There is a small example dataset included in the lwc2022
package called cog_data
. The dataset simulates cognitive
scores following the methodology used in the the Health and Retirement
(HRS), specifically focusing on tasks like word recall, serial
subtraction, and backwards counting. These cognitive tasks are the core
of the Langa-Weir classification system used to assess cognitive
function.
The simulated dataset contains 10 observations and follows the
structure expected by the functions in the package
(extract()
, score()
, and
classify()
). Below, we detail the steps taken to simulate
the dataset.
The cog_data
dataset contains 35 variable. A summary of
its structure is presented below:
# Load the package
library(lwc2022)
# Load the example dataset
data(cog_data)
# Display the structure of cog_data
str(cog_data)
#> 'data.frame': 10 obs. of 35 variables:
#> $ HHID : int 288941 234057 224021 785284 326317 465208 748794 293626 669691 689448
#> $ PN : int 93 99 72 26 7 42 9 83 36 78
#> $ SD182M1 : num 17 53 39 63 12 15 32 52 55 7
#> $ SD182M2 : num 9 51 10 23 27 99 63 7 63 27
#> $ SD182M3 : num 32 38 25 34 29 5 8 12 13 18
#> $ SD182M4 : num 33 67 27 25 38 21 15 51 57 26
#> $ SD182M5 : num 99 31 16 62 30 6 53 8 22 22
#> $ SD182M6 : num 39 31 58 17 64 60 59 34 4 13
#> $ SD182M7 : num 5 64 61 25 62 22 25 32 56 25
#> $ SD182M8 : num 23 35 40 58 30 12 31 67 56 30
#> $ SD182M9 : num 35 14 29 32 7 3 23 64 96 15
#> $ SD182M10: num 21 37 8 61 10 60 52 54 34 10
#> $ SD183M1 : num 22 12 20 56 17 56 64 35 40 56
#> $ SD183M2 : num 61 30 15 24 59 23 53 7 29 15
#> $ SD183M3 : num 23 26 38 56 32 7 27 52 5 6
#> $ SD183M4 : num 16 24 32 21 65 11 36 54 56 99
#> $ SD183M5 : num 19 25 39 64 26 9 7 34 58 13
#> $ SD183M6 : num 19 66 62 57 39 4 1 40 30 30
#> $ SD183M7 : num 62 25 16 24 64 11 58 20 40 3
#> $ SD183M8 : num 29 36 62 54 22 59 52 98 20 11
#> $ SD183M9 : num 67 65 8 56 21 55 2 53 13 56
#> $ SD183M10: num 6 67 8 54 32 96 36 55 14 63
#> $ SD142 : int 96 90 97 97 99 98 97 91 94 98
#> $ SD143 : int 86 86 89 90 80 98 89 92 90 90
#> $ SD144 : int 89 76 89 78 78 74 83 83 75 70
#> $ SD145 : int 69 76 76 66 68 79 65 77 76 64
#> $ SD146 : int 69 52 63 50 51 53 59 50 54 57
#> $ SD124 : int 0 0 0 0 1 1 0 1 0 0
#> $ SD129 : int 0 1 0 0 0 1 0 0 1 0
#> $ SD237WA : num -8 -8 -9 1 0 0 0 1 0 1
#> $ SD237WC : int 13 17 3 18 2 5 12 13 10 6
#> $ SD237WT : int 42 42 38 60 48 16 35 36 27 27
#> $ SD238WA : num -8 0 -8 -8 -8 -9 1 -8 -8 -8
#> $ SD238WC : int 9 7 9 4 2 12 9 11 7 13
#> $ SD238WT : int 37 43 33 19 12 34 21 17 12 30
The dataset contains variables for individual identifiers, cognition-related tasks (immediate/delayed word recall, serial subtraction, and backwards counting), and other variables necessary for scoring and classification.
HHID
: A unique household identifier.PN
: A unique personal identifier.SD182M01-SD182M10
: Responses for the Immediate Word
Recall task.SD183M01-SD183M10
: Responses for the Delayed Word
Recall task.SD142-SD146
: Responses for the Serial Subtraction task,
where participants are asked to subtract 7 from 100 iteratively five
times.SD124
and SD129
: Responses for the
Backwards Counting task, where participants count backwards from 20.
SD124
represents the first attempt, and SD129
represents the second attempt.SD237WA-SD237WT
and SD238WA-SD238WT
:
Responses to a mouse clicking test measuring accuracy, click counts, and
click time.The generate_example_data()
function generates a dataset
of size n = 10, producing a
set of cognitive test variables along with unique identifiers. The
output dataset is structured similarly to the cognitive assessment data
collected in the HRS.
# Simulated dataset
generate_example_data <- function(n = 10) {
data.frame(
# Identifiers
HHID = sample(100000:999999, n, replace = TRUE), # Random household ID
PN = sample(1:99, n, replace = TRUE), # Random person number
# THESE ARE THE VARIABLES USED IN THE LW CLASSIFICATIONS
# Immediate word recall (10 items)
SD182M1 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD182M2 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD182M3 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD182M4 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD182M5 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD182M6 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD182M7 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD182M8 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD182M9 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD182M10 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
# Delayed word recall (10 items)
SD183M1 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD183M2 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD183M3 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD183M4 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD183M5 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD183M6 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD183M7 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD183M8 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD183M9 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
SD183M10 = sample(c(1:40, 51:67, 96, 98, 99), n, replace = TRUE),
# Serial subtraction (Subtracting 7 from 100 five times)
SD142 = sample(90:100, n, replace = TRUE), # First subtraction value
SD143 = sample(80:99, n, replace = TRUE), # Second subtraction
SD144 = sample(70:89, n, replace = TRUE), # Third subtraction
SD145 = sample(60:79, n, replace = TRUE), # Fourth subtraction
SD146 = sample(50:69, n, replace = TRUE), # Fifth subtraction
# Backwards counting
SD124 = sample(0:1, n, replace = TRUE), # Success on first try (1 = success, 0 = fail)
SD129 = sample(0:1, n, replace = TRUE), # Success on second try (1 = success, 0 = fail)
# RANDOM VARIABLES NOT USED IN LW CLASSIFICATIONS
# Speed Test (Mouse clicking)
SD237WA = sample(c(0, 1, -8, -9), n, replace = TRUE),
SD237WC = sample(c(0, 1, -8, -9), n, replace = TRUE),
SD237WT = sample(c(0, 1, -8, -9), n, replace = TRUE),
SD238WA = sample(c(0, 1, -8, -9), n, replace = TRUE),
SD238WC = sample(c(0, 1, -8, -9), n, replace = TRUE),
SD238WT = sample(c(0, 1, -8, -9), n, replace = TRUE)
)
}
The function returns a dataframe with n rows and the following columns:
set.seed(123)
cog_data <- generate_example_data()
knitr::kable(head(cog_data), caption = "Example of generated cognition data")
HHID | PN | SD182M1 | SD182M2 | SD182M3 | SD182M4 | SD182M5 | SD182M6 | SD182M7 | SD182M8 | SD182M9 | SD182M10 | SD183M1 | SD183M2 | SD183M3 | SD183M4 | SD183M5 | SD183M6 | SD183M7 | SD183M8 | SD183M9 | SD183M10 | SD142 | SD143 | SD144 | SD145 | SD146 | SD124 | SD129 | SD237WA | SD237WC | SD237WT | SD238WA | SD238WC | SD238WT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
288941 | 93 | 17 | 9 | 32 | 33 | 99 | 39 | 5 | 23 | 35 | 21 | 22 | 61 | 23 | 16 | 19 | 19 | 62 | 29 | 67 | 6 | 96 | 86 | 89 | 69 | 69 | 0 | 0 | -8 | 0 | 0 | -9 | -8 | -8 |
234057 | 99 | 53 | 51 | 38 | 67 | 31 | 31 | 64 | 35 | 14 | 37 | 12 | 30 | 26 | 24 | 25 | 66 | 25 | 36 | 65 | 67 | 90 | 86 | 76 | 76 | 52 | 0 | 1 | -8 | -9 | 0 | -9 | -9 | 1 |
224021 | 72 | 39 | 10 | 25 | 27 | 16 | 58 | 61 | 40 | 29 | 8 | 20 | 15 | 38 | 32 | 39 | 62 | 16 | 62 | 8 | 8 | 97 | 89 | 89 | 76 | 63 | 0 | 0 | -9 | 0 | 1 | -8 | 1 | 0 |
785284 | 26 | 63 | 23 | 34 | 25 | 62 | 17 | 25 | 58 | 32 | 61 | 56 | 24 | 56 | 21 | 64 | 57 | 24 | 54 | 56 | 54 | 97 | 90 | 78 | 66 | 50 | 0 | 0 | 1 | -8 | 0 | -9 | -8 | -9 |
326317 | 7 | 12 | 27 | 29 | 38 | 30 | 64 | 62 | 30 | 7 | 10 | 17 | 59 | 32 | 65 | 26 | 39 | 64 | 22 | 21 | 32 | 99 | 80 | 78 | 68 | 51 | 1 | 0 | 0 | -8 | 1 | -8 | -8 | 1 |
465208 | 42 | 15 | 99 | 5 | 21 | 6 | 60 | 22 | 12 | 3 | 60 | 56 | 23 | 7 | 11 | 9 | 4 | 11 | 59 | 55 | 96 | 98 | 98 | 74 | 79 | 53 | 1 | 1 | 0 | 1 | 1 | -8 | -8 | 1 |