Reshape grouped data to wide format works on tables in a long format with at least one predictor column, one response column, and one grouping column. An example is the ‘Indometh’ dataset with its first six rows shown in the following table.
Subject | time | conc | |
---|---|---|---|
1 | 1 | 0.25 | 1.50 |
2 | 1 | 0.50 | 0.94 |
3 | 1 | 0.75 | 0.78 |
34 | 4 | 0.25 | 1.85 |
35 | 4 | 0.50 | 1.39 |
36 | 4 | 0.75 | 1.02 |
Our target is to reshape this table into a dataset with a row for each ‘Subject’ and columns for every ‘time’ entity. The data from the column ‘conc’ should be available as data. Hence, the result should look like the following table (first 5 columns).
time | conc_first_1 | conc_first_4 | conc_first_2 | conc_first_5 |
---|---|---|---|---|
0.25 | 1.50 | 1.85 | 2.03 | 2.05 |
0.50 | 0.94 | 1.39 | 1.63 | 1.04 |
0.75 | 0.78 | 1.02 | 0.71 | 0.81 |
1.00 | 0.48 | 0.89 | 0.70 | 0.39 |
1.25 | 0.37 | 0.59 | 0.64 | 0.30 |
2.00 | 0.19 | 0.40 | 0.36 | 0.23 |
3.00 | 0.12 | 0.16 | 0.32 | 0.13 |
4.00 | 0.11 | 0.11 | 0.20 | 0.11 |
5.00 | 0.08 | 0.10 | 0.25 | 0.08 |
6.00 | 0.07 | 0.07 | 0.12 | 0.10 |
8.00 | 0.05 | 0.07 | 0.08 | 0.06 |
To achieve this result in ‘Cornerstone’ open a dataset and choose menu ‘Analysis’ -> ‘CornerstoneR’ -> ‘Reshape to Wide’ as shown in the following screenshot.
In the appearing dialog select variable ‘time’ to predictors. The unique values are shown in a new column. The response variable is ‘conc’. We want the data allocated in multiple columns with respect to the grouping variable ‘Subject’ because the response is available for every group.
‘OK’ confirms your selection and the following window appears.
Now, click the execute button (green arrow) or choose the menu ‘R Script’ -> ‘Execute’ and all calculations are done via ‘R’. Calculations are done if the text at the lower left status bar contains ‘Last execute error state: OK’. Our result is available via the menu ‘Summaries’ -> ‘Wide Data’ as shown in the following screenshot.
After clicking this menu a ‘Cornerstone’ dataset with the reshaped data opens.
Reshape to Wide can be parametrized by the ‘Script Variables’ dialog via the menu ‘R Script’ -> ‘Script variables’. The following dialog appears.
This checkbox allows the algorithm to drops rows with solely missing values. The default is checked.
The text box can hold a comma separated list of aggregation function. The default is empty, which is equivalent to ‘first’. If you select variables as ‘Auxiliaries’ in the variable selection menu they are shown in the drop-down box right from the ‘<<’ button.
Let’s take a look at the following dataset example:
Subject | time | conc | repet |
---|---|---|---|
1 | 0.25 | 1.50 | 1 |
1 | 0.25 | 1.67 | 2 |
1 | 0.25 | 1.58 | 3 |
1 | 0.50 | 0.94 | 1 |
1 | 0.50 | 0.89 | 2 |
1 | 0.50 | 0.72 | 3 |
The aggregation function is used if one response data is not identifiable by the combination of predictors and groups. In case Subject 1 for time 0.25 was measured three times. We obtain the following result using the aggregation function ‘first’, ‘mean’, and ‘sd’
time | conc_first_1 | conc_mean_1 | conc_sd_1 |
---|---|---|---|
0.25 | 1.50 | 1.583333 | 0.0850490 |
0.50 | 0.94 | 0.850000 | 0.1153256 |
As you can see new columns are added for each additional aggregation function, in this example ‘first, mean, sd’. Function ‘first’ returns the first value of the three ‘conc’ measurements 1.50, 1.67, and 1.58. Followed by its mean value ‘mean’ and standard deviation ‘sd’.
How is an auxiliary variable used in this case? In case of repeated measurements it can bee necessary to select the last or first repetition.
From the example above we would like to get the data for the highest repetition value and add ‘maxby(repet)’ to the aggregation function. As opposed to this ‘minby(repet)’ returns the ‘conc’ value for the lowest repetition value. The following table shows the corresponding results applying both functions.
time | conc_maxby_1 | conc_minby_1 |
---|---|---|
0.25 | 1.58 | 1.50 |
0.50 | 0.72 | 0.94 |
In general it is possible to use all available ‘R’ functions. Each function should return one value like the ‘mean’ or ‘sd’ function. Function which return more than one value, e.g. ‘head’, can work but it is not defined which value is shown in Cornerstone.
Reshape grouped data to long format works on tables in wide format with at least one predictor and one response column. An example is shown in the following table.
Place | Temp.Jan | Temp.July |
---|---|---|
Europe | 0 | 20 |
USA | -10 | 15 |
Asia | 20 | 50 |
Our target is to reshape this dataset into a dataset with the temperatures ‘Temp.Jan’ and ‘Temp.July’ as responses in one column and an additional column to identify the month. All predictors records should be multiplied like shown in the following table to identify each data correctly.
Place | variable | value |
---|---|---|
Europe | Jan | 0 |
USA | Jan | -10 |
Asia | Jan | 20 |
Europe | July | 20 |
USA | July | 15 |
Asia | July | 50 |
The predictor ‘time’ is available in one column because it belongs to all responses. Selected responses are split to multiple columns depending on the value in grouping column.
To achieve this result in ‘Cornerstone’ open a corresponding dataset, e.g. sample dataset ‘cities’, and choose menu ‘Analysis’ -> ‘CornerstoneR’ -> ‘Reshape to Long’ as shown in the following screenshot.
In the appearing dialog select ‘city’ as a predictor and both temperatures ‘JanTempF’ and ‘JulyTempF’ as responses. We want to stack all temperatures with respect to each city.
‘OK’ confirms your selection and the following window appears.
Before we start to reshape our data, it is necessary to check the string by which each response variable name is split. Choose menu ‘Options’ -> ‘Script Variables’ as shown in the following screenshot.
The appearing dialog shows by default an underscore (’_‘). We want to split the variable name at the word ’Temp’ because it is redundant and change the text box to ‘Temp’ as shown in the next screenshot.
If you don’t want to split the variable name at all remove all characters from the split text box. ‘OK’ confirms settings to the script variables.
Now, click the execute button (green arrow) or choose the corresponding menu ‘R Script’ -> ‘Execute’ and all calculations are done via ‘R’. The text in the lower left status bar turns to ‘Last execute error state: OK’ to identify a successful termination. Our result is available via menu ‘Summaries’ -> ‘Long Data’ as shown in the following screenshot.
After clicking a ‘Cornerstone’ dataset with the reshaped data opens.
Selected predictors are multiplied in their column. All response names are split by ‘Temp’ and available in two new columns ‘variable1’ and ‘variable2’. The corresponding value is stored in column ‘value’.
If data are available in rows, it is necessary to transpose the data set before further evaluation. An example is ‘Summary Statistic’ table from Cornerstone without groups as shown in the following table.
Statistic | MPG | Displacement | Horsepower | Weight | Acceleration |
---|---|---|---|---|---|
Count | 398.00 | 406.00 | 400.00 | 406.00 | 406.00 |
Mean | 23.52 | 194.78 | 105.08 | 2979.41 | 15.52 |
St. Dev. | 7.82 | 104.92 | 38.77 | 847.00 | 2.80 |
To use these data in further analysis or graphs, it is necessary to transpose the data so that they are available as shown in the following table.
colnames | Count | Mean | StDev |
---|---|---|---|
MPG | 398 | 23.52 | 7.82 |
Displacement | 406 | 194.78 | 104.92 |
Horsepower | 400 | 105.08 | 38.77 |
Weight | 406 | 2979.41 | 847.00 |
Acceleration | 406 | 15.52 | 2.80 |
To achieve this result in ‘Cornerstone’, select the ‘Analysis’ -> ‘CornerstoneR’ -> ‘Transpose Data’ menu from the corresponding record, as shown in the following screenshot.
In the appearing dialog select the variables ‘MPG’ to ‘Acceleration’ as predictors or responses. For this function the division is irrelevant, because all selected variables are transposed. The variable ‘Statistics’ contains names for the future columns and is optionally selected as ‘Group by’.
‘OK’ confirms your selection and the following window appears.
Now click the green button (green arrow) or select the menu ‘R Script’ -> ‘Execute’ to have all calculations performed by ‘R’. The execution was successful, if the status bar in the lower left corner shows the text ‘Last execute error state: OK’. Our result is available via the menu ‘Summaries’ -> ‘Transposed Data’ as shown in the following screenshot.
After clicking this menu, the dataset with the transposed data opens.
The ‘Transpose Data’ function converts all data into numeric values. If the conversion is not successful, missing values are inserted. This conversion can be deactivated with the ‘Script Variables’ dialog via the ‘R Script’ -> ‘Script Variables’ menu. The following dialog appears.
If you uncheck ‘Convert data to numeric’, for example, text is retained and is not replaced by a missing value.