Title: | Basic Quality Data Assurance for Epidemiological Research |
---|---|
Description: | With the provision of several tools and templates the MOSAIC project (DFG-Grant Number HO 1937/2-1) supports the implementation of a central data management in epidemiological research projects. The 'MOQA' package enables epidemiologists with none or low experience in R to generate basic data quality reports for a wide range of application scenarios. See <https://mosaic-greifswald.de/> for more information. Please read and cite the corresponding open access publication (using the former package-name) in METHODS OF INFORMATION IN MEDICINE by M. Bialke, H. Rau, T. Schwaneberg, R. Walk, T. Bahls and W. Hoffmann (2017) <doi:10.3414/ME16-01-0123>. <https://methods.schattauer.de/en/contents/most-recent-articles/issue/2483/issue/special/manuscript/27573/show.html>. |
Authors: | Martin Bialke <[email protected]>, Thea Schwaneberg <[email protected]>, Rene Walk <[email protected]> |
Maintainer: | Martin Bialke <[email protected]> |
License: | AGPL-3 |
Version: | 2.0.0 |
Built: | 2024-11-20 06:33:16 UTC |
Source: | CRAN |
internal data variable
internal data variable
The MOSAIC Project, Martin Bialke
internal data variable
internal data variable
The MOSAIC Project, Martin Bialke
internal label for data variable
internal label for data variable
The MOSAIC Project, Martin Bialke
internal label for data variable
internal label for data variable
The MOSAIC Project, Martin Bialke
internal label for data variable
internal label for data variable
The MOSAIC Project, Martin Bialke
internal label for data variable
internal label for data variable
The MOSAIC Project, Martin Bialke
internal label for data variable
internal label for data variable
The MOSAIC Project, Martin Bialke
internal label for data variable
internal label for data variable
The MOSAIC Project, Martin Bialke
internal label for data variable
internal label for data variable
The MOSAIC Project, Martin Bialke
With the provision of several tools and templates the MOSAIC project (DFG-Grant Number HO 1937/2-1) supports the implementation of a central data management in epidemiological research projects. The 'MOQA' package enables epidemiologists with none or low experience in R to generate basic data quality reports for a wide range of application scenarios. See <https://mosaic-greifswald.de/> for more information. Please read and cite the corresponding open access publication (using the former package-name) in METHODS OF INFORMATION IN MEDICINE by M. Bialke, H. Rau, T. Schwaneberg, R. Walk, T. Bahls and W. Hoffmann (2017) <doi:10.3414/ME16-01-0123>. <https://methods.schattauer.de/en/contents/most-recent-articles/issue/2483/issue/special/manuscript/27573/show.html>.
The DESCRIPTION file:
Package: | MOQA |
Type: | Package |
Title: | Basic Quality Data Assurance for Epidemiological Research |
Version: | 2.0.0 |
Date: | 2017-06-21 |
Author: | Martin Bialke <[email protected]>, Thea Schwaneberg <[email protected]>, Rene Walk <[email protected]> |
Maintainer: | Martin Bialke <[email protected]> |
Description: | With the provision of several tools and templates the MOSAIC project (DFG-Grant Number HO 1937/2-1) supports the implementation of a central data management in epidemiological research projects. The 'MOQA' package enables epidemiologists with none or low experience in R to generate basic data quality reports for a wide range of application scenarios. See <https://mosaic-greifswald.de/> for more information. Please read and cite the corresponding open access publication (using the former package-name) in METHODS OF INFORMATION IN MEDICINE by M. Bialke, H. Rau, T. Schwaneberg, R. Walk, T. Bahls and W. Hoffmann (2017) <doi:10.3414/ME16-01-0123>. <https://methods.schattauer.de/en/contents/most-recent-articles/issue/2483/issue/special/manuscript/27573/show.html>. |
License: | AGPL-3 |
Depends: | psych, gplots, grid, readr |
NeedsCompilation: | no |
Repository: | CRAN |
Packaged: | 2017-06-22 07:51:50 UTC; bialkem |
Date/Publication: | 2017-06-22 13:23:11 UTC |
Config/pak/sysreqs: | libx11-dev |
Index of help topics:
MOQA.env MOQA.env codelist codelist footnoteString footnoteString labelCounts labelCounts labelPercentage labelPercentage label_boxplot label_boxplot label_description label_description label_normalverteilung label_normalverteilung label_qnormplot label_qnormplot label_unit label_unit moqa Basic Quality Data Assurance for Epidemiological Research mosaic.addFootnote addFootnote mosaic.beginPlot beginPlot mosaic.countValue countValue mosaic.createSimplePdfCategorical createSimplePdfCategorical mosaic.createSimplePdfCategoricalDataframe createSimplePdfCategoricalDataframe mosaic.createSimplePdfMetric createSimplePdfMetric mosaic.createSimplePdfMetricDataframe createSimplePdfMetricDataframe mosaic.finishPlot finishPlot mosaic.generateCategoricalPlot generateCategoricalPlot mosaic.generateMetricPlots generateMetricPlots mosaic.generateMetricTablePlot generateMetricTablePlot mosaic.getTimestamp getTimestamp mosaic.importToolboxSpssDataFile importToolboxSpssDataFile mosaic.info info mosaic.loadCsvData loadCsvData mosaic.preProcessCategoricalData preProcessCategoricalData mosaic.preProcessMetricData preProcessMetricData mosaic.setGlobalCodelist setGlobalCodelist mosaic.setGlobalDescription setGlobalDescription mosaic.setGlobalMissingTreshold setGlobalMissingTreshold mosaic.setGlobalUnit setGlobalUnit outputPrefix outputPrefix qualifiedMissingsTreshold qualifiedMissingsTreshold
The aim of the MOQA R-Package is to provide a basic assessment of data quality and to generate a set of informative graphs. Especially, there should be no demand for the potential researcher to master R. This R-package enables researchers to generate reports for various kinds of metric and categorical data. Additionally, general reports for multivariate input data and, if needed, detailed results for single-variable data can be produced.
CSV-files as well as dataframes can be used as input format to create a report. The results are instantly saved in an automatically generated PDF-file. For each study variable within the data input file a separate PDF-file with standard or, if applicable, customized plots and tables is produced. These standard reports enable the user to monitor and report the data integrity and completeness. However, for more specific reports the knowledge of metadata is necessary, including definition of units, variables, descriptions, code lists and categories of qualified missings.
Version 1.2 ———– ADDED Support for metric and categorical dataframes BUGFIX Aborted report generation in case of non-existent missings in datacolumn
Version 2.0 ———– RENAME Official Renaming of former package-name mosaicQA to MOQA ADDED new function importToolboxSpssDataFile
Martin Bialke <[email protected]>, Thea Schwaneberg <[email protected]>, Rene Walk <[email protected]>
Maintainer: Martin Bialke <[email protected]>
mosaic-greifswald.de
## Example 1: Generate pdf with graphs for a single metric data column, e.g. data of body height # load MOQA package library('MOQA') # specify the csv import file with metric data, use one column per variable metric_datafile='c:/mosaic/metric_single_var.csv' #specify output folder outputFolder='c:/mosaic/outputs/' #set missing threshold, optional, default is 99900 mosaic.setGlobalMissingTreshold(99900) #set variable unit, optional mosaic.setGlobalUnit('(cm)') #set variable description, optional, if not uses the name of the variable is displayed in #table heading mosaic.setGlobalDescription('Height') #create PDF-report, #uncomment to start report-generation #mosaic.createSimplePdfmetric(metric_datafile, outputFolder) ## Example 2: Generate pdf with graphs for a single categorical data column # load MOQA package library('MOQA') # specify the import file with Categorical data # first row has to contain variable names without special characters Categorical_datafile='c:/mosaic/cat_single_var_en.csv' #specify output folder outputFolder='c:/mosaic/outputs/' #set treshold to detect missings, default is 99900 (adjust this line to change this global value, #but be careful) mosaic.setGlobalMissingTreshold(99900) #set description of var mosaic.setGlobalCodelist(c('1=yes','2=no','99996=not specified','99997=not acquired')) # create simple pdf file foreach variable column in Categorical data file, # uncomment to start report-generation # mosaic.createSimplePdfCategorical(Categorical_datafile,outputFolder) ## Example 3: Generate pdf with graphs for a multiple metric data columns, generates one pdf for # each column using the variable name for table headings # load MOQA package library('MOQA') # specify the import file with metric data # use one column per variable, first row should contain variable name, following rows should # contain data, csv Files with multiple rows are supported, decimal values should be formated # for example : 25.4 metric_datafile='c:/mosaic/metric_multi_var.csv' #specify output folder outputFolder="c:/mosaic/outputs/" # set treshold to detect missings, default is 99900 (adjust this line to change this global value # but be careful) mosaic.setGlobalMissingTreshold(99900) # create PDF-Files for vars, # uncomment to start report-generation #mosaic.createSimplePdfmetric(metric_datafile, outputFolder) ## Example 4: Generate pdf with graphs for a multiple metric dataframe, generates one pdf for # each column using the variable name for table headings # load MOQA package library('MOQA') # specify the metric dataframe with 1-n columns, here sample data is generated metric_data=data.frame(matrix(rnorm(20), nrow=10)) #specify output folder outputFolder="c:/mosaic/outputs/" # set treshold to detect missings, default is 99900 (adjust this line to change this global value # but be careful) mosaic.setGlobalMissingTreshold(99900) # create PDF-Files for vars, # uncomment to start report-generation #mosaic.createSimplePdfMetricDataframe(metric_data, outputFolder) ## Example 5: Import data from SPSS Export file generated by Toolbox for Research # and generate report for specific variable # load MOQA package library('MOQA') # specify import dat-file importfile="c:/mosaic/import/all_in_one.dat" # specify output folder outputFolder="c:/mosaic/outputs/" # import data #importdata=mosaic.importToolboxSpssDataFile(importfile) # generate report for a specifc variable e.e. patient.age # pass data as dataframe to use already given column name for a more descriptive output #mosaic.createSimplePdfMetricDataframe(as.data.frame(importdata$ve_temperature_ear),outputFolder)
## Example 1: Generate pdf with graphs for a single metric data column, e.g. data of body height # load MOQA package library('MOQA') # specify the csv import file with metric data, use one column per variable metric_datafile='c:/mosaic/metric_single_var.csv' #specify output folder outputFolder='c:/mosaic/outputs/' #set missing threshold, optional, default is 99900 mosaic.setGlobalMissingTreshold(99900) #set variable unit, optional mosaic.setGlobalUnit('(cm)') #set variable description, optional, if not uses the name of the variable is displayed in #table heading mosaic.setGlobalDescription('Height') #create PDF-report, #uncomment to start report-generation #mosaic.createSimplePdfmetric(metric_datafile, outputFolder) ## Example 2: Generate pdf with graphs for a single categorical data column # load MOQA package library('MOQA') # specify the import file with Categorical data # first row has to contain variable names without special characters Categorical_datafile='c:/mosaic/cat_single_var_en.csv' #specify output folder outputFolder='c:/mosaic/outputs/' #set treshold to detect missings, default is 99900 (adjust this line to change this global value, #but be careful) mosaic.setGlobalMissingTreshold(99900) #set description of var mosaic.setGlobalCodelist(c('1=yes','2=no','99996=not specified','99997=not acquired')) # create simple pdf file foreach variable column in Categorical data file, # uncomment to start report-generation # mosaic.createSimplePdfCategorical(Categorical_datafile,outputFolder) ## Example 3: Generate pdf with graphs for a multiple metric data columns, generates one pdf for # each column using the variable name for table headings # load MOQA package library('MOQA') # specify the import file with metric data # use one column per variable, first row should contain variable name, following rows should # contain data, csv Files with multiple rows are supported, decimal values should be formated # for example : 25.4 metric_datafile='c:/mosaic/metric_multi_var.csv' #specify output folder outputFolder="c:/mosaic/outputs/" # set treshold to detect missings, default is 99900 (adjust this line to change this global value # but be careful) mosaic.setGlobalMissingTreshold(99900) # create PDF-Files for vars, # uncomment to start report-generation #mosaic.createSimplePdfmetric(metric_datafile, outputFolder) ## Example 4: Generate pdf with graphs for a multiple metric dataframe, generates one pdf for # each column using the variable name for table headings # load MOQA package library('MOQA') # specify the metric dataframe with 1-n columns, here sample data is generated metric_data=data.frame(matrix(rnorm(20), nrow=10)) #specify output folder outputFolder="c:/mosaic/outputs/" # set treshold to detect missings, default is 99900 (adjust this line to change this global value # but be careful) mosaic.setGlobalMissingTreshold(99900) # create PDF-Files for vars, # uncomment to start report-generation #mosaic.createSimplePdfMetricDataframe(metric_data, outputFolder) ## Example 5: Import data from SPSS Export file generated by Toolbox for Research # and generate report for specific variable # load MOQA package library('MOQA') # specify import dat-file importfile="c:/mosaic/import/all_in_one.dat" # specify output folder outputFolder="c:/mosaic/outputs/" # import data #importdata=mosaic.importToolboxSpssDataFile(importfile) # generate report for a specifc variable e.e. patient.age # pass data as dataframe to use already given column name for a more descriptive output #mosaic.createSimplePdfMetricDataframe(as.data.frame(importdata$ve_temperature_ear),outputFolder)
local environment to handle MOQA-internal variables
local environment
The MOSAIC Project, Martin Bialke
Add a Footnote to plot using footnotestring and current timestamp.
mosaic.addFootnote()
mosaic.addFootnote()
Function call type: internal
The MOSAIC Project, Martin Bialke
begin plotting the configured graphs for loaded data and generate the output PDF-File.
mosaic.beginPlot(varname,outputfolder)
mosaic.beginPlot(varname,outputfolder)
varname |
name of the studyitem or csv column loaded to plot graphs for. |
outputfolder |
name of the output folder |
Function call type: internal
The MOSAIC Project, Martin Bialke
Count occurrence of search value in data column
mosaic.countValue(searchvalue, data_column)
mosaic.countValue(searchvalue, data_column)
searchvalue |
value to search for |
data_column |
name of study item or data column to search in |
useful to find qualified missings in data column
count of occurences of specified value in specified data column
Function call type: internal
The MOSAIC Project, Martin Bialke
Create simple PDF-file for categorical data
mosaic.createSimplePdfCategorical(inputfile, outputfolder)
mosaic.createSimplePdfCategorical(inputfile, outputfolder)
inputfile |
path to input csv-file |
outputfolder |
path to output folder |
Function call type: user
The MOSAIC Project, Martin Bialke
# load MOQA package library('MOQA') # specify the import file with categorial data # first row has to contain variable names without special characters categorial_datafile='c:/mosaic/cat_single_var_en.csv' # specify output folder outputFolder='c:/mosaic/outputs/' # set treshold to detect missings, default is 99900 (adjust this line to change this global value, # but be careful) mosaic.setGlobalMissingTreshold(99900) # set description of var mosaic.setGlobalCodelist(c('1=yes','2=no','99996=not specified','99997=not acquired')) # create simple pdf file foreach variable column in categorial data file, uncomment to start # report-generation # mosaic.createSimplePdfCategorical(categorial_datafile,outputFolder)
# load MOQA package library('MOQA') # specify the import file with categorial data # first row has to contain variable names without special characters categorial_datafile='c:/mosaic/cat_single_var_en.csv' # specify output folder outputFolder='c:/mosaic/outputs/' # set treshold to detect missings, default is 99900 (adjust this line to change this global value, # but be careful) mosaic.setGlobalMissingTreshold(99900) # set description of var mosaic.setGlobalCodelist(c('1=yes','2=no','99996=not specified','99997=not acquired')) # create simple pdf file foreach variable column in categorial data file, uncomment to start # report-generation # mosaic.createSimplePdfCategorical(categorial_datafile,outputFolder)
Create simple PDF-file for categorical data
mosaic.createSimplePdfCategoricalDataframe(df, outputfolder)
mosaic.createSimplePdfCategoricalDataframe(df, outputfolder)
df |
dataframe |
outputfolder |
path to output folder |
Function call type: user
The MOSAIC Project, Martin Bialke
Create simple PDF-file for metric data
mosaic.createSimplePdfMetric(inputfile, outputfolder)
mosaic.createSimplePdfMetric(inputfile, outputfolder)
inputfile |
path to input csv file |
outputfolder |
path to output folder |
Function call type: user
The MOSAIC Project, Martin Bialke
# load MOQA package library('MOQA') # specify the csv import file with metric data, use one column per variable metric_datafile='c:/mosaic/metric_single_var.csv' #specify output folder outputFolder='c:/mosaic/output/' #set missing threshold, optional, default is 99900 mosaic.setGlobalMissingTreshold(99900) #set variable unit, optional mosaic.setGlobalUnit('(cm)') #set variable description, optional mosaic.setGlobalDescription('Height') #create PDF-report, uncomment to start report-generation #mosaic.createSimplePdfMetric(metric_datafile, outputFolder)
# load MOQA package library('MOQA') # specify the csv import file with metric data, use one column per variable metric_datafile='c:/mosaic/metric_single_var.csv' #specify output folder outputFolder='c:/mosaic/output/' #set missing threshold, optional, default is 99900 mosaic.setGlobalMissingTreshold(99900) #set variable unit, optional mosaic.setGlobalUnit('(cm)') #set variable description, optional mosaic.setGlobalDescription('Height') #create PDF-report, uncomment to start report-generation #mosaic.createSimplePdfMetric(metric_datafile, outputFolder)
Create simple PDF-file for metric data
mosaic.createSimplePdfMetricDataframe(df, outputfolder)
mosaic.createSimplePdfMetricDataframe(df, outputfolder)
df |
path to input csv file |
outputfolder |
path to output folder |
Function call type: user
The MOSAIC Project, Martin Bialke
# load MOQA package library('MOQA') # specify the metric dataframe with 1-n columns, here sample data is generated metric_data=data.frame(matrix(rnorm(20), nrow=10)) #specify output folder outputFolder="c:/mosaic/outputs/" # set treshold to detect missings, default is 99900 (adjust this line to change this global value # but be careful) mosaic.setGlobalMissingTreshold(99900) # create PDF-Files for vars, # uncomment to start report-generation #mosaic.createSimplePdfMetricDataframe(metric_data, outputFolder)
# load MOQA package library('MOQA') # specify the metric dataframe with 1-n columns, here sample data is generated metric_data=data.frame(matrix(rnorm(20), nrow=10)) #specify output folder outputFolder="c:/mosaic/outputs/" # set treshold to detect missings, default is 99900 (adjust this line to change this global value # but be careful) mosaic.setGlobalMissingTreshold(99900) # create PDF-Files for vars, # uncomment to start report-generation #mosaic.createSimplePdfMetricDataframe(metric_data, outputFolder)
Finish plotting, close PDF-file
mosaic.finishPlot()
mosaic.finishPlot()
Function call type: internal
The MOSAIC Project, Martin Bialke
Generate Statistics and Create plots for categorical data
mosaic.generateCategoricalPlot(dataframe, varname)
mosaic.generateCategoricalPlot(dataframe, varname)
dataframe |
data table with one or more columns (first row should contain column names/study item names/variable names) |
varname |
selected column/study item/variable to plot graph for |
Function call type: internal
The MOSAIC Project, Martin Bialke
calculate statistics and generate graphs for metric data
mosaic.generateMetricPlots(data_snippet, var_name)
mosaic.generateMetricPlots(data_snippet, var_name)
data_snippet |
data table with one or more columns (first row should contain column names/study item names/variable names) |
var_name |
selected column/study item/variable to plot graph for |
Function call type: internal
The MOSAIC Project, Martin Bialke
Generate missing-ratio table for metric data (data, num of columns, column index, varname)
mosaic.generateMetricTablePlot(data, num_of_columns, index, varname)
mosaic.generateMetricTablePlot(data, num_of_columns, index, varname)
data |
preprocessed data frame including 'valid value markers' |
num_of_columns |
absolute number of to be processed data columns |
index |
current column to be processed |
varname |
current name of variable to be used in table heading |
Function call type: internal
The MOSAIC Project, Martin Bialke
get a current timestamp formatted as %Y_%m_%d_%H%M%S
mosaic.getTimestamp()
mosaic.getTimestamp()
timestamp, e.g. '2016_09_09_143458'
Function call type: internal
The MOSAIC Project, Martin Bialke
load dat-file from 'toolbox for resarch' spss export with tab-separator with n columns to dataframe
mosaic.importToolboxSpssDataFile(filename)
mosaic.importToolboxSpssDataFile(filename)
filename |
filename or a complete path to a dat-file |
Function call type: user
The MOSAIC Project, Martin Bialke
MOSAIC Information
mosaic.info()
mosaic.info()
Function call type: user
The MOSAIC Project, Martin Bialke
Load data from csv-file is one or more columns. first row should contain the name of the study item, e.g. 'height'
mosaic.loadCsvData(filename)
mosaic.loadCsvData(filename)
filename |
filename or a complete path to a file |
Function call type: user
The MOSAIC Project, Martin Bialke
Identify unique values in data column, get absolute, percentage and cumulative statistics
mosaic.preProcessCategoricalData(data)
mosaic.preProcessCategoricalData(data)
data |
data frame to be processed containing categorical data |
Function call type: internal
The MOSAIC Project, Martin Bialke
Pre-process metric data to allow missing-ratio table
mosaic.preProcessMetricData(data)
mosaic.preProcessMetricData(data)
data |
data frame to be preprocessed containing metric data |
Function call type: internal
The MOSAIC Project, Martin Bialke
set and parse a global code list for categorical data to be used in categorical plot descriptions
mosaic.setGlobalCodelist(coding)
mosaic.setGlobalCodelist(coding)
coding |
list of code and value pairs, see example for details |
Function call type: user
The MOSAIC Project, Martin Bialke
mosaic.setGlobalCodelist(c('1=yes','2=no', '99996=no information'))
mosaic.setGlobalCodelist(c('1=yes','2=no', '99996=no information'))
Set Global Description for variable User (description) data. especially useful when plotting graphs for a selected data column
mosaic.setGlobalDescription(value)
mosaic.setGlobalDescription(value)
value |
string value to be used as study item description, e.g. 'waist circumference' |
Function call type: user
The MOSAIC Project, Martin Bialke
mosaic.setGlobalDescription('waist circumference')
mosaic.setGlobalDescription('waist circumference')
Set Global Threshold for Missings , e.g. 99000
mosaic.setGlobalMissingTreshold(value)
mosaic.setGlobalMissingTreshold(value)
value |
threshold to separate missings from valid values |
Function call type: user
The MOSAIC Project, Martin Bialke
mosaic.setGlobalMissingTreshold(99000)
mosaic.setGlobalMissingTreshold(99000)
Set Global Unit Label to be used User in graphs, e.g. '(cm)'
mosaic.setGlobalUnit(value)
mosaic.setGlobalUnit(value)
value |
unit string to be used in graphs |
Function call type: user
The MOSAIC Project, Martin Bialke
mosaic.setGlobalUnit('(cm)')
mosaic.setGlobalUnit('(cm)')
internal data variable
internal data variable
The MOSAIC Project, Martin Bialke
internal data variable
internal data variable
The MOSAIC Project, Martin Bialke