--- title: "Introduction to GWASinspector" subtitle: "Comprehensive, efficient and easy to use quality control of genome-wide association study results" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to GWASinspector} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- # Overview This vignette introduces the GWASinspector package, its general form and how to run the algorithm on multiple GWAS result files. Check our [website](http://GWASinspector.com) for further information and reference data sets. The manual for this package can also be accessed online from [here](http://gwasinspector.com/references/Introduction_to_GWASinspector.pdf). --- # Installation ```{r, echo = FALSE} knitr::opts_chunk$set( eval=FALSE, results = "hide", collapse = TRUE, comment = "#>", results = "asis" ) ``` 1. The easiest way to get **GWASinspector** is to install it from CRAN: ```{r} # this will automatically download and install the dependencies. install.packages("GWASinspector") ``` 1. Alternatively, you can use the installation function and zipped package from our website: ```{r} # get the installation function from our website: source('http://GWASinspector.com/references/install_GWASinspector.R') # this function will check R packages and install the dependencies from CRAN. install.GWASinspector(package.path = 'path/to/packageFile.gz') ``` --- # Required files ## Allele reference panels Comparing result files with an standard reference panel is the most important part of the QC process. This reference is used to check the alleles in the datasets and to ensure they are all in the same configuration (same strand, same coded alleles) in the post-QC data. We have created databases from the most popular refernece panel (e.g. HapMap, 1000G, HRC) which are available from our [website](http://GWASinspector.com). Database files are in SQLite format and can be downloaded as a compressed file. > Some reference panels include more than one population. The target population which should be set as a parameter in the configuration file before running the algorithm. ## The header-translation table The column names used in the input may differ between files (e.g. one file uses EFFECT_ALLELE where another uses CODED_ALLELE). This file is a table of possible column names and their standard translations. A sample file with common names is provided in the package. > A sample file including common terms is provided as part of the package and could be used as a template. The file contains a two-column table, with the left column containing the standard column-names and the right the alternatives. ## Configuration file An INI file is used to configure the parameters for running the algorithm. See the manual for details. > Key-names and section-names should not be edited or renamed. Otherwise the algorithm will not work properly. > A sample file is included in the package which should be used as a template. File paths and QC parameters are set according to comments and examples in the file. --- # Step-by-step guide to run a QC This walk-through explains how to run QC on a sample result file. ## Step 1: make sure the package is installed correctly After installation, try loading the package with the following command. ```{r} library(GWASinspector) ``` ## Step 2: check R environment Local machine and R environment can be explored by running the following function. ```{r} system_check() ``` Refer to the package dependency list in the manual for detail about mandatory and optional libraries. ## Step 3: download the standard allele-frequency reference datasets Standard allele-frequency reference datasets are available from our [website](http://GWASinspector.com). The database file should be decompressed and copied in the *references* folder [`dir_references` parameter of the config file]. > This package supports both *Rdata* and *SQLite database* files (the later is recommended). ## Step 4: get the header-translation table A copy of this file can be copied to a local folder by running the below command. This is a text file which includes most common variable/header names and can be edited according to user specifications. This file should be copied in the *references* folder [`dir_references` parameter of the config file]. The default name of this file is **alt_headers.txt**. `header_translations` field should be edited in the configuration file accordingly if this name is changed by user. ```{r} get_headerTranslation('/path/to/referenceFolder') # copy the file to selected folder ``` ## Step 5: get the configuration file The configuration file is in plain text format and is used for setting the desired parameters for running the algorithm. A template file can be copied to local folder by running the following command. The default name of this file is **config.ini**. ```{r} get_config('/home/user') # copy the file to selected folder ``` ## Step 6: modify the parameters in the configuration file Please refer to the configuration file or package manual for full detail of parameters. Parameters in this file are used for reading input files, analyzing the data and saving the reports. There are multiple lines of comment and information about each parameter (lines that start with `#` and `;` are comments and sample possible parameters, respectively). You should only change the lines that contain a key according to your specific needs. ## Step 7: run the QC function The QC is configured by the configuration (ini) file, which is imported into R through `setup_inspector` and turned into an object of the `Inspector` class. To perform the QC, process the object with `run_inspector`. A quick scan of the results can be performed via `result_inspector`, but the primary outcome of the QC are the log files and graphs generated by `run_inspector`. An exhaustive log file indicating the progress and possible warnings is also saved which can be used for localization of any problems during this run. **Example:** ```{r} ## load the package library(GWASinspector) ## import the QC-configuration file job <- setup_inspector("/home/user/config.ini") ## check the created instance ## input result files that will be inspected are also displayed job ## run the algorithm job <- run_inspector(job) ## check the results ## comprehensive report and result file are already saved in the output folder result_inspector(job) ``` --- # Test run You can run the algorithm on a sample GWAS file which is embedded in the package. Reports are generated and saved in the specified folder. ```{r} library(GWASinspector) demo_inspector('/sample_dir') ``` ---