Title: | Integrating Multiple Modalities of High Throughput Assays Using Item Response Theory |
---|---|
Description: | Provides a systematic framework for integrating multiple modalities of assays profiled on the same set of samples. The goal is to identify genes that are altered in cancer either marginally or consistently across different assays. The heterogeneity among different platforms and different samples are automatically adjusted so that the overall alteration magnitude can be accurately inferred. See Tong and Coombes (2012) <doi:10.1093/bioinformatics/bts561>. |
Authors: | Pan Tong, Kevin R Coombes |
Maintainer: | Kevin R. Coombes <[email protected]> |
License: | Apache License (== 2.0) |
Version: | 1.0.7 |
Built: | 2024-10-30 06:51:33 UTC |
Source: | CRAN |
Given the original binary matrix and item parameters, calculate the permuted latent trait by gene sampling. Basically this function permutes within columns and recompute the latent trait using pre-specified item parameters and the permuted binary matrix.
calculatePermutedScoreByGeneSampling(originalMat, dscrmn = dscrmn, dffclt = dffclt, c = rep(0, length(dffclt)), fold = 1, parallel=FALSE)
calculatePermutedScoreByGeneSampling(originalMat, dscrmn = dscrmn, dffclt = dffclt, c = rep(0, length(dffclt)), fold = 1, parallel=FALSE)
originalMat |
The original response matrix |
dscrmn |
The estimated item discrimination parameter |
dffclt |
The estimated item difficulty parameter |
c |
The estimated item guessing parameter if available |
fold |
The fold relative to the number of genes present should gene sampling achieve. Default is 1, meaning equal number of genes are sampled. Increasing fold would increase the precesion in estimating empirical P value |
parallel |
Logical indicating whether to use parallel computing with foreach package as backend. |
Both gene sampling and sample label permutation can be used to infer the null distribution of altent traits. For sample label permutation, one can simply first construct the binary matrix after permuting the sample labels and feed it to computeAbility() function together with item parameters
A vector of null latent traits by gene sampling
Pan Tong ([email protected]), Kevin R Coombes ([email protected])
David Magis, Gilles Raiche (2012). Random Generation of Response Patterns under Computerized Adaptive Testing with the R Package catR. Journal of Statistical Software, 48(8), 1-31.
fitOnSinglePlat, intIRTeasyRun, computeAbility
# number of items and number of genes nSample <- 10 nGene <- 2000 set.seed(1000) a <- rgamma(nSample, shape=1, scale=1) b <- rgamma(nSample, shape=1, scale=1) # true latent traits theta <- rnorm(nGene, mean=0) # probability of correct response (P_ij) for gene i in sample j P <- matrix(NA, nrow=nGene, ncol=nSample) for(i in 1:nSample){ P[, i] <- exp(a[i]*(theta-b[i]))/(1+exp(a[i]*(theta-b[i]))) } # binary matrix X <- matrix(NA, nrow=nGene, ncol=nSample) for(i in 1:nSample){ X[, i] <- rbinom(nGene, size=1, prob=P[, i]) } # IRT fitting fit2PL <- fitOnSinglePlat(X, model=3) dffclt <- coef(fit2PL$fit)[, 'Dffclt'] dscrmn <- coef(fit2PL$fit)[, 'Dscrmn'] # estimated null latent trait by gene sampling scoreNull <- calculatePermutedScoreByGeneSampling(X, dffclt=dffclt, dscrmn=dscrmn, fold=1)
# number of items and number of genes nSample <- 10 nGene <- 2000 set.seed(1000) a <- rgamma(nSample, shape=1, scale=1) b <- rgamma(nSample, shape=1, scale=1) # true latent traits theta <- rnorm(nGene, mean=0) # probability of correct response (P_ij) for gene i in sample j P <- matrix(NA, nrow=nGene, ncol=nSample) for(i in 1:nSample){ P[, i] <- exp(a[i]*(theta-b[i]))/(1+exp(a[i]*(theta-b[i]))) } # binary matrix X <- matrix(NA, nrow=nGene, ncol=nSample) for(i in 1:nSample){ X[, i] <- rbinom(nGene, size=1, prob=P[, i]) } # IRT fitting fit2PL <- fitOnSinglePlat(X, model=3) dffclt <- coef(fit2PL$fit)[, 'Dffclt'] dscrmn <- coef(fit2PL$fit)[, 'Dscrmn'] # estimated null latent trait by gene sampling scoreNull <- calculatePermutedScoreByGeneSampling(X, dffclt=dffclt, dscrmn=dscrmn, fold=1)
This function calculates the MLE of latent traits for a given response matrix with rows being examinees and columns being items for given item parameters.
computeAbility(respMat, dscrmn = dscrmn, dffclt = dffclt, c = rep(0, length(dffclt)), parallel=FALSE)
computeAbility(respMat, dscrmn = dscrmn, dffclt = dffclt, c = rep(0, length(dffclt)), parallel=FALSE)
respMat |
The response matrix of 0 and 1 with rows being examinees and columns being items. |
dscrmn |
A vector of item discrimination parameter. |
dffclt |
A vector of item difficulty parameter. |
c |
A vector of guessing parameter. Default is set to all 0 indicating no guessing allowed. |
parallel |
Logical indicating whether to use parallel computing with foreach package as backend. |
This function is a wrapper of the thetaEst() function from catR package (Magis, 2012).
A vector of latent trait estimates for each examinee.
Pan Tong ([email protected]), Kevin R Coombes ([email protected])
David Magis, Gilles Raiche (2012). Random Generation of Response Patterns under Computerized Adaptive Testing with the R Package catR. Journal of Statistical Software, 48(8), 1-31.
fitOnSinglePlat, intIRTeasyRun, calculatePermutedScoreByGeneSampling
# number of items and number of genes nSample <- 10 nGene <- 2000 set.seed(1000) a <- rgamma(nSample, shape=1, scale=1) b <- rgamma(nSample, shape=1, scale=1) # true latent traits theta <- rnorm(nGene, mean=0) # probability of correct response (P_ij) for gene i in sample j P <- matrix(NA, nrow=nGene, ncol=nSample) for(i in 1:nSample){ P[, i] <- exp(a[i]*(theta-b[i]))/(1+exp(a[i]*(theta-b[i]))) } # binary matrix X <- matrix(NA, nrow=nGene, ncol=nSample) for(i in 1:nSample){ X[, i] <- rbinom(nGene, size=1, prob=P[, i]) } # IRT fitting fit2PL <- fitOnSinglePlat(X, model=3) dffclt <- coef(fit2PL$fit)[, 'Dffclt'] dscrmn <- coef(fit2PL$fit)[, 'Dscrmn'] # estimated latent trait score <- computeAbility(X, dffclt=dffclt, dscrmn=dscrmn)
# number of items and number of genes nSample <- 10 nGene <- 2000 set.seed(1000) a <- rgamma(nSample, shape=1, scale=1) b <- rgamma(nSample, shape=1, scale=1) # true latent traits theta <- rnorm(nGene, mean=0) # probability of correct response (P_ij) for gene i in sample j P <- matrix(NA, nrow=nGene, ncol=nSample) for(i in 1:nSample){ P[, i] <- exp(a[i]*(theta-b[i]))/(1+exp(a[i]*(theta-b[i]))) } # binary matrix X <- matrix(NA, nrow=nGene, ncol=nSample) for(i in 1:nSample){ X[, i] <- rbinom(nGene, size=1, prob=P[, i]) } # IRT fitting fit2PL <- fitOnSinglePlat(X, model=3) dffclt <- coef(fit2PL$fit)[, 'Dffclt'] dscrmn <- coef(fit2PL$fit)[, 'Dscrmn'] # estimated latent trait score <- computeAbility(X, dffclt=dffclt, dscrmn=dscrmn)
This function provides a common interface for the user so that data dichotomization can be done easily.
dichotomize(mat, matCtr, assayType = c("Expr", "Methy", "CN"), ...)
dichotomize(mat, matCtr, assayType = c("Expr", "Methy", "CN"), ...)
mat |
A matrix, either expression, methylation or CN |
matCtr |
A matrix corresponding to normal controls. |
assayType |
A character string specifying the assay type. It can only be any of "Expr", "Methy", "CN". For assays none of these types, the program will quite. To run intIRT, the user can manually dichotomize the data and feed them into intIRTeasyRun function. |
... |
Additional parameters to be passed to the specific dichotomization function. |
Abinary matrix of the same dimension as input mat.
Pan Tong ([email protected]), Kevin R Coombes ([email protected])
Tong P, Coombes KR. integIRTy: a method to identify altered genes in cancer accounting for multiple mechanisms of regulation using item response theory. Bioinformatics, 2012 Nov 15; 28(22):2861–9.
dichotomizeCN, dichotomizeExpr, dichotomizeMethy
data(OV) binDat_expr <- dichotomize(Expr_T[1:20, ], Expr_N[1:20, ], assayType='Expr') binDat_methy <- dichotomize(Methy_T[1:20, ], Methy_N[1:20, ], assayType='Methy')
data(OV) binDat_expr <- dichotomize(Expr_T[1:20, ], Expr_N[1:20, ], assayType='Expr') binDat_methy <- dichotomize(Methy_T[1:20, ], Methy_N[1:20, ], assayType='Methy')
A simple dichotomization procedure is implemented for CN data that only requires two cutoffs.
dichotomizeCN(CN, CNctr = NULL, tau1 = -0.3, tau2 = 0.3)
dichotomizeCN(CN, CNctr = NULL, tau1 = -0.3, tau2 = 0.3)
CN |
A matrix of gene wise copy number data for tumor samples. Rows are the genes; columns are the samples. |
CNctr |
A matrix of copy number data for normal samples. The program first guess if this is paired data by comparing if tumor and normal sample has equal sample size. If TRUE, then normal samples will be subtracted element by element to correct for germline CN change. Otherwise, no correction is performed. The program proceeds with only tumor data. |
tau1 |
The lower bound for log2Ratio when converting to a code as 0. |
tau2 |
The upper bound for log2Ratio when converting to a code as 0. Log2ratio between tau1 and tau2 is converted to 0 and 1 otherwise. |
Returns a binary matrix of the same dimension. Missing value would be propogated into the binary matrix.
Pan Tong ([email protected]), Kevin R Coombes ([email protected])
Tong P, Coombes KR. integIRTy: a method to identify altered genes in cancer accounting for multiple mechanisms of regulation using item response theory. Bioinformatics, 2012 Nov 15; 28(22):2861–9.
dichotomizeExpr, dichotomizeMethy
data(OV) binDat <- dichotomizeCN(CN_T[1:20, ], CN_N[1:20, ]) binDat[15:20, 1:2]
data(OV) binDat <- dichotomizeCN(CN_T[1:20, ], CN_N[1:20, ]) binDat[15:20, 1:2]
This function implements the z-like metric described in the paper.
dichotomizeExpr(expr, exprCtr, refUseMean = FALSE, BIthr = NULL, tau1 = -2.5, tau2 = 2.5, parallel = FALSE)
dichotomizeExpr(expr, exprCtr, refUseMean = FALSE, BIthr = NULL, tau1 = -2.5, tau2 = 2.5, parallel = FALSE)
expr |
The expression matrix for tumor samples. Rows are genes and columns are samples. |
exprCtr |
Expression matrix of normal controls. Genes should exactly the same as the tumor sample. The sample size are not necessarily the same as tumor sample. |
refUseMean |
Logical indicating whether to use mean of normal sample as reference. Default is set to FALSE which means to use median as it is more robust. |
BIthr |
Threshold of bimodality index to flag bimodal genes. If not specified, it will be set according to the sample size of tumor samples. Specifically, if tumor sample size is over 100, BIthr=1.1. If sample size is between 50 and 100, BIthr=1.5. If sample size is below 50, BIthr=2.0. |
tau1 |
Lower bound of z-like metric to be coded as 0. |
tau2 |
Upper bound of z-like metric to be coded as 0. The z-like metric between tau1 and tau2 will be finally converted to 0 and otherwise. |
parallel |
Logical indicating whether to use parallel backend provided by foreach and related packages. |
The parallelism is written to speedup BI computation. If the number of genes is not large, i.e. below 4000, we recommend not to use parallel since this will only slow down the computation. In fact, except BI computation, all other operations are written with vector operation.
A binary matrix of the same dimension of input expr. Missing values will be propogated into binary matrix.
Pan Tong ([email protected]), Kevin R Coombes ([email protected])
Tong P, Coombes KR. integIRTy: a method to identify altered genes in cancer accounting for multiple mechanisms of regulation using item response theory. Bioinformatics, 2012 Nov 15; 28(22):2861–9.
dichotomizeCN, dichotomizeMethy, dichotomize
data(OV) binDat <- dichotomizeExpr(Expr_T[1:200, ], Expr_N[1:200, ]) #binDat <- dichotomizeExpr(Expr_T[1:200, ], Expr_N[1:200, ], parallel=TRUE) binDat[15:20, 1:2]
data(OV) binDat <- dichotomizeExpr(Expr_T[1:200, ], Expr_N[1:200, ]) #binDat <- dichotomizeExpr(Expr_T[1:200, ], Expr_N[1:200, ], parallel=TRUE) binDat[15:20, 1:2]
This function implements the procedure for dichotomizing methylation data described in the paper.
dichotomizeMethy(methy, methyCtr, refUseMean = FALSE)
dichotomizeMethy(methy, methyCtr, refUseMean = FALSE)
methy |
The methylation matrix for tumor samples. Each element represents the beta value which is bounded between 0 and 1. Rows are genes and columns are samples. |
methyCtr |
Methylation matrix of normal controls. Genes should exactly the same as the tumor sample. The sample size are not necessarily the same as tumor sample. |
refUseMean |
Logical indicating whether to use mean of normal sample as reference. Default is set to FALSE which means to use median as it is more robust. |
A binary matrix of the same dimension of input methy.
Pan Tong ([email protected]), Kevin R Coombes ([email protected])
Tong P, Coombes KR. integIRTy: a method to identify altered genes in cancer accounting for multiple mechanisms of regulation using item response theory. Bioinformatics, 2012 Nov 15; 28(22):2861–9.
dichotomizeCN, dichotomizeExpr, dichotomize
data(OV) binDat <- dichotomizeMethy(Methy_T[1:200, ], Methy_N[1:200, ]) binDat[15:20, 1:2]
data(OV) binDat <- dichotomizeMethy(Methy_T[1:200, ], Methy_N[1:200, ]) binDat[15:20, 1:2]
This function fits the Item Response Model for one platform. It assumes the user has already dichotomized the data.
fitOnSinglePlat(data, model = 2, guessing = FALSE, sampleIndices = 1:ncol(data), geneIndices = 1:nrow(data), ...)
fitOnSinglePlat(data, model = 2, guessing = FALSE, sampleIndices = 1:ncol(data), geneIndices = 1:nrow(data), ...)
data |
A matrix of 0's and 1's with rows being genes (treated as examinees) and columns being samples (treated as items). |
model |
IRT model. 1-Rasch model where all item discrination are set to 1; 2-all item discrimation are set to be equal but not necessarily as 1; 3-the 2PL model where no constraint is put on the item difficulty and discrimination parameter. |
guessing |
A logical variable indicating whether to include guessing parameter in the model. |
sampleIndices |
Indices of the samples to be feeded into the model. Default is set to use all samples. |
geneIndices |
Indices of the genes to be feeded into the model. Default is to use all genes. |
... |
Additional options available in ltm package. Currently not used in intIRT package. |
A list giving the estimated IRT model and related information
fit |
An object returned by calling ltm package. Item parameters and other auxillary inforamtion (i.e. loglikelihood, convergence, Hessian) can be accessed from this object. For more details, please refer to ltm package |
model |
The model type |
guessing |
The guessing parameter |
sampleIndices |
The sample indices used in the model |
geneIndices |
The gene indices used in the model |
Pan Tong ([email protected]), Kevin R Coombes ([email protected])
Rizopoulos, D. (2006) ltm: An R package for latent variable modelling and item response theory analyses. Journal of Statistical Software, 17(5), 1-25.
computeAbility, intIRTeasyRun, calculatePermutedScoreByGeneSampling
# number of items and number of genes nSample <- 10 nGene <- 2000 set.seed(1000) a <- rgamma(nSample, shape=1, scale=1) b <- rgamma(nSample, shape=1, scale=1) # true latent traits theta <- rnorm(nGene, mean=0) # probability of correct response (P_ij) for gene i in sample j P <- matrix(NA, nrow=nGene, ncol=nSample) for(i in 1:nSample){ P[, i] <- exp(a[i]*(theta-b[i]))/(1+exp(a[i]*(theta-b[i]))) } # binary matrix X <- matrix(NA, nrow=nGene, ncol=nSample) for(i in 1:nSample){ X[, i] <- rbinom(nGene, size=1, prob=P[, i]) } # IRT fitting fit2PL <- fitOnSinglePlat(X, model=3)
# number of items and number of genes nSample <- 10 nGene <- 2000 set.seed(1000) a <- rgamma(nSample, shape=1, scale=1) b <- rgamma(nSample, shape=1, scale=1) # true latent traits theta <- rnorm(nGene, mean=0) # probability of correct response (P_ij) for gene i in sample j P <- matrix(NA, nrow=nGene, ncol=nSample) for(i in 1:nSample){ P[, i] <- exp(a[i]*(theta-b[i]))/(1+exp(a[i]*(theta-b[i]))) } # binary matrix X <- matrix(NA, nrow=nGene, ncol=nSample) for(i in 1:nSample){ X[, i] <- rbinom(nGene, size=1, prob=P[, i]) } # IRT fitting fit2PL <- fitOnSinglePlat(X, model=3)
It fits IRT models on each of the specified platform and calculate integrated latent trait. If required, permuted latent trait by gene sampling will also be calculated. An option for parallel computing is implemented to speed up the computation.
intIRTeasyRun(platforms, model = 3, guessing = FALSE, addPermutedScore = FALSE, fold = 1, echo = TRUE, parallel = FALSE)
intIRTeasyRun(platforms, model = 3, guessing = FALSE, addPermutedScore = FALSE, fold = 1, echo = TRUE, parallel = FALSE)
platforms |
A list of response matrices representing different platforms. It assumes that the number of rows (genes ) must be equal whiel the number of columns (samples) can be different. |
model |
The model type as described in fitOnSinglePlat. |
guessing |
A logical variable indicating whether to include guessing parameter in the model. |
addPermutedScore |
A logical variable indicating whether to also calculate permuted latent trait by gene sampling. |
fold |
The fold of sampling to calculate permuted score as used in calculatePermutedScoreByGeneSampling(). Only relevant when addPermutedScore=TRUE is used. |
echo |
A logical variable indicating whether to print out the progress information. |
parallel |
Logical indicating whether to use parallel computing with foreach package as backend. |
Parallel computing uses foreach and related packages for backend. The parallelism assumes computation on each platform individually takes similar time; the latent trait computation of the integrated data is assumed to be comparable to computation on individual platform. By default, all parallel options are set to be FALSE. Parallelism happens on the individual assay and combined data level; No parallelism happens on genes since it would only slow donw the computation due to data transfering!
A list with following elements:
fits |
Model fits for each platform as returned by fitOnSinglePlat function |
estimatedScoreMat |
A matrix of estimated latent traits. The first several columns correspond to the individual assays; the last column represents the integrated latent trait with all data. |
permutedScoreMat |
A matrix of latent trait estimates after permuting the binary matrix within columns. This is only available if addPermutedScore is set to TRUE. The first several columns correspond to the individual assays; the last column represents the integrated data. |
dscrmnList |
A list of discrimination parameters. Each element contains all of the discrimination parameters as a vector for each assay. The last element contains the discrimination parameters for the integrated data which is formed by combining discrimination parameters from each assay sequentially. |
dffcltList |
Same format as dscrmnList except it contains difficulty parameter. |
gussngList |
Same format as dscrmnList except it contains guessing parameter. Be default, this is just all 0's. |
Pan Tong ([email protected]), Kevin R Coombes ([email protected])
Tong P, Coombes KR. integIRTy: a method to identify altered genes in cancer accounting for multiple mechanisms of regulation using item response theory. Bioinformatics, 2012 Nov 15; 28(22):2861–9.
intIRTeasyRunFromRaw, fitOnSinglePlat, calculatePermutedScoreByGeneSampling
This function performs data dichotomization, IRT fitting on individual assay, latent trait estimation for integrated data and significance assessment of latent trait by permutation. An option for parallel computing is implemented to speed up the computation.
intIRTeasyRunFromRaw(platforms, platformsCtr, assayType = c("Expr", "Methy", "CN"), model = 3, guessing = FALSE, permutationMethod = NULL, fold = 1, nPerm = 200, echo = TRUE, parallel = FALSE, ...)
intIRTeasyRunFromRaw(platforms, platformsCtr, assayType = c("Expr", "Methy", "CN"), model = 3, guessing = FALSE, permutationMethod = NULL, fold = 1, nPerm = 200, echo = TRUE, parallel = FALSE, ...)
platforms |
A list of matrices of the raw data for tumor samples. The matrices should have equal row number corresponding to the same set of genes. The columns representing the tumor samples can differ. |
platformsCtr |
A list of matrices of the raw data for normal control samples. The matrices should have equal row number corresponding to the same set of genes. The column number of each matrix can differ. When normal control is not available, i.e. in CN data, use NA instead. |
assayType |
A vector of assay types. Candidates can only be a combination of "Expr", "Methy", "CN" in the order of the assays specified in the input platforms. When assays other then these three types, we recommend the user to dichotomize the data first and use the intIRTeasyRun function. |
model |
The model type as described in fitOnSinglePlat. 1: Rasch model where all item discrination are set to 1; 2: all item discrimation are set to be equal but not necessarily as 1; 3: the 2PL model where no constraint is put on the item difficulty and discrimination parameter. |
guessing |
A logical variable indicating whether to include guessing parameter in the model. |
permutationMethod |
What permutation method to use. It can only be 'gene sampling', 'sample label permutation' or NULL. if NULL, no permutation is performed |
fold |
The fold of sampling to calculate permuted score as used in calculatePermutedScoreByGeneSampling(). Only relevant when permutationMethod=gene sampling is used. |
nPerm |
Number of permutations for sample label permutation. It is effective only when permutationMethod='sample label permutation'. |
echo |
A logical variable indicating whether to print out the progress information. |
parallel |
Logical indicating whether to use parallel computing with foreach package as backend. |
... |
Additional parameters for dichotomizing binary data. |
A list quite similar to the results returned by intIRTeasyRun. The following elements are included:
fits |
Model fits for each platform as returned by fitOnSinglePlat function |
estimatedScoreMat |
A matrix of estimated latent traits. The first several columns correspond to the individual assays; the last column represents the integrated latent trait with all data. |
permutedScoreMat |
A matrix of latent trait estimates after permuting the binary matrix within columns. This is only available if permutationMethod='gene sampling' is used. The first several columns correspond to the individual assays; the last column represents the integrated data. |
dscrmnList |
A list of discrimination parameters. Each element contains all of the discrimination parameters as a vector for each assay. The last element contains the discrimination parameters for the integrated data which is formed by combining discrimination parameters from each assay sequentially. |
dffcltList |
Same format as dscrmnList except it contains difficulty parameter. |
gussngList |
Same format as dscrmnList except it contains guessing parameter. Be default, this is just all 0's. |
permutedScoreMatWithLabelPerm |
A matrix of latent trait estimates using sample label permutation. This is only available if permutationMethod='sample label permutation' is used. The first several columns correspond to the individual assays; the last column represents the integrated data. |
Pan Tong ([email protected]), Kevin R Coombes ([email protected])
Tong P, Coombes KR. integIRTy: a method to identify altered genes in cancer accounting for multiple mechanisms of regulation using item response theory. Bioinformatics, 2012 Nov 15; 28(22):2861–9.
intIRTeasyRun, fitOnSinglePlat, calculatePermutedScoreByGeneSampling
data(OV) # controlList <- list(Expr_N, Methy_N, CN_N) tumorList <- list(Expr_T, Methy_T, CN_T) # not run as it takes time #runFromRaw <- intIRTeasyRunFromRaw(platforms=tumorList, # platformsCtr=controlList, # assayType=c("Expr", "Methy", "CN"), # permutationMethod="gene sampling")
data(OV) # controlList <- list(Expr_N, Methy_N, CN_N) tumorList <- list(Expr_T, Methy_T, CN_T) # not run as it takes time #runFromRaw <- intIRTeasyRunFromRaw(platforms=tumorList, # platformsCtr=controlList, # assayType=c("Expr", "Methy", "CN"), # permutationMethod="gene sampling")
Six matrices containing a subset of TCGA ovarian cancer data.
data(OV)
data(OV)
Each of the six objects (CN_N, CN_T, Methy_N, Methy_T, Expr_N, Expr_T) is a matrix with rows for 1000 (matched) genes and columns as samples. Gene expression, methylation and copy number data for 30 tumor samples and around 10 normal samples are provided.
This data is a subset of the TCGA ovarian cancer datasets. The full datasets can be downloaded through the TCGA data portal at: http://cancergenome.nih.gov/
Cancer Genome Atlas Research Network (2011). Integrated genomic analyses of ovarian carcinoma. Nature 474, 609-615
This function generates binary response matrix according to the Item Characteristic Function for specified item parameter and latent traits. It can be used for simulation purposes.
simulateBinaryResponseMat(a = a, b = b, theta = theta)
simulateBinaryResponseMat(a = a, b = b, theta = theta)
a |
A vector of item discrimination parameter |
b |
A vector of item difficulty parameter |
theta |
A vector of true latent traits |
This function is not necessary for the integration purpose. It serves as a utility function to help the user conduct simulation.
A matrix of 0's and 1's where rows are genes (examinees) and columns are samples (items).
Pan Tong ([email protected]), Kevin R Coombes ([email protected])
computeAbility, fitOnSinglePlat, intIRTeasyRun
# number of samples and genes to simulate nSample <- 50 nGene <- 1000 # mean and variance of item parameters meanDffclt_Expr <- 3; varDffclt_Expr <- 0.2 meanDscrmn_Expr <- 1.5; varDscrmn_Expr <- 0.1 # generate item parameters from gamma distribution set.seed(1000) Dffclt_Expr <- rgamma(nSample, shape=meanDffclt_Expr^2/varDffclt_Expr, scale=varDffclt_Expr/meanDffclt_Expr) Dscrmn_Expr <- rgamma(nSample, shape=meanDscrmn_Expr^2/varDscrmn_Expr, scale=varDscrmn_Expr/meanDscrmn_Expr) # generate latent trait theta <- rnorm(nGene) # the binary response matrix binary_Expr <- simulateBinaryResponseMat(a=Dscrmn_Expr, b=Dffclt_Expr, theta=theta) dim(binary_Expr)
# number of samples and genes to simulate nSample <- 50 nGene <- 1000 # mean and variance of item parameters meanDffclt_Expr <- 3; varDffclt_Expr <- 0.2 meanDscrmn_Expr <- 1.5; varDscrmn_Expr <- 0.1 # generate item parameters from gamma distribution set.seed(1000) Dffclt_Expr <- rgamma(nSample, shape=meanDffclt_Expr^2/varDffclt_Expr, scale=varDffclt_Expr/meanDffclt_Expr) Dscrmn_Expr <- rgamma(nSample, shape=meanDscrmn_Expr^2/varDscrmn_Expr, scale=varDscrmn_Expr/meanDscrmn_Expr) # generate latent trait theta <- rnorm(nGene) # the binary response matrix binary_Expr <- simulateBinaryResponseMat(a=Dscrmn_Expr, b=Dffclt_Expr, theta=theta) dim(binary_Expr)