Title: | Enhanced 'PAM50' Subtyping of Breast Cancer |
---|---|
Description: | Accurate classification of breast cancer tumors based on gene expression data is not a trivial task, and it lacks standard practices.The 'PAM50' classifier, which uses 50 gene centroid correlation distances to classify tumors, faces challenges with balancing estrogen receptor (ER) status and gene centering. The 'PCAPAM50' package leverages principal component analysis and iterative 'PAM50' calls to create a gene expression-based ER-balanced subset for gene centering, avoiding the use of protein expression-based ER data resulting into an enhanced Breast Cancer subtyping. |
Authors: | Praveen-Kumar Raj-Kumar [aut, cre, cph], Boyi Chen [aut], Ming-Wen Hu [aut], Tyler Hohenstein [aut], Jianfang Liu [aut], Craig D. Shriver [aut], Xiaoying Lin [aut, cph], Hai Hu [aut, cph] |
Maintainer: | Praveen-Kumar Raj-Kumar <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.2 |
Built: | 2024-11-25 07:10:18 UTC |
Source: | CRAN |
This function processes clinical and preprocessed PAM50 expression data to form an estrogen receptor (ER)-balanced set based on IHC classification. The ER-balanced set is created by distinguishing between ER-negative and ER-positive cases, and it produces conventional PAM50 intrinsic subtype calls.
makeCalls.ihc(df.cln, seed=118, mat, outDir=NULL)
makeCalls.ihc(df.cln, seed=118, mat, outDir=NULL)
df.cln |
Data frame of clinical data; It should include the columns 'PatientID' and 'IHC'. |
seed |
Seed for random number generation to ensure reproducibility. Default is 118. |
mat |
Matrix of preprocessed PAM50 expression data. |
outDir |
Directory for output files.If |
Returns a list containing:
Int.sbs |
Data frame with integrated subtype and clinical data. |
score.fl |
Data frame with scores from subtype predictions. |
mdns.fl |
Data frame with median values for each gene in the ER-balanced set. |
SBS.colr |
Colors associated with each subtype from the prediction results. |
outList |
Detailed results from subtype prediction functions. |
data_path <- system.file("extdata", "Sample_IHC_PAM_Mat.Rdat", package = "PCAPAM50") load(data_path) # Loads Test.ihc and Test.matrix # Prepare the data Test.ihc$ER_status <- rep("NA", length(Test.ihc$PatientID)) Test.ihc$ER_status[grep("^L",Test.ihc$IHC)] = "pos" Test.ihc$ER_status[-grep("^L",Test.ihc$IHC)] = "neg" Test.ihc <- Test.ihc[order(Test.ihc$ER_status, decreasing = TRUE),] Test.matrix <- Test.matrix[, Test.ihc$PatientID] df.cln <- data.frame(PatientID = Test.ihc$PatientID, IHC = Test.ihc$IHC, stringsAsFactors = FALSE) # Call the function result <- makeCalls.ihc(df.cln=df.cln, seed = 118, mat = Test.matrix, outDir=NULL)
data_path <- system.file("extdata", "Sample_IHC_PAM_Mat.Rdat", package = "PCAPAM50") load(data_path) # Loads Test.ihc and Test.matrix # Prepare the data Test.ihc$ER_status <- rep("NA", length(Test.ihc$PatientID)) Test.ihc$ER_status[grep("^L",Test.ihc$IHC)] = "pos" Test.ihc$ER_status[-grep("^L",Test.ihc$IHC)] = "neg" Test.ihc <- Test.ihc[order(Test.ihc$ER_status, decreasing = TRUE),] Test.matrix <- Test.matrix[, Test.ihc$PatientID] df.cln <- data.frame(PatientID = Test.ihc$PatientID, IHC = Test.ihc$IHC, stringsAsFactors = FALSE) # Call the function result <- makeCalls.ihc(df.cln=df.cln, seed = 118, mat = Test.matrix, outDir=NULL)
This function processes clinical IHC subtyping data and preprocessed PAM50 gene expression data to form a gene expression-guided ER-balanced set.This set is created by combining IHC classification information and using principal component 1 (PC1) to guide the separation.The function computes the median for each gene in this ER-balanced set, updates a calibration file, and runs subtype prediction algorithms to generate intermediate intrinsic subtype calls based on the PAM50 method. Various diagnostics and subtyping results are returned.
makeCalls.PC1ihc(df.cln, seed = 118, mat, outDir=NULL)
makeCalls.PC1ihc(df.cln, seed = 118, mat, outDir=NULL)
df.cln |
Data frame of clinical data; It should include the columns 'PatientID' and 'IHC'. |
seed |
Seed for random number generation to ensure reproducibility. Default is 118. |
mat |
Matrix of preprocessed PAM50 expression data. |
outDir |
Directory for output files.If |
Returns a list containing:
Int.sbs |
Data frame with integrated subtype and clinical data. |
score.fl |
Data frame with scores from subtype predictions. |
mdns.fl |
Data frame with median values for each gene in the ER-balanced set. |
SBS.colr |
Colors associated with each subtype from the prediction results. |
outList |
Detailed results from subtype prediction functions. |
PC1cutoff |
Cutoff values for PC1 used in subsetting. |
DF.PC1 |
Data frame of initial PCA results merged with clinical data. |
data_path <- system.file("extdata", "Sample_IHC_PAM_Mat.Rdat", package = "PCAPAM50") load(data_path) # Loads Test.ihc and Test.matrix # Prepare the data Test.ihc$ER_status <- rep("NA", length(Test.ihc$PatientID)) Test.ihc$ER_status[grep("^L",Test.ihc$IHC)] = "pos" Test.ihc$ER_status[-grep("^L",Test.ihc$IHC)] = "neg" Test.ihc <- Test.ihc[order(Test.ihc$ER_status, decreasing = TRUE),] Test.matrix <- Test.matrix[, Test.ihc$PatientID] df.cln <- data.frame(PatientID = Test.ihc$PatientID, IHC = Test.ihc$IHC, stringsAsFactors = FALSE) # Call the function result <- makeCalls.PC1ihc(df.cln=df.cln, seed = 118, mat = Test.matrix, outDir=NULL)
data_path <- system.file("extdata", "Sample_IHC_PAM_Mat.Rdat", package = "PCAPAM50") load(data_path) # Loads Test.ihc and Test.matrix # Prepare the data Test.ihc$ER_status <- rep("NA", length(Test.ihc$PatientID)) Test.ihc$ER_status[grep("^L",Test.ihc$IHC)] = "pos" Test.ihc$ER_status[-grep("^L",Test.ihc$IHC)] = "neg" Test.ihc <- Test.ihc[order(Test.ihc$ER_status, decreasing = TRUE),] Test.matrix <- Test.matrix[, Test.ihc$PatientID] df.cln <- data.frame(PatientID = Test.ihc$PatientID, IHC = Test.ihc$IHC, stringsAsFactors = FALSE) # Call the function result <- makeCalls.PC1ihc(df.cln=df.cln, seed = 118, mat = Test.matrix, outDir=NULL)
This function uses the intermediate intrinsic subtype calls and preprocessed PAM50 gene expression data to create an ER-balanced set and produces PCAPAM50 Calls.
makeCalls.v1PAM(df.pam, seed = 118, mat, outDir=NULL)
makeCalls.v1PAM(df.pam, seed = 118, mat, outDir=NULL)
df.pam |
Data frame of PAM data; It should include the columns 'PatientID' and 'PAM50'. |
seed |
Seed for random number generation to ensure reproducibility. |
mat |
Matrix of preprocessed PAM50 expression data. |
outDir |
Directory for output files.If |
Returns a list containing:
Int.sbs |
Data frame with integrated subtype and clinical data. |
score.fl |
Data frame with scores from subtype predictions. |
mdns.fl |
Data frame with median values for each gene in the ER-balanced set. |
SBS.colr |
Colors associated with each subtype from the prediction results. |
outList |
Detailed results from subtype prediction functions. |
data_path <- system.file("extdata", "Sample_IHC_PAM_Mat.Rdat", package = "PCAPAM50") load(data_path) # Loads Test.ihc and Test.matrix # Prepare the data Test.ihc$ER_status <- rep("NA", length(Test.ihc$PatientID)) Test.ihc$ER_status[grep("^L",Test.ihc$IHC)] = "pos" Test.ihc$ER_status[-grep("^L",Test.ihc$IHC)] = "neg" Test.ihc <- Test.ihc[order(Test.ihc$ER_status, decreasing = TRUE),] Test.matrix <- Test.matrix[, Test.ihc$PatientID] df.cln <- data.frame(PatientID = Test.ihc$PatientID, IHC = Test.ihc$IHC, stringsAsFactors = FALSE) outDir <- "Call.PC1" # Make a secondary ER-balanced subset and derive intermediate intrinsic subtype calls result <- makeCalls.PC1ihc(df.cln=df.cln, seed = 118, mat = Test.matrix, outDir=outDir) df.pc1pam = data.frame(PatientID=result$Int.sbs$PatientID, PAM50=result$Int.sbs$Int.SBS.Mdns.PC1ihc, IHC=result$Int.sbs$IHC, stringsAsFactors=FALSE) ### IHC column is optional # Make a tertiary ER-balanced set and PCAPAM50 calls res <- makeCalls.v1PAM(df.pam = df.pc1pam, seed = 118, mat = Test.matrix, outDir=NULL)
data_path <- system.file("extdata", "Sample_IHC_PAM_Mat.Rdat", package = "PCAPAM50") load(data_path) # Loads Test.ihc and Test.matrix # Prepare the data Test.ihc$ER_status <- rep("NA", length(Test.ihc$PatientID)) Test.ihc$ER_status[grep("^L",Test.ihc$IHC)] = "pos" Test.ihc$ER_status[-grep("^L",Test.ihc$IHC)] = "neg" Test.ihc <- Test.ihc[order(Test.ihc$ER_status, decreasing = TRUE),] Test.matrix <- Test.matrix[, Test.ihc$PatientID] df.cln <- data.frame(PatientID = Test.ihc$PatientID, IHC = Test.ihc$IHC, stringsAsFactors = FALSE) outDir <- "Call.PC1" # Make a secondary ER-balanced subset and derive intermediate intrinsic subtype calls result <- makeCalls.PC1ihc(df.cln=df.cln, seed = 118, mat = Test.matrix, outDir=outDir) df.pc1pam = data.frame(PatientID=result$Int.sbs$PatientID, PAM50=result$Int.sbs$Int.SBS.Mdns.PC1ihc, IHC=result$Int.sbs$IHC, stringsAsFactors=FALSE) ### IHC column is optional # Make a tertiary ER-balanced set and PCAPAM50 calls res <- makeCalls.v1PAM(df.pam = df.pc1pam, seed = 118, mat = Test.matrix, outDir=NULL)
Modeling after plotPCA of DESeq
my.plotPCA(x, intgroup, ablne = 0, colours = c("red","hotpink","darkblue", "lightblue","red3","hotpink3", "royalblue3","lightskyblue3"), LINE.V = TRUE)
my.plotPCA(x, intgroup, ablne = 0, colours = c("red","hotpink","darkblue", "lightblue","red3","hotpink3", "royalblue3","lightskyblue3"), LINE.V = TRUE)
x |
An ExpressionSet object, with matrix data (x) in ‘assay(x)’, produced for example by ExpressionSet(assayData=Test.matrix, phenoData=phenoData) |
intgroup |
Subtype condition: a character vector of names in ‘colData(x)’ to use for grouping. |
ablne |
An x-axis coordinate for drawing a vertical line. Default is 0. |
colours |
Colors for subtypes present in the condition. |
LINE.V |
Determines whether or not to draw line. Default is |
Returns an image containing:
pcafig |
The plot. |
library("Biobase") data_path <- system.file("extdata", "Sample_IHC_PAM_Mat.Rdat", package = "PCAPAM50") load(data_path) # Loads Test.ihc and Test.matrix pData = data.frame(condition=Test.ihc$IHC) rownames(pData) = Test.ihc$PatientID phenoData = new("AnnotatedDataFrame", data=pData)#, varMetadata=Metadata XSet = ExpressionSet(assayData=Test.matrix, phenoData=phenoData) my.plotPCA(XSet, intgroup=pData$condition, ablne=2.4, colours = c("hotpink","darkblue","lightblue","lightblue3","red"), LINE.V = TRUE)
library("Biobase") data_path <- system.file("extdata", "Sample_IHC_PAM_Mat.Rdat", package = "PCAPAM50") load(data_path) # Loads Test.ihc and Test.matrix pData = data.frame(condition=Test.ihc$IHC) rownames(pData) = Test.ihc$PatientID phenoData = new("AnnotatedDataFrame", data=pData)#, varMetadata=Metadata XSet = ExpressionSet(assayData=Test.matrix, phenoData=phenoData) my.plotPCA(XSet, intgroup=pData$condition, ablne=2.4, colours = c("hotpink","darkblue","lightblue","lightblue3","red"), LINE.V = TRUE)