Package 'BioM2' reference manual

Title:	Biologically Explainable Machine Learning Framework
Description:	Biologically Explainable Machine Learning Framework for Phenotype Prediction using omics data described in Chen and Schwarz (2017) <doi:10.48550/arXiv.1712.00336>.Identifying reproducible and interpretable biological patterns from high-dimensional omics data is a critical factor in understanding the risk mechanism of complex disease. As such, explainable machine learning can offer biological insight in addition to personalized risk scoring.In this process, a feature space of biological pathways will be generated, and the feature space can also be subsequently analyzed using WGCNA (Described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559> ) methods.
Authors:	Shunjie Zhang [aut, cre], Junfang Chen [aut]
Maintainer:	Shunjie Zhang <[email protected]>
License:	MIT + file LICENSE
Version:	1.1.1
Built:	2025-02-19 11:23:14 UTC
Source:	CRAN

Add unmapped probe

Description

Add unmapped probe

Usage

AddUnmapped(
  train = NULL,
  test = NULL,
  Unmapped_num = NULL,
  Add_FeartureSelection_Method = "wilcox.test",
  anno = NULL,
  len = NULL,
  verbose = TRUE,
  cores = 1
)
AddUnmapped(
  train = NULL,
  test = NULL,
  Unmapped_num = NULL,
  Add_FeartureSelection_Method = "wilcox.test",
  anno = NULL,
  len = NULL,
  verbose = TRUE,
  cores = 1
)

Arguments

`train`	The input training dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.
`test`	The input test dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.
`Unmapped_num`	The number of unmapped probes.
`Add_FeartureSelection_Method`	Feature selection methods. Available options are c('cor', 'wilcox.test').
`anno`	The annotation data stored in a data.frame for probe mapping. It must have at least two columns named 'ID' and 'entrezID'. (For details, please refer to data( data("MethylAnno") )
`len`	The number of unmapped probes
`verbose`	Whether to print running process information to the console
`cores`	The number of cores used for computation.

Value

Matrix of unmapped probes

Prediction by Machine Learning

Description

Prediction by Machine Learning with different learners ( From 'mlr3' )

Usage

baseModel(
  trainData,
  testData,
  predMode = "probability",
  classifier,
  paramlist = NULL,
  inner_folds = 10
)
baseModel(
  trainData,
  testData,
  predMode = "probability",
  classifier,
  paramlist = NULL,
  inner_folds = 10
)

Arguments

`trainData`	The input training dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.
`testData`	The input test dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.
`predMode`	The prediction mode.Currently only supports 'probability' for binary classification tasks.
`classifier`	Learners in mlr3
`paramlist`	Learner parameters search spaces
`inner_folds`	k-fold cross validation ( Only supported when testData = NULL )

Value

The predicted output for the test data.

Author(s)

Shunjie Zhang

Examples

library(mlr3verse)
library(caret)
library(BioM2)
data=MethylData_Test
set.seed(1)
part=unlist(createDataPartition(data$label,p=0.8))#Split data
predict=baseModel(trainData=data[part,1:10],
                 testData=data[-part,1:10],
                 classifier = 'svm')#Use 10 features to make predictions,Learner uses svm


library(mlr3verse)
library(caret)
library(BioM2)
data=MethylData_Test
set.seed(1)
part=unlist(createDataPartition(data$label,p=0.8))#Split data
predict=baseModel(trainData=data[part,1:10],
                 testData=data[-part,1:10],
                 classifier = 'svm')#Use 10 features to make predictions,Learner uses svm

Biologically Explainable Machine Learning Framework

Description

Biologically Explainable Machine Learning Framework

Usage

BioM2(
  TrainData = NULL,
  TestData = NULL,
  pathlistDB = NULL,
  FeatureAnno = NULL,
  resampling = NULL,
  nfolds = 5,
  classifier = "liblinear",
  paramlist = NULL,
  predMode = "probability",
  PathwaySizeUp = 200,
  PathwaySizeDown = 20,
  MinfeatureNum_pathways = 10,
  Add_UnMapped = TRUE,
  Unmapped_num = 300,
  Add_FeartureSelection_Method = "wilcox.test",
  Inner_CV = TRUE,
  inner_folds = 10,
  Stage1_FeartureSelection_Method = "cor",
  cutoff = 0.3,
  Stage2_FeartureSelection_Method = "RemoveHighcor",
  cutoff2 = 0.95,
  classifier2 = NULL,
  target = "predict",
  p.adjust.method = "fdr",
  save_pathways_matrix = FALSE,
  cores = 1,
  verbose = TRUE
)
BioM2(
  TrainData = NULL,
  TestData = NULL,
  pathlistDB = NULL,
  FeatureAnno = NULL,
  resampling = NULL,
  nfolds = 5,
  classifier = "liblinear",
  paramlist = NULL,
  predMode = "probability",
  PathwaySizeUp = 200,
  PathwaySizeDown = 20,
  MinfeatureNum_pathways = 10,
  Add_UnMapped = TRUE,
  Unmapped_num = 300,
  Add_FeartureSelection_Method = "wilcox.test",
  Inner_CV = TRUE,
  inner_folds = 10,
  Stage1_FeartureSelection_Method = "cor",
  cutoff = 0.3,
  Stage2_FeartureSelection_Method = "RemoveHighcor",
  cutoff2 = 0.95,
  classifier2 = NULL,
  target = "predict",
  p.adjust.method = "fdr",
  save_pathways_matrix = FALSE,
  cores = 1,
  verbose = TRUE
)

Arguments

`TrainData`	The input training dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.
`TestData`	The input test dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.
`pathlistDB`	A list of pathways with pathway IDs and their corresponding genes ('entrezID' is used). For details, please refer to ( data("GO2ALLEGS_BP") )
`FeatureAnno`	The annotation data stored in a data.frame for probe mapping. It must have at least two columns named 'ID' and 'entrezID'. (For details, please refer to data( data("MethylAnno") )
`resampling`	Resampling in mlr3verse.
`nfolds`	k-fold cross validation ( Only supported when TestData = NULL )
`classifier`	Learners in mlr3
`paramlist`	Learner parameters search spaces
`predMode`	The prediction mode. Currently only supports 'probability' for binary classification tasks.
`PathwaySizeUp`	The upper-bound of the number of genes in each biological pathways.
`PathwaySizeDown`	The lower-bound of the number of genes in each biological pathways.
`MinfeatureNum_pathways`	The minimal defined pathway size after mapping your own data to pathlistDB(KEGG database/GO database).
`Add_UnMapped`	Whether to add unmapped probes for prediction
`Unmapped_num`	The number of unmapped probes
`Add_FeartureSelection_Method`	Feature selection methods.
`Inner_CV`	Whether to perform a k-fold verification on the training set.
`inner_folds`	k-fold verification on the training set.
`Stage1_FeartureSelection_Method`	Feature selection methods.
`cutoff`	The cutoff used for feature selection threshold. It can be any value between 0 and 1.
`Stage2_FeartureSelection_Method`	Feature selection methods.
`cutoff2`	The cutoff used for feature selection threshold. It can be any value between 0 and 1.
`classifier2`	Learner for stage 2 prediction(if classifier2==NULL,then it is the same as the learner in stage 1.)
`target`	Is it used to predict or explore potential biological mechanisms? Available options are c('predict', 'pathways').
`p.adjust.method`	p-value adjustment method.(holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
`save_pathways_matrix`	Whether to output the path matrix file
`cores`	The number of cores used for computation.
`verbose`	Whether to print running process information to the console

Value

A list containing prediction results and prediction result evaluation

Examples




library(mlr3verse)
library(caret)
library(parallel)
library(BioM2)
data=MethylData_Test
set.seed(1)
part=unlist(createDataPartition(data$label,p=0.8))
Train=data[part,]
Test=data[-part,]
pathlistDB=GO2ALLEGS_BP
FeatureAnno=MethylAnno


pred=BioM2(TrainData = Train,TestData = Test,
           pathlistDB=pathlistDB,FeatureAnno=FeatureAnno,
           classifier='svm',nfolds=5,
           PathwaySizeUp=25,PathwaySizeDown=20,MinfeatureNum_pathways=10,
           Add_UnMapped='Yes',Unmapped_num=300,
           Inner_CV='None',inner_folds=5,
           Stage1_FeartureSelection_Method='cor',cutoff=0.3,
           Stage2_FeartureSelection_Method='None',
           target='predict',cores=1
)#(To explore biological mechanisms, set target=‘pathways’)



library(mlr3verse)
library(caret)
library(parallel)
library(BioM2)
data=MethylData_Test
set.seed(1)
part=unlist(createDataPartition(data$label,p=0.8))
Train=data[part,]
Test=data[-part,]
pathlistDB=GO2ALLEGS_BP
FeatureAnno=MethylAnno


pred=BioM2(TrainData = Train,TestData = Test,
           pathlistDB=pathlistDB,FeatureAnno=FeatureAnno,
           classifier='svm',nfolds=5,
           PathwaySizeUp=25,PathwaySizeDown=20,MinfeatureNum_pathways=10,
           Add_UnMapped='Yes',Unmapped_num=300,
           Inner_CV='None',inner_folds=5,
           Stage1_FeartureSelection_Method='cor',cutoff=0.3,
           Stage2_FeartureSelection_Method='None',
           target='predict',cores=1
)#(To explore biological mechanisms, set target=‘pathways’)

Find suitable parameters for partitioning pathways modules

Description

Find suitable parameters for partitioning pathways modules

Usage

FindParaModule(
  pathways_matrix = NULL,
  control_label = 0,
  minModuleSize = seq(10, 20, 5),
  mergeCutHeight = seq(0, 0.3, 0.1),
  minModuleNum = 5,
  power = NULL,
  exact = TRUE,
  ancestor_anno = NULL
)
FindParaModule(
  pathways_matrix = NULL,
  control_label = 0,
  minModuleSize = seq(10, 20, 5),
  mergeCutHeight = seq(0, 0.3, 0.1),
  minModuleNum = 5,
  power = NULL,
  exact = TRUE,
  ancestor_anno = NULL
)

Arguments

`pathways_matrix`	A pathway matrix generated by the BioM2( target='pathways') function.
`control_label`	The label of the control group ( A single number, factor, or character )
`minModuleSize`	minimum module size for module detection. Detail for WGCNA::blockwiseModules()
`mergeCutHeight`	dendrogram cut height for module merging. Detail for WGCNA::blockwiseModules()
`minModuleNum`	Minimum total number of modules detected
`power`	soft-thresholding power for network construction. Detail for WGCNA::blockwiseModules()
`exact`	Whether to divide GO pathways more accurately (work when ancestor_anno=NULL)
`ancestor_anno`	Annotations for ancestral relationships (like data('GO_Ancestor') )

Value

A list containing recommended parameters

Pathways in the GO database and their Ancestor

Description

Inclusion relationships between pathways

Format

A data frame :

...

Details

In the GO database, each pathway will have its own ancestor pathway. Map pathways in GO database to about 20 common ancestor pathways.

Source

From GO.db

Pathways in the GO database and their Ancestor

Description

Inclusion relationships between pathways

Format

A data frame :

...

Details

In the GO database, each pathway will have its own ancestor pathway. Map pathways in GO database to about 400 common ancestor pathways.

Source

From GO.db

An example about pathlistDB

Description

An example about pathlistDB

Format

A list :

...

Details

A list of pathways with pathway IDs and their corresponding genes ('entrezID' is used).

BioM2 Hyperparametric Combination

Description

BioM2 Hyperparametric Combination

Usage

HyBioM2(
  TrainData = NULL,
  pathlistDB = NULL,
  FeatureAnno = NULL,
  resampling = NULL,
  nfolds = 5,
  classifier = "liblinear",
  predMode = "probability",
  PathwaySizeUp = 200,
  PathwaySizeDown = 20,
  MinfeatureNum_pathways = 10,
  Add_UnMapped = TRUE,
  Add_FeartureSelection_Method = "wilcox.test",
  Unmapped_num = 300,
  Inner_CV = TRUE,
  inner_folds = 10,
  Stage1_FeartureSelection_Method = "cor",
  stage1_cutoff = 0.3,
  Stage2_FeartureSelection_Method = "RemoveHighcor",
  stage2_cutoff = 0.8,
  classifier2 = NULL,
  cores = 1,
  verbose = TRUE
)
HyBioM2(
  TrainData = NULL,
  pathlistDB = NULL,
  FeatureAnno = NULL,
  resampling = NULL,
  nfolds = 5,
  classifier = "liblinear",
  predMode = "probability",
  PathwaySizeUp = 200,
  PathwaySizeDown = 20,
  MinfeatureNum_pathways = 10,
  Add_UnMapped = TRUE,
  Add_FeartureSelection_Method = "wilcox.test",
  Unmapped_num = 300,
  Inner_CV = TRUE,
  inner_folds = 10,
  Stage1_FeartureSelection_Method = "cor",
  stage1_cutoff = 0.3,
  Stage2_FeartureSelection_Method = "RemoveHighcor",
  stage2_cutoff = 0.8,
  classifier2 = NULL,
  cores = 1,
  verbose = TRUE
)

Arguments

`TrainData`	The input training dataset. The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.
`pathlistDB`	A list of pathways with pathway IDs and their corresponding genes ('entrezID' is used). For details, please refer to ( data("GO2ALLEGS_BP") )
`FeatureAnno`	The annotation data stored in a data.frame for probe mapping. It must have at least two columns named 'ID' and 'entrezID'. (For details, please refer to data( data("MethylAnno") )
`resampling`	Resampling in mlr3verse.
`nfolds`	k-fold cross validation ( Only supported when TestData = NULL )
`classifier`	Learners in mlr3
`predMode`	The prediction mode. Currently only supports 'probability' for binary classification tasks.
`PathwaySizeUp`	The upper-bound of the number of genes in each biological pathways.
`PathwaySizeDown`	The lower-bound of the number of genes in each biological pathways.
`MinfeatureNum_pathways`	The minimal defined pathway size after mapping your own data to pathlistDB(KEGG database/GO database).
`Add_UnMapped`	Whether to add unmapped probes for prediction
`Add_FeartureSelection_Method`	Feature selection methods.
`Unmapped_num`	The number of unmapped probes
`Inner_CV`	Whether to perform a k-fold verification on the training set.
`inner_folds`	k-fold verification on the training set.
`Stage1_FeartureSelection_Method`	Feature selection methods.
`stage1_cutoff`	The cutoff used for feature selection threshold. It can be any value between 0 and 1.
`Stage2_FeartureSelection_Method`	Feature selection methods.
`stage2_cutoff`	The cutoff used for feature selection threshold. It can be any value between 0 and 1.
`classifier2`	Learner for stage 2 prediction(if classifier2==NULL,then it is the same as the learner in stage 1.)
`cores`	The number of cores used for computation.
`verbose`	Whether to print running process information to the console

Value

A data frame contains hyperparameter results

An example about FeatureAnno for methylation data

Description

An example about FeatureAnno for methylation data

Format

A data frame :

...

Details

The annotation data stored in a data.frame for probe mapping. It must have at least two columns named 'ID' and 'entrezID'.

An example about TrainData/TestData for methylation data

Description

An example about TrainData/TestData for methylation data MethylData_Test.

Format

A data frame :

...

Details

The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.

Delineate differential pathway modules with high biological interpretability

Description

Delineate differential pathway modules with high biological interpretability

Usage

PathwaysModule(
  pathways_matrix = NULL,
  control_label = NULL,
  power = NULL,
  minModuleSize = NULL,
  mergeCutHeight = NULL,
  cutoff = 70,
  MinNumPathways = 5,
  p.adjust.method = "fdr",
  exact = TRUE,
  ancestor_anno = NULL
)
PathwaysModule(
  pathways_matrix = NULL,
  control_label = NULL,
  power = NULL,
  minModuleSize = NULL,
  mergeCutHeight = NULL,
  cutoff = 70,
  MinNumPathways = 5,
  p.adjust.method = "fdr",
  exact = TRUE,
  ancestor_anno = NULL
)

Arguments

`pathways_matrix`	A pathway matrix generated by the BioM2( target='pathways') function.
`control_label`	The label of the control group ( A single number, factor, or character )
`power`	soft-thresholding power for network construction. Detail for WGCNA::blockwiseModules()
`minModuleSize`	minimum module size for module detection. Detail for WGCNA::blockwiseModules()
`mergeCutHeight`	dendrogram cut height for module merging. Detail for WGCNA::blockwiseModules()
`cutoff`	Thresholds for Biological Interpretability Difference Modules
`MinNumPathways`	Minimum number of pathways included in the biologically interpretable difference module
`p.adjust.method`	p-value adjustment method.(holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
`exact`	Whether to divide GO pathways more accurately (work when ancestor_anno=NULL)
`ancestor_anno`	Annotations for ancestral relationships (like data('GO_Ancestor') )

Value

A list containing differential module results that are highly biologically interpretable

Correlalogram for Biological Differences Modules

Description

Correlalogram for Biological Differences Modules

Usage

PlotCorModule(
  PathwaysModule_obj = NULL,
  alpha = 0.7,
  begin = 0.2,
  end = 0.9,
  option = "C",
  family = "serif"
)
PlotCorModule(
  PathwaysModule_obj = NULL,
  alpha = 0.7,
  begin = 0.2,
  end = 0.9,
  option = "C",
  family = "serif"
)

Arguments

`PathwaysModule_obj`	Results produced by PathwaysModule()
`alpha`	The alpha transparency, a number in (0,1). Detail for scale_fill_viridis()
`begin`	The (corrected) hue in (0,1) at which the color map begins. Detail for scale_fill_viridis().
`end`	The (corrected) hue in (0,1) at which the color map ends. Detail for scale_fill_viridis()
`option`	A character string indicating the color map option to use. Detail for scale_fill_viridis()
`family`	calligraphic style

Value

a ggplot object

Visualisation of significant pathway-level features

Description

Visualisation of significant pathway-level features

Usage

PlotPathFearture(
  BioM2_pathways_obj = NULL,
  pathlistDB = NULL,
  top = 10,
  p.adjust.method = "none",
  begin = 0.1,
  end = 0.9,
  alpha = 0.9,
  option = "C",
  seq = 1
)
PlotPathFearture(
  BioM2_pathways_obj = NULL,
  pathlistDB = NULL,
  top = 10,
  p.adjust.method = "none",
  begin = 0.1,
  end = 0.9,
  alpha = 0.9,
  option = "C",
  seq = 1
)

Arguments

`BioM2_pathways_obj`	Results produced by BioM2(,target='pathways')
`pathlistDB`	A list of pathways with pathway IDs and their corresponding genes ('entrezID' is used). For details, please refer to ( data("GO2ALLEGS_BP") )
`top`	Number of significant pathway-level features visualised
`p.adjust.method`	p-value adjustment method.(holm", "hochberg", "hommel", "bonferroni", "BH", "BY","fdr","none")
`begin`	The (corrected) hue in (0,1) at which the color map begins. Detail for scale_fill_viridis().
`end`	The (corrected) hue in (0,1) at which the color map ends. Detail for scale_fill_viridis()
`alpha`	The alpha transparency, a number in (0,1). Detail for scale_fill_viridis()
`option`	A character string indicating the color map option to use. Detail for scale_fill_viridis()
`seq`	Interval of x-coordinate

Value

a ggplot2 object

Visualisation Original features that make up the pathway

Description

Visualisation Original features that make up the pathway

Usage

PlotPathInner(
  data = NULL,
  pathlistDB = NULL,
  FeatureAnno = NULL,
  PathNames = NULL,
  p.adjust.method = "none",
  save_pdf = FALSE,
  alpha = 1,
  cols = NULL
)
PlotPathInner(
  data = NULL,
  pathlistDB = NULL,
  FeatureAnno = NULL,
  PathNames = NULL,
  p.adjust.method = "none",
  save_pdf = FALSE,
  alpha = 1,
  cols = NULL
)

Arguments

`data`	The input omics data
`pathlistDB`	A list of pathways with pathway IDs and their corresponding genes ('entrezID' is used). For details, please refer to ( data("GO2ALLEGS_BP") )
`FeatureAnno`	The annotation data stored in a data.frame for probe mapping. It must have at least two columns named 'ID' and 'entrezID'. (For details, please refer to data( data("MethylAnno") )
`PathNames`	A vector.A vector containing the names of pathways
`p.adjust.method`	p-value adjustment method.(holm", "hochberg", "hommel", "bonferroni", "BH", "BY","fdr","none")
`save_pdf`	Whether to save images in PDF format
`alpha`	The alpha transparency, a number in (0,1).
`cols`	palette (vector of colour names)

Value

a plot object

Network diagram of pathways-level features

Description

Network diagram of pathways-level features

Usage

PlotPathNet(
  data = NULL,
  BioM2_pathways_obj = NULL,
  FeatureAnno = NULL,
  pathlistDB = NULL,
  PathNames = NULL,
  cutoff = 0.2,
  num = 10
)
PlotPathNet(
  data = NULL,
  BioM2_pathways_obj = NULL,
  FeatureAnno = NULL,
  pathlistDB = NULL,
  PathNames = NULL,
  cutoff = 0.2,
  num = 10
)

Arguments

`data`	The input omics data
`BioM2_pathways_obj`	Results produced by BioM2()
`FeatureAnno`	The annotation data stored in a data.frame for probe mapping. It must have at least two columns named 'ID' and 'entrezID'. (For details, please refer to data( data("MethylAnno") )
`pathlistDB`	A list of pathways with pathway IDs and their corresponding genes ('entrezID' is used). For details, please refer to ( data("GO2ALLEGS_BP") )
`PathNames`	A vector.A vector containing the names of pathways
`cutoff`	Threshold for correlation between features within a pathway
`num`	The first few internal features of each pathway that are most relevant to the phenotype

Value

a ggplot object

Display biological information within each pathway module

Description

Display biological information within each pathway module

Usage

ShowModule(obj = NULL, ID_Module = NULL, exact = TRUE, ancestor_anno = NULL)
ShowModule(obj = NULL, ID_Module = NULL, exact = TRUE, ancestor_anno = NULL)

Arguments

`obj`	Results produced by PathwaysModule()
`ID_Module`	ID of the diff module
`exact`	Whether to divide GO pathways more accurately (work when ancestor_anno=NULL)
`ancestor_anno`	Annotations for ancestral relationships (like data('GO_Ancestor') )

Value

List containing biologically specific information within the module

Stage 1 Fearture Selection

Description

Stage 1 Fearture Selection

Usage

Stage1_FeartureSelection(
  Stage1_FeartureSelection_Method = "cor",
  data = NULL,
  cutoff = NULL,
  featureAnno = NULL,
  pathlistDB_sub = NULL,
  MinfeatureNum_pathways = 10,
  cores = 1,
  verbose = TRUE
)
Stage1_FeartureSelection(
  Stage1_FeartureSelection_Method = "cor",
  data = NULL,
  cutoff = NULL,
  featureAnno = NULL,
  pathlistDB_sub = NULL,
  MinfeatureNum_pathways = 10,
  cores = 1,
  verbose = TRUE
)

Arguments

`Stage1_FeartureSelection_Method`	Feature selection methods. Available options are c(NULL, 'cor', 'wilcox.test', 'cor_rank', 'wilcox.test_rank').
`data`	The input training dataset. The first column is the label.
`cutoff`	The cutoff used for feature selection threshold. It can be any value between 0 and 1. Commonly used cutoffs are c(0.5, 0.1, 0.05, 0.01, etc.).
`featureAnno`	The annotation data stored in a data.frame for probe mapping. It must have at least two columns named 'ID' and 'entrezID'. (For details, please refer to data( data("MethylAnno") )
`pathlistDB_sub`	A list of pathways with pathway IDs and their corresponding genes ('entrezID' is used). For details, please refer to ( data("GO2ALLEGS_BP") )
`MinfeatureNum_pathways`	The minimal defined pathway size after mapping your own data to pathlistDB(KEGG database/GO database).
`cores`	The number of cores used for computation.
`verbose`	Whether to print running process information to the console

Value

A list of matrices with pathway IDs as the associated list member names.

Author(s)

Shunjie Zhang

Examples


library(parallel)
data=MethylData_Test
feature_pathways=Stage1_FeartureSelection(Stage1_FeartureSelection_Method='cor',
                 data=data,cutoff=0,
                 featureAnno=MethylAnno,pathlistDB_sub=GO2ALLEGS_BP,cores=1)

library(parallel)
data=MethylData_Test
feature_pathways=Stage1_FeartureSelection(Stage1_FeartureSelection_Method='cor',
                 data=data,cutoff=0,
                 featureAnno=MethylAnno,pathlistDB_sub=GO2ALLEGS_BP,cores=1)

Stage 2 Fearture Selection

Description

Stage 2 Fearture Selection

Usage

Stage2_FeartureSelection(
  Stage2_FeartureSelection_Method = "RemoveHighcor",
  data = NULL,
  label = NULL,
  cutoff = NULL,
  preMode = NULL,
  classifier = NULL,
  verbose = TRUE,
  cores = 1
)
Stage2_FeartureSelection(
  Stage2_FeartureSelection_Method = "RemoveHighcor",
  data = NULL,
  label = NULL,
  cutoff = NULL,
  preMode = NULL,
  classifier = NULL,
  verbose = TRUE,
  cores = 1
)

Arguments

`Stage2_FeartureSelection_Method`	Feature selection methods. Available options are c(NULL, 'cor', 'wilcox.test','cor_rank','wilcox.test_rank','RemoveHighcor', 'RemoveLinear').
`data`	The input training dataset. The first column is the label.
`label`	The label of dataset
`cutoff`	The cutoff used for feature selection threshold. It can be any value between 0 and 1.
`preMode`	The prediction mode. "Currently only supports 'probability' for binary classification tasks."
`classifier`	Learners in mlr3
`verbose`	Whether to print running process information to the console
`cores`	The number of cores used for computation.

Value

Column index of feature

Author(s)

Shunjie Zhang

An example about FeatureAnno for gene expression

Description

An example about FeatureAnno for gene expression

Format

A data frame :

...

Details

The annotation data stored in a data.frame for probe mapping. It must have at least two columns named 'ID' and 'entrezID'.

An example about TrainData/TestData for gene expression

Description

An example about TrainData/TestData for gene expression MethylData_Test.

Format

A data frame :

...

Details

The first column is the label or the output. For binary classes, 0 and 1 are used to indicate the class member.

Visualisation of the results of the analysis of the pathway modules

Description

Visualisation of the results of the analysis of the pathway modules

Usage

VisMultiModule(
  BioM2_pathways_obj = NULL,
  FindParaModule_obj = NULL,
  ShowModule_obj = NULL,
  PathwaysModule_obj = NULL,
  exact = TRUE,
  ancestor_anno = NULL,
  type_text_table = FALSE,
  text_table_theme = ttheme("mOrange"),
  volin = FALSE,
  control_label = 0,
  module = NULL,
  cols = NULL,
  n_neighbors = 8,
  spread = 1,
  min_dist = 2,
  target_weight = 0.5,
  size = 1.5,
  alpha = 1,
  ellipse = TRUE,
  ellipse.alpha = 0.2,
  theme = ggthemes::theme_base(base_family = "serif"),
  save_pdf = FALSE,
  width = 7,
  height = 7
)
VisMultiModule(
  BioM2_pathways_obj = NULL,
  FindParaModule_obj = NULL,
  ShowModule_obj = NULL,
  PathwaysModule_obj = NULL,
  exact = TRUE,
  ancestor_anno = NULL,
  type_text_table = FALSE,
  text_table_theme = ttheme("mOrange"),
  volin = FALSE,
  control_label = 0,
  module = NULL,
  cols = NULL,
  n_neighbors = 8,
  spread = 1,
  min_dist = 2,
  target_weight = 0.5,
  size = 1.5,
  alpha = 1,
  ellipse = TRUE,
  ellipse.alpha = 0.2,
  theme = ggthemes::theme_base(base_family = "serif"),
  save_pdf = FALSE,
  width = 7,
  height = 7
)

Arguments

`BioM2_pathways_obj`	Results produced by BioM2(,target='pathways')
`FindParaModule_obj`	Results produced by FindParaModule()
`ShowModule_obj`	Results produced by ShowModule()
`PathwaysModule_obj`	Results produced by PathwaysModule()
`exact`	Whether to divide GO pathways more accurately (work when ancestor_anno=NULL)
`ancestor_anno`	Annotations for ancestral relationships (like data('GO_Ancestor') )
`type_text_table`	Whether to display it in a table
`text_table_theme`	The topic of this table.Detail for ggtexttable()
`volin`	Can only be used when PathwaysModule_obj exists. ( Violin diagram )
`control_label`	Can only be used when PathwaysModule_obj exists. ( Control group label )
`module`	Can only be used when PathwaysModule_obj exists.( PathwaysModule ID )
`cols`	palette (vector of colour names)
`n_neighbors`	The size of local neighborhood (in terms of number of neighboring sample points) used for manifold approximation. Larger values result in more global views of the manifold, while smaller values result in more local data being preserved. In general values should be in the range 2 to 100.
`spread`	The effective scale of embedded points. In combination with min_dist, this determines how clustered/clumped the embedded points are.
`min_dist`	The effective minimum distance between embedded points. Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the spread value, which determines the scale at which embedded points will be spread out.
`target_weight`	Weighting factor between data topology and target topology. A value of 0.0 weights entirely on data, a value of 1.0 weights entirely on target. The default of 0.5 balances the weighting equally between data and target. Only applies if y is non-NULL.
`size`	Scatter plot point size
`alpha`	Alpha for ellipse specifying the transparency level of fill color. Use alpha = 0 for no fill color.
`ellipse`	logical value. If TRUE, draws ellipses around points.
`ellipse.alpha`	Alpha for ellipse specifying the transparency level of fill color. Use alpha = 0 for no fill color.
`theme`	Default:theme_base(base_family = "serif")
`save_pdf`	Whether to save images in PDF format
`width`	image width
`height`	image height

Value

a ggplot2 object

Package 'BioM2'

Help Index

Add unmapped probe

Description

Usage

Arguments

Value

Prediction by Machine Learning

Description

Usage

Arguments

Value

Author(s)

Examples

Biologically Explainable Machine Learning Framework

Description

Usage

Arguments

Value

Examples

Find suitable parameters for partitioning pathways modules

Description

Usage

Arguments

Value

Pathways in the GO database and their Ancestor

Description

Format

Details

Source

Pathways in the GO database and their Ancestor

Description

Format

Details

Source

An example about pathlistDB

Description

Format

Details

BioM2 Hyperparametric Combination

Description

Usage

Arguments

Value

An example about FeatureAnno for methylation data

Description

Format

Details

An example about TrainData/TestData for methylation data

Description

Format

Details

Delineate differential pathway modules with high biological interpretability

Description

Usage

Arguments

Value

Correlalogram for Biological Differences Modules

Description

Usage

Arguments

Value

Visualisation of significant pathway-level features

Description

Usage

Arguments

Value

Visualisation Original features that make up the pathway

Description

Usage

Arguments

Value

Network diagram of pathways-level features

Description

Usage

Arguments

Value

Display biological information within each pathway module

Description

Usage