Package 'imputeLCMD' reference manual

Title:	A Collection of Methods for Left-Censored Missing Data Imputation
Description:	A collection of functions for left-censored missing data imputation. Left-censoring is a special case of missing not at random (MNAR) mechanism that generates non-responses in proteomics experiments. The package also contains functions to artificially generate peptide/protein expression data (log-transformed) as random draws from a multivariate Gaussian distribution as well as a function to generate missing data (both randomly and non-randomly). For comparison reasons, the package also contains several wrapper functions for the imputation of non-responses that are missing at random. * New functionality has been added: a hybrid method that allows the imputation of missing values in a more complex scenario where the missing data are both MAR and MNAR.
Authors:	Cosmin Lazar [aut], Thomas Burger [aut], Samuel Wieczorek [cre, ctb]
Maintainer:	Samuel Wieczorek <samuel.wieczorek@cea.fr>
License:	GPL (>= 2)
Version:	2.1
Built:	2025-03-29 07:36:11 UTC
Source:	CRAN

Generate expression data

Description

this function generates artificial peptide abundance data with DA proteins samples are drawn from a gaussian distribution

Usage

generate.ExpressionData(
  nSamples1,
  nSamples2,
  meanSamples,
  sdSamples,
  nFeatures,
  nFeaturesUp,
  nFeaturesDown,
  meanDynRange,
  sdDynRange,
  meanDiffAbund,
  sdDiffAbund
)
generate.ExpressionData(
  nSamples1,
  nSamples2,
  meanSamples,
  sdSamples,
  nFeatures,
  nFeaturesUp,
  nFeaturesDown,
  meanDynRange,
  sdDynRange,
  meanDiffAbund,
  sdDiffAbund
)

Arguments

`nSamples1`	number of samples in condition 1
`nSamples2`	number of samples in condition 2
`meanSamples`	xxx
`sdSamples`	xxx
`nFeatures`	number of total features
`nFeaturesUp`	number of features up regulated
`nFeaturesDown`	number of features down regulated
`meanDynRange`	mean value of the dynamic range
`sdDynRange`	sd of the dynamic range
`meanDiffAbund`	xxx
`sdDiffAbund`	xxx

Value

A list containing the data, the conditions label and the regulation label (up/down/no)

Generate roll up map

Description

Tthis function generates a map for peptide to protein roll-up

Usage

generate.RollUpMap(nProt, pep.Expr.Data)
generate.RollUpMap(nProt, pep.Expr.Data)

Arguments

`nProt`	number of proteins to map to the peptide expression data
`pep.Expr.Data`	matrix of peptide expression data

Value

the peptide to protein map (for each row in pep.prot.Map the corresponding value corresponds to the index of the protein that peptide is mapped to)

imputation under MAR/MCAR hypothesis

Description

This function performs missing values imputation under MAR/MCAR hypothesis. The imputation of MVs is performed for each protein containing MAR/MCAR missing values

Usage

impute.MAR(dataSet.mvs, model.selector, method = "MLE")
impute.MAR(dataSet.mvs, model.selector, method = "MLE")

Arguments

`dataSet.mvs`	expression matrix containing abundances with MVs (either peptides or proteins)
`model.selector`	binary vector; "1" indicates MAR/MCAR proteins
`method`	the method to be used for MAR/MCAR missing values. Possible values: MLE (default), SVD, KNN

Value

dataset containing only MNAR (assumed to be left-censored) missing data

Imputation under MCAR and MNAR hypothesis

Description

this function performs missing values imputation under MCAR and MNAR hypothesis

Usage

impute.MAR.MNAR(
  dataSet.mvs,
  model.selector,
  method.MAR = "KNN",
  method.MNAR = "QRILC"
)
impute.MAR.MNAR(
  dataSet.mvs,
  model.selector,
  method.MAR = "KNN",
  method.MNAR = "QRILC"
)

Arguments

`dataSet.mvs`	expression matrix containing abundances with MVs (either peptides or proteins)
`model.selector`	- binary vector; "1" indicates MCAR proteins
`method.MAR`	- the method to be used for MAR missing values - possible values: MLE (default), SVD, KNN
`method.MNAR`	- the method to be used for MAR missing values

Value

dataset containing complete abundances

imputation based on quantile regression

Description

this function performs missing values imputation based quantile regression

Usage

impute.QRILC(dataSet.mvs, tune.sigma = 1)
impute.QRILC(dataSet.mvs, tune.sigma = 1)

Arguments

`dataSet.mvs`	expression matrix with MVs (either peptides or proteins)
`tune.sigma`	coefficient that controls the sd of the MNAR distribution

Value

a list containing: a matrix with the complete abundances, a list with the estimated parameters of the complete data distribution

imputation using the EM algorithm

Description

This function performs missing values imputation using the EM algorithm

Usage

impute.wrapper.MLE(dataSet.mvs)
impute.wrapper.MLE(dataSet.mvs)

Arguments

dataSet.mvs

expression matrix with MVs (either peptides or proteins)

Value

expression matrix with MVs imputed

imputation based on SVD algorithm

Description

this function performs missing values imputation based on SVD algorithm

Usage

impute.wrapper.SVD(dataSet.mvs, K)
impute.wrapper.SVD(dataSet.mvs, K)

Arguments

`dataSet.mvs`	expression matrix with MVs (either peptides or proteins)
`K`	the number of PCs

Value

expression matrix with MVs imputed

Imputation by 0.

Description

This function performs missing values imputation by 0.

Usage

impute.ZERO(dataSet.mvs)
impute.ZERO(dataSet.mvs)

Arguments

dataSet.mvs

expression matrix containing abundances with MVs (either peptides or proteins)

Value

dataset containing complete abundances

Generates missing values in data.

Description

this function generates missing data in a complete data matrix

Usage

insertMVs(original, mean.THR, sd.THR, MNAR.rate)
insertMVs(original, mean.THR, sd.THR, MNAR.rate)

Arguments

`original`	complete data matrix containing all measurements
`mean.THR`, `sd.THR`	- parameters of the threshold distribution which controls the MVs rate (mean.THR should be initially set such that the result of the initial thresholding, in terms of no. of NAs, equals the desired total missing data rate) - example: if one wants to generate 30 mean.THR can be set as follows: mean.THR = quantile(pepExprsData, probs = 0.3) - sd.THR is usually set to a small value (e.g. 0.1)
`MNAR.rate`	percentage of MVs which are missing not at random

Value

A list that contains the original complete data matrix, the data matrix with missing data and the percentage of missing data

Dataset PXD000022 from ProteomeXchange.

Description

This dataset has been collected during a study designed to compare the protein content of the exosome-like vesicles (ELVs) released from C2C12 murine myoblasts during proliferation (ELV-MB), and after differentiation into myotuves (ELV-MT). The dataset within this package contains proteins intensity processed using MaxQuant. More information can be found on ProteomeExchange public repository (http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD000022) or in the original paper (see reference).

Usage

data(intensity_PXD000022)
data(intensity_PXD000022)

Format

A data frame with 660 observations on the following 7 variables.

Protein.IDs: Peptides/Proteins names
Intensity.MB.1: a numeric vector
Intensity.MB.2: a numeric vector
Intensity.MB.3: a numeric vector
Intensity.MT.1: a numeric vector
Intensity.MT.2: a numeric vector
Intensity.MT.3: a numeric vector

Source

Original MaxQuant data: http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD000022

References

Forterre A, Jalabert A, Berger E, Baudet M, Chikh K, et al. (2014) Proteomic Analysis of C2C12 Myoblast and Myotube Exosome-Like Vesicles: A New Paradigm for Myoblast-Myotube Cross Talk? PLoS ONE 9(1): e84153. doi:10.1371/journal.pone.0084153

Dataset PXD000052 from ProteomeXchange.

Description

This dataset has been collected during a study designed to perform the proteomic analysis of the SLP76 interactome in resting and activated primary mast cells. Four SLP76 replicates (with two analytical replicates each) have been affinity-purified from both resting and activated primary mast cells. The dataset within this package contains proteins intensity processed using MaxQuant. More information can be found on ProteomeExchange public repository (http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD000052) or in the original paper (see reference).

Usage

data(intensity_PXD000052)
data(intensity_PXD000052)

Format

A data frame with 1991 observations on the following 17 variables.

Protein.IDs: Peptides/Proteins names
iBAQ.stSLP_activ1: a numeric vector
iBAQ.stSLP_activ2: a numeric vector
iBAQ.stSLP_activ3: a numeric vector
iBAQ.stSLP_activ4: a numeric vector
iBAQ.stSLP_rest1: a numeric vector
iBAQ.stSLP_rest2: a numeric vector
iBAQ.stSLP_rest3: a numeric vector
iBAQ.stSLP_rest4: a numeric vector
iBAQ.WT_activ1: a numeric vector
iBAQ.WT_activ2: a numeric vector
iBAQ.WT_activ3: a numeric vector
iBAQ.WT_activ4: a numeric vector
iBAQ.WT_rest1: a numeric vector
iBAQ.WT_rest2: a numeric vector
iBAQ.WT_rest3: a numeric vector
iBAQ.WT_rest4: a numeric vector

Source

Original MaxQuant data: http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD000052

References

Bounab Y, Hesse AM, Iannascoli B, Grieco L, Coute Y, Niarakis A, Roncagalli R, Lie E, Lam KP, Demangel C, Thieffry D, Garin J, Malissen B, Da?ron M, Proteomic analysis of the SH2 domain-containing leukocyte protein of 76 kDa (SLP76) interactome in resting and activated primary mast cells [corrected]. Mol Cell Proteomics, 12(10):2874-89(2013).

Dataset PXD000438 from ProteomeXchange.

Description

This dataset has been collected during a study designed to compare human primary tumor-derived xenograph proteomes of the two major histological non-small cel lung cancer subtypes: adenocarcinoma (ADC) and squamous cell carcinoma (SCC). The dataset within this package contains proteins intensity for 6 ADC and 6 SCC samples, processed using MaxQuant. More information can be found on ProteomeExchange public repository(http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD000438) or in the original paper (see reference).

Usage

data(intensity_PXD000438)
data(intensity_PXD000438)

Format

A data frame with 3709 observations on the following 13 variables.

Protein.IDs: Peptides/Proteins names
Intensity.092.1: a numeric vector
Intensity.092.2: a numeric vector
Intensity.092.3: a numeric vector
Intensity.441.1: a numeric vector
Intensity.441.2: a numeric vector
Intensity.441.3: a numeric vector
Intensity.561.1: a numeric vector
Intensity.561.2: a numeric vector
Intensity.561.3: a numeric vector
Intensity.691.1: a numeric vector
Intensity.691.2: a numeric vector
Intensity.691.3: a numeric vector

Source

Original MaxQuant data: http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD000438

References

Zhang W, Wei Y, Ignatchenko V, Li L, Sakashita S, Pham NA, Taylor P, Tsao MS, Kislinger T, Moran MF, Proteomic profiles of human lung adeno and squamous cell carcinoma using super-SILAC and label-free quantification approaches. Proteomics, 14(6):795-803(2014).

Examples

  data(intensity_PXD000438)
data(intensity_PXD000438)

Dataset PXD000501 from ProteomeXchange.

Description

This dataset contains three biological replicates with three technical replicates each for the conditiones media (CM) and the whole cell lysates (WCL) of C8-D1A cell lines. The dataset within this package contains proteins iBAQ intensity processed using MaxQuant. More information can be found on ProteomeExchange public repository (http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD000501) or in the original paper (see reference).

Usage

data(intensity_PXD000501)
data(intensity_PXD000501)

Format

A data frame with 7363 observations on the following 19 variables.

Protein.IDs: Peptides/Proteins names
iBAQ.secretome_set1_tech1: a numeric vector
iBAQ.secretome_set1_tech2: a numeric vector
iBAQ.secretome_set1_tech3: a numeric vector
iBAQ.secretome_set2_tech1: a numeric vector
iBAQ.secretome_set2_tech2: a numeric vector
iBAQ.secretome_set2_tech3: a numeric vector
iBAQ.secretome_set3_tech1: a numeric vector
iBAQ.secretome_set3_tech2: a numeric vector
iBAQ.secretome_set3_tech3: a numeric vector
iBAQ.whole_set1_tech1: a numeric vector
iBAQ.whole_set1_tech2: a numeric vector
iBAQ.whole_set1_tech3: a numeric vector
iBAQ.whole_set2_tech1: a numeric vector
iBAQ.whole_set2_tech2: a numeric vector
iBAQ.whole_set2_tech3: a numeric vector
iBAQ.whole_set3_tech1: a numeric vector
iBAQ.whole_set3_tech2: a numeric vector
iBAQ.whole_set3_tech3: a numeric vector

Source

Original MaxQuant data: http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD000501

References

Han D, Jin J, Woo J, Min H, Kim Y, Proteomic analysis of mouse astrocytes and their secretome by a combination of FASP and StageTip-based, high pH, reversed-phase fractionation. Proteomics, ():(2014).

Examples

  data(intensity_PXD000501)
data(intensity_PXD000501)

Identifies row in the data matrix affected by a MNAR missingness mechanism

Description

- this function determines row in the data matrix affected by a MNAR missingness mechanism - it is based on the assumption that the distributions of the mean values of proteins follows a normal distribution - the method makes use of a decision function defined as a tradeoff between the empirical CDF of the proteins' means and the theoretical CDF assuming that no MVs are present

Usage

model.Selector(dataSet.mvs)
model.Selector(dataSet.mvs)

Arguments

dataSet.mvs

expression matrix containing abundances with MVs (either peptides or proteins)

Value

flags vector; "1" denotes rows containing random missing values; "0" denotes rows containing left-censored missing values

peptide to protein roll-up

Description

this function performs peptide to protein roll-up

Usage

pep2prot(pep.Expr.Data, rollup.map)
pep2prot(pep.Expr.Data, rollup.map)

Arguments

`pep.Expr.Data`	matrix of peptide expression data
`rollup.map`	the map to peptide to protein mapping

Value

matrix of peptide expression data

`dataSet.mvs`	expression matrix with MVs (either peptides or proteins)
`q`	the q quantile used to estimate the minimum

`dataSet.mvs`	expression matrix containing abundances with MVs (either peptides or proteins)
`q`	the q-th quantile used to estimate the minimum value observed for each sample
`tune.sigma`	coefficient that controls the sd of the MNAR distribution

`dataSet.mvs`	expression matrix with MVs (either peptides or proteins)
`K`	the number of neighbors

Package 'imputeLCMD'

Help Index

Generate expression data

Description

Usage

Arguments

Value

Generate roll up map

Description

Usage

Arguments

Value

imputation under MAR/MCAR hypothesis

Description

Usage

Arguments

Value

Imputation under MCAR and MNAR hypothesis

Description

Usage

Arguments

Value

Imputation with min value

Description

Usage

Arguments

Value

Imputation by random draws

Description

Usage

Arguments

Value

imputation based on quantile regression

Description

Usage

Arguments

Value

Imputation with KNN

Description

Usage

Arguments

Value

imputation using the EM algorithm

Description

Usage

Arguments

Value

imputation based on SVD algorithm

Description

Usage

Arguments

Value

Imputation by 0.

Description

Usage

Arguments

Value

Generates missing values in data.

Description

Usage

Arguments

Value

Dataset PXD000022 from ProteomeXchange.

Description

Usage

Format

Source

References

Dataset PXD000052 from ProteomeXchange.

Description

Usage

Format

Source

References

Dataset PXD000438 from ProteomeXchange.

Description

Usage

Format

Source

References