Package 'opdisDownsampling' reference manual

Title:	Optimal Distribution Preserving Down-Sampling of Bio-Medical Data
Description:	An optimized method for distribution-preserving class-proportional down-sampling of bio-medical data.
Authors:	Jorn Lotsch [aut,cre] , Sebastian Malkusch [aut] , Alfred Ultsch [aut]
Maintainer:	Jorn Lotsch <j.lotsch@em.uni-frankfurt.de>
License:	GPL-3
Version:	1.0.1
Built:	2025-03-12 06:48:52 UTC
Source:	CRAN

Example data of hematologic marker expression.

Description

Data set of 6 flow cytometry-based lymphoma makers from 55,843 cells from healthy subjects (class 1) and 55,843 cells from lymphoma patients (class 2).

Usage

data("FlowcytometricData")data("FlowcytometricData")

Details

Size 111686 x 6 , stored in FlowcytometricData$[Var_1,Var_2,Var_3,Var_4,Var_5,Var_6] Classes 2, stored in FlowcytometricData$Cls

Examples

data(FlowcytometricData)
str(FlowcytometricData)
data(FlowcytometricData)
str(FlowcytometricData)

Example data an artificial Gaussian mixture.

Description

Dataset of 30000 instances with 10 variables that are Gaussian mixtures and belong to classes Cls = 1, 2, or 3, with different means and standard deviations and equal weights of 0.5, 0.4, and 0.1, respectively.

Usage

data("GMMartificialData")data("GMMartificialData")

Details

Size 30000 x 10, stored in GMMartificialData$[X1,X2,X3,X4,X5,X6,X7,X8,X9,X10]

Classes 3, stored in GMMartificialData$Cls

Examples

data(GMMartificialData)
str(GMMartificialData)
data(GMMartificialData)
str(GMMartificialData)

Optimal Distribution Preserving Down-Sampling of Bio-Medical Data

Description

The package provides the necessary functions for optimal distribution-preserving down-sampling of large (bio-medical) data sets.

Usage

opdisDownsampling(Data, Cls, Size, Seed, nTrials = 1000,
TestStat = "ad", MaxCores = getOption("mc.cores", 2L), PCAimportance = FALSE)
opdisDownsampling(Data, Cls, Size, Seed, nTrials = 1000,
TestStat = "ad", MaxCores = getOption("mc.cores", 2L), PCAimportance = FALSE)

Arguments

`Data`	the (numerical!) data as a vector, matrix or data frame.
`Cls`	the class information, if any, as a vector of similar length as instances in the data.
`Size`	the total number of instances across all classes to be drawn.
`Seed`	a predefined seed to modify the results.
`nTrials`	how many samples to choose from should be randomly drawn.
`TestStat`	statistical criterion for similarity judgment.
`MaxCores`	maximum number of cpu cores to use for parallel computing.
`PCAimportance`	PCA based feature selection; only variables important in PCA projection are considered.

Value

Returns a list of data containing the drawn samples and the omitted data.

`ReducedData`	the selected sample data and class information.
`ReducedData`	the not-selected sample data and class information.
`ReducedInstances`	the instance numbers of the selected sample data.

Author(s)

Jorn Lotsch

References

Lotsch, J., Malkusch, S., Ultsch, A. (2021): Optimal distribution-preserving downsampling of large biomedical data sets (opdisDownsampling). PLoS One. 2021 Aug 5;16(8):e0255838. doi: 10.1371/journal.pone.0255838. eCollection 2021.

Examples

## example 1
data(iris)
Iris50percent <- opdisDownsampling(Data = iris[,1:4], Cls = as.integer(iris$Species),
  Size = 50, MaxCores = 1)
## example 1
data(iris)
Iris50percent <- opdisDownsampling(Data = iris[,1:4], Cls = as.integer(iris$Species),
  Size = 50, MaxCores = 1)

Package 'opdisDownsampling'

Help Index

Example data of hematologic marker expression.

Description

Usage

Details

Examples

Example data an artificial Gaussian mixture.

Description

Usage

Details

Examples

Optimal Distribution Preserving Down-Sampling of Bio-Medical Data

Description

Usage

Arguments

Value

Author(s)

References

Examples