Package: DiscreteGapStatistic 0.1.0

Eduardo Cortes

DiscreteGapStatistic: An Extension of the Gap Statistic for Ordinal/Categorical Data

The gap statistic approach is extended to estimate the number of clusters for categorical response format data. This approach and accompanying software is designed to be used with the output of any clustering algorithm and with distances specifically designed for categorical (i.e. multiple choice) or ordinal survey response data.

Authors:Jeffrey Miecznikowski [aut], Eduardo Cortes [aut, cre]

DiscreteGapStatistic_0.1.0.tar.gz
DiscreteGapStatistic_0.1.0.tar.gz(r-4.5-noble)DiscreteGapStatistic_0.1.0.tar.gz(r-4.4-noble)
DiscreteGapStatistic_0.1.0.tgz(r-4.4-emscripten)DiscreteGapStatistic_0.1.0.tgz(r-4.3-emscripten)
DiscreteGapStatistic.pdf |DiscreteGapStatistic.html
DiscreteGapStatistic/json (API)

# Install 'DiscreteGapStatistic' in R:
install.packages('DiscreteGapStatistic', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/ecortesgomez/discretegapstatistic/issues

Datasets:

1.70 score 3 scripts 17 exports 97 dependencies

Last updated 28 days agofrom:9410d3e34a. Checks:OK: 2. Indexed: yes.

TargetResultDate
Doc / VignettesOKOct 26 2024
R-4.5-linuxOKOct 26 2024

Exports:BhattacharyyaDistChisqDistclusGapDiscrcramersVmodCramerVdissbhattacharyyadisschisquaredisscramervdisshammingdisshellingerdistanceHeatdistancematrixfindKHellingerDistlikert.heat.plot2ResHeatmapSimData

Dependencies:backportsbase64encBiocGenericsbslibcachemcheckmatecirclizecliclueclustercodetoolscolorspacecombinatComplexHeatmapcpp11crayoncultevodata.tabledigestdoParalleldplyrevaluatefansifarverfastmapfontawesomeforeachforeignFormulafsgenericsGetoptLongggplot2GlobalOptionsgluegridExtragtablehighrHmischtmlTablehtmltoolshtmlwidgetsIRangesisobanditeratorsjquerylibjsonliteknitrlabelinglatticelifecyclemagrittrMASSMatrixmatrixStatsmemoisemgcvmimemunsellnlmennetpheatmappillarpkgconfigplyrpngPolychromepspearmanpurrrR6rappdirsRColorBrewerRcppreshape2rjsonrlangrmarkdownrpartrstudioapiS4Vectorssassscalesscatterplot3dshapestringistringrtibbletidyrtidyselecttinytexutf8vctrsviridisviridisLitewithrxfunyaml

Readme and manuals

Help Manual

Help pageTopics
Bhattacharyya distance core functionBhattacharyyaDist
Chi-square distance core functionChisqDist
Discrete application of clusGap Based on the implementation of the function found in the `cluster` R packageclusGapDiscr
Concussion Dataconcussion
Cramer's V modified pairwise vector function based on the function found in lsr package This is simple wrapper of the usual chisq.test fun This is actually an adjusted version of the pi = sqrt(Chisq2/N) guaranteeing that values are within 0 (no association) and 1 (association)cramersVmod
Cramer's V core functionCramerV
Bhattacharyya's wrapper Functiondissbhattacharyya
Chi-square distance wrapper functiondisschisquare
Cramer's V distance wrapper functiondisscramerv
Hamming distance wrapper function Function based on cultevo's package implementationdisshamming
Hellinger's distance wrapper Functiondisshellinger
sample-to-sample heatmap clustering samples according to a given categorical distance Exploratory tool that helps to visualize/cluster blocks of observations across columns ordered according to given categorical distance. The final output is a clustered distance matrix. This plot is aimed to guide the `DiscreteClusGap` user to give an idea which type of categorical distance would accommodate better to the inputted data. `sample2sampleHeat` is based on the `pheatmap` function from the `pheatmap` R package. Thus, any parameter found in pheatmap can be specified to `sample2sampleHeat`.distanceHeat
Function invoking discrete distance functionsdistancematrix
Criteria to determine number of clusters kfindK
Hellinger distance core functionHellingerDist
Summary Heatmap for categorical/Likert data Heatmap representation summarizing categorical/likert data. Modified version of `likert.heat.plot` from `likert` package. Does not allow different categorical ranges across questions. The function outputs a ggplot object where additional layers can be added for customization purposes. The output plot preserves the question order given by columns of `x`.likert.heat.plot2
mass datamass
Heatmap assuming a given a distance function and a known number of clusters. Function to display a categorical data matrix given a user defined number of clusters `nCl`, a categorical distance `distName` and a predefined clustering method `FUNcluster`. The output displays a heatmap separating and color-labelling resulting clusters vertically in the rows and allowing unsupervised clustering on questions in the columns. Each cell is colored according to the categorical values provided or found in the data. The clustergram is based on the `pheatmap` function from the pheatmap R package. Thus, any parameter found in pheatmap can be specified to `clusGapDiscrHeat`. This function can be used to examine number of clusters before running `clusGapDiscrHeat` but also after number of clusters is determined.ResHeatmap
Simulate DataSimData