No packages match

RODBC - ODBC Database Access

An ODBC database interface.

Last updated

unixodbc

9.91 score 10 stars 32 dependents 4.4k scripts 19k downloads

PMCMRplus - Calculate Pairwise Multiple Comparisons of Mean Rank Sums Extended

For one-way layout experiments the one-way ANOVA can be performed as an omnibus test. All-pairs multiple comparisons tests (Tukey-Kramer test, Scheffe test, LSD-test) and many-to-one tests (Dunnett test) for normally distributed residuals and equal within variance are available. Furthermore, all-pairs tests (Games-Howell test, Tamhane's T2 test, Dunnett T3 test, Ury-Wiggins-Hochberg test) and many-to-one (Tamhane-Dunnett Test) for normally distributed residuals and heterogeneous variances are provided. Van der Waerden's normal scores test for omnibus, all-pairs and many-to-one tests is provided for non-normally distributed residuals and homogeneous variances. The Kruskal-Wallis, BWS and Anderson-Darling omnibus test and all-pairs tests (Nemenyi test, Dunn test, Conover test, Dwass-Steele-Critchlow- Fligner test) as well as many-to-one (Nemenyi test, Dunn test, U-test) are given for the analysis of variance by ranks. Non-parametric trend tests (Jonckheere test, Cuzick test, Johnson-Mehrotra test, Spearman test) are included. In addition, a Friedman-test for one-way ANOVA with repeated measures on ranks (CRBD) and Skillings-Mack test for unbalanced CRBD is provided with consequent all-pairs tests (Nemenyi test, Siegel test, Miller test, Conover test, Exact test) and many-to-one tests (Nemenyi test, Demsar test, Exact test). A trend can be tested with Pages's test. Durbin's test for a two-way balanced incomplete block design (BIBD) is given in this package as well as Gore's test for CRBD with multiple observations per cell is given. Outlier tests, Mandel's k- and h statistic as well as functions for Type I error and Power analysis as well as generic summary, print and plot methods are provided.

Last updated

9.25 score 7 stars 15 dependents 692 scripts 16k downloads

bayesm - Bayesian Inference for Marketing/Micro-Econometrics

Covers many important models used in marketing and micro-econometrics applications. The package includes: Bayes Regression (univariate or multivariate dep var), Bayes Seemingly Unrelated Regression (SUR), Binary and Ordinal Probit, Multinomial Logit (MNL) and Multinomial Probit (MNP), Multivariate Probit, Negative Binomial (Poisson) Regression, Multivariate Mixtures of Normals (including clustering), Dirichlet Process Prior Density Estimation with normal base, Hierarchical Linear Models with normal prior and covariates, Hierarchical Linear Models with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a mixture of normals prior and covariates, Hierarchical Multinomial Logits with a Dirichlet Process prior and covariates, Hierarchical Negative Binomial Regression Models, Bayesian analysis of choice-based conjoint data, Bayesian treatment of linear instrumental variables models, Analysis of Multivariate Ordinal survey data with scale usage heterogeneity (as in Rossi et al, JASA (01)), Bayesian Analysis of Aggregate Random Coefficient Logit Models as in BLP (see Jiang, Manchanda, Rossi 2009) For further reference, consult our book, Bayesian Statistics and Marketing by Rossi, Allenby and McCulloch (Wiley second edition 2024) and Bayesian Non- and Semi-Parametric Methods and Applications (Princeton U Press 2014).

Last updated

openblascpp

8.66 score 21 stars 45 dependents 446 scripts 18k downloads

tree - Classification and Regression Trees

Classification and regression trees.

Last updated

7.61 score 1 stars 16 dependents 5.6k scripts 15k downloads

gss - General Smoothing Splines

A comprehensive package for structural multivariate function estimation using smoothing splines.

Last updated

openblasglibc

7.30 score 3 stars 126 dependents 129 scripts 23k downloads

gee - Generalized Estimation Equation Solver

Generalized Estimation Equation solver.

Last updated

openblas

7.27 score 3 stars 17 dependents 742 scripts 16k downloads

pspline - Penalized Smoothing Splines

Smoothing splines with penalties on order m derivatives.

Last updated

6.90 score 1 stars 105 dependents 283 scripts 9.8k downloads

BGLR - Bayesian Generalized Linear Regression

Bayesian Generalized Linear Regression.

Last updated

openblas

6.62 score 2 stars 4 dependents 788 scripts 2.2k downloads

frailtypack - Shared, Joint (Generalized) Frailty Models; Surrogate Endpoints

The following several classes of frailty models using a penalized likelihood estimation on the hazard function but also a parametric estimation can be fit using this R package: 1) A shared frailty model (with gamma or log-normal frailty distribution) and Cox proportional hazard model. Clustered and recurrent survival times can be studied. 2) Additive frailty models for proportional hazard models with two correlated random effects (intercept random effect with random slope). 3) Nested frailty models for hierarchically clustered data (with 2 levels of clustering) by including two iid gamma random effects. 4) Joint frailty models in the context of the joint modelling for recurrent events with terminal event for clustered data or not. A joint frailty model for two semi-competing risks and clustered data is also proposed. 5) Joint general frailty models in the context of the joint modelling for recurrent events with terminal event data with two independent frailty terms. 6) Joint Nested frailty models in the context of the joint modelling for recurrent events with terminal event, for hierarchically clustered data (with two levels of clustering) by including two iid gamma random effects. 7) Multivariate joint frailty models for two types of recurrent events and a terminal event. 8) Joint models for longitudinal data and a terminal event. 9) Trivariate joint models for longitudinal data, recurrent events and a terminal event. 10) Joint frailty models for the validation of surrogate endpoints in multiple randomized clinical trials with failure-time and/or longitudinal endpoints with the possibility to use a mediation analysis model. 11) Conditional and Marginal two-part joint models for longitudinal semicontinuous data and a terminal event. 12) Joint frailty-copula models for the validation of surrogate endpoints in multiple randomized clinical trials with failure-time endpoints. 13) Generalized shared and joint frailty models for recurrent and terminal events. Proportional hazards (PH), additive hazard (AH), proportional odds (PO) and probit models are available in a fully parametric framework. For PH and AH models, it is possible to consider type-varying coefficients and flexible semiparametric hazard function. Prediction values are available (for a terminal event or for a new recurrent event). Left-truncated (not for Joint model), right-censored data, interval-censored data (only for Cox proportional hazard and shared frailty model) and strata are allowed. In each model, the random effects have the gamma or normal distribution. Now, you can also consider time-varying covariates effects in Cox, shared and joint frailty models (1-5). The package includes concordance measures for Cox proportional hazards models and for shared frailty models. 14) Competing Joint Frailty Model: A single type of recurrent event and two terminal events. 15) functions to compute power and sample size for four Gamma-frailty-based designs: Shared Frailty Models, Nested Frailty Models, Joint Frailty Models, and General Joint Frailty Models. Each design includes two primary functions: a power function, which computes power given a specified sample size; and a sample size function, which computes the required sample size to achieve a specified power. 16) Weibull Illness-Death model with or without shared frailty between transitions. Left-truncated and right-censored data are allowed. 17) Weibull Competing risks model with or without shared frailty between the transitions. Left-truncated and right-censored data are allowed. Moreover, the package can be used with its shiny application, in a local mode or by following the link below.

Last updated

fortranglibcopenmp

6.21 score 7 stars 2 dependents 114 scripts 1.7k downloads

sae - Small Area Estimation

Functions for small area estimation.

Last updated

6.21 score 6 stars 9 dependents 109 scripts 4.6k downloads

ash - David Scott's ASH Routines

David Scott's ASH routines ported from S-PLUS to R.

Last updated

6.13 score 193 dependents 76 scripts 15k downloads

fds - Functional Data Sets

Functional data sets.

Last updated

6.05 score 1 stars 163 dependents 127 scripts 18k downloads

matlab - 'MATLAB' Emulation Package

Emulate 'MATLAB' code using 'R'.

Last updated

5.88 score 18 dependents 645 scripts 2.2k downloads

som - Self-Organizing Map

Self-Organizing Map (with application in gene clustering).

Last updated

cpp

5.30 score 10 dependents 76 scripts 9.8k downloads

frbs - Fuzzy Rule-Based Systems for Classification and Regression Tasks

An implementation of various learning algorithms based on fuzzy rule-based systems (FRBSs) for dealing with classification and regression tasks. Moreover, it allows to construct an FRBS model defined by human experts. FRBSs are based on the concept of fuzzy sets, proposed by Zadeh in 1965, which aims at representing the reasoning of human experts in a set of IF-THEN rules, to handle real-life problems in, e.g., control, prediction and inference, data mining, bioinformatics data processing, and robotics. FRBSs are also known as fuzzy inference systems and fuzzy models. During the modeling of an FRBS, there are two important steps that need to be conducted: structure identification and parameter estimation. Nowadays, there exists a wide variety of algorithms to generate fuzzy IF-THEN rules automatically from numerical data, covering both steps. Approaches that have been used in the past are, e.g., heuristic procedures, neuro-fuzzy techniques, clustering methods, genetic algorithms, squares methods, etc. Furthermore, in this version we provide a universal framework named 'frbsPMML', which is adopted from the Predictive Model Markup Language (PMML), for representing FRBS models. PMML is an XML-based language to provide a standard for describing models produced by data mining and machine learning algorithms. Therefore, we are allowed to export and import an FRBS model to/from 'frbsPMML'. Finally, this package aims to implement the most widely used standard procedures, thus offering a standard package for FRBS modeling to the R community.

Last updated

5.28 score 12 stars 84 scripts 4.7k downloads

GSA - Gene Set Analysis

Gene Set Analysis.

Last updated

5.19 score 8 dependents 357 scripts 1.8k downloads

Directional - A Collection of Functions for Directional Data Analysis

A collection of functions for directional data (including massive data, with millions of observations) analysis. Hypothesis testing, discriminant and regression analysis, MLE of distributions and more are included. The standard textbook for such data is the "Directional Statistics" by Mardia, K. V. and Jupp, P. E. (2000). Other references include: a) Paine J.P., Preston S.P., Tsagris M. and Wood A.T.A. (2018). "An elliptically symmetric angular Gaussian distribution". Statistics and Computing 28(3): 689-697. <doi:10.1007/s11222-017-9756-4>. b) Tsagris M. and Alenazi A. (2019). "Comparison of discriminant analysis methods on the sphere". Communications in Statistics: Case Studies, Data Analysis and Applications 5(4):467--491. <doi:10.1080/23737484.2019.1684854>. c) Paine J.P., Preston S.P., Tsagris M. and Wood A.T.A. (2020). "Spherical regression models with general covariates and anisotropic errors". Statistics and Computing 30(1): 153--165. <doi:10.1007/s11222-019-09872-2>. d) Tsagris M. and Alenazi A. (2024). "An investigation of hypothesis testing procedures for circular and spherical mean vectors". Communications in Statistics-Simulation and Computation, 53(3): 1387--1408. <doi:10.1080/03610918.2022.2045499>. e) Yu Z. and Huang X. (2024). A new parameterization for elliptically symmetric angular Gaussian distributions of arbitrary dimension. Electronic Journal of Statistics, 18(1): 301--334. <doi:10.1214/23-EJS2210>. f) Tsagris M. and Alzeley O. (2025). "Circular and spherical projected Cauchy distributions: A Novel Framework for Circular and Directional Data Modeling". Australian & New Zealand Journal of Statistics, 67(1): 77--103. <doi:10.1111/anzs.12434>. g) Tsagris M., Papastamoulis P. and Kato S. (2025). "Directional data analysis: spherical Cauchy or Poisson kernel-based distribution". Statistics and Computing, 35:51. <doi:10.1007/s11222-025-10583-0>. h) Alzeley O. and Tsagris (2026). "On the generalized circular projected Cauchy distribution". Mathematics, 14(11): 1934. <doi:10.3390/math14111934>.

Last updated

5.11 score 3 stars 5 dependents 132 scripts 1.1k downloads

Compositional - Compositional Data Analysis

Regression, classification, contour plots, hypothesis testing and fitting of distributions for compositional data are some of the functions included. We further include functions for percentages (or proportions). The standard textbook for such data is John Aitchison's (1986) "The statistical analysis of compositional data". Relevant papers include: a) Tsagris M.T., Preston S. and Wood A.T.A. (2011). "A data--based power transformation for compositional data". Fourth International International Workshop on Compositional Data Analysis. <doi:10.48550/arXiv.1106.1451>. b) Tsagris M. (2014). "The k--NN algorithm for compositional data: a revised approach with and without zero values present". Journal of Data Science, 12(3): 519--534. <doi:10.6339/JDS.201407_12(3).0008>. c) Tsagris M. (2015). "A novel, divergence based, regression for compositional data". Proceedings of the 28th Panhellenic Statistics Conference, 15-18 April 2015, Athens, Greece, 430--444. <doi:10.48550/arXiv.1511.07600>. d) Tsagris M. (2015). "Regression analysis with compositional data containing zero values". Chilean Journal of Statistics, 6(2): 47--57. <https://soche.cl/chjs/volumes/06/02/Tsagris(2015).pdf>. e) Tsagris M., Preston S. and Wood A.T.A. (2016). "Improved supervised classification for compositional data using the alpha-transformation". Journal of Classification, 33(2): 243--261. <doi:10.1007/s00357-016-9207-5>. f) Tsagris M., Preston S. and Wood A.T.A. (2017). "Nonparametric hypothesis testing for equality of means on the simplex". Journal of Statistical Computation and Simulation, 87(2): 406--422. <doi:10.1080/00949655.2016.1216554>. g) Tsagris M. and Stewart C. (2018). "A Dirichlet regression model for compositional data with zeros". Lobachevskii Journal of Mathematics, 39(3): 398--412. <doi:10.1134/S1995080218030198>. h) Alenazi A. (2019). "Regression for compositional data with compositional data as predictor variables with or without zero values". Journal of Data Science, 17(1): 219--238. <doi:10.6339/JDS.201901_17(1).0010>. i) Tsagris M. and Stewart C. (2020). "A folded model for compositional data analysis". Australian and New Zealand Journal of Statistics, 62(2): 249--277. <doi:10.1111/anzs.12289>. j) Alenazi A.A. (2022). "f--divergence regression models for compositional data". Pakistan Journal of Statistics and Operation Research, 18(4): 867--882. <doi:10.18187/pjsor.v18i4.3969>. k) Tsagris M. and Stewart C. (2022). "A Review of Flexible Transformations for Modeling Compositional Data". In Advances and Innovations in Statistics and Data Science, pp. 225--234. <doi:10.1007/978-3-031-08329-7_10>. l) Alenazi A. (2023). "A review of compositional data analysis and recent advances". Communications in Statistics--Theory and Methods, 52(16): 5535--5567. <doi:10.1080/03610926.2021.2014890>. m) Tsagris M., Alenazi A. and Stewart C. (2023). "Flexible non--parametric regression models for compositional response data with zeros". Statistics and Computing, 33(106). <doi:10.1007/s11222-023-10277-5>. n) Tsagris. M. (2025). "Constrained least squares simplicial--simplicial regression". Statistics and Computing, 35(27). <doi:10.1007/s11222-024-10560-z>. o) Sevinc V. and Tsagris. M. (2026). "Energy Based Equality of Distributions Testing for Compositional Data". Communications in Statistics--Simulation and Computation. <doi:10.1080/03610918.2026.2636167>. p) Tsagris M. and Alzeley O. (2025). "Scalable approximation of the transformation--free linear simplicial--simplicial regression via constrained iterative reweighted least squares". <doi:10.48550/arXiv.2511.13296>.

Last updated

5.07 score 3 stars 9 dependents 153 scripts 1.4k downloads

pbs - Periodic B Splines

Periodic B Splines Basis

Last updated

4.67 score 29 dependents 34 scripts 5.3k downloads

conf - Visualization and Analysis of Statistical Measures of Confidence

Enables: (1) plotting two-dimensional confidence regions, (2) coverage analysis of confidence region simulations, (3) calculating confidence intervals and the associated actual coverage for binomial proportions, (4) calculating the support values and the probability mass function of the Kaplan-Meier product-limit estimator, and (5) plotting the actual coverage function associated with a confidence interval for the survivor function from a randomly right-censored data set. Each is given in greater detail next. (1) Plots the two-dimensional confidence region for probability distribution parameters (supported distribution suffixes: cauchy, gamma, invgauss, logis, llogis, lnorm, norm, unif, weibull) corresponding to a user-given complete or right-censored dataset and level of significance. The crplot() algorithm plots more points in areas of greater curvature to ensure a smooth appearance throughout the confidence region boundary. An alternative heuristic plots a specified number of points at roughly uniform intervals along its boundary. Both heuristics build upon the radial profile log-likelihood ratio technique for plotting confidence regions given by Jaeger (2016) <doi:10.1080/00031305.2016.1182946>, and are detailed in a publication by Weld et al. (2019) <doi:10.1080/00031305.2018.1564696>. (2) Performs confidence region coverage simulations for a random sample drawn from a user- specified parametric population distribution, or for a user-specified dataset and point of interest with coversim(). (3) Calculates confidence interval bounds for a binomial proportion with binomTest(), calculates the actual coverage with binomTestCoverage(), and plots the actual coverage with binomTestCoveragePlot(). Calculates confidence interval bounds for the binomial proportion using an ensemble of constituent confidence intervals with binomTestEnsemble(). Calculates confidence interval bounds for the binomial proportion using a complete enumeration of all possible transitions from one actual coverage acceptance curve to another which minimizes the root mean square error for n <= 15 and follows the transitions for well-known confidence intervals for n > 15 using binomTestMSE(). (4) The km.support() function calculates the support values of the Kaplan-Meier product-limit estimator for a given sample size n using an induction algorithm described in Qin et al. (2023) <doi:10.1080/00031305.2022.2070279>. The km.outcomes() function generates a matrix containing all possible outcomes (all possible sequences of failure times and right-censoring times) of the value of the Kaplan-Meier product-limit estimator for a particular sample size n. The km.pmf() function generates the probability mass function for the support values of the Kaplan-Meier product-limit estimator for a particular sample size n, probability of observing a failure h at the time of interest expressed as the cumulative probability percentile associated with X = min(T, C), where T is the failure time and C is the censoring time under a random-censoring scheme. The km.surv() function generates multiple probability mass functions of the Kaplan-Meier product-limit estimator for the same arguments as those given for km.pmf(). (5) The km.coverage() function plots the actual coverage function associated with a confidence interval for the survivor function from a randomly right-censored data set for one or more of the following confidence intervals: Greenwood, log-minus-log, Peto, arcsine, and exponential Greenwood. The actual coverage function is plotted for a small number of items on test, stated coverage, failure rate, and censoring rate. The km.coverage() function can print an optional table containing all possible failure/censoring orderings, along with their contribution to the actual coverage function.

Last updated

4.39 score 87 scripts 257 downloads

ars - Adaptive Rejection Sampling

Adaptive Rejection Sampling, Original version.

Last updated

cpp

4.30 score 7 dependents 65 scripts 15k downloads

GB2 - Generalized Beta Distribution of the Second Kind: Properties, Likelihood, Estimation

The GB2 package explores the Generalized Beta distribution of the second kind. Density, cumulative distribution function, quantiles and moments of the distribution are given. Functions for the full log-likelihood, the profile log-likelihood and the scores are provided. Formulas for various indicators of inequality and poverty under the GB2 are implemented. The GB2 is fitted by the methods of maximum pseudo-likelihood estimation using the full and profile log-likelihood, and non-linear least squares estimation of the model parameters. Various plots for the visualization and analysis of the results are provided. Variance estimation of the parameters is provided for the method of maximum pseudo-likelihood estimation. A mixture distribution based on the compounding property of the GB2 is presented (denoted as "compound" in the documentation). This mixture distribution is based on the discretization of the distribution of the underlying random scale parameter. The discretization can be left or right tail. Density, cumulative distribution function, moments and quantiles for the mixture distribution are provided. The compound mixture distribution is fitted using the method of maximum pseudo-likelihood estimation. The fit can also incorporate the use of auxiliary information. In this new version of the package, the mixture case is complemented with new functions for variance estimation by linearization and comparative density plots.

Last updated

3.92 score 1 stars 3 dependents 97 scripts 3.2k downloads

tcl - Testing in Conditional Likelihood Context

An implementation of hypothesis testing in an extended Rasch modeling framework, including sample size planning procedures and power computations. Provides 4 statistical tests, i.e., gradient test (GR), likelihood ratio test (LR), Rao score or Lagrange multiplier test (RS), and Wald test, for testing a number of hypotheses referring to the Rasch model (RM), linear logistic test model (LLTM), rating scale model (RSM), and partial credit model (PCM). Three types of functions for power and sample size computations are provided. Firstly, functions to compute the sample size given a user-specified (predetermined) deviation from the hypothesis to be tested, the level alpha, and the power of the test. Secondly, functions to evaluate the power of the tests given a user-specified (predetermined) deviation from the hypothesis to be tested, the level alpha of the test, and the sample size. Thirdly, functions to evaluate the so-called post hoc power of the tests. This is the power of the tests given the observed deviation of the data from the hypothesis to be tested and a user-specified level alpha of the test. Power and sample size computations are based on a Monte Carlo simulation approach. It is computationally very efficient. The variance of the random error in computing power and sample size arising from the simulation approach is analytically derived by using the delta method. Additionally, functions to compute the power of the tests as a function of an effect measure interpreted as explained variance are provided. Draxler, C., & Alexandrowicz, R. W. (2015), <doi:10.1007/s11336-015-9472-y>.

Last updated

3.70 score 5 scripts 180 downloads

FuzzySTs - Fuzzy Statistical Tools

The main goal of this package is to present various fuzzy statistical tools. It intends to provide an implementation of the theoretical and empirical approaches presented in the book entitled "The signed distance measure in fuzzy statistical analysis. Some theoretical, empirical and programming advances" <doi: 10.1007/978-3-030-76916-1>. For the theoretical approaches, see Berkachy R. and Donze L. (2019) <doi:10.1007/978-3-030-03368-2_1>. For the empirical approaches, see Berkachy R. and Donze L. (2016) <ISBN: 978-989-758-201-1>). Important (non-exhaustive) implementation highlights of this package are as follows: (1) a numerical procedure to estimate the fuzzy difference and the fuzzy square. (2) two numerical methods of fuzzification. (3) a function performing different possibilities of distances, including the signed distance and the generalized signed distance for instance with all its properties. (4) numerical estimations of fuzzy statistical measures such as the variance, the moment, etc. (5) two methods of estimation of the bootstrap distribution of the likelihood ratio in the fuzzy context. (6) an estimation of a fuzzy confidence interval by the likelihood ratio method. (7) testing fuzzy hypotheses and/or fuzzy data by fuzzy confidence intervals in the Kwakernaak - Kruse and Meyer sense. (8) a general method to estimate the fuzzy p-value with fuzzy hypotheses and/or fuzzy data. (9) a method of estimation of global and individual evaluations of linguistic questionnaires. (10) numerical estimations of multi-ways analysis of variance models in the fuzzy context. The unbalance in the considered designs are also foreseen.

Last updated

3.44 score 11 scripts 184 downloads

cmaes - Covariance Matrix Adapting Evolutionary Strategy

Single objective optimization using a CMA-ES.

Last updated

3.32 score 1 stars 4 dependents 34 scripts 5.1k downloads

convergenceDFM - Convergence and Dynamic Factor Models

Tests convergence in macro-financial panels combining Dynamic Factor Models (DFM) and mean-reverting, discrete-time Ornstein-Uhlenbeck/AR(1) factor processes. Provides: (i) static factor extraction with VAR stability checks, Portmanteau tests and rolling out-of-sample R^2, in the spirit of Stock and Watson (2002) <doi:10.1198/073500102317351921> and the Generalized Dynamic Factor Model of Forni, Hallin, Lippi and Reichlin (2000) <doi:10.1162/003465300559037>; (ii) cointegration analysis a la Johansen (1988) <doi:10.1016/0165-1889(88)90041-3>; (iii) Bayesian factor-OU/AR(1) estimation with convergence and half-life summaries grounded in Uhlenbeck and Ornstein (1930) <doi:10.1103/PhysRev.36.823> and Vasicek (1977) <doi:10.1016/0304-405X(77)90016-2>, with full Markov chain Monte Carlo convergence diagnostics; (iv) heteroskedasticity-consistent (HC) and, when the suggested 'sandwich' (Zeileis (2004) <doi:10.18637/jss.v011.i10>) and 'lmtest' packages are available, heteroskedasticity- and autocorrelation- consistent (HAC) robust inference, with a self-contained HC fallback; (v) coupling significance tests based on time-shift / block-bootstrap nulls that preserve marginal dynamics while breaking cross-series dependence; and (vi) optional PLS-based factor preselection (Mevik and Wehrens (2007) <doi:10.18637/jss.v018.i02>). Functions emphasize reproducibility (explicit seeds throughout) and clear, publication-ready summaries.

Last updated

3.30 score 1 stars 4 scripts 223 downloads

ashapesampler - Generating Alpha Shapes

Understanding morphological variation is an important task in many applications. Recent studies in computational biology have focused on developing computational tools for the task of sub-image selection which aims at identifying structural features that best describe the variation between classes of shapes. A major part in assessing the utility of these approaches is to demonstrate their performance on both simulated and real datasets. However, when creating a model for shape statistics, real data can be difficult to access and the sample sizes for these data are often small due to them being expensive to collect. Meanwhile, the landscape of current shape simulation methods has been mostly limited to approaches that use black-box inference---making it difficult to systematically assess the power and calibration of sub-image models. In this R package, we introduce the alpha-shape sampler: a probabilistic framework for simulating realistic 2D and 3D shapes based on probability distributions which can be learned from real data or explicitly stated by the user. The 'ashapesampler' package supports two mechanisms for sampling shapes in two and three dimensions. The first, empirically sampling based on an existing data set, was highlighted in the original main text of the paper. The second, probabilistic sampling from a known distribution, is the computational implementation of the theory derived in that paper. Work based on Winn-Nunez et al. (2024) <doi:10.1101/2024.01.09.574919>.

Last updated

3.30 score 8 scripts 198 downloads

synchronicity - Boost Mutex Functionality in R

Boost mutex functionality in R.

Last updated

cpp

3.18 score 1 dependents 24 scripts 4.2k downloads

climatehealth - Statistical Tools for Modelling Climate-Health Impacts

Tools for producing climate-health indicators and supporting official statistics from health and climate data. Implements analytical workflows for temperature-related mortality, wildfire smoke exposure, air pollution, suicides related to extreme heat, malaria, and diarrhoeal disease outcomes, with utilities for descriptive statistics, model validation, attributable fraction and attributable number estimation, relative risk estimation, minimum mortality temperature estimation, and plotting for reporting. These six indicators are endorsed by the United Nations Statistical Commission for inclusion in the Global Set of Environment and Climate Change Statistics. Implemented methods include distributed lag non-linear models (DLNM), quasi-Poisson time-series regression, case-crossover analysis, Bayesian spatio-temporal models using the Integrated Nested Laplace Approximation ('INLA'), and multivariate meta-analysis for sub-national estimates. The package is based on methods developed in the Standards for Official Statistics on Climate-Health Interactions (SOSCHI) project <https://climate-health.officialstatistics.org>. For methodologies, see Watkins et al. (2026) <doi:10.5281/zenodo.14865904>, Jose et al. (2026) <doi:10.5281/zenodo.14052183>, Pearce et al. (2026) <doi:10.5281/zenodo.14050224>, Byukusenge et al. (2026) <doi:10.5281/zenodo.15585042>, Dzakpa et al. (2026) <doi:10.5281/zenodo.14881886>, and Dzakpa et al. (2026) <doi:10.5281/zenodo.14871506>.

Last updated

3.18 score 10 scripts 493 downloads

bgw - Bunch-Gay-Welsch Statistical Estimation

Performs statistical estimation and inference-related computations by accessing and executing modified versions of 'Fortran' subroutines originally published in the Association for Computing Machinery (ACM) journal Transactions on Mathematical Software (TOMS) by Bunch, Gay and Welsch (1993) <doi:10.1145/151271.151279>. The acronym 'BGW' (from the authors' last names) will be used when making reference to technical content (e.g., algorithm, methodology) that originally appeared in ACM TOMS. A key feature of BGW is that it exploits the special structure of statistical estimation problems within a trust-region-based optimization approach to produce an estimation algorithm that is much more effective than the usual practice of using optimization methods and codes originally developed for general optimization. The 'bgw' package bundles 'R' wrapper (and related) functions with modified 'Fortran' source code so that it can be compiled and linked in the 'R' environment for fast execution. This version implements a function ('bgw_mle.R') that performs maximum likelihood estimation (MLE) for a user-provided model object that computes probabilities (a.k.a. probability densities). The original motivation for producing this package was to provide fast, efficient, and reliable MLE for discrete choice models that can be called from the 'Apollo' choice modelling 'R' package ( see <https://www.apollochoicemodelling.com>). Starting with the release of Apollo 3.0, BGW is the default estimation package. However, estimation can also be performed using BGW in a stand-alone fashion without using 'Apollo' (as shown in simple examples included in the package). Note also that BGW capabilities are not limited to MLE, and future extension to other estimators (e.g., nonlinear least squares, generalized method of moments, etc.) is possible. The 'Fortran' code included in 'bgw' was modified by one of the original BGW authors (Bunch) under his rights as confirmed by direct consultation with the ACM Intellectual Property and Rights Manager. See <https://authors.acm.org/author-resources/author-rights>. The main requirement is clear citation of the original publication (see above).

Last updated

fortranglibc

3.13 score 2 dependents 4 scripts 2.2k downloads

glmnetr - Nested Cross Validation for the Relaxed Lasso and Other Machine Learning Models

Cross validation informed Relaxed LASSO (or more generally elastic net), gradient boosting machine ('xgboost'), Random Forest ('RandomForestSRC'), Oblique Random Forest ('aorsf'), Artificial Neural Network (ANN), Recursive Partitioning ('RPART') or step wise regression models are fit. Cross validation leave out samples (leading to nested cross validation) or bootstrap out-of-bag samples are used to evaluate and compare performances between these models with results presented in tabular or graphical means. Calibration plots can also be generated, again based upon (outer nested) cross validation or bootstrap leave out (out of bag) samples. Note, at the time of this writing, in order to fit gradient boosting machine models one must install the packages 'DiceKriging' and 'rgenoud' using the install.packages() function. For some datasets, for example when the design matrix is not of full rank, 'glmnet' may have very long run times when fitting the relaxed lasso model, from our experience when fitting Cox models on data with many predictors and many patients, making it difficult to get solutions from either glmnet() or cv.glmnet(). This may be remedied by using the 'path=TRUE' option when calling glmnet() and cv.glmnet(). Within the 'glmnetr' package the approach of path=TRUE is taken by default. other packages doing similar include 'nestedcv' <https://cran.r-project.org/package=nestedcv>, 'glmnetSE' <https://cran.r-project.org/package=glmnetSE> which may provide different functionality when performing a nested CV. Use of the 'glmnetr' has many similarities to the 'glmnet' package and it could be helpful for the user of 'glmnetr' also become familiar with the 'glmnet' package <https://cran.r-project.org/package=glmnet>, with the "An Introduction to 'glmnet'" and "The Relaxed Lasso" being especially useful in this regard.

Last updated

3.08 score 2 scripts 825 downloads

funMoDisco - Motif Discovery in Functional Data

Efficiently implementing two complementary methodologies for discovering motifs in functional data: ProbKMA and FunBIalign. Cremona and Chiaromonte (2023) "Probabilistic K-means with Local Alignment for Clustering and Motif Discovery in Functional Data" <doi:10.1080/10618600.2022.2156522> is a probabilistic K-means algorithm that leverages local alignment and fuzzy clustering to identify recurring patterns (candidate functional motifs) across and within curves, allowing different portions of the same curve to belong to different clusters. It includes a family of distances and a normalization to discover various motif types and learns motif lengths in a data-driven manner. It can also be used for local clustering of misaligned data. Di Iorio, Cremona, and Chiaromonte (2023) "funBIalign: A Hierarchical Algorithm for Functional Motif Discovery Based on Mean Squared Residue Scores" <doi:10.48550/arXiv.2306.04254> applies hierarchical agglomerative clustering with a functional generalization of the Mean Squared Residue Score to identify motifs of a specified length in curves. This deterministic method includes a small set of user-tunable parameters. Both algorithms are suitable for single curves or sets of curves. The package also includes a flexible function to simulate functional data with embedded motifs, allowing users to generate benchmark datasets for validating and comparing motif discovery methods.

Last updated

openblascppopenmp

3.00 score 1 scripts 569 downloads

NST - Normalized Stochasticity Ratio

To estimate ecological stochasticity in community assembly. Understanding the community assembly mechanisms controlling biodiversity patterns is a central issue in ecology. Although it is generally accepted that both deterministic and stochastic processes play important roles in community assembly, quantifying their relative importance is challenging. The new index, normalized stochasticity ratio (NST), is to estimate ecological stochasticity, i.e. relative importance of stochastic processes, in community assembly. With functions in this package, NST can be calculated based on different similarity metrics and/or different null model algorithms, as well as some previous indexes, e.g. previous Stochasticity Ratio (ST), Standard Effect Size (SES), modified Raup-Crick metrics (RC). Functions for permutational test and bootstrapping analysis are also included. Previous ST is published by Zhou et al (2014) <doi:10.1073/pnas.1324044111>. NST is modified from ST by considering two alternative situations and normalizing the index to range from 0 to 1 (Ning et al 2019) <doi:10.1073/pnas.1904623116>. A modified version, MST, is a special case of NST, used in some recent or upcoming publications, e.g. Liang et al (2020) <doi:10.1016/j.soilbio.2020.108023>. SES is calculated as described in Kraft et al (2011) <doi:10.1126/science.1208584>. RC is calculated as reported by Chase et al (2011) <doi:10.1890/ES10-00117.1> and Stegen et al (2013) <doi:10.1038/ismej.2013.93>. Version 3 added NST based on phylogenetic beta diversity, used by Ning et al (2020) <doi:10.1038/s41467-020-18560-z>.

Last updated

3.00 score 2 stars 50 scripts 575 downloads

hiphop - Parentage Assignment using Bi-Allelic Genetic Markers

Can be used for paternity and maternity assignment and outperforms conventional methods where closely related individuals occur in the pool of possible parents. The method compares the genotypes of offspring with any combination of potentials parents and scores the number of mismatches of these individuals at bi-allelic genetic markers (e.g. Single Nucleotide Polymorphisms). It elaborates on a prior exclusion method based on the Homozygous Opposite Test (HOT; Huisman 2017 <doi:10.1111/1755-0998.12665>) by introducing the additional exclusion criterion HIPHOP (Homozygous Identical Parents, Heterozygous Offspring are Precluded; Cockburn et al., in revision). Potential parents are excluded if they have more mismatches than can be expected due to genotyping error and mutation, and thereby one can identify the true genetic parents and detect situations where one (or both) of the true parents is not sampled. Package 'hiphop' can deal with (a) the case where there is contextual information about parentage of the mother (i.e. a female has been seen to be involved in reproductive tasks such as nest building), but paternity is unknown (e.g. due to promiscuity), (b) where both parents need to be assigned, because there is no contextual information on which female laid eggs and which male fertilized them (e.g. polygynandrous mating system where multiple females and males deposit young in a common nest, or organisms with external fertilisation that breed in aggregations). For details: Cockburn, A., Penalba, J.V.,Jaccoud, D.,Kilian, A., Brouwer, L., Double, M.C., Margraf, N., Osmond, H.L., van de Pol, M. and Kruuk, L.E.B. (in revision). HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for bi-allelic markers. Molecular Ecology Resources, DOI to be added upon acceptance.

Last updated

2.78 score 1 stars 12 scripts 123 downloads

ordinalTables - Fit Models to Two-Way Tables with Correlated Ordered Response Categories

Fit a variety of models to two-way tables with ordered categories. Most of the models are appropriate to apply to tables of that have correlated ordered response categories. There is a particular interest in rater data and models for rescore tables. Some utility functions (e.g., Cohen's kappa and weighted kappa) support more general work on rater agreement. Because the names of the models are very similar, the functions that implement them are organized by last name of the primary author of the article or book that suggested the model, with the name of the function beginning with that author's name and an underscore. This may make some models more difficult to locate if one doesn't have the original sources. The vignettes and tests can help to locate models of interest. For more dertaiils see the following references: Agresti, A. (1983) <doi:10.1016/0167-7152(83)90051-2> "A Simple Diagonals-Parameter Symmetry And Quasi-Symmetry Model", Agrestim A. (1983) <doi:10.2307/2531022> "Testing Marginal Homogeneity for Ordinal Categorical Variables", Agresti, A. (1988) <doi:10.2307/2531866> "A Model For Agreement Between Ratings On An Ordinal Scale", Agresti, A. (1989) <doi:10.1016/0167-7152(89)90104-1> "An Agreement Model With Kappa As Parameter", Agresti, A. (2010 ISBN:978-0470082898) "Analysis Of Ordinal Categorical Data", Bhapkar, V. P. (1966) <doi:10.1080/01621459.1966.10502021> "A Note On The Equivalence Of Two Test Criteria For Hypotheses In Categorical Data", Bhapkar, V. P. (1979) <doi:10.2307/2530344> "On Tests Of Marginal Symmetry And Quasi-Symmetry In Two And Three-Dimensional Contingency Tables", Bowker, A. H. (1948) <doi:10.2307/2280710> "A Test For Symmetry In Contingency Tables", Clayton, D. G. (1974) <doi:10.2307/2335638> "Some Odds Ratio Statistics For The Analysis Of Ordered Categorical Data", Cliff, N. (1993) <doi:10.1037/0033-2909.114.3.494> "Dominance Statistics: Ordinal Analyses To Answer Ordinal Questions", Cliff, N. (1996 ISBN:978-0805813333) "Ordinal Methods For Behavioral Data Analysis", Goodman, L. A. (1979) <doi:10.1080/01621459.1979.10481650> "Simple Models For The Analysis Of Association In Cross-Classifications Having Ordered Categories", Goodman, L. A. (1979) <doi:10.2307/2335159> "Multiplicative Models For Square Contingency Tables With Ordered Categories", Ireland, C. T., Ku, H. H., & Kullback, S. (1969) <doi:10.2307/2286071> "Symmetry And Marginal Homogeneity Of An r × r Contingency Table", Ishi-kuntz, M. (1994 ISBN:978-0803943766) "Ordinal Log-linear Models", McCullah, P. (1977) <doi:10.2307/2345320> "A Logistic Model For Paired Comparisons With Ordered Categorical Data", McCullagh, P. (1978) <doi:10.2307/2335224> A Class Of Parametric Models For The Analysis Of Square Contingency Tables With Ordered Categories", McCullagh, P. (1980) <doi:10.1111/j.2517-6161.1980.tb01109.x> "Regression Models For Ordinal Data", Penn State: Eberly College of Science (undated) <https://online.stat.psu.edu/stat504/lesson/11> "Stat 504: Analysis of Discrete Data, 11. Advanced Topics I", Schuster, C. (2001) <doi:10.3102/10769986026003331> "Kappa As A Parameter Of A Symmetry Model For Rater Agreement", Shoukri, M. M. (2004 ISBN:978-1584883210). "Measures Of Interobserver Agreement", Stuart, A. (1953) <doi:10.2307/2333101> "The Estimation Of And Comparison Of Strengths Of Association In Contingency Tables", Stuart, A. (1955) <doi:10.2307/2333387> "A Test For Homogeneity Of The Marginal Distributions In A Two-Way Classification", von Eye, A., & Mun, E. Y. (2005 ISBN:978-0805849677) "Analyzing Rater Agreement: Manifest Variable Methods".

Last updated

2.70 score 141 downloads

PytrendsLongitudinalR - Create Longitudinal Google Trends Data

'Google Trends' provides cross-sectional and time-series data on searches, but lacks readily available longitudinal data. Researchers, who want to create longitudinal 'Google Trends' on their own, face practical challenges, such as normalized counts that make it difficult to combine cross-sectional and time-series data and limitations in data formats and timelines that limit data granularity over extended time periods. This package addresses these issues and enables researchers to generate longitudinal 'Google Trends' data. This package is built on 'pytrends', a Python library that acts as the unofficial 'Google Trends API' to collect 'Google Trends' data. As long as the 'Google Trends API', 'pytrends' and all their dependencies are working, this package will work. During testing, we noticed that for the same input (keyword, topic, data_format, timeline), the output index can vary from time to time. Besides, if the keyword is not very popular, then the resulting dataset will contain a lot of zeros, which will greatly affect the final result. While this package has no control over the accuracy or quality of 'Google Trends' data, once the data is created, this package coverts it to longitudinal data. In addition, the user may encounter a 429 Too Many Requests error when using cross_section() and time_series() to collect 'Google Trends' data. This error indicates that the user has exceeded the rate limits set by the 'Google Trends API'. For more information about the 'Google Trends API' - 'pytrends', visit <https://pypi.org/project/pytrends/>.

Last updated

2.70 score 3 scripts 133 downloads

wconf - Weighted Confusion Matrix

Allows users to create weighted confusion matrices and accuracy metrics that help with the model selection process for classification problems, where distance from the correct category is important. The package includes several weighting schemes which can be parameterized, as well as custom configuration options. Furthermore, users can decide whether they wish to positively or negatively affect the accuracy score as a result of applying weights to the confusion matrix. Functions are included to calculate accuracy metrics for imbalanced data. Finally, 'wconf' integrates well with the 'caret' package, but it can also work standalone when provided data in matrix form. References: Kuhn, M. (2008) "Building Perspective Models in R Using the caret Package" <doi:10.18637/jss.v028.i05> Monahov, A. (2021) "Model Evaluation with Weighted Threshold Optimization (and the mewto R package)" <doi:10.2139/ssrn.3805911> Monahov, A. (2024) "Improved Accuracy Metrics for Classification with Imbalanced Data and Where Distance from the Truth Matters, with the wconf R Package" <doi:10.2139/ssrn.4802336> Starovoitov, V., Golub, Y. (2020). New Function for Estimating Imbalanced Data Classification Results. Pattern Recognition and Image Analysis, 295–302 Van de Velden, M., Iodice D'Enza, A., Markos, A., Cavicchia, C. (2023) "A general framework for implementing distances for categorical variables" <doi:10.48550/arXiv.2301.02190>.

Last updated

2.70 score 1 scripts 148 downloads

ADLP - Accident and Development Period Adjusted Linear Pools for Actuarial Stochastic Reserving

Loss reserving generally focuses on identifying a single model that can generate superior predictive performance. However, different loss reserving models specialise in capturing different aspects of loss data. This is recognised in practice in the sense that results from different models are often considered, and sometimes combined. For instance, actuaries may take a weighted average of the prediction outcomes from various loss reserving models, often based on subjective assessments. This package allows for the use of a systematic framework to objectively combine (i.e. ensemble) multiple stochastic loss reserving models such that the strengths offered by different models can be utilised effectively. Our framework is developed in Avanzi et al. (2023). Firstly, our criteria model combination considers the full distributional properties of the ensemble and not just the central estimate - which is of particular importance in the reserving context. Secondly, our framework is that it is tailored for the features inherent to reserving data. These include, for instance, accident, development, calendar, and claim maturity effects. Crucially, the relative importance and scarcity of data across accident periods renders the problem distinct from the traditional ensemble techniques in statistical learning. Our framework is illustrated with a complex synthetic dataset. In the results, the optimised ensemble outperforms both (i) traditional model selection strategies, and (ii) an equally weighted ensemble. In particular, the improvement occurs not only with central estimates but also relevant quantiles, such as the 75th percentile of reserves (typically of interest to both insurers and regulators). Reference: Avanzi B, Li Y, Wong B, Xian A (2023) "Ensemble distributional forecasting for insurance loss reserving" <doi:10.48550/arXiv.2206.08541>.

Last updated

2.70 score 211 downloads

discoverableresearch - Checks Title, Abstract and Keywords to Optimise Discoverability

A suite of tools are provided here to support authors in making their research more discoverable. check_keywords() - this function checks the keywords to assess whether they are already represented in the title and abstract. check_fields() - this function compares terminology used across the title, abstract and keywords to assess where terminological diversity (i.e. the use of synonyms) could increase the likelihood of the record being identified in a search. The function looks for terms in the title and abstract that also exist in other fields and highlights these as needing attention. suggest_keywords() - this function takes a full text document and produces a list of unigrams, bigrams and trigrams (1-, 2- or 2-word phrases) present in the full text after removing stop words (words with a low utility in natural language processing) that do not occur in the title or abstract that may be suitable candidates for keywords. suggest_title() - this function takes a full text document and produces a list of the most frequently used unigrams, bigrams and trigrams after removing stop words that do not occur in the abstract or keywords that may be suitable candidates for title words. check_title() - this function carries out a number of sub tasks: 1) it compares the length (number of words) of the title with the mean length of titles in major bibliographic databases to assess whether the title is likely to be too short; 2) it assesses the proportion of stop words in the title to highlight titles with low utility in search engines that strip out stop words; 3) it compares the title with a given sample of record titles from an .ris import and calculates a similarity score based on phrase overlap. This highlights the level of uniqueness of the title. This version of the package also contains functions currently in a non-CRAN package called 'litsearchr' <https://github.com/elizagrames/litsearchr>.

Last updated

2.70 score 167 downloads

tirt - Testlet Item Response Theory

Implementation of Testlet and Item Response Theory. A light-version yet comprehensive and streamlined framework for psychometric analysis using unidimensional and multidimensional Item Response Theory (IRT; Baker & Kim (2004) <doi:10.1201/9781482276725>) and Testlet Response Theory (TRT; Wainer et al., (2007) <doi:10.1017/CBO9780511618765>). Designed for researchers, this package supports the estimation of item and person parameters for a wide variety of models, including binary (i.e., Rasch, 2-Parameter Logistic, 3-Parameter Logistic) and polytomous (Partial Credit Model, Generalized Partial Credit Model, Graded Response Model) formats. It also supports the estimation of Testlet models (Rasch Testlet, 2-Parameter Logistic Testlet, 3-Parameter Logistic Testlet, Bifactor, Partial Credit Model Testlet, Graded Response), allowing users to account for local item dependence in bundled items. A key feature is the specialized support for combination use and joint estimation of item response model and testlet response model in one calibration. Beyond standard estimation via Marginal Maximum Likelihood with Expectation-Maximization (EM) or Joint Maximum Likelihood, the package also offers Bayesian estimation using priors with maximum a posteriori (MAP) method for unidimensional item response theory models. It also provides functions for scale linking and equating (Mean-Mean, Mean-Sigma, Stocking-Lord) to ensure comparability across mixed-format test forms. It also facilitates fixed-parameter calibration, enabling users to estimate person abilities with known item parameters or vice versa, which is essential for pre-equating studies and item bank maintenance. Comprehensive data simulation functions are included to generate synthetic datasets with complex structures, including mixed-model blocks and specific testlet effects, aiding in methodological research and study design validation. Researchers can try multiple simulation situations.

Last updated

2.60 score 1 scripts 522 downloads

gausscov - The Gaussian Covariate Method for Variable Selection

The standard linear regression theory whether frequentist or Bayesian is based on an 'assumed (revealed?) truth' (John Tukey) attitude to models. This is reflected in the language of statistical inference which involves a concept of truth, for example confidence intervals, hypothesis testing and consistency. The motivation behind this package was to remove the word true from the theory and practice of linear regression and to replace it by approximation. The approximations considered are the least squares approximations. An approximation is called valid if it contains no irrelevant covariates. This is operationalized using the concept of a Gaussian P-value which is the probability that pure Gaussian noise is better in term of least squares than the covariate. The precise definition given in the paper "An Approximation Based Theory of Linear Regression". Only four simple equations are required. Moreover the Gaussian P-values can be simply derived from standard F P-values. Furthermore they are exact and valid whatever the data in contrast F P-values are only valid for specially designed simulations. A valid approximation is one where all the Gaussian P-values are less than a threshold p0 specified by the statistician, in this package with the default value 0.01. This approximations approach is not only much simpler it is overwhelmingly better than the standard model based approach. The will be demonstrated using high dimensional regression and vector autoregression real data sets. The goal is to find valid approximations. The search function is f1st which is a greedy forward selection procedure which results in either just one or no approximations which may however not be valid. If the size is less than than a threshold with default value 21 then an all subset procedure is called which returns the best valid subset. A good default start is f1st(y,x,kmn=15) The best function for returning multiple approximations is f3st which repeatedly calls f1st. For more information see the papers: L. Davies and L. Duembgen, "Covariate Selection Based on a Model-free Approach to Linear Regression with Exact Probabilities", <doi:10.48550/arXiv.2202.01553>, L. Davies, "An Approximation Based Theory of Linear Regression", 2024, <doi:10.48550/arXiv.2402.09858>.

Last updated

2.55 score 2 stars 59 scripts 365 downloads

BLR - Bayesian Linear Regression

Bayesian Linear Regression.

Last updated

2.43 score 27 scripts 355 downloads

ttutils - Utility Functions

Contains some auxiliary functions.

Last updated

2.41 score 5 dependents 17 scripts 331 downloads

FRACTION - Numeric Number into Fraction

Turn numeric,data.frame,matrix into fraction form.

Last updated

2.40 score 2 dependents 21 scripts 289 downloads

easynls - Easy Nonlinear Model

Fit and plot some nonlinear models.

Last updated

2.11 score 1 stars 16 scripts 394 downloads

relevent - Relational Event Models

Tools to fit and simulate realizations from relational event models.

Last updated

2.10 score 1 stars 1 dependents 42 scripts 725 downloads

INFOSET - Computing a New Informative Distribution Set of Asset Returns

Estimation of the most-left informative set of gross returns (i.e., the informative set). The procedure to compute the informative set adjusts the method proposed by Mariani et al. (2022a) <doi:10.1007/s11205-020-02440-6> and Mariani et al. (2022b) <doi:10.1007/s10287-022-00422-2> to gross returns of financial assets. This is accomplished through an adaptive algorithm that identifies sub-groups of gross returns in each iteration by approximating their distribution with a sequence of two-component log-normal mixtures. These sub-groups emerge when a significant change in the distribution occurs below the median of the financial returns, with their boundary termed as the “change point" of the mixture. The process concludes when no further change points are detected. The outcome encompasses parameters of the leftmost mixture distributions and change points of the analyzed financial time series. The functionalities of the INFOSET package include: (i) modelling asset distribution detecting the parameters which describe left tail behaviour (infoset function), (ii) clustering, (iii) labeling of the financial series for predictive and classification purposes through a Left Risk measure based on the first change point (LR_cp function) (iv) portfolio construction (ptf_construction function). The package also provide a specific function to construct rolling windows of different length size and overlapping time.

Last updated

2.00 score 4 scripts 146 downloads

meerva - Analysis of Data with Measurement Error Using a Validation Subsample

Sometimes data for analysis are obtained using more convenient or less expensive means yielding "surrogate" variables for what could be obtained more accurately, albeit with less convenience; or less conveniently or at more expense yielding "reference" variables, thought of as being measured without error. Analysis of the surrogate variables measured with error generally yields biased estimates when the objective is to make inference about the reference variables. Often it is thought that ignoring the measurement error in surrogate variables only biases effects toward the null hypothesis, but this need not be the case. Measurement errors may bias parameter estimates either toward or away from the null hypothesis. If one has a data set with surrogate variable data from the full sample, and also reference variable data from a randomly selected subsample, then one can assess the bias introduced by measurement error in parameter estimation, and use this information to derive improved estimates based upon all available data. Formulaically these estimates based upon the reference variables from the validation subsample combined with the surrogate variables from the whole sample can be interpreted as starting with the estimate from reference variables in the validation subsample, and "augmenting" this with additional information from the surrogate variables. This suggests the term "augmented" estimate. The meerva package calculates these augmented estimates in the regression setting when there is a randomly selected subsample with both surrogate and reference variables. Measurement errors may be differential or non-differential, in any or all predictors (simultaneously) as well as outcome. The augmented estimates derive, in part, from the multivariate correlation between regression model parameter estimates from the reference variables and the surrogate variables, both from the validation subset. Because the validation subsample is chosen at random any biases imposed by measurement error, whether non-differential or differential, are reflected in this correlation and these correlations can be used to derive estimates for the reference variables using data from the whole sample. The main functions in the package are meerva.fit which calculates estimates for a dataset, and meerva.sim.block which simulates multiple datasets as described by the user, and analyzes these datasets, storing the regression coefficient estimates for inspection. The augmented estimates, as well as how measurement error may arise in practice, is described in more detail by Kremers WK (2021) <arXiv:2106.14063> and is an extension of the works by Chen Y-H, Chen H. (2000) <doi:10.1111/1467-9868.00243>, Chen Y-H. (2002) <doi:10.1111/1467-9868.00324>, Wang X, Wang Q (2015) <doi:10.1016/j.jmva.2015.05.017> and Tong J, Huang J, Chubak J, et al. (2020) <doi:10.1093/jamia/ocz180>.

Last updated

2.00 score 787 downloads

treebalance - Computation of Tree (Im)Balance Indices

The aim of the 'R' package 'treebalance' is to provide functions for the computation of a large variety of (im)balance indices for rooted trees. The package accompanies the book ''Tree balance indices: a comprehensive survey'' by M. Fischer, L. Herbst, S. Kersting, L. Kuehn and K. Wicke (2023) <ISBN: 978-3-031-39799-8>, <doi:10.1007/978-3-031-39800-1>, which gives a precise definition for the terms 'balance index' and 'imbalance index' (Chapter 4) and provides an overview of the terminology in this manual (Chapter 2). For further information on (im)balance indices, see also Fischer et al. (2021) <https://treebalance.wordpress.com>. Considering both established and new (im)balance indices, 'treebalance' provides (among others) functions for calculating the following 18 established indices and index families: the average leaf depth, the B1 and B2 index, the Colijn-Plazzotta rank, the normal, corrected, quadratic and equal weights Colless index, the family of Colless-like indices, the family of I-based indices, the Rogers J index, the Furnas rank, the rooted quartet index, the s-shape statistic, the Sackin index, the symmetry nodes index, the total cophenetic index and the variance of leaf depths. Additionally, we include 9 tree shape statistics that satisfy the definition of an (im)balance index but have not been thoroughly analyzed in terms of tree balance in the literature yet. These are: the total internal path length, the total path length, the average vertex depth, the maximum width, the modified maximum difference in widths, the maximum depth, the maximum width over maximum depth, the stairs1 and the stairs2 index. As input, most functions of 'treebalance' require a rooted (phylogenetic) tree in 'phylo' format (as introduced in 'ape' 1.9 in November 2006). 'phylo' is used to store (phylogenetic) trees with no vertices of out-degree one. For further information on the format we kindly refer the reader to E. Paradis (2012) <http://ape-package.ird.fr/misc/FormatTreeR_24Oct2012.pdf>.

Last updated

1.95 score 3 dependents 9 scripts 713 downloads

orders - Sampling from k-th Order Statistics of New Families of Distributions

Set of tools to generate samples of k-th order statistics and others quantities of interest from new families of distributions. The main references for this package are: C. Kleiber and S. Kotz (2003) Statistical size distributions in economics and actuarial sciences; Gentle, J. (2009), Computational Statistics, Springer-Verlag; Naradajah, S. and Rocha, R. (2016), <DOI:10.18637/jss.v069.i10> and Stasinopoulos, M. and Rigby, R. (2015), <DOI:10.1111/j.1467-9876.2005.00510.x>. The families of distributions are: Benini distributions, Burr distributions, Dagum distributions, Feller-Pareto distributions, Generalized Pareto distributions, Inverse Pareto distributions, The Inverse Paralogistic distributions, Marshall-Olkin G distributions, exponentiated G distributions, beta G distributions, gamma G distributions, Kumaraswamy G distributions, generalized beta G distributions, beta extended G distributions, gamma G distributions, gamma uniform G distributions, beta exponential G distributions, Weibull G distributions, log gamma G I distributions, log gamma G II distributions, exponentiated generalized G distributions, exponentiated Kumaraswamy G distributions, geometric exponential Poisson G distributions, truncated-exponential skew-symmetric G distributions, modified beta G distributions, exponentiated exponential Poisson G distributions, Poisson-inverse gaussian distributions, Skew normal type 1 distributions, Skew student t distributions, Singh-Maddala distributions, Sinh-Arcsinh distributions, Sichel distributions, Zero inflated Poisson distributions.

Last updated

1.86 score 73 scripts 250 downloads

tightClust - Tight Clustering

The functions needed to perform tight clustering Algorithm.

Last updated

1.80 score 1 dependents 21 scripts 138 downloads

twl - Two-Way Latent Structure Clustering Model

Implementation of a Bayesian two-way latent structure model for integrative genomic clustering. The model clusters samples in relation to distinct data sources, with each subject-dataset receiving a latent cluster label, though cluster labels have across-dataset meaning because of the model formulation. A common scaling across data sources is unneeded, and inference is obtained by a Gibbs Sampler. The model can fit multivariate Gaussian distributed clusters or a heavier-tailed modification of a Gaussian density. Uniquely among integrative clustering models, the formulation makes no nestedness assumptions of samples across data sources -- the user can still fit the model if a study subject only has information from one data source. The package provides a variety of post-processing functions for model examination including ones for quantifying observed alignment of clusterings across genomic data sources. Run time is optimized so that analyses of datasets on the order of thousands of features on fewer than 5 datasets and hundreds of subjects can converge in 1 or 2 days on a single CPU. See "Swanson DM, Lien T, Bergholtz H, Sorlie T, Frigessi A, Investigating Coordinated Architectures Across Clusters in Integrative Studies: a Bayesian Two-Way Latent Structure Model, 2018, <doi:10.1101/387076>, Cold Spring Harbor Laboratory" at <https://www.biorxiv.org/content/early/2018/08/07/387076.full.pdf> for model details.

Last updated

1.75 score 56 scripts 241 downloads

ddecompose - Detailed Distributional Decomposition

Implements the Oaxaca-Blinder decomposition method and generalizations of it that decompose differences in distributional statistics beyond the mean. The function ob_decompose() decomposes differences in the mean outcome between two groups into one part explained by different covariates (composition effect) and into another part due to differences in the way covariates are linked to the outcome variable (structure effect). The function further divides the two effects into the contribution of each covariate and allows for weighted doubly robust decompositions. For distributional statistics beyond the mean, the function performs the recentered influence function (RIF) decomposition proposed by Firpo, Fortin, and Lemieux (2018). The function dfl_decompose() divides differences in distributional statistics into an composition effect and a structure effect using inverse probability weighting as introduced by DiNardo, Fortin, and Lemieux (1996). The function also allows to sequentially decompose the composition effect into the contribution of single covariates. References: Firpo, Sergio, Nicole M. Fortin, and Thomas Lemieux. (2018) <doi:10.3390/econometrics6020028>. "Decomposing Wage Distributions Using Recentered Influence Function Regressions." Fortin, Nicole M., Thomas Lemieux, and Sergio Firpo. (2011) <doi:10.3386/w16045>. "Decomposition Methods in Economics." DiNardo, John, Nicole M. Fortin, and Thomas Lemieux. (1996) <doi:10.2307/2171954>. "Labor Market Institutions and the Distribution of Wages, 1973-1992: A Semiparametric Approach." Oaxaca, Ronald. (1973) <doi:10.2307/2525981>. "Male-Female Wage Differentials in Urban Labor Markets." Blinder, Alan S. (1973) <doi:10.2307/144855>. "Wage Discrimination: Reduced Form and Structural Estimates."

Last updated

1.70 score 1 stars 9 scripts 181 downloads

BivRegBLS - Tolerance Interval and EIV Regression - Method Comparison Studies

Assess the agreement in method comparison studies by tolerance intervals and errors-in-variables (EIV) regressions. The Ordinary Least Square regressions (OLSv and OLSh), the Deming Regression (DR), and the (Correlated)-Bivariate Least Square regressions (BLS and CBLS) can be used with unreplicated or replicated data. The BLS() and CBLS() are the two main functions to estimate a regression line, while XY.plot() and MD.plot() are the two main graphical functions to display, respectively an (X,Y) plot or (M,D) plot with the BLS or CBLS results. Four hyperbolic statistical intervals are provided: the Confidence Interval (CI), the Confidence Bands (CB), the Prediction Interval and the Generalized prediction Interval. Assuming no proportional bias, the (M,D) plot (Band-Altman plot) may be simplified by calculating univariate tolerance intervals (beta-expectation (type I) or beta-gamma content (type II)). Major updates from last version 1.0.0 are: title shortened, include the new functions BLS.fit() and CBLS.fit() as shortcut of the, respectively, functions BLS() and CBLS(). References: B.G. Francq, B. Govaerts (2016) <doi:10.1002/sim.6872>, B.G. Francq, B. Govaerts (2014) <doi:10.1016/j.chemolab.2014.03.006>, B.G. Francq, B. Govaerts (2014) <http://publications-sfds.fr/index.php/J-SFdS/article/view/262>, B.G. Francq (2013), PhD Thesis, UCLouvain, Errors-in-variables regressions to assess equivalence in method comparison studies, <https://dial.uclouvain.be/pr/boreal/object/boreal%3A135862/datastream/PDF_01/view>.

Last updated

1.60 score 40 scripts 251 downloads

PrInDT - Prediction and Interpretation in Decision Trees for Classification and Regression

Optimization of conditional inference trees from the package 'party' for classification and regression. For optimization, the model space is searched for the best tree on the full sample by means of repeated subsampling. Restrictions are allowed so that only trees are accepted which do not include pre-specified uninterpretable split results (cf. Weihs & Buschfeld, 2021a). The function PrInDT() represents the basic resampling loop for 2-class classification (cf. Weihs & Buschfeld, 2021a). The function RePrInDT() (repeated PrInDT()) allows for repeated applications of PrInDT() for different percentages of the observations of the large and the small classes (cf. Weihs & Buschfeld, 2021c). The function NesPrInDT() (nested PrInDT()) allows for an extra layer of subsampling for a specific factor variable (cf. Weihs & Buschfeld, 2021b). The functions PrInDTMulev() and PrInDTMulab() deal with multilevel and multilabel classification. In addition to these PrInDT() variants for classification, the function PrInDTreg() has been developed for regression problems. Finally, the function PostPrInDT() allows for a posterior analysis of the distribution of a specified variable in the terminal nodes of a given tree. In version 2, additionally structured sampling is implemented in functions PrInDTCstruc() and PrInDTRstruc(). In these functions, repeated measurements data can be analyzed, too. Moreover, multilabel 2-stage versions of classification and regression trees are implemented in functions C2SPrInDT() and R2SPrInDT() as well as interdependent multilabel models in functions SimCPrInDT() and SimRPrInDT(). Finally, for mixtures of classification and regression models functions Mix2SPrInDT() and SimMixPrInDT() are implemented. Most of these extensions of PrInDT are described in Buschfeld & Weihs (2025Fc). References: -- Buschfeld, S., Weihs, C. (2025Fc) "Optimizing decision trees for the analysis of World Englishes and sociolinguistic data", Cambridge Elements. -- Weihs, C., Buschfeld, S. (2021a) "Combining Prediction and Interpretation in Decision Trees (PrInDT) - a Linguistic Example" <doi:10.48550/arXiv.2103.02336>; -- Weihs, C., Buschfeld, S. (2021b) "NesPrInDT: Nested undersampling in PrInDT" <doi:10.48550/arXiv.2103.14931>; -- Weihs, C., Buschfeld, S. (2021c) "Repeated undersampling in PrInDT (RePrInDT): Variation in undersampling and prediction, and ranking of predictors in ensembles" <doi:10.48550/arXiv.2108.05129>.

Last updated

1.48 score 1 scripts 551 downloads

TTCA - Transcript Time Course Analysis

The analysis of microarray time series promises a deeper insight into the dynamics of the cellular response following stimulation. A common observation in this type of data is that some genes respond with quick, transient dynamics, while other genes change their expression slowly over time. The existing methods for detecting significant expression dynamics often fail when the expression dynamics show a large heterogeneity. Moreover, these methods often cannot cope with irregular and sparse measurements. The method proposed here is specifically designed for the analysis of perturbation responses. It combines different scores to capture fast and transient dynamics as well as slow expression changes, and performs well in the presence of low replicate numbers and irregular sampling times. The results are given in the form of tables including links to figures showing the expression dynamics of the respective transcript. These allow to quickly recognise the relevance of detection, to identify possible false positives and to discriminate early and late changes in gene expression. An extension of the method allows the analysis of the expression dynamics of functional groups of genes, providing a quick overview of the cellular response. The performance of this package was tested on microarray data derived from lung cancer cells stimulated with epidermal growth factor (EGF). Paper: Albrecht, Marco, et al. (2017)<DOI:10.1186/s12859-016-1440-8>.

Last updated

1.48 score 5 scripts 200 downloads

CopulaREMADA - Copula Mixed Models for Multivariate Meta-Analysis of Diagnostic Test Accuracy Studies

The bivariate copula mixed model for meta-analysis of diagnostic test accuracy studies in Nikoloulopoulos (2015) <doi:10.1002/sim.6595> and Nikoloulopoulos (2018) <doi:10.1007/s10182-017-0299-y>. The vine copula mixed model for meta-analysis of diagnostic test accuracy studies accounting for disease prevalence in Nikoloulopoulos (2017) <doi:10.1177/0962280215596769> and also accounting for non-evaluable subjects in Nikoloulopoulos (2020) <doi:10.1515/ijb-2019-0107>. The hybrid vine copula mixed model for meta-analysis of diagnostic test accuracy case-control and cohort studies in Nikoloulopoulos (2018) <doi:10.1177/0962280216682376>. The D-vine copula mixed model for meta-analysis and comparison of two diagnostic tests in Nikoloulopoulos (2019) <doi:10.1177/0962280218796685>. The multinomial quadrivariate D-vine copula mixed model for meta-analysis of diagnostic tests with non-evaluable subjects in Nikoloulopoulos (2020) <doi:10.1177/0962280220913898>. The one-factor copula mixed model for joint meta-analysis of multiple diagnostic tests in Nikoloulopoulos (2022) <doi:10.1111/rssa.12838>. The multinomial six-variate 1-truncated D-vine copula mixed model for meta-analysis of two diagnostic tests accounting for within and between studies dependence in Nikoloulopoulos (2024) <doi:10.1177/09622802241269645>. The 1-truncated D-vine copula mixed models for meta-analysis of diagnostic accuracy studies without a gold standard (Nikoloulopoulos, 2025) <doi:10.1093/biomtc/ujaf037>.

Last updated

1.34 score 2 stars 11 scripts 611 downloads

naivereg - Nonparametric Additive Instrumental Variable Estimator and Related IV Methods

In empirical studies, instrumental variable (IV) regression is the signature method to solve the endogeneity problem. If we enforce the exogeneity condition of the IV, it is likely that we end up with a large set of IVs without knowing which ones are good. Also, one could face the model uncertainty for structural equation, as large micro dataset is commonly available nowadays. This package uses adaptive group lasso and B-spline methods to select the nonparametric components of the IV function, with the linear function being a special case (naivereg). The package also incorporates two stage least squares estimator (2SLS), generalized method of moment (GMM), generalized empirical likelihood (GEL) methods post instrument selection, logistic-regression instrumental variables estimator (LIVE, for dummy endogenous variable problem), double-selection plus instrumental variable estimator (DS-IV) and double selection plus logistic regression instrumental variable estimator (DS-LIVE), where the double selection methods are useful for high-dimensional structural equation models. The naivereg is nonparametric version of 'ivregress' in 'Stata' with IV selection and high dimensional features. The package is based on the paper by Q. Fan and W. Zhong, "Nonparametric Additive Instrumental Variable Estimator: A Group Shrinkage Estimation Perspective" (2018), Journal of Business & Economic Statistics <doi:10.1080/07350015.2016.1180991> as well as a series of working papers led by the same authors.

Last updated

1.26 score 1 stars 18 scripts 252 downloads

fence - Using Fence Methods for Model Selection

This method is a new class of model selection strategies, for mixed model selection, which includes linear and generalized linear mixed models. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model is selected from among those within the fence according to a criterion which can be made flexible. References: 1. Jiang J., Rao J.S., Gu Z., Nguyen T. (2008), Fence Methods for Mixed Model Selection. The Annals of Statistics, 36(4): 1669-1692. <DOI:10.1214/07-AOS517> <https://projecteuclid.org/euclid.aos/1216237296>. 2. Jiang J., Nguyen T., Rao J.S. (2009), A Simplified Adaptive Fence Procedure. Statistics and Probability Letters, 79, 625-629. <DOI:10.1016/j.spl.2008.10.014> <https://www.researchgate.net/publication/23991417_A_simplified_adaptive_fence_procedure> 3. Jiang J., Nguyen T., Rao J.S. (2010), Fence Method for Nonparametric Small Area Estimation. Survey Methodology, 36(1), 3-11. <http://publications.gc.ca/collections/collection_2010/statcan/12-001-X/12-001-x2010001-eng.pdf>. 4. Jiming Jiang, Thuan Nguyen and J. Sunil Rao (2011), Invisible fence methods and the identification of differentially expressed gene sets. Statistics and Its Interface, Volume 4, 403-415. <http://www.intlpress.com/site/pub/files/_fulltext/journals/sii/2011/0004/0003/SII-2011-0004-0003-a014.pdf>. 5. Thuan Nguyen & Jiming Jiang (2012), Restricted fence method for covariate selection in longitudinal data analysis. Biostatistics, 13(2), 303-314. <DOI:10.1093/biostatistics/kxr046> <https://academic.oup.com/biostatistics/article/13/2/303/263903/Restricted-fence-method-for-covariate-selection-in>. 6. Thuan Nguyen, Jie Peng, Jiming Jiang (2014), Fence Methods for Backcross Experiments. Statistical Computation and Simulation, 84(3), 644-662. <DOI:10.1080/00949655.2012.721885> <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3891925/>. 7. Jiang, J. (2014), The fence methods, in Advances in Statistics, Hindawi Publishing Corp., Cairo. <DOI:10.1155/2014/830821>. 8. Jiming Jiang and Thuan Nguyen (2015), The Fence Methods, World Scientific, Singapore. <https://www.abebooks.com/9789814596060/Fence-Methods-Jiming-Jiang-981459606X/plp>.

Last updated

1.15 score 14 scripts 183 downloads

ljr - Logistic Joinpoint Regression

Fits and tests logistic joinpoint models.

Last updated

openblas

1.11 score 1 stars 13 scripts 158 downloads

scindex - Strategic Convergence Index for Inter-Rater Reliability

Inter-rater reliability analysis for binary classification tasks involving two or more raters within a signal detection-theoretic framework. User-supplied rating data are standardised into a common long-format structure. The package automatically computes Cohen's kappa for two raters and Fleiss' kappa for multiple raters. When ground-truth labels are available, rater-specific hit rates, false-alarm rates, sensitivity, specificity, and decision thresholds are estimated from observed classification responses using standard signal detection-theoretic transformations (DeCarlo, 1998) <doi:10.1037/1082-989X.3.2.186>. The package implements the Strategic Convergence Index (SCI; Gianeselli, 2026) <doi:10.1177/00131644261417643>, defined as SCI = 1 - [Var(t_i) / Var_max], where Var(t_i) denotes the variance of rater-specific decision thresholds and Var_max denotes the reference variance under maximal threshold dispersion. SCI quantifies convergence in rater decision criteria beyond observed agreement alone and complements classical agreement coefficients by distinguishing agreement in observed categorical outcomes from convergence in latent decision thresholds under an explicit signal detection-theoretic model of categorical judgment. The package provides structured summaries and threshold-based diagnostics for applications in which similar agreement coefficients may reflect substantively different underlying decision criteria across raters.

Last updated

1.00 score 446 downloads

SDI - Slow Digestibility Index

The Slow Digestibility Index (SDI) is a tool that helps users evaluate the slow-digestion properties of crops or food matrices by combining multiple factors into a single score. It considers parameters related to starch composition [total starch (TS), amylose/amylopectin ratio (Aratio), total amylose content (TAC), and total amylopectin content (TAPC)], starch digestibility [rapidly digestible starch (RDS), slowly digestible starch (SDS) and resistant starch (RS)], structural properties [relative crystallinity (RC)], non-starch components [total protein, total oil content (TOC), and total phenolic content (TPC)], and pasting behaviour [peak viscosity (PV), pasting temperature (PT), holding strength (HS), and final viscosity (FV)].The SDI is flexible and allows users to calculate the index using all parameters or only selected ones, depending on the data available. Users can also compute a starch-based SDI (using only starch-related parameters) or a principal component analysis (PCA)-based SDI, where weights are determined automatically from the data. Thus, the SDI provides a simple way to compare and rank crops or food samples based on their slow digestion potential. The package implements SDI(),starchSDI(), genSDI(), scoreSDI(), and pcaSDI() for estimating slow digestibility index using predefined weighted TOPSIS, starch-specific TOPSIS, user-defined weighted TOPSIS, score-based normalization, and PCA based approaches, respectively. The package has been developed using the algorithm of Pandey et al. (2026) <doi:10.1016/j.jff.2026.107208>.

Last updated

1.00 score 10 scripts 433 downloads

SHAKTI - Suite for Heat-Related Adsorption Knowledge and Thermodynamic Inference

A comprehensive framework for quantifying the fundamental thermodynamic parameters of adsorption reactions—changes in the standard Gibbs free energy (delta G), enthalpy (delta H), and entropy (delta S)—is essential for understanding the spontaneity, heat effects, and molecular ordering associated with sorption processes. By analysing temperature-dependent equilibrium data, thermodynamic interpretation expands adsorption studies beyond conventional isotherm fitting, offering deeper insight into underlying mechanisms and surface–solute interactions. Such an approach typically involves evaluating equilibrium coefficients across multiple temperatures and non-temperature treatments, deriving thermodynamic parameters using established thermodynamic relationships, and determining delta G as a temperature-specific indicator of adsorption favourability. This analytical pathway is widely applicable across environmental science, soil science, chemistry, materials science, and engineering, where reliable assessment of sorption behaviour is critical for examining contaminant retention, nutrient dynamics, and the behaviour of natural and engineered surfaces. By focusing specifically on thermodynamic inference, this framework complements existing adsorption isotherm-fitting packages such as “AdIsMF” <https://CRAN.R-project.org/package=AdIsMF> <doi:10.32614/CRAN.package.AdIsMF>, and strengthens the scientific basis for interpreting adsorption energetics in both research and applied contexts. Details can be found in Roy et al. (2025) <doi:10.1007/s11270-025-07963-7>.

Last updated

1.00 score 100 downloads

LTCDM - Latent Transition Cognitive Diagnosis Model with Covariates

Implementation of the three-step approach of (latent transition) cognitive diagnosis model (CDM) with covariates. This approach can be used for single time-point situations (cross-sectional data) and multiple time-point situations (longitudinal data) to investigate how the covariates are associated with attribute mastery. For multiple time-point situations, the three-step approach of latent transition CDM with covariates allows researchers to assess changes in attribute mastery status and to evaluate the covariate effects on both the initial states and transition probabilities over time using latent logistic regression. Because stepwise approaches often yield biased estimates, correction for classification error probabilities (CEPs) is considered in this approach. The three-step approach for latent transition CDM with covariates involves the following steps: (1) fitting a CDM to the response data without covariates at each time point separately, (2) assigning examinees to latent states at each time point and computing the associated CEPs, and (3) estimating the latent transition CDM with the known CEPs and computing the regression coefficients. The method was proposed in Liang et al. (2023) <doi:10.3102/10769986231163320> and demonstrated using mental health data in Liang et al. (in press; annotated R code and data utilized in this example are available in Mendeley data) <doi:10.17632/kpjp3gnwbt.1>.

Last updated

1.00 score 149 downloads

TUGLab - A Laboratory for TU Games

Cooperative game theory models decision-making situations in which a group of agents, called players, may achieve certain benefits by cooperating to reach an optimal outcome. It has great potential in different fields, since it offers a scenario to analyze and solve problems in which cooperation is essential to achieve a common goal. The 'TUGLab' (Transferable Utility Games Laboratory) R package contains a set of scripts that could serve as a helpful complement to the books and other materials used in courses on cooperative game theory, and also as a practical tool for researchers working in this field. The 'TUGLab' project was born in 2006 trying to highlight the geometrical aspects of the theory of cooperative games for 3 and 4 players. 'TUGlabWeb' is an online platform on which the basic functions of 'TUGLab' are implemented, and it is being used all over the world as a resource in degree, master's and doctoral programs. This package is an extension of the first versions and enables users to work with games in general (computational restrictions aside). The user can check properties of games, compute well-known games and calculate several set-valued and single-valued solutions such as the core, the Shapley value, the nucleolus or the core-center. The package also illustrates how the Shapley value flexibly adapts to various cooperative game settings, including weighted players and coalitions, a priori unions, and restricted communication structures. In keeping with the original philosophy of the first versions, special emphasis is placed on the graphical representation of the solution concepts for 3 and 4 players.

Last updated

1.00 score 132 downloads

ShapePattern - Tools for Analyzing Shapes and Patterns

This is an evolving and growing collection of tools for the quantification, assessment, and comparison of shape and pattern. This collection provides tools for: (1) the spatial decomposition of planar shapes using 'ShrinkShape' to incrementally shrink shapes to extinction while computing area, perimeter, and number of parts at each iteration of shrinking; the spectra of results are returned in graphic and tabular formats (Remmel 2015) <doi:10.1111/cag.12222>, (2) simulating landscape patterns, (3) provision of tools for estimating composition and configuration parameters from a categorical (binary) landscape map (grid) and then simulates a selected number of statistically similar landscapes. Class-focused pattern metrics are computed for each simulated map to produce empirical distributions against which statistical comparisons can be made. The code permits the analysis of single maps or pairs of maps (Remmel and Fortin 2013) <doi:10.1007/s10980-013-9905-x>, (4) counting the number of each first-order pattern element and converting that information into both frequency and empirical probability vectors (Remmel 2020) <doi:10.3390/e22040420>, and (5) computing the porosity of raster patches <doi:10.3390/su10103413>. NOTE: This is a consolidation of existing packages ('PatternClass', 'ShapePattern') to begin warehousing all shape and pattern code in a common package. Additional utility tools for handling data are provided and this package will be added to as more tools are created, cleaned-up, and documented. Note that all future developments will appear in this package and that 'PatternClass' will eventually be archived.

Last updated

1.00 score 5 scripts 206 downloads

CFO - CFO-Type Designs in Phase I/II Clinical Trials

In phase I clinical trials, the primary objective is to ascertain the maximum tolerated dose (MTD) corresponding to a specified target toxicity rate. The subsequent phase II trials are designed to examine the potential efficacy of the drug based on the MTD obtained from the phase I trials, with the aim of identifying the optimal biological dose (OBD). The 'CFO' package facilitates the implementation of dose-finding trials by utilizing calibration-free odds type (CFO-type) designs. Specifically, it encompasses the calibration-free odds (CFO) (Jin and Yin (2022) <doi:10.1177/09622802221079353>), randomized CFO (rCFO), precision CFO (pCFO), two-dimensional CFO (2dCFO) (Wang et al. (2023) <doi:10.3389/fonc.2023.1294258>), time-to-event CFO (TITE-CFO) (Jin and Yin (2023) <doi:10.1002/pst.2304>), fractional CFO (fCFO), accumulative CFO (aCFO), TITE-aCFO, and f-aCFO (Fang and Yin (2024) <doi: 10.1002/sim.10127>). It supports phase I/II trials for the CFO design and only phase I trials for the other CFO-type designs. The ‘CFO' package accommodates diverse CFO-type designs, allowing users to tailor the approach based on factors such as dose information inclusion, handling of late-onset toxicity, and the nature of the target drug (single-drug or drug-combination). The functionalities embedded in 'CFO' package include the determination of the dose level for the next cohort, the selection of the MTD for a real trial, and the execution of single or multiple simulations to obtain operating characteristics. Moreover, these functions are equipped with early stopping and dose elimination rules to address safety considerations. Users have the flexibility to choose different distributions, thresholds, and cohort sizes among others for their specific needs. The output of the 'CFO' package can be summary statistics as well as various plots for better visualization. An interactive web application for CFO is available at the provided URL.

Last updated

1.00 score 563 downloads

SILFS - Subgroup Identification with Latent Factor Structure

In various domains, many datasets exhibit both high variable dependency and group structures, which necessitates their simultaneous estimation. This package provides functions for two subgroup identification methods based on penalized functions, both of which utilize factor model structures to adapt to data with cross-sectional dependency. The first method is the Subgroup Identification with Latent Factor Structure Method (SILFSM) we proposed. By employing Center-Augmented Regularization and factor structures, the SILFSM effectively eliminates data dependencies while identifying subgroups within datasets. For this model, we offer optimization functions based on two different methods: Coordinate Descent and our newly developed Difference of Convex-Alternating Direction Method of Multipliers (DC-ADMM) algorithms; the latter can be applied to cases where the distance function in Center-Augmented Regularization takes L1 and L2 forms. The other method is the Factor-Adjusted Pairwise Fusion Penalty (FA-PFP) model, which incorporates factor augmentation into the Pairwise Fusion Penalty (PFP) developed by Ma, S. and Huang, J. (2017) <doi:10.1080/01621459.2016.1148039>. Additionally, we provide a function for the Standard CAR (S-CAR) method, which does not consider the dependency and is for comparative analysis with other approaches. Furthermore, functions based on the Bayesian Information Criterion (BIC) of the SILFSM and the FA-PFP method are also included in 'SILFS' for selecting tuning parameters. For more details of Subgroup Identification with Latent Factor Structure Method, please refer to He et al. (2024) <doi:10.48550/arXiv.2407.00882>.

Last updated

1.00 score 200 downloads

VAR.spec - Allows Specifying a Bivariate VAR (Vector Autoregression) with Desired Spectral Characteristics

The spectral characteristics of a bivariate series (Marginal Spectra, Coherency- and Phase-Spectrum) determine whether there is a strong presence of short-, medium-, or long-term fluctuations (components of certain frequencies in the spectral representation of the series) in each one of them. These are induced by strong peaks of the marginal spectra of each series at the corresponding frequencies. The spectral characteristics also determine how strongly these short-, medium-, or long-term fluctuations of the two series are correlated between the two series. Information on this is provided by the Coherency spectrum at the corresponding frequencies. Finally, certain fluctuations of the two series may be lagged to each other. Information on this is provided by the Phase spectrum at the corresponding frequencies. The idea in this package is to define a VAR (Vector autoregression) model with desired spectral characteristics by specifying a number of polynomials, required to define the VAR. See Ioannidis(2007) <doi:10.1016/j.jspi.2005.12.013>. These are specified via their roots, instead of via their coefficients. This is an idea borrowed from the Time Series Library of R. Dahlhaus, where it is used for defining ARMA models for univariate time series. This way, one may e.g. specify a VAR inducing a strong presence of long-term fluctuations in series 1 and in series 2, which are weakly correlated, but lagged by a number of time units to each other, while short-term fluctuations in series 1 and in series 2, are strongly present only in one of the two series, while they are strongly correlated to each other between the two series. Simulation from such models allows studying the behavior of data-analysis tools, such as estimation of the spectra, under different circumstances, as e.g. peaks in the spectra, generating bias, induced by leakage.

Last updated

1.00 score 535 downloads

TemporalGSSA - Outputs Temporal Profile of Molecules from Stochastic Simulation Algorithm Generated Datasets

The data that is generated from independent and consecutive 'GillespieSSA' runs for a generic biochemical network is formatted as rows and constitutes an observation. The first column of each row is the computed timestep for each run. Subsequent columns are used for the number of molecules of each participating molecular species or "metabolite" of a generic biochemical network. In this way 'TemporalGSSA', is a wrapper for the R-package 'GillespieSSA'. The number of observations must be at least 30. This will generate data that is statistically significant. 'TemporalGSSA', transforms this raw data into a simulation time-dependent and metabolite-specific trial. Each such trial is defined as a set of linear models (n >= 30) between a timestep and number of molecules for a metabolite. Each linear model is characterized by coefficients such as the slope, arbitrary constant, etc. The user must enter an integer from 1-4. These specify the statistical modality utilized to compute a representative timestep (mean, median, random, all). These arguments are mandatory and will be checked. Whilst, the numeric indicator "0" indicates suitability, "1" prompts the user to revise and re-enter their data. An optional logical argument controls the output to the console with the default being "TRUE" (curtailed) whilst "FALSE" (verbose). The coefficients of each linear model are averaged (mean slope, mean constant) and are incorporated into a metabolite-specific linear regression model as the dependent variable. The independent variable is the representative timestep chosen previously. The generated data is the imputed molecule number for an in silico experiment with (n >=30) observations. These steps can be replicated with multiple set of observations. The generated "technical replicates" can be statistically evaluated (mean, standard deviation) and will constitute simulation time-dependent molecules for each metabolite. For SSA-generated datasets with varying simulation times 'TemporalGSSA' will generate a simulation time-dependent trajectory for each metabolite of the biochemical network under study. The relevant publication with the mathematical derivation of the algorithm is (2022, Journal of Bioinformatics and Computational Biology) <doi:10.1142/S0219720022500184>. The algorithm has been deployed in the following publications (2021, Heliyon) <doi:10.1016/j.heliyon.2021.e07466> and (2016, Journal of Theoretical Biology) <doi:10.1016/j.jtbi.2016.07.002>.

Last updated

1.00 score 1 scripts 172 downloads

FWRGB - Fresh Weight Determination from Visual Image of the Plant

Fresh biomass determination is the key to evaluating crop genotypes' response to diverse input and stress conditions and forms the basis for calculating net primary production. However, as conventional phenotyping approaches for measuring fresh biomass is time-consuming, laborious and destructive, image-based phenotyping methods are being widely used now. In the image-based approach, the fresh weight of the above-ground part of the plant depends on the projected area. For determining the projected area, the visual image of the plant is converted into the grayscale image by simply averaging the Red(R), Green (G) and Blue (B) pixel values. Grayscale image is then converted into a binary image using Otsu’s thresholding method Otsu, N. (1979) <doi:10.1109/TSMC.1979.4310076> to separate plant area from the background (image segmentation). The segmentation process was accomplished by selecting the pixels with values over the threshold value belonging to the plant region and other pixels to the background region. The resulting binary image consists of white and black pixels representing the plant and background regions. Finally, the number of pixels inside the plant region was counted and converted to square centimetres (cm2) using the reference object (any object whose actual area is known previously) to get the projected area. After that, the projected area is used as input to the machine learning model (Linear Model, Artificial Neural Network, and Support Vector Regression) to determine the plant's fresh weight.

Last updated

1.00 score 183 downloads

ReDirection - Predict Dominant Direction of Reactions of a Biochemical Network

Biologically relevant, yet mathematically sound constraints are used to compute the propensity and thence infer the dominant direction of reactions of a generic biochemical network. The reactions must be unique and their number must exceed that of the reactants,i.e., reactions >= reactants + 2. 'ReDirection', computes the null space of a user-defined stoichiometry matrix. The spanning non-zero and unique reaction vectors (RVs) are combinatorially summed to generate one or more subspaces recursively. Every reaction is represented as a sequence of identical components across all RVs of a particular subspace. The terms are evaluated with (biologically relevant bounds, linear maps, tests of convergence, descriptive statistics, vector norms) and the terms are classified into forward-, reverse- and equivalent-subsets. Since, these are mutually exclusive the probability of occurrence is binary (all, 1; none, 0). The combined propensity of a reaction is the p1-norm of the sub-propensities, i.e., sum of the products of the probability and maximum numeric value of a subset (least upper bound, greatest lower bound). This, if strictly positive is the probable rate constant, is used to infer dominant direction and annotate a reaction as "Forward (f)", "Reverse (b)" or "Equivalent (e)". The inherent computational complexity (NP-hard) per iteration suggests that a suitable value for the number of reactions is around 20. Three functions comprise ReDirection. These are check_matrix() and reaction_vector() which are internal, and calculate_reaction_vector() which is external.

Last updated

1.00 score 1 scripts 235 downloads

EuclideanSD - An Euclidean View of Center and Spread

Illustrates the concepts developed in Sarkar and Rashid (2019, ISSN:0025-5742) <http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwiH4deL3q3xAhWX73MBHR_wDaYQFnoECAUQAw&url=https%3A%2F%2Fwww.indianmathsociety.org.in%2Fmathstudent-part-2-2019.pdf&usg=AOvVaw3SY--3T6UAWUnH5-Nj6bSc>. This package helps a user guess four things (mean, MD, scaled MSD, and RMSD) before they get the SD. 1) The package displays the Empirical Cumulative Distribution Function (ECDF) of the given data. The user must choose the value of the mean by equating the areas of two colored (blue and green) regions. The package gives feedback to improve the choice until it is correct. Alternatively, the reader may continue with a different guess for the center (not necessarily the mean). 2) The user chooses the values of the Mean Deviation (MD) based on the ECDF of the deviations by equating the areas of two newly colored (blue and green) regions, with feedback from the package until the user guesses correctly. 3) The user chooses the Scaled Mean Squared Deviation (MSD) based on the ECDF of the scaled square deviations by equating the areas of two newly colored (blue and green) regions, with feedback from the package until the user guesses correctly. 4) The user chooses the Root Mean Squared Deviation (RMSD) by ensuring that its intersection with the ECDF of the deviations is at the same height as the intersection between the scaled MSD and the ECDF of the scaled squared deviations. Additionally, the intersection of two blue lines (the green dot) should fall on the vertical line at the maximum deviation. 5) Finally, if the mean is chosen correctly, only then the user can view the population SD (the same as the RMSD) and the sample SD (sqrt(n/(n-1))*RMSD) by clicking the respective buttons. If the mean is chosen incorrectly, the user is asked to correct it.

Last updated

1.00 score 200 downloads