| Title: | Simulation-Based Fitting of Parametric Models with Minimum Prior Information |
|---|---|
| Description: | Implements an algorithm for fitting a generative model with an intractable likelihood using only box constraints on the parameters. The implemented algorithm consists of two phases. The first phase (global search) aims to identify the region containing the best solution, while the second phase (local search) refines this solution using a trust-region version of the Fisher scoring method to solve a quasi-likelihood equation. See Guido Masarotto (2025) <doi:10.48550/arXiv.2511.08180> for the details of the algorithm and supporting results. |
| Authors: | Guido Masarotto [aut, cre] (ORCID: <https://orcid.org/0000-0003-4697-1606>) |
| Maintainer: | Guido Masarotto <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.0 |
| Built: | 2026-05-20 09:30:53 UTC |
| Source: | https://github.com/cran/ifit |
The main package function ifit can be used to fit complex
parametric models from which observations can be simulated. The
implemented algorithm integrates a global search phase and a local search phase.
It can hence be used when only scarce prior information is
available.
Maintainer: Guido Masarotto [email protected] (ORCID)
Guido Masarotto (2015) 'Simulation-Based Fitting of Intractable Models via Sequential Sampling and Local Smoothing', arXiv eprint 2511.08180, pp. 1-23, doi:10.48550/arXiv.2511.08180
Function enzymeSim (written in C++) simulates a realization from
the stochastic Michaelis-Menten kinetics model; function enzymeStat
computes a possible summary statistics.
enzymeSim(theta, n = 50L, E0 = 100L, S0 = 100L) enzymeStat(y, degree = 2, knots = 0.2)enzymeSim(theta, n = 50L, E0 = 100L, S0 = 100L) enzymeStat(y, degree = 2, knots = 0.2)
theta |
a vector of length 3 containing the model parameters |
n |
the process is sampled on n equispaced points on the unit interval (default 50) |
E0, S0
|
initial state of the process (default E0=100, S0=100) |
y |
a |
degree |
the degree of the B-splines basis (default 2) |
knots |
the knots of the B-splines basis (default: only one knot equal to 0.2) |
The model describes a simple case of enzyme kinetics. It involves an
enzyme E binding to a substrate S to form a complex C. This complex C
then releases a product P while simultaneously regenerating the original
enzyme E. The possible reactions are E + S -> C, C -> E + S, and C -> P + E,
with rates that constitute the three model parameters.
In probabilistic terms, the integer-valued vector
is generated by a continuous-time Markov process
such that:
The initial state is .
Process is simulated using Gillespie exact algorithm.
enzymeSim returns a ‘n x 4’ integer matrix; the ith row contains the
simulated values for E, S, C and P at time t=(i-1)/(n-1).
enzymeStat returns a numeric vector of length
2*(degree+1+length(knots)) containing the coefficents
of the B-splines curves (with degree and knots given by the second and third
arguments) fitted by least squares to the complex and product trajectories
(i.e., to the last two components of the process's state).
The summary statistics is based only on and .
Indeed, at every time point , and
.
Darren J. Wilkinson (2019) Stochastic Modelling for Systems Biology, Third edition, CRC Press
set.seed(1) theta <- c(0.5, 2.5, 1) y <- enzymeSim(theta) plot(ts(y, start = 0, frequency = 50), main = "") print(tobs <- enzymeStat(y)) # It takes some time to run tsim <- function(theta) enzymeStat(enzymeSim(theta)) m <- ifit(tobs, tsim, l = rep(0, 3), u = rep(50, 3), trace = 1000) m confint(m) diagIFIT(m) numsimIFIT(m)set.seed(1) theta <- c(0.5, 2.5, 1) y <- enzymeSim(theta) plot(ts(y, start = 0, frequency = 50), main = "") print(tobs <- enzymeStat(y)) # It takes some time to run tsim <- function(theta) enzymeStat(enzymeSim(theta)) m <- ifit(tobs, tsim, l = rep(0, 3), u = rep(50, 3), trace = 1000) m confint(m) diagIFIT(m) numsimIFIT(m)
Fits an untractable parametric model by simulation
ifit( tobs, tsim, l, u, cluster = NULL, export = NULL, trace = 0, Ntotal = 1e+06, NTotglobal = 20000, Ninit = 1000, Nelite = 100, Aelite = 0.5, Tolglobal = 0.1, Tollocal = 1, Tolmodel = 1.5, NFitlocal = 4000, NAddglobal = 100, NAddlocal = 10, Lambda = 0.1, Rhomax = 0.1 )ifit( tobs, tsim, l, u, cluster = NULL, export = NULL, trace = 0, Ntotal = 1e+06, NTotglobal = 20000, Ninit = 1000, Nelite = 100, Aelite = 0.5, Tolglobal = 0.1, Tollocal = 1, Tolmodel = 1.5, NFitlocal = 4000, NAddglobal = 100, NAddlocal = 10, Lambda = 0.1, Rhomax = 0.1 )
tobs |
A vector containing the observed summary statistics. |
tsim |
A function implementing the model to be simulated. It must take as arguments a vector of model parameter values and it must return a vector of summary statistics. |
l, u
|
Two numeric vectors of length equal to the number of the parameters containing the lower and upper bounds on the parameters. |
cluster |
if not null, the model will be simulated in parallel;
the argument can be either a cluster object (as defined in package
|
export |
a vector giving the names of the global objects
(tipically needed by function |
trace |
An integer value; if greater than zero, results will be
printed (approximately) every |
Ntotal |
The maximum number of allowed model simulations. |
NTotglobal |
The maximum number of simulations performed during the global search phase. |
Ninit, Nelite, Aelite, Tolglobal, Tollocal, Tolmodel, NFitlocal, NAddglobal, NAddlocal, Rhomax, Lambda
|
Constants affecting the details of the algorithm (see the vignette describing the algorithm) |
An object of class ifit for which a suitable print method
exists and whose elements can be accessed using the
functions described in ifit-methods
Guido Masarotto (2015) 'Simulation-Based Fitting of Intractable Models via Sequential Sampling and Local Smoothing', arXiv eprint 2511.08180, pp. 1-23, doi:10.48550/arXiv.2511.08180
# A toy model set.seed(1) n <- 3 y <- rnorm(n, 1) tobs <- mean(y) tsim <- function(theta) mean(rnorm(n, theta[1])) m <- ifit(tobs, tsim, l = -3, u = 3) m coef(m) confint(m) globalIFIT(m) numsimIFIT(m) vcov(m, "parameters") vcov(m, "statistics") jacobianIFIT(m) diagIFIT(m, plot = FALSE) estfunIFIT(m) # Logit model # It takes some time to run n <- 100 X <- seq(-1, 1, length = n) Z <- rnorm(n) X <- cbind(1, X, Z, Z + rnorm(n)) logitSim <- function(theta) rbinom(n, 1, 1 / (1 + exp(-X %*% theta))) logitStat <- function(y) as.numeric(crossprod(X, y)) theta <- c(-1, 1, 0.5, -0.5) y <- logitSim(theta) m <- ifit( tobs = logitStat(y), tsim = function(t) logitStat(logitSim(t)), l = rep(-5, 4), u = rep(5, 4), trace = 1000 ) m g <- glm(y ~ X - 1, family = binomial) rbind( ifit.estimates = coef(m), glm.estimates = coef(g), ifit.se = sqrt(diag(vcov(m))), glm.se = sqrt(diag(vcov(g))) )# A toy model set.seed(1) n <- 3 y <- rnorm(n, 1) tobs <- mean(y) tsim <- function(theta) mean(rnorm(n, theta[1])) m <- ifit(tobs, tsim, l = -3, u = 3) m coef(m) confint(m) globalIFIT(m) numsimIFIT(m) vcov(m, "parameters") vcov(m, "statistics") jacobianIFIT(m) diagIFIT(m, plot = FALSE) estfunIFIT(m) # Logit model # It takes some time to run n <- 100 X <- seq(-1, 1, length = n) Z <- rnorm(n) X <- cbind(1, X, Z, Z + rnorm(n)) logitSim <- function(theta) rbinom(n, 1, 1 / (1 + exp(-X %*% theta))) logitStat <- function(y) as.numeric(crossprod(X, y)) theta <- c(-1, 1, 0.5, -0.5) y <- logitSim(theta) m <- ifit( tobs = logitStat(y), tsim = function(t) logitStat(logitSim(t)), l = rep(-5, 4), u = rep(5, 4), trace = 1000 ) m g <- glm(y ~ X - 1, family = binomial) rbind( ifit.estimates = coef(m), glm.estimates = coef(g), ifit.se = sqrt(diag(vcov(m))), glm.se = sqrt(diag(vcov(g))) )
ifit objectsMethods and functions usable fo extracting information from an object
of class ifit
## S3 method for class 'ifit' coef(object, ...) ## S3 method for class 'ifit' vcov(object, type = c("parameters", "statistics"), ...) ## S3 method for class 'ifit' confint(object, parm, level = 0.95, ...) controlIFIT(object) globalIFIT(object) jacobianIFIT(object) numsimIFIT(object) diagIFIT(object, plot = TRUE) estfunIFIT(object)## S3 method for class 'ifit' coef(object, ...) ## S3 method for class 'ifit' vcov(object, type = c("parameters", "statistics"), ...) ## S3 method for class 'ifit' confint(object, parm, level = 0.95, ...) controlIFIT(object) globalIFIT(object) jacobianIFIT(object) numsimIFIT(object) diagIFIT(object, plot = TRUE) estfunIFIT(object)
object |
an object returned by |
... |
additional arguments, currently not used |
type |
should the (estimated) covariance matrix of the parameters or of summary statistics returned? |
parm |
a specification of which parameters are to be given confidence intervals. |
level |
the confidence level required. |
plot |
if TRUE the summary statistics are plotted and the object is returned as invisible; otherwise, the statistics are not plotted and the object is returned as visible. |
coef returns a numeric vector with the estimated parameters
vcov returns a numeric matrix containing either the
estimates or summary statistics variance-covariance matrix.
confint returns a matrix (or vector) with columns giving lower and upper
confidence limits for each parameter. The intervals assume the
asymptotic normality of the summary statistics.
controlIFIT return a list containing the values of the constants
Ninit, NFitlocal,... used to estimate the model.
globalIFIT return the vector of the estimated parameters after
the global search
jacobianIFIT returns a numeric matrix containing the estimated
jacobian of the summary statistcs mean.
numsimIFIT returns an integer vector of length 2 containing the
number of simulations used during the global and local search
phases.
diagIFIT returns, and optionally plot, a numeric vector containing
the observed summary statistics standardized with their means and
standard deviances estimated at the final parameters; and, as attributes,
the Sargan-Hansen test statistic, its degrees of freedom and
the corresponding p-value. The
class of the returned object is ifit.diag for which suitable
print and plot methods are available.
estfunIFIT returns a numeric vector containing the estimating function evaluated
at the final parameters; as attributes, the function also returns
the estimated standard errors of the estimating function, its Mahalanobis norm
and the number of degree of freedom. The
class of the returned object is ifit.estfun for which a suitable
print method is available.
A dataset describing the nocturnal movements of 66 Fowler's toad that were radio-tracked for 63 days.
toadstoads
A numerical matrix with 63 rows and 66 columns
https://github.com/pmarchand1/fowlers-toad-move/
Philippe Marchand, Morgan Boenke, and David M. Green (2017) ‘A Stochastic Movement Model Reproduces Patterns of Site Fidelity and Long-Distance Dispersal in a Population of Fowler’s Toads (Anaxyrus Fowleri)’, Ecological Modelling, 360, pp 63–69, doi:10.1016/j.ecolmodel.2017.06.025.
# White cells are missing data lattice::levelplot(sweep(toads, 2, toads[1,]), ylab="Toad", xlab="Day", at=c(-Inf, seq(-200, 200, by=10), +Inf))# White cells are missing data lattice::levelplot(sweep(toads, 2, toads[1,]), ylab="Toad", xlab="Day", at=c(-Inf, seq(-200, 200, by=10), +Inf))
Function toadSim (written in C++) simulates a realization from
the "model 1" suggested by Marchand et al. (2017);
function toadStat computes a possible summary statistics.
toadSim(theta, data) toadStat( data, lags = c(1, 2, 4, 8), q = c(0.01, seq(0.05, 0.95, by = 0.05), 0.99), dret = 10 )toadSim(theta, data) toadStat( data, lags = c(1, 2, 4, 8), q = c(0.01, seq(0.05, 0.95, by = 0.05), 0.99), dret = 10 )
theta |
a vector of length 3 containing the model parameters |
data |
a (number of days)x(number of toads) numerical matrix containing the locations of a set of toads. The simulated dataset will replicates the size, the NA pattern and the initial positions (first row) of this argument. |
lags |
an integer vector giving the lags used to compute the statistic |
q |
a numeric vector specifying the desired quantiles |
dret |
the distance used to classify the displacements as "returns" or "non-returns". |
The model describes the nocturnal movements (along a single spatial
dimension) of Fowler's toads.
It assumes that toads leave their refuges at night to forage and
hide within sand dunes during the day. After the t-th nocturnal
foraging phase, a toad is located at a displacement of meters
from its previous refuge site. The
displacements are assumed to be independent and identically
distributed realizations of a symmetric, zero-centered -stable
random variable with stability parameter and scale parameter
. The toad then either returns to one of the previously visited
sites (with probability ) or remains at its current location (with
probability ). In the former case, the refuge site is selected at
random. The model parameter vector is
.
toadSim return a numerical matrix of the same size of data
toadStat returns a numeric vector of length
length(lags)x(1+length(q)). The summary statistic is computed
from absolute displacements at time lags
classified as "returns" or "non-returns"
depending on whether they are smaller than dret.
For each lag in lags, the statistic comprises the "return" frequency
and the median and adjacent quantile differences of the logarithms of
the "non-return" distances (the quantiles are those specified in q).
Philippe Marchand, Morgan Boenke, and David M. Green (2017) ‘A Stochastic Movement Model Reproduces Patterns of Site Fidelity and Long-Distance Dispersal in a Population of Fowler’s Toads (Anaxyrus Fowleri)’, Ecological Modelling, 360, pp 63–69, doi:10.1016/j.ecolmodel.2017.06.025.
# It takes some time to run set.seed(20251025L) tobs <- toadStat(toads) tsim <- function(theta) toadStat(toadSim(theta, toads)) m <- ifit(tobs, tsim, l = c(0.01, 0, 0), u = c(2, 100, 1), trace = 1000) m confint(m) numsimIFIT(m) estfunIFIT(m) diagIFIT(m)# It takes some time to run set.seed(20251025L) tobs <- toadStat(toads) tsim <- function(theta) toadStat(toadSim(theta, toads)) m <- ifit(tobs, tsim, l = c(0.01, 0, 0), u = c(2, 100, 1), trace = 1000) m confint(m) numsimIFIT(m) estfunIFIT(m) diagIFIT(m)
Function traitSim (written in C++) simulates a realization from
the trait model suggested by Jabot (2010); function traiStat
computes a possible summary statistics.
traitSim(theta, nspecies = 1000L, population = 500L, ngen = 5000L) traitStat(n, q = c(0.01, seq(0.05, 0.95, by = 0.05), 0.99))traitSim(theta, nspecies = 1000L, population = 500L, ngen = 5000L) traitStat(n, q = c(0.01, seq(0.05, 0.95, by = 0.05), 0.99))
theta |
a vector of length 4 containing the model parameters, |
nspecies |
the number of different levels of the trait characteristic (default 1000) |
population |
number of individuals living in the community (default 500) |
ngen |
number of generations (death/birth cycles) after which the actual population is returned (default 5000) |
n |
the observed data |
q |
the quantiles used to summarize the trait distribution |
The model describes the distribution of a numeric trait in a population
of size population. The trait can only assume the values
for . The local
competitive ability of a species with trait u is proportional to
where ,
and are parameters, and denotes the probability density function of a normal random variable with mean
and variance .
The traits of the initial
population are randomly drawn with probability proportional to
. Then, for ngen steps, one individual randomly chosen
dies. It is replaced either by an immigrant (with probability
) or by a descendant of another randomly choosen existing
individual (with probability ).
In the first case (immigration), the trait of the new individual is drawn with
probability proportional to . In the second case
(reproduction), the the probability that the trait of
the new individual is u is proportional to the abundance of u in the
population times .
The vector of model parameters is .
Note that the parametrization used in this package differs from the one
originally suggested by Jabot (2010). Specifically, Jabot assumes that
and
where and are alternative parameters used in place of
and , respectively.
traitSim returns a integer vector of length nspecies containing the trait distribution
of population individuals after ngen generations
traitStat returns a numeric vector containing the species richness
(i.e., the number of distinct traits in the population),
the Gini index, and the quantiles of the trait
characteristic corresponding to the input arguments q
Franck Jabot (2010) 'A Stochastic Dispersal-Limited Trait-Based Model of Community Dynamics', Journal of Theoretical Biology, 262, pp. 650–61, doi:10.1016/j.jtbi.2009.11.004.
set.seed(1) theta <- c(0.2, 0.7, 0.2, 0.7) y <- traitSim(theta) plot(seq(0, 1, length = length(y)), y, type = "h", xlab = "trait", ylab = "frequency") print(tobs <- traitStat(y)) # It takes some time to run tsim <- function(theta) traitStat(traitSim(theta)) m <- ifit(tobs, tsim, l = rep(0, 4), u = rep(1, 4), trace = 1000) m confint(m) diagIFIT(m) numsimIFIT(m)set.seed(1) theta <- c(0.2, 0.7, 0.2, 0.7) y <- traitSim(theta) plot(seq(0, 1, length = length(y)), y, type = "h", xlab = "trait", ylab = "frequency") print(tobs <- traitStat(y)) # It takes some time to run tsim <- function(theta) traitStat(traitSim(theta)) m <- ifit(tobs, tsim, l = rep(0, 4), u = rep(1, 4), trace = 1000) m confint(m) diagIFIT(m) numsimIFIT(m)