Title: | Latent Class Analysis (LCA) with Familial Dependence in Extended Pedigrees |
---|---|
Description: | Latent Class Analysis of phenotypic measurements in pedigrees and model selection based on one of two methods: likelihood-based cross-validation and Bayesian Information Criterion. Computation of individual and triplet child-parents weights in a pedigree is performed using an upward-downward algorithm. The model takes into account the familial dependence defined by the pedigree structure by considering that a class of a child depends on his parents classes via triplet-transition probabilities of the classes. The package handles the case where measurements are available on all subjects and the case where measurements are available only on symptomatic (i.e. affected) subjects. Distributions for discrete (or ordinal) and continuous data are currently implemented. The package can deal with missing data. |
Authors: | Arafat TAYEB <[email protected]>, Alexandre BUREAU <[email protected]> and Aurelie Labbe <[email protected]> |
Maintainer: | Alexandre BUREAU <[email protected]> |
License: | GPL |
Version: | 1.3 |
Built: | 2024-10-23 06:41:25 UTC |
Source: | CRAN |
computes cumulative logistic coefficients using probabilities.
alpha.compute(p)
alpha.compute(p)
p |
a vector of probabilities (positive entries summing to 1). |
If p
has one value (equal to 1) alpha.compute
returns NA
, if it has S (S>=2)
values, alpha.compute
returns S-1
coefficients
alpha
such that if Y
is a random variable taking values in {1,...,S}
with probabilities p
, coefficients alpha[i]
are given by:
for all i<=S-1
.
The function returns alpha
: a vector of S-1
cumulative logistic coefficients.
alpha.compute
is the inverse function of p.compute
# a vector of probability p <- c(0,0.2,0,0,0.3,0.4,0.1,0,0) alpha.compute(p) #gives -Inf -1.38 0 0 1.38 0 2.19 Inf Inf p.compute(alpha.compute(rep(1/5,5)))
# a vector of probability p <- c(0,0.2,0,0,0.3,0.4,0.1,0,0) alpha.compute(p) #gives -Inf -1.38 0 0 1.38 0 2.19 Inf Inf p.compute(alpha.compute(rep(1/5,5)))
associates to a function of density parameter optimization an attribute to distinguish between ordinal and normal cases. This is an internal function not meant to be called by the user.
attrib.dens(optim.param)
attrib.dens(optim.param)
optim.param |
the function used to estimate the parameters of the measurements. |
Available optim.param
functions are optim.noconst.ordi
, optim.const.ordi
for ordinal measurements and optim.indep.norm
, optim.diff.norm
, optim.equal.norm
and optim.gene.norm
for multinormal measurements. The attribution uses the internal function attr
and the attribute name used is type
. The user can make his own optima.param
function and has to associate an attribute type
to it to be used instead of the available ones.
The function returns the same optim.param
with an attribute type
taking values in ordi
or norm
.
optim.param <- optim.indep.norm optim.param <- attrib.dens(optim.param)
optim.param <- optim.indep.norm optim.param <- attrib.dens(optim.param)
computes the density of an individual's continuous measurement vector for all latent classes, eventually taking covariates into account. This is an internal function not meant to be called by the user.
dens.norm(y.x, param, var.list = NULL)
dens.norm(y.x, param, var.list = NULL)
y.x |
a vector |
param |
a list of the multinormal density parameters: means |
var.list |
a list of integers indicating which covariates (taken from |
For each class k
, the function computes the multinormal density with means param$mu[[k]]
and variances-covariances matrix
param$sigma[[k]]
for the individual's measurement
vector. Treatment of covariates is not yet implemented, and any
provided covariate value will be ignored.
The function returns a vector dens
of length K
, where
dens[k]
is the density of the measurements if the individual belongs to class k
.
#data data(ped.cont) status <- ped.cont[,6] y <- ped.cont[status==2,7:ncol(ped.cont)] #param data(param.cont) #the function applied for measurement of the first individual in the ped.ordi dens.norm(y.x=y[1,],param.cont)
#data data(ped.cont) status <- ped.cont[,6] y <- ped.cont[status==2,7:ncol(ped.cont)] #param data(param.cont) #the function applied for measurement of the first individual in the ped.ordi dens.norm(y.x=y[1,],param.cont)
computes the probability of an individual's discrete measurement vector for all latent classes under a multinomial distribution product, eventually taking covariates into account. This is an internal function not meant to be called by the user.
dens.prod.ordi(y.x, param, var.list = NULL)
dens.prod.ordi(y.x, param, var.list = NULL)
y.x |
a vector |
param |
a list of the parameters alpha (cumulative logistic coefficients), see |
var.list |
a list of integers indicating which covariates (taken from |
If there are K
latent classes, d
measurements and each measurement has S[j]
possible values, alpha
is a list of d
elements, each is a K
times S[j]+length{var.list[[j]]}
matrix. For a class C=k
, dens[k]=
, where
is
computed from the cumulative logistic coefficients
alpha[[j]][k,]
and
covariates x[var.list[[j]]]
,
The function returns a vector dens
of length K
, where
dens[k]
is the probability of measurement vector y
with covariates x
,
if the individual belongs to class k
.
See Also init.ordi
,
#data data(ped.ordi) status <- ped.ordi[,6] y <- ped.ordi[status==2,7:ncol(ped.ordi)] #param data(param.ordi) #the function applied for measurement of the first individual in the ped.ordi dens.prod.ordi(y.x=y[1,],param.ordi)
#data data(ped.ordi) status <- ped.ordi[,6] y <- ped.ordi[status==2,7:ncol(ped.ordi)] #param data(param.ordi) #the function applied for measurement of the first individual in the ped.ordi dens.prod.ordi(y.x=y[1,],param.ordi)
computes the probability of measurements above connectors and their classes given the model parameters, and returns the unnormalized triplet and individual weights. This is an internal function not meant to be called by the user.
downward(id, dad, mom, status, probs, fyc, peel, res.upward)
downward(id, dad, mom, status, probs, fyc, peel, res.upward)
id |
individual ID of the pedigree, |
dad |
dad ID, |
mom |
mom ID, |
status |
symptom status: (2: symptomatic, 1: without symptoms, 0: missing), |
probs |
a list of probability parameters of the model, |
fyc |
a matrix of |
peel |
a list of pedigree peeling containing connectors by peeling order and couples of parents, |
res.upward |
result of the upward step of the peeling algorithm, see |
This function computes the probability of observations above connectors and their classes using the function downward.connect
, for each connector,
if Y_above(i)
is the observations above connector i
and S_i
and C_i
are his status and his class respectively, the functions computes
P(Y_above(i),S_i,C_i)
by computing a downward step for the parent of connector i
who is also a connector. These quantities are used by the function weight.nuc
to compute the unnormalized triplet weights ww
and the unnormalized
individual weights w
.
The function returns a list of 2 elements:
ww |
unnormalized triplet weights, an array of |
w |
unnormalized individual weights, an array of |
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
See also downward.connect
.
#data data(ped.cont) data(peel) fam <- ped.cont[,1] id <- ped.cont[fam==1,2] dad <- ped.cont[fam==1,3] mom <- ped.cont[fam==1,4] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] peel <- peel[[1]] #standardize id to be 1, 2, 3, ... id.origin <- id standard <- function(vec) ifelse(vec%in%id.origin,which(id.origin==vec),0) id <- apply(t(id),2,standard) dad <- apply(t(dad),2,standard) mom <- apply(t(mom),2,standard) peel$couple <- cbind(apply(t(peel$couple[,1]),2,standard), apply(t(peel$couple[,2]),2,standard)) for(generat in 1:peel$generation) peel$peel.connect[generat,] <- apply(t(peel$peel.connect[generat,]),2,standard) #probs and param data(probs) data(param.cont) #densities of the observations fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm,param.cont,NULL)) #the upward step res.upward <- upward(id,dad,mom,status,probs,fyc,peel) #the function downward(id,dad,mom,status,probs,fyc,peel,res.upward)
#data data(ped.cont) data(peel) fam <- ped.cont[,1] id <- ped.cont[fam==1,2] dad <- ped.cont[fam==1,3] mom <- ped.cont[fam==1,4] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] peel <- peel[[1]] #standardize id to be 1, 2, 3, ... id.origin <- id standard <- function(vec) ifelse(vec%in%id.origin,which(id.origin==vec),0) id <- apply(t(id),2,standard) dad <- apply(t(dad),2,standard) mom <- apply(t(mom),2,standard) peel$couple <- cbind(apply(t(peel$couple[,1]),2,standard), apply(t(peel$couple[,2]),2,standard)) for(generat in 1:peel$generation) peel$peel.connect[generat,] <- apply(t(peel$peel.connect[generat,]),2,standard) #probs and param data(probs) data(param.cont) #densities of the observations fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm,param.cont,NULL)) #the upward step res.upward <- upward(id,dad,mom,status,probs,fyc,peel) #the function downward(id,dad,mom,status,probs,fyc,peel,res.upward)
computes the probability of the measurements above a connector and the connector latent class given the model parameters. This is an internal function not meant to be called by the user.
downward.connect(connect, parent1, parent2, bro.connect, status, probs, fyc, p.ybarF.c, res.upward)
downward.connect(connect, parent1, parent2, bro.connect, status, probs, fyc, p.ybarF.c, res.upward)
connect |
a connector in the pedigree (individual with parents and children in the pedigree), |
parent1 |
one of the connector parent who is also a connector, |
parent2 |
the other parent of the connector (not a connector), |
bro.connect |
siblings of the connector, |
status |
a vector of symptom status, |
probs |
a list of all probability parameters of the model, |
fyc |
a matrix of |
p.ybarF.c |
a array of dimension |
res.upward |
the result of the upward step of the peeling algorithm, see |
If Y_above(i)
is the measurements above connector i
and S_i
and C_i
are his status and his class respectively, the function computes
P(Y_above(i),S_i,C_i)
by computing a downward step for the parent of connector i
who is also a connector.
The function returns p.ybarF.c
updated for connector i
.
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
See also downward
#data data(ped.cont) data(peel) fam <- ped.cont[,1] id <- ped.cont[fam==1,2] dad <- ped.cont[fam==1,3] mom <- ped.cont[fam==1,4] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] peel <- peel[[1]] #standardize id to be 1, 2, 3, ... id.origin <- id standard <- function(vec) ifelse(vec%in%id.origin,which(id.origin==vec),0) id <- apply(t(id),2,standard) dad <- apply(t(dad),2,standard) mom <- apply(t(mom),2,standard) peel$couple <- cbind(apply(t(peel$couple[,1]),2,standard), apply(t(peel$couple[,2]),2,standard)) for(generat in 1:peel$generation) peel$peel.connect[generat,] <- apply(t(peel$peel.connect[generat,]),2,standard) #the 2nd connector generat <- peel$generation-1 connect <- peel$peel.connect[generat,] connect <- connect[connect>0][1] parent1.connect <- intersect(peel$peel.connect[generat+1,],c(dad[id==connect], mom[id==connect])) parent2.connect <- setdiff(c(dad[id==connect],mom[id==connect]),parent1.connect) bro.connect <- union(id[dad==parent1.connect],id[mom==parent1.connect]) bro.connect <- setdiff(bro.connect,connect) #probs and param data(probs) data(param.cont) #densities of the observations fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm,param.cont,NULL)) #probability of the observations below p.ybarF.c <- array(1,dim=c(length(id),2,length(probs$p)+1)) #the upward step res.upward <- upward(id,dad,mom,status,probs,fyc,peel) #the function downward.connect(connect,parent1.connect,parent2.connect,bro.connect,status, probs,fyc,p.ybarF.c,res.upward)
#data data(ped.cont) data(peel) fam <- ped.cont[,1] id <- ped.cont[fam==1,2] dad <- ped.cont[fam==1,3] mom <- ped.cont[fam==1,4] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] peel <- peel[[1]] #standardize id to be 1, 2, 3, ... id.origin <- id standard <- function(vec) ifelse(vec%in%id.origin,which(id.origin==vec),0) id <- apply(t(id),2,standard) dad <- apply(t(dad),2,standard) mom <- apply(t(mom),2,standard) peel$couple <- cbind(apply(t(peel$couple[,1]),2,standard), apply(t(peel$couple[,2]),2,standard)) for(generat in 1:peel$generation) peel$peel.connect[generat,] <- apply(t(peel$peel.connect[generat,]),2,standard) #the 2nd connector generat <- peel$generation-1 connect <- peel$peel.connect[generat,] connect <- connect[connect>0][1] parent1.connect <- intersect(peel$peel.connect[generat+1,],c(dad[id==connect], mom[id==connect])) parent2.connect <- setdiff(c(dad[id==connect],mom[id==connect]),parent1.connect) bro.connect <- union(id[dad==parent1.connect],id[mom==parent1.connect]) bro.connect <- setdiff(bro.connect,connect) #probs and param data(probs) data(param.cont) #densities of the observations fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm,param.cont,NULL)) #probability of the observations below p.ybarF.c <- array(1,dim=c(length(id),2,length(probs$p)+1)) #the upward step res.upward <- upward(id,dad,mom,status,probs,fyc,peel) #the function downward.connect(connect,parent1.connect,parent2.connect,bro.connect,status, probs,fyc,p.ybarF.c,res.upward)
computes triplet and individual weights the E step of the EM algorithm for all pedigrees in the data, in both cases with and without familial dependence. This is an internal function not meant to be called by the user.
e.step(ped, probs, param, dens, peel, x = NULL, var.list = NULL, famdep = TRUE)
e.step(ped, probs, param, dens, peel, x = NULL, var.list = NULL, famdep = TRUE)
ped |
a matrix representing pedigrees and measurements: |
probs |
a list of probability parameters of the model, see below for more details, |
param |
a list of measurement distribution parameters of the model, see below for more details, |
dens |
distribution of the mesurements, used in the model (multinormal, multinomial,...) |
peel |
a list of pedigree peeling containing connectors by peeling order and couples of parents, |
x |
covariates, if any. Default is |
var.list |
a list of integers indicating which covariates (taken from |
famdep |
a logical variable indicating if familial dependence model is used or not. Default is |
probs
is a list of initial probability parameters:
For models with familial dependence:
p
a probability vector, each p[c]
is the probability that an symptomatic founder is in class c
for c>=1
,
p0
the probability that a founder without symptoms is in class 0,
p.trans
an array of dimension K
times K+1
times K+1
, where K
is the number of latent classes of
the model, and is such that p.trans[c_i,c_1,c_2]
is the
conditional probability that a symptomatic individual
i
is in class c_i
given that his parents are in classes
c_1
and c_2
,
p0connect
a vector of length K
, where
p0connect[c]
is the probability that a connector without
symptoms is in class 0
,
given that one of his parents is in class c>=1
and the other in class 0,
p.found
the probability that a founder is symptomatic,
p.child
the probability that a child is symptomatic,
For models without familial dependence, all individuals are independent:
p
a probability vector, each p[c]
is the probability that an symptomatic individual is in class c
for c>=1
,
p0
the probability that an individual without symptoms is in class 0,
p.aff
the probability that an individual is symptomatic,
param
is a list of measurement density parameters: the coefficients alpha
(cumulative logistic coefficients see alpha.compute
) in
the case of discrete or ordinal data, and means mu
and variances-covariances matrices sigma
in the case of continuous data,
The function returns a list of 3 elements:
ww |
triplet posterior probabilities, an array of |
w |
individual posterior probabilities, an array of |
ll |
log-likelihood of the considered model and parameters. |
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
See also weight.famdep
, lca.model
.
#data data(ped.cont) data(peel) #probs and probs data(probs) data(param.cont) #the function e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL,var.list=NULL, famdep=TRUE)
#data data(ped.cont) data(peel) #probs and probs data(probs) data(param.cont) #the function e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL,var.list=NULL, famdep=TRUE)
computes initial values of means and variance-covariance matrices for the EM algorithm in the case of continuous measurements and multinormal model.
init.norm(y, K, x = NULL, var.list = NULL)
init.norm(y, K, x = NULL, var.list = NULL)
y |
a |
K |
number of latent classes of the model, |
x |
a matrix of covariates if any, default is |
var.list |
a list of integers indicating which covariates (taken from |
The function allocates every individual to a class by a simple clustering of the data and evaluates the means and variance-covariance matrices of measurements in each class. Treatment of covariates is not yet implemented, and any provided covariate value will be ignored.
The function returns a list of 2 elements mu
and sigma
of length K
each, mu[k]
is the means vector
(of length d
) of measurements in class k
and sigma[k]
is the variances-covariances matrix
(of dimension d
times d
) of measurements in class k
.
#data data(ped.cont) status <- ped.cont[,6] y <- ped.cont[status==2,7:ncol(ped.cont)] #the function init.norm(y,K=3)
#data data(ped.cont) status <- ped.cont[,6] y <- ped.cont[status==2,7:ncol(ped.cont)] #the function init.norm(y,K=3)
computes the initial values of cumulative logistic coefficients alpha for the EM algorithm in the case of ordinal measurements and a product multinomial model.
init.ordi(y, K, x = NULL, var.list = NULL)
init.ordi(y, K, x = NULL, var.list = NULL)
y |
a |
K |
number of latent classes of the model, |
x |
a matrix of covariates if any, default is |
var.list |
list of integers indicating which covariates (taken from |
The function allocates every individual to a class and evaluates the cumulative logistic coefficients for each measurement and each class. Regression coefficients for the covariates are set to 0.
The function returns a list of one element alpha
which is a list of d
elements, each element alpha[[j]]
is a K
times
S-1
matrix, where S
is the number of values of the measurement y[,j]
, a row alpha[[j]][k,]
gives the the cumulative logistic
coefficients of class k
and measurement j
using alpha.compute
.
#data data(ped.ordi) status <- ped.ordi[,6] y <- ped.ordi[,7:ncol(ped.ordi)] #the function init.ordi(y[status==2,],K=3)
#data data(ped.ordi) status <- ped.ordi[,6] y <- ped.ordi[,7:ncol(ped.ordi)] #the function init.ordi(y[status==2,],K=3)
initializes the marginal transition probabilities with or without parental constraint.
init.p.trans(K, trans.const = TRUE)
init.p.trans(K, trans.const = TRUE)
K |
number of latent classes, |
trans.const |
a logical variable indicating if the parental constraint is used. Parental constraint means that the class of a subject can be only one
of his parents classes. Default is |
All non-zero transition probabilities are set to be equal. The parental constraint indicator determines which transition probabilities are non-zero.
the function returns p.trans
an array of dimension K
times K+1
times K+1
: p.trans[c_i,c_1,c_2]
is the probability that
the subject i
is assigned to class c_i
and his parents to classes c_1
and c_2
.
init.p.trans(3) #parental constraint is TRUE, init.p.trans(3,trans.const=FALSE) #parental constraint is FALSE.
init.p.trans(3) #parental constraint is TRUE, init.p.trans(3,trans.const=FALSE) #parental constraint is FALSE.
This is the main function for fitting latent class models. It performs some checks of the pedigrees (it exits if an individual has only one
parent in the pedigree, if no children is in the pedigree or if there
are not enough individuals for parameters estimation) and of the
initial values (positivity of probabilites and their summation to
one). For models with familial dependence, the child latent class
depends on his parents classes via
triplet-transition probabilities. In the case of models without
familial dependence, it performs the classical Latent
Class Analysis (LCA) where all individuals are supposed independent
and the pedigree structure is meaningless. The EM algorithm stops when
the difference between log-likelihood is smaller then tol
that
is fixed by the user.
lca.model(ped, probs, param, optim.param, fit = TRUE, optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE), tol = 0.001, x = NULL, var.list = NULL, famdep = TRUE, modify.init = NULL)
lca.model(ped, probs, param, optim.param, fit = TRUE, optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE), tol = 0.001, x = NULL, var.list = NULL, famdep = TRUE, modify.init = NULL)
ped |
a matrix or data frame representing pedigrees and measurements: |
probs |
a list of initial probability parameters (see below
for more details). The function |
param |
a list of initial measurement distribution parameters (see below for more details). The function |
optim.param |
a variable indicating how measurement distribution parameter optimization is performed (see below for more details), |
fit |
a logical variable, if |
optim.probs.indic |
a vector of logical values indicating which probability parameters to estimate, |
tol |
a small number governing the stopping rule of the EM algorithm. Default is 0.001, |
x |
a matrix of covariates (optional), default is |
var.list |
a list of integers indicating the columns of
|
famdep |
a logical variable indicating if familial dependence model is used or not. Default is |
modify.init |
a function to modify initial values of the EM algorithm, or |
The symptom status vector (column 6 of ped
) takes value 1 for
subjects that have been
examined and show no symptoms (i.e. completely unaffected
subjects). When applying the LCA to
measurements available on all subjects, the status vector must take the
value of 2 for every individual with measurements.
probs
is a list of initial probability parameters:
For models with familial dependence:
p
a probability vector, each p[c]
is the probability that an symptomatic founder is in class c
for c>=1
,
p0
the probability that a founder without symptoms is in class 0,
p.trans
an array of dimension K
times K+1
times K+1
, where K
is the number of latent classes of
the model, and is such that p.trans[c_i,c_1,c_2]
is the
conditional probability that a symptomatic individual
i
is in class c_i
given that his parents are in classes
c_1
and c_2
,
p0connect
a vector of length K
, where
p0connect[c]
is the probability that a connector without
symptoms is in class 0
,
given that one of his parents is in class c>=1
and the other in class 0,
p.found
the probability that a founder is symptomatic,
p.child
the probability that a child is symptomatic,
For models without familial dependence, all individuals are independent:
p
a probability vector, each p[c]
is the probability that an symptomatic individual is in class c
for c>=1
,
p0
the probability that an individual without symptoms is in class 0,
p.aff
the probability that an individual is symptomatic,
param
is a list of measurement distribution parameters: the coefficients alpha
(cumulative logistic coefficients see alpha.compute
) in
the case of discrete or ordinal data, and means mu
and variances-covariances matrices sigma
in the case of continuous data,
optim.param
is a variable indicating how the measurement distribution parameter estimation of the M step is performed. Two possibilities,
optim.noconst.ordi
and optim.const.ordi
, are now available in the case of discrete or ordinal measurements, and four possibilities
optim.indep.norm
(measurements are independent, diagonal variance-covariance matrix),
optim.diff.norm
(general variance-covariance matrix but equal for all classes),
optim.equal.norm
(variance-covariance matrices are different for each class but equal variance and equal covariance for a class) and
optim.gene.norm
(general variance-covariance matrices for all classes), are now available in the case of continuous measurements,
One of the allowed values of optim.param
must be entered without quotes.
optim.probs.indic
is a vector of logical values of length 4 for
models with familial dependence and 2 for models without familial
dependence.
For models with familial dependence:
optim.probs.indic[1]
indicates whether p0
will be estimated or not,
optim.probs.indic[2]
indicates whether p0connect
will be estimated or not,
optim.probs.indic[3]
indicates whether p.found
will be estimated or not,
optim.probs.indic[4]
indicates whether p.connect
will
be estimated or not.
For models without familial dependence:
optim.probs.indic[1]
indicates whether p0
will be estimated or not,
optim.probs.indic[2]
indicates whether p.aff
will be
estimated or not.
All defaults are TRUE
. If the dataset contains only nuclear families, there is no information to estimate p0connect and p.connect, and these parameters will not be estimated, irrespective of the indicator value.
The function returns a list of 4 elements:
param |
the Maximum Likelihood Estimator (MLE) of the
measurement distribution parameters if |
probs |
the MLE of probability parameters if |
When measurements are available on all subjects, the probability parameters p0
and p0connect
are degenerated to 0 and
p.afound
, p.child
and p.aff
to 1 in the output.
weight |
an array of dimension |
ll |
the maximum log-likelihood value (log-ML) if |
TAYEB, A. LABBE, A., BUREAU, A. and MERETTE, C. (2011) Solving Genetic Heterogeneity in Extended
Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 26(3): 539-560. DOI: 10.1007/s00180-010-0224-2,
LABBE, A., BUREAU, A. et MERETTE, C. (2009) Integration of Genetic Familial Dependence Structure in Latent Class Models. The International Journal of Biostatistics, 5(1): Article 6.
#data data(ped.ordi) fam <- ped.ordi[,1] #probs and param data(param.ordi) data(probs) #the function applied only to two first families of ped.ordi lca.model(ped.ordi[fam%in%1:2,],probs,param.ordi,optim.noconst.ordi, fit=TRUE,optim.probs.indic=c(TRUE,TRUE,TRUE,TRUE),tol=0.001,x=NULL, var.list=NULL,famdep=TRUE,modify.init=NULL)
#data data(ped.ordi) fam <- ped.ordi[,1] #probs and param data(param.ordi) data(probs) #the function applied only to two first families of ped.ordi lca.model(ped.ordi[fam%in%1:2,],probs,param.ordi,optim.noconst.ordi, fit=TRUE,optim.probs.indic=c(TRUE,TRUE,TRUE,TRUE),tol=0.001,x=NULL, var.list=NULL,famdep=TRUE,modify.init=NULL)
Performs selection of a latent class model for phenotypic measurements
in pedigrees based on one of
two possible methods: likelihood-based cross-validation or Bayesian
Information Criterion (BIC) selection. This is the top-level
function to perform a Latent Class Analysis (LCA), which calls the
model fitting function
lca.model
. Model selection is performed among models within one of two
types: with and without familial dependence. Two families of
distributions are currently implemented: product multinomial for discrete (or
ordinal) data and mutivariate
normal for continuous data.
model.select(ped, distribution, trans.const = TRUE, optim.param, optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE), famdep = TRUE, selec = "bic", H = 5, K.vec = 1:7, tol = 0.001, x = NULL, var.list = NULL)
model.select(ped, distribution, trans.const = TRUE, optim.param, optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE), famdep = TRUE, selec = "bic", H = 5, K.vec = 1:7, tol = 0.001, x = NULL, var.list = NULL)
ped |
a matrix containing variables coding the pedigree
structure and the phenotype measurements: |
distribution |
a character variable taking the value |
trans.const |
a logical variable indicating if the parental constraint is used. Parental constraint means that the class of a subject must be one
of his parents classes. Default is |
optim.param |
a variable indicating how the measurement distribution parameter optimization is performed (see below for more details), |
optim.probs.indic |
a vector of logical values indicating which probability parameters to estimate (see below for more details), |
famdep |
a logical variable indicating if the familial dependence model is used or not. Default is |
selec |
a character variables taking the value |
H |
an integer giving the number of equal parts into which data will be splitted for the likelihood-based cross-validation model selection (see below for more details), |
K.vec |
a vector of integers, the number of latent classes of
candidate models, if |
tol |
a small number governing the stopping rule of the EM algorithm. Default is 0.001, |
x |
a matrix of covariates (optional), default is |
var.list |
a list of integers indicating the columns of
|
In the case of cross-validation based-likelihood method, data is
splitted into H
parts: H-1
parts as a training set and one part as a
test set. For each model, a validation log-likelihood is obtained by
evaluating the log-likelihood of the test set data using the parameter
values estimated in the training set. This is repeated H
times
using a different part as training set each time, and a total
validation log-likelihood is obtained by summation over the H
test sets. The best model is the one having the largest
validation log-likelihood. In the case of BIC selection method, the
BIC is computed for each candidate model. The model with the smallest
BIC is selected.
The symptom status vector (column 6 of ped
) takes value 1 for
subjects that have been
examined and show no symptoms (i.e. completely unaffected
subjects). When applying the LCA to
measurements available on all subjects, the status vector must take the
value of 2 for every individual with measurements. If covariates are used, covariate values must be provided for subjects with symptom status 0 (missing) but not for subjects with symptom status 1 (if covariate values are provided, they will be ignored).
optim.param
is a variable indicating how the measurement
distribution parameter optimization of the M step is performed. Two
possibilities,
optim.noconst.ordi
and optim.const.ordi
, are now available in the case of discrete or ordinal measurements, and four possibilities,
optim.indep.norm
(measurements are independent, diagonal variance-covariance matrix),
optim.diff.norm
(general variance-covariance matrix but equal for all classes),
optim.equal.norm
(variance-covariance matrices are different for each class but equal variance and equal covariance for a class) and
optim.gene.norm
(general variance-covariance matrices for all classes), in the case of continuous measurements.
One of the allowed values of optim.param
must be entered without quotes.
optim.probs.indic
is a vector of logical values of length 4 for
models with familial dependence and 2 for models without familial
dependence indicating which probability parameters to estimate. See the
help page for lca.model
for a definition of the parameters.
For models with familial dependence:
optim.probs.indic[1]
indicates whether p0
will be estimated or not,
optim.probs.indic[2]
indicates whether p0connect
will be estimated or not,
optim.probs.indic[3]
indicates whether p.found
will be estimated or not,
optim.probs.indic[4]
indicates whether p.connect
will
be estimated or not.
For models without familial dependence:
optim.probs.indic[1]
indicates whether p0
will be estimated or not,
optim.probs.indic[2]
indicates whether p.aff
will be
estimated or not.
All defaults are TRUE
.
The function returns a list of 5 elements, the first 3 elements are common for BIC and cross-validation model selection methods and are:
param |
the Maximum Likelihood Estimator (MLE) of the measurement distribution parameters of the selected model, |
probs |
the Maximum Likelihood Estimator (MLE) of the probability parameters of the selected model, |
weight |
an array of dimension |
If the cross-validation selection method is used, the function returns also
ll |
the value of the maximum log-likelihood (log-ML) of the selected model, |
ll.valid |
the total cross-validation log-likelihood of all candidate models, |
and if the Bayesian Information Criterion selection method is used, the function returns also
ll |
the value of maximum log-likelihood (log-ML) of all candidate models, |
bic |
the Bayesian Information Criterion
|
TAYEB, A. LABBE, A., BUREAU, A. and MERETTE, C. (2011) Solving Genetic Heterogeneity in Extended
Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 26(3): 539-560. DOI: 10.1007/s00180-010-0224-2,
LABBE, A., BUREAU, A. et MERETTE, C. (2009) Integration of Genetic Familial Dependence Structure in Latent Class Models. The International Journal of Biostatistics, 5(1): Article 6.
See also lca.model
.
#data data(ped.cont) fam <- ped.cont[,1] #the function applied for the two first families of ped.cont model.select(ped.cont[fam%in%1:2,],distribution="normal",trans.const=TRUE, optim.indep.norm,optim.probs.indic=c(TRUE,TRUE,TRUE,TRUE), famdep=TRUE,selec="bic",K.vec=1:3,tol=0.001,x=NULL,var.list=NULL)
#data data(ped.cont) fam <- ped.cont[,1] #the function applied for the two first families of ped.cont model.select(ped.cont[fam%in%1:2,],distribution="normal",trans.const=TRUE, optim.indep.norm,optim.probs.indic=c(TRUE,TRUE,TRUE,TRUE), famdep=TRUE,selec="bic",K.vec=1:3,tol=0.001,x=NULL,var.list=NULL)
computes the number of free parameters of a model, depending in the number of classes, the type of parameter optimization and the used of familial dependence, to be used in BIC model selection. This is an internal function not meant to be called by the user.
n.param(y, K, trans.const = TRUE, optim.param, optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE), famdep = TRUE)
n.param(y, K, trans.const = TRUE, optim.param, optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE), famdep = TRUE)
y |
a matrix of measurements, |
K |
an integer, the number of latent classes of a candiate model, |
trans.const |
a logical variable indicating if the parental constraint is used. Parental constraint means that the class of a subject can be only one
of his parents classes. Default is |
optim.param |
a function used for parameter optimization, see |
optim.probs.indic |
a vector of logical values indicating which probability parameters to be updated, see |
famdep |
a logical variable indicating if familial dependence model is used or not. Default is |
The function returns the number of free parameters (of the measurement distribution and the probabilities of the latent classes).
See also model.select
.
data(ped.cont) y <- ped.cont[,7:ncol(ped.cont)] n.param(y,K=3,trans.const=TRUE,optim.indep.norm, optim.probs.indic=c(TRUE,TRUE,TRUE,TRUE),famdep=TRUE)
data(ped.cont) y <- ped.cont[,7:ncol(ped.cont)] n.param(y,K=3,trans.const=TRUE,optim.indep.norm, optim.probs.indic=c(TRUE,TRUE,TRUE,TRUE),famdep=TRUE)
Estimates the cumulative logistic coefficients alpha
in the
case of multinomial (or ordinal) data with an ordinal constraint on
the parameters.
optim.const.ordi(y, status, weight, param, x = NULL, var.list = NULL)
optim.const.ordi(y, status, weight, param, x = NULL, var.list = NULL)
y |
a matrix of discrete (or ordinal) measurements (only for symptomatic subjects), |
status |
symptom status of all individuals, |
weight |
a matrix of |
param |
a list of measurement density parameters, here is a list of |
x |
a matrix of covariates (optional). Default id |
var.list |
a list of integers indicating which covariates (taken from |
the constraint on the parameters is that, for a symptom j
, the rows alpha[[j]][k,]
are equal for all classes k
except the first values.
Therefore, maximum likelihood estimators are not explicit and the
function lrm
of the package rms
is used to perform a
numerical optimization.
The function returns a list of estimated parameters param
satisfying the constraint.
#data data(ped.ordi) status <- ped.ordi[,6] y <- ped.ordi[,7:ncol(ped.ordi)] data(peel) #probs and param data(probs) data(param.ordi) #e step weight <- e.step(ped.ordi,probs,param.ordi,dens.prod.ordi,peel,x=NULL, var.list=NULL,famdep=TRUE)$w weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.ordi), ncol=length(probs$p)) #the function optim.const.ordi(y[status==2,],status,weight,param.ordi,x=NULL, var.list=NULL)
#data data(ped.ordi) status <- ped.ordi[,6] y <- ped.ordi[,7:ncol(ped.ordi)] data(peel) #probs and param data(probs) data(param.ordi) #e step weight <- e.step(ped.ordi,probs,param.ordi,dens.prod.ordi,peel,x=NULL, var.list=NULL,famdep=TRUE)$w weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.ordi), ncol=length(probs$p)) #the function optim.const.ordi(y[status==2,],status,weight,param.ordi,x=NULL, var.list=NULL)
Estimates the mean
mu
and parameters of the variance-covariance matrix
sigma
of a multinormal distribution for the measurements with
a general variance-covariance matrix identical for all classes.
optim.diff.norm(y, status, weight, param, x = NULL, var.list = NULL)
optim.diff.norm(y, status, weight, param, x = NULL, var.list = NULL)
y |
a matrix of continuous measurements (only for symptomatic subjects), |
status |
symptom status of all individuals, |
weight |
a matrix of |
param |
a list of measurement density parameters, here is a list of |
x |
a matrix of covariates (optional). Default id |
var.list |
a list of integers indicating which covariates (taken from |
The values of explicit estimators are computed for both mu
and
sigma
. The variance-covariance matrices sigma
are
identical for all classes. Treatment of covariates is not yet implemented, and any
provided covariate value will be ignored.
The function returns a list of estimated parameters param
.
#data data(ped.cont) status <- ped.cont[,6] y <- ped.cont[,7:ncol(ped.cont)] data(peel) #probs and param data(probs) data(param.cont) #e step weight <- e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL, var.list=NULL,famdep=TRUE)$w weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.cont), ncol=length(probs$p)) #the function optim.diff.norm(y[status==2,],status,weight,param.cont,x=NULL, var.list=NULL)
#data data(ped.cont) status <- ped.cont[,6] y <- ped.cont[,7:ncol(ped.cont)] data(peel) #probs and param data(probs) data(param.cont) #e step weight <- e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL, var.list=NULL,famdep=TRUE)$w weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.cont), ncol=length(probs$p)) #the function optim.diff.norm(y[status==2,],status,weight,param.cont,x=NULL, var.list=NULL)
Estimates the mean
mu
and parameters of the variance-covariance matrix
sigma
of a multinormal distribution for the measurements
with equal variance for all measurements and equal covariance between
all pairs of measurements within each class. The variance and
covariance parameters are however distinct for each class.
optim.equal.norm(y, status, weight, param, x = NULL, var.list = NULL)
optim.equal.norm(y, status, weight, param, x = NULL, var.list = NULL)
y |
a matrix of continuous measurements (only for symptomatic subjects), |
status |
symptom status of all individuals, |
weight |
a matrix of |
param |
a list of measurement density parameters, here is a list of |
x |
a matrix of covariates (optional). Default id |
var.list |
a list of integers indicating which covariates (taken from |
The values of explicit estimators are computed for both mu
and
sigma
. The variance-covariance matrices sigma
are
distinct for each class. Treatment of covariates is not yet implemented, and any
provided covariate value will be ignored.
The function returns a list of estimated parameters param
.
#data data(ped.cont) status <- ped.cont[,6] y <- ped.cont[,7:ncol(ped.cont)] data(peel) #probs and param data(probs) data(param.cont) #e step weight <- e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL, var.list=NULL,famdep=TRUE)$w weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.cont), ncol=length(probs$p)) #the function optim.equal.norm(y[status==2,],status,weight,param.cont,x=NULL, var.list=NULL)
#data data(ped.cont) status <- ped.cont[,6] y <- ped.cont[,7:ncol(ped.cont)] data(peel) #probs and param data(probs) data(param.cont) #e step weight <- e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL, var.list=NULL,famdep=TRUE)$w weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.cont), ncol=length(probs$p)) #the function optim.equal.norm(y[status==2,],status,weight,param.cont,x=NULL, var.list=NULL)
Estimates the mean
mu
and parameters of the variance-covariance matrix
sigma
of a multinormal distribution for the measurements with
general variance-covariance matrices distinct for each class.
optim.gene.norm(y, status, weight, param, x = NULL, var.list = NULL)
optim.gene.norm(y, status, weight, param, x = NULL, var.list = NULL)
y |
a matrix of continuous measurements (only for symptomatic subjects), |
status |
symptom status of all individuals, |
weight |
a matrix of |
param |
a list of measurement density parameters, here is a list of |
x |
a matrix of covariates (optional). Default id |
var.list |
a list of integers indicating which covariates (taken from |
The values of explicit estimators are computed for both mu
and
sigma
. This is the general case, the variance-covariance
matrices sigma
of the different classes are distinct and
unconstrained. Treatment of covariates is not yet implemented, and any
provided covariate value will be ignored.
The function returns a list of estimated parameters param
.
#data data(ped.cont) status <- ped.cont[,6] y <- ped.cont[,7:ncol(ped.cont)] data(peel) #probs and param data(probs) data(param.cont) #e step weight <- e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL, var.list=NULL,famdep=TRUE)$w weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.cont), ncol=length(probs$p)) #the function optim.gene.norm(y[status==2,],status,weight,param.cont,x=NULL, var.list=NULL)
#data data(ped.cont) status <- ped.cont[,6] y <- ped.cont[,7:ncol(ped.cont)] data(peel) #probs and param data(probs) data(param.cont) #e step weight <- e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL, var.list=NULL,famdep=TRUE)$w weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.cont), ncol=length(probs$p)) #the function optim.gene.norm(y[status==2,],status,weight,param.cont,x=NULL, var.list=NULL)
Estimates the mean
mu
and parameters of the variance-covariance matrix
sigma
of a multinormal distribution for the measurements with
diagonal variance-covariance matrices for each class, i.e. measurements are supposed independent.
optim.indep.norm(y, status, weight, param, x = NULL, var.list = NULL)
optim.indep.norm(y, status, weight, param, x = NULL, var.list = NULL)
y |
a matrix of continuous measurements (only for symptomatic subjects), |
status |
symptom status of all individuals, |
weight |
a matrix of |
param |
a list of measurement density parameters, here is a list of |
x |
a matrix of covariates (optional). Default id |
var.list |
a list of integers indicating which covariates (taken from |
The values of explicit estimators are computed for both mu
and
sigma
. All variance-covariance matrices sigma
are
diagonal, i.e. measurements are supposed independent. Treatment of
covariates is not yet implemented, and any
provided covariate value will be ignored.
The function returns a list of estimated parameters param
.
#data data(ped.cont) status <- ped.cont[,6] y <- ped.cont[,7:ncol(ped.cont)] data(peel) #probs and param data(probs) data(param.cont) #e step weight <- e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL, var.list=NULL,famdep=TRUE)$w weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.cont), ncol=length(probs$p)) #the function optim.indep.norm(y[status==2,],status,weight,param.cont,x=NULL, var.list=NULL)
#data data(ped.cont) status <- ped.cont[,6] y <- ped.cont[,7:ncol(ped.cont)] data(peel) #probs and param data(probs) data(param.cont) #e step weight <- e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL, var.list=NULL,famdep=TRUE)$w weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.cont), ncol=length(probs$p)) #the function optim.indep.norm(y[status==2,],status,weight,param.cont,x=NULL, var.list=NULL)
Estimates the cumulative logistic coefficients alpha
in the case of multinomial (or ordinal) data without constraint on the coefficients.
optim.noconst.ordi(y, status, weight, param, x = NULL, var.list = NULL)
optim.noconst.ordi(y, status, weight, param, x = NULL, var.list = NULL)
y |
a matrix of discrete (or ordinal) measurements (only for symptomatic subjects), |
status |
symptom status of all individuals, |
weight |
a matrix of |
param |
a list of measurement distribution parameters, here is a list |
x |
a matrix of covariates (optional). Default is |
var.list |
a list of integers indicating which covariates (taken from |
The values of explicit estimators are computed by logistic transformation of weighted empirical frequencies.
the function returns a list of estimated parameters param
.
#data data(ped.ordi) status <- ped.ordi[,6] y <- ped.ordi[,7:ncol(ped.ordi)] data(peel) #probs and param data(probs) data(param.ordi) #e step weight <- e.step(ped.ordi,probs,param.ordi,dens.prod.ordi,peel,x=NULL, var.list=NULL,famdep=TRUE)$w weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.ordi), ncol=length(probs$p)) #the function optim.noconst.ordi(y[status==2,],status,weight,param.ordi,x=NULL, var.list=NULL)
#data data(ped.ordi) status <- ped.ordi[,6] y <- ped.ordi[,7:ncol(ped.ordi)] data(peel) #probs and param data(probs) data(param.ordi) #e step weight <- e.step(ped.ordi,probs,param.ordi,dens.prod.ordi,peel,x=NULL, var.list=NULL,famdep=TRUE)$w weight <- matrix(weight[,1,1:length(probs$p)],nrow=nrow(ped.ordi), ncol=length(probs$p)) #the function optim.noconst.ordi(y[status==2,],status,weight,param.ordi,x=NULL, var.list=NULL)
estimates the probability parameters (p
, p.trans
, p0
,...) in the M step of the EM algorithm in both cases
with and without familial dependence. This is an internal
function not meant to be called by the user.
optim.probs(ped, probs, optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE), res.weight, famdep = TRUE)
optim.probs(ped, probs, optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE), res.weight, famdep = TRUE)
ped |
a matrix of pedigrees data, see |
probs |
all probability parameters to be optimized, |
optim.probs.indic |
a vector of logical values indicating which probability parameters to be updated, |
res.weight |
a matrix of |
famdep |
a logical variable indicating if familial dependence model is used or not. Default is |
explicit estimators are computed in function of the weights.
the function returns the estimated probs
of all probability parameters.
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
#data data(ped.cont) data(peel) #probs and param data(probs) data(param.cont) #e step weight <- e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL, var.list=NULL,famdep=TRUE) #the function optim.probs(ped.cont,probs,weight,optim.probs.indic= c(TRUE,TRUE,TRUE,TRUE),famdep=TRUE)
#data data(ped.cont) data(peel) #probs and param data(probs) data(param.cont) #e step weight <- e.step(ped.cont,probs,param.cont,dens.norm,peel,x=NULL, var.list=NULL,famdep=TRUE) #the function optim.probs(ped.cont,probs,weight,optim.probs.indic= c(TRUE,TRUE,TRUE,TRUE),famdep=TRUE)
computes the probability vector using cumulative logistic coefficients
p.compute(alpha,decal)
p.compute(alpha,decal)
alpha |
a vector of cumulative logistic coefficients, the first value can
be |
decal |
offset term to be applied to sums of logistic coefficients |
If alpha
has S-1
values, p.compute
returns p
of length S
. If Y
is a random variable taking values in {1,...,S}
with
probabilities p
, coefficients alpha[i]
are given by:
for all i<=S-1
.
p
: a probability vector
p.compute
is the inverse function of alpha.compute
# a vector of probability p <- c(0,0.2,0,0,0.3,0.4,0.1,0,0) alpha <- alpha.compute(p) #gives alpha= -Inf -1.38 0 0 1.38 0 2.19 Inf Inf p.compute(alpha) #gives p
# a vector of probability p <- c(0,0.2,0,0,0.3,0.4,0.1,0,0) alpha <- alpha.compute(p) #gives alpha= -Inf -1.38 0 0 1.38 0 2.19 Inf Inf p.compute(alpha) #gives p
computes the posterior probability of measurements of a child for each class and each symptom status of the subject given the classes of both of his parents. This is an internal function not meant to be called by the user.
p.post.child(child, c.connect, c.spouse, status, probs, fyc)
p.post.child(child, c.connect, c.spouse, status, probs, fyc)
child |
a child in the pedigree, |
c.connect |
the class of one parent (who is a connector) of the child, |
c.spouse |
the class of the other parent of the child, |
status |
the symptom status vector of the whole pedigree, |
probs |
a list of all probability parameters of the model, |
fyc |
a matrix of |
the function returns p.child
a matrix of 2 times K+1
entries such that p.child[s,k]
is the posterior probability of the measurements Y_child
under status S_child=s
and when he is assigned to class k
and his parents are assigned to classes c.connect
and c.spouse
.
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
#data data(ped.cont) fam <- ped.cont[,1] dad <- ped.cont[fam==1,3] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] #a child child <- which(dad!=0)[1] data(probs) data(param.cont) #densities of the observations fyc <- matrix(1,nrow=nrow(y),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm, param.cont,NULL)) #the function p.post.child(child,c.connect=1,c.spouse=3,status,probs,fyc)
#data data(ped.cont) fam <- ped.cont[,1] dad <- ped.cont[fam==1,3] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] #a child child <- which(dad!=0)[1] data(probs) data(param.cont) #densities of the observations fyc <- matrix(1,nrow=nrow(y),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm, param.cont,NULL)) #the function p.post.child(child,c.connect=1,c.spouse=3,status,probs,fyc)
computes the posterior probability of measurements of a founder for each class and each symptom status of the founder. This is an internal function not meant to be called by the user.
p.post.found(found, status, probs, fyc)
p.post.found(found, status, probs, fyc)
found |
a founder in the pedigree (individual without parents in the pedigree), |
status |
the symptom status vector of the whole pedigree, |
probs |
a list of all probability parameters of the model, |
fyc |
a matrix of |
the function returns p.found
a matrix of 2 times K+1
entries: p.found[s,k]
is the posterior probability of the observations Y_found
under status S_found=s
and when he is assigned to class k
.
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2001, DOI: 10.1007/s00180-010-0224-2.
#data data(ped.cont) fam <- ped.cont[,1] dad <- ped.cont[fam==1,3] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] #a founder found <- which(dad==0)[1] data(probs) data(param.cont) #densities of the observations fyc <- matrix(1,nrow=nrow(y),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm, param.cont,NULL)) #the function p.post.found(found,status,probs,fyc)
#data data(ped.cont) fam <- ped.cont[,1] dad <- ped.cont[fam==1,3] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] #a founder found <- which(dad==0)[1] data(probs) data(param.cont) #densities of the observations fyc <- matrix(1,nrow=nrow(y),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm, param.cont,NULL)) #the function p.post.found(found,status,probs,fyc)
means and variance-covariance matrices for each class to be used in examples with continuous measurements.
data(param.cont)
data(param.cont)
ped.param
is a list of 2 elements:
mu
a list of K=3
(the number of latent classes) entries, each represents the means of the measurement multinormal density in the class,
sigma
a list of K=3
entries, each is the
variance-covariance matrix of the measurement multinormal density in the class.
The dimension (the number of multinormal measurements) used in the dataset is 4.
See also init.norm
data(param.cont)
data(param.cont)
list of cumulative logistic coefficients for each measurement and each class to be used in examples for discrete or ordinal models.
data(param.ordi)
data(param.ordi)
ped.param
is a list of 1 element:
alpha
a list of d=4
(the number of measurements) entries, each is a matrix of K=3
(the number of classes) times S[j]
(the number of possible values of measurement j
), a row alpha[[j]][k,]
contains the logistic coefficients of the measurement j
for class k
.
See also init.ordi
data(param.ordi)
data(param.ordi)
data set of 48 pedigrees: a matrix of pedigrees data with continuous observations to be used for examples.
data(ped.cont)
data(ped.cont)
ped
is a matrix of 10 columns:
ped[,1]
family ID,
ped[,2]
subject ID,
ped[,3]
father ID, 0 for founders (i.e. subjects having no parents in the pedigree),
ped[,4]
mother ID, 0 for founders (i.e. subjects having no parents in the pedigree),
ped[,5]
subject sex: 1 male, 2 female,
ped[,6]
symptom status (2: symptomatic, 1: without symptoms, 0: missing),
ped[,7:10]
continuous observations, NA
for
missing and without symptoms,
data(ped.cont)
data(ped.cont)
data set of 48 pedigrees: a matrix of pedigrees data with discrete or ordinal observations to be used for examples.
data(ped.ordi)
data(ped.ordi)
ped
is a matrix of 10 columns:
ped[,1]
family ID,
ped[,2]
subject ID,
ped[,3]
father ID, 0 for founders (i.e. subjects having no parents in the pedigree),
ped[,4]
mother ID, 0 for founders (i.e. subjects having no parents in the pedigree),
ped[,5]
subject sex: 1 male, 2 female,
ped[,6]
symptom status (2: symptomatic, 1: without symptoms, 0: missing),
ped[,7:10]
discrete or ordinal observations,
NA
for missing and without symptoms,
data(ped.ordi)
data(ped.ordi)
peel
is a list of 48 entries, each gives the peeling order of the pedigrees and lists the couples in the 48 pedigrees of ped.cont
and peed.ordi
.
data(peel)
data(peel)
For a pedigree f
in the data ped.cont
or ped.ordi
,
peel[[f]]
is a list of three entries:
generation
the number of generations in the pedigree,
peel.connect
a matrix of generation
rows, each giving the connectors of the generation in the order of peeling,
couple
a matrix of two columns, giving the couples in the pedigree.
See also ped.cont
and ped.ordi
.
data(peel)
data(peel)
a list of probability parameters such as the probability that a founder is assigned to each class, the transition probabilities and the probability that a child is symptomatic.
data(probs)
data(probs)
probs
a list of probability parameters:
For models with familial dependence:
p
a probability vector, each p[c]
is the probability that an symptomatic founder is in class c
for c>=1
,
p0
the probability that a founder without symptoms is in class 0,
p.trans
an array of dimension K
times K+1
times K+1
, where K
is the number of latent classes of
the model, and is such that p.trans[c_i,c_1,c_2]
is the
conditional probability that a symptomatic individual
i
is in class c_i
given that his parents are in classes
c_1
and c_2
,
p0connect
a vector of length K
, where
p0connect[c]
is the probability that a connector without
symptoms is in class 0
,
given that one of his parents is in class c>=1
and the other in class 0,
p.found
the probability that a founder is symptomatic,
p.child
the probability that a child is symptomatic,
For models without familial dependence, all individuals are independent:
p
a probability vector, each p[c]
is the probability that an symptomatic individual is in class c
for c>=1
,
p0
the probability that an individual without symptoms is in class 0,
p.aff
the probability that an individual is symptomatic,
data(probs)
data(probs)
computes the probability of observations below connectors conditionally to their classes given the model parameters. This is an internal function not meant to be called by the user.
upward(id, dad, mom, status, probs, fyc, peel)
upward(id, dad, mom, status, probs, fyc, peel)
id |
individual ID of the pedigree, |
dad |
dad ID, |
mom |
mom ID, |
status |
symptom status: (2: symptomatic, 1: without symptoms, 0: missing), |
probs |
a list of probability parameters of the model, |
fyc |
a matrix of |
peel |
a list of pedigree peeling result containing connectors by peeling order and couples of parents. |
This function computes the probability of observations below connectors conditionally to their classes using the function upward.connect
The function returns a list of 2 elements:
sum.child |
an array of dimension |
p.yF.c |
an array of dimension |
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
See also upward.connect
#data data(ped.cont) data(peel) fam <- ped.cont[,1] id <- ped.cont[fam==1,2] dad <- ped.cont[fam==1,3] mom <- ped.cont[fam==1,4] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] peel <- peel[[1]] #standardize id to be 1, 2, 3, ... id.origin <- id standard <- function(vec) ifelse(vec%in%id.origin,which(id.origin==vec),0) id <- apply(t(id),2,standard) dad <- apply(t(dad),2,standard) mom <- apply(t(mom),2,standard) peel$couple <- cbind(apply(t(peel$couple[,1]),2,standard), apply(t(peel$couple[,2]),2,standard)) for(generat in 1:peel$generation) peel$peel.connect[generat,] <- apply(t(peel$peel.connect[generat,]),2,standard) #probs and param data(probs) data(param.cont) #densities of the observations fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm, param.cont,NULL)) #the function upward(id,dad,mom,status,probs,fyc,peel)
#data data(ped.cont) data(peel) fam <- ped.cont[,1] id <- ped.cont[fam==1,2] dad <- ped.cont[fam==1,3] mom <- ped.cont[fam==1,4] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] peel <- peel[[1]] #standardize id to be 1, 2, 3, ... id.origin <- id standard <- function(vec) ifelse(vec%in%id.origin,which(id.origin==vec),0) id <- apply(t(id),2,standard) dad <- apply(t(dad),2,standard) mom <- apply(t(mom),2,standard) peel$couple <- cbind(apply(t(peel$couple[,1]),2,standard), apply(t(peel$couple[,2]),2,standard)) for(generat in 1:peel$generation) peel$peel.connect[generat,] <- apply(t(peel$peel.connect[generat,]),2,standard) #probs and param data(probs) data(param.cont) #densities of the observations fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm, param.cont,NULL)) #the function upward(id,dad,mom,status,probs,fyc,peel)
computes the probability of the measurements below a connector conditionally to the connector latent class given the model parameters. This is an internal function not meant to be called by the user.
upward.connect(connect, spouse.connect, children.connect, status, probs, p.yF.c, fyc, sum.child)
upward.connect(connect, spouse.connect, children.connect, status, probs, p.yF.c, fyc, sum.child)
connect |
a connector in the pedigree, |
spouse.connect |
spouse of the connector, |
children.connect |
children of the connector, |
status |
a vector of symptom status of the whole pedigree, |
probs |
a list of probability parameters of the model, |
p.yF.c |
an array of dimension |
fyc |
a matrix of |
sum.child |
an array of dimension |
If Y_above(i)
is the observations below connector i
and C_i
is his class, the functions computes P(Y_below(i)|C_i)
.
The function returns a list of 2 elements:
sum.child |
an array of dimension |
p.yF.c |
a array of dimension |
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
See also upward
#data data(ped.cont) data(peel) fam <- ped.cont[,1] id <- ped.cont[fam==1,2] dad <- ped.cont[fam==1,3] mom <- ped.cont[fam==1,4] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] peel <- peel[[1]] #standardize id to be 1, 2, 3, ... id.origin <- id standard <- function(vec) ifelse(vec%in%id.origin,which(id.origin==vec),0) id <- apply(t(id),2,standard) dad <- apply(t(dad),2,standard) mom <- apply(t(mom),2,standard) peel$couple <- cbind(apply(t(peel$couple[,1]),2,standard), apply(t(peel$couple[,2]),2,standard)) for(generat in 1:peel$generation) peel$peel.connect[generat,] <- apply(t(peel$peel.connect[generat,]),2,standard) #a nuclear family #connector in the pedigree 1 connect <- peel$peel.connect[1,1] #soupse of connector connect spouse.connect <- peel$couple[peel$couple[,1]==connect,2] #children of connector connect children.connect <- union(id[dad==connect],id[mom==connect]) #probs and param data(probs) data(param.cont) #probabilitiy of observations above p.yF.c <- matrix(1,nrow=length(id),ncol=length(probs$p)+1) #densities of the observations fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm, param.cont,NULL)) #sums over childs sum.child <- array(0,c(length(id),length(probs$p)+1,length(probs$p)+1)) #the function upward.connect(connect,spouse.connect,children.connect,status,probs, p.yF.c,fyc,sum.child)
#data data(ped.cont) data(peel) fam <- ped.cont[,1] id <- ped.cont[fam==1,2] dad <- ped.cont[fam==1,3] mom <- ped.cont[fam==1,4] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] peel <- peel[[1]] #standardize id to be 1, 2, 3, ... id.origin <- id standard <- function(vec) ifelse(vec%in%id.origin,which(id.origin==vec),0) id <- apply(t(id),2,standard) dad <- apply(t(dad),2,standard) mom <- apply(t(mom),2,standard) peel$couple <- cbind(apply(t(peel$couple[,1]),2,standard), apply(t(peel$couple[,2]),2,standard)) for(generat in 1:peel$generation) peel$peel.connect[generat,] <- apply(t(peel$peel.connect[generat,]),2,standard) #a nuclear family #connector in the pedigree 1 connect <- peel$peel.connect[1,1] #soupse of connector connect spouse.connect <- peel$couple[peel$couple[,1]==connect,2] #children of connector connect children.connect <- union(id[dad==connect],id[mom==connect]) #probs and param data(probs) data(param.cont) #probabilitiy of observations above p.yF.c <- matrix(1,nrow=length(id),ncol=length(probs$p)+1) #densities of the observations fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm, param.cont,NULL)) #sums over childs sum.child <- array(0,c(length(id),length(probs$p)+1,length(probs$p)+1)) #the function upward.connect(connect,spouse.connect,children.connect,status,probs, p.yF.c,fyc,sum.child)
computes the triplet and the individual weights of the E step of the EM algorithm for a pedigree in the case of familial dependence. It returns also the overall log-likelihood of the observations. This is an internal function not meant to be called by the user.
weight.famdep(id, dad, mom, status, probs, fyc, peel)
weight.famdep(id, dad, mom, status, probs, fyc, peel)
id |
individual ID of the pedigree, |
dad |
dad ID, |
mom |
mom ID, |
status |
symptom status: (2: symptomatic, 1: without symptoms, 0: missing), |
probs |
list of probability parameters of the model, |
fyc |
a matrix of |
peel |
a list of pedigree peeling containing connectors by peeling order and couples of parents |
the function calls the functions upward
and
downward
which perform the required probability
computations by processing the pedigree by nuclear family (or
equivalently by connector) following the peeling order.
the function returns a list of 3 elements:
ww |
triplet weights: an array of |
w |
individual weights: an array of |
ll |
log-likelihood. |
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
See also upward
, downward
, e.step
.
#data data(ped.cont) data(peel) fam <- ped.cont[,1] id <- ped.cont[fam==1,2] dad <- ped.cont[fam==1,3] mom <- ped.cont[fam==1,4] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] peel <- peel[[1]] #probs and param data(probs) data(param.cont) #densities of the observations fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm, param.cont,NULL)) #the function weight.famdep(id,dad,mom,status,probs,fyc,peel)
#data data(ped.cont) data(peel) fam <- ped.cont[,1] id <- ped.cont[fam==1,2] dad <- ped.cont[fam==1,3] mom <- ped.cont[fam==1,4] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] peel <- peel[[1]] #probs and param data(probs) data(param.cont) #densities of the observations fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm, param.cont,NULL)) #the function weight.famdep(id,dad,mom,status,probs,fyc,peel)
the weighting algorithm proceeds by nuclear family, the function weight.nuc
computes the unnormalized triplet and individuals weights for a
nuclear family in the pedigree. This is an internal
function not meant to be called by the user.
weight.nuc(connect, spouse.connect, children.connect, status, probs, fyc, p.ybarF.c, ww, w, res.upward)
weight.nuc(connect, spouse.connect, children.connect, status, probs, fyc, p.ybarF.c, ww, w, res.upward)
connect |
a connector in the pedigree, |
spouse.connect |
spouse of the connector, |
children.connect |
children of the connector, |
status |
vector of symptom status of the whole pedigree, |
probs |
all probability parameters of the model, |
fyc |
a matrix of |
p.ybarF.c |
a array of dimension |
ww |
unnormalized triplet weights, an array of |
w |
unnormalized individual weights, an array of |
res.upward |
result of the upward step of the weighting algorithm, see |
updated ww
and w
are computed for the current nuclear family.
the function returns a list of 2 elements:
ww |
unnormalized triplet weights, an array of |
w |
unnormalized individual weights, an array of |
TAYEB et al.: Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 2011, DOI: 10.1007/s00180-010-0224-2.
See also downward
#data data(ped.cont) data(peel) fam <- ped.cont[,1] id <- ped.cont[fam==1,2] dad <- ped.cont[fam==1,3] mom <- ped.cont[fam==1,4] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] peel <- peel[[1]] #standardize id to be 1, 2, 3, ... id.origin <- id standard <- function(vec) ifelse(vec%in%id.origin,which(id.origin==vec),0) id <- apply(t(id),2,standard) dad <- apply(t(dad),2,standard) mom <- apply(t(mom),2,standard) peel$couple <- cbind(apply(t(peel$couple[,1]),2,standard), apply(t(peel$couple[,2]),2,standard)) for(generat in 1:peel$generation) peel$peel.connect[generat,] <- apply(t(peel$peel.connect[generat,]),2,standard) #the first nuclear family generat <- peel$generation connect <- peel$peel.connect[generat,] connect <- connect[connect>0] spouse.connect <- peel$couple[peel$couple[,1]==connect,2] children.connect <- union(id[dad==connect],id[mom==connect]) #probs and param data(probs) data(param.cont) #densities of the observations fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm, param.cont,NULL)) #triplet and individual weights ww <- array(0,dim=c(length(id),rep(2,3),rep(length(probs$p)+1,3))) w <- array(0,dim=c(length(id),2,length(probs$p)+1)) #probability of the observations below p.ybarF.c <- array(1,dim=c(length(id),2,length(probs$p)+1)) p.ybarF.c[connect,,] <- p.post.found(connect,status,probs,fyc) #the upward step res.upward <- upward(id,dad,mom,status,probs,fyc,peel) #the function weight.nuc(connect,spouse.connect,children.connect,status,probs,fyc, p.ybarF.c,ww,w,res.upward)
#data data(ped.cont) data(peel) fam <- ped.cont[,1] id <- ped.cont[fam==1,2] dad <- ped.cont[fam==1,3] mom <- ped.cont[fam==1,4] status <- ped.cont[fam==1,6] y <- ped.cont[fam==1,7:ncol(ped.cont)] peel <- peel[[1]] #standardize id to be 1, 2, 3, ... id.origin <- id standard <- function(vec) ifelse(vec%in%id.origin,which(id.origin==vec),0) id <- apply(t(id),2,standard) dad <- apply(t(dad),2,standard) mom <- apply(t(mom),2,standard) peel$couple <- cbind(apply(t(peel$couple[,1]),2,standard), apply(t(peel$couple[,2]),2,standard)) for(generat in 1:peel$generation) peel$peel.connect[generat,] <- apply(t(peel$peel.connect[generat,]),2,standard) #the first nuclear family generat <- peel$generation connect <- peel$peel.connect[generat,] connect <- connect[connect>0] spouse.connect <- peel$couple[peel$couple[,1]==connect,2] children.connect <- union(id[dad==connect],id[mom==connect]) #probs and param data(probs) data(param.cont) #densities of the observations fyc <- matrix(1,nrow=length(id),ncol=length(probs$p)+1) fyc[status==2,1:length(probs$p)] <- t(apply(y[status==2,],1,dens.norm, param.cont,NULL)) #triplet and individual weights ww <- array(0,dim=c(length(id),rep(2,3),rep(length(probs$p)+1,3))) w <- array(0,dim=c(length(id),2,length(probs$p)+1)) #probability of the observations below p.ybarF.c <- array(1,dim=c(length(id),2,length(probs$p)+1)) p.ybarF.c[connect,,] <- p.post.found(connect,status,probs,fyc) #the upward step res.upward <- upward(id,dad,mom,status,probs,fyc,peel) #the function weight.nuc(connect,spouse.connect,children.connect,status,probs,fyc, p.ybarF.c,ww,w,res.upward)