Title: | Functions for Clustering and Testing of Presence-Absence, Abundance and Multilocus Genetic Data |
---|---|
Description: | Distance-based parametric bootstrap tests for clustering with spatial neighborhood information. Some distance measures, Clustering of presence-absence, abundance and multilocus genetic data for species delimitation, nearest neighbor based noise detection. Genetic distances between communities. Tests whether various distance-based regressions are equal. Try package?prabclus for on overview. |
Authors: | Christian Hennig [aut, cre], Bernhard Hausdorf [aut] |
Maintainer: | Christian Hennig <[email protected]> |
License: | GPL |
Version: | 2.3-4 |
Built: | 2024-11-24 06:52:51 UTC |
Source: | CRAN |
Here is a list of the main functions in package prabclus. Most other functions are auxiliary functions for these.
Initialises presence/absence-, abundance- and multilocus data with dominant markers for use with most other key prabclus-functions.
Initialises multilocus data with codominant markers for use with key prabclus-functions.
Generates the input format required by
alleleinit
.
Computes the tests introduced in Hausdorf and Hennig (2003) and Hennig and Hausdorf (2004; these tests occur in some further publications of ours but this one is the most detailed statistical reference) for presence/absence data. Allows use of the geco-dissimilarity (Hennig and Hausdorf, 2006).
Computes the test introduced in Hausdorf and Hennig (2007) for abundance data.
A classical distance-based test for homogeneity going back to Erdos and Renyi (1960) and Ling (1973).
Species clustering for biotic element analysis
(Hausdorf and Hennig, 2007, Hennig and Hausdorf, 2004 and others),
clustering of individuals for species delimitation (Hausdorf and
Hennig, 2010) based on Gaussian mixture model clustering with
noise as implemented in R-package mclust
, Fraley and
Raftery (1998), on output of
multidimensional scaling from distances as computed by
prabinit
or alleleinit
. See also
stressvals
for help with choosing the number of
MDS-dimensions.
An unpublished alternative to
prabclust
using hierarchical clustering methods.
Visualisation of clusters of genetic markers vs. clusters of species.
Nearest neighbor based classification of observations as noise/outliers according to Byers and Raftery (1998).
Shared allele distance (see the corresponding help pages for references).
Dice distance.
geco coefficient, taking geographical distance into account.
Jaccard distance.
Kulczynski dissimilarity.
Quantitative Kulczynski dissimilarity for abundance data.
Constructs communities from geographical distances between individuals.
chord-, phiPT- and various versions of the shared allele distance between communities.
Jackknife-based test for equality of two independent regressions between distances (Hausdorf and Hennig 2019).
Jackknife-based test for equality of regression involving all distances and regression involving within-group distances only (Hausdorf and Hennig 2019).
Jackknife-based test for equality of regression involving within-group distances of a reference group only and regression involving between-group distances (Hausdorf and Hennig 2019).
Computes geographical distances from geographical coordinates.
Computes a neighborhood list from geographical distances.
A somewhat restricted function for conversion of different file formats used for genetic data with codominant markers.
kykladspecreg
, siskiyou
,
veronica
, tetragonula
.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en/
Byers, S. and Raftery, A. E. (1998) Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes, Journal of the American Statistical Association, 93, 577-584.
Erdos, P. and Renyi, A. (1960) On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17-61.
Fraley, C. and Raftery, A. E. (1998) How many clusters? Which clusterin method? - Answers via Model-Based Cluster Analysis. Computer Journal 41, 578-588.
Hausdorf, B. and Hennig, C. (2003) Nestedness of north-west European land snail ranges as a consequence of differential immigration from Pleistocene glacial refuges. Oecologia 135, 102-109.
Hausdorf, B. and Hennig, C. (2007) Null model tests of clustering of species, negative co-occurrence patterns and nestedness in meta-communities. Oikos 116, 818-828.
Hausdorf, B. and Hennig, C. (2010) Species Delimitation Using Dominant and Codominant Multilocus Markers. Systematic Biology, 59, 491-503.
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896.
Hennig, C. and Hausdorf, B. (2006) A robust distance coefficient between distribution areas incorporating geographic distances. Systematic Biology 55, 170-175.
Ling, R. F. (1973) A probability theory of cluster analysis. Journal of the American Statistical Association 68, 159-164.
Parametric bootstrap test of a null model of i.i.d., but spatially
autocorrelated species against clustering of the species' population
patterns. Note that most relevant functionality of prabtest
(except of the use of the geco distance) is
also included in abundtest
, so that abundtest
can also
be used on binary presence-absence data.
In spite of the lots of
parameters, a standard execution (for the default test statistics, see
parameter teststat
below) will be prabmatrix <- prabinit(file="path/abundmatrixfile",
neighborhood="path/neighborhoodfile")
test <- abundtest(prabmatrix)
summary(test)
Note: Data formats are described
on the prabinit
help page. You may also consider the example datasets
kykladspecreg.dat
and nb.dat
. Take care of the
parameter rows.are.species
of prabinit
.
abundtest(prabobj, teststat = "distratio", tuning = 0.25, times = 1000, p.nb = NULL, prange = c(0, 1), nperp = 4, step = 0.1, step2 = 0.01, twostep = TRUE, species.fixed=TRUE, prab01=NULL, groupvector=NULL, sarestimate=prab.sarestimate(prabobj), dist = prabobj$distance, n.species = prabobj$n.species)
abundtest(prabobj, teststat = "distratio", tuning = 0.25, times = 1000, p.nb = NULL, prange = c(0, 1), nperp = 4, step = 0.1, step2 = 0.01, twostep = TRUE, species.fixed=TRUE, prab01=NULL, groupvector=NULL, sarestimate=prab.sarestimate(prabobj), dist = prabobj$distance, n.species = prabobj$n.species)
prabobj |
an object of class |
teststat |
string, indicating the test statistics. |
tuning |
integer or (if |
times |
integer. Number of simulation runs. |
p.nb |
numerical between 0 and 1. The probability that a new
region is drawn from the non-neighborhood of the previous regions
belonging to a species under generation. If |
prange |
numerical range vector, lower value not smaller than 0, larger
value not larger than 1. Range where |
nperp |
integer. Number of simulations per |
step |
numerical between 0 and 1. Interval length between
subsequent choices of |
step2 |
numerical between 0 and 1. Interval length between
subsequent choices of |
twostep |
logical. If |
species.fixed |
logical. Indicates if the range sizes of the species
are held fixed
in the test simulation ( |
prab01 |
|
groupvector |
integer vector. For every species, a number
indicating the species' group membership. Needed only if
|
sarestimate |
Estimator of the parameters of a simultaneous
autoregression model corresponding to the null model for abundance
data from Hausdorf and Hennig (2007) as generated by
|
dist |
One of |
n.species |
number of species. By default this is taken from
|
For presence-absence data, the routine is described in
prabtest
. For abundance data, the first step under the
null model is to
simulated presence-absence patterns as in prabtest
. The second
step is to fit a simultaneous autoregression (SAR) model (Ripley 1981,
section 5.2) to the log-abundances, see
prab.sarestimate
. The simulation from the null model is
implemented in regpop.sar
.
For more details see Hennig
and Hausdorf (2004) for presence-absence data and Hausdorf and Hennig
(2007) for abundance data and the test statistics "mean"
and
"groups"
, which can also be applied to binary data.
If p.nb=NA
was
specified, a diagnostic plot
for the estimation of pd
is plotted by autoconst
.
For details see Hennig
and Hausdorf (2004) and the help pages of the cited functions.
An object of class prabtest
, which is a list with components
results |
vector of test statistic values for all simulated
populations. For |
p.above |
p-value against an alternative that generates large
values of the test statistic (usually reasonable for
|
p.below |
p-value against an alternative that generates small
values of the test statistic (usually reasonable for
|
datac |
test statistic value for the original
data. ( |
tuning |
see above. |
distance |
|
teststat |
see above. |
pd |
|
abund |
|
sarlambda |
Estimator of the autocorrelation
parameter |
sarestimate |
the output object of |
groupinfo |
list containing information from
|
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hausdorf, B. and Hennig, C. (2007) Null model tests of clustering of species, negative co-occurrence patterns and nestedness in meta-communities. Oikos 116, 818-828.
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896. http://stat.ethz.ch/Research-Reports/110.html.
Ripley, B. D. (1981) Spatial Statistics. Wiley.
prabinit
generates objects of class prab
.
autoconst
estimates pd
from such objects.
prabtest
(analogous function for presence-absence data).
regpop.sar
generates populations from the null model.
prab.sarestimate
(parameter estimators for simultaneous
autoregression model). This calls
errorsarlm
(original estimation function from
package spdep
).
Some more information on the test statistics is given in
homogen.test
, lcomponent
,
distratio
, nn
,
incmatrix
.
Summary and print methods: summary.prabtest
.
# Note: NOT RUN. # This needs package spdep and a bunch of packages that are # called by spdep! # data(siskiyou) # set.seed(1234) # x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb, # distance="logkulczynski") # a1 <- abundtest(x, times=5, p.nb=0.0465) # a2 <- abundtest(x, times=5, p.nb=0.0465, teststat="groups", # groupvector=siskiyou.groups) # These settings are chosen to make the example execution # faster; usually you will use abundtest(x). # summary(a1) # summary(a2)
# Note: NOT RUN. # This needs package spdep and a bunch of packages that are # called by spdep! # data(siskiyou) # set.seed(1234) # x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb, # distance="logkulczynski") # a1 <- abundtest(x, times=5, p.nb=0.0465) # a2 <- abundtest(x, times=5, p.nb=0.0465, teststat="groups", # groupvector=siskiyou.groups) # These settings are chosen to make the example execution # faster; usually you will use abundtest(x). # summary(a1) # summary(a2)
Converts alleleobject
with codominant markers into
binary matrix with a column for each marker.
allele2zeroone(alleleobject)
allele2zeroone(alleleobject)
alleleobject |
object of class |
A 0-1-matrix with individuals as rows and markers (alleles) as columns.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
data(tetragonula) ta <- alleleconvert(strmatrix=tetragonula[21:50,]) tai <- alleleinit(allelematrix=ta) allele2zeroone(tai)
data(tetragonula) ta <- alleleconvert(strmatrix=tetragonula[21:50,]) tai <- alleleinit(allelematrix=ta) allele2zeroone(tai)
Codominant marker data (which here means: data with several diploid
loci; two alleles per locus) can be represented in various ways. This
function converts the formats "genepop"
and "structure"
into
"structurama"
and "prabclus"
. "genepop"
is a version of the format
used by the package GENEPOP (Rousset, 2008), "structure"
is a version
of what is used by STRUCTURE (Pritchard et al., 2000), another one is
"structureb"
. "structurama"
is a version of what is used by STRUCTURAMA (Huelsenbeck and
Andolfatto, 2007) and "prabclus"
is required by the function
alleleinit
in the present package.
alleleconvert(file=NULL,strmatrix=NULL, format.in="genepop", format.out="prabclus", alength=3,orig.nachar="000",new.nachar="-", rows.are.individuals=TRUE, firstcolname=FALSE, aletters=intToUtf8(c(65:90,97:122),multiple=TRUE), outfile=NULL,skip=0)
alleleconvert(file=NULL,strmatrix=NULL, format.in="genepop", format.out="prabclus", alength=3,orig.nachar="000",new.nachar="-", rows.are.individuals=TRUE, firstcolname=FALSE, aletters=intToUtf8(c(65:90,97:122),multiple=TRUE), outfile=NULL,skip=0)
file |
string. Filename of input file, see details. One of
|
strmatrix |
matrix or data frame of strings, see details. One of
|
format.in |
string. One of |
format.out |
string. One of |
alength |
integer. If |
orig.nachar |
string. Code for missing values in input data. |
new.nachar |
string. Code for missing values in output data. |
rows.are.individuals |
logical. If |
firstcolname |
logical. If |
aletters |
character vector. String of default characters for
alleles if |
outfile |
string. If specified, the output matrix (omitting
quotes) is written to a file of this name (including row names if
|
skip |
number of rows to be skipped when reading data from a
file ( |
The formats are as follows (described is the format within R, i.e.,
for the input, the format of strmatrix
; if file
is
specified, the file is read with
read.table(file,colClasses="character")
and should give the
format explained below - note that colClasses="character"
implies that quotes are not needed in the input file):
Alleles are coded by strings of length alength
and there is no space between the two alleles in a locus, so a
value of "258260"
means that in the corresponding locus the two
alleles have codes 258 and 260.
Alleles are coded by strings of arbitrary length. Two rows correspond to each inidividual, the first row containing the first alleles in all loci and the second row containing the second ones.
Alleles are coded by strings of arbitrary length. One row corresponds to each inidividual, containing first and second alleles in all loci (first and second allele of first locus, first and second allele of second locus etc.). This starts in the third row (first two have locus names and other information).
Alleles are coded by strings of arbitrary
length. the two alleles in each locus are written with brackets
around them and a comma in between, so "258260"
in
"genepop"
corresponds to "(258,260)"
in "structurama"
.
Alleles are coded by a single character and there is
no space between the two alleles in a locus (e.g.,
"AC"
).
A matrix of strings in the format specified as format.out
with
an attribute "alevels"
, a vector of all used allele codes if
format.out=="prabclus"
, otherwise vector of allele codes of
last locus.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Huelsenbeck, J. P., and P. Andolfatto (2007) Inference of population structure under a Dirichlet process model. Genetics 175, 1787-1802.
Pritchard, J. K., M. Stephens, and P. Donnelly (2000) Inference of population structure using multi-locus genotype data. Genetics 155, 945-959.
Rousset, F. (2008) genepop'007: a complete re-implementation of the genepop software for Windows and Linux. Molecular Ecology Resources 8, 103-106.
data(tetragonula) # This uses example data file Heterotrigona_indoFO.dat str(alleleconvert(strmatrix=tetragonula)) strucmatrix <- cbind(c("I1","I1","I2","I2","I3","I3"), c("122","144","122","122","144","144"),c("0","0","21","33","35","44")) alleleconvert(strmatrix=strucmatrix,format.in="structure", format.out="prabclus",orig.nachar="0",firstcolname=TRUE) alleleconvert(strmatrix=strucmatrix,format.in="structure", format.out="structurama",orig.nachar="0",new.nachar="-9",firstcolname=TRUE)
data(tetragonula) # This uses example data file Heterotrigona_indoFO.dat str(alleleconvert(strmatrix=tetragonula)) strucmatrix <- cbind(c("I1","I1","I2","I2","I3","I3"), c("122","144","122","122","144","144"),c("0","0","21","33","35","44")) alleleconvert(strmatrix=strucmatrix,format.in="structure", format.out="prabclus",orig.nachar="0",firstcolname=TRUE) alleleconvert(strmatrix=strucmatrix,format.in="structure", format.out="structurama",orig.nachar="0",new.nachar="-9",firstcolname=TRUE)
Shared allele distance for codominant markers (Bowcock et al., 1994). One minus proportion of alleles shared by two individuals averaged over loci (loci with missing values for at least one individual are ignored).
alleledist(allelelist,ni,np,count=FALSE)
alleledist(allelelist,ni,np,count=FALSE)
allelelist |
a list of lists. In the "outer" list, there are
|
ni |
integer. Number of individuals. |
np |
integer. Number of loci. |
count |
logical. If |
A symmetrical matrix of shared allele distances between individuals.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Bowcock, A. M., Ruiz-Linares, A., Tomfohrde, J., Minch, E., Kidd, J. R., Cavalli-Sforza, L. L. (1994) High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368, 455-457.
alleleinit
, unbuild.charmatrix
data(tetragonula) tnb <- coord2dist(coordmatrix=tetragonula.coord[1:50,],cut=50,file.format="decimal2",neighbors=TRUE) ta <- alleleconvert(strmatrix=tetragonula[1:50,]) tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist,distance="none") str(alleledist((unbuild.charmatrix(tai$charmatrix,50,13)),50,13))
data(tetragonula) tnb <- coord2dist(coordmatrix=tetragonula.coord[1:50,],cut=50,file.format="decimal2",neighbors=TRUE) ta <- alleleconvert(strmatrix=tetragonula[1:50,]) tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist,distance="none") str(alleledist((unbuild.charmatrix(tai$charmatrix,50,13)),50,13))
alleleinit
converts genetic data with diploid loci as generated
by alleleconvert
into an object of class
alleleobject
. print.alleleobject
is a print method for such
objects.
alleleinit(file = NULL, allelematrix=NULL, rows.are.individuals = TRUE, neighborhood = "none", distance = "alleledist", namode="variables", nachar="-", distcount=FALSE) ## S3 method for class 'alleleobject' print(x, ...)
alleleinit(file = NULL, allelematrix=NULL, rows.are.individuals = TRUE, neighborhood = "none", distance = "alleledist", namode="variables", nachar="-", distcount=FALSE) ## S3 method for class 'alleleobject' print(x, ...)
file |
string. File name. File must be in |
allelematrix |
matrix in |
rows.are.individuals |
logical. If |
neighborhood |
A string or a list with a component for
every individual. The
components are vectors of integers indicating
neighboring individuals. An individual without neighbors
should be assigned a vector |
distance |
|
namode |
one of |
nachar |
character denoting missing values. |
distcount |
logical. If |
x |
object of class |
... |
necessary for print method. |
The required input format is the output format "prabclus"
of
alleleconvert
. Alleles are coded by a single character,
so diploid loci need to be pairs of characters without space between
the two alleles (e.g., "AC"). The input needs to be an
individuals*loci matrix or data frame (or a file that produces such
a data frame by read.table(file,stringsAsFactors=FALSE)
)
alleleinit
produces
an object of class alleleobject
(note that this is similar to
class prab
; for example both can be used with
prabclust
), which is a list with components
distmat |
distance matrix between individuals. |
amatrix |
data frame of input data with string variables in the input format, see details. Note that in the output for an individual the whole locus is declared missing if at least one of its alleles is missing in the input. |
charmatrix |
matrix of characters in which there are two rows for
every individual corresponding to the two alleles in every locus
(column). Entries are allele codes but missing values are coded as
|
nb |
neighborhood list, see above. |
ext.nblist |
a neighborhood list in which for every row in
|
n.variables |
number of loci. |
n.individuals |
number of individuals. |
n.levels |
maximum number of different alleles in a locus. |
n.species |
identical to |
alevels |
character vector with all used allele codes not including missing values. |
leveldist |
matrix in which rows are loci, columns are alleles and entries are frequencies of alleles per locus. |
prab |
useless matrix of number of factor levels corresponding to
|
regperspec |
vector of row-wise sums of |
specperreg |
vector of column-wise sums of |
distance |
string denoting the chosen distance measure, see above. |
namode |
see above. |
naprob |
probability of missing values, numeric or vector, see
documentation of argument |
nasum |
number of missing entries (individual/loci) in
|
nachar |
see above. |
spatial |
logical. |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
alleleconvert
, alleledist
,
prabinit
.
# Only 50 observations are used in order to have a fast example. data(tetragonula) tnb <- coord2dist(coordmatrix=tetragonula.coord[1:50,],cut=50,file.format="decimal2",neighbors=TRUE) ta <- alleleconvert(strmatrix=tetragonula[1:50,]) tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist) print(tai)
# Only 50 observations are used in order to have a fast example. data(tetragonula) tnb <- coord2dist(coordmatrix=tetragonula.coord[1:50,],cut=50,file.format="decimal2",neighbors=TRUE) ta <- alleleconvert(strmatrix=tetragonula[1:50,]) tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist) print(tai)
Used for computation of the genetic distances alleledist
.
allelepaircomp(allelepair1,allelepair2,method="sum")
allelepaircomp(allelepair1,allelepair2,method="sum")
allelepair1 |
vector of two allele codes (usually characters), or
|
allelepair2 |
vector of two allele codes (usually characters), or
|
method |
one of |
If method=="sum"
, number of shared alleles (0, 1 or 2), or
NA
. If method=="geometrical"
, 0, 0.5, sqrt(0.5)
(in case that one of the allelepairs is double such as in
c("A","B"),c("A","A")
) or 1, or
NA
.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
allelepaircomp(c("A","B"),c("A","C"))
allelepaircomp(c("A","B"),c("A","C"))
Monte Carlo estimation of the disjunction/spatial autocorrelation
parameter pd
for the simulation model used in
randpop.nb
, used for tests for clustering of presence-absence data.
autoconst
is the main function; autoreg
performs the
simulation and is executed within autoconst
.
autoconst(x, prange = c(0, 1), twostep = TRUE, step1 = 0.1, step2 = 0.01, plot = TRUE, nperp = 4, ejprob = NULL, species.fixed = TRUE, pdfnb=FALSE, ignore.richness=FALSE) autoreg(x, probs, ejprob, plot = TRUE, nperp = 4, species.fixed = TRUE, pdfnb=FALSE, ignore.richness=FALSE)
autoconst(x, prange = c(0, 1), twostep = TRUE, step1 = 0.1, step2 = 0.01, plot = TRUE, nperp = 4, ejprob = NULL, species.fixed = TRUE, pdfnb=FALSE, ignore.richness=FALSE) autoreg(x, probs, ejprob, plot = TRUE, nperp = 4, species.fixed = TRUE, pdfnb=FALSE, ignore.richness=FALSE)
x |
object of class |
prange |
numerical range vector, lower value not smaller than 0, larger value not larger than 1. Range where the parameter is to be found. |
twostep |
logical. If |
step1 |
numerical between 0 and 1. Interval length between
subsequent choices of |
step2 |
numerical between 0 and 1. Interval length between
subsequent choices of |
plot |
logical. If |
nperp |
integer. Number of simulations per |
ejprob |
numerical between 0 and 1. Observed disjunction
probability for data |
species.fixed |
logical. If |
probs |
vector of numericals between 0 and 1. |
pdfnb |
logical. If |
ignore.richness |
logical. If |
The spatial autocorrelation parameter pd
of the model for the generation of
presence-absence data sets used by randpop.nb
can be estimated
by use of the observed disjuction probability ejprob
which is
the sum of
all species' connectivity components minus the number of species
divided by the number of "presence" entries minus the number of
species. This is done by a simulation of artificial data sets with
characteristics of x
and different pd
-values, governed
by prange, step1, step2
and nperp
. ejprob
is then
calculated for all simulated populations. A linear regression of
ejprob
on pd
is performed and the estimator of pd
is determined by computing the inverse of the regression function for
the ejprob
-value of x
.
autoconst
produces the same list as autoreg
with
additional component ejprob
. The components are
pd |
(eventually) estimated parameter |
coef |
(eventually) estimated regression coefficients. |
ejprob |
see above. |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hausdorf, B. and Hennig, C. (2003) Biotic Element Analysis in Biogeography. To appear in Systematic Biology.
Hausdorf, B. and Hennig, C. (2003) Nestedness of north-west European land snail ranges as a consequence of differential immigration from Pleistocene glacial refuges. Oecologia 135, 102-109.
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896.
randpop.nb
, prabinit
, con.comp
options(digits=4) data(kykladspecreg) data(nb) set.seed(1234) x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb) ax <- autoconst(x,nperp=2,step1=0.3,twostep=FALSE)
options(digits=4) data(kykladspecreg) data(nb) set.seed(1234) x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb) ax <- autoconst(x,nperp=2,step1=0.3,twostep=FALSE)
For use in alleleinit
.
Creates a matrix of characters in which there are two rows for
every individual corresponding to the two alleles in every locus
(column) out of a list of lists, such as required by
alleledist
.
build.charmatrix(allelelist,n.individuals,n.variables)
build.charmatrix(allelelist,n.individuals,n.variables)
allelelist |
A list of lists. In the "outer" list, there are
|
n.individuals |
integer. Number of individuals. |
n.variables |
integer. Number of loci. |
A matrix of characters in which there are two rows for every individual corresponding to the two alleles in every locus (column).
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
alleleinit
, unbuild.charmatrix
alist <- list() alist[[1]] <- list(c("A","A"),c("B","A"),c(NA,NA)) alist[[2]] <- list(c("A","C"),c("B","B"),c("A","D")) build.charmatrix(alist,3,2)
alist <- list() alist[[1]] <- list(c("A","A"),c("B","A"),c(NA,NA)) alist[[2]] <- list(c("A","C"),c("B","B"),c("A","D")) build.charmatrix(alist,3,2)
This is for use in alleleinit
.
Given a neighborhood list of individuals, a new neighborhood list is
generated in which there are two entries for each individual (entry 1
and 2 refer to individual one, 3 and 4 to individual 2 and so
on). Neighborhoods are preserved and additionally the two entries
belonging to the same individual are marked as neighbors.
build.ext.nblist(neighbors,n.individuals=length(neighbors))
build.ext.nblist(neighbors,n.individuals=length(neighbors))
neighbors |
list of integer vectors, where each vector contains the neighbors of an individual. |
n.individuals |
integer. Number of individuals. |
list with 2*n.inidividuals
vectors of integers as described
above.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
data(veronica) vnb <- coord2dist(coordmatrix=veronica.coord[1:20,], cut=20, file.format="decimal2",neighbors=TRUE) build.ext.nblist(vnb$nblist)
data(veronica) vnb <- coord2dist(coordmatrix=veronica.coord[1:20,], cut=20, file.format="decimal2",neighbors=TRUE) build.ext.nblist(vnb$nblist)
This generates a listw
-object as needed for estimation of a
simultaneous autoregression model in package spdep
from a
neighborhood list of the type generated in prabinit
.
build.nblist(prabobj,prab01=NULL,style="C")
build.nblist(prabobj,prab01=NULL,style="C")
prabobj |
object of class |
prab01 |
presence-absence matrix of same dimensions than the
abundance matrix of |
style |
can take values "W", "B", "C", "U", and "S" though tests
suggest that "C" should be chosen. See |
A 'listw' object with the following members:
style |
see above. |
neighbours |
the neighbours list in |
weights |
the weights for the neighbours and chosen style, with attributes set to report the type of relationships (binary or general, if general the form of the glist argument), and style as above. |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
nb2listw
(which is called)
# Not run; requires package spdep # data(siskiyou) # x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb, # distance="logkulczynski") # build.nblist(x)
# Not run; requires package spdep # data(siskiyou) # x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb, # distance="logkulczynski") # build.nblist(x)
Generates a simulated matrix where the rows are interpreted as regions
and the columns as species, 1 means that a species is present in the
region and 0 means that the species is absent. Species are generated
in order to produce 2 clusters of species with similar ranges.
Spatial autocorrelation of a species' presences is governed by
the parameter p.nb
and a list of neighbors for each region.
cluspop.nb(neighbors, p.nb = 0.5, n.species, clus.specs, reg.group, grouppf = 10, n.regions = length(neighbors), vector.species = rep(1, n.species), pdf.regions = rep(1/n.regions, n.regions), count = TRUE, pdfnb = FALSE)
cluspop.nb(neighbors, p.nb = 0.5, n.species, clus.specs, reg.group, grouppf = 10, n.regions = length(neighbors), vector.species = rep(1, n.species), pdf.regions = rep(1/n.regions, n.regions), count = TRUE, pdfnb = FALSE)
neighbors |
A list with a component for every region. The
components are vectors of integers indicating
neighboring regions. A region without neighbors (e.g., an island)
should be assigned a list |
p.nb |
numerical between 0 and 1. The probability that a new
region is drawn from the non-neighborhood of the previous regions
belonging to a species under generation. Note that for a given
presence-absence matrix, this parameter can be estimated by
|
n.species |
integer. Number of species. |
clus.specs |
integer not larger than |
reg.group |
vector of pairwise distinct integers not larger than
|
grouppf |
numerical. The probability of the region of
a clustered species to belong to the corresponding group of regions
is up-weighted by factor |
n.regions |
integer. Number of regions. |
vector.species |
vector of integers. The sizes
(i.e., numbers of regions)
of the species are generated randomly from
the empirical distribution of |
pdf.regions |
numerical vector of length |
count |
logical. If |
pdfnb |
logical. If |
The non-clustered species are generated as explained on the help page
for randpop.nb
. The general principle for the clustered species
is the same, but with modified probabilities for the regions. For each
clustered species, one of the two groups of regions is drawn,
distributed according to the sum of its regions' probability given by
pdf.regions
. The first region of such a species is only drawn
from the regions of this group.
A 0-1-matrix, rows are regions, columns are species.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896.
autoconst
estimates p.nb
from matrices of class
prab
. These are generated by prabinit
.
data(nb) set.seed(888) cluspop.nb(nb, p.nb=0.1, n.species=10, clus.specs=9, reg.group=1:17, vector.species=c(10))
data(nb) set.seed(888) cluspop.nb(nb, p.nb=0.1, n.species=10, clus.specs=9, reg.group=1:17, vector.species=c(10))
Construct communities from individuals using geographical distance and
hierarchical clustering. Communities are clusters of geographically
close individuals, formed by hclust
with specified
distance cutoff.
communities(geodist,grouping=NULL, cutoff=1e-5,method="single")
communities(geodist,grouping=NULL, cutoff=1e-5,method="single")
geodist |
|
grouping |
something that can be coerced into a factor. Different
groups indicated by |
cutoff |
numeric; clustering distance cutoff value, passed on as
parameter |
method |
|
Vector of community memberships for the individuals (integer numbers from 1 to the number of communities without interruption.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[1:90,],file.format="decimal2") species <-c(rep(1,64),rep(2,17),rep(3,9)) communities(ver.geo,species)
data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[1:90,],file.format="decimal2") species <-c(rep(1,64),rep(2,17),rep(3,9)) communities(ver.geo,species)
Constructs distances between communities: chord- (Cavalli-Sforza and Edwards, 1967), phiPT/phiST (Peakall and Smouse, 2012, Meirmans, 2006), three versions of the shared allele distance between communities, and geographical distance between communities.
communitydist(alleleobj,comvector="auto",distance="chord", compute.geodist=TRUE,out.dist=FALSE, grouping=NULL,geodist=NA,diploid=TRUE, phiptna=NA,...)
communitydist(alleleobj,comvector="auto",distance="chord", compute.geodist=TRUE,out.dist=FALSE, grouping=NULL,geodist=NA,diploid=TRUE, phiptna=NA,...)
alleleobj |
if |
comvector |
either a vector of integers indicating to which
community an individual belongs (these need to be numbered from 1 to
a maximum number without interruption), or |
distance |
one of |
compute.geodist |
logical, indicating whether geographical distances between communities should be generated. |
out.dist |
logical, indicating whether |
grouping |
something that can be coerced into a factor, for
passing on to |
geodist |
matrix or |
diploid |
logical, indicating whether loci are diploid, see
|
phiptna |
if |
... |
optional arguments to be passed on to |
All genetic distances between communities are based on the information
given in alleleobj
; either on the alleles directly or on a genetic
distance (distmat
-component, see alleleinit
).
The possible genetic distance measures between communities are as follows:
"chord"
: chord-distance (Cavalli-Sforza and Edwards,
1967)
"phipt"
: phiPT-distance implemented according to
Peakall and Smouse, 2012. This also appears in the literature under
the name phiST (Meirmans, 2006, although the definition there is
incomplete and we are not sure whether this is identical).
"shared.average"
: average of between-community genetic
distances.
"shared.chakraborty"
: between-community shared allele
distance according to Chakraborty and Jin (1993).
"shared.problist"
: this implements the shared allele
distance (Bowcock et al., 1994) for individuals directly for
communities (one minus proportion of alleles shared by two
communities averaged over loci).
list with components
comvector |
integer vector of length of the number of individuals, indicating their community membership. |
dist |
genetic distances between communities. Parameter |
cgeodist |
if |
comgroup |
vector of length of the number of communities. If
|
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Bowcock, A. M., Ruiz-Linares, A., Tomfohrde, J., Minch, E., Kidd, J. R., Cavalli-Sforza, L. L. (1994) High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368, 455-457.
Cavalli-Sforza, L. L. and Edwards, A. W. F. (1967) Phylogenetic Analysis - Models and Estimation Procedures. The American Journal of Human Genetics 19, 233-257.
Chakraborty, R. and Jin, L. (1993) Determination of relatedness between individuals using DNA fingerprinting. Human Biology 65, 875-895.
Meirmans, P. G. (2006) Using the AMOVA framework to estimate a standardized genetic differentiation measure. Evolution 60, 2399-2402.
Peakall, R. and Smouse P.E. (2012) GenAlEx Tutorial 2. https://biology-assets.anu.edu.au/GenAlEx/Tutorials.html
communities
; refer to phipt
for
computation of distances between specific pairs of communities.
diploidcomlist
produces relative frequencies for all
alles of all loci in all communities (on which the chord- and the
"shared.problist"
-distances are based).
options(digits=4) data(tetragonula) tnb <- coord2dist(coordmatrix=tetragonula.coord[83:120,],cut=50, file.format="decimal2",neighbors=TRUE) ta <- alleleconvert(strmatrix=tetragonula[83:120,]) tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist) tetraspec <- c(rep(1,11),rep(2,13),rep(3,14)) tetracoms <- c(rep(1:3,each=3),4,5,rep(6:11,each=2),12,rep(13:19,each=2)) c1 <- communitydist(tai,comvector=tetracoms,distance="chord", geodist=tnb$distmatrix,grouping=tetraspec) c2 <- communitydist(tai,comvector=tetracoms,distance="phipt", geodist=tnb$distmatrix,grouping=tetraspec,compute.geodist=FALSE) c3 <- communitydist(tai,comvector=tetracoms,distance="shared.average", geodist=tnb$distmatrix,grouping=tetraspec,compute.geodist=FALSE) c4 <- communitydist(tai,comvector=tetracoms,distance="shared.chakraborty", geodist=tnb$distmatrix,grouping=tetraspec,compute.geodist=FALSE) c5 <- communitydist(tai,comvector=tetracoms,distance="shared.problist", geodist=tnb$distmatrix,grouping=tetraspec,compute.geodist=FALSE) round(c1$cgeodist,digits=1) c1$comvector c2$comvector c3$comvector c4$comvector c5$comvector round(c1$dist,digits=2) round(c2$dist,digits=2) round(c3$dist,digits=2) round(c4$dist,digits=2) round(c5$dist,digits=2)
options(digits=4) data(tetragonula) tnb <- coord2dist(coordmatrix=tetragonula.coord[83:120,],cut=50, file.format="decimal2",neighbors=TRUE) ta <- alleleconvert(strmatrix=tetragonula[83:120,]) tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist) tetraspec <- c(rep(1,11),rep(2,13),rep(3,14)) tetracoms <- c(rep(1:3,each=3),4,5,rep(6:11,each=2),12,rep(13:19,each=2)) c1 <- communitydist(tai,comvector=tetracoms,distance="chord", geodist=tnb$distmatrix,grouping=tetraspec) c2 <- communitydist(tai,comvector=tetracoms,distance="phipt", geodist=tnb$distmatrix,grouping=tetraspec,compute.geodist=FALSE) c3 <- communitydist(tai,comvector=tetracoms,distance="shared.average", geodist=tnb$distmatrix,grouping=tetraspec,compute.geodist=FALSE) c4 <- communitydist(tai,comvector=tetracoms,distance="shared.chakraborty", geodist=tnb$distmatrix,grouping=tetraspec,compute.geodist=FALSE) c5 <- communitydist(tai,comvector=tetracoms,distance="shared.problist", geodist=tnb$distmatrix,grouping=tetraspec,compute.geodist=FALSE) round(c1$cgeodist,digits=1) c1$comvector c2$comvector c3$comvector c4$comvector c5$comvector round(c1$dist,digits=2) round(c2$dist,digits=2) round(c3$dist,digits=2) round(c4$dist,digits=2) round(c5$dist,digits=2)
Tests for independence between a clustering and another grouping of species.
This is simply an interface to chisq.test
.
comp.test(cl,spg)
comp.test(cl,spg)
cl |
a vector of integers. Clustering of species (may be taken
from |
spg |
a vector of integers of the same length, groups of species. |
chisq.test
with simulated p-value is used.
Output of chisq.test
.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hausdorf, B. and Hennig, C. (2003) Biotic Element Analysis in Biogeography. Systematic Biology 52, 717-723.
set.seed(1234) g1 <- c(rep(1,34),rep(2,12),rep(3,15)) g2 <- sample(3,61,replace=TRUE) comp.test(g1,g2)
set.seed(1234) g1 <- c(rep(1,34),rep(2,12),rep(3,15)) g2 <- sample(3,61,replace=TRUE) comp.test(g1,g2)
Computes the connectivity components of an undirected graph from a matrix giving the edges.
con.comp(comat)
con.comp(comat)
comat |
a symmetric logical or 0-1 matrix, where |
The "depth-first search" algorithm of Cormen, Leiserson and Rivest (1990, p. 477) is used.
An integer vector, giving the number of the connectivity component for each vertice.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Cormen, T. H., Leiserson, C. E. and Rivest, R. L. (1990), Introduction to Algorithms, Cambridge: MIT Press.
hclust
, cutree
for cutted single linkage
trees (often equivalent).
set.seed(1000) x <- rnorm(20) m <- matrix(0,nrow=20,ncol=20) for(i in 1:20) for(j in 1:20) m[i,j] <- abs(x[i]-x[j]) d <- m<0.2 cc <- con.comp(d) max(cc) # number of connectivity components plot(x,cc) # The same should be produced by # cutree(hclust(as.dist(m),method="single"),h=0.2).
set.seed(1000) x <- rnorm(20) m <- matrix(0,nrow=20,ncol=20) for(i in 1:20) for(j in 1:20) m[i,j] <- abs(x[i]-x[j]) d <- m<0.2 cc <- con.comp(d) max(cc) # number of connectivity components plot(x,cc) # The same should be produced by # cutree(hclust(as.dist(m),method="single"),h=0.2).
Returns a vector of the numbers of connected regions per species for a presence-absence matrix.
con.regmat(regmat, neighbors, count = FALSE)
con.regmat(regmat, neighbors, count = FALSE)
regmat |
0-1-matrix. Columns are species, rows are regions. |
neighbors |
A list with a component for every region. The
components are vectors of integers indicating
neighboring regions. A region without neighbors (e.g., an island)
should be assigned a list |
count |
logical. If |
Uses con.comp
.
Vector of numbers of connected regions per species.
Designed for use in prabtest
.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
data(nb) set.seed(888) cp <- cluspop.nb(nb, p.nb=0.1, n.species=10, clus.specs=9, reg.group=1:17,vector.species=c(10)) con.regmat(cp,nb)
data(nb) set.seed(888) cp <- cluspop.nb(nb, p.nb=0.1, n.species=10, clus.specs=9, reg.group=1:17,vector.species=c(10)) con.regmat(cp,nb)
Computes geographical distances from geographical coordinates
coord2dist(file=NULL, coordmatrix=NULL, cut=NULL, file.format="degminsec", output.dist=FALSE, radius=6378.137, fp=1/298.257223563, neighbors=FALSE)
coord2dist(file=NULL, coordmatrix=NULL, cut=NULL, file.format="degminsec", output.dist=FALSE, radius=6378.137, fp=1/298.257223563, neighbors=FALSE)
file |
string. A filename for the coordinate file. The file
should have 2, 4 or 6 numeric columns and one row for each location.
See |
coordmatrix |
something that can be coerced into a matrix with
2, 4 or 6 columns. Matrix of coordinates, one row for each
location. See |
cut |
numeric. Only active if |
file.format |
one of
|
output.dist |
logical. If |
radius |
numeric. Radius of the earth in km used in computation (the default is the equatorial radius but this is not the uniquely possible choice). |
fp |
flattening of the earth; the default is from WGS-84. |
neighbors |
logical. If |
If neighbors==TRUE
, a
list with components
distmatrix |
distance matrix between locations. See
|
nblist |
list with a vector for every location containing the
numbers of its neighbors, see |
If neighbors==FALSE
, only the distance matrix.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
German Wikipedia from 29 August 2010: https://de.wikipedia.org/wiki/Orthodrome
options(digits=4) data(veronica) coord2dist(coordmatrix=veronica.coord[1:20,], cut=20, file.format="decimal2",neighbors=TRUE)
options(digits=4) data(veronica) coord2dist(coordmatrix=veronica.coord[1:20,], cut=20, file.format="decimal2",neighbors=TRUE)
Produces a matrix with clusters as rows and regions as columns, indicating how many species present in a region belong to the clusters
crmatrix(x,xc,percentages=FALSE)
crmatrix(x,xc,percentages=FALSE)
x |
object of class |
xc |
object of class |
percentages |
logical. If |
A clusters time regions matrix as explained above.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
options(digits=3) data(kykladspecreg) data(nb) set.seed(1234) x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb) xc <- prabclust(x) crmatrix(x,xc) crmatrix(x,xc, percentages=TRUE)
options(digits=3) data(kykladspecreg) data(nb) set.seed(1234) x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb) xc <- prabclust(x) crmatrix(x,xc) crmatrix(x,xc, percentages=TRUE)
Computes a distance derived from Dice's coincidence index between the columns of a 0-1-matrix.
dicedist(regmat)
dicedist(regmat)
regmat |
0-1-matrix. Columns are species, rows are regions. |
The Dice distance between two species is 1 minus the Coincidence Index, which is (2*number of regions where both species are present)/(2*number of regions where both species are present plus number of regions where at least one species is present). This is S23 in Shi (1993).
A symmetrical matrix of Dice distances.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Shi, G. R. (1993) Multivariate data analysis in palaeoecology and palaeobiogeography - a review. Palaeogeography, Palaeoclimatology, Palaeoecology 105, 199-234.
options(digits=4) data(kykladspecreg) dicedist(t(kykladspecreg))
options(digits=4) data(kykladspecreg) dicedist(t(kykladspecreg))
Calculates the ratio between the prop
smallest and largest
distances of a distance matrix.
distratio(distmat, prop = 0.25)
distratio(distmat, prop = 0.25)
distmat |
symmetric distance matrix. |
prop |
numerical. Proportion between 0 and 1. |
Rounding is by floor
for small and ceiling
for large
distances.
A list with components
dr |
ratio of |
lowmean |
mean of |
himean |
mean of |
prop |
see above. |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896.
options(digits=4) data(kykladspecreg) j <- jaccard(t(kykladspecreg)) distratio(j)
options(digits=4) data(kykladspecreg) j <- jaccard(t(kykladspecreg)) distratio(j)
Computes geco distances between the columns of a 0-1-matrix, based on a distance matrix between regions (usually, but not necessarily, this is a geographical distance).
geco(regmat,geodist=as.dist(matrix(as.integer(!diag(nrow(regmat))))), transform="piece", tf=0.1, countmode=ncol(regmat)+1)
geco(regmat,geodist=as.dist(matrix(as.integer(!diag(nrow(regmat))))), transform="piece", tf=0.1, countmode=ncol(regmat)+1)
regmat |
0-1-matrix. Columns are species, rows are regions. |
geodist |
|
transform |
transformation applied to the distances before
computation of geco coefficient, see details. "piece" means
piecewise linear, namely distance/( |
tf |
tuning constant for transformation. See |
countmode |
optional positive integer. Every 'countmode' algorithm runs 'geco' shows a message. |
The geco distance between two species is 0.5*(mean distance
between region where species 1 is present and closest region where
species 2 is present plus mean distance
between region where species 2 is present and closest region where
species 1 is present). 'closest' to a region could be the regions
itself.
It is recommended (Hennig and Hausdorf, 2006) to transform the
distances first, because the differences between large distances are
usually not meaningful or at least much less meaningful than
differences between small distances for dissimilarity measurement
between species ranges. See parameter transform
.
If the between-regions distance is 1 for all pairs of
non-equal regions, the geco distance degenerates
to the Kulczynski distance, see kulczynski
.
A symmetrical matrix of geco distances.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hennig, C. and Hausdorf, B. (2006) A robust distance coefficient between distribution areas incorporating geographic distances. Systematic Biology 55, 170-175.
options(digits=4) data(kykladspecreg) data(waterdist) geco(t(kykladspecreg),waterdist)
options(digits=4) data(kykladspecreg) data(waterdist) geco(t(kykladspecreg),waterdist)
Generates a neighborhood list as required by prabinit
from a
matrix of geographical distances.
geo2neighbor(geodist,cut=0.1*max(geodist))
geo2neighbor(geodist,cut=0.1*max(geodist))
geodist |
|
cut |
non-negative numerical. All pairs of regions with
|
A list of integer vectors, giving the set of neighbors for every region.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
data(waterdist) geo2neighbor(waterdist)
data(waterdist) geo2neighbor(waterdist)
Classical distance-based test for homogeneity against clustering. Test
statistics is number of isolated vertices in the graph of smallest
distances. The homogeneity model is a random graph model where ne
edges are drawn from all possible edges.
homogen.test(distmat, ne = ncol(distmat), testdist = "erdos")
homogen.test(distmat, ne = ncol(distmat), testdist = "erdos")
distmat |
numeric symmetric distance matrix. |
ne |
integer. Number of edges in the data graph, corresponding to smallest distances. |
testdist |
string. If |
The "ling"-test is one-sided (rejection if the number of isolated vertices is too large), the "erdos"-test computes a one-sided as well as a two-sided p-value.
A list with components
p |
p-value for one-sided test. |
p.twoside |
p-value for two-sided test, only if |
iv |
number of isolated vertices in the data. |
lambda |
parameter of the Poisson test distribution, only if
|
distcut |
largest distance value for which an edge has been drawn. |
ne |
see above. |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Erdos, P. and Renyi, A. (1960) On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17-61.
Godehardt, E. and Horsch, A. (1995) Graph-Theoretic Models for Testing the Homogeneity of Data. In Gaul, W. and Pfeifer, D. (Eds.) From Data to Knowledge, Springer, Berlin, 167-176.
Ling, R. F. (1973) A probability theory of cluster analysis. Journal of the American Statistical Association 68, 159-164.
options(digits=4) data(kykladspecreg) j <- jaccard(t(kykladspecreg)) homogen.test(j, testdist="erdos") homogen.test(j, testdist="ling")
options(digits=4) data(kykladspecreg) j <- jaccard(t(kykladspecreg)) homogen.test(j, testdist="erdos") homogen.test(j, testdist="ling")
Clusters a presence-absence matrix object by taking the
'h-cut'-partition of a hierarchical clustering and
declaring all members of too small clusters as 'noise' (this gives a
distance-based clustering method, which estimates the number of
clusters and allows for noise/non-clustered points). Note that this
is experimental. Often, the prabclust
-solutions
is more convincing due to higher flexibility of that method. However,
hprabclust
may be more stable sometimes.
Note: Data formats are described
on the prabinit
help page. You may also consider the example datasets
kykladspecreg.dat
and nb.dat
. Take care of the
parameter rows.are.species
of prabinit
.
hprabclust(prabobj, cutdist=0.4, cutout=1, method="average", nnout=2, mdsplot=TRUE, mdsmethod="classical") ## S3 method for class 'comprabclust' print(x, ...)
hprabclust(prabobj, cutdist=0.4, cutout=1, method="average", nnout=2, mdsplot=TRUE, mdsmethod="classical") ## S3 method for class 'comprabclust' print(x, ...)
prabobj |
object of class |
cutdist |
non-negative integer. Cutoff distance to determine the
partition, see |
cutout |
non-negative integer. Points that have at most
|
method |
string. Clustering method, see |
nnout |
non-negative integer. Members of clusters with less or
equal than |
mdsplot |
logical. If |
mdsmethod |
|
x |
|
... |
necessary for print method. |
hprabclust
generates an object of class comprabclust
. This is a
list with components
clustering |
vector of integers indicating the cluster memberships of
the species ( |
rclustering |
vector of integers indicating the cluster memberships of
the species, noise as described under |
cutdist |
see above. |
method |
see above. |
cutout |
see above. |
nnout |
see above. |
noisen |
number of points minus |
symbols |
vector of characters corresponding to |
points |
numerical matrix. MDS configuration (if |
call |
function call. |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
data(kykladspecreg) data(nb) data(waterdist) x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb, geodist=waterdist, distance="geco") hprabclust(x,mdsplot=FALSE)
data(kykladspecreg) data(nb) data(waterdist) x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb, geodist=waterdist, distance="geco") hprabclust(x,mdsplot=FALSE)
Computes species*species nestedness matrix and number of nestings (inclusions) from regions*species presence-absence matrix.
incmatrix(regmat)
incmatrix(regmat)
regmat |
0-1-matrix. Columns are species, rows are regions. |
A list with components
m |
0-1-matrix. |
ninc |
integer. Number of strict inclusions. |
neq |
integer. Number of region equalities between species (not including equality between species i and i). |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hausdorf, B. and Hennig, C. (2003) Nestedness of nerth-west European land snail ranges as a consequence of differential immigration from Pleistocene glacial refuges. Oecologia 135, 102-109.
data(kykladspecreg) incmatrix(t(kykladspecreg))$ninc
data(kykladspecreg) incmatrix(t(kykladspecreg))$ninc
Computes Jaccard distances between the columns of a 0-1-matrix.
jaccard(regmat)
jaccard(regmat)
regmat |
0-1-matrix. Columns are species, rows are regions. |
The Jaccard distance between two species is 1-(number of regions where both species are present)/(number of regions where at least one species is present). As a similarity coefficient, this is S22 in Shi (1993).
Thank you to Laurent Buffat for improving this function!
A symmetrical matrix of Jaccard distances.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Shi, G. R. (1993) Multivariate data analysis in palaeoecology and palaeobiogeography - a review. Palaeogeography, Palaeoclimatology, Palaeoecology 105, 199-234.
options(digits=4) data(kykladspecreg) jaccard(t(kykladspecreg))
options(digits=4) data(kykladspecreg) jaccard(t(kykladspecreg))
Computes Kulczynski distances between the columns of a 0-1-matrix.
kulczynski(regmat)
kulczynski(regmat)
regmat |
0-1-matrix. Columns are species, rows are regions. |
The Kulczynski distance between two species is 1-(mean of (number of regions where both species are present)/(number of regions where species 1 is present) and (number of regions where both species are present)/(number of regions where species 2 is present)). The similarity version of this is S28 in Shi (1993).
A symmetrical matrix of Kulczynski distances.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Shi, G. R. (1993) Multivariate data analysis in palaeoecology and palaeobiogeography - a review. Palaeogeography, Palaeoclimatology, Palaeoecology 105, 199-234.
jaccard
, geco
,qkulczynski
,
dicedist
options(digits=4) data(kykladspecreg) kulczynski(t(kykladspecreg))
options(digits=4) data(kykladspecreg) kulczynski(t(kykladspecreg))
0-1-matrix where rows are snail species and columns are islands in the Aegean sea. An entry of 1 means that the species is present in the region.
data(kykladspecreg)
data(kykladspecreg)
A 0-1 matrix with 80 rows and 34 columns.
Reads from example data file kykladspecreg.dat
.
B. Hausdorf and C. Hennig (2005) The influence of recent geography, palaeography and climate on the composition of the faune of the central Aegean Islands. Biological Journal of the Linnean Society 84, 785-795.
nb
provides neighborhood information about the 34
islands. waterdist
provides a geographical distance
matrix between the islands.
data(kykladspecreg)
data(kykladspecreg)
Computes the size of the largest connectivity component of the graph
of ncol(distmat)
vertices with edges defined by the smallest
ne
distances.
lcomponent(distmat, ne = floor(3*ncol(distmat)/4))
lcomponent(distmat, ne = floor(3*ncol(distmat)/4))
distmat |
symmetric distance matrix. |
ne |
integer. |
list with components
lc |
size of the largest connectivity component. |
ne |
see above. |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896.
data(kykladspecreg) j <- jaccard(t(kykladspecreg)) lcomponent(j)
data(kykladspecreg) j <- jaccard(t(kykladspecreg)) lcomponent(j)
Given a clustering of individuals from prabclust
(as
generated in species delimitation) and a clustering of markers (for
example dominant markers of genetic loci), lociplots
visualises
the presence of markers against the clustering of individuals and
computes some statistics.
lociplots(indclust,locclust,locprab,lcluster, symbols=NULL,brightest.grey=0.8,darkest.grey=0, mdsdim=1:2)
lociplots(indclust,locclust,locprab,lcluster, symbols=NULL,brightest.grey=0.8,darkest.grey=0, mdsdim=1:2)
indclust |
|
locclust |
vector of integers. Clustering of markers/loci. |
locprab |
|
lcluster |
integer. Number of cluster in |
symbols |
vector of plot symbols. If |
brightest.grey |
numeric between 0 and 1. Brightest grey value used in plot for individuals with smallest marker percentage, see details. |
darkest.grey |
numeric between 0 and 1. Darkest grey value used in plot for individuals with highest marker percentage, see details. |
mdsdim |
vector of two integers. The two MDS variables taken from
|
Plot and statistics are based on the individual marker percentage,
which is the percentage of markers present in an individual of the
markers belonging to cluster no. lcluster
. In the plot, the
grey value visualises the marker percentage.
list with components
locfreq |
vector of individual marker percentages. |
locfreqmin |
vector of minimum individual marker precentages for
each cluster in |
locfreqmax |
vector of maximum individual marker precentages for
each cluster in |
locfreqmean |
vector of average individual marker precentages for
each cluster in |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
options(digits=4) data(veronica) vei <- prabinit(prabmatrix=veronica[1:50,],distance="jaccard") ppv <- prabclust(vei) veloci <- prabinit(prabmatrix=veronica[1:50,],rows.are.species=FALSE) velociclust <- prabclust(veloci,nnk=0) lociplots(ppv,velociclust$clustering,veloci,lcluster=3)
options(digits=4) data(veronica) vei <- prabinit(prabmatrix=veronica[1:50,],distance="jaccard") ppv <- prabclust(vei) veloci <- prabinit(prabmatrix=veronica[1:50,],rows.are.species=FALSE) velociclust <- prabclust(veloci,nnk=0) lociplots(ppv,velociclust$clustering,veloci,lcluster=3)
Computes column-wise and row-wise numbers of missing values.
nastats(amatrix, nastr="--")
nastats(amatrix, nastr="--")
amatrix |
(any) matrix. |
nastr |
missing value indicator. |
A list with components
narow |
vector of row-wise numbers of mixxing values. |
nacol |
vector of column-wise numbers of mixxing values. |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
xx <- cbind(c(1,2,3),c(0,0,1),c(5,3,1)) nastats(xx,nastr=0)
xx <- cbind(c(1,2,3),c(0,0,1),c(5,3,1)) nastats(xx,nastr=0)
List of neighboring islands for 34 Aegean islands.
data(nb)
data(nb)
List with 34 components, all being vetors of integers (or
numeric(0)
in case of no neighbors) indicating the neighboring
islands.
Reads from example data file nb.dat
.
B. Hausdorf and C. Hennig (2005) The influence of recent geography, palaeography and climate on the composition of the faune of the central Aegean Islands. Biological Journal of the Linnean Society 84, 785-795.
data(nb) # nb <- list() # for (i in 1:34) # nb <- c(nb,list(scan(file="(path/)nb.dat", # skip=i-1,nlines=1)))
data(nb) # nb <- list() # for (i in 1:34) # nb <- c(nb,list(scan(file="(path/)nb.dat", # skip=i-1,nlines=1)))
Tests a list of neighboring regions for proper format. Neighborhood is tested for being symmetrical. Causes an error if tests fail.
nbtest(nblist, n.regions=length(nblist))
nbtest(nblist, n.regions=length(nblist))
nblist |
A list with a component for
every region. The
components are vectors of integers indicating
neighboring regions. A region without neighbors (e.g., an island)
should be assigned a vector |
n.regions |
Number of regions. |
invisible{TRUE}
.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
data(nb) nbtest(nb) nb[[1]][1] <- 1 try(nbtest(nb))
data(nb) nbtest(nb) nb[[1]][1] <- 1 try(nbtest(nb))
Computes the mean of the distances from each point to its ne
th
nearest neighbor.
nn(distmat, ne = 1)
nn(distmat, ne = 1)
distmat |
symmetric distance matrix (not a |
ne |
integer. |
numerical.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896.
data(kykladspecreg) j <- jaccard(t(kykladspecreg)) nn(j,4)
data(kykladspecreg) j <- jaccard(t(kykladspecreg)) nn(j,4)
Detects if data points are noise or part of a cluster, based on a Poisson process model.
NNclean(data, k, distances = NULL, edge.correct = FALSE, wrap = 0.1, convergence = 0.001, plot=FALSE, quiet=TRUE) ## S3 method for class 'nnclean' print(x, ...)
NNclean(data, k, distances = NULL, edge.correct = FALSE, wrap = 0.1, convergence = 0.001, plot=FALSE, quiet=TRUE) ## S3 method for class 'nnclean' print(x, ...)
data |
numerical matrix or data frame. |
k |
integer. Number of considered nearest neighbors per point. |
distances |
distance matrix object of class |
edge.correct |
logical. If |
wrap |
numerical. If |
convergence |
numerical. Convergence criterion for EM-algorithm. |
plot |
logical. If |
quiet |
logical. If |
x |
object of class |
... |
necessary for print methods. |
The assumption is that the noise is distributed as a homogeneous Poisson process on a certain region and the clusters are distributed as a homogeneous Poisson process with larger intensity on a subregion (disconnected in case of more than one cluster). The distances are then distributed according to a mixture of two transformed Gamma distributions, and this mixture is estimated via the EM-algorithm. The points are assigned to noise or cluster component by use of the estimated a posteriori probabilities.
NNclean
returns a list of class nnclean
with components
z |
0-1-vector of length of the number of data points. 1 means cluster, 0 means noise. |
probs |
vector of estimated a priori probabilities for each point to belong to the cluster component. |
k |
see above. |
lambda1 |
intensity parameter of cluster component. |
lambda2 |
intensity parameter of noise component. |
p |
estimated probability of cluster component. |
kthNND |
distance to kth nearest neighbor. |
The software can be freely used for non-commercial purposes, and can be freely distributed for non-commercial purposes only.
R-port by Christian Hennig
[email protected]
https://www.unibo.it/sitoweb/christian.hennig/en,
original Splus package by S. Byers and A. E. Raftery.
Byers, S. and Raftery, A. E. (1998) Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes, Journal of the American Statistical Association, 93, 577-584.
library(mclust) data(chevron) nnc <- NNclean(chevron[,2:3],15,plot=TRUE) plot(chevron[,2:3],col=1+nnc$z)
library(mclust) data(chevron) nnc <- NNclean(chevron[,2:3],15,plot=TRUE) plot(chevron[,2:3],col=1+nnc$z)
Auxiliary functions for communitydist
. phipt
computes phiPT/phiST (Peakall and Smouse, 2012, Meirmans,
2006) between two communities. cfchord
computes the
chord-distance (Cavalli-Sforza and Edwards, 1967) between two lists or
locus-wise relative allele frequencies. shared.problist
computes a straightforward generalisation of the shared allele
distance (Bowcock et al., 1994) between
individuals for communities, namely the ‘overlap’, i.e., sum of the
minima of the
allele relative frequencies. diploidcomlist
constructs the
input lists for cfchord
and shared.problist
from an
alleleobject
. It provides relative frequencies for all
alles of all loci in all communities.
phipt(alleleobj,comvector,i,j) cfchord(p1,p2) shared.problist(p1,p2) diploidcomlist(alleleobj,comvector,diploid=TRUE)
phipt(alleleobj,comvector,i,j) cfchord(p1,p2) shared.problist(p1,p2) diploidcomlist(alleleobj,comvector,diploid=TRUE)
alleleobj |
if |
comvector |
vector of integers indicating to which community an individual belongs. |
i |
integer. Number of community. |
j |
integer. Number of community. The phiPT-distance is computed
between the communities numbered |
p1 |
list. Every list entry refers to a locus and is a vector of relative frequencies of the alleles present in that locus in a community. |
p2 |
list. Every list entry refers to a locus and is a vector of
relative frequencies of the alleles present in that locus in a
community. The chord or shared allele distance is computed between
the communities encoded by |
diploid |
logical, indicating whether loci are diploid, see
|
cfchord
gives out the value of the chord
distance. shared.problist
gives out the distance
value. diploidcomlist
gives out a two-dimensional list. The
list has one entry for each community, which is itself a list. This
community list has one entry for each locus, which is a vector that
gives the relative frequencies of the different alleles in
phipt
gives out a list with components phipt, vap, n0,
sst, ssg, msa, msw
. These refer to the notation on p.2.12 and 2.15 of
Peakall and Smouse (2012).
phipt |
value of phiPT. |
vap |
variance among (between) populations (communities). |
n0 |
standardisation factor N0, see p.2.12 of Peakall and Smouse (2012). |
sst |
total distances sum of squares. |
ssg |
vector with two non- |
msa |
mean square between communities. |
msw |
mean square within communities. |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Bowcock, A. M., Ruiz-Linares, A., Tomfohrde, J., Minch, E., Kidd, J. R., Cavalli-Sforza, L. L. (1994) High resolution of human evolutionary trees with polymorphic microsatellites. Nature 368, 455-457.
Cavalli-Sforza, L. L. and Edwards, A. W. F. (1967) Phylogenetic Analysis - Models and Estimation Procedures. The American Journal of Human Genetics 19, 233-257.
Meirmans, P. G. (2006) Using the AMOVA framework to estimate a standardized genetic differentiation measure. Evolution 60, 2399-2402.
Peakall, R. and Smouse P.E. (2012) GenAlEx Tutorial 2. https://biology-assets.anu.edu.au/GenAlEx/Tutorials.html
options(digits=4) data(tetragonula) tnb <- coord2dist(coordmatrix=tetragonula.coord[83:120,],cut=50,file.format="decimal2",neighbors=TRUE) ta <- alleleconvert(strmatrix=tetragonula[83:120,]) tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist) tetracoms <- c(rep(1:3,each=3),4,5,rep(6:11,each=2),12,rep(13:19,each=2)) phipt(tai,tetracoms,4,6) tdip <- diploidcomlist(tai,tetracoms,diploid=TRUE) cfchord(tdip[[4]],tdip[[6]]) shared.problist(tdip[[4]],tdip[[6]])
options(digits=4) data(tetragonula) tnb <- coord2dist(coordmatrix=tetragonula.coord[83:120,],cut=50,file.format="decimal2",neighbors=TRUE) ta <- alleleconvert(strmatrix=tetragonula[83:120,]) tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist) tetracoms <- c(rep(1:3,each=3),4,5,rep(6:11,each=2),12,rep(13:19,each=2)) phipt(tai,tetracoms,4,6) tdip <- diploidcomlist(tai,tetracoms,diploid=TRUE) cfchord(tdip[[4]],tdip[[6]]) shared.problist(tdip[[4]],tdip[[6]])
Piecewise linear transformation for distance matrices, utility
function for geco
.
piecewiselin(distmatrix, maxdist=0.1*max(distmatrix))
piecewiselin(distmatrix, maxdist=0.1*max(distmatrix))
distmatrix |
symmetric (non-negative) distance matrix. |
maxdist |
non-negative numeric. Larger distances are transformed to constant 1. |
Transforms large distances to 1, 0 to 0 and continuously linear between 0 and
maxdist
.
A symmetrical matrix.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
options(digits=4) data(waterdist) piecewiselin(waterdist)
options(digits=4) data(waterdist) piecewiselin(waterdist)
Visualisation of various regressions on distance (or dissimilarity) data where objects are from two groups.
plotdistreg(dmx,dmy,grouping,groups=levels(as.factor(grouping))[1:2], cols=c(1,2,3,4), pchs=rep(1,3), ltys=c(1,2,1,2), individual=TRUE,jointwithin=TRUE,jointall=TRUE, oneplusjoint=TRUE,jittering=TRUE,bcenterline=TRUE, xlim=NULL,ylim=NULL,xlab="geographical distance", ylab="genetic distance",...)
plotdistreg(dmx,dmy,grouping,groups=levels(as.factor(grouping))[1:2], cols=c(1,2,3,4), pchs=rep(1,3), ltys=c(1,2,1,2), individual=TRUE,jointwithin=TRUE,jointall=TRUE, oneplusjoint=TRUE,jittering=TRUE,bcenterline=TRUE, xlim=NULL,ylim=NULL,xlab="geographical distance", ylab="genetic distance",...)
dmx |
dissimilarity matrix or object of class
|
dmy |
dissimilarity matrix or object of class
|
grouping |
something that can be coerced into a factor,
defining the grouping of
objects represented by the dissimilarities |
groups |
Vector of two levels. The two groups defining the
regressions to be compared in the test. These can be
factor levels, integer numbers, or strings, depending on the entries
of |
cols |
vector of four colors (or color numbers) to be used for
plotting distances
and regression lines within the first group, within the second group,
distances between groups, and a line marking the center of the
between-groups explanatory distances, see |
pchs |
vector of three plot symbols (or numbers) to be used for
plotting distances within the first group, within the second group,
and distances between groups, see |
ltys |
vector of line type numbers to be used for single group
within-group regression, both groups combined within-group
regression, regression with all distances, and regression combining
within-groups distances of one group with between-groups distances,
see |
individual |
if |
jointwithin |
if |
jointall |
if |
oneplusjoint |
if |
jittering |
if |
bcenterline |
if |
xlim |
to be passed on to |
ylim |
to be passed on to |
xlab |
to be passed on to |
ylab |
to be passed on to |
... |
optional arguments to be passed on to |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
regeqdist
, regdistbetween
,
regdistbetweenone
, regdistdiffone
options(digits=4) data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2") vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard") species <-c(rep(1,13),rep(2,22)) loggeo <- log(ver.geo+quantile(as.vector(as.dist(ver.geo)),0.25)) plotdistreg(dmx=loggeo,dmy=vei$distmat,grouping=species, jointwithin=FALSE,jointall=FALSE,groups=c(1,2)) legend(5,0.75,c("within species 1", "within species 2","species 1 and between","species 2 and between"),lty=c(1,1,2,2),col=c(1,2,1,2)) plotdistreg(dmx=loggeo,dmy=vei$distmat,grouping=species, jointwithin=TRUE,jointall=TRUE,oneplusjoint=FALSE,groups=c(1,2)) legend(5,0.75,c("within species 1", "within species 2","all distances","all within species"),lty=c(1,1,1,2),col=c(1,2,3,3))
options(digits=4) data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2") vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard") species <-c(rep(1,13),rep(2,22)) loggeo <- log(ver.geo+quantile(as.vector(as.dist(ver.geo)),0.25)) plotdistreg(dmx=loggeo,dmy=vei$distmat,grouping=species, jointwithin=FALSE,jointall=FALSE,groups=c(1,2)) legend(5,0.75,c("within species 1", "within species 2","species 1 and between","species 2 and between"),lty=c(1,1,2,2),col=c(1,2,1,2)) plotdistreg(dmx=loggeo,dmy=vei$distmat,grouping=species, jointwithin=TRUE,jointall=TRUE,oneplusjoint=FALSE,groups=c(1,2)) legend(5,0.75,c("within species 1", "within species 2","all distances","all within species"),lty=c(1,1,1,2),col=c(1,2,3,3))
Parametric bootstrap simulation of the p-value of a test of a
homogeneity hypothesis against clustering (or significant nestedness).
Designed for use within
prabtest
. The null model is defined by
randpop.nb
.
pop.sim(regmat, neighbors, h0c = 1, times = 200, dist = "kulczynski", teststat = "isovertice", testc = NULL, geodist=NULL, gtf=0.1, n.species = ncol(regmat), specperreg = NULL, regperspec = NULL, species.fixed=FALSE, pdfnb=FALSE, ignore.richness=FALSE)
pop.sim(regmat, neighbors, h0c = 1, times = 200, dist = "kulczynski", teststat = "isovertice", testc = NULL, geodist=NULL, gtf=0.1, n.species = ncol(regmat), specperreg = NULL, regperspec = NULL, species.fixed=FALSE, pdfnb=FALSE, ignore.richness=FALSE)
regmat |
0-1-matrix. Columns are species, rows are regions. |
neighbors |
A list with a component for every region. The
components are vectors of integers indicating
neighboring regions. A region without neighbors (e.g., an island)
should be assigned a list |
h0c |
numerical. Parameter |
times |
integer. Number of simulation runs. |
dist |
"kulczynski", "jaccard" or "geco", see |
teststat |
"isovertice", "lcomponent", "distratio", "nn" or
"inclusions". See
the corresponding functions, |
testc |
numerical. Tuning constant for the test statistics. |
geodist |
matrix of non-negative reals. Geographical distances
between regions. Only used if |
gtf |
tuning constant for geco-distance if |
n.species |
integer. Number of species. |
specperreg |
vector of integers. Numbers of species per region (is calculated from the data by default). |
regperspec |
vector of integers. Number of regions per species (is calculated from the data by default). |
species.fixed |
logical. If |
pdfnb |
logical. Probability correction in |
ignore.richness |
logical. If |
List with components
results |
vector of teststatistic values for the simulated matrices. |
p.above |
p-value if large test statistic leads to rejection. |
p.below |
p-value if small test statistic leads to rejection. |
datac |
test statistic value for the original data. |
testc |
see above. |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896. http://stat.ethz.ch/Research-Reports/110.html.
Hausdorf, B. and Hennig, C. (2003) Biotic Element Analysis in Biogeography. Systematic Biology 52, 717-723.
Hausdorf, B. and Hennig, C. (2003) Nestedness of north-west European land snail ranges as a consequence of differential immigration from Pleistocene glacial refuges. Oecologia 135, 102-109.
prabtest
, randpop.nb
,
jaccard
, kulczynski
,
homogen.test
, lcomponent
,
distratio
, nn
,
incmatrix
.
options(digits=4) data(kykladspecreg) data(nb) set.seed(1234) pop.sim(t(kykladspecreg), nb, times=5, h0c=0.35, teststat="nn", testc=3)
options(digits=4) data(kykladspecreg) data(nb) set.seed(1234) pop.sim(t(kykladspecreg), nb, times=5, h0c=0.35, teststat="nn", testc=3)
This is either an interface for the function errorsarlm
for abundance data stored in an object of class prab
implemented for use in abundtest
, or, in case that spatial
information should be ignored, it estimates a two-way additive
unreplicated linear
model for log-abundances on factors species and region.
prab.sarestimate(abmat, prab01=NULL,sarmethod="eigen", weightstyle="C", quiet=TRUE, sar=TRUE, add.lmobject=TRUE)
prab.sarestimate(abmat, prab01=NULL,sarmethod="eigen", weightstyle="C", quiet=TRUE, sar=TRUE, add.lmobject=TRUE)
abmat |
object of class |
prab01 |
presence-absence matrix of same dimensions than the
abundance matrix of |
sarmethod |
this is passed on as parameter |
weightstyle |
can take values "W", "B", "C", "U", and "S" though tests
suggest that "C" should be chosen. See |
quiet |
this is passed on as parameter |
sar |
logical. If |
add.lmobject |
logical. If |
A list with the following components:
sar |
see above. |
intercept |
numeric. Estimator of the intercept. |
sigma |
numeric. Estimator of error standard deviation. |
regeffects |
numeric vector. Estimator for region effects. |
speceffects |
numeric vector. Estimator for species effects. |
lamda |
numeric. Governs the degree of spatial
autocorrelation. See |
size |
integer. Length of neighborhood list generated by
|
nbweight |
numeric. Average weight of neighbors. |
lmobject |
if |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
options(digits=4) data(siskiyou) x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb, distance="none") # Not run; this needs package spdep # prab.sarestimate(x) prab.sarestimate(x, sar=FALSE)
options(digits=4) data(siskiyou) x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb, distance="none") # Not run; this needs package spdep # prab.sarestimate(x) prab.sarestimate(x, sar=FALSE)
Clusters a presence-absence matrix object (for clustering
ranges/finding biotic elements, Hennig and Hausdorf, 2004) or
an object of genetic information (for species delimitation, Hausdorf
and Hennig, 2010)
by calculating an MDS from
the distances, and applying maximum likelihood Gaussian mixtures clustering
with "noise" (package mclust
) to the MDS points. The solution
is plotted. A standard execution (using the default distance of
prabinit
) will be prabmatrix <- prabinit(file="path/prabmatrixfile",
neighborhood="path/neighborhoodfile")
clust <- prabclust(prabmatrix)
print(clust)
Examples for species delimitation are given below in the examples section.
Note: Data formats are described
on the prabinit
and alleleinit
help pages. You may also consider the example datasets
kykladspecreg.dat
, nb.dat
,
Heterotrigona_indoFO.txt
or MartinezOrtega04AFLP.dat
.
Note: prabclust
calls the function
mclustBIC
in package mclust. An alternative is the use of hprabclust
.
prabclust(prabobj, mdsmethod = "classical", mdsdim = 4, nnk = ceiling(prabobj$n.species/40), nclus = 0:9, modelid = "all", permutations=0) ## S3 method for class 'prabclust' print(x, bic=FALSE, ...)
prabclust(prabobj, mdsmethod = "classical", mdsdim = 4, nnk = ceiling(prabobj$n.species/40), nclus = 0:9, modelid = "all", permutations=0) ## S3 method for class 'prabclust' print(x, bic=FALSE, ...)
prabobj |
object of class |
mdsmethod |
|
mdsdim |
integer. Dimension of the MDS points. For
|
nnk |
integer. Number of nearest neighbors to determine the
initial noise estimation by |
nclus |
vector of integers. Numbers of clusters to perform the mixture estimation. |
modelid |
string. Model name for |
permutations |
integer. It has been found occasionally that
depending on the order of observations the algorithms |
x |
object of class |
bic |
logical. If |
... |
necessary for summary method. |
Note that if mdsmethod!="classical"
, zero distances between
non-identical objects are replaced by the smallest nonzero distance
divided by 10 to prevent the MDS methods from producing an error.
print.prabclust
does not produce output.
prabclust
generates an object of class prabclust
. This is a
list with components
clustering |
vector of integers indicating the cluster memberships of
the species. Noise can be recognized by output component |
clustsummary |
output object of |
bicsummary |
output object of |
points |
numerical matrix. MDS configuration. |
nnk |
see above. |
mdsdim |
see above. |
mdsmethod |
see above. |
symbols |
vector of characters, similar to |
permchange |
logical. If |
Note that we used mdsmethod="kruskal"
in our publications, but
mdsmethod="classical"
is now the default, because of
occasional numerical instabilities of the isoMDS
-implementation
for Jaccard, Kulczynski or geco distance matrices.
Sometimes, prabclust
produces an error because mclustBIC
cannot handle all models properly. In this case we recommend to change
the modelid
parameter. "noVVV"
and "VVV"
are
reasonable alternative choices (one of these is expected to reproduce
the error, but the other one might work).
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Fraley, C. and Raftery, A. E. (1998) How many clusters? Which clustering method? - Answers via Model-Based Cluster Analysis. Computer Journal 41, 578-588.
Hausdorf, B. and Hennig, C. (2010) Species Delimitation Using Dominant and Codominant Multilocus Markers. Systematic Biology, 59, 491-503.
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896. http://stat.ethz.ch/Research-Reports/110.html.
mclustBIC
, summary.mclustBIC
,
NNclean
, cmdscale
,
isoMDS
, sammon
,
prabinit
, hprabclust
,
alleleinit
, stressvals
.
# Biotic element/range clustering: data(kykladspecreg) data(nb) set.seed(1234) x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb) # If you want to use your own ASCII data files, use # x <- prabinit(file="path/prabmatrixfile", # neighborhood="path/neighborhoodfile") print(prabclust(x)) # Here is an example for species delimitation with codominant markers; # only 50 individuals were used in order to have a fast example. data(tetragonula) ta <- alleleconvert(strmatrix=tetragonula[1:50,]) tai <- alleleinit(allelematrix=ta) print(prabclust(tai)) # Here is an example for species delimitation with dominant markers; # only 50 individuals were used in order to have a fast example. # You may want to use stressvals to choose mdsdim. data(veronica) vei <- prabinit(prabmatrix=veronica[1:50,],distance="jaccard") print(prabclust(vei,mdsmethod="kruskal",mdsdim=3))
# Biotic element/range clustering: data(kykladspecreg) data(nb) set.seed(1234) x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb) # If you want to use your own ASCII data files, use # x <- prabinit(file="path/prabmatrixfile", # neighborhood="path/neighborhoodfile") print(prabclust(x)) # Here is an example for species delimitation with codominant markers; # only 50 individuals were used in order to have a fast example. data(tetragonula) ta <- alleleconvert(strmatrix=tetragonula[1:50,]) tai <- alleleinit(allelematrix=ta) print(prabclust(tai)) # Here is an example for species delimitation with dominant markers; # only 50 individuals were used in order to have a fast example. # You may want to use stressvals to choose mdsdim. data(veronica) vei <- prabinit(prabmatrix=veronica[1:50,],distance="jaccard") print(prabclust(vei,mdsmethod="kruskal",mdsdim=3))
prabinit
converts a matrix into an object
of class prab
(presence-absence). The matrix may be read from a
file or an R-object. It may be a 0-1 matrix or a matrix with
non-negative entries (usually abundances).
print.prab
is a print method for such
objects.
Documentation here is in terms of biotic elements analysis (species are to be clustered). For species delimitation with dominant markers, see Hausdorf and Hennig (2010), individuals take the role of species and loci take the role of regions.
prabinit(file = NULL, prabmatrix = NULL, rows.are.species = TRUE, neighborhood = "none", nbbetweenregions=TRUE, geodist=NULL, gtf=0.1, distance = "kulczynski", toprab = FALSE, toprabp = 0.05, outc = 5.2) ## S3 method for class 'prab' print(x, ...)
prabinit(file = NULL, prabmatrix = NULL, rows.are.species = TRUE, neighborhood = "none", nbbetweenregions=TRUE, geodist=NULL, gtf=0.1, distance = "kulczynski", toprab = FALSE, toprabp = 0.05, outc = 5.2) ## S3 method for class 'prab' print(x, ...)
file |
string. non-negative matrix ASCII file (such as example dataset
|
prabmatrix |
matrix with non-negative entries. Either |
rows.are.species |
logical. If |
neighborhood |
A string or a list with a component for
every region. The
components are vectors of integers indicating
neighboring regions. A region without neighbors (e.g., an island)
should be assigned a vector |
nbbetweenregions |
logical. If |
geodist |
matrix of non-negative reals. Geographical distances
between regions. Only used if |
gtf |
tuning constant for geco-distance if |
distance |
|
toprab |
logical. If |
toprabp |
numerical between 0 and 1, see |
outc |
numerical. Tuning constant for the outlier identification
associated with |
x |
object of class |
... |
necessary for print method. |
Species that are absent in all regions are omitted.
prabinit
produces
an object of class prab
, which is a list with components
distmat |
distance matrix between species. |
prab |
abundance or presence/absence matrix (if presence/absence, the entries are logical). Rows are regions, columns are species. |
nb |
neighborhood list, see above. |
regperspec |
vector of the number of regions occupied by a species. |
specperreg |
vector of the number of species present in a region. |
n.species |
number of species (in the |
n.regions |
number of regions. |
distance |
string denoting the chosen distance measure. |
geodist |
non-negative matrix. see above. |
gtf |
numeric. see above. |
spatial |
|
nonempty.species |
logical vector. The length is the number of species
in the original file/matrix. If |
nbbetweenregions |
see above. |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hausdorf, B. and Hennig, C. (2010) Species Delimitation Using Dominant and Codominant Multilocus Markers. Systematic Biology, 59, 491-503.
read.table
, jaccard
,
kulczynski
, geco
,
qkulczynski
, nbtest
,
alleleinit
# If you want to use your own ASCII data files, use # x <- prabinit(file="path/prabmatrixfile", # neighborhood="path/neighborhoodfile") data(kykladspecreg) data(nb) prabinit(prabmatrix=kykladspecreg, neighborhood=nb)
# If you want to use your own ASCII data files, use # x <- prabinit(file="path/prabmatrixfile", # neighborhood="path/neighborhoodfile") data(kykladspecreg) data(nb) prabinit(prabmatrix=kykladspecreg, neighborhood=nb)
Parametric bootstrap test of a null model of i.i.d., but spatially
autocorrelated species against clustering of the species' occupied
areas (or alternatively nestedness). In spite of the lots of
parameters, a standard execution (for the default test statistics, see
parameter teststat
below) will be prabmatrix <- prabinit(file="path/prabmatrixfile",
neighborhood="path/neighborhoodfile")
test <- prabtest(prabmatrix)
summary(test)
Note: Data formats are described
on the prabinit
help page. You may also consider the example datasets
kykladspecreg.dat
and nb.dat
. Take care of the
parameter rows.are.species
of prabinit
.
prabtest(prabobject, teststat = "distratio", tuning = switch(teststat, distratio = 0.25, lcomponent = floor(3 * ncol(prabobject$distmat)/4), isovertice = ncol(prabobject$distmat), nn = 4, NA), times = 1000, pd = NULL, prange = c(0, 1), nperp = 4, step = 0.1, step2=0.01, twostep = TRUE, sf.sim = FALSE, sf.const = sf.sim, pdfnb = FALSE, ignore.richness=FALSE) ## S3 method for class 'prabtest' summary(object, above.p=object$teststat %in% c("groups","inclusions","mean"), group.outmean=FALSE,...) ## S3 method for class 'summary.prabtest' print(x, ...)
prabtest(prabobject, teststat = "distratio", tuning = switch(teststat, distratio = 0.25, lcomponent = floor(3 * ncol(prabobject$distmat)/4), isovertice = ncol(prabobject$distmat), nn = 4, NA), times = 1000, pd = NULL, prange = c(0, 1), nperp = 4, step = 0.1, step2=0.01, twostep = TRUE, sf.sim = FALSE, sf.const = sf.sim, pdfnb = FALSE, ignore.richness=FALSE) ## S3 method for class 'prabtest' summary(object, above.p=object$teststat %in% c("groups","inclusions","mean"), group.outmean=FALSE,...) ## S3 method for class 'summary.prabtest' print(x, ...)
prabobject |
an object of class |
teststat |
string, indicating the test statistics. |
tuning |
integer or (if |
times |
integer. Number of simulation runs. |
pd |
numerical between 0 and 1. The probability that a new
region is drawn from the non-neighborhood of the previous regions
belonging to a species under generation. If |
prange |
numerical range vector, lower value not smaller than 0, larger
value not larger than 1. Range where |
nperp |
integer. Number of simulations per |
step |
numerical between 0 and 1. Interval length between
subsequent choices of |
step2 |
numerical between 0 and 1. Interval length between
subsequent choices of |
twostep |
logical. If |
sf.sim |
logical. Indicates if the range sizes of the species
are held fixed
in the test simulation ( |
sf.const |
logical. Same as |
pdfnb |
logical. If |
ignore.richness |
logical. If |
object |
object of class |
above.p |
logical. |
group.outmean |
logical. If |
x |
object of class |
... |
no meaning, necessary for print and summary methods. |
From the original data, the distribution of the
range sizes of the species, the autocorrelation parameter pd
(estimated by autoconst
) and the distribution on the regions
induced by the relative species numbers are taken. With these
parameters, times
populations according to the null model
implemented in randpop.nb
are generated and the test statistic
is evaluated. The resulting p-value is number of simulated statistic
values more extreme than than the value of the original data+1
divided by times+1
. "More extreme" means smaller for
"lcomponent"
, "distratio"
, "nn"
, larger for
"inclusions"
, and
twice the smaller number between the original statistic value and the
"border", i.e., a two-sided test for "isovertice"
.
If pd=NA
was
specified, a diagnostic plot
for the estimation of pd
is plotted by autoconst
.
For details see Hennig
and Hausdorf (2004) and the help pages of the cited functions.
prabtest
prodices
an object of class prabtest
, which is a list with components
results |
vector of test statistic values for all simulated populations. |
datac |
test statistic value for the original data.' |
p.value |
the p-value. |
tuning |
see above. |
pd |
see above. |
reg |
regression coefficients from |
teststat |
see above. |
distance |
the distance measure chosen, see |
gtf |
the geco-distance tuning parameter (only informative if
|
times |
see above. |
pdfnb |
see above. |
ignore.richness |
see above. |
summary.prabtest
produces an object of class
summary.prabtest
, which is a list with components
rrange |
range of the simulation results (test statistic values)
of |
rmean |
mean of the simulation results (test statistic values)
of |
datac , p.value , pd , tuning , teststat , distance , times , pdfnb , abund , sarlambda
|
directly
taken from |
groupinfo |
if |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896. http://stat.ethz.ch/Research-Reports/110.html.
Hausdorf, B. and Hennig, C. (2003) Biotic Element Analysis in Biogeography. Systematic Biology 52, 717-723.
Hausdorf, B. and Hennig, C. (2003) Nestedness of north-west European land snail ranges as a consequence of differential immigration from Pleistocene glacial refuges. Oecologia 135, 102-109.
prabinit
generates objects of class prab
.
autoconst
estimates pd
from such objects.
randpop.nb
generates populations from the null model.
An alternative model is given by cluspop.nb
.
Some more information on the test statistics is given in
homogen.test
, lcomponent
,
distratio
, nn
,
incmatrix
.
The simulations are computed by pop.sim
.
options(digits=4) data(kykladspecreg) data(nb) set.seed(1234) x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb) # If you want to use your own ASCII data files, use # x <- prabinit(file="path/prabmatrixfile", # neighborhood="path/neighborhoodfile") kpt <- prabtest(x, times=5, pd=0.35) # These settings are chosen to make the example execution # a bit faster; usually you will use prabtest(kprab). summary(kpt)
options(digits=4) data(kykladspecreg) data(nb) set.seed(1234) x <- prabinit(prabmatrix=kykladspecreg, neighborhood=nb) # If you want to use your own ASCII data files, use # x <- prabinit(file="path/prabmatrixfile", # neighborhood="path/neighborhoodfile") kpt <- prabtest(x, times=5, pd=0.35) # These settings are chosen to make the example execution # a bit faster; usually you will use prabtest(kprab). summary(kpt)
Computes quantitative Kulczynski distances between the columns of an abundance matrix.
qkulczynski(regmat, log.distance=FALSE)
qkulczynski(regmat, log.distance=FALSE)
regmat |
(non-negative) abundance matrix. Columns are species, rows are regions. |
log.distance |
logical. If |
The quantitative Kulczynski distance between two species is 1-(mean of (mean of over regions minimum abundance of both species)/(sum of abundances of species 1) and (mean of over regions minimum abundance of both species)/(sum of abundances of species 2)). If the abundance matrix is a 0-1-matrix, this gives the standard Kulczynski distance.
A symmetrical matrix of quantitative Kulczynski distances.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
D. P. Faith, P. R. Minchin and L. Belbin (1987) Compositional dissimilarity as a robust measure of ecological distance. Vegetation 69, 57-68.
options(digits=4) data(kykladspecreg) qkulczynski(t(kykladspecreg))
options(digits=4) data(kykladspecreg) qkulczynski(t(kykladspecreg))
Generates a simulated matrix where the rows are interpreted as regions
and the columns as species, 1 means that a species is present in the
region and 0 means that the species is absent. Species are generated
i.i.d.. Spatial autocorrelation of a species' presences is governed by
the parameter p.nb
and a list of neighbors for each region.
randpop.nb(neighbors, p.nb = 0.5, n.species, n.regions = length(neighbors), vector.species = rep(1, n.species), species.fixed = FALSE, pdf.regions = rep(1/n.regions, n.regions), count = TRUE, pdfnb = FALSE)
randpop.nb(neighbors, p.nb = 0.5, n.species, n.regions = length(neighbors), vector.species = rep(1, n.species), species.fixed = FALSE, pdf.regions = rep(1/n.regions, n.regions), count = TRUE, pdfnb = FALSE)
neighbors |
A list with a component for every region. The
components are vectors of integers indicating
neighboring regions. A region without neighbors (e.g., an island)
should be assigned a list |
p.nb |
numerical between 0 and 1. The probability that a new
region is drawn from the non-neighborhood of the previous regions
belonging to a species under generation. Note that for a given
presence-absence matrix, this parameter can be estimated by
|
n.species |
integer. Number of species. |
n.regions |
integer. Number of regions. |
vector.species |
vector of integers. If
|
species.fixed |
logical. See |
pdf.regions |
numerical vector of length |
count |
logical. If |
pdfnb |
logical. If |
The principle is that a single species with given size is generated
one-by-one region. The first region is drawn according to
pdf.regions
. For all following regions, a neighbor or
non-neighbor of the previous configuration is added (if possible),
as explained in pdf.regions
, p.nb
.
A 0-1-matrix, rows are regions, columns are species.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hennig, C. and Hausdorf, B. (2004) Distance-based parametric bootstrap tests for clustering of species ranges. Computational Statistics and Data Analysis 45, 875-896. http://stat.ethz.ch/Research-Reports/110.html.
Hausdorf, B. and Hennig, C. (2003) Biotic Element Analysis in Biogeography. Systematic Biology 52, 717-723.
Hausdorf, B. and Hennig, C. (2003) Nestedness of nerth-west European land snail ranges as a consequence of differential immigration from Pleistocene glacial refuges. Oecologia 135, 102-109.
autoconst
estimates p.nb
from matrices of class
prab
. These are generated by prabinit
.
prabtest
uses randpop.nb
as a null model for
tests of clustering. An alternative model is given by
cluspop.nb
.
data(nb) set.seed(2346) randpop.nb(nb, p.nb=0.1, n.species=5, vector.species=c(1,10,20,30,34))
data(nb) set.seed(2346) randpop.nb(nb, p.nb=0.1, n.species=5, vector.species=c(1,10,20,30,34))
Given two dissimilarity matrices dmx
and dmy
and an indicator
vector x
, this computes a standard least squares regression
between the dissimilarity between objects indicated in x
.
regdist(x,dmx,dmy,xcenter=0,param)
regdist(x,dmx,dmy,xcenter=0,param)
x |
vector of logicals of length of the number of objects on which
dissimilarities |
dmx |
dissimilarity matrix or object of class
|
dmy |
dissimilarity matrix or object of class
|
xcenter |
numeric. Dissimilarities |
param |
1 or 2 or |
If param=NULL
, the output object of lm
. If
param=1
the intercept. If
param=2
the slope.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
options(digits=4) data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[1:20,],file.format="decimal2") vei <- prabinit(prabmatrix=veronica[1:20,],distance="jaccard") regdist(c(rep(TRUE,10),rep(FALSE,10)),ver.geo,vei$distmat,param=1)
options(digits=4) data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[1:20,],file.format="decimal2") vei <- prabinit(prabmatrix=veronica[1:20,],distance="jaccard") regdist(c(rep(TRUE,10),rep(FALSE,10)),ver.geo,vei$distmat,param=1)
Jackknife-based test for equality of two regressions between distances. Given two groups of objects, this tests whether the regression involving all distances is compatible with the regression involving within-group distances only.
regdistbetween(dmx,dmy,grouping,groups=levels(as.factor(grouping))[1:2]) ## S3 method for class 'regdistbetween' print(x,...)
regdistbetween(dmx,dmy,grouping,groups=levels(as.factor(grouping))[1:2]) ## S3 method for class 'regdistbetween' print(x,...)
dmx |
dissimilarity matrix or object of class
|
dmy |
dissimilarity matrix or object of class
|
grouping |
something that can be coerced into a factor,
defining the grouping of
objects represented by the dissimilarities |
groups |
Vector of two levels. The two groups defining the
regressions to be compared in the test. These can be
factor levels, integer numbers, or strings, depending on the entries
of |
x |
object of class |
... |
optional arguments for print method. |
The null hypothesis that the regressions based on all distances and based on within-group distances only are equal is tested using jackknife pseudovalues. This assumes that a single regression is appropriate at least for the within-group distances alone. The test statistic is the difference between fitted values with x (explanatory variable) fixed at the center of the between-group distances. The test is run one-sided, i.e., the null hypothesis is only rejected if the between-group distances are larger than expected under the null hypothesis, see below.
The test cannot be run in case that within-group regressions or jackknifed within-group regressions are ill-conditioned.
This was implemented having in mind an application in which the
explanatory distances represent geographical distances, the response
distances are genetic distances, and groups represent species or
species-candidates. In this application, for testing whether the
regression patterns are compatble with the two groups behaving like a
single species, one would first use regeqdist
to test whether a
joint regression for the within-group distances of both groups makes
sense. If this is not rejected, regdistbetween
is run to see
whether the between-group distances are compatible with the
within-group distances. This is only rejected if the between-group
distances are larger than expected under equality of regressions,
because if they are smaller, this is not an indication against the
groups belonging together genetically.
If a joint regression on
within-group distances is rejected by regeqdist
,
regdistbetweenone
can be
used to test whether the between-group distances are at least
compatible with the within-group distances of one of the groups, which
can still be the case within a single species, see Hausdorf and Hennig (2019).
list of class "regdistbetween"
with components
pval |
p-value. |
coeffdiff |
difference between regression fits (all distances
minus within-group distances only) at |
condition |
condition numbers of regressions, see |
lmfit |
list. Output objects of |
jr |
output object of |
xcenter |
mean of within-groups distances of explanatory variable, used for centering. |
xcenterbetween |
mean of between-groups distances of explanatory
variable (after centering by |
tstat |
t-statistic. |
tdf |
degrees of freedom of t-statistic. |
jackest |
jackknife-estimator of difference between regression
fitted values at |
jackse |
jackknife-standard error for
|
jackpseudo |
vector of jacknife pseudovalues on which the test is based. |
testname |
title to be printed out when using
|
groups |
see above. |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
options(digits=4) data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2") vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard") loggeo <- log(ver.geo+quantile(as.vector(as.dist(ver.geo)),0.25)) species <-c(rep(1,13),rep(2,22)) rtest2 <- regdistbetween(dmx=loggeo,dmy=vei$distmat,grouping=species,groups=c(1,2)) print(rtest2)
options(digits=4) data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2") vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard") loggeo <- log(ver.geo+quantile(as.vector(as.dist(ver.geo)),0.25)) species <-c(rep(1,13),rep(2,22)) rtest2 <- regdistbetween(dmx=loggeo,dmy=vei$distmat,grouping=species,groups=c(1,2)) print(rtest2)
Jackknife-based test for equality of two regressions between distances. Given two groups of objects, this tests whether the regression involving the distances within one of the groups is compatible with the regression involving the same within-group distances together with the between group distances.
regdistbetweenone(dmx,dmy,grouping,groups=levels(as.factor(grouping))[1:2],rgroup)
regdistbetweenone(dmx,dmy,grouping,groups=levels(as.factor(grouping))[1:2],rgroup)
dmx |
dissimilarity matrix or object of class
|
dmy |
dissimilarity matrix or object of class
|
grouping |
something that can be coerced into a factor,
defining the grouping of
objects represented by the dissimilarities |
groups |
vector of two levels. The two groups defining the
regressions to be compared in the test. These can be
factor levels, integer numbers, or strings, depending on the entries
of |
rgroup |
one of the levels in |
The null hypothesis that the regressions based on the distances
within group species
and based on these distances together with
the between-groups distances are
equal is tested using jackknife pseudovalues. The test statistic is
the difference between fitted
values with x (explanatory variable) fixed at the center of the
between-group distances. The test is run one-sided, i.e., the null
hypothesis is only rejected if the between-group distances are larger
than expected under the null hypothesis, see below. For the jackknife,
observations from both groups are left out one at a time. However, the
roles of the two groups are different (observations from group
species
are used in both regressions whereas observations from
the other group are only used in one of them), and therefore the
corresponding jackknife pseudovalues can have different variances. To
take this into account, variances are pooled, and the degrees of
freedom of the t-test are computed by the Welch-Sattertwaithe
approximation for aggregation of different variances.
The test cannot be run and many components will be NA
in case that
within-group regressions or jackknifed within-group regressions are
ill-conditioned.
This was implemented having in mind an application in which the
explanatory distances represent geographical distances, the response
distances are genetic distances, and groups represent species or
species-candidates. In this application, for testing whether the
regression patterns are compatble with the two groups behaving like a
single species, one would first use regeqdist
to test whether a
joint regression for the within-group distances of both groups makes
sense. If this is not rejected, regdistbetween
is run to see
whether the between-group distances are compatible with the
within-group distances.
If a joint regression on
within-group distances is rejected by regeqdist
,
regdistbetweenone
can be
used to test whether the between-group distances are at least
compatible with the within-group distances of one of the groups, which
can still be the case within a single species, see Hausdorf and Hennig
(2019). This
is only rejected if the between-group
distances are larger than expected under equality of regressions,
because if they are smaller, this is not an indication against the
groups belonging together genetically. To this end,
regdistbetweenone
needs to be run twice using both groups as
species
. This will produce two p-values. The null hypothesis
that the regressions are compatible for at least one group can be
rejected if the maximum of the two p-values is smaller than the chosen
significance level.
list of class "regdistbetween"
with components
pval |
p-value. |
coeffdiff |
difference between regression fits (within-group
together with between-groups distances
minus within-group distances only) at |
condition |
condition numbers of regressions, see |
lmfit |
list. Output objects of |
jr |
output object of |
xcenter |
mean of within-group distances for group |
xcenterbetween |
mean of between-groups distances of explanatory
variable (after centering by |
tstat |
t-statistic. |
tdf |
degrees of freedom of t-statistic according to Welch-Sattertwaithe approximation. |
jackest |
jackknife-estimator of difference between regression
fitted values at |
jackse |
jackknife-standard error for
|
jackpseudo |
vector of jacknife pseudovalues on which the test is based. |
groups |
see above. |
species |
see above. |
testname |
title to be printed out when using
|
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
options(digits=4) data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2") vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard") species <-c(rep(1,13),rep(2,22)) loggeo <- log(ver.geo+quantile(as.vector(as.dist(ver.geo)),0.25)) rtest3 <- regdistbetweenone(dmx=loggeo,dmy=vei$distmat,grouping=species,groups=c(1,2),rgroup=1) print(rtest3)
options(digits=4) data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2") vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard") species <-c(rep(1,13),rep(2,22)) loggeo <- log(ver.geo+quantile(as.vector(as.dist(ver.geo)),0.25)) rtest3 <- regdistbetweenone(dmx=loggeo,dmy=vei$distmat,grouping=species,groups=c(1,2),rgroup=1) print(rtest3)
Given two dissimilarity matrices dmx
and dmy
, an indicator
vector x
and a grouping, this computes the difference between
standard least squares regression predictions at point
xcenterbetween
. The regressions are based on the dissimilarities
in dmx
vs. dmy
for objects indicated in
x
. grouping
indicates the two groups, and the difference
is computed between regressions based on the within-group distances of
the two groups.
regdistdiff(x,dmx,dmy,grouping,xcenter=0,xcenterbetween=0)
regdistdiff(x,dmx,dmy,grouping,xcenter=0,xcenterbetween=0)
x |
vector of logicals of length of the number of objects on which
dissimilarities |
dmx |
dissimilarity matrix or object of class
|
dmy |
dissimilarity matrix or object of class
|
grouping |
vector of length of the number of objects on which
dissimilarities |
xcenter |
numeric. Dissimilarities |
xcenterbetween |
numeric. This specifies the x- (dissimilarity)
value at which predictions from the two regressions are
compared. Note that this is interpreted as after centering by
|
Difference between
standard least squares regression predictions for the two groups at point
xcenterbetween
.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
options(digits=4) data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2") vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard") species <-c(rep(1,13),rep(2,22)) regdistdiff(rep(TRUE,35),ver.geo,vei$distmat,grouping=species,xcenter=0,xcenterbetween=100)
options(digits=4) data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2") vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard") species <-c(rep(1,13),rep(2,22)) regdistdiff(rep(TRUE,35),ver.geo,vei$distmat,grouping=species,xcenter=0,xcenterbetween=100)
Given two dissimilarity matrices dmx
and dmy
, an indicator
vector x
and a grouping, this computes the difference between
standard least squares regression predictions at point
xcenterbetween
. The regressions are based on the dissimilarities
in dmx
vs. dmy
for objects indicated in
x
. grouping
indicates the two groups, and the difference
is computed between regressions based on (a) the within-group
distances of the reference group rgroup
and (b) these together
with the between-group distances.
regdistdiffone(x,dmx,dmy,grouping,xcenter=0,xcenterbetween=0,rgroup)
regdistdiffone(x,dmx,dmy,grouping,xcenter=0,xcenterbetween=0,rgroup)
x |
vector of logicals of length of the number of objects on which
dissimilarities |
dmx |
dissimilarity matrix or object of class
|
dmy |
dissimilarity matrix or object of class
|
grouping |
vector of length of the number of objects on which
dissimilarities |
xcenter |
numeric. Dissimilarities |
xcenterbetween |
numeric. This specifies the x- (dissimilarity)
value at which predictions from the two regressions are
compared. Note that this is interpreted as after centering by
|
rgroup |
one of the values of |
Difference between
standard least squares regression predictions for the two regressions at point
xcenterbetween
.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
options(digits=4) data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,], file.format="decimal2") vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard") species <-c(rep(1,13),rep(2,22)) regdistdiffone(rep(TRUE,35),ver.geo,vei$distmat,grouping=species, xcenter=0,xcenterbetween=100,rgroup=2)
options(digits=4) data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,], file.format="decimal2") vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard") species <-c(rep(1,13),rep(2,22)) regdistdiffone(rep(TRUE,35),ver.geo,vei$distmat,grouping=species, xcenter=0,xcenterbetween=100,rgroup=2)
Jackknife-based test for equality of two regressions between distance matrices.
regeqdist(dmx,dmy,grouping,groups=levels(as.factor(grouping))[1:2]) ## S3 method for class 'regeqdist' print(x,...)
regeqdist(dmx,dmy,grouping,groups=levels(as.factor(grouping))[1:2]) ## S3 method for class 'regeqdist' print(x,...)
dmx |
dissimilarity matrix or object of class
|
dmy |
dissimilarity matrix or object of class
|
grouping |
something that can be coerced into a factor,
defining the grouping of
objects represented by the dissimilarities |
groups |
Vector of two, indicating the two groups defining the
regressions to be compared in the test. These can be
factor levels, integer numbers, or strings, depending on the entries
of |
x |
object of class |
... |
optional arguments for print method. |
The null hypothesis that the regressions within the two groups are equal is tested using jackknife pseudovalues independently in both groups allowing for potentially different variances of the pseudovalues, and aggregating as in Welch's t-test. Tests are run separately for intercept and slope and aggregated by Bonferroni's rule.
The test cannot be run and many components will be NA
in case that
within-group regressions or jackknifed within-group regressions are
ill-conditioned.
This was implemented having in mind an application in which the
explanatory distances represent geographical distances, the response
distances are genetic distances, and groups represent species or
species-candidates. In this application, for testing whether the
regression patterns are compatble with the two groups behaving like a
single species, one would first use regeqdist
to test whether a
joint regression for the within-group distances of both groups makes
sense. If this is not rejected, regdistbetween
is run to see
whether the between-group distances are compatible with the
within-group distances. On the other hand, if a joint regression on
within-group distances is rejected, regdistbetweenone
can be
used to test whether the between-group distances are at least
compatible with the within-group distances of one of the groups, which
can still be the case within a single species, see Hausdorf and Hennig (2019).
list of class "regeqdist"
with components
pval |
p-values for intercept and slope. |
coeffdiff |
vector of differences between groups (first minus second) for intercept and slope. |
condition |
condition numbers of regressions, see |
lmfit |
list. Output objects of |
jr |
list of two lists of two; output object of
|
xcenter |
mean of |
tstat |
t-statistic. |
tdf |
vector of degrees of freedom of t-statistic according to Welch-Sattertwaithe approximation for intercept and slope. |
jackest |
jackknife-estimator of difference between regressions; vector with intercept and slope difference. |
jackse |
vector with jackknife-standard errors for
|
jackpseudo |
list of two lists of vectors; jacknife pseudovalues within both groups for intercept and slope estimators. |
groups |
see above. |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.
regdistbetween
, regdistbetweenone
options(digits=4) data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2") vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard") loggeo <- log(ver.geo+quantile(as.vector(as.dist(ver.geo)),0.25)) species <-c(rep(1,13),rep(2,22)) rtest <- regeqdist(dmx=loggeo,dmy=vei$distmat,grouping=species,groups=c(1,2)) print(rtest)
options(digits=4) data(veronica) ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2") vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard") loggeo <- log(ver.geo+quantile(as.vector(as.dist(ver.geo)),0.25)) species <-c(rep(1,13),rep(2,22)) rtest <- regeqdist(dmx=loggeo,dmy=vei$distmat,grouping=species,groups=c(1,2)) print(rtest)
Generates a simulated matrix where the rows are interpreted as regions
and the columns as species, and the entries are abundances.
Species are generated i.i.d. in two steps. In the first step, a
presence-absence matrix is generated as in randpop.nb
. In the
second step, conditionally on presence in the first step, abundance
values are generated according to a simultaneous autoregression (SAR)
model for the log-abundances (see errorsarlm
for
the model; estimates are provided by the parameter
sarestimate
). Spatial autocorrelation of a species' presences
is governed by the parameter p.nb
, sarestimate
and a
list of neighbors for each region.
regpop.sar(abmat, prab01=NULL, sarestimate=prab.sarestimate(abmat), p.nb=NULL, vector.species=prab01$regperspec, pdf.regions=prab01$specperreg/(sum(prab01$specperreg)), count=FALSE)
regpop.sar(abmat, prab01=NULL, sarestimate=prab.sarestimate(abmat), p.nb=NULL, vector.species=prab01$regperspec, pdf.regions=prab01$specperreg/(sum(prab01$specperreg)), count=FALSE)
abmat |
object of class |
prab01 |
presence-absence matrix of same dimensions than the
abundance matrix of |
sarestimate |
Estimator of the parameters of a simultaneous
autoregression model corresponding to the null model for abundance
data from Hausdorf and Hennig (2007) as generated by
|
p.nb |
numeric between 0 and 1. The probability that a new
region is drawn from the non-neighborhood of the previous regions
belonging to a species under generation. If |
vector.species |
vector of integers. |
pdf.regions |
numerical vector of length |
count |
logical. If |
A matrix of abundance values, rows are regions, columns are species.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
Hausdorf, B. and Hennig, C. (2007) Null model tests of clustering of species, negative co-occurrence patterns and nestedness in meta-communities. Oikos 116, 818-828.
autoconst
estimates p.nb
from matrices of class
prab
. These are generated by prabinit
.
abundtest
uses regpop.sar
as a null model for
tests of clustering.
randpop.nb
(analogous function for simulating
presence-absence data)
options(digits=4) data(siskiyou) set.seed(1234) x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb, distance="none") # Not run; this needs package spdep. # regpop.sar(x, p.nb=0.046) regpop.sar(x, p.nb=0.046, sarestimate=prab.sarestimate(x,sar=FALSE))
options(digits=4) data(siskiyou) set.seed(1234) x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb, distance="none") # Not run; this needs package spdep. # regpop.sar(x, p.nb=0.046) regpop.sar(x, p.nb=0.046, sarestimate=prab.sarestimate(x,sar=FALSE))
Distributions of species of herbs in relation to elevation on quartz diorite in the central Siskiyou Mountains. All values are per mille frequencies in transects (The number of 1 m2 quadrats, among 1000 such quadrats, in which a species was observed, based on 1250 1m2 quadrats in the first 5 transects, and 400 1m2 quadrats in 6. transect). Observed presences in the transect, outside the sampling plots, were coded as 0.2. Rows correspond to species, columns correspond to regions.
data(siskiyou)
data(siskiyou)
Three objects are generated:
numeric matrix giving the 144*6 abundance values.
neighborhood list for the 6 regions.
integer vector of length 144, giving group memberships for the 144 species.
Reads from example data files LeiMik1.dat, LeiMik1NB.dat,
LeiMik1G.dat
.
Whittaker, R. H. 1960. Vegetation of the Siskiyou Mountains, Oregon and California. Ecol. Monogr. 30: 279-338 (table 14).
data(siskiyou)
data(siskiyou)
Generates average within-group distances (overall and group-wise) from a dissimilarity matrix and a given grouping.
specgroups(distmat,groupvector, groupinfo)
specgroups(distmat,groupvector, groupinfo)
distmat |
dissimilarity matrix or |
groupvector |
integer vector. For every row of |
groupinfo |
list with components |
A list with parameters
overall |
overall average within-groups dissimilarity. |
gr |
vector of group-wise average within-group dissimilarities
(this will be |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
options(digits=4) data(siskiyou) x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb, distance="logkulczynski") groupvector <- as.factor(siskiyou.groups) ng <- length(levels(groupvector)) lg <- levels(groupvector) nsg <- numeric(0) for (i in 1:ng) nsg[i] <- sum(groupvector==lg[i]) groupinfo <- list(lg=lg,ng=ng,nsg=nsg) specgroups(x$distmat,groupvector,groupinfo)
options(digits=4) data(siskiyou) x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb, distance="logkulczynski") groupvector <- as.factor(siskiyou.groups) ng <- length(levels(groupvector)) lg <- levels(groupvector) nsg <- numeric(0) for (i in 1:ng) nsg[i] <- sum(groupvector==lg[i]) groupinfo <- list(lg=lg,ng=ng,nsg=nsg) specgroups(x$distmat,groupvector,groupinfo)
Computes Kruskal's nonmetric multidimensional scaling
isoMDS
on alleleobject
or
prab
-objects for
different output dimensions in order to compare stress values.
stressvals(x,mdsdim=1:12,trace=FALSE)
stressvals(x,mdsdim=1:12,trace=FALSE)
x |
object of class |
mdsdim |
integer vector of MDS numbers of dimensions to be tried. |
trace |
logical. |
Note that zero distances between
non-identical objects are replaced by the smallest nonzero distance
divided by 10 to prevent isoMDS
from producing an error.
A list with components
MDSstress |
vector of stress values. |
mdsout |
list of full outputs of |
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
options(digits=4) data(tetragonula) set.seed(112233) taiselect <- sample(236,40) # Use data subset to make execution faster. tnb <- coord2dist(coordmatrix=tetragonula.coord[taiselect,], cut=50,file.format="decimal2",neighbors=TRUE) ta <- alleleconvert(strmatrix=tetragonula[taiselect,]) tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist) stressvals(tai,mdsdim=1:3)$MDSstress
options(digits=4) data(tetragonula) set.seed(112233) taiselect <- sample(236,40) # Use data subset to make execution faster. tnb <- coord2dist(coordmatrix=tetragonula.coord[taiselect,], cut=50,file.format="decimal2",neighbors=TRUE) ta <- alleleconvert(strmatrix=tetragonula[taiselect,]) tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist) stressvals(tai,mdsdim=1:3)$MDSstress
Genetic data for 236 Tetragonula (Apidae) bees from Australia and Southeast Asia, see Franck et al. (2004). The data give pairs of alleles (codominant markers) for 13 microsatellite loci.
data(tetragonula)
data(tetragonula)
Two objects are generated:
A data frame with 236 observations and 13 string
variables. Strings consist of six digits each. The
format is derived from the data format used by the software GENEPOP
(Rousset 2008). Alleles have a three digit code, so a value of
"258260"
on variable V10 means that on locus 10 the two alleles have
codes 258 and 260. "000"
refers to missing values.
a 236*2 matrix. Coordinates of locations of individuals in decimal format, i.e. the first number is latitude (negative values are South), with minutes and seconds converted to fractions. The second number is longitude (negative values are West).
Reads from example data file Heterotrigona_indoFO.dat
.
Franck, P., E. Cameron, G. Good, J.-Y. Rasplus, and B. P. Oldroyd (2004) Nest architecture and genetic differentiation in a species complex of Australian stingless bees. Mol. Ecol. 13, 2317-2331.
Rousset, F. (2008) genepop'007: a complete re-implementation of the genepop software for Windows and Linux. Molecular Ecology Resources 8, 103-106.
data(tetragonula)
data(tetragonula)
Converts abundance matrix into binary (logical) presence/absence
matrix (TRUE
if
abundance>0).
toprab(prabobj)
toprab(prabobj)
prabobj |
object of class |
Logical matrix with same dimensions as prabobj$prab
as described above.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
data(siskiyou) x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb, distance="none") toprab(x)
data(siskiyou) x <- prabinit(prabmatrix=siskiyou, neighborhood=siskiyou.nb, distance="none") toprab(x)
Creates a list of lists, such as required by alleledist
,
from the charmatrix
component of an
alleleobject
.
unbuild.charmatrix(charmatrix,n.individuals,n.variables)
unbuild.charmatrix(charmatrix,n.individuals,n.variables)
charmatrix |
matrix of characters in which there are two rows for
every individual corresponding to the two alleles in every locus
(column). Entries are allele codes but missing values are coded as
|
n.individuals |
integer. Number of individuals. |
n.variables |
integer. Number of loci. |
A list of lists. In the "outer" list, there are
n.variables
lists, one for each locus. In the "inner" list, for every
individual there is a vector of two codes (typically characters, see
alleleinit
) for the two alleles in that locus.
Christian Hennig [email protected] https://www.unibo.it/sitoweb/christian.hennig/en
data(tetragonula) tnb <- coord2dist(coordmatrix=tetragonula.coord[1:50,],cut=50,file.format="decimal2",neighbors=TRUE) ta <- alleleconvert(strmatrix=tetragonula[1:50,]) tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist,distance="none") str(unbuild.charmatrix(tai$charmatrix,50,13))
data(tetragonula) tnb <- coord2dist(coordmatrix=tetragonula.coord[1:50,],cut=50,file.format="decimal2",neighbors=TRUE) ta <- alleleconvert(strmatrix=tetragonula[1:50,]) tai <- alleleinit(allelematrix=ta,neighborhood=tnb$nblist,distance="none") str(unbuild.charmatrix(tai$charmatrix,50,13))
0-1 data indicating whether dominant markers are present for 583 different AFLP bands ranging from 61 to 454 bp of 207 plant individuals of Veronica (Pentasepalae) from the Iberian Peninsula and Morocco (Martinez-Ortega et al., 2004).
data(veronica)
data(veronica)
Two objects are generated:
0-1 matrix with 207 individuals (rows) and 583 AFLP bands (columns).
a 207*2 matrix. Coordinates of locations of individuals in decimal format, i.e. the first number is latitude (negative values are South), with minutes and seconds converted to fractions. The second number is longitude (negative values are West).
Reads from example data files MartinezOrtega04AFLP.dat,
MartinezKoord.dat
.
Martinez-Ortega, M. M., L. Delgado, D. C. Albach, J. A. Elena-Rossello, and E. Rico (2004). Species boundaries and phylogeographic patterns in cryptic taxa inferred from AFLP markers: Veronica subgen. Pentasepalae (Scrophulariaceae) in the Western Mediterranean.Syst. Bot. 29, 965-986.
data(veronica)
data(veronica)
Distance matrix of overwater distances in km between 34 islands in the Aegean sea.
data(waterdist)
data(waterdist)
A symmetric 34*34 distance matrix.
Reads from example data file Waterdist.dat
, in which there is a
35th column and line with distances to Turkey mainland.
B. Hausdorf and C. Hennig (2005) The influence of recent geography, palaeography and climate on the composition of the faune of the central Aegean Islands. Biological Journal of the Linnean Society 84, 785-795.
data(waterdist)
data(waterdist)