Title: | Computes Copula using Ranks and Subsampling |
---|---|
Description: | Estimation of copula using ranks and subsampling. The main feature of this method is that simulation studies show a low sensitivity to dimension, on realistic cases. |
Authors: | Jerome Collet |
Maintainer: | Jerome Collet <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.9.9.3 |
Built: | 2024-11-18 06:34:54 UTC |
Source: | CRAN |
Estimation of copula using ranks and subsampling. The main feature of this method is that simulation studies show a low sensitivity to dimension, on realistic cases.
The DESCRIPTION file:
Package: | subrank |
Type: | Package |
Title: | Computes Copula using Ranks and Subsampling |
Version: | 0.9.9.3 |
Date: | 2023-04-06 |
Author: | Jerome Collet |
Maintainer: | Jerome Collet <[email protected]> |
Description: | Estimation of copula using ranks and subsampling. The main feature of this method is that simulation studies show a low sensitivity to dimension, on realistic cases. |
License: | GPL (>= 3) |
LazyLoad: | yes |
NeedsCompilation: | yes |
Packaged: | 2023-04-05 13:54:30 UTC; Jerome |
Repository: | CRAN |
Date/Publication: | 2023-04-05 17:50:05 UTC |
Index of help topics:
corc Function to estimate copula using ranks and sub-sampling corc0 Function to estimate copula using ranks and sub-sampling, minimal version. desscop Discrete copula graph, a two-dimensional projection desscoptous Discrete copula graph, ALL two-dimensional projections estimdep Dependence estimation predictdep Probability forecasting predonfly Probability forecasting simany Test statistic distribution under any hypothesis simnul Test statistic distribution under independence hypothesis subrank-package Computes Copula using Ranks and Subsampling
Taking a sample, its dimension, and a sub-sample size, allows to estimate a discretized copula. This object has interesting features: convergence to copula, robustness with respect to dimension.
Jerome Collet
Maintainer: Jerome Collet <[email protected]>
lon <- 31 a <- 2.85 x <- rnorm(lon) y = a*x^2+rnorm(lon) tablo = as.data.frame(cbind(x,y)) c=corc(tablo,c(1,2),8) desscop(c,1,2)
lon <- 31 a <- 2.85 x <- rnorm(lon) y = a*x^2+rnorm(lon) tablo = as.data.frame(cbind(x,y)) c=corc(tablo,c(1,2),8) desscop(c,1,2)
Takes a sample, its dimension, a sub-sample size, and returns a discrete copula.
corc(dataframe, varnames, subsampsize, nbsafe=5,mixties=FALSE,nthreads=2)
corc(dataframe, varnames, subsampsize, nbsafe=5,mixties=FALSE,nthreads=2)
dataframe |
a data frame, containing the observations |
varnames |
the name of the variables we want to estimate the dependence between |
subsampsize |
the sub-sample size |
nbsafe |
the ratio between the number of sub-samples and the cardinality of the discretized copula. |
mixties |
if |
nthreads |
number of number of threads, assumed to be strictly positive. For "full throttle" computations, consider using parallel::detectCores() |
cop |
an array representing the discretized copula |
ties |
the number of sub-samples with a tie |
nsubsampreal |
the effective number of sub-samples drawn |
varnames |
the name of the variables studied |
nnm |
the number of observations without missing values |
Jerome Collet
lon <- 30 a <- 2 x <- rnorm(lon) y = a*x^2+rnorm(lon) datatable = as.data.frame(cbind(x,y)) c=corc(datatable,c("x","y"),8) c sum(c$cop)
lon <- 30 a <- 2 x <- rnorm(lon) y = a*x^2+rnorm(lon) datatable = as.data.frame(cbind(x,y)) c=corc(datatable,c("x","y"),8) c sum(c$cop)
Minimal version of function corc.
corc0(datavector,sampsize,dimension,subsampsize,nboot,u,mixties=FALSE,nthreads=2)
corc0(datavector,sampsize,dimension,subsampsize,nboot,u,mixties=FALSE,nthreads=2)
datavector |
a vector, containing the observations |
sampsize |
the sample size |
dimension |
the sample dimension |
subsampsize |
the sub-sample size |
nboot |
the number of sub-samples (must be big) |
u |
a random seed, integer |
mixties |
if |
nthreads |
number of number of threads, assumed to be strictly positive. For "full throttle" computations, consider using parallel::detectCores() |
the number of hits for each vector of ranks, plus 2 last values of the vector : number of ties and number of sub-samples really used.
Jerome Collet
lon <- 30 a <- 2.85 x <- rnorm(lon) y = a*x^2+rnorm(lon) c=corc0(c(x,y),lon,2,8,1e5,75014) c c0=c( 1203, 1671, 1766, 959, 1586, 1715, 1803, 1205, 1260,1988, 2348, 1917, 3506, 2045, 1340, 1093, 2694, 2757,2233, 1085, 2322, 1793, 1569, 1263, 1709, 1747, 1512,1308, 1778, 1354, 1184, 1097, 2487, 2730, 2112, 1100,2435, 2033, 1572, 1093, 1369, 1722, 1462, 1015, 1228, 1419, 1776, 1852, 1009, 1097, 1179, 1323, 1595, 1316,1477, 2628, 889, 1178, 1981, 4000, 35, 840, 2091, 4467,0, 27405) set.seed(75013) lon=30 dimension=3 sssize=4 c0==corc0(rnorm(lon*dimension),lon,dimension,sssize,1e5,75014)
lon <- 30 a <- 2.85 x <- rnorm(lon) y = a*x^2+rnorm(lon) c=corc0(c(x,y),lon,2,8,1e5,75014) c c0=c( 1203, 1671, 1766, 959, 1586, 1715, 1803, 1205, 1260,1988, 2348, 1917, 3506, 2045, 1340, 1093, 2694, 2757,2233, 1085, 2322, 1793, 1569, 1263, 1709, 1747, 1512,1308, 1778, 1354, 1184, 1097, 2487, 2730, 2112, 1100,2435, 2033, 1572, 1093, 1369, 1722, 1462, 1015, 1228, 1419, 1776, 1852, 1009, 1097, 1179, 1323, 1595, 1316,1477, 2628, 889, 1178, 1981, 4000, 35, 840, 2091, 4467,0, 27405) set.seed(75013) lon=30 dimension=3 sssize=4 c0==corc0(rnorm(lon*dimension),lon,dimension,sssize,1e5,75014)
Draws a discrete joint probability, for 2 variables, using bubbles
desscop(copest, xname, yname, normalize = FALSE, axes = TRUE)
desscop(copest, xname, yname, normalize = FALSE, axes = TRUE)
copest |
the estimated copula (the whole structure resulting from |
xname |
the name of the variable we want to put on the horizontal axis |
yname |
the name of the variable we want to put on the vertical axis |
normalize |
if TRUE, the smallest probability is rescaled to 0, and the largest to 1 |
axes |
if TRUE, puts the name of the variables on the axes |
Jerome Collet
lon <- 31 a <- 2.85 x <- rnorm(lon) y = a*x^2+rnorm(lon) tablo = as.data.frame(cbind(x,y)) c=corc(tablo,c("x","y"),8) desscop(c,"x","y") tablo = as.data.frame(cbind(x=rep(0,each=lon),y=rep(0,each=lon))) c=corc(tablo,c("x","y"),8,mixties=TRUE) desscop(c,"x","y")
lon <- 31 a <- 2.85 x <- rnorm(lon) y = a*x^2+rnorm(lon) tablo = as.data.frame(cbind(x,y)) c=corc(tablo,c("x","y"),8) desscop(c,"x","y") tablo = as.data.frame(cbind(x=rep(0,each=lon),y=rep(0,each=lon))) c=corc(tablo,c("x","y"),8,mixties=TRUE) desscop(c,"x","y")
Draws a discrete joint probability, for 2 variables, using bubbles
desscoptous(copest, normalize = FALSE)
desscoptous(copest, normalize = FALSE)
copest |
the estimated copula (the whole structure resulting from |
normalize |
if TRUE, the smallest probability is rescaled to 0, and the largest to 1 |
Jerome Collet
lon <- 31 a <- 2.85 x <- rnorm(lon) y = a*x^2+rnorm(lon) z = rnorm(lon) tablo = as.data.frame(cbind(x,y,z)) c=corc(tablo,c("x","y","z"),8) desscoptous(c)
lon <- 31 a <- 2.85 x <- rnorm(lon) y = a*x^2+rnorm(lon) z = rnorm(lon) tablo = as.data.frame(cbind(x,y,z)) c=corc(tablo,c("x","y","z"),8) desscoptous(c)
From a set of observations, builds a description of the multivariate distribution
estimdep(dataframe,varnames,subsampsize,nbsafe=5,mixties=FALSE,nthreads=2)
estimdep(dataframe,varnames,subsampsize,nbsafe=5,mixties=FALSE,nthreads=2)
dataframe |
a data frame containing the observations |
varnames |
the name of the variables we want to estimate the multivariate distribution |
subsampsize |
the sub-sample size |
nbsafe |
the ratio between the discretized copula size and the number of sub-samples |
mixties |
if |
nthreads |
number of number of threads, assumed to be strictly positive. For "full throttle" computations, consider using parallel::detectCores() |
the description of the dependence, it is an object with the following parts:
cop |
the array representing the discretized copula |
margins |
the matrix representing the margins, estimated using kernel density estimation |
varnames |
the names of the variables |
Jerome Collet
lon=3000 plon=3000 subsampsize=20 ############## x=(runif(lon)-1/2)*3 y=x^2+rnorm(lon) z=rnorm(lon) donori=as.data.frame(cbind(x,y,z)) depori=estimdep(donori,c("x","y","z"),subsampsize) knownvalues=data.frame(z=rnorm(plon)) prev <- predictdep(knownvalues,depori) plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5) points(donori[,1:2],col='red',pch=20,cex=.5) knownvalues=data.frame(x=(runif(lon)-1/2)*3) prev <- predictdep(knownvalues,depori) plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5) points(donori[,1:2],col='red',pch=20,cex=.5) knownvalues=data.frame(y=runif(plon,min=-2,max=4)) prev <- predictdep(knownvalues,depori) plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5) points(donori[,1:2],col='red',pch=20,cex=.5)
lon=3000 plon=3000 subsampsize=20 ############## x=(runif(lon)-1/2)*3 y=x^2+rnorm(lon) z=rnorm(lon) donori=as.data.frame(cbind(x,y,z)) depori=estimdep(donori,c("x","y","z"),subsampsize) knownvalues=data.frame(z=rnorm(plon)) prev <- predictdep(knownvalues,depori) plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5) points(donori[,1:2],col='red',pch=20,cex=.5) knownvalues=data.frame(x=(runif(lon)-1/2)*3) prev <- predictdep(knownvalues,depori) plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5) points(donori[,1:2],col='red',pch=20,cex=.5) knownvalues=data.frame(y=runif(plon,min=-2,max=4)) prev <- predictdep(knownvalues,depori) plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5) points(donori[,1:2],col='red',pch=20,cex=.5)
From a set of incomplete observations, and a description of the dependence, provides simulated values of the unknown coordinates. It is also possible to simulate unconditionally, with empty observations.
predictdep(knownvalues,dependence,smoothing=c("Uniform","Beta"),nthreads=2)
predictdep(knownvalues,dependence,smoothing=c("Uniform","Beta"),nthreads=2)
knownvalues |
in case of conditional simulation, a matrix containing incomplete observations, the known coordinates being the same for all observations. If no variable name in |
dependence |
the description of the dependence we want to use to forecast, as built by function |
smoothing |
the smoothing method for input and output ranks. |
nthreads |
number of number of threads, assumed to be strictly positive. For "full throttle" computations, consider using parallel::detectCores() |
the matrix of the completed observations
Jerome Collet
lon=100 plon=100 subsampsize=10 shift=0 noise=0 knowndims=1 x=rnorm(lon) y=2*x+noise*rnorm(lon) donori=as.data.frame(cbind(x,y)) depori=estimdep(donori,c("x","y"),subsampsize) ## knownvalues=data.frame(x=rnorm(plon)+shift) prev <- predictdep(knownvalues,depori) ## plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5) points(donori[,1:2],col='red',pch=20,cex=.5) ## knownvalues=data.frame(x=rnorm(plon)+shift) prev <- predictdep(knownvalues,depori,smoothing="Beta") ## plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5) points(donori[,1:2],col='red',pch=20,cex=.5) # souci normal si |shift|>>1 knownvalues=data.frame(z=rnorm(plon)+shift) prev <- predictdep(knownvalues,depori) ## plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5) points(donori[,1:2],col='red',pch=20,cex=.5)
lon=100 plon=100 subsampsize=10 shift=0 noise=0 knowndims=1 x=rnorm(lon) y=2*x+noise*rnorm(lon) donori=as.data.frame(cbind(x,y)) depori=estimdep(donori,c("x","y"),subsampsize) ## knownvalues=data.frame(x=rnorm(plon)+shift) prev <- predictdep(knownvalues,depori) ## plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5) points(donori[,1:2],col='red',pch=20,cex=.5) ## knownvalues=data.frame(x=rnorm(plon)+shift) prev <- predictdep(knownvalues,depori,smoothing="Beta") ## plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5) points(donori[,1:2],col='red',pch=20,cex=.5) # souci normal si |shift|>>1 knownvalues=data.frame(z=rnorm(plon)+shift) prev <- predictdep(knownvalues,depori) ## plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5) points(donori[,1:2],col='red',pch=20,cex=.5)
From two sets of observations, first one of complete observations and second one of incomplete observations, provides simulated values of the unknown coordinates.
predonfly(completeobs,incompleteobs,varnames,subsampsize,nbpreds=1,mixties=FALSE, maxtirs=1e5,complete=TRUE,nthreads=2)
predonfly(completeobs,incompleteobs,varnames,subsampsize,nbpreds=1,mixties=FALSE, maxtirs=1e5,complete=TRUE,nthreads=2)
completeobs |
the set of complete observations. |
incompleteobs |
the set of incomplete observations. |
varnames |
the modeled variables. |
subsampsize |
the sub-sample size. |
nbpreds |
the number of predictions for each incomplete observation. |
mixties |
if |
maxtirs |
the maximum number of sub-samples, to stop the computation even if they did not provide |
complete |
If |
nthreads |
number of number of threads, assumed to be strictly positive. For "full throttle" computations, consider using parallel::detectCores() |
the matrix of the completed observations
Jerome Collet
lon=100 plon=30 subsampsize=10 x=rnorm(lon) y=2*x+rnorm(lon)*0 donori=as.data.frame(cbind(x,y)) ## knownvalues=data.frame(x=rnorm(plon)) prev <- predonfly(donori,knownvalues,c("x","y"),subsampsize,100) ## plot(prev$x,prev$y,pch=20,cex=0.5, ylim=range(c(prev$y,donori$y),na.rm=TRUE),xlim=range(c(prev$x,donori$x))) points(donori[,1:2],col='red',pch=20,cex=.5) lon=3000 mg=20 dimtot=4 rayon=6 genboules <- function(lon,a,d) { ss <- function(vec) {return(sum(vec*vec))} surface=matrix(nrow=lon,ncol=d,data=rnorm(lon*d)) rayons=sqrt(apply(surface,1,ss)) surface=surface/rayons return(matrix(nrow=lon,ncol=d,data=rnorm(lon*d))+a*surface) } ############## donori=genboules(lon,rayon,dimtot) donori=as.data.frame(donori) dimconnues=3:dimtot valconnues=matrix(nrow=1,ncol=length(dimconnues),data=0) valconnues=as.data.frame(valconnues) names(valconnues)=names(donori)[3:dimtot] prev <- predonfly(donori,valconnues,names(donori),subsampsize,100) boule2=genboules(plon,rayon,2) plot(boule2[,1:2],xlab='X1',ylab='X2',pch=20,cex=.5) plot(prev$V1,prev$V2,xlab='X1',ylab='X2',pch=20,cex=.5)
lon=100 plon=30 subsampsize=10 x=rnorm(lon) y=2*x+rnorm(lon)*0 donori=as.data.frame(cbind(x,y)) ## knownvalues=data.frame(x=rnorm(plon)) prev <- predonfly(donori,knownvalues,c("x","y"),subsampsize,100) ## plot(prev$x,prev$y,pch=20,cex=0.5, ylim=range(c(prev$y,donori$y),na.rm=TRUE),xlim=range(c(prev$x,donori$x))) points(donori[,1:2],col='red',pch=20,cex=.5) lon=3000 mg=20 dimtot=4 rayon=6 genboules <- function(lon,a,d) { ss <- function(vec) {return(sum(vec*vec))} surface=matrix(nrow=lon,ncol=d,data=rnorm(lon*d)) rayons=sqrt(apply(surface,1,ss)) surface=surface/rayons return(matrix(nrow=lon,ncol=d,data=rnorm(lon*d))+a*surface) } ############## donori=genboules(lon,rayon,dimtot) donori=as.data.frame(donori) dimconnues=3:dimtot valconnues=matrix(nrow=1,ncol=length(dimconnues),data=0) valconnues=as.data.frame(valconnues) names(valconnues)=names(donori)[3:dimtot] prev <- predonfly(donori,valconnues,names(donori),subsampsize,100) boule2=genboules(plon,rayon,2) plot(boule2[,1:2],xlab='X1',ylab='X2',pch=20,cex=.5) plot(prev$V1,prev$V2,xlab='X1',ylab='X2',pch=20,cex=.5)
Simulates the test statistic, under independence
simany(sampsize,dimension,subsampsizes,sampnum,nbsafe=5,nthreads=2, fun=NULL, ...)
simany(sampsize,dimension,subsampsizes,sampnum,nbsafe=5,nthreads=2, fun=NULL, ...)
sampsize |
sample size |
dimension |
sample dimension |
subsampsizes |
vector of sub-sample sizes |
sampnum |
number of samples |
nbsafe |
the ratio between the number of sub-samples and the cardinality of the discretized copula. |
nthreads |
number of number of threads, assumed to be strictly positive. For "full throttle" computations, consider using parallel::detectCores() |
fun |
the function describing the dependence. |
... |
optional arguments to |
lrs |
the distances with independent case |
lrs2mean |
the distances with theoretical value, given dependence |
scarcities |
the proportions of non-reached vector ranks |
DistTypes |
a recall of the list of the distance types: "KL","L2","L1","APE" |
Jerome Collet
depquad <- function(lon,dd,a) { x <- rnorm(lon) y0 <- a*x^2 y <- y0 + rnorm(lon) reste=rnorm((dd-2)*lon) return(c(x,y,reste)) } sims0=simany(101,3,8,50,nbsafe=1) seuils=apply(sims0$lrs,3,quantile,0.95) seuils=matrix(ncol=4,nrow=50,seuils,byrow=TRUE) sims1=simany(101,3,8,50,nbsafe=1,fun=depquad,a=0.5) apply(sims1$lrs[,1,]>seuils,2,mean)
depquad <- function(lon,dd,a) { x <- rnorm(lon) y0 <- a*x^2 y <- y0 + rnorm(lon) reste=rnorm((dd-2)*lon) return(c(x,y,reste)) } sims0=simany(101,3,8,50,nbsafe=1) seuils=apply(sims0$lrs,3,quantile,0.95) seuils=matrix(ncol=4,nrow=50,seuils,byrow=TRUE) sims1=simany(101,3,8,50,nbsafe=1,fun=depquad,a=0.5) apply(sims1$lrs[,1,]>seuils,2,mean)
Simulates the test statistic, under independence
simnul(sampsize, dimension, subsampsizes, sampnum,KL=TRUE,nbsafe=5,nthreads=2)
simnul(sampsize, dimension, subsampsizes, sampnum,KL=TRUE,nbsafe=5,nthreads=2)
sampsize |
sample size |
dimension |
sample dimension |
subsampsizes |
vector of sub-sample sizes |
sampnum |
number of samples |
KL |
if TRUE, returns the Kullback-Leibler divergence with the independent case, if FALSE, the L2 distance. There is no re-normalization, contrary to what happens for |
nbsafe |
the ratio between the number of sub-samples and the cardinality of the discretized copula. |
nthreads |
number of number of threads, assumed to be strictly positive. For "full throttle" computations, consider using parallel::detectCores() |
lrs |
the distances with independent case |
scarcities |
the proportions of non-reached vector ranks |
Jerome Collet
library(datasets) # plot(swiss) c=corc(swiss,1:3,8) c RV=sum(c$cop*log(c$cop),na.rm=TRUE)+3*log(8) sims=simnul(47,3,8,100) pvalue=mean(RV<sims$lrs) pvalue RV summary(sims$lrs)
library(datasets) # plot(swiss) c=corc(swiss,1:3,8) c RV=sum(c$cop*log(c$cop),na.rm=TRUE)+3*log(8) sims=simnul(47,3,8,100) pvalue=mean(RV<sims$lrs) pvalue RV summary(sims$lrs)