Title: | Data Structure and Manipulations Tool for Host and Viral Population |
---|---|
Description: | Statistical Methods for Inferring Transmissions of Infectious Diseases from deep sequencing data (SMITID). It allow sequence-space-time host and viral population data storage, indexation and querying. |
Authors: | Jean-Francois Rey [aut, cre] |
Maintainer: | Jean-Francois Rey <[email protected]> |
License: | GPL (>= 2) | file LICENSE |
Version: | 0.0.5 |
Built: | 2024-12-10 06:33:19 UTC |
Source: | CRAN |
Statistical Methods for Inferring Transmissions of Infectious Diseases from deep sequencing data (SMITID). It allow sequence-space-time host and viral population data storage, indexation and querying.
Package: | SMITIDstruct |
Type: | Package |
Version: | 0.0.5 |
Date: | 2019-06-14 |
License: | GPL (>=2) |
The SMITIDstruct package contains functions and methods for manipulating Host and Viral population genotico-space-time data.
Jean-Francois Rey [email protected]
Maintainer: Jean-Francois Rey [email protected]
## Run a simulation library("SMITIDstruct") demo.SMITIDstruct.run()
## Run a simulation library("SMITIDstruct") demo.SMITIDstruct.run()
add a code event to an another
addcode(code, code.add)
addcode(code, code.add)
code |
an existing code |
code.add |
the code to add |
merge of the two code
add an Host to a HostSet
addHost(lhost, id)
addHost(lhost, id)
lhost |
a hostSet Object |
id |
a character of host ID |
a HostSet of host object with there ID
lhost <- list() lhost <- addHost(lhost,"42")
lhost <- list() lhost <- addHost(lhost,"42")
add to an index a new eventcode
addIndex(index, id_host, time, code)
addIndex(index, id_host, time, code)
index |
an index |
id_host |
an host index in HostSet |
time |
a time |
code |
an event code |
the index updated (add a row or update one)
load Viral pop observation in Host object
addViralObs(lhost, lvpop)
addViralObs(lhost, lvpop)
lhost |
a HostSet |
lvpop |
a ViralPopSet |
lhost update with viral population observed
count allele at each position
alleleCount(mat, seq.char = c("A", "T", "G", "C"))
alleleCount(mat, seq.char = c("A", "T", "G", "C"))
mat |
a genomique seq list as matrix by row |
seq.char |
allele alphabet |
a matrix, each row as a unique seq and col as allele count by position
concat several Viral population in one ViralPop object
concatViralPop(lvpop, lid)
concatViralPop(lvpop, lid)
lvpop |
a ViralPop Set |
lid |
vector of viralpop id to concat |
a ViralPop object with ID concatenation from all IDs and time at 0.
Create a new ViralPop object
createAViralPop(host_id, obs_time, seq, id_seq = "seq_ID", seq_value = "seq", prop = "prop", compact = FALSE)
createAViralPop(host_id, obs_time, seq, id_seq = "seq_ID", seq_value = "seq", prop = "prop", compact = FALSE)
host_id |
host ID which viral pop is observed |
obs_time |
time of the observation (numeric or date) |
seq |
a data.frame of sequences ID, sequences and counts |
id_seq |
column name containing the sequences ID |
seq_value |
column name containing the sequences |
prop |
column name containing the count of each sequences |
compact |
boolean, default FALSE, if TRUE will try group identicals sequences (not implemented yet) |
create a list of Host class object
createHost(list_host)
createHost(list_host)
list_host |
a character vector of host ID |
a HostSet of host object with there ID
lh <- seq(1,30,1) lhost <- createHost(lh)
lh <- seq(1,30,1) lhost <- createHost(lh)
create an index of time id_host and event code
createIndex(hostlist)
createIndex(hostlist)
hostlist |
a Hostset |
a data.frame with TIME, ID_HOST and EVENTCODE as columns
run a demo to load HostSet, ViralPopSet and index
demo.SMITIDstruct.run()
demo.SMITIDstruct.run()
diversity calculation using Mean Pairwise Distance
diversity.pDistance(vpop)
diversity.pDistance(vpop)
vpop |
a ViralPop object |
result
Allele frequency spectrum or Site frequency spectra : the distribution of alternative allele frequencies across all sites of genetic sequences
diversity.sfs(vpop)
diversity.sfs(vpop)
vpop |
a viralPop class |
the site frequency spectra
get Host(s) covariates
getCov(lhost, id = NA)
getCov(lhost, id = NA)
lhost |
a HostSet |
id |
a vector of host id (default NA : all lhost) |
a data.frame
Converte timestamp to Date (string)
getDate(time, format = "%Y-%m-%dT%H:%M:%S")
getDate(time, format = "%Y-%m-%dT%H:%M:%S")
time |
a timestamp or vector of |
format |
Date format output (default %Y-%m-%dT%H:%M:%S) |
time as string date
get pairwise distance of an host over viral population observated
getDiversity.pDistance(host, lvpop)
getDiversity.pDistance(host, lvpop)
host |
an Host object |
lvpop |
a ViralPopSet object |
a data.frame with col as time of observation and p_distance
get Allele Frequency Spectrum or Site Frequency spectra for observated viral pop of an host
getDiversity.sfs(host, lvpop)
getDiversity.sfs(host, lvpop)
host |
an Host object |
lvpop |
an ViralPopSet object |
a list indexed by time that contains allele.time and count
get hosts informations, status, infectedby, coordinates and time
getInfosByHostAndTime(index, lhost)
getInfosByHostAndTime(index, lhost)
index |
an index |
lhost |
a hosts list |
a data.frame with colnames (id, time, infectedby, status, probabilities, X ,Y)
get Host(s) states
getStates(lhost, id = NA)
getStates(lhost, id = NA)
lhost |
a HostSet |
id |
a vector of host id (default NA : all lhost) |
a data.frame
get the time line of an host
getTimeLine(lhost, id)
getTimeLine(lhost, id)
lhost |
a hostSet |
id |
a host ID |
a data.frame
Get the timestamp of Date
getTimestamp(date, format = "%Y-%m-%dT%H:%M:%S")
getTimestamp(date, format = "%Y-%m-%dT%H:%M:%S")
date |
a date (as string) or vector of |
format |
the date format (default %Y-%m-%dT%H:%M:%S) |
timestamp of the date(s)
get a transmission tree as a data.frame
getTransmissionTree(lhost, id = NA)
getTransmissionTree(lhost, id = NA)
lhost |
a hostSet |
id |
a vector of hosts ids (default NA : all host) |
a data.frame as source|target|time in columns
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() lhost <- loadTree(lhost,paste(path,"/tree.txt",sep='')) print(getTransmissionTree(lhost))
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() lhost <- loadTree(lhost,paste(path,"/tree.txt",sep='')) print(getTransmissionTree(lhost))
Spatio-temporal information about Host.
Object can be created by calling ...
rdname Host-class
ID
Host identifier
coordinates
Host coordinates in time (as sf)
states
Host States/Status (dob, Inf...)
sources
data.frame of time and host id who infected this host
offsprings
data.frame of time and host id who has been contamined by this host
ID_V_POP
data.frame of time and index of Viral population Observation
covariates
data.frame of time, cavariate and value of this host.
Chekc if a numeric is not a timestamp
is.juliendate(time)
is.juliendate(time)
time |
a numeric |
TRUE if time is a julien day, otherwise FALSE
Check if a string represent a date
is.StringDate(date)
is.StringDate(date)
date |
a string or a vector of string (without NA) |
TRUE if date contains date format
Check if a numeric represent a timestamp
is.timestamp(time)
is.timestamp(time)
time |
a numeric |
TRUE if time >= 1971
check a code contains a specific code
isInCode(code, thecode)
isInCode(code, thecode)
code |
list of code to test |
thecode |
the real code |
TRUE if code contain thecode otherwise FLASE
Load Hosts states
loadCoords(lhost, dfCoords, id = "ID")
loadCoords(lhost, dfCoords, id = "ID")
lhost |
a HostSet |
dfCoords |
a data.frame with host ID, time and longitude latitude values |
id |
colname for host ID |
lhost updated
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() lhost <- loadTree(lhost,paste(path,"/tree.txt",sep='')) coords <- read.table(file=paste(path,"/hosts_coords.txt",sep=''), header=TRUE, check.names=FALSE) lhost <- loadCoords(lhost,coords)
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() lhost <- loadTree(lhost,paste(path,"/tree.txt",sep='')) coords <- read.table(file=paste(path,"/hosts_coords.txt",sep=''), header=TRUE, check.names=FALSE) lhost <- loadCoords(lhost,coords)
Load Hosts covariates
loadCovs(lhost, dfCovs, id = "ID", colCovs)
loadCovs(lhost, dfCovs, id = "ID", colCovs)
lhost |
a HostSet |
dfCovs |
a data.frame with host ID in rows and covariates in columns |
id |
colname for host ID |
colCovs |
colnames of covariates columns |
lhost updated with covariates
load host object from a file
loadHost(file = "host.txt")
loadHost(file = "host.txt")
file |
a file containing hosts data |
a list of Host object (HostSet) include Class-Host.R
Load Hosts states
loadStates(lhost, dfStates, id = "ID", colStates)
loadStates(lhost, dfStates, id = "ID", colStates)
lhost |
a HostSet |
dfStates |
a data.frame with host ID and states in columns and time as value |
id |
colname for host ID |
colStates |
colnames of States columns |
lhost updated
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() class(lhost) <- "hostSet" lhost <- loadTree(lhost,paste(path,"/tree.txt",sep='')) obs <- read.table(paste(path,"/obs.txt",sep=''),header=TRUE, check.names=FALSE) obs.states <- c(colnames(obs[-grep("ID|Tobs.*",colnames(obs))])) lhost <- loadStates(lhost, obs, colStates=obs.states)
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() class(lhost) <- "hostSet" lhost <- loadTree(lhost,paste(path,"/tree.txt",sep='')) obs <- read.table(paste(path,"/obs.txt",sep=''),header=TRUE, check.names=FALSE) obs.states <- c(colnames(obs[-grep("ID|Tobs.*",colnames(obs))])) lhost <- loadStates(lhost, obs, colStates=obs.states)
load sources and offsprings from file
loadTree(lhost = list(), file = "tree.txt", source = "ID-source", receptor = "ID-receptor", tinf = "Tinf", weight = "Weight")
loadTree(lhost = list(), file = "tree.txt", source = "ID-source", receptor = "ID-receptor", tinf = "Tinf", weight = "Weight")
lhost |
a HostSet |
file |
a file containing tree data |
source |
column name for source ID |
receptor |
column name for receptor ID |
tinf |
column name for infection Time |
weight |
column name of infection weight |
the lhost param update with sources and offsprings
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() class(lhost) <- "hostSet" lhost <- loadTree(lhost,paste(path,"/tree.txt",sep=''))
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() class(lhost) <- "hostSet" lhost <- loadTree(lhost,paste(path,"/tree.txt",sep=''))
load sources and offsprings from a data.frame
loadTreeDF(lhost = list(), df = data.frame(), source = "ID-source", receptor = "ID-receptor", tinf = "Tinf", weight = "Weight")
loadTreeDF(lhost = list(), df = data.frame(), source = "ID-source", receptor = "ID-receptor", tinf = "Tinf", weight = "Weight")
lhost |
a HostSet |
df |
a data.frame containing tree data |
source |
column name for source ID |
receptor |
column name for receptor ID |
tinf |
column name for infection Time |
weight |
infection links probability |
the lhost param update with sources and offsprings
load a ViralPop object
loadViralObs(id, time, file)
loadViralObs(id, time, file)
id |
host pathogen ID |
time |
time of the observation (numeric or Date) |
file |
a fasta file |
a new ViralPop object
Load all ViralPop observated in the file.obs
loadViralPop(directory, listFiles, listCol = list(id = "id", timeObs = "time", filename = "filename"), file.extension = "fasta")
loadViralPop(directory, listFiles, listCol = list(id = "id", timeObs = "time", filename = "filename"), file.extension = "fasta")
directory |
path where is data |
listFiles |
a dataframe with host ID, time observation and file name (filename.fasta) |
listCol |
a list of listFiles colomns names ("id", "timeObs", "filename") |
file.extension |
genotype file extension |
a vector of VirlaPop object
path = system.file("extdata", "data-simul/", package="SMITIDstruct") files <- list.files(path, pattern = ".*.fasta" ,full.names=FALSE) lfileinfo <- sapply(files,function(x){return(substr(x,1,nchar(x)-6))}) splitFiles <- strsplit(lfileinfo, "_"); listF <- cbind(data.frame(matrix(unlist(splitFiles),nrow=length(splitFiles), byrow=TRUE), stringsAsFactors = FALSE), names(splitFiles)) colnames(listF) <- c("id", "time", "filename") lvpop <- loadViralPop(path,listF)
path = system.file("extdata", "data-simul/", package="SMITIDstruct") files <- list.files(path, pattern = ".*.fasta" ,full.names=FALSE) lfileinfo <- sapply(files,function(x){return(substr(x,1,nchar(x)-6))}) splitFiles <- strsplit(lfileinfo, "_"); listF <- cbind(data.frame(matrix(unlist(splitFiles),nrow=length(splitFiles), byrow=TRUE), stringsAsFactors = FALSE), names(splitFiles)) colnames(listF) <- c("id", "time", "filename") lvpop <- loadViralPop(path,listF)
load a list of viral populations
loadViralPopSet(lvpop = list(), list)
loadViralPopSet(lvpop = list(), list)
lvpop |
a viralPopSet (default new one) |
list |
a list (see details) |
The list have to be on this format: list$HOST_ID$TIME$list$seq_id $seq $prop A list indexed by host ID, follow by a list indexed by time (of observation). The last list contains an array of seq_ID (sequence ID), an array of seq (sequence as characters), and an array of the count of seq. example : $'HOST_42'$'2014-01-01T00:00:00'$seq_ID ["SEQ_1","SEQ_2"] $'HOST_42'$'2014-01-01T00:00:00'$seq ["ACGT","TGCA"] $'HOST_42'$'2014-01-01T00:00:00'$seq_ID ["46","6"]
merge a list of event code
mergeCode(listcode)
mergeCode(listcode)
listcode |
a list of event code* |
a code
plot Mean Pairwise Distance for an host viralpop over time
plotDiversity.pDistance(host, lvpop)
plotDiversity.pDistance(host, lvpop)
host |
an Host object |
lvpop |
a ViralPopSet object |
plot Allele frequency spetrum for an host viralpop over time
plotDiversity.sfs(host, lvpop)
plotDiversity.sfs(host, lvpop)
host |
an Host object |
lvpop |
an ViralPopSet object |
set hosts states from a data.frame
setStates(lhost, dfStates, colStates = c(id = "ID", time = "time", states = "value"))
setStates(lhost, dfStates, colStates = c(id = "ID", time = "time", states = "value"))
lhost |
a HostSet |
dfStates |
a data.frame with host ID and states and time in columns |
colStates |
vector of the columns name, id, time and states |
the HostSet updated
simulate states from sources infection
simulateStates(lhost)
simulateStates(lhost)
lhost |
a HostSet |
lhost update with states from sources time ~
Viral population data containing genotypes
ID
Host identifier
time
Observation time as numeric since 1970/01/01
size
Qt of variants
names
list of variants id with same sequence
genotypes
all variants genotypes (as DNAStringSet)
proportions
proportions of each variants