Title: | Data Structure and Manipulations Tool for Host and Viral Population |
Description: | Statistical Methods for Inferring Transmissions of Infectious Diseases from deep sequencing data (SMITID). It allow sequence-space-time host and viral population data storage, indexation and querying. |
Authors: | Jean-Francois Rey [aut, cre] |
Maintainer: | Jean-Francois Rey <[email protected]> |
License: | GPL (>= 2) | file LICENSE |
Version: | 0.0.5 |
Built: | 2025-03-10 06:14:17 UTC |
Source: | CRAN |
Statistical Methods for Inferring Transmissions of Infectious Diseases from deep sequencing data (SMITID). It allow sequence-space-time host and viral population data storage, indexation and querying.
Package: | SMITIDstruct |
Type: | Package |
Version: | 0.0.5 |
Date: | 2019-06-14 |
License: | GPL (>=2) |
The SMITIDstruct package contains functions and methods for manipulating Host and Viral population genotico-space-time data.
Jean-Francois Rey [email protected]
Maintainer: Jean-Francois Rey [email protected]
## Run a simulation library("SMITIDstruct") demo.SMITIDstruct.run()
## Run a simulation library("SMITIDstruct") demo.SMITIDstruct.run()
add a code event to an another
addcode(code, code.add)
addcode(code, code.add)
code |
an existing code |
code.add |
the code to add |
merge of the two code
add an Host to a HostSet
addHost(lhost, id)
addHost(lhost, id)
lhost |
a hostSet Object |
id |
a character of host ID |
a HostSet of host object with there ID
lhost <- list() lhost <- addHost(lhost,"42")
lhost <- list() lhost <- addHost(lhost,"42")
add to an index a new eventcode
addIndex(index, id_host, time, code)
addIndex(index, id_host, time, code)
index |
an index |
id_host |
an host index in HostSet |
time |
a time |
code |
an event code |
the index updated (add a row or update one)
load Viral pop observation in Host object
addViralObs(lhost, lvpop)
addViralObs(lhost, lvpop)
lhost |
a HostSet |
lvpop |
a ViralPopSet |
lhost update with viral population observed
count allele at each position
alleleCount(mat, seq.char = c("A", "T", "G", "C"))
alleleCount(mat, seq.char = c("A", "T", "G", "C"))
mat |
a genomique seq list as matrix by row |
seq.char |
allele alphabet |
a matrix, each row as a unique seq and col as allele count by position
concat several Viral population in one ViralPop object
concatViralPop(lvpop, lid)
concatViralPop(lvpop, lid)
lvpop |
a ViralPop Set |
lid |
vector of viralpop id to concat |
a ViralPop object with ID concatenation from all IDs and time at 0.
Create a new ViralPop object
createAViralPop(host_id, obs_time, seq, id_seq = "seq_ID", seq_value = "seq", prop = "prop", compact = FALSE)
createAViralPop(host_id, obs_time, seq, id_seq = "seq_ID", seq_value = "seq", prop = "prop", compact = FALSE)
host_id |
host ID which viral pop is observed |
obs_time |
time of the observation (numeric or date) |
seq |
a data.frame of sequences ID, sequences and counts |
id_seq |
column name containing the sequences ID |
seq_value |
column name containing the sequences |
prop |
column name containing the count of each sequences |
compact |
boolean, default FALSE, if TRUE will try group identicals sequences (not implemented yet) |
create a list of Host class object
list_host |
a character vector of host ID |
a HostSet of host object with there ID
lh <- seq(1,30,1) lhost <- createHost(lh)
lh <- seq(1,30,1) lhost <- createHost(lh)
create an index of time id_host and event code
hostlist |
a Hostset |
a data.frame with TIME, ID_HOST and EVENTCODE as columns
run a demo to load HostSet, ViralPopSet and index
diversity calculation using Mean Pairwise Distance
vpop |
a ViralPop object |
Allele frequency spectrum or Site frequency spectra : the distribution of alternative allele frequencies across all sites of genetic sequences
vpop |
a viralPop class |
the site frequency spectra
get Host(s) covariates
getCov(lhost, id = NA)
getCov(lhost, id = NA)
lhost |
a HostSet |
id |
a vector of host id (default NA : all lhost) |
a data.frame
Converte timestamp to Date (string)
getDate(time, format = "%Y-%m-%dT%H:%M:%S")
getDate(time, format = "%Y-%m-%dT%H:%M:%S")
time |
a timestamp or vector of |
format |
Date format output (default %Y-%m-%dT%H:%M:%S) |
time as string date
get pairwise distance of an host over viral population observated
getDiversity.pDistance(host, lvpop)
getDiversity.pDistance(host, lvpop)
host |
an Host object |
lvpop |
a ViralPopSet object |
a data.frame with col as time of observation and p_distance
get Allele Frequency Spectrum or Site Frequency spectra for observated viral pop of an host
getDiversity.sfs(host, lvpop)
getDiversity.sfs(host, lvpop)
host |
an Host object |
lvpop |
an ViralPopSet object |
a list indexed by time that contains allele.time and count
get hosts informations, status, infectedby, coordinates and time
getInfosByHostAndTime(index, lhost)
getInfosByHostAndTime(index, lhost)
index |
an index |
lhost |
a hosts list |
a data.frame with colnames (id, time, infectedby, status, probabilities, X ,Y)
get Host(s) states
getStates(lhost, id = NA)
getStates(lhost, id = NA)
lhost |
a HostSet |
id |
a vector of host id (default NA : all lhost) |
a data.frame
get the time line of an host
getTimeLine(lhost, id)
getTimeLine(lhost, id)
lhost |
a hostSet |
id |
a host ID |
a data.frame
Get the timestamp of Date
getTimestamp(date, format = "%Y-%m-%dT%H:%M:%S")
getTimestamp(date, format = "%Y-%m-%dT%H:%M:%S")
date |
a date (as string) or vector of |
format |
the date format (default %Y-%m-%dT%H:%M:%S) |
timestamp of the date(s)
get a transmission tree as a data.frame
getTransmissionTree(lhost, id = NA)
getTransmissionTree(lhost, id = NA)
lhost |
a hostSet |
id |
a vector of hosts ids (default NA : all host) |
a data.frame as source|target|time in columns
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() lhost <- loadTree(lhost,paste(path,"/tree.txt",sep='')) print(getTransmissionTree(lhost))
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() lhost <- loadTree(lhost,paste(path,"/tree.txt",sep='')) print(getTransmissionTree(lhost))
Spatio-temporal information about Host.
Object can be created by calling ...
rdname Host-class
Host identifier
Host coordinates in time (as sf)
Host States/Status (dob, Inf...)
data.frame of time and host id who infected this host
data.frame of time and host id who has been contamined by this host
data.frame of time and index of Viral population Observation
data.frame of time, cavariate and value of this host.
Chekc if a numeric is not a timestamp
time |
a numeric |
TRUE if time is a julien day, otherwise FALSE
Check if a string represent a date
date |
a string or a vector of string (without NA) |
TRUE if date contains date format
Check if a numeric represent a timestamp
time |
a numeric |
TRUE if time >= 1971
check a code contains a specific code
isInCode(code, thecode)
isInCode(code, thecode)
code |
list of code to test |
thecode |
the real code |
TRUE if code contain thecode otherwise FLASE
Load Hosts states
loadCoords(lhost, dfCoords, id = "ID")
loadCoords(lhost, dfCoords, id = "ID")
lhost |
a HostSet |
dfCoords |
a data.frame with host ID, time and longitude latitude values |
id |
colname for host ID |
lhost updated
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() lhost <- loadTree(lhost,paste(path,"/tree.txt",sep='')) coords <- read.table(file=paste(path,"/hosts_coords.txt",sep=''), header=TRUE, check.names=FALSE) lhost <- loadCoords(lhost,coords)
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() lhost <- loadTree(lhost,paste(path,"/tree.txt",sep='')) coords <- read.table(file=paste(path,"/hosts_coords.txt",sep=''), header=TRUE, check.names=FALSE) lhost <- loadCoords(lhost,coords)
Load Hosts covariates
loadCovs(lhost, dfCovs, id = "ID", colCovs)
loadCovs(lhost, dfCovs, id = "ID", colCovs)
lhost |
a HostSet |
dfCovs |
a data.frame with host ID in rows and covariates in columns |
id |
colname for host ID |
colCovs |
colnames of covariates columns |
lhost updated with covariates
load host object from a file
loadHost(file = "host.txt")
loadHost(file = "host.txt")
file |
a file containing hosts data |
a list of Host object (HostSet) include Class-Host.R
Load Hosts states
loadStates(lhost, dfStates, id = "ID", colStates)
loadStates(lhost, dfStates, id = "ID", colStates)
lhost |
a HostSet |
dfStates |
a data.frame with host ID and states in columns and time as value |
id |
colname for host ID |
colStates |
colnames of States columns |
lhost updated
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() class(lhost) <- "hostSet" lhost <- loadTree(lhost,paste(path,"/tree.txt",sep='')) obs <- read.table(paste(path,"/obs.txt",sep=''),header=TRUE, check.names=FALSE) obs.states <- c(colnames(obs[-grep("ID|Tobs.*",colnames(obs))])) lhost <- loadStates(lhost, obs, colStates=obs.states)
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() class(lhost) <- "hostSet" lhost <- loadTree(lhost,paste(path,"/tree.txt",sep='')) obs <- read.table(paste(path,"/obs.txt",sep=''),header=TRUE, check.names=FALSE) obs.states <- c(colnames(obs[-grep("ID|Tobs.*",colnames(obs))])) lhost <- loadStates(lhost, obs, colStates=obs.states)
load sources and offsprings from file
loadTree(lhost = list(), file = "tree.txt", source = "ID-source", receptor = "ID-receptor", tinf = "Tinf", weight = "Weight")
loadTree(lhost = list(), file = "tree.txt", source = "ID-source", receptor = "ID-receptor", tinf = "Tinf", weight = "Weight")
lhost |
a HostSet |
file |
a file containing tree data |
source |
column name for source ID |
receptor |
column name for receptor ID |
tinf |
column name for infection Time |
weight |
column name of infection weight |
the lhost param update with sources and offsprings
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() class(lhost) <- "hostSet" lhost <- loadTree(lhost,paste(path,"/tree.txt",sep=''))
path = system.file("extdata", "data-simul/", package="SMITIDstruct") lhost <- list() class(lhost) <- "hostSet" lhost <- loadTree(lhost,paste(path,"/tree.txt",sep=''))
load sources and offsprings from a data.frame
loadTreeDF(lhost = list(), df = data.frame(), source = "ID-source", receptor = "ID-receptor", tinf = "Tinf", weight = "Weight")
loadTreeDF(lhost = list(), df = data.frame(), source = "ID-source", receptor = "ID-receptor", tinf = "Tinf", weight = "Weight")
lhost |
a HostSet |
df |
a data.frame containing tree data |
source |
column name for source ID |
receptor |
column name for receptor ID |
tinf |
column name for infection Time |
weight |
infection links probability |
the lhost param update with sources and offsprings
load a ViralPop object
loadViralObs(id, time, file)
loadViralObs(id, time, file)
id |
host pathogen ID |
time |
time of the observation (numeric or Date) |
file |
a fasta file |
a new ViralPop object
Load all ViralPop observated in the file.obs
loadViralPop(directory, listFiles, listCol = list(id = "id", timeObs = "time", filename = "filename"), file.extension = "fasta")
loadViralPop(directory, listFiles, listCol = list(id = "id", timeObs = "time", filename = "filename"), file.extension = "fasta")
directory |
path where is data |
listFiles |
a dataframe with host ID, time observation and file name (filename.fasta) |
listCol |
a list of listFiles colomns names ("id", "timeObs", "filename") |
file.extension |
genotype file extension |
a vector of VirlaPop object
path = system.file("extdata", "data-simul/", package="SMITIDstruct") files <- list.files(path, pattern = ".*.fasta" ,full.names=FALSE) lfileinfo <- sapply(files,function(x){return(substr(x,1,nchar(x)-6))}) splitFiles <- strsplit(lfileinfo, "_"); listF <- cbind(data.frame(matrix(unlist(splitFiles),nrow=length(splitFiles), byrow=TRUE), stringsAsFactors = FALSE), names(splitFiles)) colnames(listF) <- c("id", "time", "filename") lvpop <- loadViralPop(path,listF)
path = system.file("extdata", "data-simul/", package="SMITIDstruct") files <- list.files(path, pattern = ".*.fasta" ,full.names=FALSE) lfileinfo <- sapply(files,function(x){return(substr(x,1,nchar(x)-6))}) splitFiles <- strsplit(lfileinfo, "_"); listF <- cbind(data.frame(matrix(unlist(splitFiles),nrow=length(splitFiles), byrow=TRUE), stringsAsFactors = FALSE), names(splitFiles)) colnames(listF) <- c("id", "time", "filename") lvpop <- loadViralPop(path,listF)
load a list of viral populations
loadViralPopSet(lvpop = list(), list)
loadViralPopSet(lvpop = list(), list)
lvpop |
a viralPopSet (default new one) |
list |
a list (see details) |
The list have to be on this format: list$HOST_ID$TIME$list$seq_id $seq $prop A list indexed by host ID, follow by a list indexed by time (of observation). The last list contains an array of seq_ID (sequence ID), an array of seq (sequence as characters), and an array of the count of seq. example : $'HOST_42'$'2014-01-01T00:00:00'$seq_ID ["SEQ_1","SEQ_2"] $'HOST_42'$'2014-01-01T00:00:00'$seq ["ACGT","TGCA"] $'HOST_42'$'2014-01-01T00:00:00'$seq_ID ["46","6"]
merge a list of event code
listcode |
a list of event code* |
a code
plot Mean Pairwise Distance for an host viralpop over time
plotDiversity.pDistance(host, lvpop)
plotDiversity.pDistance(host, lvpop)
host |
an Host object |
lvpop |
a ViralPopSet object |
plot Allele frequency spetrum for an host viralpop over time
plotDiversity.sfs(host, lvpop)
plotDiversity.sfs(host, lvpop)
host |
an Host object |
lvpop |
an ViralPopSet object |
set hosts states from a data.frame
setStates(lhost, dfStates, colStates = c(id = "ID", time = "time", states = "value"))
setStates(lhost, dfStates, colStates = c(id = "ID", time = "time", states = "value"))
lhost |
a HostSet |
dfStates |
a data.frame with host ID and states and time in columns |
colStates |
vector of the columns name, id, time and states |
the HostSet updated
simulate states from sources infection
lhost |
a HostSet |
lhost update with states from sources time ~
Viral population data containing genotypes
Host identifier
Observation time as numeric since 1970/01/01
Qt of variants
list of variants id with same sequence
all variants genotypes (as DNAStringSet)
proportions of each variants