Title: | Create and Manipulate Vocalisation Diagrams |
---|---|
Description: | Create adjacency matrices of vocalisation graphs from dataframes containing sequences of speech and silence intervals, transforming these matrices into Markov diagrams, and generating datasets for classification of these diagrams by 'flattening' them and adding global properties (functionals) etc. Vocalisation diagrams date back to early work in psychiatry (Jaffe and Feldstein, 1970) and social psychology (Dabbs and Ruback, 1987) but have only recently been employed as a data representation method for machine learning tasks including meeting segmentation (Luz, 2012) <doi:10.1145/2328967.2328970> and classification (Luz, 2013) <doi:10.1145/2522848.2533788>. |
Authors: | Saturnino Luz [aut, cre] |
Maintainer: | Saturnino Luz <[email protected]> |
License: | GPL-3 |
Version: | 0.8.4 |
Built: | 2024-10-31 21:26:47 UTC |
Source: | CRAN |
Create adjacency matrices of vocalisation graphs from dataframes containing sequences of speech and silence intervals, transforming these matrices into Markov diagrams, and generating datasets for classification of these diagrams by 'flattening' them and adding global properties (functionals) etc. Vocalisation diagrams date back to early work in psychiatry (Jaffe and Feldstein, 1970) and social psychology (Dabbs and Ruback, 1987) but have only recently been employed as a data representation method for machine learning tasks including meeting segmentation (Luz, 2012) doi:10.1145/2328967.2328970 and classification (Luz, 2013) doi:10.1145/2522848.2533788.
Saturnino Luz [email protected]
S. Luz. Automatic identification of experts and performance prediction in the multimodal math data corpus through analysis of speech interaction. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI'13, pages 575–582, New York, NY, USA, 2013. ACM.
S. Luz. The non-verbal structure of patient case discussions in multidisciplinary medical team meetings. ACM Transactions on Information Systems, 30(3):17:1–17:24, 2012
Dabbs, J. M. J. and Ruback, B. Dimensions of group process: Amount and structure of vocal interaction. Advances in Experimental Social Psychology 20, 123-169, 1987.
Jaffe , J. and Feldstein, S. Rhythms of dialogue. ser. Personality and Psychopathology. Academic Press, New York, 1976.
Useful links:
Report bugs at https://git.ecdf.ed.ac.uk/sluzfil/vocaldia/-issues
Anonymise a vocalisation diagram
anonymise(vd) ## S3 method for class 'vocaldia' anonymise(vd) ## Default S3 method: anonymise(vd)
anonymise(vd) ## S3 method for class 'vocaldia' anonymise(vd) ## Default S3 method: anonymise(vd)
vd |
a vocalisation diagram (vocaldia object) |
"anonymise" a vocaldia
turn taking probability matrix by
replacing speaker names by variables is
the speaker who spoke the least and
the one who did the most
talking.
a new vocaldia with speaker names replaced by variables
s.t.
is the speaker who spoke the least
and
the one who did the most talking.
## Not run: data(vocdia) x2 <- getSampledVocalMatrix(subset(atddia, id=='Abbott_Maddock_01'), individual=TRUE, nodecolumn='speaker') anonymise(x2) ## End(Not run)
## Not run: data(vocdia) x2 <- getSampledVocalMatrix(subset(atddia, id=='Abbott_Maddock_01'), individual=TRUE, nodecolumn='speaker') anonymise(x2) ## End(Not run)
appendSpeechRate: append pre-generated speech rate data (see audioproc.R)
appendSpeechRate(t, file = NULL)
appendSpeechRate(t, file = NULL)
t |
a table read through read.cha |
file |
speech rate file |
dataframe t bound to speech rates per utterance
luzs
A dataset containing 38 dialogues (17 control patients, and 21 AD patients) and 7869 vocalisation events.
atddia
atddia
A data frame with 7869 rows and 7 variables:
The dialogue indentifier
The start time of a speech turn or silence interval
The end time of a speech turn or silence interval
An identifier for the speaker of the turn, or Floor for silence.
The speaker's role (patient, interviewer, other, or Floor
The transcription of the turn (blanked out for anonymity)
The diagnosis (ad or nonad
This dataset was generated from the Carolina Conversations Collection, and used in the work described in De La Fuente, Albert and Luz: "Detecting cognitive decline through dialogue processing", 2017. For the full data set, please contact the Medical University of South Carolina (MUSC) http://carolinaconversations.musc.edu/
Compute the entropy of a distribution.
getEntropy(distribution)
getEntropy(distribution)
distribution |
a probability distribution. |
Compute the entropy of a distribution.
a numeric value.
getIDs get speaker IDs from CHA content
getIDs(text)
getIDs(text)
text |
a string vector containing the lines of a CHA file |
a vector with participants IDs
luzs
Identify the type of pause between vocalisations.
getPauseType(prevspeaker, nextspeaker)
getPauseType(prevspeaker, nextspeaker)
prevspeaker |
speaker of the vocalisation immediately before Floor |
nextspeaker |
speaker of the vocalisation immediately after Floor |
The type of pause a 'Floor' (silence) event represents can be: 'Pause', 'SwitchingPause', 'GrpPause', or 'GrpSwitchingPause'. See (Luz, 2013) for details.
the pause type.
getPauseType('a', 'b') ## [1] "SwitchingPause" getPauseType('a', 'Grp') ## [1] "SwitchingPause" getPauseType('Grp', 'Grp') ## [1] "GrpPause" getPauseType('Grp', 'a') ## [1] "GrpSwitchingPause" getPauseType('a', 'a') ##[1] "Pause"
getPauseType('a', 'b') ## [1] "SwitchingPause" getPauseType('a', 'Grp') ## [1] "SwitchingPause" getPauseType('Grp', 'Grp') ## [1] "GrpPause" getPauseType('Grp', 'a') ## [1] "GrpSwitchingPause" getPauseType('a', 'a') ##[1] "Pause"
getPIDs get study-wide unique patient IDs from CHA content
getPID(text)
getPID(text)
text |
a string vector containing the lines of a CHA file |
a vector with participants IDs
luzs
Conditional (transition ) probability
getPofAgivenB(a, b, ttarray)
getPofAgivenB(a, b, ttarray)
a |
target node |
b |
source node |
ttarray |
adjacency matrix |
Retrieve , probability of a transition from b to a in an
adjacency matrix
a transition probability.
Generate a count vocalisation diagram through 'sampling'.
getSampledVocalCountMatrix( cdf, rate = 1, individual = FALSE, noPauseTypes = FALSE, begin = "begin", end = "end", nodecolumn = "role" )
getSampledVocalCountMatrix( cdf, rate = 1, individual = FALSE, noPauseTypes = FALSE, begin = "begin", end = "end", nodecolumn = "role" )
cdf |
a data frame consisting, minimally, of a column for vocalisation/pause start times, a column for end times, and a column identifying the speaker, speaker role or 'Floor' (for silences). |
rate |
the rate at which to sample the vocalisation events (in seconds) |
individual |
whether to include individual speakers or group them into a single Vocalisation node |
noPauseTypes |
if TRUE, ignore distinctions between pauses (SwitchingPause, GrpSwitchingPause, etc) |
begin |
the name of the column containing the start time of the vocalisation event in a row. |
end |
the name of the column containing the end time of the vocalisation event in the same row. |
nodecolumn |
the name of the column containing the node (speaker) name (e.g. 'speaker', 'role'). |
A vocalisation diagram (vocaldia) is a representation of a dialogue as a Markov process whose cell <m,n> contains the transition probability from node n to node m). This function for 'cases' (an identifier for a case or a vector of identifiers identifying a set of cases) in data frame 'df', obtained by sampling the timeline every 'rate'-th second (see getSampledVocalCountMatrix).
a vocaldia object, consisting of a vocalisation matrix (vocmatrix) where cell <m,n> contains the counts of transitions from node n to node m, and a table of prior probabilities (stationary distribution) per node.
(Luz, 2013)
data(vocdia) getSampledVocalCountMatrix(subset(atddia, id=='Abbott_Maddock_01'), nodecolumn='role')
data(vocdia) getSampledVocalCountMatrix(subset(atddia, id=='Abbott_Maddock_01'), nodecolumn='role')
Generate a probabilistic vocalisation diagram through 'sampling'.
getSampledVocalMatrix(df, ...)
getSampledVocalMatrix(df, ...)
df |
a data frame consisting, minimally, of a column for vocalisation/pause start times, a column for end times, and a column identifying the speaker, speaker role or 'Floor' (for silences). |
... |
general parameter to be passed to
|
A vocalisation diagram (vocaldia) is a representation of a dialogue as a Markov process whose cell <m,n> contains the transition probability from node n to node m).
a vocaldia object, consisting of a vocalisation matrix (vocmatrix) where cell <m,n> contains the transition probability from node n to node m, and a table of prior probabilities (stationary distribution) per node.
Saturnino Luz [email protected]
S. Luz. Automatic identification of experts and performance prediction in the multimodal math data corpus through analysis of speech interaction. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI'13, pages 575–582, New York, NY, USA, 2013. ACM.
data(vocdia) getSampledVocalMatrix(subset(atddia, id=='Abbott_Maddock_01'),nodecolumn='speaker', individual=TRUE)
data(vocdia) getSampledVocalMatrix(subset(atddia, id=='Abbott_Maddock_01'),nodecolumn='speaker', individual=TRUE)
getSilences read silences file
getSilences(file, sildir = NULL, silsuffix = "c.mp3.csv")
getSilences(file, sildir = NULL, silsuffix = "c.mp3.csv")
file |
CSV formatted silences file |
sildir |
dir where silence files are |
silsuffix |
## suffix for silence files |
silences dataframe
luzs
getSyllablesAndSilences: process Praat's grid for syllable nuclei, based on De Jong's approach
getSyllablesAndSilences(txtgrid)
getSyllablesAndSilences(txtgrid)
txtgrid |
Path to Praat grid file generated by praat-syllable-syllable-nuclei-v2 |
list of syllables and silences
luzs
De Jong, N. H. and Wempe, T. (2009). Praat script to detect syllable nuclei and measure speech rate automatically. Behavior Research Methods, 41(2):385–390, May.
getTranscript
getTranscript(text)
getTranscript(text)
text |
a string vector containing the lines of a CHA file |
a list of transcriptions (participant and interviewer utterances)
luzs
Generate a vocalisation diagram with absolute vocalisation durations.
getTurnTakingMatrix( df, begin = "begin", end = "end", nodecolumn = "role", individual = FALSE, noPauseTypes = FALSE )
getTurnTakingMatrix( df, begin = "begin", end = "end", nodecolumn = "role", individual = FALSE, noPauseTypes = FALSE )
df |
a data frame consisting, minimally, of a column for vocalisation/pause start times, a column for end times, and a column identifying the speaker, speaker role or 'Floor' (for silences). |
begin |
the name of the column containing the start time of the vocalisation event in a row. |
end |
the name of the column containing the end time of the vocalisation event in the same row. |
nodecolumn |
the name of the column containing the node (speaker) name (e.g. 'speaker', 'role'). |
individual |
whether to include individual speakers or group them into a single Vocalisation node |
noPauseTypes |
if TRUE, ignore distinctions between pauses (SwitchingPause, GrpSwitchingPause, etc) |
A vocalisation diagram (vocaldia) is a representation of a
dialogue as a Markov process whose cell <m,n> contains the
transition probability from node n to node m). Unlike
getSampledVocalCountMatrix
this function
accummulates event durations directly, therefore resulting in no
self-transitions (in general).
a vocaldia object, consisting of a vocalisation matrix (vocmatrix) where cell <m,n> contains the counts of transitions from node n to node m, and a table of absolute durations of vocalisation events.
S. Luz. Automatic identification of experts and performance prediction in the multimodal math data corpus through analysis of speech interaction. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI'13, pages 575–582, New York, NY, USA, 2013. ACM.
(Luz, 2013) and getTurnTakingMatrix
.
x <- subset(atddia, id=='Abbott_Maddock_01') getTurnTakingMatrix(x) getTurnTakingMatrix(x, individual=TRUE)
x <- subset(atddia, id=='Abbott_Maddock_01') getTurnTakingMatrix(x) getTurnTakingMatrix(x, individual=TRUE)
Convert a data frame into a vocalisation diagram using counts rather than sampling.
getTurnTakingProbMatrix(df, individual = FALSE, ...)
getTurnTakingProbMatrix(df, individual = FALSE, ...)
df |
a data frame consisting, minimally, of a column for vocalisation/pause start times, a column for end times, and a column identifying the speaker, speaker role or 'Floor' (for silences). |
individual |
whether to include individual speakers or group them into a single Vocalisation node |
... |
other parameters to be passed to
|
Unlike getSampledVocalMatrix
, this function is based
on transition counts rather than sampled intervals. As a result,
where in this version self transitions will always be set to 0
(since a vocalisation by a speaker is never followed by another
vocalisation by the same speaker) in the sampled version self
transitons will usually dominate the distribution, since the
speaker who is speaking now is very likely to be the one who were
speaking one second ago.
a vocaldia object, consisting of a vocalisation matrix
(vocmatrix) where cell contains the probabilities
transitions to node
from node
, and a table of prior
probabilities (stationary distribution) per node.
(Luz, 2013) and getTurnTakingMatrix
.
S. Luz. Automatic identification of experts and performance prediction in the multimodal math data corpus through analysis of speech interaction. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI'13, pages 575–582, New York, NY, USA, 2013. ACM.
x <- subset(atddia, id=='Abbott_Maddock_01') getTurnTakingProbMatrix(x) getTurnTakingProbMatrix(x, individual=TRUE)
x <- subset(atddia, id=='Abbott_Maddock_01') getTurnTakingProbMatrix(x) getTurnTakingProbMatrix(x, individual=TRUE)
Identify turn types
getTurnType( df, i, individual = FALSE, nodecolumn = "speaker", noPauseTypes = F )
getTurnType( df, i, individual = FALSE, nodecolumn = "speaker", noPauseTypes = F )
df |
a data frame consisting, minimally, of a column for vocalisation/pause start times, a column for end times, and a column identifying the speaker, speaker role or 'Floor' (for silences). |
i |
the identifier (index number) whose type will be returned |
individual |
if TRUE, return the identifier, a Pause or Grp |
nodecolumn |
the name of the column containing the node (speaker) name (e.g. 'speaker', 'role'). |
noPauseTypes |
if TRUE, ignore distinctions between pauses (SwitchingPause, GrpSwitchingPause, etc) |
Return one of Vocalisation, GrpVocalisation, ... or identifier.
a string containing the turn type or identifier.
data(vocdia) atddia[1:10,] getTurnType(atddia, 3, nodecolumn='role') ## a vocalisation getTurnType(atddia, 4, nodecolumn='role') ## a pause
data(vocdia) atddia[1:10,] getTurnType(atddia, 3, nodecolumn='role') ## a vocalisation getTurnType(atddia, 4, nodecolumn='role') ## a pause
Identify group vocalisations
identifyGrpVocalisations(vocvector)
identifyGrpVocalisations(vocvector)
vocvector |
a character vector containing a sequence of vocalisation events |
Standardise identifier for group vocalisations
A vector with all events replaced by the appropriate type identifier.
data(vocdia) identifyGrpVocalisations(atddia$speaker[1:60])
data(vocdia) identifyGrpVocalisations(atddia$speaker[1:60])
Assign types to the pauses (Floor events) in a sequence
identifyPauses(vocvector)
identifyPauses(vocvector)
vocvector |
a character vector containing a sequence of vocalisation events |
Identify the pauses in a vector as one of the pauses in
pauseTypes
A vector with all Floor events replaced by the appropriate pause type identifier.
data(vocdia) identifyPauses(atddia$speaker[1:60])
data(vocdia) identifyPauses(atddia$speaker[1:60])
Identify switching vocalisations
identifyVocalisations(vocvector, idswitchvoc = T)
identifyVocalisations(vocvector, idswitchvoc = T)
vocvector |
a character vector containing a sequence of vocalisation events |
idswitchvoc |
if TRUE distinguise between SwitchingVocalisation and Vocalisation. |
SwitchingVocalisation is a vocalisation that signals a immediate speaker transition; that is, a transition from speaker to speaker (as opposed to speaker to Grp or speaker to Pause).
E.g (speakers A, B, C):
AAAAAAAABBBBBBBCCCCCBBBBBPauseBBBBSwitchingPauseAAAAAGrp ^ ^ ^ ^ ^ ^ | | | | | | | | | ----------- Non-SwitchingVocalisation | | | ---------------------> SwitchingVocalisation
A vector with all events replaced by the appropriate type identifier.
data(vocdia) identifyVocalisations(atddia$speaker[1:60])
data(vocdia) identifyVocalisations(atddia$speaker[1:60])
Create an igraph vocalisation diagram
igraph.vocaldia(vd, ...)
igraph.vocaldia(vd, ...)
vd |
a vocalisation diagram |
... |
arguments for the layout algorithm |
Create a vocalisation diagram
an igraph
data(vocdia) if (require('igraph')) igraph.vocaldia(getSampledVocalMatrix(subset(atddia, id=='Abbott_Maddock_01'), individual=TRUE, nodecolumn='speaker'))
data(vocdia) if (require('igraph')) igraph.vocaldia(getSampledVocalMatrix(subset(atddia, id=='Abbott_Maddock_01'), individual=TRUE, nodecolumn='speaker'))
makeSessionDataSet: create a data frame for a session (e.g. cookie scene description)
makeSessionDataSet( f, sildir = NULL, silsuffix = "c.mp3.csv", srdir = "../data/ADReSS/speech_rate/", srsuffix = "sra", sprate = T )
makeSessionDataSet( f, sildir = NULL, silsuffix = "c.mp3.csv", srdir = "../data/ADReSS/speech_rate/", srsuffix = "sra", sprate = T )
f |
CHA file to read |
sildir |
directory where silence profiles are stored |
silsuffix |
suffix for silence files |
srdir |
directory where speech rate csv (1 value per utterance) files are stored |
srsuffix |
the suffix of the speech rate files (default: sre) |
sprate |
estimate speech rate? (default: TRUE) |
a speech session data frame
luzs
Build a data frame createwith vocalisation statistics
makeVocalStatsDataset( dir = c("data/Pitt/Dementia/cookie", "data/Pitt/Control/cookie"), sildir = NULL, silsuffix = "c.mp3.csv", srdir = "data/Pitt/speech_rate/", srsuffix = "sra", sprate = T )
makeVocalStatsDataset( dir = c("data/Pitt/Dementia/cookie", "data/Pitt/Control/cookie"), sildir = NULL, silsuffix = "c.mp3.csv", srdir = "data/Pitt/speech_rate/", srsuffix = "sra", sprate = T )
dir |
a string or vector containing the location (directory path) of the DementiaBank transcript files (.cha files) |
sildir |
directory where silence csv files are stored |
silsuffix |
the suffix of the silence profile files 'c.mp3.csv'. The format of such files should be the format used by Audacity label files, i.e. 'start time, end time, label' (without header), where 'label' should be 'silence' |
srdir |
directory where speech rate csv (1 value per utterance) files are stored |
srsuffix |
the suffix of the speech rate files (default: sre) |
sprate |
compute speech rate? (not in use yet) |
a session's vocalisation feature stats
## Not run: makeVocalStatsDataset(dir=c('ADReSS-IS2020-data/train/transcription/cc/', 'ADReSS-IS2020-data/train/transcription/cd/'), sildir='ADReSS/silence/', srdir='ADReSS/speech_rate/', silsuffix='.wav-sil.csv') ## End(Not run)
## Not run: makeVocalStatsDataset(dir=c('ADReSS-IS2020-data/train/transcription/cc/', 'ADReSS-IS2020-data/train/transcription/cd/'), sildir='ADReSS/silence/', srdir='ADReSS/speech_rate/', silsuffix='.wav-sil.csv') ## End(Not run)
Matrix exponentials
matrixExp(matrix, exp, mmatrix = matrix)
matrixExp(matrix, exp, mmatrix = matrix)
matrix |
a matrix |
exp |
the power to which matrix will be raised |
mmatrix |
a placeholder. |
A (sort of) exponential function for matrix multiplication (to be
used with staticMatrix
).
matrix^exp
data(vocdia) matrixExp(vocmatrix$ttarray, 3)
data(vocdia) matrixExp(vocmatrix$ttarray, 3)
Replace identified pause pause types in data frame.
namePauses(df, nodecolumn = "role")
namePauses(df, nodecolumn = "role")
df |
a data frame consisting, minimally, of a column for vocalisation/pause start times, a column for end times, and a column identifying the speaker, speaker role or 'Floor' (for silences). |
nodecolumn |
the name of the column containing the node (speaker) name (e.g. 'speaker', 'role'). |
replace all 'Floor' speakers in df by 'Pause', 'SwitchingPause' etc, and return a new data fame containing pause types in place of 'Floor' (see markov.R, identifyPauses() for a better implementation)
a data.frame with pauses in nodecolumn replaced by different pause types.
identifyPauses
for a better implementation
data(vocdia) x <- subset(atddia, id=='Abbott_Maddock_01') x[1:15,1:6] namePauses(x)[1:15,1:6]
data(vocdia) x <- subset(atddia, id=='Abbott_Maddock_01') x[1:15,1:6] namePauses(x)[1:15,1:6]
Visualise convergence properties of vocalisation graphs
## S3 method for class 'matrixseries' plot(x, ..., par = list(), interact = F)
## S3 method for class 'matrixseries' plot(x, ..., par = list(), interact = F)
x |
an object of class matrixseries; a list where the
|
... |
extra graphics parameters for plot. |
par |
graphic parameters alist |
interact |
if TRUE, pauses the drawing after each node. |
A 'toy' for visualisation of convergence properties of vocalisation graphs. Plot the convergence paths of each Vocalisation event (i.e. each row-column transition probability, grouped by colour according to the inciding node)
the matrixseries
data(vocdia) plot(staticMatrix(vocmatrix$ttarray, digits=4, history=TRUE))
data(vocdia) plot(staticMatrix(vocmatrix$ttarray, digits=4, history=TRUE))
Plot a vocalisation diagram
## S3 method for class 'vocaldia' plot(x, ...)
## S3 method for class 'vocaldia' plot(x, ...)
x |
a vocalisation diagram |
... |
arguments for the layout algorithm |
Plot a vocalisation diagram
NULL
data(vocdia) if (require('igraph')) plot(getSampledVocalMatrix(subset(atddia, id=='Abbott_Maddock_01'), individual=TRUE, nodecolumn='speaker'))
data(vocdia) if (require('igraph')) plot(getSampledVocalMatrix(subset(atddia, id=='Abbott_Maddock_01'), individual=TRUE, nodecolumn='speaker'))
Generate ARFF files from vocalisation diagrams
printARFFfile( df, ids = c(), idcolumn = "id", noPauseTypes = F, sampled = 0, individual = TRUE, nodecolumn = "role", classcolumn = "dx", file = "" )
printARFFfile( df, ids = c(), idcolumn = "id", noPauseTypes = F, sampled = 0, individual = TRUE, nodecolumn = "role", classcolumn = "dx", file = "" )
df |
df a data frame consisting, minimally, of a column for vocalisation/pause start times, a column for end times, and a column identifying the speaker, speaker role or 'Floor' (for silences). |
ids |
Ids of dialogues to generate (as defined in column named idcolumn) |
idcolumn |
the name of the column containing the dialogue id |
noPauseTypes |
if TRUE, ignore distinctions between pauses (SwitchingPause, GrpSwitchingPause, etc) |
sampled |
if >0 use |
individual |
whether to include individual speakers or group them into a single Vocalisation node |
nodecolumn |
the name of the column containing the node (speaker) name (e.g. 'speaker', 'role'). |
classcolumn |
the name of the column containing the target class (or value). |
file |
name of ARFF file to be generated, or "" (print to console). |
Use this function to generate turn-taking diragrams in ARFF format for
S. Luz. Automatic identification of experts and performance prediction in the multimodal math data corpus through analysis of speech interaction. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI'13, pages 575–582, New York, NY, USA, 2013. ACM.
getSampledVocalCountMatrix
,
getTurnTakingProbMatrix
.
data(vocdia) atdarff <- tempfile(pattern='vocaldia-', fileext='arff') printARFFfile(atddia, individual=TRUE, classcolumn='dx', file=atdarff, noPauseTypes=FALSE) library("foreign") x1 <- read.arff(atdarff) x1[1:3,] ## remove empty columns x1[,c(unlist(apply(x1[1:(ncol(x1)-1)],2,sum)!=0), TRUE)]
data(vocdia) atdarff <- tempfile(pattern='vocaldia-', fileext='arff') printARFFfile(atddia, individual=TRUE, classcolumn='dx', file=atdarff, noPauseTypes=FALSE) library("foreign") x1 <- read.arff(atdarff) x1[1:3,] ## remove empty columns x1[,c(unlist(apply(x1[1:(ncol(x1)-1)],2,sum)!=0), TRUE)]
read.cha: read CHA transcription file (format used by DementiaBank)
read.cha(file, sildir = NULL, silsuffix = "c.mp3.csv")
read.cha(file, sildir = NULL, silsuffix = "c.mp3.csv")
file |
.cha file to reas |
sildir |
silences directory |
silsuffix |
silence files suffix |
a list containing the PID, a dataframe containing the speaker IDs and demogrephics, and a dataframe containing the speaker IDs, transcribed utterances, start and en times, speech rates etc.
luzs
Access initital matrix in a matrixseries
startmatrix(mseries) ## Default S3 method: startmatrix(mseries) ## S3 method for class 'matrixseries' startmatrix(mseries)
startmatrix(mseries) ## Default S3 method: startmatrix(mseries) ## S3 method for class 'matrixseries' startmatrix(mseries)
mseries |
a matrixseries object |
Access initital matrix in a matrixseries
the initial matrix.
## Not run: data(vocdia) x2 <- staticMatrix(vocmatrix$ttarray, digits=4, history=TRUE) ## original matrix startmatrix(x2) ## End(Not run)
## Not run: data(vocdia) x2 <- staticMatrix(vocmatrix$ttarray, digits=4, history=TRUE) ## original matrix startmatrix(x2) ## End(Not run)
Compute the stationary distribution for a Markov diagram
staticMatrix(matrix, limit = 1000, digits = 4, history = F)
staticMatrix(matrix, limit = 1000, digits = 4, history = F)
matrix |
an adjecency matrix of trnasition probabilities |
limit |
maximum number of iterations until we give up on convergence |
digits |
the number of decimal places to compare |
history |
if TRUE, keep track of all matrix products |
Return static matrix (i.e. the stationary distribution) for the Markov process represented by the given adjacency matrix. In the particular case of vocaldia's, each column should roughly correspond to the amount of time a speaker held the floor for). Of course, not all Markov chains converge, an example being:
1 /----->-------\ A B \----<--------/ 1 which gives . | 0 1 | | 0x0+1x1 0x1+1x0| | 1 0 | . M = | 1 0 | and M^2 = | 1x0+0x1 1x1+1x0| = | 0 1 |
a matrixseries object; that is, a list where each element is either the initial matrix or the product of the two preceding matrices
data(vocdia) x2 <- staticMatrix(vocmatrix$ttarray, digits=4, history=TRUE) ## original matrix round(x2[[1]],3) ## stationary matrix (M^139) round(x2[[length(x2)]],3)
data(vocdia) x2 <- staticMatrix(vocmatrix$ttarray, digits=4, history=TRUE) ## original matrix round(x2[[1]],3) ## stationary matrix (M^139) round(x2[[length(x2)]],3)
Create vocalisation diagram to file in dot (graphviz) notation
toDotNotation( vd, individual = T, varsizenode = T, shape = "circle", fontsize = 16, rankdir = "LR", nodeattribs = "fixedsize=true;", comment = "" )
toDotNotation( vd, individual = T, varsizenode = T, shape = "circle", fontsize = 16, rankdir = "LR", nodeattribs = "fixedsize=true;", comment = "" )
vd |
a vocalisation diagram |
individual |
if TRUE write individual node names |
varsizenode |
if true set varsizenode in dot |
shape |
node shape |
fontsize |
font size |
rankdir |
direction of ranking (LR, RF etc) |
nodeattribs |
attributes for node |
comment |
comments |
Create a vocalisation diagram in dot notation
character data containing the diagram in dot format.
graphviz manual
data(vocdia) toDotNotation(getSampledVocalMatrix(subset(atddia, id=='Abbott_Maddock_01'), individual=TRUE, nodecolumn='speaker'))
data(vocdia) toDotNotation(getSampledVocalMatrix(subset(atddia, id=='Abbott_Maddock_01'), individual=TRUE, nodecolumn='speaker'))
A vocaldia
object containing a 3-speaker dialogue
vocmatrix
vocmatrix
A list containing 2 arrays
The vocaldia adjacency matrix
The proportional durations (stationary probabilities) of each event (node)
This dataset was generated from the Multomodal Learning Analytics dataset, for the eponymous ICMI'13 Grand Challenge. The use these vocaldias were put to is described in Luz (2013). The full dataset and code is available at https://gitlab.scss.tcd.ie/saturnino.luz/icmi-mla-challenge
S. Luz. Automatic identification of experts and performance prediction in the multimodal math data corpus through analysis of speech interaction. In Proceedings of the 15th ACM on International Conference on Multimodal Interaction, ICMI'13, pages 575–582, New York, NY, USA, 2013. ACM.
Write vocalisation diagram to file in dot (graphviz) notation
write.vocaldia(vd, file = "", ...)
write.vocaldia(vd, file = "", ...)
vd |
a vocalisation diagram |
file |
name of file to which dot diagram will be written. |
... |
arguments passed on to toDotNotation. If "", write to STDOUT. |
Write a vocalisation diagram
NULL
data(vocdia) write.vocaldia(getSampledVocalMatrix(subset(atddia, id=='Abbott_Maddock_01'), individual=TRUE, nodecolumn='speaker'), file=tempfile(pattern='vocaldia-', fileext='.dot'))
data(vocdia) write.vocaldia(getSampledVocalMatrix(subset(atddia, id=='Abbott_Maddock_01'), individual=TRUE, nodecolumn='speaker'), file=tempfile(pattern='vocaldia-', fileext='.dot'))