Package 'ggseqplot'

Title: Render Sequence Plots using 'ggplot2'
Description: A set of wrapper functions that mainly re-produces most of the sequence plots rendered with TraMineR::seqplot(). Whereas 'TraMineR' uses base R to produce the plots this library draws on 'ggplot2'. The plots are produced on the basis of a sequence object defined with TraMineR::seqdef(). The package automates the reshaping and plotting of sequence data. Resulting plots are of class 'ggplot', i.e. components can be added and tweaked using '+' and regular 'ggplot2' functions.
Authors: Marcel Raab [aut, cre]
Maintainer: Marcel Raab <[email protected]>
License: GPL (>= 3)
Version: 0.8.5
Built: 2024-10-30 09:25:02 UTC
Source: CRAN

Help Index


Sequence Distribution Plot

Description

Function for rendering state distribution plots with ggplot2 (Wickham 2016) instead of base R's plot function that is used by TraMineR::seqplot (Gabadinho et al. 2011).

Usage

ggseqdplot(
  seqdata,
  no.n = FALSE,
  group = NULL,
  dissect = NULL,
  weighted = TRUE,
  with.missing = FALSE,
  border = FALSE,
  with.entropy = FALSE,
  linetype = "dashed",
  linecolor = "black",
  linewidth = 1,
  facet_ncol = NULL,
  facet_nrow = NULL,
  ...
)

Arguments

seqdata

State sequence object (class stslist) created with the TraMineR::seqdef function.

no.n

specifies if number of (weighted) sequences is shown (default is TRUE)

group

A vector of the same length as the sequence data indicating group membership. When not NULL, a distinct plot is generated for each level of group.

dissect

if "row" or "col" are specified separate distribution plots instead of a stacked plot are displayed; "row" and "col" display the distributions in one row or one column respectively; default is NULL

weighted

Controls if weights (specified in TraMineR::seqdef) should be used. Default is TRUE, i.e. if available weights are used

with.missing

Specifies if missing states should be considered when computing the state distributions (default is FALSE).

border

if TRUE bars are plotted with black outline; default is FALSE (also accepts NULL)

with.entropy

add line plot of cross-sectional entropies at each sequence position

linetype

The linetype for the entropy subplot (with.entropy==TRUE) can be specified with an integer (0-6) or name (0 = blank, 1 = solid, 2 = dashed, 3 = dotted, 4 = dotdash, 5 = longdash, 6 = twodash); ; default is "dashed"

linecolor

Specifies the color of the entropy line if with.entropy==TRUE; default is "black"

linewidth

Specifies the width of the entropy line if with.entropy==TRUE; default is 1

facet_ncol

Number of columns in faceted (i.e. grouped) plot

facet_nrow

Number of rows in faceted (i.e. grouped) plot

...

if group is specified additional arguments of ggplot2::facet_wrap such as "labeller" or "strip.position" can be used to change the appearance of the plot. Does not work if dissect is used

Details

Sequence distribution plots visualize the distribution of all states by rendering a series of stacked bar charts at each position of the sequence. Although this type of plot has been used in the life course studies for several decades (see Blossfeld (1987) for an early application), it should be noted that the size of the different bars in stacked bar charts might be difficult to compare - particularly if the alphabet comprises many states (Wilke 2019). This issue can be addressed by breaking down the aggregated distribution specifying the dissect argument. Moreover, it is important to keep in mind that this plot type does not visualize individual trajectories; instead it displays aggregated distributional information (repeated cross-sections). For a more detailed discussion of this type of sequence visualization see, for example, Brzinsky-Fay (2014), Fasang and Liao (2014), and Raab and Struffolino (2022).

The function uses TraMineR::seqstatd to obtain state distributions (and entropy values). This requires that the input data (seqdata) are stored as state sequence object (class stslist) created with the TraMineR::seqdef function. The state distributions are reshaped into a a long data format to enable plotting with ggplot2. The stacked bars are rendered by calling geom_bar; if entropy = TRUE entropy values are plotted with geom_line. If the group or the dissect argument are specified the sub-plots are produced by using facet_wrap. If both are specified the plots are rendered with facet_grid.

The data and specifications used for rendering the plot can be obtained by storing the plot as an object. The appearance of the plot can be adjusted just like with every other ggplot (e.g., by changing the theme or the scale using + and the respective functions).

Value

A sequence distribution plot created by using ggplot2. If stored as object the resulting list object (of class gg and ggplot) also contains the data used for rendering the plot.

Author(s)

Marcel Raab

References

Blossfeld H (1987). “Labor-Market Entry and the Sexual Segregation of Careers in the Federal Republic of Germany.” American Journal of Sociology, 93(1), 89–118. doi:10.1086/228707.

Brzinsky-Fay C (2014). “Graphical Representation of Transitions and Sequences.” In Blanchard P, Bühlmann F, Gauthier J (eds.), Advances in Sequence Analysis: Theory, Method, Applications, Life Course Research and Social Policies, 265–284. Springer, Cham. doi:10.1007/978-3-319-04969-4_14.

Fasang AE, Liao TF (2014). “Visualizing Sequences in the Social Sciences: Relative Frequency Sequence Plots.” Sociological Methods & Research, 43(4), 643–676. doi:10.1177/0049124113506563.

Gabadinho A, Ritschard G, Müller NS, Studer M (2011). “Analyzing and Visualizing State Sequences in R with TraMineR.” Journal of Statistical Software, 40(4), 1–37. doi:10.18637/jss.v040.i04.

Raab M, Struffolino E (2022). Sequence Analysis, volume 190 of Quantitative Applications in the Social Sciences. SAGE, Thousand Oaks, CA. https://sa-book.github.io/.

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis, Use R!, 2nd ed. edition. Springer, Cham. doi:10.1007/978-3-319-24277-4.

Wilke C (2019). Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. O'Reilly Media, Sebastopol, CA. ISBN 978-1-4920-3108-6.

Examples

# Use example data from TraMineR: actcal data set
data(actcal)

# We use only a sample of 300 cases
set.seed(1)
actcal <- actcal[sample(nrow(actcal), 300), ]
actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work")
actcal.seq <- seqdef(actcal, 13:24, labels = actcal.lab)

# state distribution plots; grouped by sex
# with TraMineR::seqplot
seqdplot(actcal.seq, group = actcal$sex)
# with ggseqplot
ggseqdplot(actcal.seq, group = actcal$sex)
# with ggseqplot applying a few additional arguments, e.g. entropy line
ggseqdplot(actcal.seq, group = actcal$sex,
           no.n = TRUE, with.entropy = TRUE, border = TRUE)

# break down the stacked plot to ease comparisons of distributions
ggseqdplot(actcal.seq, group = actcal$sex, dissect = "row")

# make use of ggplot functions for modifying the plot
ggseqdplot(actcal.seq) +
  scale_x_discrete(labels = month.abb) +
  labs(title = "State distribution plot", x = "Month") +
  guides(fill = guide_legend(title = "Alphabet")) +
  theme_classic() +
  theme(plot.title = element_text(size = 30,
                                  margin = margin(0, 0, 20, 0)),
    plot.title.position = "plot")

Sequence Entropy Plot

Description

Function for plotting the development of cross-sectional entropies across sequence positions with ggplot2 (Wickham 2016) instead of base R's plot function that is used by TraMineR::seqplot (Gabadinho et al. 2011). Other than in TraMineR::seqHtplot group-specific entropy lines are displayed in a common plot.

Usage

ggseqeplot(
  seqdata,
  group = NULL,
  weighted = TRUE,
  with.missing = FALSE,
  linewidth = 1,
  linecolor = "Okabe-Ito",
  gr.linetype = FALSE
)

Arguments

seqdata

State sequence object (class stslist) created with the TraMineR::seqdef function.

group

If grouping variable is specified plot shows one line for each group

weighted

Controls if weights (specified in TraMineR::seqdef) should be used. Default is TRUE, i.e. if available weights are used

with.missing

Specifies if missing states should be considered when computing the entropy index (default is FALSE).

linewidth

Specifies the with of the entropy line; default is 1

linecolor

Specifies color palette for line(s); default is "Okabe-Ito" which contains up to 9 colors (first is black). if more than 9 lines should be rendered, user has to specify an alternative color palette

gr.linetype

Specifies if line type should vary by group; hence only relevant if group argument is specified; default is FALSE

Details

The function uses TraMineR::seqstatd to compute entropies. This requires that the input data (seqdata) are stored as state sequence object (class stslist) created with the TraMineR::seqdef function.

The entropy values are plotted with geom_line. The data and specifications used for rendering the plot can be obtained by storing the plot as an object. The appearance of the plot can be adjusted just like with every other ggplot (e.g., by changing the theme or the scale using + and the respective functions).

Value

A line plot of entropy values at each sequence position. If stored as object the resulting list object also contains the data (long format) used for rendering the plot.

Author(s)

Marcel Raab

References

Gabadinho A, Ritschard G, Müller NS, Studer M (2011). “Analyzing and Visualizing State Sequences in R with TraMineR.” Journal of Statistical Software, 40(4), 1–37. doi:10.18637/jss.v040.i04.

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis, Use R!, 2nd ed. edition. Springer, Cham. doi:10.1007/978-3-319-24277-4.

Examples

# Use example data from TraMineR: actcal data set
data(actcal)

# We use only a sample of 300 cases
set.seed(1)
actcal <- actcal[sample(nrow(actcal), 300), ]
actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work")
actcal.seq <- seqdef(actcal, 13:24, labels = actcal.lab)

# sequences sorted by age in 2000 and grouped by sex
# with TraMineR::seqplot (entropies shown in two separate plots)
seqHtplot(actcal.seq, group = actcal$sex)
# with ggseqplot (entropies shown in one plot)
ggseqeplot(actcal.seq, group = actcal$sex)
ggseqeplot(actcal.seq, group = actcal$sex, gr.linetype = TRUE)

# manual color specification
ggseqeplot(actcal.seq, linecolor = "darkgreen")
ggseqeplot(actcal.seq, group = actcal$sex,
           linecolor = c("#3D98D3FF", "#FF363CFF"))

Sequence Frequency Plot

Description

Function for rendering sequence index plot of the most frequent sequences of a state sequence object using ggplot2 (Wickham 2016) instead of base R's plot function that is used by TraMineR::seqplot / TraMineR::plot.stslist.freq (Gabadinho et al. 2011).

Usage

ggseqfplot(
  seqdata,
  group = NULL,
  ranks = 1:10,
  weighted = TRUE,
  border = FALSE,
  proportional = TRUE,
  ylabs = "total",
  no.coverage = FALSE,
  facet_ncol = NULL,
  facet_nrow = NULL
)

Arguments

seqdata

State sequence object (class stslist) created with the TraMineR::seqdef function.

group

A vector of the same length as the sequence data indicating group membership. When not NULL, a distinct plot is generated for each level of group.

ranks

specifies which of the most frequent sequences should be plotted; default is the first ten (1:10); if set to 0 all sequences are displayed

weighted

Controls if weights (specified in TraMineR::seqdef) should be used. Default is TRUE, i.e. if available weights are used

border

if TRUE bars are plotted with black outline; default is FALSE (also accepts NULL)

proportional

if TRUE (default), the sequence heights are displayed proportional to their frequencies

ylabs

defines appearance of y-axis labels; default ("total") only labels min and max (i.e. cumulative relative frequency); if "share" labels indicate relative frequency of each displayed sequence (note: overlapping labels are removed)

no.coverage

specifies if information on total coverage is shown as caption or as part of the group/facet label if ylabs == "share" (default is TRUE)

facet_ncol

Number of columns in faceted (i.e. grouped) plot

facet_nrow

Number of rows in faceted (i.e. grouped) plot

Details

The subset of displayed sequences is obtained by an internal call of TraMineR::seqtab. The extracted sequences are plotted by a call of ggseqiplot which uses ggplot2::geom_rect to render the sequences. The data and specifications used for rendering the plot can be obtained by storing the plot as an object. The appearance of the plot can be adjusted just like with every other ggplot (e.g., by changing the theme or the scale using + and the respective functions).

Experienced ggplot2 users might notice the customized labeling of the y-axes in the faceted plots (i.e. plots with specified group argument). This has been achieved by utilizing the very helpful ggh4x library.

Value

A sequence frequency plot created by using ggplot2. If stored as object the resulting list object (of class gg and ggplot) also contains the data used for rendering the plot.

Author(s)

Marcel Raab

References

Gabadinho A, Ritschard G, Müller NS, Studer M (2011). “Analyzing and Visualizing State Sequences in R with TraMineR.” Journal of Statistical Software, 40(4), 1–37. doi:10.18637/jss.v040.i04.

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis, Use R!, 2nd ed. edition. Springer, Cham. doi:10.1007/978-3-319-24277-4.

See Also

ggseqiplot

Examples

# Use example data from TraMineR: actcal data set
data(actcal)

# We use only a sample of 300 cases
set.seed(1)
actcal <- actcal[sample(nrow(actcal), 300), ]
actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work")
actcal.seq <- seqdef(actcal, 13:24, labels = actcal.lab)

# sequence frequency plot
# with TraMineR::seqplot
seqfplot(actcal.seq)
# with ggseqplot
ggseqfplot(actcal.seq)
# with ggseqplot applying additional arguments and some layout changes
ggseqfplot(actcal.seq,
           group = actcal$sex,
           ranks = 1:5,
           ylabs = "share") +
  scale_x_discrete(breaks = 1:12,
                   labels = month.abb,
                   expand = expansion(add = c(0.2, 0)))

Sequence Index Plot

Description

Function for rendering sequence index plots with ggplot2 (Wickham 2016) instead of base R's plot function that is used by TraMineR::seqplot (Gabadinho et al. 2011).

Usage

ggseqiplot(
  seqdata,
  no.n = FALSE,
  group = NULL,
  sortv = NULL,
  weighted = TRUE,
  border = FALSE,
  facet_scale = "free_y",
  facet_ncol = NULL,
  facet_nrow = NULL,
  ...
)

Arguments

seqdata

State sequence object (class stslist) created with the TraMineR::seqdef function.

no.n

specifies if number of (weighted) sequences is shown as part of the y-axis title or group/facet title (default is TRUE)

group

A vector of the same length as the sequence data indicating group membership. When not NULL, a distinct plot is generated for each level of group.

sortv

Vector of numerical values sorting the sequences or a sorting method (either "from.start" or "from.end"). See details.

weighted

Controls if weights (specified in TraMineR::seqdef) should be used. Default is TRUE, i.e. if available weights are used

border

if TRUE bars are plotted with black outline; default is FALSE (also accepts NULL)

facet_scale

Specifies if y-scale in faceted plot should be free ("free_y" is default) or "fixed"

facet_ncol

Number of columns in faceted (i.e. grouped) plot

facet_nrow

Number of rows in faceted (i.e. grouped) plot

...

if group is specified additional arguments of ggplot2::facet_wrap such as "labeller" or "strip.position" can be used to change the appearance of the plot

Details

Sequence index plots have been introduced by Scherer (2001) and display each sequence as horizontally stacked bar or line. For a more detailed discussion of this type of sequence visualization see, for example, Brzinsky-Fay (2014), Fasang and Liao (2014), and Raab and Struffolino (2022).

The function uses TraMineR::seqformat to reshape seqdata stored in wide format into a spell/episode format. Then the data are further reshaped into the long format, i.e. for every sequence each row in the data represents one specific sequence position. For example, if we have 5 sequences of length 10, the long file will have 50 rows. In the case of sequences of unequal length not every sequence will contribute the same number of rows to the long data.

The reshaped data are used as input for rendering the index plot using ggplot2's geom_rect. ggseqiplot uses geom_rect instead of geom_tile because this allows for a straight forward implementation of weights. If weights are specified for seqdata and weighted=TRUE the sequence height corresponds to its weight.

If weights and a grouping variable are used, and facet_scale="fixed" the values of the y-axis are not labeled, because ggplot2 reasonably does not allow for varying scales when the facet scale is fixed.

When a sortv is specified, the sequences are arranged in the order of its values. With sortv="from.start" sequence data are sorted according to the states of the alphabet in ascending order starting with the first sequence position, drawing on succeeding positions in the case of ties. Likewise, sortv="from.end" sorts a reversed version of the sequence data, starting with the final sequence position turning to preceding positions in case of ties.

Note that the default aspect ratio of ggseqiplot is different from TraMineR::seqIplot. This is most obvious when border=TRUE. You can change the ratio either by adding code to ggseqiplot or by specifying the ratio when saving the code with ggsave.

Value

A sequence index plot. If stored as object the resulting list object also contains the data (spell format) used for rendering the plot.

Author(s)

Marcel Raab

References

Brzinsky-Fay C (2014). “Graphical Representation of Transitions and Sequences.” In Blanchard P, Bühlmann F, Gauthier J (eds.), Advances in Sequence Analysis: Theory, Method, Applications, Life Course Research and Social Policies, 265–284. Springer, Cham. doi:10.1007/978-3-319-04969-4_14.

Fasang AE, Liao TF (2014). “Visualizing Sequences in the Social Sciences: Relative Frequency Sequence Plots.” Sociological Methods & Research, 43(4), 643–676. doi:10.1177/0049124113506563.

Gabadinho A, Ritschard G, Müller NS, Studer M (2011). “Analyzing and Visualizing State Sequences in R with TraMineR.” Journal of Statistical Software, 40(4), 1–37. doi:10.18637/jss.v040.i04.

Raab M, Struffolino E (2022). Sequence Analysis, volume 190 of Quantitative Applications in the Social Sciences. SAGE, Thousand Oaks, CA. https://sa-book.github.io/.

Scherer S (2001). “Early Career Patterns: A Comparison of Great Britain and West Germany.” European Sociological Review, 17(2), 119–144. doi:10.1093/esr/17.2.119.

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis, Use R!, 2nd ed. edition. Springer, Cham. doi:10.1007/978-3-319-24277-4.

Examples

# Use example data from TraMineR: actcal data set
data(actcal)

# We use only a sample of 300 cases
set.seed(1)
actcal <- actcal[sample(nrow(actcal), 300), ]
actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work")
actcal.seq <- seqdef(actcal, 13:24, labels = actcal.lab)

# ex1 using weights
data(ex1)
ex1.seq <- seqdef(ex1, 1:13, weights = ex1$weights)

# sequences sorted by age in 2000 and grouped by sex
# with TraMineR::seqplot
seqIplot(actcal.seq, group = actcal$sex, sortv = actcal$age00)
# with ggseqplot
ggseqiplot(actcal.seq, group = actcal$sex, sortv = actcal$age00)

# sequences of unequal length with missing state, and weights
seqIplot(ex1.seq)
ggseqiplot(ex1.seq)

# ... turn weights off and add border
seqIplot(ex1.seq, weighted = FALSE, border = TRUE)
ggseqiplot(ex1.seq, weighted = FALSE, border = TRUE)

Modal State Sequence Plot

Description

Function for rendering modal state sequence plot with ggplot2 (Wickham 2016) instead of base R's plot function that is used by TraMineR::seqplot (Gabadinho et al. 2011).

Usage

ggseqmsplot(
  seqdata,
  no.n = FALSE,
  barwidth = NULL,
  group = NULL,
  weighted = TRUE,
  with.missing = FALSE,
  border = FALSE,
  facet_ncol = NULL,
  facet_nrow = NULL
)

Arguments

seqdata

State sequence object (class stslist) created with the TraMineR::seqdef function.

no.n

specifies if number of (weighted) sequences is shown (default is TRUE)

barwidth

specifies width of bars (default is NULL); valid range: (0, 1]

group

A vector of the same length as the sequence data indicating group membership. When not NULL, a distinct plot is generated for each level of group.

weighted

Controls if weights (specified in TraMineR::seqdef) should be used. Default is TRUE, i.e. if available weights are used

with.missing

Specifies if missing states should be considered when computing the state distributions (default is FALSE).

border

if TRUE bars are plotted with black outline; default is FALSE (also accepts NULL)

facet_ncol

Number of columns in faceted (i.e. grouped) plot

facet_nrow

Number of rows in faceted (i.e. grouped) plot

Details

The function uses TraMineR::seqmodst to obtain the modal states and their prevalence. This requires that the input data (seqdata) are stored as state sequence object (class stslist) created with the TraMineR::seqdef function.

The data on the modal states and their prevalences are reshaped to be plotted with ggplot2::geom_bar. The data and specifications used for rendering the plot can be obtained by storing the plot as an object. The appearance of the plot can be adjusted just like with every other ggplot (e.g., by changing the theme or the scale using + and the respective functions).

Value

A modal state sequence plot. If stored as object the resulting list object also contains the data (long format) used for rendering the plot

Author(s)

Marcel Raab

References

Gabadinho A, Ritschard G, Müller NS, Studer M (2011). “Analyzing and Visualizing State Sequences in R with TraMineR.” Journal of Statistical Software, 40(4), 1–37. doi:10.18637/jss.v040.i04.

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis, Use R!, 2nd ed. edition. Springer, Cham. doi:10.1007/978-3-319-24277-4.

Examples

# Use example data from TraMineR: actcal data set
data(actcal)

# We use only a sample of 300 cases
set.seed(1)
actcal <- actcal[sample(nrow(actcal), 300), ]
actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work")
actcal.seq <- seqdef(actcal, 13:24, labels = actcal.lab)

# modal state sequence plot; grouped by sex
# with TraMineR::seqplot
seqmsplot(actcal.seq, group = actcal$sex)
# with ggseqplot
ggseqmsplot(actcal.seq, group = actcal$sex)
# with ggseqplot and some layout changes
ggseqmsplot(actcal.seq, group = actcal$sex, no.n = TRUE, border = FALSE, facet_nrow = 2)

Mean time plot

Description

Function for rendering plot displaying the mean time spent in each state of a state sequence object using ggplot2 (Wickham 2016) instead of base R's plot function that is used by TraMineR::seqplot (Gabadinho et al. 2011).

Usage

ggseqmtplot(
  seqdata,
  no.n = FALSE,
  group = NULL,
  weighted = TRUE,
  with.missing = FALSE,
  border = FALSE,
  error.bar = NULL,
  error.caption = TRUE,
  facet_scale = "fixed",
  facet_ncol = NULL,
  facet_nrow = NULL
)

Arguments

seqdata

State sequence object (class stslist) created with the TraMineR::seqdef function.

no.n

specifies if number of (weighted) sequences is shown (default is TRUE)

group

A vector of the same length as the sequence data indicating group membership. When not NULL, a distinct plot is generated for each level of group.

weighted

Controls if weights (specified in TraMineR::seqdef) should be used. Default is TRUE, i.e. if available weights are used

with.missing

Specifies if missing states should be considered when computing the state distributions (default is FALSE).

border

if TRUE bars are plotted with black outline; default is FALSE (also accepts NULL)

error.bar

allows to add error bars either using the standard deviation "SD" or the standard error "SE"; default plot is without error bars

error.caption

a caption is added if error bars are displayed; this default behavior can be turned off by setting the argument to "FALSE"

facet_scale

Specifies if y-scale in faceted plot should be "fixed" (default) or "free_y"

facet_ncol

Number of columns in faceted (i.e. grouped) plot

facet_nrow

Number of rows in faceted (i.e. grouped) plot

Details

The information on time spent in different states is obtained by an internal call of TraMineR::seqmeant. This requires that the input data (seqdata) are stored as state sequence object (class stslist) created with the TraMineR::seqdef function. The resulting output then is prepared to be plotted with ggplot2::geom_bar. The data and specifications used for rendering the plot can be obtained by storing the plot as an object. The appearance of the plot can be adjusted just like with every other ggplot (e.g., by changing the theme or the scale using + and the respective functions).

Value

A mean time plot created by using ggplot2. If stored as object the resulting list object (of class gg and ggplot) also contains the data used for rendering the plot

Author(s)

Marcel Raab

References

Gabadinho A, Ritschard G, Müller NS, Studer M (2011). “Analyzing and Visualizing State Sequences in R with TraMineR.” Journal of Statistical Software, 40(4), 1–37. doi:10.18637/jss.v040.i04.

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis, Use R!, 2nd ed. edition. Springer, Cham. doi:10.1007/978-3-319-24277-4.

Examples

# Use example data from TraMineR: actcal data set
data(actcal)

# We use only a sample of 300 cases
set.seed(1)
actcal <- actcal[sample(nrow(actcal), 300), ]
actcal.lab <- c("> 37 hours", "19-36 hours", "1-18 hours", "no work")
actcal.seq <- seqdef(actcal, 13:24, labels = actcal.lab)

# modal state sequence plot; grouped by sex
# with TraMineR::seqplot
seqmtplot(actcal.seq, group = actcal$sex)
# with ggseqplot
ggseqmtplot(actcal.seq, group = actcal$sex)
# with ggseqplot using additional arguments and some adjustments
ggseqmtplot(actcal.seq, no.n = TRUE, error.bar = "SE") +
 coord_flip() +
 theme(axis.text.y=element_blank(),
       axis.ticks.y = element_blank(),
       panel.grid.major.y = element_blank(),
       legend.position = "top")

Relative Frequency Sequence Plot

Description

Function for rendering sequence index plots with ggplot2 instead of base R's plot function that is used by TraMineR::seqrfplot. Note that ggseqrfplot uses patchwork to combine the different components of the plot. The function and the documentation draw heavily from TraMineR::seqrf.

Usage

ggseqrfplot(
  seqdata = NULL,
  diss = NULL,
  k = NULL,
  sortv = "mds",
  weighted = TRUE,
  grp.meth = "prop",
  squared = FALSE,
  pow = NULL,
  seqrfobject = NULL,
  border = FALSE,
  ylab = NULL,
  yaxis = TRUE,
  which.plot = "both",
  quality = TRUE,
  box.color = NULL,
  box.fill = NULL,
  box.alpha = NULL,
  outlier.jitter.height = 0,
  outlier.color = NULL,
  outlier.fill = NULL,
  outlier.shape = 19,
  outlier.size = 1.5,
  outlier.stroke = 0.5,
  outlier.alpha = NULL
)

Arguments

seqdata

State sequence object (class stslist) created with the TraMineR::seqdef function. seqdata is ignored if seqrfobject is specified.

diss

pairwise dissimilarities between sequences in seqdata (see TraMineR::seqdist). diss is ignored if seqrfobject is specified.

k

integer specifying the number of frequency groups. When NULL, k is set as the minimum between 100 and the sum of weights over 10. k is ignored if seqrfobject is specified.

sortv

optional sorting vector of length nrow(diss) that may be used to compute the frequency groups. If NULL, the original data order is used. If mds (default), the first MDS factor of diss (diss^2 when squared=TRUE) is used. Ties are randomly ordered. Also allows for the usage of the string inputs: "from.start" or "from.end" (see ggseqiplot). sortv is ignored if seqrfobject is specified.

weighted

Controls if weights (specified in TraMineR::seqdef) should be used. Default is TRUE, i.e. if available weights are used.

grp.meth

Character string. One of "prop", "first", and "random". Grouping method. See details. grp.meth is ignored if seqrfobject is specified.

squared

Logical. Should medoids (and computation of sortv when applicable) be based on squared dissimilarities? (default is FALSE). squared is ignored if seqrfobject is specified.

pow

Dissimilarity power exponent (typically 1 or 2) for computation of pseudo R2 and F. When NULL, pow is set as 1 when squared = FALSE, and as 2 otherwise. pow is ignored if seqrfobject is specified.

seqrfobject

object of class seqrf generated with TraMineR::seqrf. Default is NULL; either seqrfobject or seqdata and diss have to specified

border

if TRUE bars of index plot are plotted with black outline; default is FALSE (also accepts NULL)

ylab

character string specifying title of y-axis. If NULL axis title is "Frequency group"

yaxis

Controls if a y-axis is plotted. When set as TRUE, index of frequency groups is displayed.

which.plot

character string specifying which components of relative frequency sequence plot should be displayed. Default is "both". If set to "medoids" only the index plot of medoids is shown. If "diss.to.med" only the box plots of the group-specific distances to the medoids are shown.

quality

specifies if representation quality is shown as figure caption; default is TRUE

box.color

specifies color of boxplot borders; default is "black

box.fill

specifies fill color of boxplots; default is "white"

box.alpha

specifies alpha value of boxplot fill color; default is 1

outlier.jitter.height

if greater than 0 outliers are jittered vertically. If greater than .375 height is automatically adjusted to be aligned with the box width.

outlier.color, outlier.fill, outlier.shape, outlier.size, outlier.stroke, outlier.alpha

parameters to change the appearance of the outliers. Uses defaults of ggplot2::geom_boxplot

Details

This function renders relative frequency sequence plots using either an internal call of TraMineR::seqrf or by using an object of class "seqrf" generated with TraMineR::seqrf.

For further details on the technicalities we refer to the excellent documentation of TraMineR::seqrf. A detailed account of relative frequency index plot can be found in the original contribution by Fasang and Liao (2014).

ggseqrfplot renders the medoid sequences extracted by TraMineR::seqrf with an internal call of ggseqiplot. For the box plot depicting the distances to the medoids ggseqrfplot uses geom_boxplot and geom_jitter. The latter is used for plotting the outliers.

Note that ggseqrfplot renders in the box plots analogous to the those produced by TraMineR::seqrfplot. Actually, the box plots produced with TraMineR::seqrfplot and ggplot2::geom_boxplot might slightly differ due to differences in the underlying computations of grDevices::boxplot.stats and ggplot2::stat_boxplot.

Note that ggseqrfplot uses patchwork to combine the different components of the plot. If you want to adjust the appearance of the composed plot, for instance by changing the plot theme, you should consult the documentation material of patchwork.

At this point ggseqrfplot does not support a grouping option. For plotting multiple groups, I recommend to produce group specific seqrfobjects or plots and to arrange them in a common plot using patchwork. See Example 6 in the vignette for further details: vignette("ggseqplot", package = "ggseqplot")

Value

A relative frequency sequence plot using ggplot.

Author(s)

Marcel Raab

References

Fasang AE, Liao TF (2014). “Visualizing Sequences in the Social Sciences: Relative Frequency Sequence Plots.” Sociological Methods & Research, 43(4), 643–676. doi:10.1177/0049124113506563.

Examples

# Load additional library for fine-tuning the plots
library(patchwork)

# From TraMineR::seqprf
# Defining a sequence object with the data in columns 10 to 25
# (family status from age 15 to 30) in the biofam data set
data(biofam)
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
  "Child", "Left+Child", "Left+Marr+Child", "Divorced")

# Here, we use only 100 cases selected such that all elements
# of the alphabet be present.
# (More cases and a larger k would be necessary to get a meaningful example.)
biofam.seq <- seqdef(biofam[501:600, 10:25], labels=biofam.lab,
                     weights=biofam[501:600,"wp00tbgs"])
diss <- seqdist(biofam.seq, method = "LCS")


# Using 12 groups and default MDS sorting
# and original method by Fasang and Liao (2014)

# ... with TraMineR::seqrfplot (weights have to be turned off)
seqrfplot(biofam.seq, weighted = FALSE, diss = diss, k = 12,
          grp.meth="first", which.plot = "both")

# ... with ggseqrfplot
ggseqrfplot(biofam.seq, weighted = FALSE, diss = diss, k = 12, grp.meth="first")

# Arrange sequences by a user specified sorting variable:
# time spent in parental home; has ties
parentTime <- seqistatd(biofam.seq)[, 1]
b.srf <- seqrf(biofam.seq, diss=diss, k=12, sortv=parentTime)
# ... with ggseqrfplot (and some extra annotation using patchwork)
ggseqrfplot(seqrfobject = b.srf) +
  plot_annotation(title = "Sorted by time spent in parental home",
                  theme = theme(plot.title = element_text(hjust = 0.5, size = 18)))

Representative Sequence plot

Description

Function for rendering representative sequence plots with ggplot2 (Wickham 2016) instead of base R's plot function that is used by TraMineR::seqplot (Gabadinho et al. 2011).

Usage

ggseqrplot(
  seqdata,
  diss,
  group = NULL,
  criterion = "density",
  coverage = 0.25,
  nrep = NULL,
  pradius = 0.1,
  dmax = NULL,
  border = FALSE,
  proportional = TRUE,
  weighted = TRUE,
  stats = TRUE,
  colored.stats = NULL,
  facet_ncol = NULL
)

Arguments

seqdata

State sequence object (class stslist) created with the TraMineR::seqdef function.

diss

pairwise dissimilarities between sequences in seqdata (see TraMineR::seqdist)

group

A vector of the same length as the sequence data indicating group membership. When not NULL, a distinct plot is generated for each level of group.

criterion

the representativeness criterion for sorting the candidate list. One of "freq" (sequence frequency), "density" (neighborhood density), "mscore" (mean state frequency), "dist" (centrality) and "prob" (sequence likelihood). See details.

coverage

coverage threshold, i.e., minimum proportion of sequences that should have a representative in their neighborhood (neighborhood radius is defined by pradius).

nrep

number of representative sequences. If NULL (default), the size of the representative set is controlled by coverage.

pradius

neighborhood radius as a percentage of the maximum (theoretical) distance dmax. Defaults to 0.1 (10%). Sequence yy is redundant to sequence xx when it is in the neighborhood of xx, i.e., within a distance pradius*dmax from xx.

dmax

maximum theoretical distance. The dmax value is used to derive the neighborhood radius as pradius*dmax. If NULL, the value of dmax is derived from the dissimilarity matrix.

border

if TRUE bars are plotted with black outline; default is FALSE (also accepts NULL)

proportional

if TRUE (default), the sequence heights are displayed proportional to the number of represented sequences

weighted

Controls if weights (specified in TraMineR::seqdef) should be used. Default is TRUE, i.e. if available weights are used

stats

if TRUE (default), mean discrepancy in each subset defined by all sequences attributed to one representative sequence and the mean distance to this representative sequence are displayed.

colored.stats

specifies if representatives in stats plot should be color coded; only recommended if number of representatives is small; if set to NULL (default) colors are used if n rep. <= 10; use TRUE or FALSE to change manually

facet_ncol

specifies the number of columns in the plot (relevant if !is.null(group))

Details

The representative sequence plot displays a set of distinct sequences as sequence index plot. The set of representative sequences is extracted from the sequence data by an internal call of TraMineR::seqrep according to the criteria listed in the arguments section above.

The extracted sequences are plotted by a call of ggseqiplot which uses ggplot2::geom_rect to render the sequences. If stats = TRUE the index plots are complemented by information on the "quality" of the representative sequences. For further details on representative sequence plots see Gabadinho et al. (2011) and the documentation of TraMineR::plot.stslist.rep, TraMineR::seqplot, and TraMineR::seqrep.

Note that ggseqrplot uses patchwork to combine the different components of the plot. If you want to adjust the appearance of the composed plot, for instance by changing the plot theme, you should consult the documentation material of patchwork.

Value

A representative sequence plot using ggplot.

Author(s)

Marcel Raab

References

Gabadinho A, Ritschard G, Müller NS, Studer M (2011). “Analyzing and Visualizing State Sequences in R with TraMineR.” Journal of Statistical Software, 40(4), 1–37. doi:10.18637/jss.v040.i04.

Gabadinho A, Ritschard G, Studer M, Müller NS (2011). “Extracting and Rendering Representative Sequences.” In Fred A, Dietz JLG, Liu K, Filipe J (eds.), Knowledge Discovery, Knowlege Engineering and Knowledge Management, volume 128, 94–106. Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-19032-2_7.

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis, Use R!, 2nd ed. edition. Springer, Cham. doi:10.1007/978-3-319-24277-4.

Examples

# Use examples from TraMineR
library(TraMineR)
# Defining a sequence object with the data in columns 10 to 25
# (family status from age 15 to 30) in the biofam data set
data(biofam)
# Use sample of 300 cases
set.seed(123)
biofam <- biofam[sample(nrow(biofam),150),]
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
"Child", "Left+Child", "Left+Marr+Child", "Divorced")
biofam.seq <- seqdef(biofam, 10:25, labels=biofam.lab)

# Computing the distance matrix
biofam.dhd <- seqdist(biofam.seq, method="DHD")

# Representative sequence  plot (using defaults)
# ... with TraMineR::seqplot
seqrplot(biofam.seq, diss = biofam.dhd)

# ... with ggseqrplot
ggseqrplot(biofam.seq, diss = biofam.dhd)

Sequence Transition Rate Plot

Description

Function for plotting transition rate matrix of sequence states internally computed by TraMineR::seqtrate (Gabadinho et al. 2011). Plot is generated using ggplot2 (Wickham 2016).

Usage

ggseqtrplot(
  seqdata,
  dss = TRUE,
  group = NULL,
  no.n = FALSE,
  weighted = TRUE,
  with.missing = FALSE,
  labsize = NULL,
  axislabs = "labels",
  x_n.dodge = 1,
  facet_ncol = NULL,
  facet_nrow = NULL
)

Arguments

seqdata

State sequence object (class stslist) created with the TraMineR::seqdef function.

dss

specifies if transition rates are computed for STS or DSS (default) sequences

group

A vector of the same length as the sequence data indicating group membership. When not NULL, a distinct plot is generated for each level of group.

no.n

specifies if number of (weighted) sequences is shown in grouped (faceted) graph

weighted

Controls if weights (specified in TraMineR::seqdef) should be used. Default is TRUE, i.e. if available weights are used

with.missing

Specifies if missing state should be considered when computing the transition rates (default is FALSE).

labsize

Specifies the font size of the labels within the tiles (if not specified ggplot2's default is used)

axislabs

specifies if sequence object's long "labels" (default) or the state names from its "alphabet" attribute should be used.

x_n.dodge

allows to print the labels of the x-axis in multiple rows to avoid overlapping.

facet_ncol

Number of columns in faceted (i.e. grouped) plot

facet_nrow

Number of rows in faceted (i.e. grouped) plot

Details

The transition rates are obtained by an internal call of TraMineR::seqtrate. This requires that the input data (seqdata) are stored as state sequence object (class stslist) created with the TraMineR::seqdef function. As STS based transition rates tend to be dominated by high values on the diagonal, it might be worthwhile to examine DSS sequences instead (dss = TRUE)). In this case the resulting plot shows the transition rates between episodes of distinct states.

In any case (DSS or STS) the transitions rates are reshaped into a a long data format to enable plotting with ggplot2. The resulting output then is prepared to be plotted with ggplot2::geom_tile. The data and specifications used for rendering the plot can be obtained by storing the plot as an object. The appearance of the plot can be adjusted just like with every other ggplot (e.g., by changing the theme or the scale using + and the respective functions).

Value

A tile plot of transition rates.

Author(s)

Marcel Raab

References

Gabadinho A, Ritschard G, Müller NS, Studer M (2011). “Analyzing and Visualizing State Sequences in R with TraMineR.” Journal of Statistical Software, 40(4), 1–37. doi:10.18637/jss.v040.i04.

Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis, Use R!, 2nd ed. edition. Springer, Cham. doi:10.1007/978-3-319-24277-4.

Examples

# Use example data from TraMineR: biofam data set
data(biofam)

# We use only a sample of 300 cases
set.seed(10)
biofam <- biofam[sample(nrow(biofam),300),]
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
                "Child", "Left+Child", "Left+Marr+Child", "Divorced")
biofam.seq <- seqdef(biofam, 10:25, labels=biofam.lab, weights = biofam$wp00tbgs)

# Basic transition rate plot (with adjusted x-axis labels)
ggseqtrplot(biofam.seq, x_n.dodge = 2)

# Transition rate with group variable (with and without weights)
ggseqtrplot(biofam.seq, group=biofam$sex, x_n.dodge = 2)
ggseqtrplot(biofam.seq, group=biofam$sex, x_n.dodge = 2, weighted = FALSE)