Package 'nzilbb.vowels' reference manual

Title:	Vowel Covariation Tools
Description:	Tools to support research on vowel covariation. Methods are provided to support Principal Component Analysis workflows (as in Brand et al. (2021) <doi:10.1016/j.wocn.2021.101096> and Wilson Black et al. (2023) <doi:10.1515/lingvan-2022-0086>).
Authors:	Joshua Wilson Black [aut, cre, cph] , James Brand [aut]
Maintainer:	Joshua Wilson Black <[email protected]>
License:	MIT + file LICENSE
Version:	0.3.1
Built:	2024-11-29 13:56:37 UTC
Source:	CRAN

Permutation test of pairwise correlations

Description

Permute data a given number (n) of times, collecting pairwise correlations and testing them for significance. See plot_correlation_magnitudes() and plot_correlation_counts() for plotting functions which take the output of this function.

Usage

correlation_test(pca_data, n = 100, cor.method = "pearson")
correlation_test(pca_data, n = 100, cor.method = "pearson")

Arguments

`pca_data`	dataframe or matrix containing only continuous variables. (as accepted by the `prcomp` function.)
`n`	the number of times (integer) to permute that data. Warning: high values will take a long time to compute. Default: 100.
`cor.method`	method to use for correlations (default = "pearson"). Alternative is "spearman" (see `?cor.test`).

Value

object of class correlation_test, with attributes:

⁠$permuted_correlations⁠ A tibble of length n of pairs from the original data, their correlations, and the significance of each correlation (as p-values).
⁠$actual_correlations⁠ the correlations of each pair of variables in the original data and their significance (as p-values).
⁠$iterations⁠ the number of permutations carried out.
⁠$cor_method⁠ the form of correlation used.

Examples

  # get a small sample of random intercepts.
  pca_data <- onze_intercepts |>
    dplyr::select(-speaker) |>
    dplyr::slice_sample(n=10)

  # apply correlation test with 10 permutations.
  # actual use requires at least 100.
  cor_test <- correlation_test(pca_data, n = 10, cor.method = 'pearson')
  # Return summary of significant correlations
  summary(cor_test)

  # use spearman correlation instead.
  cor_test_spear <- correlation_test(pca_data, n = 10, cor.method = 'spearman')
# get a small sample of random intercepts.
  pca_data <- onze_intercepts |>
    dplyr::select(-speaker) |>
    dplyr::slice_sample(n=10)

  # apply correlation test with 10 permutations.
  # actual use requires at least 100.
  cor_test <- correlation_test(pca_data, n = 10, cor.method = 'pearson')
  # Return summary of significant correlations
  summary(cor_test)

  # use spearman correlation instead.
  cor_test_spear <- correlation_test(pca_data, n = 10, cor.method = 'spearman')

Apply Lobanov 2.0 normalisation

Description

lobanov_2() takes a data frame where the first four columns are:

speaker identifiers,
vowel identifiers,
first formant values in Hertz,
second formant values in Hertz.

It returns a dataframe with two additional columns, F1_lob2 and F2_lob2, containing normalised formant values.

Usage

lobanov_2(vowel_data)
lobanov_2(vowel_data)

Arguments

vowel_data

a dataframe whose first four columns are speaker ids, vowel ids, F1 values, and F2 values.

Details

This functions applies Lobanov 2.0 normalisation presented in Brand et al. (2021). This variant of Lobanov normalisation is designed to work for datasets whether the vowel types have different token counts from one another. The Lobanov 2.0 value for a vowel is given by

$F_{lobanov2.0_i} = \frac{F_{raw_i} - \mu(\mu_{vowel_1}, \ldots, \mu_{vowel_n})}{\sigma(\mu_{vowel_1}, \ldots, \mu_{vowel_n})}$

where, for ease of notation, we assume all values are from a single speaker. We signify the n vowel types as vowel_1, ..., vowel_2, while i indicates the formant number. We implement the function for F1 and F2.

Value

a dataframe matching the input dataframe with additional columns F1_lob2 and F2_lob2, containing the lobanov normalised F1 and F2 values respectively.

References

Brand, James, Jen Hay, Lynn Clark, Kevin Watson & Márton Sóskuthy (2021): Systematic co-variation of monophthongs across speakers of New Zealand English. Journal of Phonetics. Elsevier. 88. 101096. doi:10.1016/j.wocn.2021.101096

Examples

normed_vowels <- lobanov_2(onze_vowels)
head(normed_vowels)

normed_vowels <- lobanov_2(onze_vowels)
head(normed_vowels)

Test optimal number of MDS dimensions.

Description

Generate bootstrapped confidence intervals and permutation based null distribution for MDS analysis. Output shows how much stress is reduced by adding an additional dimension to the MDS analysis of similarity_matrix, and bootstrapped iterations of similarity_matrix, compared with the stress reduction expected from a matrix with no meaningful structure. This function is inspired by pca_test(), but is less connected with statistical literature than that function. We currently reject additional dimensions is they reduce less stress than we would expect by chance. That is, when the distribution from the boostrapped analyses sits notably lower than the permuted distribution when plotted by plot_mds_test()

Usage

mds_test(
  similarity_matrix,
  n_boots = 50,
  n_perms = 50,
  test_dimensions = 5,
  principal = TRUE,
  mds_type = "ordinal",
  spline_degree = 2,
  spline_int_knots = 2
)
mds_test(
  similarity_matrix,
  n_boots = 50,
  n_perms = 50,
  test_dimensions = 5,
  principal = TRUE,
  mds_type = "ordinal",
  spline_degree = 2,
  spline_int_knots = 2
)

Arguments

`similarity_matrix`	Square matrix of speaker similarity scores.
`n_boots`	Number of bootstrapping iterations (default: 25).
`n_perms`	Number of permutations (default: 25).
`test_dimensions`	Number of MDS dimensions to test for stress reduction (default: 5).
`principal`	Whether to apply principal axis transform to MDS (default: TRUE)
`mds_type`	What kind of MDS to apply, see `smacof::smacofSym()` (default: 'ordinal')
`spline_degree`	How many spline degrees when `type` is 'mspline' (default: 2)
`spline_int_knots`	How many internal knots when `type` is 'mspline' (default: 2)

Value

object of class mds_test_results, containing:

⁠$stress_reduction⁠ a tibble containing
⁠$n_boots⁠ Number of bootstrapping iterations.
⁠$n_perms⁠ Number of permutation iterations
⁠$mds_type⁠ Type of MDS analysis (type argument passed to smacof::smacofSym())
⁠$principal⁠ Whether principal axis transformation is applied (passed to smacof::smacofSym())

Examples

# Apply interval MDS to `sim_matrix`, with 5 permutations and bootstraps
# testing up to 3 dimensions. In real usage, increase `n_boots` and `n_perms`
# to at least 50.
mds_test(
 sim_matrix,
 n_boots = 5,
 n_perms = 5,
 test_dimensions = 3,
 mds_type = 'interval'
)

# Apply interval MDS to `sim_matrix`, with 5 permutations and bootstraps
# testing up to 3 dimensions. In real usage, increase `n_boots` and `n_perms`
# to at least 50.
mds_test(
 sim_matrix,
 n_boots = 5,
 n_perms = 5,
 test_dimensions = 3,
 mds_type = 'interval'
)

Speaker random intercepts from GAMMs for 100 ONZE speakers

Description

A dataset containing the speaker intercepts extracted from GAMM models fit in Brand et al. (2021).

Usage

onze_intercepts
onze_intercepts

Format

A data frame with 100 rows and 21 variables:

speaker: Anonymised speaker code (character).
F1_DRESS: Speaker intercept from GAMM model of DRESS F1.
F2_DRESS: Speaker intercept from GAMM model of DRESS F2.
F1_FLEECE: Speaker intercept from GAMM model of FLEECE F1.
F2_FLEECE: Speaker intercept from GAMM model of FLEECE F2.
F1_GOOSE: Speaker intercept from GAMM model of GOOSE F1.
F2_GOOSE: Speaker intercept from GAMM model of GOOSE F2.
F1_KIT: Speaker intercept from GAMM model of KIT F1.
F2_KIT: Speaker intercept from GAMM model of KIT F2.
F1_LOT: Speaker intercept from GAMM model of LOT F1.
F2_LOT: Speaker intercept from GAMM model of LOT F2.
F1_NURSE: Speaker intercept from GAMM model of NURSE F1.
F2_NURSE: Speaker intercept from GAMM model of NURSE F2.
F1_START: Speaker intercept from GAMM model of START F1.
F2_START: Speaker intercept from GAMM model of START F2.
F1_STRUT: Speaker intercept from GAMM model of STRUT F1.
F2_STRUT: Speaker intercept from GAMM model of STRUT F2.
F1_THOUGHT: Speaker intercept from GAMM model of THOUGHT F1.
F2_THOUGHT: Speaker intercept from GAMM model of THOUGHT F2.
F1_TRAP: Speaker intercept from GAMM model of TRAP F1.
F2_TRAP: Speaker intercept from GAMM model of TRAP F2.

Source

https://osf.io/q4j29/

References

Speaker random intercepts for 418 ONZE speakers

Description

A dataset containing the speaker intercepts extracted from GAMM models fit in Brand et al. (2021).

Usage

onze_intercepts_full
onze_intercepts_full

Format

A data frame with 481 rows and 21 variables:

speaker: Anonymised speaker code.
F1_DRESS: Speaker intercept from GAMM model of DRESS F1.
F2_DRESS: Speaker intercept from GAMM model of DRESS F2.
F1_FLEECE: Speaker intercept from GAMM model of FLEECE F1.
F2_FLEECE: Speaker intercept from GAMM model of FLEECE F2.
F1_GOOSE: Speaker intercept from GAMM model of GOOSE F1.
F2_GOOSE: Speaker intercept from GAMM model of GOOSE F2.
F1_KIT: Speaker intercept from GAMM model of KIT F1.
F2_KIT: Speaker intercept from GAMM model of KIT F2.
F1_LOT: Speaker intercept from GAMM model of LOT F1.
F2_LOT: Speaker intercept from GAMM model of LOT F2.
F1_NURSE: Speaker intercept from GAMM model of NURSE F1.
F2_NURSE: Speaker intercept from GAMM model of NURSE F2.
F1_START: Speaker intercept from GAMM model of START F1.
F2_START: Speaker intercept from GAMM model of START F2.
F1_STRUT: Speaker intercept from GAMM model of STRUT F1.
F2_STRUT: Speaker intercept from GAMM model of STRUT F2.
F1_THOUGHT: Speaker intercept from GAMM model of THOUGHT F1.
F2_THOUGHT: Speaker intercept from GAMM model of THOUGHT F2.
F1_TRAP: Speaker intercept from GAMM model of TRAP F1.
F2_TRAP: Speaker intercept from GAMM model of TRAP F2.

Source

https://osf.io/q4j29/

References

Monophthong data for random sample of speakers from the ONZE corpus

Description

A dataset containing the the first and second formants, speech rate, gender, and year of birth for 100 random speakers from the ONZE corpus. 50 speakers are sampled with birth years before 1900 and 50 sampled with birth years on or after 1900 to ensure a full span of the time period. Data is present for the following NZE monophthongs, represented by Wells lexical sets: DRESS, FLEECE, GOOSE, KIT, LOT, NURSE, START, STRUT, THOUGHT, TRAP. Data for FOOT is excluded due to low token counts.

Usage

onze_vowels
onze_vowels

Format

A dataframe with 101572 rows and 8 variables:

speaker: Anonymised speaker code (factor).
vowel: Variable with Wells lexical sets for 10 NZE monophthongs. Levels: DRESS, FLEECE, GOOSE, KIT, LOT, NURSE, START, STRUT, THOUGHT, TRAP (factor).
F1_50: First formant, extracted from vowel mid-point using LaBB-CAT interface with Praat.
F2_50: Second formant, extracted from vowel mid-point using LaBB-CAT interface with Praat.
speech_rate: Average speaker speech rate for whole recording.
gender: Gender of speaker, two levels: "M", "F" (factor).
yob: Year of birth of speaker.
word: Anonymised word code (factor).

Details

This dataset is derived from the data made available in the supplementary materials of Brand et al. (2021).

Source

https://osf.io/q4j29/

References

Monophthong data for speakers from the ONZE corpus

Description

A dataset containing the the first and second formants, speech rate, gender, and year of birth for 481 speakers from the ONZE corpus. 50 speakers are sampled with birth years before 1900 and 50 sampled with birth years on or after 1900 to ensure a full span of the time period. Data is present for the following NZE monophthongs, represented by Wells lexical sets: DRESS, FLEECE, GOOSE, KIT, LOT, NURSE, START, STRUT, THOUGHT, TRAP. Data for FOOT is excluded due to low token counts.

Usage

onze_vowels_full
onze_vowels_full

Format

A data frame with 414679 rows and 8 variables:

speaker: Anonymised speaker code (factor).
vowel: Variable with Wells lexical sets for 10 NZE monophthongs. Levels: DRESS, FLEECE, GOOSE, KIT, LOT, NURSE, START, STRUT, THOUGHT, TRAP (factor).
F1_50: First formant, extracted from vowel mid-point using LaBB-CAT interface with Praat.
F2_50: Second formant, extracted from vowel mid-point using LaBB-CAT interface with Praat.
speech_rate: Average speaker speech rate for whole recording.
gender: Gender of speaker, two levels: "M", "F" (factor).
yob: Year of birth of speaker.
word: Anonymised word code (factor).

Details

This dataset is derived from the data made available in the supplementary materials of Brand et al. (2021).

Source

https://osf.io/q4j29/

References

Flip PC loadings

Description

The sign of the loadings and scores generated by PCA is arbitrary. Sometimes it is convenient to flip them so that all positive loadings/scores become negative (and vice versa). Sometimes one direction leads to a more natural interpretation. It is also useful when comparing the results of PCA across multiple data sets. This function will flip loadings and scores for PCA analyses carried out by the base R prcomp() and princomp() functions and for the pca_test() function from this package. If you specify only pc_no you will flip the loadings and scores for that PC. You can also specify a variable which you would like to have a positive loading in the resulting PCA.

Usage

pc_flip(pca_obj, pc_no, flip_var = NULL)
pc_flip(pca_obj, pc_no, flip_var = NULL)

Arguments

`pca_obj`	The result of a call to `prcomp()`, `princomp()` or `pca_test`.
`pc_no`	An integer, indicating which PC is to be flipped.
`flip_var`	An optional name of a variable which will become positive in the PC indicated by `pc_no`.

Value

An object matching the class of pca_obj with relevant PC modified.

Examples

  pca_obj <- prcomp(onze_intercepts |> dplyr::select(-speaker), scale=TRUE)

  # flip the second PC
  flipped_pca <- pc_flip(pca_obj, pc_no = 2)

  # flip (if necessary) the third PC, so that the "F1_GOOSE" variable has
  # a positive loading
  flipped_pca <- pc_flip(pca_obj, pc_no = 3, flip_var = "F1_GOOSE")
pca_obj <- prcomp(onze_intercepts |> dplyr::select(-speaker), scale=TRUE)

  # flip the second PC
  flipped_pca <- pc_flip(pca_obj, pc_no = 2)

  # flip (if necessary) the third PC, so that the "F1_GOOSE" variable has
  # a positive loading
  flipped_pca <- pc_flip(pca_obj, pc_no = 3, flip_var = "F1_GOOSE")

PCA contribution plots

Description

Plot the contribution of each variable in a data set to a given Principal Component (PC). Variables are arranged by ascending contribution to the PC, where contribution is the squared loading for the variable expressed as a percentage. These plots match those given in supplementary material for Brand et al. (2021).

Usage

pca_contrib_plot(pca_object, pc_no = 1, cutoff = 50)
pca_contrib_plot(pca_object, pc_no = 1, cutoff = 50)

Arguments

`pca_object`	a pca object generated by `prcomp` or `princomp`.
`pc_no`	the PC to be visualised. Default value is 1.
`cutoff`	the cutoff value for interpretation of the PC. Determines what total percentage contribution we want from the variables we select for interpretation. The default of 50 means that we pick the variables with the highest contribution to the PC until we have accounted for 50% of the total contributions to the PC. Can be set to `NULL` in which case, no cutoff value is plotted.

Details

As with the other plotting functions in this package, the result is a ggplot2 plot. It can be modified using ggplot2 functions (see, e.g., plot_correlation_magnitudes().

Value

ggplot object.

References

Examples

  onze_pca <- prcomp(onze_intercepts |> dplyr::select(-speaker), scale = TRUE)

  # Plot PC1 with a cutoff value of 60%
  pca_contrib_plot(onze_pca, pc_no = 1, cutoff = 60)

  # Plot PC2 with no cutoff value.
  pca_contrib_plot(onze_pca, pc_no = 2, cutoff = NULL)

onze_pca <- prcomp(onze_intercepts |> dplyr::select(-speaker), scale = TRUE)

  # Plot PC1 with a cutoff value of 60%
  pca_contrib_plot(onze_pca, pc_no = 1, cutoff = 60)

  # Plot PC2 with no cutoff value.
  pca_contrib_plot(onze_pca, pc_no = 2, cutoff = NULL)

PCA with confidence intervals and null distributions

Description

Permute and bootstrap data fed to PCA n times. Bootstrapped data is used to estimate confidence bands for variance explained by each PC and for each loading. Squared loadings are multiplied by the squared eigenvalue of the relevant PC. This ranks the loadings of PCs which explain a lot of variance higher than those from PCs which explain less. This approach to PCA testing follows Carmago (2022) and Vieria (2012). This approach differs from Carmago's PCAtest package by separating data generation and plotting.

Usage

pca_test(
  pca_data,
  n = 100,
  scale = TRUE,
  variance_confint = 0.95,
  loadings_confint = 0.9
)
pca_test(
  pca_data,
  n = 100,
  scale = TRUE,
  variance_confint = 0.95,
  loadings_confint = 0.9
)

Arguments

`pca_data`	data fed to the `prcomp` function.
`n`	the number of times to permute and bootstrap that data. Warning: high values will take a long time to compute.
`scale`	whether the PCA variables should be scaled (default: TRUE).
`variance_confint`	size of confidence intervals for variance explained (default: 0.95).
`loadings_confint`	size of confidence intervals for index loadings (default: 0.9).

Details

Default confidence bands on variance explained at 0.95 (i.e. alpha of 0.05). In line with Vieria (2012), the default confidence bands on the index loadings are at 0.9.

See plot_loadings() and plot_variance_explained() for useful plotting functions.

Value

object of class pca_test_results, containing:

⁠$variance⁠ a tibble containing the variances explained and confidence intervals for each PC.
⁠$loadings⁠ a tibble containing the index loadings and confidence intervals for each variable and PC.
⁠$raw_data⁠ a tibble containing the variance explained and loadings for each bootstrapped and permuted analysis.
⁠$variance_confint⁠ confidence intervals applied to variance explained.
⁠$loadings_confint⁠ confidence interval applied to loadings.
⁠$n⁠ the number of iterations of both permutation and bootstrapping.

References

Camargo, Arley (2022), PCAtest: testing the statistical significance of Principal Component Analysis in R. PeerJ 10. e12967. doi:10.7717/peerj.12967

Vieira, Vasco (2012): Permutation tests to estimate significances on Principal Components Analysis. Computational Ecology and Software 2. 103–123.

Examples

onze_pca <- pca_test(
  onze_intercepts |> dplyr::select(-speaker),
  n = 10,
  scale = TRUE
)
summary(onze_pca)
onze_pca <- pca_test(
  onze_intercepts |> dplyr::select(-speaker),
  n = 10,
  scale = TRUE
)
summary(onze_pca)

Run permutation test on PCA analysis.

Description

Permute data fed to PCA a given number of times, collecting the number of significant pairwise correlations in the permuted data and the variances explained for a given number of PCs.

Usage

permutation_test(
  pca_data,
  pc_n = 5,
  n = 100,
  scale = TRUE,
  cor.method = "pearson"
)
permutation_test(
  pca_data,
  pc_n = 5,
  n = 100,
  scale = TRUE,
  cor.method = "pearson"
)

Arguments

`pca_data`	data fed to the `prcomp` function. Remove non-continuous variables.
`pc_n`	the number of PCs to collect variance explained from.
`n`	the number of times to permute that data. Warning: high values will take a long time to compute.
`scale`	whether the PCA variables should be scaled (default = TRUE).
`cor.method`	method to use for correlations (default = "pearson"). Alternative is "spearman".

Details

This function is now superseded. Use correlation_test() for pairwise correlations and pca_test() for variance explained and loadings.

Value

object of class permutation_test

⁠$permuted_variances⁠ n x pc_no matrix of variances explained by first pc_no PCs in n permutations of original data.
⁠$permuted_correlations⁠ list of length n of significant pairwise correlations in n permutations of the data (<= 0.05).
⁠$actual_variances⁠ pc_n x 2 tibble of variances explained by first pc_n PCs with original data.
⁠$actual_correlations⁠ the number of significant pairwise correlations (<= 0.05) in the original data.

Examples

permutation_test(
  onze_intercepts |> dplyr::select(-speaker),
  pc_n = 5,
  n = 10,
  scale = TRUE,
  cor.method = 'pearson'
 )

permutation_test(
  onze_intercepts |> dplyr::select(-speaker),
  pc_n = 5,
  n = 10,
  scale = TRUE,
  cor.method = 'pearson'
 )

Plot of correlation counts from `correlation_test` object

Description

Plot the number of statistically significant pairwise correlations in a data set given an alpha value against the distribution of counts of statistically significant pairwise correlations in permuted data. This is an informal test which is useful to convincing yourself that there is structure in your data which PCA might be able to uncover.

Usage

plot_correlation_counts(cor_test, alpha = 0.05, half_violin = FALSE)
plot_correlation_counts(cor_test, alpha = 0.05, half_violin = FALSE)

Arguments

`cor_test`	an object of class `correlation_test` generated by `correlation_test`.
`alpha`	significance level for counting correlation as significant.
`half_violin`	Plot correlation counts using a half violin plot and half point plot. Quantiles are not currently supported.

Details

The resulting plot presents the distribution of counts of statistically significant correlations at a given alpha level in the permuted data and the count of statistically significant correlations in the original data. If the red dot is above the uppermost line inside the blue violin plot, we say the number of statistically significant correlations in the real data is itself statistically significant. Usually this is used as a rough sanity check in the course of a PCA workflow and we want to see the red dot well above the violin (as in the example below).

The resulting plot is a ggplot2 plot and can be modified using functions from that package. For instance, titles can be removed using the ggplot2::labs() function (as in the examples below).

Value

ggplot object.

Examples

  # Test correlations (use at least n = 100)
  cor_test <- correlation_test(onze_intercepts |>
    dplyr::select(-speaker), n = 10)
  cor_plot <- plot_correlation_counts(cor_test)
  cor_plot

  # make statistical test more strict by reducing the alpha.
  cor_plot_strict <- plot_correlation_counts(cor_test, alpha = 0.01)

  # modify plot using `ggplot2` functions, e.g.
  cor_plot_strict +
    ggplot2::labs(title = NULL) +
    ggplot2::theme_bw()
# Test correlations (use at least n = 100)
  cor_test <- correlation_test(onze_intercepts |>
    dplyr::select(-speaker), n = 10)
  cor_plot <- plot_correlation_counts(cor_test)
  cor_plot

  # make statistical test more strict by reducing the alpha.
  cor_plot_strict <- plot_correlation_counts(cor_test, alpha = 0.01)

  # modify plot using `ggplot2` functions, e.g.
  cor_plot_strict +
    ggplot2::labs(title = NULL) +
    ggplot2::theme_bw()

Plot distribution of correlations from `correlation_test` object

Description

This plot type is used in Brand et al. (2021). It presents the magnitudes of the correlations from the real data as a solid red line, and the correlations from each iteration of the permutation test as light blue lines. This gives a visual sense of the distribution of random correlations compared with those in the actual data. If there are significant pairwise correlations in the data, the thick red line should be visually lower and wider across the plot than the thinner blue lines. If there are no significant pairwise correlations, then the thick red line will have the same shape as the blue lines.

Usage

plot_correlation_magnitudes(cor_test)
plot_correlation_magnitudes(cor_test)

Arguments

cor_test

an object of class correlation_test generated by correlation_test.

Value

ggplot object.

References

Examples

  # Test correlations (use at least n = 100)
  cor_test <- correlation_test(onze_intercepts |>
    dplyr::select(-speaker), n = 10)
  cor_plot <- plot_correlation_magnitudes(cor_test)
  cor_plot

  # modify plot using `ggplot2` functions, e.g.
  cor_plot +
    ggplot2::labs(title = NULL) +
    ggplot2::theme_bw()
# Test correlations (use at least n = 100)
  cor_test <- correlation_test(onze_intercepts |>
    dplyr::select(-speaker), n = 10)
  cor_plot <- plot_correlation_magnitudes(cor_test)
  cor_plot

  # modify plot using `ggplot2` functions, e.g.
  cor_plot +
    ggplot2::labs(title = NULL) +
    ggplot2::theme_bw()

Plot PC index loadings from `pca_test` object.

Description

Index loadings (Vieira 2012) are presented with confidence intervals on the sampling distribution generated by bootstrapping and a null distribution generated by permutation.

Usage

plot_loadings(
  pca_test,
  pc_no = 1,
  violin = FALSE,
  filter_boots = FALSE,
  quantile_threshold = 0.25
)
plot_loadings(
  pca_test,
  pc_no = 1,
  violin = FALSE,
  filter_boots = FALSE,
  quantile_threshold = 0.25
)

Arguments

`pca_test`	an object of class pca_test_results generated by `pca_test`.
`pc_no`	An integer indicating which PC to plot.
`violin`	If TRUE, violin plots are added for the confidence intervals of the sampling distribution.
`filter_boots`	if TRUE, only bootstrap iterations in which the variable with the highest median loading is above `quantile_threshold`.
`quantile_threshold`	a real value between 0 and 1. Use this to change the threshold used for filtering bootstrap iterations. The default is 0.25.

Details

If PCs are unstable, there is an option (filter_boots) to take only the bootstrap iterations in which the variable with the highest median loading across all iterations is above quantile_threshold (default: 0.25). This helps to reveal reliable connections of this variable with other variables in the data set.

Value

ggplot object.

References

Vieira, Vasco (2012): Permutation tests to estimate significances on Principal Components Analysis. Computational Ecology and Software 2. 103–123.

Examples

  onze_pca <- pca_test(onze_intercepts |> dplyr::select(-speaker), n = 10)
  # Plot PC1
  plot_loadings(onze_pca, pc_no=1)
  # Plot PC2 with violins (not particularly useful in this case!)
  plot_loadings(onze_pca, pc_no=2, violin = TRUE)
onze_pca <- pca_test(onze_intercepts |> dplyr::select(-speaker), n = 10)
  # Plot PC1
  plot_loadings(onze_pca, pc_no=1)
  # Plot PC2 with violins (not particularly useful in this case!)
  plot_loadings(onze_pca, pc_no=2, violin = TRUE)

Plot `mds_test()` results

Description

Plot output from mds_test().

Usage

plot_mds_test(mds_test)
plot_mds_test(mds_test)

Arguments

mds_test

Object of class mds_test_results (generated by mds_test()).

Value

ggplot object.

Examples

mds_result <- mds_test(
    sim_matrix,
    n_boots = 10,
    n_perms = 10,
    test_dimensions = 3,
    mds_type = 'interval'
 )
 plot_mds_test(mds_result)
mds_result <- mds_test(
    sim_matrix,
    n_boots = 10,
    n_perms = 10,
    test_dimensions = 3,
    mds_type = 'interval'
 )
 plot_mds_test(mds_result)

Plot Scores from Significant PCs Against PCA Input

Description

It is sometimes useful to see the relationship between PCs and the raw values of the input data fed into PCA. This function takes the results of running pca_test, the scores for each speaker from the pca object, and the raw data fed into the PCA analysis. In the usual model-to-pca analysis pipeline, the resulting plot depicts by-speaker random intercepts for each vowel and an indication of which variables are significantly loaded onto the PCs. It allows the researcher to visualise the strength of the relationship between intercepts and PC scores.

Usage

plot_pc_input(pca_object, pca_data, pca_test)
plot_pc_input(pca_object, pca_data, pca_test)

Arguments

`pca_object`	Output of `prcomp`.
`pca_data`	Data fed into `prcomp`. This should not include speaker identifiers.
`pca_test`	Output of `pca_test`

Value

a ggplot object.

Examples

pca_data <- onze_intercepts |> dplyr::select(-speaker)
onze_pca <- prcomp(pca_data, scale = TRUE)
onze_pca_test <- pca_test(pca_data, n = 10)
plot_pc_input(onze_pca, pca_data, onze_pca_test)

pca_data <- onze_intercepts |> dplyr::select(-speaker)
onze_pca <- prcomp(pca_data, scale = TRUE)
onze_pca_test <- pca_test(pca_data, n = 10)
plot_pc_input(onze_pca, pca_data, onze_pca_test)

Plot PC loadings in vowel space

Description

Plot loadings from a PCA analysis carried out on vocalic data. Vowel positions mean values are at the mean with arrows indicating loadings. Loadings are multiplied by the standard deviation, by vowel, of the initial input data. This is OK for getting a quick, intuitive, interpretation of what the PCs mean in the vowel space. When using a model-to-PCA pipeline, it is not recommended to use these plots directly in publications as the models should more reliably control variation in vocalic readings than taking the standard mean and standard deviation.

Usage

plot_pc_vs(vowel_data, pca_obj, pc_no = 1, is_sig = FALSE)
plot_pc_vs(vowel_data, pca_obj, pc_no = 1, is_sig = FALSE)

Arguments

`vowel_data`	A dataframe whose first four columns are speaker ids, vowel ids, F1 values, and F2 values.
`pca_obj`	The result of a call to `prcomp()`, `princomp()` or `pca_test()`.
`pc_no`	An integer, indicating which PC to plot (default is PC1).
`is_sig`	A boolean, indicating whether only 'significant' loadings, according to `pca_test` should be plotted (only works with objects of class `pca_test_results`).

Value

a ggplot object.

Examples

  onze_pca <- prcomp(onze_intercepts |> dplyr::select(-speaker), scale=TRUE)
  # Default is to plot PC1
  plot_pc_vs(onze_vowels, onze_pca)
  # Or plot another PC with `pc_no`
  plot_pc_vs(onze_vowels, onze_pca, pc_no = 3)
onze_pca <- prcomp(onze_intercepts |> dplyr::select(-speaker), scale=TRUE)
  # Default is to plot PC1
  plot_pc_vs(onze_vowels, onze_pca)
  # Or plot another PC with `pc_no`
  plot_pc_vs(onze_vowels, onze_pca, pc_no = 3)

Create plot from `permutation_test()`.

Description

Plots results of a permutation test carried out with the permutation_test() function. Now use either correlation_test() or pca_test() and the associated plotting functions.

Usage

plot_permutation_test(permutation_results, violin = FALSE)
plot_permutation_test(permutation_results, violin = FALSE)

Arguments

permutation_results

object of class permutation_results.

violin

Determines whether the variances explained are depicted by distinct violin plots for each PC or by connected lines. the advantage of lines is that they correctly indicate that values for each PC depend on one another within a given permutation. That is, if an earlier PC soaks up a lot of the variation in a data set, then there is less variation left to explain by subsequent PCs. Default value is FALSE.

Value

ggplot object.

Examples

onze_perm <- permutation_test(
  onze_intercepts |> dplyr::select(-speaker),
  pc_n = 5,
  n = 10,
  scale = TRUE,
  cor.method = 'pearson'
 )
plot_permutation_test(onze_perm)
onze_perm <- permutation_test(
  onze_intercepts |> dplyr::select(-speaker),
  pc_n = 5,
  n = 10,
  scale = TRUE,
  cor.method = 'pearson'
 )
plot_permutation_test(onze_perm)

Create plot of variances explained from `pca_test` object

Description

The variance explained by each PC in a dataset is plotted with confidence intervals generated by bootstrapping and a null distribution generated by permutation. The function accepts the result of calling the pca_test function.

Usage

plot_variance_explained(pca_test, pc_max = NA, percent = TRUE)
plot_variance_explained(pca_test, pc_max = NA, percent = TRUE)

Arguments

`pca_test`	an object of class pca_test_results generated by `pca_test`.
`pc_max`	the maximum number of PCs to plot. If NA, plot all PCs.
`percent`	if TRUE, represent variance explained as a percentage. If FALSE, represent as eigenvalues.

Details

By default, variance explained is represented as a percentage. If the argument percent is set to FALSE, then the variance explained is represented by the eigenvalues corresponding to each PC.

Value

ggplot object.

Examples

  onze_pca <- pca_test(onze_intercepts |> dplyr::select(-speaker), n = 10)
  # Plot with percentages
  plot_variance_explained(onze_pca)
  # Plot with eigenvalues and only the first 5 PCs.
  plot_variance_explained(onze_pca, pc_max = 5, percent = FALSE)
onze_pca <- pca_test(onze_intercepts |> dplyr::select(-speaker), n = 10)
  # Plot with percentages
  plot_variance_explained(onze_pca)
  # Plot with eigenvalues and only the first 5 PCs.
  plot_variance_explained(onze_pca, pc_max = 5, percent = FALSE)

Plot vowel space for speaker or speakers.

Description

Given vowel data with the first column identifying speakers, the second identifying vowels, the third containing F1 and the fourth containing F2 values, plot a vowel space using the speaker's mean values for each vowel. Typically it is best to produce a plot from scratch. The primary purpose of this function is to generate quick plots for interactive use, rather than to produce plots for publication.

Usage

plot_vowel_space(
  vowel_data,
  speakers = NULL,
  vowel_colours = NULL,
  label_size = 4,
  means_only = TRUE,
  ellipses = FALSE,
  point_alpha = 0.1,
  facet = TRUE
)
plot_vowel_space(
  vowel_data,
  speakers = NULL,
  vowel_colours = NULL,
  label_size = 4,
  means_only = TRUE,
  ellipses = FALSE,
  point_alpha = 0.1,
  facet = TRUE
)

Arguments

`vowel_data`	data frame of vowel tokens as described above.
`speakers`	list of speaker identifiers for speaker whose vowel space is to be plotted.
`vowel_colours`	a named list of vowel = colour entries to indicate which colour to plot each vowel.
`label_size`	It is often convenient to adjust the size of the labels (in pts). Default is 4.
`means_only`	whether to plot means only or all data points. Default: TRUE.
`ellipses`	whether to 95% confidence ellipses. Only works if means_only is FALSE. Default is FALSE.
`point_alpha`	alpha value for data points if means_only is FALSE.
`facet`	whether to plot distinct speakers in distinct facets. Default is TRUE.

Value

ggplot object.

Examples

# Plot mean vowel space across
plot_vowel_space(
  onze_vowels,
  speakers = NULL,
  vowel_colours = NULL,
  label_size = 4,
  means_only = TRUE,
  ellipses = FALSE,
  point_alpha = 0.1,
  facet = FALSE
 )
# Plot mean vowel space across
plot_vowel_space(
  onze_vowels,
  speakers = NULL,
  vowel_colours = NULL,
  label_size = 4,
  means_only = TRUE,
  ellipses = FALSE,
  point_alpha = 0.1,
  facet = FALSE
 )

Formant and amplitude for intervals of QuakeBox monologues

Description

QuakeBox monologues are divided into intervals of fixed length within mean values are calcualted for formants, amplitude, and articulation rate. Data from 77 speakers is provide (the same sample as qb_vowels).

Usage

qb_intervals
qb_intervals

Format

A data frame with 53940 rows and 10 variables:

interval_length: Length of interval in seconds.
speaker: Anonymised speaker code (char).
interval: Time in seconds at which interval ends.
articulation_rate: Mean articulation rate within interval.
amplitude: Mean maximum amplitude within interval.
DRESS_F1: Speaker intercept from GAMM model of DRESS F1.
DRESS_F2: Speaker intercept from GAMM model of DRESS F2.
FLEECE_F1: Speaker intercept from GAMM model of FLEECE F1.
FLEECE_F2: Speaker intercept from GAMM model of FLEECE F2.
GOOSE_F1: Speaker intercept from GAMM model of GOOSE F1.
GOOSE_F2: Speaker intercept from GAMM model of GOOSE F2.
KIT_F1: Speaker intercept from GAMM model of KIT F1.
KIT_F2: Speaker intercept from GAMM model of KIT F2.
LOT_F1: Speaker intercept from GAMM model of LOT F1.
LOT_F2: Speaker intercept from GAMM model of LOT F2.
NURSE_F1: Speaker intercept from GAMM model of NURSE F1.
NURSE_F2: Speaker intercept from GAMM model of NURSE F2.
START_F1: Speaker intercept from GAMM model of START F1.
START_F2: Speaker intercept from GAMM model of START F2.
STRUT_F1: Speaker intercept from GAMM model of STRUT F1.
STRUT_F2: Speaker intercept from GAMM model of STRUT F2.
THOUGHT_F1: Speaker intercept from GAMM model of THOUGHT F1.
THOUGHT_F2: Speaker intercept from GAMM model of THOUGHT F2.
TRAP_F1: Speaker intercept from GAMM model of TRAP F1.
TRAP_F2: Speaker intercept from GAMM model of TRAP F2.

Details

Two interval lengths are given: 60 seconds and 240 seconds.

Formant data is z-scored by speaker and vowel, while the amplitude and articulation rate are z-scored by speaker.

Original data was generated for Wilson Black et al. (2023).

Source

https://osf.io/m8nkh/

References

Wilson Black, Joshua, Jennifer Hay, Lynn Clark & James Brand (2023): The overlooked effect of amplitude on within-speaker vowel variation. Linguistics Vanguard. Walter de Gruyter GmbH. 9(1). 173–189. doi:10.1515/lingvan-2022-0086

Formants from QuakeBox 1

Description

A dataset containing formant values, amplitude, articulation rate, and following segment data for 10 New Zealand English monophthongs, along with participant demographics.

Usage

qb_vowels
qb_vowels

Format

A data frame with 26331 rows and 14 variables:

speaker: Anonymised speaker code (char).
vowel: Wells lexical sets for 10 NZE monophthongs. Levels: DRESS, FLEECE, GOOSE, KIT, LOT, NURSE, START, STRUT, THOUGHT, TRAP, FOOT (char).
F1_50: First formant in Hz, extracted from vowel mid-point using LaBB-CAT interface with Praat.
F2_50: Second formant in Hz, extracted from vowel mid-point using LaBB-CAT interface with Praat.
participant_age_category: Age category of speaker. Values: 18-25, 26-35, 36-45, ..., 76-85 (char).
participant_gender: Gender of participant. Values: M, F (char).
participant_nz_ethnic: New Zealand ethnic category of participant. Values: NZ mixed ethnicity, NZ European, Other (char).
word_freq: Frequency of word from which vowel token is taken in CELEX.
word: Anonymised word id (char).
time: Time in seconds at which vowel segment starts.
vowel_duration: Length of vowel in seconds.
articulation_rate: Articulation rate of utterance from which token is taken.
following_segment_category: Category of following segment. NB: liquids have already been removed. Levels: labial, velar, other (factor).
amplitude: Maximum amplitude of word from which vowel token is taken, generated by LaBB-CAT interface with Praat.

Details

Original data was generated for Wilson Black et al. (2023).

Source

https://osf.io/m8nkh/

References

Similarity matrix from online perception test.

Description

Mean similarity ratings for 38 QuakeBox speakers from an online pairwise similarity task. Random noise added.

Usage

sim_matrix
sim_matrix

Format

A 38x38 matrix

Summary function for correlation test object. Set alpha to change significance level.

Description

Set alpha to change significance level and n_cors to change number of pairwise correlations given.

Usage

## S3 method for class 'correlation_test'
summary(object, alpha = 0.05, n_cors = 5, ...)
## S3 method for class 'correlation_test'
summary(object, alpha = 0.05, n_cors = 5, ...)

Arguments

`object`	object of class `⁠correlation test⁠`,
`alpha`	significance level for counting correlation as significant.
`n_cors`	number of pairwise correlations to list.
`...`	additional arguments affecting the summary produced.

Value

a glue object.

Package 'nzilbb.vowels'

Help Index

Permutation test of pairwise correlations

Description

Usage

Arguments

Value

Examples

Apply Lobanov 2.0 normalisation

Description

Usage

Arguments

Details

Value

References

Examples

Test optimal number of MDS dimensions.

Description

Usage

Arguments

Value

Examples

Speaker random intercepts from GAMMs for 100 ONZE speakers

Description

Usage

Format

Source

References

Speaker random intercepts for 418 ONZE speakers

Description

Usage

Format

Source

References

Monophthong data for random sample of speakers from the ONZE corpus

Description

Usage

Format

Details

Source

References

Monophthong data for speakers from the ONZE corpus

Description

Usage

Format

Details

Source

References

Flip PC loadings

Description

Usage

Arguments

Value

Examples

PCA contribution plots

Description

Usage

Arguments

Details

Value

References

Examples

PCA with confidence intervals and null distributions

Description

Usage

Arguments

Details

Value

References

Examples

Run permutation test on PCA analysis.

Description

Usage

Arguments

Details

Value

Examples

Plot of correlation counts from correlation_test object

Description

Usage

Plot of correlation counts from `correlation_test` object

Plot distribution of correlations from `correlation_test` object

Plot PC index loadings from `pca_test` object.

Plot `mds_test()` results

Create plot from `permutation_test()`.

Create plot of variances explained from `pca_test` object