--- title: "Getting started" description: "Introduction to topics in R." author: "" opengraph: image: src: "http://r-text.org/articles/text_files/figure-html/unnamed-chunk-5-1.png" twitter: card: summary_large_image creator: "@oscarkjell" output: github_document #rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{text} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) evaluate = FALSE ``` The *topics*-package enables Differential Language Analysis using words, phrases and topics Please reference our tutorial article when using the package: **Language visualisation methods for psychological assessments** and Ackermann L., Zhuojun G. & Kjell O.N.E. (2024). An R-package for visualizing text in topics. https://github.com/theharmonylab/topics. `DOI:zenodo.org/records/11165378`.. This Getting Started tutorial is going through the most central *topics* functions. ## Usage In an example where the topics are used to predict the PHQ-9 score, the pipeline can be run as follows: **1. Data Preprocessing**
To preprocess the data, run the following command: ```{r dtm, eval = TRUE, warning=TRUE, message=TRUE} library(topics) dtm <- topicsDtm( data = dep_wor_data$Depword) # Check the results from the dtm and refine stopwords and removal rates if necessary dtm_evaluation <- topicsDtmEval( dtm) dtm_evaluation$frequency_plot ``` **2. Model Training**
To train the LDA model, run the following command: ```{r model, eval = TRUE, warning=FALSE, message=FALSE} model <- topicsModel( dtm = dtm, num_topics = 20, num_iterations = 1000) ``` **3. Model Inference**
To infer the topic term distribution of the documents, run the following command: ```{r preds, eval = TRUE, warning=FALSE, message=FALSE} preds <- topicsPreds( model = model, data = dep_wor_data$Depword) ``` **4. Statistical Analysis**
To analyze the relationship between the topics and the prediction variable, run the following command: ```{r test, eval = TRUE, warning=FALSE, message=FALSE} test <- topicsTest( data = dep_wor_data, model = model, preds = preds, x_variable = "PHQ9tot", controls = c("Age"), test_method = "linear_regression") ``` **5. Visualization**
To visualize the significant topics as wordclouds, run the following command: ```{r plot_list, eval = TRUE, warning=FALSE, message=FALSE} plot_list <- topicsPlot( model = model, test = test, figure_format = "png") # showing some of the plots plot_list$square1 ``` ### Articles using the topics-package Differentiating balance and harmony through natural language analysis: A cross-national exploration of two understudied wellbeing-related concepts ### Other relevant references The below list consists of papers analyzing human language in a similar fashion that is possible in *topics*. ***Methods Articles*** [Gaining insights from social media language: Methodologies and challenges.](http://www.peggykern.org/uploads/5/6/6/7/56678211/kern_2016_-_gaining_insights_from_social_media_language-_methodologies_and_challenges.pdf). *Kern et al., (2016). Psychological Methods.* ***Computer Science: Python Software*** [DLATK: Differential language analysis toolkit.](https://aclanthology.org/D17-2010/) *Schwartz, H. A., Giorgi, et al., (2017). In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations* [DLATK](https://github.com/dlatk/dlatk)