Package: text 1.3.0

Oscar Kjell

text: Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.

Authors:Oscar Kjell [aut, cre], Salvatore Giorgi [aut], Andrew Schwartz [aut]

text_1.3.0.tar.gz
text_1.3.0.tar.gz(r-4.5-noble)text_1.3.0.tar.gz(r-4.4-noble)
text_1.3.0.tgz(r-4.4-emscripten)text_1.3.0.tgz(r-4.3-emscripten)
text.pdf |text.html
text/json (API)
NEWS

# Install 'text' in R:
install.packages('text', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/oscarkjell/text/issues

Pkgdown site:https://r-text.org

Uses libs:
  • openjdk– OpenJDK Java runtime, using Hotspot JIT
Datasets:

openjdk

8.33 score 3 stars 1 packages 408 scripts 2.9k downloads 9 mentions 59 exports 131 dependencies

Last updated 20 days agofrom:3f6ae79d6e. Checks:OK: 2. Indexed: no.

TargetResultDate
Doc / VignettesOKDec 05 2024
R-4.5-linuxOKDec 05 2024

Exports:find_textrpp_envtextAssesstextCentralitytextCentralityPlottextClassifytextCleantextCleanNonASCIItextDescriptivestextDimNametextDistancetextDistanceMatrixtextDistanceNormtextDomainComparetextEmbedtextEmbedLayerAggregationtextEmbedRawLayerstextEmbedReducetextEmbedStatictextFindNonASCIItextFineTuneDomaintextFineTuneTasktextGenerationtextLBAMtextModelLayerstextModelstextModelsRemovetextNERtextPCAtextPCAPlottextPlottextPredicttextPredictAlltextPredictTesttextProjectiontextProjectionPlottextQAtextrpp_initializetextrpp_installtextrpp_install_virtualenvtextrpp_uninstalltextSimilaritytextSimilarityMatrixtextSimilarityNormtextSumtextTokenizetextTokenizeAndCounttextTopicstextTopicsReducetextTopicsTesttextTopicsTreetextTopicsWordcloudtextTraintextTrainListstextTrainNtextTrainNPlottextTrainRandomForesttextTrainRegressiontextTranslatetextZeroShot

Dependencies:backportsbitbit64checkmateclassclicliprclockcodetoolscolorspacecommonmarkcowplotcpp11crayoncurldata.tablediagramdialsDiceDesigndigestdoFuturedplyreffsizefansifarverfloatforeachfurrrfuturefuture.applygenericsggplot2ggrepelggwordcloudglobalsgluegowerGPfitgridtextgtablegtoolshardhatherehmsipredisobandISOcodesiteratorsjpegjsonliteKernSmoothlabelinglatticelavalgrlhslifecyclelistenvlubridatemagrittrmalletmarkdownMASSMatrixMatrixExtramgcvmlapimodelenvmunsellngramnlmennetnumDerivparallellyparsnippillarpkgconfigpngprettyunitsprodlimprogressprogressrpurrrR6rappdirsRColorBrewerRcppRcppArmadilloRcppEigenRcppProgressRcppTOMLreadrrecipesreticulateRhpcBLASctlrJavarlangrpartrprojrootrsamplersparseRSpectrascalessfdshapesliderSQUAREMstopwordsstringistringrsurvivaltext2vectextmineRtibbletidyrtidyselecttimechangetimeDatetopicstunetzdbutf8vctrsviridisLitevroomwarpwithrworkflowsxfunxml2yardstick

How to best manage computationally heavy analyses

Rendered fromhuggingface_in_r_and_computer_capacity.Rmdusingknitr::rmarkdownon Dec 05 2024.

Last update: 2022-09-20
Started: 2022-09-20

Extended Installation Guide

Rendered fromhuggingface_in_r_extended_installation_guide.Rmdusingknitr::rmarkdownon Dec 05 2024.

Last update: 2024-12-05
Started: 2022-09-20

Implicit Motives Tutorial

Rendered fromimplicit_motives_tutorial.Rmdusingknitr::rmarkdownon Dec 05 2024.

Last update: 2024-12-05
Started: 2024-12-05

The Language-Based Assessment Model (L-BAM) Library

Rendered fromLBAM.Rmdusingknitr::rmarkdownon Dec 05 2024.

Last update: 2024-12-05
Started: 2024-12-05

L-BAM Tutorial

Rendered fromlbam_tutorial.Rmdusingknitr::rmarkdownon Dec 05 2024.

Last update: 2024-12-05
Started: 2024-12-05

Pre-registration and Researcher Degrees of Freedom

Rendered frompre_registration_and_transformers.Rmdusingknitr::rmarkdownon Dec 05 2024.

Last update: 2024-07-30
Started: 2023-08-09

Psychological Methods: the Text Tutorial

Rendered frompsychological_methods.Rmdusingknitr::rmarkdownon Dec 05 2024.

Last update: 2023-08-09
Started: 2023-08-09

HuggingFace language models are downloaded in .cache

Rendered fromremoving_huggingface_transformers_cache_files.Rmdusingknitr::rmarkdownon Dec 05 2024.

Last update: 2022-09-20
Started: 2022-09-20

Creating a Singularity Container to Run HuggingFace Transformers Models in R

Rendered fromsingularity_transformers_container.Rmdusingknitr::rmarkdownon Dec 05 2024.

Last update: 2023-08-09
Started: 2022-09-20

Getting started

Rendered fromtext.Rmdusingknitr::rmarkdownon Dec 05 2024.

Last update: 2024-12-05
Started: 2020-11-23

HuggingFace Transformers in R: Word Embeddings Defaults and Specifications

Rendered fromhuggingface_in_r.Rmdusingknitr::rmarkdownon Dec 05 2024.

Last update: 2024-07-30
Started: 2022-09-20

Readme and manuals

Help Manual

Help pageTopics
Example data for plotting a Semantic Centrality Plot.centrality_data_harmony
Data for plotting a Dot Product Projection Plot.DP_projections_HILS_SWLS_100
Example text and numeric data.Language_based_assessment_data_3_100
Text and numeric data for 10 participants.Language_based_assessment_data_8
Example data for plotting a Principle Component Projection Plot.PC_projections_satisfactionwords_40
Word embeddings from textEmbedRawLayers functionraw_embeddings_1
Semantic similarity score between single words' and an aggregated word embeddingstextCentrality
Plots words from textCentrality()textCentralityPlot
Cleans text from standard personal informationtextClean
Clean non-ASCII characterstextCleanNonASCII
Compute descriptive statistics of character variables.textDescriptives
Change dimension namestextDimName
Semantic distancetextDistance
Semantic distance across multiple word embeddingstextDistanceMatrix
Semantic distance between a text variable and a word normtextDistanceNorm
Compare two language domainstextDomainCompare
Embed texttextEmbed
Aggregate layerstextEmbedLayerAggregation
Extract layers of hidden statestextEmbedRawLayers
Pre-trained dimension reduction (experimental)textEmbedReduce
Apply static word embeddingstextEmbedStatic
Detect non-ASCII characterstextFindNonASCII
Domain Adapted Pre-Training (EXPERIMENTAL - under development)textFineTuneDomain
Task Adapted Pre-Training (EXPERIMENTAL - under development)textFineTuneTask
Text generationtextGeneration
The LBAM librarytextLBAM
Number of layerstextModelLayers
Check downloaded, available models.textModels
Delete a specified modeltextModelsRemove
Named Entity Recognition. (experimental)textNER
textPCA()textPCA
textPCAPlottextPCAPlot
Plot wordstextPlot
textPredict, textAssess and textClassifytextAssess textClassify textPredict
Predict from several models, selecting the correct inputtextPredictAll
Significance testing correlations If only y1 is provided a t-test is computed, between the absolute error from yhat1-y1 and yhat2-y1.textPredictTest
Supervised Dimension ProjectiontextProjection
Plot Supervised Dimension ProjectiontextProjectionPlot
Question Answering. (experimental)textQA
Initialize text required python packagestextrpp_initialize
Install text required python packages in conda or virtualenv environmenttextrpp_install textrpp_install_virtualenv
Uninstall textrpp conda environmenttextrpp_uninstall
Semantic SimilaritytextSimilarity
Semantic similarity across multiple word embeddingstextSimilarityMatrix
Semantic similarity between a text variable and a word normtextSimilarityNorm
Summarize texts. (experimental)textSum
Tokenize text-variablestextTokenize
Tokenize and counttextTokenizeAndCount
BERTopicstextTopics
textTopicsReduce (EXPERIMENTAL)textTopicsReduce
Wrapper for topicsTest function from the topics packagetextTopicsTest
textTopicsTest (EXPERIMENTAL) to get the hierarchical topic treetextTopicsTree
Plot word cloudstextTopicsWordcloud
Trains word embeddingstextTrain
Train lists of word embeddingstextTrainLists
Cross-validated accuracies across sample-sizestextTrainN
Plot cross-validated accuracies across sample sizestextTrainNPlot
Trains word embeddings usig random foresttextTrainRandomForest
Train word embeddings to a numeric variable.textTrainRegression
Translation. (experimental)textTranslate
Zero Shot Classification (Experimental)textZeroShot
Word embeddings for 4 text variables for 40 participantsword_embeddings_4