Package: PsychWordVec 2023.9

Han-Wu-Shuang Bao

PsychWordVec: Word Embedding Research Framework for Psychological Science

An integrative toolbox of word embedding research that provides: (1) a collection of 'pre-trained' static word vectors in the '.RData' compressed format <https://psychbruce.github.io/WordVector_RData.pdf>; (2) a series of functions to process, analyze, and visualize word vectors; (3) a range of tests to examine conceptual associations, including the Word Embedding Association Test <doi:10.1126/science.aal4230> and the Relative Norm Distance <doi:10.1073/pnas.1720347115>, with permutation test of significance; (4) a set of training methods to locally train (static) word vectors from text corpora, including 'Word2Vec' <arxiv:1301.3781>, 'GloVe' <doi:10.3115/v1/D14-1162>, and 'FastText' <arxiv:1607.04606>; (5) a group of functions to download 'pre-trained' language models (e.g., 'GPT', 'BERT') and extract contextualized (dynamic) word vectors (based on the R package 'text').

Authors:Han-Wu-Shuang Bao [aut, cre]

PsychWordVec_2023.9.tar.gz
PsychWordVec_2023.9.tar.gz(r-4.5-noble)PsychWordVec_2023.9.tar.gz(r-4.4-noble)
PsychWordVec_2023.9.tgz(r-4.4-emscripten)PsychWordVec_2023.9.tgz(r-4.3-emscripten)
PsychWordVec.pdf |PsychWordVec.html
PsychWordVec/json (API)
NEWS

# Install 'PsychWordVec' in R:
install.packages('PsychWordVec', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/psychbruce/psychwordvec/issues

Pkgdown site:https://psychbruce.github.io

Uses libs:
  • openjdk– OpenJDK Java runtime, using Hotspot JIT
Datasets:
  • demodata - Demo data (pre-trained using word2vec on Google News; 8000 vocab, 300 dims).

openjdk

1.70 score 9 scripts 353 downloads 34 exports 230 dependencies

Last updated 1 years agofrom:17d922c38c. Checks:OK: 2. Indexed: no.

TargetResultDate
Doc / VignettesOKDec 28 2024
R-4.5-linuxOKDec 28 2024

Exports:as_embedas_wordveccccos_distcos_simcosine_similaritydata_transformdata_wordvec_loaddata_wordvec_subsetdict_expanddict_reliabilityget_wordvecload_embedload_wordvecmost_similarnormalizeorth_procrustespair_similaritypatternplot_networkplot_similarityplot_wordvecplot_wordvec_tSNEsum_wordvectab_similaritytest_RNDtest_WEATtext_inittext_model_downloadtext_model_removetext_to_vectext_unmasktokenizetrain_wordvec

Dependencies:abindafexaskpassbackportsbase64encbayestestRbitbit64bootbroombroom.mixedbruceRbslibcachemcarcarDatacellrangercheckmateclassclicliprclockclustercodacodetoolscolorspacecommonmarkcorpcorcorrplotcowplotcpp11crayoncurldata.tabledatawizardDerivdiagramdialsDiceDesigndigestdoBydoFuturedplyreffectsizeeffsizeemmeansestimabilityevaluatefansifarverfastmapfastTextRfdrtoolfloatfontawesomeforcatsforeachforeignFormulafsfurrrfuturefuture.applygenericsggplot2ggrepelggwordcloudglassoglobalsgluegowerGPArotationGPfitgridExtragridtextgtablegtoolshardhathavenherehighrHmischmshtmlTablehtmltoolshtmlwidgetshttrigraphinsightinteractionsipredisobandISOcodesiteratorsjpegjquerylibjsonlitejtoolsKernSmoothknitrlabelinglatticelavalavaanlgrlhslifecyclelistenvlme4lmerTestlpSolvelubridatemagrittrmalletmarkdownMASSMatrixMatrixExtraMatrixModelsmediationmemoisemgcvmicrobenchmarkmimeminqamlapimnormtmodelenvmodelrmunsellmvtnormngramnlmenloptrnnetnumDerivopensslpanderparallellyparametersparsnippbapplypbivnormpbkrtestperformancepillarpkgconfigplyrpngprettyunitsprodlimprogressprogressrpsychpurrrqgraphquadprogquantregR.methodsS3R.ooR.utilsR6rappdirsRColorBrewerRcppRcppArmadilloRcppEigenRcppProgressRcppTOMLreadrreadxlrecipesrematchreshape2reticulaterglRhpcBLASctlriorJavarlangrmarkdownrpartrprojrootrsamplersparseRSpectrarstudioapiRtsnesandwichsassscalessfdshapeslamsliderSparseMSQUAREMstopwordsstringistringrsurvivalsystexregtexttext2vectextmineRtibbletidyrtidyselecttimechangetimeDatetinytextopicstunetzdbutf8vctrsviridisviridisLitevroomwarpwithrword2vecworkflowswritexlxfunxml2yamlyardstickzoo

Readme and manuals

Help Manual

Help pageTopics
Word vectors data class: 'wordvec' and 'embed'.as_embed as_wordvec pattern [.embed
Cosine similarity/distance between two vectors.cosine_similarity cos_dist cos_sim
Transform plain text of word vectors into 'wordvec' (data.table) or 'embed' (matrix), saved in a compressed ".RData" file.data_transform
Load word vectors data ('wordvec' or 'embed') from ".RData" file.data_wordvec_load load_embed load_wordvec
Extract a subset of word vectors data (with S3 methods).data_wordvec_subset subset.embed subset.wordvec
Demo data (pre-trained using word2vec on Google News; 8000 vocab, 300 dims).demodata
Expand a dictionary from the most similar words.dict_expand
Reliability analysis and PCA of a dictionary.dict_reliability
Extract word vector(s).get_wordvec
Find the Top-N most similar words.most_similar
Normalize all word vectors to the unit length 1.normalize
Orthogonal Procrustes rotation for matrix alignment.orth_procrustes
Compute a matrix of cosine similarity/distance of word pairs.pair_similarity
Visualize a (partial correlation) network graph of words.plot_network
Visualize cosine similarity of word pairs.plot_similarity
Visualize word vectors.plot_wordvec
Visualize word vectors with dimensionality reduced using t-SNE.plot_wordvec_tSNE
Calculate the sum vector of multiple words.sum_wordvec
Tabulate cosine similarity/distance of word pairs.tab_similarity
Relative Norm Distance (RND) analysis.test_RND
Word Embedding Association Test (WEAT) and Single-Category WEAT.test_WEAT
Install required Python modules in a new conda environment and initialize the environment, necessary for all 'text_*' functions designed for contextualized word embeddings.text_init
Download pre-trained language models from HuggingFace.text_model_download
Remove downloaded models from the local .cache folder.text_model_remove
Extract contextualized word embeddings from transformers (pre-trained language models).text_to_vec
<Deprecated> Fill in the blank mask(s) in a query (sentence).text_unmask
Tokenize raw text for training word embeddings.tokenize
Train static word embeddings using the Word2Vec, GloVe, or FastText algorithm.train_wordvec