Package: textreg 0.1.5

Luke Miratrix

textreg: n-Gram Text Regression, aka Concise Comparative Summarization

Function for sparse regression on raw text, regressing a labeling vector onto a feature space consisting of all possible phrases.

Authors:Luke Miratrix

textreg_0.1.5.tar.gz
textreg_0.1.5.tar.gz(r-4.5-noble)textreg_0.1.5.tar.gz(r-4.4-noble)
textreg_0.1.5.tgz(r-4.4-emscripten)textreg_0.1.5.tgz(r-4.3-emscripten)
textreg.pdf |textreg.html
textreg/json (API)

# Install 'textreg' in R:
install.packages('textreg', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Uses libs:
  • c++– GNU Standard C++ Library v3
Datasets:
  • bathtub - Sample of cleaned OSHA accident summaries.
  • dirtyBathtub - Sample of raw-text OSHA accident summaries.
  • testCorpora - Some small, fake test corpora.

This package does not link to any Github/Gitlab/R-forge repository. No issue tracker or development information is available.

cpp

4.18 score 1 stars 74 scripts 4.1k downloads 29 exports 8 dependencies

Last updated 6 years agofrom:e6960e5cdb. Checks:OK: 1 WARNING: 1. Indexed: yes.

TargetResultDate
Doc / VignettesOKDec 08 2024
R-4.5-linux-x86_64WARNINGDec 08 2024

Exports:build.corpuscalc.lossclean.textcluster.phrasesconvert.tm.to.characterfind.CV.Cfind.threshold.Cgrab.fragmentsis.fragment.sampleis.textreg.corpusis.textreg.resultlist.table.chartmake.appearance.matrixmake.count.tablemake.CV.chartmake.list.tablemake.path.matrixmake.phrase.correlation.chartmake.phrase.matrixmake.similarity.matrixpath.matrix.chartphrase.countphrase.matrixphrasesreformat.textreg.modelsample.fragmentssave.corpus.to.filesstem.corpustextreg

Dependencies:BHcliNLPRcpprlangslamtmxml2

Using the textreg package

Rendered frombathtub_vignette.Rmdusingknitr::rmarkdownon Dec 08 2024.

Last update: 2018-10-04
Started: 2018-10-04

Readme and manuals

Help Manual

Help pageTopics
Sparse regression package for text that allows for multiple word phrases.textreg-package
Sample of cleaned OSHA accident summaries.bathtub
Build a corpus that can be used in the textreg call.build.corpus
Calculate total loss of model (Squared hinge loss).calc.loss
Clean text and get it ready for textreg.clean.text
Cluster phrases based on similarity of appearance.cluster.phrases
Convert tm corpus to vector of strings.convert.tm.to.character
Driver function for the C++ function.cpp_build.corpus
Driver function for the C++ function.cpp_textreg
Sample of raw-text OSHA accident summaries.dirtyBathtub
K-fold cross-validation to determine optimal tuning parameterfind.CV.C
Conduct permutation test on labeling to get null distribution of regularization parameter.find.threshold.C
Grab all fragments in a corpus with given phrase.grab.fragments
Is object a fragment.sample object?fragment.sample is.fragment.sample
Is object a textreg.corpus object?is.textreg.corpus textreg.corpus
Is object a textreg.result object?is.textreg.result textreg.result
Graphic showing multiple word lists side-by-side.list.table.chart
Convert phrases to appropriate search string.make_search_phrases
Make phrase appearance matrix from textreg result.make.appearance.matrix
Count number of times documents have a given phrase.make.count.table
Plot K-fold cross validation curvesmake.CV.chart
Collate multiple regression runs.make.list.table
Generate matrix describing gradient descent path of textreg.make.path.matrix
Generate visualization of phrase overlap.make.phrase.correlation.chart
Make a table of where phrases appear in a corpusmake.phrase.matrix
Calculate similarity matrix for set of phrases.make.similarity.matrix
Plot optimization path of textreg.path.matrix.chart
Count phrase appearance.phrase.count
Make matrix of where phrases appear in corpus.phrase.matrix
Get the phrases from the textreg.result object?phrases
Plot the sequence of features as they are introduced with the textreg gradient descent program.plot.textreg.result
Predict labeling with the selected phrases.predict.textreg.result
Pretty print results of phrase sampling object.print.fragment.sample
Pretty print textreg corpus objectprint.textreg.corpus
Pretty print results of textreg regression.print.textreg.result
Clean up output from textreg.reformat.textreg.model
Sample fragments of text to contextualize a phrase.sample.fragments
Save corpus to text (and RData) file.save.corpus.to.files
Step corpus with annotation.stem.corpus
Some small, fake test corpora.testCorpora
Sparse regression of labeling vector onto all phrases in a corpus.textreg
Call gregexpr on the content of a tm Corpus.tm_gregexpr