Package: textreg 0.1.5

Luke Miratrix

textreg: n-Gram Text Regression, aka Concise Comparative Summarization

Function for sparse regression on raw text, regressing a labeling vector onto a feature space consisting of all possible phrases.

Authors:Luke Miratrix

textreg_0.1.5.tar.gz
textreg_0.1.5.tar.gz(r-4.5-noble)textreg_0.1.5.tar.gz(r-4.4-noble)
textreg_0.1.5.tgz(r-4.4-emscripten)textreg_0.1.5.tgz(r-4.3-emscripten)
textreg.pdf |textreg.html✨
textreg/json (API)

# Install 'textreg' in R:

install.packages('textreg', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Uses libs:

c++– GNU Standard C++ Library v3

Datasets:

bathtub - Sample of cleaned OSHA accident summaries.
dirtyBathtub - Sample of raw-text OSHA accident summaries.
testCorpora - Some small, fake test corpora.

On CRAN:

This package does not link to any Github/Gitlab/R-forge repository. No issue tracker or development information is available.

cpp

3.26 score 1 stars 3.7k downloads 29 exports 8 dependencies

Last updated 7 years agofrom:e6960e5cdb. Checks:1 OK, 2 WARNING. Indexed: yes.

Target	Result	Latest binary
Doc / Vignettes	OK	Mar 08 2025
R-4.5-linux-x86_64	WARNING	Mar 08 2025
R-4.4-linux-x86_64	WARNING	Mar 08 2025

Exports:build.corpus calc.loss clean.text cluster.phrases convert.tm.to.character find.CV.C find.threshold.C grab.fragments is.fragment.sample is.textreg.corpus is.textreg.result list.table.chart make.appearance.matrix make.count.table make.CV.chart make.list.table make.path.matrix make.phrase.correlation.chart make.phrase.matrix make.similarity.matrix path.matrix.chart phrase.count phrase.matrix phrases reformat.textreg.model sample.fragments save.corpus.to.files stem.corpus textreg

Dependencies:BH cli NLP Rcpp rlang slam tm xml2

Using the textreg package

Miratrix

Rendered frombathtub_vignette.Rmdusingknitr::rmarkdownon Mar 08 2025.

Last update: 2018-10-04
Started: 2018-10-04

Citation

To cite package ‘textreg’ in publications use:

Miratrix L (2018). textreg: n-Gram Text Regression, aka Concise Comparative Summarization. R package version 0.1.5, https://CRAN.R-project.org/package=textreg.

ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see ‘help("citation")’.

Corresponding BibTeX entry:

  @Manual{,
    title = {textreg: n-Gram Text Regression, aka Concise Comparative
      Summarization},
    author = {Luke Miratrix},
    year = {2018},
    note = {R package version 0.1.5},
    url = {https://CRAN.R-project.org/package=textreg},
  }

Readme and manuals

textreg

textreg CRAN package

Help Manual

Help page	Topics
Sparse regression package for text that allows for multiple word phrases.	textreg-package
Sample of cleaned OSHA accident summaries.	bathtub
Build a corpus that can be used in the textreg call.	build.corpus
Calculate total loss of model (Squared hinge loss).	calc.loss
Clean text and get it ready for textreg.	clean.text
Cluster phrases based on similarity of appearance.	cluster.phrases
Convert tm corpus to vector of strings.	convert.tm.to.character
Driver function for the C++ function.	cpp_build.corpus
Driver function for the C++ function.	cpp_textreg
Sample of raw-text OSHA accident summaries.	dirtyBathtub
K-fold cross-validation to determine optimal tuning parameter	find.CV.C
Conduct permutation test on labeling to get null distribution of regularization parameter.	find.threshold.C
Grab all fragments in a corpus with given phrase.	grab.fragments
Is object a fragment.sample object?	fragment.sample is.fragment.sample
Is object a textreg.corpus object?	is.textreg.corpus textreg.corpus
Is object a textreg.result object?	is.textreg.result textreg.result
Graphic showing multiple word lists side-by-side.	list.table.chart
Convert phrases to appropriate search string.	make_search_phrases
Make phrase appearance matrix from textreg result.	make.appearance.matrix
Count number of times documents have a given phrase.	make.count.table
Plot K-fold cross validation curves	make.CV.chart
Collate multiple regression runs.	make.list.table
Generate matrix describing gradient descent path of textreg.	make.path.matrix
Generate visualization of phrase overlap.	make.phrase.correlation.chart
Make a table of where phrases appear in a corpus	make.phrase.matrix
Calculate similarity matrix for set of phrases.	make.similarity.matrix
Plot optimization path of textreg.	path.matrix.chart
Count phrase appearance.	phrase.count
Make matrix of where phrases appear in corpus.	phrase.matrix
Get the phrases from the textreg.result object?	phrases
Plot the sequence of features as they are introduced with the textreg gradient descent program.	plot.textreg.result
Predict labeling with the selected phrases.	predict.textreg.result
Pretty print results of phrase sampling object.	print.fragment.sample
Pretty print textreg corpus object	print.textreg.corpus
Pretty print results of textreg regression.	print.textreg.result
Clean up output from textreg.	reformat.textreg.model
Sample fragments of text to contextualize a phrase.	sample.fragments
Save corpus to text (and RData) file.	save.corpus.to.files
Step corpus with annotation.	stem.corpus
Some small, fake test corpora.	testCorpora
Sparse regression of labeling vector onto all phrases in a corpus.	textreg
Call gregexpr on the content of a tm Corpus.	tm_gregexpr