Package: chinese.misc 0.2.3

Jiang Wu

chinese.misc: Miscellaneous Tools for Chinese Text Mining and More

Efforts are made to make Chinese text mining easier, faster, and robust to errors. Document term matrix can be generated by only one line of code; detecting encoding, segmenting and removing stop words are done automatically. Some convenient tools are also supplied.

Authors:Jiang Wu [aut, cre]

chinese.misc_0.2.3.tar.gz
chinese.misc_0.2.3.tar.gz(r-4.5-noble)chinese.misc_0.2.3.tar.gz(r-4.4-noble)
chinese.misc_0.2.3.tgz(r-4.4-emscripten)chinese.misc_0.2.3.tgz(r-4.3-emscripten)
chinese.misc.pdf |chinese.misc.html
chinese.misc/json (API)

# Install 'chinese.misc' in R:
install.packages('chinese.misc', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/githubwwwjjj/chinese.misc/issues

33 exports 0.61 score 18 dependencies 2 dependents 31 scripts 426 downloads

Last updated 4 years agofrom:369fd6b193. Checks:OK: 2. Indexed: yes.

TargetResultDate
Doc / VignettesOKSep 02 2024
R-4.5-linuxOKSep 02 2024

Exports:as.character2as.numeric2corp_or_dtmcreate_ttmcsv2txtDEFAULT_control1DEFAULT_control2DEFAULT_cutterdictionary_dtmdir_or_fileget_tag_wordget_tmp_chi_localeis_character_vectoris_positive_integerm2docm3mmake_stoplistmatch_patternoutput_dtmscancnseg_fileslim_textsort_tfsparse_lefttf2doctopic_trendtxt2csvVVCVCRVRVRCword_cor

Dependencies:BHcligluejiebaRjiebaRDlatticelifecyclemagrittrMatrixNLPpurrrRcpprlangslamstringitmvctrsxml2

Readme and manuals

Help Manual

Help pageTopics
Miscellaneous Tools for Chinese Text Mining and Morechinese.misc-package chinese.misc
An Enhanced Version of as.characteras.character2
An Enhanced Version of as.numericas.numeric2
Create Corpus or Document Term Matrix with 1 Linecorp_or_dtm
Create Term-Term Matrix (Term-Cooccurrence Matrix)create_ttm
Write Texts in CSV into Many TXT/RTF Filescsv2txt
A Default Value for corp_or_dtm 1DEFAULT_control1
A Default Value for corp_or_dtm 2DEFAULT_control2
A Default CutterDEFAULT_cutter
Making DTM/TDM for Groups of Wordsdictionary_dtm
Collect Full Filenames from a Mix of Directories and Filesdir_or_file
Extract Words of Some Certain Tags through Pos-Taggingget_tag_word
Check The Locale Functions are to Assumeget_tmp_chi_locale
A Convenient Version of is.characteris_character_vector
A Convenient Version of is.integeris_positive_integer
Rewrite Terms and Frequencies into Many Filesm2doc
Convert Objects among matrix, dgCMatrix, simple_triplet_matrix, DocumentTermMatrix, TermDocumentMatrixm3m
Input a Filename and Return a Vector of Stop Wordsmake_stoplist
Extract Strings by Regular Expression Quicklymatch_pattern
Convert or Write DTM/TDM Object Quicklyoutput_dtm
Read a Text File by Auto-Detecting Encodingscancn
Convenient Tool to Segment Chinese Textsseg_file
Remove Words through Speech Taggingslim_text
Find High Frequency Termssort_tf
Check How many Words are Left under Certain Sparse Valuessparse_left
Transform Terms and Frequencies into a Texttf2doc
Simple Rise or Fall Trend of Several Yearstopic_trend
Write Many Separated Files into a CSVtxt2csv
Copy and Paste from Excel-Like FilesV
Copy and Paste from Excel-Like FilesVC
Copy and Paste from Excel-Like FilesVCR
Copy and Paste from Excel-Like FilesVR
Copy and Paste from Excel-Like FilesVRC
Word Correlation in DTM/TDMword_cor