Package: tok 0.1.2

Daniel Falbel

tok:Fast Text Tokenization

Interfaces with the 'Hugging Face' tokenizers library to provide implementations of today's most used tokenizers such as the 'Byte-Pair Encoding' algorithm <https://huggingface.co/docs/tokenizers/index>. It's extremely fast for both training new vocabularies and tokenizing texts.

Authors:Daniel Falbel [aut, cre], Posit [cph]

tok_0.1.2.tar.gz
tok_0.1.2.tar.gz(r-4.5-noble)tok_0.1.2.tar.gz(r-4.4-noble)
tok.pdf |tok.html
tok/json (API)
NEWS

# Installtok in R:
install.packages('tok',repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/mlverse/tok/issues

20 exports 0.71 score 2 dependencies 1 dependents 231 downloads

Last updated 8 days agofrom:58dd61d3aa

Exports:decoder_byte_levelencodingmodel_bpemodel_unigrammodel_wordpiecenormalizer_nfcnormalizer_nfkcpre_tokenizerpre_tokenizer_byte_levelpre_tokenizer_whitespaceprocessor_byte_leveltok_decodertok_modeltok_normalizertok_processortok_trainertokenizertrainer_bpetrainer_unigramtrainer_wordpiece

Dependencies:cliR6