Package: sparkwarc 0.1.6

Edgar Ruiz

sparkwarc: Load WARC Files into Apache Spark

Load WARC (Web ARChive) files into Apache Spark using 'sparklyr'. This allows to read files from the Common Crawl project <http://commoncrawl.org/>.

Authors:Javier Luraschi [aut], Yitao Li [aut], Edgar Ruiz [aut, cre]

sparkwarc_0.1.6.tar.gz
sparkwarc_0.1.6.tar.gz(r-4.5-noble)sparkwarc_0.1.6.tar.gz(r-4.4-noble)
sparkwarc_0.1.6.tgz(r-4.4-emscripten)sparkwarc_0.1.6.tgz(r-4.3-emscripten)
sparkwarc.pdf |sparkwarc.html
sparkwarc/json (API)

# Install 'sparkwarc' in R:
install.packages('sparkwarc', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/r-spark/sparkwarc/issues

Uses libs:
  • zlib– Compression library
  • c++– GNU Standard C++ Library v3

zlibcpp

1.78 score 12 scripts 137 downloads 6 exports 40 dependencies

Last updated 3 years agofrom:db8713e982. Checks:OK: 1 NOTE: 1. Indexed: no.

TargetResultDate
Doc / VignettesOKDec 05 2024
R-4.5-linux-x86_64NOTEDec 05 2024

Exports:cc_warcrcpp_read_warc_samplespark_rcpp_read_warcspark_read_warcspark_read_warc_samplespark_warc_sample_path

Dependencies:askpassblobclicodetoolsconfigcpp11curlDBIdbplyrdplyrfansigenericsglobalsgluehttrjsonlitelifecyclemagrittrmimeopensslpillarpkgconfigpurrrR6Rcpprlangrstudioapisparklyrstringistringrsystibbletidyrtidyselectutf8uuidvctrswithrxml2yaml