Package: sparkwarc 0.1.6
sparkwarc: Load WARC Files into Apache Spark
Load WARC (Web ARChive) files into Apache Spark using 'sparklyr'. This allows to read files from the Common Crawl project <http://commoncrawl.org/>.
Authors:
sparkwarc_0.1.6.tar.gz
sparkwarc_0.1.6.tar.gz(r-4.5-noble)sparkwarc_0.1.6.tar.gz(r-4.4-noble)
sparkwarc_0.1.6.tgz(r-4.4-emscripten)sparkwarc_0.1.6.tgz(r-4.3-emscripten)
sparkwarc.pdf |sparkwarc.html✨
sparkwarc/json (API)
# Install 'sparkwarc' in R: |
install.packages('sparkwarc', repos = c('https://cran.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/r-spark/sparkwarc/issues
Last updated 3 years agofrom:db8713e982. Checks:OK: 1 NOTE: 1. Indexed: no.
Target | Result | Date |
---|---|---|
Doc / Vignettes | OK | Dec 05 2024 |
R-4.5-linux-x86_64 | NOTE | Dec 05 2024 |
Exports:cc_warcrcpp_read_warc_samplespark_rcpp_read_warcspark_read_warcspark_read_warc_samplespark_warc_sample_path
Dependencies:askpassblobclicodetoolsconfigcpp11curlDBIdbplyrdplyrfansigenericsglobalsgluehttrjsonlitelifecyclemagrittrmimeopensslpillarpkgconfigpurrrR6Rcpprlangrstudioapisparklyrstringistringrsystibbletidyrtidyselectutf8uuidvctrswithrxml2yaml
Readme and manuals
Help Manual
Help page | Topics |
---|---|
Provides WARC paths for commoncrawl.org | cc_warc |
Loads the sample warc file in Rcpp | rcpp_read_warc_sample |
Reads a WARC File into using Rcpp | spark_rcpp_read_warc |
Reads a WARC File into Apache Spark | spark_read_warc |
Loads the sample warc file in Spark | spark_read_warc_sample |
Retrieves sample warc path | spark_warc_sample_path |
sparkwarc | sparkwarc |