Title: | Quick Serialization of R Objects |
---|---|
Description: | Provides functions for quickly writing and reading any R object to and from disk. |
Authors: | Travers Ching [aut, cre, cph], Yann Collet [ctb, cph] (Yann Collet is the author of the bundled zstd, lz4 and xxHash code), Facebook, Inc. [cph] (Facebook is the copyright holder of the bundled zstd code), Reichardt Tino [ctb, cph] (Contributor/copyright holder of zstd bundled code), Skibinski Przemyslaw [ctb, cph] (Contributor/copyright holder of zstd bundled code), Mori Yuta [ctb, cph] (Contributor/copyright holder of zstd bundled code), Romain Francois [ctb, cph] (Derived example/tutorials for ALTREP structures), Francesc Alted [ctb, cph] (Shuffling routines derived from Blosc library), Bryce Chamberlain [ctb] (qsavem and qload functions), Salim Brüggemann [ctb] (Contributing to documentation (ORCID:0000-0002-5329-5987)) |
Maintainer: | Travers Ching <[email protected]> |
License: | GPL-3 |
Version: | 0.27.2 |
Built: | 2024-12-31 07:26:36 UTC |
Source: | CRAN |
Decodes a Z85 encoded string back to binary
base85_decode(encoded_string)
base85_decode(encoded_string)
encoded_string |
A string. |
The original raw vector.
Encodes binary data (a raw vector) as ASCII text using Z85 encoding format.
base85_encode(rawdata)
base85_encode(rawdata)
rawdata |
A raw vector. |
Z85 is a binary to ASCII encoding format created by Pieter Hintjens in 2010 and is part of the ZeroMQ RFC. The encoding has a dictionary using 85 out of 94 printable ASCII characters. There are other base 85 encoding schemes, including Ascii85, which is popularized and used by Adobe. Z85 is distinguished by its choice of dictionary, which is suitable for easier inclusion into source code for many programming languages. The dictionary excludes all quote marks and other control characters, and requires no special treatment in R and most other languages. Note: although the official specification restricts input length to multiples of four bytes, the implementation here works with any input length. The overhead (extra bytes used relative to binary) is 25%. In comparison, base 64 encoding has an overhead of 33.33%.
A string representation of the raw vector.
https://rfc.zeromq.org/spec/32/
Decodes a basE91 encoded string back to binary
base91_decode(encoded_string)
base91_decode(encoded_string)
encoded_string |
A string. |
The original raw vector.
Encodes binary data (a raw vector) as ASCII text using basE91 encoding format.
base91_encode(rawdata, quote_character = "\"")
base91_encode(rawdata, quote_character = "\"")
rawdata |
A raw vector. |
quote_character |
The character to use in the encoding, replacing the double quote character. Must be either a single quote ( |
basE91 (capital E for stylization) is a binary to ASCII encoding format created by Joachim Henke in 2005.
The overhead (extra bytes used relative to binary) is 22.97% on average. In comparison, base 64 encoding has an overhead of 33.33%.
The original encoding uses a dictionary of 91 out of 94 printable ASCII characters excluding -
(dash), \
(backslash) and '
(single quote).
The original encoding does include double quote characters, which are less than ideal for strings in R. Therefore,
you can use the quote_character
parameter to substitute dash or single quote.
A string representation of the raw vector.
https://base91.sourceforge.net/
Shuffles a raw vector using BLOSC shuffle routines.
blosc_shuffle_raw(x, bytesofsize)
blosc_shuffle_raw(x, bytesofsize)
x |
A raw vector. |
bytesofsize |
Either |
The shuffled vector
x <- serialize(1L:1000L, NULL) xshuf <- blosc_shuffle_raw(x, 4) xunshuf <- blosc_unshuffle_raw(xshuf, 4)
x <- serialize(1L:1000L, NULL) xshuf <- blosc_shuffle_raw(x, 4) xunshuf <- blosc_unshuffle_raw(xshuf, 4)
Un-shuffles a raw vector using BLOSC un-shuffle routines.
blosc_unshuffle_raw(x, bytesofsize)
blosc_unshuffle_raw(x, bytesofsize)
x |
A raw vector. |
bytesofsize |
Either |
The unshuffled vector.
x <- serialize(1L:1000L, NULL) xshuf <- blosc_shuffle_raw(x, 4) xunshuf <- blosc_unshuffle_raw(xshuf, 4)
x <- serialize(1L:1000L, NULL) xshuf <- blosc_shuffle_raw(x, 4) xunshuf <- blosc_unshuffle_raw(xshuf, 4)
A helper function for encoding and compressing a file or string to ASCII using base91_encode()
and qserialize()
with the highest compression level.
decode_source(string)
decode_source(string)
string |
A string to decode. |
The original (decoded) object.
encode_source()
for more details.
A helper function for encoding and compressing a file or string to ASCII using base91_encode()
and qserialize()
with the highest compression level.
encode_source(x = NULL, file = NULL, width = 120)
encode_source(x = NULL, file = NULL, width = 120)
x |
The object to encode (if |
file |
The file to encode (if |
width |
The output will be broken up into individual strings, with |
The encode_source()
and decode_source()
functions are useful for storing small amounts of data or text inline to a .R or .Rmd file.
A character vector in base91 representing the compressed original file or object.
set.seed(1); data <- sample(500) result <- encode_source(data) # Note: the result string is not guaranteed to be consistent between qs or zstd versions # but will always properly decode regardless print(result) result <- decode_source(result) # [1] 1 2 3 4 5 6 7 8 9 10
set.seed(1); data <- sample(500) result <- encode_source(data) # Note: the result string is not guaranteed to be consistent between qs or zstd versions # but will always properly decode regardless print(result) result <- decode_source(result) # [1] 1 2 3 4 5 6 7 8 9 10
Gets the formal name of the class and package of an ALTREP object
get_altrep_class_info(obj)
get_altrep_class_info(obj)
obj |
The ALTREP class name |
The class information (class name and package name) of an ALTREP object, a character vector of length two. If the object is not an ALTREP object, returns NULL.
get_altrep_class_info(1:5)
get_altrep_class_info(1:5)
Tests system endianness. Intel and AMD based systems are little endian, and so this function will likely return FALSE
.
The qs
package is not capable of transferring data between systems of different endianness. This should not matter for the large majority of use cases.
is_big_endian()
is_big_endian()
TRUE
if big endian, FALSE
if little endian.
is_big_endian() # returns FALSE on Intel/AMD systems
is_big_endian() # returns FALSE on Intel/AMD systems
Exports the compress bound function from the lz4 library. Returns the maximum compressed size of an object of length size
.
lz4_compress_bound(size)
lz4_compress_bound(size)
size |
An integer size. |
Maximum compressed size.
lz4_compress_bound(100000) #' lz4_compress_bound(1e9)
lz4_compress_bound(100000) #' lz4_compress_bound(1e9)
Compresses to a raw vector using the lz4 algorithm. Exports the main lz4 compression function.
lz4_compress_raw(x, compress_level)
lz4_compress_raw(x, compress_level)
x |
The object to serialize. |
compress_level |
The compression level used. A number > 1 (higher is less compressed). |
The compressed data as a raw vector.
x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- lz4_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(lz4_decompress_raw(xcompressed))
x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- lz4_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(lz4_decompress_raw(xcompressed))
Decompresses an lz4 compressed raw vector.
lz4_decompress_raw(x)
lz4_decompress_raw(x)
x |
A raw vector. |
The de-serialized object.
x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- lz4_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(lz4_decompress_raw(xcompressed))
x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- lz4_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(lz4_decompress_raw(xcompressed))
Reads the attributes of an object serialized to disk.
qattributes(file, use_alt_rep=FALSE, strict=FALSE, nthreads=1)
qattributes(file, use_alt_rep=FALSE, strict=FALSE, nthreads=1)
file |
The file name/path. |
use_alt_rep |
Use ALTREP when reading in string data (default |
strict |
Whether to throw an error or just report a warning (default: |
nthreads |
Number of threads to use. Default |
Equivalent to:
attributes(qread(file))
But more efficient. Attributes are stored towards the end of the file. This function will read through the contents of the file (without de-serializing the object itself), and then de-serializes the attributes only.
Because it is necessary to read through the file, pulling out attributes could take a long time if the file is large. However, it should be much faster than de-serializing the entire object first.
the attributes fo the serialized object.
file <- tempfile() qsave(mtcars, file) attr1 <- qattributes(file) attr2 <- attributes(qread(file)) print(attr1) # $names # [1] "IAU Name" "Designation" "Const." ... # $row.names # [1] 1 2 3 4 5 # $class # [1] "data.frame" identical(attr1, attr2) # TRUE
file <- tempfile() qsave(mtcars, file) attr1 <- qattributes(file) attr2 <- attributes(qread(file)) print(attr1) # $names # [1] "IAU Name" "Designation" "Const." ... # $row.names # [1] 1 2 3 4 5 # $class # [1] "data.frame" identical(attr1, attr2) # TRUE
Helper function for caching objects for long running tasks
qcache( expr, name, envir = parent.frame(), cache_dir = ".cache", clear = FALSE, prompt = TRUE, qsave_params = list(), qread_params = list() )
qcache( expr, name, envir = parent.frame(), cache_dir = ".cache", clear = FALSE, prompt = TRUE, qsave_params = list(), qread_params = list() )
expr |
The expression to evaluate. |
name |
The cached expression name (see details). |
envir |
The environment to evaluate |
cache_dir |
The directory to store cached files in. |
clear |
Set to |
prompt |
Whether to prompt before clearing. |
qsave_params |
Parameters passed on to |
qread_params |
Parameters passed on to |
This is a (very) simple helper function to cache results of long running calculations. There are other packages specializing in caching data that are more feature complete.
The evaluated expression is saved with qsave()
in <cache_dir>/<name>.qs
.
If the file already exists instead, the expression is not evaluated and the cached result is read using qread()
and returned.
To clear a cached result, you can manually delete the associated .qs
file, or you can call qcache()
with clear = TRUE
.
If prompt
is also TRUE
a prompt will be given asking you to confirm deletion.
If name
is not specified, all cached results in cache_dir
will be removed.
cache_dir <- tempdir() a <- 1 b <- 5 # not cached result <- qcache({a + b}, name="aplusb", cache_dir = cache_dir, qsave_params = list(preset="fast")) # cached result <- qcache({a + b}, name="aplusb", cache_dir = cache_dir, qsave_params = list(preset="fast")) # clear cached result qcache(name="aplusb", clear=TRUE, prompt=FALSE, cache_dir = cache_dir)
cache_dir <- tempdir() a <- 1 b <- 5 # not cached result <- qcache({a + b}, name="aplusb", cache_dir = cache_dir, qsave_params = list(preset="fast")) # cached result <- qcache({a + b}, name="aplusb", cache_dir = cache_dir, qsave_params = list(preset="fast")) # clear cached result qcache(name="aplusb", clear=TRUE, prompt=FALSE, cache_dir = cache_dir)
Reads an object from a raw vector.
qdeserialize(x, use_alt_rep=FALSE, strict=FALSE)
qdeserialize(x, use_alt_rep=FALSE, strict=FALSE)
x |
A raw vector. |
use_alt_rep |
Use ALTREP when reading in string data (default |
strict |
Whether to throw an error or just report a warning (default: |
See qserialize()
for additional details and examples.
The de-serialized object.
Exports the uncompressed binary serialization to a list of raw vectors. For testing purposes and exploratory purposes mainly.
qdump(file)
qdump(file)
file |
A file name/path. |
The uncompressed serialization.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qsave(x, myfile) x2 <- qdump(myfile)
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qsave(x, myfile) x2 <- qdump(myfile)
Reads an object in a file serialized to disk.
qread(file, use_alt_rep=FALSE, strict=FALSE, nthreads=1)
qread(file, use_alt_rep=FALSE, strict=FALSE, nthreads=1)
file |
The file name/path. |
use_alt_rep |
Use ALTREP when reading in string data (default |
strict |
Whether to throw an error or just report a warning (default: |
nthreads |
Number of threads to use. Default |
The de-serialized object.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qsave(x, myfile) x2 <- qread(myfile) identical(x, x2) # returns true # qs support multithreading qsave(x, myfile, nthreads=2) x2 <- qread(myfile, nthreads=2) identical(x, x2) # returns true # Other examples z <- 1:1e7 myfile <- tempfile() qsave(z, myfile) z2 <- qread(myfile) identical(z, z2) # returns true w <- as.list(rnorm(1e6)) myfile <- tempfile() qsave(w, myfile) w2 <- qread(myfile) identical(w, w2) # returns true
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qsave(x, myfile) x2 <- qread(myfile) identical(x, x2) # returns true # qs support multithreading qsave(x, myfile, nthreads=2) x2 <- qread(myfile, nthreads=2) identical(x, x2) # returns true # Other examples z <- 1:1e7 myfile <- tempfile() qsave(z, myfile) z2 <- qread(myfile) identical(z, z2) # returns true w <- as.list(rnorm(1e6)) myfile <- tempfile() qsave(w, myfile) w2 <- qread(myfile) identical(w, w2) # returns true
Reads an object from a file descriptor.
qread_fd(fd, use_alt_rep=FALSE, strict=FALSE)
qread_fd(fd, use_alt_rep=FALSE, strict=FALSE)
fd |
A file descriptor. |
use_alt_rep |
Use ALTREP when reading in string data (default |
strict |
Whether to throw an error or just report a warning (default: |
See qsave_fd()
for additional details and examples.
The de-serialized object.
Reads an object from a windows handle.
qread_handle(handle, use_alt_rep=FALSE, strict=FALSE)
qread_handle(handle, use_alt_rep=FALSE, strict=FALSE)
handle |
A windows handle external pointer. |
use_alt_rep |
Use ALTREP when reading in string data (default |
strict |
Whether to throw an error or just report a warning (default: |
See qsave_handle()
for additional details and examples.
The de-serialized object.
Reads an object from an external pointer.
qread_ptr(pointer, length, use_alt_rep=FALSE, strict=FALSE)
qread_ptr(pointer, length, use_alt_rep=FALSE, strict=FALSE)
pointer |
An external pointer to memory. |
length |
The length of the object in memory. |
use_alt_rep |
Use ALTREP when reading in string data (default |
strict |
Whether to throw an error or just report a warning (default: |
The de-serialized object.
A helper function that reads data from the internet to memory and deserializes the object with qdeserialize()
.
qread_url(url, buffer_size, ...)
qread_url(url, buffer_size, ...)
url |
The URL where the object is stored |
buffer_size |
The buffer size used to read in data (default |
... |
Arguments passed to |
See qdeserialize()
for additional details.
The de-serialized object.
## Not run: x <- qread_url("http://example_url.com/my_file.qs") ## End(Not run)
## Not run: x <- qread_url("http://example_url.com/my_file.qs") ## End(Not run)
Reads an object in a file serialized to disk using qsavem()
.
qreadm(file, env = parent.frame(), ...) qload(file, env = parent.frame(), ...)
qreadm(file, env = parent.frame(), ...) qload(file, env = parent.frame(), ...)
file |
The file name/path. |
env |
The environment where the data should be loaded. |
... |
additional arguments will be passed to qread. |
This function extends qread to replicate the functionality of base::load()
to load multiple saved objects into your workspace. qload
and qreadm
are alias of the same function.
Nothing is explicitly returned, but the function will load the saved objects into the workspace.
x1 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) x2 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qsavem(x1, x2, file=myfile) rm(x1, x2) qload(myfile) exists('x1') && exists('x2') # returns true # qs support multithreading qsavem(x1, x2, file=myfile, nthreads=2) rm(x1, x2) qload(myfile, nthreads=2) exists('x1') && exists('x2') # returns true
x1 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) x2 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qsavem(x1, x2, file=myfile) rm(x1, x2) qload(myfile) exists('x1') && exists('x2') # returns true # qs support multithreading qsavem(x1, x2, file=myfile, nthreads=2) rm(x1, x2) qload(myfile, nthreads=2) exists('x1') && exists('x2') # returns true
Saves (serializes) an object to disk.
qsave(x, file, preset = "high", algorithm = "zstd", compress_level = 4L, shuffle_control = 15L, check_hash=TRUE, nthreads = 1)
qsave(x, file, preset = "high", algorithm = "zstd", compress_level = 4L, shuffle_control = 15L, check_hash=TRUE, nthreads = 1)
x |
The object to serialize. |
file |
The file name/path. |
preset |
One of |
algorithm |
Ignored unless |
compress_level |
Ignored unless For lz4, this number must be > 1 (higher is less compressed). For zstd, a number between |
shuffle_control |
Ignored unless |
check_hash |
Default |
nthreads |
Number of threads to use. Default |
This function serializes and compresses R objects using block compression with the option of byte shuffling.
The total number of bytes written to the file (returned invisibly).
There are lots of possible parameters. To simplify usage, there are four main presets that are performant over a large variety of data:
"fast"
is a shortcut for algorithm = "lz4"
, compress_level = 100
and shuffle_control = 0
.
"balanced"
is a shortcut for algorithm = "lz4"
, compress_level = 1
and shuffle_control = 15
.
"high"
is a shortcut for algorithm = "zstd"
, compress_level = 4
and shuffle_control = 15
.
"archive"
is a shortcut for algorithm = "zstd_stream"
, compress_level = 14
and shuffle_control = 15
. (zstd_stream
is currently
single-threaded only)
To gain more control over compression level and byte shuffling, set preset = "custom"
, in which case the individual parameters algorithm
,
compress_level
and shuffle_control
are actually regarded.
The parameter shuffle_control
defines which numerical R object types are subject to byte shuffling. Generally speaking, the more ordered/sequential an
object is (e.g., 1:1e7
), the larger the potential benefit of byte shuffling. It is not uncommon to improve compression ratio or compression speed by
several orders of magnitude. The more random an object is (e.g., rnorm(1e7)
), the less potential benefit there is, even negative benefit is possible.
Integer vectors almost always benefit from byte shuffling, whereas the results for numeric vectors are mixed. To control block shuffling, add +1 to the
parameter for logical vectors, +2 for integer vectors, +4 for numeric vectors and/or +8 for complex vectors.
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qsave(x, myfile) x2 <- qread(myfile) identical(x, x2) # returns true # qs support multithreading qsave(x, myfile, nthreads=2) x2 <- qread(myfile, nthreads=2) identical(x, x2) # returns true # Other examples z <- 1:1e7 myfile <- tempfile() qsave(z, myfile) z2 <- qread(myfile) identical(z, z2) # returns true w <- as.list(rnorm(1e6)) myfile <- tempfile() qsave(w, myfile) w2 <- qread(myfile) identical(w, w2) # returns true
x <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qsave(x, myfile) x2 <- qread(myfile) identical(x, x2) # returns true # qs support multithreading qsave(x, myfile, nthreads=2) x2 <- qread(myfile, nthreads=2) identical(x, x2) # returns true # Other examples z <- 1:1e7 myfile <- tempfile() qsave(z, myfile) z2 <- qread(myfile) identical(z, z2) # returns true w <- as.list(rnorm(1e6)) myfile <- tempfile() qsave(w, myfile) w2 <- qread(myfile) identical(w, w2) # returns true
Saves an object to a file descriptor.
qsave_fd(x, fd, preset = "high", algorithm = "zstd", compress_level = 4L, shuffle_control = 15L, check_hash=TRUE)
qsave_fd(x, fd, preset = "high", algorithm = "zstd", compress_level = 4L, shuffle_control = 15L, check_hash=TRUE)
x |
The object to serialize. |
fd |
A file descriptor. |
preset |
One of |
algorithm |
Ignored unless |
compress_level |
Ignored unless For lz4, this number must be > 1 (higher is less compressed). For zstd, a number between |
shuffle_control |
Ignored unless |
check_hash |
Default |
This function serializes and compresses R objects using block compression with the option of byte shuffling.
The total number of bytes written to the file (returned invisibly).
There are lots of possible parameters. To simplify usage, there are four main presets that are performant over a large variety of data:
"fast"
is a shortcut for algorithm = "lz4"
, compress_level = 100
and shuffle_control = 0
.
"balanced"
is a shortcut for algorithm = "lz4"
, compress_level = 1
and shuffle_control = 15
.
"high"
is a shortcut for algorithm = "zstd"
, compress_level = 4
and shuffle_control = 15
.
"archive"
is a shortcut for algorithm = "zstd_stream"
, compress_level = 14
and shuffle_control = 15
. (zstd_stream
is currently
single-threaded only)
To gain more control over compression level and byte shuffling, set preset = "custom"
, in which case the individual parameters algorithm
,
compress_level
and shuffle_control
are actually regarded.
The parameter shuffle_control
defines which numerical R object types are subject to byte shuffling. Generally speaking, the more ordered/sequential an
object is (e.g., 1:1e7
), the larger the potential benefit of byte shuffling. It is not uncommon to improve compression ratio or compression speed by
several orders of magnitude. The more random an object is (e.g., rnorm(1e7)
), the less potential benefit there is, even negative benefit is possible.
Integer vectors almost always benefit from byte shuffling, whereas the results for numeric vectors are mixed. To control block shuffling, add +1 to the
parameter for logical vectors, +2 for integer vectors, +4 for numeric vectors and/or +8 for complex vectors.
Saves an object to a windows handle.
qsave_handle(x, handle, preset = "high", algorithm = "zstd", compress_level = 4L, shuffle_control = 15L, check_hash=TRUE)
qsave_handle(x, handle, preset = "high", algorithm = "zstd", compress_level = 4L, shuffle_control = 15L, check_hash=TRUE)
x |
The object to serialize. |
handle |
A windows handle external pointer. |
preset |
One of |
algorithm |
Ignored unless |
compress_level |
Ignored unless For lz4, this number must be > 1 (higher is less compressed). For zstd, a number between |
shuffle_control |
Ignored unless |
check_hash |
Default |
This function serializes and compresses R objects using block compression with the option of byte shuffling.
The total number of bytes written to the file (returned invisibly).
There are lots of possible parameters. To simplify usage, there are four main presets that are performant over a large variety of data:
"fast"
is a shortcut for algorithm = "lz4"
, compress_level = 100
and shuffle_control = 0
.
"balanced"
is a shortcut for algorithm = "lz4"
, compress_level = 1
and shuffle_control = 15
.
"high"
is a shortcut for algorithm = "zstd"
, compress_level = 4
and shuffle_control = 15
.
"archive"
is a shortcut for algorithm = "zstd_stream"
, compress_level = 14
and shuffle_control = 15
. (zstd_stream
is currently
single-threaded only)
To gain more control over compression level and byte shuffling, set preset = "custom"
, in which case the individual parameters algorithm
,
compress_level
and shuffle_control
are actually regarded.
The parameter shuffle_control
defines which numerical R object types are subject to byte shuffling. Generally speaking, the more ordered/sequential an
object is (e.g., 1:1e7
), the larger the potential benefit of byte shuffling. It is not uncommon to improve compression ratio or compression speed by
several orders of magnitude. The more random an object is (e.g., rnorm(1e7)
), the less potential benefit there is, even negative benefit is possible.
Integer vectors almost always benefit from byte shuffling, whereas the results for numeric vectors are mixed. To control block shuffling, add +1 to the
parameter for logical vectors, +2 for integer vectors, +4 for numeric vectors and/or +8 for complex vectors.
Saves (serializes) multiple objects to disk.
qsavem(...)
qsavem(...)
... |
Objects to serialize. Named arguments will be passed to |
This function extends qsave()
to replicate the functionality of base::save()
to save multiple objects. Read them back with qload()
.
x1 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) x2 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qsavem(x1, x2, file=myfile) rm(x1, x2) qload(myfile) exists('x1') && exists('x2') # returns true # qs support multithreading qsavem(x1, x2, file=myfile, nthreads=2) rm(x1, x2) qload(myfile, nthreads=2) exists('x1') && exists('x2') # returns true
x1 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) x2 <- data.frame(int = sample(1e3, replace=TRUE), num = rnorm(1e3), char = sample(starnames$`IAU Name`, 1e3, replace=TRUE), stringsAsFactors = FALSE) myfile <- tempfile() qsavem(x1, x2, file=myfile) rm(x1, x2) qload(myfile) exists('x1') && exists('x2') # returns true # qs support multithreading qsavem(x1, x2, file=myfile, nthreads=2) rm(x1, x2) qload(myfile, nthreads=2) exists('x1') && exists('x2') # returns true
Saves an object to a raw vector.
qserialize(x, preset = "high", algorithm = "zstd", compress_level = 4L, shuffle_control = 15L, check_hash=TRUE)
qserialize(x, preset = "high", algorithm = "zstd", compress_level = 4L, shuffle_control = 15L, check_hash=TRUE)
x |
The object to serialize. |
preset |
One of |
algorithm |
Ignored unless |
compress_level |
Ignored unless For lz4, this number must be > 1 (higher is less compressed). For zstd, a number between |
shuffle_control |
Ignored unless |
check_hash |
Default |
This function serializes and compresses R objects using block compression with the option of byte shuffling.
A raw vector.
There are lots of possible parameters. To simplify usage, there are four main presets that are performant over a large variety of data:
"fast"
is a shortcut for algorithm = "lz4"
, compress_level = 100
and shuffle_control = 0
.
"balanced"
is a shortcut for algorithm = "lz4"
, compress_level = 1
and shuffle_control = 15
.
"high"
is a shortcut for algorithm = "zstd"
, compress_level = 4
and shuffle_control = 15
.
"archive"
is a shortcut for algorithm = "zstd_stream"
, compress_level = 14
and shuffle_control = 15
. (zstd_stream
is currently
single-threaded only)
To gain more control over compression level and byte shuffling, set preset = "custom"
, in which case the individual parameters algorithm
,
compress_level
and shuffle_control
are actually regarded.
The parameter shuffle_control
defines which numerical R object types are subject to byte shuffling. Generally speaking, the more ordered/sequential an
object is (e.g., 1:1e7
), the larger the potential benefit of byte shuffling. It is not uncommon to improve compression ratio or compression speed by
several orders of magnitude. The more random an object is (e.g., rnorm(1e7)
), the less potential benefit there is, even negative benefit is possible.
Integer vectors almost always benefit from byte shuffling, whereas the results for numeric vectors are mixed. To control block shuffling, add +1 to the
parameter for logical vectors, +2 for integer vectors, +4 for numeric vectors and/or +8 for complex vectors.
Register an ALTREP class to serialize using base R serialization.
register_altrep_class(classname, pkgname)
register_altrep_class(classname, pkgname)
classname |
The ALTREP class name |
pkgname |
The package the ALTREP class comes from |
register_altrep_class("compact_intseq", "base")
register_altrep_class("compact_intseq", "base")
Allow for serialization/deserialization of promises
set_trust_promises(value)
set_trust_promises(value)
value |
a boolean |
The previous value of the global variable trust_promises
set_trust_promises(TRUE)
set_trust_promises(TRUE)
Data from the International Astronomical Union. An official list of the 336 internationally recognized named stars, updated as of June 1, 2018.
data(starnames)
data(starnames)
A data.frame
with official IAU star names and several properties, such as coordinates.
Naming Stars | International Astronomical Union.
E Mamajek et. al. (2018), WG Triennial Report (2015-2018) - Star Names, Reports on Astronomy, 22 Mar 2018.
data(starnames)
data(starnames)
Unegister an ALTREP class to not use base R serialization.
unregister_altrep_class(classname, pkgname)
unregister_altrep_class(classname, pkgname)
classname |
The ALTREP class name |
pkgname |
The package the ALTREP class comes from |
unregister_altrep_class("compact_intseq", "base")
unregister_altrep_class("compact_intseq", "base")
Exports the compress bound function from the zstd library. Returns the maximum compressed size of an object of length size
.
zstd_compress_bound(size)
zstd_compress_bound(size)
size |
An integer size |
maximum compressed size
zstd_compress_bound(100000) zstd_compress_bound(1e9)
zstd_compress_bound(100000) zstd_compress_bound(1e9)
Compresses to a raw vector using the zstd algorithm. Exports the main zstd compression function.
zstd_compress_raw(x, compress_level)
zstd_compress_raw(x, compress_level)
x |
The object to serialize. |
compress_level |
The compression level used (default |
The compressed data as a raw vector.
x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- zstd_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(zstd_decompress_raw(xcompressed))
x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- zstd_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(zstd_decompress_raw(xcompressed))
Decompresses a zstd compressed raw vector.
zstd_decompress_raw(x)
zstd_decompress_raw(x)
x |
A raw vector. |
The de-serialized object.
x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- zstd_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(zstd_decompress_raw(xcompressed))
x <- 1:1e6 xserialized <- serialize(x, connection=NULL) xcompressed <- zstd_compress_raw(xserialized, compress_level = 1) xrecovered <- unserialize(zstd_decompress_raw(xcompressed))