First major release. cppally's public API is now considered stable.
While there may be structural changes to the r_df and r_raw classes in the future,
cppally's vector and scalar classes are considered stable.
r_sexp.length has been deprecated, in favour of the free function length
length(r_df) now returns the number of rows instead of the number of cols, marking a shift in how cppally treats data frames. They are now seen as row-wise vectors.
Setting attributes on plain SEXP is now unsupported, e.g. via cppally::attr::set_attr. Use cppally types such as r_vector, r_factors, r_df and in some cases r_sexp for attribute manipulation.
Various out-of-place or trivial to implement r_vec member functions have been removed.
visit_vector, visit_sexp and view_sexp have been deprecated in favour of the more flexible constrained r_sexp visitors: r_sexp_visit, r_sexp_view and r_sexp_mutate. These allow concepts and custom constraints to be applied directly on the lambda's template parameter, e.g. r_sexp_visit(x, [&]<RVector T>(T vec){}) — here x is dispatched as its concrete vector type and aborts at runtime if the underlying type isn't an RVector. r_sexp_view is the non-owning sibling: the wrapper handed to the lambda is a view (no extra protect), so it must not outlive x. r_sexp_mutate is for in-place mutation: it moves x into the typed wrapper (making it the sole owner), calls f, then writes the result back.
r_factors elements are now treated as r_str in member functions like get and set
r_sexp_visit now visits r_null as r_vec<r_sexp>(r_null), essentially treating NULL as an empty list but without changing the underlying data.
For example, in the below pseudo-code, when x is r_null of type r_sexp, r_sexp_visit will disambiguate it as r_vec<r_sexp>(r_null), preserving its data as R's NULL but assigning its type as r_vec<r_sexp> (list).
r_sexp_visit(x, [&]<RVector T>(const T& vec) -> bool {
return vec.is_null();
});
This preservation behaviour is not new, in fact all r_vec<T> vectors preserve r_null by design, allowing for efficient and easier attribute manipulation with vectors that may or may not be r_null. What is new is that previously r_null was not a visitable r_sexp object and now it is.
r_df is now fully integrated into cppally.
New variadic function make_df to create in-line data frames.
Various r_df members have been added to allow easier data frame manipulation.
std::vector coercion. The following std::vector coercion directions are supported:
std::vector -> std::vectorstd::vector -> cppally::r_veccppally::r_vec -> std::vectorAny coercion between std::vector and cppally::r_vec is possible so long as the element coercions are supported by cppally::as
New function seq which behaves similarly to base::seq.
New function sequence which is similar to base::sequence but accepts only scalar inputs.
For named vectors, lookup by name has been dramatically improved in C++ by introducing a hashing approach. It works in the following way: the first time a lookup is requested, a linear scan is done to find the named value. The second time triggers the hash map of name-value pairs to be built and cached with the vector. That second lookup is completed using the cached hash map and all subsequent lookups also use the hash map. The rationale for hashing on second lookup is covered in the 'Automatic Names Hashing' vignette.
A similar hashing approach is also used for r_factors, making conversions of strings to and from factor codes fast and analytically viable.
cppally now supports copy-on-modify as an opt-in feature. This feature prevents accidentally overwriting data between shared objects, just like R. To opt-in, run cppally::use_copy_on_modify or set the copy_on_modify to TRUE in cpp_source.
The major downside of this feature is significantly slower element setting as every set must verify the object is not referenced by another object. This check is single-threaded and thus nearly all parallel cppally code is disabled as a safety precaution. If using copy-on-modify, it is recommended to avoid writing cppally registered R functions that rely on in-place modification.
Inspired by purrr::pmap and base::mapply, cppally::pmap is a C++ variadic function that supports applying custom C++ lambda functions across corresponding elements of multiple vectors.
With pmap it is trivial to calculate parallel statistics like max, min, etc. Example of C++ version of base::pmax applied to two vectors.
template <RVector T, RVector U>
requires requires(typename T::data_type a, typename U::data_type b) { max(a, b); }
[[cppally::register]]
auto cpp_pmax2(T x, U y){
return pmap([](auto a, auto b){ return max(a, b); }, x, y);
}
A left-fold reduction functional that successively applies a binary function along the elements of the vector (from left-to-right).
Example: maximum value across vector of doubles
[[cppally::register]]
r_dbl cpp_max(r_vec<r_dbl> x){
return x.reduce([](auto acc, auto curr){ return max(acc, curr); });
}
New alias of r_vec, r_vector.
Named-vector subsetting is now supported.
New C++ functions combine and flatten. combine is a variadic function that allows for combining multiple vectors into one, similar to base::c but always casts vectors to the common type among them. flatten allows one to flatten a list of vectors into one vector of a specified type, similar to unlist(recursive = FALSE).
Many functions that were originally r_vec-only members are now free functions that also work on r_sexp as well as RComposite types, allowing for easier manipulation of lists.
All C++ reference qualifiers (T&, T&&, const T&) are now supported for registered functions, including templated ones.
New concept RVectorisable which encompasses types that are OMP friendly.
New infix operator IS_IN, identical to R's %in%.
New C++ function coalesce().
r_psxct.datetime_str() always appends "UTC" at the end to avoid time-zone ambiguity.
When registering C++ functions, cppally.hpp is now included in the generated C++ code. Not including it caused issues when trying to compile functions that constructed factors.
Zero-length r_vec vectors can now be constructed unambiguously via r_vec<T>(0).
Math operations involving mixed types that included r_dbl are now correct when involving NA values.