Title: | Miscellaneous Functions in C++ |
---|---|
Description: | Provides utility functions that are simply, frequently used, but may require higher performance that what can be obtained from base R. Incidentally provides support for 'reverse geocoding', such as matching a point with its nearest neighbour in another array. Used as a complement to package 'hutils' by sacrificing compilation or installation time for higher running speeds. The name is a portmanteau of the author and 'Rcpp'. |
Authors: | Hugh Parsonage [aut, cre], Simon Urbanek [ctb] (fastmatch components) |
Maintainer: | Hugh Parsonage <[email protected]> |
License: | GPL-2 |
Version: | 0.10.6 |
Built: | 2024-11-04 03:33:10 UTC |
Source: | CRAN |
Equivalent to abs(x - y)
but aims to be faster by
avoiding allocations.
abs_diff(x, y, nThread = getOption("hutilscpp.nThread", 1L), option = 1L) max_abs_diff(x, y, nThread = getOption("hutilscpp.nThread", 1L))
abs_diff(x, y, nThread = getOption("hutilscpp.nThread", 1L), option = 1L) max_abs_diff(x, y, nThread = getOption("hutilscpp.nThread", 1L))
x , y
|
Atomic, numeric, equilength vectors. |
nThread |
Number of threads to use. |
option |
An integer, provides backwards-compatible method to change results.
|
x <- sample(10) y <- sample(10) abs_diff(x, y) max_abs_diff(x, y)
x <- sample(10) y <- sample(10) abs_diff(x, y) max_abs_diff(x, y)
A vector is empty if all(is.na(x))
with a
special case for length(x) == 0
.
allNA( x, expected = FALSE, len0 = FALSE, nThread = getOption("hutilscpp.nThread", 1L) )
allNA( x, expected = FALSE, len0 = FALSE, nThread = getOption("hutilscpp.nThread", 1L) )
x |
A vector. Only atomic vectors are supported. |
expected |
|
len0 |
The result if |
nThread |
Number of threads to use (only applicable if |
allNA(c(NA, NA)) allNA(c(NA, NA, 1))
allNA(c(NA, NA)) allNA(c(NA, NA, 1))
Are any values outside the interval specified?
anyOutside(x, a, b, nas_absent = NA, na_is_outside = NA)
anyOutside(x, a, b, nas_absent = NA, na_is_outside = NA)
x |
A numeric vector. |
a , b
|
Single numeric values designating the interval. |
nas_absent |
Are If |
na_is_outside |
(logical, default:
#'
|
0L
if no values in x
are outside [a, b]
. Otherwise, the position
of the first value of x
outside [a, b]
.
anyOutside(1:10, 1L, 10L) anyOutside(1:10, 1L, 7L) # na_is_outside = NA anyOutside(c(1:10, NA), 1L, 7L) # Already outside before the NA anyOutside(c(NA, 1:10, NA), 1L, 7L) # NA since it occurred first anyOutside(c(1:7, NA), 1L, 7L, na_is_outside = FALSE) anyOutside(c(1:7, NA), 1L, 7L, na_is_outside = TRUE) ## # N <- 500e6 N <- 500e3 x <- rep_len(hutils::samp(-5:6, size = 23), N) bench_system_time(anyOutside(x, -5L, 6L)) # process real # 453.125ms 459.758ms
anyOutside(1:10, 1L, 10L) anyOutside(1:10, 1L, 7L) # na_is_outside = NA anyOutside(c(1:10, NA), 1L, 7L) # Already outside before the NA anyOutside(c(NA, 1:10, NA), 1L, 7L) # NA since it occurred first anyOutside(c(1:7, NA), 1L, 7L, na_is_outside = FALSE) anyOutside(c(1:7, NA), 1L, 7L, na_is_outside = TRUE) ## # N <- 500e6 N <- 500e3 x <- rep_len(hutils::samp(-5:6, size = 23), N) bench_system_time(anyOutside(x, -5L, 6L)) # process real # 453.125ms 459.758ms
Are elements of a vector even?
are_even( x, check_integerish = TRUE, keep_nas = TRUE, nThread = getOption("hutilscpp.nThread", 1L) ) which_are_even(x, check_integerish = TRUE)
are_even( x, check_integerish = TRUE, keep_nas = TRUE, nThread = getOption("hutilscpp.nThread", 1L) ) which_are_even(x, check_integerish = TRUE)
x |
An integer vector. Double vectors may also be used, but will
be truncated, with a warning if any element are not integers.
Long vectors are not supported unless |
check_integerish |
(logical, default: |
keep_nas |
(logical, default: |
nThread |
Number of threads to use. |
For are_even
, a logical vector the same length as x
,
TRUE
whenever x
is even.
For which_are_even
the integer positions of even values in x
.
The same as as.integer(x)
but only if x
consists only of
whole numbers and is within the range of integers.
as_integer_if_safe(x)
as_integer_if_safe(x)
x |
A double vector. If not a double vector, it is simply returned without any coercion. |
N <- 1e6 # run with 1e9 x <- rep_len(as.double(sample.int(100)), N) alt_as_integer <- function(x) { xi <- as.integer(x) if (isTRUE(all.equal(x, xi))) { xi } else { x } } bench_system_time(as_integer_if_safe(x)) #> process real #> 6.453s 6.452s bench_system_time(alt_as_integer(x)) #> process real #> 15.516s 15.545s bench_system_time(as.integer(x)) #> process real #> 2.469s 2.455s
N <- 1e6 # run with 1e9 x <- rep_len(as.double(sample.int(100)), N) alt_as_integer <- function(x) { xi <- as.integer(x) if (isTRUE(all.equal(x, xi))) { xi } else { x } } bench_system_time(as_integer_if_safe(x)) #> process real #> 6.453s 6.452s bench_system_time(alt_as_integer(x)) #> process real #> 15.516s 15.545s bench_system_time(as.integer(x)) #> process real #> 2.469s 2.455s
(Used for examples and tests)
bench_system_time(expr)
bench_system_time(expr)
expr |
Passed to |
Character to numeric
character2integer(x, na.strings = NULL, allow.double = FALSE, option = 0L)
character2integer(x, na.strings = NULL, allow.double = FALSE, option = 0L)
x |
A character vector. |
na.strings |
A set of strings that shall be coerced to |
allow.double |
|
option |
Control behaviour:
|
Convenience function for coalescing to zero
coalesce0(x, nThread = getOption("hutilscpp.nThread", 1L)) COALESCE0(x, nThread = getOption("hutilscpp.nThread", 1L))
coalesce0(x, nThread = getOption("hutilscpp.nThread", 1L)) COALESCE0(x, nThread = getOption("hutilscpp.nThread", 1L))
x |
An atomic vector. Or a list for |
nThread |
Number of threads to use. |
Equivalent to hutils::coalesce(x, 0)
for
an appropriate type of zero. COALESCE0(x)
For complex numbers, each component is coalesced. For unsupported types, the vector is returned, silently.
coalesce0(c(NA, 2:3)) coalesce0(NaN + 1i)
coalesce0(c(NA, 2:3)) coalesce0(NaN + 1i)
scales::comma
Faster version of scales::comma
Comma(x, digits = 0L, big.mark = c(",", " ", "'", "_", "~", "\"", "/"))
Comma(x, digits = 0L, big.mark = c(",", " ", "'", "_", "~", "\"", "/"))
x |
A numeric vector. |
digits |
An integer, similar to |
big.mark |
A single character, the thousands separator. |
Similar to prettyNum(round(x, digits), big.mark = ',')
but rounds down
and -1 < x < 0
will output "-0"
.
Count the number of FALSE
, TRUE
, and NA
s.
count_logical(x, nThread = getOption("hutilscpp.nThread", 1L))
count_logical(x, nThread = getOption("hutilscpp.nThread", 1L))
x |
A logical vector. |
nThread |
Number of threads to use. |
A vector of 3 elements: the number of FALSE
, TRUE
, and
NA
values in x
.
Cumulative sum unless reset
cumsum_reset(x, y = as.integer(x))
cumsum_reset(x, y = as.integer(x))
x |
A logical vector indicating when the sum should continue.
Missing values in |
y |
Optional: a numeric vector the same length as |
A vector of cumulative sums,
resetting whenever x
is FALSE
.
The return type is double if y
is double; otherwise an integer vector. Integer
overflow wraps around, rather than being promoted to double type, as this
function is intended for 'shortish' runs of cumulative sums.
If length(x) == 0
, y
is returned (i.e. integer(0)
or double(0)
.
cumsum_reset(c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE)) cumsum_reset(c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE), c(1000, 1000, 10000, 10, 20, 33, 0))
cumsum_reset(c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE)) cumsum_reset(c(TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, FALSE), c(1000, 1000, 10000, 10, 20, 33, 0))
Equivalent to diff(minmax(x))
diam(x, nThread = getOption("hutilscpp.nThread", 1L)) thinner(x, width, nThread = getOption("hutilscpp.nThread", 1L))
diam(x, nThread = getOption("hutilscpp.nThread", 1L)) thinner(x, width, nThread = getOption("hutilscpp.nThread", 1L))
x |
A numeric vector. |
nThread |
Number of threads to use. |
width |
|
A single value:
diam
The difference of minmax(x)
thinner
Equivalent to diam(x) <= width
Divisibility
divisible(x, d, nThread = getOption("hutilscpp.nThread", 1L)) divisible2(x, nThread = getOption("hutilscpp.nThread", 1L)) divisible16(x, nThread = getOption("hutilscpp.nThread", 1L))
divisible(x, d, nThread = getOption("hutilscpp.nThread", 1L)) divisible2(x, nThread = getOption("hutilscpp.nThread", 1L)) divisible16(x, nThread = getOption("hutilscpp.nThread", 1L))
x |
An integer vector |
d |
|
nThread |
The number of threads to use. |
Logical vector: TRUE
where x
is divisible by d
.
divisible2
,divisible16
are short for (and quicker than)
divisible(x, 2)
and divisble(x, 16)
.
Every integer
every_int(nThread = getOption("hutilsc.nThread", 1L), na = NA_integer_)
every_int(nThread = getOption("hutilsc.nThread", 1L), na = NA_integer_)
nThread |
Number of threads. |
na |
Value for |
fastmatch::fmatch
and logical versions, with parallelization.
fmatchp( x, table, nomatch = NA_integer_, nThread = getOption("hutilscpp.nThread", 1L), fin = FALSE, whichFirst = 0L, .raw = 0L ) finp(x, table, nThread = getOption("hutilscpp.nThread", 1L), .raw = 0L) fnotinp(x, table, nThread = getOption("hutilscpp.nThread", 1L), .raw = 0L)
fmatchp( x, table, nomatch = NA_integer_, nThread = getOption("hutilscpp.nThread", 1L), fin = FALSE, whichFirst = 0L, .raw = 0L ) finp(x, table, nThread = getOption("hutilscpp.nThread", 1L), .raw = 0L) fnotinp(x, table, nThread = getOption("hutilscpp.nThread", 1L), .raw = 0L)
x , table , nomatch
|
As in |
nThread |
Number of threads to use. |
fin |
|
whichFirst |
|
.raw |
|
x <- c(1L, 4:5) y <- c(2L, 4:5) fmatchp(x, y) fmatchp(x, y, nomatch = 0L) finp(x, y)
x <- c(1L, 4:5) y <- c(2L, 4:5) fmatchp(x, y) fmatchp(x, y, nomatch = 0L) finp(x, y)
Helper
helper(expr)
helper(expr)
expr |
An expression |
The expression evaluated.
x6 <- 1:6 helper(x6 + 1)
x6 <- 1:6 helper(x6 + 1)
Implies
Implies(x, y, anyNAx = TRUE, anyNAy = TRUE)
Implies(x, y, anyNAx = TRUE, anyNAy = TRUE)
x , y
|
Logical vectors of equal length. |
anyNAx , anyNAy
|
Whether |
Logical implies: TRUE
unless x
is TRUE
and y
is FALSE
.
NA
in either x
or y
results in NA
if and only if the result is unknown.
In particular NA %implies% TRUE
is TRUE
and FALSE %implies% NA
is TRUE
.
If x
or y
are length-one, the function proceeds as if the length-one vector were recycled
to the length of the other.
library(data.table) CJ(x = c(TRUE, FALSE), y = c(TRUE, FALSE))[, ` x => y` := Implies(x, y)][] #> x y x => y #> 1: FALSE FALSE TRUE #> 2: FALSE TRUE TRUE #> 3: TRUE FALSE FALSE #> 4: TRUE TRUE TRUE # NA results: #> 5: NA NA NA #> 6: NA FALSE NA #> 7: NA TRUE TRUE #> 8: FALSE NA TRUE #> 9: TRUE NA NA
library(data.table) CJ(x = c(TRUE, FALSE), y = c(TRUE, FALSE))[, ` x => y` := Implies(x, y)][] #> x y x => y #> 1: FALSE FALSE TRUE #> 2: FALSE TRUE TRUE #> 3: TRUE FALSE FALSE #> 4: TRUE TRUE TRUE # NA results: #> 5: NA NA NA #> 6: NA FALSE NA #> 7: NA TRUE TRUE #> 8: FALSE NA TRUE #> 9: TRUE NA NA
Efficiently decide whether an atomic vector is constant; that is, contains only one value.
Equivalent to
data.table::uniqueN(x) == 1L
or
forecast::is.constant(x)
is_constant(x, nThread = getOption("hutilscpp.nThread", 1L)) isntConstant(x)
is_constant(x, nThread = getOption("hutilscpp.nThread", 1L)) isntConstant(x)
x |
An atomic vector. Only logical, integer, double, and character vectors are supported. Others may work but have not been tested. |
nThread |
|
Whether or not the vector x
is constant:
is_constant
TRUE
or FALSE
. Missing values are considered to
be the same as each other, so a vector entirely composed of missing values is
considered constant. Note that is_constant(c(NA_real_, NaN))
is TRUE
.
isntConstant
If constant, 0L
; otherwise, the first integer position at
which x
has a different value to the first.
This has the virtue of !isntConstant(x) == is_constant(x)
.
Multithreaded is_constant(x, nThread)
should only be used if
x
is expected to be true. It will be faster when
x
is constant but much slower otherwise.
Empty vectors are constant, as are length-one vectors.
library(hutilscpp) library(data.table) setDTthreads(1L) N <- 1e9L N <- 1e6 # to avoid long-running examples on CRAN ## Good-cases nonconst <- c(integer(1e5), 13L, integer(N)) bench_system_time(uniqueN(nonconst) == 1L) #> process real #> 15.734s 2.893s bench_system_time(is_constant(nonconst)) #> process real #> 0.000 0.000 bench_system_time(isntConstant(nonconst)) #> process real #> 0.000 0.000 ## Worst-cases consti <- rep(13L, N) bench_system_time(uniqueN(consti) == 1L) #> process real #> 5.734s 1.202s bench_system_time(is_constant(consti)) #> process real #> 437.500ms 437.398ms bench_system_time(isntConstant(consti)) #> process real #> 437.500ms 434.109ms nonconsti <- c(consti, -1L) bench_system_time(uniqueN(nonconsti) == 1L) #> process real #> 17.812s 3.348s bench_system_time(is_constant(nonconsti)) #> process real #> 437.500ms 431.104ms bench_system_time(isntConstant(consti)) #> process real #> 484.375ms 487.588ms constc <- rep("a", N) bench_system_time(uniqueN(constc) == 1L) #> process real #> 11.141s 3.580s bench_system_time(is_constant(constc)) #> process real #> 4.109s 4.098s nonconstc <- c(constc, "x") bench_system_time(uniqueN(nonconstc) == 1L) #> process real #> 22.656s 5.629s bench_system_time(is_constant(nonconstc)) #> process real #> 5.906s 5.907s
library(hutilscpp) library(data.table) setDTthreads(1L) N <- 1e9L N <- 1e6 # to avoid long-running examples on CRAN ## Good-cases nonconst <- c(integer(1e5), 13L, integer(N)) bench_system_time(uniqueN(nonconst) == 1L) #> process real #> 15.734s 2.893s bench_system_time(is_constant(nonconst)) #> process real #> 0.000 0.000 bench_system_time(isntConstant(nonconst)) #> process real #> 0.000 0.000 ## Worst-cases consti <- rep(13L, N) bench_system_time(uniqueN(consti) == 1L) #> process real #> 5.734s 1.202s bench_system_time(is_constant(consti)) #> process real #> 437.500ms 437.398ms bench_system_time(isntConstant(consti)) #> process real #> 437.500ms 434.109ms nonconsti <- c(consti, -1L) bench_system_time(uniqueN(nonconsti) == 1L) #> process real #> 17.812s 3.348s bench_system_time(is_constant(nonconsti)) #> process real #> 437.500ms 431.104ms bench_system_time(isntConstant(consti)) #> process real #> 484.375ms 487.588ms constc <- rep("a", N) bench_system_time(uniqueN(constc) == 1L) #> process real #> 11.141s 3.580s bench_system_time(is_constant(constc)) #> process real #> 4.109s 4.098s nonconstc <- c(constc, "x") bench_system_time(uniqueN(nonconstc) == 1L) #> process real #> 22.656s 5.629s bench_system_time(is_constant(nonconstc)) #> process real #> 5.906s 5.907s
Is a vector sorted?
is_sorted(x, asc = NA) isntSorted(x, asc = NA)
is_sorted(x, asc = NA) isntSorted(x, asc = NA)
x |
An atomic vector. |
asc |
Single logical. If |
is_sorted
returns TRUE
or FALSE
isntSorted
returns 0
if sorted or the first position
that proves the vector is not sorted
Vectorized logical with support for short-circuits
and3(x, y, z = NULL, nas_absent = FALSE) or3(x, y, z = NULL)
and3(x, y, z = NULL, nas_absent = FALSE) or3(x, y, z = NULL)
x , y , z
|
Logical vectors. If |
nas_absent |
(logical, default: |
For and3
, the same as x & y & z
;
for or3
, the same as x | y | z
, designed to be efficient when component-wise
short-circuiting is available.
Performant implementations of &
et or
.
Performance is high when the expressions are long (i.e. over 10M elements)
and in particular when they are of the form lhs <op> rhs
for binary
<op>
.
and3s( exprA, exprB = NULL, exprC = NULL, ..., nThread = getOption("hutilscpp.nThread", 1L), .parent_nframes = 1L, type = c("logical", "raw", "which") ) or3s( exprA, exprB = NULL, exprC = NULL, ..., nThread = getOption("hutilscpp.nThread", 1L), .parent_nframes = 1L, type = c("logical", "raw", "which") )
and3s( exprA, exprB = NULL, exprC = NULL, ..., nThread = getOption("hutilscpp.nThread", 1L), .parent_nframes = 1L, type = c("logical", "raw", "which") ) or3s( exprA, exprB = NULL, exprC = NULL, ..., nThread = getOption("hutilscpp.nThread", 1L), .parent_nframes = 1L, type = c("logical", "raw", "which") )
exprA , exprB , exprC , ...
|
Expressions of the form Only |
nThread |
|
.parent_nframes |
|
type |
The type of the result. |
and3s
and or3s
return exprA & exprB & exprC
and
exprA | exprB | exprC
respectively. If any expression is missing
it is considered TRUE
for and3s
and FALSE
for or3s
;
in other words only the results of the other expressions count towards the result.
When geocoding coordinates to known addresses, an efficient way to
match the given coordinates with the known is necessary. This function provides this
efficiency by using C++
and allowing approximate matching.
match_nrst_haversine( lat, lon, addresses_lat, addresses_lon, Index = seq_along(addresses_lat), cartesian_R = NULL, close_enough = 10, excl_self = FALSE, as.data.table = TRUE, .verify_box = TRUE )
match_nrst_haversine( lat, lon, addresses_lat, addresses_lon, Index = seq_along(addresses_lat), cartesian_R = NULL, close_enough = 10, excl_self = FALSE, as.data.table = TRUE, .verify_box = TRUE )
lat , lon
|
Coordinates to be geocoded. Numeric vectors of equal length. |
addresses_lat , addresses_lon
|
Coordinates of known locations. Numeric vectors of equal length
(likely to be a different length than the length of |
Index |
A vector the same length as |
cartesian_R |
The maximum radius of any address from the points to be geocoded. Used to accelerate the detection of minimum distances. Note, as the argument name suggests, the distance is in cartesian coordinates, so a small number is likely. |
close_enough |
The distance, in metres, below which a match will be considered to have occurred. (The distance that is considered "close enough" to be a match.) For example, May be provided as a string to emphasize the units, e.g. |
excl_self |
(bool, default: |
as.data.table |
Return result as a |
.verify_box |
Check the initial guess against other points within the
box of radius |
A list (or data.table
if as.data.table = TRUE
) with two elements,
both the same length as lat
, giving for point lat,lon
:
pos
the position (or corresponding value in Table
)
in addresses_lat,addresses_lon
nearest to lat, lon
.
dist
the distance, in kilometres, between the two points.
lat2 <- runif(5, -38, -37.8) lon2 <- rep(145, 5) lat1 <- c(-37.875, -37.91) lon1 <- c(144.96, 144.978) match_nrst_haversine(lat1, lon1, lat2, lon2) match_nrst_haversine(lat1, lon1, lat1, lon1, 11:12, excl_self = TRUE)
lat2 <- runif(5, -38, -37.8) lon2 <- rep(145, 5) lat1 <- c(-37.875, -37.91) lon1 <- c(144.96, 144.978) match_nrst_haversine(lat1, lon1, lat2, lon2) match_nrst_haversine(lat1, lon1, lat1, lon1, 11:12, excl_self = TRUE)
Minimum and maximum
minmax(x, empty_result = NULL, nThread = getOption("hutilscpp.nThread", 1L))
minmax(x, empty_result = NULL, nThread = getOption("hutilscpp.nThread", 1L))
x |
An atomic vector. |
empty_result |
What should be returned when |
nThread |
Number of threads to be used. |
Vector of two elements, the minimum and maximum of x
, or NULL
.
Most common element
ModeC( x, nThread = getOption("hutilscpp.nThread", 1L), .range_fmatch = 1000000000, option = 1L )
ModeC( x, nThread = getOption("hutilscpp.nThread", 1L), .range_fmatch = 1000000000, option = 1L )
x |
An atomic vector. |
nThread |
Number of threads to use. |
.range_fmatch |
If the range of |
option |
|
ModeC(c(1L, 1L, 2L))
ModeC(c(1L, 1L, 2L))
Faster pmax()
and pmin()
.
pmaxC( x, a, in_place = FALSE, keep_nas = FALSE, dbl_ok = NA, nThread = getOption("hutilscpp.nThread", 1L) ) pminC( x, a, in_place = FALSE, keep_nas = FALSE, dbl_ok = NA, nThread = getOption("hutilscpp.nThread", 1L) ) pmax0( x, in_place = FALSE, sorted = FALSE, keep_nas = FALSE, nThread = getOption("hutilscpp.nThread", 1L) ) pmin0( x, in_place = FALSE, sorted = FALSE, keep_nas = FALSE, nThread = getOption("hutilscpp.nThread", 1L) ) pmaxV( x, y, in_place = FALSE, dbl_ok = TRUE, nThread = getOption("hutilscpp.nThread", 1L) ) pminV( x, y, in_place = FALSE, dbl_ok = TRUE, nThread = getOption("hutilscpp.nThread", 1L) ) pmax3(x, y, z, in_place = FALSE) pmin3(x, y, z, in_place = FALSE)
pmaxC( x, a, in_place = FALSE, keep_nas = FALSE, dbl_ok = NA, nThread = getOption("hutilscpp.nThread", 1L) ) pminC( x, a, in_place = FALSE, keep_nas = FALSE, dbl_ok = NA, nThread = getOption("hutilscpp.nThread", 1L) ) pmax0( x, in_place = FALSE, sorted = FALSE, keep_nas = FALSE, nThread = getOption("hutilscpp.nThread", 1L) ) pmin0( x, in_place = FALSE, sorted = FALSE, keep_nas = FALSE, nThread = getOption("hutilscpp.nThread", 1L) ) pmaxV( x, y, in_place = FALSE, dbl_ok = TRUE, nThread = getOption("hutilscpp.nThread", 1L) ) pminV( x, y, in_place = FALSE, dbl_ok = TRUE, nThread = getOption("hutilscpp.nThread", 1L) ) pmax3(x, y, z, in_place = FALSE) pmin3(x, y, z, in_place = FALSE)
x |
|
a |
|
in_place |
|
keep_nas |
|
dbl_ok |
|
nThread |
|
sorted |
|
y , z
|
|
Versions of pmax
and pmin
, designed for performance.
When in_place = TRUE
, the values of x
are modified in-place.
For advanced users only.
The differences are:
pmaxC(x, a)
and pminC(x, a)
Both x
and a
must be numeric and
a
must be length-one.
This function will always be faster than pmax(x, a)
when a
is
a single value, but can be slower than pmax.int(x, a)
when x
is short.
Use this function when comparing a numeric vector with a single value.
Use in_place = TRUE
only within functions when you are sure it is safe, i.e. not a
reference to something outside the environment.
By design, the functions first check whether x
will be modified before
allocating memory to a new vector. For example, if all values in x
are
nonnegative, the vector is returned.
pmaxC(-5:5, 2) pmaxC(1:4, 5.5) pmaxC(1:4, 5.5, dbl_ok = TRUE) # pmaxC(1:4, 5.5, dbl_ok = FALSE) # error
pmaxC(-5:5, 2) pmaxC(1:4, 5.5) pmaxC(1:4, 5.5, dbl_ok = TRUE) # pmaxC(1:4, 5.5, dbl_ok = FALSE) # error
Find a binary pole of inaccessibility
poleInaccessibility2( x = NULL, y = NULL, DT = NULL, x_range = NULL, y_range = NULL, copy_DT = TRUE ) poleInaccessibility3( x = NULL, y = NULL, DT = NULL, x_range = NULL, y_range = NULL, copy_DT = TRUE, test_both = TRUE )
poleInaccessibility2( x = NULL, y = NULL, DT = NULL, x_range = NULL, y_range = NULL, copy_DT = TRUE ) poleInaccessibility3( x = NULL, y = NULL, DT = NULL, x_range = NULL, y_range = NULL, copy_DT = TRUE, test_both = TRUE )
x , y
|
Coordinates. |
DT |
A |
x_range , y_range
|
Numeric vectors of length-2; the range of |
copy_DT |
(logical, default: |
test_both |
(logical, default: |
poleInaccessibility2
A named vector containing the
xmin
, xmax
and
ymin
, ymax
coordinates of
the largest rectangle of width an integer power of two that is empty.
poleInaccessibility3
Starting with the rectangle formed by poleInaccessibility2
,
the rectangle formed by stretching it out vertically and horizontally until
the edges intersect the points x,y
library(data.table) library(hutils) # A square with a 10 by 10 square of the northeast corner removed x <- runif(1e4, 0, 100) y <- runif(1e4, 0, 100) DT <- data.table(x, y) # remove the NE corner DT_NE <- DT[implies(x > 90, y < 89)] DT_NE[, poleInaccessibility2(x, y)] DT_NE[, poleInaccessibility3(x, y)]
library(data.table) library(hutils) # A square with a 10 by 10 square of the northeast corner removed x <- runif(1e4, 0, 100) y <- runif(1e4, 0, 100) DT <- data.table(x, y) # remove the NE corner DT_NE <- DT[implies(x > 90, y < 89)] DT_NE[, poleInaccessibility2(x, y)] DT_NE[, poleInaccessibility3(x, y)]
Range of a vector using Rcpp.
range_rcpp( x, anyNAx = anyNA(x), warn_empty = TRUE, integer0_range_is_integer = FALSE )
range_rcpp( x, anyNAx = anyNA(x), warn_empty = TRUE, integer0_range_is_integer = FALSE )
x |
A vector for which the range is desired. Vectors with missing values are not supported and have no definite behaviour. |
anyNAx |
(logical, default: |
warn_empty |
(logical, default: |
integer0_range_is_integer |
(logical, default: |
A length-4 vector, the first two positions give the range and
the next two give the positions in x
where the max and min occurred.
This is almost equivalent to c(range(x), which.min(x), which.max(x))
.
Note that the type is not strictly preserved, but no loss should occur. In particular,
logical x
results in an integer result, and a double x
will
have double values for which.min(x)
and which.max(x)
.
A completely empty, logical x
returns c(NA, NA, NA, NA)
as an integer vector.
x <- rnorm(1e3) # Not noticeable at this scale bench_system_time(range_rcpp(x)) bench_system_time(range(x))
x <- rnorm(1e3) # Not noticeable at this scale bench_system_time(range_rcpp(x)) bench_system_time(range(x))
Squish into a range
squish(x, a, b, in_place = FALSE)
squish(x, a, b, in_place = FALSE)
x |
A numeric vector. |
a , b
|
Upper and lower bounds |
in_place |
(logical, default: |
A numeric/integer vector with the values of x
"squished" between a
and b
; values above b
replaced with b
and values below a
replaced with a
.
squish(-5:5,-1L, 1L)
squish(-5:5,-1L, 1L)
Sum of logical expressions
sum_and3s( exprA, exprB, exprC, ..., nThread = getOption("hutilscpp.nThread", 1L), .env = parent.frame() ) sum_or3s( exprA, exprB, exprC, ..., .env = parent.frame(), nThread = getOption("hutilscpp.nThread", 1L) )
sum_and3s( exprA, exprB, exprC, ..., nThread = getOption("hutilscpp.nThread", 1L), .env = parent.frame() ) sum_or3s( exprA, exprB, exprC, ..., .env = parent.frame(), nThread = getOption("hutilscpp.nThread", 1L) )
exprA , exprB , exprC , ...
|
Expressions of the form |
nThread |
|
.env |
The environment in which the expressions are to be evaluated. |
Equivalent to sum(exprA & exprB & exprC)
or
sum(exprA | exprB | exprC)
as desired.
The count of missing values in an atomic vector, equivalent to
to sum(is.na(x))
.
sum_isna(x, do_anyNA = TRUE, nThread = getOption("hutilscpp.nThread", 1L))
sum_isna(x, do_anyNA = TRUE, nThread = getOption("hutilscpp.nThread", 1L))
x |
An atomic vector. |
do_anyNA |
Should Ignored silently if |
nThread |
|
sum_isna(c(1:5, NA)) sum_isna(c(NaN, NA)) # 2 from v0.4.0 (Sep 2020)
sum_isna(c(1:5, NA)) sum_isna(c(NaN, NA)) # 2 from v0.4.0 (Sep 2020)
Using the fastmatch
hash functions, determine
the unique elements of a vector, and the number of distinct elements.
unique_fmatch(x, nThread = getOption("hutilscpp.nThread", 1L)) uniqueN_fmatch(x, nThread = getOption("hutilscpp.nThread", 1L))
unique_fmatch(x, nThread = getOption("hutilscpp.nThread", 1L)) uniqueN_fmatch(x, nThread = getOption("hutilscpp.nThread", 1L))
x |
An atomic vector. |
nThread |
Number of threads to use. |
Equivalent to unique(x)
or data.table::uniqueN(x)
respectively.
TRUE
?A faster and safer version of which.max
applied
to simple-to-parse logical expressions.
which_first( expr, verbose = FALSE, reverse = FALSE, sexpr, eval_parent_n = 1L, suppressWarning = getOption("hutilscpp_suppressWarning", FALSE), use.which.max = FALSE ) which_last( expr, verbose = FALSE, reverse = FALSE, suppressWarning = getOption("hutilscpp_suppressWarning", FALSE) )
which_first( expr, verbose = FALSE, reverse = FALSE, sexpr, eval_parent_n = 1L, suppressWarning = getOption("hutilscpp_suppressWarning", FALSE), use.which.max = FALSE ) which_last( expr, verbose = FALSE, reverse = FALSE, suppressWarning = getOption("hutilscpp_suppressWarning", FALSE) )
expr |
An expression, such as |
verbose |
|
reverse |
|
sexpr |
Equivalent to |
eval_parent_n |
Passed to |
suppressWarning |
Either a |
use.which.max |
If |
If the expr
is of the form LHS <operator> RHS
and LHS
is a single symbol, operator
is one of
==
, !=
, >
, >=
, <
, <=
,
%in%
,
or
%between%
,
and RHS
is numeric, then expr
is not
evaluated directly; instead, each element of LHS
is compared
individually.
If expr
is not of the above form, then expr
is evaluated
and passed to which.max
.
Using this function can be significantly faster than the alternatives
when the computation
of expr
would be expensive, though the difference is only likely to
be clear when length(x)
is much larger than 10 million.
But even for smaller vectors, it has the benefit of returning
0L
if none of the values in expr
are TRUE
, unlike
which.max
.
Compared to Position
for an appropriate
choice of f
the speed of which_first
is not much faster
when the expression is TRUE
for some position. However, which_first
is faster when all elements of expr
are FALSE
.
Thus which_first
has a smaller worst-case time than the
alternatives for most x
.
Missing values on the RHS are handled specially.
which_first(x %between% c(NA, 1))
for example is equivalent to
which_first(x <= 1)
, as in data.table::between
.
The same as which.max(expr)
or which(expr)[1]
but returns 0L
when expr
has no TRUE
values.
N <- 1e5 # N <- 1e8 ## too slow for CRAN # Two examples, from slowest to fastest, # run with N = 1e8 elements # seconds x <- rep_len(runif(1e4, 0, 6), N) bench_system_time(x > 5) bench_system_time(which(x > 5)) # 0.8 bench_system_time(which.max(x > 5)) # 0.3 bench_system_time(which_first(x > 5)) # 0.000 ## Worst case: have to check all N elements x <- double(N) bench_system_time(x > 0) bench_system_time(which(x > 0)) # 1.0 bench_system_time(which.max(x > 0)) # 0.4 but returns 1, not 0 bench_system_time(which_first(x > 0)) # 0.1 x <- as.character(x) # bench_system_time(which(x == 5)) # 2.2 bench_system_time(which.max(x == 5)) # 1.6 bench_system_time(which_first(x == 5)) # 1.3
N <- 1e5 # N <- 1e8 ## too slow for CRAN # Two examples, from slowest to fastest, # run with N = 1e8 elements # seconds x <- rep_len(runif(1e4, 0, 6), N) bench_system_time(x > 5) bench_system_time(which(x > 5)) # 0.8 bench_system_time(which.max(x > 5)) # 0.3 bench_system_time(which_first(x > 5)) # 0.000 ## Worst case: have to check all N elements x <- double(N) bench_system_time(x > 0) bench_system_time(which(x > 0)) # 1.0 bench_system_time(which.max(x > 0)) # 0.4 but returns 1, not 0 bench_system_time(which_first(x > 0)) # 0.1 x <- as.character(x) # bench_system_time(which(x == 5)) # 2.2 bench_system_time(which.max(x == 5)) # 1.6 bench_system_time(which_first(x == 5)) # 1.3
Introduced in v 1.6.0
which_firstNA(x) which_lastNA(x)
which_firstNA(x) which_lastNA(x)
x |
An atomic vector. |
The position of the first/last missing value in x
.
N <- 1e8 N <- 1e6 # for CRAN etc x <- c(1:1e5, NA, integer(N)) bench_system_time(which.max(is.na(x))) # 123ms bench_system_time(Position(is.na, x)) # 22ms bench_system_time(which_firstNA(x)) # <1ms
N <- 1e8 N <- 1e6 # for CRAN etc x <- c(1:1e5, NA, integer(N)) bench_system_time(which.max(is.na(x))) # 123ms bench_system_time(Position(is.na, x)) # 22ms bench_system_time(which_firstNA(x)) # <1ms
At which point are all values true onwards
which_true_onwards(x)
which_true_onwards(x)
x |
A logical vector. |
The position of the first TRUE
value in x
at which all
the following values are TRUE
.
which_true_onwards(c(TRUE, FALSE, TRUE, TRUE, TRUE))
which_true_onwards(c(TRUE, FALSE, TRUE, TRUE, TRUE))
which of three vectors are the elements (all, any) true?
which3( x, y, z, And = TRUE, anyNAx = anyNA(x), anyNAy = anyNA(y), anyNAz = anyNA(z) )
which3( x, y, z, And = TRUE, anyNAx = anyNA(x), anyNAy = anyNA(y), anyNAz = anyNA(z) )
x , y , z
|
Logical vectors. Either the same length or length-1 |
And |
Boolean. If |
anyNAx , anyNAy , anyNAz
|
Whether or not the inputs have |
Same as which(exprA)
where exprA
is a binary
expression.
whichs( exprA, .env = parent.frame(), nThread = getOption("hutilscpp.nThread", 1L) )
whichs( exprA, .env = parent.frame(), nThread = getOption("hutilscpp.nThread", 1L) )
exprA |
An expression. Useful when of the form |
.env |
The environment in which |
nThread |
Number of threads to use. |
Integer vector, the indices of exprA
that return TRUE
.
Exclusive or
xor2(x, y, anyNAx = TRUE, anyNAy = TRUE)
xor2(x, y, anyNAx = TRUE, anyNAy = TRUE)
x , y
|
Logical vectors. |
anyNAx , anyNAy
|
Could |