Title: | High Performance Tools for Combinatorics and Computational Mathematics |
---|---|
Description: | Provides optimized functions and flexible combinatorial iterators implemented in C++ for solving problems in combinatorics and computational mathematics. Utilizes the RMatrix class from 'RcppParallel' for thread safety. There are combination/permutation functions with constraint parameters that allow for generation of all results of a vector meeting specific criteria (e.g. generating integer partitions/compositions or finding all combinations such that the sum is between two bounds). Capable of generating specific combinations/permutations (e.g. retrieve only the nth lexicographical result) which sets up nicely for parallelization as well as random sampling. Gmp support permits exploration where the total number of results is large (e.g. comboSample(10000, 500, n = 4)). Additionally, there are several high performance number theoretic functions that are useful for problems common in computational mathematics. Some of these functions make use of the fast integer division library 'libdivide'. The primeSieve function is based on the segmented sieve of Eratosthenes implementation by Kim Walisch. It is also efficient for large numbers by using the cache friendly improvements originally developed by Tomás Oliveira. Finally, there is a prime counting function that implements Legendre's formula based on the work of Kim Walisch. |
Authors: | Joseph Wood [aut, cre] |
Maintainer: | Joseph Wood <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.8.5 |
Built: | 2024-12-11 06:52:42 UTC |
Source: | CRAN |
The RcppAlgos package attacks age-old problems in combinatorics and computational mathematics.
The main goal is to encourage fresh and creative approaches to foundational problems. The question that most appropriately summarizes RcppAlgos
is: "Can we do better?".
Provide highly optimized functions that facilitates a broader spectrum of researchable cases. E.g
Investigating the structure of large numbers over wide ranges:
primeFactorizeSieve(10^13 - 10^7, 10^13 + 10^7)
primeSieve(2^53 - 10^10, 2^53 - 1, nThreads = 32)
Easily explore combinations/permutations/partitions that would otherwise be inaccessible due to time of execution/memory constraints:
c_iter = comboIter(10000, 100) bigSamp = gmp::urand.bigz(3, gmp::log2.bigz(comboCount(10000, 100))) c_iter[[bigSamp]] ## flexible iterator allows random sampling
p_iter = partitionsIter(5000, 100, target = 6000) p_iter[[1e9]] ## start iterating from index = 1e9 p_iter@nextIter() p_iter@nextNIter(1e3)
comboGeneral(150, 5, constraintFun = "sum", Parallel = TRUE)
parallel::mclapply(seq(...), function(x) { temp = permuteGeneral(15, 10, lower = x, upper = y) ## analyze permutations ## output results }, mc.cores = detectCores() - 1))
partitionsGeneral(0:80, repetition = TRUE)
permuteSample(rnorm(100), 10, freqs = rep(1:4, 25), n = 15, seed = 123)
set.seed(123) comboGeneral(runif(42, 0, 50), 10, constraintFun = "mean", comparisonFun = c(">","<"), limitConstraints = c(39.876, 42.123))
Speed!!!.... You will find that the functions in RcppAlgos
are some of the fastest of their type available in R
.
Joseph Wood
The Combo
class family are S4-classes that expose C++ classes that provide access to iterators and other useful methods.
of "Combo"
and all classes inheriting from it:
nextIter
Retrieve the next lexicographical result
nextNIter
Pass an integer n to retrieve the next n lexicographical results
nextRemaining
Retrieve all remaining lexicographical results
currIter
Returns the current iteration
prevIter
Retrieve the previous lexicographical result (the next reverse lexicographical result)
prevNIter
Pass an integer n to retrieve the previous n lexicographical results (the next n reverse lexicographical results)
prevRemaining
Retrieve all remaining reverse lexicographical results
startOver
Resets the iterator
sourceVector
View the source vector
summary
Returns a list of summary information about the iterator
front
Retrieve the first lexicographical result
back
Retrieve the last lexicographical result
randomAccess
Random access method. Pass a single value or a vector of valid indices. If a single value is passed, the internal index of the iterator will be updated, however if a vector is passed the internal state will not change. GMP support allows for flexible indexing.
Joseph Wood
Partitions-class
, Constraints-class
showClass("Combo")
showClass("Combo")
Calculate the number of combinations/permutations of a vector chosen at a time with or without replacement. Additionally, these functions can calculate the number of combinations/permutations of multisets.
comboCount(v, m = NULL, ...) permuteCount(v, m = NULL, ...) ## Default S3 method: comboCount(v, m = NULL, repetition = FALSE, freqs = NULL, ...) ## Default S3 method: permuteCount(v, m = NULL, repetition = FALSE, freqs = NULL, ...) ## S3 method for class 'table' comboCount(v, m = NULL, ...) ## S3 method for class 'table' permuteCount(v, m = NULL, ...)
comboCount(v, m = NULL, ...) permuteCount(v, m = NULL, ...) ## Default S3 method: comboCount(v, m = NULL, repetition = FALSE, freqs = NULL, ...) ## Default S3 method: permuteCount(v, m = NULL, repetition = FALSE, freqs = NULL, ...) ## S3 method for class 'table' comboCount(v, m = NULL, ...) ## S3 method for class 'table' permuteCount(v, m = NULL, ...)
v |
Source vector. If |
m |
Number of elements to choose. If |
... |
Further arguments passed to methods. |
repetition |
Logical value indicating whether combinations/permutations should be with or without repetition. The default is |
freqs |
A vector of frequencies used for producing all combinations/permutations of a multiset of |
A numerical value representing the total number of combinations/permutations.
When the number of results exceeds , a number of class
bigz
is returned.
## Same interface as the respective "general" functions: ## i.e. comboGeneral & permuteGeneral permuteCount(-5) permuteCount(5) comboCount(25, 12) permuteCount(15, 7, TRUE) comboCount(25, 12, freqs = rep(2, 25)) ## Return object of class 'bigz' comboCount(250, 15, freqs = rep(2, 250))
## Same interface as the respective "general" functions: ## i.e. comboGeneral & permuteGeneral permuteCount(-5) permuteCount(5) comboCount(25, 12) permuteCount(15, 7, TRUE) comboCount(25, 12, freqs = rep(2, 25)) ## Return object of class 'bigz' comboCount(250, 15, freqs = rep(2, 250))
Generate combinations or permutations of a vector with or without constraints.
Produce results in parallel using the Parallel
or nThreads
arguments. You can also apply each of the five compiled functions given by the argument constraintFun
in parallel.
The arguments lower
and upper
make it possible to generate combinations/permutations in chunks allowing for parallelization via the parallel-package
. This is convenient when you want to apply a custom function to the output in parallel.
Attack integer partition and general subset sum problems.
GMP support allows for exploration of combinations/permutations of vectors with many elements.
The output is in lexicographical order.
comboGeneral(v, m = NULL, ...) permuteGeneral(v, m = NULL, ...) ## S3 method for class 'numeric' comboGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ...) ## S3 method for class 'numeric' permuteGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ...) ## S3 method for class 'factor' comboGeneral( v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, FUN.VALUE = NULL, ... ) ## S3 method for class 'factor' permuteGeneral( v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, FUN.VALUE = NULL, ... ) ## Default S3 method: comboGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, FUN = NULL, FUN.VALUE = NULL, ...) ## Default S3 method: permuteGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, FUN = NULL, FUN.VALUE = NULL, ...) ## S3 method for class 'table' comboGeneral( v, m = NULL, lower = NULL, upper = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ... ) ## S3 method for class 'table' permuteGeneral( v, m = NULL, lower = NULL, upper = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ... ) ## S3 method for class 'list' comboGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, ...) ## S3 method for class 'list' permuteGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, ...)
comboGeneral(v, m = NULL, ...) permuteGeneral(v, m = NULL, ...) ## S3 method for class 'numeric' comboGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ...) ## S3 method for class 'numeric' permuteGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ...) ## S3 method for class 'factor' comboGeneral( v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, FUN.VALUE = NULL, ... ) ## S3 method for class 'factor' permuteGeneral( v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, FUN.VALUE = NULL, ... ) ## Default S3 method: comboGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, FUN = NULL, FUN.VALUE = NULL, ...) ## Default S3 method: permuteGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, FUN = NULL, FUN.VALUE = NULL, ...) ## S3 method for class 'table' comboGeneral( v, m = NULL, lower = NULL, upper = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ... ) ## S3 method for class 'table' permuteGeneral( v, m = NULL, lower = NULL, upper = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ... ) ## S3 method for class 'list' comboGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, ...) ## S3 method for class 'list' permuteGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, ...)
v |
Source vector. If |
m |
Number of elements to choose. If |
... |
Further arguments passed to methods. |
repetition |
Logical value indicating whether combinations/permutations should be with or without repetition. The default is |
freqs |
A vector of frequencies used for producing all combinations/permutations of a multiset of |
lower |
The lower bound. Combinations/permutations are generated lexicographically, thus utilizing this argument will determine which specific combination/permutation to start generating from (e.g. |
upper |
The upper bound. Similar to If the output is constrained and In addition to the benefits listed for |
constraintFun |
Function to be applied to the elements of |
comparisonFun |
Comparison operator that will be used to compare When
In other words, the first comparison operator is applied to the first limit and the second operator is applied to the second limit. |
limitConstraints |
This is the value(s) that will be used for comparison. Can be passed as a single value or a vector of two numerical values. The default is |
keepResults |
A logical flag indicating if the result of E.g. The following are equivalent and will produce a
|
FUN |
Function to be applied to each combination/permutation. The default is |
Parallel |
Logical value indicating whether combinations/permutations should be generated in parallel using |
nThreads |
Specific number of threads to be used. The default is |
tolerance |
A numeric value greater than or equal to zero. This parameter is utilized when a constraint is applied on a numeric vector. The default value is 0 when it can be determined that whole values are being utilized, otherwise it is |
FUN.VALUE |
A template for the return value from |
For the general case, finding all combinations/permutations with constraints is optimized by organizing them in such a way that when constraintFun
is applied, a partially monotonic sequence is produced. Combinations/permutations are added successively, until a particular combination exceeds the given constraint value for a given constraint/comparison function combo. After this point, we can safely skip several combinations knowing that they will exceed the given constraint value.
There are special cases where more efficient algorithms are dyncamically deployed. These cases center around the subject of integer partitions. See partitionsGeneral
for more information.
When there are any negative values in v
and constraintFun = "prod"
, producing a monotonic set is non-trivial for the general case. As a result, performance will suffer as all combinations/permutations must be tested against the constraint criteria.
In general, a matrix with or
columns, depending on the value of
keepResults
If FUN
is utilized and FUN.VALUE = NULL
, a list is returned
When both FUN
and FUN.VALUE
are not NULL
, the return is modeled after the return of vapply
. See the 'Value' section of vapply
.
Parallel
and nThreads
will be ignored in the following cases:
When the output is constrained (except for most partitions cases)
If the class of the vector passed is character
, raw
, and complex
(N.B. Rcpp::CharacterMatrix
is not thread safe). Alternatively, you can generate an indexing matrix in parallel.
If FUN
is utilized.
If either constraintFun
, comparisonFun
or limitConstraints
is NULL
–or– if the class of the vector passed is logical
, character
, raw
, factor
, or complex
, the constraint check will not be carried out. This is equivalent to simply finding all combinations/permutations of choose
.
The maximum number of combinations/permutations that can be generated at one time is . Utilizing
lower
and upper
makes it possible to generate additional combinations/permutations.
Factor vectors are accepted. Class and level attributes are preserved except when FUN
is used.
Lexicographical ordering isn't guaranteed for permutations if lower
isn't supplied and the output is constrained.
If lower
is supplied and the output is constrained, the combinations/permutations that will be tested will be in the lexicographical range lower
to upper
or up to the total possible number of results if upper
is not given. See the second paragraph for the definition of upper
.
FUN
will be ignored if the constraint check is satisfied.
Joseph Wood
comboGeneral(4, 3) permuteGeneral(3) permuteGeneral(factor(letters[1:3]), repetition = TRUE) ## permutations of the multiset : ## c(1,1,1,2,2,3) permuteGeneral(table(c(1,1,1,2,2,3))) ## Example with list comboGeneral( v = list( p1 = matrix(1:10, ncol = 2), p2 = data.frame(a = letters, b = 1:26), p3 = as.complex(1:10) ), m = 2 ) #### Examples using "upper" and "lower": ## See specific range of permutations permuteGeneral(75, 10, freqs = rep(1:3, 25), lower = 1e12, upper = 1e12 + 10) ## Researcher only needs 10 7-tuples of mySamp ## such that the sum is greater than 7200. ## Generate some random data set.seed(1009) mySamp = rnorm(75, 997, 23) comboGeneral(mySamp, 7, constraintFun = "sum", comparisonFun = ">", limitConstraints = 7200, upper = 10) ## Similarly, you can use "lower" to obtain the last rows. ## Generate the last 10 rows comboGeneral(mySamp, 7, lower = choose(75, 7) - 9) ## Or if you would like to generate a specific chunk, ## use both "lower" and "upper". E.g. Generate one ## million combinations starting with the 900,000,001 ## lexicographic combination. t1 = comboGeneral(mySamp, 7, lower = 9*10^8 + 1, upper = 9*10^8 + 10^6) ## class of the source vector is preserved class(comboGeneral(5,3)[1,]) == class(1:5) class(comboGeneral(c(1,2:5),3)[1,]) == class(c(1,2:5)) class(comboGeneral(factor(month.name),3)[1,]) == class(factor(month.name)) ## Using keepResults will add a column of results comboGeneral(-3, 6, TRUE, constraintFun = "sum", comparisonFun = "==", limitConstraints = -8, keepResults = TRUE) ## Using multiple constraints: ## Get combinations such that the product ## is between 3000 and 4000 inclusive comboGeneral(5, 7, TRUE, constraintFun = "prod", comparisonFun = c(">=","<="), limitConstraints = c(3000, 4000), keepResults = TRUE) ## Or, get the combinations such that the ## product is less than or equal to 10 or ## greater than or equal to 40000 comboGeneral(5, 7, TRUE, constraintFun = "prod", comparisonFun = c("<=",">="), limitConstraints = c(10, 40000), keepResults = TRUE) #### General subset sum problem set.seed(516781810) comboGeneral(runif(100, 0, 42), 5, constraintFun = "mean", comparisonFun = "==", limitConstraints = 30, tolerance = 0.0000002) #### Integer Partitions comboGeneral(0:5, 5, TRUE, constraintFun = "sum", comparisonFun = "==", limitConstraints = 5) ## Using FUN comboGeneral(10000, 5, lower = 20, upper = 22, FUN = function(x) { which(cummax(x) %% 2 == 1) }) ## Not run: ## Parallel example generating more than 2^31 - 1 combinations. library(parallel) numCores = detectCores() - 1 ## 10086780 evenly divides choose(35, 15) and is "small enough" to ## generate quickly in chunks. system.time(mclapply(seq(1, comboCount(35, 15), 10086780), function(x) { a = comboGeneral(35, 15, lower = x, upper = x + 10086779) ## do something x }, mc.cores = numCores)) ## Find 13-tuple combinations of 1:25 such ## that the mean is less than 10 system.time(myComb <- comboGeneral(25, 13, FALSE, constraintFun = "mean", comparisonFun = "<", limitConstraints = 10)) ## Alternatively, you must generate all combinations and subsequently ## subset to obtain the combinations that meet the criteria system.time(myComb2 <- combn(25, 13)) system.time(myCols <- which(colMeans(myComb2) < 10)) system.time(myComb2 <- myComb2[, myCols]) ## Any variation is much slower system.time(myComb2 <- combn(25, 13)[,combn(25, 13, mean) < 10]) ## Test equality with myComb above all.equal(myComb, t(myComb2)) ## Fun example... see stackoverflow: ## https://stackoverflow.com/q/22218640/4408538 system.time(permuteGeneral(seq(0L,100L,10L), 8, TRUE, constraintFun = "sum", comparisonFun = "==", limitConstraints = 100)) ## These are called weak integer compositions. Below, we call ## compositionsGeneral which gives the same output except it ## in lexicographical order. See 'Note' above system.time(compositionsGeneral(seq(0L,100L,10L), 8, TRUE, weak = TRUE)) ## End(Not run)
comboGeneral(4, 3) permuteGeneral(3) permuteGeneral(factor(letters[1:3]), repetition = TRUE) ## permutations of the multiset : ## c(1,1,1,2,2,3) permuteGeneral(table(c(1,1,1,2,2,3))) ## Example with list comboGeneral( v = list( p1 = matrix(1:10, ncol = 2), p2 = data.frame(a = letters, b = 1:26), p3 = as.complex(1:10) ), m = 2 ) #### Examples using "upper" and "lower": ## See specific range of permutations permuteGeneral(75, 10, freqs = rep(1:3, 25), lower = 1e12, upper = 1e12 + 10) ## Researcher only needs 10 7-tuples of mySamp ## such that the sum is greater than 7200. ## Generate some random data set.seed(1009) mySamp = rnorm(75, 997, 23) comboGeneral(mySamp, 7, constraintFun = "sum", comparisonFun = ">", limitConstraints = 7200, upper = 10) ## Similarly, you can use "lower" to obtain the last rows. ## Generate the last 10 rows comboGeneral(mySamp, 7, lower = choose(75, 7) - 9) ## Or if you would like to generate a specific chunk, ## use both "lower" and "upper". E.g. Generate one ## million combinations starting with the 900,000,001 ## lexicographic combination. t1 = comboGeneral(mySamp, 7, lower = 9*10^8 + 1, upper = 9*10^8 + 10^6) ## class of the source vector is preserved class(comboGeneral(5,3)[1,]) == class(1:5) class(comboGeneral(c(1,2:5),3)[1,]) == class(c(1,2:5)) class(comboGeneral(factor(month.name),3)[1,]) == class(factor(month.name)) ## Using keepResults will add a column of results comboGeneral(-3, 6, TRUE, constraintFun = "sum", comparisonFun = "==", limitConstraints = -8, keepResults = TRUE) ## Using multiple constraints: ## Get combinations such that the product ## is between 3000 and 4000 inclusive comboGeneral(5, 7, TRUE, constraintFun = "prod", comparisonFun = c(">=","<="), limitConstraints = c(3000, 4000), keepResults = TRUE) ## Or, get the combinations such that the ## product is less than or equal to 10 or ## greater than or equal to 40000 comboGeneral(5, 7, TRUE, constraintFun = "prod", comparisonFun = c("<=",">="), limitConstraints = c(10, 40000), keepResults = TRUE) #### General subset sum problem set.seed(516781810) comboGeneral(runif(100, 0, 42), 5, constraintFun = "mean", comparisonFun = "==", limitConstraints = 30, tolerance = 0.0000002) #### Integer Partitions comboGeneral(0:5, 5, TRUE, constraintFun = "sum", comparisonFun = "==", limitConstraints = 5) ## Using FUN comboGeneral(10000, 5, lower = 20, upper = 22, FUN = function(x) { which(cummax(x) %% 2 == 1) }) ## Not run: ## Parallel example generating more than 2^31 - 1 combinations. library(parallel) numCores = detectCores() - 1 ## 10086780 evenly divides choose(35, 15) and is "small enough" to ## generate quickly in chunks. system.time(mclapply(seq(1, comboCount(35, 15), 10086780), function(x) { a = comboGeneral(35, 15, lower = x, upper = x + 10086779) ## do something x }, mc.cores = numCores)) ## Find 13-tuple combinations of 1:25 such ## that the mean is less than 10 system.time(myComb <- comboGeneral(25, 13, FALSE, constraintFun = "mean", comparisonFun = "<", limitConstraints = 10)) ## Alternatively, you must generate all combinations and subsequently ## subset to obtain the combinations that meet the criteria system.time(myComb2 <- combn(25, 13)) system.time(myCols <- which(colMeans(myComb2) < 10)) system.time(myComb2 <- myComb2[, myCols]) ## Any variation is much slower system.time(myComb2 <- combn(25, 13)[,combn(25, 13, mean) < 10]) ## Test equality with myComb above all.equal(myComb, t(myComb2)) ## Fun example... see stackoverflow: ## https://stackoverflow.com/q/22218640/4408538 system.time(permuteGeneral(seq(0L,100L,10L), 8, TRUE, constraintFun = "sum", comparisonFun = "==", limitConstraints = 100)) ## These are called weak integer compositions. Below, we call ## compositionsGeneral which gives the same output except it ## in lexicographical order. See 'Note' above system.time(compositionsGeneral(seq(0L,100L,10L), 8, TRUE, weak = TRUE)) ## End(Not run)
expand.grid
Where order Does Not Matter
This function efficiently generates Cartesian-product-like output where order does not matter. It is loosely equivalent to the following:
t = expand.grid(list)
t = t[do.call(order, t), ]
key = apply(t, 1, function(x) paste0(sort(x), collapse = ""))
t[!duplicated(key), ]
comboGrid(..., repetition = TRUE)
comboGrid(..., repetition = TRUE)
... |
vectors, factors or a list containing these. (See |
repetition |
Logical value indicating whether results should be with or without repetition. The default is |
If items with different classes are passed, a data frame will be returned, otherwise a matrix will be returned.
Joseph Wood
## return a matrix expGridNoOrder = comboGrid(1:5, 3:9, letters[1:5], letters[c(1,4,5,8)]) head(expGridNoOrder) tail(expGridNoOrder) expGridNoOrderNoRep = comboGrid(1:5, 3:9, letters[1:5], letters[c(1,4,5,8)], repetition = FALSE) head(expGridNoOrderNoRep) tail(expGridNoOrderNoRep)
## return a matrix expGridNoOrder = comboGrid(1:5, 3:9, letters[1:5], letters[c(1,4,5,8)]) head(expGridNoOrder) tail(expGridNoOrder) expGridNoOrderNoRep = comboGrid(1:5, 3:9, letters[1:5], letters[c(1,4,5,8)], repetition = FALSE) head(expGridNoOrderNoRep) tail(expGridNoOrderNoRep)
Generate partitions of a vector into groups. See Create Combinations in R by Groups on https://stackoverflow.com for a direct use case of when the groups sizes are equal.
Produce results in parallel using the Parallel
or nThreads
arguments.
GMP support allows for exploration where the number of results is large.
The output is in lexicographical order by groups.
comboGroups(v, numGroups = NULL, grpSizes = NULL, retType = "matrix", lower = NULL, upper = NULL, Parallel = FALSE, nThreads = NULL)
comboGroups(v, numGroups = NULL, grpSizes = NULL, retType = "matrix", lower = NULL, upper = NULL, Parallel = FALSE, nThreads = NULL)
v |
Source vector. If |
numGroups |
An Integer. The number of groups that the vector will be partitioned into. The default is |
grpSizes |
A vector of whole numbers representing the size of each group. The default is |
retType |
A string, "3Darray" or "matrix", that determines the shape of the output. The default is "matrix". Note, "3Darray" can only be used when the size of each group is uniform. When the size of each group varies, the return output will always be a matrix. |
lower |
The lower bound. Partitions of groups are generated lexicographically, thus utilizing this argument will determine which specific result to start generating from (e.g. |
upper |
The upper bound. Similar to |
Parallel |
Logical value indicating whether results should be generated in parallel using |
nThreads |
Specific number of threads to be used. The default is |
Conceptually, this problem can be viewed as generating all permutations of the vector v
and removing the within group permutations. To illustrate this, let us consider the case of generating partitions of 1:8
into 2 groups each of size 4.
To begin, generate the permutations of 1:8
and group the first/last four elements of each row.
Grp1 | Grp2 | |||||||
C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | |
R1 | | 1 | 2 | 3 | 4 | | | 5 | 6 | 7 | 8 | |
R2 | | 1 | 2 | 3 | 4 | | | 5 | 6 | 8 | 7 | |
R3 | | 1 | 2 | 3 | 4 | | | 5 | 7 | 6 | 8 | |
R4 | | 1 | 2 | 3 | 4 | | | 5 | 7 | 8 | 6 | |
R5 | | 1 | 2 | 3 | 4 | | | 5 | 8 | 6 | 7 | |
R6 | | 1 | 2 | 3 | 4 | | | 5 | 8 | 7 | 6 | |
Note that the permutations above are equivalent partitions of 2 groups of size 4 as only the last four elements are permuted. If we look at at the lexicographical permutation, we observe our second distinct partition.
Grp1 | Grp2 | |||||||
C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | |
R24 | | 1 | 2 | 3 | 4 | | | 8 | 7 | 6 | 5 | |
R25 | | 1 | 2 | 3 | 5 | | | 4 | 6 | 7 | 8 | |
R26 | | 1 | 2 | 3 | 5 | | | 4 | 6 | 8 | 7 | |
R27 | | 1 | 2 | 3 | 5 | | | 4 | 7 | 6 | 8 | |
R28 | | 1 | 2 | 3 | 5 | | | 4 | 7 | 8 | 6 | |
Continuing on, we will reach the lexicographical permutation, which represents the last result:
Grp1 | Grp2 | |||||||
C1 | C2 | C3 | C4 | C5 | C6 | C7 | C8 | |
R3454 | | 1 | 6 | 7 | 5 | | |8 | 3 | 4 | 2 | |
R3455 | | 1 | 6 | 7 | 5 | | |8 | 4 | 2 | 3 | |
R3456 | | 1 | 6 | 7 | 5 | | |8 | 4 | 3 | 2 | |
R3457 | | 1 | 6 | 7 | 8 | | | 2 | 3 | 4 | 5 | |
R3458 | | 1 | 6 | 7 | 8 | | |2 | 3 | 5 | 4 | |
For this small example, the method above will not be that computationally expensive. In fact, there are only 35 total partitions of 1:8
into 2 groups of size 4 out of a possible factorial(8) = 40320
permutations. However, just doubling the size of the vector will make this approach infeasible as there are over 10 trillion permutations of 1:16
.
The algorithm in comboGroups
avoids these duplicate partitions of groups by utilizing an efficient algorithm analogous to the std::next_permutation found in the standard algorithm library in C++.
By default, a matrix is returned with column names corresponding to the associated group. If retType = "3Darray"
, a named 3D array is returned.
The maximum number of partitions of groups that can be generated at one time is . Utilizing
lower
and upper
makes it possible to generate additional combinations/permutations.
The length of grpSizes
must equal numGroups
if both grpSize
and numGroups
are provided.
Joseph Wood
## return a matrix comboGroups(8, 2) ## or a 3 dimensional array temp = comboGroups(8, 2, retType = "3Darray") ## view the first partition temp[1, , ] ## Example with groups of varying size comboGroups(8, grpSizes = c(3, 5)) total = comboGroupsCount(11, grpSizes = c(3, 3, 5)) ## Start generating from particular index comboGroups(11, grpSizes = c(3, 3, 5), lower = total - 20)
## return a matrix comboGroups(8, 2) ## or a 3 dimensional array temp = comboGroups(8, 2, retType = "3Darray") ## view the first partition temp[1, , ] ## Example with groups of varying size comboGroups(8, grpSizes = c(3, 5)) total = comboGroupsCount(11, grpSizes = c(3, 3, 5)) ## Start generating from particular index comboGroups(11, grpSizes = c(3, 3, 5), lower = total - 20)
The ComboGroups
class is an S4-class that exposes C++ classes that provide access to iterators and other useful methods.
nextIter
Retrieve the next lexicographical result
nextNIter
Pass an integer n to retrieve the next n lexicographical results
nextRemaining
Retrieve all remaining lexicographical results
currIter
Returns the current iteration
startOver
Resets the iterator
sourceVector
View the source vector
summary
Returns a list of summary information about the iterator
front
Retrieve the first lexicographical result
back
Retrieve the last lexicographical result
randomAccess
Random access method. Pass a single value or a vector of valid indices. If a single value is passed, the internal index of the iterator will be updated, however if a vector is passed the internal state will not change. GMP support allows for flexible indexing.
Joseph Wood
Combo-class
, Constraints-class
, Partitions-class
showClass("ComboGroups")
showClass("ComboGroups")
Calculate the number of partitions of a vector into groups. See the related integer sequences A025035-A025042 at OEIS (E.g. A025036 for Number of partitions of into sets of size 4.)
comboGroupsCount(v, numGroups = NULL, grpSizes = NULL)
comboGroupsCount(v, numGroups = NULL, grpSizes = NULL)
v |
Source vector. If |
numGroups |
An Integer. The number of groups that the vector will be partitioned into. The default is |
grpSizes |
A vector of whole numbers representing the size of each group. The default is |
A numerical value representing the total number of partitions of groups.
When the number of results exceeds , a number of class
bigz
is returned.
Joseph Wood
comboGroupsCount(16, 4) comboGroupsCount(16, grpSizes = c(1:4, 6)) comboGroupsCount(28, grpSizes = rep(2:5, each = 2))
comboGroupsCount(16, 4) comboGroupsCount(16, grpSizes = c(1:4, 6)) comboGroupsCount(28, grpSizes = rep(2:5, each = 2))
Returns an iterator for iterating over partitions of a vector into groups.
Supports random access via the [[
method.
GMP support allows for exploration of cases where the number of comboGroups is large.
Use the next
methods to obtain results in lexicographical order.
comboGroupsIter(v, numGroups = NULL, grpSizes = NULL, retType = "matrix", Parallel = FALSE, nThreads = NULL)
comboGroupsIter(v, numGroups = NULL, grpSizes = NULL, retType = "matrix", Parallel = FALSE, nThreads = NULL)
v |
Source vector. If |
numGroups |
An Integer. The number of groups that the vector will be partitioned into. The default is |
grpSizes |
A vector of whole numbers representing the size of each group. The default is |
retType |
A string, "3Darray" or "matrix", that determines the shape of the output. The default is "matrix". Note, "3Darray" can only be used when the size of each group is uniform. When the size of each group varies, the return output will always be a matrix. |
Parallel |
Logical value indicating whether results should be generated in parallel using |
nThreads |
Specific number of threads to be used. The default is |
Once you initialize a new iterator, the following methods are available:
nextIter
Retrieve the next lexicographical result
nextNIter
Pass an integer n to retrieve the next n lexicographical results
nextRemaining
Retrieve all remaining lexicographical results
currIter
Returns the current iteration
startOver
Resets the iterator
sourceVector
View the source vector
summary
Returns a list of summary information about the iterator
front
Retrieve the first lexicographical result
back
Retrieve the last lexicographical result
[[
Random access method. Pass a single value or a vector of valid indices. If a single value is passed, the internal index of the iterator will be updated, however if a vector is passed the internal state will not change. GMP support allows for flexible indexing.
If nextIter
is called, a named vector is returned if retType = "matrix"
. If retType = "3Darray"
, a named matrix is returned.
Otherwise a named matrix is returned when retType = "matrix"
and a named 3D array is returned when retType = "3Darray"
.
If nThreads
is utilized, it will only take effect if the number of elements requested is greater than some threshold (determined internally). E.g:
serial <- comboGroupsIter(50, 10) multi <- comboGroupsIter(50, 10, nThreads = 4) fetch1e6 <- multi@nextNIter(1e6) ## much faster than serial@nextNIter(1e6) fetch1e3 <- multi@nextNIter(1e3) ## only one thread used... same as serial@nextNIter(1e3) library(microbenchmark) microbenchmark(multi@nextNIter(1e6), serial@nextNIter(1e6), times = 20) microbenchmark(multi@nextNIter(1e3), serial@nextNIter(1e3), times = 20)
The maximum number of comboGroups that can be generated at one time is .
Joseph Wood
a = comboGroupsIter(12, 3) a@nextIter() a@nextNIter(3) a@front() all_remaining = a@nextRemaining() dim(all_remaining) a@summary() a@back() a[[5]] a@summary() a[[c(1, 17, 3)]] a@summary()
a = comboGroupsIter(12, 3) a@nextIter() a@nextNIter(3) a@front() all_remaining = a@nextRemaining() dim(all_remaining) a@summary() a@back() a[[5]] a@summary() a[[c(1, 17, 3)]] a@summary()
Generate a specific (lexicographically) or random sample of partitions of groups.
Produce results in parallel using the Parallel
or nThreads
arguments.
GMP support allows for exploration where the number of results is large.
comboGroupsSample(v, numGroups = NULL, grpSizes = NULL, retType = "matrix", n = NULL, sampleVec = NULL, seed = NULL, Parallel = FALSE, nThreads = NULL, namedSample = FALSE)
comboGroupsSample(v, numGroups = NULL, grpSizes = NULL, retType = "matrix", n = NULL, sampleVec = NULL, seed = NULL, Parallel = FALSE, nThreads = NULL, namedSample = FALSE)
v |
Source vector. If |
numGroups |
An Integer. The number of groups that the vector will be partitioned into. The default is |
grpSizes |
A vector of whole numbers representing the size of each group. The default is |
retType |
A string, "3Darray" or "matrix", that determines the shape of the output. The default is "matrix". Note, "3Darray" can only be used when the size of each group is uniform. When the size of each group varies, the return output will always be a matrix. |
n |
Number of results to return. The default is |
sampleVec |
A vector of numbers representing the lexicographical partition of groups to return. Accepts vectors of class |
seed |
Random seed initialization. The default is |
Parallel |
Logical value indicating whether results should be generated in parallel. The default is |
nThreads |
Specific number of threads to be used. The default is |
namedSample |
Logical flag. If |
These algorithms rely on efficiently generating the lexicographical result.
By default, a matrix is returned with column names corresponding to the associated group. If retType = "3Darray"
, a 3D array is returned.
Joseph Wood
## generate 10 random partitions of groups of equal size comboGroupsSample(10, 2, n = 10, seed = 123) ## generate 10 random partitions of groups of varying sizes comboGroupsSample(10, grpSizes = 1:4, n = 10, seed = 123) ## using sampleVec to generate specific results comboGroupsSample(15, 5, sampleVec = c(1, 100, 1e3, 1e6)) all.equal(comboGroupsSample(10, 5, sampleVec = 1:comboGroupsCount(10, 5)), comboGroups(10, 5)) ## Examples with enormous number of total results num = comboGroupsCount(100, 20) gmp::log2.bigz(num) ## [1] 325.5498 first = gmp::urand.bigz(n = 1, size = 325, seed = 123) mySamp = do.call(c, lapply(0:10, function(x) gmp::add.bigz(first, x))) class(mySamp) ## [1] "bigz" ## using the sampling function cbgSamp = comboGroupsSample(100, 20, sampleVec = mySamp) ## using the standard function cbgGeneral = comboGroups(100, 20, lower = first, upper = gmp::add.bigz(first, 10)) identical(cbgSamp, cbgGeneral) ## [1] TRUE ## Not run: ## Using Parallel system.time(comboGroupsSample(1000, 20, n = 80, seed = 10, Parallel = TRUE)) ## End(Not run)
## generate 10 random partitions of groups of equal size comboGroupsSample(10, 2, n = 10, seed = 123) ## generate 10 random partitions of groups of varying sizes comboGroupsSample(10, grpSizes = 1:4, n = 10, seed = 123) ## using sampleVec to generate specific results comboGroupsSample(15, 5, sampleVec = c(1, 100, 1e3, 1e6)) all.equal(comboGroupsSample(10, 5, sampleVec = 1:comboGroupsCount(10, 5)), comboGroups(10, 5)) ## Examples with enormous number of total results num = comboGroupsCount(100, 20) gmp::log2.bigz(num) ## [1] 325.5498 first = gmp::urand.bigz(n = 1, size = 325, seed = 123) mySamp = do.call(c, lapply(0:10, function(x) gmp::add.bigz(first, x))) class(mySamp) ## [1] "bigz" ## using the sampling function cbgSamp = comboGroupsSample(100, 20, sampleVec = mySamp) ## using the standard function cbgGeneral = comboGroups(100, 20, lower = first, upper = gmp::add.bigz(first, 10)) identical(cbgSamp, cbgGeneral) ## [1] TRUE ## Not run: ## Using Parallel system.time(comboGroupsSample(1000, 20, n = 80, seed = 10, Parallel = TRUE)) ## End(Not run)
Returns an iterator for iterating over combinations or permutations of a vector with or without constraints.
Supports random access via the [[
method.
GMP support allows for exploration of combinations/permutations of vectors with many elements.
The output is in lexicographical order for the next
methods and reverse lexicographical order for the prev
methods.
Learn more in vignette("iterators")
.
comboIter(v, m = NULL, ...) permuteIter(v, m = NULL, ...) ## S3 method for class 'numeric' comboIter(v, m = NULL, repetition = FALSE, freqs = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ...) ## S3 method for class 'numeric' permuteIter(v, m = NULL, repetition = FALSE, freqs = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ...) ## S3 method for class 'factor' comboIter( v, m = NULL, repetition = FALSE, freqs = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, FUN.VALUE = NULL, ... ) ## S3 method for class 'factor' permuteIter( v, m = NULL, repetition = FALSE, freqs = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, FUN.VALUE = NULL, ... ) ## Default S3 method: comboIter( v, m = NULL, repetition = FALSE, freqs = NULL, FUN = NULL, FUN.VALUE = NULL, ... ) ## Default S3 method: permuteIter( v, m = NULL, repetition = FALSE, freqs = NULL, FUN = NULL, FUN.VALUE = NULL, ... ) ## S3 method for class 'table' comboIter( v, m = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ... ) ## S3 method for class 'table' permuteIter( v, m = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ... ) ## S3 method for class 'list' comboIter(v, m = NULL, repetition = FALSE, freqs = NULL, ...) ## S3 method for class 'list' permuteIter(v, m = NULL, repetition = FALSE, freqs = NULL, ...)
comboIter(v, m = NULL, ...) permuteIter(v, m = NULL, ...) ## S3 method for class 'numeric' comboIter(v, m = NULL, repetition = FALSE, freqs = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ...) ## S3 method for class 'numeric' permuteIter(v, m = NULL, repetition = FALSE, freqs = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ...) ## S3 method for class 'factor' comboIter( v, m = NULL, repetition = FALSE, freqs = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, FUN.VALUE = NULL, ... ) ## S3 method for class 'factor' permuteIter( v, m = NULL, repetition = FALSE, freqs = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, FUN.VALUE = NULL, ... ) ## Default S3 method: comboIter( v, m = NULL, repetition = FALSE, freqs = NULL, FUN = NULL, FUN.VALUE = NULL, ... ) ## Default S3 method: permuteIter( v, m = NULL, repetition = FALSE, freqs = NULL, FUN = NULL, FUN.VALUE = NULL, ... ) ## S3 method for class 'table' comboIter( v, m = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ... ) ## S3 method for class 'table' permuteIter( v, m = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL, FUN.VALUE = NULL, ... ) ## S3 method for class 'list' comboIter(v, m = NULL, repetition = FALSE, freqs = NULL, ...) ## S3 method for class 'list' permuteIter(v, m = NULL, repetition = FALSE, freqs = NULL, ...)
v |
Source vector. If |
m |
Number of elements to choose. If |
... |
Further arguments passed to methods. |
repetition |
Logical value indicating whether combinations/permutations should be with or without repetition. The default is |
freqs |
A vector of frequencies used for producing all combinations/permutations of a multiset of |
constraintFun |
Function to be applied to the elements of |
comparisonFun |
Comparison operator that will be used to compare When
In other words, the first comparison operator is applied to the first limit and the second operator is applied to the second limit. |
limitConstraints |
This is the value(s) that will be used for comparison. Can be passed as a single value or a vector of two numerical values. The default is |
keepResults |
A logical flag indicating if the result of |
FUN |
Function to be applied to each combination/permutation. The default is |
Parallel |
Logical value indicating whether combinations/permutations should be generated in parallel using |
nThreads |
Specific number of threads to be used. The default is |
tolerance |
A numeric value greater than or equal to zero. This parameter is utilized when a constraint is applied on a numeric vector. The default value is 0 when it can be determined that whole values are being utilized, otherwise it is |
FUN.VALUE |
A template for the return value from |
Once you initialize a new iterator, the following methods are available via @
(e.g. a@nextIter()
) or $
(e.g. a$nextIter()
). The preferred practice is to use @
as it is much more efficient (See examples below). Also note that not all of the methods below are available in all cases. See Combo-class
, Constraints-class
, and Partitions-class
:
nextIter
Retrieve the next lexicographical result
nextNIter
Pass an integer n to retrieve the next n lexicographical results
nextRemaining
Retrieve all remaining lexicographical results
currIter
Returns the current iteration
prevIter
Retrieve the previous lexicographical result (the next reverse lexicographical result)
prevNIter
Pass an integer n to retrieve the previous n lexicographical results (the next n reverse lexicographical results)
prevRemaining
Retrieve all remaining reverse lexicographical results
startOver
Resets the iterator
sourceVector
View the source vector
summary
Returns a list of summary information about the iterator
front
Retrieve the first lexicographical result
back
Retrieve the last lexicographical result
[[
Random access method. Pass a single value or a vector of valid indices. If a single value is passed, the internal index of the iterator will be updated, however if a vector is passed the internal state will not change. GMP support allows for flexible indexing.
If nextIter
or prevIter
is called, a vector is returned
Otherwise, a matrix with or
columns, depending on the value of
keepResults
If FUN
is utilized, FUN.VALUE = NULL
, and either nextIter
or prevIter
is called, the result will be determined by FUN
, otherwise a list is returned.
When both FUN
and FUN.VALUE
are not NULL
, the return is modeled after the return of vapply
. See the 'Value' section of vapply
.
Parallel
and nThreads
will be ignored in the following cases:
When the output is constrained (except for most partitions cases)
If the class of the vector passed is character
, raw
, and complex
(N.B. Rcpp::CharacterMatrix
is not thread safe). Alternatively, you can generate an indexing matrix in parallel.
If FUN
is utilized.
If either constraintFun
, comparisonFun
or limitConstraints
is NULL
–or– if the class of the vector passed is logical
, character
, raw
, factor
, or complex
, the constraint check will not be carried out. This is equivalent to simply finding all combinations/permutations of choose
.
The maximum number of combinations/permutations that can be generated at one time is .
Factor vectors are accepted. Class and level attributes are preserved except when FUN
is used.
Lexicographical ordering isn't guaranteed for permutations if the output is constrained.
FUN
will be ignored if the constraint check is satisfied.
Joseph Wood
## Typical usage a = permuteIter(unique(state.region)) a@nextIter() a@nextNIter(3) a@front() a@nextRemaining() a@prevIter() a@prevNIter(15) a@summary() a@back() a@prevRemaining() a[[5]] a@summary() a[[c(1, 17, 3)]] a@summary() ## See examples for comboGeneral where lower and upper are used set.seed(1009) mySamp = sort(rnorm(75, 997, 23)) b = comboIter(mySamp, 7, constraintFun = "sum", comparisonFun = ">", limitConstraints = 7200) b@nextIter() b@nextNIter(3) b@summary() b@currIter() ## Not run: ## We don't have random access or previous methods b@back() #> Error: no slot of name "back" for this object of class "Constraints" b@prevIter() #> Error: no slot of name "prevIter" for this object of class "Constraints" ## End(Not run)
## Typical usage a = permuteIter(unique(state.region)) a@nextIter() a@nextNIter(3) a@front() a@nextRemaining() a@prevIter() a@prevNIter(15) a@summary() a@back() a@prevRemaining() a[[5]] a@summary() a[[c(1, 17, 3)]] a@summary() ## See examples for comboGeneral where lower and upper are used set.seed(1009) mySamp = sort(rnorm(75, 997, 23)) b = comboIter(mySamp, 7, constraintFun = "sum", comparisonFun = ">", limitConstraints = 7200) b@nextIter() b@nextNIter(3) b@summary() b@currIter() ## Not run: ## We don't have random access or previous methods b@back() #> Error: no slot of name "back" for this object of class "Constraints" b@prevIter() #> Error: no slot of name "prevIter" for this object of class "Constraints" ## End(Not run)
Generate the rank (lexicographically) of combinations/permutations. These functions are the complement to comboSample
and permuteSample
. See the examples below.
GMP support allows for exploration of combinations/permutations of vectors with many elements.
comboRank(..., v, repetition = FALSE, freqs = NULL) permuteRank(..., v, repetition = FALSE, freqs = NULL)
comboRank(..., v, repetition = FALSE, freqs = NULL) permuteRank(..., v, repetition = FALSE, freqs = NULL)
... |
vectors or matrices to be ranked. |
v |
Source vector. If |
repetition |
Logical value indicating whether combinations/permutations should be with or without repetition. The default is |
freqs |
A vector of frequencies used for producing all combinations/permutations of a multiset of |
These algorithms rely on efficiently ranking the lexicographical combination/permutation.
A vector of class integer
, numeric
, or bigz
determined by the total number of combinations/permutations
v
must be supplied.
Joseph Wood
Lexicographical order ranking/unranking
mySamp = comboSample(30, 8, TRUE, n = 5, seed = 10, namedSample = TRUE) myRank = comboRank(mySamp, v = 30, repetition = TRUE) all.equal(as.integer(rownames(mySamp)), myRank)
mySamp = comboSample(30, 8, TRUE, n = 5, seed = 10, namedSample = TRUE) myRank = comboRank(mySamp, v = 30, repetition = TRUE) all.equal(as.integer(rownames(mySamp)), myRank)
Generate a specific (lexicographically) or random sample of combinations/permutations.
Produce results in parallel using the Parallel
or nThreads
arguments.
GMP support allows for exploration of combinations/permutations of vectors with many elements.
comboSample(v, m = NULL, ...) permuteSample(v, m = NULL, ...) ## S3 method for class 'numeric' comboSample(v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, namedSample = FALSE, FUN.VALUE = NULL, ...) ## S3 method for class 'numeric' permuteSample(v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, namedSample = FALSE, FUN.VALUE = NULL, ...) ## S3 method for class 'factor' comboSample( v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, namedSample = FALSE, FUN.VALUE = NULL, ... ) ## S3 method for class 'factor' permuteSample( v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, namedSample = FALSE, FUN.VALUE = NULL, ... ) ## Default S3 method: comboSample( v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, namedSample = FALSE, FUN.VALUE = NULL, ... ) ## Default S3 method: permuteSample( v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, namedSample = FALSE, FUN.VALUE = NULL, ... ) ## S3 method for class 'table' comboSample( v, m = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, namedSample = FALSE, FUN.VALUE = NULL, ... ) ## S3 method for class 'table' permuteSample( v, m = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, namedSample = FALSE, FUN.VALUE = NULL, ... ) ## S3 method for class 'list' comboSample( v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, namedSample = FALSE, ... ) ## S3 method for class 'list' permuteSample( v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, namedSample = FALSE, ... )
comboSample(v, m = NULL, ...) permuteSample(v, m = NULL, ...) ## S3 method for class 'numeric' comboSample(v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, namedSample = FALSE, FUN.VALUE = NULL, ...) ## S3 method for class 'numeric' permuteSample(v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, namedSample = FALSE, FUN.VALUE = NULL, ...) ## S3 method for class 'factor' comboSample( v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, namedSample = FALSE, FUN.VALUE = NULL, ... ) ## S3 method for class 'factor' permuteSample( v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, namedSample = FALSE, FUN.VALUE = NULL, ... ) ## Default S3 method: comboSample( v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, namedSample = FALSE, FUN.VALUE = NULL, ... ) ## Default S3 method: permuteSample( v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, namedSample = FALSE, FUN.VALUE = NULL, ... ) ## S3 method for class 'table' comboSample( v, m = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, namedSample = FALSE, FUN.VALUE = NULL, ... ) ## S3 method for class 'table' permuteSample( v, m = NULL, n = NULL, sampleVec = NULL, seed = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, namedSample = FALSE, FUN.VALUE = NULL, ... ) ## S3 method for class 'list' comboSample( v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, namedSample = FALSE, ... ) ## S3 method for class 'list' permuteSample( v, m = NULL, repetition = FALSE, freqs = NULL, n = NULL, sampleVec = NULL, seed = NULL, namedSample = FALSE, ... )
v |
Source vector. If |
m |
Number of elements to choose. If |
... |
Further arguments passed to methods. |
repetition |
Logical value indicating whether combinations/permutations should be with or without repetition. The default is |
freqs |
A vector of frequencies used for producing all combinations/permutations of a multiset of |
n |
Number of combinations/permutations to return. The default is |
sampleVec |
A vector of indices representing the lexicographical combination/permutations to return. Accepts whole numbers as well as vectors of class |
seed |
Random seed initialization. The default is |
FUN |
Function to be applied to each combination/permutation. The default is |
Parallel |
Logical value indicating whether combinations/permutations should be generated in parallel. The default is |
nThreads |
Specific number of threads to be used. The default is |
namedSample |
Logical flag. If |
FUN.VALUE |
A template for the return value from |
These algorithms rely on efficiently generating the lexicographical combination/permutation. This is the process of unranking.
In general, a matrix with or
columns, depending on the value of
keepResults
If FUN
is utilized and FUN.VALUE = NULL
, a list is returned
When both FUN
and FUN.VALUE
are not NULL
, the return is modeled after the return of vapply
. See the 'Value' section of vapply
.
Parallel
and nThreads
will be ignored in the following cases:
If the class of the vector passed is character
(N.B. Rcpp::CharacterMatrix
is not thread safe). Alternatively, you can generate an indexing matrix in parallel.
If FUN
is utilized.
n
and sampleVec
cannot both be NULL
.
Factor vectors are accepted. Class and level attributes are preserved except when FUN
is used.
Joseph Wood
## generate 10 random combinations comboSample(30, 8, TRUE, n = 5, seed = 10) ## Using sampleVec to generate specific permutations fqs = c(1,2,2,1,2,2,1,2,1,2,2,1,2,1,1) s_idx = c(1, 10^2, 10^5, 10^8, 10^11) permuteSample(15, 10, freqs = fqs, sampleVec = s_idx) ## Same example using 'table' method permuteSample(table(rep(1:15, times = fqs)), 10, sampleVec = s_idx) ## Generate each result one by one... ## Same, but not as efficient as generating iteratively all.equal(comboSample(10, 5, sampleVec = 1:comboCount(10, 5)), comboGeneral(10, 5)) ## Examples with enormous number of total permutations num = permuteCount(10000, 20) gmp::log2.bigz(num) first = gmp::urand.bigz(n = 1, size = 265, seed = 123) mySamp = do.call(c, lapply(0:10, function(x) gmp::add.bigz(first, x))) class(mySamp) ## using permuteSample pSamp = permuteSample(10000, 20, sampleVec = mySamp) ## using permuteGeneral pGeneral = permuteGeneral(10000, 20, lower = first, upper = gmp::add.bigz(first, 10)) identical(pSamp, pGeneral) ## Using nThreads permPar = permuteSample(10000, 50, n = 8, seed = 10, nThreads = 2) ## Using FUN permuteSample(10000, 50, n = 4, seed = 10, FUN = sd) ## Not run: ## Using Parallel permuteSample(10000, 50, n = 80, seed = 10, Parallel = TRUE) ## End(Not run)
## generate 10 random combinations comboSample(30, 8, TRUE, n = 5, seed = 10) ## Using sampleVec to generate specific permutations fqs = c(1,2,2,1,2,2,1,2,1,2,2,1,2,1,1) s_idx = c(1, 10^2, 10^5, 10^8, 10^11) permuteSample(15, 10, freqs = fqs, sampleVec = s_idx) ## Same example using 'table' method permuteSample(table(rep(1:15, times = fqs)), 10, sampleVec = s_idx) ## Generate each result one by one... ## Same, but not as efficient as generating iteratively all.equal(comboSample(10, 5, sampleVec = 1:comboCount(10, 5)), comboGeneral(10, 5)) ## Examples with enormous number of total permutations num = permuteCount(10000, 20) gmp::log2.bigz(num) first = gmp::urand.bigz(n = 1, size = 265, seed = 123) mySamp = do.call(c, lapply(0:10, function(x) gmp::add.bigz(first, x))) class(mySamp) ## using permuteSample pSamp = permuteSample(10000, 20, sampleVec = mySamp) ## using permuteGeneral pGeneral = permuteGeneral(10000, 20, lower = first, upper = gmp::add.bigz(first, 10)) identical(pSamp, pGeneral) ## Using nThreads permPar = permuteSample(10000, 50, n = 8, seed = 10, nThreads = 2) ## Using FUN permuteSample(10000, 50, n = 4, seed = 10, FUN = sd) ## Not run: ## Using Parallel permuteSample(10000, 50, n = 80, seed = 10, Parallel = TRUE) ## End(Not run)
The Constraints
class is an S4-class that exposes C++ classes that provide access to iterators and other useful methods.
nextIter
Retrieve the next lexicographical result
nextNIter
Pass an integer n to retrieve the next n lexicographical results
nextRemaining
Retrieve all remaining lexicographical results
currIter
Returns the current iteration
startOver
Resets the iterator
sourceVector
View the source vector
summary
Returns a list of summary information about the iterator
Joseph Wood
showClass("Constraints")
showClass("Constraints")
Function for generating the complete factorization for a vector of numbers.
divisorsRcpp(v, namedList = FALSE, nThreads = NULL)
divisorsRcpp(v, namedList = FALSE, nThreads = NULL)
v |
Vector of integers or numeric values. Non-integral values will be coerced to whole numbers. |
namedList |
Logical flag. If |
nThreads |
Specific number of threads to be used. The default is |
Efficient algorithm that builds on primeFactorize
to generate the complete factorization of many numbers.
Returns an unnamed vector if length(v) == 1
regardless of the value of namedList
. If , the class of the returned vector will be integer, otherwise the class will be numeric.
If length(v) > 1
, a named/unnamed list of vectors will be returned. If max(bound1, bound2)
, the class of each vector will be integer, otherwise the class will be numeric.
The maximum value for each element in is
.
Joseph Wood
## Get the complete factorization of a single number divisorsRcpp(10^8) ## Or get the complete factorization of many numbers set.seed(29) myVec <- sample(-1000000:1000000, 1000) system.time(myFacs <- divisorsRcpp(myVec)) ## Return named list myFacsWithNames <- divisorsRcpp(myVec, namedList = TRUE) ## Using nThreads system.time(divisorsRcpp(myVec, nThreads = 2))
## Get the complete factorization of a single number divisorsRcpp(10^8) ## Or get the complete factorization of many numbers set.seed(29) myVec <- sample(-1000000:1000000, 1000) system.time(myFacs <- divisorsRcpp(myVec)) ## Return named list myFacsWithNames <- divisorsRcpp(myVec, namedList = TRUE) ## Using nThreads system.time(divisorsRcpp(myVec, nThreads = 2))
Sieve that generates the complete factorization of all numbers between bound1
and bound2
(if supplied) or all numbers up to bound1
.
divisorsSieve(bound1, bound2 = NULL, namedList = FALSE, nThreads = NULL)
divisorsSieve(bound1, bound2 = NULL, namedList = FALSE, nThreads = NULL)
bound1 |
Positive integer or numeric value. |
bound2 |
Positive integer or numeric value. |
namedList |
Logical flag. If |
nThreads |
Specific number of threads to be used. The default is |
This function is useful when many complete factorizations are needed. Instead of generating the complete factorization on the fly, one can reference the indices/names of the generated list.
This algorithm benefits greatly from the fast integer division library 'libdivide'. The following is from https://libdivide.com/:
“libdivide allows you to replace expensive integer divides with comparatively cheap multiplication and bitshifts. Compilers usually do this, but only when the divisor is known at compile time. libdivide allows you to take advantage of it at runtime. The result is that integer division can become faster - a lot faster.”
Returns a named/unnamed list of integer vectors if max(bound1, bound2)
, or a list of numeric vectors otherwise.
The maximum value for either of the bounds is .
Joseph Wood
divisorsRcpp
, primeFactorizeSieve
## Generate some random data set.seed(33550336) mySamp <- sample(10^5, 5*10^4) ## Generate complete factorizations up ## to 10^5 (max element from mySamp) system.time(allFacs <- divisorsSieve(10^5)) ## Use generated complete factorization for further ## analysis by accessing the index of allFacs for (s in mySamp) { myFac <- allFacs[[s]] ## Continue algorithm } ## Generating complete factorizations over ## a range is efficient as well system.time(divisorsSieve(10^12, 10^12 + 10^5)) ## Use nThreads for improved efficiency system.time(divisorsSieve(10^12, 10^12 + 10^5, nThreads = 2)) ## Set 'namedList' to TRUE to return a named list divisorsSieve(27, 30, namedList = TRUE) ## Using nThreads system.time(divisorsSieve(1e5, 2e5, nThreads = 2))
## Generate some random data set.seed(33550336) mySamp <- sample(10^5, 5*10^4) ## Generate complete factorizations up ## to 10^5 (max element from mySamp) system.time(allFacs <- divisorsSieve(10^5)) ## Use generated complete factorization for further ## analysis by accessing the index of allFacs for (s in mySamp) { myFac <- allFacs[[s]] ## Continue algorithm } ## Generating complete factorizations over ## a range is efficient as well system.time(divisorsSieve(10^12, 10^12 + 10^5)) ## Use nThreads for improved efficiency system.time(divisorsSieve(10^12, 10^12 + 10^5, nThreads = 2)) ## Set 'namedList' to TRUE to return a named list divisorsSieve(27, 30, namedList = TRUE) ## Using nThreads system.time(divisorsSieve(1e5, 2e5, nThreads = 2))
Sieve that generates the number of coprime elements for every number between bound1
and bound2
(if supplied) or all numbers up to bound1
. This is equivalent to applying Euler's phi function (often written as ) to every number in a given range.
eulerPhiSieve(bound1, bound2 = NULL, namedVector = FALSE, nThreads = NULL)
eulerPhiSieve(bound1, bound2 = NULL, namedVector = FALSE, nThreads = NULL)
bound1 |
Positive integer or numeric value. |
bound2 |
Positive integer or numeric value. |
namedVector |
Logical flag. If |
nThreads |
Specific number of threads to be used. The default is |
For the simple case (i.e. when bound2 = NULL
), this algorithm first generates all primes up to via the sieve of Eratosthenes. We use these primes to sieve over the sequence
1:n
, dividing each value by , creating a temporary value that will be subtracted from the original value at each index (i.e. equivalent to multiply each index by
but more efficient as we don't have to deal with floating point numbers). The case when
is.null(bound2) = FALSE
is more complicated but the basic ideas still hold.
This function is very useful when you need to calculate Euler's phi function for many numbers in a range as performing this calculation on the fly can be computationally expensive.
This algorithm benefits greatly from the fast integer division library 'libdivide'. The following is from https://libdivide.com/:
“libdivide allows you to replace expensive integer divides with comparatively cheap multiplication and bitshifts. Compilers usually do this, but only when the divisor is known at compile time. libdivide allows you to take advantage of it at runtime. The result is that integer division can become faster - a lot faster.”
Returns a named/unnamed integer vector if max(bound1, bound2)
, or a numeric vector otherwise.
The maximum allowed value is .
Joseph Wood
## Generate some random data set.seed(496) mySamp <- sample(10^6, 5*10^5) ## Generate number of coprime elements for many numbers system.time(myPhis <- eulerPhiSieve(10^6)) ## Now use result in algorithm for (s in mySamp) { sPhi <- myPhis[s] ## Continue algorithm } ## See https://projecteuler.net system.time(which.max((1:10^6)/eulerPhiSieve(10^6))) ## Generating number of coprime elements ## for every number in a range is no problem system.time(myPhiRange <- eulerPhiSieve(10^13, 10^13 + 10^6)) ## Returning a named vector eulerPhiSieve(10, 20, namedVector = TRUE) eulerPhiSieve(10, namedVector = TRUE) ## Using nThreads system.time(eulerPhiSieve(1e5, 2e5, nThreads = 2))
## Generate some random data set.seed(496) mySamp <- sample(10^6, 5*10^5) ## Generate number of coprime elements for many numbers system.time(myPhis <- eulerPhiSieve(10^6)) ## Now use result in algorithm for (s in mySamp) { sPhi <- myPhis[s] ## Continue algorithm } ## See https://projecteuler.net system.time(which.max((1:10^6)/eulerPhiSieve(10^6))) ## Generating number of coprime elements ## for every number in a range is no problem system.time(myPhiRange <- eulerPhiSieve(10^13, 10^13 + 10^6)) ## Returning a named vector eulerPhiSieve(10, 20, namedVector = TRUE) eulerPhiSieve(10, namedVector = TRUE) ## Using nThreads system.time(eulerPhiSieve(1e5, 2e5, nThreads = 2))
Implementation of the Miller-Rabin primality test. Based on the "mp_prime_p" function from the "factorize.c" source file found in the gmp library: https://gmplib.org.
isPrimeRcpp(v, namedVector = FALSE, nThreads = NULL)
isPrimeRcpp(v, namedVector = FALSE, nThreads = NULL)
v |
Vector of integers or numeric values. |
namedVector |
Logical flag. If |
nThreads |
Specific number of threads to be used. The default is |
The Miller-Rabin primality test is a probabilistic algorithm that makes heavy use of modular exponentiation. At the heart of modular exponentiation is the ability to accurately obtain the remainder of the product of two numbers .
With the gmp library, producing accurate calculations for problems like this is trivial because of the nature of the multiple precision data type. However, standard C++ does not afford this luxury and simply relying on a strict translation would have limited this algorithm to numbers less than (N.B. We are taking advantage of the signed 64-bit fixed width integer from the stdint library in C++. If we were confined to base R, the limit would have been
). RcppAlgos::isPrimeRcpp gets around this limitation with a divide and conquer approach taking advantage of properties of arithmetic.
The problem we are trying to solve can be summarized as follows:
Now, we rewrite as
, so that we obtain:
Where each product for
is smaller than the original
. With this approach, we are now capable of handling much larger numbers. Many details have been omitted for clarity.
For a more in depth examination of this topic see Accurate Modular Arithmetic with Double Precision.
Returns a named/unnamed logical vector. If an index is TRUE
, the number at that index is prime, otherwise the number is composite.
The maximum value for each element in is
.
Conrad, Keith. "THE MILLER-RABIN TEST." https://www.math.uconn.edu/~kconrad/blurbs/ugradnumthy/millerrabin.pdf.
## check the primality of a single number isPrimeRcpp(100) ## check the primality of every number in a vector isPrimeRcpp(1:100) set.seed(42) mySamp <- sample(10^13, 10) ## return named vector for easy identification isPrimeRcpp(mySamp, namedVector = TRUE) ## Using nThreads system.time(isPrimeRcpp(mySamp, nThreads = 2))
## check the primality of a single number isPrimeRcpp(100) ## check the primality of every number in a vector isPrimeRcpp(1:100) set.seed(42) mySamp <- sample(10^13, 10) ## return named vector for easy identification isPrimeRcpp(mySamp, namedVector = TRUE) ## Using nThreads system.time(isPrimeRcpp(mySamp, nThreads = 2))
Sieve that generates the number of divisors for every number between bound1
and bound2
(if supplied) or all numbers up to bound1
. This is equivalent to applying the divisor function (often written as ) to every number in a given range.
numDivisorSieve(bound1, bound2 = NULL, namedVector = FALSE, nThreads = NULL)
numDivisorSieve(bound1, bound2 = NULL, namedVector = FALSE, nThreads = NULL)
bound1 |
Positive integer or numeric value. |
bound2 |
Positive integer or numeric value. |
namedVector |
Logical flag. If |
nThreads |
Specific number of threads to be used. The default is |
Simple and efficient sieve that calculates the number of divisors for every number in a given range. This function is very useful when you need to calculate the number of divisors for many numbers.
This algorithm benefits greatly from the fast integer division library 'libdivide'. The following is from https://libdivide.com/:
“libdivide allows you to replace expensive integer divides with comparatively cheap multiplication and bitshifts. Compilers usually do this, but only when the divisor is known at compile time. libdivide allows you to take advantage of it at runtime. The result is that integer division can become faster - a lot faster.”
Returns a named/unnamed integer vector
The maximum allowed value is .
Joseph Wood
## Generate some random data set.seed(8128) mySamp <- sample(10^6, 5*10^5) ## Generate number of divisors for ## every number less than a million system.time(mySigmas <- numDivisorSieve(10^6)) ## Now use result in algorithm for (s in mySamp) { sSig <- mySigmas[s] ## Continue algorithm } ## Generating number of divisors for every ## number in a range is no problem system.time(sigmaRange <- numDivisorSieve(10^13, 10^13 + 10^6)) ## Returning a named vector numDivisorSieve(10, 20, namedVector = TRUE) numDivisorSieve(10, namedVector = TRUE) ## Using nThreads system.time(numDivisorSieve(1e5, 2e5, nThreads = 2))
## Generate some random data set.seed(8128) mySamp <- sample(10^6, 5*10^5) ## Generate number of divisors for ## every number less than a million system.time(mySigmas <- numDivisorSieve(10^6)) ## Now use result in algorithm for (s in mySamp) { sSig <- mySigmas[s] ## Continue algorithm } ## Generating number of divisors for every ## number in a range is no problem system.time(sigmaRange <- numDivisorSieve(10^13, 10^13 + 10^6)) ## Returning a named vector numDivisorSieve(10, 20, namedVector = TRUE) numDivisorSieve(10, namedVector = TRUE) ## Using nThreads system.time(numDivisorSieve(1e5, 2e5, nThreads = 2))
The Partitions
class is an S4-class that exposes C++ classes that provide access to iterators and other useful methods.
nextIter
Retrieve the next lexicographical result
nextNIter
Pass an integer n to retrieve the next n lexicographical results
nextRemaining
Retrieve all remaining lexicographical results
currIter
Returns the current iteration
startOver
Resets the iterator
sourceVector
View the source vector
summary
Returns a list of summary information about the iterator
front
Retrieve the first lexicographical result
back
Retrieve the last lexicographical result
randomAccess
Random access method. Pass a single value or a vector of valid indices. If a single value is passed, the internal index of the iterator will be updated, however if a vector is passed the internal state will not change. GMP support allows for flexible indexing.
Joseph Wood
Combo-class
, Constraints-class
showClass("Partitions")
showClass("Partitions")
Calculate the number of partitions/compositions of a vector chosen at a time with or without replacement. Additionally, these functions can calculate the number of partitions of multisets.
partitionsCount(v, m = NULL, ...) compositionsCount(v, m = NULL, ...) ## Default S3 method: partitionsCount(v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, ...) ## Default S3 method: compositionsCount(v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, weak = FALSE, ...) ## S3 method for class 'table' partitionsCount(v, m = NULL, target = NULL, ...) ## S3 method for class 'table' compositionsCount(v, m = NULL, target = NULL, weak = FALSE, ...)
partitionsCount(v, m = NULL, ...) compositionsCount(v, m = NULL, ...) ## Default S3 method: partitionsCount(v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, ...) ## Default S3 method: compositionsCount(v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, weak = FALSE, ...) ## S3 method for class 'table' partitionsCount(v, m = NULL, target = NULL, ...) ## S3 method for class 'table' compositionsCount(v, m = NULL, target = NULL, weak = FALSE, ...)
v |
Source vector. If |
m |
Width of the partition. If |
... |
Further arguments passed to methods. |
repetition |
Logical value indicating whether partitions/compositions should be with or without repetition. The default is |
freqs |
A vector of frequencies used for producing all partitions of a multiset of |
target |
Number to be partitioned. If |
weak |
(Compositions only) Logical flag indicating whether to allow terms of the sequence to be zero. |
A numerical value representing the total number of partitions/compositions.
When the number of results exceeds , a number of class
bigz
is returned.
partitionsGeneral
, compositionsGeneral
## Same interface as partitionsGeneral partitionsCount(25, 5) compositionsCount(25, 5, TRUE) partitionsCount(15, 7, TRUE) partitionsCount(25, 5, freqs = rep(2, 25)) ## Return object of class 'bigz' partitionsCount(2500, 15, TRUE) compositionsCount(2500, 15, TRUE)
## Same interface as partitionsGeneral partitionsCount(25, 5) compositionsCount(25, 5, TRUE) partitionsCount(15, 7, TRUE) partitionsCount(25, 5, freqs = rep(2, 25)) ## Return object of class 'bigz' partitionsCount(2500, 15, TRUE) compositionsCount(2500, 15, TRUE)
The algorithms in RcppAlgos
go beyond the traditional integer partition algorithms and can tackle a wide variety of cases.
Efficient algorithms for partitioning numbers under various constraints:
Standard (with repetition)
Distinct
Restricted
Where each part has a specific multiplicity (i.e. when using freqs
for multisets).
Arbitrary target and source vector (e.g. partitionsGeneral(sample(1000, 20), 10, TRUE, target = 5000)
)
Produce results in parallel using the nThreads
arguments.
Alternatively, the arguments lower
and upper
make it possible to generate partitions/compositions in chunks allowing for parallelization via the parallel package.
GMP support allows for exploration of cases where the number of partitions/compositions is large.
The output is in lexicographical order.
partitionsGeneral(v, m = NULL, ...) compositionsGeneral(v, m = NULL, ...) ## Default S3 method: partitionsGeneral( v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, lower = NULL, upper = NULL, nThreads = NULL, tolerance = NULL, ... ) ## Default S3 method: compositionsGeneral( v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, weak = FALSE, lower = NULL, upper = NULL, nThreads = NULL, tolerance = NULL, ... ) ## S3 method for class 'table' partitionsGeneral( v, m = NULL, target = NULL, lower = NULL, upper = NULL, nThreads = NULL, tolerance = NULL, ... ) ## S3 method for class 'table' compositionsGeneral( v, m = NULL, target = NULL, weak = FALSE, lower = NULL, upper = NULL, nThreads = NULL, tolerance = NULL, ... )
partitionsGeneral(v, m = NULL, ...) compositionsGeneral(v, m = NULL, ...) ## Default S3 method: partitionsGeneral( v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, lower = NULL, upper = NULL, nThreads = NULL, tolerance = NULL, ... ) ## Default S3 method: compositionsGeneral( v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, weak = FALSE, lower = NULL, upper = NULL, nThreads = NULL, tolerance = NULL, ... ) ## S3 method for class 'table' partitionsGeneral( v, m = NULL, target = NULL, lower = NULL, upper = NULL, nThreads = NULL, tolerance = NULL, ... ) ## S3 method for class 'table' compositionsGeneral( v, m = NULL, target = NULL, weak = FALSE, lower = NULL, upper = NULL, nThreads = NULL, tolerance = NULL, ... )
v |
Source vector. If |
m |
Width of the partition. If |
... |
Further arguments passed to methods. |
repetition |
Logical value indicating whether partitions/compositions should be with or without repetition. The default is |
freqs |
A vector of frequencies used for producing all partitions of a multiset of |
lower |
The lower bound. Partitions/compositions are generated lexicographically, thus utilizing this argument will determine which specific partition to start generating from (e.g. |
upper |
The upper bound. Similar to |
target |
Number to be partitioned. If |
weak |
(Compositions only) Logical flag indicating whether to allow terms of the sequence to be zero. |
nThreads |
Specific number of threads to be used. The default is |
tolerance |
A numeric value greater than or equal to zero. This parameter is utilized when a constraint is applied on a numeric vector. The default value is 0 when it can be determined that whole values are being utilized, otherwise it is |
A matrix is returned with each row containing a vector of length .
nThreads
will be ignored in the following cases (i.e. Generating the partition in these cases are currently unavailable):
With standard multisets. If zero is the only element with a non-trivial multiplicity, multithreading is possible (e.g. partitionsGeneral(0:100, freqs = c(100, rep(1, 100)), nThreads = 4)
).
If the source vector is not isomorphic to 1:length(v)
(e.g. v = c(1, 4, 6, 7, 8)
).
The maximum number of partitions/compositions that can be generated at one time is . Utilizing
lower
and upper
makes it possible to generate additional partitions/compositions.
Joseph Wood
partitionsGeneral(1) partitionsGeneral(-1:0, 1) partitionsGeneral(-1:0, 1, target = -1) partitionsGeneral(20, 5) partitionsGeneral(20, 5, repetition = TRUE) partitionsGeneral(20, 5, freqs = rep(1:4, 5)) partitionsGeneral(20, 5, TRUE, target = 80) partitionsGeneral(0:10, repetition = TRUE) partitionsGeneral(seq(2L, 500L, 23L), 5, target = 1804) compositionsGeneral(0:10, 5, repetition = TRUE) set.seed(111) partitionsGeneral(sample(1000, 20), 5, TRUE, target = 2500) system.time(one_thread <- partitionsGeneral(80, 10, TRUE)) system.time(two_threads <- partitionsGeneral(80, 10, TRUE, nThreads = 2)) identical(one_thread, two_threads)
partitionsGeneral(1) partitionsGeneral(-1:0, 1) partitionsGeneral(-1:0, 1, target = -1) partitionsGeneral(20, 5) partitionsGeneral(20, 5, repetition = TRUE) partitionsGeneral(20, 5, freqs = rep(1:4, 5)) partitionsGeneral(20, 5, TRUE, target = 80) partitionsGeneral(0:10, repetition = TRUE) partitionsGeneral(seq(2L, 500L, 23L), 5, target = 1804) compositionsGeneral(0:10, 5, repetition = TRUE) set.seed(111) partitionsGeneral(sample(1000, 20), 5, TRUE, target = 2500) system.time(one_thread <- partitionsGeneral(80, 10, TRUE)) system.time(two_threads <- partitionsGeneral(80, 10, TRUE, nThreads = 2)) identical(one_thread, two_threads)
Returns an iterator for iterating over partitions/compositions of a numbers.
Supports random access via the [[
method.
GMP support allows for exploration of cases where the number of partitions/compositions is large.
Use the next
methods to obtain results in lexicographical order.
partitionsIter(v, m = NULL, ...) compositionsIter(v, m = NULL, ...) ## Default S3 method: partitionsIter(v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, nThreads = NULL, tolerance = NULL, ...) ## Default S3 method: compositionsIter(v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, weak = FALSE, nThreads = NULL, tolerance = NULL, ...) ## S3 method for class 'table' partitionsIter( v, m = NULL, target = NULL, nThreads = NULL, tolerance = NULL, ... ) ## S3 method for class 'table' compositionsIter( v, m = NULL, target = NULL, weak = FALSE, nThreads = NULL, tolerance = NULL, ... )
partitionsIter(v, m = NULL, ...) compositionsIter(v, m = NULL, ...) ## Default S3 method: partitionsIter(v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, nThreads = NULL, tolerance = NULL, ...) ## Default S3 method: compositionsIter(v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, weak = FALSE, nThreads = NULL, tolerance = NULL, ...) ## S3 method for class 'table' partitionsIter( v, m = NULL, target = NULL, nThreads = NULL, tolerance = NULL, ... ) ## S3 method for class 'table' compositionsIter( v, m = NULL, target = NULL, weak = FALSE, nThreads = NULL, tolerance = NULL, ... )
v |
Source vector. If |
m |
Width of the partition. If |
... |
Further arguments passed to methods. |
repetition |
Logical value indicating whether partitions/compositions should be with or without repetition. The default is |
freqs |
A vector of frequencies used for producing all partitions of a multiset of |
target |
Number to be partitioned. If |
weak |
(Compositions only) Logical flag indicating whether to allow terms of the sequence to be zero. |
nThreads |
Specific number of threads to be used. The default is |
tolerance |
A numeric value greater than or equal to zero. This parameter is utilized when a constraint is applied on a numeric vector. The default value is 0 when it can be determined that whole values are being utilized, otherwise it is |
Once you initialize a new iterator, the following methods are available:
nextIter
Retrieve the next lexicographical result
nextNIter
Pass an integer n to retrieve the next n lexicographical results
nextRemaining
Retrieve all remaining lexicographical results
currIter
Returns the current iteration
startOver
Resets the iterator
sourceVector
View the source vector
summary
Returns a list of summary information about the iterator
front
Retrieve the first lexicographical result
back
Retrieve the last lexicographical result
[[
Random access method. Pass a single value or a vector of valid indices. If a single value is passed, the internal index of the iterator will be updated, however if a vector is passed the internal state will not change. GMP support allows for flexible indexing.
If nextIter
is called, a vector is returned
Otherwise, a matrix with columns
If nThreads
is utilized, it will only take effect if the number of elements requested is greater than some threshold (determined internally). E.g:
serial <- partitionsIter(1000, 10) multi <- partitionsIter(1000, 10, nThreads = 4) fetch1e6 <- multi@nextNIter(1e6) ## much faster than serial@nextNIter(1e6) fetch1e3 <- multi@nextNIter(1e3) ## only one thread used... same as serial@nextNIter(1e3) library(microbenchmark) microbenchmark(multi@nextNIter(1e6), serial@nextNIter(1e6)) microbenchmark(multi@nextNIter(1e3), serial@nextNIter(1e3))
nThreads
will be ignored in the following cases (i.e. Generating the partition in these cases are currently unavailable):
With standard multisets. If zero is the only element with a non-trivial multiplicity, multithreading is possible.
If the source vector is not isomorphic to 1:length(v)
The maximum number of partitions/compositions that can be generated at one time is .
Joseph Wood
partitionsGeneral
, compositionsGeneral
a = partitionsIter(0:10, repetition = TRUE) a@nextIter() a@nextNIter(3) a@front() a@nextRemaining() a@summary() a@back() a[[5]] a@summary() a[[c(1, 17, 3)]] a@summary() ## Multisets... no random access b = partitionsIter(40, 5, freqs = rep(1:4, 10), target = 80) b@nextIter() b@nextNIter(10) b@summary() b@nextIter() b@currIter()
a = partitionsIter(0:10, repetition = TRUE) a@nextIter() a@nextNIter(3) a@front() a@nextRemaining() a@summary() a@back() a[[5]] a@summary() a[[c(1, 17, 3)]] a@summary() ## Multisets... no random access b = partitionsIter(40, 5, freqs = rep(1:4, 10), target = 80) b@nextIter() b@nextNIter(10) b@summary() b@nextIter() b@currIter()
Generate the rank (lexicographically) of partitions/compositions. These functions are the complement to partitions/compositionsSample
. See the examples below.
GMP support allows for exploration of partitions/compositions of vectors with many elements.
partitionsRank(..., v, repetition = FALSE, freqs = NULL, target = NULL) compositionsRank(..., v, repetition = FALSE, freqs = NULL, target = NULL, weak = FALSE)
partitionsRank(..., v, repetition = FALSE, freqs = NULL, target = NULL) compositionsRank(..., v, repetition = FALSE, freqs = NULL, target = NULL, weak = FALSE)
... |
vectors or matrices to be ranked. |
v |
Source vector. If |
repetition |
Logical value indicating whether partitions/compositions should be with or without repetition. The default is |
freqs |
A vector of frequencies used for producing all partitions of a multiset of |
target |
Number to be partitioned. If |
weak |
(Compositions only) Logical flag indicating whether to allow terms of the sequence to be zero. |
These algorithms rely on efficiently ranking the lexicographical partition.
A vector of class integer
, numeric
, or bigz
determined by the total number of partitions/compositions
v
must be supplied.
Joseph Wood
Lexicographical order ranking/unranking
partitionsSample
, compositionsSample
mySamp = partitionsSample(30, 8, TRUE, n = 5, seed = 10, namedSample = TRUE) myRank = partitionsRank(mySamp, v = 30, repetition = TRUE) all.equal(as.integer(rownames(mySamp)), myRank)
mySamp = partitionsSample(30, 8, TRUE, n = 5, seed = 10, namedSample = TRUE) myRank = partitionsRank(mySamp, v = 30, repetition = TRUE) all.equal(as.integer(rownames(mySamp)), myRank)
Generate a specific (lexicographically) or random sample of partitions/compositions of a number.
Produce results in parallel using the Parallel
or nThreads
arguments.
GMP support allows for exploration of cases where the number of partitions/compositions is large.
partitionsSample(v, m = NULL, ...) compositionsSample(v, m = NULL, ...) ## Default S3 method: partitionsSample( v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, n = NULL, sampleVec = NULL, seed = NULL, nThreads = NULL, namedSample = FALSE, ... ) ## Default S3 method: compositionsSample( v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, weak = FALSE, n = NULL, sampleVec = NULL, seed = NULL, nThreads = NULL, namedSample = FALSE, ... ) ## S3 method for class 'table' partitionsSample( v, m = NULL, target = NULL, n = NULL, sampleVec = NULL, seed = NULL, nThreads = NULL, namedSample = FALSE, ... ) ## S3 method for class 'table' compositionsSample( v, m = NULL, target = NULL, weak = FALSE, n = NULL, sampleVec = NULL, seed = NULL, nThreads = NULL, namedSample = FALSE, ... )
partitionsSample(v, m = NULL, ...) compositionsSample(v, m = NULL, ...) ## Default S3 method: partitionsSample( v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, n = NULL, sampleVec = NULL, seed = NULL, nThreads = NULL, namedSample = FALSE, ... ) ## Default S3 method: compositionsSample( v, m = NULL, repetition = FALSE, freqs = NULL, target = NULL, weak = FALSE, n = NULL, sampleVec = NULL, seed = NULL, nThreads = NULL, namedSample = FALSE, ... ) ## S3 method for class 'table' partitionsSample( v, m = NULL, target = NULL, n = NULL, sampleVec = NULL, seed = NULL, nThreads = NULL, namedSample = FALSE, ... ) ## S3 method for class 'table' compositionsSample( v, m = NULL, target = NULL, weak = FALSE, n = NULL, sampleVec = NULL, seed = NULL, nThreads = NULL, namedSample = FALSE, ... )
v |
Source vector. If |
m |
Width of the partition. If |
... |
Further arguments passed to methods. |
repetition |
Logical value indicating whether partitions/compositions should be with or without repetition. The default is |
freqs |
A vector of frequencies used for producing all partitions of a multiset of |
target |
Number to be partitioned. If |
weak |
(Compositions only) Logical flag indicating whether to allow terms of the sequence to be zero. |
n |
Number of partitions/compositions to return. The default is |
sampleVec |
A vector of numbers representing the lexicographical partitions/compositions to return. Accepts vectors of class |
seed |
Random seed initialization. The default is |
nThreads |
Specific number of threads to be used. The default is |
namedSample |
Logical flag. If |
These algorithms rely on efficiently generating the lexicographical partition. This is the process of unranking.
A matrix is returned with each row containing a vector of length .
partitionsSample
is not available for the following cases:
With standard multisets. If zero is the only element with a non-trivial multiplicity, sampling is allowed (e.g. partitionsSample(0:100, freqs = c(100, rep(1, 100)), n = 2)
)
If the source vector is not isomorphic to 1:length(v)
(e.g. v = c(1, 4, 6, 7, 8)
).
n
and sampleVec
cannot both be NULL
.
Joseph Wood
partitionsSample(100, 10, n = 5) partitionsSample(100, 10, seed = 42, n = 5, target = 200) ## retrieve specific results (lexicographically) partitionsCount(100, 10, TRUE, target = 500) ## [1] 175591757896 partitionsSample(100, 10, TRUE, target = 500, sampleVec = c(1, 1000, 175591757896))
partitionsSample(100, 10, n = 5) partitionsSample(100, 10, seed = 42, n = 5, target = 200) ## retrieve specific results (lexicographically) partitionsCount(100, 10, TRUE, target = 500) ## [1] 175591757896 partitionsSample(100, 10, TRUE, target = 500, sampleVec = c(1, 1000, 175591757896))
Prime counting function for counting the prime numbers less than an integer, , using Legendre's formula. It is based on the the algorithm developed by Kim Walisch found here: kimwalisch/primecount.
primeCount(n, nThreads = NULL)
primeCount(n, nThreads = NULL)
n |
Positive number |
nThreads |
Specific number of threads to be used. The default is |
Legendre's Formula for counting the number of primes less than makes use of the inclusion-exclusion principle to avoid explicitly counting every prime up to
. It is given by:
Where is the number of positive integers less than or equal to
that are relatively prime to the first
primes (i.e. not divisible by any of the first
primes). It is given by the recurrence relation (
is the
prime (e.g.
)):
This algorithm implements five modifications developed by Kim Walisch for calculating efficiently.
Cache results of
Calculate using
if
prime[a]
prime[a]
(i.e. Euler's totient function)
Calculate using
lookup table
Calculate all upfront
Stop recursion at if
or
instead of
Whole number representing the number of prime numbers less than or equal to .
The maximum value of is
Joseph Wood
Computing : the combinatorial method
Tomás Oliveira e Silva, Computing pi(x): the combinatorial method, Revista do DETUA, vol. 4, no. 6, March 2006, p. 761. https://sweet.ua.pt/tos/bib/5.4.pdf
## Get the number of primes less than a billion primeCount(10^9) ## Using nThreads system.time(primeCount(10^10, nThreads = 2))
## Get the number of primes less than a billion primeCount(10^9) ## Using nThreads system.time(primeCount(10^10, nThreads = 2))
Implementation of Pollard's rho algorithm for generating the prime factorization. The algorithm is based on the "factorize.c" source file from the gmp library found here https://gmplib.org.
primeFactorize(v, namedList = FALSE, nThreads = NULL)
primeFactorize(v, namedList = FALSE, nThreads = NULL)
v |
Vector of integers or numeric values. Non-integral values will be cured to whole numbers. |
namedList |
Logical flag. If |
nThreads |
Specific number of threads to be used. The default is |
As noted in the Description section above, this algorithm is based on the "factorize.c" source code from the gmp library. Much of the code in RcppAlgos::primeFactorize is a straightforward translation from multiple precision C data types to standard C++ data types. A crucial part of the algorithm's efficiency is based on quickly determining primality, which is easily computed with gmp. However, with standard C++, this is quite challenging. Much of the research for RcppAlgos::primeFactorize was focused on developing an algorithm that could accurately and efficiently compute primality.
For more details, see the documentation for isPrimeRcpp
.
Returns an unnamed vector if length(v) == 1
regardless of the value of namedList
. If , the class of the returned vector will be integer, otherwise the class will be numeric.
If length(v) > 1
, a named/unnamed list of vectors will be returned. If max(bound1, bound2)
, the class of each vector will be integer, otherwise the class will be numeric.
The maximum value for each element in is
.
Joseph Wood
primeFactorizeSieve
, factorize
## Get the prime factorization of a single number primeFactorize(10^8) ## Or get the prime factorization of many numbers set.seed(29) myVec <- sample(-1000000:1000000, 1000) system.time(pFacs <- primeFactorize(myVec)) ## Return named list pFacsWithNames <- primeFactorize(myVec, namedList = TRUE) ## Using nThreads system.time(primeFactorize(myVec, nThreads = 2))
## Get the prime factorization of a single number primeFactorize(10^8) ## Or get the prime factorization of many numbers set.seed(29) myVec <- sample(-1000000:1000000, 1000) system.time(pFacs <- primeFactorize(myVec)) ## Return named list pFacsWithNames <- primeFactorize(myVec, namedList = TRUE) ## Using nThreads system.time(primeFactorize(myVec, nThreads = 2))
Generates the prime factorization of all numbers between bound1
and bound2
(if supplied) or all numbers up to bound1
.
primeFactorizeSieve(bound1, bound2 = NULL, namedList = FALSE, nThreads = NULL)
primeFactorizeSieve(bound1, bound2 = NULL, namedList = FALSE, nThreads = NULL)
bound1 |
Positive integer or numeric value. |
bound2 |
Positive integer or numeric value. |
namedList |
Logical flag. If |
nThreads |
Specific number of threads to be used. The default is |
This function is useful when many prime factorizations are needed. Instead of generating the prime factorization on the fly, one can reference the indices/names of the generated list.
This algorithm benefits greatly from the fast integer division library 'libdivide'. The following is from https://libdivide.com/:
“libdivide allows you to replace expensive integer divides with comparatively cheap multiplication and bitshifts. Compilers usually do this, but only when the divisor is known at compile time. libdivide allows you to take advantage of it at runtime. The result is that integer division can become faster - a lot faster.”
Returns a named/unnamed list of integer vectors if max(bound1, bound2)
, or a list of numeric vectors otherwise.
The maximum value for either of the bounds is .
Joseph Wood
primeFactorize
, divisorsSieve
, factorize
## Generate some random data set.seed(28) mySamp <- sample(10^5, 5*10^4) ## Generate prime factorizations up ## to 10^5 (max element from mySamp) system.time(allPFacs <- primeFactorizeSieve(10^5)) ## Use generated prime factorization for further ## analysis by accessing the index of allPFacs for (s in mySamp) { pFac <- allPFacs[[s]] ## Continue algorithm } ## Generating prime factorizations over ## a range is efficient as well system.time(primeFactorizeSieve(10^12, 10^12 + 10^5)) ## Set 'namedList' to TRUE to return a named list primeFactorizeSieve(27, 30, namedList = TRUE) ## Using nThreads system.time(primeFactorizeSieve(1e4, 5e4, nThreads = 2))
## Generate some random data set.seed(28) mySamp <- sample(10^5, 5*10^4) ## Generate prime factorizations up ## to 10^5 (max element from mySamp) system.time(allPFacs <- primeFactorizeSieve(10^5)) ## Use generated prime factorization for further ## analysis by accessing the index of allPFacs for (s in mySamp) { pFac <- allPFacs[[s]] ## Continue algorithm } ## Generating prime factorizations over ## a range is efficient as well system.time(primeFactorizeSieve(10^12, 10^12 + 10^5)) ## Set 'namedList' to TRUE to return a named list primeFactorizeSieve(27, 30, namedList = TRUE) ## Using nThreads system.time(primeFactorizeSieve(1e4, 5e4, nThreads = 2))
Implementation of the segmented sieve of Eratosthenes with wheel factorization. Generates all prime numbers between bound1
and bound2
(if supplied) or all primes up to bound1
. See this stackoverflow post for an analysis on prime number generation efficiency in R: Generate a list of primes up to a certain number
The fundamental concepts of this algorithm are based off of the implementation by Kim Walisch found here: kimwalisch/primesieve.
primeSieve(bound1, bound2 = NULL, nThreads = NULL)
primeSieve(bound1, bound2 = NULL, nThreads = NULL)
bound1 |
Positive integer or numeric value. |
bound2 |
Positive integer or numeric value. |
nThreads |
Specific number of threads to be used. The default is |
At the heart of this algorithm is the traditional sieve of Eratosthenes (i.e. given a prime , mark all multiples of
as composite), however instead of sieving the entire interval, we only consider small sub-intervals. The benefits of this method are two fold:
Reduction of the space complexity from , for the traditional sieve, to
Reduction of cache misses
The latter is of particular importance as cache memory is much more efficient and closer in proximity to the CPU than main memory. Reducing the size of the sieving interval allows for more effective utilization of the cache, which greatly impacts the overall efficiency.
Another optimization over the traditional sieve is the utilization of wheel factorization. With the traditional sieve of Eratosthenes, you typically check every odd index of your logical vector and if the value is true, you have found a prime. With wheel factorization using the first four primes (i.e. 2, 3, 5, and 7) to construct your wheel (i.e. 210 wheel), you only have to check indices of your logical vector that are coprime to 210 (i.e. the product of the first four primes). As an example, with and a 210 wheel, you only have to check 2285 indices vs. 5000 with the classical implementation.
Returns an integer vector if max(bound1, bound2)
, or a numeric vector otherwise.
It does not matter which bound is larger as the resulting primes will be between min(bound1, bound2)
and max(bound1, bound2)
if bound2
is provided.
The maximum value for either of the bounds is .
Joseph Wood
## Primes up to a thousand primeSieve(100) ## Primes between 42 and 17 primeSieve(42, 17) ## Equivalent to primeSieve(17, 42) ## Primes up to one hundred million in no time system.time(primeSieve(10^8)) ## options(scipen = 50) ## Generate large primes over interval system.time(myPs <- primeSieve(10^13+10^6, 10^13)) ## Object created is small object.size(myPs) ## Using nThreads system.time(primeSieve(1e7, nThreads = 2))
## Primes up to a thousand primeSieve(100) ## Primes between 42 and 17 primeSieve(42, 17) ## Equivalent to primeSieve(17, 42) ## Primes up to one hundred million in no time system.time(primeSieve(10^8)) ## options(scipen = 50) ## Generate large primes over interval system.time(myPs <- primeSieve(10^13+10^6, 10^13)) ## Object created is small object.size(myPs) ## Using nThreads system.time(primeSieve(1e7, nThreads = 2))
Wrapper of std::thread::hardware_concurrency(). As stated by cppreference, the returned value should be considered only a hint.
stdThreadMax()
stdThreadMax()
An integer representing the number of concurrent threads supported by the user implementation. If the value cannot be determined, 1L
is returned.
stdThreadMax()
stdThreadMax()