The hidden R options for deescalating the error for using useNames = NA
to a warning has been removed; useNames = NA
is now always
an error.
Calling colRanks()
and rowRanks()
without explicitly specifying
argument ties.method
is deprecated since version 1.3.0
[2024-04-10]. If not explicitly specified, a deprecation warning is
now produced every 10:th call not specifying the ties.method
argument.
colTabulates()
and rowTabulates()
asserting that double values are passed, reported on the class of
the input data, not the storage type.runtime error: null pointer passed as argument 1, which is declared to never be null
bug introduced in v1.4.0 that was
detected by the UndefinedBehaviorSanitizer (UBSan) running on CRAN.rowSums2()
is now significantly faster for larger matrices.None of the error messages use a trailing period.
Addressing changes in the C API of R-devel resulted in compiler
errors such as error: implicit declaration of function 'Calloc'; did you mean 'calloc'? [-Wimplicit-function-declaration]
.
Addressing changes in stricter compiler flags of R-devel resulted
in compiler warning embedding a directive within macro arguments has undefined behavior [-Wembedded-directive]
.
colRanks()
and rowRanks()
without explicitly specifying
argument ties.method
is deprecated since version 1.3.0 [2024-04-10].
If not explicitly specified, a deprecation warning is now produced every
25:th call not specifying the ties.method
argument.validateIndices()
has been removed. It had been defunct since
version 0.63.0 (2022-11-14).Calling colRanks()
and rowRanks()
without explicitly specifying
argument ties.method
will be deprecated when using R (>=
4.4.0). The reason is that the current default is ties.method = "max"
, but we want to change that to ties.method = "average"
to
align it with base::rank()
. In order to minimize the risk for
sudden changes in results, we ask everyone to explicitly specify
their intent. The first notice will be through deprecation
warnings, which will only occur every 50:th call to keep the noise
level down. We will make it more noisy in future releases, and
eventually also escalated to defunct errors.
Using a scalar value for argument center
of colSds()
,
rowSds()
, colVars()
, rowVars()
, colMads()
, rowMads()
,
colWeightedMads()
, and rowWeightedMads()
is now defunct.
useNames = NA
is defunct.useNames = NA
is defunct in R (>= 4.4.0). Remains deprecated in
R (< 4.4.0) for now.useNames = NA
, suggested using
useNames = TRUE
twice instead of also useNames = FALSE
.useNames = TRUE
is the new default for all functions. For
backward compatibility, it used to be useNames = NA
.
colQuantiles()
and rowQuantiles()
gained argument digits
,
just like stats::quantile()
gained that argument in R 4.1.0.
colQuantiles()
and rowQuantiles()
only sets quantile percentage
names when useNames = TRUE
, to align with how argument names
of
stats::quantile()
works in base R.
colMeans2()
and rowMeans2()
gained argument refine
. If
refine = TRUE
, then the sample average for numeric matrices are
calculated using a two-pass scan, resulting in higher precision.
The default is refine = TRUE
to align it with colMeans()
, but
also mean2()
in this package. If the higher precision is not
needed, using refine = FALSE
will be almost twice as fast.
colSds()
, rowSds()
, colVars()
, and rowVars()
gained
argument refine
. If refine = TRUE
, then the sample average for
numeric matrices are calculated using a two-pass scan, resulting in
higher precision for the estimate of the center and therefore also
the variance.
Unnecessary checks for missing indices are eliminated, yielding better performance. This change does not affect user-facing API.
Made colQuantiles()
and rowQuantiles()
a bit faster for type != 7L
, by making sure percentage names are only generated once,
instead of once per column or row.
Contrary to other functions in the package, and how it works in
base R, functions colCumsums()
, colCumprods()
, colCummins()
,
colCummaxs()
, colRanges()
, colRanks()
, and colDiffs()
, plus
the corresponding row-based versions, did not drop the names
attribute when both row and column names were NULL
. Now also
these functions behaves the same as the case when neither row or
column names are set.
colQuantiles()
and rowQuantiles()
did not generate quantile
percentage names exactly the same way as stats::quantile()
, which
would reveal itself for certain combinations of probs
and
digits
.
useNames = NA
is now deprecated. Use useNames = TRUE
or
useNames = FALSE
instead.Package compiles again with older compilers not supporting the C99 standard (e.g. GCC 4.8.5 (2015), which is the default on RHEL / CentOS 7.9). This was the case also for matrixStats (<= 0.54.0).
Added more information to the error message produced when argument
center
for col-
and rowVars()
holds an invalid value.
Fix two compilation warnings on a function declaration without a prototype is deprecated in all versions of C [-Wstrict-prototypes]
.
validateIndices()
is now defunct and will eventually be removed
from the package API.colCummins()
, colCummaxs()
, rowCummins()
, and rowCummaxs()
now support also logical input.DBL_MAX
instead of
legacy S constant DOUBLE_XMAX
, which is planned to be unsupported
in R (>= 4.2.0).which
for colOrderStats()
and rowOrderStats()
is out of range, the error message now reports on the value of
which
. Similarly, when argument probs
for colQuantiles()
and
rowQuantiles()
is out of range, the error message reports on its
value too.validateIndices()
is deprecated and will eventually be removed
from the package API.Handling of the useNames
argument is now done in the native code.
Passing idxs
, rows
, and cols
arguments of type integer is now
less efficient than it used to, because the new code re-design (see
below) requires an internal allocation of an equally long
R_xlen_t
vector that is populated by indices coerced from
R_len_t
to R_xlen_t
integers.
R CMD check
would produce
a NOTE on the package installation size being large, which no
longer is the case. The downside is that extra overhead when
passing integer indices (see above comment).useNames = NA
in the previous release, colQuantiles()
and rowQuantiles()
got useNames = TRUE
.useNames = TRUE
. To drop them, set useNames = FALSE
. To preserve the
current, inconsistent behavior, set useNames = NA
, which, for
backward compatibility reasons, remains the default for now.meanOver()
and sumOver()
, and argument method
from
weightedVar()
, that have been defunct since January 2018.colVars()
and rowVars()
with argument center
now calculates
the sample variance using the n/(n-1)*avg((x-center)^2)
formula
rather than the n/(n-1)*(avg(x^2)-center^2)
formula that was used
in the past. Both give the same result when center
is the
correct sample mean estimate. The main reason for this change is
that, if an incorrect center
is provided, in contrast to the old
approach, the new approach is guaranteed to give at least
non-negative results, despite being incorrect. BACKWARD
COMPATIBILITY: Out of all 314 reverse dependencies on CRAN and
Bioconductor, only four called these functions with argument
center
. All of them pass their package checks also after this
update. To further protect against a negative impact in existing
user scripts, colVars()
and rowVars()
will calculate both
versions and assert that the result is the same. If not, an
informative error is produced. To limit the performance impact,
this validation is run only once every 50:th call, a frequency that
can be controlled by R option matrixStats.vars.formula.freq
.
Setting it to 0 or NULL will disable the validation. The default
can also be controlled by environment variable
R_MATRIXSTATS_VARS_FORMULA_FREQ
. This validation framework will
be removed in a future version of the package after it has been
established that this change has no negative impact.Now colWeightedMads()
and rowWeightedMads()
accept center
of
the same length as the number of columns and rows, respectively.
colAvgsPerRowSet()
and rowAvgsPerRowSet()
gained argument
na.rm
.
Now weightedMean()
and weightedMedian()
and the corresponding
row- and column-based functions accept logical x
, where FALSE is
treated as integer 0 and TRUE as 1.
Now x_OP_y()
and t_tx_OP_y()
accept logical x
and y
, where
FALSE is treated as integer 0 and TRUE as 1.
colQuantiles()
and rowQuantiles()
on a logical matrix should
return a numeric vector for type = 7
. However, when there were
only missing values (= NA) in the matrix, then it would return a
"logical" vector instead.
colAvgsPerRowSet()
on a single-column matrix would produce an
error on non-matching dimensions. Analogously, for
rowAvgsPerRowSet()
and single- row matrices.
colVars(x)
and rowVars(x)
with x
being an array would give
the wrong value if both argument dim.
and center
would be
specified.
The documentation was unclear on what the center
argument should
be. They would not detect when an incorrect specification was used,
notably when the length of center
did not match the matrix
dimensions. Now these functions give an informative error message
when center
is of the incorrect length.
center
of colSds()
,
rowSds()
, colVars()
, rowVars()
, colMads()
, rowMads()
,
colWeightedMads()
, and rowWeightedMads()
is now deprecated.colCumprods()
and rowCumprods()
now support also logical
input. Thanks to Constantin Ahlmann-Eltze at EMBL Heidelberg for
the patch.colCollapse()
and rowCollapse()
did not expand idxs
argument
before subsetting by cols
and rows
, respectively. Thanks to
Constantin Ahlmann-Eltze for reporting on this.
colAnys()
, rowAnys()
, anyValue()
, colAlls()
, rowAlls()
,
and allValue()
with value=FALSE
and numeric input would
incorrectly consider all values different from one as FALSE. Now
it is only values that are zero that are considered FALSE. Thanks
to Constantin Ahlmann-Eltze for the bug fix.
colQuantiles()
and rowQuantiles()
now supports only integer,
numeric and logical input. Previously, it was also possible to
pass, for instance, character
input, but that was a mistake. The
restriction on input allows for further optimization of these
functions.
The returned type of colQuantiles()
and rowQuantiles()
is now
the same as for stats::quantile()
, which depends on argument
type
.
colQuantiles()
and rowQuantiles()
with the default type = 7L
and when there are no missing values are now significantly faster
and use significantly fewer memory allocations.colDiffs()
and rowDiffs()
gave an error if argument dim.
was
of type numeric rather than type integer.
varDiff()
, sdDiff()
, madDiff()
, iqrDiff()
, and the
corresponding row- and column functions silently treated a diff
less than zero as diff = 0
. Now an error is produced.
Error messages on argument dim.
referred to non-existing argument
dim
.
Error messages on negative values in argument dim.
reported a
garbage value instead of the negative value.
The Markdown reports produced by the internal benchmark report generator did not add a line between tables and the following text (a figure caption) causing the following text to be included in a cell on an extra row in the table (at least when rendered on GitHub Wiki pages).
weightedVar()
, weightedSd()
, weightedMad()
, and their row-
and column- specific counter parts now return a missing value if
there are missing values in any of the weights w
after possibly
dropping (x
, w
) elements with missing values in x
(na.rm = TRUE
). Previously, na.rm = TRUE
would also drop (x
, w
)
elements where w
was missing. With this change, we now have that
for all functions in this package, na.rm = TRUE
never applies to
weights - only x
values.colRanks()
and rowRanks()
now supports the same set of
ties.method
as base::rank()
plus "dense"
as defined by
data.table::frank()
. For backward compatible reasons, the default
ties.method
remains the same as in previous versions. Thank to
Brian Montgomery for contributing this.
colCumsums()
and rowCumsums()
now support also logical input.
weightedVar()
, weightedSd()
, weightedMad()
, and their row-
and column- specific counter parts would produce an error instead
of returning a missing value when one of the weights is a missing
value.indexByRow(x)
where x
is a matrix is now defunct. Use
indexByRow(dim(x))
instead.stopifnot()
for internal validation,
because it comes with a great overhead. This was only used in
weightedMad()
, col-
, and rowWeightedMads()
, as well as col-
and rowAvgsPerColSet()
.Despite being an unlikely use case, colLogSumExps(lx)
/
rowLogSumExps(lx)
now also accepts integer lx
values.
The error produced when using indexByRow(dim)
with prod(dim) >= 2^31
would report garbage dimensions instead of dim
.
indexByRow(x)
, where x
is a matrix, is deprecated. Use
indexByRow(dim(x))
instead.col-
/rowSds()
explicitly replicate all arguments that are
passed to col-
/rowVars()
.weightedMedian(x, interpolate = TRUE)
works.colLogSumExps(lx, cols)
/ rowLogSumExps(lx, rows)
gave an error
if lx
has rownames / colnames.
col-
/rowQuantiles()
would lose rownames of output in certain
cases.
Functions sum2(x)
and means2(x)
now accept also logical input
x
, which corresponds to using as.integer(x)
but without the need
for neither coercion nor internal extra copies. With sum2(x, mode = "double")
it is possible to count number of TRUE elements beyond
2^31-1, which base::sum()
does not support.
Functions col-
/rowSums2()
and col-
/rowMeans2()
now accept
also logical input x
.
Function binMeans(y, x, bx)
now accepts logical y
, which
corresponds to to using as.integer(y)
, but without the need for
coercion to integer.
Functions col-
/rowTabulates(x)
now support logical input x
.
Now count()
can count beyond 2^31-1.
allocVector()
can now allocate long vectors (longer than 2^31-1).
Now sum2(x, mode = "integer")
generates a warning if typeof(x) == "double"
asking if as.integer(sum2(x))
was intended.
Inspired by Hmisc::wtd.var()
, when sum(w) <= 1
, weightedVar(x, w)
now produces an informative warning that the estimate is
invalid.
colAvgsPerColSet()
with that of rowAvgsPerColSet()
.col-
/rowLogSumExp()
could core dump R for "large" number of
columns/rows. Thanks Brandon Stewart at Princeton University for
reporting on this.
count()
beyond 2^31-1 would return invalid results.
Functions col-
/rowTabulates(x)
did not count missing values.
indexByRow(dim, idxs)
would give nonsense results if idxs
had
indices greater than prod(dim)
or non-positive indices; now it
gives an error.
indexByRow(dim)
would give nonsense results when prod(dim) >= 2^31
; now it gives an informative error.
col-
/rowAvgsPerColSet()
would return vector rather than matrix
if nrow(X) <= 1
. Thanks to Peter Hickey (Johns Hopkins
University) for troubleshooting and providing a fix.
Previously deprecated meanOver()
and sumOver()
are defunct. Use
mean2()
and sum2()
instead.
Previously deprecated weightedVar(x, w, method = "0.14.2")
is defunct.
Dropped previously defunct weightedMedian(..., ties = "both")
.
Dropped previously defunct argument centers
for
col-
/rowMads()
. Use center
instead.
Dropped previously defunct argument flavor
of colRanks()
and
rowRanks()
.
rowVars(..., method = "0.14.2")
that was added for very unlikely
needs of backward compatibility of an invalid degree-of-freedom
term is deprecated.matrixStats:::benchmark()
tried to run even
if not all suggested packages were available.Since anyNA()
is a built-in function since R (>= 3.1.0), please
use that instead of anyMissing()
part of this package. The
latter will eventually be deprecated. For consistency with the
anyNA()
name, colAnyNAs()
and rowAnyNAs()
are now also
available replacing the identically colAnyMissings()
and
rowAnyMissings()
functions, which will also be deprecated in a
future release.
meanOver()
was renamed to mean2()
and sumOver()
was renamed
to sum2()
.
Added colSums2()
and rowSums2()
which work like colSums()
and
rowSums()
of the base package but also supports efficient
subsetting via optional arguments rows
and cols
.
Added colMeans2()
and rowMeans2()
which work like colMeans()
and rowMeans()
of the base package but also supports efficient
subsetting via optional arguments rows
and cols
.
Functions colDiffs()
and rowDiffs()
gained argument dim.
.
Functions colWeightedMads()
and rowWeightedMads()
gained
arguments constant
and center
. The current implementation only
support scalars for these arguments, which means that the same
values are applied to all columns and rows, respectively. In
previous version a hard-to-understand error would be produced if
center
was of length greater than one; now an more informative
error message is given.
Package is now silent when loaded; it no longer displays a startup message.
Continuous-integration testing is now also done on macOS, in addition to Linux and Windows.
ROBUSTNESS: Package now registers the native API using also
R_useDynamicSymbols()
.
Cleaned up native low-level API and renamed native source code files to make it easier to navigate the native API.
Now using roxygen2 for help and NAMESPACE (was R.oo::Rdoc
).
rowAnys(x)
on numeric matrices x
would return rowAnys(x == 1)
and not rowAnys(x != 0)
. Same for colAnys()
, rowAlls()
, and
colAlls()
. Thanks Richard Cotton for reporting on this.
sumOver(x)
and meanOver(x)
would incorrectly return -Inf or
+Inf if the intermediate sum would have that value, even if one
of the following elements would turn the intermediate sum into
NaN or NA, e.g. with x
as c(-Inf, NaN)
, c(-Inf, +Inf)
, or
c(+Inf, NA)
.
WORKAROUND: Benchmark reports generated by
matrixStats:::benchmark()
would use any custom R prompt that is
currently set in the R session, which may not render very well.
Now it forces the prompt to be the built-in "> "
one.
The package API is only intended for matrices and vectors of type
numeric, integer and logical. However, a few functions would still
return if called with a data.frame. This was never intended to
work and is now an error. Specifically, functions colAlls()
,
colAnys()
, colProds()
, colQuantiles()
, colIQRs()
,
colWeightedMeans()
, colWeightedMedians()
, and colCollapse()
now produce warnings if called with a data.frame. Same for the
corresponding row- functions. The use of a `data.frame will be
produce an error in future releases.
meanOver()
and sumOver()
are deprecated because they were
renamed to mean2()
and sum2()
, respectively.
Previously deprecated (and ignored) argument flavor
of
colRanks()
and rowRanks()
is now defunct.
Previously deprecated support for passing non-vector, non-matrix
objects to rowAlls()
, rowAnys()
, rowCollapse()
, and the
corresponding column-based versions are now defunct. Likewise,
rowProds()
, rowQuantiles()
, rowWeightedMeans()
,
rowWeightedMedians()
, and the corresponding column-based versions
are also defunct. The rationale for this is to tighten up the
identity of the matrixStats package and what types of input it
accepts. This will also help optimize the code further.
SPEEDUP / CLEANUP: rowMedians()
and colMedians()
are now plain
functions. They were previously S4 methods (due to a Bioconductor
legacy). The package no longer imports the methods package.
SPEEDUP: Now native API is formally registered allowing for faster lookup of routines from R.
Package now installs on R (>= 2.12.0) as claimed. Thanks to Mikko Korpela at Aalto University School of Science, Finland, for troubleshooting and providing a fix.
logSumExp(c(-Inf, -Inf, ...))
would return NaN rather than
-Inf
. Thanks to Jason Xu (University of Washington) for reporting
and Brennan Vincent for troubleshooting and contributing a fix.
memcall(src, dest, 0)
call when dest == null
. Thanks to Brian
Ripley and the CRAN check tools for catching this. We could
reproduce this with gcc 5.1.1 but not with gcc 4.9.2.idxs
, rows
and
cols
were added to all functions such that the calculations are
performed on the requested subset while avoiding creating a
subsetted copy, i.e. rowVars(x, cols = 4:6)
is a much faster and
more memory efficient version than rowVars(x[, 4:6])
and even yet
more efficient than apply(x, MARGIN = 1L, FUN = var)
. These
features were added by Dongcan Jiang, Peking University, with
support from the Google Summer of Code program. A great thank you
to Dongcan and to Google for making this possible.w
and W
) default to
NULL, which corresponds to uniform weights.weightedVar(x, w)
used the wrong bias correction factor resulting
in an estimate that was tau too large, where tau = ((sum(w) - 1) / sum(w)) / ((length(w) - 1) / length(w))
. Thanks to Wolfgang Abele
for reporting and troubleshooting on this.
weightedVar(x)
with length(x) = 1
returned 0 - not NA. Same for
weightedSd()
.
weightedMedian(x, w = NA_real_)
returned x
rather than
NA_real_
. This only happened for length(w) = 1
.
allocArray(dim)
failed for prod(dim) >= .Machine$integer.max
.
CLEANUP: Defunct argument centers
for col-
/rowMads()
; use
center
.
weightedVar(x, w, method = "0.14.2")
is deprecated.
x_OP_y()
and t_tx_OP_y()
would return garbage on Solaris SPARC
(and possibly other architectures as well) when input was integer
and had missing values.product(x, na.rm = FALSE)
for integer x
with both zeros and NAs
returned zero rather than NA.
weightedMean(x, w, na.rm = TRUE)
did not handle missing values in
x
properly, if it was an integer. It would also return NaN if
there were weights w
with missing values, whereas
stats::weighted.mean()
would skip such data points. Now
weightedMean()
does the same.
(col|row)WeightedMedians()
did not handle infinite weights as
weightedMedian()
does.
x_OP_y(x, y, OP, na.rm = FALSE)
returned garbage iff x
or y
had missing values of type integer.
rowQuantiles()
and rowIQRs()
did not work for single-row
matrices. Analogously for the corresponding column functions.
rowCumsums()
, rowCumprods()
rowCummins()
, and rowCummaxs()
,
accessed out-of-bound elements for Nx0 matrices where N > 0. The
corresponding column methods has similar memory errors for 0xK
matrices where K > 0.
anyMissing(list(NULL))
returned NULL; now FALSE.
rowCounts()
resulted in garbage if a previous column had NAs
(because it forgot to update index kk in such cases).
rowCumprods(x)
handled missing values and zeros incorrectly for
integer x
(not double); a zero would trump an existing missing
value causing the following cumulative products to become zero. It
was only a zero that trumped NAs; any other integer would work as
expected. Note, this bug was not in colCumprods()
.
rowAnys(x, value, na.rm = FALSE)
did not handle missing values in
a numeric x
properly. Similarly, for non-numeric and non-logical
x
, row- and colAnys()
, row- and colAlls()
, anyValue()
and
allValue()
did not handle when value
was a missing value.
All of the above bugs were identified and fixed by Dongcan Jiang (Peking University, China), who also added corresponding unit tests.
anyMissing()
is no longer an S4 generic. This was done
as part of the migration of making all functions of matrixStats
plain R functions, which minimizes calling overhead and it will
also allow us to drop methods from the package dependencies.
I've scanned all CRAN and Bioconductor packages depending on
matrixStats and none of them relied on anyMissing()
dispatching
on class, so hopefully this move has little impact. The only
remaining S4 methods are now colMedians()
and rowMedians()
.CONSISTENCY: Renamed argument centers
of col-
/rowMads()
to
center
. This is consistent with col-
/rowVars()
.
CONSISTENCY: col-
/rowVars()
now use na.rm = FALSE
as the
default (na.rm = TRUE
was mistakenly introduced as the default in
v0.9.7).
SPEEDUP: The check for user interrupts at the C level is now done
less frequently of the functions. It does every k:th iteration,
where k = 2^20
, which is tested for using (iter % k == 0
). It
turns out, at least with the default compiler optimization settings
that I use, that this test is 3 times faster if k = 2^n
where n is
an integer. The following functions checks for user interrupts:
logSumExp()
, (col|row)LogSumExps()
, (col|row)Medians()
,
(col|row)Mads()
, (col|row)Vars()
, and
(col|row)Cum(Min|Max|prod|sum)s()
.
SPEEDUP: logSumExp(x)
is now faster if x
does not contain any
missing values. It is also faster if all values are missing or the
maximum value is +Inf - in both cases it can skip the actual
summation step.
all()
and any()
flavored methods on non-numeric and non-logical
(e.g. character) vectors and matrices with na.rm = FALSE
did not
give results consistent with all()
and any()
if there were
missing values. For example, with x <- c("a", NA, "b")
we have
all(x == "a") == FALSE
and any(x == "a") == TRUE
, whereas our
corresponding methods would return NA in those cases. The methods
fixed are allValue()
, anyValue()
, col-
/rowAlls()
, and
col-
/rowAnys()
. Added more package tests to cover these cases.
logSumExp(x, na.rm = TRUE)
would return NA if all values were NA
and length(x) > 1
. Now it returns -Inf for all length(x)
:s.
diff2()
with differences >= 3
would read spurious values
beyond the allocated memory. This error, introduced in 0.13.0, was
harmless in the sense that the returned value was unaffected and
still correct. Thanks to Brian Ripley and the CRAN check tools for
catching this. I could reproduce it locally with valgrind.anyMissing()
and rowMedians()
.Added weightedMean()
, which is ~10 times faster than
stats::weighted.mean()
.
Added count(x, value)
which is a notably faster than sum(x == value)
. This can also be used to count missing values etc.
Added allValue()
and anyValue()
for all(x == value)
and
any(x == value)
.
Added diff2()
, which is notably faster than base::diff()
for
vectors, which it is designed for.
Added iqrDiff()
and (col|row)IqrDiffs()
.
CONSISTENCY: Now rowQuantiles(x, na.rm = TRUE)
returns all NAs
for rows with missing values. Analogously for colQuantiles()
,
colIQRs()
, rowIQRs()
and iqr()
. Previously, all these
functions gave an error saying missing values are not allowed.
COMPLETENESS: Added corresponding "missing" vector functions for
already existing column and row functions. Similarly, added
"missing" column and row functions for already existing vector
functions, e.g. added iqr()
and count()
to complement already
existing (col|row)IQRs()
and (col|row)Counts()
functions.
ROBUSTNESS: Now column and row methods give slightly more informative error messages if a data.frame is passed instead of a matrix.
SPEEDUP: (col|row)Diffs()
are now implemented in native code and
notably faster than diff()
for matrices.
SPEEDUP: Made binCounts()
and binMeans()
a bit faster.
SPEEDUP: Implemented weightedMedian()
in native code, which made
it ~3-10 times faster. Dropped support for ties = "both"
,
because it would have to return two values in case of ties, which
made the API unnecessarily complicated. If really needed, then
call the function twice with ties = "min"
and ties = "max"
.
SPEEDUP: (col|row)Anys()
and (col|row)Alls()
is now notably
faster compared to previous versions.
anyMissing()
into a plain R
function, the specific anyMissing()
implementations for
data.frame:s and and list:s were dropped and is now handled by
anyMissing()
for "ANY"
, which is the only S4 method remaining
now. In a near future release, this remaining "ANY"
method will
turned into a plain R function and the current S4 generic will be
dropped. We know of no CRAN and Bioconductor packages that rely on
it being a generic function. Note also that since R (>= 3.1.0)
there is a base::anyNA()
function that does the exact same thing
making anyMissing()
obsolete.weightedMedian(..., ties = "both")
would give an error if there
was a tie. Added package test for this case.weightedMedian(..., ties = "both")
is now defunct.product()
on integer vector
incorrectly used C-level abs()
on intermediate values despite
those being doubles requiring fabs()
. Despite this, the
calculated product would still be correct (at least when validated
on several local setups as well as on the CRAN servers). Again,
thanks to Brian Ripley for pointing out another invalid
integer-double coercion at the C level.weightedMedian(..., interpolate = FALSE, ties = "both")
is
defunct.(col|row)Cumsums(x)
where x
is integer would return garbage for
columns (rows) containing missing values.
rowMads(x)
where x
is numeric (not integer) would give
incorrect results for rows that had an odd number of values (no
ties). Analogously issues with colMads()
. Added package tests
for such cases too. Thanks to Brian Ripley and the CRAN check
tools for (yet again) catching another coding mistake. Details:
This was because the C-level calculation of the absolute value of
residuals toward the median would use integer-based abs()
rather
than double-based fabs()
. Now it fabs()
is used when the values
are double and abs()
when they are integers.
(col|row)Cumsums()
, (col|row)Cumprods()
,
(col|row)Cummins()
, and (col|row)Cummaxs()
.(col|row)WeightedMeans()
with all zero weights gave mean
estimates with values 0 instead of NaN.SPEEDUP: Implemented (col|row)Mads()
, (col|row)Sds()
, and
(col|row)Vars()
in native code.
SPEEDUP: Made (col|row)Quantiles(x)
faster for x
without
missing values (and default type = 7L
quantiles). It should
still be implemented in native code though.
SPEEDUP: Made rowWeightedMeans()
faster.
(col|row)Medians(x)
when x
is integer would give invalid median
values in case (a) it was calculated as the mean of two values
("ties"), and (b) the sum of those values where greater than
.Machine$integer.max
. Now such ties are calculated using
floating point precision. Add lots of package tests.SPEEDUP: Now (col|row)Mins()
, (col|row)Maxs()
, and
(col|row)Ranges()
are implemented in native code providing a
significant speedup.
SPEEDUP: Now colOrderStats()
also is implemented in native code,
which indirectly makes colMins()
, colMaxs()
and colRanges()
faster.
SPEEDUP: colTabulates(x)
no longer uses rowTabulates(t(x))
.
SPEEDUP: colQuantiles(x)
no longer uses rowQuantiles(t(x))
.
flavor
of (col|row)Ranks()
is now ignored.(col|row)Prods()
now uses default method = "direct"
(was
"expSumLog"
).SPEEDUP: Now colCollapse(x)
no longer utilizes
rowCollapse(t(x))
. Added package tests for (col|row)Collapse()
.
SPEEDUP: Now colDiffs(x)
no longer uses rowDiffs(t(x))
. Added
package tests for (col|row)Diffs()
.
SPEEDUP: Package no longer utilizes match.arg()
due to its
overhead; methods sumOver()
, (col|row)Prods()
and
(col|row)Ranks()
were updated.
dim
. For instance, rowCounts(x, dim = c(nrow, ncol))
is the same as rowCounts(matrix(x, nrow, ncol))
, but more
efficient since it avoids creating/allocating a temporary matrix.colCounts()
is implemented in native code.
Moreover, (col|row)Counts()
are now also implemented in native
code for logical input (previously only for integer and double
input). Added more package tests and benchmarks for these
functions.sdDiff()
, madDiff()
, varDiff()
, weightedSd()
,
weightedVar()
and weightedMad()
into plain functions (were
generic functions).::
.indexByRow()
in native code and it is no
longer a generic function, but a regular function, which is also
faster to call. The first argument of indexByRow()
has been
changed to dim
such that one should use indexByRow(dim(X))
instead of indexByRow(X)
as in the past. The latter form is
still supported, but deprecated.allocVector()
, allocMatrix()
, and allocArray()
for
faster allocation numeric vectors, matrices and arrays,
particularly when filled with non-missing values.indexByRow(X)
with a matrix X
is deprecated. Instead
call it with indexByRow(dim(X))
.Better support for long vectors.
PRECISION: Using greater floating-point precision in more internal intermediate calculations, where possible.
binCounts()
and binMeans()
it is possible that a bin gets a
higher count than what can be represented by an R integer
(.Machine$integer.max = 2^31-1
). If that happens, an informative
warning is generated and the bin count is set to
.Machine$integer.max
. If this happens for binMeans()
, the
corresponding mean is still properly calculated and valid..Call()
and takes care of most of the argument validation and
construction of the return value. This function dispatch to
functions in the low-level API based on data type(s) and other
arguments. The low-level API is written to work with basic C data
types only.R_xlen_t
on R (>= 3.0.0) systems
where LONG_VECTOR_SUPPORT
is not supported.sumOver()
and meanOver()
, which are notably faster
versions of sum(x[idxs])
and mean(x[idxs])
. Moreover, instead
of having to do sum(as.numeric(x))
to avoid integer overflow when
x
is an integer vector, one can do sumOver(x, mode = "numeric")
, which avoids the extra copy created when coercing to
numeric (this numeric copy is also twice as large as the integer
vector). Added package tests and benchmark reports for these
functions.SPEEDUP: Made anyMissing()
, logSumExp()
, (col|row)Medians()
,
and (col|row)Counts()
slightly faster by making the native code
assign the results directly to the native vector instead of to the
R vector, e.g. ansp[i] = v
where ansp = REAL(ans)
instead of
REAL(ans)[i] = v
.
Added benchmark reports for anyMissing()
and logSumExp()
.
binMeans()
returned 0.0 instead of NA_real_
for empty bins."redefinition of typedef 'R_xlen_t'"
.Added benchmark reports for also non-matrixStats functions
col-
/rowSums()
and col-
/rowMeans()
.
Now all colNnn()
and rowNnn()
methods are benchmarked in a
combined report making it possible to also compare colNnn(x)
with
rowNnn(t(x))
.
Relaxed some packages tests such that they assert numerical
correctness via all.equal()
rather than identical()
.
Submitted to CRAN.
product()
incorrectly assumed that the
value of prod(c(NaN, NA))
is uniquely defined. However, as
documented in help("is.nan")
, it may be NA or NaN depending on R
system/platform.Introduced a bug in v0.9.5 causing col-
/rowVars()
and hence
also col-
/rowSds()
to return garbage. Add package tests for
these now.
Submitted to CRAN.
signTabulate()
for tabulating the number of negatives,
zeros, positives and missing values. For doubles, the number of
negative and positive infinite values are also counted.SPEEDUP: Now col-
/rowProds()
utilizes new product()
function.
SPEEDUP: Added product()
for calculating the product of a numeric
vector via the logarithm.
SPEEDUP: Made weightedMedian()
a plain function (was an S3
method).
CLEANUP: Now only exporting plain functions and generic functions.
SPEEDUP: Turned more S4 methods into S3 methods,
e.g. rowCounts()
, rowAlls()
, rowAnys()
, rowTabulates()
and
rowCollapse()
.
method
to col-
/rowProds()
for controlling how
the product is calculated.SPEEDUP: Package is now byte compiled.
SPEEDUP: Made rowProds()
and rowTabulates()
notably faster.
SPEEDUP: Now rowCounts()
, rowAnys()
, rowAlls()
and
corresponding column methods can search for any value in addition
to the default TRUE. The search for a matching integer or double
value is done in native code, which is notably faster (and more
memory efficient because it avoids creating any new objects).
SPEEDUP: Made colVars()
and colSds()
notably faster and
rowVars()
and rowSds()
a slightly bit faster.
Added benchmark reports, e.g. matrixStats:::benchmark("colMins")
.
indexByRow()
, madDiff()
, sdDiff()
and varDiff()
.trim
to madDiff()
, sdDiff()
and varDiff()
.binMeans(x, bx)
would try to access an
out-of-bounds value of argument y
iff x
contained elements that
are left of all bins in bx
. This bug had no impact on the
results and since no assignment was done it should also not
crash/core dump R. This was discovered thanks to new memtests
(ASAN and valgrind) provided by CRAN.rowProds()
would throw "Error in rowSums(isNeg) :
x must be an array of at least two dimensions"
on matrices where all rows
contained at least one zero. Thanks to Roel Verbelen at KU Leuven
for the report.weighedVar()
and weightedSd()
.MEMORY: Updated all functions to do a better job of cleaning out temporarily allocated objects as soon as possible such that the garbage collector can remove them sooner, iff wanted. This increase the chance for a smaller memory footprint.
Submitted to CRAN.
right
to binCounts()
and binMeans()
to specify
whether binning should be done by (u,v] or [u,v). Added system
tests validating the correctness of the two cases.anyMissing()
everywhere possible.ROBUSTNESS: Now importing loadMethod
from methods package
such that matrixStats S4-based methods also work when
methods is not loaded, e.g. when Rscript
is used,
cf. Section 'Default packages' in 'R Installation and
Administration'.
ROBUSTNESS: Updates package system tests such that the can run with only the base package loaded.
CLEANUP: Now only importing two functions from the methods package.
Bumped up package dependencies.
quietly
of library()
/require()
.help("rowQuantiles")
.(col|row)Mins()
and (col|row)Maxs()
much faster.rowRanges(x)
on an Nx0 matrix would give an error. Same for
colRanges(x)
on an 0xN matrix. Added system tests for these and
other special cases.(col|row)WeightedMedians()
.(col|row)Tabulates()
by replacing rm()
calls
with NULL assignments.\usage{}
lines are at most 90 characters
long.binCounts()
and binMeans()
now uses Hoare's Quicksort for
presorting x
before counting/averaging. They also no longer test in
every iteration (== for every data point) whether the last bin has been
reached or not, but only after completing a bin.logSumExp()
used an invalid check for missing
value of an integer argument. Detected by Brian Ripley upon CRAN
submission.logSumExp(lx)
and (col|row)LogSumExps(lx)
for accurately
computing of log(sum(exp(lx)))
for standalone vectors, and row
and column vectors of matrices. Thanks to Nakayama (Japan) for the
suggestion and contributing a draft in R.preserveShape
to colRanks()
. For backward
compatibility the default is preserveShape = FALSE
, but it may
change in the future.Since v0.6.4, (col|row)Ranks()
gave the incorrect results for
integer matrices with missing values.
Since v0.6.4, (col|row)Medians()
for integers would calculate
ties as floor(tieAvg)
.
(col|row)Ranks()
support "max"
(default), "min"
and
"average"
for argument ties.method
. Added system tests
validation these cases. Thanks Peter Langfelder (UCLA) for
contributing this.ties.method
to rowRanks()
and colRanks()
, but
still only support for "max"
(as before).anyMissing()
for data type raw
, which always returns FALSE.ROBUSTNESS: Added system test for anyMissing()
.
ROBUSTNESS: Now S3 methods are declared in the namespace.
example(weightedMedian)
faster.In some cases binCounts()
and binMeans()
could try to go past
the last bin resulting a core dump.
binCounts()
and binMeans()
would return random/garbage values
for bins that were beyond the last data point.
Added binMeans()
for fast sample-mean calculation in bins.
Thanks to Martin Morgan at the Fred Hutchinson Cancer Research
Center, Seattle, for contributing the core code for this.
Added binCounts()
for fast element counting in bins.
.Internal(psort(...))
call with a call
to a new internal partial sorting function, which utilizes the
native rPsort()
part of the R internals.(col|row)Prods()
handle missing values.(col|row)Prods()
would return NA instead of 0
for some elements. Added a redundancy test for the case. Thanks
Brenton Kenkel at University of Rochester for reporting on this.Added weightedMad()
from aroma.core v2.5.0.
Added weightedMedian()
from aroma.light v1.25.2.
This package no longer depends on the aroma.light package for any of its functions.
Now this package only imports R.methodsS3, meaning it no longer loads R.methodsS3 when it is loaded.
centers
of rowMads()
/colMads()
to explicitly be (col|row)Medians(x,...)
. The default behavior
has not changed.ROBUSTNESS: Added system/redundancy tests for
rowMads()
/colMads()
.
CRAN: Made the system tests "lighter" by default, but full tests
can still be run, cf. tests/*.R
scripts.
colMads()
would return the incorrect estimates. This bug was
introduced in matrixStats v0.4.0 (2011-11-11).rowMedians(..., na.rm = TRUE)
did not handle NaN (only NA). The
reason for this was the the native code used ISNA()
to test for
NA and NaN, but it should have been ISNAN()
, which is opposite to
how is.na()
and is.nan()
at the R level work. Added system
tests for this case.rowAvgsPerColSet()
and colAvgsPerRowSet()
.Added help pages with an example to rowIQRs()
and colIQRs()
.
Added example to rowQuantiles()
.
rowIQRs()
and colIQRs()
would return the 25% and the 75%
quantiles, not the difference between them. Thanks Pierre Neuvial
at CNRS, Evry, France for the report.center
in
rowMads()
and colMads()
. It added unnecessary overhead if not
needed.rowRanks()
and colRanks()
. Thanks Hector Corrada Bravo
(University of Maryland) and Harris Jaffee (John Hopkins).colMedians(x)
no longer uses
rowMedians(t(x))
; instead there is now an optimized native-code
implementation. Also, colMads()
utilizes the new colMedians()
directly. This improvement was kindly contributed by Harris Jaffee
at Biostatistics of John Hopkins, USA.colMedians()
and rowMedians()
.(col|row)Quantiles()
contains column names..First.lib()
and
.Last.lib()
.(col|row)WeightedMeans(..., na.rm = TRUE)
would incorrectly treat
missing values as zeros. Added corresponding redundancy tests
(also for the median case). Thanks Pierre Neuvial for reporting
this.colRanges(x)
would return a matrix of wrong dimension if x
did
not have any missing values. This would affect all functions
relying on colRanges()
, e.g. colMins()
and colMaxs()
. Added
a redundancy test for this case. Thanks Pierre Neuvial at UC
Berkeley for reporting this.
(col|row)Ranges()
return a matrix with dimension names.
"%#x"
in rowTabulates()
when creating
the column names of the result matrix. It gave an error OSX with R
v2.9.0 devel (2009-01-13 r47593b) current the OSX server at
R-forge.rowWeightedMedians()
to run
conditionally on aroma.light, which is only a suggested
package - not a required one. This in order to prevent R CMD check
to fail on CRAN, which prevents it for building binaries (as
it currently happens on their OSX servers).rowOrderStats()
, the stack would not become
UNPROTECTED before calling error.(col|row)Weighted(Mean|Median)s()
for weighted
averaging.R CMD check
flawlessly.(col|row)Tabulates()
for integer and raw matrices.rowCollapse()
was broken and returned the wrong elements.Added (col|row)Collapse()
.
Added varDiff()
, sdDiff()
, and madDiff()
.
Added indexByRow()
.
Added (col|row)OrderStats()
.
Added (col|row)Ranges()
and (col|row)(Min|Max)s()
.
Added colMedians()
.
Now anyMissing()
support most data types as structures.
Imported the rowNnn()
methods from Biobase.
Created.