NEWS


randomUniformForest 1.1.6 (2022-06-21)

The option 'na.action' has been added with arguments 'na.action = c("accurateImpute", "fastImpute", "veryFastImpute")'. 'na.action' is also used in 'randomUniformForest( )' and includes, in addition, the argument "omit". See ?rufImpute for more details and examples

One can also transforms regression responses into factors and send them to 'clusteredObject' option

Some bugs have also been corrected

release, since it uses many layers that need more optimization

comparable to 1

all the features and their effect on the dependent variable.

randomUniformForest 1.1.3

and to be more flexible. Options and examples have also been updated and are more detailed

in a compact and granular representation. It also works for classification

in a beta release

While not documented for now, one can look examples in 'unsupervised.randomUniformForest( )' function

for any proximity, leading to a binary matrix. The new and default one follows exactly Breiman's formulation.

Associated option is called 'sparseProximities', to choose between the two

or names of the variables (between quotes), or "all" (as before

instead of a part of it (which were formerly, for each tree, OOB cases in the rebalanced sample

It is now consistent with 'threads

randomUniformForest 1.1.2 (2015-01-06)

that are, in a second step, updated by extending all non pure nodes. The option is mainly designed to reduce memory

footprint while possibly providing faster convergence and computing times. Note that many other usages can be found.

could be selected many times in each node ('the sampling with replacement features'), leading to a lower accuracy

This new release allows categorical variables to, possibly, get further improvements using large 'mtry' values.

For more informations about how categorical variables are handled, see the Details section of ?randomUniformForest

The first one allows the unsupervised learning engine to be updated with new data. The second one (while experimental) allows to combine many unsupervised objects

Both lead to incremental unsupervised learning while the second one will be able to handle (at least partially) shifting distributions. Some bugs have also been corrected

More precisely if 'bagging' is enabled and 'mtry' being a value lower or equal to the dimension of data, then sample without replacement of 'mtry' variables will be done

randomUniformForest 1.1.1 (2014-12-04)

a subsample is learned in a full unsupervised mode and the remaining points are predicted in the MDS space, before applying clustering.

Some options have also be removed, since they are not easy to assess. Next update will allow full support

Patch has been added to force all trees to have well defined leaves.

Note that the algorithm deletes variables that are no longer useful during the growth of a tree (which was possibly a cause for the bug

Hence it still is a missing value from the point of view of original data,

but internally it is treated as an explicit numeric value different from others numeric ones in the variable.

randomUniformForest 1.1.0 (2014-11-10)

Cut-points choice has been changed in order to separate classification, unsupervised mode and regression.

In Regression and unsupervised learning, cut-points are chosen using the whole support of each candidate variable,

as in initial release and prior to the 1.0.9 one. In Classification, the support is now reduced to two random points

chosen among a small number. This leads to more diversity in trees.

randomUniformForest 1.0.9 (2014-11-07)

Previously, the whole support was chosen. This update on the optimization criterion leads to faster computation while not changing accuracy.

pre-processing (following Breiman's ideas) + randomUniformForest proximities matrix + Multidimensional scaling +

gap statistics (for estimating the number of clusters) + kmeans (for identifying every observation

Unsupervised learning introduces then many concepts that still need to be properly referenced.

Note that implementation for large datasets is currently not integrated

of predictions and responses. 'generic.cv( )' function proceeds to a k-fold cross-validation for any algorithm (with however some manual code

Especially for large files (> 100 000 rows), increase both speed and accuracy when using the 'rUniformForest.big( )' function

Note that the latter splits data in chunks, therefore one can not reach the accuracy of an (or the same) algorithm using the whole sample

More precisely, large datasets take most of the benefits of optimizations added

randomUniformForest 1.0.7

They were formerly matched by the same function than continuous ones, leading to some troubles in Variable importance and a loss of accuracy

Now, categorical variables used their own engine, from modelling to variable importance and selection

In addition, one can now use discrete values as categorical, e.g., in order to know if one frequent value in a variable can affect the response

Note that accuracy might drop in contrast of the default case, for which algorithm considers all variables as continuous ones

better match of categorical variables and 3D representation

Value of quantile was between 1 and 99 and not 0 and 1 as required

Bugs correction : when prediction Object was present, importance was not correctly computed, leading to wrong interactions between features

Interactions plot was also inverted (1rst order was 2nd

It is now allowed to replace all original values of train responses by a random vector sampled from those values (mean and variance

using gaussian distribution

random Uniform forests summary is now unified and one can also use model.stats( ) function to assess predictions vs responses

fillNA2.randomUniformForest( ) is now working with large files

Prediction is now possible for only one observation

Sample of any size or class distribution is allowed with rebalancedsampling option