Title: | Constructing Hierarchical Voronoi Tessellations and Overlay Heatmaps for Data Analysis |
---|---|
Description: | Facilitates building topology preserving maps for data analysis. |
Authors: | Zubin Dowlaty [aut], Mu Sigma, Inc. [cre] |
Maintainer: | "Mu Sigma, Inc." <[email protected]> |
License: | Apache License 2.0 |
Version: | 25.2.2 |
Built: | 2025-03-07 07:11:14 UTC |
Source: | CRAN |
This is the main function to perform hierarchical clustering analysis which determines optimal number of clusters, perform AGNES clustering and plot the 2D cluster hvt plot.
clustHVT( data, trainHVT_results, scoreHVT_results, clustering_method = "ward.D2", indices, clusters_k = "champion", type = "default", domains.column )
clustHVT( data, trainHVT_results, scoreHVT_results, clustering_method = "ward.D2", indices, clusters_k = "champion", type = "default", domains.column )
data |
Data frame. A data frame intended for performing hierarchical clustering analysis. |
trainHVT_results |
List. A list object which is obtained as a result of trainHVT function. |
scoreHVT_results |
List. A list object which is obtained as a result of scoreHVT function. |
clustering_method |
Character. The method used for clustering in both NbClust and hclust function. Defaults to ‘ward.D2’. |
indices |
Character. The indices used for determining the optimal number of clusters in NbClust function. By default it uses 20 different indices. |
clusters_k |
Character. A parameter that specifies the number of clusters for the provided data. The options include “champion,” “challenger,” or any integer between 1 and 20. Selecting “champion” will use the highest number of clusters recommended by the ‘NbClust’ function, while “challenger” will use the second-highest recommendation. If a numerical value from 1 to 20 is provided, that exact number will be used as the number of clusters. |
type |
Character. The type of output required. Default is 'default'. Other option is 'plot' which will return only the clustered heatmap. |
domains.column |
Character. A vector of cluster names for the clustered heatmap. Used only when type is 'plot'. |
A list object that contains the hierarchical clustering results.
[[1]] |
Summary of k suggested by all indices with plots |
[[2]] |
A dendogram plot with the selected number of clusters |
[[3]] |
A 2D Cluster HVT Plotly visualization that colors cells according to clusters derived from AGNES clustering results. It is interactive, allowing users to view cell contents by hovering over them |
Vishwavani <[email protected]>
data("EuStockMarkets") dataset <- data.frame(t = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) rownames(EuStockMarkets) <- dataset$t hvt.results<- trainHVT(dataset[-1],n_cells = 30, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") scoring <- scoreHVT(dataset, hvt.results, analysis.plots = TRUE, names.column = dataset[,1]) centroid_data <- scoring$centroidData hclust_data_1 <- centroid_data[,2:3] clust.results <- clustHVT(data = hclust_data_1, trainHVT_results = hvt.results, scoreHVT_results = scoring, clusters_k = 'champion', indices = 'hartigan')
data("EuStockMarkets") dataset <- data.frame(t = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) rownames(EuStockMarkets) <- dataset$t hvt.results<- trainHVT(dataset[-1],n_cells = 30, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") scoring <- scoreHVT(dataset, hvt.results, analysis.plots = TRUE, names.column = dataset[,1]) centroid_data <- scoring$centroidData hclust_data_1 <- centroid_data[,2:3] clust.results <- clustHVT(data = hclust_data_1, trainHVT_results = hvt.results, scoreHVT_results = scoring, clusters_k = 'champion', indices = 'hartigan')
This is the main function for displaying data in table format
displayTable(data, scroll = TRUE, limit = 20)
displayTable(data, scroll = TRUE, limit = 20)
data |
Data frame. The dataframe to be displayed in table format. |
scroll |
Logical. A value to have a scroll or not in the table. Default is TRUE. |
limit |
Numeric. A value to indicate how many rows to display. Default is 20. |
A table with proper formatting for html notebook
Vishwavani <[email protected]>
data <- datasets::EuStockMarkets dataset <- as.data.frame(data) displayTable(dataset)
data <- datasets::EuStockMarkets dataset <- as.data.frame(data) displayTable(dataset)
This is the main function that provides exploratory data analysis plots
edaPlots( df, time_column, output_type = "summary", n_cols = -1, grey_bars = NULL )
edaPlots( df, time_column, output_type = "summary", n_cols = -1, grey_bars = NULL )
df |
Dataframe. A data frame object. |
time_column |
Character. The name of the time column in the data frame. Can be given only when the data is time series |
output_type |
Character. The name of the output to be displayed. Options are 'summary', 'histogram', 'boxplot', 'timeseries' & 'correlation'. Default value is summary. |
n_cols |
Numeric. A value to indicate how many columns to be included in the output. |
grey_bars |
List. A list of timestamps where each list contains two elements: start and end period, which will be highlighted in gray in the time series plot. Default value is NULL. |
Five objects which include time series plots, data distribution plots, box plots, correlation plot and a descriptive statistics table.
Vishwavani <[email protected]>
dataset <- data.frame(date = as.numeric(time(EuStockMarkets)), DAX = as.numeric(EuStockMarkets[, "DAX"]), SMI = as.numeric(EuStockMarkets[, "SMI"]), CAC = as.numeric(EuStockMarkets[, "CAC"]), FTSE = as.numeric(EuStockMarkets[, "FTSE"])) edaPlots(dataset) edaPlots(dataset, time_column = 'date', output_type = 'timeseries', n_cols = 4)
dataset <- data.frame(date = as.numeric(time(EuStockMarkets)), DAX = as.numeric(EuStockMarkets[, "DAX"]), SMI = as.numeric(EuStockMarkets[, "SMI"]), CAC = as.numeric(EuStockMarkets[, "CAC"]), FTSE = as.numeric(EuStockMarkets[, "FTSE"])) edaPlots(dataset) edaPlots(dataset, time_column = 'date', output_type = 'timeseries', n_cols = 4)
This is the main function to create transition probability matrix The transition probability matrix quantifies the likelihood of transitioning from one state to another. States: The table includes the current states and the possible next states. Probabilities: For each current state, it lists the probability of transitioning to each of the next possible states.
getTransitionProbability( df, cellid_column, time_column, type = "with_self_state" )
getTransitionProbability( df, cellid_column, time_column, type = "with_self_state" )
df |
Data frame. The input data frame should contain two columns, cell ID from scoreHVT function and time stamp of that dataset. |
cellid_column |
Character. Name of the column containing cell IDs. |
time_column |
Character. Name of the column containing time stamps. |
type |
Character. A character value indicating the type of transition probability table to create. Accepted entries are "with_self_state" and "without_self_state". |
Stores a data frames with transition probabilities.
PonAnuReka Seenivasan <[email protected]>, Vishwavani <[email protected]>
dataset <- data.frame(t = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) hvt.results<- trainHVT(dataset[-1],n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") scoring <- scoreHVT(dataset, hvt.results) cell_id <- scoring$scoredPredictedData$Cell.ID time_stamp <- dataset$t dataset <- data.frame(cell_id, time_stamp) table <- getTransitionProbability(dataset, cellid_column = "cell_id",time_column = "time_stamp")
dataset <- data.frame(t = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) hvt.results<- trainHVT(dataset[-1],n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") scoring <- scoreHVT(dataset, hvt.results) cell_id <- scoring$scoredPredictedData$Cell.ID time_stamp <- dataset$t dataset <- data.frame(cell_id, time_stamp) table <- getTransitionProbability(dataset, cellid_column = "cell_id",time_column = "time_stamp")
This is the main function to perform Monte Carlo simulations of Markov Chain on the dynamic forecasting of HVT States of a time series dataset. It includes both ex-post and ex-ante analysis offering valuable insights into future trends while resolving state transition challenges through clustering and nearest-neighbor methods to enhance simulation accuracy.
msm( state_time_data, forecast_type = "ex-post", initial_state, n_ahead_ante, transition_probability_matrix, num_simulations = 100, trainHVT_results, scoreHVT_results, actual_data = NULL, raw_dataset, k = 5, handle_problematic_states = FALSE, n_nearest_neighbor = 1, show_simulation = TRUE, mae_metric = "median" )
msm( state_time_data, forecast_type = "ex-post", initial_state, n_ahead_ante, transition_probability_matrix, num_simulations = 100, trainHVT_results, scoreHVT_results, actual_data = NULL, raw_dataset, k = 5, handle_problematic_states = FALSE, n_nearest_neighbor = 1, show_simulation = TRUE, mae_metric = "median" )
state_time_data |
DataFrame. A dataframe containing state transitions over time(cell id and timestamp) |
forecast_type |
Character. A character to indicate the type of forecasting. Accepted values are "ex-post" or "ex-ante". |
initial_state |
Numeric. An integer indicatiog the state at t0. |
n_ahead_ante |
Numeric. A vector of n ahead points to be predicted further in ex-ante analyzes. |
transition_probability_matrix |
DataFrame. A dataframe of transition probabilities/ output of 'getTransitionProbability' function |
num_simulations |
Integer. A number indicating the total number of simulations to run. Default is 100. |
trainHVT_results |
List.'trainHVT' function output |
scoreHVT_results |
List. 'scoreHVT' function output |
actual_data |
Dataframe. A dataFrame for ex-post prediction period with teh actual raw data values |
raw_dataset |
DataFrame. A dataframe of input raw dataset from the mean and standard deviation will be calculated to scale up the predicted values |
k |
Integer. A number of optimal clusters when handling problematic states. Default is 5. |
handle_problematic_states |
Logical. To indicate whether to handle problematic states or not. Default is FALSE. |
n_nearest_neighbor |
Integer. A number of nearest neighbors to consider when handling problematic states. Default is 1. |
show_simulation |
Logical. To indicate whether to show the simulation lines in plots or not. Default is TRUE. |
mae_metric |
Character. A character to indicate which metric to calculate Mean Absolute Error. Accepted entries are "mean", "median", or "mode". Default is "median". |
A list object that contains the forecasting plots and MAE values.
[[1]] |
Simulation plots and MAE values for state and centroids plot |
[[2]] |
Summary Table, Dendogram plot and Clustered Heatmap when handle_problematic_states is TRUE |
Vishwavani <[email protected]>
dataset <- data.frame(t = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) hvt.results<- trainHVT(dataset[,-1],n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") scoring <- scoreHVT(dataset, hvt.results) cell_id <- scoring$scoredPredictedData$Cell.ID time_stamp <- dataset$t temporal_data <- data.frame(cell_id, time_stamp) table <- getTransitionProbability(temporal_data, cellid_column = "cell_id",time_column = "time_stamp") colnames(temporal_data) <- c("Cell.ID","t") ex_post_forecasting <- dataset[1800:1860,] ex_post <- msm(state_time_data = temporal_data, forecast_type = "ex-post", transition_probability_matrix = table, initial_state = 2, num_simulations = 100, scoreHVT_results = scoring, trainHVT_results = hvt.results, actual_data = ex_post_forecasting, raw_dataset = dataset, mae_metric = "median", show_simulation = FALSE)
dataset <- data.frame(t = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) hvt.results<- trainHVT(dataset[,-1],n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") scoring <- scoreHVT(dataset, hvt.results) cell_id <- scoring$scoredPredictedData$Cell.ID time_stamp <- dataset$t temporal_data <- data.frame(cell_id, time_stamp) table <- getTransitionProbability(temporal_data, cellid_column = "cell_id",time_column = "time_stamp") colnames(temporal_data) <- c("Cell.ID","t") ex_post_forecasting <- dataset[1800:1860,] ex_post <- msm(state_time_data = temporal_data, forecast_type = "ex-post", transition_probability_matrix = table, initial_state = 2, num_simulations = 100, scoreHVT_results = scoring, trainHVT_results = hvt.results, actual_data = ex_post_forecasting, raw_dataset = dataset, mae_metric = "median", show_simulation = FALSE)
This is the main function for generating flow maps and animations based on transition probabilities including self states and excluding self states. Flow maps are a type of data visualization used to represent the transition probability of different states. Animations are the gifs used to represent the movement of data through the cells.
plotAnimatedFlowmap( hvt_model_output, transition_probability_df, df, animation = "All", flow_map = "All", fps_time = 1, fps_state = 1, time_duration = 2, state_duration = 2, cellid_column, time_column )
plotAnimatedFlowmap( hvt_model_output, transition_probability_df, df, animation = "All", flow_map = "All", fps_time = 1, fps_state = 1, time_duration = 2, state_duration = 2, cellid_column, time_column )
hvt_model_output |
List. Output from a trainHVT function. |
transition_probability_df |
List. Output from getTransitionProbability function |
df |
Data frame. The input dataframe should contain two columns, cell ID from scoreHVT function and time stamp of that dataset. |
animation |
Character. Type of animation ('state_based', 'time_based', 'All' or NULL) |
flow_map |
Character. Type of flow map ('self_state', 'without_self_state', 'All' or NULL) |
fps_time |
Numeric. A numeric value for the frames per second of the time transition gif. (Must be a numeric value and a factor of 100). Default value is 1. |
fps_state |
Numeric. A numeric value for the frames per second of the state transition gif. (Must be a numeric value and a factor of 100). Default value is 1. |
time_duration |
Numeric. A numeric value for the duration of the time transition gif. Default value is 2. |
state_duration |
Numeric. A numeric value for the duration of the state transition gif. Default value is 2. |
cellid_column |
Character. Name of the column containing cell IDs. |
time_column |
Character. Name of the column containing time stamps |
A list of flow maps and animation gifs.
PonAnuReka Seenivasan <[email protected]>, Vishwavani <[email protected]>
trainHVT
scoreHVT
getTransitionProbability
dataset <- data.frame(date = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) hvt.results<- trainHVT(dataset,n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") scoring <- scoreHVT(dataset, hvt.results) cell_id <- scoring$scoredPredictedData$Cell.ID time_stamp <- dataset$date dataset <- data.frame(cell_id, time_stamp) table <- getTransitionProbability(dataset, cellid_column = "cell_id",time_column = "time_stamp") plots <- plotAnimatedFlowmap(hvt_model_output = hvt.results, transition_probability_df = table, df = dataset, animation = 'All', flow_map = 'All',fps_time = 1,fps_state = 1,time_duration = 3, state_duration = 3,cellid_column = "cell_id", time_column = "time_stamp")
dataset <- data.frame(date = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) hvt.results<- trainHVT(dataset,n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") scoring <- scoreHVT(dataset, hvt.results) cell_id <- scoring$scoredPredictedData$Cell.ID time_stamp <- dataset$date dataset <- data.frame(cell_id, time_stamp) table <- getTransitionProbability(dataset, cellid_column = "cell_id",time_column = "time_stamp") plots <- plotAnimatedFlowmap(hvt_model_output = hvt.results, transition_probability_df = table, df = dataset, animation = 'All', flow_map = 'All',fps_time = 1,fps_state = 1,time_duration = 3, state_duration = 3,cellid_column = "cell_id", time_column = "time_stamp")
This is the main plotting function to construct hierarchical voronoi tessellations in 1D,2D or Interactive surface plot.
plotHVT( hvt.results, line.width = 0.5, color.vec = "black", centroid.size = 0.6, centroid.color = "black", child.level = 1, hmap.cols, separation_width = 7, layer_opacity = c(0.5, 0.75, 0.99), dim_size = 1000, plot.type = "2Dhvt", quant.error.hmap = NULL, cell_id = FALSE, cell_id_position = "bottom", cell_id_size = 2.6 )
plotHVT( hvt.results, line.width = 0.5, color.vec = "black", centroid.size = 0.6, centroid.color = "black", child.level = 1, hmap.cols, separation_width = 7, layer_opacity = c(0.5, 0.75, 0.99), dim_size = 1000, plot.type = "2Dhvt", quant.error.hmap = NULL, cell_id = FALSE, cell_id_position = "bottom", cell_id_size = 2.6 )
hvt.results |
(1D/2DProj/2Dhvt/2Dheatmap/surface_plot) List. A list containing the output of |
line.width |
(2Dhvt/2Dheatmap) Numeric Vector. A vector indicating the line widths of the tessellation boundaries for each level. |
color.vec |
(2Dhvt/2Dheatmap) Vector. A vector indicating the colors of the boundaries of the tessellations at each level. |
centroid.size |
(2Dhvt/2Dheatmap) Numeric Vector. A vector indicating the size of centroids for each level. |
centroid.color |
(2Dhvt/2Dheatmap) Numeric Vector. A vector indicating the color of centroids for each level. |
child.level |
(2Dheatmap/surface_plot) Numeric. Indicating the level for which the plot should be displayed |
hmap.cols |
(2Dheatmap/surface_plot) Numeric or Character. The column number or column name from the dataset indicating the variables for which the heat map is to be plotted. |
separation_width |
(surface_plot) Numeric. An integer indicating the width between hierarchical levels in surface plot |
layer_opacity |
(surface_plot) Numeric. A vector indicating the opacity of each hierarchical levels in surface plot |
dim_size |
(surface_plot) Numeric. An integer controls the resolution or granularity of the 3D surface grid |
plot.type |
Character. An option to indicate which type of plot should be generated. Accepted entries are '1D','2Dproj','2Dhvt','2Dheatmap'and 'surface_plot'. Default value is '2Dhvt'. |
quant.error.hmap |
(2Dheatmap) Numeric. A number representing the quantization error threshold to be highlighted in the heatmap. When a value is provided, it will emphasize cells with quantization errors equal or less than the specified threshold, indicating that these cells cannot be further subdivided in the next depth layer. The default value is NULL, meaning all cells will be colored in the heatmap across various depths. |
cell_id |
(2Dhvt/2Dheatmap) Logical. A logical indicating whether the cell IDs should be displayed |
cell_id_position |
(2Dhvt/2Dheatmap) Character. A character indicating the position of the cell IDs. Accepted entries are 'top' , 'bottom', 'left' and 'right'. |
cell_id_size |
(2Dhvt/2Dheatmap) Numeric. A numeric vector indicating the size of the cell IDs for all levels. |
plot object containing the visualizations of reduced dimension(1D/2D) for the given dataset.
Shubhra Prakash <[email protected]>, Sangeet Moy Das <[email protected]>, Vishwavani <[email protected]>
data("EuStockMarkets") hvt.results <- trainHVT(EuStockMarkets, n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method="kmeans") #change the 'plot.type' argument to '2Dproj' or '2DHVT' to visualize respective plots. plotHVT(hvt.results, plot.type='1D') #change the 'plot.type' argument to 'surface_plot' to visualize the Interactive surface plot plotHVT(hvt.results,child.level = 1, hmap.cols = "DAX", plot.type = '2Dheatmap')
data("EuStockMarkets") hvt.results <- trainHVT(EuStockMarkets, n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method="kmeans") #change the 'plot.type' argument to '2Dproj' or '2DHVT' to visualize respective plots. plotHVT(hvt.results, plot.type='1D') #change the 'plot.type' argument to 'surface_plot' to visualize the Interactive surface plot plotHVT(hvt.results,child.level = 1, hmap.cols = "DAX", plot.type = '2Dheatmap')
This is the main function that generates diagnostic plots for hierarchical voronoi tessellations models and scoring.
plotModelDiagnostics(model_obj)
plotModelDiagnostics(model_obj)
model_obj |
List. A list obtained from the trainHVT function or scoreHVT function |
For trainHVT, Minimum Intra-DataPoint Distance Plot, Minimum Intra-Centroid Distance Plot Mean Absolute Deviation Plot, Distribution of Number of Observations in Cells, for Training Data and Mean Absolute Deviation Plot for Validation Data are plotted. For scoreHVT Mean Absolute Deviation Plot for Training Data and Validation Data are plotted
Shubhra Prakash <[email protected]>
data("EuStockMarkets") hvt.results <- trainHVT(EuStockMarkets, n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE, quant_method="kmeans", diagnose = TRUE, hvt_validation = TRUE) plotModelDiagnostics(hvt.results)
data("EuStockMarkets") hvt.results <- trainHVT(EuStockMarkets, n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE, quant_method="kmeans", diagnose = TRUE, hvt_validation = TRUE) plotModelDiagnostics(hvt.results)
This is the main plotting function to construct hierarchical voronoi tessellations and highlight the outlier cells
plotNovelCells( plot.cells, hvt.map, line.width = c(0.6), color.vec = c("#141B41"), pch = 21, centroid.size = 0.5, title = NULL, maxDepth = 1 )
plotNovelCells( plot.cells, hvt.map, line.width = c(0.6), color.vec = c("#141B41"), pch = 21, centroid.size = 0.5, title = NULL, maxDepth = 1 )
plot.cells |
Vector. A vector indicating the cells to be highlighted in the map |
hvt.map |
List. A list containing the output of |
line.width |
Numeric Vector. A vector indicating the line widths of the tessellation boundaries for each level |
color.vec |
Vector. A vector indicating the colors of the boundaries of the tessellations at each level |
pch |
Numeric. Symbol of the centroids of the tessellations (parent levels) Default value is 21. |
centroid.size |
Numeric. Size of centroids of first level tessellations. Default value is 0.5 |
title |
String. Set a title for the plot. (default = NULL) |
maxDepth |
Numeric. An integer indicating the number of levels. (default = NULL) |
Returns a ggplot object containing hierarchical voronoi tessellation plot highlighting the outlier cells
Shantanu Vaidya <[email protected]>
data("EuStockMarkets") hvt.results <- trainHVT(EuStockMarkets, n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method="kmeans") #selected 55,58 are for demo purpose plotNovelCells(c(55,58),hvt.results)
data("EuStockMarkets") hvt.results <- trainHVT(EuStockMarkets, n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method="kmeans") #selected 55,58 are for demo purpose plotNovelCells(c(55,58),hvt.results)
This is the function that produces histograms displaying the distribution of Quantization Error (QE) values for both train and test datasets, highlighting mean values with dashed lines for quick evaluation.
plotQuantErrorHistogram(hvt.results, hvt.scoring)
plotQuantErrorHistogram(hvt.results, hvt.scoring)
hvt.results |
List. A list of hvt.results obtained from the trainHVT function. |
hvt.scoring |
List. A list of hvt.scoring obtained from the scoreHVT function. |
Returns the ggplot object containing the quantization error distribution plots for the given HVT results of training and scoring
Shubhra Prakash <[email protected]>
data("EuStockMarkets") dataset <- data.frame(date = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) rownames(EuStockMarkets) <- dataset$date #Split in train and test train <- EuStockMarkets[1:1302, ] test <- EuStockMarkets[1303:1860, ] hvt.results<- trainHVT(train,n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE, quant_method = "kmeans") scoring <- scoreHVT(test, hvt.results) plotQuantErrorHistogram(hvt.results, scoring)
data("EuStockMarkets") dataset <- data.frame(date = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) rownames(EuStockMarkets) <- dataset$date #Split in train and test train <- EuStockMarkets[1:1302, ] test <- EuStockMarkets[1303:1860, ] hvt.results<- trainHVT(train,n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE, quant_method = "kmeans") scoring <- scoreHVT(test, hvt.results) plotQuantErrorHistogram(hvt.results, scoring)
This is the main function to create a state transition plot from a data frame. A state transition plot is a type of data visualization used to represent the changes or transitions in states over time for a given system. State refers to a particular condition or status of a cell at a specific point in time. Transition refers to the change of state for a cell from one condition to another over time.
plotStateTransition( df, sample_size = NULL, line_plot = NULL, cellid_column, time_column, v_intercept = NULL, time_periods = NULL )
plotStateTransition( df, sample_size = NULL, line_plot = NULL, cellid_column, time_column, v_intercept = NULL, time_periods = NULL )
df |
Data frame. The Input data frame should contain two columns. Cell ID from scoreHVT function and time stamp of that dataset. |
sample_size |
Numeric. An integer indicating the fraction of the data frame to visualize in the plot. Default value is 0.2 |
line_plot |
Logical. A logical value indicating to create a line plot. Default value is NULL. |
cellid_column |
Character. Name of the column containing cell IDs. |
time_column |
Character. Name of the column containing time stamps. |
v_intercept |
Numeric. A numeric value indicating the time stamp to draw a vertical line on the plot. |
time_periods |
List. A list of vectors, each containing start and end times for highlighting time periods. |
A plotly object representing the state transition plot for the given data frame.
PonAnuReka Seenivasan <[email protected]>, Vishwavani <[email protected]>
dataset <- data.frame(date = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) hvt.results<- trainHVT(dataset,n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") scoring <- scoreHVT(dataset, hvt.results) cell_id <- scoring$scoredPredictedData$Cell.ID time_stamp <- dataset$date dataset <- data.frame(cell_id, time_stamp) plotStateTransition(dataset, sample_size = 1, cellid_column = "cell_id",time_column = "time_stamp")
dataset <- data.frame(date = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) hvt.results<- trainHVT(dataset,n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") scoring <- scoreHVT(dataset, hvt.results) cell_id <- scoring$scoredPredictedData$Cell.ID time_stamp <- dataset$date dataset <- data.frame(cell_id, time_stamp) plotStateTransition(dataset, sample_size = 1, cellid_column = "cell_id",time_column = "time_stamp")
This is the main function to plot the z scores against cell ids.
plotZscore( data, cell_range = NULL, segment_size = 2, reference_lines = c(-1.65, 1.65) )
plotZscore( data, cell_range = NULL, segment_size = 2, reference_lines = c(-1.65, 1.65) )
data |
Data frame. A data frame of cell id and features. |
cell_range |
Vector. A numeric vector of cell id range for which the plot should be displayed. Default is NULL, which plots all the cells. |
segment_size |
Integer. A numeric value to indicate the size of the bars in the plot. Default is 2. |
reference_lines |
Vector. A numeric vector of confidence interval values for the reference lines in the plot. Default is c(-1.65, 1.65). |
A grid of plots of z score against cell id of teh given features.
Vishwavani <[email protected]>
data("EuStockMarkets") dataset <- data.frame(t = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) rownames(EuStockMarkets) <- dataset$t hvt.results<- trainHVT(dataset[-1],n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") col_names <- c("Cell.ID","DAX","SMI","CAC","FTSE") data <- dplyr::arrange(dplyr::select(hvt.results[[3]][["summary"]],col_names),Cell.ID) data <- round(data, 2) plotZscore(data)
data("EuStockMarkets") dataset <- data.frame(t = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) rownames(EuStockMarkets) <- dataset$t hvt.results<- trainHVT(dataset[-1],n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") col_names <- c("Cell.ID","DAX","SMI","CAC","FTSE") data <- dplyr::arrange(dplyr::select(hvt.results[[3]][["summary"]],col_names),Cell.ID) data <- round(data, 2) plotZscore(data)
This is the main function for creating reconciliation plots and tables which helps in comparing the transition probabilities calculated manually and from markovchain function
reconcileTransitionProbability( df, hmap_type = NULL, cellid_column, time_column )
reconcileTransitionProbability( df, hmap_type = NULL, cellid_column, time_column )
df |
Data frame. The input data frame should contain two columns, cell ID from scoreHVT function and timestamp of that dataset. |
hmap_type |
Character. ('self_state', 'without_self_state', or 'All') |
cellid_column |
Character. Name of the column containing cell IDs. |
time_column |
Character. Name of the column containing timestamps |
A list of plotly heatmap objects and tables representing the transition probability heatmaps.
PonAnuReka Seenivasan <[email protected]>, Vishwavani <[email protected]>
dataset <- data.frame(date = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) hvt.results<- trainHVT(dataset,n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") scoring <- scoreHVT(dataset, hvt.results) cell_id <- scoring$scoredPredictedData$Cell.ID time_stamp <- dataset$date dataset <- data.frame(cell_id, time_stamp) reconcileTransitionProbability(dataset, hmap_type = "All", cellid_column = "cell_id", time_column = "time_stamp")
dataset <- data.frame(date = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) hvt.results<- trainHVT(dataset,n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") scoring <- scoreHVT(dataset, hvt.results) cell_id <- scoring$scoredPredictedData$Cell.ID time_stamp <- dataset$date dataset <- data.frame(cell_id, time_stamp) reconcileTransitionProbability(dataset, hmap_type = "All", cellid_column = "cell_id", time_column = "time_stamp")
This function is used to remove the identified novelty cells.
removeNovelty(outlier_cells, hvt_results)
removeNovelty(outlier_cells, hvt_results)
outlier_cells |
Vector. A vector with the cell number of the identified novelty |
hvt_results |
List. A list having the results of the compressed map i.e. output of |
A list of two items
[[1]] |
Dataframe of novelty cell(s) |
[[2]] |
Dataframe without the novelty cell(s) from the dataset used in model training |
Shantanu Vaidya <[email protected]>
data("EuStockMarkets") hvt.results <- trainHVT(EuStockMarkets, n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method="kmeans") identified_Novelty_cells <<- c(2, 10) output_list <- removeNovelty(identified_Novelty_cells, hvt.results) data_with_novelty <- output_list[[1]] data_without_novelty <- output_list[[2]]
data("EuStockMarkets") hvt.results <- trainHVT(EuStockMarkets, n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method="kmeans") identified_Novelty_cells <<- c(2, 10) output_list <- removeNovelty(identified_Novelty_cells, hvt.results) data_with_novelty <- output_list[[1]] data_without_novelty <- output_list[[2]]
This function scores each data point in the test dataset based on a trained hierarchical Voronoi tessellations model.
scoreHVT( dataset, hvt.results.model, child.level = 1, mad.threshold = 0.2, line.width = 0.6, color.vec = c("navyblue", "slateblue", "lavender"), normalize = TRUE, distance_metric = "L1_Norm", error_metric = "max", yVar = NULL, analysis.plots = FALSE, names.column = NULL )
scoreHVT( dataset, hvt.results.model, child.level = 1, mad.threshold = 0.2, line.width = 0.6, color.vec = c("navyblue", "slateblue", "lavender"), normalize = TRUE, distance_metric = "L1_Norm", error_metric = "max", yVar = NULL, analysis.plots = FALSE, names.column = NULL )
dataset |
Data frame. A data frame which to be scored. Can have categorical columns if 'analysis.plots' are required. |
hvt.results.model |
List. A list obtained from the trainHVT function |
child.level |
Numeric. A number indicating the depth for which the heat map is to be plotted. |
mad.threshold |
Numeric. A numeric value indicating the permissible Mean Absolute Deviation. |
line.width |
Vector. A vector indicating the line widths of the tessellation boundaries for each layer. |
color.vec |
Vector. A vector indicating the colors of the tessellation boundaries at each layer. |
normalize |
Logical. A logical value indicating if the dataset should be normalized. When set to TRUE, the data (testing dataset) is standardized by ‘mean’ and ‘sd’ of the training dataset referred from the trainHVT(). When set to FALSE, the data is used as such without any changes. |
distance_metric |
Character. The distance metric can be L1_Norm(Manhattan) or L2_Norm(Eucledian). L1_Norm is selected by default. The distance metric is used to calculate the distance between an n dimensional point and centroid. The distance metric can be different from the one used during training. |
error_metric |
Character. The error metric can be mean or max. max is selected by default. max will return the max of m values and mean will take mean of m values where each value is a distance between a point and centroid of the cell. |
yVar |
Character. A character or a vector representing the name of the dependent variable(s) |
analysis.plots |
Logical. A logical value indicating that the scored plot should be plotted or not. If TRUE, the identifier column(character column) name should be supplied in 'names.column' argument. The output will be a 2D heatmap plotly which gives info on the cell id and the observations of a cell. |
names.column |
Character. A character or a vector representing the name of the identifier column/character column. |
Dataframe containing scored data, plots and summary
Shubhra Prakash <[email protected]>, Sangeet Moy Das <[email protected]> , Vishwavani <[email protected]>
data("EuStockMarkets") dataset <- data.frame(date = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) rownames(EuStockMarkets) <- dataset$date # Split in train and test train <- EuStockMarkets[1:1302, ] test <- EuStockMarkets[1303:1860, ] #model training hvt.results<- trainHVT(train,n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") scoring <- scoreHVT(test, hvt.results) data_scored <- scoring$scoredPredictedData
data("EuStockMarkets") dataset <- data.frame(date = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) rownames(EuStockMarkets) <- dataset$date # Split in train and test train <- EuStockMarkets[1:1302, ] test <- EuStockMarkets[1303:1860, ] #model training hvt.results<- trainHVT(train,n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") scoring <- scoreHVT(test, hvt.results) data_scored <- scoring$scoredPredictedData
This function that scores the cell and corresponding layer for each data point in a test dataset using three hierarchical vector quantization (HVT) models (Map A, Map B, Map C) and returns a data frame containing the scored layer output. The function incorporates the scored results from each map and merges them to provide a comprehensive result.
scoreLayeredHVT( data, hvt_mapA, hvt_mapB, hvt_mapC, mad.threshold = 0.2, normalize = TRUE, seed = 300, distance_metric = "L1_Norm", error_metric = "max", child.level = 1, yVar = NULL )
scoreLayeredHVT( data, hvt_mapA, hvt_mapB, hvt_mapC, mad.threshold = 0.2, normalize = TRUE, seed = 300, distance_metric = "L1_Norm", error_metric = "max", child.level = 1, yVar = NULL )
data |
Data Frame. A data frame containing test dataset. The data frame should have all the variable(features) used for training. |
hvt_mapA |
A list of hvt.results.model obtained from trainHVT function while performing 'trainHVT()' on train data |
hvt_mapB |
A list of hvt.results.model obtained from trainHVT function while performing 'trainHVT()' on data with novelty(s) |
hvt_mapC |
A list of hvt.results.model obtained from trainHVT function while performing 'trainHVT()' on data without novelty(s) |
mad.threshold |
Numeric. A number indicating the permissible Mean Absolute Deviation |
normalize |
Logical. A logical value indicating if the dataset should be normalized. When set to TRUE, the data (testing dataset) is standardized by 'mean' and 'sd' of the training dataset referred from the trainHVT(). When set to FALSE, the data is used as such without any changes. (Default value is TRUE). |
seed |
Numeric. Random Seed. |
distance_metric |
Character. The distance metric can be L1_Norm(Manhattan) or L2_Norm(Eucledian). L1_Norm is selected by default. The distance metric is used to calculate the distance between an n dimensional point and centroid. The distance metric can be different from the one used during training. |
error_metric |
Character. The error metric can be mean or max. max is selected by default. max will return the max of m values and mean will take mean of m values where each value is a distance between a point and centroid of the cell. |
child.level |
Numeric. A number indicating the level for which the heat map is to be plotted. |
yVar |
Character. A character or a vector representing the name of the dependent variable(s) |
Dataframe containing scored layer output
Shubhra Prakash <[email protected]>, Sangeet Moy Das <[email protected]>, Shantanu Vaidya <[email protected]>,Somya Shambhawi <[email protected]>
data("EuStockMarkets") dataset <- data.frame(date = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) rownames(EuStockMarkets) <- dataset$date train <- EuStockMarkets[1:1302, ] test <- EuStockMarkets[1303:1860, ] ###MAP-A hvt_mapA <- trainHVT(train, n_cells = 150, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") identified_Novelty_cells <- c(127,55,83,61,44,35,27,77) output_list <- removeNovelty(identified_Novelty_cells, hvt_mapA) data_with_novelty <- output_list[[1]] data_with_novelty <- data_with_novelty[, -c(1,2)] ### MAP-B hvt_mapB <- trainHVT(data_with_novelty,n_cells = 10, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") data_without_novelty <- output_list[[2]] ### MAP-C hvt_mapC <- trainHVT(data_without_novelty,n_cells = 135, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", quant_method = "kmeans", normalize = TRUE) ##SCORE LAYERED data_scored <- scoreLayeredHVT(test, hvt_mapA, hvt_mapB, hvt_mapC)
data("EuStockMarkets") dataset <- data.frame(date = as.numeric(time(EuStockMarkets)), DAX = EuStockMarkets[, "DAX"], SMI = EuStockMarkets[, "SMI"], CAC = EuStockMarkets[, "CAC"], FTSE = EuStockMarkets[, "FTSE"]) rownames(EuStockMarkets) <- dataset$date train <- EuStockMarkets[1:1302, ] test <- EuStockMarkets[1303:1860, ] ###MAP-A hvt_mapA <- trainHVT(train, n_cells = 150, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") identified_Novelty_cells <- c(127,55,83,61,44,35,27,77) output_list <- removeNovelty(identified_Novelty_cells, hvt_mapA) data_with_novelty <- output_list[[1]] data_with_novelty <- data_with_novelty[, -c(1,2)] ### MAP-B hvt_mapB <- trainHVT(data_with_novelty,n_cells = 10, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method = "kmeans") data_without_novelty <- output_list[[2]] ### MAP-C hvt_mapC <- trainHVT(data_without_novelty,n_cells = 135, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", quant_method = "kmeans", normalize = TRUE) ##SCORE LAYERED data_scored <- scoreLayeredHVT(test, hvt_mapA, hvt_mapB, hvt_mapC)
This is the main function for displaying summary from model training and scoring
summary(data, limit = 20, scroll = TRUE)
summary(data, limit = 20, scroll = TRUE)
data |
List. A listed object from trainHVT or scoreHVT |
limit |
Numeric. A value to indicate how many rows to display. |
scroll |
Logical. A value to indicate whether to display scroll bar or not. Default value is TRUE. |
A consolidated table of summary for training, scoring and forecasting
Vishwavani <[email protected]>, Alimpan Dey <[email protected]>
data <- datasets::EuStockMarkets dataset <- as.data.frame(data) #model training hvt.results <- trainHVT(dataset, n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE, quant_method = "kmeans", dim_reduction_method = 'sammon') summary(data = hvt.results)
data <- datasets::EuStockMarkets dataset <- as.data.frame(data) #model training hvt.results <- trainHVT(dataset, n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE, quant_method = "kmeans", dim_reduction_method = 'sammon') summary(data = hvt.results)
This is the main function to construct hierarchical voronoi tessellations. This is done using hierarchical vector quantization(hvq). The data is represented in 2D coordinates and the tessellations are plotted using these coordinates as centroids. For subsequent levels, transformation is performed on the 2D coordinates to get all the points within its parent tile. Tessellations are plotted using these transformed points as centroids.
trainHVT( dataset, min_compression_perc = NA, n_cells = NA, depth = 1, quant.err = 0.2, normalize = FALSE, distance_metric = "L1_Norm", error_metric = "max", quant_method = "kmeans", scale_summary = NA, diagnose = FALSE, hvt_validation = FALSE, train_validation_split_ratio = 0.8, dim_reduction_method = "sammon", tsne_theta = 0.2, tsne_eta = 200, tsne_perplexity = 30, tsne_verbose = TRUE, tsne_max_iter = 500, umap_n_neighbors = 60, umap_n_components = 2, umap_min_dist = 0.1 )
trainHVT( dataset, min_compression_perc = NA, n_cells = NA, depth = 1, quant.err = 0.2, normalize = FALSE, distance_metric = "L1_Norm", error_metric = "max", quant_method = "kmeans", scale_summary = NA, diagnose = FALSE, hvt_validation = FALSE, train_validation_split_ratio = 0.8, dim_reduction_method = "sammon", tsne_theta = 0.2, tsne_eta = 200, tsne_perplexity = 30, tsne_verbose = TRUE, tsne_max_iter = 500, umap_n_neighbors = 60, umap_n_components = 2, umap_min_dist = 0.1 )
dataset |
Data frame. A data frame, with numeric columns (features) will be used for training the model. |
min_compression_perc |
Numeric. An integer, indicating the minimum compression percentage to be achieved for the dataset. It indicates the desired level of reduction in dataset size compared to its original size. |
n_cells |
Numeric. An integer, indicating the number of cells per hierarchy (level). |
depth |
Numeric. An integer, indicating the number of levels. A depth of 1 means no hierarchy (single level), while higher values indicate multiple levels (hierarchy). |
quant.err |
Numeric. A number indicating the quantization error threshold. A cell will only breakdown into further cells if the quantization error of the cell is above the defined quantization error threshold. |
normalize |
Logical. A logical value indicating if the dataset should be normalized. When set to TRUE, scales the values of all features to have a mean of 0 and a standard deviation of 1 (Z-score). |
distance_metric |
Character. The distance metric can be L1_Norm(Manhattan) or L2_Norm(Eucledian). L1_Norm is selected by default. The distance metric is used to calculate the distance between an n dimensional point and centroid. |
error_metric |
Character. The error metric can be mean or max. max is selected by default. max will return the max of m values and mean will take mean of m values where each value is a distance between a point and centroid of the cell. |
quant_method |
Character. The quantization method can be kmeans or kmedoids. Kmeans uses means (centroids) as cluster centers while Kmedoids uses actual data points (medoids) as cluster centers. kmeans is selected by default. |
scale_summary |
List. A list with user-defined mean and standard deviation values for all the features in the dataset. Pass the scale summary when normalize is set to FALSE. |
diagnose |
Logical. A logical value indicating whether user wants to perform diagnostics on the model. Default value is FALSE. |
hvt_validation |
Logical. A logical value indicating whether user wants to holdout a validation set and find mean absolute deviation of the validation points from the centroid. Default value is FALSE. |
train_validation_split_ratio |
Numeric. A numeric value indicating train validation split ratio. This argument is only used when hvt_validation has been set to TRUE. Default value for the argument is 0.8. |
dim_reduction_method |
Character.The dim_reduction_method can be one of "tsne", "umap", "sammon". |
tsne_theta |
Numeric.The tsne_theta is only used when dim_reduction_method is set to "tsne". Default value is 0.5 and common values are between 0.2 and 0.5. |
tsne_eta |
Numeric.The tsne_eta are used only when dim_reduction method is set to "tsne". Default value is 200. |
tsne_perplexity |
Numeric.The tsne_perplexity is only used when dim_reduction_method is set to "tsne". Default value is 30 and common values are between between 30 and 50. |
tsne_verbose |
Logical. A logical value which indicates the t-SNE algorithm to print detailed information about its progress to the console. |
tsne_max_iter |
Numeric.The tsne_max_iter is used only when dim_reduction_method is set to "tsne". Default value is 1000.More iterations can improve results but increase computation time. |
umap_n_neighbors |
Integer.The umap_n_neighbors is used only when dim_reduction_method is set to "umap". Default value is 15.Controls the balance between local and global structure in data. |
umap_n_components |
Integer.The umap_n_components is used only when dim_reduction_method is set to "umap". Default value is 2.Indicates the number of dimensions for embedding. |
umap_min_dist |
Numeric.The umap_map_dist is used only when dim_reduction_method is set to "umap". Default value is 0.1.Controls how tightly UMAP packs points together. |
A Nested list that contains the hierarchical tessellation information. This list has to be given as input argument to plot the tessellations.
[[1]] |
A list containing information related to plotting tessellations. This information will include coordinates, boundaries, and other details necessary for visualizing the tessellations |
[[2]] |
A list containing information related to Sammon’s projection coordinates of the data points in the reduced-dimensional space. |
[[3]] |
A list containing detailed information about the hierarchical vector quantized data along with a summary section containing no of points, Quantization Error and the centroids for each cell. |
[[4]] |
A list that contains all the diagnostics information of the model when diagnose is set to TRUE. Otherwise NA. |
[[5]] |
A list that contains all the information required to generates a Mean Absolute Deviation (MAD) plot, if hvt_validation is set to TRUE. Otherwise NA |
[[6]] |
A list containing detailed information about the hierarchical vector quantized data along with a summary section containing no of points, Quantization Error and the centroids for each cell which is the output of 'hvq' |
[[7]] |
model info: A list that contains model-generated timestamp, input parameters passed to the model , the validation results and the dimensionality reduction evaluation metrics table. |
Shubhra Prakash <[email protected]>, Sangeet Moy Das <[email protected]>, Shantanu Vaidya <[email protected]>,Bidesh Ghosh <[email protected]>,Alimpan Dey <[email protected]>
data("EuStockMarkets") hvt.results <- trainHVT(EuStockMarkets, n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method="kmeans")
data("EuStockMarkets") hvt.results <- trainHVT(EuStockMarkets, n_cells = 60, depth = 1, quant.err = 0.1, distance_metric = "L1_Norm", error_metric = "max", normalize = TRUE,quant_method="kmeans")