check_zero_width_intervals() checks interval-valued data for zero-width
intervals (min == max). It accepts both MM format (a data.frame with
paired _min/_max columns) and RSDA format (a symbolic_tbl with
symbolic_interval columns; non-interval columns are ignored). It returns
(invisibly) a logical scalar carrying a "flagged" [n, p] matrix of
degenerate cells and a "variables" vector of affected variables, and
warns by default when any are found. Useful for screening data before
passing it to downstream tools that divide by interval width (e.g.
ggInterval::ggInterval_indexImage()). A tol argument allows near-zero
widths to be flagged.
aggregate_to_symbolic(type = "int") gains a zero_width argument
controlling how zero-width intervals (min == max) in the aggregated output
are handled: "keep" (default) leaves the output untouched, returning any
zero-width intervals as-is without warning; "remove" drops every concept
containing a zero-width interval; "regenerate" re-runs the aggregation
(re-clustering or re-sampling) until none remain (effective only for
stochastic group_by such as "kmeans"/"resampling"); and "adjust"
adds a small epsilon (default 1e-07) to the upper endpoint of each
zero-width interval. A companion epsilon argument sets the adjustment
amount. For "remove" and "adjust", a single warning names the affected
variables and the action taken. Such degenerate intervals otherwise break
downstream tools such as ggInterval::ggInterval_indexImage() (which divides
by interval width); use check_zero_width_intervals() to screen for them.
Nine interval time series (ITS) datasets from global financial markets and meteorology:
sp500.its — S&P 500 index daily OHLC intervals.djia.its — Dow Jones Industrial Average daily OHLC intervals.ibovespa.its — Ibovespa (Brazil) daily OHLC intervals.crude_oil_wti.its — WTI crude oil daily OHLC intervals.merval.its — MERVAL (Argentina) daily OHLC intervals.petrobras.its — Petrobras stock daily OHLC intervals.euro_usd.its — EUR/USD exchange rate daily OHLC intervals.shanghai_stock.its — Shanghai Composite daily OHLC intervals.irish_wind.its — Irish wind energy daily intervals.Renamed 7 datasets with proper type suffixes for consistency:
| Old name | New name |
|---|---|
| airline_flights2 | airline_flights2.modal |
| crime | crime.modal |
| crime2 | crime2.modal |
| fuel_consumption | fuel_consumption.modal |
| health_insurance2 | health_insurance2.modal |
| occupations | occupations.modal |
| occupations2 | occupations2.modal |
Also added mushroom.int.mm (interval multi-modal format of mushroom data).
aggregate_to_symbolic() — convert traditional data to symbolic data format.read_symbolic_csv(), write_symbolic_csv() — read/write symbolic data as CSV.search_data() — search available datasets by keyword or type.to_all_interval_formats() — convert intervals to all supported formats at once.ARRAY_to_MM(), ARRAY_to_RSDA(), ARRAY_to_iGAP(), MM_to_ARRAY(), RSDA_to_ARRAY(), SODAS_to_ARRAY(), iGAP_to_ARRAY().int_dist p parameter bug.@usage defaults, @details, and examples.NOT_CRAN).@section Metadata blocks.R CMD check --as-cran: 0 errors, 0 warnings, 0 notes.Seventeen new datasets from R packages (intkrige, RSDA, MAINT.Data, GPCSIV, HistDAWass, symbolicDA), the Billard & Diday (2007) textbook, and the QualAr Portuguese air quality network:
utsnow.int — 415 Utah weather stations with snow load prediction intervals plus coordinates/elevation (from intkrige).lynne1.int — 10 observations with pulse rate, systolic, and diastolic pressure intervals (from RSDA).loans_by_risk_quantile.int — 35 Lending Club loan groups (A1-G5) with 4 quantile-based financial intervals (from MAINT.Data).judge1.int — 6 regions rated by Judge 1 on 4 interval variables (from GPCSIV).judge2.int — 6 regions rated by Judge 2 on 4 interval variables (from GPCSIV).judge3.int — 6 regions rated by Judge 3 on 4 interval variables (from GPCSIV).video1.int — 10 user groups with 5 video engagement interval metrics (from GPCSIV).video2.int — 10 user groups with 5 video engagement interval metrics (from GPCSIV).video3.int — 10 user groups with 5 video engagement interval metrics (from GPCSIV).lisbon_air_quality.int — 1096 daily observations of 8 pollutant concentration intervals from Lisbon (QualAr).blood.hist — 14 gender-age groups with cholesterol, hemoglobin, and hematocrit histograms (from HistDAWass).china_climate_month.hist — 60 Chinese weather stations with 168 monthly climate histograms (from HistDAWass).china_climate_season.hist — 60 Chinese stations with 56 seasonal climate histograms (from HistDAWass).exchange_rate_returns.hist — 108 monthly exchange rate return histograms (from HistDAWass).polish_cars.mix — 30 Polish car models with 9 interval + 3 multinomial variables (from symbolicDA).hierarchy.hist — 10 observations with hierarchical categories, conditional histograms, and cholesterol interval (Table 6.20).bird_color_taxonomy.hist — 20 birds with density/size histograms, tone, and fuzzy shade taxonomy (Tables 6.9/6.14).Nineteen new datasets from Billard & Diday (2020) Clustering Methodology for Symbolic Data, the HistDAWass R package, and other R packages:
genome_abundances.int — 14 genome classes with 10 dinucleotide abundance intervals (Table 3-16).china_temp_monthly.int — 15 Chinese weather stations with 12 monthly temperature intervals + elevation (Table 7-9).ecoli_routes.int — 9 E. coli transport routes with 5 biochemical interval variables (Table 8-10).loans_by_risk.int — 35 Lending Club loan groups by risk level (A-G) with 4 financial intervals (from MAINT.Data).polish_voivodships.int — 18 Polish voivodships with 9 socio-economic interval variables (from clusterSim).iris_species.hist — 3 iris species with 4 morphological histogram variables (Table 4-10).flights_detail.hist — 16 airlines with 5 flight performance histograms (Table 5-1).cover_types.hist — 7 forest cover types with 4 topographic histograms (Table 7-21).glucose.hist — 4 regions with blood glucose histograms (Table 4-14).state_income.hist — 6 US states with 4 income distribution histograms (Table 7-18).simulated.hist — 5 simulated observations with 2 histogram variables (Table 7-26).age_pyramids.hist — 229 countries with 3 age pyramid histograms (from HistDAWass).ozone.hist — 84 daily observations with 4 weather histograms (from HistDAWass).french_agriculture.hist — 22 French regions with 4 agricultural histograms (from HistDAWass).household_characteristics.distr — 12 counties with 3 categorical distribution variables (Table 6-1).county_income_gender.hist — 12 counties with gendered income histograms + sample sizes (Table 6-16).joggers.mix — 10 jogger groups with pulse rate intervals + running time histograms (Table 2-5).census.mix — 10 census regions with 6 mixed-type variables: histograms, distributions, multi-valued sets, and intervals (Table 7-23).mtcars.mix — 5 car groups with 7 interval + 4 modal variables (from ggInterval).Thirteen new datasets extracted from R packages and the Billard & Diday (2006) textbook:
cardiological.int — 44 patients with 5 interval-valued physiological measurements (from RSDA).prostate.int — 97 prostate cancer patients with 9 clinical interval variables (from RSDA).uscrime.int — 46 US states with 102 interval-valued crime statistics (from RSDA).hardwood.hist — 5 hardwood tree species with 4 histogram-valued climate variables (from RSDA).synthetic_clusters.int — 125 observations in 5 clusters with 6 interval variables (from symbolicDA).environment.mix — 14 EPA state groups with mixed interval/modal environmental data (from ggInterval).weight_age.hist — 7 age groups with histogram-valued weight distributions (Table 3.10).hospital.hist — 15 hospitals with histogram-valued cost distributions (Table 3.12).cholesterol.hist — 14 gender-age groups with cholesterol histograms (Table 4.5).hemoglobin.hist — 14 gender-age groups with hemoglobin histograms (Table 4.6).hematocrit.hist — 14 gender-age groups with hematocrit histograms (Table 4.14).hematocrit_hemoglobin.hist — 10 observations with bivariate 2-bin histograms (Table 6.8).energy_usage.distr — 10 towns with categorical fuel/heating distributions (Table 3.7).Seven new interval-valued benchmark datasets from recent SDA papers (2020-2025):
freshwater_fish.int — 12 freshwater fish species with 13 heavy metal bioaccumulation variables, 4 feeding classes (Andrade et al., 2025).fungi.int — 55 fungi specimens with 5 morphological variables, 3 genera: Amanita, Agaricus, Boletus (Andrade et al., 2025).iris.int — 30 interval observations of Fisher's iris data, 4 sepal/petal variables, 3 species (Andrade et al., 2025).water_flow.int — 316 water flow sensor readings with 47 interval features, 2 classes (Andrade et al., 2025).wine.int — 33 wine samples with 9 chemical/physical property variables, 2 classes (Andrade et al., 2025).car_models.int — 33 Italian car models with 8 specification variables, 4 categories (Andrade et al., 2025).hdi_gender.int — 183 countries with 2 World Bank gender indicator intervals and ordinal HDI classification (Alcacer et al., 2023).MM_to_RSDA() — convert MM format (_min/_max columns) to RSDA format (symbolic_tbl with complex-encoded intervals).iGAP_to_RSDA() — convert iGAP format to RSDA format via iGAP_to_MM → MM_to_RSDA.int_list_conversions() now returns 8 conversions (was 6), including MM_to_RSDA and iGAP_to_RSDA.int_convert_format() now supports to = "RSDA" for MM and iGAP sources with auto-detection.Adopted snake_case naming with type suffixes (.int, .hist, .mix, .distr, .iGAP) for all datasets. Renamed 10 existing datasets:
| Old name | New name |
|---|---|
| Abalone | abalone.int |
| Abalone.iGAP | abalone.iGAP |
| Cars.int | cars.int |
| ChinaTemp.int | china_temp.int |
| Face.iGAP | face.iGAP |
| LoansbyPurpose.int | loans_by_purpose.int |
| bird.int | bird.mix |
| soccer.bivar.int | soccer_bivar.int |
| airline_flights | airline_flights.hist |
| health_insurance | health_insurance.mix |
acid_rain.int, bats.int, credit_card.int, employment.int, oils.int, teams.int, tennis.int, temperature_city.int, trivial_intervals.int, world_cup.int (interval-valued)bird_species.mix, bird_species_extended.mix, town_services.mix (mixed symbolic)lung_cancer.hist (histogram-valued)energy_consumption.distr (distribution-valued)bank_rates, mushroom_fuzzy (other)interval_format_conversions.R):
int_detect_format() — automatically detect interval data format (RSDA, MM, iGAP, SODAS).int_list_conversions() — list available format conversion functions, with optional filtering by source/target format.int_convert_format() — unified interface for all interval format conversions with auto-detection.R CMD check: 0 errors, 0 warnings, 0 notes.interval_dist.R): int_dist(), int_dist_matrix(), int_pairwise_dist(), int_dist_all() — 14 distance measures (GD, IY, L1, L2, CB, HD, EHD, nEHD, snEHD, TD, WD, minkowski, ichino, de_carvalho) with method aliases (euclidean, hausdorff, manhattan, city_block, wasserstein).interval_geometry.R): int_width(), int_radius(), int_center(), int_overlap(), int_containment(), int_midrange().interval_position.R): int_median(), int_quantile(), int_range(), int_iqr(), int_mad(), int_mode() — all support 8 methods (CM, VM, QM, SE, FV, EJD, GQ, SPT).interval_robust.R): int_trimmed_mean(), int_winsorized_mean(), int_trimmed_var(), int_winsorized_var().interval_shape.R): int_skewness(), int_kurtosis(), int_symmetry(), int_tailedness().interval_similarity.R): int_jaccard(), int_dice(), int_cosine(), int_overlap_coefficient(), int_tanimoto(), int_similarity_matrix().interval_uncertainty.R): int_entropy(), int_cv(), int_dispersion(), int_imprecision(), int_granularity(), int_uniformity(), int_information_content().R/interval_format_conversions.R, organized by target format.R/interval_utils.R: consolidated 7 internal interval helpers, format-preparation functions (RSDA_format, set_variable_format, clean_colnames), and their internal helpers.R/histogram_utils.R: 8 internal helpers for histogram statistics.utils_validation.R to validation.R.R CMD check: 0 errors, 0 warnings, 0 notes. All 399 tests pass.R/utils_validation.R: 11 internal validation helpers centralizing all checks.RSDA_format fix: replaced 4 return("Error") with proper stop() calls.R CMD check: 0 errors, 0 warnings.NEWS.md with changelog for all versions.int_mean, int_var, int_cov, int_cor (8 methods).hist_mean, hist_var, hist_cov, hist_cor.