This vignette reproduces the weather example described in Puchhammer, Wilms and Filzmoser (2025). The original data is from Geosphere Austria (2022) and included in this package.
library(ssMRCD)
library(ggplot2)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, unionThe original data from GeoSphere Austria (2022) is pre-cleaned and
saved in the data frame object weatherAUT2021. Additional
information can be found on the helping page.
# load the data
data("weatherAUT2021")
# inspect the data
head(weatherAUT2021)
#> p s vv t rsum rel name lon lat alt
#> 1 941.35 154.5 2.15 6.15 49.5 77.5 AIGEN/ENNSTAL 14.13833 47.53278 641
#> 2 945.80 172.5 2.95 6.25 39.5 76.0 ALLENTSTEIG 15.36694 48.69083 599
#> 3 985.50 152.0 2.55 8.20 46.5 78.0 AMSTETTEN 14.89500 48.10889 266
#> 4 893.20 123.5 1.85 5.10 66.0 74.5 BAD GASTEIN 13.13333 47.11055 1092
#> 5 984.50 176.0 1.50 8.45 40.5 75.0 BAD GLEICHENBERG 15.90361 46.87222 269
#> 6 992.05 188.5 2.20 8.95 60.5 77.0 BAD RADKERSBURG 15.99333 46.69222 207
# select variables, station names and number of observations
data = weatherAUT2021 %>% select(p:rel)
stations = weatherAUT2021$name
n = dim(data)[1]The predefined groups are based in the underlying geographical landscape consisting of Alpine mountains, hills and flatter areas in Austria.
# build 5 groups of observations based on spatial proximity and geography
cut_lon = c(min(weatherAUT2021$lon)-0.2, 12, 16, max(weatherAUT2021$lon) + 0.2)
cut_lat = c(min(weatherAUT2021$lat)-0.2, 48, max(weatherAUT2021$lat) + 0.2)
groups = ssMRCD::groups_gridbased(weatherAUT2021$lon,
weatherAUT2021$lat,
cut_lon,
cut_lat)
N = length(unique(groups))
table(groups)
#> groups
#> 1 2 3 4 5
#> 31 80 35 16 21# calculate MG-GMM
model = cellMGGMM(X = data, groups = groups,
nsteps = 100, alpha = 0.5,
maxcond = 100)
#> Iteration finished due to computational inaccuracies (increase of objective function in last step of size: 0.000119383117635152). Check convergence plot.# mixture probabilities
cat("Pi (in %):\n")
#> Pi (in %):
round(model$pi_groups*100, 2)
#> 1 2 3 4 5
#> 1 100.00 0.00 0.00 0.00 0.00
#> 2 24.97 71.95 0.00 3.08 0.00
#> 3 0.00 0.00 94.02 5.98 0.00
#> 4 12.48 0.00 5.83 64.57 17.11
#> 5 0.00 0.00 9.01 15.35 75.64# percentage of outliers
cat("% Outliers per group and variable:\n")
#> % Outliers per group and variable:
round(sapply(1:N, function(x) colMeans(1-model$W[groups == x, ]))*100, 2)
#> [,1] [,2] [,3] [,4] [,5]
#> p 6.45 3.75 2.86 0.00 0.00
#> s 0.00 2.50 0.00 0.00 0.00
#> vv 6.45 12.50 8.57 6.25 9.52
#> t 6.45 3.75 0.00 0.00 4.76
#> rsum 0.00 0.00 2.86 0.00 4.76
#> rel 0.00 2.50 0.00 6.25 4.76GeoSphere Austria (2022): https://data.hub.geosphere.at.
Puchhammer P., Wilms I. and Filzmoser P. (2025): A smooth multi-group Gaussian Mixture Model for cellwise robust covariance estimation. https://doi.org/10.48550/arXiv.2504.02547