| Title: | Comprehensive Soil Quality Index Computation and Visualization |
|---|---|
| Description: | Provides a comprehensive, modular framework for computing the Soil Quality Index (SQI) using six established methods: Linear Scoring (Doran and Parkin, 1994, <doi:10.2136/sssaspecpub35.c1>), Regression-based (Masto et al., 2008, <doi:10.1007/s10661-007-9697-z>), Principal Component Analysis-based (Andrews et al., 2004, <doi:10.2136/sssaj2004.1945>), Fuzzy Logic, Entropy Weighting (Shannon, 1948, <doi:10.1002/j.1538-7305.1948.tb01338.x>), and TOPSIS (Hwang and Yoon, 1981, <doi:10.1007/978-3-642-48318-9>). Implements four variable scoring functions: more-is-better, less-is-better, optimum-value, and trapezoidal, following Karlen and Stott (1994, <doi:10.2136/sssaspecpub35.c4>). Includes automated Minimum Data Set selection via Principal Component Analysis with Variance Inflation Factor filtering (Kaiser, 1960, <doi:10.1177/001316446002000116>), one-way ANOVA with Tukey HSD post-hoc tests, leave-one-out sensitivity analysis, and publication-quality visualization using 'ggplot2'. |
| Authors: | Sadikul Islam [aut, cre] (ORCID: <https://orcid.org/0000-0003-2924-7122>), Rajesh Kaushal [aut] |
| Maintainer: | Sadikul Islam <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.0 |
| Built: | 2026-05-20 09:59:29 UTC |
| Source: | https://github.com/cran/SQIpro |
Constructs a variable configuration data frame that specifies the scoring function type and relevant parameters for each soil indicator. This configuration table is the central object passed to all scoring and indexing functions in SQIpro.
make_config( variable, type, opt_low = rep(NA_real_, length(variable)), opt_high = rep(NA_real_, length(variable)), min_val = rep(NA_real_, length(variable)), max_val = rep(NA_real_, length(variable)), weight = rep(1, length(variable)), description = rep(NA_character_, length(variable)) )make_config( variable, type, opt_low = rep(NA_real_, length(variable)), opt_high = rep(NA_real_, length(variable)), min_val = rep(NA_real_, length(variable)), max_val = rep(NA_real_, length(variable)), weight = rep(1, length(variable)), description = rep(NA_character_, length(variable)) )
variable |
Character vector of variable names (must match column names in the data). |
type |
Character vector of scoring types, one per variable. Must be one of:
|
opt_low |
Numeric vector. Lower bound of optimum range (required for
|
opt_high |
Numeric vector. Upper bound of optimum range (required for
|
min_val |
Numeric vector. Absolute minimum value (required for
|
max_val |
Numeric vector. Absolute maximum value (required for
|
weight |
Numeric vector of user-defined weights (0–1). Used only
when |
description |
Character vector. Optional human-readable description of each variable (units, rationale). Useful for automated reports. |
A data frame (class sqi_config) with one row per variable.
Doran, J.W., & Parkin, T.B. (1994). Defining and assessing soil quality. In J.W. Doran et al. (Eds.), Defining Soil Quality for a Sustainable Environment, pp. 1–21. SSSA Special Publication 35. doi:10.2136/sssaspecpub35.c1
Andrews, S.S., Karlen, D.L., & Cambardella, C.A. (2004). The soil management assessment framework. Soil Science Society of America Journal, 68(6), 1945–1962. doi:10.2136/sssaj2004.1945
cfg <- make_config( variable = c("pH", "EC", "BD", "OC", "MBC", "Clay"), type = c("opt", "less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35), description = c("Soil pH (H2O)", "Electrical Conductivity (dS/m)", "Bulk Density (g/cm3)", "Organic Carbon (%)", "Microbial Biomass Carbon (mg/kg)", "Clay content (%)") ) print(cfg)cfg <- make_config( variable = c("pH", "EC", "BD", "OC", "MBC", "Clay"), type = c("opt", "less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35), description = c("Soil pH (H2O)", "Electrical Conductivity (dS/m)", "Bulk Density (g/cm3)", "Organic Carbon (%)", "Microbial Biomass Carbon (mg/kg)", "Clay content (%)") ) print(cfg)
Renders a PCA biplot showing both variable loadings and group centroids,
using factoextra::fviz_pca_biplot. Useful for understanding
variable relationships and group separation underlying MDS selection.
plot_pca_biplot( mds, scored, group_col = "LandUse", title = "PCA Biplot of Soil Quality Variables" )plot_pca_biplot( mds, scored, group_col = "LandUse", title = "PCA Biplot of Soil Quality Variables" )
mds |
An object returned by |
scored |
A scored data frame (for group colour coding). |
group_col |
Character. Column for group colours. |
title |
Character. Plot title. |
A ggplot object.
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) mds <- select_mds(scored, group_cols = c("LandUse","Depth")) plot_pca_biplot(mds, scored, group_col = "LandUse")data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) mds <- select_mds(scored, group_cols = c("LandUse","Depth")) plot_pca_biplot(mds, scored, group_col = "LandUse")
Draws a radar (spider) chart comparing mean variable scores across groups. Useful for visualising the multidimensional soil quality profile of each land-use system.
plot_radar( scored, config, group_col, group_cols = group_col, vars = NULL, title = "Soil Quality Radar Profile", palette = c("#1b7837", "#762a83", "#d6604d", "#4393c3", "#f4a582") )plot_radar( scored, config, group_col, group_cols = group_col, vars = NULL, title = "Soil Quality Radar Profile", palette = c("#1b7837", "#762a83", "#d6604d", "#4393c3", "#f4a582") )
scored |
A scored data frame from |
config |
A |
group_col |
Character. Column used to define radar chart series. |
group_cols |
Character vector of all grouping columns. |
vars |
Character vector of variables to include. Defaults to all
in |
title |
Character. Plot title. |
palette |
Character vector of colours for each group. |
Invisible NULL; the chart is rendered to the active
graphics device.
Chambers, J.M., & Hastie, T.J. (1992). Statistical Models in S. Wadsworth & Brooks/Cole.
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) plot_radar(scored, cfg, group_col = "LandUse", group_cols = c("LandUse","Depth"))data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) plot_radar(scored, cfg, group_col = "LandUse", group_cols = c("LandUse","Depth"))
Displays a heatmap of mean variable scores (0–1) per group, allowing rapid visual identification of which variables drive high or low SQI within each land-use system.
plot_scores( scored, config, group_cols = "LandUse", group_by = group_cols[1], facet_by = NULL, palette = "RdYlGn", title = "Mean Variable Scores by Group" )plot_scores( scored, config, group_cols = "LandUse", group_by = group_cols[1], facet_by = NULL, palette = "RdYlGn", title = "Mean Variable Scores by Group" )
scored |
A scored data frame from |
config |
A |
group_cols |
Character vector. Grouping columns. |
group_by |
Character. Which grouping column to display on the x-axis. |
facet_by |
Character or |
palette |
Character. Colour palette: |
title |
Character. Plot title. |
A ggplot object.
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) plot_scores(scored, cfg, group_cols = c("LandUse","Depth"), group_by = "LandUse", facet_by = "Depth")data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) plot_scores(scored, cfg, group_cols = c("LandUse","Depth"), group_by = "LandUse", facet_by = "Depth")
Visualises the scoring function (0–1 transformation) for each variable in the configuration, overlaid on the observed data distribution. This plot is essential for verifying that the scoring configuration is biologically sensible before computing indices.
plot_scoring_curves(data, config, group_cols = "LandUse", ncol = 3)plot_scoring_curves(data, config, group_cols = "LandUse", ncol = 3)
data |
The raw (unscored) soil data frame. |
config |
A |
group_cols |
Character vector of grouping columns to exclude. |
ncol |
Integer. Number of columns in the facet grid. Default 3. |
A ggplot object.
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) plot_scoring_curves(soil_data, cfg, group_cols = c("LandUse","Depth"))data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) plot_scoring_curves(soil_data, cfg, group_cols = c("LandUse","Depth"))
Visualises variable importance from sqi_sensitivity as a
horizontal bar (tornado) chart, ordered from most to least sensitive.
plot_sensitivity(sa_result, title = "Variable Sensitivity to SQI")plot_sensitivity(sa_result, title = "Variable Sensitivity to SQI")
sa_result |
Data frame from |
title |
Character. Plot title. |
A ggplot object.
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) sa <- sqi_sensitivity(scored, cfg, group_cols = c("LandUse","Depth")) plot_sensitivity(sa)data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) sa <- sqi_sensitivity(scored, cfg, group_cols = c("LandUse","Depth")) plot_sensitivity(sa)
Creates a grouped bar chart of SQI values per group, with optional error bars (standard deviation computed across replicate observations before indexing) and significance letters.
plot_sqi( sqi_result, sqi_col, group_col, fill_col = NULL, letters_df = NULL, title = "Soil Quality Index", y_label = "SQI (0-1)", palette = c("#2d6a4f", "#52b788", "#95d5b2", "#d8f3dc", "#b7e4c7") )plot_sqi( sqi_result, sqi_col, group_col, fill_col = NULL, letters_df = NULL, title = "Soil Quality Index", y_label = "SQI (0-1)", palette = c("#2d6a4f", "#52b788", "#95d5b2", "#d8f3dc", "#b7e4c7") )
sqi_result |
A data frame returned by any |
sqi_col |
Character. Name of the SQI column to plot. |
group_col |
Character. Grouping column for the x-axis. |
fill_col |
Character or |
letters_df |
Data frame with columns |
title |
Character. Plot title. |
y_label |
Character. Y-axis label. |
palette |
Character vector of colours. |
A ggplot object.
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) res <- sqi_linear(scored, cfg, group_cols = c("LandUse","Depth")) plot_sqi(res, sqi_col = "SQI_linear", group_col = "LandUse", fill_col = "Depth")data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) res <- sqi_linear(scored, cfg, group_cols = c("LandUse","Depth")) plot_sqi(res, sqi_col = "SQI_linear", group_col = "LandUse", fill_col = "Depth")
Applies the appropriate scoring function to each soil variable according
to a configuration table produced by make_config. This
is the primary data-preparation step before computing any Soil Quality
Index.
score_all(data, config, group_cols = "LandUse", custom_fns = list())score_all(data, config, group_cols = "LandUse", custom_fns = list())
data |
A data frame containing the soil variables. |
config |
A |
group_cols |
Character vector of grouping column names to preserve
unchanged. Default is |
custom_fns |
A named list of functions for variables with
|
A data frame with the same structure as data, but with
each variable column replaced by its 0–1 score. Group columns are
preserved unchanged.
Andrews, S.S., Karlen, D.L., & Cambardella, C.A. (2004). The soil management assessment framework. Soil Science Society of America Journal, 68(6), 1945–1962. doi:10.2136/sssaj2004.1945
data(soil_data) cfg <- make_config( variable = c("pH", "EC", "BD", "OC", "MBC", "Clay"), type = c("opt", "less", "less", "more", "more", "opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse", "Depth")) head(scored)data(soil_data) cfg <- make_config( variable = c("pH", "EC", "BD", "OC", "MBC", "Clay"), type = c("opt", "less", "less", "more", "more", "opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse", "Depth")) head(scored)
Applies an arbitrary user-defined scoring function to a numeric vector. The function must accept a numeric vector and return a numeric vector of the same length with values in [0, 1].
score_custom(x, FUN, ...)score_custom(x, FUN, ...)
x |
Numeric vector of raw variable values. |
FUN |
A function with signature |
... |
Additional arguments passed to |
Numeric vector of scores in [0, 1].
# Log-linear scoring for a skewed variable mbc <- c(30, 80, 200, 400, 600) score_custom(mbc, FUN = function(x) { s <- (log(x) - log(min(x))) / (log(max(x)) - log(min(x))) pmin(pmax(s, 0), 1) })# Log-linear scoring for a skewed variable mbc <- c(30, 80, 200, 400, 600) score_custom(mbc, FUN = function(x) { s <- (log(x) - log(min(x))) / (log(max(x)) - log(min(x))) pmin(pmax(s, 0), 1) })
Applies a "less is better" linear scoring function, transforming raw variable values to a 0–1 score. Suitable for soil indicators where lower values denote better soil quality, such as bulk density, electrical conductivity, or heavy metal concentrations (Andrews et al., 2004).
The score is computed as:
score_less(x, x_min = NULL, x_max = NULL)score_less(x, x_min = NULL, x_max = NULL)
x |
Numeric vector of raw variable values. |
x_min |
Numeric. Lower bound. Defaults to |
x_max |
Numeric. Upper bound. Defaults to |
Numeric vector of scores in [0, 1].
Andrews, S.S., Karlen, D.L., & Cambardella, C.A. (2004). The soil management assessment framework. Soil Science Society of America Journal, 68(6), 1945–1962. doi:10.2136/sssaj2004.1945
bd <- c(0.9, 1.1, 1.3, 1.5, 1.7) # Bulk Density (g/cm3) score_less(bd) # With domain bounds score_less(bd, x_min = 0.8, x_max = 2.0)bd <- c(0.9, 1.1, 1.3, 1.5, 1.7) # Bulk Density (g/cm3) score_less(bd) # With domain bounds score_less(bd, x_min = 0.8, x_max = 2.0)
Applies a "more is better" linear scoring function, transforming raw variable values to a 0–1 score. This is appropriate for soil indicators where higher values improve soil function, such as organic carbon, microbial biomass, or cation exchange capacity (Andrews et al., 2004; Karlen & Stott, 1994).
The score is computed as:
where and are taken from the observed
data (or from user-supplied bounds).
score_more(x, x_min = NULL, x_max = NULL)score_more(x, x_min = NULL, x_max = NULL)
x |
Numeric vector of raw variable values. |
x_min |
Numeric. Lower bound for scoring. Defaults to
|
x_max |
Numeric. Upper bound for scoring. Defaults to
|
Numeric vector of scores in [0, 1].
Andrews, S.S., Karlen, D.L., & Cambardella, C.A. (2004). The soil management assessment framework. Soil Science Society of America Journal, 68(6), 1945–1962. doi:10.2136/sssaj2004.1945
Karlen, D.L., & Stott, D.E. (1994). A framework for evaluating physical and chemical indicators of soil quality. In J.W. Doran et al. (Eds.), Defining Soil Quality for a Sustainable Environment, pp. 53–72. SSSA Special Publication 35. doi:10.2136/sssaspecpub35.c4
oc <- c(0.5, 1.2, 2.1, 3.4, 4.5) # Organic Carbon (%) score_more(oc) # With user-defined bounds (e.g., 0 to 5%) score_more(oc, x_min = 0, x_max = 5)oc <- c(0.5, 1.2, 2.1, 3.4, 4.5) # Organic Carbon (%) score_more(oc) # With user-defined bounds (e.g., 0 to 5%) score_more(oc, x_min = 0, x_max = 5)
Applies a bell-shaped (peaked) scoring function appropriate for soil variables that have an optimum range, beyond which both higher and lower values reduce soil quality. Classic examples include pH (optimal 6.0–7.0 for most crops) and clay content (Liebig et al., 1996; Karlen & Stott, 1994).
The scoring rules are:
if
if
if
score_optimum(x, opt_low, opt_high, x_min = NULL, x_max = NULL)score_optimum(x, opt_low, opt_high, x_min = NULL, x_max = NULL)
x |
Numeric vector of raw variable values. |
opt_low |
Numeric. Lower bound of the optimum range. |
opt_high |
Numeric. Upper bound of the optimum range. |
x_min |
Numeric. Absolute minimum (score = 0). Defaults to
|
x_max |
Numeric. Absolute maximum (score = 0). Defaults to
|
Numeric vector of scores in [0, 1].
Karlen, D.L., & Stott, D.E. (1994). A framework for evaluating physical and chemical indicators of soil quality. In J.W. Doran et al. (Eds.), Defining Soil Quality for a Sustainable Environment, pp. 53–72. SSSA Special Publication 35. doi:10.2136/sssaspecpub35.c4
Liebig, M.A., Varvel, G., & Doran, J.W. (1996). A simple performance- based index for assessing multiple agroecosystem functions. Agronomy Journal, 88, 739–745. doi:10.2134/agronj1996.00021962008800050011x
ph <- c(4.5, 5.5, 6.2, 6.8, 7.0, 7.5, 8.2) score_optimum(ph, opt_low = 6.0, opt_high = 7.0) clay <- c(10, 18, 25, 32, 45, 60) score_optimum(clay, opt_low = 20, opt_high = 35)ph <- c(4.5, 5.5, 6.2, 6.8, 7.0, 7.5, 8.2) score_optimum(ph, opt_low = 6.0, opt_high = 7.0) clay <- c(10, 18, 25, 32, 45, 60) score_optimum(clay, opt_low = 20, opt_high = 35)
Applies a trapezoidal scoring function where scores are 1 within an
ideal plateau [opt_low, opt_high], rise linearly from 0
at min_val to 1 at opt_low, and fall linearly from 1 at
opt_high to 0 at max_val. Values outside
[min_val, max_val] receive a score of 0.
This function is more flexible than score_optimum because
the user explicitly controls the zero-score boundaries, making it
suitable for variables with well-established critical thresholds.
score_trapezoid(x, min_val, opt_low, opt_high, max_val)score_trapezoid(x, min_val, opt_low, opt_high, max_val)
x |
Numeric vector of raw variable values. |
min_val |
Numeric. Value at which score becomes 0 on the low side. |
opt_low |
Numeric. Lower bound of the plateau (score = 1). |
opt_high |
Numeric. Upper bound of the plateau (score = 1). |
max_val |
Numeric. Value at which score becomes 0 on the high side. |
Numeric vector of scores in [0, 1].
Wymore, A.W. (1993). Model-Based Systems Engineering. CRC Press, Boca Raton, FL.
Buse, R., & Lele, S. (2003). Trapezoidal membership functions in fuzzy soil quality assessment. Geoderma, 114, 177–196.
ph <- c(3.5, 5.0, 6.5, 7.0, 7.8, 8.5, 9.5) # pH: absolute zero below 4 and above 9; ideal 6.0-7.0 score_trapezoid(ph, min_val = 4.0, opt_low = 6.0, opt_high = 7.0, max_val = 9.0)ph <- c(3.5, 5.0, 6.5, 7.0, 7.8, 8.5, 9.5) # pH: absolute zero below 4 and above 9; ideal 6.0-7.0 score_trapezoid(ph, min_val = 4.0, opt_low = 6.0, opt_high = 7.0, max_val = 9.0)
Identifies the most informative subset of soil variables (the Minimum Data Set, MDS) using Principal Component Analysis (PCA). Only variables with high factor loadings on principal components explaining eigenvalue > 1 (Kaiser criterion) are retained. Where multiple variables load highly on the same component, the one with the highest correlation to others in that component is selected to minimise redundancy.
This approach follows the widely cited method of Andrews et al. (2004)
and Sharma et al. (2008), and is equivalent to the PCAIndex
algorithm in Wani et al. (2023).
select_mds( data, group_cols = "LandUse", load_threshold = 0.5, vif_threshold = 10, n_pc = "auto", verbose = TRUE )select_mds( data, group_cols = "LandUse", load_threshold = 0.5, vif_threshold = 10, n_pc = "auto", verbose = TRUE )
data |
A data frame of scored or raw soil variables (numeric
columns only, or with group columns specified in |
group_cols |
Character vector of grouping columns to exclude from
the analysis. Default: |
load_threshold |
Numeric in (0, 1). Minimum absolute factor
loading for a variable to be considered for MDS membership.
Default: |
vif_threshold |
Numeric. Maximum allowable Variance Inflation
Factor among MDS variables. Variables exceeding this are iteratively
removed. Set to |
n_pc |
Integer or |
verbose |
Logical. Print MDS selection summary. Default |
**Algorithm steps:**
Standardise all numeric variables (mean = 0, sd = 1).
Perform PCA; retain components with eigenvalue > 1.
For each retained component, identify variables with absolute
loading load_threshold.
Among those, select the variable with the highest sum of absolute Pearson correlations to all others in the set (i.e., the most correlated, least redundant variable).
Optionally, remove variables with high Variance Inflation Factor
(VIF > vif_threshold) among the MDS candidates.
A list of class sqi_mds with:
Character vector of selected MDS variable names.
Character vector of all candidate variable names.
The PCA result object.
Matrix of factor loadings.
Numeric vector of eigenvalues.
Numeric vector of variance explained (%) per component.
Andrews, S.S., Karlen, D.L., & Cambardella, C.A. (2004). The soil management assessment framework: A quantitative soil quality evaluation method. Soil Science Society of America Journal, 68(6), 1945–1962. doi:10.2136/sssaj2004.1945
Kaiser, H.F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20(1), 141–151. doi:10.1177/001316446002000116
Sharma, K.L., et al. (2008). Long-term soil management effects on soil quality indices. Geoderma, 144, 290–300. doi:10.1016/j.geoderma.2007.11.019
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","PMN","Clay","WHC","DEH","AP","TN"), type = c("opt","less","less","more","more","more", "opt","more","more","more","more"), opt_low = c(6.0, NA, NA, NA, NA, NA, 20, NA, NA, NA, NA), opt_high = c(7.0, NA, NA, NA, NA, NA, 35, NA, NA, NA, NA) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) mds <- select_mds(scored, group_cols = c("LandUse","Depth")) mds$mds_varsdata(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","PMN","Clay","WHC","DEH","AP","TN"), type = c("opt","less","less","more","more","more", "opt","more","more","more","more"), opt_low = c(6.0, NA, NA, NA, NA, NA, 20, NA, NA, NA, NA), opt_high = c(7.0, NA, NA, NA, NA, NA, 35, NA, NA, NA, NA) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) mds <- select_mds(scored, group_cols = c("LandUse","Depth")) mds$mds_vars
A hypothetical dataset representing soil physicochemical and biological properties across five land-use systems and two soil depths, generated using realistic parameter ranges reported in the soil quality literature. This dataset is intended for demonstrating the functions in the SQIpro package and for pedagogical purposes.
soil_datasoil_data
A data frame with 100 rows and 14 variables:
Character. Land-use system: Natural_Forest,
Agroforestry, Cropland, Grassland,
Degraded_Land.
Character. Soil depth: Surface_0_15cm or
Subsurface_15_30cm.
Numeric. Soil pH (1:2.5 water suspension). Unitless. Optimal range 6.0–7.0 for most crops (Brady & Weil, 2008).
Numeric. Electrical Conductivity (dS m).
Lower values indicate less salinity stress; <0.2 dS m
considered non-saline (Richards, 1954).
Numeric. Bulk Density (g cm). Lower values
indicate better soil structure and aeration; >1.6 g cm
restricts root growth (Arshad et al., 1996).
Numeric. Cation Exchange Capacity
(cmol(+) kg). Higher values indicate greater
nutrient-holding capacity (Sparks, 2003).
Numeric. Organic Carbon (%). Higher values indicate greater soil organic matter, a key indicator of soil health (Doran & Parkin, 1994).
Numeric. Microbial Biomass Carbon
(mg kg). Indicator of soil biological activity
(Brookes, 1995).
Numeric. Potentially Mineralizable Nitrogen
(mg kg). Indicates N-supplying capacity of soil
(Stanford & Smith, 1972).
Numeric. Clay content (%). Optimal range 20–35% for water and nutrient retention (Arshad et al., 1996).
Numeric. Water Holding Capacity (%). Higher values indicate better moisture retention (Reynolds et al., 2009).
Numeric. Dehydrogenase Enzyme Activity
(g TPF g day). Indicator of
overall microbial metabolic activity (Casida et al., 1964).
Numeric. Available Phosphorus (mg kg).
Higher values indicate better P availability for plants
(Olsen & Sommers, 1982).
Numeric. Total Nitrogen (%). Higher values indicate greater N reserves (Bremner, 1996).
Parameter ranges were informed by values reported in:
Doran and Parkin (1994) for biological indicators
Andrews et al. (2004) for MDS indicator ranges
Masto et al. (2008) for land-use comparison ranges
The dataset is entirely synthetic and does not represent any specific geographic location.
Synthetically generated for the SQIpro package.
Andrews, S.S., Karlen, D.L., & Cambardella, C.A. (2004). The soil management assessment framework: A quantitative soil quality evaluation method. Soil Science Society of America Journal, 68(6), 1945–1962. doi:10.2136/sssaj2004.1945
Arshad, M.A., Lowery, B., & Grossman, B. (1996). Physical tests for monitoring soil quality. In J.W. Doran & A.J. Jones (Eds.), Methods for Assessing Soil Quality, pp. 123–141. SSSA Special Publication 49. doi:10.2136/sssaspecpub49.c7
Brady, N.C., & Weil, R.R. (2008). The Nature and Properties of Soils (14th ed.). Prentice Hall, New Jersey.
Doran, J.W., & Parkin, T.B. (1994). Defining and assessing soil quality. In J.W. Doran et al. (Eds.), Defining Soil Quality for a Sustainable Environment, pp. 1–21. SSSA Special Publication 35. doi:10.2136/sssaspecpub35.c1
Masto, R.E., Chhonkar, P.K., Singh, D., & Patra, A.K. (2008). Alternative soil quality indices for evaluating the effect of intensive cropping, fertilisation and manuring for 31 years in the semi-arid soils of India. Environmental Monitoring and Assessment, 136, 419–435. doi:10.1007/s10661-007-9697-z
data(soil_data) head(soil_data) summary(soil_data) table(soil_data$LandUse, soil_data$Depth)data(soil_data) head(soil_data) summary(soil_data) table(soil_data$LandUse, soil_data$Depth)
Performs a one-way ANOVA to test whether Soil Quality Index values differ significantly across land-use groups, followed by Tukey's Honest Significant Difference (HSD) test for pairwise comparisons.
sqi_anova(scored, sqi_col, group_col, alpha = 0.05)sqi_anova(scored, sqi_col, group_col, alpha = 0.05)
scored |
A scored data frame from |
sqi_col |
Character. Name of the SQI column to test (e.g.,
|
group_col |
Character. Grouping variable column name (e.g.,
|
alpha |
Numeric. Significance level for the ANOVA. Default 0.05. |
A list with:
An anova object.
A TukeyHSD object.
Logical. Whether the ANOVA is significant at
alpha.
Data frame of compact letter display for plotting.
Tukey, J.W. (1949). Comparing individual means in the analysis of variance. Biometrics, 5(2), 99–114. doi:10.2307/3001913
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) # Compute per-observation linear SQI for ANOVA scored$SQI_obs <- rowMeans(scored[, cfg$variable], na.rm = TRUE) aov_result <- sqi_anova(scored, sqi_col = "SQI_obs", group_col = "LandUse") print(aov_result$tukey)data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) # Compute per-observation linear SQI for ANOVA scored$SQI_obs <- rowMeans(scored[, cfg$variable], na.rm = TRUE) aov_result <- sqi_anova(scored, sqi_col = "SQI_obs", group_col = "LandUse") print(aov_result$tukey)
Runs all six SQI methods (sqi_linear, sqi_regression,
sqi_pca, sqi_fuzzy, sqi_entropy, sqi_topsis)
on the same scored dataset and returns a combined results table for
method comparison.
sqi_compare(scored, config, group_cols = "LandUse", dep_var = NULL, mds = NULL)sqi_compare(scored, config, group_cols = "LandUse", dep_var = NULL, mds = NULL)
scored |
A scored data frame from |
config |
A |
group_cols |
Character vector of grouping column names. |
dep_var |
Character. Dependent variable for |
mds |
Object from |
A data frame with one row per group and columns for each SQI
method. Also includes Mean_SQI and Rank columns.
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) results <- sqi_compare(scored, cfg, group_cols = c("LandUse","Depth"), dep_var = "OC") print(results)data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) results <- sqi_compare(scored, cfg, group_cols = c("LandUse","Depth"), dep_var = "OC") print(results)
Computes SQI using Shannon entropy to derive objective weights for each variable. Variables with higher information entropy (greater discriminating power among groups) receive higher weights. This removes subjectivity from weight assignment.
The entropy weight for variable is:
where .
sqi_entropy(scored, config, group_cols = "LandUse", mds_vars = NULL)sqi_entropy(scored, config, group_cols = "LandUse", mds_vars = NULL)
scored |
A scored data frame from |
config |
A |
group_cols |
Character vector of grouping column names. |
mds_vars |
Character vector of MDS variable names. |
A data frame with group columns, SQI_entropy,
and attribute entropy_weights (named numeric vector).
Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423. doi:10.1002/j.1538-7305.1948.tb01338.x
Li, P., Qian, H., & Wu, J. (2010). Groundwater quality assessment based on improved water quality index in Pengyang County, Ningxia, Northwest China. E-Journal of Chemistry, 7, 209–216. doi:10.1155/2010/451304
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) result <- sqi_entropy(scored, cfg, group_cols = c("LandUse","Depth")) attr(result, "entropy_weights")data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) result <- sqi_entropy(scored, cfg, group_cols = c("LandUse","Depth")) attr(result, "entropy_weights")
Computes SQI using a fuzzy membership aggregation approach. Each scored variable (already 0–1) is treated as a fuzzy membership value, and groups are aggregated using either the arithmetic mean (equivalent to the linear method) or the fuzzy weighted average operator.
This approach is appropriate when variable importance is uncertain or when expert-elicited weights are available (Zhu et al., 2006; Torbert & Wood, 1992).
sqi_fuzzy( scored, config, group_cols = "LandUse", mds_vars = NULL, fuzzy_weights = NULL, operator = c("mean", "geometric") )sqi_fuzzy( scored, config, group_cols = "LandUse", mds_vars = NULL, fuzzy_weights = NULL, operator = c("mean", "geometric") )
scored |
A scored data frame from |
config |
A |
group_cols |
Character vector of grouping column names. |
mds_vars |
Character vector of MDS variable names. |
fuzzy_weights |
Named numeric vector of fuzzy importance weights (sum need not equal 1; they are normalised internally). Defaults to equal weights. |
operator |
Character. Aggregation operator: |
A data frame with group columns and SQI_fuzzy (0–1).
Zhu, A.X., Liu, F., Li, B., Pei, T., Qin, C., Liu, G., Wang, Y., Chen, Y., Ma, X., Qi, F., & Li, R. (2010). Differentiation of soil conditions over flat areas using land surface feedback dynamic patterns extracted from MODIS. Soil Science Society of America Journal, 74(1), 861–869.
Torbert, H.A., & Wood, C.W. (1992). Effects of soil compaction and water-filled pore space on soil microbial activity and N losses. Communications in Soil Science and Plant Analysis, 23, 1321–1331. doi:10.1080/00103629209368668
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) result <- sqi_fuzzy(scored, cfg, group_cols = c("LandUse","Depth")) print(result)data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) result <- sqi_fuzzy(scored, cfg, group_cols = c("LandUse","Depth")) print(result)
Computes the Soil Quality Index (SQI) using the linear additive scoring
method of Doran & Parkin (1994) and Andrews et al. (2004). Each variable
score (0–1, from score_all) is averaged across replicates
within each group, optionally weighted, and then min-max normalised to
produce the final index.
where is the mean score of variable in group
and is the weight of variable .
sqi_linear( scored, config, group_cols = "LandUse", mds_vars = NULL, weights = NULL )sqi_linear( scored, config, group_cols = "LandUse", mds_vars = NULL, weights = NULL )
scored |
A scored data frame from |
config |
A |
group_cols |
Character vector of grouping column names. |
mds_vars |
Character vector. If supplied, only these variables are
used. Otherwise all numeric variables in |
weights |
Named numeric vector of variable weights. Defaults to equal weights (1 for all). Names must match variable names. |
A data frame with group columns plus:
Final normalised Soil Quality Index (0–1).
Weighted mean score before normalisation.
Doran, J.W., & Parkin, T.B. (1994). Defining and assessing soil quality. In J.W. Doran et al. (Eds.), Defining Soil Quality for a Sustainable Environment, pp. 1–21. SSSA Special Publication 35. doi:10.2136/sssaspecpub35.c1
Andrews, S.S., Karlen, D.L., & Cambardella, C.A. (2004). The soil management assessment framework. Soil Science Society of America Journal, 68(6), 1945–1962. doi:10.2136/sssaj2004.1945
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) result <- sqi_linear(scored, cfg, group_cols = c("LandUse","Depth")) print(result)data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) result <- sqi_linear(scored, cfg, group_cols = c("LandUse","Depth")) print(result)
Computes SQI using Principal Component Analysis, weighting selected MDS variables by the proportion of variance their component explains. This is the most widely cited data-driven approach in soil quality research (Andrews et al., 2004; Bastida et al., 2008).
where is the variance explained by component ,
is the MDS variable selected from component , and
is the group mean score of that variable.
sqi_pca(scored, config, group_cols = "LandUse", mds = NULL)sqi_pca(scored, config, group_cols = "LandUse", mds = NULL)
scored |
A scored data frame from |
config |
A |
group_cols |
Character vector of grouping column names. |
mds |
Object returned by |
A data frame with group columns and SQI_pca (0–1).
Andrews, S.S., Karlen, D.L., & Cambardella, C.A. (2004). The soil management assessment framework. Soil Science Society of America Journal, 68(6), 1945–1962. doi:10.2136/sssaj2004.1945
Bastida, F., Zsolnay, A., Hernandez, T., & Garcia, C. (2008). Past, present and future of soil quality indices: A biological perspective. Geoderma, 147(3–4), 159–171. doi:10.1016/j.geoderma.2008.08.007
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) result <- sqi_pca(scored, cfg, group_cols = c("LandUse","Depth")) print(result)data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) result <- sqi_pca(scored, cfg, group_cols = c("LandUse","Depth")) print(result)
Computes the SQI using stepwise multiple linear regression to identify and weight the most predictive soil variables. The dependent variable (e.g., crop yield, total biomass) determines which variables enter the model. Regression coefficients serve as weights in the index.
This follows the method described by Masto et al. (2008) and Mukherjee & Lal (2014).
sqi_regression( scored, config, dep_var, group_cols = "LandUse", mds_vars = NULL, direction = "both" )sqi_regression( scored, config, dep_var, group_cols = "LandUse", mds_vars = NULL, direction = "both" )
scored |
A scored data frame from |
config |
A |
dep_var |
Character. Name of the dependent variable column in
|
group_cols |
Character vector of grouping column names. |
mds_vars |
Character vector of candidate predictor variable names.
If |
direction |
Character. Direction for stepwise selection:
|
A data frame with group columns plus:
Normalised SQI (0–1).
(attribute) Character vector of selected predictors.
Masto, R.E., Chhonkar, P.K., Singh, D., & Patra, A.K. (2008). Alternative soil quality indices. Environmental Monitoring and Assessment, 136, 419–435. doi:10.1007/s10661-007-9697-z
Mukherjee, A., & Lal, R. (2014). Comparison of soil quality index using three methods. PLOS ONE, 9(8), e105981. doi:10.1371/journal.pone.0105981
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) # OC used as surrogate dependent variable result <- sqi_regression(scored, cfg, dep_var = "OC", group_cols = c("LandUse","Depth")) print(result)data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) # OC used as surrogate dependent variable result <- sqi_regression(scored, cfg, dep_var = "OC", group_cols = c("LandUse","Depth")) print(result)
Quantifies the contribution of each soil variable to the overall Soil Quality Index by a leave-one-out approach: each variable is removed in turn and the resulting index is compared to the full index. A larger change indicates a higher-sensitivity (more important) variable.
sqi_sensitivity( scored, config, group_cols = "LandUse", method = c("linear", "fuzzy", "entropy", "topsis"), mds_vars = NULL )sqi_sensitivity( scored, config, group_cols = "LandUse", method = c("linear", "fuzzy", "entropy", "topsis"), mds_vars = NULL )
scored |
A scored data frame from |
config |
A |
group_cols |
Character vector of grouping columns. |
method |
Character. Which indexing method to use for sensitivity
analysis: |
mds_vars |
Character vector of MDS variable names. If |
A data frame with columns variable, mean_change
(mean absolute change in SQI when variable is removed),
sd_change, and relative_importance (0–1, normalised).
Saltelli, A., Ratto, M., Andres, T., Campolongo, F., Cariboni, J., Gatelli, D., Saisana, M., & Tarantola, S. (2008). Global Sensitivity Analysis: The Primer. John Wiley & Sons, Chichester. doi:10.1002/9780470725184
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) sa <- sqi_sensitivity(scored, cfg, group_cols = c("LandUse","Depth")) print(sa)data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) sa <- sqi_sensitivity(scored, cfg, group_cols = c("LandUse","Depth")) print(sa)
Computes SQI using the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS), a multi-criteria decision analysis method. Each group is ranked by its Euclidean distance to the positive ideal solution (all scores = 1) and negative ideal solution (all scores = 0).
where and are distances to the positive and
negative ideal solutions. with higher values
indicating better soil quality.
sqi_topsis( scored, config, group_cols = "LandUse", mds_vars = NULL, weights = NULL )sqi_topsis( scored, config, group_cols = "LandUse", mds_vars = NULL, weights = NULL )
scored |
A scored data frame from |
config |
A |
group_cols |
Character vector of grouping column names. |
mds_vars |
Character vector of MDS variable names. |
weights |
Named numeric vector of criteria weights. Defaults to equal weights. |
A data frame with group columns and SQI_topsis (0–1).
Hwang, C.L., & Yoon, K. (1981). Multiple Attribute Decision Making: Methods and Applications. Springer, Berlin. doi:10.1007/978-3-642-48318-9
Yoon, K. (1987). A reconciliation among discrete compromise solutions. Journal of the Operational Research Society, 38, 277–286. doi:10.1057/jors.1987.44
data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) result <- sqi_topsis(scored, cfg, group_cols = c("LandUse","Depth")) print(result)data(soil_data) cfg <- make_config( variable = c("pH","EC","BD","OC","MBC","Clay"), type = c("opt","less","less","more","more","opt"), opt_low = c(6.0, NA, NA, NA, NA, 20), opt_high = c(7.0, NA, NA, NA, NA, 35) ) scored <- score_all(soil_data, cfg, group_cols = c("LandUse","Depth")) result <- sqi_topsis(scored, cfg, group_cols = c("LandUse","Depth")) print(result)
Checks that a data frame meets requirements for Soil Quality Index (SQI) computation: correct column types, sufficient sample sizes, absence of infinite values, and appropriate variable configuration.
validate_data( data, group_cols = NULL, config = NULL, min_n = 3, verbose = TRUE )validate_data( data, group_cols = NULL, config = NULL, min_n = 3, verbose = TRUE )
data |
A data frame. The first column(s) should be grouping factors (character or factor); remaining columns should be numeric soil variables. |
group_cols |
Character vector. Names of grouping columns (e.g.,
|
config |
A data frame produced by |
min_n |
Integer. Minimum number of observations per group. Default is 3. |
verbose |
Logical. If |
Invisibly returns a list with components:
Logical. TRUE if all checks pass.
Character vector of warning/info messages.
Data frame of group sizes.
Andrews, S.S., Karlen, D.L., & Cambardella, C.A. (2004). The soil management assessment framework: A quantitative soil quality evaluation method. Soil Science Society of America Journal, 68(6), 1945–1962. doi:10.2136/sssaj2004.1945
data(soil_data) result <- validate_data(soil_data, group_cols = c("LandUse", "Depth")) result$valid result$n_per_groupdata(soil_data) result <- validate_data(soil_data, group_cols = c("LandUse", "Depth")) result$valid result$n_per_group