---
title: "dicepro - Hyperparameter Search Space Visualization"
author: "dicepro Team"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
    number_sections: true
vignette: >
  %\VignetteIndexEntry{dicepro - Hyperparameter Search Space Visualization}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse   = TRUE,
  comment    = "#>",
  fig.width  = 7,
  fig.height = 5,
  warning    = FALSE,
  message    = FALSE,
  eval       = FALSE
)
```

> **Note:** All code chunks have `eval = FALSE` and are shown for
> illustration only. To run them interactively:
> ```r
> library(dicepro)
> # copy-paste the chunks below into your R session
> ```

# Overview

This vignette explains the two **hyper-parameter search space strategies**
available in dicepro and shows how to visualize the resulting
$(\gamma, \lambda)$ distributions with `create_gamma_lambda_plot()`.

The `hspaceTechniqueChoose` argument controls which strategy is used, both
in `run_experiment()` and in the plot function.

---

# The Two Strategies

## `"all"` - Independent sampling

$\lambda$ and $\gamma$ are each drawn independently from their own
log-uniform distribution:

| Parameter | Distribution | Range |
|---|---|---|
| `lambda_` | Log-uniform | $[1,\; 10^8]$ |
| `gamma`   | Log-uniform | $[1,\; 10^8]$ |
| `p_prime` | Log-uniform | $[10^{-6},\; 1]$ |

No structural constraint links the two parameters. The resulting
$(\gamma, \lambda)$ cloud fills the entire feasible rectangle uniformly
on a log-log scale.

## `"restrictionEspace"` - Linked sampling

$\gamma$ is the **base variable**; $\lambda$ is **derived** via:

$$\lambda = \gamma \times \lambda_\text{factor}, \quad
  \lambda_\text{factor} \sim \text{LogUniform}(2,\; 100)$$

| Parameter | Distribution | Range |
|---|---|---|
| `gamma`         | Log-uniform | $[1,\; 10^5]$ |
| `lambda_factor` | Log-uniform | $[2,\; 100]$  |
| `p_prime`       | Log-uniform | $[0.1,\; 1]$  |

This guarantees $\lambda \geq 2\gamma$ at all times. The feasibility
region is bounded by two diagonal lines in the log-log plane:

- **Lower bound** (red dashed): $\lambda = 2\gamma$
- **Upper bound** (green dashed): $\lambda = 100\gamma$

---

# Visualizing the Search Space

`create_gamma_lambda_plot()` samples 200 configurations (by default) and
renders them as scatter plot on log-log axes.

## `"all"` - Independent space

```{r plot-all, eval=FALSE}
library(dicepro)

p_all <- create_gamma_lambda_plot(hspaceTechniqueChoose = "all")
p_all
```

The cloud fills the square $[1, 10^8]^2$ uniformly, with no structural
relationship between $\gamma$ and $\lambda$.

## `"restrictionEspace"` - Restricted space

```{r plot-restriction, eval=FALSE}
p_restr <- create_gamma_lambda_plot(hspaceTechniqueChoose = "restrictionEspace")
p_restr
```

All points fall within the diagonal band delimited by the two dashed lines.
On log–log axes, the linear $\lambda = c * \gamma$ relationship
appear as parallel straight lines.

---

# Simulated Data

Before running the optimization, we simulate a self-consistent data set
using `simulation()`. The function returns a list with three elements:

- **`$W`** - reference signature matrix (genes × cell types)
- **`$p`** - true proportion matrix (samples × cell types)
- **`$B`** - noisy bulk expression matrix (genes × samples)

`run_experiment()` expects a `dataset` list with keys `$W`, `$P`, and
`$B`. We therefore rename `$p` to `$P` after simulation.

```{r simulate, eval=FALSE}
library(dicepro)
set.seed(2101L)

sim <- simulation(
  loi        = "gauss",
  scenario   = "hierarchical",
  nSample    = 30L,
  nGenes     = 200L,
  nCellsType = 10L,
  sigma_bio  = 0.07,
  sigma_tech = 0.07,
  seed       = 2101L
)

my_dataset <- list(
  W = sim$W,
  P = sim$p,
  B = sim$B
)

cat("W :", nrow(my_dataset$W), "genes x", ncol(my_dataset$W), "cell types\n")
cat("P :", nrow(my_dataset$P), "samples x", ncol(my_dataset$P), "cell types\n")
cat("B :", nrow(my_dataset$B), "genes x", ncol(my_dataset$B), "samples\n")
cat("Row sums of P (range):", round(range(rowSums(my_dataset$P)), 4), "\n")
```

---

# Running the optimization

## Strategy `"all"` - Independent sampling

```{r run-all, eval=FALSE}
results_all <- run_experiment(
  dataset               = my_dataset,
  W_prime               = 0,
  bulkName              = "SimBulk",
  refName               = "SimRef",
  hp_max_evals          = 150L,
  algo_select           = "random",
  output_base_dir       = tempdir(),
  hspaceTechniqueChoose = "all"
)

cat("Completed trials:", nrow(results_all$trials), "\n")
head(results_all$trials[, c("lambda_", "gamma", "p_prime", "loss", "constraint")])
```

## Strategy `"restrictionEspace"` - linked sampling

```{r run-restriction, eval=FALSE}
results_restr <- run_experiment(
  dataset               = my_dataset,
  W_prime               = 0,
  bulkName              = "SimBulk",
  refName               = "SimRef",
  hp_max_evals          = 150L,
  algo_select           = "random",
  output_base_dir       = tempdir(),
  hspaceTechniqueChoose = "restrictionEspace"
)

cat("Completed trials:", nrow(results_restr$trials), "\n")
head(results_restr$trials[, c("lambda_", "gamma", "p_prime", "loss", "constraint")])
```

---

# Comparing the Two Strategies

Once both runs are complete, we can overlay their $(\gamma, \lambda)$
distributions to compare coverage:

```{r compare-spaces, eval=FALSE, fig.height=6}
best_all   <- results_all$trials[which.min(results_all$trials$loss), ]
best_restr <- results_restr$trials[which.min(results_restr$trials$loss), ]

cat("--- all ---\n")
cat(sprintf("  lambda = %.3g  |  gamma = %.3g  |  loss = %.4f\n",
            best_all$lambda_, best_all$gamma, best_all$loss))

cat("--- restrictionEspace ---\n")
cat(sprintf("  lambda = %.3g  |  gamma = %.3g  |  loss = %.4f\n",
            best_restr$lambda_, best_restr$gamma, best_restr$loss))

plot(
  results_all$trials$gamma,
  results_all$trials$lambda_,
  log  = "xy",
  pch  = 19, cex = 0.5,
  col  = adjustcolor("steelblue", 0.4),
  xlab = expression(gamma), ylab = expression(lambda),
  main = "Sampled configurations: all (blue) vs restrictionEspace (orange)"
)
points(
  results_restr$trials$gamma,
  results_restr$trials$lambda_,
  pch = 19, cex = 0.5,
  col = adjustcolor("darkorange", 0.4)
)
legend("topleft",
       legend = c("all", "restrictionEspace"),
       col    = c("steelblue", "darkorange"),
       pch    = 19, pt.cex = 1.2)
```

---

# Session Info

```{r session-info, eval=FALSE}
sessionInfo()
```