---
title: "Temporal Disaggregation of IBM's GHG Emissions"
#author: "Your Name"
#date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Temporal Disaggregation of IBM's GHG Emissions}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```



# Introduction

In this vignette, we demonstrate how to use the **DisaggregateTS** package to perform temporal disaggregation of IBM's greenhouse gas (GHG) emissions data. The goal is to estimate quarterly GHG emissions based on annual data, leveraging high-frequency economic indicators.

## Background

By focusing on emissions per unit of economic output, carbon intensity accounts for the fact that larger organizations or economies may naturally produce more emissions simply due to scale. This allows for a fair comparison of sustainability performance across different entities, regardless of size.

Accurate and timely carbon accounting and the development of robust measurement frameworks are essential for effective emission reduction strategies and the pursuit of sustainable development goals. While carbon accounting frameworks offer valuable insights into emissions quantification, they are not without limitations. One of those limitations is the frequency with which this information is released, generally at an annual frequency, while most companies’ economic indicators are made public on a quarterly basis. This is a perfect example in which temporal disaggregation can be used to bridge the gap between data availability and prompt economic and financial analyses.

In this application, the variable of interest is the GHG emissions for IBM between Q3 2005 and Q3 2021, at annual frequency, resulting in 17 data points (i.e., \(\mathbf{Y} \in \mathbb{R}^{17}\)). For the high-frequency data, we used the balance sheet, income statement, and cash flow statement quarterly data between Q3 2005 and Q3 2021, resulting in 68 data points for the 112 variables (after filtering). We remove variables that have a pairwise correlation higher than 0.99, resulting in a filtered dataset with 112 variables (\(\mathbf{X} \in \mathbb{R}^{68 \times 112}\)).

In this example, we employ the adaptive LASSO method (`method = "adaptive-spTD"`) to select the best variables that can be used to recover the high-frequency observations, and we apply the `aggMat = "first"` aggregation method.

# Data Preparation

We start by loading the required packages and data.


```{r setup, message=FALSE, warning=FALSE}
library(DisaggregateTS)
```

```{r}
# Load the combined data from the package
data(combined_data)

# Extract Data_Y and Data_X from the combined data
Data_Y <- combined_data$Data_Y
Data_X <- combined_data$Data_X

# Select IBM GHG data and dates for Q3 2005 - Q3 2021
Y <- Data_Y$IBM[7:23]
Dates <- as.Date(Data_Y$Dates[7:23])

# Filter high-frequency data (Q3 2005 - Q3 2021)
X <- Data_X[24:91, ]

# Convert all columns to numeric, remove columns with NA values
X <- as.data.frame(lapply(X, as.numeric))
Y <- as.numeric(Y)
X <- X[, colSums(is.na(X)) == 0]

# Remove highly correlated variables (pairwise correlation >= 0.99)
corr_matrix <- cor(X, use = "complete.obs")
corr_matrix[upper.tri(corr_matrix)] <- 0
diag(corr_matrix) <- 0

X_filtered <- X[, !apply(corr_matrix, 2, function(x) any(abs(x) >= 0.99))]
```

# Temporal Disaggregation

```{r}
result <- disaggregate(
  Y = as.matrix(Y),
  X = as.matrix(X_filtered),
  aggMat = "first",
  aggRatio = 4,
  method = "adaptive-spTD"
)

# High-frequency estimates and coefficients
Y_HF <- result$y_Est
beta_Est <- result$beta_Est
rho_Est <- result$rho_Est

# Display estimated rho
print(paste("Estimated rho:", rho_Est))
```

# Results

```{r plot-results, fig.width=8, fig.height=5, echo=FALSE}
# Ensure Dates_Q is in Date format
Dates_Q <- as.Date(Data_X$Dates[24:91])

# Plot the disaggregated and interpolated results
plot(Dates_Q, Y_HF, type = "l", ylab = "GHG Emissions", xlab = "Time", lwd = 2, col = "black")

# Add points on top of the line
points(Dates_Q, Y_HF, col = "black", pch = 16)

points(Dates, Y, col = "red", pch = 16)

# Add legend
legend("topright", legend = c("Observed Annual", "Disaggregated"), 
       col = c("red", "black"), lty = c(NA, 1), pch = c(16, NA))
```