---
title: "Introduction to greenfeedr"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to greenfeedr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

When working with GreenFeed system you must:

* Retrieve data from each GreenFeed unit
* Check animal visitation
* Filtering and process data for further analysis

The greenfeedr makes these steps fast and easy:

* It provides functions for downloading, reporting, and processing GreenFeed data

This document introduces you to greenfeedr's set of tools, and shows you how to apply them to data frames.

```{r setup}
library(greenfeedr)
```


## Download data with `get_gfdata()`

The first step in working with GreenFeed is to retrieve your data from the system. This can be done manually by logging into the GreenFeed web interface [C-Lock Inc.](https://ext.c-lockinc.com/greenfeed.php) using your username and password.

However, there is a simpler approach: that is via the API using `get_gfdata()` function. This function automates the data retrieval process, allowing you to specify user and password, units, and date range to download the data directly from C-Lock Inc. server.

Note that if you have multiple studies simultaneously, you can define a list of studies as follows:

```{r download data}
USER <- "your_username"
PASS <- "your_password"

studies <- list(
  list(
    Experiment = "Experiment_01",
    Unit = c(2, 3),
    StartDate = "2024-01-20",
    EndDate = Sys.Date(),
    save_dir = "/tempdir()/Experiment_01/"
  ),
  list(
    Experiment = "Experiment_02",
    Unit = c(212),
    StartDate = "2024-02-01",
    EndDate = Sys.Date(),
    save_dir = "/tempdir()/Experiment_02/"
  )
)

# Here you loop (using 'for') over all your studies applying get_gfdata() function
# for (element in studies) {
#  get_gfdata(USER, PASS, element$Experiment, element$Unit, element$StartDate, element$EndDate, element$save_dir)
# }
```


## How it looks the GreenFeed data?

The package provides daily and final datasets for you to explore the basic format of GreenFeed data. The data provided is actual data from a 32-cow study. 
```{r data}
# Open the daily data downloaded from C-Lock Inc. server
daily_data <- readr::read_csv(system.file("extdata", "StudyName_GFdata.csv", package = "greenfeedr"), show_col_types = FALSE)

# View the structure of the daily data
str(daily_data)

# View the first few rows of the daily data
head(daily_data)

# Open the finalized data received from C-Lock Inc.
final_data <- readxl::read_excel(system.file("extdata", "StudyName_FinalReport.xlsx", package = "greenfeedr"),
  col_types = c("text", "text", "numeric", rep("date", 3), rep("numeric", 12), "text", rep("numeric", 6))
)

# View the structure of the daily data
str(final_data)

# View the first few rows of the daily data
head(final_data)
```


## Report data with `report_gfdata()`

The next step in working with GreenFeed is to check animal visitation and gases production on animals and on a daily basis. The `report_gfdata()` function allows you to download the daily data and generates an easy-to-read report to check your GreenFeed on the farm. In addition, you can use `report_gfdata()` to generate a final report of your study by providing the final data received from C-Lock Inc. three to four weeks after the end of your study.

This function is very useful when you have several studies and units running simultaneously to check that the study is going in the right direction.

```{r report data, message = FALSE, echo = FALSE, warning = FALSE}
library(dplyr)
library(ggplot2)

file <- system.file("extdata", "StudyName_FinalReport.xlsx", package = "greenfeedr")

start_date <- "2024-05-13"
end_date <- "2024-05-25"
input_type <- "final"
plot_opt <- "All"
rfid_file <- NULL

df <- readxl::read_excel(file, col_types = c("text", "text", "numeric", rep("date", 3), rep("numeric", 12), "text", rep("numeric", 6)))
names(df)[1:14] <- c(
  "RFID",
  "AnimalName",
  "FeederID",
  "StartTime",
  "EndTime",
  "GoodDataDuration",
  "HourOfDay",
  "CO2GramsPerDay",
  "CH4GramsPerDay",
  "O2GramsPerDay",
  "H2GramsPerDay",
  "H2SGramsPerDay",
  "AirflowLitersPerSec",
  "AirflowCf"
)

# df contains finalized GreenFeed data
df <- df %>%
  ## Remove leading zeros from RFID col to match with IDs
  dplyr::mutate(
    RFID = gsub("^0+", "", RFID),
    ## Extract hours, minutes, and seconds from GoodDataDuration
    GoodDataDuration = round(
      as.numeric(substr(GoodDataDuration, 12, 13)) * 60 + # Hours to minutes
        # as.numeric(substr(GoodDataDuration, 1, 2)) * 60 +  # Hours to minutes
        as.numeric(substr(GoodDataDuration, 15, 16)) + # Minutes
        # as.numeric(substr(GoodDataDuration, 4, 5)) +
        as.numeric(substr(GoodDataDuration, 18, 19)) / 60, # Seconds to minutes
      # as.numeric(substr(GoodDataDuration, 7, 8)) / 60,
      2
    )
  ) %>%
  ## Remove data with Airflow below the threshold (25 l/s) and data in the time range selected
  dplyr::filter(
    AirflowLitersPerSec >= 25,
    as.Date(StartTime) >= as.Date(start_date) & as.Date(StartTime) <= as.Date(end_date)
  )


cols_to_convert <- c("CH4GramsPerDay", "CO2GramsPerDay", "O2GramsPerDay", "H2GramsPerDay")
df[cols_to_convert] <- lapply(df[cols_to_convert], as.numeric)

# Plot 1: Total number of production records per day
plot1 <- ggplot(as.data.frame(table(as.Date(df$StartTime))), aes(x = Var1, y = Freq)) +
  geom_col(color = "black") +
  labs(
    title = "Total Records Per Day",
    x = "",
    y = "Total Records"
  ) +
  geom_text(aes(label = Freq), vjust = -0.5, color = "black", size = 2.2, position = position_dodge(width = 0.9)) +
  theme_classic() +
  theme(
    plot.title = element_text(size = 11, face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1.05, size = 8),
    axis.title.y = element_text(size = 10, face = "bold"),
    legend.position = "none"
  )

plot1

# Assuming RFIDfile is provided or not, set the grouping variable
group_var <- if (!is.null(rfid_file) && is.data.frame(rfid_file) && nrow(rfid_file) > 0) "FarmName" else "RFID"

farmname_order <- df %>%
  dplyr::mutate(day = as.Date(EndTime)) %>%
  dplyr::group_by(!!sym(group_var), day) %>%
  dplyr::summarise(
    n = n(),
    daily_CH4 = weighted.mean(CH4GramsPerDay, GoodDataDuration, na.rm = TRUE)
  ) %>%
  dplyr::group_by(!!sym(group_var)) %>%
  dplyr::summarise(
    n = sum(n),
    daily_CH4 = mean(daily_CH4, na.rm = TRUE)
  ) %>%
  dplyr::arrange(desc(daily_CH4)) %>%
  dplyr::pull(!!sym(group_var))


# Plot 1: Total number of records per animal
plot1 <- df %>%
  dplyr::mutate(day = as.Date(EndTime)) %>%
  dplyr::group_by(!!sym(group_var), day) %>%
  dplyr::summarise(
    n = n(),
    daily_CH4 = weighted.mean(CH4GramsPerDay, GoodDataDuration, na.rm = TRUE)
  ) %>%
  dplyr::group_by(!!sym(group_var)) %>%
  dplyr::summarise(
    n = sum(n),
    daily_CH4 = mean(daily_CH4, na.rm = TRUE)
  ) %>%
  ggplot(aes(x = factor(!!sym(group_var), levels = farmname_order), y = n)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  labs(
    title = "Total Records Per Animal",
    x = "",
    y = "Total Records"
  ) +
  theme_classic() +
  theme(
    plot.title = element_text(size = 11, face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1.05, size = 8),
    axis.title.y = element_text(size = 10, face = "bold"),
    legend.position = "none"
  ) +
  geom_text(aes(label = n), vjust = -1, color = "black", position = position_dodge(width = 0.9), size = 2.2)


# Plot distribution of records throughout the day
plot2 <- df %>%
  dplyr::mutate(AMPM = case_when(
    HourOfDay >= 22 ~ "10PM-4AM",
    HourOfDay < 4 ~ "10PM-4AM",
    HourOfDay >= 4 & HourOfDay < 10 ~ "4AM-10AM",
    HourOfDay >= 10 & HourOfDay < 16 ~ "10AM-4PM",
    HourOfDay >= 16 & HourOfDay < 22 ~ "4PM-10PM",
    TRUE ~ NA_character_
  )) %>%
  dplyr::group_by(!!sym(group_var), AMPM) %>%
  dplyr::summarise(n = n()) %>%
  ggplot(aes(
    x = factor(!!sym(group_var), levels = farmname_order), y = n,
    fill = factor(AMPM, levels = c("10PM-4AM", "4AM-10AM", "10AM-4PM", "4PM-10PM"))
  )) +
  geom_bar(stat = "identity", position = "fill") +
  labs(
    title = "Daily Records Distribution",
    x = "",
    y = "Percentage of total records",
    fill = "Time-Windows (24h)"
  ) +
  theme_classic() +
  theme(
    plot.title = element_text(size = 11, face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1.05, size = 8),
    legend.position = "bottom",
    axis.title.y = element_text(size = 10, face = "bold")
  ) +
  scale_fill_brewer(palette = "BrBG") +
  scale_y_continuous(breaks = c(0, 0.25, 0.50, 0.75, 1), labels = c("0%", "25%", "50%", "75%", "100%"), expand = c(0, 0))


# Create the plots
plot1
plot2
```


## Process data with `process_gfdata()`

It is finally time to process your GreenFeed data for analysis. The most frequently asked questions are:

* What is the minimum acceptable time that my records should last?
* How many records should I have per day?
* How many days with records should I keep per week?

However, since there is no single answer, these values will depend on many factors such as the duration of the study, number of animals, feeding, time of year, etc.

The `process_gfdata()` function allows you to explore your data and use different sets of parameters (param1, param2, and min_time) and check how many records and animals you keep for analysis. To do that you can follow the example in the [README file](https://github.com/GMBog/greenfeedr/blob/main/README.md)


## Process pellet intakes and visits with `pellin()` and `viseat()`

The greenfeedr includes additional functions to help you process daily entries and visits. To check animal visits, you must access the GreenFeed web interface and, in the [data](https://ext.c-lockinc.com/data.php) tab, select “Download Large Dataset” and define the time period for which you want to analyze data.
In the folder that the system downloads to your computer, you will find a file 'feedtimes'.

This is the file you will use as input for pellin and viseat. If you have more than one 'feedtimes' file because you are using multiple GreenFeed units in the same experiment then you just need to include them as a list of files.

Note that you should include the result obtained from the 10-drops test. If units have different gram values, define 'gcup' as a vector with an element for each unit. 

```{r pellin and viseat}
file <- list(system.file("extdata", "feedtimes.csv", package = "greenfeedr"))

result <- pellin(
  file_path = file,
  unit = 1,
  gcup = 34,
  start_date = "2024-05-13",
  end_date = "2024-05-25",
  save_dir = tempdir()
)

head(result)
```

## Citation