Package 'OlinkAnalyze'

Title: Facilitate Analysis of Proteomic Data from Olink
Description: A collection of functions to facilitate analysis of proteomic data from Olink, primarily NPX data that has been exported from Olink Software. The functions also work on QUANT data from Olink by log- transforming the QUANT data. The functions are focused on reading data, facilitating data wrangling and quality control analysis, performing statistical analysis and generating figures to visualize the results of the statistical analysis. The goal of this package is to help users extract biological insights from proteomic data run on the Olink platform.
Authors: Kathleen Nevola [aut, cre] (<https://orcid.org/0000-0002-5183-6444>, kathy-nevola), Marianne Sandin [aut] (<https://orcid.org/0000-0001-6186-963X>, marisand), Jamey Guess [aut] (<https://orcid.org/0000-0002-4017-0923>, jrguess), Simon Forsberg [aut] (<https://orcid.org/0000-0002-7451-9222>, simfor), Christoffer Cambronero [aut] (Orbmac), Pascal Pucholt [aut] (<https://orcid.org/0000-0003-3342-1373>, AskPascal), Boxi Zhang [aut] (<https://orcid.org/0000-0001-7758-6204>, boxizhang), Masoumeh Sheikhi [aut] (MasoumehSheikhi), Klev Diamanti [aut] (<https://orcid.org/0000-0002-4922-8415>, klevdiamanti), Amrita Kar [aut] (amrita-kar), Lei Conze [aut] (leiliuC), Kristyn Chin [aut] (kristynchin-olink), Danai Topouza [aut] (dtopouza, <https://orcid.org/0000-0002-6897-9281>), Kristian Hodén [ctb] (<https://orcid.org/0000-0003-0354-0662>, kristianHoden), Per Eriksson [ctb] (<https://orcid.org/0000-0001-7633-403X>, b_watcher), Nicola Moloney [ctb] , Britta Lötstedt [ctb] , Emmett Sprecher [ctb] , Jessica Barbagallo [ctb] (jbarbagallo), Olof Mansson [ctr] (olofmansson), Ola Caster [ctb] (OlaCaster), Olink [cph, fnd]
Maintainer: Kathleen Nevola <[email protected]>
License: AGPL (>= 3)
Version: 4.0.1
Built: 2024-09-25 09:27:15 UTC
Source: CRAN

Help Index


Check data completeness

Description

Throw informative warnings if a dataset appears to have problems

Usage

check_data_completeness(df)

Arguments

df

a NPX dataframe, e.g. from read_NPX()

Value

None. Used for side effects (warnings)

Examples

npx_data1 %>%
    dplyr::mutate(NPX = dplyr::if_else(
                         SampleID == "A1" & Panel == "Olink Cardiometabolic",
                         NA_real_,
                         NPX)) %>%
    OlinkAnalyze:::check_data_completeness()

Example Sample Manifest

Description

Sample manifest is generated randomly to demonstrate use of functions in this package.

Usage

manifest

Format

This dataset contains columns:

SubjectID

Subject Identifier, A-Z

Visit

Visit Number, 1-6

SampleID

138 unique sample IDs

Site

Site1 or Site2

Details

A tibble with 138 rows and 4 columns. This manifest contains 26 example subjects, with 6 visits and 2 sites.


Combine reference and non-reference datasets

Description

The function is used by norm_internal_subset and norm_internal_bridge to combine the reference dataset that has Adj_factor = 0 and the non-reference dataset that used the adjustment factors provided in adj_fct_df.

Usage

norm_internal_adjust(
  ref_df,
  ref_name,
  ref_cols,
  not_ref_df,
  not_ref_name,
  not_ref_cols,
  adj_fct_df
)

Arguments

ref_df

The reference dataset to be used in normalization (required).

ref_name

Project name of the reference dataset (required).

ref_cols

Named list of column names in the reference dataset (required).

not_ref_df

The non-reference dataset to be used in normalization (required).

not_ref_name

Project name of the non-reference dataset (required).

not_ref_cols

Named list of column names in the non-reference dataset (required).

adj_fct_df

Dataset containing the adjustment factors to be applied to the non-reference dataset for (required).

Details

The function calls norm_internal_adjust_ref and norm_internal_adjust_not_ref and combines their outputs.

Value

Tibble or ArrowObject with the normalized dataset.

Author(s)

Klev Diamanti


Add adjustment factors to a dataset

Description

Add adjustment factors to a dataset

Usage

norm_internal_adjust_not_ref(df, name, cols, adj_fct_df, adj_fct_cols)

Arguments

df

The dataset to be normalized (required).

name

Project name of the dataset (required).

cols

Named list of column names in the dataset (required).

adj_fct_df

Dataset containing the adjustment factors to be applied to the dataset not_ref_df (required).

adj_fct_cols

Named list of column names in the dataset containing adjustment factors (required).

Value

Tibble or ArrowObject with the normalized dataset with additional columns "Project" and "Adj_factor".

Author(s)

Klev Diamanti


Modify the reference dataset to be combined with the non-reference normalized dataset

Description

Modify the reference dataset to be combined with the non-reference normalized dataset

Usage

norm_internal_adjust_ref(ref_df, ref_name)

Arguments

ref_df

The reference dataset to be used in normalization (required).

ref_name

Project name of the reference dataset (required).

Value

Tibble or ArrowObject with the reference dataset with additional columns "Project" and "Adj_factor".

Author(s)

Klev Diamanti


Compute median value of the quantification method for each Olink assay

Description

The function computes the median value of the the quantification method for each Olink assay in the set of samples samples, and it adds the column Project.

Usage

norm_internal_assay_median(df, samples, name, cols)

Arguments

df

The dataset to calculate medians from (required).

samples

Character vector of sample identifiers to be used for adjustment factor calculation in the dataset df (required).

name

Project name of the dataset that will be added in the column Project (required).

cols

Named list of column names identified in the dataset df (required).

Details

This function is typically used by internal functions norm_internal_subset and norm_internal_reference_median that compute median quantification value for each assay across multiple samples specified by samples.

Value

Tibble or ArrowObject with one row per Olink assay and the columns OlinkID, Project, and assay_med

Author(s)

Klev Diamanti


Internal bridge normalization function

Description

Internal bridge normalization function

Usage

norm_internal_bridge(
  ref_df,
  ref_samples,
  ref_name,
  ref_cols,
  not_ref_df,
  not_ref_name,
  not_ref_cols
)

Arguments

ref_df

The reference dataset to be used in normalization (required).

ref_samples

Character vector of sample identifiers to be used for adjustment factor calculation in the reference dataset (required).

ref_name

Project name of the reference dataset (required).

ref_cols

Named list of column names in the reference dataset (required).

not_ref_df

The non-reference dataset to be used in normalization (required).

not_ref_name

Project name of the non-reference dataset (required).

not_ref_cols

Named list of column names in the non-reference dataset (required).

Value

Tibble or ArrowObject with the normalized dataset.

Author(s)

Klev Diamanti


Internal function normalizing Olink Explore 3k to Olink Explore 3072

Description

Internal function normalizing Olink Explore 3k to Olink Explore 3072

Usage

norm_internal_cross_product(
  ref_df,
  ref_samples,
  ref_name,
  ref_cols,
  not_ref_df,
  not_ref_name,
  not_ref_cols
)

Arguments

ref_df

The reference dataset to be used in normalization (required).

ref_samples

Character vector of sample identifiers to be used for adjustment factor calculation in the reference dataset (required).

ref_name

Project name of the reference dataset (required).

ref_cols

Named list of column names in the reference dataset (required).

not_ref_df

The non-reference dataset to be used in normalization (required).

not_ref_name

Project name of the non-reference dataset (required).

not_ref_cols

Named list of column names in the non-reference dataset (required).

Value

Tibble or ArrowObject with a dataset with the following additional columns:

  • OlinkID_E3072: Corresponding assay identifier from Olink Explore 3072.

  • Project: Project of origin.

  • BridgingRecommendation: Recommendation of whether the assay is bridgeable or not. One of "NotBridgeable", "MedianCentering", or "QuantileSmoothing".

  • MedianCenteredNPX: NPX values adjusted based on the median of the pair-wise differences of NPX values between bridge samples.

  • QSNormalizedNPX: NPX values adjusted based on the quantile smoothing normalization among bridge samples.

Author(s)

Klev Diamanti


Internal reference median normalization function

Description

Internal reference median normalization function

Usage

norm_internal_reference_median(
  ref_df,
  ref_samples,
  ref_name,
  ref_cols,
  reference_medians
)

Arguments

ref_df

The reference dataset to be used in normalization (required).

ref_samples

Character vector of sample identifiers to be used for adjustment factor calculation in the reference dataset (required).

ref_name

Project name of the reference dataset (required).

ref_cols

Named list of column names in the reference dataset (required).

reference_medians

Dataset with columns "OlinkID" and "Reference_NPX" (required). Used for reference median normalization.

Value

Tibble or ArrowObject with the normalized dataset.

Author(s)

Klev Diamanti


Update column names of non-reference dataset based on those of reference dataset

Description

This function handles cases when specific columns referring to the same thing are named differently in df1 and df2 normalization datasets. It only renames columns panel_version, qc_warn, and assay_warn based on their names in the reference dataset.#'

Usage

norm_internal_rename_cols(ref_cols, not_ref_cols, not_ref_df)

Arguments

ref_cols

Named list of column names identified in the reference dataset.

not_ref_cols

Named list of column names identified in the non-reference dataset.

not_ref_df

Non-reference dataset to be used in normalization.

Value

not_ref_df with updated column names.

Author(s)

Klev Diamanti


Internal subset normalization function

Description

This function performs subset normalization using a subset of the samples from either or both reference and non-reference datasets. When all samples from each dataset are used, the function performs intensity normalization.

Usage

norm_internal_subset(
  ref_df,
  ref_samples,
  ref_name,
  ref_cols,
  not_ref_df,
  not_ref_samples,
  not_ref_name,
  not_ref_cols
)

Arguments

ref_df

The reference dataset to be used in normalization (required).

ref_samples

Character vector of sample identifiers to be used for adjustment factor calculation in the reference dataset (required).

ref_name

Project name of the reference dataset (required).

ref_cols

Named list of column names in the reference dataset (required).

not_ref_df

The non-reference dataset to be used in normalization (required).

not_ref_samples

Character vector of sample identifiers to be used for adjustment factor calculation in the non-reference dataset (required).

not_ref_name

Project name of the non-reference dataset (required).

not_ref_cols

Named list of column names in the non-reference dataset (required).

Value

Tibble or ArrowObject with the normalized dataset.

Author(s)

Klev Diamanti


NPX Data in Long format

Description

Data is generated randomly to demonstrate use of functions in this package.

Usage

npx_data1

Format

In addition to standard read_NPX() columns, this dataset also contains columns:

Subject

Subject Identifier

Treatment

Treated or Untreated

Site

Site indicator, 5 unique values

Time

Baseline, Week.6 and Week.12

Project

Project ID number

Details

A tibble with 29,440 rows and 17 columns. Dataset npx_data1 is an Olink NPX data file (tibble) in long format with 158 unique Sample ID's (including 2 repeats each of control samples: CONTROL_SAMPLE_AS 1 CONTROL_SAMPLE_AS 2). The data also contains 1104 assays (uniquely identified using OlinkID) over 2 Panels.


NPX Data in Long format, Follow-up

Description

Data is generated randomly to demonstrate use of functions in this package. The format is very similar to data(npx_data1). Both datasets can be used together to demonstrate the use of normalization functionality.

Usage

npx_data2

Format

In addition to standard read_NPX() columns, this dataset also contains columns:

Subject

Subject Identifier

Treatment

Treated or Untreated

Site

Site indicator, 5 unique values

Time

Baseline, Week.6 and Week.12

Project

Project ID number

Details

A tibble with 32,384 rows and 17 columns. npx_data2 is an Olink NPX data file (tibble) in long format with 174 unique Sample ID's (including 2 repeats each of control samples: CONTROL_SAMPLE_AS 1 CONTROL_SAMPLE_AS 2). The data also contains 1104 assays (uniquely identified using OlinkID) over 2 Panels. This dataset also contain 16 bridge samples with SampleID's that are also present in data(npx_data1). These sample ID's are: A13, A29, A30, A36, A45, A46, A52, A63, A71, A73, B3, B4, B37, B45, B63, B75


Read in flex data

Description

Called by read_NPX

Usage

read_flex(filename)

Arguments

filename

where the file is located

Value

tibble of data


Function to read NPX data into long format

Description

Imports an NPX or QUANT file exported from Olink Software. No alterations to the output format is allowed.

Usage

read_NPX(filename)

Arguments

filename

Path to Olink Software output file.

Value

A "tibble" in long format. Columns include:

  • SampleID: Sample ID

  • Index: Index

  • OlinkID: Olink ID

  • UniProt: UniProt ID

  • Assay: Protein symbol

  • MissingFreq: Proportion of sample below LOD

  • Panel_Version: Panel Version

  • PlateID: Plate ID

  • QC_Warning: QC Warning Status

  • LOD: Limit of detection

  • NPX: Normalized Protein Expression

Additional columns may be present or missing depending on the platform

Examples

file <- system.file("extdata", "Example_NPX_Data.csv", package = "OlinkAnalyze")
read_NPX(file)

Helper function to read in Olink Explore csv or txt files

Description

Helper function to read in Olink Explore csv or txt files

Usage

read_npx_csv(filename)

Arguments

filename

Path to Olink Software output txt of csv file.

Value

A "tibble" in long format. Some of the columns are:

  • SampleID: Sample ID

  • Index: Index

  • OlinkID: Olink ID

  • UniProt: UniProt ID

  • Assay: Protein symbol

  • MissingFreq: Proportion of sample below LOD

  • Panel_Version: Panel Version

  • PlateID: Plate ID

  • QC_Warning: QC Warning Status

  • LOD: Limit of detection

  • NPX: Normalized Protein Expression

Additional columns may be present or missing depending on the platform

Examples

file <- system.file("extdata", "Example_NPX_Data.csv", package = "OlinkAnalyze")
read_NPX(file)

Helper function to read in Olink Explore parquet output files

Description

Helper function to read in Olink Explore parquet output files

Usage

read_npx_parquet(filename)

Arguments

filename

Path to Olink Software parquet output file.

Value

A "tibble" in long format. Some of the columns are:

  • SampleID: Sample ID

  • OlinkID: Olink ID

  • UniProt: UniProt ID

  • Assay: Protein symbol

  • PlateID: Plate ID

  • Count: Counts from sequences

  • ExtNPX: External control normalized counts

  • NPX: Normalized Protein Expression

Additional columns may be present or missing depending on the platform

Examples

file <- system.file("extdata", "Example_NPX_Data.csv", package = "OlinkAnalyze")
read_NPX(file)

Helper function to read in Olink Explore zip csv files

Description

Helper function to read in Olink Explore zip csv files

Usage

read_npx_zip(filename)

Arguments

filename

Path to Olink Software output zip file.

Value

A "tibble" in long format. Some of the columns are:

  • SampleID: Sample ID

  • Index: Index

  • OlinkID: Olink ID

  • UniProt: UniProt ID

  • Assay: Protein symbol

  • MissingFreq: Proportion of sample below LOD

  • Panel_Version: Panel Version

  • PlateID: Plate ID

  • QC_Warning: QC Warning Status

  • LOD: Limit of detection

  • NPX: Normalized Protein Expression

Additional columns may be present or missing depending on the platform

Examples

try({ # May fail if dependencies are not installed
file <- system.file("extdata", "Example_NPX_Data.csv", package = "OlinkAnalyze")
read_NPX(file)
})

Function to set plot theme

Description

This function sets a coherent plot theme for functions.

Usage

set_plot_theme(font = "Swedish Gothic Thin")

Arguments

font

Font family to use for text elements. Depends on extrafont package.

Value

No return value, used as theme for ggplots

Examples

library(ggplot2)

ggplot(mtcars, aes(x = wt, y = mpg, color = as.factor(cyl))) +
  geom_point(size = 4) +
  set_plot_theme()

ggplot(mtcars, aes(x = wt, y = mpg, color = as.factor(cyl))) +
  geom_point(size = 4) +
  set_plot_theme(font = "")