HhP

Table of contents

  1. Description
  2. Methodology
  3. Quick Start

Description

In cancer research, supervised heterogeneity analysis has important implications. Such analysis has been traditionally based on clinical/demographic/molecular variables. Recently, histopathological imaging features, which are generated as a byproduct of biopsy, have been shown as effective for modeling cancer outcomes, and a handful of supervised heterogeneity analysis has been conducted based on such features. There are two types of histopathological imaging features, which are extracted based on specific biological knowledge and using automated imaging processing software, respectively. Using both types of histopathological imaging features, our goal is to conduct the first supervised cancer heterogeneity analysis that satisfies a hierarchical structure. That is, the first type of imaging features defines a rough structure, and the second type defines a nested and more refined structure. A penalization approach is developed, which has been motivated by but differs significantly from penalized fusion and sparse group penalization. It has satisfactory statistical and numerical properties. In the analysis of lung adenocarcinoma data, it identifies a heterogeneity structure significantly different from the alternatives and has satisfactory prediction and stability performance.

Methodology

Model setting

Consider n independent subjects with measurements {yi, xi, zi}i = 1n, where xi = (xi1, ⋯, xiq) and zi = (zi1, ⋯, zip). Here for subject i, yi is the response variable, and xi and zi are the first and second type of imaging features, respectively. Consider the heterogeneity model: where βi and γi are the q- and p-dimensional vectors of unknown regression coefficients, respectively, and ϵi is the random error with E(ϵi) = 0 and Var(ϵi) = σ2. Intercept is omitted for the simplicity of notation. We consider linear regression for a continuous response, which matches the data analysis in Section 4. Note that the proposed approach is potentially applicable to other types of response/model. Each subject is flexibly modeled to have its own regression coefficients, and two subjects belong to the same (sub)subgroup if and only if they have the same regression model/coefficients.

Significantly advancing from the existing literature, we consider a more sophisticated heterogeneity structure as sketched in the lower panel of Figure (a), where βi’s define a ``rough’’ heterogeneity structure with K1 subgroups, and γi’s define a more refined heterogeneity structure with K2 sub-subgroups. Denote {𝒢1*, ⋯, 𝒢K1*} as the collection of subject index sets of the K1 subgroups, and {𝒯1*, ⋯, 𝒯K2*} as the collection of subject index sets of the K2 sub-subgroups. The hierarchy of heterogeneity amounts to a nested structure. That is, there exists a mutually exclusive partition of {1, ⋯, K2}: {ℋ1, ⋯, ℋK1} satisfying 𝒢k1* = ⋃k2 ∈ ℋk1𝒯k2*, $1 k_1 K_1 $, $1 k_2 K_2 $.

Reguarlized estimation

For simultaneous estimation and determination of the heterogeneity structure, we propose the penalized objective function: where β = (β1, ⋯, βn), γ = (γ1, ⋯, γn), and p(⋅, λ) is a concave penalty function with tuning parameter λ > 0. In our numerical study, we adopt MCP (Minimax Concave Penalty, ), and note that SCAD (Smoothly Clipped Absolute Deviation Penalty) and some other penalties are also applicable. Consider $(\widehat{\boldsymbol{\beta}}, \widehat{\boldsymbol{\gamma}} )=\underset{ \boldsymbol{\beta},\boldsymbol{\gamma} }{\mathrm{argmin}} \ Q(\boldsymbol{\beta},\boldsymbol{\gamma})$. Denote $\{\widehat{\boldsymbol{\alpha}}_1 , \cdots, \widehat{\boldsymbol{\alpha}}_{\widehat{K}_1} \}$ and $\{\widehat{\boldsymbol{\delta}}_1 , \cdots, \widehat{\boldsymbol{\delta}}_{\widehat{K}_2} \}$ as the distinct values of $\widehat{\boldsymbol{\beta}}$ and $\widehat{\boldsymbol{\gamma}}$, respectively. Then $\{ \widehat{\mathcal{G}}_{1}, \cdots, \widehat{\mathcal{G}}_{\widehat{K}_1} \}$ and $\{ \widehat{\mathcal{T}}_{1}, \cdots, \widehat{\mathcal{T}}_{\widehat{K}_2} \}$ constitute mutually exclusive partitions of {1, ⋯, n}, where $\widehat{\mathcal{G}}_{k_1} = \{i: \widehat{\boldsymbol{\beta}}_i = \widehat{\boldsymbol{\alpha}}_{k_1}, i=1, \cdots, n \}$, k1 = 1, ⋯, 1 and $\widehat{\mathcal{T}}_{k_2} = \{i: \widehat{\boldsymbol{\gamma}}_i = \widehat{\boldsymbol{\delta}}_{k_2}, i=1, \cdots, n \}$, k2 = 1, ⋯, 2. Collectively, they fully determine the heterogeneity structure.

Quick Start

library(HhP)
library(Matrix)
library(MASS)
library(fmrs)
data(example.data.reg)
n   = example.data.reg$n
q   = example.data.reg$q
p   = example.data.reg$p
# ------------ Necessary parameters to support algorithm implementation --------
beta.init.list  =  gen_int_beta(n, p, q, example.data.reg)
beta.init  =  beta.init.list$beta.init
lambda  =  genelambda.obo()
result  =  HhP.reg(lambda, example.data.reg, n, q, p, beta.init)
index.list  =  evaluation.sum(n,q,p, result$admmres, result$abic.n, result$admmres2, example.data.reg$Beta0, result$bic.var)
index.list$err.s

References:

  • Ren, M., Zhang, Q., Zhang, S., Zhong, T., Huang, J. & Ma, S. (2022+). Hierarchical cancer heterogeneity analysis based on histopathological imaging features. Biometrics. doi:10.1111/biom.13426