Title: | Markov Chain-Based Cost-Optimal Control Charts |
---|---|
Description: | Functions for cost-optimal control charts with a focus on health care applications. Compared to assumptions in traditional control chart theory, here, we allow random shift sizes, random repair and random sampling times. The package focuses on X-bar charts with a sample size of 1 (representing the monitoring of a single patient at a time). The methods are described in Zempleni et al. (2004) <doi:10.1002/asmb.521>, Dobi and Zempleni (2019) <doi:10.1002/qre.2518> and Dobi and Zempleni (2019) <http://ac.inf.elte.hu/Vol_049_2019/129_49.pdf>. |
Authors: | Balazs Dobi & Andras Zempleni |
Maintainer: | Balazs Dobi <[email protected]> |
License: | GPL |
Version: | 2.1.5 |
Built: | 2024-10-13 07:49:33 UTC |
Source: | CRAN |
Pseudonymised and randomised time series data of 800 patients. The patients are divided into two main groups by therapy type: insulin analogues (artificial insulins) and glucagon-like peptides (GLP, promotes insulin secretion). The patients' well-being is indicated by the blood sugar (more accurately, the glicated haemoglobin - HbA1c) level.
data("diabetes")
data("diabetes")
A data frame with 87598 observations on the following 11 variables.
ID
Patient ID
DATE
Date of the sampling/observation
AGE
Age of the patient
THERAPY
Therapy type
THERAPY_COST_EUR
Therapy cost
HEALTHCARE_BURDEN_EUR
Event (e.g. heart attack) cost for the health care provider
HBA1C_AVG
Blood sugar average for the 30-day sampling cycle
HBA1C_SD
Blood sugar standard deviation for the 30-day sampling cycle
SAMPLING_IN_MONTH
Number of sampling for the 30-day sampling cycle
ICD
Diabetes diagnosis type (International Classification of Diseases)
THERAPY_VECTOR
Therapy vector of the patient, i.e. taking into account the time the therapy lasts after initiation
The example data focuses on two therapy types: insulin analogues (artificial insulins) and glucagon-like peptides (GLP, promotes insulin secretion). Of course there are more treatment types, the database also lists oral antidiabetics (OAD) and human insulins, but we choose to make the data simpler by focusing on GLP and analogue therapies. For the sake of comparison the therapies are grouped in this way: the first group is insulin analogues with possible parallel OAD therapies but human insulin and GLP excluded. The second group is GLP therapies with possible parallel OAD and insulin analogue therapies but human insulin excluded. Essentially we are comparing the effect and cost of insulin analogues with the effect and cost of additional GLP therapies. For cost calculations, the 2021 March 21 EUR-HUF exchange rate was used (1 EUR = 369.05 HUF).
The example below contains a lengthy code which exemplifies the process of gathering useful data for control chart use. Detailed application of this data can be found in the package's vignette.
The original dataset is based on a month-aggregated time series data of diabetic patients from Hungary which was gathered from the period of 2007 September - 2017 September. The data came from two sources: the National Health Insurance Fund of Hungary and the South-Pest Central Hospital. The first source provided information about diagnoses, treatments, health care event and related costs while the latter provided laboratory data regarding blood sugar level. Patients with International Classification of Diseases (ICD) codes (diagnosis) of E10, E11 and E14, and at least one blood sugar measurement were included initially. Only the data of patients with at least one E11 (type II diabetes) diagnosis in the study period was kept. An additional homogenising filter was the requirement of age above 40 at the time of the first diagnosis. Disease progression and therapy effectiveness estimation required at least two blood sugar (HbA1c) measurements with simultaneous therapy data. A total of 4434 patients satisfied all conditions out of which 2151 had at least two HbA1c measurements.
https://ecmiindmath.org/2019/08/19/markov-chain-based-cost-optimal-control-charts/
data("diabetes") str(diabetes) ## Not run: ##### Example of data assessment for control chart use ##### ### Packages require(zoo) require(reshape2) RANDOMISED_DATA <- diabetes ### Functions weighted.var <- function(x, w, na.rm = FALSE) { if (na.rm) { w <- w[i <- !is.na(x)] x <- x[i] } sum.w <- sum(w) sum.w2 <- sum(w^2) mean.w <- sum(x * w) / sum(w) (sum.w / (sum.w^2 - sum.w2)) * sum(w * (x - mean.w)^2, na.rm = na.rm) } estBetaParams <- function(mu, var) { alpha <- ((1 - mu) / var - 1 / mu) * mu ^ 2 beta <- alpha * (1 / mu - 1) return(params = list(alpha = alpha, beta = beta)) } ### Setting up data # Way too high HbA1C levels are discarded as outliers RANDOMISED_DATA$HBA1C_AVG[RANDOMISED_DATA$HBA1C_AVG>20 & !is.na(RANDOMISED_DATA$HBA1C_AVG)] <- NA # Lowest HbA1c level taken into account lowest <- 4 ### Gathering data for several estimates RANDOMISED_DATA <- RANDOMISED_DATA[RANDOMISED_DATA$ID %in% RANDOMISED_DATA$ID[grepl("E11",RANDOMISED_DATA$ICD)],] # Process standard deviation sigma_param <- sigma <- sqrt(weighted.mean((RANDOMISED_DATA$HBA1C_SD[ RANDOMISED_DATA$SAMPLING_IN_MONTH>=2 & !is.na(RANDOMISED_DATA$SAMPLING_IN_MONTH)])^2, RANDOMISED_DATA$SAMPLING_IN_MONTH[RANDOMISED_DATA$SAMPLING_IN_MONTH>=2 & !is.na(RANDOMISED_DATA$SAMPLING_IN_MONTH)])) IDLIST <- unique(RANDOMISED_DATA$ID[!is.na(RANDOMISED_DATA$HBA1C_AVG)][ duplicated(RANDOMISED_DATA$ID[!is.na(RANDOMISED_DATA$HBA1C_AVG)])]) IDLIST <- unique(RANDOMISED_DATA$ID[(RANDOMISED_DATA$ID %in% IDLIST) & RANDOMISED_DATA$AGE>39]) shiftdat <- NULL stimedat <- NULL repaidat <- NULL deltats <- NULL deltaATC <- NULL for(i in IDLIST) { smalldat <- RANDOMISED_DATA[RANDOMISED_DATA$ID==i,c("DATE","HBA1C_AVG","THERAPY_VECTOR")] smalldat <- smalldat[!is.na(smalldat$DATE) & !is.na(smalldat$HBA1C_AVG),] patshiftdat <- as.data.frame(cbind(i,smalldat$DATE[2:dim(smalldat)[1]],diff(smalldat$DATE), diff(smalldat$HBA1C_AVG))[diff(smalldat$HBA1C_AVG)>2*sigma,,drop=FALSE]) if(dim(patshiftdat)[1]>1) stimedat <- rbind(stimedat,cbind(i,diff(as.Date(patshiftdat$V2)))) patrepaidat <- as.data.frame(cbind(i,diff(smalldat$DATE),(smalldat$HBA1C_AVG-lowest)[2: length(smalldat$HBA1C_AVG)]/(smalldat$HBA1C_AVG-lowest)[1:(length(smalldat$HBA1C_AVG)-1)], as.character(smalldat$THERAPY_VECTOR[1:(length(smalldat$THERAPY_VECTOR)-1)]))[ (which(diff(smalldat$HBA1C_AVG)<(-2*sigma) & smalldat$HBA1C_AVG[1:(length(smalldat$HBA1C_AVG)-1)]>6 & smalldat$HBA1C_AVG[1:(length(smalldat$HBA1C_AVG)-1)]<=20)),,drop=FALSE]) shiftdat <- rbind(shiftdat,patshiftdat) repaidat <- rbind(repaidat,patrepaidat) deltats <- rbind(deltats,cbind(i,diff(as.Date(RANDOMISED_DATA$DATE[ !is.na(RANDOMISED_DATA$HBA1C_AVG) & RANDOMISED_DATA$ID==i])))) try(deltaATC <- rbind(deltaATC,cbind(i,diff(as.Date(RANDOMISED_DATA$DATE[ !is.na(RANDOMISED_DATA$THERAPY) & RANDOMISED_DATA$ID==i])))), silent=TRUE) } colnames(shiftdat) <- c("ID","TIME","TIMEDIFF","SHIFTSIZE") colnames(deltats) <- c("ID","DeltaT") colnames(deltaATC) <- c("ID","deltaATC") # delta: expected shift size delta_param <- mean(shiftdat$SHIFTSIZE[shiftdat$TIMEDIFF>=90 & shiftdat$TIMEDIFF<184]) # Empirical distribution of elapsed times (between samplings) summary(deltats[,2]) mean(deltats[,2]) median(deltats[,2]) sd(deltats[,2]) # s: expected number of shifts per unit time stimedat <- as.data.frame(stimedat) colnames(stimedat) <- c("ID","TIMEDIFF") s_param <- 1/mean(stimedat$TIMEDIFF[stimedat$TIMEDIFF<367]) # alphas, betas: therapy effectiveness parameters colnames(repaidat) <- c("ID","TIMEDIFF","REMAIN","THERAP") repaidat$REMAIN <- as.numeric(as.character(repaidat$REMAIN)) repaidat$TIMEDIFF <- as.numeric(as.character(repaidat$TIMEDIFF)) repaidat$THERAP <- as.character(repaidat$THERAP) repaidat <- repaidat[repaidat$TIMEDIFF>=90 & repaidat$TIMEDIFF<184,] repaidat$REMAIN[repaidat$REMAIN<0] <- 0 ther_eff <- as.data.frame(rbind( cbind("ANALOGUE",repaidat$REMAIN[repaidat$TIMEDIFF>=90 & repaidat$TIMEDIFF<184 & grepl("ANALOGUE",repaidat$THERAP) & !grepl("-H",repaidat$THERAP) & !grepl("GLP",repaidat$THERAP)]), cbind("GLP",repaidat$REMAIN[repaidat$TIMEDIFF>=90 & repaidat$TIMEDIFF<184 & grepl("GLP",repaidat$THERAP) & !grepl("-H",repaidat$THERAP)]))) ther_eff[,1] <- factor(ther_eff[,1], levels = c("ANALOGUE", "GLP")) ther_eff[,2] <- as.numeric(as.character(ther_eff[,2])) colnames(ther_eff) <- c("Type","Effectiveness") ANALOGUE <- estBetaParams(mean(repaidat$REMAIN[repaidat$TIMEDIFF>=90 & repaidat$TIMEDIFF<184 & grepl("ANALOGUE",repaidat$THERAP) & !grepl("-H",repaidat$THERAP) & !grepl("GLP",repaidat$THERAP)]), var(repaidat$REMAIN[repaidat$TIMEDIFF>=90 & repaidat$TIMEDIFF<184 & grepl("ANALOGUE",repaidat$THERAP) & !grepl("-H",repaidat$THERAP) & !grepl("GLP",repaidat$THERAP)])) GLP <- estBetaParams(mean(repaidat$REMAIN[repaidat$TIMEDIFF>=90 & repaidat$TIMEDIFF<184 & grepl("GLP",repaidat$THERAP) & !grepl("-H",repaidat$THERAP)]), var(repaidat$REMAIN[repaidat$TIMEDIFF>=90 & repaidat$TIMEDIFF<184 & grepl("GLP",repaidat$THERAP) & !grepl("-H",repaidat$THERAP)])) ### Repair cost HBA1C_fill <- NULL for (i in unique(RANDOMISED_DATA$ID[!is.na(RANDOMISED_DATA$HBA1C_AVG)])) { vec <- RANDOMISED_DATA$HBA1C_AVG[RANDOMISED_DATA$ID==i] vec[which(is.na(vec))[which(is.na(vec))<which(!is.na(vec))[1]]] <- vec[which(!is.na(vec))[1]] vec[which(is.na(vec))[which(is.na(vec))>which(!is.na(vec))[length(which(!is.na(vec)))]]] <- vec[which(!is.na(vec))[length(which(!is.na(vec)))]] vec <- na.approx(vec) HBA1C_fill <- rbind(HBA1C_fill,cbind(i,vec)) smaldat <- RANDOMISED_DATA[RANDOMISED_DATA$ID==i,] smaldat$THERAPY_COST_EUR[smaldat$THERAPY_COST_EUR==0 & smaldat$THERAPY_VECTOR!=""] <- NA if(is.na(smaldat$THERAPY_COST_EUR[1])) smaldat$THERAPY_COST_EUR[1] <- 0 new_burden <- na.locf(smaldat$THERAPY_COST_EUR) seged <- cbind(rle(is.na(smaldat$THERAPY_COST_EUR))[[2]], rle(is.na(smaldat$THERAPY_COST_EUR))[[1]]) seged[,2][seged[,1]==0] <- seged[,2][seged[,1]==0]-1 seged[,2][seged[,1]==1] <- seged[,2][seged[,1]==1]+1 if(seged[length(seged[,1]),1]==0) seged[length(seged[,2]),2] <- seged[length(seged[,2]),2]+1 seged2 <- cbind(rep(seged[,1], seged[,2]),rep(seged[,2], seged[,2])) new_burden[seged2[,1]==1] <- new_burden[seged2[,1]==1]/seged2[,2][seged2[,1]==1] RANDOMISED_DATA$THERAPY_COST_EUR[RANDOMISED_DATA$ID==i] <- new_burden } RANDOMISED_DATA$HBA1C_fill <- NA RANDOMISED_DATA$HBA1C_fill[RANDOMISED_DATA$ID%in%HBA1C_fill[,1]] <- HBA1C_fill[,2] RANDOMISED_DATA$HBA1C_fill_filter <- RANDOMISED_DATA$HBA1C_fill RANDOMISED_DATA$HBA1C_fill_filter[RANDOMISED_DATA$HBA1C_fill_filter>=10] <- NA discparam <- 150 cutHBA1C_AVG <- cut(na.omit(RANDOMISED_DATA$HBA1C_fill_filter),breaks=discparam) newlvls <- seq(min(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)), max(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)), (max(na.omit(RANDOMISED_DATA$HBA1C_fill_filter))- min(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)))/discparam)[1:discparam] + (max(na.omit(RANDOMISED_DATA$HBA1C_fill_filter))- min(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)))/discparam/2 levels(cutHBA1C_AVG) <- newlvls costs <- cbind(as.numeric(as.character(cutHBA1C_AVG)), RANDOMISED_DATA$THERAPY_COST_EUR[!is.na( RANDOMISED_DATA$HBA1C_fill_filter)]/30, as.character(RANDOMISED_DATA$THERAPY[ !is.na(RANDOMISED_DATA$HBA1C_fill_filter)])) costs <- as.data.frame(costs) colnames(costs) <- c("HBA1C","HC_BURDEN","THERAP") costs$HBA1C <- as.numeric(as.character(costs$HBA1C)) costs$HC_BURDEN <- as.numeric(as.character(costs$HC_BURDEN)) costs$THERAP <- as.character(costs$THERAP) costs$THERAP[grepl("ANALOGUE", costs$THERAP) & !grepl("GLP", costs$THERAP)] <- "ANALOGUE" costs$THERAP[grepl("GLP",costs$THERAP)] <- "GLP" cost.ANALOGUE <- as.data.frame(cbind(sort(unique(costs$HBA1C[costs$THERAP=="ANALOGUE"])), as.numeric(tapply(costs$HC_BURDEN[costs$THERAP=="ANALOGUE"], costs$HBA1C[costs$THERAP=="ANALOGUE"],mean)))) colnames(cost.ANALOGUE) <- c("HBA1C","HC_BURDEN") cost.GLP <- as.data.frame(cbind(sort(unique(costs$HBA1C[costs$THERAP=="GLP"])), as.numeric(tapply(costs$HC_BURDEN[costs$THERAP=="GLP"], costs$HBA1C[costs$THERAP=="GLP"],mean)))) colnames(cost.GLP) <- c("HBA1C","HC_BURDEN") ## ANALOGUE therapy # Mean cost.ANALOGUE <- na.omit(as.data.frame(cbind(as.numeric( costs$HBA1C[costs$THERAP=="ANALOGUE"]), costs$HC_BURDEN[costs$THERAP=="ANALOGUE"]))) colnames(cost.ANALOGUE) <- c("HBA1C","HC_BURDEN") cost.ANALOGUE <- cost.ANALOGUE[order(cost.ANALOGUE$HBA1C),] cost.ANALOGUE <- cost.ANALOGUE[cost.ANALOGUE$HBA1C>lowest,] cost.ANALOGUE$HBA1C <- cost.ANALOGUE$HBA1C-min(lowest) # Fit non-linear model mod.ANALOGUE <- nls(HC_BURDEN ~ a + b/(HBA1C + c), start = list(a = 5, b = -5, c = 1), cost.ANALOGUE, control = list(maxiter = 50000, minFactor = 0.000000000000001)) cost_ANALOGUE_plotdat <- cbind("ANALOGUE",as.data.frame(cbind(seq(0,6,6/99), predict(mod.ANALOGUE, newdata=data.frame(HBA1C = seq(0,6,6/99)))))) # Variance cost_var.ANALOGUE <- na.omit(as.data.frame(cbind(sort(unique( costs$HBA1C[costs$THERAP=="ANALOGUE"])), as.numeric(tapply(costs$HC_BURDEN[costs$THERAP=="ANALOGUE"], costs$HBA1C[costs$THERAP=="ANALOGUE"],var))))) colnames(cost_var.ANALOGUE) <- c("HBA1C","HC_BURDEN") cost_var.ANALOGUE <- cost_var.ANALOGUE[cost_var.ANALOGUE$HBA1C>lowest,] cost_var.ANALOGUE$HBA1C <- cost_var.ANALOGUE$HBA1C-min(lowest) # Fit non-linear model mod_var.ANALOGUE <- nls(HC_BURDEN ~ a + b/(HBA1C + c), start = list(a = 5, b = -3, c = 0.1), cost_var.ANALOGUE[cost_var.ANALOGUE$HBA1C<10-lowest,], control = list(maxiter = 50000, minFactor = 0.000000000000001)) cost_ANALOGUE_plotdat <- cbind(cost_ANALOGUE_plotdat, cost_ANALOGUE_plotdat[,3] - sqrt(predict(mod_var.ANALOGUE, newdata=data.frame(HBA1C = seq(0,6,6/99)))), cost_ANALOGUE_plotdat[,3] + sqrt(predict(mod_var.ANALOGUE, newdata=data.frame(HBA1C = seq(0,6,6/99))))) colnames(cost_ANALOGUE_plotdat) <- c("Data","HbA1c","Therapy cost","low","high") ## GLP # Mean cost.GLP <- na.omit(as.data.frame(cbind(as.numeric( costs$HBA1C[costs$THERAP=="GLP"]), costs$HC_BURDEN[costs$THERAP=="GLP"]))) colnames(cost.GLP) <- c("HBA1C","HC_BURDEN") cost.GLP <- cost.GLP[order(cost.GLP$HBA1C),] cost.GLP <- cost.GLP[cost.GLP$HBA1C>lowest,] cost.GLP$HBA1C <- cost.GLP$HBA1C-min(lowest) # Simple linear mod.GLP <- nls(HC_BURDEN ~ a + b * HBA1C, start = list(a = 1, b = 1), cost.GLP, control = list(maxiter = 50000, minFactor = 0.000000000000001)) cost_GLP_plotdat <- cbind("GLP",as.data.frame(cbind(seq(0,6,6/99), predict(mod.GLP, newdata=data.frame(HBA1C = seq(0,6,6/99)))))) # Variance cost_var.GLP <- na.omit(as.data.frame(cbind(sort(unique( costs$HBA1C[costs$THERAP=="GLP"])), as.numeric(tapply(costs$HC_BURDEN[costs$THERAP=="GLP"], costs$HBA1C[costs$THERAP=="GLP"],var))))) colnames(cost_var.GLP) <- c("HBA1C","HC_BURDEN") cost_var.GLP <- cost_var.GLP[cost_var.GLP$HBA1C>lowest,] cost_var.GLP$HBA1C <- cost_var.GLP$HBA1C-min(lowest) # Simple linear mod_var.GLP <- nls(HC_BURDEN ~ a + b*(HBA1C), start = list(a = 5, b = -3), cost_var.GLP, control = list(maxiter = 50000, minFactor = 0.000000000000001)) cost_GLP_plotdat <- cbind(cost_GLP_plotdat, cost_GLP_plotdat[,3] - sqrt(predict(mod_var.GLP, newdata=data.frame(HBA1C = seq(0,6,6/99)))), cost_GLP_plotdat[,3] + sqrt(predict(mod_var.GLP, newdata=data.frame(HBA1C = seq(0,6,6/99))))) colnames(cost_GLP_plotdat) <- c("Data","HbA1c","Therapy cost","low","high") ### Out-of-control cost COST_CUMU<-NULL for (i in unique(RANDOMISED_DATA$ID[!is.na(RANDOMISED_DATA$HEALTHCARE_BURDEN_EUR)])) { vec <- RANDOMISED_DATA$HEALTHCARE_BURDEN_EUR[RANDOMISED_DATA$ID==i] vec2 <- rollapply(vec,min(6,length(vec)), sum,align="left",partial=TRUE)/ (pmin(length(vec)-(1:length(vec))+1,6)*30) COST_CUMU <- rbind(COST_CUMU,cbind(i,vec2)) } RANDOMISED_DATA$COST_CUMU <- NA RANDOMISED_DATA$COST_CUMU[RANDOMISED_DATA$ID%in%COST_CUMU[,1]] <- COST_CUMU[,2] discparam <- 150 cutHBA1C_AVG <- cut(na.omit(RANDOMISED_DATA$HBA1C_fill_filter),breaks=discparam) newlvls <- seq(min(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)), max(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)), (max(na.omit(RANDOMISED_DATA$HBA1C_fill_filter))- min(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)))/discparam)[1:discparam] + (max(na.omit(RANDOMISED_DATA$HBA1C_fill_filter))- min(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)))/discparam/2 levels(cutHBA1C_AVG) <- newlvls ooc_costs <- cbind(round(as.numeric(as.character(cutHBA1C_AVG)),1), RANDOMISED_DATA$COST_CUMU[!is.na(RANDOMISED_DATA$HBA1C_fill_filter)]) ooc_costs <- as.data.frame(ooc_costs) # Mean disc_ooc_cost <- as.data.frame(cbind(as.numeric(ooc_costs[,1]),ooc_costs[,2])) colnames(disc_ooc_cost) <- c("HBA1C","COST") disc_ooc_cost <- disc_ooc_cost[order(disc_ooc_cost$HBA1C),] disc_ooc_cost <- disc_ooc_cost[disc_ooc_cost$HBA1C >= lowest,] disc_ooc_cost$HBA1C <- disc_ooc_cost$HBA1C - lowest mod.COST <- nls(COST ~ a + c*HBA1C^2, start = list(a = 1, c = 1), disc_ooc_cost) cost_COST_plotdat <- cbind("Complications",as.data.frame(cbind(seq(0, 6, 6/99), predict(mod.COST, newdata=data.frame(HBA1C = seq(0, 6, 6/99)))))) # Variance disc_ooc_cost_var <- as.data.frame(cbind(sort(unique(ooc_costs[,1])), as.numeric(tapply(ooc_costs[,2],ooc_costs[,1],var)))) colnames(disc_ooc_cost_var) <- c("HBA1C","COST") disc_ooc_cost_var <- disc_ooc_cost_var[disc_ooc_cost_var$HBA1C>=lowest,] disc_ooc_cost_var$HBA1C <- disc_ooc_cost_var$HBA1C-lowest mod_var.COST <- nls(COST ~ a + c*HBA1C^2, start = list(a = 0.5, c = 0.5), disc_ooc_cost_var, control = list(maxiter = 50000, minFactor = 0.000000000000001)) cost_COST_plotdat <- cbind(cost_COST_plotdat, cost_COST_plotdat[,3] - sqrt(predict(mod_var.COST, newdata=data.frame(HBA1C = seq(0,6,6/99)))), cost_COST_plotdat[,3] + sqrt(predict(mod_var.COST, newdata=data.frame(HBA1C = seq(0,6,6/99))))) colnames(cost_COST_plotdat) <- c("Data","HbA1c","Therapy cost","low","high") cost_plots <- rbind(cost_ANALOGUE_plotdat,cost_GLP_plotdat,cost_COST_plotdat) cost_plots$HbA1c <- cost_plots$HbA1c + lowest cost_plots[,"Therapy cost"] <- cost_plots[,"Therapy cost"]/1 cost_plots[,"low"] <- cost_plots[,"low"]/1 cost_plots[,"low"][cost_plots[,"low"]<0] <- 0 cost_plots[,"high"] <- cost_plots[,"high"]/1 cost_plots <- cost_plots ### Sampling cost: official, government-regulated prices related to sampling ### converted from HUF to EUR sampling_cost=2875/369.05 ### Empirical costs for comparison # GLP mean(RANDOMISED_DATA[grepl("GLP", RANDOMISED_DATA$THERAPY),]$THERAPY_COST_EUR)/30 + mean(RANDOMISED_DATA[grepl("GLP", RANDOMISED_DATA$THERAPY),]$COST_CUMU) + sampling_cost/mean(deltats[,2]) sd(RANDOMISED_DATA[grepl("GLP", RANDOMISED_DATA$THERAPY),]$THERAPY_COST_EUR/30 + RANDOMISED_DATA[grepl("GLP", RANDOMISED_DATA$THERAPY),]$COST_CUMU + sampling_cost/mean(deltats[,2])) # ANALOGUE mean(RANDOMISED_DATA[grepl("ANALOGUE", RANDOMISED_DATA$THERAPY) & !grepl("GLP", RANDOMISED_DATA$THERAPY),]$THERAPY_COST_EUR)/30 + mean(RANDOMISED_DATA[grepl("ANALOGUE", RANDOMISED_DATA$THERAPY) & !grepl("GLP", RANDOMISED_DATA$THERAPY),]$COST_CUMU) + sampling_cost/mean(deltats[,2]) sd(RANDOMISED_DATA[grepl("ANALOGUE", RANDOMISED_DATA$THERAPY) & !grepl("GLP", RANDOMISED_DATA$THERAPY),]$THERAPY_COST_EUR/30 + RANDOMISED_DATA[grepl("ANALOGUE", RANDOMISED_DATA$THERAPY) & !grepl("GLP", RANDOMISED_DATA$THERAPY),]$COST_CUMU + sampling_cost/mean(deltats[,2])) ### Empirical HbA1c distribution # GLP empi.GLP <- RANDOMISED_DATA$HBA1C_fill[grepl("GLP", RANDOMISED_DATA$THERAPY) & RANDOMISED_DATA$HBA1C_fill>=4 & RANDOMISED_DATA$HBA1C_fill<=20] cutHBA1C <- cut(na.omit(empi.GLP),breaks=100) newlvls <- seq(min(na.omit(empi.GLP)),max(na.omit(empi.GLP)), (max(na.omit(empi.GLP))-min(na.omit(empi.GLP)))/100)[1:100] + (max(na.omit(empi.GLP))-min(na.omit(empi.GLP)))/100/2 levels(cutHBA1C) <- newlvls empi.GLP <- as.data.frame(table(cutHBA1C)/sum(table(cutHBA1C))) empi.GLP$cutHBA1C <- as.numeric(as.character(empi.GLP$cutHBA1C)) empi.GLP <- cbind("GLP", empi.GLP) colnames(empi.GLP) <- c("Therapy", "HbA1c", "Probability") # ANALOGUE empi.ANALOGUE <- RANDOMISED_DATA$HBA1C_fill[grepl("ANALOGUE", RANDOMISED_DATA$THERAPY) & !grepl("GLP", RANDOMISED_DATA$THERAPY) & RANDOMISED_DATA$HBA1C_fill>=4 & RANDOMISED_DATA$HBA1C_fill<=20] cutHBA1C <- cut(na.omit(empi.ANALOGUE),breaks=100) newlvls <- seq(min(na.omit(empi.ANALOGUE)), max(na.omit(empi.ANALOGUE)), (max(na.omit(empi.ANALOGUE))- min(na.omit(empi.ANALOGUE)))/100)[1:100] + (max(na.omit(empi.ANALOGUE))- min(na.omit(empi.ANALOGUE)))/100/2 levels(cutHBA1C) <- newlvls empi.ANALOGUE <- as.data.frame(table(cutHBA1C)/sum(table(cutHBA1C))) empi.ANALOGUE$cutHBA1C <- as.numeric(as.character(empi.ANALOGUE$cutHBA1C)) empi.ANALOGUE <- cbind("ANALOGUE", empi.ANALOGUE) colnames(empi.ANALOGUE) <- c("Therapy", "HbA1c", "Probability") # Merging datasets hba1c_empir <- rbind(empi.ANALOGUE, empi.GLP) # The list of gathered data and statistics: # sigma_param, s_param, delta_param, ANALOGUE # GLP, mod.ANALOGUE, mod_var.ANALOGUE # mod.GLP, mod_var.GLP, mod.COST, mod_var.COST # cost_plots, sampling_cost, hba1c_empir ## End(Not run)
data("diabetes") str(diabetes) ## Not run: ##### Example of data assessment for control chart use ##### ### Packages require(zoo) require(reshape2) RANDOMISED_DATA <- diabetes ### Functions weighted.var <- function(x, w, na.rm = FALSE) { if (na.rm) { w <- w[i <- !is.na(x)] x <- x[i] } sum.w <- sum(w) sum.w2 <- sum(w^2) mean.w <- sum(x * w) / sum(w) (sum.w / (sum.w^2 - sum.w2)) * sum(w * (x - mean.w)^2, na.rm = na.rm) } estBetaParams <- function(mu, var) { alpha <- ((1 - mu) / var - 1 / mu) * mu ^ 2 beta <- alpha * (1 / mu - 1) return(params = list(alpha = alpha, beta = beta)) } ### Setting up data # Way too high HbA1C levels are discarded as outliers RANDOMISED_DATA$HBA1C_AVG[RANDOMISED_DATA$HBA1C_AVG>20 & !is.na(RANDOMISED_DATA$HBA1C_AVG)] <- NA # Lowest HbA1c level taken into account lowest <- 4 ### Gathering data for several estimates RANDOMISED_DATA <- RANDOMISED_DATA[RANDOMISED_DATA$ID %in% RANDOMISED_DATA$ID[grepl("E11",RANDOMISED_DATA$ICD)],] # Process standard deviation sigma_param <- sigma <- sqrt(weighted.mean((RANDOMISED_DATA$HBA1C_SD[ RANDOMISED_DATA$SAMPLING_IN_MONTH>=2 & !is.na(RANDOMISED_DATA$SAMPLING_IN_MONTH)])^2, RANDOMISED_DATA$SAMPLING_IN_MONTH[RANDOMISED_DATA$SAMPLING_IN_MONTH>=2 & !is.na(RANDOMISED_DATA$SAMPLING_IN_MONTH)])) IDLIST <- unique(RANDOMISED_DATA$ID[!is.na(RANDOMISED_DATA$HBA1C_AVG)][ duplicated(RANDOMISED_DATA$ID[!is.na(RANDOMISED_DATA$HBA1C_AVG)])]) IDLIST <- unique(RANDOMISED_DATA$ID[(RANDOMISED_DATA$ID %in% IDLIST) & RANDOMISED_DATA$AGE>39]) shiftdat <- NULL stimedat <- NULL repaidat <- NULL deltats <- NULL deltaATC <- NULL for(i in IDLIST) { smalldat <- RANDOMISED_DATA[RANDOMISED_DATA$ID==i,c("DATE","HBA1C_AVG","THERAPY_VECTOR")] smalldat <- smalldat[!is.na(smalldat$DATE) & !is.na(smalldat$HBA1C_AVG),] patshiftdat <- as.data.frame(cbind(i,smalldat$DATE[2:dim(smalldat)[1]],diff(smalldat$DATE), diff(smalldat$HBA1C_AVG))[diff(smalldat$HBA1C_AVG)>2*sigma,,drop=FALSE]) if(dim(patshiftdat)[1]>1) stimedat <- rbind(stimedat,cbind(i,diff(as.Date(patshiftdat$V2)))) patrepaidat <- as.data.frame(cbind(i,diff(smalldat$DATE),(smalldat$HBA1C_AVG-lowest)[2: length(smalldat$HBA1C_AVG)]/(smalldat$HBA1C_AVG-lowest)[1:(length(smalldat$HBA1C_AVG)-1)], as.character(smalldat$THERAPY_VECTOR[1:(length(smalldat$THERAPY_VECTOR)-1)]))[ (which(diff(smalldat$HBA1C_AVG)<(-2*sigma) & smalldat$HBA1C_AVG[1:(length(smalldat$HBA1C_AVG)-1)]>6 & smalldat$HBA1C_AVG[1:(length(smalldat$HBA1C_AVG)-1)]<=20)),,drop=FALSE]) shiftdat <- rbind(shiftdat,patshiftdat) repaidat <- rbind(repaidat,patrepaidat) deltats <- rbind(deltats,cbind(i,diff(as.Date(RANDOMISED_DATA$DATE[ !is.na(RANDOMISED_DATA$HBA1C_AVG) & RANDOMISED_DATA$ID==i])))) try(deltaATC <- rbind(deltaATC,cbind(i,diff(as.Date(RANDOMISED_DATA$DATE[ !is.na(RANDOMISED_DATA$THERAPY) & RANDOMISED_DATA$ID==i])))), silent=TRUE) } colnames(shiftdat) <- c("ID","TIME","TIMEDIFF","SHIFTSIZE") colnames(deltats) <- c("ID","DeltaT") colnames(deltaATC) <- c("ID","deltaATC") # delta: expected shift size delta_param <- mean(shiftdat$SHIFTSIZE[shiftdat$TIMEDIFF>=90 & shiftdat$TIMEDIFF<184]) # Empirical distribution of elapsed times (between samplings) summary(deltats[,2]) mean(deltats[,2]) median(deltats[,2]) sd(deltats[,2]) # s: expected number of shifts per unit time stimedat <- as.data.frame(stimedat) colnames(stimedat) <- c("ID","TIMEDIFF") s_param <- 1/mean(stimedat$TIMEDIFF[stimedat$TIMEDIFF<367]) # alphas, betas: therapy effectiveness parameters colnames(repaidat) <- c("ID","TIMEDIFF","REMAIN","THERAP") repaidat$REMAIN <- as.numeric(as.character(repaidat$REMAIN)) repaidat$TIMEDIFF <- as.numeric(as.character(repaidat$TIMEDIFF)) repaidat$THERAP <- as.character(repaidat$THERAP) repaidat <- repaidat[repaidat$TIMEDIFF>=90 & repaidat$TIMEDIFF<184,] repaidat$REMAIN[repaidat$REMAIN<0] <- 0 ther_eff <- as.data.frame(rbind( cbind("ANALOGUE",repaidat$REMAIN[repaidat$TIMEDIFF>=90 & repaidat$TIMEDIFF<184 & grepl("ANALOGUE",repaidat$THERAP) & !grepl("-H",repaidat$THERAP) & !grepl("GLP",repaidat$THERAP)]), cbind("GLP",repaidat$REMAIN[repaidat$TIMEDIFF>=90 & repaidat$TIMEDIFF<184 & grepl("GLP",repaidat$THERAP) & !grepl("-H",repaidat$THERAP)]))) ther_eff[,1] <- factor(ther_eff[,1], levels = c("ANALOGUE", "GLP")) ther_eff[,2] <- as.numeric(as.character(ther_eff[,2])) colnames(ther_eff) <- c("Type","Effectiveness") ANALOGUE <- estBetaParams(mean(repaidat$REMAIN[repaidat$TIMEDIFF>=90 & repaidat$TIMEDIFF<184 & grepl("ANALOGUE",repaidat$THERAP) & !grepl("-H",repaidat$THERAP) & !grepl("GLP",repaidat$THERAP)]), var(repaidat$REMAIN[repaidat$TIMEDIFF>=90 & repaidat$TIMEDIFF<184 & grepl("ANALOGUE",repaidat$THERAP) & !grepl("-H",repaidat$THERAP) & !grepl("GLP",repaidat$THERAP)])) GLP <- estBetaParams(mean(repaidat$REMAIN[repaidat$TIMEDIFF>=90 & repaidat$TIMEDIFF<184 & grepl("GLP",repaidat$THERAP) & !grepl("-H",repaidat$THERAP)]), var(repaidat$REMAIN[repaidat$TIMEDIFF>=90 & repaidat$TIMEDIFF<184 & grepl("GLP",repaidat$THERAP) & !grepl("-H",repaidat$THERAP)])) ### Repair cost HBA1C_fill <- NULL for (i in unique(RANDOMISED_DATA$ID[!is.na(RANDOMISED_DATA$HBA1C_AVG)])) { vec <- RANDOMISED_DATA$HBA1C_AVG[RANDOMISED_DATA$ID==i] vec[which(is.na(vec))[which(is.na(vec))<which(!is.na(vec))[1]]] <- vec[which(!is.na(vec))[1]] vec[which(is.na(vec))[which(is.na(vec))>which(!is.na(vec))[length(which(!is.na(vec)))]]] <- vec[which(!is.na(vec))[length(which(!is.na(vec)))]] vec <- na.approx(vec) HBA1C_fill <- rbind(HBA1C_fill,cbind(i,vec)) smaldat <- RANDOMISED_DATA[RANDOMISED_DATA$ID==i,] smaldat$THERAPY_COST_EUR[smaldat$THERAPY_COST_EUR==0 & smaldat$THERAPY_VECTOR!=""] <- NA if(is.na(smaldat$THERAPY_COST_EUR[1])) smaldat$THERAPY_COST_EUR[1] <- 0 new_burden <- na.locf(smaldat$THERAPY_COST_EUR) seged <- cbind(rle(is.na(smaldat$THERAPY_COST_EUR))[[2]], rle(is.na(smaldat$THERAPY_COST_EUR))[[1]]) seged[,2][seged[,1]==0] <- seged[,2][seged[,1]==0]-1 seged[,2][seged[,1]==1] <- seged[,2][seged[,1]==1]+1 if(seged[length(seged[,1]),1]==0) seged[length(seged[,2]),2] <- seged[length(seged[,2]),2]+1 seged2 <- cbind(rep(seged[,1], seged[,2]),rep(seged[,2], seged[,2])) new_burden[seged2[,1]==1] <- new_burden[seged2[,1]==1]/seged2[,2][seged2[,1]==1] RANDOMISED_DATA$THERAPY_COST_EUR[RANDOMISED_DATA$ID==i] <- new_burden } RANDOMISED_DATA$HBA1C_fill <- NA RANDOMISED_DATA$HBA1C_fill[RANDOMISED_DATA$ID%in%HBA1C_fill[,1]] <- HBA1C_fill[,2] RANDOMISED_DATA$HBA1C_fill_filter <- RANDOMISED_DATA$HBA1C_fill RANDOMISED_DATA$HBA1C_fill_filter[RANDOMISED_DATA$HBA1C_fill_filter>=10] <- NA discparam <- 150 cutHBA1C_AVG <- cut(na.omit(RANDOMISED_DATA$HBA1C_fill_filter),breaks=discparam) newlvls <- seq(min(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)), max(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)), (max(na.omit(RANDOMISED_DATA$HBA1C_fill_filter))- min(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)))/discparam)[1:discparam] + (max(na.omit(RANDOMISED_DATA$HBA1C_fill_filter))- min(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)))/discparam/2 levels(cutHBA1C_AVG) <- newlvls costs <- cbind(as.numeric(as.character(cutHBA1C_AVG)), RANDOMISED_DATA$THERAPY_COST_EUR[!is.na( RANDOMISED_DATA$HBA1C_fill_filter)]/30, as.character(RANDOMISED_DATA$THERAPY[ !is.na(RANDOMISED_DATA$HBA1C_fill_filter)])) costs <- as.data.frame(costs) colnames(costs) <- c("HBA1C","HC_BURDEN","THERAP") costs$HBA1C <- as.numeric(as.character(costs$HBA1C)) costs$HC_BURDEN <- as.numeric(as.character(costs$HC_BURDEN)) costs$THERAP <- as.character(costs$THERAP) costs$THERAP[grepl("ANALOGUE", costs$THERAP) & !grepl("GLP", costs$THERAP)] <- "ANALOGUE" costs$THERAP[grepl("GLP",costs$THERAP)] <- "GLP" cost.ANALOGUE <- as.data.frame(cbind(sort(unique(costs$HBA1C[costs$THERAP=="ANALOGUE"])), as.numeric(tapply(costs$HC_BURDEN[costs$THERAP=="ANALOGUE"], costs$HBA1C[costs$THERAP=="ANALOGUE"],mean)))) colnames(cost.ANALOGUE) <- c("HBA1C","HC_BURDEN") cost.GLP <- as.data.frame(cbind(sort(unique(costs$HBA1C[costs$THERAP=="GLP"])), as.numeric(tapply(costs$HC_BURDEN[costs$THERAP=="GLP"], costs$HBA1C[costs$THERAP=="GLP"],mean)))) colnames(cost.GLP) <- c("HBA1C","HC_BURDEN") ## ANALOGUE therapy # Mean cost.ANALOGUE <- na.omit(as.data.frame(cbind(as.numeric( costs$HBA1C[costs$THERAP=="ANALOGUE"]), costs$HC_BURDEN[costs$THERAP=="ANALOGUE"]))) colnames(cost.ANALOGUE) <- c("HBA1C","HC_BURDEN") cost.ANALOGUE <- cost.ANALOGUE[order(cost.ANALOGUE$HBA1C),] cost.ANALOGUE <- cost.ANALOGUE[cost.ANALOGUE$HBA1C>lowest,] cost.ANALOGUE$HBA1C <- cost.ANALOGUE$HBA1C-min(lowest) # Fit non-linear model mod.ANALOGUE <- nls(HC_BURDEN ~ a + b/(HBA1C + c), start = list(a = 5, b = -5, c = 1), cost.ANALOGUE, control = list(maxiter = 50000, minFactor = 0.000000000000001)) cost_ANALOGUE_plotdat <- cbind("ANALOGUE",as.data.frame(cbind(seq(0,6,6/99), predict(mod.ANALOGUE, newdata=data.frame(HBA1C = seq(0,6,6/99)))))) # Variance cost_var.ANALOGUE <- na.omit(as.data.frame(cbind(sort(unique( costs$HBA1C[costs$THERAP=="ANALOGUE"])), as.numeric(tapply(costs$HC_BURDEN[costs$THERAP=="ANALOGUE"], costs$HBA1C[costs$THERAP=="ANALOGUE"],var))))) colnames(cost_var.ANALOGUE) <- c("HBA1C","HC_BURDEN") cost_var.ANALOGUE <- cost_var.ANALOGUE[cost_var.ANALOGUE$HBA1C>lowest,] cost_var.ANALOGUE$HBA1C <- cost_var.ANALOGUE$HBA1C-min(lowest) # Fit non-linear model mod_var.ANALOGUE <- nls(HC_BURDEN ~ a + b/(HBA1C + c), start = list(a = 5, b = -3, c = 0.1), cost_var.ANALOGUE[cost_var.ANALOGUE$HBA1C<10-lowest,], control = list(maxiter = 50000, minFactor = 0.000000000000001)) cost_ANALOGUE_plotdat <- cbind(cost_ANALOGUE_plotdat, cost_ANALOGUE_plotdat[,3] - sqrt(predict(mod_var.ANALOGUE, newdata=data.frame(HBA1C = seq(0,6,6/99)))), cost_ANALOGUE_plotdat[,3] + sqrt(predict(mod_var.ANALOGUE, newdata=data.frame(HBA1C = seq(0,6,6/99))))) colnames(cost_ANALOGUE_plotdat) <- c("Data","HbA1c","Therapy cost","low","high") ## GLP # Mean cost.GLP <- na.omit(as.data.frame(cbind(as.numeric( costs$HBA1C[costs$THERAP=="GLP"]), costs$HC_BURDEN[costs$THERAP=="GLP"]))) colnames(cost.GLP) <- c("HBA1C","HC_BURDEN") cost.GLP <- cost.GLP[order(cost.GLP$HBA1C),] cost.GLP <- cost.GLP[cost.GLP$HBA1C>lowest,] cost.GLP$HBA1C <- cost.GLP$HBA1C-min(lowest) # Simple linear mod.GLP <- nls(HC_BURDEN ~ a + b * HBA1C, start = list(a = 1, b = 1), cost.GLP, control = list(maxiter = 50000, minFactor = 0.000000000000001)) cost_GLP_plotdat <- cbind("GLP",as.data.frame(cbind(seq(0,6,6/99), predict(mod.GLP, newdata=data.frame(HBA1C = seq(0,6,6/99)))))) # Variance cost_var.GLP <- na.omit(as.data.frame(cbind(sort(unique( costs$HBA1C[costs$THERAP=="GLP"])), as.numeric(tapply(costs$HC_BURDEN[costs$THERAP=="GLP"], costs$HBA1C[costs$THERAP=="GLP"],var))))) colnames(cost_var.GLP) <- c("HBA1C","HC_BURDEN") cost_var.GLP <- cost_var.GLP[cost_var.GLP$HBA1C>lowest,] cost_var.GLP$HBA1C <- cost_var.GLP$HBA1C-min(lowest) # Simple linear mod_var.GLP <- nls(HC_BURDEN ~ a + b*(HBA1C), start = list(a = 5, b = -3), cost_var.GLP, control = list(maxiter = 50000, minFactor = 0.000000000000001)) cost_GLP_plotdat <- cbind(cost_GLP_plotdat, cost_GLP_plotdat[,3] - sqrt(predict(mod_var.GLP, newdata=data.frame(HBA1C = seq(0,6,6/99)))), cost_GLP_plotdat[,3] + sqrt(predict(mod_var.GLP, newdata=data.frame(HBA1C = seq(0,6,6/99))))) colnames(cost_GLP_plotdat) <- c("Data","HbA1c","Therapy cost","low","high") ### Out-of-control cost COST_CUMU<-NULL for (i in unique(RANDOMISED_DATA$ID[!is.na(RANDOMISED_DATA$HEALTHCARE_BURDEN_EUR)])) { vec <- RANDOMISED_DATA$HEALTHCARE_BURDEN_EUR[RANDOMISED_DATA$ID==i] vec2 <- rollapply(vec,min(6,length(vec)), sum,align="left",partial=TRUE)/ (pmin(length(vec)-(1:length(vec))+1,6)*30) COST_CUMU <- rbind(COST_CUMU,cbind(i,vec2)) } RANDOMISED_DATA$COST_CUMU <- NA RANDOMISED_DATA$COST_CUMU[RANDOMISED_DATA$ID%in%COST_CUMU[,1]] <- COST_CUMU[,2] discparam <- 150 cutHBA1C_AVG <- cut(na.omit(RANDOMISED_DATA$HBA1C_fill_filter),breaks=discparam) newlvls <- seq(min(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)), max(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)), (max(na.omit(RANDOMISED_DATA$HBA1C_fill_filter))- min(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)))/discparam)[1:discparam] + (max(na.omit(RANDOMISED_DATA$HBA1C_fill_filter))- min(na.omit(RANDOMISED_DATA$HBA1C_fill_filter)))/discparam/2 levels(cutHBA1C_AVG) <- newlvls ooc_costs <- cbind(round(as.numeric(as.character(cutHBA1C_AVG)),1), RANDOMISED_DATA$COST_CUMU[!is.na(RANDOMISED_DATA$HBA1C_fill_filter)]) ooc_costs <- as.data.frame(ooc_costs) # Mean disc_ooc_cost <- as.data.frame(cbind(as.numeric(ooc_costs[,1]),ooc_costs[,2])) colnames(disc_ooc_cost) <- c("HBA1C","COST") disc_ooc_cost <- disc_ooc_cost[order(disc_ooc_cost$HBA1C),] disc_ooc_cost <- disc_ooc_cost[disc_ooc_cost$HBA1C >= lowest,] disc_ooc_cost$HBA1C <- disc_ooc_cost$HBA1C - lowest mod.COST <- nls(COST ~ a + c*HBA1C^2, start = list(a = 1, c = 1), disc_ooc_cost) cost_COST_plotdat <- cbind("Complications",as.data.frame(cbind(seq(0, 6, 6/99), predict(mod.COST, newdata=data.frame(HBA1C = seq(0, 6, 6/99)))))) # Variance disc_ooc_cost_var <- as.data.frame(cbind(sort(unique(ooc_costs[,1])), as.numeric(tapply(ooc_costs[,2],ooc_costs[,1],var)))) colnames(disc_ooc_cost_var) <- c("HBA1C","COST") disc_ooc_cost_var <- disc_ooc_cost_var[disc_ooc_cost_var$HBA1C>=lowest,] disc_ooc_cost_var$HBA1C <- disc_ooc_cost_var$HBA1C-lowest mod_var.COST <- nls(COST ~ a + c*HBA1C^2, start = list(a = 0.5, c = 0.5), disc_ooc_cost_var, control = list(maxiter = 50000, minFactor = 0.000000000000001)) cost_COST_plotdat <- cbind(cost_COST_plotdat, cost_COST_plotdat[,3] - sqrt(predict(mod_var.COST, newdata=data.frame(HBA1C = seq(0,6,6/99)))), cost_COST_plotdat[,3] + sqrt(predict(mod_var.COST, newdata=data.frame(HBA1C = seq(0,6,6/99))))) colnames(cost_COST_plotdat) <- c("Data","HbA1c","Therapy cost","low","high") cost_plots <- rbind(cost_ANALOGUE_plotdat,cost_GLP_plotdat,cost_COST_plotdat) cost_plots$HbA1c <- cost_plots$HbA1c + lowest cost_plots[,"Therapy cost"] <- cost_plots[,"Therapy cost"]/1 cost_plots[,"low"] <- cost_plots[,"low"]/1 cost_plots[,"low"][cost_plots[,"low"]<0] <- 0 cost_plots[,"high"] <- cost_plots[,"high"]/1 cost_plots <- cost_plots ### Sampling cost: official, government-regulated prices related to sampling ### converted from HUF to EUR sampling_cost=2875/369.05 ### Empirical costs for comparison # GLP mean(RANDOMISED_DATA[grepl("GLP", RANDOMISED_DATA$THERAPY),]$THERAPY_COST_EUR)/30 + mean(RANDOMISED_DATA[grepl("GLP", RANDOMISED_DATA$THERAPY),]$COST_CUMU) + sampling_cost/mean(deltats[,2]) sd(RANDOMISED_DATA[grepl("GLP", RANDOMISED_DATA$THERAPY),]$THERAPY_COST_EUR/30 + RANDOMISED_DATA[grepl("GLP", RANDOMISED_DATA$THERAPY),]$COST_CUMU + sampling_cost/mean(deltats[,2])) # ANALOGUE mean(RANDOMISED_DATA[grepl("ANALOGUE", RANDOMISED_DATA$THERAPY) & !grepl("GLP", RANDOMISED_DATA$THERAPY),]$THERAPY_COST_EUR)/30 + mean(RANDOMISED_DATA[grepl("ANALOGUE", RANDOMISED_DATA$THERAPY) & !grepl("GLP", RANDOMISED_DATA$THERAPY),]$COST_CUMU) + sampling_cost/mean(deltats[,2]) sd(RANDOMISED_DATA[grepl("ANALOGUE", RANDOMISED_DATA$THERAPY) & !grepl("GLP", RANDOMISED_DATA$THERAPY),]$THERAPY_COST_EUR/30 + RANDOMISED_DATA[grepl("ANALOGUE", RANDOMISED_DATA$THERAPY) & !grepl("GLP", RANDOMISED_DATA$THERAPY),]$COST_CUMU + sampling_cost/mean(deltats[,2])) ### Empirical HbA1c distribution # GLP empi.GLP <- RANDOMISED_DATA$HBA1C_fill[grepl("GLP", RANDOMISED_DATA$THERAPY) & RANDOMISED_DATA$HBA1C_fill>=4 & RANDOMISED_DATA$HBA1C_fill<=20] cutHBA1C <- cut(na.omit(empi.GLP),breaks=100) newlvls <- seq(min(na.omit(empi.GLP)),max(na.omit(empi.GLP)), (max(na.omit(empi.GLP))-min(na.omit(empi.GLP)))/100)[1:100] + (max(na.omit(empi.GLP))-min(na.omit(empi.GLP)))/100/2 levels(cutHBA1C) <- newlvls empi.GLP <- as.data.frame(table(cutHBA1C)/sum(table(cutHBA1C))) empi.GLP$cutHBA1C <- as.numeric(as.character(empi.GLP$cutHBA1C)) empi.GLP <- cbind("GLP", empi.GLP) colnames(empi.GLP) <- c("Therapy", "HbA1c", "Probability") # ANALOGUE empi.ANALOGUE <- RANDOMISED_DATA$HBA1C_fill[grepl("ANALOGUE", RANDOMISED_DATA$THERAPY) & !grepl("GLP", RANDOMISED_DATA$THERAPY) & RANDOMISED_DATA$HBA1C_fill>=4 & RANDOMISED_DATA$HBA1C_fill<=20] cutHBA1C <- cut(na.omit(empi.ANALOGUE),breaks=100) newlvls <- seq(min(na.omit(empi.ANALOGUE)), max(na.omit(empi.ANALOGUE)), (max(na.omit(empi.ANALOGUE))- min(na.omit(empi.ANALOGUE)))/100)[1:100] + (max(na.omit(empi.ANALOGUE))- min(na.omit(empi.ANALOGUE)))/100/2 levels(cutHBA1C) <- newlvls empi.ANALOGUE <- as.data.frame(table(cutHBA1C)/sum(table(cutHBA1C))) empi.ANALOGUE$cutHBA1C <- as.numeric(as.character(empi.ANALOGUE$cutHBA1C)) empi.ANALOGUE <- cbind("ANALOGUE", empi.ANALOGUE) colnames(empi.ANALOGUE) <- c("Therapy", "HbA1c", "Probability") # Merging datasets hba1c_empir <- rbind(empi.ANALOGUE, empi.GLP) # The list of gathered data and statistics: # sigma_param, s_param, delta_param, ANALOGUE # GLP, mod.ANALOGUE, mod_var.ANALOGUE # mod.GLP, mod_var.GLP, mod.COST, mod_var.COST # cost_plots, sampling_cost, hba1c_empir ## End(Not run)
A data frame containing aggregated low-density-lipoprotein (LDL) patient data gathered from various sources.
data("LDL")
data("LDL")
A data frame with 1 observation on the following 12 variables.
target_value
Target LDL value. Set according to the European guideline for patients at risk. Garmendia et al. (2000)
standard_deviation
Process standard deviation. Estimated using real life data from Hungary, namely registry data through Healthware Consulting Ltd. (Budapest, Hungary).
expected_shift_size
Expected shift size, around 0.8 increase in LDL per year on average, due to the expected number of shifts per year. Estimated with the help of a health care professional from the Healthware Consulting Ltd.
num_exp_daily_shifts
We expect around 3 shifts per year on average. Estimated with the help of a health professional from the Healthware Consulting Ltd.
rep_size_first
First repair size distribution parameter. Estimated using an international study which included Hungary. Garmendia et al. (2000)
rep_size_second
Second repair size distribution parameter.
samp_prob_first
First sampling probability parameter. Patient non-compliance in LDL controlling medicine is quite high, and this is represented through the parametrisation of the logistic function. Lardizabal and Deedwania (2010)
samp_prob_second
Second sampling probability parameter.
sampling_cost
Sampling cost in Euro. Estimated using the official LDL testing cost and visit cost in Hungary.
OOC_cost
Out-of-control operation (health care event) cost in Euro. Estimated using real world data of cardiovascular event costs from Hungary
base_rep_cost
Base repair (treatment) cost in Euro. Estimated using the simvastatin therapy costs in Hungary
prop_rep_cost
Shift-proportional (illness-proportional) repair cost in Euro. Estimated using the simvastatin therapy costs in Hungary.
It is very difficult to give a good estimate for the type and parameters of a distribution that properly models the non-compliance (sampling probability), thus the data here can at best be regarded as close approximations to a real-life situation. This is not a limiting factor, as patients themselves can have vast differences in their behaviour, so evaluation of different scenarios are often required. Since high LDL levels rarely produce noticeable symptoms, the sampling probability should only depend on the time between samplings (Institute for Quality and Efficiency in Health Care, 2017). Thus, the sampling probability parameters assume the use of a logistic function and not a beta distribution in the Markovstat
function. It is important to note that the proportional costs usually assumed to increase according to a Taguchi-type loss function (squared loss), thus huge expenses can be generated if the patient’s health is highly out-of-control. For cost calculations, the 2021 March 21 EUR-HUF exchange rate was used (1 EUR = 369.05 HUF).
Dobi, B. and Zempléni, A. Markov chain-based cost-optimal control charts for health care data. Quality and Reliability Engineering International, 2019;35(5):1379–1395. https://doi.org/10.1002/qre.2518
Boekholdt SM, Arsenault BJ, Mora S, et al. Association of LDL cholesterol, non–HDL cholesterol, and apolipoprotein B levels with risk of cardiovascular events among patients treated with statins: a meta-analysis. J Am Med Assoc. 2012;307(12):1302-1309. https://doi.org/10. 1001/jama.2012.366
Garmendia F, Brown AS, Reiber I, Adams PC. Attaining United States and European guideline LDL-cholesterol levels with simvastatin in patients with coronary heart disease (the GOALLS study). Curr Med Res Opin. 2000;16(3):208-219. PMID: 11191012.
Lardizabal JA, Deedwania PC. Benefits of statin therapy and compliance in high risk cardiovascular patients. Vasc Health Risk Manag. 2010;6:843-853. https://doi/org/10.2147/VHRM.S9474
High cholesterol: Overview. Institute for Quality and Efficiency in Health Care. https://www.ncbi.nlm.nih.gov/books/NBK279318/ [10 September 2021] Bookshelf ID: NBK279318.
#Print data data("LDL") LDL ### # Run analysis from Dobi & Zempleni 2019. # Note that the code below is generated with updated HUF-EUR rate and # a more accurate R implementation than in the original paper. stat_LDL_cost <- Markovstat( shiftfun = 'exp', h = 50, k = 3.15-LDL$target_value, sigma = LDL$standard_deviation, s = LDL$num_exp_daily_shifts, delta = LDL$expected_shift_size, RanRep = TRUE, alpha = LDL$rep_size_first, beta = LDL$rep_size_second, RanSam = TRUE, q = LDL$samp_prob_first, z = LDL$samp_prob_second, Vd = 100, V = 6-LDL$target_value) #Defining parallel_opt parallel settings. #parallel_opt can also be left empty to be defined automatically by the function. require(parallel) num_workers <- min(c(detectCores(),2)) parall <- list(cl=makeCluster(num_workers), forward=FALSE, loginfo=TRUE) res_LDL_cost <- Markovchart( statdist = stat_LDL_cost, OPTIM = TRUE, p = 1, cs = LDL$sampling_cost, cf = LDL$base_rep_cost, coparams = c(0,LDL$OOC_cost), crparams = c(LDL$base_rep_cost,LDL$prop_rep_cost), parallel_opt = parall) num_workers <- min(c(detectCores(),2)) parall <- list(cl=makeCluster(num_workers), forward=FALSE, loginfo=TRUE) res_LDL_cost_grid <- Markovchart( statdist = stat_LDL_cost, h=seq(50,75,2.5), k=seq(0.05,0.25,0.02), p = 1, cs = LDL$sampling_cost, cf = LDL$base_rep_cost, coparams = c(0,LDL$OOC_cost), crparams = c(LDL$base_rep_cost,LDL$prop_rep_cost), parallel_opt = parall) require(ggplot2) plot(res_LDL_cost_grid, y = 'Expected \ndaily cost', mid = '#ff9494', high = '#800000', xlab = 'Days between samplings', ylab = 'Critical LDL increase') + geom_point(aes(x = res_LDL_cost$Parameters[[1]], y = res_LDL_cost$Parameters[[2]]))
#Print data data("LDL") LDL ### # Run analysis from Dobi & Zempleni 2019. # Note that the code below is generated with updated HUF-EUR rate and # a more accurate R implementation than in the original paper. stat_LDL_cost <- Markovstat( shiftfun = 'exp', h = 50, k = 3.15-LDL$target_value, sigma = LDL$standard_deviation, s = LDL$num_exp_daily_shifts, delta = LDL$expected_shift_size, RanRep = TRUE, alpha = LDL$rep_size_first, beta = LDL$rep_size_second, RanSam = TRUE, q = LDL$samp_prob_first, z = LDL$samp_prob_second, Vd = 100, V = 6-LDL$target_value) #Defining parallel_opt parallel settings. #parallel_opt can also be left empty to be defined automatically by the function. require(parallel) num_workers <- min(c(detectCores(),2)) parall <- list(cl=makeCluster(num_workers), forward=FALSE, loginfo=TRUE) res_LDL_cost <- Markovchart( statdist = stat_LDL_cost, OPTIM = TRUE, p = 1, cs = LDL$sampling_cost, cf = LDL$base_rep_cost, coparams = c(0,LDL$OOC_cost), crparams = c(LDL$base_rep_cost,LDL$prop_rep_cost), parallel_opt = parall) num_workers <- min(c(detectCores(),2)) parall <- list(cl=makeCluster(num_workers), forward=FALSE, loginfo=TRUE) res_LDL_cost_grid <- Markovchart( statdist = stat_LDL_cost, h=seq(50,75,2.5), k=seq(0.05,0.25,0.02), p = 1, cs = LDL$sampling_cost, cf = LDL$base_rep_cost, coparams = c(0,LDL$OOC_cost), crparams = c(LDL$base_rep_cost,LDL$prop_rep_cost), parallel_opt = parall) require(ggplot2) plot(res_LDL_cost_grid, y = 'Expected \ndaily cost', mid = '#ff9494', high = '#800000', xlab = 'Days between samplings', ylab = 'Critical LDL increase') + geom_point(aes(x = res_LDL_cost$Parameters[[1]], y = res_LDL_cost$Parameters[[2]]))
Wrapper for Markov chain-based cost optimal control charts. Includes cost calculation methods for different shift size distributions and optimisation with respect to the average cost and cost standard deviation where the free parameters are the sampling interval (h
) and the control limit/critical value (k
).
Markovchart(statdist, h = NULL, k = NULL, OPTIM = FALSE, p = 1, constantr = FALSE, ooc_rep = 0, cs = NULL, cofun = cofun_default, coparams = NULL, crfun = crfun_default, crparams = NULL, cf = crparams, vcofun = vcofun_default, vcoparams = c(0, 0), vcrfun = vcrfun_default, vcrparams = c(0, 0), method = c("L-BFGS-B", "Nelder-Mead", "BFGS", "CG", "SANN", "Brent"), parallel_opt = NULL, silent = TRUE, ...)
Markovchart(statdist, h = NULL, k = NULL, OPTIM = FALSE, p = 1, constantr = FALSE, ooc_rep = 0, cs = NULL, cofun = cofun_default, coparams = NULL, crfun = crfun_default, crparams = NULL, cf = crparams, vcofun = vcofun_default, vcoparams = c(0, 0), vcrfun = vcrfun_default, vcrparams = c(0, 0), method = c("L-BFGS-B", "Nelder-Mead", "BFGS", "CG", "SANN", "Brent"), parallel_opt = NULL, silent = TRUE, ...)
statdist |
The stationary distribution of the Markov chain. Must be an object of class |
h |
The time between samplings. Must be a positive value, can be a numeric vector. For optimisation, this is the initial value. Inherited from |
k |
The control limit (critical value). Must be a positive value, can be a numeric vector. For optimisation, this is the initial value. Only one sided shifts are allowed, thus there is only one control limit. Inherited from |
OPTIM |
Logical. Should the resulting |
p |
The weight of the cost expectation in the calculation of the |
constantr |
Logical. Should the repair cost be assumed to constantly occur over time ( |
ooc_rep |
Numeric value between 0 and 1. The percentage of repair cost ocurring during out-of-control operation. Default is 0. If a value greater than 0 is set, then |
cs |
Sampling cost per sampling. |
cofun |
A function describing the relationship between the distance from the target value and the resulting out-of-control costs. Default is calculated using a base and a distance-scaling out-of-control parameter. See "Details". |
coparams |
Numeric vector. Parameters of |
crfun |
A function describing the relationship between the distance from the target value and the resulting repair costs. The default function assumes a linear relationship between the repair cost and the distance, and uses a base and a distance-scaling repair cost parameter. See "Details". |
crparams |
Numeric vector. Parameters of |
cf |
Numeric. The false alarm cost. Only relevant when |
vcofun |
A function describing the relationship between the distance from the target value and the resulting out-of-control cost variance. For the default function see "Details". |
vcoparams |
Numeric vector. Parameters of |
vcrfun |
A function describing the relationship between the distance from the target value and the resulting repair cost variance. For the default function see "Details". |
vcrparams |
Numeric vector. Parameters of |
method |
Method used for optimisation. Same as the argument of |
parallel_opt |
A list of parallel options. See e.g. the argument |
silent |
Should the call be returned? Default is |
... |
Further arguments to be passed down to |
The constantr
parameter is used for different repair assumptions. In traditional control chart theory, repair cost only occurs in case of an alarm signal. This is represented by constantr=FALSE
, which is the default. In this case the repair is just a momentary cost, occurring at the time of the sampling. However this model is inappropriate in several cases in healthcare. For example there are chronic diseases that require constant medication (repair in the sense of the model). In this approach (constantr=TRUE
) the repair cost still depends on the state of the process during sampling, but occurs even if there is no alarm and is divided by h
to represent the constant repair through the whole sampling interval. Thus the repair cost should be given in a way which corresponds to the model used.
The default cofun
calculates the out-of-control (OOC) cost using a base and a distance-scaling OOC parameter:
where is the total OOC cost,
is the base OOC cost (even without shift),
is the shift-scaling cost and
is the squared distance from the target value. This latter part is defined like this because a Taguchi-type loss function is used. This
incorporates the distances (the base of the losses) incurred not just at the time of the sampling, but also between samplings (hence it dependens on h). Even if the user defines a custom cost function for the OOC cost, this
term must be included, as a closed form solution has been developed for the calculation of the squared distances in case of exponential shifts, considerably decreasing run times. Thus the arguments of the OOC cost function should look like this: function(
, other parameters contained in a vector).
is fed to the cost function as a vector, thus the function should vectorised with respect to this argument. The default function looks like this:
cofun_default <- function(sqmudist,coparams) { sqmudist=sqmudist cob=coparams[1] cos=coparams[2] co <- cob + cos*sqmudist return(co) }
The default vcofun
also uses a Taguchi-type loss function and has identical parts and requirements as cofun
. The final standard deviation itself is calculated using the law of total variance. The default vcofun
is:
vcofun_default <- function(sqmudist,vcoparams) { sqmudist=sqmudist vcob=vcoparams[1] vcos=vcoparams[2] vco <- vcob + vcos*sqmudist return(vco) }
The defaults for the repair cost and cost variance are simple linear functions. For crfun
it is
where the notation are the same as before and "r" denotes repair. A custom function can be defined more freely here, but the first argument should be and the second a vector of further parameters.
The default function are:
crfun_default <- function(mudist,crparams) { mudist=mudist crb=crparams[1] crs=crparams[2] cr <- crb + crs*mudist return(cr) }
vcrfun_default <- function(mudist,vcrparams) { mudist=mudist vcrb=vcrparams[1] vcrs=vcrparams[2] vcr <- vcrb + vcrs*mudist; return(vcr) }
The value depends on the parameters:
If either h
or k
have length greater than 1, then the G-value
(weighted average of average cost and cost standard deviation) is calculated for all given values without optimisation. The value of the function in this case is a data frame of class codeMarkov_grid with length(h)*length(k)
number of rows and three columns for h
, k
and the G-value
.
If h
and k
are both of length 1 (they may be inherited from statdist
), then the value of the function is a Markov_chart
object, which is a list of length 4, detailing the properties of the control chart setup.
Results |
Vector of |
Subcosts |
Vector of sub-costs that are parts of the total expected cost. |
Parameters |
A vector that contains the time between samplings ( |
Stationary_distribution |
The stationary distribution of the Markov chain. Further information about the stationary distribution can be calculated using the |
Balazs Dobi and Andras Zempleni
Zempleni A, Veber M, Duarte B and Saraiva P. (2004) Control charts: a cost-optimization approach for processes with random shifts. Applied Stochastic Models in Business and Industry, 20(3), 185-200.
Dobi B and Zempleni A. (2019) Markov chain-based cost-optimal control charts for health care data. Quality and Reliability Engineering International, 35(5), 1379-1395.
Dobi B and Zempleni A. (2019) Markov chain-based cost-optimal control charts with different shift size distributions. Annales Univ. Sci. Budapest., Sect. Comp., 49, 129-146.
#Defining parallel_opt parallel settings. #parallel_opt can also be left empty to be defined automatically by the function. require(parallel) num_workers <- min(c(detectCores(),2)) parall <- list(cl=makeCluster(num_workers), forward=FALSE, loginfo=TRUE) #Fixed shift size (essentially Duncan's cycle model) - no optimisation. stat_deg <- Markovstat(shiftfun="deg", h=1, k=1, sigma=1, s=0.2, delta=2.5) res1 <- Markovchart(statdist=stat_deg, cs=1, crparams=20, coparams=50) res1 #Fixed shift size (essentially Duncan's cycle model) - with optimisation. res2 <- Markovchart(statdist=stat_deg, OPTIM=TRUE, cs=1, crparams=20, coparams=50, lower = c(0.01,0.01), upper = c(5,5), parallel_opt=parall) res2 #Exponential shift - no optimisation - default cost functions. stat_exp <- Markovstat(shiftfun="exp", h=0.5, k=2, sigma=1, s=0.2, delta=2, RanRep=FALSE, Vd=30, V=18) res3 <- Markovchart(stat_exp, p=0.9, cs=1, coparams=c(10,3), crparams=c(1,2)) res3 #Exponential shift - with optimisation - default cost functions. stat_exp2 <- Markovstat(shiftfun="exp", h=1, k=1, sigma=1, s=0.2, delta=2, RanRep=TRUE, alpha=1, beta=3, Vd=30, V=18) parall <- list(cl=makeCluster(num_workers), forward=FALSE, loginfo=TRUE) res4 <- Markovchart(statdist=stat_exp2, OPTIM=TRUE, p=0.9, cs=1, coparams=c(10,3), crparams=c(1,2), vcoparams=c(8,1.5), vcrparams=c(5,2), parallel_opt=parall) res4 #Exponential-geometric mixture shift - no optimisation - #random sampling - custom repair variance function. stat_expgeo <- Markovstat(shiftfun="exp-geo",h=1.5, k=2, sigma=1, s=0.2, delta=1.2, probmix=0.7, probnbin=0.8, disj=2, RanRep=TRUE, alpha=1, beta=3, RanSam=TRUE, StateDep=TRUE, a=1, b=15, Vd=100, V=8) vcrfun_new <- function(mudist,vcrparams) { mudist=mudist vcrb=vcrparams[1] vcrs=vcrparams[2] vcrs2=vcrparams[3] vcr <- vcrb + vcrs/(mudist + vcrs2) return(vcr) } res5 <- Markovchart(statdist=stat_expgeo, p=0.9, cs=1, coparams=c(10,6), crparams=c(20,3), vcoparams=c(10000,100), vcrfun=vcrfun_new, vcrparams=c(50000,-600000,1.5)) res5 #Exponential shift - no optimisation - vectorised. parall <- list(cl=makeCluster(num_workers), forward=FALSE, loginfo=TRUE) Gmtx <- Markovchart(statdist=stat_exp2, h=seq(1,10,by=(10-1)/5), k=seq(0.1,5,by=(5-0.1)/5), p=0.9, cs=1, coparams=c(10,3), crparams=c(1,2), vcoparams=c(8,1.5), vcrparams=c(5,2), V=18, parallel_opt=parall) Gmtx
#Defining parallel_opt parallel settings. #parallel_opt can also be left empty to be defined automatically by the function. require(parallel) num_workers <- min(c(detectCores(),2)) parall <- list(cl=makeCluster(num_workers), forward=FALSE, loginfo=TRUE) #Fixed shift size (essentially Duncan's cycle model) - no optimisation. stat_deg <- Markovstat(shiftfun="deg", h=1, k=1, sigma=1, s=0.2, delta=2.5) res1 <- Markovchart(statdist=stat_deg, cs=1, crparams=20, coparams=50) res1 #Fixed shift size (essentially Duncan's cycle model) - with optimisation. res2 <- Markovchart(statdist=stat_deg, OPTIM=TRUE, cs=1, crparams=20, coparams=50, lower = c(0.01,0.01), upper = c(5,5), parallel_opt=parall) res2 #Exponential shift - no optimisation - default cost functions. stat_exp <- Markovstat(shiftfun="exp", h=0.5, k=2, sigma=1, s=0.2, delta=2, RanRep=FALSE, Vd=30, V=18) res3 <- Markovchart(stat_exp, p=0.9, cs=1, coparams=c(10,3), crparams=c(1,2)) res3 #Exponential shift - with optimisation - default cost functions. stat_exp2 <- Markovstat(shiftfun="exp", h=1, k=1, sigma=1, s=0.2, delta=2, RanRep=TRUE, alpha=1, beta=3, Vd=30, V=18) parall <- list(cl=makeCluster(num_workers), forward=FALSE, loginfo=TRUE) res4 <- Markovchart(statdist=stat_exp2, OPTIM=TRUE, p=0.9, cs=1, coparams=c(10,3), crparams=c(1,2), vcoparams=c(8,1.5), vcrparams=c(5,2), parallel_opt=parall) res4 #Exponential-geometric mixture shift - no optimisation - #random sampling - custom repair variance function. stat_expgeo <- Markovstat(shiftfun="exp-geo",h=1.5, k=2, sigma=1, s=0.2, delta=1.2, probmix=0.7, probnbin=0.8, disj=2, RanRep=TRUE, alpha=1, beta=3, RanSam=TRUE, StateDep=TRUE, a=1, b=15, Vd=100, V=8) vcrfun_new <- function(mudist,vcrparams) { mudist=mudist vcrb=vcrparams[1] vcrs=vcrparams[2] vcrs2=vcrparams[3] vcr <- vcrb + vcrs/(mudist + vcrs2) return(vcr) } res5 <- Markovchart(statdist=stat_expgeo, p=0.9, cs=1, coparams=c(10,6), crparams=c(20,3), vcoparams=c(10000,100), vcrfun=vcrfun_new, vcrparams=c(50000,-600000,1.5)) res5 #Exponential shift - no optimisation - vectorised. parall <- list(cl=makeCluster(num_workers), forward=FALSE, loginfo=TRUE) Gmtx <- Markovchart(statdist=stat_exp2, h=seq(1,10,by=(10-1)/5), k=seq(0.1,5,by=(5-0.1)/5), p=0.9, cs=1, coparams=c(10,3), crparams=c(1,2), vcoparams=c(8,1.5), vcrparams=c(5,2), V=18, parallel_opt=parall) Gmtx
Wrapper for simulation of processes with a Markov chain-based control chart setup. Includes methods for different shift size distributions.
Markovsim(shiftfun = c("exp", "exp-geo"), num = 100, h, k, sigma, s, delta, probmix = 0, probnbin = 0.5, disj=1, RanRep = FALSE, alpha = NULL, beta = NULL, RanSam = FALSE, StateDep = FALSE, a = NULL, b = NULL, q = NULL, z = NULL, detail = 100, Vd = 50, V, burnin = 1)
Markovsim(shiftfun = c("exp", "exp-geo"), num = 100, h, k, sigma, s, delta, probmix = 0, probnbin = 0.5, disj=1, RanRep = FALSE, alpha = NULL, beta = NULL, RanSam = FALSE, StateDep = FALSE, a = NULL, b = NULL, q = NULL, z = NULL, detail = 100, Vd = 50, V, burnin = 1)
shiftfun |
A string defining the shift size distribution to be used. Must be either "exp", "exp-geo". |
num |
Integer. The number of sampling intervals simulated. This means that the time elapsed in the simulation is |
h |
The time between samplings. Must be a positive value. |
k |
The control limit (critical value). Must be a positive value. Only one sided shifts are allowed, thus there is only one control limit. |
sigma |
Process standard deviation (the distribution is normal). |
s |
Expected number of shifts in an unit time interval. |
delta |
Expected shift size. |
probmix |
The weight of the geometric distribution in case of exponential-geometric mixture shift distribution and should be between 0 and 1. |
probnbin |
The probability parameter of the geometric distribution in case of exponential-geometric mixture shift distribution and should be between 0 and 1. |
disj |
The size of a discrete jump in case of exponential-geometric mixture shift distribution, must be a positive number. |
RanRep |
Logical. Should the repair be random? Default is FALSE (no). |
alpha |
First shape parameter for the random repair beta distribution. |
beta |
Second shape parameter for the random repair beta distribution. |
RanSam |
Logical. Should the sampling be random? Default is FALSE (no). |
StateDep |
Logical. Should the sampling probability also depend on the distance from the target value (state dependency)? (If TRUE, a beta distribution is used for the sampling probability, if FALSE then a logistic function.) |
a |
First parameter* |
b |
Second shape parameter for the random sampling time beta distribution. |
q |
The steepness of the curve of the random sampling time logistic function. |
z |
The logistic sigmoid's midpoint of the random sampling time logistic function. |
detail |
The detail of the simulation, i.e. how many data points (including the moment of the sampling itself) should be simulated within a unit time. Should be a positive integer greater than 1, and the user should consider the length of the sampling interval |
Vd |
Integer discretisation parameter: the number of states after the equidistant discretisation of the state space. Should be an integer value greater than 2. This parameter is needed to calculate a stationary distibution that can be compared to results of the |
V |
Numeric discretisation parameter: the maximum (positive) distance from the target value taken into account. This parameter is needed to calculate a stationary distibution that can be compared to results of the |
burnin |
Numeric burn-in parameter: the number of samplings deemed as a burn-in period. Should be an integer greater than one. |
The simulation only includes the more complicated process and control chart cases and is meant for model checking and for situations when the exact calculation is problematic (such as low probabilities in the stationary distribution leading to rounding errors).
A Markov_sim
object which is a list of length 4.
Value_at_samplings |
The process value at sampling. |
Sampling_event |
The event at sampling, each can either be success (there was a sampling but no alarm), alarm (sampling with alarm) or failure (no sampling occurred). |
Simulation_data |
The simulated data (distances from the target value). |
Stationary_distribution |
The stationary distribution of the Markov chain, created by discretising the simulated data. See the documentaion of the |
Balazs Dobi and Andras Zempleni
Zempleni A, Veber M, Duarte B and Saraiva P. (2004) Control charts: a cost-optimization approach for processes with random shifts. Applied Stochastic Models in Business and Industry, 20(3), 185-200.
Dobi B and Zempleni A. (2019) Markov chain-based cost-optimal control charts for health care data. Quality and Reliability Engineering International, 35(5), 1379-1395.
Dobi B and Zempleni A. (2019) Markov chain-based cost-optimal control charts with different shift size distributions. Annales Univ. Sci. Budapest., Sect. Comp., 49, 129-146.
#Simulation using exponential shifts, random repair and random samling. simres1 <- Markovsim(shiftfun="exp", num=500, h=1, k=1, sigma=1, s=0.2, delta=2, RanRep=TRUE, alpha=1, beta=3, RanSam=TRUE, StateDep=TRUE, a=0.1, b=1, V=10) simres1 hist(simres1[[1]], 20, freq=FALSE) #Simulation using exponential-geometric mixture shifts, random repair and random samling. simres2 <- Markovsim(shiftfun="exp-geo", num=500, h=1, k=1, sigma=1, s=0.2, delta=2, probmix=0.9, probnbin=0.6, RanRep=TRUE, alpha=1, beta=3, RanSam=TRUE, StateDep=TRUE, a=0.1, b=1, V=10) simres2 hist(simres2[[1]], 20, freq=FALSE)
#Simulation using exponential shifts, random repair and random samling. simres1 <- Markovsim(shiftfun="exp", num=500, h=1, k=1, sigma=1, s=0.2, delta=2, RanRep=TRUE, alpha=1, beta=3, RanSam=TRUE, StateDep=TRUE, a=0.1, b=1, V=10) simres1 hist(simres1[[1]], 20, freq=FALSE) #Simulation using exponential-geometric mixture shifts, random repair and random samling. simres2 <- Markovsim(shiftfun="exp-geo", num=500, h=1, k=1, sigma=1, s=0.2, delta=2, probmix=0.9, probnbin=0.6, RanRep=TRUE, alpha=1, beta=3, RanSam=TRUE, StateDep=TRUE, a=0.1, b=1, V=10) simres2 hist(simres2[[1]], 20, freq=FALSE)
Calculates the stationary distribution of a process described by a discrete state, discrete time Markov chain. The process is described by a degradation-repair cycle type model. The user must give parameters describing both the degradation and the repair. The process is not repaired until the problem is discovered by sampling, hence the control chart setup. The same, single element is monitored (i.e. the sample size is always 1).
Markovstat(shiftfun = c("exp", "exp-geo", "deg"), h, k, sigma, s, delta, probmix = 0, probnbin = 0.5, disj = 1, RanRep = FALSE, alpha = NULL, beta = NULL, RanSam = FALSE, StateDep = FALSE, a = NULL, b = NULL, q = NULL, z = NULL, Vd = 100, V, Qparam = 30)
Markovstat(shiftfun = c("exp", "exp-geo", "deg"), h, k, sigma, s, delta, probmix = 0, probnbin = 0.5, disj = 1, RanRep = FALSE, alpha = NULL, beta = NULL, RanSam = FALSE, StateDep = FALSE, a = NULL, b = NULL, q = NULL, z = NULL, Vd = 100, V, Qparam = 30)
shiftfun |
A string defining the shift size distribution to be used. Must be either |
h |
The time between samplings. Must be a positive value. |
k |
The control limit (critical value). Must be a positive value. Only one sided shifts are allowed, thus there is only one control limit. |
sigma |
Process standard deviation (the distribution is assumed to be normal). |
s |
Expected number of shifts in an unit time interval. |
delta |
Expected shift size. Used as the parameter of the exponential distribution ( |
probmix |
The weight of the geometric distribution in case of exponential-geometric mixture shift distribution; should be between 0 and 1. |
probnbin |
The probability parameter of the geometric distribution in case of exponential-geometric mixture shift distribution; should be between 0 and 1. |
disj |
The size of a discrete jump in case of exponential-geometric mixture shift distribution, must be a positive number. |
RanRep |
Logical. Should the repair be random? Default is |
alpha |
First shape parameter for the random repair beta distribution. |
beta |
Second shape parameter for the random repair beta distribution. |
RanSam |
Logical. Should the sampling be random? Default is |
StateDep |
Logical. Should the sampling probability also depend on the distance from the target value (state dependency)? (If TRUE, a beta distribution is used for the sampling probability, if |
a |
First parameter* |
b |
Second shape parameter for the random sampling time beta distribution. |
q |
The steepness of the curve of the random sampling time logistic function. |
z |
The logistic sigmoid"s midpoint of the random sampling time logistic function. |
Vd |
Integer discretisation parameter: the number of states in the equidistant discretisation of the state space. Should be an integer value greater than 2. |
V |
Numeric discretisation parameter: the maximum (positive) distance from the target value taken into account. |
Qparam |
Integer discretisation parameter: the number of maximum events taken into account within a sampling interval. |
The function return a list object of class Markov_stationary
. The list is of length 3:
Stationary_distribution |
Stationary distribution of the Markov chain. The probabilities in the stationary distribution are labeled. If |
Transition_matrix |
The transition matrix of the Markov chain. Not printed. |
Param_list |
Parameters given to the function and various technical results used by the |
Balazs Dobi and Andras Zempleni
Zempleni A, Veber M, Duarte B and Saraiva P. (2004) Control charts: a cost-optimization approach for processes with random shifts. Applied Stochastic Models in Business and Industry, 20(3), 185-200.
Dobi B and Zempleni A. (2019) Markov chain-based cost-optimal control charts for health care data. Quality and Reliability Engineering International, 35(5), 1379-1395.
Dobi B and Zempleni A. (2019) Markov chain-based cost-optimal control charts with different shift size distributions. Annales Univ. Sci. Budapest., Sect. Comp., 49, 129-146.
#Fixed shift size (essentially Duncan's cycle model). res1 <- Markovstat(shiftfun="deg", h=1, k=1, sigma=1, s=0.2, delta=2.5) res1 #Exponential shift - perfect repair - deterministic sampling res2 <- Markovstat(shiftfun="exp", h=1, k=1, sigma=1, s=0.2, delta=2, Vd=30, V=18) res2 #Notice how the In-control and the False-alarm states have non-zero probabilities. #If the repair would be random (RanRep=TRUE), then these states would have zero probability. #Exponential-geometric mixture shift - random repair - random sampling. res3 <- Markovstat(shiftfun='exp-geo', h=1.5, k=2, sigma=1, s=0.2, delta=1.2, probmix=0.7, probnbin=0.8, disj=2, RanRep=TRUE, alpha=1, beta=3, RanSam=TRUE, StateDep=TRUE, a=1, b=15, Vd=40, V=8) res3
#Fixed shift size (essentially Duncan's cycle model). res1 <- Markovstat(shiftfun="deg", h=1, k=1, sigma=1, s=0.2, delta=2.5) res1 #Exponential shift - perfect repair - deterministic sampling res2 <- Markovstat(shiftfun="exp", h=1, k=1, sigma=1, s=0.2, delta=2, Vd=30, V=18) res2 #Notice how the In-control and the False-alarm states have non-zero probabilities. #If the repair would be random (RanRep=TRUE), then these states would have zero probability. #Exponential-geometric mixture shift - random repair - random sampling. res3 <- Markovstat(shiftfun='exp-geo', h=1.5, k=2, sigma=1, s=0.2, delta=1.2, probmix=0.7, probnbin=0.8, disj=2, RanRep=TRUE, alpha=1, beta=3, RanSam=TRUE, StateDep=TRUE, a=1, b=15, Vd=40, V=8) res3
Markov_grid
control chart results.
Convenience function for plotting G-values in a contour plot as the function of the time between samplings and the critical value.
## S3 method for class 'Markov_grid' plot( x, y = expression(atop(italic("G")*-value~per, unit~time)), xlab = "Time between samplings", ylab = "Critical value", low = "white", mid = "#999999", high = "black", colour = "white", nbreaks = 16, ...)
## S3 method for class 'Markov_grid' plot( x, y = expression(atop(italic("G")*-value~per, unit~time)), xlab = "Time between samplings", ylab = "Critical value", low = "white", mid = "#999999", high = "black", colour = "white", nbreaks = 16, ...)
x |
A |
y |
The name of the scale. |
xlab |
A title for the x axis. |
ylab |
A title for the x axis. |
low |
Colour for the low end of the gradient. |
mid |
Colour for the midpoint. |
high |
Colour for the high end of the gradient. |
colour |
Colour of the contour lines. |
nbreaks |
Number of contour breaks. Uses |
... |
Further arguments to be passed down to |
A plot object of class gg
and ggplot
produced using the ggplot2
package.
The plot itself is made using the package ggplot
by Hadley Wickham et al. The text on the contour lines is added with the geom_text_contour
function from the package metR
by Elio Campitelli.
Balazs Dobi and Andras Zempleni
Zempleni A, Veber M, Duarte B and Saraiva P. (2004) Control charts: a cost-optimization approach for processes with random shifts. Applied Stochastic Models in Business and Industry, 20(3), 185-200.
Dobi B and Zempleni A. (2019) Markov chain-based cost-optimal control charts for health care data. Quality and Reliability Engineering International, 35(5), 1379-1395.
Dobi B and Zempleni A. (2019) Markov chain-based cost-optimal control charts with different shift size distributions. Annales Univ. Sci. Budapest., Sect. Comp., 49, 129-146.
#Defining parallel_opt parallel settings. #parallel_opt can also be left empty to be defined automatically by the function. require(parallel) num_workers <- min(c(detectCores(),2)) #Exponential shift - default cost functions. stat_exp <- Markovstat(shiftfun="exp", h=1, k=1, sigma=1, s=0.2, delta=2, RanRep=TRUE, alpha=1, beta=3, Vd=30, V=18) parall <- list(cl=makeCluster(num_workers), forward=FALSE, loginfo=TRUE) Gmtx <- Markovchart(statdist=stat_exp, h=seq(1,10,by=(10-1)/5), k=seq(0.1,5,by=(5-0.1)/5), p=0.9, cs=1, coparams=c(10,3), crparams=c(1,2), vcoparams=c(8,1.5), vcrparams=c(5,2), V=18, parallel_opt=parall) plot(Gmtx)
#Defining parallel_opt parallel settings. #parallel_opt can also be left empty to be defined automatically by the function. require(parallel) num_workers <- min(c(detectCores(),2)) #Exponential shift - default cost functions. stat_exp <- Markovstat(shiftfun="exp", h=1, k=1, sigma=1, s=0.2, delta=2, RanRep=TRUE, alpha=1, beta=3, Vd=30, V=18) parall <- list(cl=makeCluster(num_workers), forward=FALSE, loginfo=TRUE) Gmtx <- Markovchart(statdist=stat_exp, h=seq(1,10,by=(10-1)/5), k=seq(0.1,5,by=(5-0.1)/5), p=0.9, cs=1, coparams=c(10,3), crparams=c(1,2), vcoparams=c(8,1.5), vcrparams=c(5,2), V=18, parallel_opt=parall) plot(Gmtx)