Title: | Customer Analytics Data Formatting |
---|---|
Description: | Converts customer transaction data (ID, purchase date) into a R6 class called customer. The class stores various customer analytics calculations at the customer level. The package also contains functionality to convert data in the R6 class to data.frames that can serve as inputs for various customer analytics models. |
Authors: | Ludwig Steven [aut, cre] |
Maintainer: | Ludwig Steven <[email protected]> |
License: | GPL-3 |
Version: | 0.1 |
Built: | 2024-12-01 08:27:48 UTC |
Source: | CRAN |
Likelihood maximization for annual halfing customer retention model
annualhalfing_LL(grid, dta)
annualhalfing_LL(grid, dta)
grid |
model parameters |
dta |
dataset |
Annual halfing Likelihood in optimization routine
A recency-frequency model used in non-contractual situations. Model assumptions: 1.) Increasing recency leads to higher probability of quitting. 2.) Frequency is related to exponential learning curves Reference: Segmentation and Lifetime Value Modeling in SAS (Edward Malthouse)
annualhalfingmodel(cadf.data, starting.values)
annualhalfingmodel(cadf.data, starting.values)
cadf.data |
cadf-formatted dataset |
starting.values |
parameter starting values for model |
Returns model parameters
dta <- lapply(CADF::cadf.data.sample, function(x) tail(x$data, 1)) dta <- do.call(rbind, dta) starting.values <- c(.5,.9,.2,-.9) annualhalfingmodel(cadf.data.sample, starting.values)
dta <- lapply(CADF::cadf.data.sample, function(x) tail(x$data, 1)) dta <- do.call(rbind, dta) starting.values <- c(.5,.9,.2,-.9) annualhalfingmodel(cadf.data.sample, starting.values)
Answering machine data
A data frame with 9 rows and two columns
bigT_expand_via_apply
bigT_expand_via_apply(x)
bigT_expand_via_apply(x)
x |
vector containing bigT, cancel and count |
x <- c(3, 1, 5) bigT_expand_via_apply(x)
x <- c(3, 1, 5) bigT_expand_via_apply(x)
ca_SRM
ca_SRM(df_logistic)
ca_SRM(df_logistic)
df_logistic |
data frame containing the data for logistic regression |
customertype1 <- c(3, 1, 5) customertype2 <- c(12, 0, 3) cust1 <- bigT_expand_via_apply(customertype1) cust2 <- bigT_expand_via_apply(customertype2) df_logistic <- rbind(cust1, cust2) model <- ca_SRM(df_logistic)
customertype1 <- c(3, 1, 5) customertype2 <- c(12, 0, 3) cust1 <- bigT_expand_via_apply(customertype1) cust2 <- bigT_expand_via_apply(customertype2) df_logistic <- rbind(cust1, cust2) model <- ca_SRM(df_logistic)
Time varying Simple retention model Estimates retention rate using logistic regression and the simple regression model Mostly used for contractual models where there are clear opportunities for cancellation. Could be used in non-contractional situations although the cancellation opportunities should be defined. Not recommended for use with services that consumers use rotating-door style. Use the migration model there.
ca_SRM_time_varying(df_logistic, reference_level = 12, maxT = 12)
ca_SRM_time_varying(df_logistic, reference_level = 12, maxT = 12)
df_logistic |
A data frame, formatted for logistic regression. 1 row for each customer id/timeperiod. 1/0 for purchase. |
reference_level |
All coefficients will be judged relevant to the reference level. It defaults to time period 12. (Note interpretation will change based on how T is formulated.) |
maxT |
The number of timeperiods to build. |
Returns logistic model results (the glm model)
library(stats) x <- c(3, 1, 5) df_logistic <- bigT_expand_via_apply(x) model <- ca_SRM_time_varying(df_logistic, reference_level = 3)
library(stats) x <- c(3, 1, 5) df_logistic <- bigT_expand_via_apply(x) model <- ca_SRM_time_varying(df_logistic, reference_level = 3)
CADF to purchase string Extracts purchase strings from the CADF and formats as a R matrix.
ca_to_ps_matrix(ca.data, maxT)
ca_to_ps_matrix(ca.data, maxT)
ca.data |
Data in the CADF format generated by the CADF _to_CADF functions and Customer class. |
maxT |
Number of columns in the matrix |
Output is a matrix. Rows are number of customers; columns = maxT
Matrix with dimensions C x maxT (number of customers by maxT) library(CADF) data("transactions") customer <- subset(transactions, transactions$ID == 40) today.study.cutoff <- max(customer$PURCHASE_DATE) customer.40.CADF <- list(Customer$new(customer, today.study.cutoff)) psmatrix <- customer.40.CADF$purchase_string_as_matrix psmatrix2 <- ca_to_ps_matrix(customer.40.CADF, 15)
Converts CADF output to dataset for annual halfing model
CADF_to_annualhalfing_data(cadf.data)
CADF_to_annualhalfing_data(cadf.data)
cadf.data |
CADF dataset |
Converts a CADF dataset to a dataset for btyd pareto nbd modeling
CADF_to_btyd_pareto_nbd(cadf.data)
CADF_to_btyd_pareto_nbd(cadf.data)
cadf.data |
CADF-formatted dataset |
Convert a CADF dataset to a dataset for logistic regression
CADF_to_logistic_regression(CADF)
CADF_to_logistic_regression(CADF)
CADF |
CADF-formatted dataset |
Builds transition matrix for a migration model. T is the maximum time cutoff which defaults to 5. The output will be a transition matrix.
CADF_to_migration_model(cadf.data, maxT = 5)
CADF_to_migration_model(cadf.data, maxT = 5)
cadf.data |
Data in R list format processed by CADF functions |
maxT |
If time is greater than maxT it will be converted into a + category |
tmatrix <- CADF_to_migration_model(cadf.data.sample)
tmatrix <- CADF_to_migration_model(cadf.data.sample)
CADF_to_nth_purchase
CADF_to_nth_purchase(cadf.data, n)
CADF_to_nth_purchase(cadf.data, n)
cadf.data |
Data in R list format processed by CADF functions |
n |
the nth purchase you want to analyze |
CADF_to_nth_purchase_allrows inputs CADF data and the desired purchase number that you want to count the nth result of.
CADF_to_nth_purchase_allrows(cadf.data, n)
CADF_to_nth_purchase_allrows(cadf.data, n)
cadf.data |
Data in R list format processed by CADF functions |
n |
the nth purchase |
CADF-formatted sample data
List with 2,185 customers, in CADF format
Function called during Customer$new() (the Customer R6 class) to create purchase string for the customer.
create.purchase.string(x, id.column, date.column, return.mode = "")
create.purchase.string(x, id.column, date.column, return.mode = "")
x |
Transactional data associated with customer id. |
id.column |
Description goes here. |
date.column |
Description goes here. |
return.mode |
Set to matrix if you want result returned as a matrix |
purchase string in 0/1 format. Returned as string.
data("transactions") customer <- subset(transactions, transactions$ID == 5) create.purchase.string(customer, "ID", "PURCHASE_DATE")
data("transactions") customer <- subset(transactions, transactions$ID == 5) create.purchase.string(customer, "ID", "PURCHASE_DATE")
Tracks cumulative recency
create.recency.string(x)
create.recency.string(x)
x |
vector of zeros and ones |
head(cadf.data.sample)
head(cadf.data.sample)
A short description...
Call Customer$new() to convert transactional data to CADF format
output
Stores all information in R format at the customer level.
payload
Stores all computed customer information in JSON format for integration into other systems. This is not quite an API but designed so that customer information can be imported to other formats and systems.
data
a data frame that stores purchase information for a single customer. Input data for various calculations in initialize (df_customer)
id
The customer id. This will be the same ID as provided in the input transaction file.
study_name
A name to associate with the cohort study. #The name can be whatever is easiest to associate with the set of customer id and dates included in the analysis.
study_begin_date
Begin date of the customer study. In theory this should be min(TRANSACTION_DATE) for each customer in the dataset.
timing
Monthly timing computes T as months. Most commonly utilized and is the default.
transaction_dates
All transaction dates for the customer
transaction_months
All YYYY_MM transaction dates for the customer
first_purchase_date
First purchase date for the customer.
last_purchase_date
Last purchase date for the customer. #' @field repeat_customer repeat_customer if the following conditions are true. The customer has more than one transaction. The second transaction date is greater than the first transaction date.
repeat_customer_by_day
description
today
today #' @field T a measure of time between first date of activity and purchase.
T_ss
T_ss
transaction_range_complete
shows a consecutive sequence usually beginning at 1
purchase_count
purchase count
purchase_string
description
purchase_string_as_matrix
purchase string as matrix
recency_string_as_matrix
recency string as matrix
Freq
frequency count
logistic_modeling_matrix
Stores customer's logistic modeling matrix. (One row for each time period (T), 1 = purchase; 0 = no purchase)
logistic_modeling_matrix_ss
logistic_modeling_matrix_ss
logistic_modeling_matrix_custom
logistic_modeling_matrix_custom
survival_modeling_matrix
Stores customer's modeling matrix for survival analysis. For survival analysis '1' means that the customer has stopped being a customer. '0' means that the customer is continuing to be a customer.
survival_modeling_matrix_ss
survival_modeling_matrix_ss
survival_modeling_matrix_custom
survival_modeling_matrix_custom
repeat_customer
This can be used to filter out repeat customers from analysis. Repeat customer based on YYYY_MM. (Customer with only two purchases in January would not be a repeat customer) however it's by day instead of YYYY_MM. PURCHASE STRINGS purchase_string Utilizes the 'create.purchase.string' function to create a purchase string. "1" if purchase was made during the purchase period; "0" otherwise. No special rules are applied and the purchase string reflects true purchase history. df_customer: data frame for single customer, id column, purchase date column
T
T is a cancellation time. CADF offers different ways to estimate the cancellation time strict_quitter: Customer leaves after first period of inactivity. Example purchase string 11001. T=3 strict_stayer: T is the last period of transaction in the purchase string. 11001. T=5 As T becomes longer strict_quitter will have a tendancy to underestimate retention. Strict_stayer will have a tendancey to overestimate If you know your customers come and go at free will you can utilize a Migration model or choose T between strict quitter and strict stayer
T_ss T_ss
T_custom
T_custom logistic_modeling_matrix Stores rows for the customer that contribute to a logistic modeling matrix. Assumes strict/perm cancellations. Customer relationship starts at time 1 and ends at time N (with perm cancellation and no pauses in between) This is usually known as a contractual relationship logistic_modeling_matrix_sc Assumes strict stayer assumption $field logistic_modeling_matrix_custom survival_modeling_matrix Stores rows for the customer that contribute to a survival modeling matrix. $field logistic_modeling_matrix_custom cleanup and data storage empty working df_customer data frame and place the result in the class, name it 'data'
new()
Creates a CADF profile for a given customer based on the input transactional data usually an R list
Customer$new(df_customer = NA, today = NA)
df_customer
description
today
A new 'Customer' object. Converted transactional data to CADF format. To access cadf[[1]], etc... Represents customer data (for a particular id) in the "CADF" format df_customer$Tdays df_customer data frame column: to compute "days from first purchase" df_customer$month_yr date converted to YYYY_MM format df_customer$Tmonths Number of months between purchase date and first purchase date. Rounded up to nearest month id the customerid which identifies the customer in the CADF class. transaction_dates All unique transaction dates for customer All unique YYYY_MM combinations for customer transactions. This is used for building purchase strings.
clone()
The objects of this class are cloneable with this method.
Customer$clone(deep = FALSE)
deep
Whether to make a deep clone.
library(CADF) data("transactions") customer <- subset(transactions, transactions$ID == 40) today.study.cutoff <- max(customer$PURCHASE_DATE) customer.40.CADF <- Customer$new(customer, today.study.cutoff)
library(CADF) data("transactions") customer <- subset(transactions, transactions$ID == 40) today.study.cutoff <- max(customer$PURCHASE_DATE) customer.40.CADF <- Customer$new(customer, today.study.cutoff)
'f_CustomerModelingMatrix' inputs are cancellation_time.
f_CustomerModelingMatrix(cancellation_time)
f_CustomerModelingMatrix(cancellation_time)
cancellation_time |
= cancellation time |
Description here
f_CustomerModelingMatrix(10)
f_CustomerModelingMatrix(10)
'f_CustomerSurvivalModelingMatrix' inputs are T.
f_CustomerSurvivalModelingMatrix(cancellation_time)
f_CustomerSurvivalModelingMatrix(cancellation_time)
cancellation_time |
cancellation time |
Description here
f_CustomerSurvivalModelingMatrix(10)
f_CustomerSurvivalModelingMatrix(10)
Compute the months between two purchase dates
f_intMonths(a, b)
f_intMonths(a, b)
a |
starting date |
b |
ending date Description here |
Purchase string to frequency count
frequency_from_ps(x)
frequency_from_ps(x)
x |
rle object |
RLE object to frequency count
frequency_from_rle(x)
frequency_from_rle(x)
x |
rle object |
# example code x <- c(1,1,0,1,0,0,1,0,0,0) x.rle <- rle(x) frequency_from_rle(x.rle)
# example code x <- c(1,1,0,1,0,0,1,0,0,0) x.rle <- rle(x) frequency_from_rle(x.rle)
Gamma gamma spend model data
data frame with 2,357 rows and 6 columns
generate_date_template
generate_date_template()
generate_date_template()
dates <- generate_date_template()
dates <- generate_date_template()
'id_to_CADF' inputs is coming from a lapply operation on a split customer dataset. If variable a is the split customer dataset then a$'1' is customer with ID 1
id_to_CADF(data, today.study.cutoff)
id_to_CADF(data, today.study.cutoff)
data |
Transactional Data for one customerid |
today.study.cutoff |
Separate data an holdout |
Description here
LD functions are utilized for learning and diagnostic use.
ld_sample_customer_matrix(numCustomers, maxT, purchaseAtT0 = TRUE)
ld_sample_customer_matrix(numCustomers, maxT, purchaseAtT0 = TRUE)
numCustomers |
number of customers to simulate |
maxT |
number of timeperiods |
purchaseAtT0 |
by default sets first column of matrix to 1 |
LTV transactions data
data frame with 53,998 rows and 4 columns
Likelihood function for annual halfing model
modeling.annualhalfing.likelihood(grid2, rec, freq, targetBuy)
modeling.annualhalfing.likelihood(grid2, rec, freq, targetBuy)
grid2 |
Modeling parameters |
rec |
recency |
freq |
frequency |
targetBuy |
indicator if purchase was made in holdout period |
LL function for the gamma gamma spend model
modeling.LL.gamma_spend(p, q, gamma, y = data)
modeling.LL.gamma_spend(p, q, gamma, y = data)
p |
p |
q |
q |
gamma |
gamma |
y |
data |
PDF probability function for gamma distribution
pdf_gamma(x, r, a)
pdf_gamma(x, r, a)
x |
between 0 and 1 for pdf |
r |
shape parameter |
a |
scale parameter |
Probability density function for gamma distribution
pdf_gamma2(x, shape, scale)
pdf_gamma2(x, shape, scale)
x |
x |
shape |
shape parameter |
scale |
scale parameter |
The glossary for the CADF data format
## S3 method for class 'glossary' print()
## S3 method for class 'glossary' print()
Calculates T from a purchase string. Custom.
ps_to_T_custom(ps, skips = 2)
ps_to_T_custom(ps, skips = 2)
ps |
Purchase string. |
skips |
Number of non purchase periods that the customer is still considered a customer for. |
The sum of x
and y
.
Calculates T from a purchase string
ps_to_T_strict_quitter(ps)
ps_to_T_strict_quitter(ps)
ps |
Purchase string. |
The sum of x
and y
.
Calculates T from a purchase string under the "strict stayer" assumption.
ps_to_T_strict_stayer(ps)
ps_to_T_strict_stayer(ps)
ps |
Purchase string. |
The numeric value for T, which is the position of the last 1 in the purchase string
psmatrix_to_psstring
psmatrix_to_psstring(psmatrix)
psmatrix_to_psstring(psmatrix)
psmatrix |
purchase string of 1's and 0's in matrix format |
cadf.data.sample[[4]]$purchase_string_as_matrix
cadf.data.sample[[4]]$purchase_string_as_matrix
accepts a psmatrix converts 1/0 purchase strings to recency at timeof
psmatrix_to_recency_attimeof_matrix(psmatrix)
psmatrix_to_recency_attimeof_matrix(psmatrix)
psmatrix |
a psmatrix |
The customer analytics data format (CADF) relays heavily on correct input data. Transactional data must: 1.) be a data frame with two columns 2.) Column one is the customer id 3.) Column 2 is the transaction date. Column 2 must be formatted as a date object in R.
qc_transactional_data(x)
qc_transactional_data(x)
x |
R dataframe representing .. |
A number representing whether it passes or not.
Segmentation and LTV data
A data frame with 53998 rows and 4 columns
Function used for simulation and scenario planning
simple_migration(num.customers, pct.buy.buy, pct.nobuy.buy, n.periods)
simple_migration(num.customers, pct.buy.buy, pct.nobuy.buy, n.periods)
num.customers |
Number of customers for the simulation. |
pct.buy.buy |
percentage of customers that buy in the nxt period |
pct.nobuy.buy |
percentage of non buyers that convert over to buyers |
n.periods |
number of periods |
simple_migration(200, .80, .20, 12)
simple_migration(200, .80, .20, 12)
Create a CADF dataset from a dataframe
## S3 method for class 'transaction.file_to_CADF' split(data, today.study.cutoff)
## S3 method for class 'transaction.file_to_CADF' split(data, today.study.cutoff)
data |
data frame for a single customer id |
today.study.cutoff |
separate analysis and holdout data |
#' Simple retention model data
A data frame with 5828 rows and two columns
Time period
Whether or not there was a cancellation in the time period
...
Stockmarket put/call data
A data frame with 770 rows and 20 columns
#' Transaction data
A data frame with 67,944 rows and 4 columns
Customer ID
Purchase date
Number of items purchased
Total transaction amount
...
Calculate transition periods between two timeperiods
transitions(timeperiod0, timeperiod1, buyvar = "Y", nobuyvar = "N")
transitions(timeperiod0, timeperiod1, buyvar = "Y", nobuyvar = "N")
timeperiod0 |
Column representing the 'from' side of the transition probability |
timeperiod1 |
Column representing the 'to' side of the transition probability |
buyvar |
field value that represents a buy, defaults to Y |
nobuyvar |
field value that represents not buy, defaults to N |
2 x 2 transaction matrix
timeperiod0 <- c("Y", "Y", "Y", "Y", "Y") timeperiod1 <- c("N", "Y", "N", "Y", "N") transitions(timeperiod0, timeperiod1)
timeperiod0 <- c("Y", "Y", "Y", "Y", "Y") timeperiod1 <- c("N", "Y", "N", "Y", "N") transitions(timeperiod0, timeperiod1)