Package 'CADF'

Title: Customer Analytics Data Formatting
Description: Converts customer transaction data (ID, purchase date) into a R6 class called customer. The class stores various customer analytics calculations at the customer level. The package also contains functionality to convert data in the R6 class to data.frames that can serve as inputs for various customer analytics models.
Authors: Ludwig Steven [aut, cre]
Maintainer: Ludwig Steven <[email protected]>
License: GPL-3
Version: 0.1
Built: 2024-12-01 08:27:48 UTC
Source: CRAN

Help Index


Likelihood maximization for annual halfing customer retention model

Description

Likelihood maximization for annual halfing customer retention model

Usage

annualhalfing_LL(grid, dta)

Arguments

grid

model parameters

dta

dataset

Value

Annual halfing Likelihood in optimization routine


Annual Halfing Model

Description

A recency-frequency model used in non-contractual situations. Model assumptions: 1.) Increasing recency leads to higher probability of quitting. 2.) Frequency is related to exponential learning curves Reference: Segmentation and Lifetime Value Modeling in SAS (Edward Malthouse)

Usage

annualhalfingmodel(cadf.data, starting.values)

Arguments

cadf.data

cadf-formatted dataset

starting.values

parameter starting values for model

Value

Returns model parameters

Examples

dta <- lapply(CADF::cadf.data.sample, function(x) tail(x$data, 1))
dta <- do.call(rbind, dta)
starting.values <- c(.5,.9,.2,-.9)
annualhalfingmodel(cadf.data.sample, starting.values)

Answering machine data

Description

Answering machine data

Format

A data frame with 9 rows and two columns


bigT_expand_via_apply

Description

bigT_expand_via_apply

Usage

bigT_expand_via_apply(x)

Arguments

x

vector containing bigT, cancel and count

Examples

x <- c(3, 1, 5)
bigT_expand_via_apply(x)

Billionaires

Description

Billionaires

Format

data frame


ca_SRM

Description

ca_SRM

Usage

ca_SRM(df_logistic)

Arguments

df_logistic

data frame containing the data for logistic regression

Examples

customertype1 <- c(3, 1, 5)
customertype2 <- c(12, 0, 3)
cust1 <- bigT_expand_via_apply(customertype1)
cust2 <- bigT_expand_via_apply(customertype2)
df_logistic <- rbind(cust1, cust2)
model <- ca_SRM(df_logistic)

Time varying Simple retention model Estimates retention rate using logistic regression and the simple regression model Mostly used for contractual models where there are clear opportunities for cancellation. Could be used in non-contractional situations although the cancellation opportunities should be defined. Not recommended for use with services that consumers use rotating-door style. Use the migration model there.

Description

Time varying Simple retention model Estimates retention rate using logistic regression and the simple regression model Mostly used for contractual models where there are clear opportunities for cancellation. Could be used in non-contractional situations although the cancellation opportunities should be defined. Not recommended for use with services that consumers use rotating-door style. Use the migration model there.

Usage

ca_SRM_time_varying(df_logistic, reference_level = 12, maxT = 12)

Arguments

df_logistic

A data frame, formatted for logistic regression. 1 row for each customer id/timeperiod. 1/0 for purchase.

reference_level

All coefficients will be judged relevant to the reference level. It defaults to time period 12. (Note interpretation will change based on how T is formulated.)

maxT

The number of timeperiods to build.

Value

Returns logistic model results (the glm model)

Examples

library(stats)
x <- c(3, 1, 5)
df_logistic <- bigT_expand_via_apply(x)
model <- ca_SRM_time_varying(df_logistic, reference_level = 3)

CADF to purchase string Extracts purchase strings from the CADF and formats as a R matrix.

Description

CADF to purchase string Extracts purchase strings from the CADF and formats as a R matrix.

Usage

ca_to_ps_matrix(ca.data, maxT)

Arguments

ca.data

Data in the CADF format generated by the CADF _to_CADF functions and Customer class.

maxT

Number of columns in the matrix

Details

Output is a matrix. Rows are number of customers; columns = maxT

Value

Matrix with dimensions C x maxT (number of customers by maxT) library(CADF) data("transactions") customer <- subset(transactions, transactions$ID == 40) today.study.cutoff <- max(customer$PURCHASE_DATE) customer.40.CADF <- list(Customer$new(customer, today.study.cutoff)) psmatrix <- customer.40.CADF$purchase_string_as_matrix psmatrix2 <- ca_to_ps_matrix(customer.40.CADF, 15)


cadf.

Description

cadf.


Convert CADF dataset into annualhalfing model dataset

Description

Converts CADF output to dataset for annual halfing model

Usage

CADF_to_annualhalfing_data(cadf.data)

Arguments

cadf.data

CADF dataset


CADF to btyd pareto nbd model

Description

Converts a CADF dataset to a dataset for btyd pareto nbd modeling

Usage

CADF_to_btyd_pareto_nbd(cadf.data)

Arguments

cadf.data

CADF-formatted dataset


CADF to logistic regression

Description

Convert a CADF dataset to a dataset for logistic regression

Usage

CADF_to_logistic_regression(CADF)

Arguments

CADF

CADF-formatted dataset


CADF_to_migration_model converts CADF data to migration model data

Description

Builds transition matrix for a migration model. T is the maximum time cutoff which defaults to 5. The output will be a transition matrix.

Usage

CADF_to_migration_model(cadf.data, maxT = 5)

Arguments

cadf.data

Data in R list format processed by CADF functions

maxT

If time is greater than maxT it will be converted into a + category

Examples

tmatrix <- CADF_to_migration_model(cadf.data.sample)

CADF_to_nth_purchase

Description

CADF_to_nth_purchase

Usage

CADF_to_nth_purchase(cadf.data, n)

Arguments

cadf.data

Data in R list format processed by CADF functions

n

the nth purchase you want to analyze


CADF_to_nth_purchase_allrows inputs CADF data and the desired purchase number that you want to count the nth result of.

Description

CADF_to_nth_purchase_allrows inputs CADF data and the desired purchase number that you want to count the nth result of.

Usage

CADF_to_nth_purchase_allrows(cadf.data, n)

Arguments

cadf.data

Data in R list format processed by CADF functions

n

the nth purchase


CADF-formatted sample data

Description

CADF-formatted sample data

Format

List with 2,185 customers, in CADF format


Function called during Customer$new() (the Customer R6 class) to create purchase string for the customer.

Description

Function called during Customer$new() (the Customer R6 class) to create purchase string for the customer.

Usage

create.purchase.string(x, id.column, date.column, return.mode = "")

Arguments

x

Transactional data associated with customer id.

id.column

Description goes here.

date.column

Description goes here.

return.mode

Set to matrix if you want result returned as a matrix

Value

purchase string in 0/1 format. Returned as string.

Examples

data("transactions")
customer <- subset(transactions, transactions$ID == 5)
create.purchase.string(customer, "ID", "PURCHASE_DATE")

create_recency_string

Description

Tracks cumulative recency

Usage

create.recency.string(x)

Arguments

x

vector of zeros and ones

Examples

head(cadf.data.sample)

R6 Class representing a customer. Otherwise known as the CADF.

Description

A short description...

Details

Call Customer$new() to convert transactional data to CADF format

Public fields

output

Stores all information in R format at the customer level.

payload

Stores all computed customer information in JSON format for integration into other systems. This is not quite an API but designed so that customer information can be imported to other formats and systems.

data

a data frame that stores purchase information for a single customer. Input data for various calculations in initialize (df_customer)

id

The customer id. This will be the same ID as provided in the input transaction file.

study_name

A name to associate with the cohort study. #The name can be whatever is easiest to associate with the set of customer id and dates included in the analysis.

study_begin_date

Begin date of the customer study. In theory this should be min(TRANSACTION_DATE) for each customer in the dataset.

timing

Monthly timing computes T as months. Most commonly utilized and is the default.

transaction_dates

All transaction dates for the customer

transaction_months

All YYYY_MM transaction dates for the customer

first_purchase_date

First purchase date for the customer.

last_purchase_date

Last purchase date for the customer. #' @field repeat_customer repeat_customer if the following conditions are true. The customer has more than one transaction. The second transaction date is greater than the first transaction date.

repeat_customer_by_day

description

today

today #' @field T a measure of time between first date of activity and purchase.

T_ss

T_ss

transaction_range_complete

shows a consecutive sequence usually beginning at 1

purchase_count

purchase count

purchase_string

description

purchase_string_as_matrix

purchase string as matrix

recency_string_as_matrix

recency string as matrix

Freq

frequency count

logistic_modeling_matrix

Stores customer's logistic modeling matrix. (One row for each time period (T), 1 = purchase; 0 = no purchase)

logistic_modeling_matrix_ss

logistic_modeling_matrix_ss

logistic_modeling_matrix_custom

logistic_modeling_matrix_custom

survival_modeling_matrix

Stores customer's modeling matrix for survival analysis. For survival analysis '1' means that the customer has stopped being a customer. '0' means that the customer is continuing to be a customer.

survival_modeling_matrix_ss

survival_modeling_matrix_ss

survival_modeling_matrix_custom

survival_modeling_matrix_custom

repeat_customer

This can be used to filter out repeat customers from analysis. Repeat customer based on YYYY_MM. (Customer with only two purchases in January would not be a repeat customer) however it's by day instead of YYYY_MM. PURCHASE STRINGS purchase_string Utilizes the 'create.purchase.string' function to create a purchase string. "1" if purchase was made during the purchase period; "0" otherwise. No special rules are applied and the purchase string reflects true purchase history. df_customer: data frame for single customer, id column, purchase date column

T

T is a cancellation time. CADF offers different ways to estimate the cancellation time strict_quitter: Customer leaves after first period of inactivity. Example purchase string 11001. T=3 strict_stayer: T is the last period of transaction in the purchase string. 11001. T=5 As T becomes longer strict_quitter will have a tendancy to underestimate retention. Strict_stayer will have a tendancey to overestimate If you know your customers come and go at free will you can utilize a Migration model or choose T between strict quitter and strict stayer

T_ss T_ss

T_custom

T_custom logistic_modeling_matrix Stores rows for the customer that contribute to a logistic modeling matrix. Assumes strict/perm cancellations. Customer relationship starts at time 1 and ends at time N (with perm cancellation and no pauses in between) This is usually known as a contractual relationship logistic_modeling_matrix_sc Assumes strict stayer assumption $field logistic_modeling_matrix_custom survival_modeling_matrix Stores rows for the customer that contribute to a survival modeling matrix. $field logistic_modeling_matrix_custom cleanup and data storage empty working df_customer data frame and place the result in the class, name it 'data'

Methods

Public methods


Method new()

Creates a CADF profile for a given customer based on the input transactional data usually an R list

Usage
Customer$new(df_customer = NA, today = NA)
Arguments
df_customer

description

today
Returns

A new 'Customer' object. Converted transactional data to CADF format. To access cadf[[1]], etc... Represents customer data (for a particular id) in the "CADF" format df_customer$Tdays df_customer data frame column: to compute "days from first purchase" df_customer$month_yr date converted to YYYY_MM format df_customer$Tmonths Number of months between purchase date and first purchase date. Rounded up to nearest month id the customerid which identifies the customer in the CADF class. transaction_dates All unique transaction dates for customer All unique YYYY_MM combinations for customer transactions. This is used for building purchase strings.


Method clone()

The objects of this class are cloneable with this method.

Usage
Customer$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Examples

library(CADF)
data("transactions")
customer <- subset(transactions, transactions$ID == 40)
today.study.cutoff <- max(customer$PURCHASE_DATE)
customer.40.CADF <- Customer$new(customer, today.study.cutoff)

Discrete choice

Description

Discrete choice

Format

##'discretechoice'


Excel data

Description

Excel data

Format

Data frame with 50 rows and 9 columns


For each customer, return a modeling matrix that is utilized for logistic regression

Description

'f_CustomerModelingMatrix' inputs are cancellation_time.

Usage

f_CustomerModelingMatrix(cancellation_time)

Arguments

cancellation_time

= cancellation time

Details

Description here

Examples

f_CustomerModelingMatrix(10)

For each customer, return a survival modeling matrix that is utilized for survival analysis

Description

'f_CustomerSurvivalModelingMatrix' inputs are T.

Usage

f_CustomerSurvivalModelingMatrix(cancellation_time)

Arguments

cancellation_time

cancellation time

Details

Description here

Examples

f_CustomerSurvivalModelingMatrix(10)

Compute the months between two purchase dates

Description

Compute the months between two purchase dates

Usage

f_intMonths(a, b)

Arguments

a

starting date

b

ending date

Description here


Health Data

Description

Health Data

Format

data frame with 5,432 rows and 36 columns


Purchase string to frequency count

Description

Purchase string to frequency count

Usage

frequency_from_ps(x)

Arguments

x

rle object


RLE object to frequency count

Description

RLE object to frequency count

Usage

frequency_from_rle(x)

Arguments

x

rle object

Examples

# example code
x <- c(1,1,0,1,0,0,1,0,0,0)
x.rle <- rle(x)
frequency_from_rle(x.rle)

Gamma gamma spend model data

Description

Gamma gamma spend model data

Format

data frame with 2,357 rows and 6 columns


generate_date_template

Description

generate_date_template

Usage

generate_date_template()

Examples

dates <- generate_date_template()

Convert to CADF for a single customer id

Description

'id_to_CADF' inputs is coming from a lapply operation on a split customer dataset. If variable a is the split customer dataset then a$'1' is customer with ID 1

Usage

id_to_CADF(data, today.study.cutoff)

Arguments

data

Transactional Data for one customerid

today.study.cutoff

Separate data an holdout

Details

Description here


LD functions are utilized for learning and diagnostic use.

Description

LD functions are utilized for learning and diagnostic use.

Usage

ld_sample_customer_matrix(numCustomers, maxT, purchaseAtT0 = TRUE)

Arguments

numCustomers

number of customers to simulate

maxT

number of timeperiods

purchaseAtT0

by default sets first column of matrix to 1


LTV transactions data

Description

LTV transactions data

Format

data frame with 53,998 rows and 4 columns


Likelihood function for annual halfing model

Description

Likelihood function for annual halfing model

Usage

modeling.annualhalfing.likelihood(grid2, rec, freq, targetBuy)

Arguments

grid2

Modeling parameters

rec

recency

freq

frequency

targetBuy

indicator if purchase was made in holdout period


LL function for the gamma gamma spend model

Description

LL function for the gamma gamma spend model

Usage

modeling.LL.gamma_spend(p, q, gamma, y = data)

Arguments

p

p

q

q

gamma

gamma

y

data


PDF probability function for gamma distribution

Description

PDF probability function for gamma distribution

Usage

pdf_gamma(x, r, a)

Arguments

x

between 0 and 1 for pdf

r

shape parameter

a

scale parameter


Probability density function for gamma distribution

Description

Probability density function for gamma distribution

Usage

pdf_gamma2(x, shape, scale)

Arguments

x

x

shape

shape parameter

scale

scale parameter


The glossary for the CADF data format

Description

The glossary for the CADF data format

Usage

## S3 method for class 'glossary'
print()

Calculates T from a purchase string. Custom.

Description

Calculates T from a purchase string. Custom.

Usage

ps_to_T_custom(ps, skips = 2)

Arguments

ps

Purchase string.

skips

Number of non purchase periods that the customer is still considered a customer for.

Value

The sum of x and y.


Calculates T from a purchase string

Description

Calculates T from a purchase string

Usage

ps_to_T_strict_quitter(ps)

Arguments

ps

Purchase string.

Value

The sum of x and y.


Calculates T from a purchase string under the "strict stayer" assumption.

Description

Calculates T from a purchase string under the "strict stayer" assumption.

Usage

ps_to_T_strict_stayer(ps)

Arguments

ps

Purchase string.

Value

The numeric value for T, which is the position of the last 1 in the purchase string


psmatrix_to_psstring

Description

psmatrix_to_psstring

Usage

psmatrix_to_psstring(psmatrix)

Arguments

psmatrix

purchase string of 1's and 0's in matrix format

Examples

cadf.data.sample[[4]]$purchase_string_as_matrix

accepts a psmatrix converts 1/0 purchase strings to recency at timeof

Description

accepts a psmatrix converts 1/0 purchase strings to recency at timeof

Usage

psmatrix_to_recency_attimeof_matrix(psmatrix)

Arguments

psmatrix

a psmatrix


The customer analytics data format (CADF) relays heavily on correct input data. Transactional data must: 1.) be a data frame with two columns 2.) Column one is the customer id 3.) Column 2 is the transaction date. Column 2 must be formatted as a date object in R.

Description

The customer analytics data format (CADF) relays heavily on correct input data. Transactional data must: 1.) be a data frame with two columns 2.) Column one is the customer id 3.) Column 2 is the transaction date. Column 2 must be formatted as a date object in R.

Usage

qc_transactional_data(x)

Arguments

x

R dataframe representing ..

Value

A number representing whether it passes or not.


Segmentation and LTV data

Description

Segmentation and LTV data

Format

A data frame with 53998 rows and 4 columns


Simple Migration

Description

Function used for simulation and scenario planning

Usage

simple_migration(num.customers, pct.buy.buy, pct.nobuy.buy, n.periods)

Arguments

num.customers

Number of customers for the simulation.

pct.buy.buy

percentage of customers that buy in the nxt period

pct.nobuy.buy

percentage of non buyers that convert over to buyers

n.periods

number of periods

Examples

simple_migration(200, .80, .20, 12)

Create a CADF dataset from a dataframe

Description

Create a CADF dataset from a dataframe

Usage

## S3 method for class 'transaction.file_to_CADF'
split(data, today.study.cutoff)

Arguments

data

data frame for a single customer id

today.study.cutoff

separate analysis and holdout data


#' Simple retention model data

Description

#' Simple retention model data

Format

A data frame with 5828 rows and two columns

bigT

Time period

cancel

Whether or not there was a cancellation in the time period

...


SRM model data

Description

SRM model data

Format

Data frame with 22 rows and 3 columns


Stockmarket put/call data

Description

Stockmarket put/call data

Format

A data frame with 770 rows and 20 columns


Transactions data

Description

Transactions data

Format

data frame with 69659 rows and 4 columns


#' Transaction data

Description

#' Transaction data

Format

A data frame with 67,944 rows and 4 columns

ID

Customer ID

PURCHASE_DATE

Purchase date

NUM_ITEMS

Number of items purchased

TOTAL

Total transaction amount

...


Calculate transition periods between two timeperiods

Description

Calculate transition periods between two timeperiods

Usage

transitions(timeperiod0, timeperiod1, buyvar = "Y", nobuyvar = "N")

Arguments

timeperiod0

Column representing the 'from' side of the transition probability

timeperiod1

Column representing the 'to' side of the transition probability

buyvar

field value that represents a buy, defaults to Y

nobuyvar

field value that represents not buy, defaults to N

Value

2 x 2 transaction matrix

Examples

timeperiod0 <- c("Y", "Y", "Y", "Y", "Y")
timeperiod1 <- c("N", "Y", "N", "Y", "N")
transitions(timeperiod0, timeperiod1)