Title: | Finds an Optimal System of Distinct Representatives |
---|---|
Description: | Provides routines for finding an Optimal System of Distinct Representatives (OSDR), as defined by D.Gale (1968) <doi:10.1016/S0021-9800(68)80039-0>. |
Authors: | Massimo Cannas [aut, cre] |
Maintainer: | Massimo Cannas <[email protected]> |
License: | GPL-3 |
Version: | 1.1.4 |
Built: | 2024-11-28 06:27:18 UTC |
Source: | CRAN |
Provides routines for finding an Optimal System of Distinct Representatives (OSDR), as defined by D.Gale (1968) <doi:10.1016/S0021-9800(68)80039-0>.
Package: | OSDR |
Type: | Package |
Title: | Finds an Optimal System of Distinct Representatives |
Version: | 1.1.4 |
Date: | 2022-03-07 |
Authors@R: | person("Massimo", "Cannas", role = c("aut", "cre"), email = "[email protected]") |
Author: | Massimo Cannas [aut, cre] |
Maintainer: | Massimo Cannas <[email protected]> |
Description: | Provides routines for finding an Optimal System of Distinct Representatives (OSDR), as defined by D.Gale (1968) <doi:10.1016/S0021-9800(68)80039-0>. |
License: | GPL-3 |
URL: | https://cran.r-project.org/package=OSDR |
NeedsCompilation: | no |
Packaged: | 2022-03-07 15:53:48 UTC; massimo |
Repository: | CRAN |
Date/Publication: | 2022-03-08 13:30:05 UTC |
Index of help topics:
OSDR Finds an Optimal System of Distinct Representatives from a family of subsets. OSDR-package Finds an Optimal System of Distinct Representatives exdata Executive dataset (subset of) listmat From cost matrix to list of feasible assignments. matlist From list of feasible assignments to cost matrix.
The aim of this small package is to provide a function solving the assignment problem on an ordered set. This matching problem can be viewed in two ways. First, as the original ordered assignment problem, where a set of jobs, ordered by importance, must be filled by suitable applicants in the best possible way. Second, as an ordered matching problem where treated unit should be matched in order of importance with suitable controls. Suitability can be obtained by a ”caliper”: units within the caliper having zero matching cost and units outside the caliper having infinity cost. The main function OSDR
exploits an algorithm suggested by I.Anderson to find an order optimal matching, as defined by D.Gale (see OSDR
for details). The package includes some utilities and examples (both combinatorial and statistically oriented) to illustrate the use of OSDR
.
Massimo Cannas [aut, cre]
Maintainer: Massimo Cannas <[email protected]>
Gale, D. (1968) Optimal matching in an ordered set: an application of matroid theory. Journal of Combinatorial Theory 4, pp. 176-180.
Anderson, I. (1989) A first course in Combinatorial Mathematics. Oxford University Press.
Rosenbaum, P. R. (1989). Optimal matching for observational studies. Journal of the American Statistical Association 84(408): pp. 1024-1032.
#See OSDR help for the examples.
#See OSDR help for the examples.
Contains executives data for an italian firm operating in the energy sector (year 2015). The complete dataset contains 302 rows (10 firms) and 180 columns (it will be released in next package version).
data("exdata")
data("exdata")
A data frame with 28 observations on the following 8 variables.
firm
a character vector
position
a numeric vector
education
a numeric vector
year_born
a numeric vector
contract
a numeric vector
part_fulltime
a numeric vector
seniority
a numeric vector
sex
a numeric vector
position
is coded: 4 = top manager, 3 = medium/first line manager, 2 = supervisor
sex
is coded: 0 = M, 1 = F
education
is coded: Post-graduate = 5, Graduate = 4, High school = 3
contract type
is coded: fixed term = 4; permanent = 3
seniority
is measured by number of years in the current position
# load executive data data(exdata) # case study: matched samples for comparing women and men executives table(exdata$sex) table(exdata$position,exdata$sex) # There are more women and more in apical position. # A complete matching is not possible for several choices of the caliper. # Gap differences tend to be higher for higher ranks # e.g. Lynn and Thompson(1997), Above the glass ceiling? A comparison of # matched Samples of Men and Women Executives. J. of Appl. Psych. 82(3) # so we would give higher matching priority to women in higher position. # We can use OSDR to find a minimum cost matching # performing matching by decreasing hierarchical position.
# load executive data data(exdata) # case study: matched samples for comparing women and men executives table(exdata$sex) table(exdata$position,exdata$sex) # There are more women and more in apical position. # A complete matching is not possible for several choices of the caliper. # Gap differences tend to be higher for higher ranks # e.g. Lynn and Thompson(1997), Above the glass ceiling? A comparison of # matched Samples of Men and Women Executives. J. of Appl. Psych. 82(3) # so we would give higher matching priority to women in higher position. # We can use OSDR to find a minimum cost matching # performing matching by decreasing hierarchical position.
Transforms the cost matrix of the assignment problem in the corresponding list of suitable applicants for each job.
listmat(x)
listmat(x)
x |
A cost matrix where entry |
In statistical matching problems the input is usually the cost matrix while in assignment problems is the list of assignable elements. Functions matlist
and listmat
go back and forth these two representations of the input.
A list of suitable applicants. The element of the list contains the suitable applicants for job i. These are all applicants with finite assignment cost. Note that if the cost matrix contains finite non zero values there is a loss of information in the transformation.
Massimo Cannas <[email protected]>
See matlist
, the reverse function.
# a list of feasible applicants for five jobs M1<-c("A","B","C") M2<-c("A","C") M3<-c("B") M4<-c("A","C") M5<-c("A","D") M <-list(M1,M2,M3,M4,M5) M # list --> cost matrix m <- listmat(M) # cost matrix --> list l <- matlist(m)
# a list of feasible applicants for five jobs M1<-c("A","B","C") M2<-c("A","C") M3<-c("B") M4<-c("A","C") M5<-c("A","D") M <-list(M1,M2,M3,M4,M5) M # list --> cost matrix m <- listmat(M) # cost matrix --> list l <- matlist(m)
Transforms the list of suitable applicants into the equivalent cost matrix.
matlist(x)
matlist(x)
x |
An ordered list of applicants for a set of jobs. The |
In statistical matching problems the input is usually the cost matrix while in assignment problems is the list of assignable elements. Functions matlist
and listmat
go back and forth these two representations of the input.
The cost matrix of the assignment problem. The entry of the matrix is zero if the
applicant is assignable to the
job and
otherwise. Equivalently, the
entry of the matrix is zero if the
control unit can be matched to the
treated unit and
otherwise.
Massimo Cannas <[email protected]>
See also the reverse function listmat
.
# a list of feasible applicants for five jobs M1<-c("A","B","C") M2<-c("A","C") M3<-c("B") M4<-c("A","C") M5<-c("A","D") M <-list(M1,M2,M3,M4,M5) M # the corresponding cost matrix m<-listmat(M) # back to the list l<-matlist(m)
# a list of feasible applicants for five jobs M1<-c("A","B","C") M2<-c("A","C") M3<-c("B") M4<-c("A","C") M5<-c("A","D") M <-list(M1,M2,M3,M4,M5) M # the corresponding cost matrix m<-listmat(M) # back to the list l<-matlist(m)
The function finds an Optimal Set of Distinct Representatives (OSDR) from an ordered family of subsets. Optimality is order-wise in the sense specified by Gale: any other SDR cannot be larger and its elements cannot be chosen from more important sets (see details).
OSDR(M)
OSDR(M)
M |
An ordered family of subsets. In assignment problems the list specifying the feasible applicants for each job. In matching problems the list specifying the controls matchable with each treated unit. The list is assumed ordered by decreasing assignment/matching priority. |
Consider a set of jobs and a set of suitable applicants for each job. It was shown by D.Gale that there exists an optimally assignable subset of the jobs in the sense that if is optimal and
is another assignable set of jobs then: a)
and b)
for all
. The latter means that job
has as higher a priority as job
. From a graph perspective, if
is the set of jobs and
is the set of applicants the OSDR is an order wise maximum size matching of the bipartite graph with vertex set
.
Function OSDR
finds the optimal matching using an algorithmic proof of Hall's theorem due to logician D.J. Shoesmith. The algorithm assigns greedily until possible and correct backward when necessary. A message is given when a correction is attempted (when it fails another message warns that Hall's condition is not satisfied).
A list containing three elements: the OSDR (a zero in position i indicates impossibility of assigning job i), the index of optimally assignable jobs and the index of unassignable jobs. If the latter is not empty M
does not meet Hall's condition, i.e., a complete SDR is not possible.
In statistical matching problems the sets and
are the sets of treated and control units and it is usually required to find the minimum covariate distance matching of treated versus control units. When a complete matching does not exist it can be convenient to find either an order optimal matching or a minimum cost matching on the
OSDR
subset (see the gender gap example).
Massimo Cannas <[email protected]>
Gale, D. (1968) Optimal matching in an ordered set: an application of matroid theory. Journal of Combinatorial Theory 4: 176-180.
Anderson, I. (1989) A first course in Combinatorial Mathematics. Oxford University Press.
### example 1 # M is the list of suitable applicants for five jobs M1 <- c("A" , "B") M2 <- c("B" , "C") M3 <- c("B") M4 <- c("A" , "C") M5 <- c("B" , "C" , "D") M <- list(M1 , M2 , M3 , M4 , M5) OSDR(M) # $OSDR # [1] "A" "C" "B" "0" "D" # $matched # [1] 1 2 3 5 # $unmatched # [1] 4 # job 4 unmatched so Hall's condition is not satisfied: it's impossible to fill all the jobs # note that there are (order-\emph{suboptimal}) assignments of the same length of the optimal: # eg: 0CBAD , BC0AD #### example 2: sligthly modified: more than one order optimal matching M1<-c("A","B","C") M2<-c("A","C") M3<-c("B") M4<-c("A","C") M5<-c("A","D") M <-list(M1,M2,M3,M4,M5) OSDR(M) # note there are other order optimal matchings: ACB0D or CAB0D # note there are also other maximum size matchings (not order optimal): # e.g. 0CBAD or BC0AD
### example 1 # M is the list of suitable applicants for five jobs M1 <- c("A" , "B") M2 <- c("B" , "C") M3 <- c("B") M4 <- c("A" , "C") M5 <- c("B" , "C" , "D") M <- list(M1 , M2 , M3 , M4 , M5) OSDR(M) # $OSDR # [1] "A" "C" "B" "0" "D" # $matched # [1] 1 2 3 5 # $unmatched # [1] 4 # job 4 unmatched so Hall's condition is not satisfied: it's impossible to fill all the jobs # note that there are (order-\emph{suboptimal}) assignments of the same length of the optimal: # eg: 0CBAD , BC0AD #### example 2: sligthly modified: more than one order optimal matching M1<-c("A","B","C") M2<-c("A","C") M3<-c("B") M4<-c("A","C") M5<-c("A","D") M <-list(M1,M2,M3,M4,M5) OSDR(M) # note there are other order optimal matchings: ACB0D or CAB0D # note there are also other maximum size matchings (not order optimal): # e.g. 0CBAD or BC0AD