| Title: | Computing Key Indicators of the Spatial Distribution of Economic Activities |
|---|---|
| Description: | Computes a series of indices commonly used in the fields of economic geography, economic complexity, and evolutionary economics to describe the location, distribution, spatial organization, structure, and complexity of economic activities. Functions include basic spatial indicators such as the location quotient, the Krugman specialization index, the Herfindahl or the Shannon entropy indices but also more advanced functions to compute different forms of normalized relatedness between economic activities or network-based measures of economic complexity. Most of the functions use matrix calculus and are based on bipartite (incidence) matrices consisting of region - industry pairs. These are described in Balland (2017) <http://econ.geo.uu.nl/peeg/peeg1709.pdf>. |
| Authors: | Pierre-Alexandre Balland [aut, cre, cph] |
| Maintainer: | Pierre-Alexandre Balland <[email protected]> |
| License: | GPL-2 | GPL-3 |
| Version: | 2.1 |
| Built: | 2026-07-02 21:13:22 UTC |
| Source: | https://github.com/cran/EconGeo |
This function computes the number of co-occurrences between industry pairs from an incidence (industry - event) matrix
co_occurrence(mat, diagonal = FALSE, list = FALSE)co_occurrence(mat, diagonal = FALSE, list = FALSE)
mat |
An incidence matrix with industries in rows and events in columns |
diagonal |
Logical; shall the values in the diagonal of the co-occurrence matrix be included in the output? Defaults to FALSE (values in the diagonal are set to 0), but can be set to TRUE (values in the diagonal reflects in how many events a single industry can be found) |
list |
Logical; is the input a list? Defaults to FALSE (input = adjacency matrix), but can be set to TRUE if the input is an edge list |
The co-occurrence matrix as an R matrix object.
Pierre-Alexandre Balland [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
relatedness, relatedness_density
## generate a region - events matrix set.seed(31) mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 5) rownames(mat) <- c("I1", "I2", "I3", "I4") colnames(mat) <- c("US1", "US2", "US3", "US4", "US5") ## run the function co_occurrence(mat) co_occurrence(mat, diagonal = TRUE) ## generate a regular data frame (list) my_list <- get_list(mat) ## run the function co_occurrence(my_list, list = TRUE) co_occurrence(my_list, list = TRUE, diagonal = TRUE)## generate a region - events matrix set.seed(31) mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 5) rownames(mat) <- c("I1", "I2", "I3", "I4") colnames(mat) <- c("US1", "US2", "US3", "US4", "US5") ## run the function co_occurrence(mat) co_occurrence(mat, diagonal = TRUE) ## generate a regular data frame (list) my_list <- get_list(mat) ## run the function co_occurrence(my_list, list = TRUE) co_occurrence(my_list, list = TRUE, diagonal = TRUE)
This function computes a simple measure of diversity of regions by counting the number of industries in which a region has a relative comparative advantage (location quotient > 1) from regions - industries (incidence) matrices
diversity(mat, rca = FALSE)diversity(mat, rca = FALSE)
mat |
An incidence matrix with regions in rows and industries in columns |
rca |
Logical; should the index of relative comparative advantage (RCA - also refered to as location quotient) first be computed? Defaults to FALSE (a binary matrix - 0/1 - is expected as an input), but can be set to TRUE if the index of relative comparative advantage first needs to be computed |
A numeric vector representing the share of a tech in a city's portfolio
Pierre-Alexandre Balland [email protected]
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function diversity(mat, rca = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(31) mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function diversity(mat)## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function diversity(mat, rca = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(31) mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function diversity(mat)
This function computes the ease of recombination of a given technological class from technological classes - patents (incidence) matrices
ease_recombination(mat, sparse = FALSE, list = FALSE)ease_recombination(mat, sparse = FALSE, list = FALSE)
mat |
A bipartite adjacency matrix (can be a sparse matrix) |
sparse |
Logical; is the input matrix a sparse matrix? Defaults to FALSE, but can be set to TRUE if the input matrix is a sparse matrix |
list |
Logical; is the input a list? Defaults to FALSE, but can be set to TRUE if the input matrix is a list |
A data frame with two columns: "tech" representing the technological class and "eor" representing the ease of recombination of the technological class
Pierre-Alexandre Balland [email protected]
Fleming, L. and Sorenson, O. (2001) Technology as a complex adaptive system: evidence from patent data, Research Policy 30: 1019-1039
## generate a technology - patent matrix set.seed(31) mat <- matrix(sample(0:1, 30, replace = TRUE), ncol = 5) rownames(mat) <- c("T1", "T2", "T3", "T4", "T5", "T6") colnames(mat) <- c("US1", "US2", "US3", "US4", "US5") ## generate a technology - patent sparse matrix library(Matrix) smat <- Matrix(mat, sparse = TRUE) ## run the function ease_recombination(mat) ease_recombination(smat, sparse = TRUE) ## generate a regular data frame (list) my_list <- get_list(mat) ## run the function ease_recombination(my_list, list = TRUE)## generate a technology - patent matrix set.seed(31) mat <- matrix(sample(0:1, 30, replace = TRUE), ncol = 5) rownames(mat) <- c("T1", "T2", "T3", "T4", "T5", "T6") colnames(mat) <- c("US1", "US2", "US3", "US4", "US5") ## generate a technology - patent sparse matrix library(Matrix) smat <- Matrix(mat, sparse = TRUE) ## run the function ease_recombination(mat) ease_recombination(smat, sparse = TRUE) ## generate a regular data frame (list) my_list <- get_list(mat) ## run the function ease_recombination(my_list, list = TRUE)
This function computes the Shannon entropy index from regions - industries matrices from (incidence) regions - industries matrices
entropy(mat)entropy(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
A numeric vector representing the Shannon entropy index computed from the regions - industries matrix
Pierre-Alexandre Balland [email protected]
Shannon, C.E., Weaver, W. (1949) The Mathematical Theory of Communication. Univ of Illinois Press.
Frenken, K., Van Oort, F. and Verburg, T. (2007) Related variety, unrelated variety and regional economic growth, Regional studies 41 (5): 685-697.
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function entropy(mat)## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function entropy(mat)
This function generates a data frame of entry events from multiple regions - industries matrices (different matrix compositions are allowed). In this function, the maximum number of periods is limited to 20.
entry_list(...)entry_list(...)
... |
Incidence matrices with regions in rows and industries in columns (period ... - optional) |
A data frame representing the entry events from multiple regions - industries matrices, with columns "region" (representing the region), "industry" (representing the industry), "entry" (representing the entry event), and "period" (representing the period)
Pierre-Alexandre Balland [email protected]
Wolf-Hendrik Uhlbach [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[3, 1] <- 1 ## run the function entry_list(mat1, mat2) ## generate a third region - industry matrix in which cells represent the presence/absence ## of a RCA (period 3) mat3 <- mat2 mat3[5, 2] <- 1 ## run the function entry_list(mat1, mat2, mat3) ## generate a fourth region - industry matrix in which cells represent the presence/absence ## of a RCA (period 4) mat4 <- mat3 mat4[5, 4] <- 1 ## run the function entry_list(mat1, mat2, mat3, mat4)## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[3, 1] <- 1 ## run the function entry_list(mat1, mat2) ## generate a third region - industry matrix in which cells represent the presence/absence ## of a RCA (period 3) mat3 <- mat2 mat3[5, 2] <- 1 ## run the function entry_list(mat1, mat2, mat3) ## generate a fourth region - industry matrix in which cells represent the presence/absence ## of a RCA (period 4) mat4 <- mat3 mat4[5, 4] <- 1 ## run the function entry_list(mat1, mat2, mat3, mat4)
This function generates a matrix of entry events from two regions - industries matrices (different matrix compositions are allowed)
entry_mat(mat1, mat2)entry_mat(mat1, mat2)
mat1 |
An incidence matrix with regions in rows and industries in columns (period 1) |
mat2 |
An incidence matrix with regions in rows and industries in columns (period 2) |
A matrix representing the entry events from two regions - industries matrices, with rows representing regions and columns representing industries
Pierre-Alexandre Balland [email protected]
Wolf-Hendrik Uhlbach [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
growth_list, entry_list, exit_list
## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[3, 1] <- 1 ## run the function entry_mat(mat1, mat2)## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[3, 1] <- 1 ## run the function entry_mat(mat1, mat2)
This function generates a data frame of exit events from multiple regions - industries matrices (different matrix compositions are allowed). In this function, the maximum number of periods is limited to 20.
exit_list(...)exit_list(...)
... |
Incidence matrices with regions in rows and industries in columns (period ... - optional) |
A data frame representing the exit events from multiple regions - industries matrices, with columns "region" (representing the region), "industry" (representing the industry), "exit" (representing the exit event), and "period" (representing the period)
Pierre-Alexandre Balland [email protected]
Wolf-Hendrik Uhlbach [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[2, 1] <- 0 ## run the function exit_list(mat1, mat2) ## generate a third region - industry matrix in which cells represent the presence/absence ## of a RCA (period 3) mat3 <- mat2 mat3[5, 1] <- 0 ## run the function exit_list(mat1, mat2, mat3) ## generate a fourth region - industry matrix in which cells represent the presence/absence ## of a RCA (period 4) mat4 <- mat3 mat4[5, 3] <- 0 ## run the function exit_list(mat1, mat2, mat3, mat4)## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[2, 1] <- 0 ## run the function exit_list(mat1, mat2) ## generate a third region - industry matrix in which cells represent the presence/absence ## of a RCA (period 3) mat3 <- mat2 mat3[5, 1] <- 0 ## run the function exit_list(mat1, mat2, mat3) ## generate a fourth region - industry matrix in which cells represent the presence/absence ## of a RCA (period 4) mat4 <- mat3 mat4[5, 3] <- 0 ## run the function exit_list(mat1, mat2, mat3, mat4)
This function generates a matrix of exit events from two regions - industries matrices (different matrix compositions are allowed)
exit_mat(mat1, mat2)exit_mat(mat1, mat2)
mat1 |
An incidence matrix with regions in rows and industries in columns (period 1) |
mat2 |
An incidence matrix with regions in rows and industries in columns (period 2) |
A matrix representing the exit events from two regions - industries matrices, with rows representing regions and columns representing industries
Pierre-Alexandre Balland [email protected]
Wolf-Hendrik Uhlbach [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
growth_list, exit_list, entry_list
## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[2, 1] <- 0 ## run the function exit_mat(mat1, mat2)## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[2, 1] <- 0 ## run the function exit_mat(mat1, mat2)
This function computes the expy index of regions from (incidence) regions - industries matrices, as proposed by Hausmann, Hwang & Rodrik (2007). The index is a measure of the productivity level associated with a region's specialization pattern.
expy(mat, vec)expy(mat, vec)
mat |
An incidence matrix with regions in rows and industries in columns |
vec |
A vector that gives GDP, R&D, education or any other relevant regional attribute that will be used to compute the weighted average for each industry |
A numeric vector representing the expy index of regions computed from the regions - industries matrix
Pierre-Alexandre Balland [email protected]
Balassa, B. (1965) Trade Liberalization and Revealed Comparative Advantage, The Manchester School 33: 99-123
Hausmann, R., Hwang, J. & Rodrik, D. (2007) What you export matters, Journal of economic growth 12: 1-25.
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## a vector of GDP of regions vec <- c(5, 10, 15, 25, 50) ## run the function expy(mat, vec)## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## a vector of GDP of regions vec <- c(5, 10, 15, 25, 50) ## run the function expy(mat, vec)
This function creates regular data frames with three columns (regions, industries, count) from (incidence) matrices (wide to long format) using the reshape2 package
get_list(mat)get_list(mat)
mat |
An incidence matrix with regions in rows and industries in columns (or the other way around) |
A data frame with three columns: "Region" (representing the region), "Industry" (representing the industry), and "Count" (representing the count of occurrences)
Pierre-Alexandre Balland [email protected]
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function get_list(mat)## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function get_list(mat)
This function creates regions - industries (incidence) matrices from regular data frames (long to wide format) using the reshape2 package or the Matrix package
get_matrix (my_data, sparse = FALSE)get_matrix (my_data, sparse = FALSE)
my_data |
is a data frame with three columns (regions, industries, count) |
sparse |
Logical; shall the returned output be a sparse matrix? Defaults to FALSE, but can be set to TRUE if the dataset is very large |
A regions - industries matrix in either dense or sparse format, depending on the value of the "sparse" parameter
Pierre-Alexandre Balland [email protected]
## generate a region - industry data frame set.seed(31) region <- c("R1", "R1", "R1", "R1", "R2", "R2", "R3", "R4", "R5", "R5") industry <- c("I1", "I2", "I3", "I4", "I1", "I2", "I1", "I1", "I3", "I3") my_data <- data.frame(region, industry) my_data$count <- 1 ## run the function get_matrix(my_data) get_matrix(my_data, sparse = TRUE)## generate a region - industry data frame set.seed(31) region <- c("R1", "R1", "R1", "R1", "R2", "R2", "R3", "R4", "R5", "R5") industry <- c("I1", "I2", "I3", "I4", "I1", "I2", "I1", "I1", "I3", "I3") my_data <- data.frame(region, industry) my_data$count <- 1 ## run the function get_matrix(my_data) get_matrix(my_data, sparse = TRUE)
This function computes the Gini coefficient. The Gini index measures spatial inequality. It ranges from 0 (perfect income equality) to 1 (perfect income inequality) and is derived from the Lorenz curve. The Gini coefficient is defined as a ratio of two surfaces derived from the Lorenz curve. The numerator is given by the area between the Lorenz curve of the distribution and the uniform distribution line (45 degrees line). The denominator is the area under the uniform distribution line (the lower triangle). This index gives an indication of the unequal distribution of an industry accross n regions. Maximum inequality in the sample occurs when n-1 regions have a score of zero and one region has a positive score. The maximum value of the Gini coefficient is (n-1)/n and approaches 1 (theoretical maximum limit) as the number of observations (regions) increases.
gini(mat)gini(mat)
mat |
A region-industry count matrix |
The Gini coefficient or a data frame with the Gini coefficient for each industry (if the input is a matrix with multiple columns)
Pierre-Alexandre Balland [email protected]
Gini, C. (1921) Measurement of Inequality of Incomes, The Economic Journal 31: 124-126
hoover_gini, locational_gini, locational_gini_curve, lorenz_curve, hoover_curve
## generate vectors of industrial count ind <- c(0, 10, 10, 30, 50) ## run the function gini(ind) ## generate a region - industry matrix mat <- matrix( c( 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1 ), ncol = 4, byrow = TRUE ) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function gini(mat) ## run the function by aggregating all industries gini(rowSums(mat)) ## run the function for industry #1 only (perfect equality) gini(mat[, 1]) ## run the function for industry #2 only (perfect equality) gini(mat[, 2]) ## run the function for industry #3 only (perfect unequality: max gini = (5-1)/5) gini(mat[, 3]) ## run the function for industry #4 only (top 40% produces 100% of the output) gini(mat[, 4])## generate vectors of industrial count ind <- c(0, 10, 10, 30, 50) ## run the function gini(ind) ## generate a region - industry matrix mat <- matrix( c( 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1 ), ncol = 4, byrow = TRUE ) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function gini(mat) ## run the function by aggregating all industries gini(rowSums(mat)) ## run the function for industry #1 only (perfect equality) gini(mat[, 1]) ## run the function for industry #2 only (perfect equality) gini(mat[, 2]) ## run the function for industry #3 only (perfect unequality: max gini = (5-1)/5) gini(mat[, 3]) ## run the function for industry #4 only (top 40% produces 100% of the output) gini(mat[, 4])
This function generates a matrix of industrial growth by industries from two regions - industries matrices (same matrix composition from two different periods)
growth_ind(mat1, mat2)growth_ind(mat1, mat2)
mat1 |
An incidence matrix with regions in rows and industries in columns (period 1) |
mat2 |
An incidence matrix with regions in rows and industries in columns (period 2) |
A matrix of industrial growth by industries
Pierre-Alexandre Balland [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3, 1] <- 8 ## run the function growth_ind(mat1, mat2)## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3, 1] <- 8 ## run the function growth_ind(mat1, mat2)
This function generates a data frame of industrial growth in regions from multiple regions - industries matrices (same matrix composition for the different periods). In this function, the maximum number of periods is limited to 20.
growth_list(...)growth_list(...)
... |
Incidence matrices with regions in rows and industries in columns (period ... - optional) |
A data frame of industrial growth in regions
Pierre-Alexandre Balland [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3, 1] <- 8 ## run the function growth_list(mat1, mat2) ## generate a third region - industry matrix with full count (period 3) mat3 <- mat2 mat3[5, 2] <- 1 ## run the function growth_list(mat1, mat2, mat3) ## generate a fourth region - industry matrix with full count (period 4) mat4 <- mat3 mat4[5, 4] <- 1 ## run the function growth_list(mat1, mat2, mat3, mat4)## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3, 1] <- 8 ## run the function growth_list(mat1, mat2) ## generate a third region - industry matrix with full count (period 3) mat3 <- mat2 mat3[5, 2] <- 1 ## run the function growth_list(mat1, mat2, mat3) ## generate a fourth region - industry matrix with full count (period 4) mat4 <- mat3 mat4[5, 4] <- 1 ## run the function growth_list(mat1, mat2, mat3, mat4)
This function generates a data frame of industrial growth in regions from multiple regions - industries matrices (same matrix composition for the different periods). In this function, the maximum number of periods is limited to 20.
growth_list_ind(...)growth_list_ind(...)
... |
Incidence matrices with regions in rows and industries in columns (period ... - optional) |
A data frame of industrial growth in regions
Pierre-Alexandre Balland [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
growth_list, entry_list, exit_list
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3, 1] <- 8 ## run the function growth_list_ind(mat1, mat2) ## generate a third region - industry matrix with full count (period 3) mat3 <- mat2 mat3[5, 2] <- 1 ## run the function growth_list_ind(mat1, mat2, mat3) ## generate a fourth region - industry matrix with full count (period 4) mat4 <- mat3 mat4[5, 4] <- 1 ## run the function growth_list_ind(mat1, mat2, mat3, mat4)## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3, 1] <- 8 ## run the function growth_list_ind(mat1, mat2) ## generate a third region - industry matrix with full count (period 3) mat3 <- mat2 mat3[5, 2] <- 1 ## run the function growth_list_ind(mat1, mat2, mat3) ## generate a fourth region - industry matrix with full count (period 4) mat4 <- mat3 mat4[5, 4] <- 1 ## run the function growth_list_ind(mat1, mat2, mat3, mat4)
This function generates a data frame of industrial growth in regions from multiple regions - industries matrices (same matrix composition for the different periods). In this function, the maximum number of periods is limited to 20.
growth_list_reg(...)growth_list_reg(...)
... |
Incidence matrices with regions in rows and industries in columns (period ... - optional) |
A data frame of region growth from multiple regions - industries matrices
Pierre-Alexandre Balland [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
growth_list, entry_list, exit_list
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3, 1] <- 8 ## run the function growth_list_reg(mat1, mat2) ## generate a third region - industry matrix with full count (period 3) mat3 <- mat2 mat3[5, 2] <- 1 ## run the function growth_list_reg(mat1, mat2, mat3) ## generate a fourth region - industry matrix with full count (period 4) mat4 <- mat3 mat4[5, 4] <- 1 ## run the function growth_list_reg(mat1, mat2, mat3, mat4)## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3, 1] <- 8 ## run the function growth_list_reg(mat1, mat2) ## generate a third region - industry matrix with full count (period 3) mat3 <- mat2 mat3[5, 2] <- 1 ## run the function growth_list_reg(mat1, mat2, mat3) ## generate a fourth region - industry matrix with full count (period 4) mat4 <- mat3 mat4[5, 4] <- 1 ## run the function growth_list_reg(mat1, mat2, mat3, mat4)
This function generates a matrix of industrial growth in regions from two regions - industries matrices (same matrix composition from two different periods)
growth_mat(mat1, mat2)growth_mat(mat1, mat2)
mat1 |
An incidence matrix with regions in rows and industries in columns (period 1) |
mat2 |
An incidence matrix with regions in rows and industries in columns (period 2) |
A matrix of industrial growth in regions from two regions - industries matrices
Pierre-Alexandre Balland [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
growth_list, entry_list, exit_list
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3, 1] <- 8 ## run the function growth_mat(mat1, mat2)## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3, 1] <- 8 ## run the function growth_mat(mat1, mat2)
This function generates a matrix of industrial growth by regions from two regions - industries matrices (same matrix composition from two different periods)
growth_reg(mat1, mat2)growth_reg(mat1, mat2)
mat1 |
An incidence matrix with regions in rows and industries in columns (period 1) |
mat2 |
An incidence matrix with regions in rows and industries in columns (period 2) |
A vector of industrial growth by regions from two regions - industries matrices
Pierre-Alexandre Balland [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
growth_list, entry_list, exit_list
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3, 1] <- 8 ## run the function growth_reg(mat1, mat2)## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3, 1] <- 8 ## run the function growth_reg(mat1, mat2)
This function computes the Hachman index from regions - industries matrices. The Hachman index indicates how closely the industrial distribution of a region resembles the one of a more global economy (nation, world). The index varies between 0 (extreme dissimilarity between the region and the more global economy) and 1 (extreme similarity between the region and the more global economy)
hachman(mat)hachman(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
A vector of Hachman index values indicating the similarity between the industrial distribution of a region and a more global economy
Pierre-Alexandre Balland [email protected]
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function hachman(mat)## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function hachman(mat)
This function computes the Herfindahl index from regions - industries matrices from (incidence) regions - industries matrices. This index is also known as the Herfindahl-Hirschman index.
herfindahl(mat)herfindahl(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
A vector of Herfindahl index values indicating the concentration of industries within regions
Pierre-Alexandre Balland [email protected]
Herfindahl, O.C. (1959) Copper Costs and Prices: 1870-1957. Baltimore: The Johns Hopkins Press.
Hirschman, A.O. (1945) National Power and the Structure of Foreign Trade, Berkeley and Los Angeles: University of California Press.
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function herfindahl(mat)## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function herfindahl(mat)
This function plots a Hoover curve from regions - industries matrices.
hoover_curve(mat, pop, plot = TRUE, pdf = FALSE, pdf_location = NULL)hoover_curve(mat, pop, plot = TRUE, pdf = FALSE, pdf_location = NULL)
mat |
An incidence matrix with regions in rows and industries in columns. The input can also be a vector of industrial regional count (a matrix with n regions in rows and a single column). |
pop |
A vector of population regional count |
plot |
Logical; shall the curve be automatically plotted? Defaults to TRUE. If set to TRUE, the function will return x y coordinates that you can latter use to plot and customize the curve. |
pdf |
Logical; shall a pdf be saved? Defaults to FALSE. If set to TRUE, a pdf with all will be compiled and saved to R's temp dir if no 'pdf_location' is specified. |
pdf_location |
Output location of pdf file |
If 'plot = FALSE', a list containing the cumulative distribution of population shares ('cum.reg') and industry shares ('cum.out') is returned. If 'plot = TRUE', no return value is specified.
Pierre-Alexandre Balland [email protected]
Hoover, E.M. (1936) The Measurement of Industrial Localization, The Review of Economics and Statistics 18 (1): 162-171
hoover_gini, locational_gini, locational_gini_curve, lorenz_curve, gini
## generate vectors of industrial and population count ind <- c(0, 10, 10, 30, 50) pop <- c(10, 15, 20, 25, 30) ## run the function (30% of the population produces 50% of the industrial output) hoover_curve (ind, pop) hoover_curve (ind, pop, pdf = FALSE) hoover_curve (ind, pop, plot = FALSE) ## generate a region - industry matrix mat = matrix ( c (0, 10, 0, 0, 0, 15, 0, 0, 0, 20, 0, 0, 0, 25, 0, 1, 0, 30, 1, 1), ncol = 4, byrow = TRUE) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function hoover_curve (mat, pop) hoover_curve (mat, pop, plot = FALSE) ## run the function by aggregating all industries hoover_curve (rowSums(mat), pop) hoover_curve (rowSums(mat), pop, plot = FALSE) ## run the function for industry #1 only hoover_curve (mat[,1], pop) hoover_curve (mat[,1], pop, plot = FALSE) ## run the function for industry #2 only (perfectly proportional to population) hoover_curve (mat[,2], pop) hoover_curve (mat[,2], pop, plot = FALSE) ## run the function for industry #3 only (30% of the pop. produces 100% of the output) hoover_curve (mat[,3], pop) hoover_curve (mat[,3], pop, plot = FALSE) ## run the function for industry #4 only (55% of the pop. produces 100% of the output) hoover_curve (mat[,4], pop) hoover_curve (mat[,4], pop, plot = FALSE) ## Compare the distribution of the #industries oldpar <- par(mfrow = c(2, 2)) # Save the current graphical parameter settings hoover_curve (mat[,1], pop) hoover_curve (mat[,2], pop) hoover_curve (mat[,3], pop) hoover_curve (mat[,4], pop) par(oldpar) # Reset the graphical parameters to their original values ## Save output as pdf hoover_curve (mat, pop, pdf = TRUE) ## To specify an output directory for the pdf, ## specify 'pdf_location', for instance as '/Users/jones/hoover_curve.pdf' ## hoover_curve(mat, pop, pdf = TRUE, pdf_location = '/Users/jones/hoover_curve.pdf')## generate vectors of industrial and population count ind <- c(0, 10, 10, 30, 50) pop <- c(10, 15, 20, 25, 30) ## run the function (30% of the population produces 50% of the industrial output) hoover_curve (ind, pop) hoover_curve (ind, pop, pdf = FALSE) hoover_curve (ind, pop, plot = FALSE) ## generate a region - industry matrix mat = matrix ( c (0, 10, 0, 0, 0, 15, 0, 0, 0, 20, 0, 0, 0, 25, 0, 1, 0, 30, 1, 1), ncol = 4, byrow = TRUE) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function hoover_curve (mat, pop) hoover_curve (mat, pop, plot = FALSE) ## run the function by aggregating all industries hoover_curve (rowSums(mat), pop) hoover_curve (rowSums(mat), pop, plot = FALSE) ## run the function for industry #1 only hoover_curve (mat[,1], pop) hoover_curve (mat[,1], pop, plot = FALSE) ## run the function for industry #2 only (perfectly proportional to population) hoover_curve (mat[,2], pop) hoover_curve (mat[,2], pop, plot = FALSE) ## run the function for industry #3 only (30% of the pop. produces 100% of the output) hoover_curve (mat[,3], pop) hoover_curve (mat[,3], pop, plot = FALSE) ## run the function for industry #4 only (55% of the pop. produces 100% of the output) hoover_curve (mat[,4], pop) hoover_curve (mat[,4], pop, plot = FALSE) ## Compare the distribution of the #industries oldpar <- par(mfrow = c(2, 2)) # Save the current graphical parameter settings hoover_curve (mat[,1], pop) hoover_curve (mat[,2], pop) hoover_curve (mat[,3], pop) hoover_curve (mat[,4], pop) par(oldpar) # Reset the graphical parameters to their original values ## Save output as pdf hoover_curve (mat, pop, pdf = TRUE) ## To specify an output directory for the pdf, ## specify 'pdf_location', for instance as '/Users/jones/hoover_curve.pdf' ## hoover_curve(mat, pop, pdf = TRUE, pdf_location = '/Users/jones/hoover_curve.pdf')
This function computes the Hoover Gini, named after Hedgar hoover_ The Hoover index is a measure of spatial inequality. It ranges from 0 (perfect equality) to 1 (perfect inequality) and is calculated from the Hoover curve associated with a given distribution of population, industries or technologies and a reference category. In this sense, it is closely related to the Gini coefficient and the Hoover index. The numerator is given by the area between the Hoover curve of the distribution and the uniform distribution line (45 degrees line). The denominator is the area under the uniform distribution line (the lower triangle).
hoover_gini(mat, pop)hoover_gini(mat, pop)
mat |
An incidence matrix with regions in rows and industries in columns. The input can also be a vector of industrial regional count (a matrix with n regions in rows and a single column). |
pop |
A vector of population regional count |
The Hoover Gini value(s). If the input matrix has a single column, the function returns a numeric value representing the Hoover Gini index. If the input matrix has multiple columns, the function returns a data frame with two columns: "Industry" (names of the industries) and "hoover_gini" (corresponding Hoover Gini values).
Pierre-Alexandre Balland [email protected]
Hoover, E.M. (1936) The Measurement of Industrial Localization, The Review of Economics and Statistics 18 (1): 162-171
hoover_curve, locational_gini, locational_gini_curve, lorenz_curve, gini
## generate vectors of industrial and population count ind <- c(0, 10, 10, 30, 50) pop <- c(10, 15, 20, 25, 30) ## run the function (30% of the population produces 50% of the industrial output) hoover_gini(ind, pop) ## generate a region - industry matrix mat <- matrix( c( 0, 10, 0, 0, 0, 15, 0, 0, 0, 20, 0, 0, 0, 25, 0, 1, 0, 30, 1, 1 ), ncol = 4, byrow = TRUE ) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function hoover_gini(mat, pop) ## run the function by aggregating all industries hoover_gini(rowSums(mat), pop) ## run the function for industry #1 only hoover_gini(mat[, 1], pop) ## run the function for industry #2 only (perfectly proportional to population) hoover_gini(mat[, 2], pop) ## run the function for industry #3 only (30% of the pop. produces 100% of the output) hoover_gini(mat[, 3], pop) ## run the function for industry #4 only (55% of the pop. produces 100% of the output) hoover_gini(mat[, 4], pop)## generate vectors of industrial and population count ind <- c(0, 10, 10, 30, 50) pop <- c(10, 15, 20, 25, 30) ## run the function (30% of the population produces 50% of the industrial output) hoover_gini(ind, pop) ## generate a region - industry matrix mat <- matrix( c( 0, 10, 0, 0, 0, 15, 0, 0, 0, 20, 0, 0, 0, 25, 0, 1, 0, 30, 1, 1 ), ncol = 4, byrow = TRUE ) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function hoover_gini(mat, pop) ## run the function by aggregating all industries hoover_gini(rowSums(mat), pop) ## run the function for industry #1 only hoover_gini(mat[, 1], pop) ## run the function for industry #2 only (perfectly proportional to population) hoover_gini(mat[, 2], pop) ## run the function for industry #3 only (30% of the pop. produces 100% of the output) hoover_gini(mat[, 3], pop) ## run the function for industry #4 only (55% of the pop. produces 100% of the output) hoover_gini(mat[, 4], pop)
This function computes the Hoover index, named after Hedgar Hoover. The Hoover index is a measure of spatial inequality. It ranges from 0 (perfect equality) to 100 (perfect inequality) and is calculated from the Lorenz curve associated with a given distribution of population, industries or technologies. In this sense, it is closely related to the Gini coefficient. The Hoover index represents the maximum vertical distance between the Lorenz curve and the 45 degree line of perfect spatial equality. It indicates the proportion of industries, jobs, or population needed to be transferred from the top to the bottom of the distribution to achieve perfect spatial equality. The Hoover index is also known as the Robin Hood index in studies of income inequality.
Computation of the Hoover index:
hoover_index(mat, pop)hoover_index(mat, pop)
mat |
An incidence matrix with regions in rows and industries in columns. The input can also be a vector of industrial regional count (a matrix with n regions in rows and a single column). |
pop |
A vector of population regional count; if this argument is missing an equal distribution of the reference group will be assumed. |
The Hoover index value(s) as either a numeric value or a data frame with two columns: "Industry" (names of the industries) and "hoover_index" (corresponding Hoover index values).
Pierre-Alexandre Balland [email protected]
Hoover, E.M. (1936) The Measurement of Industrial Localization, The Review of Economics and Statistics 18 (1): 162-171
hoover_curve, hoover_gini, locational_gini, locational_gini_curve, lorenz_curve, gini
## generate vectors of industrial and population count ind <- c(0, 10, 10, 30, 50) pop <- c(10, 15, 20, 25, 30) ## run the function (30% of the population produces 50% of the industrial output) hoover_index(ind, pop) ## generate a region - industry matrix mat <- matrix( c( 0, 10, 0, 0, 0, 15, 0, 0, 0, 20, 0, 0, 0, 25, 0, 1, 0, 30, 1, 1 ), ncol = 4, byrow = TRUE ) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function hoover_index(mat, pop) ## run the function by aggregating all industries hoover_index(rowSums(mat), pop) ## run the function for industry #1 only hoover_index(mat[, 1], pop) ## run the function for industry #2 only (perfectly proportional to population) hoover_index(mat[, 2], pop) ## run the function for industry #3 only (30% of the pop. produces 100% of the output) hoover_index(mat[, 3], pop) ## run the function for industry #4 only (55% of the pop. produces 100% of the output) hoover_index(mat[, 4], pop)## generate vectors of industrial and population count ind <- c(0, 10, 10, 30, 50) pop <- c(10, 15, 20, 25, 30) ## run the function (30% of the population produces 50% of the industrial output) hoover_index(ind, pop) ## generate a region - industry matrix mat <- matrix( c( 0, 10, 0, 0, 0, 15, 0, 0, 0, 20, 0, 0, 0, 25, 0, 1, 0, 30, 1, 1 ), ncol = 4, byrow = TRUE ) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function hoover_index(mat, pop) ## run the function by aggregating all industries hoover_index(rowSums(mat), pop) ## run the function for industry #1 only hoover_index(mat[, 1], pop) ## run the function for industry #2 only (perfectly proportional to population) hoover_index(mat[, 2], pop) ## run the function for industry #3 only (30% of the pop. produces 100% of the output) hoover_index(mat[, 3], pop) ## run the function for industry #4 only (55% of the pop. produces 100% of the output) hoover_index(mat[, 4], pop)
This function computes a measure of complexity from the inverse of the normalized ubiquity of industries. We divide the logarithm of the total count (employment, number of firms, number of patents, ...) in an industry by its ubiquity. Ubiquity is given by the number of regions in which an industry can be found (location quotient > 1) from regions - industries (incidence) matrices
inv_norm_ubiquity(mat)inv_norm_ubiquity(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
A vector of complexity values computed from the inverse of the normalized ubiquity of industries.
Pierre-Alexandre Balland [email protected]
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
diversity, location_quotient, ubiquity, tci, mort
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function inv_norm_ubiquity(mat)## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function inv_norm_ubiquity(mat)
This function computes an index of knowledge complexity of regions using the eigenvector method from regions - industries (incidence) matrices. Technically, the function returns the eigenvector associated with the second largest eigenvalue of the projected region - region matrix.
kci(mat, rca = FALSE)kci(mat, rca = FALSE)
mat |
An incidence matrix with regions in rows and industries in columns |
rca |
Logical; should the index of relative comparative advantage (RCA - also refered to as location quotient) first be computed? Defaults to FALSE (a binary matrix - 0/1 - is expected as an input), but can be set to TRUE if the index of relative comparative advantage first needs to be computed |
A vector representing the index of knowledge complexity of regions computed using the eigenvector method.
Pierre-Alexandre Balland [email protected]
Hidalgo, C. and Hausmann, R. (2009) The building blocks of economic complexity, Proceedings of the National Academy of Sciences 106: 10570 - 10575.
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
location_quotient, ubiquity, diversity, morc, tci, mort
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function kci(mat, rca = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(31) mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function kci(mat) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1", "P2", "P3", "P4", "P2", "P3", "P4", "P4") my_data <- data.frame(countries, products) my_data$freq <- 1 mat <- get_matrix(my_data) ## run the function kci(mat)## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function kci(mat, rca = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(31) mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function kci(mat) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1", "P2", "P3", "P4", "P2", "P3", "P4", "P4") my_data <- data.frame(countries, products) my_data$freq <- 1 mat <- get_matrix(my_data) ## run the function kci(mat)
This function computes the Krugman index from regions - industries matrices. The higher the coefficient, the greater the regional specialization. This index is often referred to as the Krugman specialisation index and measures the distance between the distributions of industry shares in a region and at a more aggregated level (country for instance).
krugman_index(mat)krugman_index(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
A vector representing the Krugman index of regional specialization computed from the regions - industries matrix.
Pierre-Alexandre Balland [email protected]
Krugman P. (1991) Geography and Trade, MIT Press, Cambridge
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function krugman_index(mat)## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function krugman_index(mat)
This function computes location quotients from (incidence) regions - industries matrices. The numerator is the share of a given industry in a given region. The denominator is the share of a this industry in a larger economy (overall country for instance). This index is also refered to as the index of Revealed Comparative Advantage (RCA) following Ballasa (1965), or the Hoover-Balassa index.
location_quotient(mat, binary = FALSE)location_quotient(mat, binary = FALSE)
mat |
An incidence matrix with regions in rows and industries in columns |
binary |
Logical; shall the returned output be a dichotomized version (0/1) of the location quotient? Defaults to FALSE (the full values of the location quotient will be returned), but can be set to TRUE (location quotient values above 1 will be set to 1 & location quotient values below 1 will be set to 0) |
A matrix of location quotients computed from the regions - industries matrix. If the 'binary' parameter is set to TRUE, the returned matrix will contain binary values (0/1) representing the location quotient. If 'binary' is set to FALSE, the full values of the location quotient will be returned.
Pierre-Alexandre Balland [email protected]
Balassa, B. (1965) Trade Liberalization and Revealed Comparative Advantage, The Manchester School 33: 99-123.
## generate a region - industry matrix mat <- matrix( c( 100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0 ), ncol = 5, byrow = TRUE ) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4", "I5") ## run the function location_quotient(mat) location_quotient(mat, binary = TRUE)## generate a region - industry matrix mat <- matrix( c( 100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0 ), ncol = 5, byrow = TRUE ) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4", "I5") ## run the function location_quotient(mat) location_quotient(mat, binary = TRUE)
This function computes the average location quotients of regions from (incidence) regions - industries matrices. This index is also referred to as the coefficient of specialization (Hoover and Giarratani, 1985).
location_quotient_avg(mat)location_quotient_avg(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
A vector of average location quotients computed for each region from the regions - industries matrix. The average location quotient represents the degree of specialization of each region in different industries.
Pierre-Alexandre Balland [email protected]
Hoover, E.M. and Giarratani, F. (1985) An Introduction to Regional Economics. 3rd edition. New York: Alfred A. Knopf
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
## generate a region - industry matrix mat <- matrix( c( 100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0 ), ncol = 5, byrow = TRUE ) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4", "I5") ## run the function location_quotient_avg(mat)## generate a region - industry matrix mat <- matrix( c( 100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0 ), ncol = 5, byrow = TRUE ) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4", "I5") ## run the function location_quotient_avg(mat)
This function computes the locational Gini coefficient as proposed by Krugman from regions - industries matrices. The higher the coefficient (theoretical limit = 0.5), the greater the industrial concentration. The locational Gini of an industry that is not localized at all (perfectly spread out) in proportion to overall employment would be 0.
locational_gini(mat)locational_gini(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
A data frame with two columns: "Industry" and "Loc_gini". The "Industry" column contains the names of the industries, and the "Loc_gini" column contains the locational Gini coefficient computed for each industry from the regions - industries matrix.
Pierre-Alexandre Balland [email protected]
Krugman P. (1991) Geography and Trade, MIT Press, Cambridge (chapter 2 - p.56)
hoover_gini, locational_gini_curve, hoover_curve, lorenz_curve, gini
## generate a region - industry matrix mat <- matrix( c( 100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0 ), ncol = 5, byrow = TRUE ) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4", "I5") ## run the function locational_gini(mat)## generate a region - industry matrix mat <- matrix( c( 100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0 ), ncol = 5, byrow = TRUE ) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4", "I5") ## run the function locational_gini(mat)
This function plots a locational Gini curve following Krugman from regions - industries matrices.
locational_gini_curve(mat, pdf = FALSE, pdf_location = NULL)locational_gini_curve(mat, pdf = FALSE, pdf_location = NULL)
mat |
An incidence matrix with regions in rows and industries in columns. The input can also be a vector of industrial regional count (a matrix with n regions in rows and a single column). |
pdf |
Logical; shall a pdf be saved? Defaults to FALSE. If set to TRUE, a pdf with all will be compiled and saved to R's temp dir if no 'pdf_location' is specified. |
pdf_location |
Output location of pdf file |
No return value, produces a plot or pdf.
Pierre-Alexandre Balland [email protected]
Krugman P. (1991) Geography and Trade, MIT Press, Cambridge (chapter 2 - p.56)
hoover_gini, locational_gini, hoover_curve, lorenz_curve, gini
## generate a region - industry matrix mat <- matrix( c( 100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0 ), ncol = 5, byrow = TRUE ) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4", "I5") ## run the function (shows industry #5) locational_gini_curve(mat, pdf = FALSE) locational_gini_curve(mat, pdf = FALSE) ## Save output as pdf locational_gini_curve(mat, pdf = TRUE) ## To specify an output directory for the pdf, ## specify 'pdf_location', for instance as '/Users/jones/locational_gini_curve.pdf' ## locational_gini_curve(mat, pdf = TRUE, pdf_location = '/Users/jones/locational_gini_curve.pdf')## generate a region - industry matrix mat <- matrix( c( 100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0 ), ncol = 5, byrow = TRUE ) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4", "I5") ## run the function (shows industry #5) locational_gini_curve(mat, pdf = FALSE) locational_gini_curve(mat, pdf = FALSE) ## Save output as pdf locational_gini_curve(mat, pdf = TRUE) ## To specify an output directory for the pdf, ## specify 'pdf_location', for instance as '/Users/jones/locational_gini_curve.pdf' ## locational_gini_curve(mat, pdf = TRUE, pdf_location = '/Users/jones/locational_gini_curve.pdf')
This function plots a Lorenz curve from regional industrial counts. This curve gives an indication of the unequal distribution of an industry accross regions.
lorenz_curve(mat, plot = TRUE, pdf = TRUE, pdf_location = NULL)lorenz_curve(mat, plot = TRUE, pdf = TRUE, pdf_location = NULL)
mat |
An incidence matrix with regions in rows and industries in columns. The input can also be a vector of industrial regional count (a matrix with n regions in rows and a single column). |
plot |
Logical; shall the curve be automatically plotted? Defaults to TRUE. If set to TRUE, the function will return x y coordinates that you can latter use to plot and customize the curve. |
pdf |
Logical; shall a pdf be saved? Defaults to FALSE. If set to TRUE, a pdf with all will be compiled and saved to R's temp dir if no 'pdf_location' is specified. |
pdf_location |
Output location of pdf file |
If 'plot = FALSE', the function returns a list with two components: - 'cum.reg': A vector of cumulative proportions of regions. - 'cum.out': A vector of cumulative proportions of industrial output. If 'plot = TRUE', the function generates a plot of the Lorenz curve and does not return a value.
Pierre-Alexandre Balland [email protected]
Lorenz, M. O. (1905) Methods of measuring the concentration of wealth, Publications of the American Statistical Association 9: 209–219
hoover_gini, locational_gini, locational_gini_curve, hoover_curve, gini
## generate vectors of industrial count ind <- c(0, 10, 10, 30, 50) ## run the function lorenz_curve (ind) lorenz_curve (ind, plot = FALSE) ## generate a region - industry matrix mat = matrix ( c (0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1), ncol = 4, byrow = TRUE) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function lorenz_curve (mat) lorenz_curve (mat, plot = FALSE) ## run the function by aggregating all industries lorenz_curve (rowSums(mat)) lorenz_curve (rowSums(mat), plot = FALSE) ## run the function for industry #1 only (perfect equality) lorenz_curve (mat[,1]) lorenz_curve (mat[,1], plot = FALSE) ## run the function for industry #2 only (perfect equality) lorenz_curve (mat[,2]) lorenz_curve (mat[,2], plot = FALSE) ## run the function for industry #3 only (perfect unequality) lorenz_curve (mat[,3]) lorenz_curve (mat[,3], plot = FALSE) ## run the function for industry #4 only (top 40% produces 100% of the output) lorenz_curve (mat[,4]) lorenz_curve (mat[,4], plot = FALSE) ## Compare the distribution of the #industries oldpar <- par(mfrow = c(2, 2)) # Save the current graphical parameter settings lorenz_curve (mat[,1]) lorenz_curve (mat[,2]) lorenz_curve (mat[,3]) lorenz_curve (mat[,4]) par(oldpar) # Reset the graphical parameters to their original values ## Save output as pdf lorenz_curve (mat, pdf = TRUE) ## To specify an output directory for the pdf, ## specify 'pdf_location', for instance as '/Users/jones/lorenz_curve.pdf' ## lorenz_curve(mat, pdf = TRUE, pdf_location = '/Users/jones/lorenz_curve.pdf')## generate vectors of industrial count ind <- c(0, 10, 10, 30, 50) ## run the function lorenz_curve (ind) lorenz_curve (ind, plot = FALSE) ## generate a region - industry matrix mat = matrix ( c (0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1), ncol = 4, byrow = TRUE) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function lorenz_curve (mat) lorenz_curve (mat, plot = FALSE) ## run the function by aggregating all industries lorenz_curve (rowSums(mat)) lorenz_curve (rowSums(mat), plot = FALSE) ## run the function for industry #1 only (perfect equality) lorenz_curve (mat[,1]) lorenz_curve (mat[,1], plot = FALSE) ## run the function for industry #2 only (perfect equality) lorenz_curve (mat[,2]) lorenz_curve (mat[,2], plot = FALSE) ## run the function for industry #3 only (perfect unequality) lorenz_curve (mat[,3]) lorenz_curve (mat[,3], plot = FALSE) ## run the function for industry #4 only (top 40% produces 100% of the output) lorenz_curve (mat[,4]) lorenz_curve (mat[,4], plot = FALSE) ## Compare the distribution of the #industries oldpar <- par(mfrow = c(2, 2)) # Save the current graphical parameter settings lorenz_curve (mat[,1]) lorenz_curve (mat[,2]) lorenz_curve (mat[,3]) lorenz_curve (mat[,4]) par(oldpar) # Reset the graphical parameters to their original values ## Save output as pdf lorenz_curve (mat, pdf = TRUE) ## To specify an output directory for the pdf, ## specify 'pdf_location', for instance as '/Users/jones/lorenz_curve.pdf' ## lorenz_curve(mat, pdf = TRUE, pdf_location = '/Users/jones/lorenz_curve.pdf')
This function e-arranges the dimension of a matrix based on the dimension of another matrix
match_mat(fill, dim, missing = TRUE)match_mat(fill, dim, missing = TRUE)
fill |
A matrix that will be used to populate the matrix output |
dim |
A matrix that will be used to determine the dimensions of the matrix output |
missing |
Logical; Shall the cells of the non matching rows/columns set to NA? Default to TRUE but can be set to FALSE to set the cells of the non matching rows/columns to 0 instead. |
The matrix output with the dimensions rearranged based on the input 'dim' matrix.
Pierre-Alexandre Balland [email protected]
## generate a first region - industry matrix set.seed(31) mat1 <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix set.seed(31) mat2 <- matrix(sample(0:1, 16, replace = TRUE), ncol = 4) rownames(mat2) <- c("R1", "R2", "R3", "R5") colnames(mat2) <- c("I1", "I2", "I3", "I4") ## run the function match_mat(fill = mat1, dim = mat2) match_mat(fill = mat2, dim = mat1) match_mat(fill = mat2, dim = mat1, missing = FALSE)## generate a first region - industry matrix set.seed(31) mat1 <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat1) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c("I1", "I2", "I3", "I4") ## generate a second region - industry matrix set.seed(31) mat2 <- matrix(sample(0:1, 16, replace = TRUE), ncol = 4) rownames(mat2) <- c("R1", "R2", "R3", "R5") colnames(mat2) <- c("I1", "I2", "I3", "I4") ## run the function match_mat(fill = mat1, dim = mat2) match_mat(fill = mat2, dim = mat1) match_mat(fill = mat2, dim = mat1, missing = FALSE)
This function computes a measure of modular complexity of patent documents from technological classes - patents (incidence) matrices
modular_complexity(mat, sparse = FALSE, list = FALSE)modular_complexity(mat, sparse = FALSE, list = FALSE)
mat |
A bipartite adjacency matrix (can be a sparse matrix) |
sparse |
Logical; is the input matrix a sparse matrix? Defaults to FALSE, but can be set to TRUE if the input matrix is a sparse matrix |
list |
Logical; is the input a list? Defaults to FALSE (input = adjacency matrix), but can be set to TRUE if the input is an edge list |
A data frame with columns "patent" and "mod.comp" representing the patents and their corresponding modular complexity values.
Pierre-Alexandre Balland [email protected]
Fleming, L. and Sorenson, O. (2001) Technology as a complex adaptive system: evidence from patent data, Research Policy 30: 1019-1039
## generate a technology - patent matrix set.seed(31) mat <- matrix(sample(0:1, 30, replace = TRUE), ncol = 5) rownames(mat) <- c("T1", "T2", "T3", "T4", "T5", "T6") colnames(mat) <- c("US1", "US2", "US3", "US4", "US5") ## run the function modular_complexity(mat) ## generate a technology - patent sparse matrix library(Matrix) ## run the function smat <- Matrix(mat, sparse = TRUE) modular_complexity(smat, sparse = TRUE) ## generate a regular data frame (list) my_list <- get_list(mat) ## run the function modular_complexity(my_list, list = TRUE)## generate a technology - patent matrix set.seed(31) mat <- matrix(sample(0:1, 30, replace = TRUE), ncol = 5) rownames(mat) <- c("T1", "T2", "T3", "T4", "T5", "T6") colnames(mat) <- c("US1", "US2", "US3", "US4", "US5") ## run the function modular_complexity(mat) ## generate a technology - patent sparse matrix library(Matrix) ## run the function smat <- Matrix(mat, sparse = TRUE) modular_complexity(smat, sparse = TRUE) ## generate a regular data frame (list) my_list <- get_list(mat) ## run the function modular_complexity(my_list, list = TRUE)
This function computes a measure of average modular complexity of technologies (average complexity of patent documents in a given technological class) from technological classes - patents (incidence) matrices
modular_complexity_avg(mat, sparse = FALSE, list = FALSE)modular_complexity_avg(mat, sparse = FALSE, list = FALSE)
mat |
A bipartite adjacency matrix (can be a sparse matrix) |
sparse |
Logical; is the input matrix a sparse matrix? Defaults to FALSE, but can be set to TRUE if the input matrix is a sparse matrix |
list |
Logical; is the input a list? Defaults to FALSE (input = adjacency matrix), but can be set to TRUE if the input is an edge list |
A data frame with columns "tech" and "avg.mod.comp" representing the technologies and their corresponding average modular complexity values.
Pierre-Alexandre Balland [email protected]
Fleming, L. and Sorenson, O. (2001) Technology as a complex adaptive system: evidence from patent data, Research Policy 30: 1019-1039
## generate a technology - patent matrix set.seed(31) mat <- matrix(sample(0:1, 30, replace = TRUE), ncol = 5) rownames(mat) <- c("T1", "T2", "T3", "T4", "T5", "T6") colnames(mat) <- c("US1", "US2", "US3", "US4", "US5") ## run the function modular_complexity_avg(mat) ## generate a technology - patent sparse matrix library(Matrix) ## run the function smat <- Matrix(mat, sparse = TRUE) modular_complexity_avg(smat, sparse = TRUE) ## generate a regular data frame (list) my_list <- get_list(mat) ## run the function modular_complexity_avg(my_list, list = TRUE)## generate a technology - patent matrix set.seed(31) mat <- matrix(sample(0:1, 30, replace = TRUE), ncol = 5) rownames(mat) <- c("T1", "T2", "T3", "T4", "T5", "T6") colnames(mat) <- c("US1", "US2", "US3", "US4", "US5") ## run the function modular_complexity_avg(mat) ## generate a technology - patent sparse matrix library(Matrix) ## run the function smat <- Matrix(mat, sparse = TRUE) modular_complexity_avg(smat, sparse = TRUE) ## generate a regular data frame (list) my_list <- get_list(mat) ## run the function modular_complexity_avg(my_list, list = TRUE)
This function computes an index of knowledge complexity of regions using the method of reflection from regions - industries (incidence) matrices. The index has been developed by Hidalgo and Hausmann (2009) for country - product matrices and adapted by Balland and Rigby (2016) to city - technology matrices.
morc(mat, rca = FALSE, steps = 20)morc(mat, rca = FALSE, steps = 20)
mat |
An incidence matrix with regions in rows and industries in columns |
rca |
Logical; should the index of relative comparative advantage (RCA - also refered to as location quotient) first be computed? Defaults to FALSE (a binary matrix - 0/1 - is expected as an input), but can be set to TRUE if the index of relative comparative advantage first needs to be computed |
steps |
Number of iteration steps. Defaults to 20, but can be set to 0 to give diversity (number of industry in which a region has a RCA), to 1 to give the average ubiquity of the industries in which a region has a RCA, to 2 to give the average diversity of regions that have similar industrial structures, or to any other number of steps < or = to 22. Note that above steps = 2 the index will be rescaled from 0 (minimum relative complexity) to 100 (maximum relative complexity). |
If 'steps' is set to 0, the function returns a numeric vector representing the diversification of regions. Otherwise, it returns
Pierre-Alexandre Balland [email protected]
Hidalgo, C. and Hausmann, R. (2009) The building blocks of economic complexity, Proceedings of the National Academy of Sciences 106: 10570 - 10575.
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
location_quotient, ubiquity, diversity, kci, tci, mort
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function morc(mat, rca = TRUE) morc(mat, rca = TRUE, steps = 0) morc(mat, rca = TRUE, steps = 1) morc(mat, rca = TRUE, steps = 2) ## generate a region - industry matrix in which cells represent the presence/absence of an RCA set.seed(32) mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function morc(mat) morc(mat, steps = 0) morc(mat, steps = 1) morc(mat, steps = 2) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1", "P2", "P3", "P4", "P2", "P3", "P4", "P4") my_data <- data.frame(countries, products) my_data$freq <- 1 mat <- get_matrix(my_data) ## run the function morc(mat) morc(mat, steps = 0) morc(mat, steps = 1) morc(mat, steps = 2)## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function morc(mat, rca = TRUE) morc(mat, rca = TRUE, steps = 0) morc(mat, rca = TRUE, steps = 1) morc(mat, rca = TRUE, steps = 2) ## generate a region - industry matrix in which cells represent the presence/absence of an RCA set.seed(32) mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function morc(mat) morc(mat, steps = 0) morc(mat, steps = 1) morc(mat, steps = 2) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1", "P2", "P3", "P4", "P2", "P3", "P4", "P4") my_data <- data.frame(countries, products) my_data$freq <- 1 mat <- get_matrix(my_data) ## run the function morc(mat) morc(mat, steps = 0) morc(mat, steps = 1) morc(mat, steps = 2)
This function computes an index of knowledge complexity of industries using the method of reflection from regions - industries (incidence) matrices. The index has been developed by Hidalgo and Hausmann (2009) for country - product matrices and adapted by Balland and Rigby (2016) to city - technology matrices.
mort(mat, rca = FALSE, steps = 19)mort(mat, rca = FALSE, steps = 19)
mat |
An incidence matrix with regions in rows and industries in columns |
rca |
Logical; should the index of relative comparative advantage (RCA - also refered to as location quotient) first be computed? Defaults to FALSE (a binary matrix - 0/1 - is expected as an input), but can be set to TRUE if the index of relative comparative advantage first needs to be computed |
steps |
Number of iteration steps. Defaults to 19, but can be set to 0 to give ubiquity (number of regions that have a RCA in a industry), to 1 to give the average diversity of the regions that have a RCA in this industry, to 2 to give the average ubiquity of technologies developed in the same regions, or to any other number of steps < or = to 21. Note that above steps = 2 the index will be rescaled from 0 (minimum relative complexity) to 100 (maximum relative complexity). |
If 'steps' is set to 0, the function returns a numeric vector representing the ubiquity (number of regions that have a relative comparative advantage) of industries. Otherwise, it returns a numeric vector representing the index of knowledge complexity of industries based on the specified number of iteration steps.
Pierre-Alexandre Balland [email protected]
Hidalgo, C. and Hausmann, R. (2009) The building blocks of economic complexity, Proceedings of the National Academy of Sciences 106: 10570 - 10575.
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
location_quotient, ubiquity, diversity, kci, tci, morc
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function mort(mat, rca = TRUE) mort(mat, rca = TRUE, steps = 0) mort(mat, rca = TRUE, steps = 1) mort(mat, rca = TRUE, steps = 2) ## generate a region - industry matrix in which cells represent the presence/absence of a rca set.seed(32) mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function mort(mat) mort(mat, steps = 0) mort(mat, steps = 1) mort(mat, steps = 2) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1", "P2", "P3", "P4", "P2", "P3", "P4", "P4") my_data <- data.frame(countries, products) my_data$freq <- 1 mat <- get_matrix(my_data) ## run the function mort(mat) mort(mat, steps = 0) mort(mat, steps = 1) mort(mat, steps = 2)## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function mort(mat, rca = TRUE) mort(mat, rca = TRUE, steps = 0) mort(mat, rca = TRUE, steps = 1) mort(mat, rca = TRUE, steps = 2) ## generate a region - industry matrix in which cells represent the presence/absence of a rca set.seed(32) mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function mort(mat) mort(mat, steps = 0) mort(mat, steps = 1) mort(mat, steps = 2) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1", "P2", "P3", "P4", "P2", "P3", "P4", "P4") my_data <- data.frame(countries, products) my_data$freq <- 1 mat <- get_matrix(my_data) ## run the function mort(mat) mort(mat, steps = 0) mort(mat, steps = 1) mort(mat, steps = 2)
This function computes a measure of complexity by normalizing ubiquity of industries. We divide the share of the total count (employment, number of firms, number of patents, ...) in an industry by its share of ubiquity. Ubiquity is given by the number of regions in which an industry can be found (location quotient > 1) from regions - industries (incidence) matrices
norm_ubiquity(mat)norm_ubiquity(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
A numeric vector representing the measure of complexity obtained by normalizing the ubiquity of industries. Each value in the vector corresponds to the normalized complexity score of an industry.
Pierre-Alexandre Balland [email protected]
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
diversity, location_quotient, ubiquity, tci, mort
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function norm_ubiquity(mat)## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function norm_ubiquity(mat)
This function computes the prody index of industries from (incidence) regions - industries matrices, as proposed by Hausmann, Hwang & Rodrik (2007). The index gives an associated income level for each industry. It represents a weighted average of per-capita GDPs (but GDP can be replaced by R&D, education...), where the weights correspond to the revealed comparative advantage of each region in a given industry (or sector, technology, ...).
prody(mat, vec)prody(mat, vec)
mat |
An incidence matrix with regions in rows and industries in columns |
vec |
A vector that gives GDP, R&D, education or any other relevant regional attribute that will be used to compute the weighted average for each industry |
A numeric vector representing the prody index of industries. Each value in the vector corresponds to the associated income level for an industry.
Pierre-Alexandre Balland [email protected]
Balassa, B. (1965) Trade Liberalization and Revealed Comparative Advantage, The Manchester School 33: 99-123
Hausmann, R., Hwang, J. & Rodrik, D. (2007) What you export matters, Journal of economic growth 12: 1-25.
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## a vector of GDP of regions vec <- c(5, 10, 15, 25, 50) ## run the function prody(mat, vec)## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## a vector of GDP of regions vec <- c(5, 10, 15, 25, 50) ## run the function prody(mat, vec)
This function computes an index of revealed comparative advantage (RCA) from (incidence) regions - industries matrices. The numerator is the share of a given industry in a given region. The denominator is the share of a this industry in a larger economy (overall country for instance). This index is also refered to as a location quotient, or the Hoover-Balassa index.
rca(mat, binary = FALSE)rca(mat, binary = FALSE)
mat |
An incidence matrix with regions in rows and industries in columns |
binary |
Logical; shall the returned output be a dichotomized version (0/1) of the RCA? Defaults to FALSE (the full values of the RCA will be returned), but can be set to TRUE (RCA above 1 will be set to 1 & RCA values below 1 will be set to 0) |
A matrix representing the index of revealed comparative advantage (RCA) or location quotient. Each cell in the matrix corresponds to the RCA value for a specific region and industry. If the 'binary' parameter is set to TRUE, the returned matrix will be dichotomized, with values above 1 set to 1 and values below 1 set to 0.
Pierre-Alexandre Balland [email protected]
Balassa, B. (1965) Trade Liberalization and Revealed Comparative Advantage, The Manchester School 33: 99-123.
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function rca(mat) rca(mat, binary = TRUE)## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function rca(mat) rca(mat, binary = TRUE)
This function computes the Hoover coefficient of specialization from regions - industries matrices. The higher the coefficient, the greater the regional specialization. This index is closely related to the Krugman specialisation index.
spec_coeff(mat)spec_coeff(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
A vector representing the Hoover coefficient of specialization for each region. The values in the vector indicate the degree of regional specialization, with higher values indicating greater specialization.
Pierre-Alexandre Balland [email protected]
Hoover, E.M. and Giarratani, F. (1985) An Introduction to Regional Economics. 3rd edition. New York: Alfred A. Knopf (see table 9-4 in particular)
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function spec_coeff(mat)## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function spec_coeff(mat)
This function computes an index of knowledge complexity of industries using the eigenvector method from regions - industries (incidence) matrices. Technically, the function returns the eigenvector associated with the second largest eigenvalue of the projected industry - industry matrix.
tci(mat, rca = FALSE)tci(mat, rca = FALSE)
mat |
An incidence matrix with regions in rows and industries in columns |
rca |
Logical; should the index of relative comparative advantage (RCA - also refered to as location quotient) first be computed? Defaults to FALSE (a binary matrix - 0/1 - is expected as an input), but can be set to TRUE if the index of relative comparative advantage first needs to be computed |
A numeric vector representing the index of knowledge complexity of industries. The vector contains the values of the eigenvector associated with the second largest eigenvalue of the projected industry - industry matrix.
Pierre-Alexandre Balland [email protected]
Hidalgo, C. and Hausmann, R. (2009) The building blocks of economic complexity, Proceedings of the National Academy of Sciences 106: 10570 - 10575.
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
location_quotient, ubiquity, diversity, morc, kci, mort
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function tci(mat, rca = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a rca set.seed(31) mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function tci(mat) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1", "P2", "P3", "P4", "P2", "P3", "P4", "P4") my_data <- data.frame(countries, products) my_data$freq <- 1 mat <- get_matrix(my_data) ## run the function tci(mat)## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function tci(mat, rca = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a rca set.seed(31) mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function tci(mat) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1", "P2", "P3", "P4", "P2", "P3", "P4", "P4") my_data <- data.frame(countries, products) my_data$freq <- 1 mat <- get_matrix(my_data) ## run the function tci(mat)
This function computes a simple measure of ubiquity of industries by counting the number of regions in which an industry can be found (location quotient > 1) from regions - industries (incidence) matrices
ubiquity(mat, rca = FALSE)ubiquity(mat, rca = FALSE)
mat |
An incidence matrix with regions in rows and industries in columns |
rca |
Logical; should the index of relative comparative advantage (RCA - also refered to as location quotient) first be computed? Defaults to FALSE (a binary matrix - 0/1 - is expected as an input), but can be set to TRUE if the index of relative comparative advantage first needs to be computed |
A numeric vector representing the measure of ubiquity of industries. Each element of the vector corresponds to the number of regions in which an industry can be found (location quotient > 1).
Pierre-Alexandre Balland [email protected]
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function ubiquity(mat, rca = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a rca set.seed(31) mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function ubiquity(mat)## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function ubiquity(mat, rca = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a rca set.seed(31) mat <- matrix(sample(0:1, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## run the function ubiquity(mat)
This function computes a weighted average of regions or industries from (incidence) regions - industries matrices.
weighted_avg(mat, vec, reg = TRUE)weighted_avg(mat, vec, reg = TRUE)
mat |
An incidence matrix with regions in rows and industries in columns |
vec |
A vector that will be used to compute the weighted average for each industry/region |
reg |
Logical; Shall the weighted average for regions be returned? Default to TRUE (requires a vector of industry value) but can be set to FALSE (requires a vector of region value) if the weighted average for industries should be returned |
A numeric vector representing the weighted average of regions or industries, depending on the value of the 'reg' argument. If 'reg = TRUE', the weighted average for regions is returned; if 'reg = FALSE', the weighted average for industries is returned.
Pierre-Alexandre Balland [email protected]
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## a vector for regions will be used to computed the weighted average of industries vec <- c(5, 10, 15, 25, 50) ## run the function weighted_avg(mat, vec, reg = FALSE) ## a vector for industries will be used to computed the weighted average of regions vec <- c(5, 10, 15, 25) ## run the function weighted_avg(mat, vec, reg = TRUE)## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100, 20, replace = TRUE), ncol = 4) rownames(mat) <- c("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c("I1", "I2", "I3", "I4") ## a vector for regions will be used to computed the weighted average of industries vec <- c(5, 10, 15, 25, 50) ## run the function weighted_avg(mat, vec, reg = FALSE) ## a vector for industries will be used to computed the weighted average of regions vec <- c(5, 10, 15, 25) ## run the function weighted_avg(mat, vec, reg = TRUE)
This function computes the z-score between pairs of technologies from a patent-technology incidence matrix. The z-score is a measure to analyze the co-occurrence of technologies in patent documents (i.e. knowledge combination). It compares the observed number of co-occurrences to what would be expected under the hypothesis that combination is random. A positive z-score indicates a typical co-occurrence which has occurred multiple times before. In contrast, a negative z-socre indicates an atypical co-occurrence. The z-score has been used to estimate the degree of novelty of patents (Kim 2016), scientific publications (Uzzi et al. 2013) or the relatedness between industries (Teece et al. 1994).
z_score(mat)z_score(mat)
mat |
A patent-technology incidence matrix with patents in rows and technologies in columns |
A matrix of z-scores representing the co-occurrence of technologies in the input incidence matrix. The z-score measures the deviation of the observed co-occurrence from the expected co-occurrence under the assumption of random combination. Positive z-scores indicate typical co-occurrences, while negative z-scores indicate atypical co-occurrences.
Lars Mewes [email protected]
Kim, D., Cerigo, D. B., Jeong, H., and Youn, H. (2016). Technological novelty proile and invention's future impact. EPJ Data Science, 5 (1):1–15
Teece, D. J., Rumelt, R., Dosi, G., and Winter, S. (1994). Understanding corporate coherence. Theory and evidence. Journal of Economic Behavior and Organization, 23 (1):1–30
Uzzi, B., Mukherjee, S., Stringer, M., and Jones, B. (2013). Atypical Combinations and Scientific Impact. Science, 342 (6157):468–472
relatedness_density, co_occurrence
## Generate a toy incidence matrix set.seed(2210) techs <- paste0("T", seq(1, 5)) techs <- sample(techs, 50, replace = TRUE) patents <- paste0("P", seq(1, 20)) patents <- sort(sample(patents, 50, replace = TRUE)) my_data <- data.frame(patents, techs) my_dat <- unique(my_data) mat <- as.matrix(table(my_data$patents, my_data$techs)) ## run the function z_score(mat)## Generate a toy incidence matrix set.seed(2210) techs <- paste0("T", seq(1, 5)) techs <- sample(techs, 50, replace = TRUE) patents <- paste0("P", seq(1, 20)) patents <- sort(sample(patents, 50, replace = TRUE)) my_data <- data.frame(patents, techs) my_dat <- unique(my_data) mat <- as.matrix(table(my_data$patents, my_data$techs)) ## run the function z_score(mat)