Title: | Bibliographic Network Analysis |
---|---|
Description: | Enables the user to build a citation network/graph from bibliographic data and, based on modularity and heterocitation metrics, assess the degree of awareness/cross-fertilization between two corpora/communities. This toolset is optimized for Scopus data. |
Authors: | Christian Vincenot |
Maintainer: | Christian Vincenot <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.13 |
Built: | 2024-12-11 06:50:12 UTC |
Source: | CRAN |
Denis Diderot (1713-1784), French philosopher and co-founder of the modern encyclopedia. |
This package allows to detect and quantify the unification or separation of two bibliographic corpora through the creation of citation networks. This tool can be used to study the spread of concepts across scientific disciplines, or the fusion/fission of scientific communities.
Package: | Diderot |
Type: | Package |
Version: | 0.13 |
Date: | 2020-04-17 |
License: | GPL (>=2) |
A typical flow of use of the package includes the following points.
First, literature metadata, including references, from the two fields of studies to analyze are downloaded from Scopus (or built manually). This data is imported to create a bibliographic dataset using create_bibliography
.
Second, a graph is created with a call to build_graph
to reproduce the citation network in the bibliographic dataset.
Finally, statistical analysis can be performed on the graph to assess the fusion/fission state of the two corpora/communities. Heterocitation indices (i.e. share and balance) show how much publications or authors cite papers from the other corpus (see heterocitation
and heterocitation_authors
respectively). Such analysis shall always be preceded by a call to precompute_heterocitation
to perform initial calculations. These metrics are completed by traditional as well as custom modularity metrics (see compute_modularity
and compute_custom_modularity
respectively) that translate how much the communities are separated. Publications that foster mutual awareness and cross-fertilization between the corpora/communities can be identified using the usual betweeness centrality metric (see compute_BC_ranking
) and the Ji index (see compute_Ji_ranking
).
Christian Vincenot
Maintainer: Christian Vincenot ([email protected])
## Not run: # Two corpora on individual-based modelling (IBM) and agent-based modelling (ABM) # were downloaded from Scopus. The structure of each corpus is as follows: tt<-read.csv("IBMmerged.csv", stringsAsFactors=FALSE) str(tt,strict.width="cut") ### 'data.frame': 3184 obs. of 9 variables: ### $ Authors : chr "Chen J., Marathe A., Marathe M." "Van Dijk D., Sl".. ### $ Title : chr "Coevolution of epidemics, social networks, and in".. ### $ Year : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ... ### $ DOI : chr "10.1007/978-3-642-12079-4_28" "10.1016/j.procs.20".. ### $ Link : chr "http://www.scopus.com/inward/record.url?eid=2-s2.".. ### $ Abstract : chr "This research shows how a limited supply of antiv".. ### $ Author.Keywords: chr "Antiviral; Behavioral economics; Epidemic; Microe".. ### $ Index.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microe".. ### $ References : chr "(2009) Centre Approves Restricted Retail Sale of ".. # Define the name of corpora (labels) and specific keywords to identify relevant # publications (keys). labels<-c("IBM","ABM") keys<-c("individual-based model|individual based model", "agent-based model|agent based model") # Build the IBM-ABM bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c("IBMmerged.csv","ABMmerged.csv"), labels=labels, keywords=keys) ### [1] "File IBMmerged.csv contains 3184 records" ### [1] "File ABMmerged.csv contains 9641 records" # Build and save citation graph gr<-build_graph(db=db,small.year.mismatch=T,fine.check.nb.authors=2, attrs=c("Corpus","Year","Authors", "DOI")) ### [1] "Graph built! Execution time: 1200.22 seconds." save_graph(gr, "graph.graphml") # Compute and plot modularity compute_modularity(gr_sx, 1987, 2018) ###[1] 0.3164805 plot_modularity_timeseries(gr_sx, 1987, 2018, window=1000) # Compute and plot publication heterocitation gr_sx<-precompute_heterocitation(gr,labels=labels,infLimitYear=1987, supLimitYear=2018) ###[1] "Summary of the nodes considered for computation (1987-2017)" ###[1] "-----------------------------------------------------------" ###[1] "IBM ABM IBM|ABM" ###[1] "1928 5378 153" ###[1] ###[1] "Edges summary" ###[1] "-------------" ###[1] "IBM->IBM/IBM->Other 5583/1086 => Prop 0.163" ###[1] "ABM->ABM/ABM->Other 16946/2665 => Prop 0.136" ###[1] "General Same/Diff 22529/3751 => Prop 0.143" ###[1] ###[1] "Heterocitation metrics" ###[1] "----------------------" ###[1] "Sx ALL / IBM / ABM" ###[1] "0.127 / 0.137 / 0.124" ###[1] "Dx ALL / IBM / ABM" ###[1] "-0.652 / -0.803 / -0.598" heterocitation(gr_sx, labels=labels, 1987, 2005) ###[1] "Sx ALL / ABM / IBM" ###[1] "0.047 / 0.214 / 0.007" ###[1] "Dx ALL / ABM / IBM" ###[1] "-0.927 / -0.690 / -0.982" plot_heterocitation_timeseries(gr_sx, labels=labels, mini=-1, maxi=-1, cesure=2005) # Compute author heterocitation hetA<-heterocitation_authors(gr_sx, 1987, 2018, pub_threshold=4) head(hetA[order(hetA$avgDx,decreasing=T),c(1)], n=10) ### [1] "Ashlock D." "Evora J." "Hernandez J.J." "Hernandez M." "Gooch K.J." ### [6] "Reinhardt J.W." "Ng K." "Kazanci C." "Senior A.M." "Ariel G." # Try to figure which publication are most impactful in terms of cross-fertilization jir<-compute_Ji_ranking(gr_sx, labels=labels, 1987, 2018) head(jir[,c(2,7)],n=3) ### Title Ji ### 758 A standard protocol for describing individual-based and agent-based models 200 ### 4437 Pattern-oriented modeling of agent-based complex systems: Lessons from ecology 134 ### 33 The ODD protocol: A review and first update 120 ## End(Not run)
## Not run: # Two corpora on individual-based modelling (IBM) and agent-based modelling (ABM) # were downloaded from Scopus. The structure of each corpus is as follows: tt<-read.csv("IBMmerged.csv", stringsAsFactors=FALSE) str(tt,strict.width="cut") ### 'data.frame': 3184 obs. of 9 variables: ### $ Authors : chr "Chen J., Marathe A., Marathe M." "Van Dijk D., Sl".. ### $ Title : chr "Coevolution of epidemics, social networks, and in".. ### $ Year : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ... ### $ DOI : chr "10.1007/978-3-642-12079-4_28" "10.1016/j.procs.20".. ### $ Link : chr "http://www.scopus.com/inward/record.url?eid=2-s2.".. ### $ Abstract : chr "This research shows how a limited supply of antiv".. ### $ Author.Keywords: chr "Antiviral; Behavioral economics; Epidemic; Microe".. ### $ Index.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microe".. ### $ References : chr "(2009) Centre Approves Restricted Retail Sale of ".. # Define the name of corpora (labels) and specific keywords to identify relevant # publications (keys). labels<-c("IBM","ABM") keys<-c("individual-based model|individual based model", "agent-based model|agent based model") # Build the IBM-ABM bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c("IBMmerged.csv","ABMmerged.csv"), labels=labels, keywords=keys) ### [1] "File IBMmerged.csv contains 3184 records" ### [1] "File ABMmerged.csv contains 9641 records" # Build and save citation graph gr<-build_graph(db=db,small.year.mismatch=T,fine.check.nb.authors=2, attrs=c("Corpus","Year","Authors", "DOI")) ### [1] "Graph built! Execution time: 1200.22 seconds." save_graph(gr, "graph.graphml") # Compute and plot modularity compute_modularity(gr_sx, 1987, 2018) ###[1] 0.3164805 plot_modularity_timeseries(gr_sx, 1987, 2018, window=1000) # Compute and plot publication heterocitation gr_sx<-precompute_heterocitation(gr,labels=labels,infLimitYear=1987, supLimitYear=2018) ###[1] "Summary of the nodes considered for computation (1987-2017)" ###[1] "-----------------------------------------------------------" ###[1] "IBM ABM IBM|ABM" ###[1] "1928 5378 153" ###[1] ###[1] "Edges summary" ###[1] "-------------" ###[1] "IBM->IBM/IBM->Other 5583/1086 => Prop 0.163" ###[1] "ABM->ABM/ABM->Other 16946/2665 => Prop 0.136" ###[1] "General Same/Diff 22529/3751 => Prop 0.143" ###[1] ###[1] "Heterocitation metrics" ###[1] "----------------------" ###[1] "Sx ALL / IBM / ABM" ###[1] "0.127 / 0.137 / 0.124" ###[1] "Dx ALL / IBM / ABM" ###[1] "-0.652 / -0.803 / -0.598" heterocitation(gr_sx, labels=labels, 1987, 2005) ###[1] "Sx ALL / ABM / IBM" ###[1] "0.047 / 0.214 / 0.007" ###[1] "Dx ALL / ABM / IBM" ###[1] "-0.927 / -0.690 / -0.982" plot_heterocitation_timeseries(gr_sx, labels=labels, mini=-1, maxi=-1, cesure=2005) # Compute author heterocitation hetA<-heterocitation_authors(gr_sx, 1987, 2018, pub_threshold=4) head(hetA[order(hetA$avgDx,decreasing=T),c(1)], n=10) ### [1] "Ashlock D." "Evora J." "Hernandez J.J." "Hernandez M." "Gooch K.J." ### [6] "Reinhardt J.W." "Ng K." "Kazanci C." "Senior A.M." "Ariel G." # Try to figure which publication are most impactful in terms of cross-fertilization jir<-compute_Ji_ranking(gr_sx, labels=labels, 1987, 2018) head(jir[,c(2,7)],n=3) ### Title Ji ### 758 A standard protocol for describing individual-based and agent-based models 200 ### 4437 Pattern-oriented modeling of agent-based complex systems: Lessons from ecology 134 ### 33 The ODD protocol: A review and first update 120 ## End(Not run)
Builds a citation graph based on a database of bibliographic records generated with create_bibliography. This process is automatically parallelized on multicore hardware. By default, matching between title and references is done based on the full title, publication year, and three first authors. Publication attributes present in the dataframe can be copied to graph nodes using the attrs argument.
build_graph(db, title = "Cite Me As", year = "Year", authors = "Authors", ref = "Cited References", set.title.as.name = F, attrs = NULL, verbose = F, makeCluster.type = "PSOCK", nb.cores=NA, fine.check.threshold = 1000, fine.check.nb.authors = 3, small.year.mismatch = T, debug = F)
build_graph(db, title = "Cite Me As", year = "Year", authors = "Authors", ref = "Cited References", set.title.as.name = F, attrs = NULL, verbose = F, makeCluster.type = "PSOCK", nb.cores=NA, fine.check.threshold = 1000, fine.check.nb.authors = 3, small.year.mismatch = T, debug = F)
db |
Bibliographic database created with created_bibliography. |
title |
Name of the data frame column in which publication titles are listed. |
year |
Name of the data frame column in which publication years are listed. |
authors |
Name of the data frame column in which publication authors are listed. |
ref |
Name of the data frame column in which publication references are listed. |
set.title.as.name |
Set graph vertex ID to publication title |
attrs |
Attributes of the bibliographic database (i.e. data frame column names, such as "Authors"", "Year") to be set as vertex attributes. |
verbose |
Verbosity flag triggering a more detailed output during graph building. |
makeCluster.type |
Type of cluster to be used to parallelize the graph building process. For more options, see |
nb.cores |
Number of cores to be used for parallel computation. |
fine.check.threshold |
Title length under which citation matching is further confirmed based on publication year. This value can be reduced to increase performance on large bibliographic databases. By default, publication year check is always performed. |
fine.check.nb.authors |
Maximum number of authors to check against for citation matching. This value can be reduced to increase performance on large bibliographic databases. Default value is three authors. |
small.year.mismatch |
Flag indicating whether small year mismatches (+- 1 year) should be tolerated. It is recommended to keep this this flag to TRUE to accomodate usual inconsistencies in bibliographic databases. |
debug |
Debug flag allowing the user to browse function calls upon execution error. For more details, see |
Returns a graph object.
Christian Vincenot ([email protected])
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
This function computes the Betweenness Centrality (BC) for each graph node (i.e. publication).
compute_BC_ranking(gr, labels, write_to_graph = F)
compute_BC_ranking(gr, labels, write_to_graph = F)
gr |
Citation graph |
labels |
Labels (i.e. names) of the two corpora featured in the graph. |
write_to_graph |
Flag to indicate whether to write results to the graph (i.e. save BC values as node attributes). |
If write_to_graph is FALSE, returns a list of entries (authors, title, year, corpus, BC) sorted by decreasing BC. Else, returns the graph given as input to which BC values are added as node attributes.
Christian Vincenot ([email protected])
compute_citation_ranking
, compute_Ji_ranking
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Compute BC compute_BC_ranking(gr, labels)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Compute BC compute_BC_ranking(gr, labels)
This function computes the citation count for each graph node (i.e. publication).
compute_citation_ranking(gr, labels, write_to_graph = F)
compute_citation_ranking(gr, labels, write_to_graph = F)
gr |
Citation graph |
labels |
Labels (i.e. names) of the two corpora featured in the graph. |
write_to_graph |
Flag to indicate whether to write results to the graph (i.e. save citation count values as node attributes). |
If write_to_graph is FALSE, returns a list of entries (authors, title, year, corpus, citations) sorted by decreasing citation count. Else, returns the graph given as input to which citation count values are added as node attributes.
Christian Vincenot ([email protected])
compute_BC_ranking
, compute_Ji_ranking
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Compute Citation Ranking compute_citation_ranking(gr, labels)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Compute Citation Ranking compute_citation_ranking(gr, labels)
This function computes custom modularity of the citation graph. Custom modularity here stands for Newman's modularity computed over the subgraph comprising nodes that belong to a single corpus only and are within the time window, as well as direct outgoing neighbors of the former nodes (whatever their year tag). Basically, the citation graph considered thus includes publications within the time window as well as older papers that they cite.
compute_custom_modularity(gr, infLimitYear, supLimitYear)
compute_custom_modularity(gr, infLimitYear, supLimitYear)
gr |
Citation graph |
infLimitYear |
Start year of the time window considered (included) |
supLimitYear |
End year of the time window considered (*excluded*) |
Returns the custom modularity value of the subgraph restricted to the interval [infLimitYear;supLimitYear[.
Christian Vincenot ([email protected])
compute_modularity
, plot_modularity_timeseries
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Compute Custom Modularity compute_custom_modularity(gr, 1990, 2018)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Compute Custom Modularity compute_custom_modularity(gr, 1990, 2018)
This function computes the Ji metric for a given term from a bibliographic dataset and returns its annual evolution within the timeframe specified. This metric indicates how much the term (e.g. publication title, software name) is cited simulteaneously in the references of both corpora and is thus important for cross-fertilization between the two communities. This function is run on the bibliographic dataset (created with create_bibliography
) and is thus useful before graph creation or when the term to be searched is not the title of a node in the resulting graph. For instance, if the user knows that a publication (or, e.g. software or scientific database referenced only through a URL or grey literature) is cited and may have an impact on cross-fertilization between the two communities (the literature of which is represented by the two corpora) but does not have its own entry in the bibliographic database and would therefore not be featured as a node in the graph created by build_graph
, the compute_Ji
function can be used to assess its importance.
compute_Ji(db, pubtitle, labels, from = -1, to = -1)
compute_Ji(db, pubtitle, labels, from = -1, to = -1)
db |
Bibliographic database created with created_bibliography. |
pubtitle |
Publication title, or more generally term to be searched (e.g. software name). |
labels |
Labels (i.e. names) of the two corpora featured in the graph. |
from |
Start year of the time window considered (included) |
to |
End year of the time window considered (*excluded*) |
Dataframe containing year and Ji metric value.
Christian Vincenot ([email protected])
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Compute Ji compute_Ji(db, "Title1", labels, from=1990, to=2018)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Compute Ji compute_Ji(db, "Title1", labels, from=1990, to=2018)
This function computes the Ji metric for each graph node (i.e. publication). This metric indicates how much a publication is cited simulteaneously by both corpora and is thus important for cross-fertilization between the two communities.
compute_Ji_ranking(gr, labels, infLimitYear, supLimitYear, write_to_graph=F)
compute_Ji_ranking(gr, labels, infLimitYear, supLimitYear, write_to_graph=F)
gr |
Citation graph |
labels |
Labels (i.e. names) of the two corpora featured in the graph. |
infLimitYear |
Start year of the time window considered (included) |
supLimitYear |
End year of the time window considered (*excluded*) |
write_to_graph |
Flag to indicate whether to write results to the graph (i.e. save Ji values as node attributes). |
If write_to_graph is FALSE, returns a list of entries (authors, title, year, corpus, citations from corpus 1, citation from corpus 2, Ji) sorted by decreasing Ji. Else, returns the graph given as input to which Ji are added as node attributes.
Christian Vincenot ([email protected])
build_graph
, precompute_heterocitation
, compute_Ji
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Compute Ji ranking compute_Ji_ranking(gr, labels, 1990, 2018)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Compute Ji ranking compute_Ji_ranking(gr, labels, 1990, 2018)
This function computes Newman's modularity of the citation graph restricted to a given time window and ignoring nodes belonging to both corpora simultaneously.
compute_modularity(gr, infLimitYear, supLimitYear)
compute_modularity(gr, infLimitYear, supLimitYear)
gr |
Citation graph |
infLimitYear |
Start year of the time window considered (included) |
supLimitYear |
End year of the time window considered (*excluded*) |
Returns Newman's modularity value of the subgraph restricted to the interval [infLimitYear;supLimitYear[.
Christian Vincenot ([email protected])
compute_custom_modularity
, plot_modularity_timeseries
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Compute Modularity compute_modularity(gr, 1990, 2018)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Compute Modularity compute_modularity(gr, 1990, 2018)
This function creates a bibliographic dataset based on two external corpus files, each representing the bibliography of a given domain.
create_bibliography(corpora_files, labels, keywords, retrieve_pubdates = F, clean_refs = F, encoding = NULL)
create_bibliography(corpora_files, labels, keywords, retrieve_pubdates = F, clean_refs = F, encoding = NULL)
corpora_files |
Vector containing the pathes to two corpus files (e.g. Scopus exports). The CSV files should contain for each record at least Authors (comma separated), Publication Title, Publication Year, and References (semicolon separated). The inclusion of DOI (for date checking; see the retrieve_pubdates option) as well as Abstract, Author.Keywords, and Index.Keywords (for the in-depth identification of publications belonging to both corpora) are strongly recommended. |
labels |
Labels (i.e. names) given to the two corpora to be analyzed. |
keywords |
Keywords identifying the two corpora |
retrieve_pubdates |
Flag indicating whether to confirm publication dates by retrieving them (see |
clean_refs |
Attempt to clean references and keep titles only. NOT RECOMMENDED, especially if |
encoding |
Character encoding used in the input files. |
Returns a dataframe containing a bibliographic dataset usable by Diderot and including all references from both corpora.
Christian Vincenot ([email protected])
## Not run: # Two corpora on individual-based modelling (IBM) and agent-based modelling (ABM) # were downloaded from Scopus. The structure of each corpus is as follows: tt<-read.csv("IBMmerged.csv", stringsAsFactors=FALSE) str(tt,strict.width="cut") ### 'data.frame': 3184 obs. of 9 variables: ### $ Authors : chr "Chen J., Marathe A., Marathe M." "Van Dijk D., Sl".. ### $ Title : chr "Coevolution of epidemics, social networks, and in".. ### $ Year : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ... ### $ DOI : chr "10.1007/978-3-642-12079-4_28" "10.1016/j.procs.20".. ### $ Link : chr "http://www.scopus.com/inward/record.url?eid=2-s2.".. ### $ Abstract : chr "This research shows how a limited supply of antiv".. ### $ Author.Keywords: chr "Antiviral; Behavioral economics; Epidemic; Microe".. ### $ Index.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microe".. ### $ References : chr "(2009) Centre Approves Restricted Retail Sale of ".. # Define the name of corpora (labels) and specific keywords to identify relevant # publications (keys). labels<-c("IBM","ABM") keys<-c("individual-based model|individual based model", "agent-based model|agent based model") # Build the IBM-ABM bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c("IBMmerged.csv","ABMmerged.csv"), labels=labels, keywords=keys) ### [1] "File IBMmerged.csv contains 3184 records" ### [1] "File ABMmerged.csv contains 9641 records" # Processed output. Note the field name changes (for standardization with ISI Web # of Knowledge format) and addition of the "Corpus" field (with identification of # joint "IBM | ABM" publications based on keywords). str(db, strict.width="cut") ### 'data.frame': 12504 obs. of 10 variables: ### $ Authors : chr "Chen J., Marathe A., Marathe M." "Van Dijk D., Sloot ".. ### $ Cite Me As : chr "Coevolution of epidemics, social networks, and indivi".. ### $ Year : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ... ### $ DOI : chr "10.1007/978-3-642-12079-4_28" "10.1016/j.procs.2010.0".. ### $ Link : chr "http://www.scopus.com/inward/record.url?eid=2-s2.0-78".. ### $ Abstract : chr "This research shows how a limited supply of antiviral".. ### $ Author.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microecono".. ### $ Index.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microecono".. ### $ Cited References: chr "(2009) Centre Approves Restricted Retail Sale of Tami".. ### $ Corpus : chr "IBM" "IBM | ABM" "IBM | ABM" "IBM" ... ## End(Not run)
## Not run: # Two corpora on individual-based modelling (IBM) and agent-based modelling (ABM) # were downloaded from Scopus. The structure of each corpus is as follows: tt<-read.csv("IBMmerged.csv", stringsAsFactors=FALSE) str(tt,strict.width="cut") ### 'data.frame': 3184 obs. of 9 variables: ### $ Authors : chr "Chen J., Marathe A., Marathe M." "Van Dijk D., Sl".. ### $ Title : chr "Coevolution of epidemics, social networks, and in".. ### $ Year : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ... ### $ DOI : chr "10.1007/978-3-642-12079-4_28" "10.1016/j.procs.20".. ### $ Link : chr "http://www.scopus.com/inward/record.url?eid=2-s2.".. ### $ Abstract : chr "This research shows how a limited supply of antiv".. ### $ Author.Keywords: chr "Antiviral; Behavioral economics; Epidemic; Microe".. ### $ Index.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microe".. ### $ References : chr "(2009) Centre Approves Restricted Retail Sale of ".. # Define the name of corpora (labels) and specific keywords to identify relevant # publications (keys). labels<-c("IBM","ABM") keys<-c("individual-based model|individual based model", "agent-based model|agent based model") # Build the IBM-ABM bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c("IBMmerged.csv","ABMmerged.csv"), labels=labels, keywords=keys) ### [1] "File IBMmerged.csv contains 3184 records" ### [1] "File ABMmerged.csv contains 9641 records" # Processed output. Note the field name changes (for standardization with ISI Web # of Knowledge format) and addition of the "Corpus" field (with identification of # joint "IBM | ABM" publications based on keywords). str(db, strict.width="cut") ### 'data.frame': 12504 obs. of 10 variables: ### $ Authors : chr "Chen J., Marathe A., Marathe M." "Van Dijk D., Sloot ".. ### $ Cite Me As : chr "Coevolution of epidemics, social networks, and indivi".. ### $ Year : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ... ### $ DOI : chr "10.1007/978-3-642-12079-4_28" "10.1016/j.procs.2010.0".. ### $ Link : chr "http://www.scopus.com/inward/record.url?eid=2-s2.0-78".. ### $ Abstract : chr "This research shows how a limited supply of antiviral".. ### $ Author.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microecono".. ### $ Index.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microecono".. ### $ Cited References: chr "(2009) Centre Approves Restricted Retail Sale of Tami".. ### $ Corpus : chr "IBM" "IBM | ABM" "IBM | ABM" "IBM" ... ## End(Not run)
This function retrieves precise publication date by querying the Digital Object Identifier (DOI) web server. Alternatively, if extract_date_from_doi is set to TRUE, the function will first try to extract a publication year from the publication DOI string. If create_bibliography
is called with retrieve_pubdates = TRUE, it calls get_date_from_doi
for each record to confirm publication dates.
get_date_from_doi(doi, extract_date_from_doi)
get_date_from_doi(doi, extract_date_from_doi)
doi |
Character string representing the Digital Object Identifier (DOI) of the publication |
extract_date_from_doi |
Flag indicating whether to try to simply extract publication year from the DOI string before restorting to online queries to the DOI server |
Returns a date in YYYY-MM-DD format or YYYY-MM format if extract_date_from_doi is set to TRUE.
Scopus records already contain the year of publication of scientific papers indexed. However, in some cases these are inaccurate and can be verified by comparing them with the date retrieved by this function. Note that
Christian Vincenot ([email protected])
## Not run: # Query publication date from DOI server get_date_from_doi(doi="10.1016/j.procs.2010.04.250",extract_date_from_doi=TRUE) ## End(Not run) # Extract date from DOI string get_date_from_doi(doi="10.1016/j.procs.2010.04.250",extract_date_from_doi=TRUE)
## Not run: # Query publication date from DOI server get_date_from_doi(doi="10.1016/j.procs.2010.04.250",extract_date_from_doi=TRUE) ## End(Not run) # Extract date from DOI string get_date_from_doi(doi="10.1016/j.procs.2010.04.250",extract_date_from_doi=TRUE)
Function to extract the publication title from a reference using an online query to Freecite
get_reference_title(str)
get_reference_title(str)
str |
Character string representing a reference |
Returns a character string of the publication title
Christian Vincenot ([email protected])
This function calculates the heterocitation share and heterocitation balance between two corpora A and B in the time window specified. The heterocitation share (Sx) of a publication belonging to corpus A is defined as the percentage of citations to publications belonging to corpus B (or A|B) in its reference list. The global heterocitation share for corpus A is calculated as the average heterocitation share of the publications that corpus A contains (e.g. a value of 0.2 for corpus A indicates that, on average, publications in corpus A cite only 20% of papers from corpus B). The heterocitation balance metric (Dx), on the other hand, takes into consideration the respective sizes of corpus A and B to discern how much the heterocitation share deviates from values expected in the case of well-mixedness (i.e. if A and B originated from a unique community; e.g. a value of -50% for corpus A indicates that, on average, publications in corpus A cite papers from corpus B half less frequently than expected, which suggests a lack of mutual awareness between the corpora and related communities).
heterocitation(gr, labels, infLimitYear, supLimitYear)
heterocitation(gr, labels, infLimitYear, supLimitYear)
gr |
Citation graph priorly preprocessed with |
labels |
Labels (i.e. names) of the two corpora featured in the graph. |
infLimitYear |
Start year of the time window considered (included) |
supLimitYear |
End year of the time window considered (*excluded*) |
Returns a numerical vector containing, in this order, the heterocitation share (Sx) for corpus A, B and global, and the heterocitation balance (Dx) for A, B and global.
precompute_heterocitation
should be called before running this function.
Christian Vincenot ([email protected])
precompute_heterocitation
, plot_heterocitation_timeseries
, heterocitation_authors
, MC_baseline_distribution
, significance_Dx
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Heterocitation gr<-precompute_heterocitation(gr,labels, 1990, 2018) heterocitation(gr,labels, 1990, 2018)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Heterocitation gr<-precompute_heterocitation(gr,labels, 1990, 2018) heterocitation(gr,labels, 1990, 2018)
This function computes heterocitation metrics for authors. The heterocitation share (Sx) and heterocitation balance (Dx) of an author are calculated as the average of these metrics for papers published by this author within the given time window. See the man page of heterocitation
for definitions of heterocitation metrics.
heterocitation_authors(gr, infLimitYear, supLimitYear, pub_threshold = 0, remove_orphans = F, remove_citations_to_joint_papers = F)
heterocitation_authors(gr, infLimitYear, supLimitYear, pub_threshold = 0, remove_orphans = F, remove_citations_to_joint_papers = F)
gr |
Citation graph priorly preprocessed with |
infLimitYear |
Start year of the time window considered (included) |
supLimitYear |
End year of the time window considered (*excluded*) |
pub_threshold |
Minimum number of publications for authors to be considered. |
remove_orphans |
Do not consider publications that do not cite any other paper in the dataset (i.e. orphan nodes in the citation network) |
remove_citations_to_joint_papers |
Do not consider publications belonging to both corpora in the authors' average corpus calculation. |
Returns a data frame containing author name ("Authors"), number of publications ("NbPubs"), list of publication years ("Years"), list of publications corpora ("Corpus"), list of publication heterocitation share ("Sx"), list of publication heterocitation balance ("Dx"), average heterocitation share ("avgSx"), average heterocitation balance ("avgDx"), average corpus value of publications ("avgCorpus"), regression coefficient of the heterocitation share evolution ("coeffSx"), regression coefficient of the heterocitation balance evolution ("coeffDx"), regression coefficient of the evolution of the corpus value of publications ("coeffCorpus").
precompute_heterocitation
should be called before running this function.
Christian Vincenot ([email protected])
precompute_heterocitation
, heterocitation
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Heterocitation gr<-precompute_heterocitation(gr,labels, 1990, 2018) # Author heterocitation heterocitation_authors(gr, 1990, 2018)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Heterocitation gr<-precompute_heterocitation(gr,labels, 1990, 2018) # Author heterocitation heterocitation_authors(gr, 1990, 2018)
This function loads a citation graph saved on the filesystem.
load_graph(filename)
load_graph(filename)
filename |
File to load |
Returns a graph object.
This function basically supports only graph previously saved with Diderot's save_graph. However, as the file is actually a graphml file handled by igraph, advanced users may use this function on appropriate graphs created elsewhere, as long as they respect Diderot's structure (presence of a "Corpus"" field, etc).
Christian Vincenot ([email protected])
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) ## Not run: save_graph(gr, "Saved.graphml") # Load saved graph gr<-load_graph("Saved.graphml") ## End(Not run)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) ## Not run: save_graph(gr, "Saved.graphml") # Load saved graph gr<-load_graph("Saved.graphml") ## End(Not run)
This function performs Monte Carlo runs with random permutations of corpus tags in the graph provided and computes the heterocitation balance on the new graphs. Permutation is repeated over several iterations (set through the "rep" argument) and provides a baseline Dx values for the graph topology considered. This can then be compared with the Dx value obtained for the original graph to evaluate whether it could merely be the result of chance (see significance_Dx
).
MC_baseline_distribution(gr, labels, infYearLimit, supYearLimit, rep = 20)
MC_baseline_distribution(gr, labels, infYearLimit, supYearLimit, rep = 20)
gr |
Graph file (created with build_graph) |
labels |
List of the names of the two corpora studied (e.g. c("Computer Science", "Mathematics")), present in the "Corpus" attribute |
infYearLimit |
Minimum year considered in this study |
supYearLimit |
Maximum year considered in this study |
rep |
Number of Monte Carlo iterations |
This function currently plots the histograms of distribution of Dx values generated through random permutations of corpus tags among the records. Returns a list containing:
Dx1 |
Dx value for corpus 1 per iteration |
Dx1 |
Dx value for corpus 2 per iteration |
DxALL |
Global Dx value per iteration |
Christian Vincenot ([email protected])
significance_Dx
, heterocitation
This function output the number of records and citations in a bibliographic database, and returns the latter.
nb_refs(db)
nb_refs(db)
db |
Bibliographic database |
Returns the number of citations in the bibliographic database
Christian Vincenot ([email protected])
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # NB refs nb_refs(db)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # NB refs nb_refs(db)
This function plots and returns the annual number of new authors and cumulative number of authors in the bibliographic database.
plot_authors_count(db)
plot_authors_count(db)
db |
Bibliographic database created with created_bibliography. |
Returns a dataframe containing year, annual number of new authors (i.e. not seen before), and cumulative number of authors.
Christian Vincenot ([email protected])
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Plot authors count ## Not run: plot_authors_count(db)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Plot authors count ## Not run: plot_authors_count(db)
This function plots and returns annual heterocitation share and heterocitation balance values.
plot_heterocitation_timeseries(gr_arg, labels, mini = -1, maxi = -1, cesure = -1)
plot_heterocitation_timeseries(gr_arg, labels, mini = -1, maxi = -1, cesure = -1)
gr_arg |
Citation graph |
labels |
Labels (i.e. names) of the two corpora featured in the graph. |
mini |
Start year of the time window |
maxi |
End year of the time window |
cesure |
Year before which values should be cumulated. Default value is -1, which indicates that each year in the time window should be plotted. |
Returns a dataframe with year and annual values for heterocitation share (sx1, sx2 and sxall for corpus A and B and global resp.) and heterocitation balance (dx1, dx2 and dxall for corpus A and B and global resp.).
Christian Vincenot ([email protected])
precompute_heterocitation
, heterocitation
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Heterocitation timeseries gr<-precompute_heterocitation(gr,labels, 1990, 2018) plot_heterocitation_timeseries(gr, labels, 1990, 2018)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Heterocitation timeseries gr<-precompute_heterocitation(gr,labels, 1990, 2018) plot_heterocitation_timeseries(gr, labels, 1990, 2018)
This function plots and returns annual graph modularity values for predefined corpora (representing communities). See compute_modularity
for details on modularity calculation.
plot_modularity_timeseries(gr_arg, mini = -1, maxi = -1, cesure = -1, window = 1, modularity_function = "normal")
plot_modularity_timeseries(gr_arg, mini = -1, maxi = -1, cesure = -1, window = 1, modularity_function = "normal")
gr_arg |
Citation graph |
mini |
Start year of the time window |
maxi |
End year of the time window |
cesure |
Year before which values should be cumulated. Default value is -1, which indicates that each year in the time window should be plotted. |
window |
The temporal sliding window size over which modularity should be computed. |
modularity_function |
Modularity function to be used for the calculation: "custom" indicates that |
Returns a dataframe containing year and annual modularity value.
Christian Vincenot ([email protected])
compute_modularity
, compute_custom_modularity
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Compute Modularity timeseries plot_modularity_timeseries(gr, 1990, 2018)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Compute Modularity timeseries plot_modularity_timeseries(gr, 1990, 2018)
This function plots and returns the annual number of publications.
plot_publication_curve(gr, labels, k = 1)
plot_publication_curve(gr, labels, k = 1)
gr |
Citation graph |
labels |
Labels (i.e. names) of the two corpora featured in the graph. |
k |
Text font size (multiplier of cex values) |
Returns a dataframe containing year and annual publication count for each corpus and both together.
Christian Vincenot ([email protected])
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Publication curve plot_publication_curve(gr,labels)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) # Publication curve plot_publication_curve(gr,labels)
This function computes heterocitation values for each publication and stores them as node attributes in the graph. The heterocitation share of a publication belonging to corpus A is defined as the percentage of citations to publications belonging to corpus B (or A|B) in its reference list (e.g. a value of 0.2 for a publication in corpus A indicates that the publication cites only 20% of papers from corpus B). The heterocitation balance metric, on the other hand, takes into consideration the respective sizes of corpus A and B to discern how much the heterocitation share deviates from values expected in the case of well-mixedness (i.e. if A and B originated from a unique community; e.g. a value of -30% for a publication in corpus A indicates that it cites papers from corpus B 30% less frequently than expected).
precompute_heterocitation(gr, labels, infLimitYear, supLimitYear)
precompute_heterocitation(gr, labels, infLimitYear, supLimitYear)
gr |
Citation graph |
labels |
Labels (i.e. names) of the two corpora featured in the graph. |
infLimitYear |
Start year of the time window considered (included) |
supLimitYear |
End year of the time window considered (*excluded*) |
Returns the graph gr with added node attributes Sx and Dx representing the heterocitation share and heterocitation balance respectively.
Corpus-wide heterocitation values can be computed using heterocitation
.
Christian Vincenot ([email protected])
heterocitation
, plot_heterocitation_timeseries
, compute_Ji_ranking
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) gr<-precompute_heterocitation(gr,labels, 1990, 2018)
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) gr<-precompute_heterocitation(gr,labels, 1990, 2018)
This function saves a graph produced with Diderot. The resulting structure is actually a graphml file and can thus be exported to third party software.
save_graph(gr, filename)
save_graph(gr, filename)
gr |
Graph object to save. |
filename |
File to save the graph to. |
Christian Vincenot ([email protected])
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) ## Not run: save_graph(gr, "Saved.graphml")
labels<-c("Corpus1","Corpus2") # Build a bibliographical dataset from Scopus exports db<-create_bibliography(corpora_files=c(tempfi1,tempfi2), labels=labels, keywords=NA) # Build graph gr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1) ## Not run: save_graph(gr, "Saved.graphml")
This function assesses to what extent the heterocitation balance (Dx value) calculated for a graph departs from baseline situation. The latter typically represents Dx values to be expected by chance, i.e. through random permutation of corpus assignation at the node/vertex level (see MC_baseline_distribution
). A Shapiro-Wilk test is first executed on the control distribution (using shapiro.test
) and if the normality hypothesis is not rejected, a one-sample t test (see t.test
) is used to test whether value is significantly different from the control distribution. The strength of this difference is additionally assessed through Glass' delta, an estimator of effect size (Glass, McGraw, and Smith, 1981).
significance_Dx(value, control, normality_threshold=0.05)
significance_Dx(value, control, normality_threshold=0.05)
value |
Heterocitation balance (Dx) calculated for the citation network studied |
control |
Baseline distribution of Dx values in control experiments |
normality_threshold |
P value threshold under which the hypothesis of normality is rejected in the preliminary Shapiro-Wilk test |
Returns a list containing the p-value obtained in a one-sample t test comparing value and the control distribution (with null hypothesis being that value could come from the control distribution) or NA if the control distribution is not normal based on a Shapiro-Wilk normality test, and Glass' estimator of effect size.
Christian Vincenot ([email protected])
Glass, G. V., McGraw, B., & Smith, M. L. (1981). Meta-analysis in social research. Beverly Hills: Sage Publications.
significance_Dx
, heterocitation
## Not run: # Heterocitation in our graph heterocitation(gr_sx, labels=labels, 1987, 2005) ### [1] "Sx ALL / ABM / IBM" ### [1] "0.047 / 0.214 / 0.007" ### [1] "Dx ALL / ABM / IBM" ### [1] "-0.927 / -0.690 / -0.982" # Generate a baseline distribution for Dx values obtained through chance # Here, we run 200 iterations of node corpus permutations baseline<-MC_baseline_distribution(gr_sx, labels, 1987, 2018, 200) # Assess whether our observed Dx is possibly due to chance significance_Dx(-0.927, baseline[["Dx ALL"]]) ### [1] "Distribution is normal. Performing t-test." ### ### One Sample t-test ### ### data: value - control ### t = -323.0017, df = 319, p-value < 2.2e-16 ### alternative hypothesis: true mean is not equal to 0 ### 95 percent confidence interval: ### -0.9159834 -0.9048923 ### sample estimates: ### mean of x ### -0.9104379 ### ### [1] "Glass' effect size: -18.0563442219448" ## End(Not run)
## Not run: # Heterocitation in our graph heterocitation(gr_sx, labels=labels, 1987, 2005) ### [1] "Sx ALL / ABM / IBM" ### [1] "0.047 / 0.214 / 0.007" ### [1] "Dx ALL / ABM / IBM" ### [1] "-0.927 / -0.690 / -0.982" # Generate a baseline distribution for Dx values obtained through chance # Here, we run 200 iterations of node corpus permutations baseline<-MC_baseline_distribution(gr_sx, labels, 1987, 2018, 200) # Assess whether our observed Dx is possibly due to chance significance_Dx(-0.927, baseline[["Dx ALL"]]) ### [1] "Distribution is normal. Performing t-test." ### ### One Sample t-test ### ### data: value - control ### t = -323.0017, df = 319, p-value < 2.2e-16 ### alternative hypothesis: true mean is not equal to 0 ### 95 percent confidence interval: ### -0.9159834 -0.9048923 ### sample estimates: ### mean of x ### -0.9104379 ### ### [1] "Glass' effect size: -18.0563442219448" ## End(Not run)