--- title: "PubMedMining-vignette" author: "Jeff DIDIER" output: rmarkdown::html_vignette description: > Easy function for text-mining the PubMed repository based on defined sets of terms. The relationship between fix-terms (related to your research topic) and pub-terms (terms which pivot around your research focus) is calculated using the pointwise mutual information algorithm (pmi). A text file is generated with the pmi-scores for each fixterm. Then for each collocation pairs (a fix-term + a pub-term), a text file is generated with related article titles and publishing years. Additional Author section will follow in the next version updates. vignette: > %\VignetteIndexEntry{PubMedMining-vignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r, setup, echo=FALSE} library(PubMedMining) ``` This package has been created for easy and fast term-based text mining of the broad PubMed article repository. To find relevant articles to your research topic, you must: * Figure out the main terms of your research focus (here fixterms) * Figure out important terms that might pivot around your focus (here pubterms) * (optional) define an output for the results files (Default = current location) * Have stable internet access The terms are stored as character strings in the according variables "fixterms" and "pubterms". The desired output pathway can be stored in the "output" variable. ```{r, example, eval=FALSE} fixterms = c("bike", "downhill") pubterms = c("dangerous", "extreme", "injuries") output = getwd() #or "YOUR/DESIRED/PATHWAY" pubmed_textmining(fixterms, pubterms, output) ``` Two kinds of results are generated by the function (.txt files): * PMI-scores: Point-wise mutual information score table for each fix-term with scores for each pub-term * relevant articles: for each fixterm+pubterm pair, a text file with relevant article titles and publishing year is generated __Definition of Pointwise Mutual Information (PMI) scoring:__ Good collocation pairs have high PMI because the probability of co-occurrence is only slightly lower than the probabilities of occurrence of each word. Conversely, a pair of words whose probabilities of occurrence are considerably higher than their probability of co-occurrence gets a small PMI score. If PMi = -Inf, no articles found for the respective collocation pair.