Title: | Persian Text Mining Tool for Co-Occurrence Network |
---|---|
Description: | Provides an extension to 'MadanText' for creating and analyzing co-occurrence networks in Persian text data. This package mainly makes use of the 'PersianStemmer' (Safshekan, R., et al. (2019). <https://CRAN.R-project.org/package=PersianStemmer>), 'udpipe' (Wijffels, J., et al. (2023). <https://CRAN.R-project.org/package=udpipe>), and 'shiny' (Chang, W., et al. (2023). <https://CRAN.R-project.org/package=shiny>) packages. |
Authors: | Kido Ishikawa [aut, cre], Hasan Khosravi [aut] |
Maintainer: | Kido Ishikawa <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2025-01-02 06:50:16 UTC |
Source: | CRAN |
This function converts the given object to a data frame.
ASDATA.FRAME(x)
ASDATA.FRAME(x)
x |
An object to be converted into a data frame. |
Returns a data frame with rows and columns corresponding to the original object's structure. If 'x' is a matrix, each column in the matrix becomes a column in the data frame. If 'x' is a list where all elements are of the same length, each element of the list becomes a column in the data frame. Attributes such as rownames, colnames, and dimnames (if any) are preserved in the conversion.
data <- ASDATA.FRAME(matrix(1:4, ncol = 2))
data <- ASDATA.FRAME(matrix(1:4, ncol = 2))
This function applies clustering to a graph and extracts the largest connected component.
cluster.graph(network)
cluster.graph(network)
network |
A graph object. |
A list containing three elements: 'gr' with the largest connected component of the graph, 'cl' with a data frame of nodes and their cluster membership, and 'node.impo' with a data frame of node importance measures like degree, closeness, and betweenness.
## Not run: # Assuming 'network' is a predefined graph object cluster.graph(network) ## End(Not run)
## Not run: # Assuming 'network' is a predefined graph object cluster.graph(network) ## End(Not run)
This function applies community detection to a graph and returns the membership information of each node.
Community.Detection.Membership(network)
Community.Detection.Membership(network)
network |
A graph object. |
A data frame where each row represents a node in the graph, with columns for the node name and its corresponding community membership number. This information is useful for understanding the community structure within the graph.
## Not run: network <- make_graph("Zachary") membership_info <- Community.Detection.Membership(network) print(membership_info) ## End(Not run)
## Not run: network <- make_graph("Zachary") membership_info <- Community.Detection.Membership(network) print(membership_info) ## End(Not run)
This function applies community detection to a graph and plots the result.
Community.Detection.Plot(network)
Community.Detection.Plot(network)
network |
A graph object. |
A plot visualizing the graph with nodes colored according to their community membership. The plot also displays the modularity score as a sub-title, indicating the strength of the community structure.
## Not run: # Assuming 'network' is a predefined graph object # network <- make_graph("Zachary") Community.Detection.Plot(network) ## End(Not run)
## Not run: # Assuming 'network' is a predefined graph object # network <- make_graph("Zachary") Community.Detection.Plot(network) ## End(Not run)
This function normalizes Persian text by replacing specific characters and applies stemming.
f3(x)
f3(x)
x |
A character vector of Persian text. |
Returns a character vector where each element is the normalized and stemmed version of the corresponding element in the input vector. Specifically, it performs character replacement and stemming on each element of the input, thereby returning a vector of the same length but with processed text. If an element cannot be processed, it will be returned as NA in the output vector.
## Not run: text <- c("Persian text here") normalized_text <- f3(text) ## End(Not run)
## Not run: text <- c("Persian text here") normalized_text <- f3(text) ## End(Not run)
This function filters a data frame by the specified document ID. If the ID is 0, the entire data frame is returned.
f5(UPIP, I)
f5(UPIP, I)
UPIP |
A data frame with a column named 'doc_id'. |
I |
An integer representing the document ID. |
Returns a subset of the input data frame ('UPIP') containing only the rows where the 'doc_id' column matches the specified document ID 'I'. If 'I' is 0, the function returns the entire data frame unmodified. The output is a data frame with the same structure as the input but potentially fewer rows, depending on the presence and frequency of the specified ID.
data <- data.frame(doc_id = 1:5, text = letters[1:5]) filtered_data <- f5(data, 2)
data <- data.frame(doc_id = 1:5, text = letters[1:5]) filtered_data <- f5(data, 2)
This function extracts token, lemma, and part-of-speech (POS) tag information from a given data frame and compiles them into a new data frame.
f6(UPIP)
f6(UPIP)
UPIP |
A data frame containing columns 'token', 'lemma', and 'upos' for tokens, their lemmatized forms, and POS tags respectively. |
Returns a new data frame with three columns: 'TOKEN', 'LEMMA', and 'TYPE'. 'TOKEN' contains the original tokens from the 'token' column of the input data frame. 'LEMMA' contains the lemmatized forms of these tokens, as provided in the 'lemma' column. 'TYPE' contains POS tags corresponding to each token, as provided in the 'upos' column. The returned data frame has the same number of rows as the input data frame, with each row representing the token, its lemma, and its POS tag from the corresponding row of the input.
data <- data.frame(token = c("running", "jumps"), lemma = c("run", "jump"), upos = c("VERB", "VERB")) token_info <- f6(data)
data <- data.frame(token = c("running", "jumps"), lemma = c("run", "jump"), upos = c("VERB", "VERB")) token_info <- f6(data)
This function extracts tokens of a specified part of speech (POS) from the given data frame and counts their frequency.
f7(UPIP, type)
f7(UPIP, type)
UPIP |
A data frame with columns 'upos' (POS tags) and 'lemma' (lemmatized tokens). |
type |
A string representing the POS to filter (e.g., 'NOUN', 'VERB'). |
Returns a data frame where each row corresponds to a unique lemma of the specified POS type. The data frame has two columns: 'key', which contains the lemma, and 'freq', which contains the frequency count of that lemma in the data. The rows are ordered in decreasing frequency of occurrence. This format is useful for quickly identifying the most common terms of a particular POS in the data.
data <- data.frame(upos = c('NOUN', 'VERB'), lemma = c('house', 'run')) noun_freq <- f7(data, 'NOUN')
data <- data.frame(upos = c('NOUN', 'VERB'), lemma = c('house', 'run')) noun_freq <- f7(data, 'NOUN')
This function iteratively applies a series of suffix modifications to a vector of Persian words.
fun.all.sums(v, TYPE = TYPE.org)
fun.all.sums(v, TYPE = TYPE.org)
v |
A character vector of Persian words. |
TYPE |
A vector of suffix types for modification. |
Returns a character vector where each element corresponds to a word from the input vector 'v' with all specified suffix modifications applied. This results in a transformed vector where each word has been modified according to the series of suffix types provided in 'TYPE'. The length of the returned vector matches the length of the input vector.
## Not run: words <- c("Persian text here") modified_words <- fun.all.sums(words, TYPE) ## End(Not run)
## Not run: words <- c("Persian text here") modified_words <- fun.all.sums(words, TYPE) ## End(Not run)
This function modifies Persian words based on a specified suffix type.
fun.one.sums(v, type)
fun.one.sums(v, type)
v |
A character vector of Persian words. |
type |
A character string representing the suffix type. |
Returns a character vector where each element corresponds to a word from the input vector 'v' with the specified suffix type modified. This results in a transformed vector where each word has been modified to remove or alter the specified suffix. The length of the returned vector matches the length of the input vector, and each word is modified independently based on the specified suffix type.
## Not run: words <- c("Persian text here") modified_words <- fun.one.sums(words, "Persian text here") ## End(Not run)
## Not run: words <- c("Persian text here") modified_words <- fun.one.sums(words, "Persian text here") ## End(Not run)
This function processes a data frame containing bigrams and their frequency, and creates a new data frame with separated words and their frequencies.
FUNbigrams(tf.bigrams)
FUNbigrams(tf.bigrams)
tf.bigrams |
A data frame with bigram terms and their frequency. |
A tibble data frame where each row represents a unique bigram from the input data. The data frame contains three columns: 'word1' and 'word2' representing the individual words in the bigram, and 'weight' representing the frequency of the bigram in the corpus. This structure facilitates further analysis of the bigram relationships and their occurrences.
tf_bigrams <- data.frame(term = c("hello_world", "shiny_app"), term_freq = c(3, 2)) bigram_info <- FUNbigrams(tf_bigrams)
tf_bigrams <- data.frame(term = c("hello_world", "shiny_app"), term_freq = c(3, 2)) bigram_info <- FUNbigrams(tf_bigrams)
This function modifies Persian words ending with 'Persian text here' suffix.
fungan(v)
fungan(v)
v |
A character vector of Persian words. |
Returns a character vector where each element corresponds to a word from the input vector ‘v' with the ’Persian text here' suffix modified. This results in a transformed vector where each word ending with the specified suffix is altered. The length of the returned vector matches the length of the input vector, and each word is modified independently based on the presence of the specified suffix.
## Not run: words <- c("Persian text here") modified_words <- fungan(words) ## End(Not run)
## Not run: words <- c("Persian text here") modified_words <- fungan(words) ## End(Not run)
This function modifies Persian words ending with 'Persian text here' suffix.
fungi(v)
fungi(v)
v |
A character vector of Persian words. |
Returns a character vector where each element corresponds to a word from the input vector 'v' with the specified suffix modified. This results in a transformed vector where each word ending with the specified suffix is altered. The length of the returned vector matches the length of the input vector, and each word is modified independently based on the presence of the specified suffix.
## Not run: words <- c("Persian text here") modified_words <- fungi(words) ## End(Not run)
## Not run: words <- c("Persian text here") modified_words <- fungi(words) ## End(Not run)
This function modifies Persian words starting with the prefix 'Persian text here'.
funmi(v)
funmi(v)
v |
A character vector of Persian words. |
Returns a character vector where each element corresponds to a word from the input vector 'v' with the specified suffix modified. This results in a transformed vector where each word ending with the specified suffix is altered. The length of the returned vector matches the length of the input vector, and each word is modified independently based on the presence of the specified suffix.
## Not run: words <- c("Persian text here") modified_words <- funmi(words) ## End(Not run)
## Not run: words <- c("Persian text here") modified_words <- funmi(words) ## End(Not run)
This function performs lemmatization on a vector of Persian words.
LEMMA(Y, TYPE = TYPE.org)
LEMMA(Y, TYPE = TYPE.org)
Y |
A character vector of Persian words. |
TYPE |
A vector of suffix types for modification. |
Returns a character vector where each element is the lemmatized form of the corresponding element in the input vector 'Y'. Lemmatization involves removing inflectional endings and returning the word to its base or dictionary form. The length of the returned vector matches the length of the input vector, and each word is lemmatized independently based on the specified suffix types in 'TYPE'.
## Not run: words <- c("Persian text here") lemmatized_words <- LEMMA(words, TYPE) ## End(Not run)
## Not run: words <- c("Persian text here") lemmatized_words <- LEMMA(words, TYPE) ## End(Not run)
This function creates a correlation network based on specified terms and a threshold, and optionally plots it.
network.cor(dt, Terms, threshold = 0.4, pl = TRUE)
network.cor(dt, Terms, threshold = 0.4, pl = TRUE)
dt |
A document-term matrix. |
Terms |
A vector of terms to check for correlation. |
threshold |
A numeric threshold for correlation. |
pl |
A logical value to plot the network or not. |
If 'pl' is TRUE, a plot of the correlation network is displayed, highlighting the strength of associations between terms. If 'pl' is FALSE, a data frame with correlation pairs and their corresponding weights is returned.
This function calculates the PMI for collocations in a given text data.
PMI(x)
PMI(x)
x |
A data frame with columns 'token' and 'doc_id'. |
Returns a data frame where each row represents a unique keyword (collocation) in the input data. The data frame contains columns such as 'keyword', representing the keyword, and 'pmi', representing the PMI score of that keyword. Higher PMI scores indicate a stronger association between the components of the collocation within the corpus.
data <- data.frame(token = c("word1", "word2"), doc_id = c(1, 1)) pmi_scores <- PMI(data)
data <- data.frame(token = c("word1", "word2"), doc_id = c(1, 1)) pmi_scores <- PMI(data)
This function scales a numeric vector by a specified lambda value.
ScaleWeight(x, lambda)
ScaleWeight(x, lambda)
x |
A numeric vector. |
lambda |
A numeric scaling factor. |
A numeric vector where each element of the input vector 'x' is divided by the scaling factor 'lambda'. This results in a scaled version of the input vector.
scaled_vector <- ScaleWeight(1:10, 2)
scaled_vector <- ScaleWeight(1:10, 2)
This function contains the server-side logic for the MadanText application. It handles user inputs, processes data, and creates outputs to be displayed in the UI.
server(input, output)
server(input, output)
input |
List of Shiny inputs. |
output |
List of Shiny outputs. |
This function sets up the reactive environment and output elements in the Shiny application. It does not return any value but modifies the Shiny app's UI based on user inputs and reactive expressions. It returns a Shiny Server object.
This function sets various attributes for a given graph object, including vertex degree and edge width.
set.graph(network)
set.graph(network)
network |
A graph object. |
The input graph object with added attributes: 'degree' for each vertex and 'width' for each edge. These attributes enhance the graph's visual representation and analytical capabilities.
This function creates a user interface for the MadanText Shiny application. It includes various input and output widgets for file uploads, text input, and visualization.
ui
ui
An object of class shiny.tag.list
(inherits from list
) of length 4.
A Shiny UI object.