Title: | A Simple Needmining Implementation |
---|---|
Description: | Showcasing needmining (the semi-automatic extraction of customer needs from social media data) with Twitter data. It uses the handling of the Twitter API provided by the package 'rtweet' and the textmining algorithms provided by the package 'tm'. Niklas Kuehl (2016) <doi:10.1007/978-3-319-32689-4_14> wrote an introduction to the topic of needmining. |
Authors: | Dorian Proksch <[email protected]>, Timothy P. Jurka [ctb], Yoshimasa Tsuruoka [ctb], Loren Collingwood [ctb], Amber E. Boydstun [ctb], Emiliano Grossman [ctb], Wouter van Atteveldt [ctb] |
Maintainer: | Dorian Proksch <[email protected]> |
License: | GPL-3 |
Version: | 0.1.1 |
Built: | 2024-10-31 06:53:37 UTC |
Source: | CRAN |
needmining provides the basic functionality to download social media data from Twitter and semi automatically classify the data regarding user needs
downloadTweets
downloads Tweets containing specified keywords from the Twitter API
downloadTweets(search_terms, n = 100, lang = "en")
downloadTweets(search_terms, n = 100, lang = "en")
search_terms |
a string containing the search terms in Twitter format (use OR and AND to connect multiple search terms in one search) |
n |
The number of Tweets downloaded. Please note that this limit is based on your Twitter account |
lang |
The language of the Tweets. Default is English. Please refer to the Twitter API documentation for language codes |
This function downloads Tweets for a specified keyword list, removes line breaks, adds a column isNeed filled with 0
a data frame containing the tweets as well as an additional column isNeed filled with 0
Dorian Proksch <[email protected]>
searchterm <- '"smart speaker" OR "homepod" OR "google home mini"' ## Not run: token <- twitterLogin() currentTweets <- downloadTweets(searchterm, n = 180) ## End(Not run)
searchterm <- '"smart speaker" OR "homepod" OR "google home mini"' ## Not run: token <- twitterLogin() currentTweets <- downloadTweets(searchterm, n = 180) ## End(Not run)
filterTweetsMachineLearning
classifies a list of Tweets as
needs based on the random forest machine learning algorithm
filterTweetsMachineLearning(dataToClassify, trainingData)
filterTweetsMachineLearning(dataToClassify, trainingData)
dataToClassify |
a dataframe containing the Tweet messages to classify |
trainingData |
a dataframe containing Tweets messages with a given classification (0=not a need, 1=a need) |
This function uses a machine learning algorithm (random forest) to classify needs based on their content. It needs a training data set with classified needs (indicated by 0=not a need, 1=a need). This function used code fragments from the archived R packages maxent and RTextTools. The authors are Timothy P. Jurka, Yoshimasa Tsuruoka, Loren Collingwood, Amber E. Boydstun, Emiliano Grossman, Wouter van Atteveldt
a dataframe with classified data
Dorian Proksch <[email protected]>
data(NMTrainingData) data(NMdataToClassify) smallNMTrainingData <- rbind(NMTrainingData[1:75,], NMTrainingData[101:175,]) smallNMdataToClassify <- rbind(NMdataToClassify[1:10,], NMdataToClassify[101:110,]) results <- filterTweetsMachineLearning(smallNMdataToClassify, smallNMTrainingData)
data(NMTrainingData) data(NMdataToClassify) smallNMTrainingData <- rbind(NMTrainingData[1:75,], NMTrainingData[101:175,]) smallNMdataToClassify <- rbind(NMdataToClassify[1:10,], NMdataToClassify[101:110,]) results <- filterTweetsMachineLearning(smallNMdataToClassify, smallNMTrainingData)
filterTweetsNeedwords
filters a list of Tweets regarding
need indicating words
filterTweetsNeedwords(tweetMessages, needWords)
filterTweetsNeedwords(tweetMessages, needWords)
tweetMessages |
a dataframe containing the Tweet messages |
needWords |
a string containing needwords separately by ';' |
This function filters Tweets regarding a list of need indicating words
a filtered data frame
Dorian Proksch <[email protected]>
data(NMTrainingData) needWordsNeedsOnly <- "need;want;wish;feature;ask;would like;improve;idea;upgrade" needsSimple <- filterTweetsNeedwords(NMTrainingData, needWordsNeedsOnly) needWordsExtended <- "need;want;wish;feature;ask;would like;improve;idea;upgrade; support;problem;issue;help;fix;complain;fail" needsSimpleExtended <- filterTweetsNeedwords(NMTrainingData, needWordsExtended)
data(NMTrainingData) needWordsNeedsOnly <- "need;want;wish;feature;ask;would like;improve;idea;upgrade" needsSimple <- filterTweetsNeedwords(NMTrainingData, needWordsNeedsOnly) needWordsExtended <- "need;want;wish;feature;ask;would like;improve;idea;upgrade; support;problem;issue;help;fix;complain;fail" needsSimpleExtended <- filterTweetsNeedwords(NMTrainingData, needWordsExtended)
A dataset containing 200 artificially generated messages in the Twitter format for the topic of smart speakers. These messages are inspired by real Tweets (rephrased, anonymized, all brand names removed). Furthermore, Tweets containing stopwords were removed. 100 rows contain user needs, 100 rows contain no user needs. The data is coded (0=no need,1=a need). It can be used to test a classification algorithm.
data(NMdataToClassify)
data(NMdataToClassify)
A data frame with 200 rows and 2 variables:
Contains the message
Is a need described within the message? 0=no, 1=yes
A dataset containing 200 artificially generated messages in the Twitter format for the topic of smart speakers. These messages are inspired by real Tweets (rephrased, anonymized, all brand names removed). 100 rows contain user needs, 100 rows contain no user needs. The data is coded (0=no need,1=a need). The data can be used to train a classification algorithm.
data(NMTrainingData)
data(NMTrainingData)
A data frame with 200 rows and 2 variables:
Contains the message
Is a need described within the message? 0=no, 1=yes
readNeedminingFile
reads a Needmining file
created by the needmining package
readNeedminingFile(filename)
readNeedminingFile(filename)
filename |
The filename of the file to read |
This function reads a Needmining file created by the needmining package
a data frame containing the content of the file
Dorian Proksch <[email protected]>
data(NMTrainingData) saveNeedminingFile(filename=file.path(tempdir(), "NMTrainingData.csv"), NMTrainingData) currentNeedData <- readNeedminingFile(file.path(tempdir(), "NMTrainingData.csv"))
data(NMTrainingData) saveNeedminingFile(filename=file.path(tempdir(), "NMTrainingData.csv"), NMTrainingData) currentNeedData <- readNeedminingFile(file.path(tempdir(), "NMTrainingData.csv"))
removeTweetsStopwords
removes Tweets containing stopwords
removeTweetsStopwords(tweetMessages, stopWords)
removeTweetsStopwords(tweetMessages, stopWords)
tweetMessages |
a dataframe containing the Tweet messages |
stopWords |
a string containing stopwords separated by ';' |
This function removes Tweets containing stopwords from a list of Twitter messages.
a filtered data frame
Dorian Proksch <[email protected]>
stopWords <- "review;giveaway;save;deal;win;won;price;launch;news;gift;announce; reveal;sale;http;buy;bought;purchase;sell;sold;invest;discount; coupon;ship;giving away" data(NMTrainingData) filteredTweets <- removeTweetsStopwords(NMTrainingData, stopWords)
stopWords <- "review;giveaway;save;deal;win;won;price;launch;news;gift;announce; reveal;sale;http;buy;bought;purchase;sell;sold;invest;discount; coupon;ship;giving away" data(NMTrainingData) filteredTweets <- removeTweetsStopwords(NMTrainingData, stopWords)
saveNeedminingFile
saves a dataframe
created by the needmining package
to a file
saveNeedminingFile(filename, tweetMessages)
saveNeedminingFile(filename, tweetMessages)
filename |
The filename to save to |
tweetMessages |
An object containing the Twitter messages |
This function saves a dataframe created by the needmining package to a file
Dorian Proksch <[email protected]>
data(NMTrainingData) saveNeedminingFile(filename=file.path(tempdir(), "NMTrainingData.csv"), NMTrainingData)
data(NMTrainingData) saveNeedminingFile(filename=file.path(tempdir(), "NMTrainingData.csv"), NMTrainingData)
twitterLogin
creates a token for the
Twitter API
twitterLogin()
twitterLogin()
This function creates a Twitter token of the Twitter API. This is necessary to use functions of the Twitter API. The login data has to be stored in the 'TwitterLoginData.csv' in the current set working directory (please refer to getwd() and setwd()). The file should have the following format: START app;consumer_key;consumer_secret;access_token;access_secret LINEBREAK The name of your app; your consumer_key; your consumer_secret; your access_token; your access_secret END OF FILE
a Twitter token
Dorian Proksch <[email protected]>
## Not run: token <- twitterLogin() ## End(Not run)
## Not run: token <- twitterLogin() ## End(Not run)