Text-Mining-DataCamp-Analyzing Social Media Data in R
1. Understanding Twitter Data
1.1 Analyzing twitter data (video)
1.2 Power of twitter data
Instruction:
# Extract live tweets for 120 seconds window
tweets120s <- stream_tweets("", timeout = 120)
# View dimensions of the data frame with live tweets
dim(tweets120s)
1.3 Pros and cons of twitter data
1.4 Extracting twitter data (video)
1.5 Prerequisites to set up the R environment
1.6 Search and extract tweets
Instruction:
# Extract tweets on "#Emmyawards" and include retweets
twts_emmy <- search_tweets("#Emmyawards",
n = 2000,
include_rts = TRUE,
lang = "en")
# View output for the first 5 columns and 10 rows
head(twts_emmy[,1:5], 10)
1.7 Search and extract timelines
Instruction:
# Extract tweets posted by the user @Cristiano
get_cris <- get_timeline("@Cristiano", n = 3200)
# View output for the first 5 columns and 10 rows
head(get_cris[,1:5], 10)
1.8 Components of twitter data (video)
1.9 User interest and tweet counts
Instruction:
# Create a table of users and tweet counts for the topic
sc_name <- table(tweets_ai$screen_name)
# Sort the table in descending order of tweet counts
sc_name_sort <- sort(sc_name, decreasing = TRUE)
# View sorted table for top 10 users
head(sc_name_sort, 10)
1.10 Compare follower count
Instruction:
# Extract user data for the twitter accounts of 4 news sites
users <- lookup_users("nytimes", "CNN", "FoxNews", "NBCNews")
# Create a data frame of screen names and follower counts
user_df <- users[,c("screen_name","followers_count")]
# Display and compare the follower counts for the 4 news sites
user_df
1.11 Retweet counts
Instruction 1:
# Create a data frame of tweet text and retweet count
rtwt <- tweets_ai[,c("text", "retweet_count")]
head(rtwt)
# Sort data frame based on descending order of retweet counts
rtwt_sort <- arrange(rtwt, desc(retweet_count))
Instruction 2:
# Create a data frame of tweet text and retweet count
rtwt <- tweets_ai[,c("text", "retweet_count")]
head(rtwt)
# Sort data frame based on descending order of retweet counts
rtwt_sort <- arrange(rtwt, desc(retweet_count))
# Exclude rows with duplicate text from sorted data frame
rtwt_unique <- unique(rtwt_sort, by = "text")
# Print top 6 unique posts retweeted most number of times
rownames(rtwt_unique) <- NULL
head(rtwt_unique)
2. Analyzing Twitter Data
2.1 Filtering tweets (video)
2.2 Filtering for original tweets
Instruction:
# Extract 100 original tweets on "Superbowl"
tweets_org <- search_tweets("Superbowl -filter:retweets -filter:quote -filter:replies", n = 100)
# Check for presence of replies
count(tweets_org$reply_to_screen_name)
# Check for presence of quotes
count(tweets_org$is_quote)
# Check for presence of retweets
count(tweets_org$is_retweet)
2.3 Filtering on tweet language
Instruction:
在这里插入代码片
2.4 Filter based on tweet popularity
Instruction:
在这里插入代码片
2.5 Twitter user analysis
Instruction:
在这里插入代码片
2.6 Extract user information
Instruction:
在这里插入代码片
2.7 Explore users based on the golden ratio
Instruction:
在这里插入代码片
2.8 Subscribers to twitter lists
Instruction:
在这里插入代码片
2.9 Twitter trends
Instruction:
在这里插入代码片
2.10 Available trends
Instruction:
在这里插入代码片
2.11 Trends by country name
Instruction:
在这里插入代码片
2.12 Trends by city and most tweeted trends
Instruction:
在这里插入代码片
2.13 Plotting twitter data over time
Instruction:
在这里插入代码片
2.14 Visualizing frequency of tweets
Instruction:
在这里插入代码片
2.15 Create time series objects
Instruction:
在这里插入代码片
2.16 Compare tweet frequencies for two brands
Instruction:
在这里插入代码片