Social Media Analysis - Importing Twitter Data Into Stata
A Stata blog post in 2017 introduced the twitter2stata command to allow you to import twitter data directly into Stata for analysis. This command was created using the ‘Java plugins’ feature which was improved with the release of Stata 15. The command requires you to have a twitter account to be able to access the data, as you must set up an application within twitter that Stata can connect to through the twitter2stata command.
Twitter Application Setup Part 1:
Log into your twitter account
Click the arrow next to “Developer” in the top left corner and click “Apply” down the bottom
Add a mobile phone number to your account if you haven’t already
Verify phone number
Select “I am requesting access for my own personal use”, name your developer account, select your country and click Next
Select any use cases you’re looking at, if its just for fun its fine to select other
Answer the 4 questions, the purpose is to download twitter data into Stata software, if this is just for your personal use then you’re not using the data with twitter or displaying the content anywhere
If you’re making the twitter data you get available to government then answer “Yes” to the below question, otherwise answer “No” and click Continue
Agree to the terms and conditions and click Apply
Once you have received your developer approval you can move on to setting up your twitter app
Twitter Application Setup Part 2:
1. Log into your twitter developer account
2. In the top right corner of your screen click on the drop down arrow next to your name and click “Apps”
3. In your “Apps” page, click the “Create an app” button in the top right
4. The following steps are what I had to go through as a company rather than an individual. Because of this, some of the steps you see may differ from what I have shown below. Follow the instructions Twitter gives you to set up your app.
5. Give your app a name and give a description of what the app is for.
6. Give a website URL. If you have a website you can use that, otherwise you can just use Stata’s URL
7. Enter a description of your app. Feel free to use the one shown below.
8. Click Create.
9. Once created you will be shown the App details page for your app. There are 3 tabs across the top. Click on the “Keys and tokens” tab in the middle.
10. Click the “Create” button under the heading “Access token & access token secret”.
11. Copy your consumer API key, consumer API secret key, access token, and access token secret and save them in a do file to be run at the start of each twitter2stata session. An example of an appropriate do file is shown below.
Note: each “local” item in the above do file is called using the grave key (next to the 1 key) to open and the apostrophe key (next to the enter key) to close. Do not use 2 apostrophes to call these as part of the twitter2stata set access line, as this will not work.
How to Use:
Search tweets or users
Search a specific user's tweets, likes, following and followers
In this example, I want to examine tweets about Season 3 of Riverdale, which started airing last week. I’m going to use the string #Riverdale to find all the tweets relating to the show and download them directly into Stata. I first run my do-file containing my tokens, as shown in the previous section, and then in the command pane I type the following:
This takes some time to access and download all the tweets with #Riverdale in them. The daterange() option I specified will download tweets in a particular date range, up to 7 days prior to today. You can specify a start date and end date if you are looking for tweets made for less than 7 consecutive days in the past week. You are unable to specify a date range earlier than the last 7 days from today. As I haven’t specified a start or end date, Stata assumes I want to look for all tweets from 7 days ago to today. This is appropriate in this case as #Riverdale season 3 started approximately 7 days ago. It is unlikely, given the volume of tweets on this topic, that I will get all tweets made about #Riverdale in the past 7 days. For this exercise a selection is enough. You may want to narrow down your search if you are looking to get all the relevant tweets on a particular topic.
I have gotten a selection of 15,000 tweets containing the string #Riverdale. There are limits on how much data you can download in one go, and in our case 15,000 tweets is our limit. There is a lot of information associated with tweets, and the twitter2stata command has downloaded 45 variables containing different information about each tweet. The information includes the tweet text, the user id or @ the tweet came from, the user’s name if given, date user account was created, user description, total number of tweets user has liked, current number of followers user has, current number of accounts user is following, the user’s language, the user’s specified location, the user’s latitude and longitude if geotagging is on, list of countries the user’s tweets are withheld from, the date and time the tweet was sent, the type of device used to send the tweet (phone, tablet, desktop computer, laptop, etc.) and so much more. It is amazing how much data can be gathered from a single tweet.
From here, I’m going to encode the variable "tweet_source", which contains information about what device the tweet was sent from. In the command pane I type the following:
There are 31 tweet sources, however these are shown in long href labels. I have renamed all the labels, and below is an example of the command I used to do this, as well as the tabulate command to show how many tweets were posted per device used:
From this I might infer that someone who likes Riverdale is more likely to have an iPhone than any other type of phone since, unlike iOS, Android is on any number of different phone brands. It is worth noting that approximately 90% of all tweets collected were sent from a phone app.