Below is python code to search twitter and export the results to a csv file. This is a good way to build up a database of text to use with Hadoop. I’ll be writing posts on analyzing twitter data in the future.
The code contains the following variables:
TWITTER_APP_KEY
TWITTER_APP_KEY_SECRET
TWITTER_ACCESS_TOKEN
TWITTER_ACCESS_TOKEN_SECRET
You must obtain these by going to https://dev.twitter.com/apps
This code also uses the twython Python module, so you must install this as well.
Here is the code. I’m far from a Python expert (or even novice) so please provide any improvements in the comments.
from twython import Twython, TwythonError import string, json, pprint import urllib from datetime import timedelta from datetime import date from time import * import string, os, sys, subprocess, time import cvs TWITTER_APP_KEY = '' #supply the appropriate values TWITTER_APP_KEY_SECRET = '' TWITTER_ACCESS_TOKEN = '' TWITTER_ACCESS_TOKEN_SECRET = '' harvest_list = ['#snow'] #put the words you want to search for here c = csv.writer(open("tweetfile.csv", "wb")) for tweet_keyword in harvest_list: twitter = Twython(app_key=TWITTER_APP_KEY, app_secret=TWITTER_APP_KEY_SECRET, oauth_token=TWITTER_ACCESS_TOKEN, oauth_token_secret=TWITTER_ACCESS_TOKEN_SECRET) try: search_results = twitter.search(q=tweet_keyword, count="500", lang='en') # our search for the current keyword except TwythonError as e: print e tweets=search_results['statuses'] for tweet in tweets: try: c.writerow([str(tweet['text'].encode('utf-8').replace("'","''").replace(';',''))]) except: print "############### Unexpected error:", sys.exc_info()[0], "##################################"