Uses "Twitter Streaming API" to get the target tweets(real-time) for a recent high traffic event(s), and persisting them to elasticsearch. Later, tweets can be filtered using REST API
Python 2.7+, pip, Elastilcsearch, Twitter developer app
Note: For creating
Twitter developer app, visit Twitter Application Management page
- Move to
<project-dir>, create virual environment and then activate it as
$ cd <project-dir>
$ virtualenv .environment
$ source .environment/bin/activate- Copy
settings_sample.pyand createsettings.py. Edit configuration/settings related toTwitter developer app.
$ cp settings_sample.py settings.py- Add project to
PYTHONPATHas
$ export PYTHONPATH="$PYTHONPATH:." # . corresponds to current directory(project-dir)If you are using PyCharm then it can be done under
run configuration.
- Under
<project-dir>install requirements/dependencies as
$ pip install -r requirements.txt- Then run
app.pyas
$ python app.pyNow you can access the application by visiting
{protocol}://{host}:{port}. For localhost it ishttp://localhost:5000.
Congratulations! Start Streaming & later on data can be filtered by using Funneling API.
Fields: In Elasticsearch, every document tweet under tweets_index will contain following fields -
-
tweet_text: string, -
screen_name: string, -
user_name: string, -
location: string, -
source_device: string, -
is_retweeted: boolean, -
retweet_count: integer, -
country: string, -
country_code: string, -
reply_count: integer, -
favorite_count: integer, -
created_at: datetime, -
timestamp_ms: long, -
lang: string, -
hashtags: array
Operators: Following operators are available in order to filter/query data/tweets -
-
equals: Facilitates exact match, or = operator for numeric/datetime values. -
contains: Facilitates full-text search. -
wildcard:-
startswith: *ind (Starts with ind), -
endswith: ind* (Ends with ind), -
wildcard: *ind* (searches ind anywhere in string)
-
-
gte: >= operator for numeric/datetime values. -
gt: > operator for numeric/datetime values. -
lte: <= operator for numeric/datetime values. -
lt: < operator for numeric/datetime values.
GET /stream?keywords=cricket,hockey,viratIt will start streaming real-time tweets containing kewords. And tweets will get persisted in elasticsearch under
the index tweets_index and tweet document type.
Response
{
"status": "success",
"message": "Started streaming tweets with keywords [u'cricket', u'hockey', u'virat']"
}POST /funnel?from=0&size=20Note:
from&sizecan be used for limit/pagination, but are optional, defaultsizeis 100.
Request body
{
"sort":["created_at"], // User '-' sign for 'desc' order.
"criteria": {
"AND": [{
"fields": ["created_at"],
"operator": "gte", // equals, contains, wildcard, gte, gt, lte, lt
"query": "2017-12-17T14:18:13"
}, {
"fields": ["location"],
"operator": "wildcard",
"query": "*ind*"
}, {
"fields": ["hashtags"], // 'hashtags' is an array field.
"operator": "contains",
"query": "Cricket"
}
],
"OR": [{
"fields": ["hashtags"],
"operator": "contains",
"query": "cricket"
}, {
"fields": ["hashtags"],
"operator": "contains",
"query": "hockey"
}
],
"NOT": [{
"fields": ["source_device"],
"operator": "equals",
"query": "Twitter for Android"
}
]
}
}Response
{
"count": {
"total": 21,
"fetched": 10
},
"results": [
{
"sort": [
1513520366000
],
"_type": "tweet",
"_source": {
"lang": "in",
"is_retweeted": false,
"retweet_count": 0,
"screen_name": "T10CricketLive",
"country": "",
"created_at": "2017-12-17T14:19:26",
"hashtags": [
"IndvSL",
"Cricket"
],
"tweet_text": "Ind 193/2 (30 ov), need 23. Karthik 15(24), Dhawan 87(79). Bowling figures of Akila Dananjaya so far: 7-0-48-1. #IndvSL #Cricket",
"source_device": "IFTTT",
"reply_count": 0,
"location": "New Delhi, India",
"country_code": "",
"timestamp_ms": "1513520366428",
"user_name": "cricGuru5167",
"favorite_count": 0
},
"_score": null,
"_index": "tweets_index",
"_id": "AWBk2AUVU3yhj98vAeu_"
},
{......},
{......},
{......},
{......},
]
}