Building a Twitter sentiment analysis platform with AWS

Director at Yellow Mango | AWS Certified Solutions Architect | Experienced Business Solutions Architect | Agile

Overview
12 months ago we started to look at Twitter sentiment analysis and how organisations can use this data to better understand their products, brand and services. We launched a project called ‘BrandSocialable.com’ which looked to provide users with a platform whereby they could view and compare sentiment for their brands. After a couple of months of development we had a beta in place however, due to time constraints with other clients it wasn’t something we could continue to actively pursue we therefore put it on hold.
So why the renewed interest in Twitter sentiment analysis? There have been a couple of drivers to re-evaluate the project; I’m currently studying the AWS Associate Architect course learning all things AWS and the power of on-demand resources, secondly; another client approached us for a similar use case of using sentiment analysis to track the sentiment of movies currently being shown at cinemas.
Approach
After a couple of hours and familiarising myself with my original Python code my Twitter mining script was up and running again however I wanted to move on from a relational MySQL datastore into something with higher scalability and offered opportunities for machine learning.
As an AWS Registered Partner we have access to a wealth of AWS related resources and therefore decided to see if there was anything we could reuse or learn from to better our offering. It didn’t take me long to stumble across a great article by Ben Snively and Viral Desai - https://aws.amazon.com/blogs/machine-learning/build-a-social-media-dashboard-using-machine-learning-and-bi-services/
The article details out how the AWS services can be used to build a social media dashboard using machine learning and BI services. They also provide a link so you can provision the stack yourself which makes it simple to get started.
In terms of the general approach it is very similar to my original concept whereby keywords search terms are sent through to the Twitter API e.g. Jurassic Park movie and then a series of tweets are returned back. Once the tweets have been stored then NLP (natural language processing) is used to determine whether the content of the tweet is seen as positive, neutral or negative.
Storing tweet content and the sentiment was piped directly into a MySQL DB in my original design whereas the approach suggested in the article was to utilise S3 buckets. By using S3 I wasn’t restricted by DB limits which was a benefit however in the future i may consider adjusting the architecture so that a noSQL DB such as MongoDB or DynanoDB is used so that I can explore the use of Elasticsearch capabilities.
An overview of the architecture can be found below.

Getting started
Once I had provisioned the infrastructure and adjusted the EC2 instance type (from medium to micro given the usage demands are minimal) I configured AWS Athena so that it could start to discover the raw files in the S3 buckets. Setting up AWS Athena as is the case with most AWS products was easy and I tested out a number of SQL queries to validate the data.
Now the Twitter data was being retrieved and stored in S3 with AWS Athena providing a relational view of the data it was time to connect up a data dashboard. The article suggested using AWS Quicksight which connects directly to the AWS Athena data source easily once the access groups in AWS were amended. During my original implementation i made use of Chart.js to visualise the data and in my opinion this is a better choice given the vast chart types and integration options to choose from.
AWS Quicksight provides a user friendly interface that allows you generate dashboards quickly and within a matter of minutes I was able to generate charts based on the data stored. Quicksight refreshes the data from AWS Athena on request or scheduled and once it has processed the data stores a copy local within a Quicksight spice. This helps in terms of performance when generating reports however you need to be mindful of the data limits and the associated costs of adding more storage when the spice starts growing in size.
See below an extract of the dashboard from AWS Quicksight.

Applicable use cases
Now the platform is built out with tweets and sentiment data available via Quicksight we can start to look at more use cases where this platform can be used. Originally during the initial implementation the intention was to track brands, specifically those within the e-commerce/fashion domain such as Boohoo, SimplyBe, Missguided. I could then begin to track and categorise the tweets allowing organisations to identify areas of improvement or learn of what works well for example customer services, delivery queries and product quality were popular categories.
Based on the implemented architecture scaling out the platform into new verticals can be easily completed with little effort by adjusting the terms that are sent through on the JS file to the Twitter API.
So what other use cases can be considered with this platform? I’ve outlined some initial thoughts but im sure there are others which i’ve missed:
- Tracking brand sentiment by geography
- Monitoring the sentiment of a product or service launch
- Identification of negative tweets and addressing customer concerns
- Automation of follow-up activity when either positive or negative sentiment is identified e.g. trigger a paid ad campaign when negative sentiment or perhaps email the marketing team if an influencer is positive about a product
- Identification of tweet categorises and sentiment e.g. customer services
Next steps
Personally i believe sentiment analysis is an extremely important consideration for any organisation (profit and not-for-profit) whether purely online, hybrid or traditional offline. Understanding how customers ‘feel’ about your brand, product or service should be at the top or at least up there in terms of how a business measure success.
With regards to architectural considerations I would like to review as mentioned earlier piping the data into a noSQL storage so that I can begin running machine learning algorithms on the data looking for patterns including predictive analysis. Going back to the example i mentioned earlier where we are currently tracking movie sentiment, imagine a scenario where based on historic data that we could predict that action movies screened in Manchester and Newcastle have a positive sentiment whereas thriller movies have a positive sentiment in Edinburgh.
I feel there are opportunities to consider with invoking events using Lambda functions based on the sentiment values stored, such as notifying a marketing team when a negative sentiment is identified in a particular geographic area so follow-up action can be taken with offline/online campaigns. A similar scenario might be that when a Twitter influencer positively mentions your brand, service or product that follow-up action could be triggered to provision additional AWS services to cope with increased demand proactively before any potential customers visit your website or app.
If you are interested in using the sentiment analysis platform we have developed or would like to learn how your organisation could use these techniques then get in touch with our team we'd love to hear from you.