Put simply, my internship with Saatchi and Saatchi Wellness was a magical experience.
This summer, I had the opportunity to work as a Data Analytics intern at Saatchi & Saatchi Wellness, a healthcare advertising agency that is part of Publicis Health. A bit about myself – I am a student at New York University working towards my Master’s degree in Applied Statistics. When I first joined the team at the start of my internship, what immediately struck me was the diversity in skillsets across the analytics team. The SSW analytics team is a group of bright people who all come from different backgrounds (data strategy, data engineering, and data science) that share a common passion: using data to tell stories. They also held in common an openness to share their experiences and consistently help to point me in the right direction of key resources that would help me become a better analytics professional.
The most valuable lesson I learned in my time with SSW is that superior soft skills are as equally important as superior technical skills in being a great data scientist. Not only do you have to have the technical chops to get your hands dirty and dive deep into the data, but you also have to have the business savvy to understand and articulate the key challenges at hand to be able to garner relevant actionable insights. I definitely feel like I’ve grown as a data scientist and have met people that I am grateful to have had the chance to work with.
Learning how to use APIs to analyze Twitter data
Web-scraping and learning how to work with APIs were areas I always wanted to explore prior to beginning my internship. My two months in the data and analytics team gave me the opportunity and freedom to explore my areas of interest, and so I set out to do just that.
Web scraping or web content extraction can prove to be useful in a number of ways for marketers. At its core, web content extraction is different from simply leveraging 3rd party social listening tools that are able to aggregate social media metrics. While these tools are powerful for certain specific tasks, writing code that taps into APIs gives us the ability to gather raw data in a flexible, scalable fashion. With the right data, we can build customized data visualizations in addition to robust statistical and machine learning models with it.
APIs essentially provide a user with a pre-defined format to allow the creation of applications that access the features or data of an operating system or application.
For my internship project, I chose to work with Twitter’s web-based API. Unless you’ve been living under a rock for the past decade, you know that Twitter users broadcast short messages called tweets to be shared with others who interact by retweeting or responding. Twitter has a message size limit of 280 characters. This limitation forces users to be brief in the messages they want to share. As of 2019 there are over 120 million daily unique users of the platform, making it a powerful platform for understanding the voice of the consumer at scale.
What Data Can Scraping Tweets Give Us?
- Brand Twitter Handles shows what subjects its followers are tweeting about. This can give actionable insights to marketing into real time concerns of existing or potential customers.
- Influencer Twitter Handles gives a topic analysis of followers and brand influencers. This can produce and complement the view of opinion polls.
First, I needed to create a Twitter Developer Account to access the API Tokens. Once approved, create an app with the permissions that you want. Take note of the access token and access secret. You will need those keys when authorizing.
The API documentation is laid out on the developer website, allowing me to start with small code and complement it with the features I needed. With API access, you can access all tweets made by any user, the tweets containing a particular term or combination of terms, or tweets done by specific date range, etc.
One thing I quickly learned when scraping a bunch of tweets at a time is that there are rate limits for developers. Twitter doesn’t allow users to gather a large number of tweets at once. That being said – I was able to generate enough data to gain a representative sample to glean actionable insights.
After gathering the data, the next step was to wrangle the data into a suitable format for analysis. First, I created one document per follower by aggregating each tweet. Each follower then has a unique document which becomes the basis of our analysis. Further data cleansing must be done in order to make the data ready for analysis including removing URLS, removing documents with less than 100 words as it may not be informative, removing stop words, etc.
Once a Twitter data pipeline is established, a whole host of data science applications based on key business questions are now possible. A few areas that I explored:
- Creating spatial graphs on where tweets are mentioned most in the world
- Running sentiment analysis on tweets to see if the overall opinion of brand/company is positive or negative
- Creating social graphs of the most popular users that tweet about a brand/company
Essentially, the Twitter API allows you to build a corpus of relevant social data, from which we can continually repurpose based on key business questions, and areas of curiosity. During my internship, I only scratched the surface at what is possible, and I’m excited to continue to experiment with additional use cases.
My summer internship with SSW Analytics taught me that data related projects ultimately require grit, passion, and perseverance. From understanding the business context at hand, moving through multiple iterations trial and error (and a lot of code), and ultimately tying actionable insights back to the business context in a clean, presentable, and strategically relevant format. Above all, I learned that it’s ok to make mistakes, ask for advice, and collaborate with people from different backgrounds. Because at the end of the day, the journey of learning in data analytics takes time, experimentation, constant iteration, and above all, teamwork.
Written By Christina Ho
2019 Data Analytics Intern, Saatchi & Saatchi Wellness
I am currently a second year Master’s student in Applied Statistics at New York University. I also have a Bachelor’s in Communication and Business. I interned for the Saatchi team Summer of 2019. Previously, I worked in advertising and marketing strategy and I hope to bring my industry knowledge and communication skills in conjunction with business intelligence, machine learning, and statistical methodology. I advocate for the importance of data-driven, fact-based research as a crucial factor in decision-making.