My Internship with SSW Analytics: How to Build an Innovative Data Science Product in Three Easy Steps

At the beginning of my nine-week internship this summer as the Data Science and Engineering Intern at SSW Analytics, I was given a deceptively challenging assignment: “Here’s some data about health insurance claims. See if you can use it to tell where doctors are referring their patients.” How does one take a single question and turn it into a meaningful analysis? And furthermore – how can that meaningful analysis further be developed into a replicable, scalable product offering?

In my nine weeks with SSW Analytics, here’s what I learned:

Step 1: Ask lots of questions

I had a terrific mentor in our lead Data Scientist, Vladislav Ryvkin who was an invaluable partner in being willing to answer any question I had – and I had a lot of questions. As someone without previous experience in the healthcare domain, these questions ranged from basic abbreviations and definitions—is a “Doctor” different from a “Practitioner” or an “HCP?”—to more general questions about, for instance, how best to approach missing data in a particular case, or what exactly the final product needed to do.

These kinds of questions represent an often overlooked key first step in building a data science-oriented product that is of any value. A successful analytics team, therefore, needs an open & safe environment where one can be comfortable asking something embarrassingly simple. No single person can be expected to be an expert on every component of a modern data-driven software project, and leaving each person to make their own assumptions at every step leads to an unfocused product with multiple conflicting visions and goals.

Questions don’t stop once you start writing code, either. On the contrary, every line written will often mean another round of questions raised and decisions made. Over the summer, I was able to tap into the broad wealth of knowledge at Saatchi & Saatchi Wellness through conversations and ideation sessions with people all across the agency: from my Analytics teammates to Brand Strategists, and even actual doctors within our Medical Strategy department. All of these discussions helped me to better understand how to build and improve what eventually became ReferralMap.

Steps 2 to N: Itirate Itereight Iterate!

Now comes the fun part: actually writing code! (Note: this is also the incredibly frustrating part.)

In data science, choosing the right tool for the job is very important. In my experience within the initial build stage of a data science project, it is preferable to work with a flexible & responsive set of tools that will produce a slower program than a set of rigid tools that will end up with a blazingly fast program, at the cost of a long and arduous process.

This meant a slower, but more expressive language—in this case, Python. It also meant Jupyter notebooks, about which I will skip the lengthy, but deserved, praise. In short, they are incredibly useful for doing the kind of development where exploratory analysis mixes with writing production code for the final product and where responsive, iterative development is called for.

Writing code and fixing bugs is not the only place where iteration is important. The requirements of the project change over time as a better understanding develops regarding what is possible and what is necessary, and goals need to be adjusted in response to these changes.

Showing the interim results to Kevin Troyanos elucidated a number of questions that I had, and some I hadn’t even realized that I had. Freely collaborating about the project with other team members across disciplines having different perspectives led down a number of very interesting and ultimately fruitful paths.

For instance, one discussion led to the usage of unsupervised techniques such as automated community detection to identify referral subnetworks:

ClusterMap

Another conversation led to the usage of geo-spatial data to map the network topology in a geographic way to identify patterns in referral distance:

NetworkMap

And yet another conversation led to taking a patient-journey oriented view, leveraging the same data, but in a slightly different way:

Pathway

None of these enhancements would have happened without simple meetings, discussion, and collaboration. An environment like the one I was part of this summer at Saatchi & Saatchi Wellness allowed for easy communication and a rapid, flexible approach to dealing with problems that worked very well for this project.

Steps N+1 to ???: Polish

After a month and a half of this process, I finally had a more-or-less finished product ready to be showed off. Of course, I immediately ran into issues. For example, if a future data scientist wanted to change the dataset used to generate the network (which was a necessary feature), they had to actually rewrite multiple parts of the code. As the code had been designed and written at the same time, parts were left over from earlier versions which were now unnecessary, and slowed down the final product. Typos and small bugs abounded. Sure, the program ran and sure, it gave the correct results, but I could barely understand how—and I had just finished writing it! In short, it was not in a final, polished state.

With the planning and decision making already complete, the simplest solution was actually rewriting the code from scratch. Well, not quite from scratch since I knew now exactly what it needed to do—and more importantly, what it didn’t need to do. The rewritten product was faster, more reliable, and much easier to understand. From there it could much more easily be extended to work with new data, and allow a wealth of options regarding its operation.

Taking time to ensure the code is in the best possible state before handing it off ended up being the right choice. It’s much better to spend two weeks in development polishing a tool than to spend two years in production dealing with a slow, buggy mess.

…Ok, so it wasn’t really three steps

If this summer taught me nothing else (and it did, but this post is getting long enough!) it was the importance of asking questions, especially in the field of data science. Not asking questions means not understanding the problem, which in turn means not being able to actually solve it. Being inflexible results in being unresponsive to new, potentially better ideas that come up during the process. Polish is necessary—a clunky, slow, and difficult application will hide bugs and be an unnecessary pain to use.

Ask questions, be flexible, and polish your work.

38217111_1864738246905600_1567978545567760384_o


Gray_Mackenzie_headshot_small

Written by Mackenzie Gray

Data Science & Engineering Intern at Saatchi & Saatchi Wellness

Mackenzie graduated with a B.A. in Biochemistry and is finishing my final year towards a M.S. in Data Science. He was the Data Science and Engineering Intern at Saatchi & Saatchi Wellness in the summer of 2018, where he focused on analytics automation and building ReferralMap. He enjoys working at the intersection of data and health and is passionate about using the power of data to build tools and products which improve people’s lives. He enjoys hiking, playing guitar, and playing with his dog.

 

What do you think?

This site uses Akismet to reduce spam. Learn how your comment data is processed.