From Math M.S. to Data Science
5 min read
Photo by Franki Chamaki on Unsplash
I have an M.S. in pure math (think theorem-proving, not number-crunching), and recently decided to begin transitioning into data science, specifically machine learning/AI. My background has taught me to think rigorously and precisely, but hasn’t prepared me to do anything other than teaching, either as an adjunct at a college or university, or a high school. Research was my objective, not teaching, though the Ph.D. and tenured faculty position I had in mind would certainly have involved some teaching. It turned out I might actually be good at teaching, and for a few years I thought I could make a decent fist of it, but while the lack of a doctorate means I’m effectively shut out of mathematical research for the time being, the urge to work on new questions and ideas has never left me.
Two of my favorite math courses were graph theory and mathematical logic, and when I was still planning to pursue a Ph.D., I asked my mathematical logic professor for advice on specializing in that field. She told me that it’s out of fashion at the moment, and that to specialize in mathematical logic would be academic career suicide, but that the modern application of mathematical logic was AI. Graph theory is also prominent in data structures and algorithms, and virtually all of my favorite topics had a part to play in machine learning.
I wasn’t convinced at the time, but after deciding that spending another five to six years as a student wasn’t the right decision for my career trajectory for now, I finally took her advice. Following the guidance of Edouard Harris, co-founder of Sharpest Minds, I started by learning Python (beginning with José Portilla’s Complete Python Bootcamp, and then filling in a few gaps in my foundation with Jake VanderPlas’ Whirlwind Tour of Python). By no means would I claim to have mastered Python at this time, but I have enough of the basics down to take the next step.
It then took an enormous leap of faith for me to heed the advice of Harris and many others to not follow my deeply-ingrained learning pattern from grad school: thoroughly learning each component before putting them all together. Instead, I’m adopting a top-down approach: learning tools by using them, not before using them. This is an enormous mindset shift for me, and I don’t know how long it’ll take me to get used to it.
Photo by Fabian Grohs on Unsplash
So instead of moving further in Python with an advanced Python course, I’ve just completed Andrew Ng’s Deep Learning Specialization on Coursera, and am starting fast.ai’s Practical Deep Learning for Coders (both of which use Python). Co-founded by Rachel Thomas and Jeremy Howard, fast.ai embodies the top-down approach to learning, and has shown an impressive aptitude for teaching their students highly applicable skills by centering doing, rather than theorizing. I’ve also been learning SQL using Mode Analytics SQL Tutorial, playing with more Python libraries in Jose Portilla’s Python for Data Science Bootcamp, and brushing up on my stats with Khan Academy. In keeping with the philosophy of learning by doing, I hope to be ready for an entry-level position in data science within a year, possibly applying to become a Sharpest Minds mentee during my job hunt.
Coming from the academe, I’m still battling within myself to feel like self-study is “worth anything,” particularly when it comes time to apply for jobs, and especially because job descriptions seeking data scientists tend to be somewhat discouraging to newcomers. But perhaps the inherent rapid iteration on ideas and technologies in data science requires everyone in the field to be adept at self-learning, and my strong mathematical background — I do have a “graduate degree in a quantitative field,” after all — combined with the skill to learn new concepts and skills on my own, will be attractive to potential employers.
(update November 2022)
A funny thing happened on the way to becoming a data scientist: I became a technical writer instead! But I get messages somewhat regularly by readers of this post, usually along one of these two lines of inquiry:
Should I (reader) become a data scientist?
I'm afraid I really can't answer that question for anyone who's not, you know, literally me, but I can say this: careers change, because humans change. If you were interested in this post, you already know this! So sure, go ahead and become proficient in Python and start learning skills that are pretty universal in tech, like:
- how to work in the command line -the absolute basics of how to use a text editor like Vim or Nano, even if you prefer to work in an IDE most of the time
- some SQL (databases are everywhere)
And while you're learning these universal skills, find social media accounts belonging to people in several different areas of tech, even if you just read. You'll start to get a bit of a feel for how different industries function, and if you follow who those people follow, you'll get a much bigger picture of the landscape.
How can I (reader) become a data scientist?
I feel awkward answering this, since I didn't end up choosing data science, but I was actively in the job hunt phase before I abruptly changed course, having gone through multiple rounds of interviews a few times and gotten valuable advice along the way. Here's that advice: the days when companies would hire math grads simply on the basis of their degrees are long gone. No matter what kind of degree you have, companies want to you to have:
- a portfolio with at least one data science project (not from a course or tutorial)
- the ability to write production-level Python
- fluency in Python DS libraries like Scikit-Learn, NumPy, and Pandas
- fluency in SQL (not just syntax, but real-world use)
- fluency in Git (also not just syntax, but real-world use)
I've made a list of my favorite resources for learning many of these skills. I hope this was helpful! Thanks for reading. 😊
Places you can find me: