I recently joined Coiled as Head of Data Science Evangelism and Marketing. In this post, I’ll tell you why.
A bit about me
I am restlessly interested in promoting data literacy, broadening access to computation, and lowering barriers to data tooling.
To understand why I joined Coiled, a company that builds products and provides support for data scientists and teams working in Python with data at scale, it’s worth providing some context on who I am, where I’ve been, and what I do. Most recently, I worked at DataCamp, an EdTech company whose mission is to democratize data science education. An early employee there, I built out the foundational Python data science curriculum and wore other hats in marketing, product, data science and analytics, evangelism, and professional services. Prior to this, I worked in academic research (cell biology, applied mathematics, biophysics) at Yale University and the Max Planck Institute for Cell Biology and Genetics, Dresden, where I also taught workshops on Practical Data Science for Researchers. I am restlessly interested in promoting both data and computational literacy, reducing “computational anxiety,” lowering barriers to data tooling, and increasing both the adoption and development of open-source software (OSS).
A bit about Coiled
Coiled is a company born from the open-source ecosystem to meet the increasing needs of teams, organizations, and the enterprise to use OSS. For data scientists, analysts, and engineers using Python at scale, the package Dask (which “provides advanced parallelism for analytics, enabling performance at scale for the tools you love”) is an attractive option but doesn’t meet the growing needs of teams and the enterprise. This is why we founded Coiled earlier this year: to build products to meet those pressing needs.
So, then, why did I join Coiled? To answer this, I think it’s key to answer a set of sub-questions:
- Why a company based around open-source software?
- Why a Python company?
- Why a seed stage startup?
- Why evangelism and marketing?
- Why Coiled?
Why a company based around open-source software?
I’m excited to play a part in the next level of OSS adoption around the world, both in academic research and industry.
It is unquestionable that open-source software and tooling have changed the nature of business, academic research, and data science. I came of age in academic research at a time when there was a serious shift in moving away from proprietary software, such as MATLAB and Mathematica, to tools such as Python and R. The collaborative, open, and transparent nature of OSS, when done correctly, results in software that meets individual end-user needs and software that evolves as these needs evolve. As Eric Raymond notes in his seminal essay The Cathedral and the Bazaar, “given enough eyeballs, all bugs are shallow.” This OSS mantra was named Linus’ Law in honor of Linus Torvalds, the creator of Linux. The eponymous “Cathedral” refers to software with development cycles restricted to exclusive groups and “Bazaar” to software developed in public view. Moreover, as Fernando Perez, creator of IPython and co-lead of Project Jupyter, stated in his keynote at the inaugural JupyterCon,
If science is about opening up the black box of nature, we shouldn’t be doing science with tools that we are not legally allowed to open up and understand.
It was the burgeoning impact of the open-source on data work of all kinds that led me to join DataCamp in early 2016 to build out their foundational and scalable Pythonic data science curriculum.
There are at least two related challenges in the OSS space:
- The need for sustainable sources of funding for open-source software development;
- OSS meets the needs of individual end users, but cannot always meet those of teams, organization, and the enterprise.
Open-source software is undergoing a phase transition from having individual users in organizations to having large scale institutional adoption.
Here are some examples across several verticals:
- LIGO’s discovery of gravitational waves in 2016 used many OSS tools from the PyData ecosystem;
- JP Morgan’s trading platform Athena contains 35 million lines of Python code;
- Walmart uses Python, Dask, and XGBoost (a popular machine learning framework) to “tear through their massive-scale data analytics and machine learning”;
- Netflix runs over 150,000 batch Jupyter notebook jobs a day;
- T-Mobile uses R and TensorFlow to deploy machine learning models with the aim of building conversational AI to improve the customer experience;
- The Event Horizon Telescope team used the OSS PyData stack to create the first ever image of a black hole!
However, OSS is not always organization or enterprise ready, nor should it be. In the gap between OSS and the enterprise, we find the need for companies such as Coiled. This is why we built it: to help Dask meet the needs of modern teams doing data science at scale. These needs include, but are not limited to:
- Data analysts, scientists, and engineers being able to have one-click hosted cloud deployments and seamless integration between their local workstation configurations and their at-scale data work on cloud clusters;
- Data team leads having insight into data infrastructure usage and spend across their team and the larger organization;
- IT requirements for security and authentication.
Without companies that build products and provide support for organizations to use OSS, there will simply be less adoption of OSS. I joined Coiled because I’m excited to play a part in the next level of OSS adoption around the world, both in academic research and industry.
Why a Python company?
Data Science, impact, and community.
The reasons for joining a company in the Python ecosystem are simple: data scientist needs, impact, and community. Dave Robinson’s 2017 post The Incredible Growth of Python made it clear, if it wasn’t before, that Python was growing… well… in an incredible fashion… and it still is (you can play around with Stack Overflow Trends to see this yourself). There is simply a huge amount of demand for companies built around the PyData stack and, as a Pythonista, I’m in a position where I hope to have impact by meeting those needs, which I predict will only increase over the coming years.
The other reason for joining a Pythonic company is the PyData community, many of whom I’ve enjoyed working with over the years and many of whom I consider friends and mentors. In many ways, the PyData community feels like family and gives me a sense of a home away from home. PyData conferences are some of my favorite conferences to go to, as is SciPy (and there’s also something about Austin in the summertime!). Anybody who tuned in to DataFramed, the DataCamp podcast I hosted, would have noticed a recurring emphasis on the PyData community and its impact (among other patterns, of course), including early interviews with Jake VanderPlas and Katy Huff, conversations with Katharine Jarmul, and interviews with PyData tool creators and builders Wes McKinney (pandas), Skipper Seabold (statsmodels), and Brian Granger (Project Jupyter). It is also worth mentioning how important academic research and educational communities, where I feel at home due to my research background, have been to the PyData stack, which has largely grown out of academia in the past (there is now much more diverse organizational coverage).
Why a seed stage startup?
Leadership, learning, and a bias for getting things done.
There are 2 major reasons:
- I love the hustle and bustle, the do-what-needs-to-be-done, the problem-solving, and bias-for-action that’s required at smaller companies, and the necessary learning that then occurs;
- I am excited to be end-responsible for a business function and look forward to growing into a leadership position as we scale Coiled.
There are so many examples of early-stage startup problem-solving and bias-for-action that I love but one stands out:
When I started at DataCamp, my mandate was to build out as many high quality Pythonic data science courses as possible. Company leadership told me that collaborations with RStudio had been successful on the R curriculum side and asked me who the RStudio of PythonLand was. I told them it was Continuum Analytics, now Anaconda Inc., although there were key distinctions between them. We had breakfast early one morning with Peter Wang, then CTO of Continuum, in Cambridge, MA. Peter seemed to really dig the product and introduced us to Travis Oliphant, then CEO of Continuum. We had some great calls with Travis but, in order to get 4+ courses on the roadmap with Continuum, we needed to get in the same room. So one Friday afternoon in mid-June of 2016, I called my CEO and said, “why don’t we just fly to Austin on Sunday and pop into the Continuum office on Monday to say hi?” He loved the idea but rightly suggested that I email Travis to let him know we were planning to visit. Travis replied generously, as always, that he’d be happy to see us but he may not have a lot of time as it was his birthday! En route to the Continuum office on West 6th St in downtown Austin, I decided to swing by a great hot sauce shop (with the great name Tears of Joy) and bought Travis a birthday hamper of hot sauces and salsas. Travis spent hours with us that day, talking shop, both business and technical, and took us to an early dinner before joining his family for his own birthday dinner. Two days later, as we were in a taxi heading for Austin Bergstrom Airport, we received a signed contract from Travis for our first 4 collaborative courses that, to this day, are foundational in DataCamp’s Python curriculum, and that I had the great pleasure of working on. The DataCamp/Anaconda collaboration also extends to this very day, and ranges from courses to co-marketing initiatives, and the recent announcement of the DataCamp + Anaconda Team Edition.
This type of hustle, problem-solving, and get-it-done is something you find more readily in smaller startups, when you’re building the foundations of the machine (I’ve heard both “we’re building the railway track from the front of the train” and “we’re building the plane as it’s going down”!). Most importantly, the amount of learning that happens (and needs to!) in this context far surpasses most others.
Why Evangelism and Marketing?
The world doesn’t need more marketing. It needs vastly more relevant marketing.
I never considered myself a marketer and found myself falling into it several years ago at DataCamp. The truth was that I just loved creating content (writing, speaking, conversing) that aspiring and practicing data scientists found valuable and there was a pressing need for this at DataCamp as the B2C business grew. In a word, it represented my ikigai in a variety of ways, which is a Japanese concepts that you find at the intersection of (i) what you’re good at, (ii) what you love, (iii) what the world needs, and (iv) what you can be paid for (see figure below).
Now I definitely hear you ask, “Wait, what, does the world really need more marketing?!” and the answer is a resounding “no.” But what the world does need is vastly more relevant marketing. For years, marketing left a sour taste in my mouth, partially due to my background embedded in the purities of academic research, but also because so much marketing is just irrelevant and plain old sucks! Working in online education made me realize that I could create content to market a product I believed in to people who actually would find it useful. And I could do so creating content itself that delivered value immediately, whether it be conference talks, online live coding sessions, or a podcast.
As I hope I’ve made clear, I fundamentally believe that companies such as Coiled are essential to the ongoing adoption and future success of open-source software development and I consider it an honor to be able to market the products that we’re building at Coiled to an audience of my data scientists peers who need them, as I am so excited to evangelize the PyData stack, in general, and Dask, in particular, in order to play a role in the ongoing adoption of OSS at the individual contributor and organizational levels.
In addition to all the reasons stated above, we started Coiled in order to solve an already existing multitude of problems faced by institutional users of the PyData stack and Dask (for example, Capital One, Novartis, NASA, Chan-Zuckerberg initiative): those who use Python to do data science, machine learning, and AI at scale, or those using Python to work with “big data”. I’m excited to be a part of this. Now the buzz of “big data” has died down, it’s time to see it deliver real value. Put another way, I want to play a Pythonic part in the movement that takes us out of the Gartnerian Trough of Disillusionment and into the Slope of Enlightenment (see figure below).
Google trends results for “big data” over time. If we accept the Gartner Hype Cycle as a reasonable model for technological innovation and Google queries as a proxy for hype or societal interest, we are about to emerge into the Slope of Enlightenment. If we don’t really believe the Gartner Hype Cycle, we can still see that interests peaked, then hype died down, and now it’s time to see some real value delivered.
A second key reason is that Coiled is building on top of Dask, which is not merely a Python package, but one that’s deeply ingrained in the Scientific Python ecosystem, as we’ve described here.
The third reason is that I get to work and build things with what I consider to be a wonderful team of experts in the field. This isn’t just a startup scrambling for a VC payout. This is the real deal: a team of experts from diverse fields that can deliver on hard and important problems and I think that I have a few things to offer here. I look forward to building the next generation of tools for teams of data professionals and I am bloody stoked, as we say in Australia, to be doing this at Coiled.
Many thanks to Chris Holdgraf, Brian Granger, and Angela Bowne for their valuable and critical feedback on drafts of this essay along the way.