Many people who have an interest in open source don’t understand the intense, behind-the-scenes grind. The more successful a project is, the more burned out maintainers become. They may feel like victims of their own success, but consumers of open source don’t always see that. As a result, getting funding and adequate support for maintainers to invest appropriate time in a project feels like a constant challenge. There comes a point where volunteer maintainers just cannot do the job effectively without recruiting help.
There are some incomplete options, like single corporation funding; startups that have product offerings with an open core or another go-to-market strategy that involves open source; or consulting. Too often, there can be a friction between business necessity and the best interests of the open source community. But it’s important to strike a balance between solving business problems and creating a healthy developer ecosystem.
With these things in mind, I decided to take a new approach to funding and sustainability with Apache Arrow and Ursa Labs. I knew there was room for a more pure mission of building the open source project, creating the community around it, and taking into account the needs and requirements of everyone. I wanted to be able to invest multiple years with this mission to create a large and healthy open source development community for Arrow.
Eliminating inefficiencies and letting the community lead
After I graduated from Massachusetts Institute of Technology (MIT) in 2007, I joined AQR, an investment management firm, and was shocked by how tedious it was to do basic data analysis and statistics work. I saw a lot of smart, dedicated people who weren’t working efficiently, with a surprising amount of manual work in Microsoft Excel. I wanted to see if I could change the status quo and help make data easier to work with—and found that I really enjoyed making people and systems more efficient, and finding ways to make data tools more powerful and intuitive.
pandas was initially my reaction to being frustrated with the tools I’d been using. I think that’s what drives a lot of open source developers: They don’t find what they’re looking for, so they build it. In 2008, when I started working on pandas, Python wasn’t a popular language for data analysis and statistics yet, but I felt it had a lot of potential, especially with the right tools. We actually didn’t really have the term “data science” back then; people called it “data analysis” or “statistics.” For a long time, pandas was my skunkworks side project, and for the first year-and-a-half it was a proprietary code base. The financial industry didn’t have a good track record of open sourcing projects, so even getting initial code out the door was a lot of work. But it helped me get involved in open source and the Python community.
In 2011, I took leave and eventually dropped out of graduate school to focus on pandas. I had saved money from my first job and didn’t have student loans, so I was in a very fortunate position where I could live on my savings for a while. I reasoned that I could spend about a year doing part-time work to help pay the bills and concentrate primarily on open source work. I also partnered with O’Reilly to write and publish Python for Data Analysis, which was released in October 2012.
In the next year, I recruited two of my colleagues from AQR, Adam Klein and Chang She, to work with me and explore startup opportunities for Python and pandas in the finance industry. We were able to build out pandas’s functionality to a point where it started acquiring a small but passionate user base. That was around the time that core project developers like Phillip Cloud and Jeff Reback (who has been one of the main driving forces for the last seven years) began contributing heavily, and Chang and I went on to found DataPad, a startup that powered collaborative data discovery for businesses.
It surprises people to learn that I’m no longer involved in day-to-day pandas development, and that the project has been entirely community-owned and maintained since 2013. Almost all open source work I do now has an impact on pandas users in some direct or indirect way, though, and as a governance formality, I’m the project’s benevolent dictator for life (BDFL). Jeff always jokes, “Wes gets the kudos, and I get the hate mail.” But I honestly didn’t do a lot of the hard labor of building pandas’ maintainer and developer community after 2013. There have been 2,000 or more contributors at this point, and I can’t take a lot of credit for that. I really have to tip my hat to the core maintainers like Jeff, Tom Augspurger, Joris van den Bossche, Marc Garcia, and Brock Mendel. They have gone out of their way to make it easier to make contributions to pandas and to make the community more diverse and inclusive.
To grow a healthy community, it’s so important for the creators of these big projects—the people who might be a little more well-known—to make space for other maintainers to take on leadership roles. The enthusiasm from the early major contributors really made me feel confident enough to take a step back and to know that the project was in good hands. That left me free to search for more greenfield projects.
Taking time to network and identify a big challenge
I thrive on the early bootstrapping stages of projects. When a project transitions to more of an incremental improvement and maintenance-centric grind, I find myself looking for new problems to solve—problems that are often too big for me to solve as a lone wolf. Many high-impact projects can only be successful by recruiting collaborators and building a large developer and user community.
The social dynamics of building trust in large, ambitious projects can be daunting. I haven’t found any shortcuts for this part of open source development. However, I’ve learned that the more transparent you are with people about your objectives and your individual motivations, the better. Building relationships is a time-consuming process, but it’s important, and I spend a lot of time and energy on it. When possible, face-to-face interactions can make a big difference, since tone and intent can be difficult to convey over GitHub comments and email. I try to understand people’s different points of view and the kinds of problems they’re solving. Successful collaborations are also possible across language and cultural boundaries, and those require effort, too. It takes a lot for people to trust each other in open source—particularly when you’re asking somebody to commit labor or trust you to help solve their engineering problems.
It doesn’t help that there’s a culture of hero worship in open source: “The 10x engineer on GitHub.” It sometimes does take intense dedication from a small number of individuals to get a project bootstrapped—but even that isn’t always enough. What sets successful projects apart is how effectively the maintainers can conduct that social side, attract contributors, and empower contributors to grow into maintainers. You want to see the group being respectful and working earnestly to define and solve problems. When people see how other maintainers are behaving, they should feel comfortable spending a lot of time with them, virtually. They should feel respected. When in doubt, others contributors should assume that others have good intentions.
Funding ambitious open source innovation
The theme of my career is about promoting open source data analysis and data science tools. In 2016, I co-created the Apache Arrow project with a large group of other open source data systems developers, and I’ve been working full time on the project and ecosystem ever since. My goal with the Arrow project is to make data science tools faster, more efficient, and more interoperable across programming languages.
Funding work in a long-term ambitious open source project like Apache Arrow presented a new challenge. Many companies had expressed interest in investing time and money in the project, and I found the idea of a kind of “non-profit industry consortium” highly appealing: It would enable me to build relationships with many different organizations and work with them to understand their business problems.
In early 2018, JJ Allaire and Hadley Wickham from RStudio expressed interest in partnering with me on Arrow development to help improve performance and interoperability across both the R and Python languages. Together we created Ursa Labs to provide that industry consortium model I was searching for. Two Sigma (where I had been working since 2016) joined as our first sponsor and we soon added many others: NVIDIA, Intel, Bloomberg, and G-Research, to name a few. Through these sponsorships, we’ve been able to support a distributed team of six full-time open source developers.
With Ursa Labs, our primary dedication is to the open source project—and everyone is aware of those priorities. We prioritize the maintenance and the maintainers, and set that relationship precedent from the start. The commitment we’ve made to our sponsors in return is to oversee their interests in the project and work in good faith to help them solve their business problems through open source development. We see the sponsor’s success as the project’s success, so it’s really a symbiotic relationship. When both the users and consumers are successful, it’s good for everyone.
Pushing beyond the status quo and into a sustainable future
It feels like I’m participating in the awkward teenage years of open source: We’re in the middle of a trough of disillusionment where open source developers often feel unsupported and taken for granted. I think GitHub is doing the right thing by setting up GitHub Sponsors and easing the transfer of money from individuals to developers—but to actually motivate people to press the sponsor button is the next challenge. Consumers and big corporations would rather pay for specific new feature development than the more difficult-to-understand project maintenance.
The biggest concern I have is that people will stop wanting to contribute to open source. They’ll look at this lifestyle and say, “These people are all miserable. Why would I want to be like them?” Right now, when recent graduates Google “open source maintainer,” they’re going to see a lot of stories about burnout and negative behavior from entitled users.
There are tens of thousands of open source projects that people depend on every day. Who do you talk to about getting $100,000 for a critical JavaScript package that has one or two maintainers? I’d like to see 10,000 open source maintainers with full-time competitive pay and benefits who are able to spend 100 percent of their time on open source development. But without some mechanism to obtain a large and reliable amount of annual funding, it’s hard to see any significant change. It’s a fundamental shift that will likely have to come from a government level. The ideal scenario would be for the US federal government to allocate hundreds of millions of dollars per year to open source development. Perhaps a National Open Source Development Administration that recognizes how important the open source infrastructure is for developers to support growth and innovation in the modern business world.
In the meantime, I will make as valiant an effort as I can to provide stable long-term employment for open source developers. I think it’s easier for me to get funding for Ursa Labs because of my reputation, so I will leverage that to do as much good in the world as I can. It’s upon us to raise awareness about our experiences as open source maintainers in the 2010s, and the sustainability challenges for the decade ahead. The status quo is causing a lot of collective anxiety—so we have to take action now to change things.