One Year In, Here’s What We’ve Learned About Data Science in Civic Hacking

The Data Science Working Group at Code for San Francisco is almost a year old now! And though we’re volunteer civic hackers, with all of the complications of continuity that naturally entails, we’ve accomplished a good amount of, well, good, …and a great amount of organization for greater good, which is great for the greater good.

GREAT!!

Along the way way, though, we’ve also learned some hard lessons in trying to execute or operationally integrate data science through the various contexts of civic hacking. Some of these are akin to what traditional volunteer civic hackers have faced, while most are substantially different, as one might expect from a first-of-its-kind data science-specialized brigade team.

That said, we thought we’d ask some of our members who’ve been here from the start to each share a major lesson they/we have learned over this past year. Some of their answers may surprise you, while others may inspire you, but most of all, we hope that these answers will help inform the next group or generation of volunteer data scientists…

1. When it Comes to Interactive DS Solutions, Simple = Sustainable

sanat-moningiAs data scientists and engineers, we often find ourselves gravitating toward complex solutions to many of the problems our partners face. Complex solutions are sometimes needed, for their performance or strategic reliability, but they are not always sustainable. Most nonprofit and government partners do not have the resources to support complex solutions (such as reliable statistical insights or large scale custom applications), meaning all of your work could go to waste, even if it’s actually very valuable. It’s important to use the tools that are readily available to tackle problems and think about simple solutions that your partners can sustain over time, without having to transfer tons of specialized knowledge in the future. In addition, we can save time by using popular tools for initial exploratory analysis. Focus on simple solutions that work toward solving the problem, but are both sustainable and reproducible. This makes it easy for the partner to use and for others to replicate in their own cities.

Sanat Moningi
Co-Team Lead @ DSWG / Solution Architect @ Salesforce

2. Communication is Key to Maintaining Efficiency and Positive Partner Relations

rocio-ng_profileIt is important to communicate effectively with your team and whomever (eg. Non-profit org) will benefit from your work. This prevents your teammates from doing extra work, which can very easily happen in collaborative data science (e.g. data munging where already completed), and it prevents the team from creating a product that may not be useful to your endpoint user (which is why we are here!). We utilize Slack and Github to communicate and share our work while being sure to document everything we do. We also make sure to touch base with our end user from time to time to give status updates and re-align the goals of the project if needed.

Rocio Ng, Ph.D.
Lead Scientist @ DSWG / Data Scientist @ Schoold

3. Data Science Projects are an Iterative Undertaking

matt-mollison_profileChances are high that you will not understand the dataset at the outset, so documenting and exploring are necessary first steps. Before starting to work on the problem, create a data dictionary to help you and your team understand what the fields describe (your partnering organization can verify or amend your dictionary), and profile the data using various plots and aggregation metrics to understand its structure and the relationships between the fields. When beginning to answer the question, start simple, maybe with a couple variables—you probably do not need to build a complex machine learning model to find a reasonable solution. Throughout the project, keep a list of clarifying questions and stay in touch with your partnering organization to help you and your team work toward a solution. As basic as your results may seem at first, share your work to solicit feedback and spark new ideas; you can then iterate to build up to more interesting explanations.

Matthew Mollison, Ph.D.
Data Scientist @ Silicon Valley Data Science

3. Ignite the Passion in each Team Member

catherine-zhang_profile“I recently came across a quote by Antoine de Saint Exupery that resonates with my first year with the data science group: ‘If you want to build a ship, don’t drum up the men to gather wood, divide the work and give orders. Instead, teach them to yearn for the vast and endless sea.’ When newcomers step into the brigade -and the amazing Code for America HQ- for the first time, they are eager to work on a problem they care about. Give them the opportunity and ownership to achieve progress in their cause, or give them a reason to create their own. Members return every week when they have seen applied data science unravel challenges faced by government agencies. As a team, we believe there is much more work to do and we long to do it.”

Catherine Zhang
Product Manager @ Workday

4. Working as a Team of Volunteers has Unique Challenges

tyler-field_profileWith a volunteer group like ours, people come and go, which makes on-boarding new members a constant challenge. It also makes it difficult to maintain institutional knowledge, and this kind of knowledge (data science) can be especially costly to recapture. That’s why It is vital to document your work and thoughts through tools like Slack, GitHub, Google docs, and iPython Notebooks. We want to make sure new people can easily discover work already in progress and not duplicate efforts. This also helps when it comes time to writing a ‘final’ report or delivering a product. And when it comes to membership, it’s a good idea to document every person who comes in, as well as the breadth and depth of their skills, in some standardized way. That allows us to put together the right team for each job, and lets us tap current and past members for more esoteric expertise (e.g. GIS analysis, algorithm design, etc).

Tyler Fielding
Informatics Engineer @ Point Blue Conservation Science

5. Data Scientists and Analysts Aren’t Always “Coders,” and That’s OK

data-science_profileThere are plenty of capable statisticians out there, in various domains, but many haven’t been professionally privy to the standards of collaborative application development. We recognized such working discrepancies right away, through our first major project, the 311 case data analysis. Therein, much of the team was well-experienced in Git/version control, markdown, and Github, but definitely not all, and that proved to be a bit of a tragic bottleneck, as some of the latter group were highly statistically capable. Nevertheless, we eventually got through it and learned some things we’ve since addressed in our protocols here. Such protocol changes include A) assigning precise, singular prompts/challenges to each project teammate, such that the need for statistical collaboration is minimized (in its execution; strategy is still often discussed as a team); B) using hosted Jupyter Notebooks, so that teammates with little markdown or Github experience can reliably share, collaborate on, and upload full analyses to Github right away, and C) where analyses/models must be put into more robust, production-ready applications, turn to the broader brigade for full-stack expertise.

Jude Calvillo
Co-Team Lead @ DSWG / Co-Founder @ Hyperthesis, LLC

CHECKSUM(We’ve Learned Alongside Our Machines)

All in all, we’ve learned a thing or two about data science in civic hacking, but we’re still learning. As we continue to grow, we face new challenges of organizational scalability, multi-class partner relations, maximizing deliverable value, and more. Regardless, we remain excited for the “growing pains” that lie ahead, and we’re thankful for the broader SF brigade’s support thus far.

We hope you’ll join us or partner with us as we ride what we hope will be a cumulative probability curve toward becoming the #1 source for volunteer data science across the CFA brigade network… and across the country. 🙂