Data is not Enough to Solve our Problems

Brandon Pachuca
4 min readOct 22, 2020

How To Frame the Question to better solve public problems

Photo by Tatiana Fet from Pexels

When addressing a problem in the public sphere, a data scientist must be aware that every problem they touch is a wicked one. Wicked problems are not easily definable compared to an engineering or optimization problem. The wickedness starts at the beginning: defining the problem. A problem can be explained differently depending on whom you ask. Therefore, who is involved in round table discussions, or perhaps better stated who is not involved, becomes especially important. Since equity and fairness questions are essential when solving any public problem. If we want to create an accessible and viable solution, we must first ask ourselves the right questions.

There is an explosion of new data sources and bleeding-edge technologies, which continue to decrease entry barriers, particularly in the data science field. Holistically, this is an incredible achievement — since it might offer new methods to attain data literacy worldwide. It also brings about new issues such as; if data is all around — then every problem might look like it can be solved with more data. We cannot solely rely on technology to understand the actual problem. It is essential to break down a public problem first, before even thinking about data. We can break a problem process into two components, the problem statement and the analytical question.

The problem statement comes first; we must interview and understand the stakeholders affected by the problem and by a proposed solution. Recognizing that there are no undo’s in city policy or resource allocations. Once we understand the problem, the next step is to reframe the problem statement as an analytical question: ask if there is data or available methods to help inform the given problem?

Suppose a researcher is attempting to provide more healthy food to a community. In that case, the problem statement might be that the community lacks access to healthy food. The analytical question(s) can be; which communities are most at risk, where are healthy food resources, how much would it cost to increase access for a community? These questions have a clear, measurable objective, where data might assist in solving the equations. However, data is not always available to directly solve the problem statement, or the analytical question shifts throughout the solution.

In 2015, New York City had an outbreak of legionnaire disease (a form of pneumonia). City agencies identified that legionnaires were breeding in cooling towers across the city. Citizens became severely sick, and the problem statement became, which buildings in the city were at risk of legionnaires? At the time, the city did not have a dataset containing every building with a cooling tower or a log of buildings recently inspected and cleaned.

The problem quickly became searching for a needle in a haystack with no idea where to look. New York Mayor’s Office of Data Analytics (MODA) frantically began collecting proxies for which buildings had cooling towers to narrow down the list of possible buildings to inspect. The analytical question broke down into several components: which buildings have the highest chance of having a cooling tower, the contact information for that building, and has the cooling tower been inspected?

MODA created a series of machine learning approaches to narrow down which buildings would need an inspection. The first list had around 70,000 buildings, way too much for inspectors to handle. Early in the process, the algorithms were only 10% accurate when predicting the buildings with cooling towers.

The team consulted with firefighters who gave crucial insight that local fire code prohibited cooling towers on buildings with fewer than seven stories. This insight cut the list in half. At this point, MODA started incorporating additional contextual knowledge from city workers and agencies that the model began reaching 80% accuracy. The city contained the outbreak in several weeks with help from MODA, identifying which buildings had cooling towers to be inspected and cleaned.

MODA coordinated multiple city agencies to inspect, canvas, and conduct outreach in the community to contain the legionnaire outbreak effectively. Data and machine learning algorithms played a significant role in helping solve this problem. However, it took a team of people to ask the right questions to create a viable solution. Data itself was not the answer, nor could it solve the problem alone. The end dataset was virtually the building address, owner name, inspected (yes, no). Only when the team began engaging with city workers and experts did the right questions become clear to solve a wicked problem.

Green, B. (2019, March 29). 6. The Innovative City: The Relationship between Technical and Nontechnical Change in City Government · The Smart Enough City. Retrieved October 15, 2020, from https://smartenoughcity.mitpress.mit.edu/pub/yyth5w6y/release/1

Legionnaires’ Disease Response. (n.d.). Retrieved October 15, 2020, from https://moda-nyc.github.io/Project-Library/projects/cooling-towers/

Rittel, H. W., & Webber, M. M. (1977). Dilemmas in a general theory of planning. Stuttgart: IGP.

--

--

Brandon Pachuca

Urban Data Scientist + Web Developer at KPF. Studied Urban Informatics at New York University.