Data-Driven Management in Public Organizations

This course introduces students to the field of performance management and the practice of data programming. It is a practical, tools-based course designed to build the foundations for strong data analysis as part of the performance management process. We will cover basic data operations, graphics, programming fundamentals, text analysis, and the creation of user interface. As part of the class students will work in a team to build a model performance management system that includes a data collection process, an analytical framework, and a dashboard to report key performance indicators to stakeholders.   [ SYLLABUS ]

The Rise of Data Science in the Public Sector

"While data fluency has become a skill of increased importance in government and policy, truly taking advantage of the power of big data often requires workers with specialized skills. Increased government use of data has the potential to deliver services more efficiently and improve the quality of life in communities. However, demand continues to outpace supply, leaving many cities with a severe shortage of qualified workers."  [ link to blog ]

One of the most important skills the next generation of public employees can have is an understanding of both policy and data, enabling them to understand the social context, deploy the necessary analysis and craft targeted solutions for the most pressing civic problems. [ link ]

Data Science vs. Statistics

Statistics has a lot to say about collecting data and modelling: survey sampling and design of experiments are well established fields backed by decades of research. Statisticians, however, have little to say about refining questions, thinking about the shape of data, and communicating results.

The end product of an analysis is not a model: it is rhetoric. An analysis is meaningless unless it convinces someone to take action. In business, this typically means convincing senior management who have little statistical expertise. 

Adapted from Hadley Wickham: "Data science: how is it different to statistics?" [ link ]

Performance Management vs. Program Evaluation

Historically, there's been an unfortunate and unproductive divide between people who have the same goal of getting government to make more informed and data-driven decisions. On one side, there are those tasked with measuring performance. On the other are program evaluators.

“You could look at the history of program evaluation and performance measurement as a cautionary tale of two children who were brought up in the same house but were raised by different tribes and aren’t so friendly with one another,” says Don Moynihan, a professor at the La Follette School of Public Affairs at the University of Wisconsin-Madison.

“Performance measurement is a great tool for monitoring purposes, but it doesn’t tell people why something is happening. A performance measure could tell you that childhood obesity has declined by 5 percent from last year to this year, but it doesn’t tell you that the reason is a particular government program or a change in the economy or a private-sector initiative.”

Program evaluators follow carefully prescribed standards makes it more likely that one evaluation can easily be compared to another. But program evaluations can be expensive and often take at least two years to complete. Legislatures  and managers can't wait that long because by the time the evaluation is released the ground beneath the issue has often shifted. Evaluations take a snapshot of the status of an agency or program, but they're not useful for seeing what changes have taken place over time.

From our perspective as regular users of both evaluations and measurements, any rancor between the two groups defies common sense...If both sides of the rivalry can agree on nothing else, they certainly agree on this: The more information governments have, the better.

Adapted from: "Government's Data-Driven Frenemies"  [ link ]

Data Drives Open Innovation in Criminal Justice Reforms

July was the biggest month for prison reform in more than 20 years... The federal focus on prison policy comes none too soon. The number of federal inmates shot up from about 24,000 in 1980 to more than 215,000 in 2013. Over the same period, the cost of the system jumped nearly 600 percent. But despite all of the spending, public safety returns were weak.

The good news is that the federal government can look to the states for solutions. After Texas took the first step in 2007, states as diverse as Kentucky, Mississippi, South Dakota, and Oregon have adopted comprehensive reforms that are reining in the size and cost of their corrections systems while making communities safer. Many of the states have taken a “justice reinvestment” approach, drawing on research into effective practices as well as data about their own systems to craft policies that prioritize prison space for serious, violent offenders and use the savings to strengthen alternatives for lower-level offenders. Success in the states has put pressure on the federal government to improve its own system. ( From The Pew Charitable Trust [ link ] )

While waiting for Congress to enact national criminal justice reform legislation, the White House is promoting evidence-based efforts to help share innovative solutions to the broken criminal justice system from the ground up. The White House is creating a platform to discuss and share successful criminal justice reform efforts from states, cities, and counties. In fact, the Data-Driven Justice Initiative highlights 67 state and local communities that have engaged in reform efforts to fix a fragmented, inefficient, and expensive criminal justice system (see list here). ( From the Nonprofit Quarterly [ link ] )

New York City Opens Up Government Algorithms [ NYT Article ]

Governments also have access to oceans of data. Algorithms can decide where kids go to school, how often garbage is picked up, which police precincts get the most officers, where building code inspections should be targeted, and even what metrics are used to rate a teacher.

Andrew Nicklin, who ran open data projects for the city and state and is now at Johns Hopkins University, said experts were still learning how algorithms affect society.

Naked algorithms are just bunches of code, and even experts can find it challenging to discern what values they express. So researchers are discussing ways to include public participation before they are written. “We can formalize certain notions of fairness and nondiscrimination, affirmatively, at the outset,” said Solon Barocas, a professor of information science at Cornell University.

At their most powerful, algorithms can decide an individual’s liberty, as when they are used by the criminal justice system to predict future criminality. ProPublica reporters examined the risk scores of 7,000 people assigned by a private company’s algorithm. The recidivism rankings were wrong about 40 percent of the time, with blacks more likely to be falsely rated as future criminals at almost twice the rate of whites, according to Julia Angwin, who led the investigation.

Select Performance Management Literature

Data Systems / Analytics

Grass, M. (2016). Performance Management vs. Data Analytics: An Interview With Mike Flowers. Route Fifty [ link ]

Barrett, K. & Greene, R. (2016). Government's Data-Driven Frenemies: Performance Management vs. Program Evaluation. Governing Magazine. [ link ]

Hatry, H., and Davies, E. (2011). A Guide to Using Data‐Driven Performance Reviews. IBM Center for the Business of Government. Overview of the framework: pp 8-32. [ link ]

CompStat responsible for a 10 percent reduction in crime:  [ link ] [ video ]

Behn, R.D. (2007).  What All Mayors Would Like to Know about Baltimore’s CitiStat Performance Strategy.  IBM Center for the Business of Government, Washington, DC.  [ link ]

Moynihan, D. (2013). The New Federal Performance System: Implementing the GPRA Modernization Act. IBM Center for the Business of Government, Wash., DC. [ link ]

Matlin, C. “Matchmaker, Matchmaker, Make Me a Spreadsheet: How Data Saved OK Cupid.” FiveThirtyEight: Sept 9, 2014. [ link ]

Avirgan, J. (2015). "Why the Bronx Really Burned." FiveThirtyEight Podcasts, Oct. 29, 2015. [ link ]

Performance Measurement

Gawande, A. “The Score.” The New Yorker: October 9, 2006. [ link ]

Duhigg, C. (2012). The power of habit: Why we do what we do in life and business (Vol. 34, No. 10). Random House. CH 4: The Ballad of Paul O’Neill. [ download ]

The Robin Hood Foundation: Impact Metrics Methodology [ link ]

Kimberlin, C. L., & Winterstein, A. G. (2008). Validity and reliability of measurement instruments used in research. Am J Health Syst Pharm, 65(23), 2276-84. [ download ]

Incentives

Levitt, S. (2005). Do This, Get That: Making Sense of Incentives. Associations Now. [ link ]

Levitt, S. D., & Neckermann, S. (2014). What field experiments have and have not taught us about managing workers. Oxford Review of Economic Policy,30(4), 639-657. 

Heinrich, C.H. (2010). “Incentives and Their Dynamics in Public Sector Performance Management Systems.” Journal of Policy Analysis and Management, 29(1): 183‐208.

Lazear, E. P. (1996). Performance pay and productivity (No. w5672). National bureau of economic research.

Hong, F., Hossain, T., List, J. A., & Tanaka, M. (2013). Testing the Theory of Multitasking: Evidence from a Natural Field Experiment in Chinese Factories (No. w19660). National Bureau of Economic Research.

Pentland, A. (2014). Social Physics: How Good Ideas Spread-The Lessons from a New Science. Penguin. CH 4: Engagement

 

Open Source Data Science with R

 

Job Growth for R Skills [ link ] [ blog ]

Check out some recent blogs on the strengths of R and on compensation for advanced R programmers.

 

Course Content

Week 1: Introduction to Data Programming  [ ppt ] [ lecture notes ] [ markdown ] [ lab ]
Week 2: Data Structures  [ lecture notes ] [ markdown ] [ lab ]
Week 3: Merging Data [ lecture notes ] [ markdown ] [ lab ]
Week 4: Describing Data [ lecture notes ] [ markdown ] [ lab ]
Week 5: Data Input [ github ]
Week 6: Principles of Visualization [ ppt ] [ lecture notes ] [ markdown ] [ lab ]
Week 7: Simple Graphics [ lecture notes ] [ markdown ] [ lab ]
Week 8: Advanced Graphics [ lecture notes ] [ markdown ] [ lab ] [ lab ] [ animations ]
Week 9: Maps and GIS [ preview ] [ ppt ] [ github ] [ example
Week 10: Basic Programming [ lecture notes ] [ markdown ] [ lab ]
Week 11: Text Analysis [ ppt ] [ lecture notes ] [ markdown ] [ examples ]
Week 12: More Text Analysis [ quiz ] [ solutions
Week 13: Building a Data Dashboard [ link ] [ storyboard ] [ widgets ] [ gallery ] [ shiny tutorial ]
Week 14: Final Presentations

A Tale of Twenty-Two Million Citi Bike Rides: Todd W. Schneider

 

Great API for using Census Data in R! [ gist ]

Resources for Social Network Analysis in R [ tutorial ]

Resources

Helpful Listservs

R Weekly [ link ]
R Bloggers [ link ]

Data Science Podcasts

Data Points by GovEx [ link ]
Partial Derivative [ link ]
DMV Nation [ link ]
Becoming a Data Scientist [ link ]
Data Stories [ link ]
Talking Machines [ link ]
Not So Standard Deviations [ link ]
Data Skeptic [ link ]
More Or Less [ link ]
Linear Digression [ link ]
R-Podcast [ link ]

Data Journalists, Bloggers & Civic Groups

Trend CT [ link ] [ github ] [ style guide
Todd Schneider [ blog ] [ github ]
I Quant NY [ blog ]
ChartsNThings: A Blog by the NYT Graphics Dept [ link ]
Data for Democracy [ link ]

Open Data for Government

The Data Transparency Act [ overview ] [ link ] [ link ] [ link ]
Data.gov Federal Portal [ link ]
Project Open Data [ link
Keynote Speech on Importance of DATA Act [ link ]
40 Brilliant Open Data Projects for Smart Cities [ link ]
US Cities Open Data Census [ link ]

Ben Wellington's TED Talk on Open Data in NYC [ link ]
Background on the Open Data Movement [ link ]
Sunlight Foundation's Open Data Guidelines [ link ]
Global Impact of Open Data Book: GovLab / O'Reilly [ link ]
Progress Tracker on Federal Open Data Compliance [ link ]
How to Make Government Data Sites Better [ link ] [ link ]

Statewide Portal Tested in California [ link ]
Five Largest Cities Now Have Open Data Policies [ link ]

The Hidden Cost (and Benefits) of Open Data [ link ]
Realizing the Promise of Big Data: IBM Center for Gov.  [ link ]

Local Government Portals

Washington DC [ site ] [ shapefiles on github ] [ data community dc ]
Chattanooga Tableau Site [ link ]

Useful Data APIs

ckanr [ github ] [ vignette ]
Rsocrata [ github ]
censusapi Package [ github ] [ slides ] [ tutorial ]
@unitedstates [ about ] [ github ]
Data USA [ link ] [ documentation ]
Data Science Toolkit [ link ] [ rpackage ]
Federal Government APIs [ link ]

Misc. Datasets

19 Free Public Datasets (Springboard blog) [ link ]
Google Trends R Package [ link ]

Visualization

Compendium of Clean Graphs in R: [ link ]
Gallery of ggplot geoms [ link ]
Creating More Effective Graphs [ book ] [ gallery
Data + Design: Ebook On Data [ pdf ]
An Economist's Guide to Visualizing Data [ pdf ]
Ferdio Data Viz Project [ link ]
Information is Beautiful [ link ]
Visuals for Teaching Statistics [ link ] [ link ]
Bl.ocks.org Graphics Gallery [ link ]
Help Me Viz Graphics Gallery [ link ]
What Makes a Map Beautiful? [ link ]
Tableau: Which chart or graph is right for you. [ link ]
Flowing Data [ link ]
Graphics in R Tutorial: [ FlowingData ]
ChartsNThings: A Blog by the NYT Graphics Dept [ link ]
Data Viz Syllabus by Quealy & Carter [ link
Junk Charts: Blog on Making Graphics Better [ link ]
Primer on Making Great Graphs in R [ download ]
10 Tips for Making R Graphics Look Good [ link ]
Data USA [ link ]
CityBike Data Visualized [ link ]
Pedestrian & Routes in US Cities Visualized [ link ] & Europe [ link ]
Winners of Infographic Awards [ link ]
Visual Essays [ link ]

Bad Graphs

How to Display Data Badly [ link ]
Clowns [ link ]
Label Your Axes [ link ]
Pie Charts [ link ] [ link ]

Data Programming

R Shiny [ link ]
R Style Guide  [ download ] [ datacamp ]
R Markdown Cheat Sheet  [ link ]
Short Reference Card  [ link ] [ link ]
Project Management Guide  [ download ] [ download ]
GitHub is Going Mainstream [ link ]

US Digital Services

Inside Obama's Stealth Startup [ link ]
Why I Joined the US Digital Services [ link ]
Five Examples of How Federal Agencies Use Big Data [ link ]

Data Science Training in Government

San Francisco [ link ]
New York [ link ]

Department of Commerce [ link ]

Dashboard Examples

Pittsburgh Building Permits [ link ]
Voter Registration in AZ [ link ]
Government Performance in Chattanooga [ link ]
Fundraising Dashboard in R [ link ]
DataUSA [ link ]
Census Reporter [ link ]
Teacher Dashboard on Student Performance [ link ]
Vehicle collisions in Edinburgh [ link ]
Traffic accidents in London [ link ]
Life Expectancy Charts [ link ] [ link ]
Rise of Inequality [ link ]
World Development Indicators [ link ]
Demographics in Catalonia, Spain [ link ]
Tableau Gallery [ link ]

Dashboard Design

R Shiny Showcase [ link ]
R Shiny Widgets Gallery [ link ]
Nonprofit Dashboard Design [ webinar ] [ slides ]
Tableau: 6 Best Practices of Effective Dashboards [ download ]

Predictive Analytics Models

Food Inspection Forecasting: case study on predictive analytics for food violations in Chicago[ link ] 
Optimizing Infrastructure Repair [ measurement ] [ model ] [ news ]
Pretrial Criminal Risk Assessment for Judges [ link ]
Predicting Fire Hazards [ link ] [ model ]
Why the Bronx Really Burned - Predictive Analytics Fail [ link ]
Use Machine Learning to Predict Infrastructure Failure [ link ]
Using Prediction to Prioritize Water Infrastructure Maintenance [ link ] 
Using RFIDs to Regulate Marijuana Distribution in Colorado [ link ]
Crowd-Sources Solutions [ about DrivenData ] [ current competitions ]
State and National Presidential Poll Aggregation [ link ]

Open Innovation

The Data-Driven Justice Initiative [ link ]
Next Stage in the Open Data Movement [ link ]
Challenge.gov: Using Competitions to Spur Innovation [ link ]
Data for Democracy [ link ]