When this came through my inbox this morning, all I could think was, “How have I never found this blog before?” Harvard Business Review has a new article – “Beware Spurious Correlations” – that features the absurd (like an example of how iPhone sales correlates visually with U.S. deaths from falling down stairs) and the serious implications of what that means for those visualizing data and inferring unproven causal relationships.
The teaser from HBR:
“We all know the truism “Correlation doesn’t imply causation,” but when we see lines sloping together, bars rising together, or points on a scatterplot clustering, the data practically begs us to assign a reason. We want to believe one exists.
Statistically we can’t make that leap, however. Charts that show a close correlation are often relying on a visual parlor trick to imply a relationship. Tyler Vigen, a JD student at Harvard Law School and the author of Spurious Correlations, has made sport of this on his website, which charts farcical correlations—for example, between U.S. per capita margarine consumption and the divorce rate in Maine.”
Read the article for the full explanation, but all to say, be thoughtful about how you visualize your data – don’t fall prey to misleading with charts and graphs.
Jeff Knezovich over at On Think Tanks posted some great reflections from his recent trip to the Cartanga Data Festival, breaking down why data viz isn’t just a science but also an art. Data science alone, with its emphasis on statistics, code, and often technology, can’t develop the kind of simple yet artful visualizations that we find on feature blogs like Information is Beautiful or in reports to Ministries of Health that effectively advocate for new health facilities.
One of the highlights of his post was insight into how he approaches data visualization training and design as a discipline that requires expertise in research, technology, design, and communication. Jeff unpacks (with some great resource links!) the importance of design from a visual and graphical sense, but I would argue that data viz design requires a certain level of understanding of the human experience of interacting with information. Who is your audience? How do they interact with information? What is their level of numeric literacy? How much do they care about the information you’re trying to communicate?
My team has been exploring human centered design (HCD) methods through our work on the Innovations for Maternal, Newborn,, and Child Health Initiative*. At the core, HCD focuses on developing an empathy with the beneficiaries of a program. In visualization design, identifying an audience for your visualization and keeping them at the center of your design process is key to creating something that makes information meaningful.
Applying these principles of design need not be onerous or feel intimidating for data visualization designers (though the facilitation guides and experts in this space can go deep in more involved program design). Next time you’re crafting something visual from a data set, think about these three things:
- Who am I creating this for? As yourself this question repeatedly throughout the design process, not just at the very beginning. Understand both what they say they need from your analysis, but also their latent needs and expectations. If you’re working on a more complex project, like developing a dashboard, creating personas for your different users could be very helpful.
- Prototype (sketch!), test, and iterate. Don’t be afraid to ask for feedback from your users or at the very least your colleagues throughout the design process. And don’t be afraid to make changes!
- How will my audience use this product? How will your audience feel when they see your graph, chart, infographic, video, or dashboard? How they will interpret and use the data analysis you’ve presented? These considerations are key to ensuring your visualizations are used to promote evidence-led decision making.
Have you deliberately applied principles of human centered design in your data viz design? Share your experiences & learning in the comments!
*The Innovations Initiative is led by Concern Worldwide and funded by the Bill & Melinda Gates Foundation. JSI serves as the global research partner for the project.
I often argue that a chart’s y axis should always start with zero. Cole Nussbaumer (storytellingwithdata.com) and Jon Schwabish (policyviz.com) recently had a conversation about this very topic and posted it at storytellingwithdata.com, along with some example charts. In summary, Cole and Jon agreed that column/bar charts should always start their y-axis with zero. Because our eyes focus on the height of that bar, a non-zero axis distorts the relationship between columns. Savvy viewers of the chart may also be more skeptical about the truth of our visualization when they notice that the axis doesn’t start at zero.
But what if you really need to focus on small but meaningful differences between data points? Line graphs might be ok here depending on the context, because the focus in a line graph is on the relative position of the points in space rather than the height of the bars. But the relative positions can still be overstated by a non-zero axis.
One solution Cole and Jon (and I!) like is to present 2 charts side-by-side: one that has a zero-starting y axis to show the context of the data, and another that does not start at zero, but instead zooms into the chart to show the variation within a smaller range.
Listen to the whole conversation at storytellingwithdata.com. Are there any examples in global health/development where a non-zero axis works?
There’s been a lot in the news about the US Ebola cases, most recently the case diagnosed in New York City. Reading through my morning RSS feed, this powerful use of an icon matrix from “Why Ebola is less deadly in America than in Africa?” struck me as one of the best ways of visualizing a primary reason why we should be less worried about Ebola in the US and focus on the countries, with already fragile health systems before the virus attacked, hardest hit by this outbreak.
The visual comparison of doctors per 100,000 population says more than any long paragraph about robust health systems or the importance of training medical professionals (though both would likely be packed with potent information). No fancy colors or long winded text: just a simple, well-designed visual showing the staggering disparity in the availability of health workers to fight the deadly virus.
Kudos to Vox for their great design!
This Friday has been a blitz of sharing data visualization resources with colleagues, from Canva to Piktochart to Tiki-Toki. In one of the meetings, a colleague shared a resource new to me: The Noun Project.
For anyone who has been desperately searching for an easy place to find simple, often free icons, the Noun Project will be your new best friend. Simply search for the topic of interest, and you’ll likely find a smattering of options.
Continue reading The Noun Project makes finding icons easy
Harvard Business Review recently posted a visualization from an article about Contextual Intelligence, focused on the cross-cultural relevance of learning about different business practices. While not health or explicitly development related, their approach to visualizing connections between countries was elegant in it’s simplicity and worth sharing.
The visualization uses what has become an increasingly common scrolling approach (also seen in Tableau Story Points, the Wall Street Journal, and other design tools and publications), which helps to visualize a story by guiding the reader’s eye as s/he clicks through a series of graphs or charts.
What we liked about this particular example is the smart use of color, simplifying the initial matrix into visualizations only highlighting particular country connections).
Take a look at the full visualization on the HBR blog—does this example make you think in new ways about how to use a matrix to illustrate a data story?
The Ebola outbreak is undoubtedly frightening, with border closures being announced today in Liberia and more than 1,200 confirmed cases across West Africa, and our thoughts are with everyone being affected by the outbreak
The Economist featured the spread of Ebola as it’s daily chart on 29 July, bringing together maps and some stacked bar charts with transparent backgrounds to illustrate both the severity of the current outbreak and how markedly higher the number of cases is compared to any outbreak on record.
The map and charts together provide a wealth of information in a small amount of space, and using colors in a similar palette (but distinctly different when being used to represent different kinds of data) helps to tie the visual together. Because the case numbers are so low in the bottom bar chart from 1978 to 1995, the overlay seems to work well when placed across the map, not obscuring any extra data points. The simple axes and exclusion of any data points keep the story front and center: this outbreak is the worst we’ve seen in history, and the disease has a striking mortality rate.
The incidence and mortality data coming from this outbreak, as well as data from past outbreaks, has prompted a number of graphs, charts, and maps trying to visualize the data. These include both standard charts and maps, but also visual illustrations of how the disease spreads and timelines of past outbreaks, which can be great tools for raising awareness and sharing accurate information.