A History of Data Visualization and Graphic Communication
by Michael Friendly and Howard Wainer
Harvard University Press, 2021
edward tenner, a research affiliate of the Smithsonian and Rutgers University, is currently a Visitor in the Program in Interdisciplinary Studies at the Institute for Advanced Study in Princeton, New Jersey.
Published September 3, 2021
In July, the Financial Times economics columnist James Harford wrote that he had seen “Twitter posts demanding that certain statisticians be silenced or hunted down and destroyed, sometimes for doing no more than publishing graphs of Covid-19 cases and hospitalizations.”
Covid-19 has shown that information graphics can be, in more than one way, a matter of life and death. An important new study of statistical representation, A History of Data Visualization and Graphic Communication, shows that they always have been. The book provides an unmatched overview of personalities and techniques over hundreds of years with new insights — but also with a few gaps worth filling.
One important feature of the book is that both authors are award-winning statisticians, in contrast to their predecessors who came to the topic from other disciplines. Darrell Huff, who published the classic How to Lie with Statistics in 1954, was a magazine editor and journalist. Edward Tufte, author of a spectacularly successful book series on graphic information design beginning in the 1980s, originally taught political science. And before his second career, Tufte was best known for a pathbreaking analysis, Political Control of the Economy, a topic that has only grown in importance. Hans Rosling, the Swedish author of Factfulness: Ten Reasons We’re Wrong About the World — and Why Things Are Better Than You Think, whose TED talks have been viewed millions of times, was a physician by profession.
(Two of the greatest 19th-century British economists, William Stanley Jevons and Alfred Marshall, were early enthusiasts of graphical methods, according to the authors, but today’s economic data graphics wizards are probably not academics but data journalists, such as the creators of the penultimate “Graphic Detail” page of The Economist magazine.)
While the story that Friendly and Wainer tell begins with the cave paintings of Lascaux in southwestern France, they hit their stride with a graphic of twelve different estimates of the longitudinal (east-west) distance between Toledo and Rome by the 17th-century Flemish cartographer Michael van Langren. Van Langren understood that showing the estimates along a single line would demonstrate just how far off the primitive measures of the era could be. As we know now, the estimated points on the line were all well east of the actual location — some as far away as Turkey — what statisticians now call systematic error.
The spread was an ingenious visual aid designed to help secure financial support for van Langren’s quest for a solution to the very difficult problem of measuring longitude. (He ultimately received a grant, but left his solution in an encrypted message that still has not been decoded.)
The rise of public policy studies and political economy in Britain and on the Continent in the late 18th and early 19th centuries — hence the “state” in “statistics” — created a new era in data representation. Today’s familiar shaded thematic maps date only from the 1820s, when they were used to analyze French social conditions and possible relationships between variables like crime rates and popular education.
It was only a short step to British innovations that were a positive unintended consequence of establishing a national registry of births and deaths. The original motive, as the authors explain it, was the orderly transfer of landed wealth. But William Farr, a physician overseeing the data bureaucracy, understood that analyzing information on causes of death could save lives and pioneered graphs of the rise and fall of cholera mortality without understanding the causes. Another physician, John Snow, became celebrated for a map identifying a single pump as the likely source of the contagion.
Despite the immense potential of data graphics for good decision making, facing facts through them can’t improve decisions if decision makers don’t pay attention.
The Biggest Three
The great age of statistical graphics began not with these 19th-century pioneers but with three others: the Scots entrepreneur William Playfair, whose best-known works were published in the late 18th century, the French civil engineer Charles Joseph Minard and the unclassifiable English polymath Francis Galton, who began his professional life as medical school dropout, mathematician and explorer.
Playfair had a dazzling gift for economic investigation through line graphs suggesting relationships between two or more variables through time: British import and export trade with various countries, the national debt, food prices and wages — the leading figure of the “big bang” of data graphics, as the authors put it. As a civil servant assisting in the development of France’s railroad network, Minard developed innovative graphics and maps to analyze everything from the causes of a bridge collapse to railroad routes and fares to the effect of the American Civil War on flows of raw cotton to Europe. After compulsory retirement at the age of 70, he produced, among other works, a graphic of the horrendous casualties of Napoleon’s Russian campaign of 1812 that experts now consider the most beautifully compact representation of data ever made.
Francis Galton developed, among many other originally stunning and now-familiar innovations, the first graphs of correlation of variables like the relationship between parents’ and children’s height. While Galton is best known for popularizing the “bell curve” probability distribution today — and, alas, eugenics a k a “scientific” racism — the authors see his great contribution as the subtler scatterplot that maps two variables with an array of dots that can help identify patterns concealed by numerical tables alone.
Data graphics entered a new golden age after World War II. The improvements were aided by the explosion of electronic data and computers, facilitating the processing of three-dimensional data. A landmark was a project at the Stanford Linear Accelerator Center in 1973 that enabled the analysis of data in up to nine dimensions, yielding insights into previously elusive topics like the origin of diabetes.Hans Rosling’s animations in his TED talks showed how animated graphics could yield fresh insights into historical trends for lay as well as academic audiences.
Friendly-Wainer may well be the best single book on its topic, indispensable for understanding the professional heritage of graphic innovation. It generally does not deal with the popular side of graphs as presented by books like Huff’s How to Lie with Statistics. Ironically, Huff defended tobacco companies at public hearings and secretly accepted their cash to write a never-published statistical defense of the industry — perhaps from mercenary motives, but more likely to demonstrate that he could outsmart the professionals. He has been canceled from the pantheon. But the artist of How to Lie with Statistics, Irving Geis, was a giant of scientific illustration who deserved an honorable mention for his pre-computer work on visualizing protein structures.
In the vast territory Friendly and Wainer do cover, the authors not only visit the past highlights of their field, but also demonstrate how the state of the art can improve data graphics classics even further. Yet the context of the same classics also shows the limits of information design in guiding decisions.
The English clockmaker John Harrington — not the cartographer van Langren — won the British government’s prize for determining longitude. King Louis XVI of France was duly impressed with William Playfair’s outstanding Commercial and Political Atlas when he received it as a gift in 1787, but it could not save his throne or his life two years later.
Minard’s graphic of the Russian campaign may be the most poignant example. Friendly and Wainer remind us that Minard prepared this masterpiece in retirement as a warning to Emperor Napoleon III against another futile adventure. The emperor nonetheless repeated history by allowing himself to be drawn into a disastrous war with Prussia that destroyed the Second Empire and created a fearsome enemy in a united German Reich.
Why was Napoleon I so foolhardy in the first place? There is a tantalizing clue in the magnificently engraved large-scale map of European Russia that he ordered from his military cartographers as he was contemplating the invasion. I saw one of the 77 parts (about 40 x 60 inches) in an exhibition at the New York Public Library, and the Bibliothèque Nationale de France has put an image online. This superb miniature representation of Russia must have supported the emperor’s false sense of confidence.
And why was Francis Galton, one of the 19th century’s greatest intellects, unable to use his own brilliant data techniques to question rather to reinforce his contemporaries’ racist attitudes on Africans, evolution and intelligence?
Thus, despite the immense potential of data graphics for good decision making, facing facts through them has two inevitable limits. It can’t improve decisions if decision makers don’t pay attention. And even the cleverest software can never make assumptions on our behalf.