The potential synergy of text and graphics can be appreciated by talking through your own graphics, explaining them to others. Student's reanalysis of the Lanarkshire Milk Experiment (Student, 1931) is an excellent example (and is also interesting as an early analysis of a large data set). They complement text and are complemented by text. Graphics on their own are insufficient, they are part of a whole. If you have read all the supporting text, the display is often memorable and readily understandable. Fisher's with the words: "No one should read this book who has not read it already." It is like that with graphics. For data visualization you need to know the context, the source of the data, how and why they were collected, whether more could be collected, the reasons for drawing the displays, and how people with the necessary background knowledge advise they might be interpreted. A picture is not a substitute for a thousand words it needs a thousand words (or more). ‘A Picture Is Worth a Thousand Words’įamous sayings have a way of developing a life of their own. Superb examples include Human Terrain, a dynamic graphic showing the world's population in 3-D, and the interactive NameVoyager.
Dynamic graphics and, more especially, interactive graphics are in an exciting stage of development and have much to add. This overview concentrates on static graphics. Just as graphics are useful for checking model results, models are useful for checking ideas derived from graphics (for more on models, see Hand, 2019). In fact, interpreting graphics needs experience to identify potentially interesting features and statistical nous to guard against the dangers of overinterpretation.
Graphics raise questions that stimulate research and suggest ideas. Graphics reveal data features that statistics and models may miss: unusual distributions of data, local patterns, clusterings, gaps, missing values, evidence of rounding or heaping, implicit boundaries, outliers, and so on. Look, for instance, at the one-sided peaks in the distributions of marathon finishing times (marastats, 2019).
This is a part of data analysis that is underplayed in textbooks, yet ever-present in actual investigations. It is essential for exploratory data analysis and data mining to check data quality and to help analysts become familiar with the structure and features of the data before them. The main goal is to visualize data and statistics, interpreting the displays to gain information.ĭata visualization is useful for data cleaning, exploring data structure, detecting outliers and unusual groups, identifying trends and clusters, spotting local patterns, evaluating modeling output, and presenting results. As with other aspects of working with graphics, it would be useful to have an agreed base of concepts and terminology to build on. One person's statistics may be another person's raw data. They can include displays of transformed data, sometimes based on complicated transformations. The displays are mainly descriptive, concentrating on 'raw' data and simple summaries. Sometimes every data point is drawn, as in a scatterplot, sometimes statistical summaries may be shown, as in a histogram. Data visualization means drawing graphic displays to show data.