Previous Lecture | lect06 | Next Lecture |

# lect06, Thu 04/18

Data Visualization

## Lecture References

The newly released book: Fundamentals of Data Visualization Lecture featured examples from Chapters 2-5 and 29.

The article from *The Economist* about their visualisations: Mistakes, we’ve drawn a fewa

## Anscombe’s Quartet

Anscombe’s Quartet, developed by F.J. Anscombe in 1973, is a set of four datasets that have nearly the same summary statistics (mean, standard deviation, and correlation) but, when plotted, result in very different graphs.

Add the following code to a new notebook in JupyterHub and verify it for yourself (compute the summary statistics too):

```
"""
Anscombe's quartet
==================
"""
import seaborn as sns
sns.set(style="ticks")
# Load the example dataset for Anscbombe's quartet
df = sns.load_dataset("anscombe")
df
#######
# Show the results of a linear regression within each dataset
sns.lmplot(x="x", y="y", col="dataset", hue="dataset", data=df,
col_wrap=2, ci=None, palette="muted", height=3,
scatter_kws={"s": 50, "alpha": 1})
```

Alberto Cairo created the Datasaurus dataset “to illustrate how important it is to visualize data while analyzing it”: Download the Datasaurus: Never trust summary statistics alone; always visualize your data

Interesting results from Autodesk Research: Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing