Previous Lecture lect06 Next Lecture

lect06, Thu 04/18

Data Visualization

Lecture References

The newly released book: Fundamentals of Data Visualization Lecture featured examples from Chapters 2-5 and 29.

The article from The Economist about their visualisations: Mistakes, we’ve drawn a fewa

Anscombe’s Quartet

Anscombe’s Quartet, developed by F.J. Anscombe in 1973, is a set of four datasets that have nearly the same summary statistics (mean, standard deviation, and correlation) but, when plotted, result in very different graphs.

Add the following code to a new notebook in JupyterHub and verify it for yourself (compute the summary statistics too):

"""
Anscombe's quartet
==================
"""
import seaborn as sns
sns.set(style="ticks")

# Load the example dataset for Anscbombe's quartet
df = sns.load_dataset("anscombe")
df

#######
# Show the results of a linear regression within each dataset
sns.lmplot(x="x", y="y", col="dataset", hue="dataset", data=df,
           col_wrap=2, ci=None, palette="muted", height=3,
           scatter_kws={"s": 50, "alpha": 1})

Alberto Cairo created the Datasaurus dataset “to illustrate how important it is to visualize data while analyzing it”: Download the Datasaurus: Never trust summary statistics alone; always visualize your data

Interesting results from Autodesk Research: Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing