Previous Lecture | lect06 | Next Lecture |
lect06, Thu 04/18
Data Visualization
Lecture References
The newly released book: Fundamentals of Data Visualization Lecture featured examples from Chapters 2-5 and 29.
The article from The Economist about their visualisations: Mistakes, we’ve drawn a fewa
Anscombe’s Quartet
Anscombe’s Quartet, developed by F.J. Anscombe in 1973, is a set of four datasets that have nearly the same summary statistics (mean, standard deviation, and correlation) but, when plotted, result in very different graphs.
Add the following code to a new notebook in JupyterHub and verify it for yourself (compute the summary statistics too):
"""
Anscombe's quartet
==================
"""
import seaborn as sns
sns.set(style="ticks")
# Load the example dataset for Anscbombe's quartet
df = sns.load_dataset("anscombe")
df
#######
# Show the results of a linear regression within each dataset
sns.lmplot(x="x", y="y", col="dataset", hue="dataset", data=df,
col_wrap=2, ci=None, palette="muted", height=3,
scatter_kws={"s": 50, "alpha": 1})
Alberto Cairo created the Datasaurus dataset “to illustrate how important it is to visualize data while analyzing it”: Download the Datasaurus: Never trust summary statistics alone; always visualize your data
Interesting results from Autodesk Research: Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing