Data Visualization Basics
It's really important that we understand the fundamentals of data visualization design, and what kind of chart to create.
Get the project source code below, and follow along with the lesson material.
Download Project Source CodeTo set up the project on your local machine, please follow the directions provided in the README.md
file. If you run into any issues with running the project source code, then feel free to reach out to the author in the course's Discord channel.
Lesson Transcript
[00:00 - 00:12] Okay, so now that we understand how to make a chart, I want to talk to you a little bit about what chart you make and how to make those decisions. So a little bit of recap.
[00:13 - 00:20] So first we made a line chart. Next we made this scatter plot and then after that we made this bar chart.
[00:21 - 00:41] But if we zoom on a little bit, there's this huge problems pace of different types of charts and even if you're using the same exact data set, you could come up with these completely different charts and it can probably feel a little bit overwhelming if you're sitting there with a dataset and saying, "Oh, I want to visualize this. What do I do with it?"
[00:42 - 00:59] And the first step for deciding the type of chart is usually asking yourself what kind of questions do I want to answer. So if we go back and look at each of the charts that we've already made, even with the same data, we can talk about the different questions that they can help start to answer.
[01:00 - 01:09] So first we made this timeline. So we were mapping the maximum temperature for every day in that location for an entire year.
[01:10 - 01:23] So the timelines are really good at showing trends over time and any kind of temporal pattern. So we can see, oh, it was really hot, the temperature peaks around August.
[01:24 - 01:33] There's a lot of fluctuation between days. If we could probably guesstimate, there's like 10 to 20 degrees of fluctuation day to day.
[01:34 - 01:54] It seems to be more fluctuation in the winter time and less in the summer. And we can see it's really cold in the winter, really hot in the summer, and we had a few days where the maximum temperature was below freezing around the new year.
[01:55 - 02:00] So that's our timeline. The next chart we made is the scatter plot, where we chose two metrics.
[02:01 - 02:08] We went with humidity versus dewpoint. And so a scatter plot is really good at showing the relationship between two metrics.
[02:09 - 02:14] So you begin to answer things like, are they correlated? Are they anti-correlated?
[02:15 - 02:24] We can see humidity and dewpoint are kind of correlated. If you have a higher humidity, you're more likely to have a higher dewpoint.
[02:25 - 02:31] Each of these dots, remember, is an individual day. But it's not-- the scrooping isn't super tight.
[02:32 - 02:42] So there is still range where here's a day with a high humidity. And here's another day with low humidity with the same dewpoint.
[02:43 - 02:53] So this kind of scatter plot can be really good at looking at the relationship between two metrics. And you don't really get any temporal patterns in here.
[02:54 - 03:02] And then the third chart we made was this histogram, where we only looked at one metric. We chose humidity.
[03:03 - 03:17] And it's a little bit more similar to the scatter plot, where you can kind of see where the values fall for this one metric. But it really hoons in on how much variance is there in this metric?
[03:18 - 03:24] Are most of the numbers in this one group? Or are there two groups that they mainly fall into?
[03:25 - 03:38] Because our humidity is most likely going to be between 0.5 and 0.9 on any given day. But there's a little bit more of a tail on the left side than on the right side .
[03:39 - 03:57] So histograms are really good at showing the distribution of a single metric. And so hopefully you can begin to see, even with this one data set, you can plot the same data in many different ways.
[03:58 - 04:09] And how those different formats can answer completely different questions. So the timeline isn't necessarily going to be appropriate in a place that might call for a scatter plot.
[04:10 - 04:15] Yeah, so we're going to start in the next lesson by talking about the data itself.
[00:00 - 00:12] Okay, so now that we understand how to make a chart, I want to talk to you a little bit about what chart you make and how to make those decisions. So a little bit of recap.
[00:13 - 00:20] So first we made a line chart. Next we made this scatter plot and then after that we made this bar chart.
[00:21 - 00:41] But if we zoom on a little bit, there's this huge problems pace of different types of charts and even if you're using the same exact data set, you could come up with these completely different charts and it can probably feel a little bit overwhelming if you're sitting there with a dataset and saying, "Oh, I want to visualize this. What do I do with it?"
[00:42 - 00:59] And the first step for deciding the type of chart is usually asking yourself what kind of questions do I want to answer. So if we go back and look at each of the charts that we've already made, even with the same data, we can talk about the different questions that they can help start to answer.
[01:00 - 01:09] So first we made this timeline. So we were mapping the maximum temperature for every day in that location for an entire year.
[01:10 - 01:23] So the timelines are really good at showing trends over time and any kind of temporal pattern. So we can see, oh, it was really hot, the temperature peaks around August.
[01:24 - 01:33] There's a lot of fluctuation between days. If we could probably guesstimate, there's like 10 to 20 degrees of fluctuation day to day.
[01:34 - 01:54] It seems to be more fluctuation in the winter time and less in the summer. And we can see it's really cold in the winter, really hot in the summer, and we had a few days where the maximum temperature was below freezing around the new year.
[01:55 - 02:00] So that's our timeline. The next chart we made is the scatter plot, where we chose two metrics.
[02:01 - 02:08] We went with humidity versus dewpoint. And so a scatter plot is really good at showing the relationship between two metrics.
[02:09 - 02:14] So you begin to answer things like, are they correlated? Are they anti-correlated?
[02:15 - 02:24] We can see humidity and dewpoint are kind of correlated. If you have a higher humidity, you're more likely to have a higher dewpoint.
[02:25 - 02:31] Each of these dots, remember, is an individual day. But it's not-- the scrooping isn't super tight.
[02:32 - 02:42] So there is still range where here's a day with a high humidity. And here's another day with low humidity with the same dewpoint.
[02:43 - 02:53] So this kind of scatter plot can be really good at looking at the relationship between two metrics. And you don't really get any temporal patterns in here.
[02:54 - 03:02] And then the third chart we made was this histogram, where we only looked at one metric. We chose humidity.
[03:03 - 03:17] And it's a little bit more similar to the scatter plot, where you can kind of see where the values fall for this one metric. But it really hoons in on how much variance is there in this metric?
[03:18 - 03:24] Are most of the numbers in this one group? Or are there two groups that they mainly fall into?
[03:25 - 03:38] Because our humidity is most likely going to be between 0.5 and 0.9 on any given day. But there's a little bit more of a tail on the left side than on the right side .
[03:39 - 03:57] So histograms are really good at showing the distribution of a single metric. And so hopefully you can begin to see, even with this one data set, you can plot the same data in many different ways.
[03:58 - 04:09] And how those different formats can answer completely different questions. So the timeline isn't necessarily going to be appropriate in a place that might call for a scatter plot.
[04:10 - 04:15] Yeah, so we're going to start in the next lesson by talking about the data itself.