Types of data

We talk about the different types of data: the main buckets of qualitative and quantitative metrics, and their sub-categorizations.

Project Source Code

Get the project source code below, and follow along with the lesson material.

Download Project Source Code

To set up the project on your local machine, please follow the directions provided in the README.md file. If you run into any issues with running the project source code, then feel free to reach out to the author in the course's Discord channel.

This lesson preview is part of the Fullstack D3 Masterclass course and can be unlocked immediately with a \newline Pro subscription or a single-time purchase. Already have access to this course? Log in here.

This video is available to students only
Unlock This Course

Get unlimited access to Fullstack D3 Masterclass, plus 70+ \newline books, guides and courses with the \newline Pro subscription.

Thumbnail for the \newline course Fullstack D3 Masterclass

So when you have a new dataset, the first step is often just figuring out the structure of the data. So over here on the left, we have the name and value of all the properties for one of the days in our dataset. So we can see there's the summary, there's the moon phase, precipitation, de-point, cloud cover, the date. So this is January 1st, 2018. There's really a lot to look at here. So let's see how we can kind of categorize these to make sense of what these different values are. So one common grouping you might see is grouping types of data into two buckets . One is qualitative, the other is quantitative. So for qualitative data, it's usually a string, doesn't have to be, but often it is. And it's usually you can put things in different buckets. So things like icon, this is a string, you can put clear day, you can put this data point within the clear day bucket. And to contrast that we have quantitative is the other categorization. And it's usually a number where there aren't really buckets, but instead that we have kind of like a number line. So let's go into a little bit more detail for both of these two categorizations . So the first sub-categorization for qualitative data, the one with the buckets is binary. So if you have a binary metric, it's usually, it's one or the other one. So a good example of this is raining or not raining. I'm not sure if we have any good examples of this in our data set, but imagine that precipitation probability was zero or one. Is it going to rain or is it not going to rain? That would be a binary metric. The next one, still qualitative is nominal. So nominal is usually things that you have a name. So like this icon, I can say clear day, sunny, cloudy or windy. These are just names of things and they don't really have an order. The next one we have is ordinal. So it's a little bit similar to nominal, but there's a natural order of things. So not windy, kind of windy, very windy. If the icon, there were three options, they said clear day, partially clear day . Partially cloudy day and cloudy day. That would be an ordinal metric. And then, so those are the three qualitative metric types, binary, nominal and ordinal. And then on the other side, we have for quantitative metrics, we have discrete metrics. So this is a classic example of this is number of kids. So you can have one kid, two kid or three kid, but you can't really have one and a half kids, unless maybe you're averaging. So discrete data is, there are discrete numbers and they kind of have a place on a number line. So they have relationships to each other, but you can't interpolate in between certain values. And I think for visibility is kind of discrete because I think there are set values here. I think they're going to be integers. So zero, one, two, three all the way up to ten. And then the last one is continuous. And this is what you're going to see the most often. This is, if you see a number, it's most likely continuous like degrees Fahrenheit. You can have 70 degrees, you can have 72 degrees, you can have 71 and a half degrees. You can have really any point in between the two extremes. So most of the things on here will be a continuous variable. So temperature min is going to be continuous. Humidity is a continuous variable. And then any dew point or pressure or wind speed, these are all going to be continuous variables. And once you've kind of labeled all of the metrics in your data set as what type of data they are, you can start to figure out the structure for your data set and then move on to the next step, which is figuring out how to visualize those different pieces of data.