Intro Statistics: Vocabulary: Data, Individual, Population, Sample, Parameter, Statistic, Inference

Intro Statistics: Vocabulary: Data, Individual, Population, Sample, Parameter, Statistic, Inference


So, this is section 1.1, and honestly,
statistics is difficult for some people because it involves about equal amounts
of English and critical reading as it does mathematics. You are going to have
to be careful about the wording of things in a way that you might not have
had to be careful before. So, with that in mind let’s go through and look at the first
definition here of “What is statistics?”. According to the book, they say it’s “a
science of collecting, organizing, summarizing, and analyzing informations
to draw conclusions or answer questions”. So, you’ll notice that shares some
similarities between my definition. It’s a science first of all. And it’s about
drawing conclusions, which is basically what I said. It’s about making decisions.
And so, then the book talks a little bit more about this data that we mentioned.
What is data? The book calls it something slightly confusing here but I think this
is a little bit better. It describes characteristics of an individual. This
could be a number, like your height, or this could be a word, like the color of
your shirt. And what’s important is that this data varies. They are not the same
for everyone. We aren’t all the same height. We don’t have all the same hair
color. We don’t wear the same color shirts. And these are pretty simple
examples, but they’re underlying a really important fact about statistics.
We are trying to understand this variability. We want to know how things
change. Another barrier that a lot of people have to understanding statistics
is understanding the vocabulary. I highly recommend maybe making a notebook of
definitions. We use a lot of words in here and some of them you might have
heard before, or some of them you might not have heard before, and even the ones that you might have used before we might use them in a slightly different
way. For example, we’ve got this word “individual” here which, typically in
English, you think about that being a person. But in statistics that’s really
just a thing we can measure. It could be a car, or a classroom, or it could be a
state in the United States. It’s anything that we can measure. So the word
Individual doesn’t necessarily mean “a person”. Another word that you might not
have seen too often is “variability”. That is a very important word in statistics.
And I’m actually going to include a link to a video that gives a really good
description of what we’re talking about when we talk about “variability”. So you
notice the last line here says “a goal of statistics is to describe and understand
sources of variability”. And why do we care about that so much? Well, let’s think
about an example. Maybe using Michael Jordan. He was one of the best basketball
players that ever played the game. But, think about Michael Jordan shooting free
throws. Whether he makes them or misses them varies. Sometimes he would make
them, sometimes he would miss them. If you just saw Michael Jordan shooting two
free throws and he missed them both and that’s all you ever saw you might say “Oh,
that persons not very good at basketball”, but that’s actually
completely wrong. The variability of his free throw shooting kind of got in the
way of the truth of whether or not he was a good basketball player. You have to
think about the fact that almost everything in the world varies and what
you see is just a small sample of that variability. So you have to be careful
when you want to make decisions about what is actually true or not. Just
because Michael Jordan missed a few free throws doesn’t mean then he’s a bad
basketball player. So this slide is quite important because it’s talking about the
general process of statistics. So say you want to answer a question like “What
color of car is the most popular in the United States?”. Well, if you think about
what that means, it’s a question involving what we call “the population”,
which is a large group of individuals. And for this question, that population is
“every car in the United States”. That’s millions of cars. That’s hard to know
about! So how do we answer a question like that? Well what we would do is
collect a smaller number of cars, which we call “the sample”, and from that sample
of cars we would try to talk about what that tells us about which color of car
is the most popular out of all these millions of cars. We use the sample to
try to describe what’s going on with the population. That is the entire point of
statistics. So, descriptive statistics are just describing what is going on with
the sample that we took. Sticking with the car example, suppose that we were
able to collect information on the color of 50 cars and suppose 15 of those cars
happen to be red. Well, this collection of 50 cars is our sample itself. It’s the
individuals that we gather data on and this 15 right here is a number that
describes that sample and we would call that “a statistic”. That’s an important
idea. A statistic is a number that describes our sample. In contrast to a
statistic is another really important idea for this class what’s known as “a
parameter”. Sticking with the car example suppose we wanted to know “How many red
cars are in the entire United States?”. Well if you think about that, that’s a
number and that’s a number that describes the population. It’s describing
every single car in the United States. So that number is a parameter. So to review
some of the vocabulary we just learned, with the car example a car itself would
be what’s known as an “individual”. It’s what we’re collecting data on. And our
“population” is every single car that we’re interested in, which would be all
the cars in the United States. The “sample” are the cars that we actually collected
data on, which would be those 50 cars. And so the number of red cars in our
population, the number of red cars in the United States, is what’s known as a
“parameter”. And the 15 red cars in our sample is what’s known as a “statistic”.
This is a really important idea, that we have a “population” and we have “a sample”.
Those are groups of individuals, in this case groups of cars, and we have numbers
that describe the population. Those are what’s known as “parameters”, and we have
numbers that describe our “sample”, and those are what are known as statistics.
We will use these terms over and over again in this class, “population”, “sample”,
“parameter”, “statistic”. Parameter goes with population. There’s 2 p’s. Statistics go
with samples. 2 s’s. Those are really important vocabulary to know what’s
going on for the rest of the semester. Ok, so, wrapping up section 1.1 which really
has a lot of important stuff in it, is the idea of what’s known as “inferential
statistics”. That is making an inference, a statement about something. So what’s
going on here? Inferential statistics is saying we can use our sample to talk
about the population. For our car example we had 15 out of 50 cars were red. That
meant that thirty percent of our sample was red. We can make an inference by
talking about the population from the sample and say hey, because thirty
percent of our sample was red we might think that it might be reasonable to say
something like “We estimate that thirty percent of cars in the United States are
red”. Is that totally true? Probably not. But it might be close, and that’s another
important part of Statistics is, how reliable our result is. How much can we
trust the idea that potentially thirty percent of cars the United States are
red, because thirty percent of our sample was a red car

1 thought on “Intro Statistics: Vocabulary: Data, Individual, Population, Sample, Parameter, Statistic, Inference

Leave a Reply

Your email address will not be published. Required fields are marked *