Welcome to Excel 2013

statistical analysis video number 43. And if you want to download this

workbook file or the PDF files for chapter 7, click on

the link below the video. Hey, chapter 7, we got

to talk about sampling, and then one of the more

amazing things, the sampling distribution, which will

be a distribution of all of our x bars and p bar too. And we’re going to learn

about the central limit theorem, which will come from

some things we learn when we create a sampling distribution. Now, I’m going to go to

actually to page 9 in the PDF. Here is page 9. In chapter 6, we looked at

our normal distribution. But we were always talking about

x values and probabilities. Not only that, but we

had the population mean and the population standard

deviation, mu and sigma. And we were talking

about x, right? But here’s chapter 7. We’re going to treat the x

bar as a random variable. That means we will plot

all of the possible x bars from a given set

of population data. And we’re going to discover

some profound things. The normal

distribution down here, our random variable

is going to be x bar. Now, in chapter 7,

we’re still going to know the population mean. But we’ll learn something

quite amazing about that. And we’re still going to know

sigma or population standard deviation. But all of our random

variable, those are going to be actual

samples we take, x bar. And we’ll also talk

about proportions where we have p bar. Then in next chapter,

chapter 8, we’ll talk about what to do

when you don’t know sigma, the population

standard deviation. Home to get back to page 1. All right, so sampling

and sampling distribution. We’re going to start off

by talking about sampling. Hey, population? That’s all of the data points. Sample? Just some of them. We’ll talk about

one or two methods of actually how to go and

pick your sample items. But you get the idea– all the items,

some of the items. Now, what happens is we

have a population mu, our mean of all of

these sales values– 289 bucks approximately, right? Well, when you took a sample

and calculated mean x bar, we got what’s called a point

estimate of our population mean, our x bar, 255. Well, what’s the difference

between these two things? It’s called sampling error. And we can see it’s 34 bucks. Now, the question

is what are we going to do with that sampling error? Is the sampling error

just sampling error? Hey, we just took

a sample, so we’re not going to have

exactly the same values, or does it represent

a true difference? So is the point

estimator good enough? And really, what we’re going

to ask is, is it reasonable? So in this chapter, we’re

going to build a model so we can check whether our

point estimator, either x bar or p bar, and the sampling

error, are those reasonable? Sometimes, we’re going

to take a sample, and it won’t be

reasonable, which means it’s not a

good point estimator. Other times, we’re

going to take a sample and we’ll run our test. And it will be reasonable. And we’ll say that,

yeah, the point estimator is a fair representation for

the population parameter. Again, the model

that we’re going to use here to help us decide

whether our point estimate is reasonable is called the

sampling distribution. But first, before we talk

about the amazing sampling distribution, let’s talk about

sampling, page 3 in our PDF, sampling. Hey, why do we take

samples in the first place? Well, a few reasons. Some populations are

just impossible to check. We can’t count all

the fish in the sea. And yet, it’s very important

sometimes for people to do samples to figure out

how many fish there are, how many dolphins,

how many orcas. Another reason, calculating off

of the population costs a lot. An example from General Mills– they might hire a firm

to test a new cereal. The sample might cost 40K. Population– that

means going out and asking every single person. No way, way too much money. Another reason that

we take samples is contacting a whole

population is time-consuming. Political polls can be

conducted in one to two days, whereas, of course, contacting

the whole population would take years. How about destructive

nature of some tests? If you’re making wine

in manufacturing, you cannot test every

single bottle of wine. If you make seeds

for farmers, you can’t go out and test every

single one because then there would be none left. And finally and most

importantly, samples are usually adequate. And a great example is

the consumer price index. Those taken from samples

are quite accurate. Now, why do we need samples? Hey, we select samples to

answer research questions about the population. Here at Highline

Community College, do Highline students think

advising should be mandatory? That’s a question

right here on campus they’re trying to

figure out right now. We could, at Highline,

get to every single person because actually we have a list

of all the registered students. But it would take a long time. And it probably would be

impossible to get everybody to answer this question. But sampling, we could

get a reasonable estimate of what people thought. For manufacturing, is the

machine filling accurately? So if you’re filling boxes of

cereals or bags of lettuce, we absolutely need to

periodically sample to make sure the

machine is working. Hey, next question

about sampling, if the population

is so big, how do we know where to go

and get the data? Well, the first thing is

we have to be careful. And the big thing we

have to be careful of is we must make sure that the

sample population is the same as the target population. Now, sample

population, that’s just population from which

the sample is drawn. And target population,

that’s the one we’re trying to actually

get at, population, we want to make the

inference about. Here’s an example to illustrate

the difference between sample and target. The goal is to gather data about

students at Laney Community College. That’s a community college

in Oakland, California. By the way, they have a great

culinary arts restaurant there. So here’s the deal. If you took a sample from Laney

College registration list, the sample population would

equal the target population. Because remember, our goal is

to gather data about students at Laney College. But if we took a

sample from people who ate at the culinary

arts restaurant, that means we just went there,

and it was convenient for us just to ask people

as they were eating, right? The problem with

that scenario is, we would get some students

from Laney College. They would be in the

target population. But we’d also get

some people eating who are not Laney College students. Those people are not in

the target population. So in this example,

the sample population would actually be people

eating at the culinary arts restaurant, whereas

the target population was all the students. So the target

population would not equal the sample population. Some other examples–

if we were trying to calculate Seattle

house price mean, and we accidentally used

some data from Burien also. The target population was

just Seattle house prices. But we accidentally mixed and

got some from Burien also. The most famous example,

example three– and I’ll let you Google this because

this is quite an amazing one– the 1936 presidential

election was between Landon and Roosevelt. And there was

a famous poll that was done. And they only used phone

directories and other lists that they mistakenly

thought represented the whole population. But they only represented

middle and upper class people. So the views they got

were highly biased. Well, this famous

poll disastrously mispredicted the outcome. And of course, Roosevelt won. Here, the sample population

was the middle upper class instead of what

they really wanted, the target population,

which was lower, middle, upper, and everything

else, all the classes together. All right, now, we got to

talk about what a frame is. And a frame as simple. It’s just a list of all the

elements in the population, or a list of elements that

the sample is selected from. Now, you can imagine that

frames are not always possible to create. So example frames

that you can create? Well, if we had a

list of all the names of registered students at

Laney College, that’s something we can get, or obtain. A list of all the companies

listed on the New York Stock Exchange, so if we

were going to do a sample on just some

of the companies, we could choose from this list. These are finite lists. If we actually have

a list, then we can use what’s called

simple random sampling. And we’ll look at an

example of that in Excel. We’ll put a list there and

randomly select some data from that list. Examples of frames that

cannot be constructed– population is too big, like

all the fish in the sea. Actually, there

was a famous study just recently too back in

2008, I think it was started. And it went on for years and

years trying to actually count all of the fish in the sea. But of course, they

were doing a sample. Also, frames that

cannot be constructed– populations from an ongoing

process, for example, machines filling

boxes of cereal, transactions

occurring at a bank, customers entering at a store. In all of these cases,

it’s theoretically infinite because there’s no upper limit. We consider it infinite because

we cannot construct a frame. Even though customers

are entering a store, we can count there’s two

people that just came in. But theoretically,

any number of people could have come in at

that particular moment. As you can imagine,

if you have a frame, then it’s a lot easier

to randomly select items from a frame than if

you don’t have a frame. Now, let’s scroll down and talk

about two sampling methods. The first one we’re

going to talk about is when you have a

finite population where you have a frame. We can use simple

random sampling. Now, we’re going to cover this

over in Excel in just a second. But the key is we’re

going to add a new column and use the RAND function to

randomly generate a number. And then we’re going to

sort those random numbers and pick whatever

items are at the top. Here’s some

information about RAND. We’ll talk about

that over in Excel. Now, the second

method for sampling is for infinite population,

or an ongoing process. That’s called random sample. Now we’re going to select any

end units in a random way. But two things must be true. Each element must be

from the same population. Sample population has to

equal target population. Now, for an ongoing

process, people walking in or an infinite

population, remember, we don’t have that

frame, that list. If we look up here for

a simple random sample, we’re actually going to have

the frame sitting right there. So that’s all of the items that

are sampled and the target. But down here, if it’s

infinite or an ongoing process, we have to be really careful

that the sample population is equal to the target population. The second thing

that must be true is each element is

selected independently. This is so we don’t have bias. We don’t want to select all

the items from a similar group, similar attitudes. Now, if you imagine you

were polling people walking into a restaurant,

well, if you only ask people of a certain

age group because you felt comfortable

talking to them, that would mean you are not

selecting sample elements independently. There is a bias. So both of these

things have to be true when you do a random sample. Now, when you’re doing

an ongoing process like filling a

machine, to make sure that each element is

from the same population, you choose samples from

the same point in time, so all elements from

the same population. So you make sure that you

pull all the bags of lettuce or all the bottles of ketchup or

whatever and right at that one point of time, and then weigh

them, sample them, et cetera. Now, I want to go over to

Excel and figure out how to do a simple random sample. Here’s the method that

we’re going to use. But I want to investigate a

little bit before we actually do the actual random sample. And we’ll look at the definition

of simple random sample. The definition is a

simple random sample of size n from a finite

population of size n is a sample selected such that

each possible sample size of n has the same probability

of being selected. Now, before we can really

make sense of this, I actually want to try and

list every single sample and see what they’re

talking about. So let’s go over to Excel. Now, before we list all

the possible samples, let’s just look at one sample. Here’s our original data set

that we started off this video with, sales rep and sales. And here is one possible sample. There’s the names. There’s the values. Now, we have a population size

big N of 7, 1, 2, 3, 4, 5, 6, 7. The population

mean or mu is 289. We just took an average. Population standard

deviation is about 74. Now, here’s our sample. Sample mean equals x bar

equals the point estimator of the population mean. So here, we come over

and we calculate our mean using the average function. When I hit Enter, 255. That’s the point estimator

of our population mean. But we can see that

there’s a big difference. And the difference is

called sampling error. We’re going to take x bar and

subtract our population mean. Enter. So our point estimator is $34

below the population mean. Now, part of what we’ll

do in this chapter is we’ll find out if our point

estimator and the sample error seem reasonable, or not. Is this sample error too big? Or is it OK? It’s just sampling error. And we’ll accept this as a

reasonable point estimator to the population mean. Now, this is just one sample. Let’s go over to the sheet,

all the possible samples. And no way. Here’s our data set. We’re going to try and

list every single sample. Now, we actually did

this earlier in the class where we looked at big N

population 7, sample size 3. We used the Combination

function to calculate how many total samples there would be. I got busy listing

every single one. Jo and Sue, I listed

those five times. And then I listed

all the matches. Oh, and I went

through and did it over and over for every

single possible sample. Now, the one we just did on this

sheet right here, Chin, Kip, Jo? That’s in here somewhere. But first, I want to actually

get every single number. So this 185 goes here. This 250 goes here. So for Chin, this

210 goes right here. Then I want to calculate

every single possible x bar. Now, there’s no way I’m going to

copy and paste, copy and paste. Instead, I use the VLOOKUP. Notice in this cell, I

need to look up Jo’s sales and bring it back over here. Here, I need to look up

Sue’s sales right there, 250, and bring it back down here. All the way down here,

I need to look up Tyrone’s sale on this

table, get that number, and bring it back over here. Sounds like the perfect

job for VLOOKUP. Now, VLOOKUP needs

a lookup value. I’m going to say, hey,

VLOOKUP, go look up Jo– and that’s a relative

cell reference– comma. The table– first column has

the items we’re matching. Subsequent columns

have things that we’re trying to get and

bring back to the cell. Now, I’m going to hit

F4 to lock that comma. Now, I need to tell VLOOKUP

which column, one or two, has the thing that I want to

get and bring back to the cell. Well, of course, the second

column has the sales number. So I just put a 2

comma exact match because we’re

exactly matching Jo in the first column, false,

or zero, close parentheses. Control Enter and watch this. I’m going to copy it over and

double click and send it down. I’m going to go to the

last cell, F2, to check it. And sure enough, it’s

looking up Kip in there. And it got exactly

the right number. Man, I would not like

to do that manually. Luckily, we have VLOOKUP. Now, we can do the x bar for

every single possible sample point. Control Enter and double

click and send it down. Now, if we look through here–

and I don’t remember who it was, Chin, Kip, Jo– So Jo and Chin, and

here it is right here. And there’s our sample. But look, that’s just one. And there’s 34 other

possible samples. Now, part of what

we’re doing here is helping us understand

that definition. But also, this is

going to lead us up to the sampling

distribution of x bar and then the central

limit theorem, which will give us real power. Now, let’s just go

back to our PDF. And notice it says a simple

random sample of size n from a finite

population of size n is a sample selected so that

each possible sample has the same probability. So here it is. We have every single

possible sample. And what’s the

probability of each one? It’s a uniform distribution– 1 divided by 35 possibilities. F4, Control Enter, double

click, and send it down. So if we’re going to do a

real simple random sample, the process we use

better have 0.0285 for every single one of

the possible samples. Again, if we calculate

randomly select a sample, it’s going to be one of these. All right, well, guess what? There is a built in function

in Excel that follows a uniform distribution. And we’re going to

use the RAND function. Now, the RAND function

is argument list. That means you just

put open parentheses, close parentheses,

and then Enter. It randomly generates a

number between 1 and 0. It’s actually greater than or

equal to 0 and less than 1. But it’s got a number

with 15 digits. And it’s random. If you hit the F9 key,

which is the evaluate key, it randomly generates

numbers between 1 and 0 up to 15 digits. This follows a

uniform distribution. And we can use this when

selecting our samples to simulate exactly this. Now, let’s scroll over here. We will come back to this

when we build our sampling distribution of x bar

and use this to build the actual distribution. That just means we take all

of our random variables x bar and list them in a histogram

or some sort of chart to look at the distribution. But what we want

to see now is how to create a simple

random sample. So we’ll click on the

sheet simple random sample. Here’s our data set. We’re going to add a column

with the RAND function. Then we’re going to sort it. And then copy and

paste the ones on top. Our sample size equals 3. We obviously have a

finite population. And here’s our frame, right? So we’re going to start

off with equals RAND. Control Enter and copy it down. Now, one thing about the

RAND function is watch this. If I start typing over here,

watch when I hit Enter. Oh, anything you do at all

triggers RAND to recalculate. So what we’re going to do

is now we’re going to sort. And I’m going to go to the data

and right-click uncollapse. So I want to click

in a single cell. And let’s, before we

apply our rule here, just watch what happens when

I sort A to Z. Remember? It’s supposed to bring

the smallest on top. So I clicked it. Is the smallest one on top? Well, I can already

see 0.0a, 0.07, 0.02– that’s the smallest. The reason why is

the act of sorting– click– is another thing that

triggers the RAND function. So when you sort, it actually

does sort the smallest number to the top and the associated

records just for a second. But when it stops, it

retriggers it again. So when we click A to Z, even

though the smallest number is not necessarily at

the top, the data set was sorted using a

uniform distribution. So here’s our rule. Add the RAND. Click sort A to Z

button five times. It could be any number– 1, 2, 10. Just always keep consistent. And then copy and paste

the top three records. So you ready? Click in a single cell. One, two, three, four, five. And that’s it. The top three, I’m going to

Control C and paste over here. Now, I can calculate x bar for

this particular sample equals average. And there we go. Control Enter. 215, so x bar,

215, and let’s all go see if we can

find that over here. It’s Sue, Jo, Chin. So oh, my heavens. It’s the very first one. Look at that. And there it is,

that particular 215. Now, we go back over to

simple random sample. Now, we could do

this another way. And this way is not necessary. This way is kind of

nice because we just take them and copy and paste. This method will

automate the process. If we do it with a

formula like we’re going to do over here to

extract randomly three records, every time

we do anything, it calculates all

of our formulas and reformulates our sample. But nevertheless, notice we

have a first column with RAND. And one of our rules is that we

need the three smallest values. Well, we can use this SMALL

function to actually retrieve the three smallest. The array– that’s this

column right here now– F4 to lock it because we’re

going to copy it down, comma and k, first

smallest, second smallest, third smallest. Boom, like that. Close parentheses, Control

Enter and copy it down. Any time they change, F9,

it automatically lists them first smallest, second

smallest, third smallest. Now notice we have a match

between some lookup value and a first column

in a lookup table. The second column

has our sales rep, which we want to return here. And our third column

has our sales, which we want to return here. So watch this– equals VLOOKUP– there is my lookup value– comma the table array,

the entire range, F4 to lock it comma. In the column index, I’m simply

going to put a 2 comma 0 close parentheses. Control Enter. Now, I can copy this down. And watch this. I’m going to cheat. I’m going to copy this over. It’s going to be totally

wrong for a second. Double click that. Click on the SMALL. And then change this to a 3. Control Enter and copy it down. And so there we have random

samples being created. By the way, if you knew

mixed cell references here, we could do one formula. But we don’t know mixed cell

references in this class, so we’re not going to– we’re going to do it

in a 2-step process. Hey, let’s calculate our

x bar Tab equals average and get this range right here. Enter. We’re even going to calculate

our population mean over here. And then I’m going

to calculate sampling error equals the particular

x bar minus our mu population mean. Wow, so that particular

sample of the 35 possible was 9.7 away. F9, and every time I hit F9,

so we’ve got a 29, a minus 40, a minus 34– that was our one we had before– 51, minus 4, minus 74, minus 16. So there we go,

simple random sample. Now, in this video, we

talked about creating a simple random sample. We used RAND,

SMALL, and VLOOKUP. We also did just the

RAND and the SORT five times and copy and paste. We also very importantly saw if

we have a particular population and sample, we can calculate

all the possible samples, list them, get the sales,

calculate the x bar. We’ll use that next video in our

sampling distribution of x bar. But here, we used it to

just see that the definition of simple random sample set,

each one of these samples has to have the

same probability. And then we talked about a point

estimator and sample error. And then back in our PDFs,

we talked about sampling. Next video, we’ll actually

create our sampling distribution of x bar

and discover something quite amazing about the mean

of the sampling distribution and about the population mean. All right, we’ll

see you next video.