Lesson 6a: Hypothesis Testing for One-Sample Proportion

As mentioned before, methods of making inferences about parameters are either estimating the parameter or testing a hypothesis about the value of the parameter. In this lesson, we will introduce the concepts of hypothesis testing. Then we will discuss hypothesis testing for a population proportion. In the next Lesson, we discuss inference for the population mean.

Objectives

Upon successful completion of this lesson, you should be able to:

6a.1 - Introduction to Hypothesis Testing

6a.1 - Introduction to Hypothesis Testing

Basic Terms

The first step in hypothesis testing is to set up two competing hypotheses. The hypotheses are the most important aspect. If the hypotheses are incorrect, your conclusion will also be incorrect.

The two hypotheses are named the null hypothesis and the alternative hypothesis.

The null hypothesis is typically denoted as \(H_0\). The null hypothesis states the "status quo". This hypothesis is assumed to be true until there is evidence to suggest otherwise.

The alternative hypothesis is typically denoted as \(H_a\) or \(H_1\). This is the statement that one wants to conclude. It is also called the research hypothesis.

The goal of hypothesis testing is to see if there is enough evidence against the null hypothesis. In other words, to see if there is enough evidence to reject the null hypothesis. If there is not enough evidence, then we fail to reject the null hypothesis.

Consider the following example where we set up these hypotheses.

Example 6-1

A man, Mr. Orangejuice, goes to trial and is tried for the murder of his ex-wife. He is either guilty or innocent. Set up the null and alternative hypotheses for this example.

Putting this in a hypothesis testing framework, the hypotheses being tested are:

  1. The man is guilty
  2. The man is innocent

Let's set up the null and alternative hypotheses.

\(H_0\colon \) Mr. Orangejuice is innocent

\(H_a\colon \) Mr. Orangejuice is guilty

Remember that we assume the null hypothesis is true and try to see if we have evidence against the null. Therefore, it makes sense in this example to assume the man is innocent and test to see if there is evidence that he is guilty.

The Logic of Hypothesis Testing

We want to know the answer to a research question. We determine our null and alternative hypotheses. Now it is time to make a decision.

The decision is either going to be.

  1. reject the null hypothesis or.
  2. fail to reject the null hypothesis.

Note! Why can’t we say we “accept the null”? The reason is that we are assuming the null hypothesis is true and trying to see if there is evidence against it. Therefore, the conclusion should be in terms of rejecting the null.

Consider the following table. The table shows the decision/conclusion of the hypothesis test and the unknown "reality", or truth. We do not know if the null is true or if it is false. If the null is false and we reject it, then we made the correct decision. If the null hypothesis is true and we fail to reject it, then we made the correct decision.

Decision Reality
\(H_0\) is true \(H_0\) is false
Reject \(H_0\), (conclude \(H_a\)) Correct decision
Fail to reject \(H_0\) Correct decision

So what happens when we do not make the correct decision?

When doing hypothesis testing, two types of mistakes may be made and we call them Type I error and Type II error. If we reject the null hypothesis when it is true, then we made a type I error. If the null hypothesis is false and we failed to reject it, we made another error called a Type II error.

Decision Reality
\(H_0\) is true \(H_0\) is false
Reject \(H_0\), (conclude \(H_a\)) Type I error Correct decision
Fail to reject \(H_0\) Correct decision Type II error

Types of errors

When we reject the null hypothesis when the null hypothesis is true. When we fail to reject the null hypothesis when the null hypothesis is false.

The “reality”, or truth, about the null hypothesis is unknown and therefore we do not know if we have made the correct decision or if we committed an error. We can, however, define the likelihood of these events.

The probability of committing a Type I error. Also known as the significance level. The probability of committing a Type II error. Power is the probability the null hypothesis is rejected given that it is false (ie. \(1-\beta\))

\(\alpha\) and \(\beta\) are probabilities of committing an error so we want these values to be low. However, we cannot decrease both. As \(\alpha\) decreases, \(\beta\) increases.

Note! Type I error is also thought of as the event that we reject the null hypothesis GIVEN the null is true. In other words, Type I error is a conditional event and \(\alpha\) is a conditional probability. The same idea applies to Type II error and \(\beta\).

Example 6-1 Cont'd.

A man, Mr. Orangejuice, goes to trial and is tried for the murder of his ex-wife. He is either guilty or not guilty. We found before that.

Interpret Type I error, \(\alpha \), Type II error, \(\beta \).

Type I Error: Type I error is committed if we reject \(H_0 \) when it is true. In other words, when the man is innocent but found guilty. \( \alpha \): \( \alpha \) is the probability of a Type I error, or in other words, it is the probability that Mr. Orangejuice is innocent but found guilty. Type II Error: Type II error is committed if we fail to reject \(H_0 \) when it is false. In other words, when the man is guilty but found not guilty. \(\beta\): \(\beta\) is the probability of a Type II error, or in other words, it is the probability that Mr. Orangejuice is guilty but found not guilty.

As you can see here, the Type I error (putting an innocent man in jail) is the more serious error. Ethically, it is more serious to put an innocent man in jail than to let a guilty man go free. So to minimize the probability of a type I error we would choose a smaller significance level.

Try it!

An inspector has to choose between certifying a building as safe or saying that the building is not safe. There are two hypotheses:

  1. Building is safe
  2. Building is not safe

Set up the null and alternative hypotheses. Interpret Type I and Type II error.

\( H_0\colon\) Building is not safe vs \(H_a\colon \) Building is safe

Decision Reality
\(H_0\) is true \(H_0\) is false
Reject \(H_0\), (conclude \(H_a\)) Reject "building is not safe" when it is not safe (Type I Error) Correct decision
Fail to reject \(H_0\) Correct decision Failing to reject 'building not is safe' when it is safe (Type II Error)

Power and \(\beta \) are complements of each other. Therefore, they have an inverse relationship, i.e. as one increases, the other decreases.

It makes sense for us to set up the \(H_0\) and \(H_a\) as above (that is, assume building is not safe until proven otherwise), because if we switch \(H_0\) and \(H_a\) (that is, if \(H_0\) was building is safe and \(H_a\) is building is not safe) and if we fail to reject \(H_0\), we cannot quite conclude that building is safe (we can only fail to reject \(H_0\), we cannot accept \(H_0\)).

6a.2 - Steps for Hypothesis Tests

6a.2 - Steps for Hypothesis Tests

The Logic of Hypothesis Testing

A hypothesis, in statistics, is a statement about a population parameter, where this statement typically is represented by some specific numerical value. In testing a hypothesis, we use a method where we gather data in an effort to gather evidence about the hypothesis.

How do we decide whether to reject the null hypothesis?

Six Steps for Hypothesis Tests

In hypothesis testing, there are certain steps one must follow. Below these are summarized into six such steps to conducting a test of a hypothesis.

  1. Set up the hypotheses and check conditions: Each hypothesis test includes two hypotheses about the population. One is the null hypothesis, notated as \(H_0 \), which is a statement of a particular parameter value. This hypothesis is assumed to be true until there is evidence to suggest otherwise. The second hypothesis is called the alternative, or research hypothesis, notated as \(H_a \). The alternative hypothesis is a statement of a range of alternative values in which the parameter may fall. One must also check that any conditions (assumptions) needed to run the test have been satisfied e.g. normality of data, independence, and number of success and failure outcomes.
  1. Decide on the significance level, \(\alpha \): This value is used as a probability cutoff for making decisions about the null hypothesis. This alpha value represents the probability we are willing to place on our test for making an incorrect decision in regards to rejecting the null hypothesis. The most common \(\alpha \) value is 0.05 or 5%. Other popular choices are 0.01 (1%) and 0.1 (10%).
  1. Calculate the test statistic: Gather sample data and calculate a test statistic where the sample statistic is compared to the parameter value. The test statistic is calculated under the assumption the null hypothesis is true and incorporates a measure of standard error and assumptions (conditions) related to the sampling distribution.
  1. Calculate probability value (p-value), or find the rejection region: A p-value is found by using the test statistic to calculate the probability of the sample data producing such a test statistic or one more extreme. The rejection region is found by using alpha to find a critical value; the rejection region is the area that is more extreme than the critical value. We discuss the p-value and rejection region in more detail in the next section.
  1. Make a decision about the null hypothesis: In this step, we decide to either reject the null hypothesis or decide to fail to reject the null hypothesis. Notice we do not make a decision where we will accept the null hypothesis.
  1. State an overall conclusion: Once we have found the p-value or rejection region, and made a statistical decision about the null hypothesis (i.e. we will reject the null or fail to reject the null), we then want to summarize our results into an overall conclusion for our test.

We will follow these six steps for the remainder of this Lesson. In the future Lessons, the steps will be followed but may not be explained explicitly.

Step 1 is a very important step to set up correctly. If your hypotheses are incorrect, your conclusion will be incorrect. In this next section, we practice with Step 1 for the one sample situations.

6a.3 - Set-Up for One-Sample Hypotheses

6a.3 - Set-Up for One-Sample Hypotheses

We will continue our discussion by considering two specific hypothesis tests: a test of one proportion, and a test of one mean. We will provide the general set up of the hypothesis and the test statistics for both tests. From there, we will branch off into specific discussions on each of these tests.

In order to make a judgment about the value of a parameter, the problem can be set up as a hypothesis testing problem. We usually set the hypothesis that one wants to conclude as the alternative hypothesis, also called the research hypothesis.

Since hypothesis tests are about a parameter value, the hypotheses use parameter notation - \(p \) for proportion or \(\mu \) for mean - in their arrangement. For tests of a proportion or a test of a mean, we would choose the appropriate alternative based on our research question.

Below are the possible hypotheses from which we would select only one of them based on the research question. The symbols \(p_0 \) and \(\mu_0 \) are used in these general statements and in practice, get replaced by the parameter value, or constant, being tested.

One Sample Proportion
Research Question Is the population proportion different from \(p_0\)? Is the population proportion greater than \(p_0\)? Is the population proportion less than \(p_0\)?
Null Hypothesis, \(H_\) \(p=p_0\) \(p= p_0\) \(p= p_0\)
Alternative Hypothesis, \(H_\) \(p\neq p_0\) \(p> p_0\) \(p < p_0\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
*\( p_ <0>\) is the hypothesized population proportion One Sample Mean
Research Question Is the population mean different from \( \mu_ \)? Is the population mean greater than \(\mu_\)? Is the population mean less than \(\mu_\)?
Null Hypothesis, \(H_\) \(\mu=\mu_ \) \(\mu=\mu_ \) \(\mu=\mu_ \)
Alternative Hypothesis, \(H_\) \(\mu\neq \mu_ \) \(\mu> \mu_ \) \(\mu <\mu_\)
Type of Hypothesis Test Two-tailed, non-directional Right-tailed, directional Left-tailed, directional
*\( \mu_ <0>\) is the hypothesized population mean

The most important step in hypothesis testing is choosing the correct parameter of interest and correctly setting up the alternative hypothesis.

Example 6-3 Null and Alternative Hypotheses

In each of the following scenarios, determine the parameter of interest and the null and alternative hypotheses.

When debating the State Appropriation for Penn State, the following question is asked: "Are the majority of students at Penn State from Pennsylvania?"

The response variable is 'State' and is a qualitative variable. Therefore, the parameter of interest would be \(p \) the population proportion of students from PA. The hypotheses should be in terms of \(p \). The value we are testing is the “majority” (50%) or \(p_0=0.5 \). The majority also implies greater than 50%. Thus, the hypothesis set up would be a right-tailed test. \( H_0\colon p=0.5 \) vs. \(H_a\colon p>0.5 \)

A consumer test agency wants to see the whether the mean lifetime of a brand of tires is less than 42,000 miles. The tire manufacturer advertises that the average lifetime is at least 42,000 miles.

The response variable here is 'lifetime' and is a quantitative variable. Therefore, set up the hypotheses in terms of \(\mu \). Here the value of \(\mu_0 \) is 42,000. With the consumer test agency wanting to research that the mean lifetime is below 42,000, we would set up the hypotheses as a left-tailed test: \( H_0\colon \mu=42000 \) vs. \(H_a\colon \mu

The length of a certain lumber from a national home building store is supposed to be 8.5 feet. A builder wants to check whether the shipment of lumber she receives has a mean length different from 8.5 feet.

The response variable is the 'length of lumber' and is quantitative. Therefore, we set up the hypotheses in terms of \(\mu \). Here the value of \(\mu_0 \) is 8.5. With the builder wanting to check if the mean length is different from 8.5, she would set up the hypotheses as a two-tailed test: \( H_0\colon \mu=8.5 \) vs \(H_a\colon \mu\ne 8.5 \)

A political news company believes the national approval rating for the current president has fallen below 40%.

The response variable here is 'approval rating' and is a qualitative variable. Therefore, we will set up the hypothesis in terms of \(p \). In this case, the \(p_0 \) value is 0.4 and the hypotheses would be set up as a left-tailed test: \( H_0\colon p=0.4 \) vs. \(H_a\colon p

6a.4 - Hypothesis Test for One-Sample Proportion

6a.4 - Hypothesis Test for One-Sample Proportion

Overview

In this section, we will demonstrate how we use the sampling distribution of the sample proportion to perform the hypothesis test for one proportion.

Recall that if \(np \) and \(n(1-p) \) are both greater than five, then the sample proportion, \(\hat

\), will have an approximate normal distribution with mean \(p \), standard error \(\sqrt> \), and the estimated standard error \(\sqrt<\frac<\hat

(1-\hat

)>> \).

In hypothesis testing, we assume the null hypothesis is true. Remember, we set up the null hypothesis as \(H_0\colon p=p_0 \). This is very important! This statement says that we are assuming the unknown population proportion, \(p \), is equal to the value \(p_0 \).

Since this is true, then we can follow the same logic above. Therefore, if \(np_0 \) and \(n(1-p_0) \) are both greater than five, then the sampling distribution of the sample proportion will be approximately normal with mean \(p_0 \) and standard error \(\sqrt> \).

We can find probabilities associated with values of \(\hat

\) by using:

Example 6-4

Referring back to a previous example, say we take a random sample of 500 Penn State students and find that 278 are from Pennsylvania. Can we conclude that the proportion is larger than 0.5?

Is 0.556(=278/500) much bigger than 0.5? What is much bigger?

This depends on the standard deviation of \(\hat

\) under the null hypothesis.

The standard deviation of \(\hat

\), if the null hypothesis is true (e.g. when \(p_0=0.5\)) is:

We can compare them by taking the ratio.

Therefore, assuming the true population proportion is 0.5, a sample proportion of 0.556 is 2.504 standard deviations above the mean.

The \(z^*\) value we found in the above example is referred to as the test statistic.

The sample statistic one uses to either reject \(H_0 \) (and conclude \(H_a \) ) or fail to reject \(H_0 \).

6a.4.1 - Making a Decision

6a.4.1 - Making a Decision

In the previous example for Penn State students, we found that assuming the true population proportion is 0.5, a sample proportion of 0.556 is 2.504 standard deviations above the mean, \(p \).

Is it far enough away from the 0.5 to suggest that there is evidence against the null? Is there a cutoff for the number of standard deviations that we would find acceptable?

What if instead of a cutoff, we found a probability? Recall the alternative hypothesis for this example was \(H_a\colon p>0.5 \). So if we found, for example, the probability of a sample proportion being 0.556 or greater, then we get \( P(Z>2.504)=0.0061 \).

This means that, if the true proportion is 0.5, the probability we would get a sample proportion of 0.556 or greater is 0.0061. Very small! But is it small enough to say there is evidence against the null?

To determine whether the probability is small or how many standard deviations are “acceptable”, we need a preset level of significance, which is the probability of a Type I error. Recall that a Type I error is the event of rejecting the null hypothesis when that null hypothesis is true. Think of finding guilty a person who is actually innocent.

When we specify our hypotheses, we should have some idea of what size of a Type I error we can tolerate. It is denoted as \(\alpha \). A conventional choice of \(\alpha \) is 0.05. Values ranging from 0.01 to 0.1 are also common and the choice of \(\alpha \) depends on the problem one is working on.

Once we have this preset level, we can determine whether or not there is significant evidence against the null. There are two methods to determine if we have enough evidence: the rejection region method and the p-value method.

Rejection Region Approach

We start the hypothesis test process by determining the null and alternative hypotheses. Then we set our significance level, \(\alpha \), which is the probability of making a Type I error. We can determine the appropriate cutoff called the critical value and find a range of values where we should reject, called the rejection region.

The values that separate the rejection and non-rejection regions. The set of values for the test statistic that leads to rejection of \(H_0 \)

The graphs below show us how to find the critical values and the rejection regions for the three different alternative hypotheses and for a set significance level, \(\alpha \). The rejection region is based on the alternative hypothesis.

The rejection region is the region where, if our test statistic falls, then we have enough evidence to reject the null hypothesis. If we consider the right-tailed test, for example, the rejection region is any value greater than \(c_ \), where \(c_\) is the critical value.

Left-Tailed Test

Reject \(H_0\) if the test statistics is less than or equal to the critical value (\(c_\alpha\))

Normal curve with a left tailed test shaded.