Return to References

Inferential Statistics

1. Concept Behind Inferential Statistics To begin with there are certain fundamental concepts in this section that span the section and are used throughout statistics. Often a sample is taken and the sample is a subgroup of a larger group, the population. From the sample it is desired to learn about the population. For example, a survey is taken on 200 people living in Bangkok and their opinion about the underground train. Results are published and comments are made about it. Is anyone truly concerned about the specific 200 people in the survey? If 200 people do not like the underground train does it really matter? No. In fact there exist well over 200 people in Bangkok that have never taken the underground train. What people really want to learn from the survey is the general opinion within Bangkok about the underground and to do this a sample of 200 people are surveyed and asked questions. If all 200 people surveyed did not like the underground train, this is of concern only because it leads us to believe that the general populace within Bangkok do not like the underground train and perhaps only a small minority like the train. The sample is almost immediately within our minds extrapolated to the population at large. In this case the population at large is people living in Bangkok. Inferential statistics are used to learn about the population from the sample. Two common techniques to use a sample to learn about a population that go beyond descriptive statistics are hypothesis testing and confidence intervals. Hypothesis testing is used to test a theory. Confidence intervals are used to obtain a range of values for which you might consider the population mean, $\mu$, to be within. Technically, from a frequentist viewpoint, the population mean is either within the interval or not. 1.1. Hypothesis testing In general within hypothesis testing we wish to test a theory, belief or simply something of interest. It is desired to test if a quantity concerning the population, called a parameter, is either not equal to, greater than or less than some value. Typically, the population mean, $\mu$, or proportion, $\pi$, is the parameter, but not always. In hypothesis testing the theory is turned into what is called a null hypothesis, denoted $H_0$, and an alternative hypothesis, denoted $H_1$ or $H_A$. In general hypothesis testing one may want to compare one group/sample to a specific value, say $\mu_0$. Often within hypothesis testing one may want to compare two groups/samples to each other, such as comparing the average salary of men, say $\mu_1$, to the average salary of women, say $\mu_2$. The alternative hypothesis is what is desired to prove or show to be true and the null hypothesis the opposite. Examples: If it is desired to prove the …

  • average income in Bangkok is greater than 30,000 Baht/month:
    • $H_0:$ $\mu \leq 30,000$ and $H_A:$ $\mu > 30,000$.
  • average income in Bangkok of men is greater than that of women:
    • $H_0:$ $\mu_{men} \leq \mu_{women}$ and $H_A:$ $\mu_{men} > \mu_{women}$.
  • percent of women in Hong Kong is less than 50%:
    • $H_0:$ $\pi \geq 50%$ and $H_A:$ $\pi < 50%$.
  • etc.
Table~h01 lists various null and alternative hypothesis combinations for one and two sample tests of population mean(s) and proportion(s) and how to calculate their associated p-values Note: The Table~h01 for calculating p-value assumes variance is known for investigating the population mean, $\mu$. \begin{array}{|c|c|c|c|}\hline Investigate &H_0 & H_A & Calc. p-value \hline \mu from one&\mu=\mu_0 & \mu \neq \mu_0 & 2\times P(Z>|z|) group/ &\mu \geq \mu_0 & \mu < \mu_0 & P(Z < z) sample &\mu \leq \mu_0 & \mu > \mu_0 & P(Z > z) \hline \pi from one&\pi=\pi_0 & \pi \neq \pi_0& 2\times P(Z>|z|) group/ &\pi \geq \pi_0 & \pi < \pi_0& P(Z < z) sample &\pi \leq \pi_0 & \pi > \pi_0& P(Z > z) \hline \mu from two&\mu_1=\mu_2 & \mu_1 \neq \mu_2& 2\times P(Z>|z|) groups/&\mu_1 \geq \mu_2 & \mu_1 < \mu_2& P(Z < z) samples&\mu_1 \leq \mu_2 & \mu_1 > \mu_2& P(Z > z) \hline \pi from two&\pi_1=\pi_2 & \pi_1 \neq \pi_2& 2\times P(Z > |z|) groups/&\pi_1 \geq \pi_2 & \pi_1 < \pi_2& P(Z < z) samples&\pi_1 \leq \pi_2 & \pi_1 > \pi_2& P(Z > z) \hline \end{array} In hypothesis testing a decision is made by using what is known as a {\it p-value}. The p-value is the probability of observing what was observed or more extreme assuming the null hypothesis is true. If the probability of observing what was observed or more extreme assuming the null hypothesis is true is “very small” the researcher rejects the null hypothesis. The researcher rejects the null hypothesis when the p-value is small because we trust the data over the null hypothesis. Typically p-values less than that of 0.1, 0.05, or 0.01 are considered too small to be random chance and the null hypothesis is rejected. The value which the null hypothesis will be rejected at is called the {\it level of significance} and denoted by $\alpha$. Commonly for large data sets often a significance level of 0.01 is used. Typically in the class room setting an $\alpha=0.05$ is used. \defm{ Important: If p-value $< \alpha$ then reject $H_0$ If p-value $\ge \alpha$ then fail to reject $H_0$} For hypothesis testing regardless of the test chosen and the test-statistic used the steps are generally the same. This book will only cover the p-value approach to hypothesis testing. Other books cover a rejection region as well. The rejection region approach is useful for when a p-value can’t be calculated. For example, when the researcher does not have access to a computer, like on exams. When working, in this day in age the researcher will most likely have access to a computer and almost all, if not all statistical software calculates a p-value for hypothesis testing. For this reason only the p-value approach will be covered.
  • l{Steps Within Hypothesis Testing: P-value Approach
    1. Determine the null hypothesis, $H_{0}$, and the alternative hypothesis, $H_{A}$.
    2. Decide on the appropriate level of significance, $\alpha$.
    3. Determine the sample size and sampling design to use.
      • The tests in this chapter are appropriate when the data comes from a simple random sample.
      • The tests in this chapter and other statistical tests are \bf not appropriate when the data comes from a convenience or other type of non-probability sample.
    4. Determine the appropriate test statistic given the data and sampling design.
    5. Collect the data and calculate the appropriate test statistic.
    6. Calculate the p-value for the $H_{0}$ and $H_{A}$ combination.
    7. Make a decision whether to fail to reject $H_{0}$ or reject the $H_{0}$ by comparing the p-value to $\alpha$.
    } After making a decision there are two possible types of error, type I and type II. A Type I error is when when you reject the null hypothesis and the null hypothesis is actually true. A Type II error is when you fail to reject the null hypothesis and the null hypothesis is actually false, with probability $\beta$. The {\it power} of a test equals $1-\beta$ which is the probability of rejecting the null hypothesis when the null hypothesis is false. All possible error/no error results of a hypothesis test are given in Table~error12. \begin{array}{|c|c|c|}\hline & H_0 is true & H_0 is false \hline Fail to reject H_0 &P(No error)=1-\alpha & P(Type II Error)=\beta \hline Reject H_0 &P(Type I error)=\alpha & P(No error)=1-\beta \hline \end{array} \heads{Important} When a hypothesis test is performed, the result is either {\bf fail to reject} the null hypothesis or {\bf reject} the null hypothesis. {\bf Do not say “accept”} the null hypothesis. There is a huge difference between not having enough evidence to disprove something and proving something. Reject $H_0$ is like disproving $H_0$ and fail to reject $H_0$ is like failing to disprove $H_0$, but this is very different from saying accept $H_0$ or that $H_0$ has been proved. This is a very important concept and understanding it will help you avoid much confusion when performing hypothesis testing and working with data. \heads{Scenario} You have a theory that on average men in Bangkok weigh more than 65 kilograms. $H_0: \mu \leq 65$ and $H_A: \mu > 65$. Data are collected, a simple random sample of size $n=100$ and the average weight from the sample is $\bar{x}=67.3$ kilograms. The statistician performs a statistical test and the p-value is 0.23, so he {\bf fails to reject} $H_0$. Is he saying he believes the average male weight in Bangkok is less than or equal to 65 kilograms? No! Were he to say he accepts $H_0$ this would be implying he believes the average weight is less than or equal to 65 kilograms. What the statistician is saying is that there is not enough evidence to show your theory beyond a reasonable doubt, so he can not reject $H_0$. This subtle difference is very important. Imagine saying to someone that the sample average is $\bar{x}=67.3$ kilograms so you believe the population average is less than or equal to 65 kilograms. That does not make any logical sense. Given the amount of data, the sample average, $\bar{x}=67.3$, and the sample standard deviation, we are not confident in saying that the population average is greater than 65 kilograms. This is a what is being shown by the hypothesis test. With more information it could possibly be shown that $\mu>65$. For this particular reason the author tends to prefer looking at confidence intervals for a deeper understanding of what the data collected are saying. 1.2. Confidence Intervals In general when creating what is called a confidence interval, we wish to obtain a range of plausible values for a quantity concerning the population, a parameter. Typically the population mean, $\mu$, or proportion, $\pi$, is the parameter, but not always. Also, it is often desired determine a plausible range between two groups/samples to each other, such as comparing the average salary of men, say $\mu_1$, to the average salary of women, say $\mu_2$. A $(1-\alpha) \times 100%$ confidence interval is the probability of obtaining the parameter of interest under what is known as a Bayesian approach and is often the way a confidence interval is explained. Bayesian’s consider the parameter of interest a random variable. The author is a frequentist, and the author considers the parameter to be an unknown constant. Under the frequentist approach, a $(1-\alpha) \times 100%$ is the percent of confidence intervals that are expected to contain the true value of the parameter of interest. This is assuming an infinite number of samples taken of the same size, under a simple random sample. Of course, in reality only a single sample is taken in practice. The confidence interval is thus often considered the range of plausible values the parameter might be, what it is, is unknown in reality though and may or may not be within the interval.