Return to References

Random Variables

Understanding the concept of a random variable is important for a deeper understanding of statistics. Next some key terminology will be covered related to random variables.

\defs{Definitions}

  • {\bf Random variable:
    Is an outcome or observation whose value is determined by a process that is not predetermined and thus can't be predicted. Random variables are often denoted using capital letters, and possible values that a random variable can take by a lower case letter.}
    1. {\bf Categorical random variable:} Is a random variable that results in categorical response (non-numeric), such as gender (male or female), and opinion (strongly disagree, disagree, ..., or strongly agree).
      • {\bf Dummy coding:} Dummy coding is turning a variable with two or more outcomes into a variable(s) with possible values of 0 and 1. Often categorical variables are dummy coded for
        analysis purposes. For example, the gender male might be assigned the value of 0 and females the value of 1. If there are several categories, several dummy variables are needed to capture all the information. The dummy coded data can now be treated as a numerical random variable.
    2. {\bf Numerical random variable:} Is a random variable that results in a numerical response. Examples include height, weight, age, income, etc. of a randomly selected individual.
      1. {\bf Discrete random variable:} Resulting integer values, like the number of heads observed when flipping a coin four times, x=0,1,2,3 or 4. For an example, see Table~contdisc1.
      2. {\bf Continuous random variable:} Resulting in continuous values,
        like income. For an example see Table~contdisc1.
  • {\bf Cumulative distribution function (c.d.f.):}
    Basically $P(X \leq x)$ where $X$ is a random variable and $x$ is a real number.
    The cdf is often denoted with a capital $F$ as $F(x)$, i.e. $F(x)=P(X \leq x)$.
  • {\bf Probability distribution function (p.d.f.):}
    1. For a discrete random variable it is merely the probability of a certain value occurring, $P(X=x)$.
      • The probability distribution function has the following properties:
        1. $f(x_i) \geq 0, \quad \forall i.$
        2. $\sum_{\forall i} f(x_i)=1$
    2. For a continuous random variable the $P(X=x)=0$ and thus the definition is not the same. The p.d.f. for a continuous random variable is a curve described by the function, $f(x)$. The area under the curve within a given interval yields the probability of the continuous random variable falling within that given interval.
      • The probability distribution function has the following properties:
        1. $f(x) \geq 0$
        2. $\int_{-\infty}^{\infty}{f(x)dx}=1$
        3. $F(b)-F(a)=P(a\leq X\leq b) = \int_{a}^{b}{f(x)dx}$, which is the area under the curve $f(x)$ from $a$ to $b$, $a\leq b$.
      • Note: $P(X=b)=F(b)-F(b)=\int_{b}^{b}{f(x)dx}=0$, that is the probability of a continuous random variable equaling a specific constant, say $b$, is zero.
  • {\bf Expectation} of a random variable is the mean value (a weighted mean) of the variable $X$ in the sample space, or population, of possible outcomes. {\em Expected value} can also be interpreted as the mean value that would be obtained from an infinite number of observations of the random variable.

\begin{table}
\centering
\begin{tabular}{|c|c|}\hline
Discrete & Continuous\\\hline
0& 736.1918273\\
1& 759.5668806\\
2& 812.7593044\\
3& 562.2359305\\
4& 798.2952718\\\hline
\end{tabular}
\caption{Example of Discrete and Continuous Data}
\label{contdisc1}
\end{table}

\defl{Examples of Categorical, Continuous and Discrete Data.}

  • Categorical:
    1. Gender
    2. Blood Type
    3. Marital Status
    4. Eye Color
    5. Political Party
  • Discrete:
    1. Number of people using the ATM at a certain location within the past hour.
    2. Number of brothers or sisters a person has.
    3. Number of times a person won at roulette within the past 20 spins.
  • Continuous:
    1. Income
    2. Age
    3. Height
    4. Weight
[\latex]

Binomial

\defl{Binomial Distribution has the following properties:} There are a fixed number of trials or observations, $n$, determined in advance.Each trial can take on one of two possible outcomes, labeled ”success” and ”failure”.Each trial’s outcome is determined independently of all the other trials.The probability of a success and that of a failure remains the same from …

Exponential

Exponential Distribution has the following properties: Equals the distance between successive occurances or arrivals of a Poisson process with mean $\lambda > 0$$\lambda$ is the average number of occurances or arrivals per unit of time (length, space, etc.)$\frac{1}{\lambda}$ is the average time between occurrences or arrivals. \defl{Exponential Distribution:} \[f(x) = \lambda e^{-{\lambda}x} \] \[F(x) = …

Hypergeometric

\defl{Hypergeometric distribution has the following properties:} When units are selected from a finite population without replacement and the population consists of successes and failures. The major difference between the Hypergeometric distribution and the Binomial distribution is that the probability of selecting a success is {\bf not constant and is not independent} from each draw. \defm{Hypergeometric …

Normal

\defl{Normal Distribution has the following properties:} Symmetrical and a bell shaped appearance. The population mean and median are equal. An infinite range, $-\infty < x < \infty$ The approximate probability for certain ranges of $X$-values: $P(\mu - 1\sigma < X < \mu + 1\sigma) \approx 68%$ $P(\mu - 2\sigma < X < \mu + 2\sigma) …

Poisson

\[P(X=x)=f(x) = \frac{e^{-\lambda}\lambda^{x}}{x!} \] $X$ is Poisson Distributed $x$ equals the number of success in the interval $x = 0,1,2,\ldots$ $0