# Get the Knowledge that sets you free...Science and Math for K8 to K12 students

Email
×

## Statistical Inference

Lifetime of batteries - an example of point estimate Suppose, for example, that the parameter of interest is μ, the true average lifetime of batteries of a certain type. A random sample of 3 batteries might yield observed lifetimes (in hours) x1 = 6.2, x2 = 5.4 and x3 = 5.0. The computed value of the sample mean lifetime is x = 5.53. It is reasonable to regard 5.53 as a very plausible value of μ our "best guess" for the value of μ based on the available sample information.
Futaleufu river The Futaleufu is well known for its deep blue waters formed from an ice glacier and runs with exotic white water currents. This is one of the three white water rivers in the world. This river is famous for fly fishing. The main income comes from tourism, white water rafting & fly fishing.
Estimation & t-Distribution

Introduction: In the statistical inference, we begin our formal study of inference by introducing the 't-distribution' and talk about estimating a population parameter. We will learn about confidence intervals, a way of identifying a range of values that we think might contain our parameter of interest. We will develop intervals to estimate a single population mean, a single population proportion, the difference between two population means and the difference between two population proportions.

Estimation: An estimation in statistics refers to the process by which one makes inferences about a population based on information obtained from a sample. Here, we use statistic [a value describes a sample] as an estimate of a parameter [a value describes a population]. An estimate of a population parameter may be expressed in two ways: point estimate and interval estimate.

• Point estimate: A point estimate is defined as: a single value of a statistic that is the best estimate of an unknown population parameter. For example: the sample mean x is a point estimate of the population mean 'μ'. Similarly, the sample proportion is a point estimate of the population proportion 'p'.
• Interval estimate: An interval estimate provides an interval of values between which a population parameter is said to lie. If 'a < x < b' is an interval estimate of the population mean μ, then it indicates that the population mean is greater than the value 'a' but less than the value 'b'.
• Example: In a study undertaken in Futaleufu River, the mean stream velocity based on a sample of 25 measurements was found to be 12.5 m/sec. Therefore, 12.5 m/sec is our best estimate of the true stream velocity for the river. Imagine that we place an interval of 2 m/sec around our estimate of mean velocity. This interval indicates that we believe the true mean stream velocity lies somewhere between 10.5 m/sec and 14.5 m/sec.
t-Distribution

The t-distribution is used to estimate the population parameters when the sample size is small and/or when the population standard deviation is unknown. It is a continuous probability distribution which is also known as student's t-distribution.

According to the central limit theorem, when the sample size (n) is large enough [that is, n ≥ 30], the sampling distribution of any statistic (like a sample mean) will be approximately normal. Thus, when we know the standard deviation of the population, we can compute a z-score and use the normal distribution to evaluate probabilities with the sample mean.

But sometimes sample sizes (n) are small and often we don't know the standard deviation of the population. When either of these problems occur, the sampling distribution of any statistic follows a t-distribution, which is similar in many respects to the normal distribution. The formula for calculating the t-statistic (or) t-score is given by: t = , df = n – 1.

In the above equation: x is the sample mean, μ is the population mean, is the standard error [standard error is an error occurred by using the standard deviation of sample to estimate the standard deviation of population] and 'df' is the degrees of freedom.

• Degrees of freedom: Actually there are many different t-distributions. The particular form of the t-distribution is determined by its degrees of freedom. The degrees of freedom refers to the number of independent observations in a set of data. When estimating a mean (or) a proportion from a single sample of size 'n', the number of independent observations is equal to the sample size minus one, that is, degrees of freedom = n – 1. We will use the symbol t(k) to identify the t-distribution with k degrees of freedom.

The t-statistic (or) t-score follows a t-distribution if and only if (i) The population from which the sample was drawn is approximately normal (or) the sample size is large enough, that is, n ≥ 30 (ii) The sample which is drawn from the population is an SRS (simple random sample).

When the sample size (n) increases, the degrees of freedom (df) also increases. As degrees of freedom increases, the t-distribution gets closer to the normal distribution.

t-table: The table used for t-scores is set up differently than table used for z-scores. In standard normal tables, the marginal entries are z-scores and the table entries are the corresponding areas under the normal curve to the left of z. In the t-table, the left hand column is degrees of freedom, the top margin gives upper tail probabilities and the table entries are the corresponding critical values of 't' required to achieve the probability. Here, we will use t* (or z*) to indicate critical values.

Confidence intervals

A confidence interval is a type of an interval estimate which is used to express the accuracy and uncertainty associated with a particular sampling method. It consists of 3 parts: confidence level [which describes the uncertainty of a sampling method], sample statistic and margin of error [which describes the accuracy of the sampling method].

• Confidence level (C): A confidence level refers to the percentage of all possible samples that can be expected to include the true population parameter. Suppose, for example, all possible samples were selected from the same population and a confidence interval were computed for each sample. A 90% confidence level means that 90% of the confidence intervals would include the true population parameter.
• Statistic: The value that describes a sample is called statistic and the value that describes a population is called parameter. In inferential statistics, we use statistics to estimate the parameters.
• Margin of error: It expresses the maximum expected difference between the true population parameter and a sample estimate of that population parameter. It composed of two parts: critical value and the standard error [standard error is an error occurred by using the standard deviation of sample to estimate the standard deviation of population].

Constructing a confidence interval: There are four steps to constructing a confidence interval.

• Step 1: Identify a sample statistic. Choose the statistic [Example: sample mean, sample proportion] that we will use to estimate a population parameter.
• Step 2: Select a confidence level, which describes the uncertainty of a sampling method. Often, researchers choose 90%, 95%, (or) 99% confidence levels; but any percentage can be used.
• Step 3: Find the margin of error. Often, however, we will need to compute the margin of error, based on one of the following two equations:
• Margin of error = Critical value * Standard deviation of a statistic
• Margin of error = Critical value * Standard error of a statistic
• Step 4: Specify the confidence interval. The uncertainty is denoted by confidence level and the range of confidence interval is defined by the following equation: Confidence interval = sample statistic ± margin of error.