Statistics – Full Detailed Notes
Descriptive statistics is the branch of statistics that deals with summarizing, organizing, and presenting data in a clear and understandable way, using numerical measures and graphical techniques
1.1 Data Types of Measurement Scales
1.1.1 Nominal Scale
Nominal scale classifies data into categories without any order or magnitude. Only equality comparisons are meaningful.
- Qualitative classification
- No arithmetic operations possible
1.1.2 Ordinal Scale
Ordinal scale allows ranking or ordering of data, but differences between ranks are not meaningful.
- Relative position matters
- Intervals between ranks not equal
1.1.3 Interval Scale
Interval scale has ordered categories with meaningful differences, but no true zero point. Ratios are not meaningful.
- Equal intervals between values
- No absolute zero
\[ \text{Difference is meaningful: } (30^\circ C - 20^\circ C) = 10^\circ C \]
1.1.4 Ratio Scale
Ratio scale has all properties of interval scale plus a true zero, allowing meaningful ratios.
- Equal intervals
- True zero point
- Ratios are meaningful
\[ \text{Ratio is meaningful: } \frac{40}{20} = 2 \]
1.2 Measures of Central Tendency
Measures of Central Tendency are statistical tools used to identify the central point or typical value in a dataset. They provide a single representative figure that summarizes the entire data, making it easier to understand and compare.
1.2.1 Arithmetic Mean (AM)
Definition
The arithmetic mean is the sum of observations divided by the number of observations.
\[ \bar{x} = \frac{\sum_{i=1}^n x_i}{n}\]
Grouped Data Formula
\[ \bar{x} = \frac{\sum f_i x_i}{\sum f_i}\]
Properties
- Sum of deviations from mean is zero: \(\sum (x_i-\bar{x})=0\).
- Minimizes sum of squared deviations.
1.2.2 Median
Definition
The median is the middle value when data is arranged in order.
Grouped Data Formula
\[\text{Median} = L + \left(\frac{\frac{n}{2}-CF}{f}\right)h \]
where \(L\)=lower boundary of median class, \(CF\)=cumulative frequency before median class, \(f\)=frequency of median class, \(h\)=class width.Properties
- Not affected by extreme values.
- Represents central tendency for skewed data.
1.2.3 Mode
Definition
The mode is the most frequently occurring value.
Grouped Data Formula
\[\text{Mode} = L + \left(\frac{f_1-f_0}{2f_1-f_0-f_2}\right)h\]
where \(L\)=lower boundary of modal class, \(f_1\)=frequency of modal class, \(f_0\)=frequency before, \(f_2\)=frequency after, \(h\)=class width.Properties
- Represents most typical value.
- Useful for categorical data.
1.2.4 Geometric Mean (GM)
Definition
GM is the \(n\)-th root of the product of observations.
\[GM = \left(\prod_{i=1}^n x_i\right)^{1/n}\]
Grouped Data Formula
\[ \log GM = \frac{\sum f_i \log x_i}{\sum f_i} \]
Properties
- Appropriate for multiplicative data.
- Less affected by extreme values than AM.
1.2.5 Harmonic Mean (HM)
Definition
HM is the reciprocal of the arithmetic mean of reciprocals.
\[ HM = \frac{n}{\sum_{i=1}^n \frac{1}{x_i}} \]
Grouped Data Formula
\[ HM = \frac{\sum f_i}{\sum \frac{f_i}{x_i}} \]
Properties
- Appropriate for rates and ratios.
- Always less than or equal to GM and AM.
1.3 Empirical Relation
For moderately skewed distributions:
\[\text{Mode} \approx 3\text{Median} - 2\text{Mean} \]
Features of a Good Average
- Rigidly defined
- Easy to understand and compute
- Based on all observations
- Not unduly affected by extreme values
- Capable of further algebraic treatment
1.4 Measures of Dispersion
Definition: Show how much the data varies.
1.4.1 Range
The range is the simplest measure of dispersion, defined as the difference between the maximum and minimum values.
\[R = X_{\max} - X_{\min}\]
Coefficient of Range
\[ \text{Coefficient of Range} = \frac{X_{\max} - X_{\min}}{X_{\max} + X_{\min}} \]
1.4.2 Quartile Deviation (QD)
Quartile deviation (semi-interquartile range) measures spread around the median using quartiles.
\[QD = \frac{Q_3 - Q_1}{2}\]
Coefficient of QD
\[\text{Coefficient of QD} = \frac{Q_3 - Q_1}{Q_3 + Q_1}\]
1.4.3 Mean Deviation (MD)
Mean deviation is the average of absolute deviations from a central value (mean, median, or mode).
\[ MD = \frac{\sum |x_i - A|}{n} \]
where \(A\) is mean, median, or mode.Coefficient of MD
\[ \text{Coefficient of MD} = \frac{MD}{A} \]
1.4.4 Standard Deviation (SD)
Standard deviation is the square root of the average of squared deviations from the mean.
\[ \sigma = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n}} \]
Coefficient of Variation (CV)
\[ CV = \frac{\sigma}{\bar{x}} \times 100\% \]
1.4.5 Variance
Variance is the square of the standard deviation, representing average squared deviation from the mean.
\[\sigma^2 = \frac{\sum (x_i - \bar{x})^2}{n}\]
Applications
- Business: Variance helps assess risk in investment returns.
- Pharmacy: Variance is used to measure consistency in drug potency and dosage.
1.5 Central and Non-Central Moments
1.5.1 Non-Central Moments
The \(r\)-th non-central moment about origin is:
\[ \mu_r' = \frac{1}{n}\sum_{i=1}^n x_i^r \]
1.5.2 Central Moments
The \(r\)-th central moment about mean is:
\[ \mu_r = \frac{1}{n}\sum_{i=1}^n (x_i-\bar{x})^r \]
1.5.3 Interrelationship
Central moments can be expressed in terms of non-central moments:
\[ \mu_2 = \mu_2' - (\mu_1')^2,\quad \mu_3 = \mu_3' - 3\mu_2'\mu_1' + 2(\mu_1')^3 \]
1.5.4 Sheppard's Correction
When data are grouped, moments are biased. Sheppard's correction adjusts central moments:
\[ \mu_2^{corr} = \mu_2 - \frac{h^2}{12},\quad \mu_4^{corr} = \mu_4 - \frac{h^2}{2}\mu_2 + \frac{7h^4}{240} \]
where \(h\) = class interval width.1.6 Skewness
Concept
Skewness measures asymmetry of distribution. Positive skew: tail to right; negative skew: tail to left.
Methods
- Pearson’s coefficient: \(\frac{\bar{x}-\text{Mode}}{\sigma}\)
- Bowley’s coefficient: \(\frac{Q_3+Q_1-2Q_2}{Q_3-Q_1}\)
- Moment coefficient: \(\beta_1=\frac{\mu_3^2}{\mu_2^3}\)
1.7 Kurtosis
Concept
Kurtosis measures peakedness or flatness of distribution relative to normal.
\[ \beta_2 = \frac{\mu_4}{\mu_2^2},\quad \gamma_2 = \beta_2 - 3 \]
- \(\gamma_2=0\): Mesokurtic (normal)
- \(\gamma_2>0\): Leptokurtic (peaked)
- \(\gamma_2<0\): Platykurtic (flat)
Curve Fitting — Concepts and Methods
2.1 Bivariate Data
Bivariate data consists of pairs of observations \((x_i, y_i)\). Curve fitting seeks to model the relationship between \(x\) and \(y\) using a mathematical function.
2.2 Principle of Least Squares
The least squares method minimizes the sum of squared deviations between observed values and fitted curve values:
\[\min \sum_{i=1}^n (y_i - f(x_i))^2\]
2.3 Curve Fitting
2.3.1 Fitting of Straight Line
The simplest case is fitting a straight line:
\[ y = a + bx\]
Normal equations:
\[\sum y_i = na + b\sum x_i,\quad \sum x_i y_i = a\sum x_i + b\sum x_i^2\]
2.3.2 Fitting of Second Degree Polynomial
Quadratic curve:
\[y = a + bx + cx^2\]
Normal equations involve sums of \(x, x^2, x^3, x^4\).
2.3.3 Fitting of Family of Exponential Curves
Exponential curve of form:
\[y = a e^{bx}\]
Take logarithms: \(\ln y = \ln a + bx\). Fit straight line to transformed data.
2.3.4 Fitting of Power Curve
Power curve of form:
\[ y = a x^b\]
Take logarithms: \(\ln y = \ln a + b\ln x\). Fit straight line to transformed data.
2.4 Meaning and types of correlation
Meaning
Correlation quantifies the strength and direction of association between two variables. Linear correlation assesses how well a straight line describes the relationship.
Types
- Positive: Both variables move in the same direction.
- Negative: Variables move in opposite directions.
- No linear correlation: No systematic linear association.
- Perfect: Exact linear relation, \(r=\pm 1\).
Measures of correlation
Scatter diagram
A scatter plot of \((x_i,y_i)\) reveals pattern, direction, strength, and outliers. Tight, linear upward pattern suggests strong positive correlation.
Karl Pearson’s coefficient
The product–moment correlation coefficient is
\[ r=\frac{\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^n (x_i-\bar{x})^2}\sqrt{\sum_{i=1}^n (y_i-\bar{y})^2}} =\frac{\mathrm{Cov}(X,Y)}{s_x s_y}. \]
Rank correlation coefficient (Spearman)
For ranks \(R_i\) and \(S_i\) (no ties), with \(d_i=R_i-S_i\):
\[ r_s=1-\frac{6\sum d_i^2}{n(n^2-1)}.\]
With ties, assign average ranks and include tie correction:
\[ r_s=1-\frac{6\left(\sum d_i^2 + \frac{1}{12}\sum_j (t_j^3-t_j) + \frac{1}{12}\sum_k (u_k^3-u_k)\right)}{n(n^2-1)}, \]
where \(t_j\) and \(u_k\) are tie counts in each variable.Properties of \(r\)
- Bounds: \(-1\le r\le 1\).
- Unit-free: Invariant to linear scaling of variables.
- Symmetry: \(r_{xy}=r_{yx}\).
- Translation invariance: Adding constants to \(x\) or \(y\) leaves \(r\) unchanged.
Bivariate frequency distribution and correlation
Bivariate frequency table
Data grouped simultaneously by \(X\) and \(Y\) classes with frequencies \(f_{ij}\). Use class midpoints \(x_i,y_j\) for grouped correlation.
Correlation coefficient for bivariate data
Using grouped data sums (over all cells):
\[ r=\frac{ \sum f\,xy - \frac{(\sum f\,x)(\sum f\,y)}{\sum f} } { \sqrt{ \left(\sum f\,x^2 - \frac{(\sum f\,x)^2}{\sum f}\right) \left(\sum f\,y^2 - \frac{(\sum f\,y)^2}{\sum f}\right) } }. \]
Other correlation measures and related quantities
Coefficient of concurrent deviation
Counts concordant signs of deviations from means. If \(c\) of \(n\) pairs have concurrent deviations,
\[ r_c=\pm\sqrt{\frac{2c-n}{n}}, \]
sign chosen by overall direction.Probable error and its properties
Approximate probable error of \(r\):
\[ \mathrm{PE}=0.6745\,\frac{1-r^2}{\sqrt{n}}.\]
- Significance rule: If \(|r|>6\,\mathrm{PE}\), correlation is considered significant.
- Precision: Smaller PE implies more precise estimate of \(r\).
Coefficient of determination
\[ R^2=r^2, \]
the proportion of variance in \(Y\) explained linearly by \(X\).Multiple and partial correlation (three variables)
Multiple correlation (one on two jointly)
Multiple correlation of \(X_1\) with \(X_2,X_3\) given pairwise \(r_{12},r_{13},r_{23}\):
\[R_{1.23}=\sqrt{\frac{r_{12}^2+r_{13}^2-2\,r_{12}r_{13}r_{23}}{1-r_{23}^2}}. \]
Partial correlation (two controlling a third)
Partial correlation between \(X_1,X_2\) controlling \(X_3\):
\[r_{12.3}=\frac{r_{12}-r_{13}r_{23}}{\sqrt{(1-r_{13}^2)(1-r_{23}^2)}}. \]
Properties and problems
- Bounds: \(0\le R_{1.23}\le 1\); \(-1\le r_{12.3}\le 1\).
- Orthogonality: If \(r_{13}=r_{23}=0\), then \(r_{12.3}=r_{12}\) and \(R_{1.23}=\sqrt{r_{12}^2+r_{13}^2}\) simplifies.
- Interpretation: Partial correlation isolates the unique linear link between two variables net of the third.
Intra-class correlation and correlation ratio
Intra-class correlation (ICC)
Measures similarity within groups; variance-components form:
\[\mathrm{ICC}=\frac{\sigma_b^2}{\sigma_b^2+\sigma_w^2},\]
where \(\sigma_b^2\) is between-group variance and \(\sigma_w^2\) within-group variance. In one-way ANOVA with equal group size \(k\):
\[\mathrm{ICC(1)}=\frac{MS_B-MS_W}{MS_B+(k-1)MS_W}.\]
Correlation ratio (\(\eta\))
Captures strength of possibly nonlinear association of \(Y\) with categorical or binned \(X\):
\[\eta^2=\frac{\mathrm{SS}_{\text{between}}}{\mathrm{SS}_{\text{total}}}=\frac{\sum_g n_g(\bar{y}_g-\bar{y})^2}{\sum_i (y_i-\bar{y})^2},\quad 0\le \eta^2 \le 1.\]
Practical tips and cautions
- Linearity check: Always inspect scatter plots; \(r\) can be misleading for nonlinear patterns.
- Outliers: Outliers can strongly affect \(r\); consider robust measures or diagnostics.
- Grouping bias: Grouped-data \(r\) depends on class midpoints; use fine bins or raw data when possible.
3.1 Simple Linear Regression
Models relation between one predictor and one response.
\[ y = a + bx \]3.2 Multiple Regression
Multiple independent predictors.
\[ y = b_0 + b_1x_1 + b_2x_2 + \ldots \]3.3 Residual Analysis
Residuals = difference between observed & fitted values.
4.1 Laspeyres Index
\[ P_L = \frac{\sum p_1 q_0}{\sum p_0 q_0} \]4.2 Paasche Index
\[ P_P = \frac{\sum p_1 q_1}{\sum p_0 q_1} \]4.3 Fisher’s Ideal Index
\[ P_F = \sqrt{P_L P_P} \]5.1 Trend
Long-term movement in data.
5.2 Seasonal Variation
Patterns repeating yearly, monthly, etc.
5.3 Cyclical Variation
Long-term economic cycles.
5.4 Irregular Variation
Random or unpredictable movements.
6.1 Types of Attributes
- Positive attribute: presence of a quality
- Negative attribute: absence of a quality
6.2 Consistency of Data
Checks if attribute classification is logical.
6.3 Association of Attributes
Determines whether one attribute is related to another.
7.1 Addition Rule
\[ P(A\cup B) = P(A) + P(B) - P(A\cap B) \]7.2 Multiplication Rule
\[ P(A\cap B) = P(A)P(B|A) \]7.3 Conditional Probability
\[ P(A|B)=\frac{P(A\cap B)}{P(B)} \]7.4 Bayes’ Theorem
\[ P(A|B)=\frac{P(B|A)P(A)}{P(B)} \]8.1 Expectation / Mean
Explanation: The expected value (mean) is the long-run average of a random variable.
Discrete: \(\;E(X)=\sum_x x\,P(X=x)\). Continuous: \(\;E(X)=\int x f(x)\,dx\).
8.2 Moments and Central Moments
Explanation: \(k\)-th moment \(E(X^k)\). Central moments use \((X-E(X))^k\); variance is 2nd central moment.
\(\mathrm{Var}(X)=E[(X-E(X))^2]=E(X^2)-[E(X)]^2\).
8.3 Properties of Expectation
Explanation: Linearity: \(E[aX+bY]=aE(X)+bE(Y)\). Useful for simplifying calculations.
9.1 Probability Generating Function (PGF)
Explanation: For integer-valued \(X\): \(G_X(t)=E(t^X)=\sum_{k\ge0}P(X=k)t^k\). Encodes PMF and moments.
9.2 Moment Generating Function (MGF)
Explanation: \(M_X(t)=E(e^{tX})\). Derivatives at 0 give moments.
9.3 Characteristic Function
Explanation: \(\phi_X(t)=E(e^{itX})\). Always exists; helpful for proving limit theorems.
9.4 Law of Large Numbers (LLN)
Explanation: Sample mean \(\bar{X}_n\) converges to true mean as \(n\to\infty\) (weak & strong forms).
10.1 Bernoulli
Explanation: Single trial with success probability \(p\). Support \(\{0,1\}\).
Example: Toss coin (head=1) with \(P(1)=p\).
10.1.1 PMF
The Bernoulli PMF assigns probability \(p\) to success and \(1-p\) to failure:
\[P(X=x) = p^x (1-p)^{1-x}, \quad x \in \{0,1\}, \quad 0 \le p \le 1.\]
10.1.2 Mean
The expected value equals the success probability:
\[\mathbb{E}[X] = p.\]
10.1.3 Variance
Variance is the product of success and failure probabilities:
\[\mathrm{Var}(X) = p(1-p).\]
10.1.4 Recurrence relations
Bernoulli has two points. A useful ratio ties them together:
\[\frac{P(X=1)}{P(X=0)} = \frac{p}{1-p}.\]
10.2 Binomial
Explanation: Number of successes in \(n\) iid Bernoulli trials. \(P(X=k)=\binom{n}{k}p^k(1-p)^{n-k}\).
Binomial distribution
10.2.1 PMF
For \(n\) independent Bernoulli trials with success probability \(p\), the number of successes \(X\) has:
\[P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k=0,1,\dots,n.\]
10.2.2 Mean
The expected number of successes equals the trials times success probability:
\[\mathbb{E}[X] = np.\]
10.2.3 Variance
Variance scales Bernoulli variance by the number of trials:
\[\mathrm{Var}(X) = np(1-p).\]
10.2.4 Recurrence relations
Adjacent Binomial probabilities are linked by:
\[\frac{P(X=k+1)}{P(X=k)} = \frac{n-k}{k+1} \cdot \frac{p}{1-p},\quad k=0,1,\dots,n-1.\]
Equivalently,
\[ P(X=k+1) = P(X=k)\, \frac{n-k}{k+1} \cdot \frac{p}{1-p}.\]
10.2.5 Binomial approximations
10.2.5.1 Normal approximation
When \(n\) is large and \(p\) is not extreme, \(X \sim \mathrm{Bin}(n,p)\) is approximated by a Normal:
\[X \approx \mathcal{N}\!\big(np,\; np(1-p)\big).\]
Use continuity correction: for a discrete event like \(P(X \le k)\), compute \(\,P(Y \le k+0.5)\) with \(Y\) Normal.
10.2.5.2 Poisson approximation
When \(n\) is large, \(p\) is small, and \(\lambda = np\) is moderate, \(X \sim \mathrm{Bin}(n,p)\) is approximated by a Poisson:
\[X \approx \mathrm{Pois}(\lambda), \quad \lambda = np.\]
Tip: Normal approximation works best when \(np(1-p)\) is large; Poisson approximation works best when \(p\) is small and \(n\) is large with fixed \(\lambda=np\).
10.3 Poisson
Overview
The Poisson distribution models counts of rare, independent events in a fixed interval. The Poisson process extends this to event counts over time, assuming a constant rate and independent increments.
Explanation: Models rare events per interval; mean = variance = \(\lambda\).
Example: Calls per hour when avg \(\lambda=5\): \(P(X=k)=e^{-\lambda}\lambda^k/k!\).
Poisson distribution
10.3.1 PMF
The Poisson PMF gives the probability of observing \(k\) events in a fixed interval:
\[ P(X = k) = e^{-\lambda}\, \frac{\lambda^k}{k!}, \quad k = 0,1,2,\dots, \ \lambda > 0. \]
10.3.2 Mean equals variance
The Poisson distribution has equal mean and variance; both are \(\lambda\):
\[ \mathbb{E}[X] = \lambda, \qquad \mathrm{Var}(X) = \lambda. \]
10.3.3 Recurrence relation (optional but useful)
Adjacent probabilities satisfy a simple recurrence:
\[ \frac{P(X=k+1)}{P(X=k)} = \frac{\lambda}{k+1} \quad \Rightarrow \quad P(X=k+1) = P(X=k)\, \frac{\lambda}{k+1}. \]
10.3.4 Approximation to binomial
Conditions and result
When \(n\) is large and \(p\) is small with \(\lambda = np\) fixed, the binomial \(\mathrm{Bin}(n,p)\) is well-approximated by a Poisson \(\mathrm{Pois}(\lambda)\):
\[ P_{\mathrm{Bin}(n,p)}(X=k) \approx e^{-\lambda}\frac{\lambda^k}{k!}, \quad \lambda = np. \]
Why it works
With small \(p\), most trials are failures; counts are driven by rare, nearly independent successes. As \(n \to \infty\), \(p \to 0\) and \(np \to \lambda\), the binomial PMF converges pointwise to the Poisson PMF.
10.3.5 Additional useful relations
Moment generating and probability generating functions
The MGF and PGF capture count behavior and enable derivations of moments and sums:
\[ M_X(t) = \exp\big(\lambda(e^{t}-1)\big), \qquad G_X(s) = \exp\big(\lambda(s-1)\big). \]
10.4 Geometric & Negative Binomial
Definition
The Geometric distribution models the number of independent Bernoulli trials until the first success, with success probability \(p\).
- Support: \(X \in \{1,2,3,\dots\}\).
- Parameter: Success probability \(p\), with \(0 < p \le 1\).
Probability Mass Function (PMF)
The PMF is derived by requiring \(k-1\) failures followed by one success:
\[ P(X=k) = (1-p)^{k-1} p, \quad k=1,2,\dots \]
Mean
The expected value is the reciprocal of success probability:
\[ \mathbb{E}[X] = \frac{1}{p}. \]
Variance
The variance is given by:
\[ \mathrm{Var}(X) = \frac{1-p}{p^2}. \]
Memoryless Property
The Geometric distribution is the only discrete distribution with the memoryless property. This means:
\[ P(X > m+n \mid X > m) = P(X > n). \]
Interpretation: The probability of waiting more than \(m+n\) trials given that we have already waited more than \(m\) trials is the same as the probability of waiting more than \(n\) trials from the start.
Approximations
Exponential approximation
For small \(p\), the Geometric distribution approximates an Exponential distribution with mean \(1/p\). If \(X \sim \mathrm{Geom}(p)\), then for large \(k\):
\[ P(X > k) \approx e^{-pk}. \]
Relation to Negative Binomial
The Geometric distribution is a special case of the Negative Binomial with \(r=1\):
\[ \mathrm{Geom}(p) = \mathrm{NegBin}(r=1,p). \]
Definition
The Negative Binomial distribution models the number of trials required to achieve \(r\) successes in independent Bernoulli trials with success probability \(p\).
- Support: \(X \in \{r, r+1, r+2, \dots\}\).
- Parameters: \(r > 0\) (number of successes), \(0 < p \le 1\) (success probability).
Probability Mass Function (PMF)
The PMF is derived by requiring \(r-1\) successes in the first \(k-1\) trials and a success on the \(k\)-th trial:
\[ P(X=k) = \binom{k-1}{r-1} p^r (1-p)^{k-r}, \quad k=r,r+1,\dots \]
Mean
The expected value is:
\[ \mathbb{E}[X] = \frac{r}{p}. \]
Variance
The variance is:
\[ \mathrm{Var}(X) = \frac{r(1-p)}{p^2}. \]
Approximations
Normal approximation
For large \(r\), the Negative Binomial can be approximated by a Normal distribution:
\[ X \approx \mathcal{N}\!\left(\frac{r}{p}, \frac{r(1-p)}{p^2}\right). \]
Poisson approximation
When \(p\) is small and \(r\) is fixed, the Negative Binomial approximates a Poisson distribution with mean \(\lambda = r(1-p)/p\).
Memoryless Property
Important: The Negative Binomial distribution does not have the memoryless property. Only the Geometric distribution (special case with \(r=1\)) is memoryless.
For \(r>1\), the conditional probability depends on past trials, so memorylessness fails.
Explanation: Geometric = trials until first success; Negative Binomial = trials until r-th success.
Hypergeometric Distribution
A compact, classroom-ready reference with explanations, examples, and LaTeX rendered via MathJax.
Definition
The Hypergeometric distribution models the probability of drawing a certain number of successes in a sample of size \(n\) from a finite population of size \(N\) containing \(K\) successes, without replacement.
- Support: \(X \in \{0,1,\dots,n\}\).
- Parameters: \(N\) (population size), \(K\) (number of successes in population), \(n\) (sample size).
Probability Mass Function (PMF)
The PMF is given by:
\[ P(X=k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}, \quad k=0,1,\dots,n. \]
Mean
The expected value is:
\[ \mathbb{E}[X] = n \cdot \frac{K}{N}. \]
Variance
The variance is:
\[ \mathrm{Var}(X) = n \cdot \frac{K}{N} \cdot \left(1-\frac{K}{N}\right) \cdot \frac{N-n}{N-1}. \]
Sampling Without Replacement
The Hypergeometric distribution arises naturally in sampling without replacement. Unlike the Binomial distribution, which assumes independent trials with replacement, the Hypergeometric accounts for the changing composition of the population after each draw.
Relation to Binomial
When the population size \(N\) is large compared to the sample size \(n\), the Hypergeometric distribution approximates the Binomial distribution with parameters \(n\) and \(p=K/N\).
\[ \mathrm{Hypergeom}(N,K,n) \approx \mathrm{Binomial}(n,p), \quad p=\frac{K}{N}. \]
11.1 Uniform
\(f(x)=\frac{1}{b-a},\; a\le x\le b\). Example: random between 0 and 1.
11.2 Exponential
\(f(x)=\lambda e^{-\lambda x},\; x\ge0\). Memoryless property. Example: time between arrivals.
11.3 Normal (Gaussian)
\(f(x)=\frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}\). Example: human heights approx. normal.
11.4 Student-t, Chi-square, Gamma, Beta
Explanation: Special families used in inference (t, chi-square) and flexible modeling (Gamma, Beta).
12.1 Joint, Marginal & Conditional
Explanation: Joint density/pmf \(f_{X,Y}(x,y)\). Marginal \(f_X(x)=\int f_{X,Y}(x,y)dy\). Conditional \(f_{X|Y}(x|y)=f_{X,Y}(x,y)/f_Y(y)\).
12.2 Covariance & Correlation
\(\mathrm{Cov}(X,Y)=E[(X-E(X))(Y-E(Y))],\quad \rho=\frac{\mathrm{Cov}(X,Y)}{\sigma_X\sigma_Y}\).
12.3 Independence vs Uncorrelated
Explanation: Independence ⇒ uncorrelated, but not conversely. Dependence can exist with zero covariance.
13.1 Point Estimation
Explanation: Single-value estimators (e.g., sample mean for population mean).
13.2 Properties: Unbiasedness, Consistency, Efficiency
Explanation: Unbiased: \(E(\hat\theta)=\theta\). Consistent: \(\hat\theta\to\theta\) as \(n\to\infty\). Efficient: minimum variance among unbiased estimators.
13.3 Maximum Likelihood Estimation (MLE)
Explanation: Choose \(\hat\theta\) maximizing likelihood \(L(\theta)=\prod f(x_i;\theta)\). Asymptotically normal and efficient under regularity.
13.4 Method of Moments & Bayesian Estimation
Explanation: Method of Moments equates sample and theoretical moments. Bayesian combines prior and likelihood to obtain posterior.
14.1 Null & Alternative
Explanation: \(H_0\) is baseline (no effect), \(H_1\) is alternative. Choose significance level \(\alpha\) for type I error.
14.2 Test Statistics and p-values
Explanation: Compute statistic whose distribution under \(H_0\) is known. p-value = probability to observe statistic as extreme or more given \(H_0\).
14.3 t-test, Chi-square, F-test
Explanation: t-test for unknown sigma (small samples), chi-square for goodness-of-fit / contingency, F-test for variances or ANOVA comparisons.
14.4 Type I/II Errors & Power
Explanation: Type I error \(\alpha\) = reject \(H_0\) when true. Type II error \(\beta\) = fail to reject when false. Power = \(1-\beta\).
15.1 Sign Test
Explanation: Used to test median when distribution is unknown. Count +/− signs of differences.
15.2 Wilcoxon Signed-Rank Test
Explanation: Considers magnitudes + signs for paired samples. More powerful than sign test.
15.3 Mann–Whitney U Test
Explanation: Tests if two independent samples come from same distribution.
15.4 Kruskal–Wallis Test
Explanation: Non-parametric alternative to ANOVA for >2 samples.
Brief review of parameter, statistic, and sampling distribution
Parameter
A parameter is a fixed (but typically unknown) numerical characteristic of a population, such as the population mean \(\mu\) or proportion \(p\). Parameters do not vary with samples—they describe the entire population.
Statistic
A statistic is a function of sample data used to estimate a parameter. Common statistics include the sample mean \(\bar{X}\) and sample proportion \(\hat{p}\). Statistics vary with the sample drawn.
\[ \bar{X}=\frac{1}{n}\sum_{i=1}^n X_i,\quad \hat{p}=\frac{1}{n}\sum_{i=1}^n I\{X_i=\text{success}\}. \]
Sampling distribution
The sampling distribution is the probability distribution of a statistic across all possible samples from the population. It determines standard error, confidence intervals, and tests.
\[ \mathrm{SE}(\bar{X})= \begin{cases} \frac{\sigma}{\sqrt{n}}, & \text{i.i.d. with replacement}\\[4pt] \frac{\sigma}{\sqrt{n}}\sqrt{\frac{N-n}{N-1}}, & \text{SRS without replacement (FPC)} \end{cases} \]
Principal steps and principles in a sample survey
Define objectives and target population
Specify the survey purpose, key indicators, and the population of inference. Clear objectives drive design choices and ensure relevance.
Construct the sampling frame
List or mechanism that allows access to population units (e.g., registry, map, list of households). Frame quality affects coverage.
Choose sampling design
Select suitable probability design (SRS, stratified, cluster, multistage) to balance precision, cost, and operational constraints.
Develop instruments and protocols
Draft questionnaires, define variables, pretest, and standardize field procedures to reduce measurement error.
Train and supervise fieldwork
Train enumerators, monitor adherence, and implement quality control to minimize interviewer and processing errors.
Weighting, estimation, and inference
Compute design weights, adjust for nonresponse and post-stratification, then estimate parameters with variance accounting for design.
\[ w_i=\frac{1}{\pi_i}\times \text{NR adj.}\times \text{PS adj.},\quad \hat{\mu}=\frac{\sum_i w_i y_i}{\sum_i w_i}. \]
Analysis, reporting, and dissemination
Apply appropriate estimators, variance estimators (Taylor linearization, replicate weights), report methodology, limitations, and uncertainty.
Sampling and non-sampling errors
Sampling errors
Random errors due to using a sample instead of the full population; quantified by variance and standard error, decrease with larger effective sample size and better designs.
\[ \mathrm{SE}(\hat{p})=\sqrt{\frac{p(1-p)}{n}}\times \sqrt{\frac{N-n}{N-1}}\quad \text{(SRSWOR with FPC)}. \]
Non-sampling errors
Systematic or random errors not due to sampling: coverage error, nonresponse, measurement, processing, interviewer effects, and model/specification errors.
Advantages of sampling over census
Cost, speed, and feasibility
Sampling is cheaper, faster, and operationally feasible, enabling timely insights and iteration compared to a full enumeration.
Quality and depth
Resources can be concentrated on training, supervision, and instrument quality; more variables and higher measurement precision are possible.
Ethical and respondent burden
Sampling reduces respondent burden and privacy intrusion relative to a census collecting detailed information.
Limitations of sampling
Precision limits and rare subgroups
Small domains or rare events may require large samples or special designs (oversampling), otherwise estimates have high variance.
Bias risks
Frame coverage gaps, nonresponse, and measurement errors can bias estimates if not addressed via design, weighting, and adjustment.
Complexity in design and analysis
Clustered, stratified, multistage designs require specialized variance estimation and trained staff, increasing complexity.
Types of sampling — subjective, probability, and mixed sampling
Subjective (non-probability) sampling
Selection not based on known inclusion probabilities: convenience, purposive, quota, snowball. Useful for exploration, but limits inference to populations.
Probability sampling
Every unit has a known, non-zero probability of selection, enabling unbiased design-based inference and valid variance estimation.
- SRS/SRSWOR: Equal probabilities, simple estimators.
- Stratified sampling: Partition population; sample within strata to improve precision.
- Cluster/multistage: Sample groups (PSUs) then units; cost-effective but higher intra-cluster correlation.
- PPS: Select PSUs with probability proportional to size to balance workloads.
Mixed sampling
Combines probability and non-probability elements for practical or methodological reasons (e.g., probability core sample plus purposive oversample of hard-to-reach groups).
Additional formulas and concepts
Unbiasedness and consistency
Under probability sampling, many estimators are unbiased or asymptotically unbiased. For SRS, \(\mathbb{E}[\bar{X}]=\mu\) and \(\mathbb{E}[\hat{p}]=\pi\). Consistency improves as effective sample size grows.
Central limit theorem for sample mean
For large \(n\), \(\bar{X}\) is approximately normal, facilitating Wald-type intervals: \[ \bar{X}\overset{\cdot}{\sim}\mathcal{N}\!\left(\mu,\ \frac{\sigma^2}{n}\right). \] With complex designs, use design-consistent variance estimators.
16.1 Simple Random Sampling (SRS)
Simple random sampling: with and without replacement
Simple random sampling with replacement (SRSWR)
Each draw is independent, every unit has equal probability on each draw, and selected units are returned before the next draw.
\[ P(\text{select unit } i \text{ on a given draw})=\frac{1}{N}, \quad \text{draws independent.} \]
Simple random sampling without replacement (SRSWOR)
All \(\binom{N}{n}\) subsets are equally likely; once selected, a unit cannot appear again.
\[ P(\text{specific sample } S)=\frac{1}{\binom{N}{n}}, \quad |S|=n. \]
Notations and terminology
- Population size: \(N\)
- Sample size: \(n\)
- Population values: \(y_1,\dots,y_N\)
- Population mean and variance: \(\mu=\frac{1}{N}\sum_{i=1}^N y_i,\quad S^2=\frac{1}{N-1}\sum_{i=1}^N (y_i-\mu)^2\)
- Sample mean and sample variance: \(\bar{y}=\frac{1}{n}\sum_{i\in s} y_i,\quad s^2=\frac{1}{n-1}\sum_{i\in s}(y_i-\bar{y})^2\)
- Population total: \(Y=\sum_{i=1}^N y_i=N\mu\)
Various probabilities of selection
First-order and second-order inclusion probabilities (SRSWOR)
Under SRSWOR, all units have equal inclusion probability; pairs have equal joint inclusion probability.
\[ \pi_i=\frac{n}{N},\qquad \pi_{ij}=\frac{n(n-1)}{N(N-1)}\quad (i\ne j). \]
Selection probabilities across draws (SRSWR)
In SRSWR, per-draw selection probability is \(1/N\), and the probability a unit appears exactly \(k\) times in \(n\) draws is Binomial.
\[ P(\text{unit } i \text{ appears } k \text{ times})=\binom{n}{k}\left(\frac{1}{N}\right)^k\left(1-\frac{1}{N}\right)^{n-k}. \]
Random numbers tables and uses
Purpose and properties
Random number tables approximate independent Uniform(0,1) digits, enabling objective selection without systematic patterns.
Practical use in SRS
Map population units to numeric labels and scan the table, skipping duplicates (SRSWOR) or counting occurrences (SRSWR).
Methods of selecting a simple random sample
Lottery method
Write unit labels on identical slips, mix thoroughly, and draw at random until \(n\) are selected. Ensures physical randomization.
Method based on random numbers
Assign numeric IDs to all units; use a random number table or generator to select IDs according to SRSWR/SRSWOR rules.
Estimates of population total, mean and their variances and standard errors
Unbiased estimators
Sample mean estimates the population mean; the Horvitz–Thompson estimator reduces to \(N\bar{y}\) for totals under SRS.
\[ \hat{\mu}=\bar{y}=\frac{1}{n}\sum_{i\in s}y_i,\qquad \hat{Y}=N\bar{y}. \]
Variances under SRSWR and SRSWOR
SRSWR (independent draws):
\[ \mathrm{Var}(\bar{y})=\frac{S^2}{n},\qquad \mathrm{Var}(\hat{Y})=N^2\frac{S^2}{n}. \]
SRSWOR (finite population correction, FPC):
\[ \mathrm{Var}(\bar{y})=\frac{S^2}{n}\left(1-\frac{n}{N}\right),\qquad \mathrm{Var}(\hat{Y})=N^2\,\frac{S^2}{n}\left(1-\frac{n}{N}\right). \]
Estimating variance from the sample
Plug in sample variance \(s^2\) for \(S^2\) to estimate SEs:
\[ \widehat{\mathrm{Var}}(\bar{y})=\frac{s^2}{n}\left(1-\frac{n}{N}\right),\quad \widehat{\mathrm{SE}}(\bar{y})=\sqrt{\widehat{\mathrm{Var}}(\bar{y})}. \]
Determination of sample size
For estimating a mean
Target margin of error \(d\) at confidence level \(1-\alpha\) with z-score \(z_{\alpha/2}\). With prior SD \(\sigma\):
\[ n_0=\frac{z_{\alpha/2}^2\,\sigma^2}{d^2},\qquad n=\frac{n_0}{1+\frac{n_0-1}{N}}\quad \text{(FPC-adjusted)}. \]
For estimating a proportion (attribute)
With anticipated proportion \(p\) (use \(0.5\) if unknown) and margin \(d\):
\[ n_0=\frac{z_{\alpha/2}^2\,p(1-p)}{d^2},\qquad n=\frac{n_0}{1+\frac{n_0-1}{N}}. \]
Simple random sampling of attributes (binary outcomes)
Estimator and standard error
Let \(X_i\in\{0,1\}\) indicate “success”. The sample proportion \(\hat{p}=\frac{1}{n}\sum_{i\in s}X_i\) is unbiased for population proportion \(\pi\).
\[ \mathrm{Var}(\hat{p})=\frac{\pi(1-\pi)}{n}\left(1-\frac{n}{N}\right)\approx \frac{\hat{p}(1-\hat{p})}{n}\left(1-\frac{n}{N}\right). \]
Confidence interval
Wald CI (large-sample): \(\hat{p}\pm z_{\alpha/2}\cdot \widehat{\mathrm{SE}}(\hat{p})\). For small \(n\), consider exact or adjusted intervals.
Additional practical points
Duplicate handling and replacement rules
Under SRSWOR, skip duplicate IDs from random numbers; under SRSWR, record multiplicities (useful in PPSWR frameworks).
Operational checks
Verify frame coverage, randomization integrity, and documented selection steps to ensure reproducibility and auditability.
Tip: Use the finite population correction when \(n/N\) is non-negligible; otherwise, the SRSWR variance approximations are adequate and simpler.
Explanation: Every unit has equal chance to be chosen.
16.2 Stratified Sampling
Stratified Random Sampling
Population is divided into homogeneous subgroups (strata). A simple random sample is drawn independently from each stratum. This ensures representation across key subgroups.
Advantages and Disadvantages
Advantages
- Improves precision by reducing variance when strata are internally homogeneous.
- Ensures representation of all subgroups.
- Allows separate estimates for each stratum.
Disadvantages
- Requires detailed population information to form strata.
- Complexity in design and analysis compared to SRS.
- Incorrect stratification may reduce efficiency.
Estimation of Population Mean and Variance
Let population be divided into \(L\) strata, stratum \(h\) has size \(N_h\), sample size \(n_h\), mean \(\bar{y}_h\), variance \(S_h^2\). Total population size \(N=\sum_{h=1}^L N_h\).
Estimator of mean
\[ \bar{y}_{st} = \sum_{h=1}^L W_h \bar{y}_h,\quad W_h=\frac{N_h}{N}. \]
Variance of stratified mean
\[ \mathrm{Var}(\bar{y}_{st}) = \sum_{h=1}^L W_h^2\left(\frac{S_h^2}{n_h}\left(1-\frac{n_h}{N_h}\right)\right). \]
Proportional and Optimum Allocations
Proportional allocation
Sample size in each stratum proportional to stratum size:
\[ n_h = n\cdot \frac{N_h}{N}. \]
Optimum (Neyman) allocation
Allocates samples to minimize variance, considering stratum variability:
\[ n_h = n\cdot \frac{N_h S_h}{\sum_{j=1}^L N_j S_j}. \]
Comparison with SRSWOR
Variance under proportional allocation:
\[ \mathrm{Var}(\bar{y}_{st})=\frac{1}{n}\sum_{h=1}^L W_h S_h^2\left(1-\frac{n_h}{N_h}\right). \]
SRSWOR variance:
\[ \mathrm{Var}(\bar{y}_{srs})=\frac{S^2}{n}\left(1-\frac{n}{N}\right). \]
Optimum allocation generally yields lower variance than proportional allocation and SRSWOR, especially when stratum variances differ significantly.
Explanation: Population divided into strata; sample drawn from each.
16.3 Systematic Sampling
Systematic Sampling Definition
Systematic sampling selects every \(k\)-th unit from a population after a random start. When \(N=nk\), the population of size \(N\) is divided into \(n\) groups of size \(k\), and one unit from each group is chosen systematically.
Merits and Demerits of Systematic Sampling
Merits
- Simple and convenient to implement.
- Ensures spread across the population.
- Useful when population list is ordered.
Demerits
- Risk of bias if population has hidden periodicity.
- Variance estimation can be complex.
Estimate of Mean and Variance
The systematic sample mean is:
\[ \bar{y}_{sys} = \frac{1}{n}\sum_{i=1}^n y_{r+(i-1)k} \]
Variance depends on population structure. For random start and no periodicity, variance approximates SRS variance:
\[ \mathrm{Var}(\bar{y}_{sys}) \approx \frac{S^2}{n}\left(1-\frac{n}{N}\right). \]
Comparison with Stratified and SRSWOR
Systematic sampling is similar to stratified sampling with one unit per stratum when the population is ordered. Compared to SRSWOR, systematic sampling is easier operationally but variance estimation is less straightforward.
Comparison of Variance for Linear Trend
For a population with linear trend, systematic sampling can yield lower variance than SRS or stratified sampling.
\[ \mathrm{Var}_{SRS}(\bar{y}) \approx \frac{S^2}{n},\quad \mathrm{Var}_{StRS}(\bar{y}) \le \mathrm{Var}_{SRS}(\bar{y}),\quad \mathrm{Var}_{Sys}(\bar{y}) \ll \mathrm{Var}_{SRS}(\bar{y}) \text{ under linear trend}. \]
16.4 Cluster Sampling
Cluster sampling selects groups (clusters) of units rather than individual units. Useful when population is naturally grouped.
16.5 Multistage Sampling
Multistage sampling involves selecting samples in stages: first primary units, then secondary units within them, etc.
16.6 Quota Sampling
Quota sampling is a non-probability method where interviewers are given quotas to fill based on characteristics like age or gender.
Analysis of Variance (ANOVA)
Exploring the concept, models, assumptions, and applications of ANOVA in statistical analysis.
Concept and Definition
ANOVA is a statistical method used to compare means across multiple groups by analyzing variance. It separates total variation into components due to different sources.
Definition: According to R.A. Fisher, ANOVA is the separation of variance ascribable to one group of causes from the variance ascribable to other groups.
Assumptions of ANOVA
- Observations are independent
- Population is normally distributed
- Treatment and environmental effects are additive
One-Way Classification
Mathematical Model
\[ y_{ij} = \mu + \alpha_i + \varepsilon_{ij} \]
Where \( y_{ij} \) is the observation from group \( i \), \( \mu \) is the overall mean, \( \alpha_i \) is the effect of group \( i \), and \( \varepsilon_{ij} \) is the random error.
Analysis with Equal Classification
Each group has the same number of observations \( n \). Total sum of squares is partitioned as:
\[ SST = SSA + SSE \]
\[ F = \frac{MSA}{MSE} = \frac{SSA/(k-1)}{SSE/(N-k)} \]
Analysis with Unequal Classification
Groups have different sizes \( n_1, n_2, \dots, n_k \). Adjusted formulas for SSA and MSE are used.
\[ SSA = \sum_{i=1}^k n_i (\bar{y}_i - \bar{y})^2 \]
Two-Way Classification
Mathematical Model
\[ y_{ij} = \mu + \alpha_i + \beta_j + \varepsilon_{ij} \]
Where \( \alpha_i \) is the effect of row factor, \( \beta_j \) is the effect of column factor, and \( \varepsilon_{ij} \) is the error term.
Analysis
Total variation is split into:
\[ SST = SSA + SSB + SSE \]
F-ratios are computed for both factors:
\[ F_A = \frac{MSA}{MSE},\quad F_B = \frac{MSB}{MSE} \]
Introduction
In 1935 sir Ronald A. Fisher laid the foundation for the subject which has come to be known by the title of his book „ The Design of Experiments‟. Since then the theory of experimental design has been considerably developed and extended. Applications of this theory are found today laboratories and research in natural sciences, engineering, and nearly all branches of social science.
17.1 CRD (Completely Randomized Design)
Explanation: Treatments assigned completely at random.
Definition
In a Completely Randomised Design (CRD), treatments are assigned completely at random to all experimental units without any restrictions.
Terminology
- Treatments: Different conditions or factors being compared.
- Experimental units: Smallest division of material receiving treatments.
- Replication: Repetition of treatments to improve reliability.
Principles of Design of Experiments
Replication
Repeating treatments to reduce error and increase precision.
\[ SE = \frac{\sigma}{\sqrt{r}} \]
Randomisation
Ensures unbiased allocation of treatments.
Local Control
Not applicable in CRD since no blocking is used.
CRD Concept
All experimental units are homogeneous, and treatments are allocated randomly. It is the simplest design.
Advantages
- Simplicity in layout and analysis
- Flexibility in number of treatments and replications
- Easy randomisation
Disadvantages
- Not suitable for heterogeneous experimental units
- Less efficient if variability exists among units
Applications
CRD is widely used in laboratory experiments, greenhouse studies, and situations where experimental units are homogeneous.
Layout
Treatments are assigned randomly to all units. For \( t \) treatments and \( r \) replications, total units = \( t \times r \).
Statistical Analysis
Model:
\[ y_{ij} = \mu + \tau_i + \varepsilon_{ij} \]
Where \( y_{ij} \) is observation under treatment \( i \), \( \mu \) is overall mean, \( \tau_i \) is treatment effect, and \( \varepsilon_{ij} \) is error.
ANOVA partitions total variation:
\[ SST = SSA + SSE \]
\[ F = \frac{MSA}{MSE} \]
Critical Differences
When hypothesis is significant, pairwise comparisons are made using Critical Difference (CD):
\[ CD = t_{\alpha, df} \times \sqrt{\frac{2MSE}{r}} \]
17.2 RBD (Randomized Block Design)
Explanation: Blocks remove variability; treatments randomized within blocks.
Randomized Block Design (RBD)
Concept, layout, analysis, advantages/disadvantages, Critical Difference and handling one missing observation
Concept
What it is: Randomized Block Design (RBD) groups experimental units into relatively homogeneous blocks (strata). Each block receives all treatments (or one replicate of each treatment). Randomization occurs within each block so treatments are assigned at random to plots inside the block. RBD reduces experimental error due to known nuisance variation by controlling it through blocking.
Why use RBD?
Explanation: When experimental units are heterogeneous (e.g., fertility gradient in a field), blocking isolates that heterogeneity into block effects; this reduces residual variance and increases precision of treatment comparisons. RBD is simple to implement and analyze.
Layout
General layout: For \(t\) treatments and \(r\) blocks (replications), arrange an \(r \times t\) table. Each row is a block and in each row all \(t\) treatments appear once (if possible). Label cell observation as \(y_{ij}\) = response of treatment \(i\) in block \(j\).
Notation used
Let \(y_{ij}\) be the observation for treatment \(i\) in block \(j\). Let
\(T_i=\sum_{j=1}^r y_{ij}\) (treatment total), \quad \(B_j=\sum_{i=1}^t y_{ij}\) (block total),\quad \(G=\sum_{i=1}^t\sum_{j=1}^r y_{ij}\) (grand total).
Practical considerations
Explanation: Blocks should be as homogeneous as possible (within-block variation minimal) and selected based on known gradients (e.g., soil depth, initial weight, machine batch). RBD is less suitable when blocks become very large (hard to keep homogeneous) or when many treatments make block size big.
| Block \ Treatment | T1 | T2 | T3 | T4 |
|---|---|---|---|---|
| Block 1 | y11 | y21 | y31 | y41 |
| Block 2 | y12 | y22 | y32 | y42 |
| Block 3 | y13 | y23 | y33 | y43 |
Statistical analysis (ANOVA for RBD)
Model (fixed-effects):
\( y_{ij} = \mu + \tau_i + \beta_j + \varepsilon_{ij},\quad i=1,\dots,t;\ j=1,\dots,r \)
Components: \(\mu\) = overall mean; \(\tau_i\) = treatment effect (fixed); \(\beta_j\) = block effect; \(\varepsilon_{ij}\) ~ iid \(N(0,\sigma^2)\).
Sum of squares (corrected)
Total SS (corrected): \( \mathrm{TSS} = \sum_{i=1}^t\sum_{j=1}^r y_{ij}^2 - \dfrac{G^2}{tr} \)
Treatment SS: \( \mathrm{SS}_T = \dfrac{1}{r}\sum_{i=1}^t T_i^2 - \dfrac{G^2}{tr} \)
Block SS: \( \mathrm{SS}_B = \dfrac{1}{t}\sum_{j=1}^r B_j^2 - \dfrac{G^2}{tr} \)
Error SS: \( \mathrm{SS}_E = \mathrm{TSS} - \mathrm{SS}_T - \mathrm{SS}_B \).
ANOVA table
| Source | d.f. | Sum of Squares | Mean Square | F |
|---|---|---|---|---|
| Treatments | t-1 | \(\mathrm{SS}_T\) | \(\mathrm{MS}_T=\mathrm{SS}_T/(t-1)\) | \(\dfrac{\mathrm{MS}_T}{\mathrm{MS}_E}\) |
| Blocks | r-1 | \(\mathrm{SS}_B\) | \(\mathrm{MS}_B=\mathrm{SS}_B/(r-1)\) | \(\dfrac{\mathrm{MS}_B}{\mathrm{MS}_E}\) |
| Error | (t-1)(r-1) | \(\mathrm{SS}_E\) | \(\mathrm{MS}_E=\mathrm{SS}_E/((t-1)(r-1))\) | — |
| Total | tr-1 | \(\mathrm{TSS}\) | — | — |
Critical Difference (C.D.) and multiple comparisons
Purpose: If ANOVA rejects the global null (treatment means equal), we need pairwise comparisons to know which treatment pairs differ. The simplest is Critical Difference (a t-based pairwise test). :contentReference[oaicite:13]{index=13}
Critical Difference formula
\( \text{C.D.} = t_{\alpha, \nu} \times \text{S.E.}(d) \)
For comparing means \( \bar y_i, \bar y_j\) with replications \(r_i,r_j\):
\( \text{S.E.}(d) = \sqrt{\mathrm{MS}_E\left(\dfrac{1}{r_i} + \dfrac{1}{r_j}\right)}. \)
When replications are equal \(r_i=r_j=r\):
\( \text{S.E.}(d) = \sqrt{\dfrac{2\mathrm{MS}_E}{r}}. \)
Reject equality of two means at level \(\alpha\) if \(|\bar y_i - \bar y_j| > \text{C.D.}\).
S.E. = \(\sqrt{2\times 25/4} = \sqrt{12.5} = 3.535\). C.D. = \(2.179\times 3.535 \approx 7.70\). Any difference larger than 7.70 is significant at 5%.
RBD with one missing value — estimation & analysis
Problem: When one observation is missing in an RBD, we can estimate it (by solving normal equations or using formulae) and then proceed with ANOVA. The course notes give a direct formula for the missing value estimate and the adjustment for upward bias.
Notation for one missing observation
Suppose the missing observation is \(x\) for treatment \(i_0\) in block \(j_0\). Let:
\(R'=\) total of the observed block \(j_0\) (without x), \quad \(T'=\) total of observed treatment \(i_0\) (without x), \quad \(G'=\) grand total of observed values (without x).
Estimate for missing value (direct formula)
One commonly used estimator is:
\( \displaystyle \hat x \;=\; \dfrac{r\,R' + t\,T' - G'}{(r-1)(t-1)}. \)
Upward bias correction: After substitution one may compute an upward bias correction term \(B\) and adjust treatment SS as described in the notes. The worked formula and substitution are shown in the PDF.
Worked numeric example (from your PDF)
Data: \(t=5\) treatments, \(r=4\) replications. One missing observation at treatment 2, replication III. The observed totals (without x) are:
- \(R' =\) replication-III total (observed) = 135.1
- \(T' =\) treatment-2 total (observed) = 89.5
- \(G' =\) grand total (observed) = 590.2
Apply the formula:
\( \hat x = \dfrac{rR' + tT' - G'}{(r-1)(t-1)} = \dfrac{4(135.1) + 5(89.5) - 590.2}{3\times 4} = \dfrac{397.7}{12} = 33.1. \)
After substituting \(x=33.1\) the grand total becomes \(623.3\) and treatment 2 total becomes \(122.6\); the adjusted treatment SS is computed and an upward-bias term \(B=0.3645\) is used to correct SS (see the ANOVA table below). Final ANOVA (as in PDF) gave Treatment MS = 130.3659, Error MS = 31.6316 and \(F \approx 4.117\) (d.f. for Error = 11).
Advantages, disadvantages & applications
Advantages
- Increases precision by removing block-to-block variability from the error term.
- Flexible: any number of treatments/replications (practically at least 2) and easy to include checks or repeated controls. }
- Analysis is straightforward (simple ANOVA computations).
Disadvantages
- May give misleading results if blocks are not homogeneous.
- Not suitable for very large number of treatments (blocks would be large).
- If many plots are missing analysis becomes tedious (one or two missing observations are manageable using formulas/worked methods).
Applications
Common in agriculture (fertilizers, varieties across field blocks), manufacturing (batches as blocks), medical trials (centres as blocks) and lab experiments where a known nuisance factor can be blocked.
Problems (practice)
Below are two practice problems — work them by hand or plug into a calculator following the steps shown above.
Problem 1 — Basic RBD ANOVA
Five treatments with 4 blocks. Observed treatment totals: 121.8, 122.6, 113.9, 162.8, 102.2; grand total 623.3. Compute treatment SS, block SS, error SS and test if treatments differ (use formulas in section "Statistical analysis").
Problem 2 — One missing observation (use the formula)
Same as Problem 1 but assume the observation for treatment 2 in block 3 is missing and the observed totals are \(R'=135.1\), \(T'=89.5\), \(G'=590.2\). Find \(\hat x\) and complete the ANOVA. (This is the worked example in the notes; answer: \(\hat x\approx 33.1\).)
If you want, I can convert these problems into fillable HTML forms that compute S.S. and F-values automatically in the browser (small JS calculator).
17.3 Latin Square Design
Explanation: Controls two sources of variation.
1. Concept of LSD
(A) Complete 3×3 LSD — concise walk‑through
Suppose the 3×3 table (treatments labelled A,B,C) has observations:
| Col1 | Col2 | Col3 | |
|---|---|---|---|
| Row1 | 10(A) | 12(B) | 11(C) |
| Row2 | 13(B) | 14(C) | 15(A) |
| Row3 | 9(C) | 8(A) | 16(B) |
Compute treatment totals: \(T_A=10+15+8=33,\; T_B=12+13+16=41,\; T_C=11+14+9=34\). Grand total \(G=108\).
Then form SS_{trt}, SS_rows, SS_cols and SSE. (This page keeps algebra concise — full numeric ANOVA can be computed by following standard SS formulas.)
(B) Estimation of one missing value (summary)
If one cell is missing (row \(i\), col \(j\), treatment \(k\)), estimate with:
\[ \widehat{y}_{ij(k)} = \dfrac{t(R_i + C_j + T_k) - 2G}{(t-1)(t-2)}. \]After replacing the missing value use completed ANOVA but reduce total and error df by 1.
9. Problems & practice questions
- Given the 4×4 square in section 1 with numeric observations (provide a table), compute the ANOVA table and test treatments at 5% level.
- For a 5×5 Latin square with one missing observation at (row2, col4, treatment C), using the marginal totals below, estimate the missing value with the formula in section 8(B).
- Compare CRD, RBD and LSD using hypothetical MSEs: CRD=10, RBD=6, LSD=3. Compute relative efficiencies and interpret.
- Find an orthogonal Latin square for t=3 (note: small t has limitations) and discuss if Graeco‑Latin square is possible.
10. Critical differences: LSD vs RBD vs CRD
Summary of key differences:
| Feature | CRD | RBD (RCBD) | LSD |
|---|---|---|---|
| Blocks controlled | none | one | two (rows & columns) |
| Number of experimental units | flexible | needs blocks × treatments | must be t×t |
| Error df | N-t | (b-1)(t-1) | (t-1)(t-2) |
| When preferable | homogeneous experimental units | one nuisance factor important | two nuisance factors important & equal levels |
| Complexity | simple | moderate | higher (layout constraints) |
Example (critical differences)
If field variability exists in two directions (rows and columns), LSD will often yield lower MSE than RBD and CRD. But if only one direction matters, RBD may suffice and is simpler.
References & further reading
Standard textbooks: Montgomery Design and Analysis of Experiments, Cochran & Cox Experimental Designs, and class lecture notes on Latin squares. For missing-value derivations see lecture notes on imputation in experimental designs.
Latin Square Design (LSD) — Estimating one missing value & efficiencies (versus RBD, CRD)
You can find derivations of this in standard ANOVA/DOE lecture notes (see references).
Example problem (practice)
Given an order‑3 Latin square with observed table (one missing at row2,col2):
| C1 | C2 | C3 | |
|---|---|---|---|
| R1 | A:10 | B:12 | C:11 |
| R2 | B:13 | C:? | A:14 |
| R3 | C:9 | A:8 | B:15 |
Compute \(R_2, C_2, T_{\text{treatment at missing}}\) and \(G\) (exclude missing), then use the formula (with \(t=3\)) to estimate the missing value. (Try it — for \(t=3\) the denominator \((t-1)(t-2)=2\times1=2\).)
Efficiency: RBD relative to CRD (intuitive + formula)
Why RBD can be more efficient: Randomized Block Design (RCBD/RBD) accounts for a known nuisance factor (blocks). When blocks explain variation, the residual (error) variance is reduced compared to CRD — so treatment comparisons become more precise.
Relative efficiency (practical definition)
A common definition of relative efficiency of RBD compared to CRD is the ratio of their mean‑squared errors (or the ratio of sample sizes required to achieve the same precision):
\[ \text{RE}_{\text{RBD:CRD}} = \frac{\text{MSE}_{\text{CRD}}}{\text{MSE}_{\text{RBD}}}. \]If \(\text{RE}>1\) then RBD is more efficient (i.e. a CRD would need roughly \(\text{RE}\) times more observations per treatment to match the precision).
How to estimate \(\text{MSE}_{\text{CRD}}\) when you only have RBD data: common practice is to re‑compute the CRD MSE by "undoing" the block partitioning — for example, use the RCBD sums of squares to form a hypothetical CRD error:
\[ \text{MSE}_{\text{CRD}} \approx \frac{\text{SS}_{\text{blocks}} + \text{SS}_{\text{error (RBD)}}}{\text{df}_{\text{CRD}}}, \]and then take the ratio with \(\text{MSE}_{\text{RBD}}\). Textbook notes and lecture slides give worked examples; the practical formula used by many instructors is
\[ \text{RE} \approx \frac{\text{MSE}_{\text{CRD (reconstructed)}}}{\text{MSE}_{\text{RBD}}}. \]Short numeric illustration
Suppose an RBD ANOVA gives \(\text{MSE}_{RBD}=3.7\) and a hypothetical CRD MSE (reconstructed) would be \(\text{MSE}_{CRD}=5.2\). Then
\[ \text{RE} = 5.2/3.7 \approx 1.41. \]This means CRD would need about 1.41 times as many observations per treatment to achieve the same precision as the RBD.
Efficiency: LSD relative to RBD and CRD
Because LSD blocks in two directions (rows and columns), it can reduce residual variance further than RBD (which blocks in one direction) provided the row and column sources actually contribute meaningful variability. The same relative efficiency idea applies:
\[ \text{RE}_{\text{LSD:RBD}} = \frac{\text{MSE}_{\text{RBD}}}{\text{MSE}_{\text{LSD}}}, \qquad \text{RE}_{\text{LSD:CRD}} = \frac{\text{MSE}_{\text{CRD}}}{\text{MSE}_{\text{LSD}}}. \]Interpretation: values >1 mean the denominator design (LSD) is more efficient.
Comparative example (toy)
Imagine MSEs from analyses are: \(\text{MSE}_{CRD}=8.0,\; \text{MSE}_{RBD}=4.0,\; \text{MSE}_{LSD}=2.0\). Then
\[ \text{RE}_{RBD:CRD}=8/4=2, \quad \text{RE}_{LSD:RBD}=4/2=2, \quad \text{RE}_{LSD:CRD}=8/2=4. \]So LSD here is 4× more efficient than CRD, and 2× more efficient than RBD for treatment comparisons.
Practical remarks and when LSD helps
- LSD is best when you have exactly \(t\) treatments and can block in two directions (e.g., rows and columns) — otherwise it's not applicable.
- The gain in precision depends on how much variation is explained by rows and columns. If once estimated, row and column effects are negligible, LSD may not beat RBD.
- Degrees of freedom for error in LSD are limited: \((t-1)(t-2)\). So for small \(t\) there are few error df — be cautious when drawing strong conclusions.
- When a missing observation is imputed, adjust total and error df by −1 before testing.
More worked problems (suggested exercises)
- Take the 4×4 numeric example in section 3, replace the missing value with the estimate, form the ANOVA table and show the adjustment of df. Compute F for treatments.
- Given an RBD ANOVA with SSblocks = 40 (df = 3), SSE = 18 (df = 8), compute MSE_RBD and reconstruct MSE_CRD and the relative efficiency RE = MSE_CRD / MSE_RBD.
- Design a 5×5 LSD dataset with two strong blocking directions and show numerically how MSE drops from CRD → RBD → LSD.
18.1 Control Charts
Explanation: Monitor process variation over time.
Example: X̄ & R charts for product weight.
18.2 Process Capability
Indices: \(C_p=\frac{USL-LSL}{6\sigma}\), \(C_{pk}=\frac{\min(\mu-LSL,USL-\mu)}{3\sigma}\).
18.3 Acceptance Sampling
Explanation: Decide to accept/reject a batch using sample inspection.
18.4 OC Curve
Explanation: Shows probability of accepting a lot at different defect levels.
19.1 Birth, Death, Fertility Rates
Crude Birth Rate = \(\frac{\text{Births}}{\text{Population}}\times1000\).
19.2 Mortality Rates
Explanation: Age-specific & standardized mortality reflect real health patterns.
19.3 Life Tables
Explanation: Summaries of survivorship; used to compute life expectancy.
19.4 Reproduction Rates
Explanation: Measures replacement level of population.
