Title Statistical Methods in HYDROLOGY-Haan Physics & Mathematics Probability Theory Statistical Theory Statistical Analysis Autoregressive Integrated Moving Average 16.2 MB 516
##### Document Text Contents
Page 1

Sta t is t i ca l Methods,in
HYDROLOG*

Page 258

The most common assumptions are:

1. X is a nonrandom variable measured without error, Y is a random variable, and E(Y,IX) is
normally and independently distributed with mean a + PX and variance a2.

2. Y and X are both random variables having a joint distribution, the conditional distribution of
Y is N(a + PX, a2), and the marginal distribution of X is independent of a , P and a2.

It turns out that under either of the above conditions, the procedures given in this chapter are
valid for tests of hypotheses and confidence interval estimation at a specified level of significance.
Graybill (1961) points out that the power of the tests are not the same for the two conditions.

If X is a fixed variable measured without error and ei is independently and identically
distributed N(0, a2); or Y and X are from a bivariate normal distribution and are measured
without error.; or Y and X are from a bivariate non-normal population with the conditional distri-
bution of Y being N(a + PX, a2) and the marginal distribution of X independent of a , P and a2;
then the least squares estimates of a , p and a2 are also maximum likelihood estimators. The least
squares estimates for the regression coefficients are unbiased.

If significant measurement errors are made on the X variables, then complications arise. For
this situation reference can be made to Graybill (1961) or Johnston (1963). Certainly, measure-
ment errors are always present; however, if these errors are small relative to X, then the theory
presented in this chapter and chapters 10, 11, and 12 may still be applied.

The reason that measurement errors on X cause problems can be seen by considering the
model Y = a + PX + E. If Y and X contain measurement errors, then Y and X are not observed.
What is observed is Y * and X*, where

where ey and ex are the measurement errors on Y and X. Thus, the normal equations are solved
in terms of Y* = a + PX* + E, or Y + ey = a + p(X + ex) + E = a + f3X + f3ex + E.
Now if ex is small in comparison to X, this latter equation becomes Y = a + f3X + E - ey, or
Y = a + f3X + e,, which can be handled by the methods outlined in this chapter.

Recall that no distributional assumptions are required to get the least squares estimates for
a and f3. The assumptions are involved when confidence intervals and tests of hypotheses are of
concern, or when it is desired to state that the least squares estimates for a and P are also maxi-
mum likelihood estimates. Johnston (1963) points out that the least squares estimates for a and
p are biased if significant measurement errors are present on X.

One of the assumptions used in developing confidence intemals and tests of hypotheses was
that the E~ are independent. If E, is correlated with E ~ + , , the least square estimates of a and f3 are
unbiased, however, the sampling variance of a and f3 will be unduly large and will be underesti-
mated by the least squares formulas for variances rendering the level of significance of tests of
hypotheses unknown. Also, the sampling variances on predictions made with the resulting equa-
tion will be needlessly large. Correlation between E, and frequently arises when time series
data are being analyzed. This type of correlation is known as autocorrelation or serial correlation.

Page 259

SIMPLE REGRESSION 239

Fig. 9.5. Illustration of situation where Var(ei) # s2 for all i.

Johnston (1963) discusses least squares estimation procedures in the presence of autocorrelation.
Autocorrelation of errors is discussed in more detail in the next chapter of this book.

In some situations the assumption of homoscedasticity [Var(~,) = 0' for all i] is violated.
Quite commonly, Var(ei) increases as X increases. Such a situation is depicted in figure 9.5. Draper

and Smith (1966) and Johnston (1963) discuss least squares estimation under this condition.
Another point to be made concerning hypothesis testing in general is that a statistically

significant difference and a physically significant difference are two entirely different quantities.
For example, when the H,: P = 0 was tested in example 9.4, the conclusion was that the
regression line explained a significant amount of the variation in Y. This refers to a statistically
significant amount of the variation at the chosen level of significance. It means that recognizing
an a% chance of an error, the relationship Y = a + bX cannot be attributed to chance. It does
not imply a cause and effect relationship between Y and X.

Looking at the confidence limits on the regression as plotted in figure 9.1 and the scatter of
the data, it can be seen that this simple relationship Y = a + bX leaves a lot to be desired in
terms of predicting annual runoff. Whether or not the derived relationship is usable depends on
the use to be made of the predicted values of Y and not on the fact that the Ho: p = 0 is rejected.
It may be that the standard error of the equation, s2, is so large as to render the estimate made with
the equation in some particular application too uncertain to be used even though the equation is
explaining a statistically significant portion of the variability in the dependent variable.

Exercises

9.1. The following data are the maximum air and soil temperatures (bare soil at 2-inch depth)
recorded for the first 30 days of July 1973, at Lexington, Kentucky. Derive a linear relationship
via simple regression for predicting the maximum soil temperature from the maximum air

Page 515

INDEX 495

Spherical model, 430,43 1,438
Spurious correlation, 29 1-293
Standard deviation, 57,28 1
Standard error

multiple linear regression, 255
simple regression, 232,237

Standardized variable, 102,304
Standard normal distribution, 102-104

approximations for, 104-106
Standard random normal deviate, 323
State, 380
Stationarity, 92, 372

in a time series, 338-340
Statistical methods, applying, 4
Statistical tests. See Hypothesis testing
Statistics

definition of, 6
descriptive, 42-30
nonparametric, 194-1 95
parametric, 194- 195

Stepwise multiple regression, 256
Stillwater (Oklahoma), rainfall data,

34 1-344
Stochastic component, 337,374
Stochastic convergence, 19
Stochastic matrix, 38 1
Stochastic models, 7,370-374

Markov models, 375-388
purely random, 374-375
selecting, 372-373

Stochastic process, 16,336366,370-388
continuous, 338
uncertainty in, 390-391

Streamflow models, stochastic, 371
Students t distribution. See t Distributions
Subset, 20
Sufficiency, 7 1-72
Sum of squares, 229-230

multiple regression, 250
Symmetry, measures of, 58-59
Systematic record, 156-157
System reliability, estimating, 397,400,418.

Talbot formula, 18 1
Taylor series expansion, 398,400401,

411412,414

t Distributions, 143-144
Tests of hypothesis. See Hypothesis testing
Tests of significance. See Hypothesis testing
Theory of errors, 100
Thiessen polygon method, 446-447
3-parameter lognormal distribution, 422
3-parameter Weibull distribution, 136
Time average properties, 338
Time scale

continuous, 338
discrete, 84,338,349

Time series
ARIMA, 355-361,363-364
autocorrelation, 348-350
definitions, 336-340
independence of data, 348-349
jumps in, 337,346348,374
parameter estimation

least squares, 364-366
maximum likelihood, 366-367

periodicity, 350-355
plot, 29
trend analysis, 337,340-346,374
variance, 35 1-353

Total probability theorem, 24-25
Total system variance, 298
TP 40 (United States Weather Bureau),

189,190
Trace of matrix, 298
Transformations, 145-146

bivariate, 4 7 4 8
logit, 274
multiple linear regression, 266-268
Z, 151

Transform methods, reliability/risk
analysis, 424

Transition probability, 38 1-383,387
matrix, 3 8 1
n step, 382-383
one step, 380-381

Translations, along the x axis, 145
Trends, in a time series, 337,340-346,374
Triangular distributions, continuous, 11 6-1 17
Type I error

hypothesis testing, 196,202
and Kolmogorov-Smirnov test, 215

Type 11 error
hypothesis testing, 196,202-204
and Kolmogorov-Smirnov test, 214

Type I, II, and III extreme value distributions.
See Extreme value distributions

Page 516

of regression coefficients, 232-233,261
sample, 57-58
sample mean, 65
time series, 351-353
total system, 298

in geostatistics, 448449
in stochastic processes, 39639 1

Uncorrelated random variables, 281-282
Uniform distributions, continuous, 1 14- 1 16
Uniformly most powerful test, 205
Uniform random number, 322
Union, of events, 21
United States Water Resources Council, 156, 182
United States Weather Bureau TP 40,189, 190
Univariate distribution, 32-39,44,53-55
Unordered sampling, 26,27

Variables
dependent, 224,228,242
independent, 245,260
indicator, 268-27 1
lagged, 258,260
random. See Random variables
selection of in regression, 254-255
standardized, 102,304
uncertain, 404

Variance, 57-58
confidence interval and, 199-200
of design estimate, 33
of errors, minimizing, 434-438
first-order approximation estimate, 407-410
global estimate for, 448 .
grouped data, 58
hypothesis tests concerning, 209, 210 .
of linear function, 65,298
multiple regression, 245, 249
noise, 364
output random variable, 399-400
of parameter estimate, 332
point estimates, 428,429
population, 57-58
of predicted value in regression, 236
of principal components, 298-300,304,306

weighted linear combination, 435
Variance-Covariance matrix. See Covariance

matrix
Variance inflation factor, 262
Variation, coefficient of, 58
Venn diagram

probability, 21-22,23
theorem of total probability, 24

Variables
dependent, 224,228,242,257,260

in multivariate multiple regression, 3 11-3 12

Water quality, and frequency analysis, 192
Water quality models

generic form, 390
parameters, 396

Water Resources Council, 156, 182
Weibull distribution, 13 1, 134-1 38

3-parameter, 136
Weibull plotting position, 154-156
Weighted least squares, 43 1
Weighted mean, 57
Weighted probability, 25
Weights, estimation, 434-438
White noise, 356

Yule-Walker equation, 36 1,364

Zero, probability of, 20
Zeros, treatment of, 168-176
Zonal anisotropy, 445
Z transformation, 151