Polynomial Regression
Back To:
FAQ Home
Introduction
Interpretation
Fundamentals
References
Questions:
Is it possible for statistical interaction
to occur in the context of one independent variable and the dependent
variable?
Why is the inclusion of a polynomial or a
logarithm in a OLS regression referred to as a nonlinear
model?
How can one determine if a nonlinear
relationship exists?
How can one determine the functional form
of the nonlinear relationship?
How do you interpret the estimates of a
polynomial regression?
What do
b0
, b1, and b2
represent in a polynomial regression?
How do you obtain standard errors and
t-ratios for the parameter estimates in polynomial regression?
When including a polynomial must one also
include the constituent variable?
Does the polynomial regression also
experience increased collinearity between the lower and higher ordered
term?
Can the effect of X1
on the dependent variable have fluctuating
significance at different values of X1 in a polynomial
regression?
Does the addition of the higher order term
(X12) in a
polynomial model change the parameter estimate of the lower order term
(X1)?
Why would one use the logarithmic
transformation in the case of nonlinear relationships?
How do you interpret an independent
variable that is a logarithm?
How does one obtain standard errors and
t-ratios for the parameter estimates for logarithmic nonlinear
regression?
Can a statistical interaction occur
between more than two variables in an interactive
model?
Is it possible for statistical interaction
to occur in the context of one independent variable and the dependent
variable?
Yes, it is possible that the impact of an independent variable on the
dependent variable can vary according to the value of that independent
variable. There are two general nonlinear type relationships:
polynomial and logarithmic. A polynomial regression utilizes regressors
which are successive powers of the independent variable. (Polynomial
Example) The second type of nonlinear relationship is a logarithmic relationship.
Logarithms are
useful in cases where it is necessary to stress the difference between scores in a manner
that is proportional to their ratio rather than in terms of their absolute
difference. (Logarithmic Example) Both the
polynomial and logarithmic relationships are similar to the interactive
model in that the impact of an independent variable on the dependent
variable is not constant as in a traditional linear, additive regression.
Why is the inclusion of a polynomial or a
logarithm in a OLS regression referred to as a nonlinear
model?
In standard statistical analysis the relationship between the dependent
and independent variables is treated as linear. This means that the rate
of change in the dependent variable produced by changes in independent variable does not vary with the
values of the independent variable. A nonlinear relationship refers to
a relationship between the dependent variable and an independent
variable in which the effect of the independent variable on the dependent
variable varies according to the value of that independent variable.
Back to Top
How can one determine if a nonlinear
relationship exists?
There are a few methods that can be used to determine if the effect
of an independent variable on the dependent variable varies according to
the value of the same independent variable. First, one can use a simple
bivariate plot of the dependent
variable and the lower order independent variable (Cohen and Cohen 1983).
If the data points do not resemble a straight line one can suspect that
the relationship is not
linear. Second, one can create ordered discrete categories and run a one-way analysis of
variance on the dependent variable (Bohrnstedt and Knoke 1994). Next,
regress the
dependent variable on the original independent variable. Then use the eta-square result
from the analysis of variance and the R2
from the regression to calculate an F-ratio.
If the F-ratio is statistically significant one can conclude that the relationship is nonlinear. Third,
Darlington (1990) suggests examining a residual scatterplot of the relationship between
the dependent and independent variable. This is more complex requiring the use of the
right plot in order to detect the relationship. Finally, one can include a nonlinear term
in the regression; however, this requires that one know the correct
funtion form of the relationship.
Back to Top
How can one determine the functional form
of the nonlinear
relationship?
Cohen and Cohen (1983) indicate that the
curve fitting necessary to determine the proper transformation is quiet mathematically
advanced. The researcher should rely on apriori information or theory in testing the
nonlinear relationship. If one is working in an area that traditionally has hypothesized
the presence of a nonlinear relationship then the researcher should employ the previously
specified relationship. For example, in state legislative research the population size of
a state is typically examined in logarithm form. In terms of polynomials, the quadratic
fit is the most commonly used transformation: the inclusion of x and x2 or age and age2. If there is no apriori information
and a researcher suspects a specific relationship, one should run a
regression with and without the specified transformation. The individual
coefficients p value indicates if the included transformation is
significant and the F test for the models indicate if the models fit
improves with the inclusion of the transformation.
Back to Top
How do you interpret the estimates of a
polynomial regression?
The following equation will serve as the model interpreted
here:
Y =b
0
+ (b1
+b2
X1)X
1
The interpretation of the effect of X1
on Y varies at different values of X1
, thus it is necessary to determine the metric effect.
This is very similar to the metric effect for the interactive model. In
this case the metric effect is (b1 + 2
b2X
1). The metric effect is derived
by multiplying b2 times two then multiplied by some
value of X1, this is then added to the value
of b1
. This computation can be used for any value of the lower
order term. The result of the equation above provides an estimate of the
conditional effect of X1 on
Y at the specified value of X1.
As with the case of the multiplicative model involving the effect of two independent
variables interacting, if a variable interacts with itself and is not included the
researcher fails to estimate the correct conditional
relationship. (Numerical Example)
Back to Top
What do
b0
, b1
, and b2
represent in a polynomial regression?
The b0
is the predicted value of Y when X1
equals zero, b1
is the slope of Y on X1
, when X1
equals zero and b2
describes the instantaneous change in
a slope at that specific value of X1
. The conditional nature of the relationship means that the
standard interpretation is no longer applicable, strictly speaking.
There are some instances where X1
never equals zero, in which case the intercept nor b1 are interpretable
apart from b2
(Friedrich 1982). The slopes represent tangent lines to
the curve of the distribution;
therefore one cannot charge that X1
leads to unit changes in Y.
Back to Top
How do you obtain standard errors and
t-ratios for the parameter estimates in polynomial regression?
The standard error and t-ratio for different levels of
X1 can be obtained with
appropriate formulas. Using the outlined example above, it
is possible to determine the standard error for the metric effect by
taking the square root of the following
equation:
var (b1) +
4X12 var
(b2
) + 4X1cov(b1,b2)
note that this standard error represents the standard error for
X1
at the specified value of X1.
The t-ratio can then be derived by dividing the metric effect by the
standard error for X1 at a
particular value of X1.
Back to Top
When including a polynomial must one also
include the constituent variable?
As in the case of a multiplicative term, polynomial regression also
requires the inclusion of both the lowest order term and subsequent
higher power terms. In the model outlined above the inclusion of age and
age2 fits a parabola. This model can
represent a parabola located anywhere. If the model simply included the
higher order regressor the parabola has a maximum and minimum exactly at
the origin. This removes the ability to allow the curve to be located
anywhere (Darlington 1990).
Back to Top
Does the polynomial regression also
experience increased collinearity between the lower and higher ordered
term?
Yes, the higher and lower ordered terms are usually highly
correlated. This may increase the standard error for the lower order term
but poses little threat to the standard error of the conditional slope,
meaning (b1 +
b2X)X1. The correlation produces substantial
covariance between the coefficients,
which reduces the value of the conditional standard error at some levels of the lower
order term to a value equal to or less than that of the model without the higher order
term. As with the interaction between two independent variable the collinearity level in
polynomial regression is not damaging to the analysis.
Back to Top
Can the effect of X1
on the dependent variable have fluctuating
significance at different values of X1 in a polynomial
regression?
As with the interaction between two independent variables,
the impact of X1 on Y can be
significant at some values of X1 and non-significant at other values. This is based on the
fact that this polynomial represents a conditional relationship; whereas the traditional
linear regression represents a general relationship. The inclusion of the polynomial
allows for the effect X1 on the dependent
variable to vary based on the value of X1;
whereas, the linear regression examines a constant effect of X1 on the dependent variable. The failure to control for the nonlinear relationship
in regular regression can lead to wrong conclusions about the effect of the independent
variable on the dependent variable.
Back to Top
Does the addition of the higher order term
(X12) in a
polynomial model change the parameter estimate of the lower order term
(X1)?
This aspect of polynomial regression has the same implications
as the nonadditive models. The inclusion of the higher order term can dramatically effect
the parameter estimate of the lower order term because the traditional linear regression
estimates general effect and the nonlinear approach estimates a conditional relationship.
In essence, the parameters can differ because the polynomial regression
controls for the curvilinear effect of the independent variable on the
dependent variable.
Back to Top
Why would one use the logarithmic
transformation in the case of nonlinear relationships?
The use of a logarithm is warranted when the effect of the
independent variable on the dependent variable varies, specifically when for example the
effect of the independent variable is stronger at low values of X than at high values.
This type of relationship is curvilinear as is the polynomial relationship. The polynomial
relationship involves a variable multiplied by itself; whereas, a logarithm involves a
base or natural logarithm. This constant is multiplied across all values of the
independent variable. This concept involves the proportional difference between values of
the same variable, for example population.
Back to Top
How do you interpret an independent
variable that is a logarithm?
The following equation is employed to answer this question:
Y =b0+ b1log
10(X1)
Using an example taken from Knoke and Bohrnstedt (1994) the
above equation is a model estimating the impact of logarithm base 10 of
age in year at first marriage on the expected number of children. The
results of OLS regression estimates are: Y= 11.3 -6.75 log10X1. The
nonlinear relationship indicates that the expected number of children is
not constant across the log transformed age variable. In order to determine the effect of
age at different values one must determine the log for a specific value of
age and multiple it by the beta coefficient for X
1.
For example, a woman at age 17 (log10=1.23) would have 11.3 - (6.75)(1.23)=3.00 children. The metric
effect can also be derived by dividing the beta
coefficient by a specific value of the age variable
(b1/X1).
Back to Top
How does one obtain standard errors and
t-ratios for the parameter estimates for logarithmic nonlinear
regression?
As with polynomial and multiplicative interactions, formulas can
be used to derive standard errors and t-ratios for different levels of logrithmic
variable. Using the equation outlined above, the following equation allows one to
determine the standard error for the log variable:
1/X12
* SE(b
1)
The t-ratio can be determined by dividing the metric effect
by the metric standard error for X1
at a particular value of X1
.
Back to Top
Can a statistical interaction occur
between more than two variables in an interactive
model?
It is possible that there can be higher order interaction
effects, meaning that an interaction exists between three or more variables. The presence
of such an interaction is less common that the interaction between two
variables; however
it is possible. The following equation represents a three way interaction:
Y =b0+ b1X1
+b2X2 +b
3X3 +b4X1
X2+b5X1X
3 + b6X2X3
+b7X1X2X
3
This means the effect of X1 on Y is not
uniform over all combinations of X2 and X3 and that the effect of X2 on Y is not uniform over all combination of X
1 and X3, etc.
Back to Top
Back To:
FAQ Home
Introduction
Interpretation
Fundamentals
References
Last updated: August 16, 1999