Polynomial Regression
Back To: FAQ Home Introduction Interpretation Fundamentals References


Questions:
Is it possible for statistical interaction to occur in the context of one independent variable and the dependent variable?
Why is the inclusion of a polynomial or a logarithm in a OLS regression referred to as a nonlinear model?
How can one determine if a nonlinear relationship exists?
How can one determine the functional form of the nonlinear relationship?
How do you interpret the estimates of a polynomial regression?
What do b0 , b1, and b2 represent in a polynomial regression?
How do you obtain standard errors and t-ratios for the parameter estimates in polynomial regression?
When including a polynomial must one also include the constituent variable?
Does the polynomial regression also experience increased collinearity between the lower and higher ordered term?
Can the effect of X
1 on the dependent variable have fluctuating significance at different values of X1 in a polynomial regression?
Does the addition of the higher order term (X
12) in a polynomial model change the parameter estimate of the lower order term (X1)?
Why would one use the logarithmic transformation in the case of nonlinear relationships?
How do you interpret an independent variable that is a logarithm?
How does one obtain standard errors and t-ratios for the parameter estimates for logarithmic nonlinear regression?
Can a statistical interaction occur between more than two variables in an interactive model?

Is it possible for statistical interaction to occur in the context of one independent variable and the dependent variable?

Yes, it is possible that the impact of an independent variable on the dependent variable can vary according to the value of that independent variable. There are two general nonlinear type relationships: polynomial and logarithmic. A polynomial regression utilizes regressors which are successive powers of the independent variable. (Polynomial Example) The second type of nonlinear relationship is a logarithmic relationship. Logarithms are useful in cases where it is necessary to stress the difference between scores in a manner that is proportional to their ratio rather than in terms of their absolute difference. (Logarithmic Example) Both the polynomial and logarithmic relationships are similar to the interactive model in that the impact of an independent variable on the dependent variable is not constant as in a traditional linear, additive regression.

Why is the inclusion of a polynomial or a logarithm in a OLS regression referred to as a nonlinear model?

In standard statistical analysis the relationship between the dependent and independent variables is treated as linear. This means that the rate of change in the dependent variable produced by changes in independent variable does not vary with the values of the independent variable. A nonlinear relationship refers to a relationship between the dependent variable and an independent variable in which the effect of the independent variable on the dependent variable varies according to the value of that independent variable.

Back to Top

How can one determine if a nonlinear relationship exists?

There are a few methods that can be used to determine if the effect of an independent variable on the dependent variable varies according to the value of the same independent variable. First, one can use a simple bivariate plot of the dependent variable and the lower order independent variable (Cohen and Cohen 1983). If the data points do not resemble a straight line one can suspect that the relationship is not linear. Second, one can create ordered discrete categories and run a one-way analysis of variance on the dependent variable (Bohrnstedt and Knoke 1994). Next, regress the dependent variable on the original independent variable. Then use the eta-square result from the analysis of variance and the R2 from the regression to calculate an F-ratio. If the F-ratio is statistically significant one can conclude that the relationship is nonlinear. Third, Darlington (1990) suggests examining a residual scatterplot of the relationship between the dependent and independent variable. This is more complex requiring the use of the right plot in order to detect the relationship. Finally, one can include a nonlinear term in the regression; however, this requires that one know the correct funtion form of the relationship.

Back to Top

How can one determine the functional form of the nonlinear relationship?

Cohen and Cohen (1983) indicate that the curve fitting necessary to determine the proper transformation is quiet mathematically advanced. The researcher should rely on apriori information or theory in testing the nonlinear relationship. If one is working in an area that traditionally has hypothesized the presence of a nonlinear relationship then the researcher should employ the previously specified relationship. For example, in state legislative research the population size of a state is typically examined in logarithm form. In terms of polynomials, the quadratic fit is the most commonly used transformation: the inclusion of x and x2 or age and age2. If there is no apriori information and a researcher suspects a specific relationship, one should run a regression with and without the specified transformation. The individual coefficients p value indicates if the included transformation is significant and the F test for the models indicate if the models fit improves with the inclusion of the transformation.

Back to Top

How do you interpret the estimates of a polynomial regression?

The following equation will serve as the model interpreted here:

Y =b 0 + (b1 +b2 X1)X 1


The interpretation of the effect of X1 on Y varies at different values of X1 , thus it is necessary to determine the metric effect. This is very similar to the metric effect for the interactive model. In this case the metric effect is (b1 + 2 b2X 1). The metric effect is derived by multiplying b2 times two then multiplied by some value of X1, this is then added to the value of b1 . This computation can be used for any value of the lower order term. The result of the equation above provides an estimate of the conditional effect of X1 on Y at the specified value of X1. As with the case of the multiplicative model involving the effect of two independent variables interacting, if a variable interacts with itself and is not included the researcher fails to estimate the correct conditional relationship. (Numerical Example)

Back to Top

What do b0 , b1 , and b2 represent in a polynomial regression?

The b0 is the predicted value of Y when X1 equals zero, b1 is the slope of Y on X1 , when X1 equals zero and b2 describes the instantaneous change in a slope at that specific value of X1 . The conditional nature of the relationship means that the standard interpretation is no longer applicable, strictly speaking. There are some instances where X1 never equals zero, in which case the intercept nor b1 are interpretable apart from b2 (Friedrich 1982). The slopes represent tangent lines to the curve of the distribution; therefore one cannot charge that X1 leads to unit changes in Y.

Back to Top

How do you obtain standard errors and t-ratios for the parameter estimates in polynomial regression?

The standard error and t-ratio for different levels of X1 can be obtained with appropriate formulas. Using the outlined example above, it is possible to determine the standard error for the metric effect by taking the square root of the following equation:

var (b1) + 4X12 var (b2 ) + 4X1cov(b1,b2)

note that this standard error represents the standard error for X1 at the specified value of X1. The t-ratio can then be derived by dividing the metric effect by the standard error for X1 at a particular value of X1.

Back to Top

When including a polynomial must one also include the constituent variable?

As in the case of a multiplicative term, polynomial regression also requires the inclusion of both the lowest order term and subsequent higher power terms. In the model outlined above the inclusion of age and age2 fits a parabola. This model can represent a parabola located anywhere. If the model simply included the higher order regressor the parabola has a maximum and minimum exactly at the origin. This removes the ability to allow the curve to be located anywhere (Darlington 1990).

Back to Top

Does the polynomial regression also experience increased collinearity between the lower and higher ordered term?

Yes, the higher and lower ordered terms are usually highly correlated. This may increase the standard error for the lower order term but poses little threat to the standard error of the conditional slope, meaning (b1 + b2X)X1. The correlation produces substantial covariance between the coefficients, which reduces the value of the conditional standard error at some levels of the lower order term to a value equal to or less than that of the model without the higher order term. As with the interaction between two independent variable the collinearity level in polynomial regression is not damaging to the analysis.

Back to Top

Can the effect of X1 on the dependent variable have fluctuating significance at different values of X1 in a polynomial regression?

As with the interaction between two independent variables, the impact of X1 on Y can be significant at some values of X1 and non-significant at other values. This is based on the fact that this polynomial represents a conditional relationship; whereas the traditional linear regression represents a general relationship. The inclusion of the polynomial allows for the effect X1 on the dependent variable to vary based on the value of X1; whereas, the linear regression examines a constant effect of X1 on the dependent variable. The failure to control for the nonlinear relationship in regular regression can lead to wrong conclusions about the effect of the independent variable on the dependent variable.

Back to Top

Does the addition of the higher order term (X12) in a polynomial model change the parameter estimate of the lower order term (X1)?

This aspect of polynomial regression has the same implications as the nonadditive models. The inclusion of the higher order term can dramatically effect the parameter estimate of the lower order term because the traditional linear regression estimates general effect and the nonlinear approach estimates a conditional relationship. In essence, the parameters can differ because the polynomial regression controls for the curvilinear effect of the independent variable on the dependent variable.

Back to Top

Why would one use the logarithmic transformation in the case of nonlinear relationships?

The use of a logarithm is warranted when the effect of the independent variable on the dependent variable varies, specifically when for example the effect of the independent variable is stronger at low values of X than at high values. This type of relationship is curvilinear as is the polynomial relationship. The polynomial relationship involves a variable multiplied by itself; whereas, a logarithm involves a base or natural logarithm. This constant is multiplied across all values of the independent variable. This concept involves the proportional difference between values of the same variable, for example population.

Back to Top

How do you interpret an independent variable that is a logarithm?

The following equation is employed to answer this question:

Y =b0+ b1log 10(X1)

Using an example taken from Knoke and Bohrnstedt (1994) the above equation is a model estimating the impact of logarithm base 10 of age in year at first marriage on the expected number of children. The results of OLS regression estimates are: Y= 11.3 -6.75 log10X1. The nonlinear relationship indicates that the expected number of children is not constant across the log transformed age variable. In order to determine the effect of age at different values one must determine the log for a specific value of age and multiple it by the beta coefficient for X 1. For example, a woman at age 17 (log10=1.23) would have 11.3 - (6.75)(1.23)=3.00 children. The metric effect can also be derived by dividing the beta coefficient by a specific value of the age variable (b1/X1).

Back to Top

How does one obtain standard errors and t-ratios for the parameter estimates for logarithmic nonlinear regression?

As with polynomial and multiplicative interactions, formulas can be used to derive standard errors and t-ratios for different levels of logrithmic variable. Using the equation outlined above, the following equation allows one to determine the standard error for the log variable:
1/X12 * SE(b 1)

The t-ratio can be determined by dividing the metric effect by the metric standard error for X1 at a particular value of X1 .

Back to Top

Can a statistical interaction occur between more than two variables in an interactive model?

It is possible that there can be higher order interaction effects, meaning that an interaction exists between three or more variables. The presence of such an interaction is less common that the interaction between two variables; however it is possible. The following equation represents a three way interaction:

Y =b0+ b1X1 +b2X2 +b 3X3 +b4X1 X2+b5X1X 3 + b6X2X3 +b7X1X2X 3


This means the effect of X1 on Y is not uniform over all combinations of X2 and X3 and that the effect of X2 on Y is not uniform over all combination of X 1 and X3, etc.

Back to Top


Back To: FAQ Home Introduction Interpretation Fundamentals References

Last updated: August 16, 1999