Examples and data borrowed from Oscar Torres-Reyna’s excellent tutorial on panel data.
library( foreign )
library( car )
Panel <- read.dta("http://dss.princeton.edu/training/Panel101.dta")
Panel$y <- Panel$y / 1000000 # change to millions of dollars
A problem that we encounter often in statistics is bias that is correlated with group structure in the data. This is sometimes referred to as Heterogeneity.
If we have panel data, data where the study variables are measured repeatedly for each group over time, we can account for much of this bias using a “fixed-effect”, a separate intercept term for each group within the data.
plot( Panel$x1, Panel$y,
col=gray(0.5,0.5),
pch=19, cex=3, bty="n",
xlab="X", ylab="Y" )
abline( lm( y~x1, data=Panel ), col="red", lwd=2 )
plot( Panel$x1, Panel$y,
col=factor(Panel$country),
pch=19, cex=3, bty="n",
xlab="X", ylab="Y" )
fe <- lm( y ~ x1 + factor(country) - 1, data=Panel )
yhat <- fe$fitted
plot( Panel$x1, yhat, col=factor(Panel$country),
pch=19, cex=2, xlab="X", ylab="Y", bty="n" )
# Or use the scatterplot() function in the car library
scatterplot( yhat~Panel$x1|Panel$country,
boxplots=FALSE, xlab="x1",
ylab="yhat",smooth=FALSE )
abline( lm(Panel$y~Panel$x1), lwd=3, col="red")
We run a fixed effects regression by adding a term for the group strucure, which in this example is “country”. Since this is a categorical variable, R will create multiple dummy variables that each represent a unique intercept for a group. When each group has an intercept like this, we need to omit the model-level intercept by adding a negative one to the model command.
# Regular OLS regression
ols <- lm( y ~ x1, data=Panel )
# Fixed Effects Model
fe <- lm( y ~ x1 + factor(country) - 1, data=Panel )
library( stargazer )
# To add a stargazer table to your markdown file,
# add this argument to your code chunk:
#
# {r, results='asis'}
stargazer( ols, fe, type="html",
omit.stat = c("adj.rsq", "f","ser"),
digits=2)
Dependent variable: | ||
y | ||
(1) | (2) | |
x1 | 494.99 | 2,475.62** |
(778.86) | (1,106.68) | |
factor(country)A | 880.54 | |
(961.81) | ||
factor(country)B | -1,057.86 | |
(1,051.07) | ||
factor(country)C | -1,722.81 | |
(1,631.51) | ||
factor(country)D | 3,162.83*** | |
(909.46) | ||
factor(country)E | -602.62 | |
(1,064.29) | ||
factor(country)F | 2,010.73* | |
(1,122.81) | ||
factor(country)G | -984.72 | |
(1,492.72) | ||
Constant | 1,524.32** | |
(621.07) | ||
Observations | 70 | 70 |
R2 | 0.01 | 0.44 |
Note: | p<0.1; p<0.05; p<0.01 |
If we would like to improve the print-out by suppressing the fixed effects in the table, you can use these arguments in stargazer.
# Suppress the printout of all the intercepts
stargazer( ols, fe, type="html",
omit.stat = c("adj.rsq", "f","ser"),
omit="country",
add.lines=list(c("Country Fixed Effects?", "No","Yes")),
digits=2)
Dependent variable: | ||
y | ||
(1) | (2) | |
x1 | 494.99 | 2,475.62** |
(778.86) | (1,106.68) | |
Constant | 1,524.32** | |
(621.07) | ||
Country Fixed Effects? | No | Yes |
Observations | 70 | 70 |
R2 | 0.01 | 0.44 |
Note: | p<0.1; p<0.05; p<0.01 |
As we would expect based upon the picture, the slope in the fixed effects model is larger than the slope in the OLS model. We also see that is it not significant in the OLS model, but it is in the fixed effects model because the fit is not better (each data point is closer to the line of fit since there are now several lines of best fit).