Panel Models in Stata and R
The purpose of this page is to help you take panel models you fit in Stata, and fit them in R, and to understand why standard errors (SEs) differ between the two. You will have limited success trying to translate panel models in the other direction, from R to Stata, because Stata package authors are less likely than R package authors to explicitly reproduce methods unique to other software packages.
The example code in the tables below are written with Stata-like
terminology. They assume you have some dataset dat
with
panel variable panelvar
, time variable
timevar
, dependent variable depvar
, any number
of independent variables indepvars
, and some other group
variable groupvar
. Substitute each of these with the names
of the variables in your particular dataset.
The functions in the R code require you to install and load the
plm
, coeftest
, sandwich
, and
clubSandwich
packages.
1 Panel Models Equivalents
1.1 Fixed effects
Stata:
xtset panelvar
xtreg depvar indepvars, fe
R:
mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "within")
1.1.1 SEs clustered by
panelvar
Stata:
xtset panelvar
xtreg depvar indepvars, fe vce(cluster panelvar)
R:
mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "within")
n_groups <- length(unique(dat$panelvar))
adj <- n_groups / (n_groups - 1)
coeftest(mod,
adj * vcovHC(mod, type = "HC1"))
See notes on finite sample size adjustments and degrees of freedom.
1.1.2 SEs clustered by
groupvar
Stata:
xtset panelvar
xtreg depvar indepvars, fe vce(cluster groupvar)
R:
mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "within")
coeftest(mod,
vcovCR(mod,
type = "CR1S",
cluster = dat$groupvar))
See notes on finite sample size adjustments,
SEs clustered by groupvar
,
and degrees of freedom.
1.2 Random effects
1.2.1 Balanced
Stata:
xtset panelvar
xtreg depvar indepvars, re
R:
mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "random")
1.2.2 Unbalanced
Stata:
xtset panelvar
xtreg depvar indepvars, re sa
R:
mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "random",
random.models = c("within", "between"))
R’s default is the Swamy and Arora model, which can be done in Stata
with the sa
option.
1.2.3 SEs clustered by
panelvar
Stata:
xtset panelvar
xtreg depvar indepvars, re vce(cluster panelvar)
R:
mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "random")
coeftest(mod,
vcovHC(mod,
type = "sss"))
See note on finite sample size adjustments.
1.2.4 SEs clustered by
groupvar
Stata:
xtset panelvar
xtreg depvar indepvars, re vce(cluster groupvar)
R has no equivalent.
See note on SEs clustered by
groupvar
.
2 Doing More
2.1 Including
timevar
In Stata, timevar
is included in the initial
xtset
: xtset panelvar timevar
.
In R, timevar
must be added to the index
argument of plm()
. Supply index
with a vector
of panelvavr
and timevar
:
plm(..., index = c("panelvar", "timevar"))
.
2.2 Including Multiple Fixed Effects
If you are fitting a model with many fixed effects with
reghdfe
, see the R package lfe
, but note that
the
package is no longer being maintained.
3 Notes
3.1 Finite sample size adjustments
Stata’s xtreg
applies a correction to standard errors
for finite sample sizes, while R does not. Applying some adjustment
factor, such as \(\frac{\text{n_groups}}{\text{n_groups} -
1}\), will make R’s SEs the same as, or at least very close to,
Stata’s SEs.
reghdfe
, on the other hand, produces the same SEs as
plm()
, so that and
are equivalent. Note that
reghdfe
only supports fixed effects models, however.
reghdfe
produces SEs identical to plm
’s
default.
As an alternative for fixed effects models, use
reghdfe
3.2 SEs clustered by
groupvar
Fixed effects models: I have not been able to figure out why the SEs slightly differ for Stata and R, even though it appears they are applying the same adjustment to the SEs.
Random effects models: As of this writing, plm
,
sandwich
, and clubSandwich
do not support
clustering SEs by groups that were not included in the random effects
panel model.
3.3 Degrees of freedom
Stata and R use different degrees of freedom for clustered standard errors. While the SEs and t-values will match, the p-values and confidence intervals will not. Stata uses the number of groups minus one, and R uses the number of observations minus the number of groups minus the number of predictors in the model.
To manually calculate Stata’s and R’s p-values for some t-value
(tvalue
), adapt the code below.
g <- length(unique(dat$panelvar))
n <- nobs(mod)
k <- length(coef(mod))
df_stata <- g - 1
df_r <- n - g - k
pt(abs(tvalue), df_stata, lower.tail = F) * 2 # Stata's p-value
pt(abs(tvalue), df_r, lower.tail = F) * 2 # R's p-value