Panel Models in Stata and R

The purpose of this page is to help you take panel models you fit in Stata, and fit them in R, and to understand why standard errors (SEs) differ between the two. You will have limited success trying to translate panel models in the other direction, from R to Stata, because Stata package authors are less likely than R package authors to explicitly reproduce methods unique to other software packages.

The example code in the tables below are written with Stata-like terminology. They assume you have some dataset dat with panel variable panelvar, time variable timevar, dependent variable depvar, any number of independent variables indepvars, and some other group variable groupvar. Substitute each of these with the names of the variables in your particular dataset.

The functions in the R code require you to install and load the plm, coeftest, sandwich, and clubSandwich packages.

1 Panel Models Equivalents

1.1 Fixed effects

Stata:

xtset panelvar
xtreg depvar indepvars, fe

R:

mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "within")

1.1.1 SEs clustered by panelvar

Stata:

xtset panelvar
xtreg depvar indepvars, fe vce(cluster panelvar)

R:

mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "within")

1.2 Random effects

1.2.1 Balanced

Stata:

xtset panelvar
xtreg depvar indepvars, re

R:

mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "random")

1.2.2 Unbalanced

Stata:

xtset panelvar
xtreg depvar indepvars, re sa

R:

mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "random",
random.models = c("within", "between"))

R’s default is the Swamy and Arora model, which can be done in Stata with the sa option.

1.2.3 SEs clustered by panelvar

Stata:

xtset panelvar
xtreg depvar indepvars, re vce(cluster panelvar)

R:

mod <-
plm(depvar ~ indepvars,
dat,
index = "panelvar",
model = "random")

coeftest(mod,
vcovHC(mod,
type = "sss"))

See note on finite sample size adjustments.

1.2.4 SEs clustered by groupvar

Stata:

xtset panelvar
xtreg depvar indepvars, re vce(cluster groupvar)

R has no equivalent.

See note on SEs clustered by groupvar.

2 Doing More

2.1 Including timevar

In Stata, timevar is included in the initial xtset: xtset panelvar timevar.

In R, timevar must be added to the index argument of plm(). Supply index with a vector of panelvavr and timevar: plm(..., index = c("panelvar", "timevar")).

2.2 Including Multiple Fixed Effects

If you are fitting a model with many fixed effects with reghdfe, see the R package lfe, but note that the package is no longer being maintained.

3 Notes

Stata’s xtreg applies a correction to standard errors for finite sample sizes, while R does not. Applying some adjustment factor, such as $$\frac{\text{n_groups}}{\text{n_groups} - 1}$$, will make R’s SEs the same as, or at least very close to, Stata’s SEs.

reghdfe, on the other hand, produces the same SEs as plm(), so that and are equivalent. Note that reghdfe only supports fixed effects models, however.

reghdfe produces SEs identical to plm’s default.

As an alternative for fixed effects models, use reghdfe

3.2 SEs clustered by groupvar

Fixed effects models: I have not been able to figure out why the SEs slightly differ for Stata and R, even though it appears they are applying the same adjustment to the SEs.

Random effects models: As of this writing, plm, sandwich, and clubSandwich do not support clustering SEs by groups that were not included in the random effects panel model.

3.3 Degrees of freedom

Stata and R use different degrees of freedom for clustered standard errors. While the SEs and t-values will match, the p-values and confidence intervals will not. Stata uses the number of groups minus one, and R uses the number of observations minus the number of groups minus the number of predictors in the model.

To manually calculate Stata’s and R’s p-values for some t-value (tvalue), adapt the code below.

g <- length(unique(dat\$panelvar))
n <- nobs(mod)
k <- length(coef(mod))

df_stata <- g - 1
df_r <- n - g - k

pt(abs(tvalue), df_stata, lower.tail = F) * 2 # Stata's p-value
pt(abs(tvalue), df_r, lower.tail = F) * 2 # R's p-value