R Practical Ecotrix
Setup & Data Handling
library(package_name)
Purpose: Loads external packages with specialized functions.
Packages used:
o DescTools (summary stats, residual diagnostics)
o dynlm (dynamic linear models for time series)
o ecm (error correction models)
o lmtest (model tests: RESET, Breusch-Pagan, etc.)
o readxl (import Excel files)
o tseries (JB test, etc.)
o wooldridge (access Wooldridge datasets)
data('dataset_name')
Purpose: Loads datasets from packages, e.g., wage1 from wooldridge.
read_excel("file_path") (from readxl)
Purpose: Imports data from Excel files into R.
📊 Descriptive Statistics & Plots
Desc(variable) (DescTools)
Purpose: Provides comprehensive descriptive statistics.
Use: Initial data exploration to understand distribution, outliers, etc.
hist(variable)
Purpose: Plots histogram of a variable.
Use: Used to visually check normality or distribution.
scatter.smooth(x, y)
Purpose: Plots a scatterplot with a smoothed regression line.
Use: Visual diagnostic for linearity.
📈 Model Estimation
lm(y ~ x1 + x2, data = dataset)
Purpose: Fits a linear regression model.
Use: Core command for cross-sectional or time series OLS models.
Assumptions checked: Linearity, homoscedasticity, no autocorrelation,
no multicollinearity, normality of residuals.
dynlm(y ~ L(y, 1) + x, data = dataset) (dynlm)
Purpose: Fits dynamic linear models including lags.
Use: Specifically for time-series models with lagged terms.
gls(y ~ x1 + x2, data = dataset) (from nlme)
Purpose: Generalized Least Squares estimation.
Use: Remedy for autocorrelation or heteroscedasticity.
Assumption: Adjusts for non-spherical error structure.
lag(variable, -1)
Purpose: Creates lagged variables manually (first lag, etc.)
Use: Needed in time series for dynamic models or Durbin’s h test.
ts(variable, start=c(), end=c(), frequency=12)
Purpose: Creates a time-series object.
Use: Required for time-series modeling; monthly data use frequency = 12.
🔍 Residuals and Normality Tests
resid(model)
Purpose: Extract residuals from a regression model.
jarque.bera.test(variable) (tseries)
Purpose: Tests for normality (null: normal distribution).
Use: For raw variables or residuals.
Assumption checked: Normality of residuals.
🔍 Model Specification Tests
resettest(model, power=2:3, type="fitted", data=dataset) (lmtest)
Purpose: Ramsey RESET test for functional form misspecification.
Use: Checks for omitted variables or wrong functional form.
Assumption: Correct specification of the model.
petest(model1, model2) (ecm)
Purpose: Compares linear vs. log-linear models using PE test.
Use: Tests which functional form fits better.
Assumption checked: Correct model specification.
🧪 Heteroscedasticity Tests
bptest(model, studentize=FALSE) (lmtest)
Purpose: Breusch-Pagan test for heteroscedasticity.
Use: Checks constant variance of residuals (null: homoscedastic).
Assumption checked: Homoscedasticity.
bptest(model, ~x1*x2 + I(x1^2) + I(x2^2), data=dataset) (White’s
Test)
Purpose: White’s test for heteroscedasticity (general form).
Use: More flexible than BP test; includes interaction and nonlinear terms.
🛠 Heteroscedasticity Remedies
lm(y ~ x1 + x2, weights = 1/educ)
Purpose: Weighted Least Squares (WLS).
Use: Remedy for heteroscedasticity when variance is related to an
explanatory variable.
Assumption: Known form of heteroscedasticity.
🔄 Autocorrelation Tests and Remedies
durbinH(model, variable) (lmtest)
Purpose: Durbin’s h test for autocorrelation in presence of lagged
dependent variable.
Use: Time series models with lagged dependent terms.
Assumption checked: No autocorrelation.
bgtest(model, order=2) or bgtest(model, order=3) (lmtest)
Purpose: Breusch-Godfrey test for higher-order autocorrelation.
Use: More flexible than Durbin-Watson; allows lagged dependent
variables.
Assumption checked: No serial correlation.
🔍 Multicollinearity Detection
cor(dataset)
Purpose: Correlation matrix.
Use: Preliminary check for multicollinearity.
Interpretation: High correlation between explanatory variables may
suggest multicollinearity.
vif(model) (car package recommended)
Purpose: Variance Inflation Factor (VIF).
Use: Diagnoses multicollinearity (rule of thumb: VIF > 10 is problematic).
Assumption checked: No multicollinearity.
✅ Model Comparison & Validation
anova(model1, model2)
Purpose: Compares nested models.
Use: Choose better model using F-test or WLS models comparison.
plot(model)
Purpose: Diagnostic plots of residuals, fitted values, etc.
Use: Check linearity, heteroscedasticity, influential points.
✅ Complete Flow of a Typical Analysis
1. Data Import & Description: read_excel(), Desc(), hist(),
scatter.smooth()
2. Model Estimation: lm(), dynlm(), gls()
3. Residual Analysis: resid(), hist(), jarque.bera.test()
4. Heteroscedasticity: bptest(), White’s Test, WLS
5. Autocorrelation: durbinH(), bgtest(), gls()
6. Multicollinearity: cor(), vif()
7. Specification Errors: resettest(), petest()
8. Model Selection: anova(), plot()