0% found this document useful (0 votes)
88 views50 pages

Gretl User's Guide: Econometrics

This document provides a user's guide for Gretl, an open-source statistical package for econometrics. It discusses how to install Gretl, load and structure data files, perform regressions and other analyses, and interpret output. The guide covers the basics of using Gretl as well as more advanced topics like panel data analysis, joining multiple datasets, handling real-time data, and temporal disaggregation.

Uploaded by

Taha Najid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views50 pages

Gretl User's Guide: Econometrics

This document provides a user's guide for Gretl, an open-source statistical package for econometrics. It discusses how to install Gretl, load and structure data files, perform regressions and other analyses, and interpret output. The guide covers the basics of using Gretl as well as more advanced topics like panel data analysis, joining multiple datasets, handling real-time data, and temporal disaggregation.

Uploaded by

Taha Najid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Gretl User’s Guide

Gnu Regression, Econometrics and Time-series Library

Allin Cottrell
Department of Economics
Wake Forest University

Riccardo “Jack” Lucchetti


Dipartimento di Economia
Università Politecnica delle Marche

March, 2023
Permission is granted to copy, distribute and/or modify this document under the terms of the
GNU Free Documentation License, Version 1.1 or any later version published by the Free Software
Foundation (see http://www.gnu.org/licenses/fdl.html).
Contents

1 Introduction 1
1.1 Features at a glance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Installing the programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

I Running the program 3

2 Getting started 4
2.1 Let’s run a regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Estimation output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 The main window menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Keyboard shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 The gretl toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Modes of working 12
3.1 Command scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Saving script objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 The gretl console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 The Session concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Data files 18
4.1 Data file formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Creating a dataset from scratch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4 Structuring a dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5 Panel data specifics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.6 Missing data values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.7 Maximum size of data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.8 Data file collections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.9 Assembling data from multiple sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5 Sub-sampling a dataset 31
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Setting the sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Restricting the sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

i
Contents ii

5.4 Panel data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33


5.5 Resampling and bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6 Graphs and plots 36


6.1 Gnuplot graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.2 Plotting graphs from scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.3 Boxplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

7 Joining data sources 48


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.2 Basic syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.3 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.4 Matching with keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.5 Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.6 String-valued key variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.7 Importing multiple series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.8 A real-world case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.9 The representation of dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.10 Time-series data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.11 Special handling of time columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.12 Panel data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.13 Memo: join options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

8 Realtime data 66
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.2 Atomic format for realtime data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.3 More on time-related options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.4 Getting a certain data vintage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
8.5 Getting the n-th release for each observation period . . . . . . . . . . . . . . . . . . . . . 69
8.6 Getting the values at a fixed lag after the observation period . . . . . . . . . . . . . . . . 70
8.7 Getting the revision history for an observation . . . . . . . . . . . . . . . . . . . . . . . . 71

9 Temporal disaggregation 74
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
9.2 Notation and design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9.3 Overview of data handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.4 Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.5 Function signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
9.6 Handling of deterministic terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.7 Some technical details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.8 The plot option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Contents iii

9.9 Multiple low-frequency series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80


9.10 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

10 Special functions in genr 82


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
10.2 Cumulative densities and p-values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
10.3 Retrieving internal variables (dollar accessors) . . . . . . . . . . . . . . . . . . . . . . . . . 84

11 Gretl data types 85


11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
11.2 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
11.3 Scalars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
11.4 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
11.5 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
11.6 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
11.7 Bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
11.8 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
11.9 The life cycle of gretl objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

12 Discrete variables 98
12.1 Declaring variables as discrete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
12.2 Commands for discrete variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

13 Loop constructs 103


13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
13.2 Loop control variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
13.3 Special controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
13.4 Progressive mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
13.5 Loop examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

14 User-defined functions 110


14.1 Defining a function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
14.2 Calling a function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
14.3 Deleting a function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
14.4 Function programming details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
14.5 Function packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

15 Named lists and strings 122


15.1 Named lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
15.2 Named strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

16 String-valued series 131


Contents iv

16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131


16.2 Creating a string-valued series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
16.3 Permitted operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
16.4 String-valued series and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
16.5 Other import formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

17 Matrix manipulation 139


17.1 Creating matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
17.2 Empty matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
17.3 Selecting submatrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
17.4 Deleting rows or columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
17.5 Matrix operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
17.6 Matrix–scalar operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
17.7 Matrix functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
17.8 Matrix accessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
17.9 Namespace issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
17.10 Creating a data series from a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
17.11 Matrices and lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
17.12 Deleting a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
17.13 Printing a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
17.14 Example: OLS using matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

18 Complex matrices 156


18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
18.2 Creating a complex matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
18.3 Indexation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
18.4 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
18.5 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
18.6 File input/output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
18.7 Backward (in)compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

19 Calendar dates 163


19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
19.2 Date and time representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
19.3 Converting between representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
19.4 Epoch day arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
19.5 Other accessors and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
19.6 Working with pre-Gregorian dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

20 Handling mixed-frequency data 176


20.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Contents v

20.2 The notion of a “MIDAS list” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178


20.3 High-frequency lag lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
20.4 High-frequency first differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
20.5 MIDAS-related plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
20.6 Alternative MIDAS data methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

21 Cheat sheet 187


21.1 Dataset handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
21.2 Creating/modifying variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
21.3 Neat tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

II Econometric methods 203

22 Robust covariance matrix estimation 204


22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
22.2 Cross-sectional data and the HCCME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
22.3 Time series data and HAC covariance matrices . . . . . . . . . . . . . . . . . . . . . . . . 206
22.4 Special issues with panel data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
22.5 The cluster-robust estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

23 Panel data 214


23.1 Estimation of panel models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
23.2 Autoregressive panel models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

24 Dynamic panel models 224


24.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
24.2 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
24.3 Replication of DPD results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
24.4 Cross-country growth example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
24.5 Auxiliary test statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
24.6 Post-estimation available statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
24.7 Memo: dpanel options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

25 Nonlinear least squares 239


25.1 Introduction and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
25.2 Initializing the parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
25.3 NLS dialog window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
25.4 Analytical and numerical derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
25.5 Advanced use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
25.6 Controlling termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
25.7 Details on the code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Contents vi

25.8 Numerical accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

26 Maximum likelihood estimation 245


26.1 Generic ML estimation with gretl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
26.2 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
26.3 Covariance matrix and standard errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
26.4 Gamma estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
26.5 Stochastic frontier cost function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
26.6 GARCH models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
26.7 Analytical derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
26.8 Debugging ML scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
26.9 Using functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
26.10 Advanced use of mle: functions, analytical derivatives, algorithm choice . . . . . . . . 258
26.11 Estimating constrained models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
26.12 Handling non-convergence gracefully . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

27 GMM estimation 266


27.1 Introduction and terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
27.2 GMM as Method of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
27.3 OLS as GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
27.4 TSLS as GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
27.5 Covariance matrix options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
27.6 A real example: the Consumption Based Asset Pricing Model . . . . . . . . . . . . . . . . 273
27.7 Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

28 Model selection criteria 278


28.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
28.2 Information criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

29 Degrees of freedom correction 280


29.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
29.2 Back to basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
29.3 Application to OLS regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
29.4 Beyond OLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
29.5 Consistency and awkward cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
29.6 What gretl does . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

30 Time series filters 286


30.1 Fractional differencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
30.2 The Hodrick–Prescott filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
30.3 The Baxter and King filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Contents vii

30.4 The Butterworth filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288


30.5 The discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

31 Univariate time series models 292


31.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
31.2 ARIMA models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
31.3 Unit root tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
31.4 Cointegration test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
31.5 ARCH and GARCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302

32 Vector Autoregressions 306


32.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
32.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
32.3 Structural VARs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
32.4 Residual-based diagnostic tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

33 Cointegration and Vector Error Correction Models 315


33.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
33.2 Vector Error Correction Models as representation of a cointegrated system . . . . . . . 316
33.3 Interpretation of the deterministic components . . . . . . . . . . . . . . . . . . . . . . . . 317
33.4 The Johansen cointegration tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
33.5 Identification of the cointegration vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
33.6 Over-identifying restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
33.7 Numerical solution methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

34 Multivariate models 332


34.1 The system command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
34.2 Equation systems within functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
34.3 Restriction and estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
34.4 System accessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

35 Forecasting 339
35.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
35.2 Saving and inspecting fitted values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
35.3 The fcast command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
35.4 Univariate forecast evaluation statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
35.5 Forecasts based on VAR models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
35.6 Forecasting from simultaneous systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

36 State Space Modeling 346


36.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
Contents viii

36.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346


36.3 Defining the model as a bundle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
36.4 Special features of state-space bundles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
36.5 The kfilter function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
36.6 The ksmooth function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
36.7 The kdsmooth function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
36.8 Diffuse initialization of the state vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
36.9 Extensions and refinements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
36.10 The ksimul function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
36.11 Numerical optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
36.12 Example scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
36.13 Graphical interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363

37 Numerical methods 370


37.1 Derivative-based optimization methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
37.2 Derivative-free optimization methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
37.3 Numerical differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
37.4 Numerical integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381

38 Discrete and censored dependent variables 383


38.1 Logit and probit models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
38.2 Ordered response models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
38.3 Multinomial logit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
38.4 Bivariate probit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
38.5 Panel estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
38.6 The Tobit model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
38.7 Interval regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
38.8 Sample selection model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
38.9 Count data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
38.10 Duration models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

39 Quantile regression 405


39.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
39.2 Basic syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
39.3 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
39.4 Multiple quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
39.5 Large datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407

40 Nonparametric methods 410


40.1 Locally weighted regression (loess) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
40.2 The Nadaraya–Watson estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
Contents ix

41 MIDAS models 415


41.1 Parsimonious parameterizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
41.2 Estimating MIDAS models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
41.3 Parameterization functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422

III Technical details 426

42 Gretl and ODBC 427


42.1 ODBC support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
42.2 ODBC base concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
42.3 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
42.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
42.5 Connectivity details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431

43 Gretl and TEX 433


43.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
43.2 TEX-related menu items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
43.3 Fine-tuning typeset output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
43.4 Installing and learning TEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438

44 Gretl and R 439


44.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
44.2 Starting an interactive R session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
44.3 Running an R script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
44.4 Taking stuff back and forth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
44.5 Interacting with R from the command line . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
44.6 Performance issues with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
44.7 Further use of the R library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448

45 Gretl and Ox 450


45.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
45.2 Ox support in gretl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
45.3 Illustration: replication of DPD model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452

46 Gretl and Octave 454


46.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
46.2 Octave support in gretl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
46.3 Illustration: spectral methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456

47 Gretl and Stata 458

48 Gretl and Python 460


Contents x

48.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460


48.2 Python support in gretl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
48.3 Illustration: linear regression with multicollinearity . . . . . . . . . . . . . . . . . . . . . 460

49 Gretl and Julia 462


49.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
49.2 Julia support in gretl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
49.3 Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462

50 Troubleshooting gretl 464


50.1 Bug reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
50.2 Auxiliary programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465

51 The command line interface 466

IV Appendices 467

A Data file details 468


A.1 Basic native format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
A.2 Binary data file format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
A.3 Native database format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469

B Building gretl 470


B.1 Installing the prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
B.2 Getting the source: release or git . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
B.3 Configure the source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
B.4 Build and install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473

C Numerical accuracy 475

D Related free software 476

E Listing of URLs 477

Bibliography 478
Chapter 1

Introduction

1.1 Features at a glance


Gretl is an econometrics package, including a shared library, a command-line client program and a
graphical user interface.

User-friendly Gretl offers an intuitive user interface; it is very easy to get up and running with
econometric analysis. Thanks to its association with the econometrics textbooks by Ramu
Ramanathan, Jeffrey Wooldridge, and James Stock and Mark Watson, the package offers many
practice data files and command scripts. These are well annotated and accessible. Two other
useful resources for gretl users are the available documentation and the gretl-users mailing
list.

Flexible You can choose your preferred point on the spectrum from interactive point-and-click to
complex scripting, and can easily combine these approaches.

Cross-platform Gretl’s “home” platform is Linux but it is also available for MS Windows and Mac
OS X, and should work on any unix-like system that has the appropriate basic libraries (see
Appendix B).

Open source The full source code for gretl is available to anyone who wants to critique it, patch it,
or extend it. See Appendix B.

Sophisticated Gretl offers a full range of least-squares based estimators, either for single equations
and for systems, including vector autoregressions and vector error correction models. Sev-
eral specific maximum likelihood estimators (e.g. probit, ARIMA, GARCH) are also provided
natively; more advanced estimation methods can be implemented by the user via generic
maximum likelihood or nonlinear GMM.

Extensible Users can enhance gretl by writing their own functions and procedures in gretl’s script-
ing language, which includes a wide range of matrix functions.

Accurate Gretl has been thoroughly tested on several benchmarks, among which the NIST refer-
ence datasets. See Appendix C.

Internet ready Gretl can fetch materials such databases, collections of textbook datafiles and add-
on packages over the internet.

International Gretl will produce its output in English, French, Italian, Spanish, Polish, Portuguese,
German, Basque, Turkish, Russian, Albanian or Greek depending on your computer’s native
language setting.

1.2 Acknowledgements
The gretl code base originally derived from the program ESL (“Econometrics Software Library”),
written by Professor Ramu Ramanathan of the University of California, San Diego. We are much in
debt to Professor Ramanathan for making this code available under the GNU General Public Licence
and for helping to steer gretl’s early development.

1
Chapter 1. Introduction 2

We are also grateful to the authors of several econometrics textbooks for permission to package for
gretl various datasets associated with their texts. This list currently includes William Greene, au-
thor of Econometric Analysis; Jeffrey Wooldridge (Introductory Econometrics: A Modern Approach);
James Stock and Mark Watson (Introduction to Econometrics); Damodar Gujarati (Basic Economet-
rics); Russell Davidson and James MacKinnon (Econometric Theory and Methods); and Marno Ver-
beek (A Guide to Modern Econometrics).
GARCH estimation in gretl is based on code deposited in the archive of the Journal of Applied
Econometrics by Professors Fiorentini, Calzolari and Panattoni, and the code to generate p-values
for Dickey–Fuller tests is due to James MacKinnon. In each case we are grateful to the authors for
permission to use their work.
With regard to the internationalization of gretl, thanks go to Ignacio Díaz-Emparanza (Spanish),
Michel Robitaille and Florent Bresson (French), Cristian Rigamonti (Italian), Tadeusz Kufel and Pawel
Kufel (Polish), Markus Hahn and Sven Schreiber (German), Hélio Guilherme and Henrique Andrade
(Portuguese), Susan Orbe (Basque), Talha Yalta (Turkish) and Alexander Gedranovich (Russian).
Gretl has benefitted greatly from the work of numerous developers of free, open-source software:
for specifics please see Appendix B. Our thanks are due to Richard Stallman of the Free Software
Foundation, for his support of free software in general and for agreeing to “adopt” gretl as a GNU
program in particular.
Many users of gretl have submitted useful suggestions and bug reports. In this connection par-
ticular thanks are due to Ignacio Díaz-Emparanza, Tadeusz Kufel, Pawel Kufel, Alan Isaac, Cri
Rigamonti, Sven Schreiber, Talha Yalta, Andreas Rosenblad, and Dirk Eddelbuettel, who maintains
the gretl package for Debian GNU/Linux.

1.3 Installing the programs


Linux
On the Linux1 platform you have the choice of compiling the gretl code yourself or making use of a
pre-built package. Building gretl from the source is necessary if you want to access the development
version or customize gretl to your needs, but this takes quite a few skills; most users will want to
go for a pre-built package.
Some Linux distributions feature gretl as part of their standard offering: Debian, Ubuntu and Fe-
dora, for example. If this is the case, all you need to do is install gretl through your package
manager of choice. In addition the gretl webpage at http://gretl.sourceforge.net offers a
“generic” package in rpm format for modern Linux systems.
If you prefer to compile your own (or are using a unix system for which pre-built packages are not
available), instructions on building gretl can be found in Appendix B.

MS Windows
The MS Windows version comes as a self-extracting executable. Installation is just a matter of
downloading gretl_install.exe and running this program. You will be prompted for a location
to install the package.

Mac OS X
The Mac version comes as a gzipped disk image. Installation is a matter of downloading the image
file, opening it in the Finder, and dragging Gretl.app to the Applications folder. However, when
installing for the first time two prerequisite packages must be put in place first; details are given
on the gretl website.

1 In this manual we use “Linux” as shorthand to refer to the GNU/Linux operating system. What is said herein about

Linux mostly applies to other unix-type systems too, though some local modifications may be needed.
Part I

Running the program

3
Chapter 2

Getting started

2.1 Let’s run a regression


This introduction is mostly angled towards the graphical client program; please see Chapter 51
below and the Gretl Command Reference for details on the command-line program, gretlcli.
You can supply the name of a data file to open as an argument to gretl, but for the moment let’s
not do that: just fire up the program.1 You should see a main window (which will hold information
on the data set but which is at first blank) and various menus, some of them disabled at first.
What can you do at this point? You can browse the supplied data files (or databases), open a data
file, create a new data file, read the help items, or open a command script. For now let’s browse the
supplied data files. Under the File menu choose “Open data, Sample file”. A second notebook-type
window will open, presenting the sets of data files supplied with the package (see Figure 2.1). Select
the first tab, “Ramanathan”. The numbering of the files in this section corresponds to the chapter
organization of Ramanathan (2002), which contains discussion of the analysis of these data. The
data will be useful for practice purposes even without the text.

Figure 2.1: Practice data files window

If you select a row in this window and click on “Info” this opens a window showing information on
the data set in question (for example, on the sources and definitions of the variables). If you find
a file that is of interest, you may open it by clicking on “Open”, or just double-clicking on the file
name. For the moment let’s open data3-6.

☞ In gretl windows containing lists, double-clicking on a line launches a default action for the associated list
entry: e.g. displaying the values of a data series, opening a file.

1 For convenience we refer to the graphical client program simply as gretl in this manual. Note, however, that the

specific name of the program differs according to the computer platform. On Linux it is called gretl_x11 while on
MS Windows it is gretl.exe. On Linux systems a wrapper script named gretl is also installed — see also the Gretl
Command Reference.

4
Chapter 2. Getting started 5

This file contains data pertaining to a classic econometric “chestnut”, the consumption function.
The data window should now display the name of the current data file, the overall data range and
sample range, and the names of the variables along with brief descriptive tags — see Figure 2.2.

Figure 2.2: Main window, with a practice data file open

OK, what can we do now? Hopefully the various menu options should be fairly self explanatory. For
now we’ll dip into the Model menu; a brief tour of all the main window menus is given in Section 2.3
below.
Gretl’s Model menu offers numerous various econometric estimation routines. The simplest and
most standard is Ordinary Least Squares (OLS). Selecting OLS pops up a dialog box calling for a
model specification —see Figure 2.3.

Figure 2.3: Model specification dialog

To select the dependent variable, highlight the variable you want in the list on the left and click
the arrow that points to the Dependent variable slot. If you check the “Set as default” box this
variable will be pre-selected as dependent when you next open the model dialog box. Shortcut:
double-clicking on a variable on the left selects it as dependent and also sets it as the default. To
select independent variables, highlight them on the left and click the green arrow (or right-click the
Chapter 2. Getting started 6

highlighted variable); to remove variables from the selected list, use the rad arrow. To select several
variable in the list box, drag the mouse over them; to select several non-contiguous variables, hold
down the Ctrl key and click on the variables you want. To run a regression with consumption as
the dependent variable and income as independent, click Ct into the Dependent slot and add Yt to
the Independent variables list.

2.2 Estimation output


Once you’ve specified a model, a window displaying the regression output will appear. The output
is reasonably comprehensive and in a standard format (Figure 2.4).

Figure 2.4: Model output window

The output window contains menus that allow you to inspect or graph the residuals and fitted
values, and to run various diagnostic tests on the model.
For most models there is also an option to print the regression output in LATEX format. See Chap-
ter 43 for details.
To import gretl output into a word processor, you may copy and paste from an output window,
using its Edit menu (or Copy button, in some contexts) to the target program. Many (not all) gretl
windows offer the option of copying in RTF (Microsoft’s “Rich Text Format”) or as LATEX. If you are
pasting into a word processor, RTF may be a good option because the tabular formatting of the
output is preserved.2 Alternatively, you can save the output to a (plain text) file then import the
file into the target program. When you finish a gretl session you are given the option of saving all
the output from the session to a single file.
Note that on the gnome desktop and under MS Windows, the File menu includes a command to
send the output directly to a printer.

☞ When pasting or importing plain text gretl output into a word processor, select a monospaced or typewriter-
style font (e.g. Courier) to preserve the output’s tabular formatting. Select a small font (10-point Courier
should do) to prevent the output lines from being broken in the wrong place.

2 Note that when you copy as RTF under MS Windows, Windows will only allow you to paste the material into ap-

plications that “understand” RTF. Thus you will be able to paste into MS Word, but not into notepad. Note also that
there appears to be a bug in some versions of Windows, whereby the paste will not work properly unless the “target”
application (e.g. MS Word) is already running prior to copying the material in question.
Chapter 2. Getting started 7

2.3 The main window menus


Reading left to right along the main window’s menu bar, we find the File, Tools, Data, View, Add,
Sample, Variable, Model and Help menus.

• File menu

– Open data: Open a native gretl data file or import from other formats. See Chapter 4.
– Append data: Add data to the current working data set, from a gretl data file, a comma-
separated values file or a spreadsheet file.
– Save data: Save the currently open native gretl data file.
– Save data as: Write out the current data set in native format, with the option of using
gzip data compression. See Chapter 4.
– Export data: Write out the current data set in Comma Separated Values (CSV) format, or
the formats of GNU R or GNU Octave. See Chapter 4 and also Appendix D.
– Send to: Send the current data set as an e-mail attachment.
– New data set: Allows you to create a blank data set, ready for typing in values or for
importing series from a database. See below for more on databases.
– Clear data set: Clear the current data set out of memory. Generally you don’t have to do
this (since opening a new data file automatically clears the old one) but sometimes it’s
useful.
– Working directory: Change the current working directory (or “workdir”) and specify re-
lated options. For an explanation of the role of the workdir click the Help button in the
dialog window which is presented, or refer to the documentation of the set command
with the workdir option in the command reference.
– Script files: A “script” is a file containing a sequence of gretl commands. This item
contains entries that let you open a script you have created previously (“User file”), open
a sample script, or open an editor window in which you can create a new script.
– Session files: A “session” file contains a snapshot of a previous gretl session, including
the data set used and any models or graphs that you saved. Under this item you can
open a saved session or save the current session.
– Databases: Allows you to browse various large databases, either on your own computer
or, if you are connected to the internet, on the gretl database server. See Section 4.2 for
details.
– Function packages: Manage user-contributed function packages that extend gretl’s capa-
bilities. To learn more about such packages written in gretl’s built-in matrix and scripting
language “hansl”, please refer to the “Packages” entry in Help menu.
– Resource from addon: Access example scripts and datafiles that are shipped as part of
gretl’s official “addons”. (Addons are function packages that are more tightly integrated
with the gretl program than standard user-contributed packages.)
– Exit: Quit the program. You’ll be prompted to save any unsaved work.

• Tools menu

– Statistical tables: Look up critical values for commonly used distributions (normal or
Gaussian, t, chi-square, F and Durbin–Watson).
– P-value finder: Look up p-values from the Gaussian, t, chi-square, F, gamma, binomial or
Poisson distributions. See also the pvalue command in the Gretl Command Reference.
Chapter 2. Getting started 8

– Distribution graphs: Produce graphs of various probability distributions. In the resulting


graph window, the pop-up menu includes an item “Add another curve”, which enables
you to superimpose a further plot (for example, you can draw the t distribution with
various different degrees of freedom).
– Test statistic calculator: Calculate test statistics and p-values for a range of common hy-
pothesis tests (population mean, variance and proportion; difference of means, variances
and proportions).
– Nonparametric tests: Calculate test statistics for various nonparametric tests (Sign test,
Wilcoxon rank sum test, Wilcoxon signed rank test, Runs test).
– Seed for random numbers: Set the seed for the random number generator (by default
this is set based on the system time when the program is started).
– Command log: Open a window containing a record of the commands executed so far.
– Gretl console: Open a “console” window into which you can type commands as you would
using the command-line program, gretlcli (as opposed to using point-and-click).
– Start Gnu R: Start R (if it is installed on your system), and load a copy of the data set
currently open in gretl. See Appendix D.
– Sort variables: Rearrange the listing of variables in the main window, either by ID number
or alphabetically by name.
– Function packages: Handles “function packages” (see Section 14.5), which allow you to
access functions written by other users and share the ones written by you.
– NIST test suite: Check the numerical accuracy of gretl against the reference results for
linear regression made available by the (US) National Institute of Standards and Technol-
ogy.
– Preferences: Set the paths to various files gretl needs to access. Choose the font in which
gretl displays text output. Activate or suppress gretl’s messaging about the availability
of program updates, and so on. See the Gretl Command Reference for further details.

• Data menu

– Select all: Several menu items act upon those variables that are currently selected in the
main window. This item lets you select all the variables.
– Display values: Pops up a window with a simple (not editable) printout of the values of
the selected variable or variables.
– Edit values: Opens a spreadsheet window where you can edit the values of the selected
variables.
– Add observations: Gives a dialog box in which you can choose a number of observations
to add at the end of the current dataset; for use with forecasting.
– Remove extra observations: Active only if extra observations have been added automati-
cally in the process of forecasting; deletes these extra observations.
– Read info, Edit info: “Read info” just displays the summary information for the current
data file; “Edit info” allows you to make changes to it (if you have permission to do so).
– Print description: Opens a window containing a full account of the current dataset, in-
cluding the summary information and any specific information on each of the variables.
– Add case markers: Prompts for the name of a text file containing “case markers” (short
strings identifying the individual observations) and adds this information to the data set.
See Chapter 4.
– Remove case markers: Active only if the dataset has case markers identifying the obser-
vations; removes these case markers.
Chapter 2. Getting started 9

– Dataset structure: invokes a series of dialog boxes which allow you to change the struc-
tural interpretation of the current dataset. For example, if data were read in as a cross
section you can get the program to interpret them as time series or as a panel. See also
section 4.4.
– Compact data: For time-series data of higher than annual frequency, gives you the option
of compacting the data to a lower frequency, using one of four compaction methods
(average, sum, start of period or end of period).
– Expand data: For time-series data, gives you the option of expanding the data to a higher
frequency.
– Transpose data: Turn each observation into a variable and vice versa (or in other words,
each row of the data matrix becomes a column in the modified data matrix); can be useful
with imported data that have been read in “sideways”.

• View menu

– Icon view: Opens a window showing the content of the current session as a set of icons;
see section 3.4.
– Graph specified vars: Gives a choice between a time series plot, a regular X–Y scatter
plot, an X–Y plot using impulses (vertical bars), an X–Y plot “with factor separation” (i.e.
with the points colored differently depending to the value of a given dummy variable),
boxplots, and a 3-D graph. Serves up a dialog box where you specify the variables to
graph. See Chapter 6 for details.
– Multiple graphs: Allows you to compose a set of up to six small graphs, either pairwise
scatter-plots or time-series graphs. These are displayed together in a single window.
– Summary statistics: Shows a full set of descriptive statistics for the variables selected in
the main window.
– Correlation matrix: Shows the pairwise correlation coefficients for the selected variables.
– Cross Tabulation: Shows a cross-tabulation of the selected variables. This works only if
at least two variables in the data set have been marked as discrete (see Chapter 12).
– Principal components: Produces a Principal Components Analysis for the selected vari-
ables.
– Mahalanobis distances: Computes the Mahalanobis distance of each observation from
the centroid of the selected set of variables.
– Cross-correlogram: Computes and graphs the cross-correlogram for two selected vari-
ables.

• Add menu Offers various standard transformations of variables (logs, lags, squares, etc.) that
you may wish to add to the data set. Also gives the option of adding random variables, and
(for time-series data) adding seasonal dummy variables (e.g. quarterly dummy variables for
quarterly data).

• Sample menu

– Set range: Select a different starting and/or ending point for the current sample, within
the range of data available.
– Restore full range: self-explanatory.
– Define, based on dummy: Given a dummy (indicator) variable with values 0 or 1, this
drops from the current sample all observations for which the dummy variable has value
0.
– Restrict, based on criterion: Similar to the item above, except that you don’t need a pre-
defined variable: you supply a Boolean expression (e.g. sqft > 1400) and the sample is
restricted to observations satisfying that condition. See the entry for genr in the Gretl
Command Reference for details on the Boolean operators that can be used.
Chapter 2. Getting started 10

– Random sub-sample: Draw a random sample from the full dataset.


– Drop all obs with missing values: Drop from the current sample all observations for
which at least one variable has a missing value (see Section 4.6).
– Count missing values: Give a report on observations where data values are missing. May
be useful in examining a panel data set, where it’s quite common to encounter missing
values.
– Set missing value code: Set a numerical value that will be interpreted as “missing” or “not
available”. This is intended for use with imported data, when gretl has not recognized
the missing-value code used.

• Variable menu Most items under here operate on a single variable at a time. The “active”
variable is set by highlighting it (clicking on its row) in the main data window. Most options
will be self-explanatory. Note that you can rename a variable and can edit its descriptive label
under “Edit attributes”. You can also “Define a new variable” via a formula (e.g. involving
some function of one or more existing variables). For the syntax of such formulae, look at the
online help for “Generate variable syntax” or see the genr command in the Gretl Command
Reference. One simple example:

foo = x1 * x2

will create a new variable foo as the product of the existing variables x1 and x2. In these
formulae, variables must be referenced by name, not number.

• Model menu For details on the various estimators offered under this menu please consult the
Gretl Command Reference. Also see Chapter 25 regarding the estimation of nonlinear models.

• Help menu Please use this as needed! It gives details on the syntax required in various dialog
entries.

2.4 Keyboard shortcuts


When working in the main gretl window, some common operations may be performed using the
keyboard, as shown in the table below.

Return Opens a window displaying the values of the currently selected variables: it is
the same as selecting “Data, Display Values”.
Delete Pressing this key has the effect of deleting the selected variables. A confirma-
tion is required, to prevent accidental deletions.
e Has the same effect as selecting “Edit attributes” from the “Variable” menu.
F2 Same as “e”. Included for compatibility with other programs.
g Has the same effect as selecting “Define new variable” from the “Variable”
menu (which maps onto the genr command).
h Opens a help window for gretl commands.
F1 Same as “h”. Included for compatibility with other programs.
r Refreshes the variable list in the main window.
t Graphs the selected variable; a line graph is used for time-series datasets,
whereas a distribution plot is used for cross-sectional data.

2.5 The gretl toolbar


At the bottom left of the main window sits the toolbar.
Chapter 2. Getting started 11

The icons have the following functions, reading from left to right:

1. Launch a calculator program. A convenience function in case you want quick access to a
calculator when you’re working in gretl. The default program is calc.exe under MS Win-
dows, or xcalc under the X window system. You can change the program under the “Tools,
Preferences, General” menu, “Programs” tab.

2. Start a new script. Opens an editor window in which you can type a series of commands to be
sent to the program as a batch.

3. Open the gretl console. A shortcut to the “Gretl console” menu item (Section 2.3 above).

4. Open the session icon window.

5. Open a window displaying available gretl function packages.

6. Open this manual in PDF format.

7. Open the help item for script commands syntax (i.e. a listing with details of all available
commands).

8. Open the dialog box for defining a graph.

9. Open the dialog box for estimating a model using ordinary least squares.

10. Open a window listing the sample datasets supplied with gretl, and any other data file collec-
tions that have been installed.
Chapter 3

Modes of working

3.1 Command scripts


As you execute commands in gretl, using the GUI and filling in dialog entries, those commands are
recorded in the form of a “script” or batch file. Such scripts can be edited and re-run, using either
gretl or the command-line client, gretlcli.
To view the current state of the script at any point in a gretl session, choose “Command log” under
the Tools menu. This log file is called session.inp and it is overwritten whenever you start a new
session. To preserve it, save the script under a different name. Script files will be found most easily,
using the GUI file selector, if you name them with the extension “.inp”.
To open a script you have written independently, use the “File, Script files” menu item; to create a
script from scratch use the “File, Script files, New script” item or the “new script” toolbar button.
In either case a script window will open (see Figure 3.1).

Figure 3.1: Script window, editing a command file

The toolbar at the top of the script window offers the following functions (left to right): (1) Save
the file; (2) Save the file under a specified name; (3) Print the file (this option is not available on all
platforms); (4) Execute the commands in the file; (5) Copy selected text; (6) Paste the selected text;
(7) Find and replace text; (8) Undo the last Paste or Replace action; (9) Help (if you place the cursor
in a command word and press the question mark you will get help on that command); (10) Close
the window.
When you execute the script, by clicking on the Execute icon or by pressing Ctrl-r, all output is
directed to a single window, where it can be edited, saved or copied to the clipboard. To learn
more about the possibilities of scripting, take a look at the gretl Help item “Command reference,”

12
Chapter 3. Modes of working 13

or start up the command-line program gretlcli and consult its help, or consult the Gretl Command
Reference.
If you run the script when part of it is highlighted, gretl will only run that portion. Moreover, if you
want to run just the current line, you can do so by pressing Ctrl-Enter.1
Clicking the right mouse button in the script editor window produces a pop-up menu. This gives
you the option of executing either the line on which the cursor is located, or the selected region of
the script if there’s a selection in place. If the script is editable, this menu also gives the option of
adding or removing comment markers from the start of the line or lines.
The gretl package includes over 70 example scripts. Many of these relate to Ramanathan (2002),
but they may also be used as a free-standing introduction to scripting in gretl and to various points
of econometric theory. You can explore the example files under “File, Script files, Example scripts”
There you will find a listing of the files along with a brief description of the points they illustrate
and the data they employ. Open any file and run it to see the output. Note that long commands in
a script can be broken over two or more lines, using backslash as a continuation character.
You can, if you wish, use the GUI controls and the scripting approach in tandem, exploiting each
method where it offers greater convenience. Here are two suggestions.

• Open a data file in the GUI. Explore the data — generate graphs, run regressions, perform tests.
Then open the Command log, edit out any redundant commands, and save it under a specific
name. Run the script to generate a single file containing a concise record of your work.

• Start by establishing a new script file. Type in any commands that may be required to set
up transformations of the data (see the genr command in the Gretl Command Reference).
Typically this sort of thing can be accomplished more efficiently via commands assembled
with forethought rather than point-and-click. Then save and run the script: the GUI data
window will be updated accordingly. Now you can carry out further exploration of the data
via the GUI. To revisit the data at a later point, open and rerun the “preparatory” script first.

Scripts and data files


One common way of doing econometric research with gretl is as follows: compose a script; execute
the script; inspect the output; modify the script; run it again — with the last three steps repeated as
many times as necessary. In this context, note that when you open a data file this clears out most
of gretl’s internal state. It’s therefore probably a good idea to have your script start with an open
command: the data file will be re-opened each time, and you can be confident you’re getting “fresh”
results.
One further point should be noted. When you go to open a new data file via the graphical interface,
you are always prompted: opening a new data file will lose any unsaved work, do you really want
to do this? When you execute a script that opens a data file, however, you are not prompted. The
assumption is that in this case you’re not going to lose any work, because the work is embodied
in the script itself (and it would be annoying to be prompted at each iteration of the work cycle
described above).
This means you should be careful if you’ve done work using the graphical interface and then decide
to run a script: the current data file will be replaced without any questions asked, and it’s your
responsibility to save any changes to your data first.

1 This feature is not unique to gretl; other econometric packages offer the same facility. However, experience shows

that while this can be remarkably useful, it can also lead to writing dinosaur scripts that are never meant to be executed
all at once, but rather used as a chaotic repository to cherry-pick snippets from. Since gretl allows you to have several
script windows open at the same time, you may want to keep your scripts tidy and reasonably small.
Chapter 3. Modes of working 14

3.2 Saving script objects


When you estimate a model using point-and-click, the model results are displayed in a separate
window, offering menus which let you perform tests, draw graphs, save data from the model, and
so on. Ordinarily, when you estimate a model using a script you just get a non-interactive printout
of the results. You can, however, arrange for models estimated in a script to be “captured”, so that
you can examine them interactively when the script is finished. Here is an example of the syntax
for achieving this effect:

Model1 <- ols Ct 0 Yt

That is, you type a name for the model to be saved under, then a back-pointing “assignment arrow”,
then the model command. The assignment arrow is composed of the less-than sign followed by a
dash; it must be separated by spaces from both the preceding name and the following command.
The name for a saved object may include spaces, but in that case it must be wrapped in double
quotes:

"Model 1" <- ols Ct 0 Yt

Models saved in this way will appear as icons in the gretl icon view window (see Section 3.4) after
the script is executed. In addition, you can arrange to have a named model displayed (in its own
window) automatically as follows:

Model1.show

Again, if the name contains spaces it must be quoted:

"Model 1".show

The same facility can be used for graphs. For example the following will create a plot of Ct against
Yt, save it under the name “CrossPlot” (it will appear under this name in the icon view window),
and have it displayed:

CrossPlot <- gnuplot Ct Yt


CrossPlot.show

You can also save the output from selected commands as named pieces of text (again, these will
appear in the session icon window, from where you can open them later). For example this com-
mand sends the output from an augmented Dickey–Fuller test to a “text object” named ADF1 and
displays it in a window:

ADF1 <- adf 2 x1


ADF1.show

Objects saved in this way (whether models, graphs or pieces of text output) can be destroyed using
the command .free appended to the name of the object, as in ADF1.free.

3.3 The gretl console


A further option is available for your computing convenience. Under gretl’s “Tools” menu you will
find the item “Gretl console” (there is also an “open gretl console” button on the toolbar in the
main window). This opens up a window in which you can type commands and execute them one
by one (by pressing the Enter key) interactively. This is essentially the same as gretlcli’s mode of
operation, except that the GUI is updated based on commands executed from the console, enabling
you to work back and forth as you wish.
Chapter 3. Modes of working 15

In the console, you have “command history”; that is, you can use the up and down arrow keys to
navigate the list of command you have entered to date. You can retrieve, edit and then re-enter a
previous command.
In console mode, you can create, display and free objects (models, graphs or text) aa described
above for script mode.

3.4 The Session concept


Gretl offers the idea of a “session” as a way of keeping track of your work and revisiting it later.
The basic idea is to provide an iconic space containing various objects pertaining to your current
working session (see Figure 3.2). You can add objects (represented by icons) to this space as you
go along. If you save the session, these added objects should be available again if you re-open the
session later.

Figure 3.2: Icon view: one model and one graph have been added to the default icons

If you start gretl and open a data set, then select “Icon view” from the View menu, you should see
the basic default set of icons: these give you quick access to information on the data set (if any),
correlation matrix (“Correlations”) and descriptive summary statistics (“Summary”). All of these
are activated by double-clicking the relevant icon. The “Data set” icon is a little more complex:
double-clicking opens up the data in the built-in spreadsheet, but you can also right-click on the
icon for a menu of other actions.
To add a model to the Icon view, first estimate it using the Model menu. Then pull down the File
menu in the model window and select “Save to session as icon. . . ” or “Save as icon and close”.
Simply hitting the S key over the model window is a shortcut to the latter action.
To add a graph, first create it (under the View menu, “Graph specified vars”, or via one of gretl’s
other graph-generating commands). Click on the graph window to bring up the graph menu, and
select “Save to session as icon”.
Once a model or graph is added its icon will appear in the Icon view window. Double-clicking on the
icon redisplays the object, while right-clicking brings up a menu which lets you display or delete
the object. This popup menu also gives you the option of editing graphs.

The model table


In econometric research it is common to estimate several models with a common dependent
variable—the models differing in respect of which independent variables are included, or per-
haps in respect of the estimator used. In this situation it is convenient to present the regression
results in the form of a table, where each column contains the results (coefficient estimates and
standard errors) for a given model, and each row contains the estimates for a given variable across
the models. Note that some estimation methods are not compatible with the straightforward model
Chapter 3. Modes of working 16

table format, therefore gretl will not let those models be added to the model table. These meth-
ods include non-linear least squares (nls), generic maximum-likelihood estimators (mle), generic
GMM (gmm), dynamic panel models (dpanel), interval regressions (intreg), bivariate probit models
(biprobit), AR(I)MA models (arima or arma), and (G)ARCH models (garch and arch).
In the Icon view window gretl provides a means of constructing such a table (and copying it in plain
text, LATEX or Rich Text Format). The procedure is outlined below. (The model table can also be built
non-interactively, in script mode—see the entry for modeltab in the Gretl Command Reference.)

1. Estimate a model which you wish to include in the table, and in the model display window,
under the File menu, select “Save to session as icon” or “Save as icon and close”.

2. Repeat step 1 for the other models to be included in the table (up to a total of six models).

3. When you are done estimating the models, open the icon view of your gretl session, by se-
lecting “Icon view” under the View menu in the main gretl window, or by clicking the “session
icon view” icon on the gretl toolbar.

4. In the Icon view, there is an icon labeled “Model table”. Decide which model you wish to
appear in the left-most column of the model table and add it to the table, either by dragging
its icon onto the Model table icon, or by right-clicking on the model icon and selecting “Add
to model table” from the pop-up menu.

5. Repeat step 4 for the other models you wish to include in the table. The second model selected
will appear in the second column from the left, and so on.

6. When you are finished composing the model table, display it by double-clicking on its icon.
Under the Edit menu in the window which appears, you have the option of copying the table
to the clipboard in various formats.

7. If the ordering of the models in the table is not what you wanted, right-click on the model
table icon and select “Clear table”. Then go back to step 4 above and try again.

A simple instance of gretl’s model table is shown in Figure 3.3.

Figure 3.3: Example of model table


Chapter 3. Modes of working 17

The graph page


The “graph page” icon in the session window offers a means of putting together several graphs
for printing on a single page. This facility will work only if you have the LATEX typesetting system
installed, and are able to generate and view either PDF or PostScript output. The output format
is controlled by your choice of program for compiling TEX files, which can be found under the
“Programs” tab in the Preferences dialog box (under the “Tools” menu in the main window). Usually
this should be pdflatex for PDF output or latex for PostScript. In the latter case you must have a
working set-up for handling PostScript, which will usually include dvips, ghostscript and a viewer
such as gv, ggv or kghostview.
In the Icon view window, you can drag up to eight graphs onto the graph page icon. When you
double-click on the icon (or right-click and select “Display”), a page containing the selected graphs
(in PDF or EPS format) will be composed and opened in your viewer. From there you should be able
to print the page.
To clear the graph page, right-click on its icon and select “Clear”.
As with the model table, it is also possible to manipulate the graph page via commands in script or
console mode—see the entry for the graphpg command in the Gretl Command Reference.

Saving and re-opening sessions


If you create models or graphs that you think you may wish to re-examine later, then before quitting
gretl select “Session files, Save session” from the File menu and give a name under which to save
the session. To re-open the session later, either

• Start gretl then re-open the session file by going to the “File, Session files, Open session”, or

• From the command line, type gretl -r sessionfile, where sessionfile is the name under which
the session was saved, or

• Drag the icon representing a session file onto gretl.


Chapter 4

Data files

4.1 Data file formats


Gretl has its own native format for data files. Most users will probably not want to read or write
such files outside of gretl itself, but occasionally this may be useful and details on the file formats
are given in Appendix A. The program can also import data from a variety of other formats. In
the GUI program this can be done via the “File, Open Data, User file” menu — note the drop-down
list of acceptable file types. In script mode, simply use the open command. The supported import
formats are as follows.

• Plain text files (comma-separated or “CSV” being the most common type). For details on what
gretl expects of such files, see Section 4.3.

• Spreadsheets: MS Excel, Gnumeric and Open Document (ODS). The requirements for such files
are given in Section 4.3.

• Stata data files (.dta).

• SPSS data files (.sav).

• SAS “xport” files (.xpt).

• Eviews workfiles (.wf1).1

• JMulTi data files.

When you import data from a plain text format, gretl opens a “diagnostic” window, reporting on its
progress in reading the data. If you encounter a problem with ill-formatted data, the messages in
this window should give you a handle on fixing the problem.
Note that gretl has a facility for writing out data in the native formats of GNU R, Octave, JMulTi and
PcGive (see Appendix D). In the GUI client this option is found under the “File, Export data” menu;
in the command-line client use the store command with the appropriate option flag.

4.2 Databases
For working with large amounts of data gretl is supplied with a database-handling routine. A
database, as opposed to a data file, is not read directly into the program’s workspace. A database
can contain series of mixed frequencies and sample ranges. You open the database and select
series to import into the working dataset. You can then save those series in a native format data
file if you wish. Databases can be accessed via the menu item “File, Databases”.
For details on the format of gretl databases, see Appendix A.

1 See http://users.wfu.edu/cottrell/eviews_format/.

18
Chapter 4. Data files 19

Online access to databases


Several gretl databases are available from Wake Forest University. Your computer must be con-
nected to the internet for this option to work. Please see the description of the “data” command
under the Help menu.

☞ Visit the gretl data page for details and updates on available data.

Foreign database formats


Thanks to Thomas Doan of Estima, who made available the specification of the database format
used by RATS 4 (Regression Analysis of Time Series), gretl can handle such databases — or at least,
a subset of same, namely time-series databases containing monthly and quarterly series.
Gretl can also import data from PcGive databases. These take the form of a pair of files, one
containing the actual data (with suffix .bn7) and one containing supplementary information (.in7).
In addition, gretl offers ODBC connectivity. Be warned: this feature is meant for somewhat ad-
vanced users; there is currently no graphical interface. Interested readers will find more info in
appendix 42.

4.3 Creating a dataset from scratch


There are several ways of doing this:

1. Find, or create using a text editor, a plain text data file and open it via “Import”.

2. Use your favorite spreadsheet to establish the data file, save it in comma-separated format if
necessary (this may not be necessary if the spreadsheet format is MS Excel, Gnumeric or Open
Document), then use one of the “Import” options.

3. Use gretl’s built-in spreadsheet.

4. Select data series from a suitable database.

5. Use your favorite text editor or other software tools to a create data file in gretl format inde-
pendently.

Here are a few comments and details on these methods.

Common points on imported data


Options (1) and (2) involve using gretl’s “import” mechanism. For the program to read such data
successfully, certain general conditions must be satisfied:

• The first row must contain valid variable names. A valid variable name is of 31 characters
maximum; starts with a letter; and contains nothing but letters, numbers and the underscore
character, _. (Longer variable names will be truncated to 31 characters.) Qualifications to the
above: First, in the case of an plain text import, if the file contains no row with variable names
the program will automatically add names, v1, v2 and so on. Second, by “the first row” is
meant the first relevant row. In the case of plain text imports, blank rows and rows beginning
with a hash mark, #, are ignored. In the case of Excel, Gnumeric and ODS imports, you are
presented with a dialog box where you can select an offset into the spreadsheet, so that gretl
will ignore a specified number of rows and/or columns.

• Data values: these should constitute a rectangular block, with one variable per column (and
one observation per row). The number of variables (data columns) must match the number
of variable names given. See also section 4.6. Numeric data are expected, but in the case of
Chapter 4. Data files 20

importing from plain text, the program offers limited handling of character (string) data: if
a given column contains character data only, consecutive numeric codes are substituted for
the strings, and once the import is complete a table is printed showing the correspondence
between the strings and the codes.

• Dates (or observation labels): Optionally, the first column may contain strings such as dates,
or labels for cross-sectional observations. Such strings have a maximum of 15 characters (as
with variable names, longer strings will be truncated). A column of this sort should be headed
with the string obs or date, or the first row entry may be left blank.
For dates to be recognized as such, the date strings should adhere to one or other of a set of
specific formats, as follows. For annual data: 4-digit years. For quarterly data: a 4-digit year,
followed by a separator (either a period, a colon, or the letter Q), followed by a 1-digit quarter.
Examples: 1997.1, 2002:3, 1947Q1. For monthly data: a 4-digit year, followed by a period or
a colon, followed by a two-digit month. Examples: 1997.01, 2002:10.

Plain text (“CSV”) files can use comma, space, tab or semicolon as the column separator. When you
open such a file via the GUI you are given the option of specifying the separator, though in most
cases it should be detected automatically.
If you use a spreadsheet to prepare your data you are able to carry out various transformations of
the “raw” data with ease (adding things up, taking percentages or whatever): note, however, that
you can also do this sort of thing easily — perhaps more easily — within gretl, by using the tools
under the “Add” menu.

Appending imported data


You may wish to establish a dataset piece by piece, by incremental importation of data from other
sources. This is supported via the “File, Append data” menu items: gretl will check the new data for
conformability with the existing dataset and, if everything seems OK, will merge the data. You can
add new variables in this way, provided the data frequency matches that of the existing dataset. Or
you can append new observations for data series that are already present; in this case the variable
names must match up correctly. Note that by default (that is, if you choose “Open data” rather
than “Append data”), opening a new data file closes the current one.

Using the built-in spreadsheet


Under the “File, New data set” menu you can choose the sort of dataset you want to establish (e.g.
quarterly time series, cross-sectional). You will then be prompted for starting and ending dates (or
observation numbers) and the name of the first variable to add to the dataset. After supplying this
information you will be faced with a simple spreadsheet into which you can type data values. In
the spreadsheet window, clicking the right mouse button will invoke a popup menu which enables
you to add a new variable (column), to add an observation (append a row at the foot of the sheet),
or to insert an observation at the selected point (move the data down and insert a blank row.)
Once you have entered data into the spreadsheet you import these into gretl’s workspace using the
spreadsheet’s “Apply changes” button.
Please note that gretl’s spreadsheet is quite basic and has no support for functions or formulas.
Data transformations are done via the “Add” or “Variable” menus in the main window.

Selecting from a database


Another alternative is to establish your dataset by selecting variables from a database.
Begin with the “File, Databases” menu item. This has four forks: “Gretl native”, “RATS 4”, “PcGive”
and “On database server”. You should be able to find the file fedstl.bin in the file selector that
Chapter 4. Data files 21

opens if you choose the “Gretl native” option since this file, which contains a large collection of US
macroeconomic time series, is supplied with the distribution.
You won’t find anything under “RATS 4” unless you have purchased RATS data.2 If you do possess
RATS data you should go into the “Tools, Preferences, General” dialog, select the Databases tab,
and fill in the correct path to your RATS files.
If your computer is connected to the internet you should find several databases (at Wake Forest
University) under “On database server”. You can browse these remotely; you also have the option
of installing them onto your own computer. The initial remote databases window has an item
showing, for each file, whether it is already installed locally (and if so, if the local version is up to
date with the version at Wake Forest).
Assuming you have managed to open a database you can import selected series into gretl’s workspace
by using the “Series, Import” menu item in the database window, or via the popup menu that ap-
pears if you click the right mouse button, or by dragging the series into the program’s main window.

Creating a gretl data file independently


It is possible to create a data file in one or other of gretl’s own formats using a text editor or
software tools such as awk, sed or perl. This may be a good choice if you have large amounts
of data already in machine readable form. You will, of course, need to study these data formats
(XML-based or “traditional”) as described in Appendix A.

4.4 Structuring a dataset


Once your data are read by gretl, it may be necessary to supply some information on the nature of
the data. We distinguish between three kinds of datasets:

1. Cross section

2. Time series

3. Panel data

The primary tool for doing this is the “Data, Dataset structure” menu entry in the graphical inter-
face, or the setobs command for scripts and the command-line interface.

Cross sectional data


By a cross section we mean observations on a set of “units” (which may be firms, countries, indi-
viduals, or whatever) at a common point in time. This is the default interpretation for a data file:
if there is insufficient information to interpret data as time-series or panel data, they are automat-
ically interpreted as a cross section. In the unlikely event that cross-sectional data are wrongly
interpreted as time series, you can correct this by selecting the “Data, Dataset structure” menu
item. Click the “cross-sectional” radio button in the dialog box that appears, then click “Forward”.
Click “OK” to confirm your selection.

Time series data


When you import data from a spreadsheet or plain text file, gretl will make fairly strenuous efforts
to glean time-series information from the first column of the data, if it looks at all plausible that
such information may be present. If time-series structure is present but not recognized, again you
can use the “Data, Dataset structure” menu item. Select “Time series” and click “Forward”; select the
appropriate data frequency and click “Forward” again; then select or enter the starting observation

2 See www.estima.com
Chapter 4. Data files 22

and click “Forward” once more. Finally, click “OK” to confirm the time-series interpretation if it is
correct (or click “Back” to make adjustments if need be).
Besides the basic business of getting a data set interpreted as time series, further issues may arise
relating to the frequency of time-series data. In a gretl time-series data set, all the series must
have the same frequency. Suppose you wish to make a combined dataset using series that, in their
original state, are not all of the same frequency. For example, some series are monthly and some
are quarterly.
Your first step is to formulate a strategy: Do you want to end up with a quarterly or a monthly data
set? A basic point to note here is that “compacting” data from a higher frequency (e.g. monthly) to
a lower frequency (e.g. quarterly) is usually unproblematic. You lose information in doing so, but
in general it is perfectly legitimate to take (say) the average of three monthly observations to create
a quarterly observation. On the other hand, “expanding” data from a lower to a higher frequency is
not, in general, a valid operation.
In most cases, then, the best strategy is to start by creating a data set of the lower frequency, and
then to compact the higher frequency data to match. When you import higher-frequency data from
a database into the current data set, you are given a choice of compaction method (average, sum,
start of period, or end of period). In most instances “average” is likely to be appropriate.
You can also import lower-frequency data into a high-frequency data set, but this is generally not
recommended. What gretl does in this case is simply replicate the values of the lower-frequency
series as many times as required. For example, suppose we have a quarterly series with the value
35.5 in 1990:1, the first quarter of 1990. On expansion to monthly, the value 35.5 will be assigned
to the observations for January, February and March of 1990. The expanded variable is therefore
useless for fine-grained time-series analysis, outside of the special case where you know that the
variable in question does in fact remain constant over the sub-periods.
When the current data frequency is appropriate, gretl offers both “Compact data” and “Expand
data” options under the “Data” menu. These options operate on the whole data set, compacting or
exanding all series. They should be considered “expert” options and should be used with caution.

Panel data
Panel data are inherently three dimensional — the dimensions being variable, cross-sectional unit,
and time-period. For example, a particular number in a panel data set might be identified as the
observation on capital stock for General Motors in 1980. (A note on terminology: we use the
terms “cross-sectional unit”, “unit” and “group” interchangeably below to refer to the entities that
compose the cross-sectional dimension of the panel. These might, for instance, be firms, countries
or persons.)
For representation in a textual computer file (and also for gretl’s internal calculations) the three
dimensions must somehow be flattened into two. This “flattening” involves taking layers of the
data that would naturally stack in a third dimension, and stacking them in the vertical dimension.
gretl always expects data to be arranged “by observation”, that is, such that each row represents an
observation (and each variable occupies one and only one column). In this context the flattening of
a panel data set can be done in either of two ways:

• Stacked time series: the successive vertical blocks each comprise a time series for a given
unit.

• Stacked cross sections: the successive vertical blocks each comprise a cross-section for a
given period.

You may input data in whichever arrangement is more convenient. Internally, however, gretl always
stores panel data in the form of stacked time series.
Chapter 4. Data files 23

4.5 Panel data specifics


When you import panel data into gretl from a spreadsheet or comma separated format, the panel
nature of the data will not be recognized automatically (most likely the data will be treated as
“undated”). A panel interpretation can be imposed on the data using the graphical interface or via
the setobs command.
In the graphical interface, use the menu item “Data, Dataset structure”. In the first dialog box
that appears, select “Panel”. In the next dialog you have a three-way choice. The first two options,
“Stacked time series” and “Stacked cross sections” are applicable if the data set is already organized
in one of these two ways. If you select either of these options, the next step is to specify the number
of cross-sectional units in the data set. The third option, “Use index variables”, is applicable if the
data set contains two variables that index the units and the time periods respectively; the next step
is then to select those variables. For example, a data file might contain a country code variable and
a variable representing the year of the observation. In that case gretl can reconstruct the panel
structure of the data regardless of how the observation rows are organized.
The setobs command has options that parallel those in the graphical interface. If suitable index
variables are available you can do, for example

setobs unitvar timevar --panel-vars

where unitvar is a variable that indexes the units and timevar is a variable indexing the periods.
Alternatively you can use the form setobs freq 1:1 structure, where freq is replaced by the “block
size” of the data (that is, the number of periods in the case of stacked time series, or the number
of units in the case of stacked cross-sections) and structure is either --stacked-time-series or
--stacked-cross-section. Two examples are given below: the first is suitable for a panel in
the form of stacked time series with observations from 20 periods; the second for stacked cross
sections with 5 units.

setobs 20 1:1 --stacked-time-series


setobs 5 1:1 --stacked-cross-section

Panel data arranged by variable


Publicly available panel data sometimes come arranged “by variable.” Suppose we have data on two
variables, x1 and x2, for each of 50 states in each of 5 years (giving a total of 250 observations
per variable). One textual representation of such a data set would start with a block for x1, with
50 rows corresponding to the states and 5 columns corresponding to the years. This would be
followed, vertically, by a block with the same structure for variable x2. A fragment of such a data
file is shown below, with quinquennial observations 1965–1985. Imagine the table continued for
48 more states, followed by another 50 rows for variable x2.

x1
1965 1970 1975 1980 1985
AR 100.0 110.5 118.7 131.2 160.4
AZ 100.0 104.3 113.8 120.9 140.6

If a datafile with this sort of structure is read into gretl,3 the program will interpret the columns as
distinct variables, so the data will not be usable “as is.” But there is a mechanism for correcting the
situation, namely the stack function.
Consider the first data column in the fragment above: the first 50 rows of this column constitute a
cross-section for the variable x1 in the year 1965. If we could create a new series by stacking the
3 Note that you will have to modify such a datafile slightly before it can be read at all. The line containing the variable

name (in this example x1) will have to be removed, and so will the initial row containing the years, otherwise they will be
taken as numerical data.
Chapter 4. Data files 24

first 50 entries in the second column underneath the first 50 entries in the first, we would be on the
way to making a data set “by observation” (in the first of the two forms mentioned above, stacked
cross-sections). That is, we’d have a column comprising a cross-section for x1 in 1965, followed by
a cross-section for the same variable in 1970.
The following gretl script illustrates how we can accomplish the stacking, for both x1 and x2. We
assume that the original data file is called panel.txt, and that in this file the columns are headed
with “variable names” v1, v2, . . . , v5. (The columns are not really variables, but in the first instance
we “pretend” that they are.)

open panel.txt
series x1 = stack(v1..v5, 50)
series x2 = stack(v1..v5, 50, 50)
setobs 50 1:1 --stacked-cross-section
store panel.gdt x1 x2

The second and third lines illustrate the syntax of the stack function, which has this signature:

series stack(list L, scalar length, scalar offset)

• L: a list of series on which to operate.

• length: an integer giving the number of observations to take from each series.

• offset: an integer giving the offset from the top of the dataset at which to start taking values
(optional, defaults to 0).

The “..” syntax in the example above constructs a list of the 5 contiguous series to be stacked.
More generally, you can define a named list of series and pass that as the first argument to stack
(see chapter 15). In this example we’re supposing that the full data set contains 100 rows, and that
in the stacking of variable x1 we wish to read only the first 50 rows from each column, so we give
50 as the second argument.
On line 3 we do the stacking for variable x2. Again we want a length of 50 for the components of
the stacked series, but this time we want to start reading from the 50th row of the original data,
and so we add a third offset argument of 50. Line 4 then imposes a panel interpretation on the
data. Finally, we save the stacked data to file, with the panel interpretation.
The illustrative script above is appropriate when the number of variables to be processed is small.
When then are many variables in the dataset it will be more convenient to use a loop to accomplish
the stacking, as shown in the following script. The setup is presumed to be the same as in the
previous case (50 units, 5 periods), but with 20 variables rather than 2.

open panel.txt
list L = v1..v5 # predefine a list of series
scalar length = 50
loop i=1..20
scalar offset = (i - 1) * length
series x$i = stack(L, length, offset)
endloop
setobs 50 1.01 --stacked-cross-section
store panel.gdt x1..x20

Side-by-side time series


There’s a second sort of data that you may wish to convert to gretl’s panel format, namely side-
by-side time series for a number of cross-sectional units. For example, a data file might contain
separate GDP series of common length T for each of N countries. To turn these into a single stacked
Chapter 4. Data files 25

time series the stack function can again be used. An example follows, where we suppose the
original data source is a comma-separated file named GDP.csv, containing GDP data for countries
from Austria (GDP_AT) to Zimbabwe (GDP_ZW) in consecutive columns.

open GDP.csv
scalar T = $nobs # the number of periods
list L = GDP_AT..GDP_ZW
series GDP = stack(L, T)
setobs T 1:01 --stacked-time-series
store panel.gdt GDP

The resulting data file, panel.gdt, will contain a single series of length NT where N is the number
of countries and T is the length of the original dataset. One could insert revised variants of lines
3 and 4 of the script if the original file contained additional side-by-side per-country series for
investment, consumption or whatever.

Panel data marker strings


It can be helpful with panel data to have the observations identified by mnemonic markers. A
special function in the genr command is available for this purpose.
In the example under the heading “Panel data arranged by variable” above, suppose all the states
are identified by two-letter codes in the left-most column of the original datafile. When the stack
function is invoked as shown, these codes will be stacked along with the data values. If the first row
is marked AR for Arkansas, then the marker AR will end up being shown on each row containing an
observation for Arkansas. That’s all very well, but these markers don’t tell us anything about the
date of the observation. To rectify this we could do:

genr time
series year = 1960 + (5 * time)
genr markers = "%s:%d", marker, year

The first line generates a 1-based index representing the period of each observation, and the second
line uses the time variable to generate a variable representing the year of the observation. The
third line contains this special feature: if (and only if) the name of the new “variable” to generate is
markers, the portion of the command following the equals sign is taken as a C-style format string
(which must be wrapped in double quotes), followed by a comma-separated list of arguments.
The arguments will be printed according to the given format to create a new set of observation
markers. Valid arguments are either the names of variables in the dataset, or the string marker
which denotes the pre-existing observation marker. The format specifiers which are likely to be
useful in this context are %s for a string and %d for an integer. Strings can be truncated: for
example %.3s will use just the first three characters of the string. To chop initial characters off
an existing observation marker when constructing a new one, you can use the syntax marker + n,
where n is a positive integer: in the case the first n characters will be skipped.
After the commands above are processed, then, the observation markers will look like, for example,
AR:1965, where the two-letter state code and the year of the observation are spliced together with
a colon.

Panel dummy variables


In a panel study you may wish to construct dummy variables of one or both of the following sorts:
(a) dummies as unique identifiers for the units or groups, and (b) dummies as unique identifiers for
the time periods. The former may be used to allow the intercept of the regression to differ across
the units, the latter to allow the intercept to differ across periods.
Two special functions are available to create such dummies. These are found under the “Add”
menu in the GUI, or under the genr command in script mode or gretlcli.
Chapter 4. Data files 26

1. “unit dummies” (script command genr unitdum). This command creates a set of dummy
variables identifying the cross-sectional units. The variable du_1 will have value 1 in each
row corresponding to a unit 1 observation, 0 otherwise; du_2 will have value 1 in each row
corresponding to a unit 2 observation, 0 otherwise; and so on.

2. “time dummies” (script command genr timedum). This command creates a set of dummy
variables identifying the periods. The variable dt_1 will have value 1 in each row correspond-
ing to a period 1 observation, 0 otherwise; dt_2 will have value 1 in each row corresponding
to a period 2 observation, 0 otherwise; and so on.

If a panel data set has the YEAR of the observation entered as one of the variables you can create a
periodic dummy to pick out a particular year, e.g. genr dum = (YEAR==1960). You can also create
periodic dummy variables using the modulus operator, %. For instance, to create a dummy with
value 1 for the first observation and every thirtieth observation thereafter, 0 otherwise, do

genr index
series dum = ((index-1) % 30) == 0

Lags, differences, trends


If the time periods are evenly spaced you may want to use lagged values of variables in a panel
regression (but see also chapter 24); you may also wish to construct first differences of variables of
interest.
Once a dataset is identified as a panel, gretl will handle the generation of such variables correctly.
For example the command genr x1_1 = x1(-1) will create a variable that contains the first lag
of x1 where available, and the missing value code where the lag is not available (e.g. at the start of
the time series for each group). When you run a regression using such variables, the program will
automatically skip the missing observations.
When a panel data set has a fairly substantial time dimension, you may wish to include a trend in
the analysis. The command genr time creates a variable named time which runs from 1 to T for
each unit, where T is the length of the time-series dimension of the panel. If you want to create an
index that runs consecutively from 1 to m × T , where m is the number of units in the panel, use
genr index.

Basic statistics by unit


gretl contains functions which can be used to generate basic descriptive statistics for a given vari-
able, on a per-unit basis; these are pnobs() (number of valid cases), pmin() and pmax() (minimum
and maximum) and pmean() and psd() (mean and standard deviation).
As a brief illustration, suppose we have a panel data set comprising 8 time-series observations on
each of N units or groups. Then the command

series pmx = pmean(x)

creates a series of this form: the first 8 values (corresponding to unit 1) contain the mean of x for
unit 1, the next 8 values contain the mean for unit 2, and so on. The psd() function works in a
similar manner. The sample standard deviation for group i is computed as
sP
(x − x̄i )2
si =
Ti − 1

where Ti denotes the number of valid observations on x for the given unit, x̄i denotes the group
mean, and the summation is across valid observations for the group. If Ti < 2, however, the
standard deviation is recorded as 0.
Chapter 4. Data files 27

One particular use of psd() may be worth noting. If you want to form a sub-sample of a panel that
contains only those units for which the variable x is time-varying, you can either use

smpl pmin(x) < pmax(x) --restrict

or

smpl psd(x) > 0 --restrict

4.6 Missing data values


Representation and handling
Missing values are represented internally as NaN (“not a number”), as defined in the IEEE 754
floating-point standard. In a native-format data file they should be represented as NA. When im-
porting CSV data gretl accepts several common representations of missing values including −999,
the string NA (in upper or lower case), a single dot, or simply a blank cell. Blank cells should, of
course, be properly delimited, e.g. 120.6,,5.38, in which the middle value is presumed missing.
As for handling of missing values in the course of statistical analysis, gretl does the following:

• In calculating descriptive statistics (mean, standard deviation, etc.) under the summary com-
mand, missing values are simply skipped and the sample size adjusted appropriately.

• In running regressions gretl first adjusts the beginning and end of the sample range, trun-
cating the sample if need be. Missing values at the beginning of the sample are common in
time series work due to the inclusion of lags, first differences and so on; missing values at the
end of the range are not uncommon due to differential updating of series and possibly the
inclusion of leads.

If gretl detects any missing values “inside” the (possibly truncated) sample range for a regression,
the result depends on the character of the dataset and the estimator chosen. In many cases, the
program will automatically skip the missing observations when calculating the regression results.
In this situation a message is printed stating how many observations were dropped. On the other
hand, the skipping of missing observations is not supported for all procedures: exceptions include
all autoregressive estimators, system estimators such as SUR, and nonlinear least squares. In the
case of panel data, the skipping of missing observations is supported only if their omission leaves
a balanced panel. If missing observations are found in cases where they are not supported, gretl
gives an error message and refuses to produce estimates.

Manipulating missing values


Some special functions are available for the handling of missing values. The Boolean function
missing() takes the name of a variable as its single argument; it returns a series with value 1 for
each observation at which the given variable has a missing value, and value 0 otherwise (that is, if
the given variable has a valid value at that observation). The function ok() is complementary to
missing; it is just a shorthand for !missing (where ! is the Boolean NOT operator). For example,
one can count the missing values for variable x using

scalar nmiss_x = sum(missing(x))

The function zeromiss(), which again takes a single series as its argument, returns a series where
all zero values are set to the missing code. This should be used with caution — one does not want
to confuse missing values and zeros— but it can be useful in some contexts. For example, one can
determine the first valid observation for a variable x using
Chapter 4. Data files 28

genr time
scalar x0 = min(zeromiss(time * ok(x)))

The function misszero() does the opposite of zeromiss, that is, it converts all missing values to
zero.
If missing values get involved in calculations, they propagate according to the IEEE rules: notably,
if one of the operands to an arithmetical operation is a NaN, the result will also be NaN.

4.7 Maximum size of data sets


Basically, the size of data sets (both the number of variables and the number of observations per
variable) is limited only by the characteristics of your computer. Gretl allocates memory dynami-
cally, and will ask the operating system for as much memory as your data require. Obviously, then,
you are ultimately limited by the size of RAM.
Aside from the multiple-precision OLS option, gretl uses double-precision floating-point numbers
throughout. The size of such numbers in bytes depends on the computer platform, but is typically
eight. To give a rough notion of magnitudes, suppose we have a data set with 10,000 observations
on 500 variables. That’s 5 million floating-point numbers or 40 million bytes. If we define the
megabyte (MB) as 1024 × 1024 bytes, as is standard in talking about RAM, it’s slightly over 38 MB.
The program needs additional memory for workspace, but even so, handling a data set of this size
should be quite feasible on a current PC, which at the time of writing is likely to have at least 256
MB of RAM.
If RAM is not an issue, there is one further limitation on data size (though it’s very unlikely to
be a binding constraint). That is, variables and observations are indexed by signed integers, and
on a typical PC these will be 32-bit values, capable of representing a maximum positive value of
231 − 1 = 2, 147, 483, 647.
The limits mentioned above apply to gretl’s “native” functionality. There are tighter limits with
regard to two third-party programs that are available as add-ons to gretl for certain sorts of time-
series analysis including seasonal adjustment, namely TRAMO/SEATS and X-12-ARIMA. These pro-
grams employ a fixed-size memory allocation, and can’t handle series of more than 600 observa-
tions.

4.8 Data file collections


If you’re using gretl in a teaching context you may be interested in adding a collection of data files
and/or scripts that relate specifically to your course, in such a way that students can browse and
access them easily.
There are three ways to access such collections of files:

• For data files: select the menu item “File, Open data, Sample file”, or click on the folder icon
on the gretl toolbar.

• For script files: select the menu item “File, Script files, Example scripts”.

When a user selects one of the items:

• The data or script files included in the gretl distribution are automatically shown (this includes
files relating to Ramanathan’s Introductory Econometrics and Greene’s Econometric Analysis).

• The program looks for certain known collections of data files available as optional extras,
for instance the datafiles from various econometrics textbooks (Davidson and MacKinnon,
Gujarati, Stock and Watson, Verbeek, Wooldridge) and the Penn World Table (PWT 5.6). (See
Chapter 4. Data files 29

the data page at the gretl website for information on these collections.) If the additional files
are found, they are added to the selection windows.

• The program then searches for valid file collections (not necessarily known in advance) in
these places: the “system” data directory, the system script directory, the user directory,
and all first-level subdirectories of these. For reference, typical values for these directories
are shown in Table 4.1. (Note that PERSONAL is a placeholder that is expanded by Windows,
corresponding to “My Documents” on English-language systems.)

Linux MS Windows
system data dir /usr/share/gretl/data c:\Program Files\gretl\data
system script dir /usr/share/gretl/scripts c:\Program Files\gretl\scripts
user dir $HOME/gretl PERSONAL\gretl

Table 4.1: Typical locations for file collections

Any valid collections will be added to the selection windows. So what constitutes a valid file collec-
tion? This comprises either a set of data files in gretl XML format (with the .gdt suffix) or a set of
script files containing gretl commands (with .inp suffix), in each case accompanied by a “master
file” or catalog. The gretl distribution contains several example catalog files, for instance the file
descriptions in the misc sub-directory of the gretl data directory and ps_descriptions in the
misc sub-directory of the scripts directory.
If you are adding your own collection, data catalogs should be named descriptions and script
catalogs should be be named ps_descriptions. In each case the catalog should be placed (along
with the associated data or script files) in its own specific sub-directory (e.g. /usr/share/gretl/
data/mydata or c:\userdata\gretl\data\mydata).
The catalog files are plain text; if they contain non-ASCII characters they must be encoded as UTF-
8. The syntax of such files is straightforward. Here, for example, are the first few lines of gretl’s
“misc” data catalog:

# Gretl: various illustrative datafiles


"arma","artificial data for ARMA script example"
"ects_nls","Nonlinear least squares example"
"hamilton","Prices and exchange rate, U.S. and Italy"

The first line, which must start with a hash mark, contains a short name, here “Gretl”, which
will appear as the label for this collection’s tab in the data browser window, followed by a colon,
followed by an optional short description of the collection.
Subsequent lines contain two elements, separated by a comma and wrapped in double quotation
marks. The first is a datafile name (leave off the .gdt suffix here) and the second is a short de-
scription of the content of that datafile. There should be one such line for each datafile in the
collection.
A script catalog file looks very similar, except that there are three fields in the file lines: a filename
(without its .inp suffix), a brief description of the econometric point illustrated in the script, and
a brief indication of the nature of the data used. Again, here are the first few lines of the supplied
“misc” script catalog:

# Gretl: various sample scripts


"arma","ARMA modeling","artificial data"
"ects_nls","Nonlinear least squares (Davidson)","artificial data"
"leverage","Influential observations","artificial data"
"longley","Multicollinearity","US employment"
Chapter 4. Data files 30

If you want to make your own data collection available to users, these are the steps:

1. Assemble the data, in whatever format is convenient.

2. Convert the data to gretl format and save as gdt files. It is probably easiest to convert the data
by importing them into the program from plain text, CSV, or a spreadsheet format (MS Excel
or Gnumeric) then saving them. You may wish to add descriptions of the individual variables
(the “Variable, Edit attributes” menu item), and add information on the source of the data (the
“Data, Edit info” menu item).

3. Write a descriptions file for the collection using a text editor.

4. Put the datafiles plus the descriptions file in a subdirectory of the gretl data directory (or user
directory).

5. If the collection is to be distributed to other people, package the data files and catalog in some
suitable manner, e.g. as a zipfile.

If you assemble such a collection, and the data are not proprietary, we would encourage you to
submit the collection for packaging as a gretl optional extra.

4.9 Assembling data from multiple sources


In many contexts researchers need to bring together data from multiple source files, and in some
cases these sources are not organized such that the data can simply be “stuck together” by append-
ing rows or columns to a base dataset. In gretl, the join command can be used for this purpose;
this command is discussed in detail in chapter 7.
Chapter 5

Sub-sampling a dataset

5.1 Introduction
Some subtle issues can arise here; this chapter attempts to explain the issues.
A sub-sample may be defined in relation to a full dataset in two different ways: we will refer to these
as “setting” the sample and “restricting” the sample; these methods are discussed in sections 5.2
and 5.3 respectively. In addition section 5.4 discusses some special issues relating to panel data,
and section 5.5 covers resampling with replacement, which is useful in the context of bootstrapping
test statistics.
The following discussion focuses on the command-line approach. But you can also invoke the
methods outlined here via the items under the Sample menu in the GUI program.

5.2 Setting the sample


By “setting” the sample we mean defining a sub-sample simply by means of adjusting the starting
and/or ending point of the current sample range. This is likely to be most relevant for time-series
data. For example, one has quarterly data from 1960:1 to 2003:4, and one wants to run a regression
using only data from the 1970s. A suitable command is then

smpl 1970:1 1979:4

Or one wishes to set aside a block of observations at the end of the data period for out-of-sample
forecasting. In that case one might do

smpl ; 2000:4

where the semicolon is shorthand for “leave the starting observation unchanged”. (The semicolon
may also be used in place of the second parameter, to mean that the ending observation should be
unchanged.) By “unchanged” here, we mean unchanged relative to the last smpl setting, or relative
to the full dataset if no sub-sample has been defined up to this point. For example, after

smpl 1970:1 2003:4


smpl ; 2000:4

the sample range will be 1970:1 to 2000:4.


An incremental or relative form of setting the sample range is also supported. In this case a relative
offset should be given, in the form of a signed integer (or a semicolon to indicate no change), for
both the starting and ending point. For example

smpl +1 ;

will advance the starting observation by one while preserving the ending observation, and

smpl +2 -1

31
Chapter 5. Sub-sampling a dataset 32

will both advance the starting observation by two and retard the ending observation by one.
An important feature of “setting” the sample as described above is that it necessarily results in
the selection of a subset of observations that are contiguous in the full dataset. The structure of
the dataset is therefore unaffected (for example, if it is a quarterly time series before setting the
sample, it remains a quarterly time series afterwards).

5.3 Restricting the sample


By “restricting” the sample we mean selecting observations on the basis of some Boolean (logical)
criterion, or by means of a random number generator. This is likely to be most relevant for cross-
sectional or panel data.
Suppose we have data on a cross-section of individuals, recording their gender, income and other
characteristics. We wish to select for analysis only the women. If we have a male dummy variable
with value 1 for men and 0 for women we could do

smpl male==0 --restrict

to this effect. Or suppose we want to restrict the sample to respondents with incomes over $50,000.
Then we could use

smpl income>50000 --restrict

A question arises: if we issue the two commands above in sequence, what do we end up with in
our sub-sample: all cases with income over 50000, or just women with income over 50000? By
default, the answer is the latter: women with income over 50000. The second restriction augments
the first, or in other words the final restriction is the logical product of the new restriction and any
restriction that is already in place. If you want a new restriction to replace any existing restrictions
you can first recreate the full dataset using

smpl --full

Alternatively, you can add the replace option to the smpl command:

smpl income>50000 --restrict --replace

This option has the effect of automatically re-establishing the full dataset before applying the new
restriction.
Unlike a simple “setting” of the sample, “restricting” the sample may result in selection of non-
contiguous observations from the full data set. It may therefore change the structure of the data
set.
This can be seen in the case of panel data. Say we have a panel of five firms (indexed by the variable
firm) observed in each of several years (identified by the variable year). Then the restriction

smpl year==1995 --restrict

produces a dataset that is not a panel, but a cross-section for the year 1995. Similarly

smpl firm==3 --restrict

produces a time-series dataset for firm number 3.


For these reasons (possible non-contiguity in the observations, possible change in the structure of
the data), gretl acts differently when you “restrict” the sample as opposed to simply “setting” it. In
Chapter 5. Sub-sampling a dataset 33

the case of setting, the program merely records the starting and ending observations and uses these
as parameters to the various commands calling for the estimation of models, the computation of
statistics, and so on. In the case of restriction, the program makes a reduced copy of the dataset
and by default treats this reduced copy as a simple, undated cross-section — but see the further
discussion of panel data in section 5.4.
If you wish to re-impose a time-series interpretation of the reduced dataset you can do so using the
setobs command, or the GUI menu item “Data, Dataset structure”.
The fact that “restricting” the sample results in the creation of a reduced copy of the original
dataset may raise an issue when the dataset is very large. With such a dataset in memory, the
creation of a copy may lead to a situation where the computer runs low on memory for calculating
regression results. You can work around this as follows:

1. Open the full data set, and impose the sample restriction.
2. Save a copy of the reduced data set to disk.
3. Close the full dataset and open the reduced one.
4. Proceed with your analysis.

Random sub-sampling
Besides restricting the sample on some deterministic criterion, it may sometimes be useful (when
working with very large datasets, or perhaps to study the properties of an estimator) to draw a
random sub-sample from the full dataset. This can be done using, for example,

smpl 100 --random

to select 100 cases. If you want the sample to be reproducible, you should set the seed for the
random number generator first, using the set command. This sort of sampling falls under the
“restriction” category: a reduced copy of the dataset is made.

5.4 Panel data


Consider for concreteness the Arellano–Bond dataset supplied with gretl (abdata.gdt). This com-
prises data on 140 firms (n = 140) observed over the years 1976–1984 (T = 9). The dataset is
“nominally balanced” in the sense that that the time-series length is the same for all countries (this
being a requirement for a dataset to count as a panel in gretl), but in fact there are many missing
values (NAs).
You may want to sub-sample such a dataset in either the cross-sectional dimension (limit the sam-
ple to a subset of firms) or the time dimension (e.g. use data from the 1980s only). One way to
sub-sample on firms keys off the notation used by gretl for panel observations. The full data range
is printed as 1:1 (firm 1, period 1) to 140:9 (firm 140, period 9). The effect of

smpl 1:1 80:9

is to limit the sample to the first 80 firms. Note that if you instead tried smpl 1:1 80:4 this would
provoke an error: you cannot use this syntax to sub-sample in the time dimension of the panel.
Alternatively, and perhaps more naturally, you can use the --unit option with the smpl command
to limit the sample in the cross-sectional dimension, as in

smpl 1 80 --unit

The firms in the Arellano–Bond dataset are anonymous, but suppose you had a panel with five
named countries. With such a panel you can inform gretl of the names of the groups using the
setobs command. For example, given
Chapter 5. Sub-sampling a dataset 34

string cstr = "Portugal Italy Ireland Greece Spain"


setobs country cstr --panel-groups

gretl creates a string-valued series named country with group names taken from the variable cstr.
Then, to include only Italy and Spain you could do

smpl country=="Italy" || country=="Spain" --restrict

or to exclude one country,

smpl country!="Ireland" --restrict

Sub-sampling a panel in the time dimension can be done via --restrict. For example, the
Arellano–Bond dataset contains a variable named YEAR that records the year of the observations
and if one wanted to omit the first two years of data one could do

smpl YEAR >= 1978 --restrict

If a dataset does not already incude a suitable variable for this purpose one can use the command
genr time to create a simple 1-based time index.
Another way to sub-sample in the time dimension of a panel starts with a specification of time via
the setobs command, as in

setobs 1 1976 --panel-time

This tells gretl that panel-time is annual (frequency 1), starting in 1976. (In fact this is already done
for abdata.gdt.) Then to restrict the sample range to 1979–1982 you can do

smpl 1979 1982 --time

Note that if you apply a sample restriction that just selects certain units (firms, countries or what-
ever), or selects certain contiguous time-periods — such that n > 1, T > 1 and the time-series length
is still the same across all included units — your sub-sample will still be interpreted by gretl as a
panel.

Unbalancing restrictions
In some cases one wants to sub-sample according to a criterion that “cuts across the grain” of
a panel dataset. For instance, suppose you have a micro dataset with thousands of individuals
observed over several years and you want to restrict the sample to observations on employed
women.
If we simply extracted from the total nT rows of the dataset those that pertain to women who were
employed at time t (t = 1, . . . , T ) we would likely end up with a dataset that doesn’t count as a
panel in gretl (because the specific time-series length, Ti , would differ across individuals). In some
contexts it might be OK that gretl doesn’t take your sub-sample to be a panel, but if you want to
apply panel-specific methods this is a problem. You can solve it by giving the --preserve-panel
option with smpl. For example, supposing your dataset contained dummy variables gender (with
the value 1 coding for women) and employed, you could do

smpl gender==1 && employed==1 --restrict --preserve-panel

What exactly does this do? Well, let’s say the years of your data are 2000, 2005 and 2010, and
that some women were employed in all of those years, giving a maximum Ti value of 3. But in-
dividual 526 is a woman who was employed only in the year 2000 (Ti = 1). The effect of the
--preserve-panel option is then to insert “padding rows” of NAs for the years 2005 and 2010 for
individual 526, and similarly for all individuals with 0 < Ti < 3. Your sub-sample then qualifies as
a panel.
Chapter 5. Sub-sampling a dataset 35

5.5 Resampling and bootstrapping


Given an original data series x, the command

series xr = resample(x)

creates a new series each of whose elements is drawn at random from the elements of x. If the
original series has 100 observations, each element of x is selected with probability 1/100 at each
drawing. Thus the effect is to “shuffle” the elements of x, with the twist that each element of x may
appear more than once, or not at all, in xr.
The primary use of this function is in the construction of bootstrap confidence intervals or p-values.
Here is a simple example. Suppose we estimate a simple regression of y on x via OLS and find that
the slope coefficient has a reported t-ratio of t0 with ν degrees of freedom. A two-tailed p-value
for the null hypothesis that the slope parameter equals zero can then be found using the t(ν)
distribution. Depending on the context, however, we may doubt whether the ratio of coefficient to
standard error truly follows the t(ν) distribution. In that case we could derive a bootstrap p-value
as shown in Listing 5.1.
Under the null hypothesis that the slope with respect to x is zero, y is simply equal to its mean plus
an error term. We simulate y by resampling the residuals from the initial OLS and re-estimate the
model. We repeat this procedure a large number of times, and record the number of cases where
the absolute value of the t-ratio is greater than t0 : the proportion of such cases is our bootstrap
p-value. For a good discussion of simulation-based tests and bootstrapping, see Davidson and
MacKinnon (2004, chapter 4); Davidson and Flachaire (2001) is also instructive.

Listing 5.1: Calculation of bootstrap p-value [Download ▼]

nulldata 50
set seed 54321
series x = normal()
series y = 10 + x + 2*normal()
ols y 0 x
# the reported t-stat
t0 = abs($coeff[2] / $stderr[2])
# save the residuals
series u = $uhat
scalar ybar = mean(y)
# number of replications for bootstrap
scalar B = 1000
scalar tcount = 0
series ysim
loop B
# generate simulated y by resampling
ysim = ybar + resample(u)
ols ysim 0 x --quiet
scalar tsim = abs($coeff[2] / $stderr[2])
tcount += (tsim > t0)
endloop
printf "proportion of cases with |t| > %.3f = %g\n", t0, tcount / B
Chapter 6

Graphs and plots

6.1 Gnuplot graphs


A separate program, gnuplot, is called to generate graphs. Gnuplot is a very full-featured graphing
program with myriad options. It is available from www.gnuplot.info (but note that a suitable copy
of gnuplot is bundled with the packaged versions of gretl for MS Windows and Mac OS X). Gretl
gives you direct access, via a graphical interface, to a subset of gnuplot’s options and it tries to
choose sensible values for you; it also allows you to take complete control over graph details if you
wish.
With a graph displayed, you can click on the graph window for a pop-up menu with the following
options.

• Save as PNG: Save the graph in Portable Network Graphics format (the same format that you
see on screen).

• Save as postscript: Save in encapsulated postscript (EPS) format.

• Save as Windows metafile: Save in Enhanced Metafile (EMF) format.

• Save to session as icon: The graph will appear in iconic form when you select “Icon view” from
the View menu.

• Zoom: Lets you select an area within the graph for closer inspection (not available for all
graphs).

• Print: (Current GTK or MS Windows only) lets you print the graph directly.

• Copy to clipboard: MS Windows only, lets you paste the graph into Windows applications such
as MS Word.

• Edit: Opens a controller for the plot which lets you adjust many aspects of its appearance.

• Close: Closes the graph window.

Displaying data labels


For simple X-Y scatter plots, some further options are available if the dataset includes “case mark-
ers” (that is, labels identifying each observation).1 With a scatter plot displayed, when you move
the mouse pointer over a data point its label is shown on the graph. By default these labels are
transient: they do not appear in the printed or copied version of the graph. They can be removed by
selecting “Clear data labels” from the graph pop-up menu. If you want the labels to be affixed per-
manently (so they will show up when the graph is printed or copied), select the option “Freeze data
labels” from the pop-up menu; “Clear data labels” cancels this operation. The other label-related
option, “All data labels”, requests that case markers be shown for all observations. At present the
display of case markers is disabled for graphs containing more than 250 data points.

1 For an example of such a dataset, see the Ramanathan file data4-10: this contains data on private school enrollment

for the 50 states of the USA plus Washington, DC; the case markers are the two-letter codes for the states.

36
Chapter 6. Graphs and plots 37

GUI plot editor


Selecting the Edit option in the graph popup menu opens an editing dialog box, shown in Figure 6.1.
Notice that there are several tabs, allowing you to adjust many aspects of a graph’s appearance:
font, title, axis scaling, line colors and types, and so on. You can also add lines or descriptive labels
to a graph (under the Lines and Labels tabs). The “Apply” button applies your changes without
closing the editor; “OK” applies the changes and closes the dialog.

Figure 6.1: gretl’s gnuplot controller

Publication-quality graphics: advanced options


The GUI plot editor has two limitations. First, it cannot represent all the myriad options that
gnuplot offers. Users who are sufficiently familiar with gnuplot to know what they’re missing in
the plot editor presumably don’t need much help from gretl, so long as they can get hold of the
gnuplot command file that gretl has put together. Second, even if the plot editor meets your needs,
in terms of fine-tuning the graph you see on screen, a few details may need further work in order
to get optimal results for publication.
Either way, the first step in advanced tweaking of a graph is to get access to the graph command
file.

• In the graph display window, right-click and choose “Save to session as icon”.

• If it’s not already open, open the icon view window — either via the menu item View/Icon view,
or by clicking the “session icon view” button on the main-window toolbar.

• Right-click on the icon representing the newly added graph and select “Edit plot commands”
from the pop-up menu.

• You get a window displaying the plot file (Figure 6.2).

Here are the basic things you can do in this window. Obviously, you can edit the file you just
opened. You can also send it for processing by gnuplot, by clicking the “Execute” (cogwheel) icon
in the toolbar. Or you can use the “Save as” button to save a copy for editing and processing as you
wish.
Chapter 6. Graphs and plots 38

Figure 6.2: Plot commands editor

Unless you’re a gnuplot expert, most likely you’ll only need to edit a couple of lines at the top of
the file, specifying a driver (plus options) and an output file. We offer here a brief summary of some
points that may be useful.
First, gnuplot’s output mode is set via the command set term followed by the name of a supported
driver (“terminal” in gnuplot parlance) plus various possible options. (The top line in the plot
commands window shows the set term line that gretl used to make a PNG file, commented out.)
The graphic formats that are most suitable for publication are PDF and EPS. These are supported
by the gnuplot term types pdf, pdfcairo and postscript (with the eps option). The pdfcairo
driver has the virtue that is behaves in a very similar manner to the PNG one, the output of which
you see on screen. This is provided by the version of gnuplot that is included in the gretl packages
for MS Windows and Mac OS X; if you’re on Linux it may or may be supported. If pdfcairo is not
available, the pdf terminal may be available; the postscript terminal is almost certainly available.
Besides selecting a term type, if you want to get gnuplot to write the actual output file you need
to append a set output line giving a filename. Here are a few examples of the first two lines you
might type in the window editing your plot commands. We’ll make these more “realistic” shortly.

set term pdfcairo


set output ’mygraph.pdf’

set term pdf


set output ’mygraph.pdf’

set term postscript eps


set output ’mygraph.eps’

There are a couple of things worth remarking here. First, you may want to adjust the size of the
graph, and second you may want to change the font. The default sizes produced by the above
drivers are 5 inches by 3 inches for pdfcairo and pdf, and 5 inches by 3.5 inches for postscript
eps. In each case you can change this by giving a size specification, which takes the form XX,YY
(examples below).

You might also like