Alistair.jl

A minimal regression library for Julia

Install

To install Alistair.jl simply run:

Pkg.clone("https://github.com/giob1994/Alistair.jl.git")

Dependencies

Optim.jl – non-linear regression and optimization backend.
Rmath.jl – distributions backend.

Features

Linear Regressions: OLS, GLS, Feasible GLS, Iterated FGLS.
Robust Variance Matrices: simple and HCE (Eicker-White) variance.
T-test & Confidence Intervals for estimated coefficients.
Non-Linear Regression (prototype): generic solver for any model specified as Y = f(X, β).
Logit/Probit, Multivariate Regressions.
Syntactic Function Interpreter for non-linear models.
Compatibility with DataFrames.

Why Alistair.jl?

While Julia already has some packages for regression, like GLM.jl or Regression.jl, sometimes a simple, bare-bones regression package is just what is needed. For example, GLM.jl accepts only DataFrames as input type, and the syntax relies on high-level constructs.

Alistair.jl, on the other hand, tries to be minimal, intuitive and fast.

A Simple Example

Take the folllowing OLS (Ordinary Least Squares) regression:

beta = [0.7; 0.45]
X = hcat(ones(100,1), collect(1.0:100))
Y = X * beta + 0.1*randn(100, 1)

Using Alistair.jl we solve this with:

using Alistair

result = linregress(X, Y, OLS(intercept=true, robust=true))

Compare it to the equivalent GLM.jl code:

using Compat, DataFrames, StatsBase, GLM

data = DataFrame(X1=X[:,2], Y1=Y[:])
OLS_result = glm(@formula(Y1 ~ X1), data, Normal(), IdentityLink())

and the Regression.jl code:

ret = solve(linearreg(X, Y; bias=1.0);
            solver=GD(),
            reg=SqrL2Reg(1.0e-3),
            options=Options(verbosity=:iter, grtol=1.0e-6 * n))

Basic Usage

Alistair.jl works at different levels of abstraction, so that if you want, it is possible to access the main solution routines directly for even (marginally) faster computation.

However, because low-level equals risky, the best idea is to go at your own speed: if you are unsure of what function does what, or you are not 100% confident about the data you are handling, the higher-level functions will be happy to try and check issues for you. Don't drink and regress!

Preliminaries

Alistair.jl is minimal, that is, it already assumes that you are aware of the data you are using... at least a bit!

This is a cheatsheet of good-to-know assumptions Alistair.jl makes:

Y [response array]
- is a 1D array containing only numbers: Float64/32 type is recommended.
- Julia does make a distinction between A::Array{Float64,1} and B::Array{Float64,2} even though B may consist of a single column of data identical to that in A: Alistair.jl tries to work around this and generally should execute fine. If you have problem with "left-over" singleton dimensions, consider using squeeze().
X [regressor array]
- is 1D/2D array containing only numbers: Float64/32 type is recommended.
- usually X is supposed to contain more than one regressor variable: Alistair.jl assumes that each ROW is an OBSERVATION and each COLUMN is a VARIABLE. solve() tries to be robust toward incorrectly specified input data, so it will check that X and Y have "compatible lengths" (i.e. Y has as many elements as X has rows) and may even try to transpose the arrays. General advice: be careful and know your data before calling any Alistair.jl solver function!

Using `solve(X, Y, regtype)`

The most simple command the Alistair.jl exposes is solve(), which takes care of routing your regression call to the right solver - whatever type of regression you have decided to tun -, and also checks up that the data is consistent and well-defined.

Example:

# Do a simple OLS regression of Y over regressors in X:

res = solve(X, Y, OLS())

# NOTE:	OLS() defaults to: OLS(intercept=true, robust=false)

Using `linregress(X, Y, regtype)`

If the regtype argument in solve() indicates a linear regression, then after some checks the function linregress() is called. If you want to access this function directly, you can do so:

Example:

# Do a simple OLS regression of Y over regressors in X:
# NOTE: this is equivalent to the previous example with
#       solve()

res = linregress(X, Y, OLS())

Alistair.jl has specific functions to solve each type of regression, so this call to linregress() is routed to:

OLS(X, Y, intercept=true, robust=false)

which you can call yourself too, if you so want! Alistair.jl exports OLS() for the brave and intrepid.

Robust variance matrices

As you might have noticed, OLS() has an argument called robust. Alistair.jl allows to specify the way of computing the variance-covariance matrix of the regression, and specifically allows for the Eicker-White HEC form.

Example:

# Do a simple OLS regression of Y over regressors in X,
# but use the "Heteroskedastic Robust" form for the
# variance-covariance matrix.

res = linregress(X, Y, OLS(robust=HCEVariance()))

Printing results, beautifully

We all like pretty printing when it comes to results. Alistair.jl defines for every result type a nice overload of Julia Base.show() method, so a nice output can be showed without hassle by simply using print(), println() or show()!

Example:

julia> println(linregress(X, Y, OLS()))

# ------------- RegResult [ Alistair.OLS ] ------------- #

 N of obs.: 	      10
 MSE:       	0.093079

-------------------------------------------------------------------------------------------
     Y     |      β         σ [S.E.]       t          P > |t|      [95% Conf. Interval] 
-------------------------------------------------------------------------------------------
    X[1]   |   0.776401     0.073686     10.537        0.000       0.629030   0.408779
    X[2]   |   0.432530     0.011876     36.422        0.000       0.923773   0.456281
-------------------------------------------------------------------------------------------

 Robustness: false

# ------------- EndResult [ Alistair.OLS ] ------------- #

Types

Alistair specifies its own types for regressions and more. This is a useful cheatsheet:

Regression Types

Abstract Types

Alistair.jl Type	Supertype
`AbstractRegressionType`	None
`LinearRegressionType`	`AbstractRegressionType`
`NonLinearRegressionType`	`AbstractRegressionType`

Linear Regression Types

Regression	Alistair.jl Type	Default form
OLS	`OLS`	`OLS(intercept=true, robust=false)`
GLS	`GLS`	`GLS(omega; intercept=true, robust=false)` where `omega` is the mastrix chose for the GLS "sandwich" estimator
FGLS	`FGLS`	`FGLS(intercept=true, robust=false)`
Iterated FGLS	`IteratedFLGS`	`IteratedFGLS(iterations; intercept=true, robust=false)` where `iterations` is the maximum number of iterations allowed: `IteratedFLGS` may stop earlier because convergence has been reached

Non-Linear Regression Types

Regression	Alistair.jl Type	Default form
Numerical optimization-based non-linear regression	`Optimize`	`Optimize(modelfun, beta0, intercept=true, robust=false; method=NewtonTrustRegion())`

Result Types

All types in this category inherit from the abstract AbstractFittingResult type. These types are used for output when the regression is compelted:

Regression Type	Alistair.jl Result Type	Fields
Any	`genericfitresult`	`callertype` `beta` `residuals` `mse` `variance`
Linear	`linearfitresult`	`callertype` `beta` `residuals` `mse` `variance` `robust` `iterations`
NonLinear	`nlinfitresult`	`callertype` `beta` `residuals` `mse` `variance` `robust` `converged` `iterations`

Estimation Types

These are "support" types, that allow for an easier and more user-friendly specification of some tasks during the estimation phase:

Variance Matrix Types

All types in this category inherit from the abstract AbstractVarianceMatrix type. Note that all Variance Matrix Types are functions too! You can call them as

myVariance = <VarianceMatrixTypeYouNeed>(...)

Method	Alistair.jl Type	Function form
Default	`BasicVariance`	`BasicVariance(X, residuals, k)` where `k` are the degree of freedom (number of parametes in `beta`)
HCE (Eicker-White)	`HCEVariance`	`HCEVariance(X, residuals)`

Note on Performance

Alistair.jl is coded in standard Julia language operators like \, * and inv(). Since Julia automatically produces code that uses LAPACK functions [1], there is no real use in fiddling with LAPACK functions inside Julia. However, it might interesting to explore usage of SIMD functions and the parallel programming capabilities of Julia to accelerate regressions with datasets in the order of 10⁵-10⁶ observations or more.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.ipynb_checkpoints		.ipynb_checkpoints
assets		assets
docs		docs
src		src
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.md		LICENSE.md
README.md		README.md
REQUIRE		REQUIRE
Solving Non-Linear Regressions.ipynb		Solving Non-Linear Regressions.ipynb
appveyor.yml		appveyor.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alistair.jl

A minimal regression library for Julia

Install

Dependencies

Features

Why Alistair.jl?

A Simple Example

Basic Usage

Preliminaries

Using `solve(X, Y, regtype)`

Using `linregress(X, Y, regtype)`

Robust variance matrices

Printing results, beautifully

Types

Regression Types

Abstract Types

Linear Regression Types

Non-Linear Regression Types

Result Types

Estimation Types

Variance Matrix Types

Note on Performance

About

Releases

Packages

Languages

License

giob1994/Alistair.jl

Folders and files

Latest commit

History

Repository files navigation

Alistair.jl

A minimal regression library for Julia

Install

Dependencies

Features

Why Alistair.jl?

A Simple Example

Basic Usage

Preliminaries

Using solve(X, Y, regtype)

Using linregress(X, Y, regtype)

Robust variance matrices

Printing results, beautifully

Types

Regression Types

Abstract Types

Linear Regression Types

Non-Linear Regression Types

Result Types

Estimation Types

Variance Matrix Types

Note on Performance

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Using `solve(X, Y, regtype)`

Using `linregress(X, Y, regtype)`

Packages