This directory contains an implementation of a small subset of Haskell. It uses combinators for the runtime execution.
The compiler can compile itself.
The language is a subset of Haskell. There is only simple Hindley-Milner polymorphism, no type classes (yet).
It has the following features:
- variables
- application
- lambda
- integer literals
- character literals
- string (list of characters) literals
- case expressions
- let expressions
- tuples
- list syntax
- list comprehensions
- arithmetic and comparison operators, but only for
Int - qualified
donotation, e.g.,IO.do - data (and newtype) type declarations
- type synonyms
- type signatures
- importing of other modules,
qualifiedandassupported, but no import list - exporting with mandatory export list
- the
Preludehas to be imported explicitely - terrible, terrible error messages
The file Example.hs contains the following:
module Example(main) where
import Prelude
fac :: Int -> Int
fac 0 = 1
fac n = n * fac(n-1)
main :: IO ()
main = do
let
rs = map fac [1,2,3,10]
putStrLn "Some factorials"
putStrLn $ showList showInt rsFirst, make sure the binaries are built. E.g., by doing make test.
Then compile the file by bin/mhs -ilib Example which produces out.comb.
Finally, run the combinator file by bin/eval.
This should produce
Some factorials
[1,2,6,3628800]
There are a number of libraries that have some of the standard Haskell functions.
But in general, the Prelude contains much, much less.
There are two primitive data types Int and Handle. These are known by the runtime system
and various primitive operations work on them. The function type, ->, is (of course) also built in.
All other types are defined with the language. They are converted to lambda terms using the Scott encoding. The runtime system knows how lists are encoded and booleans are encoded.
The compiler is written in Micro Haskell.
It takes a name of a module and compiles it to a file called out.comb.
-iDIRaddDIRto search path for modules-oFILEoutput combinators toFILEinstead ofout.comb-rrun directly, does not work if compiled with GHC-vbe more verbose, flag can be repeated
With the -v flag the processing time for each module is reported.
E.g.
importing done MicroHs.Exp, 716ms (368 + 348)
which means tha processing MicroHs.Exp.hs took 716ms,
with parsing taking 368ms and typecheck&desugar taking 348ms.
Main, the main module. Decodes flags, compiles, and writes result.Compile, top level compiler. Maintains a cache of already compiled modules.Exp, simple expression type, combinator abstraction and optimization.Expr, parsed expression type.Desugar, desugar full expressions to simple expressions.Lex, lexical analysis and indentation processing.Parse, parse and build and abstract syntax tree.Translate, convert an expression tree to its value.TypeCheck, type checker.
The runtime system is written in C and is in eval.c.
It uses combinators for handling variables, and has primitive operations
for integers and for executing IO operations.
There is a also a simple mark-scan garbage collector.
It is written in a reasonably portable C code.
Runtime flags are given between the flags +RTS and -RTS.
Between those the runtime decodes the flags, everything else is available to
the running program.
-HSIZEset heap size toSIZEcells, can be suffixed byk,M, orG, default is50M-KSIZEset stack size toSIZEentries, can be suffixed byk,M, orG, default is100k-rFILEread combinators fromFILE, instead ofout.comb-vbe more verbose, flag can be repeated
For example, bin/eval +RTS -H1M -v -RTS hello runs out.comb and the program gets the argument hello,
whereas the runtime system sets the heap to 1M cells and is verbose.
The runtime system can serialize and deserialize any expression
and keep its graph structure (sharing and cycles).
The only exception to this is file handles, which cannot be serialized (except for stdin, stdout, and stderr).
Memory allocation is based on cells. Each cell has room for two pointers (i.e., two words, i.e., 16 bytes), so it can represent an application node. One bit is used to indicate if the cell has an application or something else. If it is something else one word is a tag indicating what it is, e.g., a combinator or an integer. The second word is then used to store any payload, e.g., the number itself for an integer node.
Memory allocation has a bitmap with one bit per cell. Allocating a cell consists of finding the next free cell using the bitmap, and then marking it as used. The garbage collector first clears the bitmap and then (recursively) marks every used cell in the bitmap. There is no explicit scan phase since that is baked into the allocation.
It is possible to use smaller cells by using 32 bit "pointers" instead of 64 bit pointers. This has a performance penalty, though.
The C code for the evaluator does not use any special features, and should be portable to many platforms. It has mostly been test with MacOS and Linus, so there are undoubtedly problems on Windows.
The code has only been tested on 64 bit platforms, so again, there are lurking problems with other word sizes.
It is possible to recompile the compiler without access to a Haskell compiler.
The combinator file for the compiler itself is available in comb/mhs.comb.
The bootstrapping process takes about 20s (on a modern machine).
To bootstrap:
- build the evaluator,
make bin/eval, this requires a C compiler - compile the compiler
bin/eval +RTS -rcomb/mhs.comb -RTS -ilib -isrc -onewmhs.comb MicroHs.Main - The file
newmhs.combis the new combinator binary and it should be identical tocomb/mhs.comb. - It is also possible to bake the combinator code into the binary.
See
maketargetbin/cmhsfor how it is done. - For systems where
upxworks you can compress further compress the binary. Seebin/umhstarget.
NOTE The GC mark phase currently uses a ridiculously deep stack. You might have to increase it on your system.
-
- Q: When will it get insert feature?
- A: Maybe some time, maybe never.
-
- Q: Why are the error messages so bad?
- A: Error messages are boring. But I plan to add location information to them.
-
- Q: Why is the so much source code?
- A: I wonder this myself. Over 4000 lines of Haskell seems excessive. 1700 lines of C is also more than I'd like for such a simple system.
-
- Q: Why are the binaries so big?
- A: The combinator file is rather verbose. Compressed the combinator file for the compiler shrinks from 150kB to 20kB. The evaluator is about 40kB so the total size for runtime and (compressed) compiler is about 40k. I'm sorry if you're running on a 16 bit system.