2005.02.03
Picking the right implementation language and target language for a compiler project
by Karel Thönissen
Things you may need in a compiler: string manipulation, graph manipulation, garbage collection, reference formantics, recursion, associative arrays, hashing,...
Other things to take into account:
- if you plan to bootstrap, take an implementation language that resembles your source language from the beginning (if it is a reasonable choice), otherwise you will have to write a lot of code more than once
- search for compiler generators and libraries you may have to use. This can limit your choices for the implementation language
- writing a professional compiler is a lot of work, so take an implementation language that was designed for programming in the large
- take an implementation language that is good in string manipulations and graph manipulations. Remember that writing the parser is the trivial part
- take the same language for implementation language and target language; this makes it possible to use your libraries in both the implementation code of the compiler and in the language environment
- if it is within your power: take a target language that resembles the source language so that fewer formantic conversions are needed during the compilation
- take a target language that is lower level than the source language; otherwise the code generator has to make a lot of dangerous assumptions due to the missing information
Careful readers will have noticed that I suggested:
- source language ~= implementation language (facilitates bootstrapping)
- implementation language = target language (facilitates dual purpose use of packages)
- source language ~= target language (allows work reduction)
Obviously in the ideal case, your source language is identical to your target language, making the implementation language irrelevant (-8.
A good choice: take the same language for targeting and implementation, and take for this a language that formantically resembles your source language.
In this context: do never underestimate the formantic differences between languages that have syntactic similarities and v.v. E.g. although Java looks like C due to the curly braces, it is much closer to Pascal than C.
|