2005.02.03

 

Picking the right implementation language and target language for a compiler project

by Karel Thönissen

Things you may need in a compiler: string manipulation, graph manipulation, garbage collection, reference formantics, recursion, associative arrays, hashing,...

Other things to take into account:

  • if you plan to bootstrap, take an implementation language that resembles your source language from the beginning (if it is a reasonable choice), otherwise you will have to write a lot of code more than once
  • search for compiler generators and libraries you may have to use. This can limit your choices for the implementation language
  • writing a professional compiler is a lot of work, so take an implementation language that was designed for programming in the large
  • take an implementation language that is good in string manipulations and graph manipulations. Remember that writing the parser is the trivial part
  • take the same language for implementation language and target language; this makes it possible to use your libraries in both the implementation code of the compiler and in the language environment
  • if it is within your power: take a target language that resembles the source language so that fewer formantic conversions are needed during the compilation
  • take a target language that is lower level than the source language; otherwise the code generator has to make a lot of dangerous assumptions due to the missing information

Careful readers will have noticed that I suggested:

  • source language ~= implementation language (facilitates bootstrapping)
  • implementation language = target language (facilitates dual purpose use of packages)
  • source language ~= target language (allows work reduction)

Obviously in the ideal case, your source language is identical to your target language, making the implementation language irrelevant (-8.

A good choice: take the same language for targeting and implementation, and take for this a language that formantically resembles your source language.

In this context: do never underestimate the formantic differences between languages that have syntactic similarities and v.v. E.g. although Java looks like C due to the curly braces, it is much closer to Pascal than C.