“Let’s Build a Compiler”

By | June 24, 2011

Ever wanted to write your own programming language? Defining one is easy (relatively speaking). Building the compiler for it…well, not so much. Usually, that’s where most people (including myself) stop. To solve this problem, Jack W. Crenshaw started a series titled, Let’s Build a Compiler where he covered the basics of building a simple compiler before starting to write a real one as the series continued. It was never finished and the last article appeared in 1995. Despite stopping short of a “real” compiler, it is an excellent starting point for those wanting to start building them.

The code used in the series was designed to run on Turbo Pascal 4.0 (from back in the 1980’s). While the code is old and the compiler to compile isn’t used anymore, it’s still compilable thanks to the Free Pascal project. Alternative, just get a copy of Bloodshed’s open source Dev-Pascal IDE which includes Free Pascal.

The assembly code that is being generated is for Motorola 68000 with the SK*DOS operating system. I don’t know of anyone with a device still running on that processor and operating system so the code may not be so useful in itself. However, M86K assembly is by far, easier to read and understand than more modern assembly languages such as x86. That really does help with understanding the code generator for those (including, again, myself) who have never written anything in assembly.

All usefulness for the generated code is not lost however! The EASy68K program provides an open source interpreter for M68K assembly code. You can use it to make sure that the compiler works. Note that you will have to change the the Header() procedure which generates a header for SK*DOS (“WARMST  EQU $A01E”) which is not very useful in any way. Once you change that to something like “ORG    $1000”, then you should be able to run it properly.

If you really have that much of an issue with Pascal and M68K, there are alternatives. There is a translation of the entire series to Forth and x86 from 2005. There’s another version ported to C generating P-code. The site is unavailable at the time of this post so I’m attaching the compiler to the post. Note that it isn’t a direct translation and there are many changes. There are two more versions on this page of the Tiny language, one written in C and the other in C++ derived from the series. The first one is attached to the first post is in C and generates Propeller Assembly (PASM) code for small devices. I tested it and it works quite well. The second one, written in C++, is attached to the 8th post and generates x86 assembly for GNU/Linux and is compatible with the GNU Assembler so it may be of value for those wanting to extend their compilers into real world use. For those running on Intel Macs (couldn’t find any PowerPC version), here is another translation to C code producing x86 assembly with Mac headers. Finally, there’s a Haskell version complete with its own tutorial paralleling the original. The code generated is x86 and appears to work on Mac and Windows (I have not tested it so I can’t say for sure).

That should be more than enough to get you started on your own compiler. So how exactly should you write it? That’s really up to you but I would recommend first writing the compiler in whatever language you choose with the given M86K code generator due to its readability. After thoroughly understanding the compiler and the code it produces, start writing a code generator for x86 using any of the above implementations.

Keep in mind that the Tiny compiler is designed for learning rather than for production. The code produced by it is horrible at best (though it may still beat out certain interpreters which will remain unnamed here). For one, absolutely no optimization is performed so don’t expect to use it in a production environment where other compilers exist. You will still have to read through other (dizzying) compiler books before it becomes a possibility.

I mentioned earlier that I gave up writing an actual compiler from scratch. I’m currently in the progress of writing another one and am following my guidelines I set above. Perhaps by next year, it will finally be usable. Until then, there’s still a lot of work to be done.

Good luck to all those compiler writers out there!

2 thoughts on ““Let’s Build a Compiler”

  1. Kingston

    “Defining one is easy (relatively speaking). Building the compiler for it…well, not so much.”

    You should write about #1, then! For a lot of us, writing an implementation of a specification is easy. It’s coming up with a useful, consistent, usable specification that’s mind-bogglingly difficult.

    I look at other languages, and I usually have no idea how the creators came up with what they did. Once I had that, though, it would be easy (or at least straightforward) to write a compiler for it.

    Kind of like how I never would have thought of Craigslist or Ebay or Reddit, but if I knew those would be big ideas, I wouldn’t have much trouble implementing them. There’s no magic required to write them.

    1. North Post author

      It’s only easy if you know how to do it. This post obviously isn’t for those people who already know how. And writing a specification to a language is different from inventing the programming language. There are a lot of languages out there and many have relatively simple specifications that make sense, even to those new to the field. Notice how I didn’t say that defining a *good* language is easy. I just said that defining any old language that works is easy. It can be done within 10 minutes if you really felt like it. Now write a compiler for that language…odds are, it’ll take a lot longer than 10 minutes unless that language is REALLY simple.


Leave a Reply

Your email address will not be published. Required fields are marked *