Compile Time is Runtime

16 Feb 2012 07:54
Tags

Previous: Why badlang?

Next: Safety

Back to list of posts

One of the key aspects of badlang is the build system. Aside from having a rather bad[lang] bootstrapping problem, its quite novel.

First, a runtime/interpreter is needed. All it needs is the ability to run badlang code. While it may lack the ability to directly generate executables, we shall still call it the compiler.

The compiler needs to be able to run a single build script file written in badlang. This script can do any number of things, but the general process would be as follows:

  1. import compiler modules (specially made available by the compiler)
  2. use compiler modules to generate a module object from the main source file
  3. pass a pointer to the main function of the main module to a function from the compiler modules that builds an executable from it.

This last step is a bit complex. For it to work, badlang needs have a moving garbage collector. The idea is the pointer to the main function is the start point, and it and everything it references is copied into a block of memory, much like a moving garbage collector would do. This is then simply dumped to disk as an executable with all the needed headers and such. A little extra bit of care can be taken to properly separate code and data, and recompile all the code from its LLVM IR for the desired target with high optimization if desired.

The idea is that each module can be "run" to produce a module object. This resulting module can be saved if doing incremental compiles. An import call literally invokes the compiler to go compile (if needed) the module, and return the resulting module object.

This means that the code at the top level in modules is meta-programming. Code to create functions, types, generate modules and more can exist there, and it can even call arbitrary functions. This top level code returns a module object.

As a side note, it is possible to make a badlang program run rather than compile by launching a file with the compiler. Doing this would basically resemble a python program: it would dynamically load and compile all the imported modules, generated top level functions and classes at import time, and could start execution from some entry point in the top level of the main file.

This architecture has some impressive benefits. The compiler functionality is presented to the language, allowing dynamic compiling and loading of code if desired. It allows badlang itself to be used as a build script, and meta-language. It allows arbitrary compile time actions to be performed with no special tools or configuration.

Some things that this makes possible:

  • One could write a function that could be passed a path to some non badlang source file, and return a module that provides binding for it. This could be done with no extra build scripts, tools or configuration. It could even go so far as to invoke the compiler for said language and statically link in the resulting library, though that would require some more features, and config to know what compiler to call and how. But this all can be done my a user provided module, and needs no direct language or tool extensions.
  • Compiling in data files. If desired, file dependencies, say 3D models for a game, could be loaded at compile time (and properly included in incremental compiles).

In addition to many preprocessing and build script type tasks, the meta-programming and compile time execution allows badlang itself to replace many compiler features as well:

  • Enumerations can be implemented with regular code. There is no need for special constructs like Go's Iota.
  • Field and method lookups, and some other type details can be implemented (and customized) in badlang rather than in the compiler. This allows even the concept of subclassing or structs to be a library level, not a language level feature.

To get some of these details (like customized types) to work with good performance, an additional feature is actually needed: compile time execution of deterministic expressions. This means a rather extended version of constant folding. Things like looking up field locations in structs are normally done at compile time, but instead in badlang are implemented with an function that is runtime equivalent to the traditional compile time lookup. This however, can still be "run" when compiling (since it only depends on known values and has no side effects), which reduces it to the expected offset, to get the expected performance.

Having the type system to be simple to implement, but this extensible, without incurring performance overhead, will be a key feature for badlang it it ever works!

Garbage Collection

One other benefit of having compile time be runtime, is that the runtime garbage collector can be used to remove inaccessible code and data. More specifically, a moving Tracing garbage collector can start a traversal at the entry point function (main), and copy/move all needed data into a contiguous region of memory, or even directly into the output binary.

If sufficient care is taken when designing how code and types are represented in memory, this could even strip out reflection data for types that are never reflected on. In programs that end up containing some places that still need dynamic code generation, they will end up including some of the compiler modules, where projects that are fully statically compiled will omit them.

Comments: 0

Add a New Comment

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License