Example

This is a very rough mockup of how a badlang project might look. Note that badlang is in the design stage, there is no implementation yet.

The current syntax used here is mostly that of Go.

We shall cover the language's design by walking through the build process for this sample.

An introduction and inspiration

Python is a fantastic language. There are a few things is does differently that are incredibly powerful:
Everything is done at runtime: If you write code in python, even if its a class of function definition, it is run at runtime. You can create classes, meta-classes, functions and variables all at runtime, and they can have attributes that are looked up at runtime through through look up functions that are defined at runtime. You can load and compile code at runtime, and even import C extension modules at runtime (in cpython). In a sense, this makes python its own meta-language since python functions and classes are created by python code.

In python, when a module is imported, the code at its base level is run, and this creates all the functions, classes and data that lives in the module's namespace. Most python projects are structured such that the main module (the entry point) imports some packages, which in turn import others and so fourth. Once these imports are done, there is a collection (generally a directed acyclic graph) of module namespaces full of functionality available.

Then, the "real" code starts running, and it uses these modules as needed.

What if, the constructed set of modules could be saved, in a compiled executable, along with an entry point function? This would be equivalent to the standard compiled C style application, but the whole set of module namespaces would be constructed by code in the language itself by running it.

This is exactly what badlang aims to do in an very explicit manner: run regular badlang code to setup all the namespaces and their contents needed by the program itself, then generate an executable including the entry point function, and this graph of dependancies.

The build script: build.bad

var compiler Module=import("compiler");
var main Module=import("main");
compiler.optimize(main);
compiler.writeExecutable(main.entry,"main");

This does exactly what was discussed above. It imports the project's entry point module, main, which in turn includes other modules. Then, it looks up and extracts main's main function (main.main), and generates an executable from this, which will walk all the references and include all needed parts of the dependency graph.

A couple of points of interest:

  • there is really nothing special about Modules. They can be put in variables and passed around.
  • despite the static typing, and being regular objects of type Module, the different Module instances (and all types actually) can have attributes with different names accessible through dot notation on.
  • Theres an optimize in here! It eagerly evaluates lots of stuff. More on that later.
  • The modules that are being compiled is loaded into the namespace, and are fully accessible. This script could just as easily end with "main.entry()" to run the program instead of dumping an optimized executable. This would cause the "build script" to be really just a "script" running the whole program inside the compiler much like python does.
  • The import is a regular function call, that is evaluated when the code is run, and simply passed the module name, or maybe path (to be determined). This includes loading the file, and compiling it.

build.bad is special in that it is a script, not a module like the other .bad files (perhpas it should have a different extension).
Despite this difference, it is compiled in the same manner, and uses the same syntax. There really is no special distinction, except the type of function that wraps it, and whats done with it. This will be detailed later.

This looks very much like regular statically typed compilable code, and thats basically what it is, with a few interesting differences which we will address soon.

In real use, we would probably have some state setup in the compiler to make it optimize and cache (for incremental compiles) all imported modules, but lets ignore such complexities/optimizations for now.

An example entry point module, main.bad:

Here is a pretty trivial module file:

var sub Module=import("sub")
 
switch name{
case "entry" : return func(){
        print(doFibonacci(7));
    }
case "doFibonacci" : return func(a int)(int){
        return sub.fibonacci(a);
    }
}
return nil;

This defines the body of a function that takes a string "name", and us used to resolve attributes just like python's getattr. Based on this code, if one wants to access this modules "entry" attribute (via main.entry), they will get a function that takes no arguments, and returns nothing.

The entry function happens to contain a call to doFibonacci. Since there is no local doFibonacci symbol, it is assumed to be at the module level. Thus code calling main's attribute, in python terms, main.getattr("doFibonacci") look up function will be generated.

If you's paying attention, this should seem like it incurs a lot of overhead, but it gets a lot worse: the first line 'var sub Module=import("sub")'. This happens to live inside main.getattr, meaning every time you get an attribute of main, it has to locate, and compile the sub module! Clearly this is way to horrible.

For completeness, here is a potential (but silly) implementation of sub.bad:

switch name{
case "fibonacci" : return func(a int)(int){
        if a<2{
            return 1;
        }
        return fibonacci(a-1)+fibonacci(a-2);
    }
}
return nil;

One optimization: Partial Evaluation

A language that requires a "sufficiently smart" compiler to perform well is quite screwed unless the particular smarts the compiler needs are well defined. To perform well at all, badlang needs exactly one optimization to be implemented, and this one optimization is very heavily used. The optimization is Partial Evaluation: pre-evaluating deterministic expressions at compile time. This is what the "optimize" call in the build.bad script above does.

There will likely be syntax constructs to require something is able to be pre-evaluated in this way, or throw a compile error, and there will be hooks for easily viewing what parts of the code can be run this way. Its not a ignorable transparent optimization, it's very important that the programmer keeps it in mind.

Optimizing imports

The "import" function is defined to be deterministic, and thus can always be pre-evaluated its parameters are known. In main.bad, notice that the sub module is never reassigned, or has its address taken. Thus, it is a constant once assigned (perhaps this should be explicit, not implicit, but thats a syntax issue). Since its a constant once assigned, and its initial value can be pre-evaluated, the optimization pass can replace it with a single constant Module instance. This removes the horrible issue of reimporting it every every time an attribute is accessed.

Optimizing atribute lookups

Python goes to some real work to make attribute lookups fast, but its still rather slow, and has the annoyance that it never fails at compile time, only at run time. Bad land fixes both these issues using its one optimization. Assuming an attribute lookup function is deterministic and has no side effects, it can be pre-evamuated it the parameters are known. Since the parameter is generally the name of the symbol, which is always known, usually the attribute lookup can be replaced with a constant value (such as a function pointer).

There is also the the issue of typing. The attribute lookup function returns values of many different types. This is a mess that still needs some design work, but the simple analysis is as follows: assume some boxed implementation that holds the type, and value, and properly asserts the type before the conversion to the return type expected at the call site. This has lots of overhead, but since in this case the contents of the box (value and type) are deterministic, and the un-boxing has no side effects, what ever slow un-boxing is needed can also be pre-evaluated, resulting in no runtime overhead. In the cases shown in this example, the result is a direct function call to a known function, despite all the dynamic lookup and typing.

Optimizing computation

With pre-evaluation, you can get the performance you get from constants, while still computing them using any pure functions. In this case, pre-evaluation would simply reduce this entire program to print(21).

Other optimizations

To allow cross compiling, and further optimization (and just make things not crazy confusing), during compilation the code is kept in 2 forms: LLVM IR, and compiled code for the current architecture. The compiled code is used for making pre-evaluation calls, and is generated lazily from the LLVM IR by LLVM's JIT. This means, that once badlang has finished its pre-evaluation optimizations, LLVM can run its standard optimizations. In the example of attribute lookup above, badlang converts a dynamic lookup to a static call. In this case LLVM can inline it if desired. This means badlang can have python like dynamic lookup, but often the fully optimized C level performance including inlining.

This also means that a slight mistake, like introducing a potential side effect into your get attribute function can make what would normally be near optimal into a horribly slow abomination. That why programmers really need to be kept aware of the one optimization badlang requires, pre-evaluation, and the tool chain much provide tools for this.

Compiler performance

I expect roughly constant time per unit of source, except when large amounts of pre-evaluation are done. It won't be blindingly fast, but I don't see a reason for it to be particularly slow since the modularity and locality should be good, and the optimizations are pretty straight forward assuming proper purity annotation during the AST generation.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License