badlang: A bad programming language

Included page "clone:badlang" does not exist (create it now)

Assignment Considered Harmful - 28 May 2013 00:26

Tags:

In most languages, the "=" (assignment) operator is pretty special. Lets look at an example in C:

int a;
a=1;

This code, is roughly doing:

int a;
int one=1;
memcpy(&a,&one,sizeof(int));

Specifically, this assignment writes 1 into the variable a. Thus a on the left hand said (LHS) is not an expression referring to the value of the variable, but instead some kind of reference to the variable itself, which knows its address: a variable of type integer really is a place for an integer, not an integer itself.

This means if you were to write a function to assign ints, it needs to take a pointer to the int:

void store(int *a, int value) {
    *a=value;
}
 
store(&a, 1);

Thus, to be able to set the value of something, you need to be able to get its address, but the syntax for setting something ("="), does not refer to the address. This is rather odd in my opinion. This means that while you can use integer type variables as integers in expressions, integer type expressions can't be used as things to assign to.

You can make an equivalent language where assignment is only done via an explicit pointer, such as using the store function above (obviously, replace the *a=value with some actual memory access).

Now, lets look at a less trivial example:

int myArray[3];
myArray[1]=1;

Here, the LHS is an array element. Note that "myArray[1]" really refers to the location in the array, not what ever value is there like it would on the right hand side (RHS). To translate this to using a store function, we do:

int myArray[3];
store(&(myArray[1]),1);

Or the much simpler:

int myArray[3];
store(myArray+1,1);

Now, what if you want to add types that, like array, can have their elements assigned to? If you are using a store function to store things, then you just need a pointer. If you want to just use =, you either have to return a pointer, and dereference it, then assign to it (implicitly taking its address).

MyClass myInstance;
*(myInstance[1])=1;

Note that dereferenceing here isn't making an integer, its making a place for an integer that could also be used as an integer expression. In C, you are stuck here. Fortunately, like for most problems, C++ has a feature for fixing this, references, so this can work:

MyClass myInstance;
myInstance[1]=1;

There is a lot going on there.

How does a compiler implement this? Well, LLVM just has a store instruction, and references are just pointers, so its simple.

Now lets suppose we have a language with no mutability, and no pointers, just Single Static Assignment (SSA) . Each variable refers only to its value: it will never change, and has no address. This is very simple. Everything that does not access IO is functionally pure. All variables are immutable constants.

Then you just create a set of types which are effectively pointers. They perform IO to their respective memory locations. This can be done as a library, not as a core part of the language (though it will likely need a compiler tie in for the actual Load and Store operations). There is no reason users (and libraries) can't create other kinds of pointers (safe, unsafe, atomic (load and store atomically), const (no load, no store), immutable (no store), garbage collected, owned etc), or types that use pointers.

This means there is no syntactic need for references; if you want to return something that can be stored to, simply do so. It could have a store method, operator, or be of a type you can pass to a store function that lives elsewhere.

Note that this means you don't need an assignment operator, references, or an address of (&) operator. Instead you get functions or methods to do load and store, which could be implemented as operators if desired. You also get some way to create immutable "variables" SSA style.

Here is an example in this style that sums values from 1 to 10, using some approximate BadLang syntax. I haven't come up with an operator style syntax for load and store yet, but there could be one:

// Create a pointer to an integer named sum, and store 0 to it
sum := 0;
// Call a function "range" that takes a start index, an end index and a function that takes an integer
// Pass in a closure over the sum variable that adds the passed in value to it.
range(1, 10, {(i int:) store(s, load(s) + i);};

I think with a well designed syntax, this approach can be an improvement over the approach that most existing languages take. It makes clear the difference between saving an intermediate value (say the output from an expression) for reuse or clarity vs. creating a mutable local variable. This is because instead of creating a mutable named local variable, you create an immutable value that is a pointer to a location for the value which can be over written if thats what you want. (Implementation wise it may point to a slot on your stack frame, a register or whatever the compiler wants really. The key is it is a "where": a variable is a place.)

This has the added benefit of being easy to compile with LLVM, since its based on SSA and Load and Store operations. (LLVM IR is a language that works this way)

There are some complexities: you may end up wanting a programming facing feature like LLVM's GetElementPtr which apparently confuses a lot of people. I think I have good solutions there, but its a work in progress.

In a language where pointers are not special, there is less of the language to learn, and more flexibility for what to build. In the end, you get a simpler language spec, simpler syntax, simpler compiler and more flexible language by pushing these features into libraries.

Theres also massive potential for horrible interoperability between code using different kinds/implementations of pointers, as is the case for all language extensions, be they libraries or otherwise. Thus, there should be a good set in the standard library.

This also leads to the massive issues C++ has: simple looking code can call into tons of library code all over the place, making tooling and debugging horrible, and crippling readability. Unlike with C++ however, it avoids adding tons of stuff to the actual language syntax (no references, no const, no constants, no pointers etc) and compiler.

Explicitly using a "store" function is a bit verbose, so some operator could be used for that if desired, perhaps "pointer<-value" representing a send/store/write to memory. Dereferencing could just use the old unary "*" operator, or maybe something else, like a Go style receive "<-pointer". In a language that properly supports overloading operators, this is straightforward (though again, offers the same horrible set of issues operator overloading in C++ does).

In short, I feel modification of variables via a traditional "=" operator might be a bad design choice for a language. Loading and Storing through pointers is at least as powerful as assigning and accessing a variable, and worked well for LLVM IR. I think we should consider more broadly applying this to languages targeted at human authors.

I think its still possible to write elegant code using collections, structures and such this way, but it will look rather different, and force some rather different thinking. LLVM IR takes this approach, as does SML with its ref type.

I will continue to investigate this topic. I'm considering taking this approach with BadLang. Feedback is welcome. - Comments: 0

Safety - 25 Feb 2012 07:13

Tags:

Type and memory safety are very important. They are handled well in go, so we will steal that design, but throw in some C++ like layered complexity for good measure.

Go's model of having complete type and memory safety, unless you explicty use unsafe is very nice. How can this be generalized in badlang style?

The Goal: Opt-in danger

Rather than opt-in safety, we want opt-in danger. This means that type and memory safe code is identifiable, the default, and can be required simply by disabling the ability to opt in to danger. This comes in 2 flavors: danger provided by the language itself, and danger provided by other modules. Since a goal of badlang it to be simple, the danger thats provided by the language itself can actually be provided via some special core modules, and access to them can be controlled the same as all other modules. That means to opt-ion to dangers, you simply need to import a module that provides those dangers (again, like the unsafe package in Go).

It is important to note that dangers to not get inherited by using dangers. You can use something that lacks memory safety to implement something that is (claims to be) memory safe. Otherwise nothing safe using the lower level tools!

Generalize!

Now we got the safety system down to which modules you can/do import. There are several types of "dangers", so modules can simply register themselves with a list of danger types their public APIs expose. When importing a module, the compiler is provided with a set of modules the module (and any sub-modules it causes to get imported) can access. To restrict access to various types of dangers, the set of modules can simply be filtered.

Some use cases:

  • generally keeping most code free of memory/type errors (and free from having to check for them when debugging!)
  • ability to sandbox code (prevent direct memory access, file system access, syscalls etc)
  • Allowing, but discouraging widespread use of some low level basics which are needed to implement basic language features in standard lib.
  • enforcing clean plugin and other API boundaries inside a project
  • allowing explicit (and easy to find) safety violations where required, such as interfacing with other languages, and some optimizations.
  • enforcing arbitrary project level usage decisions (like preventing some part of the project from using threading, synchronous IO, exceptions, or other random things like that)

Source of Safety

So how exactly can a type and memory safe language be implemented in this system? The basic idea it to stick with the same design as Go, but allocation can be done by functions provided by libraries which internally use operations that are not memory and type safe. Basically a clone of Go in this respect, but new and make come from module in standard lib, not the language itself. Like Go, direct memory access would come through some unsafe types provided by another module. It works for Go, so theres no reason not to steal/copy it. - Comments: 0

Compile Time is Runtime - 16 Feb 2012 07:54

Tags:

One of the key aspects of badlang is the build system. Aside from having a rather bad[lang] bootstrapping problem, its quite novel.

First, a runtime/interpreter is needed. All it needs is the ability to run badlang code. While it may lack the ability to directly generate executables, we shall still call it the compiler.

The compiler needs to be able to run a single build script file written in badlang. This script can do any number of things, but the general process would be as follows:

  1. import compiler modules (specially made available by the compiler)
  2. use compiler modules to generate a module object from the main source file
  3. pass a pointer to the main function of the main module to a function from the compiler modules that builds an executable from it.

This last step is a bit complex. For it to work, badlang needs have a moving garbage collector. The idea is the pointer to the main function is the start point, and it and everything it references is copied into a block of memory, much like a moving garbage collector would do. This is then simply dumped to disk as an executable with all the needed headers and such. A little extra bit of care can be taken to properly separate code and data, and recompile all the code from its LLVM IR for the desired target with high optimization if desired.

The idea is that each module can be "run" to produce a module object. This resulting module can be saved if doing incremental compiles. An import call literally invokes the compiler to go compile (if needed) the module, and return the resulting module object.

This means that the code at the top level in modules is meta-programming. Code to create functions, types, generate modules and more can exist there, and it can even call arbitrary functions. This top level code returns a module object.

As a side note, it is possible to make a badlang program run rather than compile by launching a file with the compiler. Doing this would basically resemble a python program: it would dynamically load and compile all the imported modules, generated top level functions and classes at import time, and could start execution from some entry point in the top level of the main file.

This architecture has some impressive benefits. The compiler functionality is presented to the language, allowing dynamic compiling and loading of code if desired. It allows badlang itself to be used as a build script, and meta-language. It allows arbitrary compile time actions to be performed with no special tools or configuration.

Some things that this makes possible:

  • One could write a function that could be passed a path to some non badlang source file, and return a module that provides binding for it. This could be done with no extra build scripts, tools or configuration. It could even go so far as to invoke the compiler for said language and statically link in the resulting library, though that would require some more features, and config to know what compiler to call and how. But this all can be done my a user provided module, and needs no direct language or tool extensions.
  • Compiling in data files. If desired, file dependencies, say 3D models for a game, could be loaded at compile time (and properly included in incremental compiles).

In addition to many preprocessing and build script type tasks, the meta-programming and compile time execution allows badlang itself to replace many compiler features as well:

  • Enumerations can be implemented with regular code. There is no need for special constructs like Go's Iota.
  • Field and method lookups, and some other type details can be implemented (and customized) in badlang rather than in the compiler. This allows even the concept of subclassing or structs to be a library level, not a language level feature.

To get some of these details (like customized types) to work with good performance, an additional feature is actually needed: compile time execution of deterministic expressions. This means a rather extended version of constant folding. Things like looking up field locations in structs are normally done at compile time, but instead in badlang are implemented with an function that is runtime equivalent to the traditional compile time lookup. This however, can still be "run" when compiling (since it only depends on known values and has no side effects), which reduces it to the expected offset, to get the expected performance.

Having the type system to be simple to implement, but this extensible, without incurring performance overhead, will be a key feature for badlang it it ever works!

Garbage Collection

One other benefit of having compile time be runtime, is that the runtime garbage collector can be used to remove inaccessible code and data. More specifically, a moving Tracing garbage collector can start a traversal at the entry point function (main), and copy/move all needed data into a contiguous region of memory, or even directly into the output binary.

If sufficient care is taken when designing how code and types are represented in memory, this could even strip out reflection data for types that are never reflected on. In programs that end up containing some places that still need dynamic code generation, they will end up including some of the compiler modules, where projects that are fully statically compiled will omit them. - Comments: 0

Why badlang? - 24 Jan 2012 08:57

Tags:

So you think language X is is bad? Join the club! And since its bad, we need a different language right? I think all programming languages are bad, and we need yet another bad programming language: enter badlang!

Badlang is in the design phase. Its designed to be statically dynamic through eagerly evaluating buzzwords (here buzzwords is defined to be a large set of buzzwords, or just the noun "buzzwords", you choose!).

Inspiration

Go lacks generics. How can generics be added without forcing compile or runtime size or time overheads? How about a completely unrelated language that is entirely designed around solving this one issue? Its badlang!

What if all the different flavors of generics, with their different tradeoffs all could be implemented in code, with no extensions, in a language? Such a language would allow worse hacks than C++. Surely it would be a badlang.

How can this be done? Dynamic runtime type creation seems to solve the compile time size/time bloats. Doing this same dynamic creation, but pre-evaluating it at compile time seems to save the runtime slowdown. Lots of other things can also be built with these tools too.

This focus is the core of badlang: make everything dynamic, then allow it to run at compile time to [re]move the overhead.

What does this lead to?

Design goals:

  1. simple -> most features implemented in stdlib
  2. self-metalanguage -> meta-programming is cool, but if your meta-language is not the target language, you can't have meta-[meta-[meta-[meta[…]]]]-programming. Also, less languages if you reuse badlang as the metalang.
  3. usable performant code-> if you want to waste time optimizing, make it possible to go as far as desired making horrible code to make it run fast, but allow such messes to have tolerable APIs.

In short thats badlang! Now for some bad examples:

printf(someStringConstant,[some args or not])

how about this:

printf(someStringConstant).([some args or not])

Explanation:
Here printf takes an immutable string. If the string is known at compile time, the printf function gets called then, and it returns a callable that knows how many and what type of arguments it expects. Then the compiler sees there is a callable getting called, so it enumerates the types passed to the callable (the types of [some args or not]), and passes the types to the callable's function lookup that returns a matching function, but can also throw an error if the types don't match what is expected, and it can do this at compile time if the types and format string are all possible to determine. - Comments: 0


Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License