badlang: A bad programming language

Included page "clone:badlang" does not exist (create it now)

Assignment Considered Harmful - 28 May 2013 00:26

Tags:

In most languages, the "=" (assignment) operator is pretty special. Lets look at an example in C:

int a;
a=1;

This code, is roughly doing:

int a;
int one=1;
memcpy(&a,&one,sizeof(int));

Specifically, this assignment writes 1 into the variable a. Thus a on the left hand said (LHS) is not an expression referring to the value of the variable, but instead some kind of reference to the variable itself, which knows its address: a variable of type integer really is a place for an integer, not an integer itself.

This means if you were to write a function to assign ints, it needs to take a pointer to the int:

void store(int *a, int value) {
    *a=value;
}
 
store(&a, 1);

Thus, to be able to set the value of something, you need to be able to get its address, but the syntax for setting something ("="), does not refer to the address. This is rather odd in my opinion. This means that while you can use integer type variables as integers in expressions, integer type expressions can't be used as things to assign to.

You can make an equivalent language where assignment is only done via an explicit pointer, such as using the store function above (obviously, replace the *a=value with some actual memory access).

Now, lets look at a less trivial example:

int myArray[3];
myArray[1]=1;

Here, the LHS is an array element. Note that "myArray[1]" really refers to the location in the array, not what ever value is there like it would on the right hand side (RHS). To translate this to using a store function, we do:

int myArray[3];
store(&(myArray[1]),1);

Or the much simpler:

int myArray[3];
store(myArray+1,1);

Now, what if you want to add types that, like array, can have their elements assigned to? If you are using a store function to store things, then you just need a pointer. If you want to just use =, you either have to return a pointer, and dereference it, then assign to it (implicitly taking its address).

MyClass myInstance;
*(myInstance[1])=1;

Note that dereferenceing here isn't making an integer, its making a place for an integer that could also be used as an integer expression. In C, you are stuck here. Fortunately, like for most problems, C++ has a feature for fixing this, references, so this can work:

MyClass myInstance;
myInstance[1]=1;

There is a lot going on there.

How does a compiler implement this? Well, LLVM just has a store instruction, and references are just pointers, so its simple.

Now lets suppose we have a language with no mutability, and no pointers, just Single Static Assignment (SSA) . Each variable refers only to its value: it will never change, and has no address. This is very simple. Everything that does not access IO is functionally pure. All variables are immutable constants.

Then you just create a set of types which are effectively pointers. They perform IO to their respective memory locations. This can be done as a library, not as a core part of the language (though it will likely need a compiler tie in for the actual Load and Store operations). There is no reason users (and libraries) can't create other kinds of pointers (safe, unsafe, atomic (load and store atomically), const (no load, no store), immutable (no store), garbage collected, owned etc), or types that use pointers.

This means there is no syntactic need for references; if you want to return something that can be stored to, simply do so. It could have a store method, operator, or be of a type you can pass to a store function that lives elsewhere.

Note that this means you don't need an assignment operator, references, or an address of (&) operator. Instead you get functions or methods to do load and store, which could be implemented as operators if desired. You also get some way to create immutable "variables" SSA style.

Here is an example in this style that sums values from 1 to 10, using some approximate BadLang syntax. I haven't come up with an operator style syntax for load and store yet, but there could be one:

// Create a pointer to an integer named sum, and store 0 to it
sum := 0;
// Call a function "range" that takes a start index, an end index and a function that takes an integer
// Pass in a closure over the sum variable that adds the passed in value to it.
range(1, 10, {(i int:) store(s, load(s) + i);};

I think with a well designed syntax, this approach can be an improvement over the approach that most existing languages take. It makes clear the difference between saving an intermediate value (say the output from an expression) for reuse or clarity vs. creating a mutable local variable. This is because instead of creating a mutable named local variable, you create an immutable value that is a pointer to a location for the value which can be over written if thats what you want. (Implementation wise it may point to a slot on your stack frame, a register or whatever the compiler wants really. The key is it is a "where": a variable is a place.)

This has the added benefit of being easy to compile with LLVM, since its based on SSA and Load and Store operations. (LLVM IR is a language that works this way)

There are some complexities: you may end up wanting a programming facing feature like LLVM's GetElementPtr which apparently confuses a lot of people. I think I have good solutions there, but its a work in progress.

In a language where pointers are not special, there is less of the language to learn, and more flexibility for what to build. In the end, you get a simpler language spec, simpler syntax, simpler compiler and more flexible language by pushing these features into libraries.

Theres also massive potential for horrible interoperability between code using different kinds/implementations of pointers, as is the case for all language extensions, be they libraries or otherwise. Thus, there should be a good set in the standard library.

This also leads to the massive issues C++ has: simple looking code can call into tons of library code all over the place, making tooling and debugging horrible, and crippling readability. Unlike with C++ however, it avoids adding tons of stuff to the actual language syntax (no references, no const, no constants, no pointers etc) and compiler.

Explicitly using a "store" function is a bit verbose, so some operator could be used for that if desired, perhaps "pointer<-value" representing a send/store/write to memory. Dereferencing could just use the old unary "*" operator, or maybe something else, like a Go style receive "<-pointer". In a language that properly supports overloading operators, this is straightforward (though again, offers the same horrible set of issues operator overloading in C++ does).

In short, I feel modification of variables via a traditional "=" operator might be a bad design choice for a language. Loading and Storing through pointers is at least as powerful as assigning and accessing a variable, and worked well for LLVM IR. I think we should consider more broadly applying this to languages targeted at human authors.

I think its still possible to write elegant code using collections, structures and such this way, but it will look rather different, and force some rather different thinking. LLVM IR takes this approach, as does SML with its ref type.

I will continue to investigate this topic. I'm considering taking this approach with BadLang. Feedback is welcome. - Comments: 0


Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License