These articles are written by Codalogic empowerees as a way of sharing knowledge with the programming community. They do not necessarily reflect the opinions of Codalogic.

Understanding the C++ ABI Breakage debate

By: Pete, May 2021

If you hang around C++ Twitter you might have noticed there are people expressing the need (or not) to break the C++ ABI in order to improve C++.

Twitter being Twitter, the debate has quickly moved to "Oh yes you do, ####" and "oh no you don't, ####" without a lot of background on what the issue is actually about.

It does seem important though, so I wanted to understand what is being proposed.

ABI stands for "Application Binary Interface". But I was confused about what ABI means in the context of "C++ ABI Breakage".

My prior understanding of ABI is that it's the rules a compiler uses to layout bits and bytes for things like function calls. For example, if a function call consists of a char argument followed by an int * argument, the bytes put on the stack by the compiler will depend on whether the processor is x86, x64, ARM, Power etc. Similarly, if a struct contains a char followed by an int * the layout of bytes will depend on the processor. This is the compiler's ABI for a specific processor. Usually, you can ignore this form of ABI unless you write compilers. This is a processor specific, programming language specific ABI. This is not the form of ABI being discussed here.

Another ABI context is how an OS (such as Linux) expects bits and bytes to be laid out when making OS system calls. For example, if a processor requires 32-bit word alignment, a system call may require a single byte for one parameter, followed by 3 ignored bytes, followed by 8 bytes that contain an address. The OS doesn't care whether these bytes are written by C, C++, Fortran or Assembler. Hence this type of ABI is processor specific, programming language agnostic. This is not the form of ABI being discussed here.

Compare this to an API or "Application Programming Interface". Here, an API might say:

Function foo is called as int foo( char c, int * p );

An API may also specify the format of structs. And structs may be passed to functions. Thus, an API might say:

Function bar is called as int bar( S s ); where S is defined as struct S { char c; int * p; }.

Now we're getting near to what ABI refers to in discussions of "C++ ABI Breakage".

Let's assume that my program consists of a core that contains main() and uses a library called X. Library X contains the functions foo and bar outlined previously. The system might look like this:

Program structure

Let's also assume that the code in library X hasn't changed for years. So to save time I don't bother re-compiling it from source each time I change core. I just link Library X to core.

When I call function foo the stack will end up looking something like the following:

Stack after calling foo()

This is how Library X is expecting the bytes to be laid out when I ask it to do foo. If somehow core manages to call foo as, say, foo( char c, float f, int * p ), it would not layout the bytes in the way Library X expects them to be laid out for function foo (Remember, we're not re-compiling X). Hence Library X would do the wrong thing. This is an example of an API change, but we only update the API in one module.

Such a change in call signature would be possible in C but not in C++ (due to C++'s function name mangling). But what happens if we call bar with an instance of struct S? The stack would look something like:

Stack after calling bar()

In other words, very similar to the case of calling foo.

But this time, if we were to change S to struct S { char c; float f; int * p; } in core (but not re-compile Library X), the stack would end up looking like:

Stack after calling modified bar()

We'd have the same problem we had when calling foo with different parameters. However, in this case C++ would not prevent us from doing it. Not updating the API in both parts of the code would cause problems.

OK, so far. As we know, classes are fundamentally the same as structs but usually with the data members private. Thus we could have:

class C
{
    char c;
    int * p;
public:
    void method();
}

When using an instance of C we never directly manipulate variables c and p. Hence c and p do not form part of the API of C. The API is made up of methods like method().

Yet, if we had a function baz with signature int baz( C c );, when we called baz the stack would end up looking like:

Stack after calling baz()

In other words, just like if we had called it with an instance of struct S.

If we now add a float f to C to get:

class C
{
    char c;
    float f;
    int * p;
public:
    void method();
}

we would not have changed the API (we'd still interact with the class using the public methods), but a call to baz using C would end up with a stack frame like:

Stack after calling modified baz()

So finally... This is the sort of thing "ABI" and "ABI Breakage" refers to in the context of the "C++ ABI Breakage" debate. The ability (or not) to change the private variables of an STL class from one version of a C++ release to another. Changing such private variables (along with a few other things) would be deemed an ABI Break.

It's very similar to the consequences of an API break. But you don't directly program the items that are changed, so it's not an API issue. Therefore they call it an ABI break. Even though there are already two other usages for term ABI that it could be confused with. And, unlike the other forms of ABI, it's processor agnostic. It's a source code change. If I ruled the world, I would want to call it something different. Like "Library Binary Interface" or "LBI". "C++ LBI" has a nice ring to it.

But back to the plot...

Why the debate?

There are parts of the STL that could be significantly improved if the private variables of STL's classes could be changed for future versions of C++. For example, the speed of std::regex could be improved and it could be modified to work with the UTF-8 character set.

Against having these types of ABI breakages is that some people are not in a position to re-compile their equivalents of Library X. This might be because it hasn't changed for so long that they lost the source code. Or maybe they bought the library in binary form from a third-party (such as a financial or maths package) and the third-party is no longer around to give them a re-compiled version. If they were to compile their equivalent of core with a new version of C++ that included an ABI break, it would cause their application to break. The C++ community has a long tradition of making sure updates do not break old code and some feel this duty of care should include this type of ABI issue.

I hope to look into the pros and cons of each side of the argument in a future post, but for the time being I'll leave the debate here.

Thanks for reading. Have fun and code well.

Comments

Have comments on this article? Add them to the thread on Twitter.

Keywords