Introduction

Type checking and Pointers, Part 2

Posted by

Jahan Addison on December 12th, 2025

A few weeks after the first type-checking post, I was satisfied with the object table and type checker. So I started engineering the backend - I chose x86-64 as the first platform for code generation.

I've made a lot of headway, but one thing has stuck out to me. Recall in my last post, I talked about a type system "for free" by being prudent in my data structure decisions - in reality, it would have been harder to build this compiler without a type system.

The B language specification contains a lot of undefined behavior simply because, at the time, resources were not available as they are today. Take a look at this code:

main() {
  auto x;
  x = test("hello world");
  print(test(x));
}

test(y) {
  return(y);
}

There are some curious questions without a type system: what should be stored at x ? What is the return type of the test function? What is its size? For a clue, here's the x86-64 assembly of x = "hello world"

.data

._L_str1__:
    .asciz "hello world"

.text
    .global _start

_start:
    push rbp
    mov rbp, rsp
    lea rax, [rip + ._L_str1__]
    mov qword ptr [rbp - 8], rax

The x local variable, which is at the stack offset rbp - 8 is loaded with the address at rip + .L_str1__ - that is, x is now a pointer - and in C terms a "pointer to const char." From the B snippet, that is very unclear, and indeed difficult to implement because x is not typed, so its storage is undefined until initialization.

A solution could be multi-pass analysis, constructing and reallocating by reading the source code over and over, but I had a better idea. Here's a reminder of what my IR looks like:

__main():
 BeginFunc ;
    LOCL x;
    x = ("hello world":string:11);
_L1:
    LEAVE;
 EndFunc ;

Still, even with this much information, you'd need to backtrack (or worse, internally reallocate the %rbp stack) without knowing what x is at construction time.

I under no circumstances want to reallocate the storage on the %rbp stack; I'd consider that catastrophic. Instead, I've enforced pointer to auto declarations at compile-time for ALL pointer rvalue types, including strings, which will always be an offset of %rip in the data section. Further, function parameters must be defined with the pointer syntax if the rvalue is meant to be a pointer - the first snippet does not compile. Here's the valid B version in Credence:

main() {
  auto *x;
  x = test("hello world");
}

test(*y) {
  return(y);
}

This is great because I don't have to analyze the code in another pass to know the storage size of test, x, and the parameter y at compile-time. And, a non-pointer rvalue is not allowed to test , so test(5) for example is a compile-time error.

Moreover, I said that x is basically a const char * - but there's not much else it could be! The other option is all pointers are void* and you sort of "cast" to more specified types for accurate pointer decay, but that sounds terribly error-prone.

My type system works by inference at initialization, so auto *k and *k = 5 is now a "pointer to int", The term "pointer to int" and "pointer to K" in general becomes a little clearer if you look at this B example:

main() {
  auto *x;
  extrn strings;
  x = strings[1];
}

strings [3] "string one", "string two", "string 3";

In x86-64 assembly, it becomes:

.data

._L_str1__:
    .asciz "string 3"

._L_str2__:
    .asciz "string one"

._L_str3__:
    .asciz "string two"

strings:
    .quad ._L_str2__

    .quad ._L_str3__

    .quad ._L_str1__

.text
    .global _start
    .extern print

_start:
    push rbp
    mov rbp, rsp
    mov rax, qword ptr [rip + strings+8]
    mov qword ptr [rbp - 8], rax
    mov rax, 33554433
    mov rdi, 0
    syscall

Look at the strings label, do you see that? The stacked .quad directives? .quad is a directive to store a "quad word" or address storage that fits in an x64 register - a pointer.

That means strings is a pointer to strings const char ** . Take another look at how we assign an index to x at rpb - 8 :

  mov rax, qword ptr [rip + strings+8]
  mov qword ptr [rbp - 8], rax

%rip is the relative address register to access the data section, strings is the label of the address at the start of our strings data. Each "element" is a quad word or 8 bytes, so to get strings[1] we need [rip + strings+8]

For another example, look at a pointer to ints or int[]:

.data

ints:
    .long 1

    .long 2

    .long 3

This is how arrays are defined in the data section, and furthermore, I've demonstrated pointer decay: The address of x[1] for strings or ints will always be at the RIP address + sizeof(k) * index. However, to keep the compiler dynamic as in the original B language, I allow arrays of any non-homogeneous type, and then type them at initialization at compile-time.

Early in my career, I was taught that C is 'portable assembly,' or intrinsic. But it's more than that. C is essentially a structural map of the CPU. The hardware design didn't just support the language - it defined it. You can't edit a pointer to const char for a reason.

Type checking and Pointers, Part 2

Jahan Addison

Merry Christmas 🎄

Type Checking and Pointers