The challenge of source-to-source (III): Transforming blues
We saw in the last post of this series that, while the Abstract Syntax Tree is the true one source of knowledge of your program, we need a symbol table for efficiency. This symbol table is created after the declarations that are, of course, represented in the AST.
Our smart readers, I know you are, will have already noticed that if we have two sources of knowledge in the compiler and we are committed to perform source-to-source transformations we will have to change the AST but also the symbol table. The existence of the symbol table is, obviously, a source of redundance. When changing the program we have to be very careful keeping both structures in sync.
And this is maybe the worst thing in source-to-source.
Imagine a famous painting, let's say Leonardo Da Vinci's Mona Lisa. Imagine that we pick a drill and we bore a hole on the painting. We cannot leave the painting this way with a hole! So we fix it, we fix it in a way that nobody notices that we drilled it, although we know we did: this is a transformation in source-to-source.
Let's consider several cases when performing transformations. Let's start with the easy case: adding things. Adding things is very simple cause it is the natural process of any compiler when analyzing. The compiler starts knowing nothing and increasingly, while traversing the code, enlarges its knowledge: variables, functions, declared types. This is why adding things is easy. Well, not that easy, imagine we want to count how many times (in a non-multithreaded world) we call a function. The simplest thing to do is adding a static integer variable that does the counting.
static void f(void)
{
static int _f_count = 0;
_f_count++;
}
This case is pretty easy since we can put it in the beginning of the function and it will always work. But this transformation is also pretty useless since nobody but the function can use this variable. Ok, second approach.
int _f_count = 0;
static void f(void)
{
_f_count++;
}This works but might suffer of another problem. There might be another function called f in another file with static linkage too. Let's forget, though, this case since it is hard to solve even today (although using the filename where the function was defined can reduce the impact of this problem).
For another example, consider we want to call a function when a function is entered and another one when it is left (there is a way to do this automatically with gcc using -finstrument-functions, but not all compilers on the earth are gcc!). The enter call is as easy as before: just do it when entering. The leaving call is a bit more involved: besides of adding a call just before returning we will have to look for all return statements and add the call just before the return. This can get even tougher if we want to be sure that this leaving function is the last thing called before returning the function.
Adding is most of the times easy. Consider removing. Hah! Removing, removing is difficult. Why? Because nothing in the compiler is meant to be removed. It never happens in the compiler that parsing causes some entity to be removed. And even worse than removing is changing.
For instance, consider the following example.
vector int f(vector int a, vector int b)
{
// Element-wise plus operation
return a + b;
}
Imagine that we want to transform vector int x into int x[4] and implement the element-wise addition using a loop. So, we want this.
void f(int *a, int *b, int *_ret)
{
for (int i = 0; i < 4; i++)
_ret[i] = a[i] + b[i];
}
Doing this triggers a whole bunch of problems. We have to replace every call, every declaration of vector int in the code. This is hard. We will see in next posts how we, and others, tackle these problems.
- rferrer's blog
- Add new comment
- 695 reads
