I'm Tanui | th3madbit

Computer Science | AppSec | Poetry & Philosophy | AND THINGS ...


The Low Level Series - C Lang Translation

Published March 28, 2023

The Low Level Series.

                                 ___       ___          __                     
                                  |  |__| |__     |    /  \ |  |               
                                  |  |  | |___    |___ \__/ |/\|               
                                                                               
                              ___       ___          __   ___  __     ___  __  
                        |    |__  \  / |__  |       /__` |__  |__) | |__  /__` 
                        |___ |___  \/  |___ |___    .__/ |___ |  \ | |___ .__/ 
                                                                                               

Perhaps unintentionally, my Computer Science major has been introducing several ideas that completely floor me.

At the moment, working on Compiler Design is undoubtedly not a bed of roses. Roses are red, violets are blue, assembly code is not human readable.

I decided to make studying for my exams a little bit enjoyable because otherwise it wouldn’t be. In order to create brief essays about what I’m learning, I’ll try to blend the knowledge I learned in my Compiler Design class with reverse engineering.

Notwithstanding my complaints that compiler design is brain-fucking, I have come to understand just how much goes on behind the scenes to determine whether programs run or not. These machines experience a lot. Close your laptops and let them rest.

To start off the Series:

Table of Contents:

C Language Translation

In this first article, I’ll look at how the C language source code is translated to an executable. C is a compiled language and perfectly compliments this series. The C Lang has a relatively simple syntax and a small set of keywords and operators, which makes it easy to learn and use.

Translation Process of C

  • Preprocessing: In this stage, the preprocessor processes the source code and performs tasks such as expanding macros, including header files, and removing comments. The preprocessor generates an expanded source code file, which is then passed to the compiler.

  • Compilation: The compiler takes the preprocessed source code and generates object code, which is machine code in a format that can be linked with other object files to create an executable file. The compiler performs tasks such as syntax and semantic analysis, code optimization, and code generation. The output of the compilation stage is an object file.

  • Linking: The linker takes one or more object files, along with any necessary libraries, and combines them into a single executable file. The linker performs tasks such as symbol resolution, relocation, and linking of external references. The output of the linking stage is an executable file.

Summary of the translation process:

Source code --> Preprocessor --> Preprocessed source code --> Compiler --> Object code --> Linker --> Executable file

Attention (Here comes the source of your troubles):

During the translation process, errors or warnings may be generated at any stage, indicating syntax or semantic issues in the source code. It’s important to resolve these issues before proceeding to the next stage of the process.

GCC (GNU Compiler Collection)

GCC is a popular open-source compiler suite that supports several programming languages including C. GCC performs all the processes involved in the translation process of C, which I have highlighted above.

gcc when used by default runs the entire translation process producing an executable object file. In this series however, I’ll use specific options provided by gcc to stop the translation and inspect the result of each stage.

gcc should be already installed in your computer if you’re using a linux distro. While at it, use free softwares and contribute in building better versions of them.

With that background I can go ahead and start the translation process.

Here is the source code I’ll be using:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char const *argv[])
{
	char message[] = "C Lang translation with th3madbit x03";
	printf("%s\n", message);
	return 0;
}
1. gcc by default:

As I had mentioned ealier, gcc by default translates the source to an executable if no options are specified.

Example:

gcc-default

2. gcc -> pre-processor stage:

For example, to stop the translation process after preprocessing, you can use the following command:

gcc -E source.c -o preprocessed.c

-E - this option specifies that the translation should stop after proprocessing

-o - specifies source output (preprocessed.c)

Observe: What was 9 lines of code has been preprocessed to 1836 lines. (Couldn’t include all that)

3. gcc -> compilation stage:

To stop the translation process after compilation and generate an assembly file, you can use the following command:

gcc -S source.c -o assembly.s

Here is the output for that: (Is this the Assembly code that is human readable?)

	.file	"source.c"
	.text
	.globl	main
	.type	main, @function
main:
.LFB6:
	.cfi_startproc
	endbr64
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$64, %rsp
	movl	%edi, -52(%rbp)
	movq	%rsi, -64(%rbp)
	movq	%fs:40, %rax
	movq	%rax, -8(%rbp)
	xorl	%eax, %eax
	movabsq	$8367801831430824003, %rax
	movabsq	$7598805589701845362, %rdx
	movq	%rax, -48(%rbp)
	movq	%rdx, -40(%rbp)
	movabsq	$2335244403110604399, %rax
	movabsq	$7593742291306768500, %rdx
	movq	%rax, -32(%rbp)
	movq	%rdx, -24(%rbp)
	movl	$813178996, -16(%rbp)
	movw	$51, -12(%rbp)
	leaq	-48(%rbp), %rax
	movq	%rax, %rdi
	call	puts@PLT
	movl	$0, %eax
	movq	-8(%rbp), %rdx
	subq	%fs:40, %rdx
	je	.L3
	call	__stack_chk_fail@PLT
.L3:
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE6:
	.size	main, .-main
	.ident	"GCC: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0"
	.section	.note.GNU-stack,"",@progbits
	.section	.note.gnu.property,"a"
	.align 8
	.long	1f - 0f
	.long	4f - 1f
	.long	5
0:
	.string	"GNU"
1:
	.align 8
	.long	0xc0000002
	.long	3f - 2f
2:
	.long	0x3
3:
	.align 8
4:

Observe: From 9 lines of source code to 64 lines of Assembly code.

4. gcc -> object file stage:

And to stop the translation process after compilation and generate an object file, you can use the following command:

gcc -c source.c -o object.o

An object file is an intermediate file format produced by a compiler during the compilation process. It contains machine code in a form that is not yet executable and is meant to be further processed during the linking process to create an executable file.

At this stage the program is already compiled, any errors removed and ready to be linked.

5. Object file -> executable:

Once you’ve generated the object file, you can link it to any other object file if any. You can also use gcc for that.

gcc -o executable object.o

img

FIN

To sum it up, I started off with the translation process and the stages involved in that. Preprocessing -> Compilation -> Linking. I have also demonstrated how you we can use gcc to generate files from each stage and inspect what’s under the hood.

That’s it for C Lang translation.

That was much fun. See you in the next ones. I’ll be getting into the nuts and bolts of the x86 architechture and Assembly.