Mark Tanui | Personal Website

Welcome to Part two of The Low Level Series by me, th3madbit.

Having gone through the C Lang translation process, I’m confident that this part will be easier to follow along.
Nonetheless, this part will be quite more low level than the previous one.

Part One Recap:

In my previous post I compiled 9 lines of C Lang source code into an executable. I also got into the details of how we could stop compilation at each translation process and inspect their output.

Given an executable file, would I be able to get back the source code? I will attempt to answer the question in this part of the series.

However, if I get into reversing or back engineering the executable right now, I’ll have left very important info that I think we should know. Having that in mind, I’ll leave this part to be a theory class on fundamental concepts of the CPU architecture, registers, the heap, the stack and instruction sets.

The above concepts will be weaved in together to lay a decent foundation of back engineering.

The EIP register is simply the most important register you will deal with in reverse engineering.
Imagine that you were able to change the value of the eip register to point to a memory address that you normaly wouldn’t? In the following tutorial, I will try to illustrate how that could be done. We can regard this as our genesis of reverse engineering.

Most of the concepts in the tutorial might be new and confusing especially if it’s your first time. However, that is my intention. I would like to introduce you to how things work and slowly we’ll get into why they work the way the do.

Let’s get into it:

Here’s the code that I will work with through out the tutorial

#include <stdio.h>
#include <stdlib.h>

void hiddenFunction(void){

	char message[] = "I'm hidden. You can't see me!"; 
	printf("%s\n", message);
	exit(0);
}

int main(void){

	printf("Hello ninja, you can't see what's hidden. Or can you?");

	return 0;
}

I compiled it to an executable using the gcc 32-bit utility.

Try the following command to compile the same:

gcc -m32 -o myprogram myprogram.c

If you encounter an error, try installing gcc-multilib with the following commands:

sudo apt-get install gcc-multilib

To debug the program as it runs in memory, I will use the gdb also known as the Gnu Debugger.

Most linux machines come pre-installed with gdb but if for whatever reason yours doesn’t have, run the following:
(I acknowledge my biases towards using linux machines and it’s okay imagine. It’s free SW incase you forgot::)

sudo apt-get install gdb

That’s it ninjas. Let’s get into debugging the program:

Debugging the executable

Run the program with gdb gdb myprogram

Set the disas flavor to intel, a breakpoint to the main function using b main or break main and run the program with r.
When a breakpoint is set, the program will run until it reaches the breakpoint at which point it will pause and wait for further instructions from the debugger.
Just think of it as a pause. Breakpoints are usefull for allowing us to examine the state of the program at a specific point in time.

Disassemble the main function: disas main
At this point, we can see the instructions of the main function in assembly. What we are most interested in would be the eip.
See where the ‘=>’ is at? That’s our target. Take note of the value.

Continue the program: continue and inspect the eip again: x/1xw $eip

!!Note: Our hiddenFunction will never be executed since it has not been called in the main function. However, it is still loaded into the memory as part of the program. Using this knowledge we can go a head and disassemble the function. This would reveal it’s location in the memory. If we do it right, we will be able
to point the eip to it and execute the function. How cool?

Disassamble hiddenFunction disas hiddenFuction

Take note of the memory address where the function begins.

Set the eip to the start of the hiddenFunction set $eip=<address>
Confirm that eip has the address x/1xw $eip

!! Hey ninja, are you still reading?

Last step is to continue the program c or continue

Tada! We hijacked eip and saw the hiddenFunction.

The second mini-part will highlight some of the theory that is important as we continue learning rev.
It’s nonetheless not all the theory we need but I’ll include some references that if you go through them, you’ll get a decent understanding.

CPU architectures:

A CPU, or Central Processing Unit, is the primary component of a computer responsible for carrying out instructions and performing calculations. It is sometimes called the “brain” of a computer.

The architecture of a CPU refers to the way its components are organized and how they interact with each other to carry out instructions. At a high level, a CPU is made up of three main components: the control unit, the arithmetic and logic unit (ALU), and registers.

The control unit is responsible for fetching instructions from memory, decoding them, and then coordinating the execution of those instructions.

The ALU performs arithmetic and logical operations, such as addition, subtraction, and comparisons, on data that is passed to it by the control unit.

Registers are small, high-speed storage locations inside the CPU that are used to hold data and instructions that the CPU is currently working on. Registers are much faster to access than main memory, which makes them essential for efficient operation.

The CPU communicates with other components of the computer through buses, which are essentially channels that allow data to flow between different parts of the system.

Overall, the architecture of a CPU is designed to maximize speed and efficiency in executing instructions, which is crucial for the smooth operation of a computer system.

In the computer market today, the most common CPU architectures are the x86s (32-bit & 64-bit). Both Intel and AMD produce these CPUs which have been used in Linux, Mac and Windows systems. ARM is equivalently a common producer of 32 and 64-bit CPU architectures.

You have probably been downloading an application binary, and were presented with an array of options to choose from depending on the architecture of your machine. Yes, the binaries are compiled differently to accomodate the difference in CPU architectures.

But what might be the difference between these architectures? I will highlight two areas in which the main differences emanate from. One, the size of the instruction set handled and two, the type of instruction set used.

Instruction size of the CPU (32-bit, 64-bit):

Memory Capacity: 32-bit systems can address up to 4 GB of RAM, while 64-bit systems can address up to 16 exabytes (16 billion GB) of RAM. This means that 64-bit systems can handle more memory-intensive applications and processes.

Processor Architecture: 32-bit processors can only handle 32-bit instructions, while 64-bit processors can handle both 32-bit and 64-bit instructions. This means that 64-bit processors can handle larger amounts of data at once and can process data more quickly.

Instruction set Used (CISC, RISC):

ARM and x86 processors use different instruction sets.

ARM processors use the ARM instruction set, which is a reduced instruction set computing (RISC) architecture, while x86 processors use the x86 instruction set, which is a complex instruction set computing (CISC) architecture.

RISC processors typically have a smaller instruction set, which makes them more efficient in certain types of computing tasks, while CISC processors have a larger instruction set, which makes them better suited for complex tasks that require a lot of data manipulation.

Although nearly identical, there are some differences between the two instruction sets in the semantics of a few seldom used machine instructions (or situations), which are mainly used for system programming. Compilers generally produce executables (i.e. machine code) that avoid any differences, at least for ordinary application programs. This is therefore of interest mainly to developers of compilers, operating systems and similar, which must deal with individual and special system instructions.

The Heap:

The heap is a section in the memory of a computer used in the dynamic allocation of memory. The heap is separate from the program and is used to store data that can be randomly accessed, unlike the stack where memory access is sequential.

For example in C lang, the malloc() and free() functions are used for dynamic memory allocation. The technicalities of the heap are quite complex since a lot of system calls are involved. Nonetheless, in this article I will highlight it’s pros and cons in the most abstract way possible.

One advantage of using the heap is that it allows a program to dynamically allocate memory at runtime, which can be useful in situations where the amount of memory needed by a program is not known at compile-time. This can help to reduce memory waste and improve the efficiency of a program.

However, there are also some potential disadvantages to using the heap. One issue is that dynamic memory allocation can lead to memory fragmentation, which can make it difficult to allocate large blocks of memory. Another issue is that heap memory can be slower to access than stack memory, which can impact the performance of a program.

As a result, programmers should ensure they use heap memory correctly and that it is properly allocated and released.

The Stack and Registers.

While reading posts and docs from other ninjas, I came across some very good resources on registers and the stack.
It would be very unloving for me not to share some of those. I have used them and got great value from them.
Here are the resources:

That’s it for now. Might take a while before getting into the next part.
“Hack more, Hate less” - TCM

hack more

I'm Tanui | th3madbit

The Low Level Series - EIP Hijacking

Part One Recap:

TOC

The `eip` hijacking:

Debugging the executable

CPU architectures:

Instruction size of the CPU (32-bit, 64-bit):

Instruction set Used (CISC, RISC):

The Heap:

The Stack and Registers.

The Low Level Series - EIP Hijacking

Part One Recap:

TOC

The eip hijacking:

Debugging the executable

CPU architectures:

Instruction size of the CPU (32-bit, 64-bit):

Instruction set Used (CISC, RISC):

The Heap:

The Stack and Registers.

The `eip` hijacking: