This is part of my series on EvilVM, a project to create a tiny injectable VM with language support for EvilTM agent features for pen-testers like me. In other words, a programming language and agent framework specifically designed for post-exploitation. The goal is to allow ad-hoc, reliable adaptation of malicious agents to situational needs during a pen-test. And to build it from scratch for fun (of course). The index and overview for this project can be found here.

Last time I created the execution loop for the VM and defined a few fundamental instructions. This time I’ll expand the instruction set with the goal of writing some code that’s actually useful. I’ll have to make a few decisions about how to handle some things – I’m not sure if these decisions will stick or not. It’s clear at this point that I may encounter issues later that will make me question early decisions, so I’m trying to keep an open mind so far.

Picking up where I left off, I have instructions for putting numbers and characters on the stack and exiting the program. It’s inauspicious, but still essential progress. The next order of business is to be able to do something with my numbers. We’ll start with math, but as I said before, I promise not to just leave this article series with an arithmetic language!

My first decision to make has to do with types – do I want to have separate instructions for manipulating each data type, or do I want to make the VM “aware” of different types and handle type checking when doing math? For clarity, here’s an example of what it might look like if I wanted to implement type-aware instructions:

uchar *vm_instr_add(VM *vm, uchar *code) {
	object *i = vm_pop(vm);
	object *j = vm_pop(vm);
	switch (i->type) {
	case TYPE_INT:
		if (j->type == TYPE_INT) j->i32_value += i->i32_value;
		break;
	case TYPE_CHAR:
		if (j->type == TYPE_CHAR) j->i8_value += i->i8_value;
		break;
	case TYPE_UCHAR:
		if (j->type == TYPE_UCHAR) j->u8_value += i->u8_value;
		break;
	case TYPE_FLOAT:
		if (j->type == TYPE_FLOAT) j->float_value += i->float_value;
		break;
	default:
		return NULL;
	}
	vm_push(j, vm);
	return code + 1;
}

The advantage of building this type of logic into my VM is that compiling expressions gets simpler on the high-level language side. I could, in principle, make this really complicated by implementing a full numeric tower into the VM itself where type conversions occur invisibly and all arithmetic operations work regardless of types. The obvious cost is that this code has to run every time the instruction is used. And to do it right, it’ll be much more complicated than what you see above – if we need to add a TYPE_INT to a TYPE_CHAR, we need to access the second value correctly and cast it, while keeping in mind what resulting data type we’ll need (e.g., the TYPE_INT is bigger than the TYPE_CHAR). It gets worse for TYPE_FLOAT instances, since we will be converting from precise integers to imprecise IEEE floats transparently.

The resulting code gets potentially complex, as the code needs to handle all possible pairs of types. As I add more types, it gets hairier. Some optimizations might work – such as taking the easy route when both types are the same (which means only two conditionals before carrying out the operation), and then using long code otherwise.

Another interesting question is whether this impacts future decisions for interacting with C APIs. I’ll likely need to support all of the fundamental data types in C. I also don’t really want to force my higher level languages to either build in type checking at runtime or to be forced to do static type analysis. What I sense at this point is that I’ll want to avoid having to make changes to the VM once I’m working at a higher level, so I’ll make the simplest decision at this point. And that means that, as cool as a full numeric tower in the VM would be, I’ll optimize for simplicity later on. So I’ll start with integer addition, subtraction, multiplication, and division:

uchar *vm_instr_add(VM *vm, uchar *code) {
	object *i = vm_pop(vm);
	object *j = vm_peek(vm);
	j->i32_value += i->i32_value;
	return code + 1;
}

uchar *vm_instr_mul(VM *vm, uchar *code) {
    ...
}

uchar *vm_instr_sub(VM *vm, uchar *code) {
    ...
}

uchar *vm_instr_div(VM *vm, uchar *code) {
    ...
}

And of course, I need to add these to the instruction function pointer table. I’ll repeat this here now, but from now on in the blog posts I won’t repeat this because it’s the same every time:

vm_funtable['+'] = &vm_instr_add;  // add     add two integers 
vm_funtable['*'] = &vm_instr_mul;  // mul     multiply two integers
vm_funtable['-'] = &vm_instr_sub;  // sub     subtract two integers
vm_funtable['/'] = &vm_instr_div;  // sub     subtract two integers

So now I should be able to do a few interesting calculations. For example, here is a program that calculates the result of the following arithmetic expression: ((3 + 4) * 6) / 3 in pseudo-assembly:

pushi 3
pushi 4
add
pushi 6
mul
pushi 3
div
halt

This translates to the “machine code” that follows:

i\x3\x0\x0\x0i\x4\x0\x0\x0+i\x6\x0\x0\x0*i\x3\x0\x0\x0/h

And this is an example of running it in the VM:

MU_TEST(test_programs) {
	VM *vm = vm_create();
	printf("Calculating '((3 + 4) * 6) / 3':\n");
	object *result = vm_exec(vm, 
	  "i\x3\x0\x0\x0i\x4\x0\x0\x0+i\x6\x0\x0\x0*i\x3\x0\x0\x0/h");
	printf("Result: %d\n", result->i32_value);
	vm_destroy(vm);
}

When I run my test suite, I get something like this:

Calculating '((3 + 4) * 6) / 3':
Result: 14

Yay, math works! But computers do a lot more than arithmetic – we want to be able to write algorithms, so we need a way to make decisions. This isn’t too hard; we just need a couple important instructions. Namely, we need “tests” and “branches”. Tests are pretty easy – for example, let’s make the equal instruction that just tests if the two numbers on the top of the stack are equal to zero:

uchar *vm_instr_equali(VM *vm, uchar *code) {
	object *j = vm_pop(vm);
	object *i = vm_pop(vm);
	object *res = vm_object(vm, TYPE_BOOLEAN);
	res->bool_value = i->i32_value == j->i32_value ? EVIL_TRUE : EVIL_FALSE;
	vm_push(res, vm);
	return code + 1;
}

So there are a couple things going on here, but one momentous decision has been made, and it’s kind of subtle. I have to choose between two options – does a “test” instruction consume its argument(s), or should it leave them on the stack underneath the new boolean value? I could do it either way, and this is another one of those things that has far-reaching consequences. For example, if I consume them, then there will be lots of places where the compiler will need to duplicate values before testing them. In the inverse case, code that branches based on tests will need to eliminate the value once the branch is taken.

In either case, it’s extra instructions. I decided to take inspiration from Forth, which is a high-level language that is built around stack manipulations. In Forth, tests consume the arguments, and since I can’t decide which is better, I’ll just go with tradition. Hence, the instruction code you see above.

And then there are “branches”. These are instructions that change control flow. The most basic one I’ll start with is jumpt, which jumps to an immediate, relative address if the top value on the stack is a boolean TRUE value:

uchar *vm_instr_jumpt(VM *vm, uchar *code) {
	int delta = *((int*)(code + 1));
	return vm_pop(vm)->i8_value ? code + delta : code + 5;
}

Since every instruction receives the VM and the current address pointer and returns the next address pointer, branches are really easy to implement. I just add or subtract the indicated relative jump and the current instruction pointer.

I have also added a few extra instructions, the details of which I won’t belabor here in the blog post. Here’s a list of some additional ones that are really handy:

swap      Swap the positions of the top two items on the stack
over      Push a copy of the second item on the stack
dup       Push a duplicate of the top item on the stack
jump      Jump to a relative address unconditionally

This gives me enough instruction set now to calculate something interesting, like factorials. Here’s the pseudo-assembly for a factorial:

      dup          # We need two copies to calculate the factorial
      dup          # duplicate the value for a test
      pushi 0      # compare it to zero
      equal        # conduct the test
      jumpt :ret0  # 0! is trivial, so we're done
loop: pushi 1      # we'll decrement the multiplicand each loop
      sub          # 
      swap         # reorder on stack so we can carry out multiplication
      over         # copy over the multiplicand (since mull will consume)
      pushi 0      # don't multiply by 0!
      equal        # test if it's zero
      jumpt :ret   # we're done when this is true
      over         # now copy up the multiplicand again
      mul          # carry out this loop's multiplication
      swap         # reorganize stack for next loop
      jump :loop   # unconditionally loop
ret0: pop          # remove stack contents
      pop          # ...
      pushi 1      # replace with 1s 
      pushi 1      # an extra one because ret: will pop it
ret:  swap         # bring last multiplicand to the front
      pop          # remove it 
      halt         # halt execution, leaving the factorial on top

To spare you the details, this ends up “assembling” as the literal value I pasted into the code below. This will display the factorials from 0! through 9!. The C code that runs this program on the VM is as shown:

for (int i = 0; i < 10; i++) {
    result = vm_object(vm, TYPE_INT);
    result->i32_value = i;
    vm_push(result, vm);
    vm_exec(vm, "\x64\x64\x69\x0\x0\x0\x0\x3D\x74\x20\x0\x0\x0\x69\x1\x0"
                "\x0\x0\x2D\x73\x4F\x69\x0\x0\x0\x0\x3D\x74\x19\x0\x0\x0"
                "\x4F\x2A\x73\x6A\xEA\xFF\xFF\xFF\x70\x70\x69\x1\x0\x0\x0\x69"
                "\x1\x0\x0\x0\x73\x70\x68"
    result = vm_pop(vm);
    printf("%d! = %d\n", i, result->i32_value);
}

And the resulting execution looks like this output from the test suite:

0! = 1
1! = 1
2! = 2
3! = 6
4! = 24
5! = 120
6! = 720
7! = 5040
8! = 40320
9! = 362880

OK, so now I’ve got a usable computer. There are lots more instructions to define, but I don’t think I’ll spend much more time in these posts going specifically over the instructions. I think any interested reader should have a good sense for what this looks like. I’ve found in testing things that it’s pretty easy to add new instructions over time, as long as I am careful not to add any that are too zany for overly specific purposes.

Next time, I think I’ll build a little assembler so that I don’t have to translate all these things by hand!