Some friends were registered to this CTF, and since I had some days off, I decided to work a bit on one RE exercise.

The binary is called BadVM:

[nathan@Jyn badvm]$ ./badvm-original
### BadVM 0.1 ###

Veuillez entrer le mot de passe:
toto
Ca mouline ...
Plus qu'un instant ... On avait la réponse depuis le début en faite :>
Perdu ...

It is a stripped, ELF 64 PIE binary. Time to start Binary Ninja. This binary has no anti-debug, nor packing techniques. Just some calls to sleep. Once these calls NOPed, we can start reversing the VM.

The VM is initialized in the function I called load_vm (0xde6). Then, the function at 0xd5f is called, let’s call it vm_trampoline.

This function will choose the next instruction to execute. Load it’s address in rax and call it. vm_trampoline is called at the end of each instruction. Thus, each instruction is a new entry in the backtrace.

This means, when returning from the first call to vm_trampoline, we can read the result and return it. This takes us back to load_vm, and result is checked.

In case of an invalid character in the password, we have an early-exit. Input is checked linearly, no hash or anything, Thus instruction counting works well.

Since I was on holidays, I decided to experiment a bit with lldb, and write a instrument this VM using its API.

Reversing the VM

This VM uses 0x300 bytes long buffer to run. Some points of interest:

  • 0x4: register A (rip)
  • 0x5: register B
  • 0xFF: register C (result)
  • 0x2fc: register D
  • 0x2fe: register E (instruction mask?)

  • 0x32: password buffer (30 bytes)
  • 0x2b: data buffer (xor data, 30 bytes)
  • 0x200: data start (binary’s .data is copied in this area)

Instruction are encoded as follows:

opcode

To select the instruction, the VM contains a jump-table.

jump-table

Here one of the instructions (a ~GOTO):

instruction

Final note: each instruction/function has the following prototype:

prototype

Instrumenting using LLDB

This VM does not check its own code, thus we can freely use software breakpoints. The code is not rewritten, thus offsets are kept. This allow us to simply use LLDB’s python API to instrument and analyse the VM behavior.

First step, create an lldb instance:

def init():
    dbg = lldb.SBDebugger.Create()
    dbg.SetAsync(True)
    console = dbg.GetCommandInterpreter()

    error = lldb.SBError()
    target = dbg.CreateTarget('./badvm', None, None, True, error)
    # check error

    info = lldb.SBLaunchInfo(None)
    process = target.Launch(info, error)
    print("[LLDB] process launched")

Now, we can register out breakpoints. Since vm_trampoline is called before each instruction, we only need this one:

    target.BreakpointCreateByAddress(p_offset + VM_LOAD_BRKP_OFFSET)

Now, we can run. To interact with the binary, we can use LLDB’s events. Registering a listener, we can be notified each time the process stops, or when a breakpoint is hit.

listener = dbg.GetListener()
event = lldb.SBEvent()

if not listener.WaitForEvent(1, event):
    continue

if event.GetType() != EVENT_STATE_CHANGED:
    # handle_event(process, program_offset, vm_memory, event)
    continue

regs = get_gprs(get_frame(process))
if regs['rip'] - program_offset != address:
    print("break location: 0x{:x} (0x{:x})".format(
          regs['rip'] - program_offset, regs['rip']))

To read memory, or registers, we can simply do it like that

process.ReadUnsignedFromMemory(vm_memory + 0, 1, err),

process.selected_thread.frame[frame_number].registers
# registers[0] contains general purpose registers

Now we can implement a pretty-printer to have “readable” instructions. Once everything together, we can dump the execution trace:

mov [0x00], 0xff
mov [0x01], 0x01
mov tmp, [0x00]  	# tmp=0xff
mov [tmp], [0x01]	# src=0x1
mov [0x00], 0x0b
mov [0x01], 0x1d
mov tmp, [0x00]  	# tmp=0xb
mov [tmp], [0x01]	# src=0x1d
mov [0x01], 0x0b
mov tmp, [0x01]  	# tmp=0xb
mov [0x00], [tmp]	# [tmp]=0x1d
mov r5, [0x00]
sub r5, [0x0a]   	# 0x1d - 0x0 = 0x1d
if r5 == 0:
    mov rip, 0x2d
mov [0x01], 0x0a
[...]

Now, we can reverse the program running in the VM:

def validate(password, xor_data):
    if len(password) != len(xor_data):
        return -1

    D = 0
    for i in range(len(xor_data)):
        tmp = (D + 0xAC) % 0x2D
        D = tmp
        if xor_data[i] != chr(ord(password[i]) ^ tmp):
            return i

    return len(xor_data)

And we get the flag:

SCE{1_4m_not_4n_is4_d3s1yn3r}

Conclusion

This VM has no anti-debug, packing or anything special. But it was a funny binary to reverse. To instrument the VM, lldb is useful, but using DynamiRIO would be a more elegant method.


See comments

Working on my 3D game engine is the perfect occasion to reimplement classic algorithms. On today’s menu: self-shadowed steep parrallax-mapping First step, get the classic steep parrallax-mapping.

parrallax final result

Here a two good links to implement this algorithm:

Steep parrallax-mapping allows us to get a pretty good result (10 samples):

parrallax closeup 1 parrallax closeup 2

But something is missing. Let’s implement self-shadows.

Self shadows are only computed on directional lights. The algorithm is very simple.

  • convert light direction in tangent space
  • compute steep parrallax-mapping
  • from the resulting coordinate, ray-march towards the light
  • If there is an intersection, reduce exposition

And then, TADAA

Shader code available here

(2 steps are more than enough for this part.)

parrallax final result


See comments

Virglrenderer provides OpenGL acceleration to a guest running on QEMU.

My current GSoC project is to add support for the Vulkan API.

Vulkan is drastically different to OpenGL. Thus, this addition is not straight-forward. My current idea is to add an alternative path for Vulkan. Currently, two different states are kept, one for OpenGL, and one for Vulkan. Commands will either go to the OpenGL or Vulkan front-end.

For now, only compute shaders are supported. The work is divided in two parts: a Vulkan ICD in MESA, and a new front-end for Virgl and vtest.

If you have any feedback, do not hesitate !

This experiment can be tested using this repository. If you have an Intel driver in use, you might be able to use the Dockerfile provided.

Each part is also available independently:


See comments

Several months ago started the GSoC 2018. Once again, I found a project which got my attention.

Vulkan-ize VirglRenderer

Virglrenderer is a library designed to provide QEMU guests with OpenGL acceleration. It is composed of several components:

  • a MESA driver, on the guest, which generates Virgl commands
  • a lib, on the host, which takes virgl commands and generated OpenGL calls from it.

If you want to read more: 3D Acceleration using VirtIO.

This library was built with OpenGL in mind. Today, Vulkan is correctly supported, and is becoming a new standard. It might be time to bring Vulkan to QEMU’s guests !

To do so, we will need to work on two components:

  • A Vulkan ICD. Writting one for MESA sounds like a good idea.
  • A Vulkan back-end for Virglrenderer.

Now, we can face the first issue: Vulkan is not designed with abstraction in mind. The time of the old GlBegin/glVertex is kinda dead.

If we want to avoid any unnecessary abstraction, we cannot easily reduce the amount of calls made to the API. Thus, the vast majority of the VK calls will be forwarded to the host. However, there is some area in which we can bend the rules a bit.


The rest of this post contains the same content as the announce email (virgl ML).

Project status

  • Several Vulkan objects can be created
  • Memory can be mapped and altered on the client.
  • Changes are written/read to/from the server on flush/invalidation
  • Basic features for command buffers are supported.

As a result, a sample compute shader can be ran, and the results can be read.

I only use vtest for now. The client part lies in mesa/srv/virgl.

Current behavior

To compile virglrenderer with vulkan, the option –with-vulkan is needed. Running the server as-is does not enable Vulkan. And for now, Vulkan cannot be used in parallel with OpenGL (Issue #1). To enable Vulkan, the environment variable VTEST_USE_VULKAN must be set.

Initialization:

The client driver is registered as a classic Vulkan ICD. When the loader call icdNegociateLoaderICDInterfaceVersion, the driver connects to the server. On failure, the driver reports as an invalid driver.

Once connected, the ICD will fetch and cache all physical devices. It will also fetch information about queue, memory and so. Physical devices are then exposed as virtual-gpus. Memory areas are showed as-is, except for the VK_MEMORY_PROPERTY_HOST_COHERENT bit, which is disabled. This forces the application to notify every modification made to a mapped memory.

The object creation part relies heavily on API-Forwarding. For now, I don’t see how I could avoid that.

Memory transfers

Once basic objects are created, the client will ask to map some memory. For now, no clever thing is done. The ICD will provide a buffer. On flush, a transfer command is issued. Virglrenderer will then map the corresponding memory region, write/read, and unmap it. A memory manager could be used on the server in the future to avoid mapping/unmapping regions each time a transfer occurs.

Commands and execution

Command pool creation is forwarded to the server. For now, a command buffer is attached to its pool. To retrieve a command buffer from a handle, I need to know from which pool it came from. (Issue #2) Command buffer creation is also forwarded to the server.

Command buffers state is managed on the client. Each vkCmd* call will modify an internal state. Once vkEndCommandBuffer is called, the state is sent to the server. The server will then call corresponding vkCmd* functions to match retrieved the state.

Code generation

Vulkan entry points are generated at compile time. Heavily inspired from Intel’s entry-point generation. However, since object creation relies on API-Forwarding, I started to work on a code generator for these functions.

Using a json, the interesting informations are outlined. Then a Python script will generate functions used to forward object creation to the vtest pipe. Even-though the Vulkan API seams pretty consistent, some specific cases and time constraints forced me to abandon it.

This script is still available in the mesa/src/virgl/tools and virglrenderer/tools folder, but is lacking features. Also, since I had different needs on both sides of vtest, scripts diverge a lot. The most recent version is the Virglrenderer one. It’s a second iteration, and it might be easier to work with.

In the current state, I use it to generate a skeleton for vtest functions, and then fixes the implementation. In the future, it could save us some time, especially if we use the same protocol for VirtIO commands.

Issues

1: (Virglrenderer) Vulkan cannot be used next to OpenGL.

There is no reason for it except a badly though integration of the vulkan initialization part into virglrenderer.

2: (Virglrenderer) Command buffers are scattered into several pools

Command buffers are scattered into several pools the client created. To fetch a command buffer vk-handle, I need to first fetch the corresponding pool from a logical device, then fetch the command buffer. Since VirtIO ant Vtest provides a FIFO, maybe we could drop the command pool creation forwarding. Use only one pool per instance, and thus simplify command buffers lookups.

3: (MESA) Vtest and VirtIO switch is not straightforward right now.

An idea could be to add a level between vgl_vk* functions an vtest. vgl_vk* function would still manage the state of the ICD. the mid-layer would convert handles and payload to a common protocol for both VirtIO and Vtest. (Both could use vgl handles and some metadata). Then, a backend function, which would choose between vtest and virtio.

The handles could be either forwarded as-is (vtest case) Or translated to real virgl handles in the case of a kernel driver which could do a translation, or check them. But the metadata should not change.

4: (Virglrenderer/MESA) vtest error handling is bad.

Each command sends a result payload, and optionally, data. This result payload contains two informations. An error code, and a numerical value. Use as a handle, or a size. On server failure, error-codes should be used.

5: bugs, bugs and bugs.

This project is absolutely NOT usable right now.

Next steps

My first step should be to rebase this project onto the current virglrenderer version, and rewrite the history. In the mean time, rewrite the initialization part to allow both OpenGL and Vulkan to be ran. Then, fix the vtest/virtio architecture. Add this new mid-layer. Once refactorized, I should work on the error handling for client-server interactions.

Once in a sane state, other issues will have to be addressed.

How to test it

There is a main repo used to build and test it rapidly. In it, a bash script and a dockerfile (+ readme, todo)

The bash script in itself should be enough. But if the compilation fails for a reason, the dockerfile could be used.

The README provided should be enough to make the sample app run.

Repositories


See comments

Currently working on a Vulkan extension for VirglRenderer, I need to grep the API all the time. The official documentation gives me two options:

  • search the Vulkan spec (huge PDF)
  • use my browser custom engine feature and play with Khronos’ registry URL’s

The first is painful, and the second too strict (case sensitive).

Recently, I also went to an Algolia hosted meeting. Their search engine API looked good, and in my case, it’s free!

Thus, I took a couple hours off from my GSoC, and crafted this thing: a dirty Vulkan API search engine.

Edit 2024-04-02:

  • This utility had no users for a year. Algolia has scheduled the index deletion.
  • Its index is very outdated (Vulkan 1.0, very few KHR/vendor extensions).
  • The official Vulkan documentation has improved, making this utility obsolete.

For those 3 reasons, I am sunsetting this utility.

screenshot


See comments