Introduction

My accounting software is useless without automatic transaction fetching. Open Banking, enabled by DSP2, is theoretically an option, but the official certification process is a major hurdle. Given that they offer a mobile and web app, I suspect there might be an API I can leverage.

Figuring out how this works might be fun, after all, the last bank auth process I looked into was… peculiar 🙃.

Step 1: Login

As most French banks, the authentication flow typically involves:

  • Entering a digits-only username.
  • Inputting a digits-only password using a visual keyboard.
  • Potentially validating a second factor via the bank’s app.

This bank's keypad

Before sending any login info, Boursorama requires you to do a few fixed requests to gather some magic cookies. There returned values seem quite stable for a given IP.

$ curl 'https://clients.boursobank.com/connexion/'
[...]
<script>
    document.cookie="__brs_mit=<128 bit hex key>; domain=." + window.location.hostname + "; path=/; ";
</script>
[...]

To continue, this key must be included in every subsequent request as the __brs_mit cookie. Doing the same request again, but providing __brs_mit yields 2 new cookies:

brsxd_secure=<some 142 char long key, not base64>
navSessionId=web<sha-256>

In addition to those, we also need a third value: form[_token]! This token can be found in the returned HTML page. It looks like a JWT token (b64, sections separated by a dot), but is not. This token will have to be returned with the input/password POST request.

Step 2: SVG based obfuscation

Since the beginning, no user information was required. It was simply a matter of making the right requests and collecting bits in the cookies or returned HTML.

Now comes the last hurdle: the keypad.

As usual, the user needs to type the password using a visual keyboard. No direct key input is allowed. Perhaps to prevent a keylogger from discovering your highly secure 8-digit password?

What’s interesting is that the request doesn’t contain the password itself, but rather a series of 3-letter groups. So if you type 12345567, you get NXS|GKY|KJE|YOL|JXA|JXA|YFM|YSP. The sequence changes with each page reload, but the digits remain unshuffled, and the same digits are always encoded the same way.

NXS|GKY|KJE|YOL|JXA|JXA|YFM|YSP
 1 | 2 | 3 | 4 | 5 | 5 | 6 | 7
                 ^   ^
                 same!

The idea is that the server sends down 10 SVG files, each associated with a 3-letter sequence. Then, when the user clicks on an SVG button, the JS records the associated letters and appends them to the “password”.

This SVG-group association can be found by loading https://clients.boursobank.com/connexion/clavier-virtuel?_hinclude=1 (don’t forget to include all the cookies we gathered so far).

On the returned HTML page, you’ll find a list of SVGs, each linked to a data-matrix-key attribute:

One of the 10 buttons in HTML

Now that we have 10 SVG images and their corresponding groups, how do we know what digits each image represents?

# SVG are always the same, and the path length is different for each digit.
B64_SVG_LEN_MAP = {
        419  : 0,
        259  : 1,
        1131 : 2,
        979  : 3,
        763  : 4,
        839  : 5,
        1075 : 6,
        1359 : 7,
        1023 : 8,
        1047 : 9,
}

key = B64_SVG_LEN_MAP[len(my_svg_path)]

Step 3: Loggin in!

Equipped with the magic cookies and a SVG-to-digit conversion method, we can delve into the login step!

The login request is a multipart form POST request. It requires a few fields to be accepted.

Some make sense:

  • clientNumber: the username/client ID.
  • password: the 3-letter chain we created by understanding the digit images.
  • matrixRandomChallenge: a long hex key gathered in the HTML along the keypad. Probably the key used to generate the 3-letters group, allowing the server to be stateless.
  • _token: one of the many magic values we got by fetching a particular URL (part 1, ‘form[_token]’)
  • ajx: always ‘1’
  • platformAuthenticatorAvailable: “-1”, I guess something to say if my device supports passkeys?

Some are just plain weird:

  • fakePassword: one instance of the character for each digit in the password. So ‘••••••••’ since Boursorama enforces the password length to be 8.

And some are probably related to some analytics and can be discarded:

  • passwordAck: A JSON containing the timestamp of the click on each digit, and the X/Y coordinate of the said tap relative to the button.

Initially, I feared that this last field would require a more complex “human” check, such as emulating keypad layout and calculating realistic click delays based on button distances. However, it turns out that a simple {} is a sufficient value for passwordAck.

The final form request looks like this:

form[clientNumber]: <actual client number, plain text, e.g: 12341234>
form[password]: "CEF|UGR|O....E|IKR|KNE" # 3-key sequence we computed before.
form[ajx]: 1
form[platformAuthenticatorAvailable]: "-1"
form[passwordAck]: "{}"
form[fakePassword]: "••••••••"
form[_token]: <kinda JWT token, not quite, fetched from the previous steps>
form[matrixRandomChallenge]: <very long key, 13k characters, looks like B64>

This as POST request to `https://clients.boursobank.com/connexion/saisie-mot-de-passe’, along with the previously gathers cookies should yield us 2 final cookies:

brsxds_d6e4a9b6646c62fc48baa6dd6150d1f7 = <actual JTW token>
ckln<sha256> = <HTML quoted 2048-bit RSA?>

The first is a simple JWT token. But what’s intersting is the cookie name: brsxds_d6e4a9b6646c62fc48baa6dd6150d1f7! Did you know d6e4a9b6646c62fc48baa6dd6150d1f7 is the MD5 hash of prod ? 🙃 Turns out naming the cookie brsxds_prod wasn’t enough, they needed to hash the suffix.

The second cookie is a bit more mysterious. The name seems to be a SHA-256 hash prefixed with ckln. Not sure why. The value itself looks to be a twice URL-encoded 2048-bit base64 key, but I wasn’t able to figure out more. But as always: just add those to the next request, and everything works!

Step 4: Fetching my data

As any 90’s movie hacker would say: I’m in! Time to grab some transaction and account information to feed my software.

Unfortunately, Boursorama doesn’t appear to offer a JSON API for easy data access, it seems to rely heavily on Server-Side Rendering. To retrieve my account list, I fetched https://clients.boursobank.com/mon-budget/generate and parsed the HTML.

As for recent transactions, there’s at least a CSV exporter available:

params = {
        'movementSearch[selectedAccounts][]': account_id,
        'movementSearch[fromDate]': from_date.strftime('%d/%m/%Y'),
        'movementSearch[toDate]': to_date.strftime('%d/%m/%Y'),
        'movementSearch[format]': 'CSV',
        'movementSearch[filteredBy]': 'filteredByCategory',
        'movementSearch[catergory]': '',
        'movementSearch[operationTypes]': '',
        'movementSearch[myBudgetPage]': 1,
        'movementSearch[submit]': ''
}

url = 'https://clients.boursobank.com/budget/exporter-mouvements'
session = requests.Session()
resp = session.request('GET', url, cookies=cookies, params=params)

Note: if the range is invalid, or returned no results, the response is not a CSV anymore, but an HTML page showing an error message.

Final thoughts

As always with bank logins, I find this very convoluted, but not sure of the added benefit. In fact, most magic cookies were simply fetched from the server once, and sent back as-is in all subsequent requests, and the only challenge (SVG->key) is quite trivial.

I’d be curious to know the rational behind all that. Initialy I thought maybe the magic cookies are used to prevent some kind of MITM or replay attack? But unlike OVH which uses time-based request signatures, those keys seems to be quite stable.

In the future, I’d like to explore the mobile app, see if there is some JSON API I could use, because parsing HTML feels wrong.

I Hope you found this post interesting!


See comments

The details of this article have been communicated to the bank, but after 6 months of silence, I’m assuming it is not an issue for them, and decided to release this (see timeline below).

I think they are breaking PSD2 regulation around strong authentication, but
I’m not an expert on that subject.

Introduction

       When I was a child, I had a 10€ allowance per month. I remember keeping a small paper with all those transactions, but also planning future investment. For example, I knew I had to save for 8 years to afford my driving license.

Fast forward 2017, student, new flat, and thus the start of the great accounting spreadsheet™.

After 6 years, it became The Humongous Accounting Spreadsheet™.
Turns out, a single spreadsheet is not the ideal tool to track your every expenses across multiple countries. So here we are, with My Own Accounting Tool®.

It is fancy-enough, auto-categorizes most transactions, and can display pretty graphs.

Problem is, I still have to record transactions manually.

  • When I’m lucky, It’s a curl gathered from Firefox (DevTools > copy as cURL).
  • For others, a wonky regex-based python script to parse statements.

My goal: automatically fetching transactions directly from my bank account. This should reduce input mistake and accounting errors. My bank should have an API right?

The official API

The bank seems to have a public API: https://developer.lcl.fr/.
But as far as I understand, one needs to sign an agreement with the authorities or something, before getting some kind of certificate to sign requests.. Not going down that path tonight!

This bank also offers a website, so unless it’s full SSR, they should have some API I can plug into.

The other API

A quick look at the network requests, and here we are: https://monespace.lcl.fr/api/*!

The most interesting routes seems to be:

  • https://monespace.lcl.fr/api/login
  • https://monespace.lcl.fr/api/login/keypad
  • https://monespace.lcl.fr/api/login/contract
  • https://monespace.lcl.fr/api/user/accounts?type=current&contract_id=XXXXXXXX
  • https://monespace.lcl.fr/api/user/<account-id>/transactions

Those should be enough to fetch my own banking information.

Step 1: Login

To access my own data, I need to login.
For some unknown reason, banks in France LOVE weird SeCuRe visual keypads.
This bank doesn’t deviate: a 6-digit pin is the only password you need.

This bank's keypad

First surprising element: no 2FA by default? This bank does provide one (prompt on a trusted device), but it is only required for a few specific operations. I tried login on a blank browser, on a phone, with a new IP, and still, only the 6-digit password.

⚠ When traveling abroad, I noticed 2FA was required on the web page once, logging in from an already trusted device.
Rented a VPN, and tried my script in a few locations in France and Europe, and 2FA was never required. Not sure of the heuristic they chose, but since I can login from an untrusted location, and untrusted device, seems weak.

The 2 important network requests during the login are:

  • https://monespace.lcl.fr/api/login
  • https://monespace.lcl.fr/api/login/keypad

When you load the page, a first GET request is sent to api/login/keypad.
Upon login, a POST request is sent to api/login.

⚠ I redacted some parts of the request samples. The reason is I don’t know what those are, and if they contain secrets I shall not share.

api/login/keypad GET request

{
    "keypad": "13236373539383433303XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
}
  • keypad: A long, apparently random, digit-only sequence (partially redacted).

api/login POST request

{
   "callingUrl" : "/connexion",
   "clientTimestamp" : 1692997262,
   "encryptedIdentifier" : false,
   "identifier" : "XXXXXXXXXXX",
   "keypad" : "030303939303XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
   "sessionId" : "00000000000000000000001"
}
  • clientTimestamp: timestamp of the request.
  • encryptedIdentifier: always false, not sure why. Maybe something for plain HTTP requests?
  • identifier: the customer number.
  • keypad: A long, digit-only sequence (partially redacted). Maybe a challenge response?
  • sessionId: some client-side value derived from the timestamp. Seems to accept all numerical values as long as it respects some format.

Digit mangling

A large random number received, some client-side process with a keypad, and a large random number sent back. Some kind of challenge-response? Not exactly.

The keypad parameter is composed of 2 parts:

  • 13236373539383433303: a sequence determining the order of the keys on the keypad.
  • XXXXXXXXX...: the random seed used to generate that order?

So what do my login request looks like with the code 011000 ?
"keypad": "030303939303XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

The repetition pattern looks familiar.

03 03 03 93 93 03 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 0  0  0  1  1  0 ??

Yes, that’s the pin code, mangled digit by digit, and reversed. The mangling is a bit weird:

  • take the received keypad
  • reverse the string
  • parse digits, 2 by 2, as an hex value
  • take the last 10 pairs
  • interpret them as base-10
  • take the ascii char corresponding to the each value.
  • those are your keypad numbers

I spare you the JS handling the keypad, but here is the python code to login.

answer = get_json("https://monespace.lcl.fr/api/login/keypad")

keypad = answer['keypad']

# Weird mangling/obfuscation for the keypad values.
# The HEX digits, interpreted as base-10 are the keypad digits.
keys = [ chr(int(x, base=16)) for x in re.findall('..', keypad[::-1]) ][-10:]
seed = "".join([ chr(int(x, base=16)) for x in re.findall('..', keypad[::-1]) ][:-10])

password = input("Your 6 digit pin? ")
mangled = "".join([ str(keys.index(x)) for x in password ])
token = "".join([ str(hex(ord(x)))[2:] for x in (seed+mangled) ])[::-1]

payload = {
    'callingUrl': "/connexion",
    'encryptedIdentifier': False,
    'identifier': "<customer-id>",
    'keypad': token,
    'clientTimestamp': now,
    'sessionId': "<some-random-value>"
}

post_json("https://monespace.lcl.fr/api/login", payload)

Getting the transactions

Now that we are logged in, we want to list transactions.

  • Each transaction is tied to an account.
  • Each account is tied to a contract.
  • Each contract is tied to a user.

So to get my transactions, I need to get the contract, then get the account, and only then transactions.

The initial login request returns a few info:

{
    "accessToken": "<Bearer token>",
    "refreshToken": "<Refresh token>",
    "expiresAt": "<timestamp>",
    "multiFactorAuth": null,
    "userName": "<name>",
    "birthdate": "<birthdate>",
    "[...]"
    "contracts": [
        {
            "id": "<contract-id>",
            "[...]"
        }
    ],
}

As-is, the accessToken cannot be used to fetch transactions. Instead, it is used to get a second token. Token which authenticates requests made to a specific “contract”. I’m not sure how accounts are tied to “contracts”, but in my case, I have 1 contract tied to 1 account.

api/login/contract POST request

{
    "clientTimestamp": timestamp,
    "contractId": base64.b64encode(contract["id"].encode()).decode()[:-2]
}

Why is the contract ID base64 encoded? Maybe some code sharing with the user/accounts GET route?

api/login/contract response.

{
    "accessToken": "<another-token>",
    "refreshToken": "<refresh-token>",
    "expiresAt": "<timestamp>"
}

This access token can be used on 2 routes:

  • https://monespace.lcl.fr/api/user/accounts?type=current&contract_id=XXXXXXXX
  • https://monespace.lcl.fr/api/user/<account-id>/transactions

api/user/accounts GET request

This request takes 4 parameters:

  • type: the type of the contract/account to fetch?, Here set to current.
  • contract_id: the base64 encoded contract ID.
  • is_eligible_for_identity: false. Not sure what this is about.
  • include_aggregate_account: <boolean>

It returns some information about the fetched account:

{
    "total": "<balance-in-euro>",
    "accounts": [
        {
            "type": "current",
            "iban": "<the iban>",
            "amount": {
                "date": "2023-08-25T22:45:46.892+0200",
                "value": "<balance>",
                "currenty": "EUR"
            },
            "internal_id": "<internal-account-id>",
            "external_id": "<external-account-id>",
            "[...]"
        }
    ]
}

api/user/<account-id>/transactions GET request

This request takes 2 parameters:

  • contract_id: this time, the internal_id received in the previous request.
  • range: <int32>-<int32>. From-To range of transactions to fetch. 0 is the most recent transaction.
{
    "isFailover": "<boolean>",
    "accountTransactions": [
        {
            "label": "CB some shop",
            "booking_date_time": "1970-01-01T00:00:00.000Z",
            "is_accounted": "<boolean>",
            "are_details_available": "<boolean>",
            "amount": {
                "value": -5.32,
                "currency": "EUR"
            },
            "movement_code_type": "<code>",
            "nature": "<I/CARTE/VIREMENT SEPA RECU/PRELVT SEPA RECU XXX>"
        }
    ]
}
  • movement_code_type: not sure, sometimes absent, sometimes an int (like 948).
  • nature: seem to be a free-form field, as SEPA order text can be seen there.

Getting old transactions

The api/user/<account-id>/transactions request takes a range. But if this range contains any transaction older than 90 days, the request fails: 2FA is required to make such request.

Digging a bit, I found 2 other API routes:

  • api/user/documents/accounts_statements
  • api/user/documents/documents

Those routes have no limit on the dates.

WAIT, WHAT?

Yes, they do require 2FA to call https://monespace.lcl.fr/api/user/<account-id>/transactions for transactions older than 90 days, but PDF statements since the dawn of time? Sure, NO PROBLEM.

The returned values have this format:

[
    {
        "codsoufamdoc_1": "AST",
        "datprddoccli": "2020-12-02",
        "downloadToken": "<some-token>",
        "liblg_typdoc": "Some human-readable document title",
        "libsoufamdoc_1": "Some human-readable category"
    }
]

To download the PDF, a GET request with the downloadToken fetched in the previous request:

https://monespace.lcl.fr/api/user/documents/download?downloadToken=<token>

Final thoughts

No 2 factor authentication;

A 6-digit pin. Really?
Why isn’t 2FA enforced by default? Even my empty Twitter account is more secured.

Why is the pin code mangled?

Isn’t SSL enough to secure your payload?
This rot-13 like obfuscation really seems weak if that’s the worry.

Auth tokens remains valid for 21 days

The web session does auto-exit after ~30mn of inactivity.
But did you know the auth tokens remains valid for 21 days?

Anyway, I do have what I need to interoperate with my accounting application, and I can rest peacefully, knowing my personal information are safe 🙃.

All the information disclosed here are public, and freely accessible with any web browser. I was required to figure this out to build interoperability with my own software.

Disclosure timeline

  • 28-08-2023: found those weaknesses, documented them.
  • 01-09-2023: contacted the bank on twitter via private message to ask about this.
  • 04-09-2023: contacted the bank by email since the twitter message hasn’t been replied to.
  • 05-09-2023: received a twitter message saying “we received the email, we’ll reply”
  • 21-02-2024: No news. Same behavior observed. Published this article.

See comments

       This blog is simple: some .md files, generated to static HTML. No backend or complex CMS. It’s light, loads fast, and readable-enough on mobiles (except code-blocks). Versioning is done on git. But it had one drawback: “high” cost to publish.

Building is done using Jekyll, and then pushing files to an FTP server. Since I publish rarely, I had no warm setup. Sometimes my ruby installation was broken, sometimes some dependency were broken. Once built, pushing to the FTP was a mix of FTP fuseFS + rsync (My OVH hosting had to ssh/sshfs access). As always with manual intervention, error could happen!

Anyway, I had some free credits on GCP, so tried Cloud Builder (used it in the past to setup CIs), and quickly stopped. Goal was to simplify the whole process, and using GCP is not going the good direction.

Found out about Firebase, decided to give it a try (spoiler: blog is hosted on Firebase as of today). And it had everything I need: It’s fast, simple, and absolutely cheap for my use-case!

Deploying the website was simple:

$ firebase deploy

This combined with a GitHub workflow, spinning a ruby docker, building the website and pushing to firebase:

'on':
  push:
    branches:
      - master
jobs:
  build_and_deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Prepare tree
        run: 'mkdir build'
      - name: Build docker image
        run: 'cd docker && docker build -t builder . && cd ..'
      - name: Build website
        run: 'docker run -t --rm --mount type=bind,src=/home/runner/work/blog/blog,dst=/mnt/src --mount type=bind,src=/home/runner/work/blog/blog/build,dst=/mnt/output builder'
      - uses: FirebaseExtended/action-hosting-deploy@v0
        with:
          repoToken: '$'
          firebaseServiceAccount: '$'
          channelId: live
          projectId: blog-1234

This also brings another advantage: Github becomes my CMS. I can write a new article as long as I have some Github access. Which is quite convenient!


See comments

       While working on parallax mapping, somebody told me about a cool presentation: Sparse virtual textures. The idea is quite simple: reimplement pagination in your shaders, allowing you to have infinite textures while keeping the GPU memory usage constant.

Goal was set: add SVT support to my renderer!

Step 1 - Hand-made pagination

Pagination overview

To understand how SVT works, it is useful to understand what pagination is.

On most computers, data is stored in the RAM. RAM is a linear buffer, and its first byte is at the address 0, and the last at address N.

For some practical reasons, using the real address is not very convenient. Thus some clever folks invented the segmentation, which then evolved into pagination.

The idea is simple: use a virtual address, that is translated by the CPU into the real RAM address (physical). The whole mechanism is well explained by Intel1

This translation possible thanks to pagetables.

Translating every addresses into a new independant one is costly and not needed. That’s why they divided the whole space in pages. A page is a set of N contiguous bytes. For example, on x86, we often talk about 4kB pages.

What the CPU translate are page addresses. Each block is translated as a contiguous unit. The internal offset remains the same. This means for N bytes, we only have to store N/page_size translations.

pagination recap

Here on the left you have the virtual memory, divided in 4 blocks (pages). Each block is linearly mapped to an entry in the pagetable.

The mapping can be understood as follows:

  • Take your memory address.
    • adress = 9416
  • Split it into a page-aligned value and the rest.
    • 9416 => 8192 + 1224.
    • aligned_adress = 8192
    • rest = 1224
  • Take the aligned value, and divide it by the page size.
    • 8192 / 4096 = 2
    • index = 2
  • This result is the index in the pagetable.
  • Read the pagetable entry at this index, this is your new aligned address:
    • pagetable[2] = 20480
  • Add the rest back to this address:
    • physical_address = 20480 + 1224
  • You have your physical address.

Adding the page concept to the shader

To implement this technique, I’ll need to:

  • find which pages to load
  • load them in the “main memory”
  • add this pagetable/translation technique.

This could be done using compute shaders and linear buffers, but why not use textures directly? This way I can just add a special rendering pass to compute visibility, and modify my pre-existing forward rendering pass to support pagetables.

First step is to build the pagetable lookup system. This is done in GLSL:

  • take the UV coordinates
  • split them into page-aligned address, and the rest
  • compute page index in both X and Y dimensions
  • lookup a texture at the computed index (our pagetable)
  • add to the value the rest
uv coordinates
Showing UV coordinates
page aligned UVs
Showing page-aligned UV coordinates

Computing visibility

The other advantage of pagination is the ability to load/unload parts of the memory at runtime. Instead of loading the whole file, the kernel only loads the required bits (pages), and only fetch new pages when required.

This is done using a pagefault:

  • User tries to access a non-yet loaded address.
  • CPU faults, and send a signal to the kernel (page fault).
  • The kernel determines if this access is allowed, and loads the page.
  • Once loaded, the kernel can resume the user program.

This mechanism requires hardware support: the CPU knows what a pagetable is, and has this interruption system. In GLSL/OpenGL, we don’t have such thing. So what do we do when interrupts don’t exits? We poll!

For us, this means running an initial rendering pass, but instead of rendering the final output with lights and materials, we output the page addresses. (Similar to the illustration image seen above).

This is done by binding a special framebuffer, and doing render-to-texture. Once the pass completed, the output texture can be read, and we can discover which pages are visible.

For this render pass, all materials are replaced with a simple shader:

#version 420 core

/* material definition */
uniform float textureid;
/* Size of a page in pixels. */
uniform float page_size;
/* Size of the pagetable, in pixels (aka how many entries do we have). */
uniform float pagetable_size;
/* Size in pixels of the final texture to load. */
uniform float texture_size;
/* Aspect ratio difference between this pass, and the final pass. */
uniform float svt_to_final_ratio_w; // svt_size / final_size
uniform float svt_to_final_ratio_h; // svt_size / final_size

in vertex_data {
    vec2 uv;
} fs_in;

out vec4 result;

/* Determines which mipmap level the texture should be visible at.
 * uv: uv coordinates to query.
 * texture_size: size in pixels of the texture to display.
 */
float mipmap_level(vec2 uv, float texture_size)
{
    vec2 dx = dFdx(uv * texture_size) * svt_to_final_ratio_w;
    vec2 dy = dFdy(uv * texture_size) * svt_to_final_ratio_h;

    float d = max(dot(dx, dx), dot(dy, dy));
    return 0.5f * log2(d);
}

void main()
{
    /* how many mipmap level we have for the page-table */
    float max_miplevel = log2(texture_size / page_size);

    /* what mipmap level do we need */
    float mip = floor(mipmap_level(fs_in.uv, texture_size));

    /* clamp on the max we can store using the page-table */
    mip = clamp(mip, 0.f, max_miplevel);

    vec2 requested_pixel = floor(fs_in.uv * texture_size) / exp2(mip);
    vec2 requested_page = floor(requested_pixel / page_size);

    /* Move values back into a range supported by our framebuffer. */
    result.rg = requested_page / 255.f;
    result.b = mip / 255.f;

    /* I use the alpha channel to mark "dirty" pixels.
     * On the CPU side, I first check the alpha value for > 0.5,
     * and if yes, consider this a valid page request.
     * I could also use it to store a "material" ID and support
     * multi-material single-pass SVT. */
    result.a = 1.f;
}

Once the page request list retrieved, I can load the textures in the “main memory”.

The main memory is a simple 2D texture, and page allocation is for now simple: first page requested gets the first slot, and so on until memory is full.

main memory texture
“Main memory” texture

Once the page allocated, I need to update the corresponding pagetable entry to point to the correct physical address. This is done by updating the correct pixel in the pagetable:

  • R & G channels store the physical address.
  • B is unused.
  • A marks the entry as valid (loaded) or not.
pagetable
Pagetable texture

Rendering pass

The final pass is quite similar to a classic pass, except instead of binding one texture for diffuse, I bind 2 textures: the pagetable, and the memory.

  • bind the 3D model
  • bind the GLSL program
  • bind the pagetable and main-memory textures.

At this stage, I can display a texture too big to fit in RAM & VRAM.

Step 2: MipMapping

If you look at the previous video, you’ll notice two issues:

  • Red lines showing up near the screen edges.
  • Page load increase when zooming out.

First issue is because texture loading doesn’t block the current pass. This means I might request a page, and not have it ready by the time the final pass is ran. I could render it as black, but wanted to make it visible.

The second issue is because I have a 1:1 mapping between the virtual page size and the texture page size. Zooming out to show the entire plane would require loading the entire texture. Texture which doesn’t fit in my RAM.

The solution to both these issues are mipmaps.

  • A page at mipmap level 0 covers page_size pixels.
  • A page at mipmap level 1 covers page_size * 2 pixels
  • A page at mipmap level N covers the whole texture.

Now, I can load the mipmap level N by default, and if the requested page is not available, I just go up in the mip levels until I find a valid page.

Adding mipmaps also allow me to implement a better memory eviction mechanism:
I can now replace 4 pages with one page a level above.
So if I’m low on memory, I can just downgrade some areas, and save 75% of my memory.

Finally, MipMapping reduces the bandwidth requirements: if the object is far, why load the texture in high resolution? A low-resolution page is enough:

  • less disk load.
  • less memory usage.
  • less latency (since there is less pages to load).
physicaladdresses with MipMapping
Showing physical addresses with MipMapping

Step 3: Complex materials

The initial rendered had PBR materials. Such material had not only an albedo map, but also normal and roughness+metallic maps. To add new textures, several options:

  • New memory textures, new pagetable texture, new pass.
  • simple
  • requires an additional pass. This is not OK.

  • Same memory texture, same pagetable texture.
  • Each page contains in fact the N textures sequentially. So when one page is loaded, N textures are queried and loaded.
  • Easy to implement, but I have to load N textures.

  • Same memory texture, multiple pagetable textures.
  • pagetables are small, 16x16 or 32x32. Overhead is not huge.
  • I can unload some channels for distant objects (normal maps by ex).
  • Drawback is I have now N*2 texture sampling in the shader: one for each texture and its associated pagetable.

Because I like the flexibility of this last option, I chose to implement it. In the final version, each object has 4 textures:

  • memory (1 mip level)
  • albedo pagetable (N mip levels)
  • roughness/metallic pagetable (N mip levels)
  • normal pagetable (N mip levels)

In the following demo, page loading is done in the main thread, but limited to 1 page per frame, making the loading process very visible.

  • Bottom-left graph shows the main memory.
  • Other graphs show the pagetables and their corresponding mip-levels.

Page request : subsampling, random and frame budget.

For each frame, I need to do this initial pass to check texture visibility. Reading this framebuffer on the CPU between each frame is quite slow, and for a 4K output, this is prohibitively expensive.

The good news is: I don’t need a 4K framebuffer in that case! Pages are covering N pixels, so we can just reduce the framebuffer size and hope our pages will still be requested!

The demo above is using a 32x32 framebuffer. Which is very small. If done naïvely, this wouldn’t work: some pages would be caught between 2 rendered pixels, and never loaded.

missing pages
8x8 framebuffer, no jitter.
 

A way to solve that is add some jitter to the initial pass. The page request viewpoint is not exactly the camera’s position, but the camera’s position + some random noise.

This way, we can increase coverage without increasing the framebuffer size.

missing pages
8x8 framebuffer, jitter.
  1. See Intel Architectures Developer’s Manual: Vol. 3A, Chapter 3 


See comments

       I never experimented with machine learning or denoising. I guess having obscure matrices combined together to produce some result scared me a bit.. Surprising for someone who loves computer graphics… 🙃
After failing an interview for an ML-related position (surprising?) I thought enough is enough, time to play catch-up!

For this project, I started with the basics: Andrew NG ML course. After a couple days — and obviously becoming the greatest ML expert in the world — I decided to tackle the easiest problem ever: image denoising!

The goal

Denoising is a complex field, and some very bright people are making a career out of it. Not my goal!

Here I’ll try to explore some classic denoising techniques, implement them, and once used to some of the problems, build a custom model to improve the result.

The input:

challenge image

I believe this should be a good candidate:

  • has a flat shape to check edge preservation.
  • has some “noise” to keep (foliage).
  • has some small structured details (steel beams).
  • has smooth gradients (sky).

Step 1 - sanity check

pixel line

From Wikipedia:

noise is a general term for unwanted […] modifications that a signal may suffer

The graph above represents a line of pixels being part of a smooth shade. In red are 2 bad pixels. They are bad because they interrupt the smoothness of our graph, and thus are perceived as noise.

How can we remove some outliers in that case? Averaging! Each pixel value is averaged in regard to its neighbors. In this case, this would help reduce perceptible noise.

  foreach x, y in image
    neighbors = extract_window_around(image, x, y, window_size=10)
    res = average(neighbors)
    image.set(x, y, res)

smooth, before & after

But in real life, that’s terrible..

real, before & after

The reason for this poor performance is we don’t discriminate valid details from noise. We loose our edges, and all details are lost.

Step 3 - Better average - YUV vs RGB

The previous image was generated by averaging RGB values using a 10-pixels sliding window. Because it was averaging RGB values, it mixed colors. As result, edges were blurred in a very perceptible way, leading to an unpleasant result.

YUV is another color representation, splitting the channels not as red, green, and blue, but color, and luminosity. Colors are represented using polar coordinates, and luminosity is a single linear value.

If we look at the sky, the noise doesn’t seem to alter the color a lot, only the brightness of the blue. So averaging using the same window, but only on the luminance component should give better results:

yuv, smooth yuv, real

Step 4 - selective average

Using YUV vs RGB helped: the sky looks fine, and the green edges look sharper. Sadly, the rest of the image looks bad. The reason is that I still use the same window size for the sky and the tower.

I can improve that solution using a new input: an edge intensity map. Using the well known Sobel operator I can generate the list of areas to avoid.

  edge_map = sobel(image)
  foreach x, y in image
    window_size = lerp(10, 1, edge_map.at(x, y))
    neighbors = extract_window_around(image, x, y, window_size)
    res = average(neighbors)
    image.set(x, y, res)

edge, real

  • ✅ The square edges are preserved.
  • ✅ The sky blur is gone
  • ✅ The Eiffel Tower’s edges seem preserved.
  • ❌ Artifacts visible in the sky (top-right)
  • ❌ The foliage texture is lost.
  • ❌ The metallic structure lost precision.
  • ❌ The grass mowing pattern is completely lost.

Step 5 - ML-based noise detection

In the previous step, I tried to discriminate areas to blur and keep as-is. The issue is my discrimination criteria: edges. I was focusing on keeping edges, but lost good noise like the foliage.

So now I wonder, can I split good noise from bad noise using a classification model?

  foreach x, y in image
    window = extract_window_around(image, x, y, window_size)
    bad_noise_probability = run_model(window)
    blur_window_size = lerp(1, 10, bad_noise_probability)
    res = average_pixels(image, x, y, blur_window_size)
    image.set(x, y, res)

For this model, I tried to go with a naïve approach:

  • select a set of clean images
  • generate their noisy counterpart in an image editor
  • split these images in 16x16 pixel chunks.

model training set extraction

Those would represent my training & test set (6000 items and 600 items). The goal is now from a 16 pixel window, determine if the pixel belongs to noise, or belongs to some details.

Then, I would iterate over my pixels, extract the 16x16 window around, run the model on it, and use this probability to select my blur window. My guess is that we should now be able to differentiate foliage from sky noise.

Here is the model output: in red the parts to clean, in black the parts to keep.

model output

And here is the output:

final result

  • ✅ Edges are preserved.
  • ✅ Steel structure is clear in the middle.
  • ✅ Left foliage looks textured.
  • ❌ Right foliage shadows are still noisy.
  • ❌ Some areas of the steel structure are blurred.
  • ❌ Sky has artifacts.

The model training set is composed of only ~6000 chunks extracted from 4 images (2 good, 2 noisy). Training the same model on a better dataset might be a first solution to improve the noise classification.

This result seems better than the bilateral filtering, so I guess that’s enough for a first step into the ML world. I will stop there for now, and move on to the next project!


See comments