Attempting a systematic approach

In our previous blog post, we talked about how we envisioned a solid three-layer core as the foundation for Refli. In this blog post, we explain how we are trying to make the second and third layers more concrete. We use a simple running example before applying our approach to a computation we have recently implemented: the social security contributions of freelancers. We also explain how we hope this approach can be used systematically for other computations.

Time flies and it's been more than a year since our last blog post. In the meantime, we have (partially) updated the gross-to-net calculation, but otherwise our resources have been consumed by other commitments.

More recently, we've been working on making the layered core of Refli more concrete. The conceptual layers we introduced earlier were computations, data, and legislative texts, where each layer successively rests on top of the next one. Those layers can be seen in the following diagram, on the left half:

On the right half, we introduce similar layers, but with different names. They map the conceptual layers from the left to their concrete embodiements within Refli, displayed in yellow:

Starting from the bottom layer, we see that the legislative texts exist within Refli as a mini site called Lex Iterata. We try to keep Refli URLs well organized; the corresponding URLs here start with /en/lex. Lex Iterata, as presented on Refli, is essentially just a new rendition of texts available elsewhere (in particular on the web site of the Official Belgian Journal). But for us working with it, it also exists as files, database, and code to manipulate it, giving us extra possibilities to process it.
Moving to the middle layer, we have a small knowledge base, called "Concepts". Its URL is predictably /en/concepts. The knowledge base records small bits of information about the concepts a legislative text describes (such as documents, computations, and variables), and the relationships between them.
Finally, the top layer contains the implementation of computations described in the legislative texts. We group those computations and make them available in a mini site called "Playground", at the URL /en/playground.

While the purpose of Lex Iterata is easy to understand, we will expand on the two other layers in the next sections. To have something concrete to talk about, we'll use a contrived example, before moving to (one of the cases of) the computation of the social security contributions of freelancers.

Legislative texts

We start with legislative texts. For our simple example, we use this made up text:

00 MOIS 0000. — Article sur le calcul du revenu indexé

Publication: 00-00-0000
Numéro: 0000000000
Refli Concepts ✓: View in the knowledge base

Article 1er.Ce texte est un example utilisé pour illustrer la base de connaissances de Refli. Il ne s'agit pas d'un vrai texte législatif.

Art. 2.§ 1er. Le revenu indexé est déterminé en multipliant le revenu de référence par la fraction d'indexation applicable à l'année en cours.

§ 2. La fraction d'indexation est fixée annuellement par le Roi. Le numérateur correspond à la moyenne des indices des prix à la consommation de l'année en cours, tandis que le dénominateur représente la moyenne des indices de l'année de référence. Pour l'année 2025, cette fraction est fixée à 662,28/606,30.

§ 3. Le revenu de référence utilisé pour ce calcul est celui de la deuxième année civile qui précède l'année pour laquelle la cotisation est due.

Working with such a text might remind you of some homework you did when you were a kid: you need to carefully read the text, identifying relevant information and dismissing noise.

For instance, the sentence of Art. 1. can be discarded as it doesn't specify the computation. On the other hand, the first sentence of Art. 2. introduces two variables and how they should be multiplied together: this it the meat of our computation.

Moving on to a real-world scenario, the computation of the social security contributions is described in the text 1967072702, "27 JUILLET 1967. - Arrêté royal n° 38 organisant le statut social des travailleurs indépendants". Other texts appearing each year update some values used in the computation, while the overall description of the computation remains the same (here is a text fixing new values for 2025: 2024205904).

Concepts and their relationship

Moving from the legislative texts layer to the knowledge base layer, we simply need to name and record the concepts we noted earlier, together with their relationship.

Let's assume that we record in our knowledge base, that a document titled "Déclaration de revenu indexé" shows one of the variable of the above example computation, say, "Revenu indexé". (This can be done for instance by using a relational database, with a table called "presents", containing two columns, "document", and "variable".) We can use a graphical representation of such a piece of information like this:

Here is another example, where we show that our above example legislative text, "Article sur le calcul du revenu indexé", specifies the value of a variable called "Fraction d'indexation":

We can also add in our little database some other concept and relations (computations, describes (to record that a computation is described by a specific text), input and output variables). The complete computation (with the document that is not described in the above text) looks like this:

Note that we don't record explicitely in the database things such as constants, or intermediate variables. Instead they are easily inferred from the record knowledge: an output variable that is not presented by any document is only used to structure a computation, and thus is an intermediate variable; an input variable that is set by a text is a constant.

Returning to our real world case, we have separated the computation described within in multiple cases. You can visit a page with a graphical representation of the extracted concepts of one such case.

Short description

In addition of recording simple facts about a computation in our knowledge base, we also write down a very short summary of the computation. In the case of our example text, this can be something like this:

Such a description is added below the graphical representation on the computation page.

With the social security contributions of freelancers, we show how we put these ideas into practice:

Starting with the relevant legislative texts, we manually annotate them (providing the "data" part of the three layers), then draw formal connections between those texts, their annotations, the calculations they describe, and the variables the calculations require. From there, we can generate documentation and forms. We then need to manually implement the described computations.

Systematicity

The key insight of the presented approach, or at least that's our hope, is that we can use it to systematically implement the legislative texts relevant to Refli. In effect, implementing new legislative texts would require following these steps:

Identify a legislative text that we want to implement, and add it to knowledge base.
Identify related texts, and add them to the knowledge base.
Identify, within the above texts, computations (and possibly different cases of those computations), variables, and documents the texts describe, and add them to the knowledge base.
Given an updated knowledge base, we regenerate the "Concepts" mini site. (Diagrams and most of the specific web pages are derived automatically from the data in the knowledge base.)
For each (sub)computation, we also write some notes. (They appear under the "Description" sections, near the top of the generated pages.)
Write code to describe the forms associated to the computations. Basic forms can be generated automatically from the knowledge base, by deriving which variables are computation inputs but are not set to specific values by the legislative texts.
Write code to present the computation results. Again, simple presentations of the results can be generated automatically.
Finally, implement the computations themselves. This is the more challenging task, but is rendered slightly more easy with the above documentation.

2 Negative perspective

Possible introduction: one might think that the most obvious starting point for creating payroll management software would be the set of legislative texts published by the FPS Justice on its website.

One might also think that, since the Belgian state has argued and decided that the web distribution of texts alone allows it to fulfill its obligation to distribute legal texts to the Belgian population, these would indeed be in an appropriate digital format. However, whether in HTML or PDF format, their use does not meet the legitimate expectations one might have.

Despite these difficulties, this is nevertheless the approach chosen by Refli: starting from the legislative texts, and progressively, if possible with an approach that can be repeated over and over again, arriving at an implementation that covers the various existing cases.

3

In the previous blog post, we talked about three layers. We noted that the Data layer is not very precise. In this blog post, we specify this layer by discussing a way to link information to legal texts to better describe and facilitate the implementation of the Computation layer.

Indeed, a major challenge regarding the implementation of software like Refli is the large quantity of texts, and cases described therein, but also their evolution over time. It is therefore important to structure an approach that we hope can be easily reused.

The three layers (Texts, Data, Computations) exist under Refli behind three prefixes: `/en/lex`, `/en/concepts`, and `/en/playground`.

4

A year ago, the (very vague) idea of the Data layer was simply to attach to certain texts a list, for example, of variables. Finally, what we have done is a bit more general: we have created a mini database where we have recorded the relationship "text t describes variable v".

This representation is extremely simple, and at the same time easy to generalize. For example, we can also say "document t presents variable v", or "calculation c uses variable v" or even "calculation c calculates variable w".

With such a mini database, it becomes trivial to create a graphical representation of these relationships. We can also derive other interesting information from it. For example, if we know the variables presented in a document, which calculation is necessary to determine these variables, and which legislative text describes these variables, we can determine which documents could potentially be impacted when this text changes.

5

Our work on Refli tries to integrate into a single object, the software behind `refli.be`, everything related to its creation, whether it's the mini database of our knowledge base, the generation of diagrams derived from it, the documentation of calculations or their use.

We can notice this deep integration even on this page: the rendering of images or the presentation of forms and calculation results are not simple screenshots copied/pasted into this blog post, but are displayed exactly in the same way as you can find them in the Concepts and Playground mini-sites.

6

Once this mini database is created, it is easy to produce dedicated pages for each concept, for example a page for the "xxx" calculation or a page for the "yyy" document, etc. In a way, these pages are redundant with the initial graphic: everything is already said there. But having these dedicated pages is useful for "zooming in" on a specific part that interests us.

In addition to the information provided derived from this mini database, we can write documentation manually, which we also include on these dedicated pages. This documentation appears under the "Description" title of each page.

7

Now that we have a page briefly describing the most... (the most what? superficial? mechanical?...) aspects, we can use it as a guide to approach the implementation of the calculation, following the legislative texts.

8

A possible improvement that we are considering is to better point out, within a text, the articles, or paragraphs, which are associated with our concepts (documents, variables, calculations).

Challenges and improvements

We need example data that we can trust to validate our work. It's unfortunately too easy to misinterpret a text, or make a mistake when taking notes or writing computer code. Once we have example data, we can use them to run aumated tests, and we can also show them as part of the knowledge base.

Our current knowledge base doensn't take legislative texts evlotion into account. It's really just a snapshot at the time we created it. Even Lex Iterata only present the current version of the legislative texts. Our goal should be to retain and offer past computations too.

The knowledge base is currently used as a vehicle to organize computation-related data, to help us implement those computations. In the future, we would like to enrich it with additional concepts, such as personae, access rights, tasks, ... that Refli will also need to implement in order to become a full-fledged application.

With such a larger concept map, we can start selecting our next priorities, acting a bit as a visual roadmap, and complementing tools such as Jira.

In some cases, the legislative texts presented on Justel don't contain the important parts: they can be pushed into annexes of the PDF version of the texts, or for instance for "convention collective de travail", be in PDF documents elsewhere. In some cases, annexes are formatted in such way that the text cannot be easily selected (and thus are not easily processed by a machine).