Roslyn SyntaxTree inspiration

Roslyn (open source C#/VB Compiler Platform) took a very interesting path in modelling the syntax of parsed code. A strong focus on immutability gave many positive results like thread safety and others but it also made things like two-way parent-child reference harder to achieve without recreating the tree on every keystroke in editor.

A much more detailed and better written explanation why and how is in a this blog post.

In short, there are actually two trees, one which is built bottom-up and every node knows only its children and is truly immutable. The second one is an facade built only on-demand from top to bottom, only nodes that are actually accessed, with a reference to the parent — and its immutability is only perceived on public API, the actual state, due to lazy creation of child nodes etc is actually mutable, so to say. That facade tree, called in roslyn “red” in opposite to internal “green”, is thrown away on every edit and can’t be reused.

The first tree is immutable and, because children know nothing of their parents, any edit requires replacing only the node that changed and all parents up to the top — which is about log n nodes. All the others are perfectly reusable.

The initial cost of creating such a structure is big, but the benefits… Oh, the benefits are amazing in my humble opinion. I am definitely not objective here, as I love the functional paradigm and immutability at its heart. Naturally, multi-threading is completely safe without any locks for reading. A given tree is just a tree and won’t change. Creating virtual operations (“preview changes” in Visual Studio) is as simple as it can get. Undo/redo — nearly out of the box. Already analyzed a subtree that probably didn’t change? No need to check equality, just check if references are equal. And probably many more I can’t get my mind on right now.

Mapping into my domain

So with that promising perspective, I sat down and tried to map those syntax nodes and so on into my domain, which is BattleScribe datafiles: catalogues, rosters, entries, selections, groups etc etc. That’s where it got hard. I was really stuck for the last week in trying out different designs.

In the meantime the Visual Studio 2017 hit the web and this weekend I decided to finally migrate my wham libraries to the new old .csproj format. Thankfully the migration was completely successful and gave no problems at all.

There are some problems to solve: I already have serialization mapping classes, which are by design mutable (setters everywhere). How to connect that layer to the “green” tree layer? Re-creating the tree on every serialization is too high a cost both on memory and on CPU, but reusing objects that were used for deserialization sounds rather dangerous — I’d have to remember to essentially copy the behavior of “green” layer in that every change recreates the same path that gets recreated in “green” tree. So it’s almost complete duplication of layers except serializable layer is mutable.

I could abandon those serialization classes and write my own serialization method on every single class, but that also sounds like a bad idea. So with that in mind, I’ve started thinking on writing my own… Source Generator!

Sneak peek:

memes ka me ha me to be continued

Leave a Reply

Your email address will not be published. Required fields are marked *