Diffing and merging resx and NDepend with gmaster

Friday, August 24, 2018 0 Comments

My name is Ralf Koban and I've been using gmaster since its initial preview, about 1 year ago. For me, the greatest value provided by gmaster is the semantic diff and merge.

I mostly used C# with gmaster during this time. But, then I realized I was missing support for specific files in my development such as .resx resource files (famous in the .NET ecosystem) and NDepend configuration files.

So, I decided to add custom support for these XML based languages myself.

And here is the improvement: text-based diff when the NDepend rule changes and relocation makes it impossible to follow what really happened (see the XML in the following figure XML).

Versus gmaster with NDepend support where it is very easy to follow how some rules were changed, moved and even renamed (see the example case for the rule on the top left labelled "methods too complex").

And here goes the full story :-)

A little bit about me and my job

Just a few words about me and what I do. I'm a software developer in a mid-size company. One of my responsibilities is to ensure and increase the software quality in a project for our company (besides the development itself).

It all started with SemanticMerge

Before I joined that project, a colleague of mine was constantly challenged with merging quite a lot of code because they used (and still use) several feature branches. So, he found SemanticMerge and we both evaluated it together.

I immediately bought it, because I found its ability to do a semantic diff and merge was (and still is) quite a huge win for us. Instead of several days, those merges would take us only a few hours (or even less).

Beyond the basics: moves across files with gmaster

A short time after that, gmaster was announced as beta release. Because it had SemanticMerge included, I downloaded a copy and gave it a try.

At that time, I was working on different projects. Some used Git, some used SVN and a few used TFS (such as the project I mentioned earlier). I was lucky to be able to convert some SVN projects into GIT, and even use the GIT TFS bridge to have a local GIT copy of that specific project above.

Out of curiosity, I took a look at the commit history of those projects.

The results were astonishing. gmaster showed me a lot of diffs and code refactorings that included code that was moved between multiple C# files. That was amazing because that was something I really needed to have a better and deeper understanding of in the code.

Diffing and merging resx – the XML challenge for .NET developers

However, something was still missing.

For example, for a lot of resource XML files (.resx) it was still hard to spot the differences easily. Then I found information about the sortresx tool which sorts the .resx XML files alphabetically. I found that idea interesting but not really helpful in my case because all developers on the team would have to use it to benefit from it. Otherwise, if only some used it and some didn’t, then those that don't use it would have trouble when it comes to merging such files.

My own custom XML parser

A short time after that I read a blog post about how external parsers could be used with gmaster (and SemanticMerge).

When I read through that post, I thought that this would be a nice way to get rid of my .resx file problem. In addition, it seemed to be relatively easy to develop such a parser. So, I gave it a try and started to write my own external parser. The outcome of this can be found at https://github.com/RalfKoban/resx-semantic-external-parser.

The hard parts of coding your custom parser: location spans

During development, I found the hard part is actually the correct calculation of the location spans.

The file format of a .resx file itself is just plain XML, so the XML parser I used could provide me the line number and line position. However, after having a first version of my parser ready, I gave it a try in GMaster.

It did not work. So, I contacted the GMaster devs to get an idea what could have gone wrong. It came out that what I missed in my parser was to provide all the spans in a seamless way, starting from the very first character until the very end character of my file, without any gaps in between. This also included the line break characters which I missed (as well as several whitespaces). After I fixed that, my parser still did not work - it came out that I missed including the XML header.

Testing my new XML parser

.Finally, I got my parser up and running in GMaster. So, I gave it a try, changed some resources in a big .resx file a bit and used the sortresx tool to alphabetically sort the file. Then, I compared the result in GMaster and luckily saw the moves and changes all reported correctly. Now, I was prepared to semantic diff and merge .resx files. In addition, because of the semantic diff ability, conflicts in those files could be automatically merged without manual effort from my side.

NDepend: XML based and ready to diff and merge semantically

Awhile later, I thought that being able to semantically diff and even merge .resx files automatically was nice, but it would be even better to be able to diff or merge other kinds of XML files as well.

For example, I've been using NDepend for years to improve the code quality with my projects.

NDepend itself provides a project file (.ndproj) containing different customizable rules, and you can also define your own custom rules. Later versions of NDepend even allow you to place such rules in a special rule files (.ndrules) that can be shared amongst different projects.

In some projects, I heavily customized some of the pre-defined rules to our own needs. Nevertheless, each time a new NDepend release came out, I wanted to benefit from the bug fixes and improvements in the pre-defined rules as well and did a manual diff of the rules. Sometimes this was a bit troublesome because the order of some rules had been changed and my text diff showed me a lot of non-related diffs. In the end, a manual diff and merge took me about 2-3 days to complete.

In addition to that, in the project I mentioned at the beginning, my team recently had the task to switch to the WIX toolkit installation technology (WIX stands for Windows Installer XML, so the WIX files are also XML).

So, I concluded that this would be the ideal starting point for my external XML parser and I started to write one. It can be found at: https://github.com/RalfKoban/xml-semantic-external-parser.

Even if it is still in development, I'm already able to diff NDepend and WIX files, despite it being from other XMLs.

The problems I faced were similar to the .resx parser - no gaps in the files, correct detection of line breaks (and different kinds of line breaks).

Dealing with XML comments

In addition, because I wanted to be able to diff plain XMLs, I was faced with the situation of how to detect the XML comments properly. Short after the start of the development, I decided to consider all XML elements to be containers and all attributes as terminal nodes. Because the XML comments are also some kind of element, I thought to consider them as separate elements. This did not work as expected, and after contacting the gmaster devs again, I changed my mind - now they are considered to be part of other elements as well.

I also had (and probably have) a couple of bugs in my parser, mostly regarding correct location span detection - especially in combination with the different kinds of line breaks. So, to fix some of them, I contacted some of the gmaster devs which helped and supported me a lot. Thanks to all of them (and especially Míryam) for the support!

Future work on XML-based parsers

What's still missing (and I need to think about that a bit deeper) is the support of "code reformats", like the GMaster option for C# code. Because XMLs may have the attributes sorted differently but still are the same, I must find a way to be able to detect such a situation. My goal there is to reduce some noise to spot the real, important differences.

What's also still missing is that I must find ways to consider some elements in specific kinds of XMLs to be terminal nodes. My parser is already doing that for different kinds of elements, but I have to see whether that's suitable in my case. However, my parser can "fall back" to plain XML comparison in such a situation, so currently I can "soften" that semantic diff a bit.

Conclusion

To sum it up, when it comes to developing your own external parser, keep the following in mind:

  • Think about what a container will be and what will be a terminal node (and in which situations).
  • Do report the spans for all characters in the complete file.
  • Do not miss some characters and have "gaps" in between the different nodes.
  • If there is an issue with your parser and gmaster, it is probably because you report wrong or have invalid line positions and character spans (so check first whether the parser behaves correctly).
  • Activate the logging in gmaster for external parsers, because it may help you to spot problems inside your parser early on.
  • Test your parser with different formats.

We develop gmaster, a Git GUI with semantic superpowers. It parses the code to really understand how it was refactored and then diff it and merge it as a human would do, matching methods instead of lines.

If you want to try it, download it from here.

We are also the developers of SemanticMerge and Plastic SCM, a full-featured version control.

0 comments: