Project Description
An algorithm to diff two strings or XElements. Not only get the diff, but also get how to transform one string to another.
Two methods are provided. One returns the ChangeSet object, another one apply the ChangeSet object to transform a source string to the target.
by KevinZ


Given 2 versions of one text file or XML file. Find out the different between them, as well as how to transform from one to another.
We all use diff tools to diff 2 files, but there are 2 limitations.
1. Most diff tools tell you which lines have been changes. It doesn't tell you which word in this line has changed.
2. Most diff tools cannot tell you how the changes happened. It cannot tell you all steps to edit original file to the current file.

What I want is that a modify commands list, follow this list, the original file can be edited as same as the target file. Once we have a series of those modify commands list ( I call it changeset), we can reproduce the history of editing.

A ChangeSet contains many ChangeItems. There are four kinds of ChangeItemType:
They are Delete, Insert, Replace, Update. There are also line number or word number indicate where is the change, as well as the new text to which will be changed. Once you have the original text and the ChangeSet, using ApplyDelta() function you can reproduce the changes.

Please use the Unit Test project in the source code. It is very useful if you make any change in the source code. It can verify most common cases. It is also a good way to learn the algorithm.

Possible usage of this algorithm will be
1. Version control
2. Detect and track changes. Maybe used in comparative editing like Google Doc.
3. Notify changes of original file on server site to the client site. Because it only transfer the Delta, it is much smaller in transfer size.

Although I tried to make this tools work smarter to find out the minimal changes between old version and new version. Some times, it still not smart enough. Please update me if you have any better algorithm to make it even smarter.

If you have any idea, just drop a line to me. Any comments, suggestion, bug reports, project invitation, or even a job offer !

Contact me at kevin.zhang.canada AT gmail.com

Credit:
The diff code is adapted from Matthias Hertel, http://www.mathertel.de. I made minor changes to fit DotNET 4.0 Silverlight (There is no Hashtable in Silverlight, Dictionary<> instead). You can find the original comments and copyright info from diff.cs.

Last edited May 1, 2010 at 4:43 AM by KevinZ, version 4