<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Rohland de Charmoy &#187; html</title>
	<atom:link href="http://www.rohland.co.za/index.php/tag/html/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rohland.co.za</link>
	<description>Pushing buttons...</description>
	<lastBuildDate>Sat, 04 Feb 2012 16:01:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>C# HTML Diff Algorithm</title>
		<link>http://www.rohland.co.za/index.php/2009/10/31/csharp-html-diff-algorithm/</link>
		<comments>http://www.rohland.co.za/index.php/2009/10/31/csharp-html-diff-algorithm/#comments</comments>
		<pubDate>Sat, 31 Oct 2009 08:44:25 +0000</pubDate>
		<dc:creator>Rohland</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[codeplex]]></category>
		<category><![CDATA[csharp]]></category>
		<category><![CDATA[dotnet]]></category>
		<category><![CDATA[html]]></category>

		<guid isPermaLink="false">http://www.rohland.co.za/?p=81</guid>
		<description><![CDATA[I have finally launched by my first Codeplex project, very exciting :) I was inspired by <a href="http://writeboard.com">writeboard.com</a> to find some way of implementing an HTML difference viewer in an internal application I was developing. Essentially, I was looking for a way to take two blocks of HTML and compare them in a way that highlights what the differences are. This is extremely useful for CMS type systems where WYSIWYG/Textile/Wiki markup is used to populate content. In most web systems where content is authored dynamically, a history of the content is tracked over time. When collaborating with a few people, this feature is critically important. What makes it extremely useful is the capability to detect what has changed between versions. This post focuses on a project I have launched to do exactly that - track the difference between two versions of HTML markup.]]></description>
			<content:encoded><![CDATA[<p>I have finally launched my first <a href="http://htmldiff.codeplex.com" title="C# Html Diff Algorithm">Codeplex project</a>, very exciting <img src='http://www.rohland.co.za/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  I was inspired by <a href="http://writeboard.com">writeboard.com</a> to find some way of implementing an HTML difference viewer in an internal application I was developing. Essentially, I was looking for a way to take two blocks of HTML and compare them in a way that highlights what the differences are. This is extremely useful for CMS type systems where WYSIWYG/Textile/Wiki markup is used to populate content. In most web systems where content is authored dynamically, a history of the content is tracked over time. When collaborating with a few people, this feature is critically important. What makes it extremely useful is the capability to detect what has changed between versions. This post focuses on a project I have launched to do exactly that &#8211; track the difference between two versions of HTML markup.</p>
<p>The application I was building was developed on ASP .NET MVC (C#) so naturally I was looking for some C# code I could use to implement the difference algorithm. In searching, I could not find any libraries that were worth implementing. I did come across one or two command line utilities but nothing spectacular. I widened my search to other languages and came across a <a href="http://github.com/myobie/htmldiff">neat implementation</a> in <a href="http://www.ruby-lang.org/en/">Ruby</a>. The algorithm was developed by <a href="http://nathanherald.com/">Nathan Herald</a> who generously made the code available to everyone via the common <a href="http://en.wikipedia.org/wiki/MIT_License">MIT license</a>.</p>
<p>So, I had the algorithm I was looking for, but I didn&#8217;t speak Ruby! This was an excellent opportunity to roll up my sleeves and learn some Ruby so I fired up my browser, downloaded the Windows <a href="http://www.ruby-lang.org/en/downloads/">one-click installer</a> and got a simple environment up and running. After toying with code for a bit, scratching my head at one or two alien Ruby constructs I got the gist of how things worked. I fired up Visual Studio, created a new project and began the process of porting the algorithm. I must admit that the process was relatively painless and I got something working in a few hours. It took about another hour or two to iron out some bugs I picked up but essentially, in a relatively short space of time, I had the C# diff library that I was originally looking for! Below is a demo of how it is used, followed by one or two screenshots demonstrating the functionality when rendered to your browser.</p>
<pre class="brush: csharp; title: ; notranslate">
            string oldText = @&quot;&lt;p&gt;This is some sample text to demonstrate the capability of the &lt;strong&gt;HTML diff tool&lt;/strong&gt;.&lt;/p&gt;
                                &lt;p&gt;It is based on the Ruby implementation found &lt;a href='http://github.com/myobie/htmldiff'&gt;here&lt;/a&gt;. Note how the link has no tooltip&lt;/p&gt;
                                &lt;table cellpadding='0' cellspacing='0'&gt;
                                &lt;tr&gt;&lt;td&gt;Some sample text&lt;/td&gt;&lt;td&gt;Some sample value&lt;/td&gt;&lt;/tr&gt;
                                &lt;tr&gt;&lt;td&gt;Data 1 (this row will be removed)&lt;/td&gt;&lt;td&gt;Data 2&lt;/td&gt;&lt;/tr&gt;
                                &lt;/table&gt;&quot;;

            string newText = @&quot;&lt;p&gt;This is some sample text to demonstrate the awesome capabilities of the &lt;strong&gt;HTML diff tool&lt;/strong&gt;.&lt;/p&gt;&lt;br/&gt;&lt;br/&gt;Extra spacing here that was not here before.
                                &lt;p&gt;It is based on the Ruby implementation found &lt;a title='Cool tooltip' href='http://github.com/myobie/htmldiff'&gt;here&lt;/a&gt;. Note how the link has a tooltip now and the HTML diff algorithm has preserved formatting.&lt;/p&gt;
                                &lt;table cellpadding='0' cellspacing='0'&gt;
                                &lt;tr&gt;&lt;td&gt;Some sample &lt;strong&gt;bold text&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Some sample value&lt;/td&gt;&lt;/tr&gt;
                                &lt;/table&gt;&quot;;

            HtmlDiff diffHelper = new HtmlDiff(oldText, newText);
            string diffOutput = diffHelper.Build();
</pre>
<p>Using the sample web application provided with the project in Codeplex, the following is rendered based on the code above:</p>
<div id="attachment_119" class="wp-caption aligncenter" style="width: 587px"><a href="http://www.rohland.co.za/wp-content/uploads/2009/10/html_diff_old_text.PNG"><img src="http://www.rohland.co.za/wp-content/uploads/2009/10/html_diff_old_text.PNG" alt="Old HTML" title="html_diff_old_text" width="577" height="212" class="size-full wp-image-119" /></a><p class="wp-caption-text">Old HTML</p></div>
<div id="attachment_122" class="wp-caption aligncenter" style="width: 583px"><a href="http://www.rohland.co.za/wp-content/uploads/2009/10/html_diff_new_text.PNG"><img src="http://www.rohland.co.za/wp-content/uploads/2009/10/html_diff_new_text.PNG" alt="Updated HTML" title="html_diff_new_text" width="573" height="275" class="size-full wp-image-122" /></a><p class="wp-caption-text">Updated HTML</p></div>
<div id="attachment_123" class="wp-caption aligncenter" style="width: 588px"><a href="http://www.rohland.co.za/wp-content/uploads/2009/10/html_diff_output_text.PNG"><img src="http://www.rohland.co.za/wp-content/uploads/2009/10/html_diff_output_text.PNG" alt="HTML diff output" title="html_diff_output_text" width="578" height="320" class="size-full wp-image-123" /></a><p class="wp-caption-text">HTML diff output</p></div>
<p>You can see that the algorithm as originally developed takes care of the nasty HTML parsing to figure out how to highlight the differences. The changes are marked up using &#8220;ins&#8221; and &#8220;del&#8221; tags. You can easily style these tags as I have done. The CSS below is responsible for rendering the differences as per the example.</p>
<pre class="brush: css; title: ; notranslate">
ins {
	background-color: #cfc;
	text-decoration: none;
}

del {
	color: #999;
	background-color:#FEC8C8;
}
</pre>
<p>I hope you find the library useful. I wish I had more time to add tests and more documentation to the Codeplex project, but for now I think the implementation is reasonably solid and easy to follow. If you spot any bugs, let me know and I&#8217;ll try and attend to them. Given that I am not responsible for the original implementation as developed in Ruby, it might be a bit tricky to solve some of the fundamental issues with the algorithm but I will certainly have a crack at it since I have quite a good understanding of how it works after porting it.</p>
<p>Link to C# implementation: <a href="http://htmldiff.codeplex.com">http://htmldiff.codeplex.com</a><br />
Link to Ruby implementation: <a href="http://github.com/myobie/htmldiff">http://github.com/myobie/htmldiff</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.rohland.co.za/index.php/2009/10/31/csharp-html-diff-algorithm/feed/</wfw:commentRss>
		<slash:comments>37</slash:comments>
		</item>
	</channel>
</rss>

