Backup your Subversion repository offsite (Windows Guide)
Posted by Rohland in Development, Uncategorized on February 1, 2010
If you work in a development environment, theres a good chance you are using Subversion as your code repository of choice. If thats the case, the usual suggestion for backing up is to dump the repositories onto a DVD or external drive to be stored offsite. We have been doing this for a while and have found the process painfull (to say the least!). If you run subversion and don’t have your data backed up frequently offsite, you might find yourself pushing this button sooner or later!
Near the end of last year I started looking at offsite backup options that didn’t require user intervention and was very excited to discover the svnsync command. The benefit of svnsync is that only new revisions are mirrored and not the full repository each time. This is absolutely critical if you have a repository that is quite active. Needless to say, I decided to forge ahead and try my hand at implementing automated scripts to take care of backing up our repositories online utilising the svnsync tool. As a reference I have posted the setup process here.
Its important to note that this guide assumes you are working in a Windows environment and that you have access to a server offsite. I have referenced a few articles and other blog posts I discovered along the way to help you if you are working in a Linux environment.
Step 1 – Setup Subversion on your remote server
Create a Windows user account on your remote server which you will use to remotely access the backup repository from your main Subversion server. Take note of the account name and password you use. Once you have created the account, install Visual SVN on the server where you want to host your mirrored repositories. Ensure you select Windows Authentication on the security dialog during the installation process. Once completed, ensure that Subversion is running correctly on the remote machine by opening the VisualSVN manager and clicking on the repository address displayed. Now ensure you can access the repository from your host Subversion server. If your backup server’s name is not addressable from your host server, use the remote servers IP address or simply add an entry to your DNS server or Windows host file. If you opted for setting up a DNS entry, you should be able to ping your backup server using the server’s name. Try access the repository again. When prompted for username and password use the credentials setup for the user account you created.
Step 2 – Configure permissions
Before setting up the repositories etc. we need to define which users have access writes to the backup repository. To configure this, open VisualSVN manager on the remote server and right click on the Repositories folder, choose Properties from the drop down menu. Revoke all access for the BUILTIN\Users role and then add the user account you setup in Step 1. Ensure this user has full Read/Write access.
Step 3 – Create the destination repository
Now that you have full configured the Subversion server hosted on your remote server we can start the process of setting up the synchronisation process. To do this we need to ensure that we have a destination repository to mirror your existing repository to (if you have more than one you need to create a destination repository for each repository you want to mirror). To keep things simple, I gave the destination repository the same name as the source repository. Take note that any repository you create on the destination server should be empty (i.e. do not tick the “Create default structure” checkbox when creating the repository.
Step 4 – Configure the repository
The next step involves setting the Pre-revision Property Change Hook. This is an important step. Right click on the repository you created on the destination server and select All Tasks > Manage Hooks. Click on the “Pre-revision property change hook” entry and click Edit. Enter a few blank lines and click OK and Apply.
Step 5 – Configure SSL
We need to configure the client server to accept the SSL certificate generated by the VisualSVN installer. If you wish to use a properly signed certificate or already have one, follow this guide and ignore the rest of this step. If you want to continue using the auto generated certificate, follow Mark Wilson’s guide on how to trust the default certificate.
Step 6 – Initialise your repositories for synchronisation
Before you can synchronise your repository, you need to initialise it. To do this, you need to run the following command on the host server (note that you need to replace the keys in CAPS to the relevant object names):
svnsync init PATH_TO_REMOTE_REPO PATH_TO_LOCAL_REPO –sync-username REMOTE_USERNAME –sync-password REMOTE_PASSWORD –source-username HOST_USERNAME –source-password HOST_PASSWORD
Step 7 – Initialise remote repositories from a previous backup
Only run through this step if you have a relatively large repository and don’t want to have to mirror it (the sync process is quite slow) from revision 0 all the way to revision xxxx. If you are running through these steps for a brand new repository you want to have mirrored, ignore this step. Also, please take note that if you are using Powershell to execute all these scripts “>” is equiv to | Out-File -encoding Unicode (thanks Keith). If you don’t be careful, you might end up with the Malformed dumpfile header error. To be safe, use the command line interface.
Dump your existing repository on your host machine by running the following script:
svnadmin dump “FILE_PATH_TO_REPO” > “REPO_NAME.db”
Once the repository dump has completed, upload it to your backup server and then run the following script on the backup/mirros server:
svnadmin load “FILE_PATH_TO_BACKUP_REPO” < "REPO_NAME.db"
Now, the next step is critical. You need to update the last-merged-rev property on the remote repository to the existing revision number of the repository (you can get this information by running “svn info REPO_PATH”). To do this run the following script:
svn propset svn:sync-last-merged-rev –revprop -r0 REV_NUMBER “PATH_TO_REMOTE_REPO”.
Step 8 – Synchronise!
Basically you are done, you simply need to run the following script on a frequent basis (best to setup as a scheduled task in Windows):
svnsync sync PATH_TO_REMOTE_REPO –sync-username REMOTE_USERNAME –sync-REMOTE_PASSWORD –source-username HOST_USERNAME –source-password HOST_PASSWORD
Hope you found this useful. I might follow this post up with another blog entry on steps I took to setup an automated script to email me when a repository on the host machine is missing its mirrored counterpart. This is really helpful to detect cases where a repository was setup locally but not configured for synchronisation, furthermore the ability to automatically generate the relevant scripts is quite useful
Rohland
References:
http://journal.paul.querna.org/articles/2006/09/14/using-svnsync/
http://www.rosshawkins.net/archive/2009/04/27/using-svnsync-properwith-visualsvn.aspx
http://devlicious.com/blogs/christopher_bennage/archive/2009/03/11/mirroring-subversion-from-windows.aspx
Implementing the Repository Pattern with LLBLGEN
Posted by Rohland in Development on January 23, 2010
This post has been sitting in draft for a while but finally managed to get round to completing it. I started it back in 2009. Apologies for the delay
It was the start of 2009, and I was investigating ORM tools for a new project we were working on at Clyral. We had been using Linq 2 SQL as our core database access layer for some time but felt we had outgrown it and were looking for something a bit more powerful and flexible. It didn’t take long for us to discover LLBLGEN. Whilst not the most intuitive acronym for an O/R mapping framework, LLBLGEN (Lower Level Business logic Layer Generator) impressed the team from the outset. After downloading the demo version and playing around with it on a test project we committed to purchasing it and since then haven’t looked back.
We started off using the Self Servicing model of the framework as it was earmarked for beginners, in time though, we began to see that we would get more mileage using the adapter model and began using this model as the defacto standard for projects. It was at this point we began looking at ways to implement the repository pattern which simplifies the testing process and ensures the implementation (which is often technology specific) does not get mangled with the domain model. To achieve this we needed every entity to implement an interface (or contract if you will). The problem with this of course, is that in C# generic variance is not supported. This posed a bit of a problem because we still wanted the full representation of a given entity graph to be available using our defined interfaces. To get around this, we needed to update the LLBLGEN templates to allow us to inject our own custom implementation of collections which would match our interface definitions. I have provided a few example snippets to illustrate what I am talking about. Essentially we added properties which took a Todos collection (property of a TodoList entity) such as defined below:
public virtual EntityCollection<TodoEntity> Todos
{
get
{
if(_todos==null)
{
_todos = new EntityCollection<TodoEntity>(EntityFactoryCache2.GetEntityFactory(typeof(TodoEntityFactory)));
_todos.SetContainingEntityInfo(this, "Todolists");
}
return _todos;
}
}
and added the code below to support out interface definition:
public EntityList<ITodoEntity, TodoEntity> TodosCollection
{
get
{
if (_TodosCollection == null)
{
_TodosCollection = new EntityList<ITodoEntity, TodoEntity>(this.Todos);
}
return _TodosCollection;
}
}
private EntityList<ITodoEntity, TodoEntity> _TodosCollection;
where “EntityList” is a custom wrapper we wrote to get around the generic variance issue (note that EntityList understands that a TodoEntity is an implementation of ITodoEntity). This allowed us to define our entity contracts as such:
/// <summary>
/// Interface for the entity 'TodoList'.
/// </summary>
public partial interface ITodoListEntity
{
EntityList<ITodoEntity, TodoEntity> TodosCollection {get;}
System.Int32 Id {get;set;}
System.String Title {get;set;}
System.String Description {get;set;}
System.Int32 ProjectId {get;set;}
System.Int16 Position {get;set;}
System.Boolean Billable {get;set;}
System.DateTime CreatedOn {get;set;}
System.DateTime ModifiedOn {get;set;}
System.String CreatedBy {get;set;}
System.String ModifiedBy {get;set;}
System.Guid CreatedUserId {get;set;}
System.Guid ModifiedUserId {get;set;}
}
As you can see, a standard was implemented where the original list’s name was simply extended with the word “Collection”. After modifying the adapter’s templates and generating our templates to create the entity contracts, everything fell into place and we had our repository pattern implemented. Our repository definitions ensured that only interfaces were passed round (of course implemented using LLBLGEN’s entities) which in turn ensured that our UI (or business logic) knew nothing about the underlying implementation. One benefit of doing this is that the chaps working on the UI never had to deal with the copious number of properties and methods that hang off an LLBLGEN entity by default. Of course, these properties and methods are useful in some cases and can still be used within the repository itself.
I have attached a zip file to this post with the implementation of the EntityList class as well as the templates that we modified and added to make this all happen. Let me know what you think, any comments or suggestions regarding the implementation are certainly welcome!
Rohland
C# HTML Diff Algorithm
Posted by Rohland in Development on October 31, 2009
I have finally launched my first Codeplex project, very exciting
I was inspired by writeboard.com to find some way of implementing an HTML difference viewer in an internal application I was developing. Essentially, I was looking for a way to take two blocks of HTML and compare them in a way that highlights what the differences are. This is extremely useful for CMS type systems where WYSIWYG/Textile/Wiki markup is used to populate content. In most web systems where content is authored dynamically, a history of the content is tracked over time. When collaborating with a few people, this feature is critically important. What makes it extremely useful is the capability to detect what has changed between versions. This post focuses on a project I have launched to do exactly that – track the difference between two versions of HTML markup.
The application I was building was developed on ASP .NET MVC (C#) so naturally I was looking for some C# code I could use to implement the difference algorithm. In searching, I could not find any libraries that were worth implementing. I did come across one or two command line utilities but nothing spectacular. I widened my search to other languages and came across a neat implementation in Ruby. The algorithm was developed by Nathan Herald who generously made the code available to everyone via the common MIT license.
So, I had the algorithm I was looking for, but I didn’t speak Ruby! This was an excellent opportunity to roll up my sleeves and learn some Ruby so I fired up my browser, downloaded the Windows one-click installer and got a simple environment up and running. After toying with code for a bit, scratching my head at one or two alien Ruby constructs I got the gist of how things worked. I fired up Visual Studio, created a new project and began the process of porting the algorithm. I must admit that the process was relatively painless and I got something working in a few hours. It took about another hour or two to iron out some bugs I picked up but essentially, in a relatively short space of time, I had the C# diff library that I was originally looking for! Below is a demo of how it is used, followed by one or two screenshots demonstrating the functionality when rendered to your browser.
string oldText = @"<p>This is some sample text to demonstrate the capability of the <strong>HTML diff tool</strong>.</p>
<p>It is based on the Ruby implementation found <a href='http://github.com/myobie/htmldiff'>here</a>. Note how the link has no tooltip</p>
<table cellpadding='0' cellspacing='0'>
<tr><td>Some sample text</td><td>Some sample value</td></tr>
<tr><td>Data 1 (this row will be removed)</td><td>Data 2</td></tr>
</table>";
string newText = @"<p>This is some sample text to demonstrate the awesome capabilities of the <strong>HTML diff tool</strong>.</p><br/><br/>Extra spacing here that was not here before.
<p>It is based on the Ruby implementation found <a title='Cool tooltip' href='http://github.com/myobie/htmldiff'>here</a>. Note how the link has a tooltip now and the HTML diff algorithm has preserved formatting.</p>
<table cellpadding='0' cellspacing='0'>
<tr><td>Some sample <strong>bold text</strong></td><td>Some sample value</td></tr>
</table>";
HtmlDiff diffHelper = new HtmlDiff(oldText, newText);
string diffOutput = diffHelper.Build();
Using the sample web application provided with the project in Codeplex, the following is rendered based on the code above:
You can see that the algorithm as originally developed takes care of the nasty HTML parsing to figure out how to highlight the differences. The changes are marked up using “ins” and “del” tags. You can easily style these tags as I have done. The CSS below is responsible for rendering the differences as per the example.
ins {
background-color: #cfc;
text-decoration: none;
}
del {
color: #999;
background-color:#FEC8C8;
}
I hope you find the library useful. I wish I had more time to add tests and more documentation to the Codeplex project, but for now I think the implementation is reasonably solid and easy to follow. If you spot any bugs, let me know and I’ll try and attend to them. Given that I am not responsible for the original implementation as developed in Ruby, it might be a bit tricky to solve some of the fundamental issues with the algorithm but I will certainly have a crack at it since I have quite a good understanding of how it works after porting it.
Link to C# implementation: http://htmldiff.codeplex.com
Link to Ruby implementation: http://github.com/myobie/htmldiff
Project deadlines and their impact
Posted by Rohland in Development on October 18, 2009
Deadlines – customers love them, developers hate them.
Ok, so that was a slight generalisation, however in my experience customers tend to set fixed deadlines without any significant consultation with their development team. This causes countless issues as expectations are misaligned right from the start. Often in a scoping meeting I hear some variant of the phrase: “We need the project launched by [insert ridiculous date here]“.
Case in point – I had a meeting last week with a prospective new client where we discussed quite an interesting project. When we got down to the specifics, the client specified the deadline for roll-out was early December. Now, given that the project is quite meaty, has no formal specification (no sweat, Agile rocks) and that there would most definitely be a lead time, I knew it would be impossible to get anything decent out by then. So, only one thing to do – attempt to level expectations! In doing so, I could sense the customer’s disappointment fill the room, killing all inspiration and excitement about what we had just discussed. This was followed by the rather predictable question: “So what do you think we can achieve by early December?”.
I hate being in this position. Clearly, I have only had 1 hour to digest information which the client has had months/years to think about (often with critical domain knowledge). How can I possibly give any kind of accurate indication of what can be achieved? Of course, my experience counts for something and I have some kind of feeling about the project, but I know if I attempt to answer the question, we will be treading on dangerous territory. For example, if I vocalised my guesses (guesses, not estimates), I know the client may potentially end up using this as fact and then hold me accountable when two months down the line we don’t have the relevant bits complete. If I don’t candidly discuss my guesses, I end up looking like an inexperienced idiot. Conundrum. My only recourse is to somehow convey to the client, that in the software development paradigm, it is extremely difficult to shoot from the hip in estimating timelines and cost for a software project. At this point, a lot of what is being discussed goes over the clients head (understandably), they don’t understand why it is so difficult for me to give them some kind of indication. Specifically, it perplexes them why I cant estimate ball park figures relating to total project timeline and cost. So, I end up throwing out a few numbers, knowing full well that my guess is subject to the Cone of Uncertainty, something the client may forget about and perhaps blame me for later. Did I mention I hate being in this position?
The point I am trying to make, is that deadlines, much like budgets, always seem to be this fixed line in the sand. Cross it, and face the wrath of the client. Nevermind that the deadline/budget was a guess to begin with and that the client was warned about it. Deadlines and milestones should have the ability to shift as the project progresses. They really should just resemble some date in the future where all parties believe everything should be complete. I understand that business needs to set some boundaries, for example, if a new product is being launched, there are a number of projects (software, marketing etc.) that all need to culminate with a launch date. However, too often everything is left to the last minute and pressure placed on certain parties (us) to deliver. No-one really works well under sustained pressure so the result is always sub-optimal. In the case of the project discussed with the new client, I knew we could develop something by the beginning of December but that it would end up failing in the operational environment as severe short cuts would need to be taken. A premature release to users would ultimately lead to a situation where quick fixes/updates would need to patched onto an already shaky solution as operational issues arise. Suddenly, after a further two to three months, the result of quick fixes and updates to this shaky solution becomes the final solution. Not the most ideal plan.
All in all, I don’t think we can ever get away from deadlines and project constraints, but I do think all parties agreeing to a deadline should view it as a relatively fuzzy date in the future. At the very least, if the date needs to be fixed, consultation with key parties is critical before a deadline is even decided upon. The deadline should be reasonable. It is critical that you not succumb to the pressure of sitting in front of a new customer and force your team into a short sighted decision. It is always best to be candid and constantly remind the customer of the risks involved in planning a software project poorly. Budgets and delivery dates are sensitive topics because they always constrain quality. As a developer, I don’t enjoy delivering solutions to a customer when I know that with a bit more time we could have delivered something spectacular which would ultimately give the overall project a significantly better chance to succeed. We need to educate customers in this regard as many of them have never been responsible for a software project (especially SMEs) and don’t understand that quality is a critical factor in planning a project’s budget and delivery date.
On a lighter note, I’m sure we have all been in a situation similar to the one below. I particularly enjoyed drawing the parallel to the common software development milestone known as the “product shipment date”. Yup, been there.

Blog Migration
Posted by Rohland in Uncategorized on October 11, 2009
In search of a bit more flexibility, I have decided to move my existing blog at dotnet.org.za to www.rohland.co.za. For lack of a better name, I ended up with the obvious pretentious default
I am very grateful for the service made available via the dotnet.org.za portal. A year ago it allowed me to get up and running with my blog relatively quickly. To the administrators of the portal – thank you!
In terms of moving forward, I was not sure what the correct procedure for moving my existing blog away from dotnet.org.za was, so I ended up implementing a simple script to redirect to my new domain. Hopefully, I will be able to organise a mechanism whereby future posts here reflect in the dotnet.org.za main feed. If not, then I guess that is the penalty for moving my blog.
Hopefully you subscribe to the RSS feed available here, or at the very least, check back periodically for new content. Please let me know if spot any issues.
If you have landed here after clicking a link related to one of my older blog posts, simply use the search feature available in the side bar to locate the content you were originally hoping to review.
Cheers,
Rohland

What people are saying