It’s inevitable. If you are a web developer, at some point in time, you are going to be stuck with the problem of migrating a system between internet service providers. At that point you will discover (if you haven’t already) that it’s a bit of a pain in the ass. Not just because you have to move stuff, but because of DNS management and the infamous propagation lag.
I’ve given this quite a bit of thought lately as we were in the process of migrating many of our systems to a new hosting provider. There are two main areas you need to focus on in minimising downtime for your clients:
- Time required to take the site down, backup, move and restore everything to the new server/s.
- The DNS record update.
The second item on this list is the major issue. You have little control over the amount of time your DNS update is going to take to propagate to your clients due to DNS record caching. Reducing the TTL is not a full-proof solution since many ISPs won’t obey unusually low TTL settings that you may configure. People have come up with various strategies for dealing with the issue such as implementing a temporary sub-domain while the DNS record update slowly propagates around the globe. Either of the options above may be your best bet in many circumstances, however, if you are making use of a load balancer or proxy (such as Nginx) migrating your site is less of an issue.
Before migrating to our new service provider, we were not making use of a load balancer such as Nginx/HAProxy, however, we had one setup at our new hosting centre (we went with Nginx) before we began the migration. Whilst the rationale for implementing Nginx was more out of the need to create a resilient hosting infrastructure for our clients, we also gained the ability to have more control over where our content is served from.
For those of you not familiar with the concept of a load balancer in the web context, it simply allows you to proxy requests through one endpoint to a dynamic number of back-end web servers. In the Nginx context, the back-end servers are configured as upstream servers. When an HTTP request is received, Nginx forwards the request to one of the back-end webservers. For each successive request, a round robin balancing algorithm is employed to fairly distribute the load. You can control the weighting of each upstream server to deal with a mismatch in server capabilities (think of 3 webservers where 1 is far more capable than the others due to better hardware). While Nginx sufficiently handles balancing in our case, HAProxy does have far more capabilities in this area.
Back to the DNS update issue (if the solution hasn’t become apparent yet, it will now). In our new environment, we configured our Nginx load balancer to point to all our existing webservers (i.e. not our new webservers). Once we completed the configuration of Nginx to do this, we updated all of our DNS settings. We then waited a few days. As you can imagine, over the course of time, each site began resolving to our new hosting provider but users experienced no downtime since Nginx was simply proxying the updated requests to the original servers. This meant that clients who had not yet received the updated DNS record were hitting the original servers, while clients that did receive the DNS update we also doing so but through Nginx at our new hosting provider. Once we were comfortable that everyone was hitting the various sites through our new setup we simply turned everything off and performed the real migration (i.e. physically move the databases etc. across).
Given this approach we minimised downtime for our clients. The 2-3 hours for the physical migration of databases and systems was acceptable to our clients, 2-3 days for DNS resolution was not. This solution mitigated days of potential downtime and worked well for us. The only real downsides to the approach was that our sites operated marginally slower (due to added latency between new and old host) and that we had to pay for bandwidth twice during the transition period. Both of these issues were tenable given the alternatives.
That’s it basically. I have documented the steps below for anyone who’s interested.
- A. Existing web server (DNS currently points here)
- B. New server (Nginx server. This server proxys requests to C, D, E which are the new web servers).
Step 1. Update the Nginx configuration (Server B) in our new setup such that the upstream servers were set to A (not the new webservers C,D an E)
Step 2. Update the DNS record to point to B.
Step 3. Wait 2-3 days while DNS resolves to B.
Step 4. Take your app offline and install the relevant systems on C, D and E. (presumably there is also a DB server).
Step 5. Update Nginx configuration such that the upstream servers are set back to C, D, E.
Step 6. Bring application back online
1. Parts of Step 4 could be achieved before you even start the process. With the exception of the database, the rest of the application could be deployed and tested in the new environment to minimise downtime (only the database needs to be backed up and restored in the final step).
2. As described earlier, during the two to three days while your DNS update is propagating, all requests to B will be proxied to A which means the latency between server B and A will be added onto every request. This shouldn’t be a big problem depending on what the latency between your hosts are.
3. If you make use of SSL, you need to setup your SSL certificate on the Nginx machine before you switch your DNS settings. I’ll probably need to cover this procedure in a separate blog post.