Cleaning Up After Migrating To Drupal

Photo of Greg Harvey
Wed, 2010-02-17 13:35By greg

We have just finished a migration job for a client of ours from an old .Net system in to Drupal, the last task of which was to write some Apache mod_rewrite conditions and rules to deal with the URLs of their old website. This proved to be more trouble than I thought, mainly because I struggled to find examples of how this might work.

Firstly, the ground work. The URL pattern to be redirected looked like this:
MainArticle.aspx?m=33818&amid=30301119

Where the amid value is the article ID which we had taken through to Drupal and used the Pathauto module to make sure all the URLs were:
story/%amid

So we now have a connection between the old URLs and the new URLs we can confidently rely on as the basis for a mechanism to make sure both bots and people are directed to the new content correctly.

Now for the tricky bit. To do the directing we decided to use Apache's built-in redirection abilities. I started out thinking we would use RedirectMatch, but I pretty quickly found this blog post - it was a good starting point, saying this would not work and saving some time:
http://davidherron.com/content/cure-fail-when-using-redirectmatch-clean-...

It also shows how you can achieve the same with RewriteRule instead, to provide a 301 redirect (permanently moved), but it only tells half the story. I'd perfected my rule and it looked like this:
RewriteRule ^.*amid=([0-9]+).*$ story/$1 [R=301,L]

I tested it here and it worked too:
http://civilolydnad.se/projects/rewriterule/

But no matter where I put it in Drupal's .htaccess file, it did not do a thing!

Eventually I found this comment on Drupal Groups, which made the penny finally drop:
http://groups.drupal.org/node/11361#comment-43246

The rewrite rule tester I had been using has a bug! The querystring is being disregarded by the *real* mod_rewrite, so even though my rule seemed to work at the rewrite rule tester above, the querystring was simply not available. This is me not understanding things properly. Once I had that worked out, it was plain sailing.

So here's what I finally ended up with, with comments inline:


# Rewrite old-style URLs for .aspx scripts

# First we need to exclude files and favicon.ico, just like Drupal does
# Repeat the core conditions here:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !=/favicon.ico

# Now we need to find the value we're after in the querystring:
RewriteCond %{QUERY_STRING} ^.*amid=([0-9]+).*$

# Finally, catch any request for an aspx script and push it to our alias
# Note the R=301, our permanent redirect
# Also note the final ? on the end, to stop the old querystring being
# passed on again:
RewriteRule ^.*\.aspx.*$ /story/%1? [R=301,L]

This *does* work. It will catch all the requests to any aspx script and push them on to the revised URL on the same server, as a 301 redirect, using the value from the previous condition where we parsed the querystring to get the old amid article ID out.