How I Switched My Blog From Drupal To Jekyll

When I migrated my blog from Drupal to Jekyll the migration was more than a simple migration as the Jekyll migration scripts handle. I migrated the comments to Disqus, kept my URL aliases, migrated taxonomy terms, and cleanly handled the images and other media in my posts. Because I migrated so much, Matt Butcher asked me to share what I did. So, here goes.

As a quick note, before I started this migration I made a copy and backup of my Drupal site. All of the changes I made were to the copy rather than the production site. I do not recommend doing this work straight off of a production site.

Some Assumptions

Knowing my site I was able to make some assumptions about the code that simplified what I was doing. These show up through the code and someone with different assumptions might want to do something different.

  • All my blog posts have URL aliases. None of the content should have been linked at /node/123. So, I discarded all links like that.
  • The migration was from Drupal 6 to Jekyll. Migrating from Drupal 7 would have been more difficult due to the field system.
  • I didn't have any cck fields on my content to bring over. Just the body content from nodes.

Migrating The Comments to Disqus

The first step for me was to migrate the comments to Disqus. There are two methods to migrate the comments to Disqus. I used the xml download from Drupal that I uploaded to Disqus. First, I created a new site within Discus and installed the Disqus Drupal module and the Disqus Migrate submodule. The version I used was the development snapshot from February 3, 2012.

In order to get the proper URLs I temporarily altered my hosts file to point codeengineered.com to my local instance.

Before installing these modules I made a few modifications for my use.

The first modification was to the file disqus_migrate.export.inc. The alterations to this file export the comments in a manner that makes them easier to use in Jekyll.

The first alteration was at line 335.

// Load up the thread data array for this node
$thread_data[$nid] = array(
  'title' => $node_data->title,
  'link' => url("node/" . $nid, array('absolute' => TRUE)),
  'identifier' => 'node/' . $nid,
  'post_date_gmt' => date("Y-m-d H:i:s", $node_data->created),
  'post_date_gmt_unix' => $node_data->created,
);

On the url() function call I removed the argument of 'alias' => TRUE which tells Drupal the path being passed in is already an alias.

The next change I made was at line 256.

$output .= '<item>';
$output .= '<title>' . _disqus_migrate_cleanse_xml($thread['title']) . '</title>';
$output .= '<link>' . $thread['link'] . '</link>';
$output .= '<content:encoded></content:encoded>';
//$output .= '<dsq:identifier>' . $thread['identifier'] . '</dsq:identifier>';
$output .= '<wp:post_date_gmt>' . $thread['post_date_gmt'] . '</wp:post_date_gmt>';
$output .= '<wp:comment_status>open</wp:comment_status>';

The Disqus identifier is useful if we are keeping the comments inside Drupal. Since the move was to Jekyll I didn't want this line. Instead the link (URL) alias will be used instead.

To help verify the comments made it over I also altered disqus.module on line 431 where I commented out the identifier.

$settings = array(
  'url' => $options['url'],
  'title' => $options['title'],
  //'identifier' => $options['identifier'],
  'shortname' => $domain,
);

Then I exported the comments to an XML file and uploaded them to Disqus as a generic WXR file. Once the import was complete the comments showed up on my local Drupal site at the url aliases.

In my Jekyll template where I wanted the comments to show up I added:

If you use this be sure to update the two variables.

Note, this is embeded as a gist because a snippet with template variables doesn't work to show the variables.

Changes To The Content

Before my content was ready to be migrated I had to make some changes to the content. For example, rather than my images living at /sites/codeengineered.com/files/images I wanted them to simply live at /media/images. To make changes like this I used an update query that uses the REPLACE function. For example:

UPDATE `node_revisions` SET `body` = REPLACE(`body`, 'code to find', 'code to replace it with');

Cleaning up the code was easier to do in SQL before migrating to static files.

The Migration Script

For the migration script I started with the directions on the migration page and the built in migration script. From there I added permalinks populated by the URL alias and my taxonomy tags. My customizations can be seen in this gist.