Upgrading JIRA when the database encoding is wrong

I recently upgraded our instance of JIRA from a very old 3 to a newish 4.1. We migrated to new hardware, new operating system, new database type and new database encoding at the same time. What made things interesting was that the original database wasn’t UTF-8 (it was Cp1252) so when we imported the JIRA export into the new system, it crashed spectacularly.

I tried to clean up some of the bad characters I found in the issues, but (un)fortunately, JIRA keeps copies of every change. So if you have an issue description with bad characters in it, then edit it and save, the old one is tucked away safely and can’t be edited. So I was doomed and had to manually process the export XML before importing it into the new system.

Atlassian actually provide a tool to do this, but it didn’t work. I ended up writing a Groovy script to do this (see below) and have listed the steps I followed in case anyone else runs into the same problem. The script isn’t quite complete, nor the best code in the world, as you’ll see I still had to manually find and replace all beta symbols (ß). Anyway, this process does actually work so good luck.
Here’s the Groovy script: fixXml.gr

And the steps:

  1. Unzip the JIRA export
  2. Manually search and replace beta’s (ß)
  3. Run the groovy script: groovy fixXml export.xml export-fixed.xml
  4. Verify via xmllint: xmllint export-fixed.xml

You’ll need to have Groovy installed.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s