‘The Truth Is Out There’: Convert old docs to ‘x’ files format

As consultants, we haven’t been doing our clients any favors by not forcing the migration from Microsoft Office’s old “.doc” (Word) and “.xls” (Excel) files to the newer “.docx” and “.xlsx” file formats. We’ve allowed our clients to update from doc to docx as part of the default settings when creating a new file, but this hasn’t resulted in as fast of an upgrade cycle as we thought it would. Instead we find that many documents are simply edited over time, used as templates to create new files, and when saved these new files remain in the original doc format. So here we are 10 years after the introduction of docx and xlsx and we’re still in the migration phase. Ten years!

Why force this migration?

The “x” at the end of the newer file types tell you that these are actually XML files. XML is a technical standard that many applications recognize. This means that your files are more likely to be successfully opened regardless of what type of software the person you are sending it to is using.

These x files are actually zip files, and you can open them using an unzipping application. Once you do, you’ll see a bunch of small files. The text of your document is called document.xml and is found within the Word folder in my example. If you double click on it, you can open your document in any browser.

inside docx file types are many files

showing the text part of an xml file

When your files are scanned for virus content, they are less likely to become corrupt. This is because the data part of the file is separate from the text part of the file. This separation of duties also means that the files are more recoverable should corruption occur. As long as you have the text part of the document, then Office will rebuild the missing parts on its own as part of the document recovery process.

The new file formats also result in much smaller files. Given the explosion of data, minimizing file size is very important, and now that data storage is moving to the cloud, a smaller file size means faster access, too. These files can be one-half to one-quarter the size of the old files. In my case, the example document is a single page of just text so the savings is less but still significant.

more efficient file size of docx

More privacy, too

The x file format brings a new privacy feature with it, too. With a docx file you can remove personal information from the document using the Inspect Document feature in any of the Office applications. You’ll find this feature on the File tab when your document is open.

document inspector

Continuing to use the same example file, I see in the properties of the file (also available on the File tab) that there is some personal information attached to this file.

properties of a docx file

This file has been around for a while, and I have no idea who Ron Sugiyama is and certainly don’t want him listed as the author of the document. So I’m going to use the Inspect Document feature to remove the authorship.

finalize changes to docx personal information removal

When I select the Document Inspector feature, I’m presented with a list of things that it can check for. You can leave them all checked. It doesn’t take a long time and, as we see below, you’ll get a chance when it’s finished to decide which items that it has found to remove. Press the Remove All button on the items that you want removed from the document.

select the docx personal information removal types

The actions you select take place the next time the document is saved.

Easier to automate admin tasks

In my research for this article, I found this example from Microsoft.

The logo for a consulting company changed to reflect its new mission. The IT department is given the task of changing the logo in the thousands of documents currently stored on a server. In previous versions of Microsoft Office, it was necessary to either open each document individually and delete the old logo and paste the new logo, or to create and test a complex custom application to automate the task. With the new file format, the IT department creates a batch process that navigates the file structure to locate the graphic in the media folder (which is the same for each document) and swaps out the new graphic. Now when the document opens, the new logo automatically displays.

It’s a pretty powerful example of how breaking down a file into component parts can assist any manager responsible for maintaining consistency across files. If the change of a logo example doesn’t speak to you, how about the need to change the content of a footer? Copyright information, disclaimers … these things do get updated, and wouldn’t it be fantastic if they remained consistent across the whole company?

So for all of these reasons we need to finally complete this migration. Again, it’s been 10 years! There’s no good reason to wait any longer.

Time for a mass conversion

Once you’ve decided to take on this task, you’ll realize that you need an automated method to make it happen. The time-consuming option of opening each and every office file and saving it in the new format is just not feasible. Your first step will probably be to search for a tool, at which point you will encounter a lot of free online conversion tools. I wouldn’t trust a single one of them. Why? Let’s see, you upload your files to a stranger who opens them, then saves them to the new file type for free. Do you think that they didn’t also collect the data that you sent to them? Free doesn’t pay their bills; your data does. So skip the online free conversion search.

Instead, there’s an old MSDN blog post from Eric White that you’ll want to read. It’s from 2008 because this is a 10-year-old problem, so the solutions to the problem tend to also be old. The summary of the process is that you download two Office tools from Microsoft then edit an .ini file and let it run. You then repeat this for each folder. It’ll still be a bit of a process to get through, but it can be done.

Another choice is available from Microsoft MVP Graham Mayor. He’s written an add-in for Word 2016 that will help you bulk convert your Word files only. You’ll still need to use the Eric White MSDN blog for the rest of the file types.

Any way you cut this, it is going to take you a while to rid your network of old file types. But certainly it’s going to be quicker than waiting for your users to do the conversion for you. It’s been 10 years and those pesky old files are still around. This type of thing makes a great spare time task or a good project for your next intern.

About The Author

Leave a Comment

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Scroll to Top