Information about the MS Office File Formats

A Word 8 converter

Getting the File Formats

The MS Office file formats (Word, Excel, Powerpoint, Office Binder and Office Drawing) were all made freely available from the MS msdn website in 1998. Since then they have been removed, but MS made cd's available of their website to developers that registered to receive them. These cd's are commonly available. The particular cd that the specifications were made available on is the July 1998 edition. CD Number 2 of the three part set. The specs that were made available were the office 97 spefications. Not the previous versions. The specs are quite hard to read, and often incomplete. Some fields are wrong, and some information is not fully correct, but theres nothing better available. is an archive of file formats. I believe that the word 6 and some of the office97 formats are also available from that site. New file formats are added regularly, so its a good site to keep an eye on.

The wine project is a project to implement the windows api so as to run native windows binaries under linux. Its source can be a great help for some of the ms file formats, such as the wmf file format. (my libwmf based on this can be found here).

Freely Available Libraries and Products

There is considerable overlap between the free projects, some of them are based upon others. The numbering of the entries reflects their usefulness and active state.


  1. wv can read word 2000,98,97,95 and 6 files. It's about the best gpled, or similiarly licensed, available word reader, and the only one that handles fastsaved file correctly. Check the link for the full feature set and more information
  2. laola, includes one called elser, which is ok considering that it did not use any specifications at all, but its far from 100%
  3. word2x, which is for word 6 and doesnt do fastsaves
  4. catdoc, which doesnt do fastsaves or tables, also for word 6.

Some other word related material


  1. gnumeric reads them in pretty well.
  2. xls2xml


  1. libole2 which is gnumeric people wrote is about the best one going at the moment. It's available as its own library from Gnome. It is also available from the Gnome FTP site under "Unstable Sources"
  2. cole from the filters project can handle ole2.
  3. laola is a perl ole2 stream
  4. Property Set information access library


  1. libwmf, converts wmf files into xfig, gif and other formats.


  1. The escher (internal office draw format) file format is being examined by both the gnumeric and wv projects at the moment. There's a very bad preliminary implementation in wv at the moment. Hopefully this will mature to a useful stage in the close future.

Some Commercial Products

I have very little knowledge of how well these work at all, you'll have to try them for yourself.
  • QuickView Plus exists for some unix platforms, not linux though
  • Sun has something which displays word files on screen, though it doesnt print
  • Stardivision (now sun as well) has a word processor that can read most word documents fairly well
  • Corels word processor for linux, has a very good converter for word6/7/8 built in. Its has had a few mistakes in conversion but retains a good bit of formatting.
  • Use wine and the ms 16bit word viewer, heres a howto