Getting the File Formats
The MS Office file formats (Word, Excel, Powerpoint, Office Binder and Office Drawing) were all
made freely available from the MS msdn website in 1998. Since then they have been removed, but
MS made cd's available of their website to developers that registered to receive them. These
cd's are commonly available. The particular cd that the specifications were made available on
is the July 1998 edition. CD Number 2 of the three part set. The specs that were made available
were the office 97 spefications. Not the previous versions. The specs are quite hard to read, and
often incomplete. Some fields are wrong, and some information is not fully correct, but theres
nothing better available.
wotsit.org is an archive of file formats. I believe that the word
6 and some of the office97 formats are also available from that site. New file formats are added
regularly, so its a good site to keep an eye on.
The wine project is a project to implement the windows api
so as to run native windows binaries under linux. Its source can be a great help for some of the
ms file formats, such as the wmf file format. (my libwmf based on this can be found
here).
Freely Available Libraries and Products
There is considerable overlap between the free projects, some of them are based upon others.
The numbering of the entries reflects their usefulness and active state.
Word
- wv can read word 2000,98,97,95 and 6 files. It's about the best gpled, or
similiarly licensed,
available word reader, and the only one that handles fastsaved file correctly. Check the link for
the full feature set and more information
- laola, includes one called elser,
which is ok considering that it did not use any specifications at all, but its far from 100%
- word2x, which is for word 6 and doesnt do fastsaves
- catdoc, which doesnt do fastsaves or tables, also for word 6.
Some other word related material
Excel
- gnumeric reads them in pretty well.
- xls2xml
OLE2
- libole2 which is gnumeric people wrote is about the best one going at the moment. It's available as its own
library from Gnome. It is also available from the Gnome FTP site
under "Unstable Sources"
- cole from the filters project can handle ole2.
- laola is a perl ole2 stream
- Property Set information
access library
wmf
- libwmf, converts wmf files into xfig,
gif and other formats.
Escher
- The escher (internal office draw format) file format is being examined by both the gnumeric and
wv projects at the moment. There's a very bad preliminary implementation in wv at the moment. Hopefully
this will mature to a useful stage in the close future.
Some Commercial Products
I have very little knowledge of how well these work at all, you'll have to try them for yourself.
- QuickView Plus exists for some unix platforms, not linux though
- Sun has something which displays
word files on screen, though it doesnt print
- Stardivision (now sun as well) has a word processor that can read most word documents fairly well
- Corels word processor for linux, has a very good
converter for word6/7/8 built in. Its has had a few mistakes in conversion but retains a good bit of formatting.
- Use wine and the ms 16bit word viewer, heres a howto
|