The wv library itself

A Word 8 converter for Unix


Download Dir

The general overview of wv can be found on the home page. This section relates some more detail as to the fileformat. Firstly word documents from version 6 upwards are stored in an ole2 fileformat wrapper. And the internal word format is enclosed inside this. There are two subtypes of word fileformat in each main format, fast and full save. Otherwise known as complex and simple format. The complex format is quite difficult to implement correctly, but I believe that wv has achieved this correctly. The reasoning behind the creation of something as monstrous as the fastsave fileformat by Microsoft is unknown to me, but it is very awkward.

What do you need

All that is required is the source but..
If you want to be able to handle embedded wmf files (which you do), then you need to have the following installed
  • zlib to be able to uncompress wmf files, which are stored compressed.
  • Imagemagick to convert bmp files to png so that they can be shown in a browser, a mini implementation is part of wv
  • You need to have libpng installed. If Imagemagick was not found then wv will attempt to find and use png itself, if Imagemagick was installed wv will use it instead and hope that it was linked against png, if this turns out to be false you should reinstall imagemagick with png support, or failing that install png and run wv's configure as ./configure --without-Magick
In general after you install zlib, png and ImageMagick, then you just have to do is ./configure make The INSTALL file in the distribution has all the building details you need to know.

Charset Conversion

The text of a word document in word 8 is often stored in unicode, wv will convert this to utf-8 so that it can be read in netscape and other modern browsers. In older word documents (and under certain conditions in word 8) the text is stored in one of the windows codepages. By default wv will promote this text to unicode and convert it to utf-8

Some users dislike utf-8 as an output format and wish to convert it to different output formats such as koi8-r and the standard iso-8859-1.

wv contains an internal charset converter which can promote all windows codepages to unicode and can convert unicode to

  • utf-8
  • iso-8859-15
  • koi8-r
  • tis-620
(read the wvHtml manpage to see if any others have been added)

wv will always be able to do the above conversions, but during configure if wv finds that your system has an iconv implementation which can do the above conversions then wv will be able to use all the other conversions that your iconv can handle. In practice this only happens if you have glibc2.1 and above on your system (redhat 6.0 and above). In this scenario you have a multitude of conversions from unicode to many other character sets. Experiment with the iconv program that is on your system if you have iconv support.