Microsoft Office has the ability to save documents in XML (WordProcesingML) format, so I’ve written an HTML to WordProcessingML converter for one of our projects at Synop. But while the schema is provided, there’s not much useful documentation, and there are some traps.
In WordProcessingML, lists are generated by applying a w:listPr element to a paragraph. The w:listPr points to what’s called an w:ilfo element, and it is the w:ilfo element which points to the structure which defines the style of the list, the w:listDef element. Think of it as a memory handle, as it works the same way.
So far so good, as it makes moving styles around fairly easy, by just changing pointer values. You can also restart the numbering of a list inside the w:ilfo, and the bullet characters from within the w:listPr, but for all intents the w:listDef is where the action happens.
Now the w:ilfo and w:listDef structures are all kept inside a master w:lists element at the document root, and while the ordering of each doesn’t matter, the grouping of like elements does. For example, you can have two w:listDefs then two w:ilfos which point to either of the w:listDefs, the ordering doesn’t matter, but you can’t have a w:listDef followed by a w:ilfo, followed by a w:listDef.
This of course flies in the face of having XML based handles in the first place, so my assumption was that is a bug and not by design. However upon checking the schema (XSD), there’s an xs:sequence which dictates that listDefs must all be included before any ilfos. So either it is by design, or whoever coded the XSD wasn’t thinking DOM and XPath access to the data. Not only that, but the schema doesn’t actually validate (in XMLSpy), so in the immortal words of the XML nazis, it’s not technically a schema.
Anyway, aside from having to know how to read an XSD, this isn’t documented anywhere, so a tip for budding WordProcessingML developers: always put w:listDefs together, following by w:ilfos.
One Comment