Indexes, tables of contents, page numbers and other dynamically constructed elements in a Word document, are triggered by field codes. A table of contents is typically { TOC h z “My heading style” } for user defined headings, and you can see the code if you right click the table of contents and select “Toggle Field Codes” from the menu.
When you generate one of these structures, a rendered version of the field code is embedded in the document. This is what is viewed and printed, but it is not the master definition for the element, the field code is. Today I was generating a table of contents for a dynamically constructed WordProcessingML document, and found out how problematic this can be.
When you generate a table of contents, there is a lot of logic to perform. First you need a list of all the titles you’re going to use, then you need the page numbers that they’re on. But in order to get the correct page numbers, you need to repaginate the document with the completed table of contents included. So first you grab the titles, render the list with unspecified page numbers, which are also stored as field codes, then repaginate, then go back and render the page number field codes, subsequently embedding the non-master rendered page numbers with the field codes. Confusing? If you use the h TOC switch, you also need to generate hyperlinks from the titles in the TOC, to the titles in the document, so when you ctrl-click a TOC title, it jumps to that page. This involves generating infrastructure bookmarks throughout the document, and referencing them from the TOC. It doesn’t help that the bookmarks aren’t actually part of the Word schema, but the amlcore schema, and are thus in a different namespace. The page numbers, which are in fact rendered field codes themselves, also need to be hyperlinked in the same way.
In order to generate this rendered field codes inside rendered field codes mess, you need to have access to the Word rendering engine, so you can calculate page numbers, and if all you have available is XML and XSLT, that poses an interesting problem. How do you generate a Word table of contents from scratch, without invoking Word? (And without rewriting Word!)
At first you’d think that by leaving out the rendered version of the field code, Word might think to regenerate it. But due to what I’m guessing is a complex legacy issue, this isn’t the case. Leave out the rendered TOC, and you haven’t got a TOC. A simple dirty flag on the field code would probably solve the problem, triggering Word’s field code rendering, but Word currently doesn’t have anything like that for field codes. Inline styles sure, but not for field codes.
There are actually two kinds of field codes in Word. The first, unsurprisingly, is called a simple field code, which according to the schema “These fields are run-time calculated entities in Word (for example, page numbers)”, and look like this:
<w:fldSimple w:instr='TOC z "Item title,1"'/>
While they embed the field code quite nicely, they still don’t get rendered (calculated) at run-time unless you manually tell them to. Arguably not exactly run-time.
The other kind of field code is not surprisingly a complex field code, but surprisingly they don’t call it a complex field code, it is just a field code. Complex field codes use what are called fldChar markers to mark up sections of a document, however large, as field codes and their rendered views. Not much else to add here other than no, you can’t auto render those when the document opens either.
So how do you automatically generate a table of contents when you open a Word document you’ve generated outside of Word? Absolutely no idea. But if I’m right, it’s yet another example of Microsoft simply jumping on the XML bandwagon, and just exporting the underlying Word structures as XML, instead of carefully thinking about why developers might actually want to do this.
Comments (2)