Web indexing: extending the functionality of HTML Indexer

HTML Indexer (www.html-indexer.com) is the only commercial stand-alone indexing tool that is designed solely for the indexing of web sites. For a review of the software, see Heather Hedden's article in the previous issue of The Indexer (www.hedden-information.com/Indexer_Apr_06_Hedden.pdf).

This article arises from issues that we had to overcome at TechScribe when creating the index on www.techscribe.co.uk/techw/a-z-index.htm. It shows how to extend the functionality of HTML Indexer by including special codes in the entries, then post-processing the generated HTML to obtain final HTML. (To avoid long-winded sentences, we use the term generated HTML to mean the output from HTML Indexer and the term final HTML to mean the HTML code that is used in the index page itself.)

Design goals for the index

The design goals for the index are:

Figure 1 shows an example from the screen version of the finished index.

Part of the finished index

Figure 1. Part of the finished index

Limitations of HTML Indexer

The specific limitations of HTML Indexer with respect to the design goals are:

The next section shows how to resolve these limitations.

Solutions

HTML Indexer generates HTML code that is consistent (unlike some help authoring tools). That means it is straightforward to manipulate the generated HTML programmatically.

There are three basic approaches:

Figure 2 shows the index entries in HTML Indexer, and Figure 3 shows the generated HTML in a web browser.

Entries in HTML Indexer

Figure 2. Entries in HTML Indexer

Part of the index from the generated HTML

Figure 3. Part of the index from the generated HTML

To create the icons in the final index (Figure 1), the macro identifies the file extension, and then creates the relevant HTML code automatically. (Originally, we use codes in the index entries. For example, +p was converted into code that displays an image representing a PDF file.)

Hyperlinked main headings with subheadings

By default, HTML Indexer does not create a hyperlinked main heading if there are subheadings. You can force HTML Indexer to create a hyperlinked main heading by including HTML code in the text for the heading. (The section 'Create hyperlinked common headings' on the HTML Indexer Tips and Techniques web page shows how to do this. However, the method is cumbersome, error-prone, and not recommended by the developers of HTML Indexer.)

One solution is to create the main heading in the normal manner. The generated HTML will contain a link to the web page. For each subheading, create an entry where the heading contains some additional text that indicates the entry should be deleted during post-processing, as shown in Figure 2.

See also cross-references

Generally, a see also cross-reference should be part of a single entry. One entry for a heading and another entry for a cross-reference from that heading is not standard indexing practice. HTML Indexer creates a separate entry for a cross-reference, as shown here:

HTML Indexer creates a separate entry for a cross-reference

The solution is to create the see also text as a subheading, as shown in Figure 4.

'See also' cross-reference as a subheading

Figure 4. See also cross-reference as a subheading

By default, the 'Sort as' entry field contains the same content as the 'X-ref heading' field, and this does not need to be changed. The <i> is HTML code that causes the text that comes after it to be displayed in italics in a web browser. Conveniently, the filing order of the angle bracket will cause the subheading to be at the top of the list. (To have the cross-reference on the same line as the heading would require a simple change to the post-processing macros.)

The 'Reference Text' field cannot be empty. The neatest solution is to include the HTML code (</i>) that ends the instruction to produce italic text.

An alternative to using the <i> and </i> markup would be to use codes, and convert these during post processing. This could save a few keystrokes and it allows for conversion to semantic markup (the strictly correct option), rather than hard-coding the tags for the italic text.

Summary

The methodology is not overly complex. You need to define a few easy-to-remember codes, and of course, you need to create macros to manipulate the generated HTML (At TechScribe, we use Microsoft Word, but there are text editors that offer macro functionality). After you update the index in HTML Indexer, you must copy the generated HTML to the editing tool, run the macro, and then copy the HTML to the final index.

From a commercial perspective, visual appearance and consistency throughout a web site are both important. Conforming to best practice indicates that you value your index and the people it serves.

RSS feed