Indexing technical documents

This article was originally published (with minor differences) in the City Information Group (www.cityinformation.org.uk) Yearbook 2004.

"A document without an index is like a country without a map."

Finding information

If a document contains the information that a reader needs, but if the reader cannot find that information, the document is useless. Worse than useless, it's a hindrance. If I know that some information is not available, I won't waste my time looking for it. However, if I think that the information is available, and if I can't find it after a period of fruitless searching, all I will have achieved is frustration.

Within the field of technical writing, documents typically are things such as user guides, manuals, online help systems and procedure guides. A document that is well structured and that has a table of contents which contains clear headings and subheadings can help a reader to locate information. But that's not enough. An index is usually also needed. Essentially, an index organises information that is scattered throughout a document. An index enables a reader to find information by providing search terms that direct the reader to the appropriate locations in the document.

Many small documents don't need an index. A quick scan through the document may be all that is needed to find the answer to a question. Some large documents don't need an index or may only need an index that covers part of the document. For example, a phone book is a large document that is extremely usable, even though it does not contain an index for the main text. There is usually a small index, which, amongst other things, relates the terms that people might use to the ones that are used in the text. For example, "Author: see also technical writer". To determine whether an index is needed, one needs to know how the document will be used, its size and its structure.

Common misconceptions

Some clients have misconceptions about indexes and why they are needed. In the days before electronic versions of documents were readily available, technical manuals contained an index. Nowadays readers can search online help systems and electronic versions of printed documents for words that appear in the text. Does this mean that an electronic document doesn't need an index? Far from it. An example proves the point. All developers of Windows-based software know that you "close" a dialogue box, "quit" a program and "end" a network connection. Many end-users do not make a distinction (and why should they?). Therefore, although the document may contain text that refers to "quitting xyz", if users search for "closing xyz", they won't find the information. On the other hand, because a search returns every instance of a term, the results may not be much practical use.

Another common misconception is that a word processor or desktop publishing tool can automatically create an index. Software can create a concordance, which is very different from an index. (In this article concordance means a list of keywords; another meaning is a list of keywords with their immediate context.) A concordance is rarely useful within the field of technical documentation. A reader needs to know about significant instances of a particular term, not every instance. Software just cannot make that distinction.

Producing an index

Until software can "understand" the real-world meanings and significance of a text, an indexer must supply the intellectual input in creating an index (certainly, indexers use software to prepare indexes). There are two basic ways in which an indexer produces an index. Either the index can be created as a stand-alone entity (the historical method) or it can be created as part of the document that is being indexed (embedded indexing). This latter method is becoming more popular in general; within the field of technical writing for software it is the only method that is used.

With the historical stand-alone method, the indexer manually specifies the pages to which a particular term in the index refers. If the pagination changes, the index must be corrected manually (even if using software to create the index).

With embedded indexing, the indexer puts a marker (also known as a tag or a code) at each place in the document at which the term is relevant. Software generates the index based on the markers. If text is moved the markers move with it, and the index can be regenerated easily. If text is deleted, the markers are also deleted, and again, the index can be regenerated. The index still requires manual checking. For example, cross-references may no longer be valid.

The future

Currently, indexing a document is an intellectual task that is aided by software. Indexers beware! There is software on the market that attempts to create indexes. It's not very good at all. But it will improve. Maybe it will never be quite as good as a human-created index, but it might just be good enough for many purposes.

See also

Frequently asked questions: an alternative

RSS feed