Indexing technical documents
This article was first published (with small differences) in the City Information Group (www.cityinformation.org.uk) Yearbook 2004.
"A document without an index is like a country without a map." [Adapted from an unknown source.]
Finding information
If a document contains the information that a reader needs, but if the reader cannot find that information, the document is useless. Worse than useless, the document causes problems. If I know that some information is not available, I will not waste my time looking for the information. However, if I think that the information is available, and if I cannot find it after a period of unsuccessful searching, I will be frustrated.
In the field of technical writing, typical documents are user guides, reference manuals, and online help. A document that has good structure, and a table of contents that contains clear headings can help a reader to find information. However, good structure and a table of contents are not sufficient. Usually, an index is also necessary. An index organises information that is scattered through a document. An index supplies search terms that tell the reader the locations of applicable information in the document.
Many small documents do not need an index. Possibly, a quick scan through a document is all that is necessary to find the answer to a question. Some large documents do not need an index, or possibly, they need an index that covers only part of the document. For example, a telephone directory is a large document that is very usable, although it does not contain an index for the primary text. Usually, a small index relates the terms that people use to the terms that are in the text. For example, 'technical author: see technical writer'. To know whether an index is necessary, you need to know how the document will be used, its size, and its structure.
Common misconceptions
Some clients have misconceptions about indexes and why they are necessary. Before electronic versions of documents were easily available, technical manuals contained an index. Now, readers can search online help and electronic versions of printed documents for words that appear in the text. Does this mean that an electronic document does not need an index? No, as the following example shows.
All software developers of Windows-based software know that you 'close' a dialog box, 'quit' a program, and 'end' a network connection. Many users do not know the different terms. Therefore, if the document contains text that refers to 'quitting xyz', and if users search for 'closing xyz', they will not find the information.
Because a search returns all instances of a term, possibly, the results are not much practical use, because there are too many results.
Another frequent misconception is that a word processor or desktop publishing tool can automatically create an index. Software can create a concordance, which is very different from an index. (In this article, concordance means a list of keywords; another meaning is a list of keywords with their immediate context.) A concordance is not usually useful in the field of technical documentation. A reader needs to know about important instances of a particular term, not all instances of a term. Software does not know what is important and what is not important.
Producing an index
Until software can 'understand' the real-world meaning of text, an indexer must supply the intellectual input in creating an index (certainly, indexers use software to prepare indexes). An indexer creates an index in one of two ways:
- An index can be created as a separate document (the historical method)
- An index can be created as part of the document that is being indexed (embedded indexing). This method is becoming popular. In the field of technical writing for software, it is the primary method.
With the first method, the indexer manually specifies the pages to which a particular term in the index refers. If the page layout changes, the index must be corrected manually, even if software is used to create the index.
With embedded indexing, the indexer puts a marker (also known as a tag or a code) at each location in the document at which the term is relevant. Software generates the index based on the markers. If text is moved, the markers move with it, and the index can be regenerated easily. If text is deleted, the markers are also deleted, and the index can be regenerated. The index still requires manual checking. For example, possibly, cross-references are not valid now.
The future
Indexing a document is an intellectual task that is helped by software. Indexers beware! The software on the market that tries to create indexes is not very good. However, it will improve. Possibly, it will never be as good as a human-created index, but possibly, it will become good enough for many purposes.
See also ![]()