Chapter 2. Review

This chapter is in three sections. The first examines how the end-user (i.e. student) interacts with CBT software, and some of the problems faced in this interaction. Section two discusses the kinds of questions that are often used in teaching situations, and relates this to CBT material. The final section deals with the analysis of answers, and compares what teachers/trainers and authoring languages do. The results of these investigations were taken into consideration throughout the further development of the author language routines.

2.1 OVERVIEW OF CBT PRODUCTS

In order to determine both the features that are desirable for a CBT package, and also the pitfalls to avoid, a selection of products that are commercially produced for the CBT and CALL market [s1 to s11] were reviewed. My choice of viewing material was not determined by any a priori considerations; I was able to observe only what I had access to at Dean Associates and Sheffield University EFL department. However, from personal experience, I believe that I saw a representative sample of current CBT/CAL software. My intention was to see these packages through the eyes of a 'typical user'. Only factors relevant to the use of the packages have been noted (along with some possible solutions); the pedagogical content is a matter for the author to determine. The results of this study were taken into consideration during the design of the author language functions.

Messages should correspond exactly to the options allowed. For example, if the message 'Press space bar to continue' appears, then all other keys should be ignored - if any keypress can be used to continue the program, then this should be stated.

Solution: have a pause function with a standard message.
If the program requests a Y/N response, then pressing of the Y or N keys only should be accepted. To assume that any keypress except Y is equivalent to N is not good practice (unless, of course, this is clearly stated in a suitable message).

Solution: have a function to do this.
All unnecessary keys should be disabled, so that they do not produce garbage. For example, in one program the back-arrow key was used to delete text; pressing the left arrow, however, produced spurious characters. This must be avoided.

Solution: all functions must be written so that unwanted keypresses are ignored.
If there is a list to be chosen from, and if the answer is to be typed, then stupid responses/typing errors should produce a message asking the student to re-enter his or her answer. (However, this type of question would better be set so that the student could make a selection by moving a highlight bar, and selecting the option by pressing the Enter key).
When messages (or any other small sections of text) are changed, re-drawing the whole screen should be avoided; just the relevant portions should be altered.

Solution: have a function which clears a specified section of the screen.
If the user has to input more than one answer on a frame (a screenful of information in a CBT environment), there should be some way for him/her to return to previous answers to correct errors that have been noticed.

Solution: If the 'Enter' key is used to designate the end of an entry, then some other (e.g. function) key could be used to enter the set of answers.
There should be provision to escape easily from a module/ section of a CBT program.

Solution: since the program will be waiting for input (either an answer, or a keypress to continue), the functions that deal with these can easily be made to exit the program, or a section of the program, if a given key, say Esc, is pressed.
With moving blocks type questions the sequence of correct answers should be matched to the sequence of the student's answers (or v.v.), so errors can be seen easily. This can be seen in the following two examples, where [1] is clearer than [2].
```
1.   Your [student] answer: 3e Id 5f ...
     Correct answer       : 3c Id 5f ...

2.   Your answer          : 3e Id 5f ...
     Correct answer       : Id 3c 5f ...
```
However, a far better solution would be to show this by use of colour to match pairs, or by joining pairs with lines.

Solution: have procedures which set up moving blocks questions, and which after analysis of an answer, show the student in a graphical way the correct answer. (Similar considerations apply to multiple-choice questions).
The cursor should be removed from the screen, except when/ where input of an answer is required.

Solution: have procedures to hide and reveal the cursor.
Waiting for a screen to fill up with text could be boring if the student is a fast reader, so there must be a balance between showing information in small, easily assimilated chunks, and boring the student by having long delays.

Solution: since part of this problem might be due to implementation dependent variations in delay, one solution might be to have a delay procedure that uses a real time delay.
Use of the page up/down (or other) keys to show previous and next screens (text or diagrams, not response frames) is a desirable feature. However, a window system might be a better approach.
Responses to students, answers should be consistent, and relatively neutral. For example, correct, or 'well done' are acceptable, whereas fantastic, or brilliant, are not appropriate.

Solution: to help an author with consistency, a match function might include the output of a standard reply.
Unnecessary keypresses should be avoided. For example, after inputting an answer, the student should not normally be asked to press a key to continue to the next frame, although there may be occasions on which the author has good reason for doing this.
All instructions must match the hardware available. For example, if there is an instruction such as 'Press the PgUp key', then this key should be on the keyboard.

Solution: if the problem is merely one of nomenclature, then a keyboard overlay card would suffice.
Chapter numbers in workbooks and other associated material should match unit and/or menu numbers in the software.

Of all the problems noted, only the last two cannot be solved by the provision of carefully designed authoring functions I do not mean to suggest either that an author could not do more to avoid some of the problems that have been noted, or that all the problems in these cases are caused by deficient software. However, to suggest that all these are due to an author's inability to fully use an AL is not credible, and as the whole purpose of an AL is to make an author's job simpler, there certainly appears to be room for improvement in the functions that are provided in ALs.

2.2 USING QUESTIONS

2.2.1 Why use questions?

Apart from the obvious role that questions play in the testing of students, for training to be effective, as all teachers recognise, there has to be a high level of learner interest and involvement. One way of achieving this is by asking questions [Beard & Senior 1980]. (These must be at an appropriate level, or the learner will either struggle to answer them, or find them trivially insulting to his/her intelligence, and will in either case lose motivation. This is a pedagogic consideration which does not affect the design or implementation of match functions). However, it is not enough merely to ask a question; there must be some method of analysing the response, and acting on it. After all, one of the strengths of good CBT is that it is 'learner centred', the student being presented with information that is immediately relevant to his or her needs. This could be, for example, by reviewing material already studied, but not assimilated, or by skipping part of a course of study whose content the student is already familiar with.

2.2.2 Types of question

Most of the time we use the word 'question', without giving much, if any, thought to what it really means. Now, however, we must consider what we mean. Questions can be categorised in different ways. For example, a grammarian might use the following taxonomy, taken from Quirk, et al [1984].

Questions can be divided into three major classes according to the type of answer they expect. Those that expect the answer yes or no, such as "Have you been to Paris?", are yes-no questions; those that contain a 'wh-element', (who? what? how? etc) and expect a reply supplying the missing information posited by that element are wh- questions: What is your name? A third type of lesser importance is the alternative question, which expects as an answer one of two or more alternatives mentioned in the question: Would you like steak or chicken?

A totally different viewpoint focuses on the physical structure of the question as presented to the student, and the way in which the student is expected to reply; it need not even be a question in the grammatical sense of the word as defined above. At the top of the list, the questions are primarily based on recognition, but those nearer the bottom require more recall on the part of the student.

true/false (or yes/no)
multiple choice (mc)
ranking
matching
blank fill (sentence completion)
free answer

For the purposes of developing matching functions for CBT, the second approach is the one that is more appropriate, since we are interested in an analysis of responses to the various question types' that are used by teachers. Of course, the author is still free to use any (or none) of the grammatical forms discussed above in the questions s/he sets the student.

Brown and Edmondson [1984] sum the foregoing well. For them, a question is 'Any statement intended to evoke a [verbal] response' (brackets mine). Of course, they are referring to classroom situations, rather than everyday conversations.

For readers unfamiliar with the terminology used, examples of these types of question are set out below. Note that the examples are not complete, in that full instructions as to how the student is to enter his or her response are not given.

1) TRUE/FALSE or YES/NO

The student is presented with a sentence, and must indicate whether it is right or wrong.

COBOL is the best programming language for 'number crunching' purposes.

2) MULTIPLE CHOICE (MC)

This consists of a partial sentence, and a number (usually four or five) of options. The student must select (by typing a letter, moving a cursor block, etc) the best/most appropriate/ correct option. (Clearly, the context is important!).

CBT is an acronym for ________

a) COBOL Beginners Training
b) Computer Based Trauma
c) Computer Based Training
d) Computer Biased Teaching

3) RANKING

The student is presented with some items, which must be put into some order of precedence (typing a letter/number next to them, moving them on screen, etc).

Arrange the following data storage devices according to speed of data access: floppy disk, RAM, hard disk, registers.

4) MATCHING

The student is given two groups of items, and must match items from each group (joining lines, writing a letter and a number, etc).

Match one pigment to each colour.


     PIGMENT       COLOUR

     bistre        green
     cinnabar      yellow
     bice          brown
     verdigris     blue
     gamboge       red

5) BLANK FILL (SENTENCE COMPLETION)

There are many varieties of this, but basically the idea is for a partial sentence to have its missing word(s), inserted by a student. Help can be given by replacing letters in the word by dashes, etc.

Another name for a RAM disk is a _ _ _ _ _ _ _ disk.

6) FREE ANSWER

This may range from just one word to an essay. In CBT the longest practical free answer (for immediate feedback) is usually one sentence. It is possible, however, for a student to type in an essay, and have this assessed by a trainer/teacher.

2.2.3 Limitations of CBT

Much computer based training and testing has made use of the first four question types described previously (i.e. true/ false, multiple-choice, ranking, matching). I suggest that this is due to limitations of software technology, rather than being due to any inherent pedagogical advantages of these types of question. By this, I do not mean that they do not have an important role to play, but rather, there are situations in which their use is inappropriate, or at best, limiting.

It is a trivial matter to analyse true/false, multiple-choice, ranking, and matching type questions, and most ALs deal with these adequately. In the case of true/false and multiple choice analysis, the program has merely to check that a response is one of a set of allowed answers, and if it is, then the answer itself must be analysed, and depending on what it is, appropriate action is taken. Usually, if an answer is not in the allowed set, for example, if a student enters a number instead of a letter, then some visual and/or aural message is given, and the user re-enters the answer. There are usually pre-analysis procedures which 'clean up', a multiple choice answer, for example, to convert the case or remove leading spaces and extraneous punctuation marks. Methods which prevent students even making this kind of error are available; a 'pick and shoot', menu, where the student can move a highlight bar over the available options, and then press a key to select one, is a good example of this.

Ranking and matching are a little more involved in that a student may rank or match some, but not all, items in the correct order. The number of combinations that are possible may be large, and so this could lead to logistic difficulties in responding to the answer.

Blank fill/sentence completion matching is a halfway stage between the types of match analysis required for the question types considered above, and that required for analysing a free response answer. It is with these two type of answer that limitations are most noticeable. The next section explores how current commercial authoring language match functions try to replicate the judgements of a teacher, and the problems involved therein.

2.3 ANALYSIS OF AN ANSWER

This section details the kind of analysis that any teacher or trainer does in the classroom, when responding to answers, and relates this to the facilities that commercial match functions offer the CBT author. The classification of answer analysis is mine, based on personal experience. In the examples of the match functions that follow, I use a pseudocode which exemplifies what is to be done; each author language has its own format for doing this.

2.3.1 Accepting redundant information

Q: Who wrote 'Thomas the Tank Engine'?
A1: The Rev. W. Awdrey did.
A2: It was Awdrey, wasn't it?
A3: I think it was W. Awdrey.

The information that is wanted is, 'The Reverend W. Awdrey'. Three examples of perfectly acceptable answers have been given, but clearly, there are many more.

Most authoring languages deal with this by looking f or one or more key words in a student's answer. Thus, if an ideal answer were "The cat sat on the mat", an author might select "cat", "sat", and "mat" as three keywords to be matched in the answer. If they are found in the order given, then the author assumes that the student has the right idea. However, there can be many problems with negation, which can occur in a verb as well as by using words such as 'no', 'not', etc.

2.3.2 Analysis of partially correct information

Q: List the colours of the rainbow in decreasing order of wavelength.
A: Red, orange, green, blue, yellow, violet.

(Note that this is not a ranking question, since the student is required to recall, not simply rearrange information).

Two kinds of analysis are required here. One is to note that some information (i.e. indigo) is missing, and the other is to determine that some of the information is incorrectly ordered.

In commercial authoring languages, analysis of partially correct information is done by looking for words in an answer, but not paying attention to their order. To determine, for example, whether a minimum of five out of seven items are in an answer is a relatively simple matter, if all the function has to do is return TRUE or FALSE. However, if an author wishes to tell the student what the other two items were, or to take some kind of remedial action dependent on the student's answer, there is a major difficulty that needs to be overcome. Essentially, it is to do with the combinatorial explosion problem; how are all the possible combinations of two from seven catered for? In the ALs that I have seen there is no answer to this. The author has to say something along the lines of "You found 5 of the items, the full list is ...

With a certain type of multiple choice question a related problem arises. Say, for example, that an author was looking for any two correct answers out of a group of three correct possibilities, s/he could do this by writing something like:

IF answer = (a AND b) OR (b AND c) OR (a AND c) THEN ...

However, this is tedious, and with a larger number of options could easily lead to an author making logical errors.

2.3.3 Acceptance of spelling errors

Q: Name an indicating agent that is red when basic, and colourless in acids.
A: Phenolfaline.

In most cases (unless spelling were part of the test), a teacher would probably accept the answer as an unorthodox version of phenolphthalein , but might make some comment about the spelling. Similar considerations apply to punctuation, agreement of subject and verb, and other errors that occur in written answers.

Wildcard characters are used in most commercial ALs to enable an author to specify that any characters), or group of characters should be accepted by the match function. For example, let ? represent any one character, and * represent an unspecified number of characters.

1. match("h?llo", student_answer)
2. match("hel*", student_answer)

Match function 1 will accept "hello", "hallo", as well as something like "hjllo". Match function 2 will accept "hello", "help", and "heliotrope".

Most ALs let the author decide if things such as spaces, case, and punctuation marks are to be considered in the match. TopClass [FPC 1987] takes into account phonetic aspects, and will allow, for example, the substitution of 'ph' by 'f' to be accepted.

2.3.4 Acceptance of options

Q: According to Professor X, what is the most insidious drug that is freely available in contemporary Western society?

AI: television
A2: telly
A3: tv
A4: the box

The meaning of all these four answers is the same (apart from emotive considerations - e.g. 'the box, has pejorative connotations to some people, whereas 'television, is neutral). Members of a learning group are usually able to instantly recognise whether or not a particular word or phrase is a synonym for another, and no difficulties arise.

This is usually accomplished in commercial packages by having a list of options as a model answer, any one item of which, when found in the student's answer, will cause the match function to succeed.

In some cases an author would want to accept more than one answer from the student, for example, in a question of the type, "What three things do you consider ... ".

One such question from a CBT package [Dean Associates 1987] had the following as possible answers:

1. bonus/surveyor phoned/ phone/ call
2. bricklayer's bonus/head office
3. doesn't know/didn't know/ did not know

This question exemplifies more areas of difficulty. Firstly, there is the problem of how the student's answer is to be input into the match function. it could be that three answers are collected independently, and then matched independently. Alternatively, three answers could be input parameters to just one match function. A final possibility is to collect the three answers as one group, and analyse them as such.

Secondly, in answer 3, we can see that the semantic content focuses on the verb "to know". It is inefficient for the author to have to write this in its many forms; ideally the matching functions should be able to deal with situations like this. Similarly, there could be a sub-function to cope with irregular verbs.

Another problem can be seen by referring to the first model answer. This is about a telephone call (the subject of which was a workers' bonus payment, the surveyor (!) being the person who made the call). If "phone" or "call" are acceptable answers, then so should "phone-call", "phone call", "telephone", "bring", "rang", and perhaps even colloquialisms such as "bell" or "blower". Clearly, it is impractical for an author to list all the possible options, although by the judicious use of wildcard characters, this difficulty could be partially alleviated.

2.3.5 Understanding of context

Q: What's the translation of 'tempus fugit'?
A: Time flies.

The teacher understands that 'time' is a noun, rather than the verb of juvenile humour.

Understanding of context is the area of greatest deficiency in CBT, and programs can cope only superficially, for example, by keeping track of which branches the student has already been through, or allowing a specified number of attempts at a question before entering a remedial loop. This cannot be classed as real 'understanding'. In fact, all the problems mentioned can be subsumed under this heading; machines simply do not understand the world as humans do.

In all the cases shown above, the teacher is able to understand exactly what the student means, and can thus match (and judge as 'correct') the answer with the model that s/he has in mind. Furthermore, we know that if for any reason the teacher is unable to understand an answer, s/he is usually able to ask the student for clarification. Additionally, corrective feedback may be given, which could focus on spelling, appropriateness, analysis of how a student arrived at an answer (whether right or wrong), etc.

A CBT program can match a word, allow for some typographical errors, ignore a space, etc, but there is no underlying model of 'the real world'. Syntactically, computers can do wonderful things, but when it comes to semantics, there's a long way to go.