This chapter is in three sections. The first examines how the end-user (i.e. student) interacts with CBT software, and some of the problems faced in this interaction. Section two discusses the kinds of questions that are often used in teaching situations, and relates this to CBT material. The final section deals with the analysis of answers, and compares what teachers/trainers and authoring languages do. The results of these investigations were taken into consideration throughout the further development of the author language routines.
In order to determine both the features that are desirable for a CBT package, and also the pitfalls to avoid, a selection of products that are commercially produced for the CBT and CALL market [s1 to s11] were reviewed. My choice of viewing material was not determined by any a priori considerations; I was able to observe only what I had access to at Dean Associates and Sheffield University EFL department. However, from personal experience, I believe that I saw a representative sample of current CBT/CAL software. My intention was to see these packages through the eyes of a 'typical user'. Only factors relevant to the use of the packages have been noted (along with some possible solutions); the pedagogical content is a matter for the author to determine. The results of this study were taken into consideration during the design of the author language functions.
However, a far better solution would be to show this by use of colour to match pairs, or by joining pairs with lines.
1. Your [student] answer: 3e Id 5f ... Correct answer : 3c Id 5f ... 2. Your answer : 3e Id 5f ... Correct answer : Id 3c 5f ...
Of all the problems noted, only the last two cannot be solved by the provision of carefully designed authoring functions I do not mean to suggest either that an author could not do more to avoid some of the problems that have been noted, or that all the problems in these cases are caused by deficient software. However, to suggest that all these are due to an author's inability to fully use an AL is not credible, and as the whole purpose of an AL is to make an author's job simpler, there certainly appears to be room for improvement in the functions that are provided in ALs.
Apart from the obvious role that questions play in the testing of students, for training to be effective, as all teachers recognise, there has to be a high level of learner interest and involvement. One way of achieving this is by asking questions [Beard & Senior 1980]. (These must be at an appropriate level, or the learner will either struggle to answer them, or find them trivially insulting to his/her intelligence, and will in either case lose motivation. This is a pedagogic consideration which does not affect the design or implementation of match functions). However, it is not enough merely to ask a question; there must be some method of analysing the response, and acting on it. After all, one of the strengths of good CBT is that it is 'learner centred', the student being presented with information that is immediately relevant to his or her needs. This could be, for example, by reviewing material already studied, but not assimilated, or by skipping part of a course of study whose content the student is already familiar with.
Most of the time we use the word 'question', without giving much, if any, thought to what it really means. Now, however, we must consider what we mean. Questions can be categorised in different ways. For example, a grammarian might use the following taxonomy, taken from Quirk, et al .
Questions can be divided into three major classes according to the type of answer they expect. Those that expect the answer yes or no, such as "Have you been to Paris?", are yes-no questions; those that contain a 'wh-element', (who? what? how? etc) and expect a reply supplying the missing information posited by that element are wh- questions: What is your name? A third type of lesser importance is the alternative question, which expects as an answer one of two or more alternatives mentioned in the question: Would you like steak or chicken?
A totally different viewpoint focuses on the physical structure of the question as presented to the student, and the way in which the student is expected to reply; it need not even be a question in the grammatical sense of the word as defined above. At the top of the list, the questions are primarily based on recognition, but those nearer the bottom require more recall on the part of the student.
For the purposes of developing matching functions for CBT, the second approach is the one that is more appropriate, since we are interested in an analysis of responses to the various question types' that are used by teachers. Of course, the author is still free to use any (or none) of the grammatical forms discussed above in the questions s/he sets the student.
Brown and Edmondson  sum the foregoing well. For them, a question is 'Any statement intended to evoke a [verbal] response' (brackets mine). Of course, they are referring to classroom situations, rather than everyday conversations.
For readers unfamiliar with the terminology used, examples of these types of question are set out below. Note that the examples are not complete, in that full instructions as to how the student is to enter his or her response are not given.
The student is presented with a sentence, and must indicate whether it is right or wrong.
COBOL is the best programming language for 'number crunching' purposes.
This consists of a partial sentence, and a number (usually four or five) of options. The student must select (by typing a letter, moving a cursor block, etc) the best/most appropriate/ correct option. (Clearly, the context is important!).
CBT is an acronym for ________
a) COBOL Beginners Training
b) Computer Based Trauma
c) Computer Based Training
d) Computer Biased Teaching
The student is presented with some items, which must be put into some order of precedence (typing a letter/number next to them, moving them on screen, etc).
Arrange the following data storage devices according to speed of data access: floppy disk, RAM, hard disk, registers.
The student is given two groups of items, and must match items from each group (joining lines, writing a letter and a number, etc).
Match one pigment to each colour.
PIGMENT COLOUR bistre green cinnabar yellow bice brown verdigris blue gamboge red
There are many varieties of this, but basically the idea is for a partial sentence to have its missing word(s), inserted by a student. Help can be given by replacing letters in the word by dashes, etc.
Another name for a RAM disk is a _ _ _ _ _ _ _ disk.
This may range from just one word to an essay. In CBT the longest practical free answer (for immediate feedback) is usually one sentence. It is possible, however, for a student to type in an essay, and have this assessed by a trainer/teacher.
Much computer based training and testing has made use of the first four question types described previously (i.e. true/ false, multiple-choice, ranking, matching). I suggest that this is due to limitations of software technology, rather than being due to any inherent pedagogical advantages of these types of question. By this, I do not mean that they do not have an important role to play, but rather, there are situations in which their use is inappropriate, or at best, limiting.
It is a trivial matter to analyse true/false, multiple-choice, ranking, and matching type questions, and most ALs deal with these adequately. In the case of true/false and multiple choice analysis, the program has merely to check that a response is one of a set of allowed answers, and if it is, then the answer itself must be analysed, and depending on what it is, appropriate action is taken. Usually, if an answer is not in the allowed set, for example, if a student enters a number instead of a letter, then some visual and/or aural message is given, and the user re-enters the answer. There are usually pre-analysis procedures which 'clean up', a multiple choice answer, for example, to convert the case or remove leading spaces and extraneous punctuation marks. Methods which prevent students even making this kind of error are available; a 'pick and shoot', menu, where the student can move a highlight bar over the available options, and then press a key to select one, is a good example of this.
Ranking and matching are a little more involved in that a student may rank or match some, but not all, items in the correct order. The number of combinations that are possible may be large, and so this could lead to logistic difficulties in responding to the answer.
Blank fill/sentence completion matching is a halfway stage between the types of match analysis required for the question types considered above, and that required for analysing a free response answer. It is with these two type of answer that limitations are most noticeable. The next section explores how current commercial authoring language match functions try to replicate the judgements of a teacher, and the problems involved therein.
This section details the kind of analysis that any teacher or trainer does in the classroom, when responding to answers, and relates this to the facilities that commercial match functions offer the CBT author. The classification of answer analysis is mine, based on personal experience. In the examples of the match functions that follow, I use a pseudocode which exemplifies what is to be done; each author language has its own format for doing this.
Q: Who wrote 'Thomas the Tank Engine'?
A1: The Rev. W. Awdrey did.
A2: It was Awdrey, wasn't it?
A3: I think it was W. Awdrey.
The information that is wanted is, 'The Reverend W. Awdrey'. Three examples of perfectly acceptable answers have been given, but clearly, there are many more.
Most authoring languages deal with this by looking f or one or more key words in a student's answer. Thus, if an ideal answer were "The cat sat on the mat", an author might select "cat", "sat", and "mat" as three keywords to be matched in the answer. If they are found in the order given, then the author assumes that the student has the right idea. However, there can be many problems with negation, which can occur in a verb as well as by using words such as 'no', 'not', etc.
Q: List the colours of the rainbow in decreasing order of wavelength.
A: Red, orange, green, blue, yellow, violet.
(Note that this is not a ranking question, since the student is required to recall, not simply rearrange information).
Two kinds of analysis are required here. One is to note that some information (i.e. indigo) is missing, and the other is to determine that some of the information is incorrectly ordered.
In commercial authoring languages, analysis of partially correct information is done by looking for words in an answer, but not paying attention to their order. To determine, for example, whether a minimum of five out of seven items are in an answer is a relatively simple matter, if all the function has to do is return TRUE or FALSE. However, if an author wishes to tell the student what the other two items were, or to take some kind of remedial action dependent on the student's answer, there is a major difficulty that needs to be overcome. Essentially, it is to do with the combinatorial explosion problem; how are all the possible combinations of two from seven catered for? In the ALs that I have seen there is no answer to this. The author has to say something along the lines of "You found 5 of the items, the full list is ...
With a certain type of multiple choice question a related problem arises. Say, for example, that an author was looking for any two correct answers out of a group of three correct possibilities, s/he could do this by writing something like:
IF answer = (a AND b) OR (b AND c) OR (a AND c) THEN ...
However, this is tedious, and with a larger number of options could easily lead to an author making logical errors.
Q: Name an indicating agent that is red when basic, and colourless in acids.
In most cases (unless spelling were part of the test), a teacher would probably accept the answer as an unorthodox version of phenolphthalein , but might make some comment about the spelling. Similar considerations apply to punctuation, agreement of subject and verb, and other errors that occur in written answers.
Wildcard characters are used in most commercial ALs to enable an author to specify that any characters), or group of characters should be accepted by the match function. For example, let ? represent any one character, and * represent an unspecified number of characters.
1. match("h?llo", student_answer)
2. match("hel*", student_answer)
Match function 1 will accept "hello", "hallo", as well as something like "hjllo". Match function 2 will accept "hello", "help", and "heliotrope".
Most ALs let the author decide if things such as spaces, case, and punctuation marks are to be considered in the match. TopClass [FPC 1987] takes into account phonetic aspects, and will allow, for example, the substitution of 'ph' by 'f' to be accepted.
Q: According to Professor X, what is the most insidious drug that is freely available in contemporary Western society?
A4: the box
The meaning of all these four answers is the same (apart from emotive considerations - e.g. 'the box, has pejorative connotations to some people, whereas 'television, is neutral). Members of a learning group are usually able to instantly recognise whether or not a particular word or phrase is a synonym for another, and no difficulties arise.
This is usually accomplished in commercial packages by having a list of options as a model answer, any one item of which, when found in the student's answer, will cause the match function to succeed.
In some cases an author would want to accept more than one answer from the student, for example, in a question of the type, "What three things do you consider ... ".
One such question from a CBT package [Dean Associates 1987] had the following as possible answers:
1. bonus/surveyor phoned/ phone/ call
2. bricklayer's bonus/head office
3. doesn't know/didn't know/ did not know
This question exemplifies more areas of difficulty. Firstly, there is the problem of how the student's answer is to be input into the match function. it could be that three answers are collected independently, and then matched independently. Alternatively, three answers could be input parameters to just one match function. A final possibility is to collect the three answers as one group, and analyse them as such.
Secondly, in answer 3, we can see that the semantic content focuses on the verb "to know". It is inefficient for the author to have to write this in its many forms; ideally the matching functions should be able to deal with situations like this. Similarly, there could be a sub-function to cope with irregular verbs.
Another problem can be seen by referring to the first model answer. This is about a telephone call (the subject of which was a workers' bonus payment, the surveyor (!) being the person who made the call). If "phone" or "call" are acceptable answers, then so should "phone-call", "phone call", "telephone", "bring", "rang", and perhaps even colloquialisms such as "bell" or "blower". Clearly, it is impractical for an author to list all the possible options, although by the judicious use of wildcard characters, this difficulty could be partially alleviated.
Q: What's the translation of 'tempus fugit'?
A: Time flies.
The teacher understands that 'time' is a noun, rather than the verb of juvenile humour.
Understanding of context is the area of greatest deficiency in CBT, and programs can cope only superficially, for example, by keeping track of which branches the student has already been through, or allowing a specified number of attempts at a question before entering a remedial loop. This cannot be classed as real 'understanding'. In fact, all the problems mentioned can be subsumed under this heading; machines simply do not understand the world as humans do.
In all the cases shown above, the teacher is able to understand exactly what the student means, and can thus match (and judge as 'correct') the answer with the model that s/he has in mind. Furthermore, we know that if for any reason the teacher is unable to understand an answer, s/he is usually able to ask the student for clarification. Additionally, corrective feedback may be given, which could focus on spelling, appropriateness, analysis of how a student arrived at an answer (whether right or wrong), etc.
A CBT program can match a word, allow for some typographical errors, ignore a space, etc, but there is no underlying model of 'the real world'. Syntactically, computers can do wonderful things, but when it comes to semantics, there's a long way to go.
Preface | Contents | 1 Introduction | 2 Review | 3 Req. analysis | 4 Req. documents | 5 Specification | 6 Design | 7 Verification | 8 Discussion | 9 PAL manual | Appendix A | Appendix B | Appendix C | Glossary | References | Index