Download as RTF document (51KB)

Relevance, Contextual Effects and Least Effort

Trevor Pateman

Abstract: A fairly technical review of the first edition of Sperber and Wilson's Relevance which includes discussion of criticisms by Gazdar and Good. Focus is on concepts of relevance, context, and least effort. Material of interest to students of literary theory is given prominence in the second half of the review, with cross-references to Bakhtin/Volosinov

Dan Sperber and Deirdre Wilson, Relevance, Communication and Cognition. Oxford: Basil Blackwell, 1986. VIII+ 279pp.

Twenty years ago I heard this story: A European economist had gone to Cuba to assist the new revolutionary government. He found himself blocked at every turn. In despair, he went to see the Minister. "I'm an economist," he said. "My job is to show how to get maximum output with minimum input." The Minister smiled: "But we are revolutionaries. We want maximum output with maximum input."

This is not, I suspect, the book everyone has been waiting for. It is an introduction to a larger project in which Sperber and Wilson (hereafter, SW) are engaged. They tell us that "The substance of two more books, one on pragmatics, the other on rhetoric, is already on paper and, duly revised, might even go into print" (p. viii). This project claims to be nothing less than "a new approach to the study of human communication" (p. vii), grounded in a general view of human cognition as "geared to achieving the greatest possible cognitive effect for the smallest possible processing effort" (p. vii). Communication connects to cognition insofar as communication is driven by a principle of relevance which, SW claim, guides utterance making and interpretation.

The story sounds very much like my economist's and, indeed, many of SW's metaphors for cognitive processes derive from economics or, more generally, utilitarianism. (Wilson, incidentally, studied both economics and philosophy in her first degree.) There is passing recognition of the Cuban Minister's revolutionary sentiment in the notion that the level or amount of processing we expect to engage in varies with context ? higher in a seminar (or, I would add, reading a nouveau roman) than in a pub conversation (or, reading a historical romance).

But this point is not developed and the theory of relevance is expounded in the context of a global or undifferentiated least effort assumption: speakers do no more than they have to to get their message across and, if they appear to do more, it is because the message is richer than was assumed. Hearers, likewise, interpret utterances in the first way that occurs to them, assuming that this will yield the message intended and backtrack only if the first?shot interpretation is inadequate (is irrelevant or only weakly relevant), as, for example, when we fail to "get" a joke. (See endnote 1)

Chapter 1 "Communication" contrasts two models of verbal communication. The code model says that communication is achieved by encoding and decoding messages. The inferential model of communication says that communication is achieved by producing and interpreting evidence and, specifically, evidence of the communicator's intentions, which are differentiated into informative and communicative intentions by SW. The code model is inadequate because comprehension (understanding) involves more than decoding the linguistic signal, which is a coded pairing of phonological and semantic representations. SW assume that the code is shared by communicators ("members of the same linguistic community converge on the same language," p. 16), though this is not a necessary assumption (see endnote 2). In addition to recovering the coded pairing of sound and meaning, hearers also have to go beyond the code to get from a sentence's semantic representation to the thought it serves to express when the sentence is used in an utterance. One of SW's key ideas is that sentences generally provide only incomplete and ambiguous representation of thoughts (see endnote 3) and, later on (p. 193), they cast doubt on the principle of effability ( the principle that for every thought, there is a corresponding eternal sentence). Denial of the principle of effability is connected to (and, it seems to me, required by) their more fully developed view that semantic representations are not connected to the thoughts they are used to express by means of a code. Mutual knowledge approaches to understanding are precisely attempts to extend the code approach from the level of language to the level of context but they are untenable, argue SW. Semantic representations are linked to thoughts not by means of the deployment of (coded) mutual knowledge but by means of inferences which necessarilyremain non?demonstrative: they necessarily involve an element of abduction (hypothesizing). Such inferences aim to recover speakers' intentions. In arguing this way, SW's approach is inspired by the approach of H P Grice in Logic and Conversation

Two connections with literary theory and, more generally, poetics can be made at this point. First, SW's approach aligns them with that of E.D. Hirsch, to whom their emphasis on the non?demonstrative nature of the link between semantic representation and thought would be congenial. Recall, for example, this passage from The Aims of Interpretation, where Hirsch's "meaning" is roughly equivalent to SW's "thought":

"Speech-act theory, in the form developed by Grice and Strawson, reasserts the linguistic priority of intention and hence of mind. It asserts the indeterminacy and hence the partial independence of meaning with respect to form and to convention. It follows that a guess [an abduction - TP] about intention is in principle a permanent feature of interpretation which no methodological system [e.g., of stylistics - TP] could ever remove. The guess itself cannot be fully determined by stylistic features, nor can stylistic features definitely confirm the guess concerning intended meaning" (p. 71).

Second, SW's approach would oppose them (like Hirsch) to such practitioners of stylistics as Roger Fowler and his associates (Fowler et al. 1979; Kress and Hodge 1979) who do indeed seem to think both that we could lengthen sentences to the point where they completely and unambiguously express our thoughts and also that this would be a desirable practice, since it would reduce likelihood of communicative failure. SW would agree that, on their own approach, it is communicative success rather than communicative failure which requires explanation (p. 45) but they do not agree that increasing the explicitness of our sentences is a possible, necessary or desirable way of achieving such success. (See endnote 4)

To return to SW. They spend a good part of their first chapter building an account of context which avoids what they see as the weaknesses of the mutual knowledge approach to context. The leading idea is to analyze saying via showing (ostension) where what is ostended by a speaker comes with a tacit guarantee of relevance to the hearer. The upshot is a definition of ostensive-inferential communication in which "the communicator produces a stimulus which makes it mutually manifest to communicator and audience that the communicator intends, by means of this stimulus, to make manifest or more manifest to the audience a set of assumptions { I }" (p. 63). In this definition, an assumption is taken to be something an individual is capable of mentally representing and accepting as true.

Though SW deny that the inference from semantic representation to thought expressed is a demonstrative inference, they do believe that deductive inference plays a part in utterance understanding and they devote the second chapter of their book to a sketch of their theory of deductive inference. They need such a theory for the straightforward reason that, on most accounts of deductive inference, inferences from sets of premises newly conjoined in a hearer's mind (as new information in the context of old information) expand to infinity and total triviality. On both those counts, any account of deductive inference with such consequences must be psychologically implausible. They propose to solve this problem by means of a model of deduction, claimed to be psychologically realistic, in which introduction rules are barred and only elimination rules permitted. So in their model of our deductive device (which they see as operating spontaneously, unconsciously and automatically), it is not possible to derive the assumption P or Q from the assumption P. It is only possible to deploy elimination rules, such as conjunctive modus ponens and disjunctive modus ponens (p. 99) . In support of their claim that such a model of deduction is psychologically realistic, they cite Rips (1983).

The function of the deductive device within the overall context of non-demonstrative inference is to derive the contextual effects of newly presented information { P } in a context of old information { C }. The most important subclass of contextual effects are contextual implications, defined as follows (see endnote 5 for other sub-classes of contextual effects):

A set of assumptions {P} contextually implies an assumotion {Q} in the context {C} if
  1. the union of {p} and {c} non-trivially implies Q
  2. {P} does not non-trivially inply Q, and
  3. {C} does not non-trivially imply Q

SW are now in a position to characterize their central notion of relevance in terms of the generation of contextual effects, and the discussion of relevance provides the subject matter of Chapter 3.

In Chapter 3, SW argue that for an utterance - or, more precisely, an assumption ostended by an utterance - to be relevant, it is necessary and sufficient for it to have some contextual effects and, all other things (e.g., processing costs) equal, the greater the contextual effects, the greater the relevance. Thus, relevance can be defined as follows:

Extent condition 1: an assumption is relevant in a context to the extent that its contextual effects in the context are large.
Extent condition 2: an assumption is relevant in a context to the extent that the effort required to process it in this context is small (p. 125; compre p. 145).

Two points can be made at this stage about this definition of relevance.

First, it is to be noted that SW reject the search for a quantitative concept of relevance on the grounds that it would not be psychologically realistic; they say that a version of such an approach was "once attributed to us by Gazdar and Good 1982" (p. 261). Gazdar and Good's attribution struck me as a reasonable interpretation of the goals of SW 1982a nor did SW 1982b challenge its quantitative approach So I think it fair to see the present definition of relevance, cited above, as involving a weakening of the theory aspired to in SW 1982a. (See endnote 6 for discussion). In particular, as SW accept, the specification of relevance which they now give will leave the relevance of some instances undecidable - "clear comparisons [are] possible only in some cases" (p. 125) - since contextual effects can be increased or decreased by varying processing cost, which is an independent variable. I take it that a goal of SW 1982a was to avoid such undecidability (see endnote 7) and, as I argue below, for the current definition of relevance to be at all useable, substantive assumptions about processing cost have to be made by SW.

Second, SW get involved in a real difficulty about processing costs arising from the fact that, as contextual effects increase, processing effort automatically increases. They have to say that such processing is worth the effort - as they put it, "Except when they are in a state of utter exhaustion, humans find thinking worth the effort. We can therefore draw the empirical conclusion that the processing effort needed simply to [mentally] write down a contextual implication or to raise or lower the strength of an assumption is not enough to offset the contribution thereby made to relevance" (p. 126). (See endnote 8). But this point is difficult to square with and may be in contradiction to the theory of processing which I shall come to in a moment, a theory which fixes cost at a level where it deters processing.

SW consider in Chapter 3 the question of how context { C } is determined. They reject the view that context is given by previous utterances (Section 3.4) and argue that context is determined by the search for relevance (p. 141). The problem which then arises is simply this: How do you know which { C } is optimal from the point of view of relevance without actually computing the contextual effects of all possible contexts, a computation which will incur (large) processing costs? In other words, each possible setting of { C } yields a different value for the variables, contextual effects and processing effort and there is no way of seeing what optimizes relevance without doing the computations for all settings of { C }.

SW cut through this problem in a radical fashion. They argue that the correct interpretation of an utterance (an ostensive stimulus) is the first spontaneously derived interpretation consistent with the principle of relevance, which specifies that an utterance is relevant if and only if it has some contextual effects (see pp. 166-168). Two points can be made about this solution.

First, their solution is an application of a principle of least effort, constrained but not guided by considerations of contextual effects. Their solution can be represented as the ordered application of two principles, rather than the application of a principle of relevance. One can see this by imagining an automaton built to implement SW's proposal. It would proceed as follows:

  1. Given { P } (= new stimulus or information), compute the interpretation which arises in the first { C } (=context) which occurs to you.
  2. Check that the result of (1) has some contextual effects, as defined at pp. 107-108 quoted above.
  3. If the answer to (2) is "Yes," halt.
    If the answer to (2) is "No," reinterpret { P } for the next { C } which occurs to you.

Here (2), which has to do with contextual effects, is simply a constraint on the operation of (1), which has to do with avoiding effort. Neither is a principle of relevance. Contexts are abandoned if they prove to be irrelevant; they are not determined by the search for relevance.

Second, we could argue that, whereas in relation to processing effort arising automatically from computing extra contextual effects, SW set processing costs (arbitrarily) at a low level, here processing costs have been set (arbitrarily) high. This can be seen if we ask the question: why should it not be optimally relevant to compute contextual effects of { P } for two settings of { C } and then select the value for { C } which yields the greater contextual effects? Obviously, such a procedure involves extra processing costs but there is no a priori reason why the net benefits from a two-shot interpretation procedure should be less than those from a one-shot procedure, since the latter obviously risks missing (important) contextual effects. It may be true, as a matter of psychology, that we engage in first-shot processing but it is not clear that attributing this to an assumed high level of processing costs is at all explanatory. It is simply what you have to say, retrospectively, to fit the theory to the fact and, if so, is as vacuous as saying "People always do what they want" to explain what people do. (Compare Gazdar and Good 1982, pp. 97-98.) At this point in their exposition, SW are setting processing costs high enough to "explain" first-shot processing; earlier, they set processing costs low enough to justify the costs which automatically arise from processing contextual effects themselves.

It remains to consider how SW think { C } is set in the first place, since the first (most accessible) setting for { C } plays such an important part in their model of on-line utterance interpretation. One could wish for more on this topic. (See endnote 9). What I take SW to be saying (e.g., p. 138) is that the most accessible { C } is given by the encyclopaedic script, frame or prototype (see endnote 10) currently in the hearer's short-term memory at the moment of utterance or else made accessible by the utterance itself - which then creates its own context (by means which are far from clear). Contrary to expectations some may have had, the theory of relevance appears to offer no fresh light on the frame or context-setting problem, except as an account of context abandonment.

SW's longest chapter, chapter 4, is titled, "Aspects of Verbal Communication" and contains a great deal of interesting discussion on a variety of topics, not all of it directly concerning the principle of relevance and not, as far as I can see, lending itself to smoothly integrated treatment in the context of a review like the present one. The topics discussed include propositional form, parsing, metaphor, irony and speech act theory.

Early in the chapter, there is a good example of a least effort principle used to do explanatory work independent of considerations of relevance. This occurs in SW's sketch of a theory of on-line, left-to-right parsing. They ask why, for example, in the absence of contextual information, (1) is typically parsed as (2a) rather than (2b) (p. 184):

1) I saw that gasoline can explode
2a) I saw that it is possible for gasoline to explode
2b) I saw that can of gasoline explode.

They note (p. 187) that when the parser reaches "that," it has to decide whether "that" is a complementizer or a demonstrative and they argue, "Demonstrative determiners need a particular type of context: one created by pointing, for instance. In an artificial situation, the complementizer interpretation which does not need an ad hoc context, is less effort-consuming and will be preferred" (p. 187) - since the demonstrative interpretation requires the effort of creating a context. In other words, doing what comes easily is doing what comes naturally, or vice versa.

Particularly interesting for the student of poetics is the development of a notion of weak implicature, by means of which SW respond to their own view that "we see it as a major challenge for any account of human communication to give a precise description and explanation of its vaguer effects" (p. 57). Compare the utterances (3) and (4) (from p. 221) :

3) My childhood days are gone.
4) My childhood days are gone, gone.

SW argue that to be optimally relevant in a context, (4) must imply some contextual effects additional to those applied by (3), since (4) involves more processing effort than (3). (See endnote 11) However, SW argue that to maintain relevance, though the speaker of (4) must intend to produce some such additional contextual effects, she need not intend any specific contextual effect which the hearer can infer on the speaker's authority. As they put it, "there may be no cut-off point between assumptions strongly backed by the speaker, and assumptions derived from the utterance but on the hearer's sole responsibility" (p. 199). Such weak implicatures are a central element in the creation of poetic effects; they play the part in SW's theory that connotation plays in semiological theories.

SW's theory of metaphor and irony will also be of interest to students of poetics. They treat metaphor as a descriptive use of language, on a continuum with literal uses, which represents a state of affairs in virtue of its propositional form being true of that state of affairs. In contrast, irony is an interpretative use of language which involves a representation of a representation - specifically, it is an echoic use of language. In earlier publications by SW, notably SW 1981, irony was said to involve mention rather than use of linguistic forms. The present account simply substitutes "interpretation" for "mention" (see n. 25 on pp. 263-264). Both accounts will be congenial to Bakhtinians and I attempt to draw some connections between Bakhtin and SW in my essay on Bakhtin/Volosinov, also to be found on-line on this site.

In conclusion, let me say that SW's book represents a significant contribution to pragmatics and poetics. Its virtues are also its weaknesses, for in aiming at an all-round simplification of the theories of communication and cognition in terms of relevance, I do not think SW entirely avoid the dangers of vagueness and even vacuousness. I think that there are unresolved tensions between the idea of relevance as contextual effects and the idea of least effort processing. But I suspect that some of the interesting things SW say about weak implicature, metaphor and irony could be treated as independent arguments by anyone who is not wholly convinced of the truth of the general theory, on which, undoubtedly, much more forceful criticisms will be brought to bear than those I have tried to articulate here. (See endnote 12)


Fowler, Roger et al., 1979. Language and Control (London: Routledge and Kegan Paul).

Fowler, Roger, 1986. Linguistic Criticism (Oxford: Oxford UP).

Gazdar, Gerald and Good, David, 1982. "On a Notion of Relevance. Comments on Sperber and Wilson's Paper," in: Smith 1982, 88-100.

Hirsch, E.D.Jr., 1976. The Aims of Interpretation (Chicago: University of Chicago Press).

Kress, Gunther and Hodge, Robert, 1979. Language as Ideology (London: Routledge and Kegan Paul).

Levinson, Stephen, 1979. "Activity types and language," Linguistics, 17, 365-399.

Pateman, Trevor, 1982a. "David Lewis's Theory of Convention and the Social Life of Language,"Journal ofPragmatics, 6, 135-157. (Also in Pateman 1987, chapter 7.)

1982b "Sperber and Wilson on Gazdar and Good: A Comment," Unpublished paper.

1983 "How is Understanding an Advertisement Possible? in: H. Davis and P. Walton, eds., Language, Image, Media 187-204 (Oxford: Basil Blackwell).

1987 Language in Mind and Language in Society (Oxford: Oxford UP).

(1989) "Bakhtin/Volosinov: Pragmatics in Semiotics." Now online on this site at www.selectedworks.co.uk/pragmatics.html

Rips, L., 1983. "Cognitive processes in propositional reasoning," Psychological Review, 90, 1, 38-71.

Smith, Neil V., ed, 1982.Mutual Knowledge (London: Academic Press).

Sperber, Dan, 1975a. Rethinking Symbolism (Cambridge: Cambridge UP).

1975b "Rudiments de rhetorique cognitive," Poetique, 23, 389-415.

1985 On Anthropological Knowledge (Cambridge: Cambridge UP).

Sperber, Dan and Wilson, Deirdre, 1981. "Irony and the Use-Mention Distinction," in: P. Cole, ed., Radical Pragmatics, 295-318 (New York: Academic Press).

1982a "Mutual Knowledge and Relevance," in: Smith 1982, 61-85.

1982b "Reply to Gazdar and Good," in: Smith 1982, 101-110.

Volosinov, V.N., 1973. Marxism and the Philosophy of Language (New York: Seminar Press).

Wilks, Y., 1985. Relevance, Points of View and Speech Acts: An Artificial Intelligence View. Memoranda in Computer and Cognitive Science,, MCCS-85-25, (Las Cruces, New Mexico: Computing Research Laboratory, New Mexico State University


1. Compare also such humor as "Where do you expect to find a legless tortoise?" - "Exactly where you left it." Here it could be said that the question is unanswerable in the first context, available or accessed by the result of left to right parsing but that the answer becomes obvious and trivial (and repellent) in the second context available or accessed by the actual giving of the answer by the joke teller. Whether such examples count for or against SW's theory is a point touched upon later in this review when I briefly consider SW's remarks on parsing. (I owe the joke to Fiona Sparks.)

2. This assumption of SW is unnecessary. The inferential approach to communication allows us to explain how communicators can succeed in communicating by means of non-identical codes. See Pateman 1982a. But see also SW 1982a, pp. 68-69 for similar sentiments.

3. SW's thoughts are roughly equivalent to the theme of Volosinov 1973.

4. Fowler's continuing commitment to the stylistics viewpoint can be gleaned from this quotation from his latest, rather muddled, book in which he quotes from David Lodge then adds his own comment: "The novelist's medium is language: whatever he does, qua novelist, he does in and through language. It follows that whatever the writer `does' can be shown by analysis of the language" (Fowler 1986, p. 2). Not only is Fowler's comment a non sequitur; for SW, Hirsch and others, it is plainly false.

5. The other sub-classes of contextual effects are strengthenings of assumptions and contradictions between assumptions.

6. SW 1982a say, for example, that "We assume that it is possible to compare two sets of premises for the number of non-trivial implications they have. We shall maintain that this is a crucial factor in assessing the relevance of an utterance" (p. 72) and they go on to speak of a standard of "maximal relevance" (p. 74) where "Degrees of relevance depend on a ratio of input to output, where output is number of contextual implications, and input is amount of processing needed to derive these contextual implications" (p. 74). Again, "of two utterances that take the same amount of processing, it is the one with the most contextual implications that will be the more relevant; and of two utterances which have the same number of contextual implications, it is the one which takes the least amount of processing that will be the more relevant" (p. 74). These formulations can be compared with those in the book under review e.g., at pp. 125-126. In SW 1982a, the principle of relevance takes the form "The speaker tries to express the proposition which is the most relevant one possible to the hearer" (p. 75); in the present book this has become "Every act of ostensive communication communicates the assumption of its own optimal relevance" (p. 158). (I take it that the word "optimal" is used as it is by economists, such that maximization is one form of optim(al)ization but not the only one.)

7. Though I do not think they were successful then (for reasons I tried to spell out in Pateman 1982b).

8. An economist would suggest an alternative formulation at this point: processing continues until the marginal cost of extra processing equals the marginal benefit of the extra contextual effects generated by it.

9. Gazdar and Good 1982 consider the idea of first-shot interpretation and point out that, for this to be optimal "would require that the most likely reading/context pairing would be processed first," adding the claim that "Such a tactic is not available [to SW ] though since it would require at least a partial solution to the problem of finding the context that their machinery was intended to solve" (p. 94). The present book's contribution to the question of context-setting should be read with this claim in mind.

10. Regrettably, SW do not mention Levinson's concept of activity type as having any part to play in the setting of context, yet it seems tome ideally suited to this role. See Levinson 1979 and Pateman 1983.

11. I should, however, like to make the following observation. If one asks what alerts the hearer to the character of (4) as implying more contextual effects than would an utterance of (3), I suggest that the answer is not that it is longer but that it is stylistically marked for poeticalness. The hearer could then be thought of as operating a rule of thumb maxim: Put more effort into processing poetic-looking utterances. The trouble with such rule of thumb maxims is that, though supposedly derived from some higher level general principles, they can in fact be or become principles in their own right; the higher level principle drops out of the picture the more resilient such maxims become. The problems which this creates are well known to students of the rule-act controversy in the interpretation of utilitarianism. There the problem is whether it is individual cases or only classes of cases which ought to be judged in terms of 'greatest happiness' producing-effects. It seems to me that similar problems may affect SW's work.

12. In addition to criticisms of SW 1982a in Smith 1982, see also Wilks 1985 (kindly drawn to my attention by Gerald Gazdar) which is relevant to this review in its explicit commitment to a "least effort" approach to information processing.

LIGHTLY REVISED FROM THE VERSION APPEARING UNDER THE SAME TITLE in Poetics Today, vol 7 number 4, 1986, pp 745 - 754. Copyright in the original, The Porter Institute for Poetics and Semiotics, Tel Aviv University