Key Principle
Conversation design for dialogue systems rests on a foundation of linguistic pragmatics and interaction theory. Grice's Cooperative Principle supplies maxims that serve as design heuristics but also explain how humans communicate meaning beyond literal content. Speech Act Theory reframes utterances as actions with preconditions, not just text. The Natural Conversation Framework (NCF) translates Conversation Analysis findings into 100 implementable dialogue patterns. These theoretical foundations determine what a conversational interface must handle: not merely parsing words, but recognizing intentions, managing mutual understanding, and supporting the structural turns that confirm comprehension between participants.
Why This Matters
Designers who treat conversation as "text in, text out" miss the mechanisms that make dialogue work. Grice's maxims are routinely applied as design rules, but "Grice's original purpose was explaining conversational implicature -- how speakers flout maxims to convey meaning beyond literal utterances" (p. 15). Systems must handle implicature, not just follow the maxims. Speech Act Theory motivates intent classification: "When people engage in conversation they do more than simply produce utterances -- they perform actions" (p. 15). The NCF demonstrates that turns like "thanks" and "you're welcome" are not social niceties but structural confirmation of mutual understanding -- without these closure turns, a system cannot distinguish a satisfied user from a silently confused one (p. 36). Ignoring these foundations produces systems that are technically functional but conversationally incompetent.
Good Examples
- The Conversational User Interface (CUI) concept unifies fragmented terminology (161 synonyms on chatbots.org). Rather than distinguishing chatbots, virtual assistants, and dialogue systems, CUI focuses on what they share: "to engage with applications in a conversational manner, i.e., by taking turns as in a dialogue" (p. 13). The terminological proliferation reflects five independent research traditions, not meaningful technical distinctions (p. 13).
- The NCF catalogs sequence expansion types that enable adaptive dialogue: preliminary (checking preconditions before the main request), inserted (user clarification, system slot-filling), and post ("Anything else?" openings). "Sequences cannot be pre-determined but evolve on a turn-by-turn basis" (p. 37). One user completes an exchange in two turns; another needs ten.
- The three-type dialogue taxonomy maps directly to implementation complexity (p. 30): user-initiated (two-turn exchanges), system-directed (slot-filling, proactive, instructional), and multi-turn open-domain. Deployed systems handle types 1 and 2; type 3 "is generally not supported in currently deployed systems but is the focus of much research" (p. 30).
- The NCF Open Request pattern (Table 1.1, p. 36) specifies: FULL REQUEST -> GRANT -> SEQUENCE CLOSER -> RECEIPT. The two closing turns are what one-shot smart-speaker interactions typically omit, losing the mutual-understanding confirmation mechanism.
- The Alana socialbot demonstrated cross-turn anaphoric reference, proactive topic introduction, and graceful handling of user-initiated topic switches -- capabilities that require implementing NCF patterns beyond simple slot-filling (p. 38).
- Slot-filling is the dominant task-oriented design pattern because it converts unconstrained natural language into structured data a backend can execute: "The information required to complete the transaction is gathered into a data structure containing a number of slots to be filled" (p. 26). The design tension lies in accepting over-answering (multiple slots filled in one utterance) while tracking unfilled slots (p. 34-35).
- Dialogue design operates across three layers -- linguistic (context, topic, error recovery), social (engagement, personality, emotion), and psychological (theory of mind). Addressing only the linguistic layer produces systems that feel robotic (p. 41).
- The happy-path/deviation model reveals that deviation responses are handcrafted per use case and "are difficult to apply more generally" (p. 33), creating the brittleness that motivates the shift toward statistical approaches.
Counterpoints
- Grice's maxims are still useful as design heuristics: "These maxims are still being used widely by dialogue designers as general recommendations for how to design conversations with automated systems" (p. 15). The critical nuance is that designers should not only follow the maxims but also anticipate how users will flout them.
- Speech Act Theory's formal apparatus (locutionary, illocutionary, perlocutionary acts) proved computationally complex and "in the worst case intractable" when implemented as plan-based models (p. 16), driving the field toward simpler statistical intent classification. The theory's insight survives in modern intent/entity classification even though its formal machinery does not.
- The NCF's 100 patterns may over-specify for simple task-oriented systems where slot-filling suffices. The framework's value scales with dialogue complexity -- it is most critical for socialbots and open-domain systems.
- Neural fluency can mask the absence of conversational structure: Meena generates plausible responses but "about one third of [conversations] degenerated into cross-turn repetitions" (p. 40) because it lacks the repair and grounding mechanisms the NCF catalogs.
Key Quotes
"Make your contribution such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged." (p. 15)
"When people engage in conversation they do more than simply produce utterances -- they perform actions." (p. 15)
"Achieving mutual understanding is an important objective in conversational interaction and motivates conversational turns that are not part of the informational and transactional elements of a conversation." (p. 36)
"Sequences cannot be pre-determined but evolve on a turn-by-turn basis as a result of the interactional work by the participants." (p. 37)
"Achieving sustained, coherent and engaging dialog is the next frontier for Conversational AI." (p. 35)
"Rather than attempting to tease out fine distinctions between all these different terms, it is more productive to focus on what all of the terms mentioned here have in common." (p. 13)
"The ability to converse freely in natural language is one of the hallmarks of human intelligence, and is likely a requirement for true artificial intelligence." (p. 35)
Rules of Thumb
- Apply Grice's maxims as design guidelines, but also build systems that can interpret users who violate them -- implicature is not a bug, it is how humans communicate indirectly.
- Design for sequence expansion: any interaction point may require preliminary, inserted, or post-expansion turns. Rigid scripts break at the first deviation from the happy path (p. 37).
- Include closure turns (confirmation, receipt) in dialogue design. Without them, the system cannot confirm mutual understanding (p. 36).
- Use the three-type taxonomy (user-initiated, system-directed, open-domain) to select architecture. The sub-types within system-directed -- proactive, instructional, and slot-filling -- require fundamentally different dialogue management approaches (p. 30).
- Model dialogue complexity as a spectrum: one-shot, extended one-shot (slot replacement + anaphora), system-directed, and multi-turn open-domain. Each level demands new architectural capabilities (p. 31-35).
- Recognize anaphora resolution as a qualitative jump: "Anaphora resolution is a very hard problem, especially in long multi-turn conversations" (p. 32). Even extending one-shot exchanges to handle natural follow-ups requires substantial NLP infrastructure.
Related References
- core-framework.md -- the three-paradigm progression that contextualizes these design principles
- pipeline-architecture.md -- the modular architecture implementing these conversational patterns
- toolkits-and-platforms.md -- the development tools that operationalize these design principles