This is a text based conversational interface, where an artificial intelligence chats with a brand’s customers.
A ‘Conversational Interface’ is any kind of device, surface or ‘modality’ that allows a person to interact with a computer. That interaction happens by using typically human methods of interaction: spoken language, written language, gestures or a combination of any of those.
Here we will explore the definition further, look at some examples and touch on the challenges and advances that each interface brings.
A conversational interface can refer to both the method: voice vs text - or the type of thing you use to interact: phone vs website on laptop. This can make the definition a bit confusing. Finally, as mentioned above, interfaces are also referred to as ‘modalities’. Modality is the term most commonly used when interfaces offer combined ways of interaction: voice and text combined is called multimodal for example.
Interfaces can come in all kinds of shapes, sizes, applications and functionalities. From a website chatbot, to a Bluetooth headset with built in voice assistant. Or a fluffy toy that can speak with children, or a refrigerator that its owner can talk with when out shopping to find out what they need to purchase.
Finally, if you chat with Gemini, Claude or ChatGPT on your laptop, you too are using a conversational interface.
Each interface comes with its own design challenges. Let’s have a look at the three main types:
Text
In a text-based interface, you are designing it to work with typed natural language. Often this is further supported by other UI elements like buttons, lists, carousels, images and even video.
Voice
Here you are designing an Agent that understands human language and also responds with audio. This is more complicated, as people can speak unclearly, or with strong accents for example. Likewise, getting an AI voice to sound natural has traditionally been challenging, although LLM technology is making that easier all the time. Remember when designing for voice that it takes a lot of extra effort to get conversations back on track when they go wrong.
Multi modal (Voice and text)
This is slightly easier than strictly using Voice. Because people can look at the screen and use buttons, or read what was said and figure out what they need to do that way.
Gestures
One can debate whether gestures should be considered ‘conversational’ - but for arguments sake let’s just say that designing for gesture interaction entails using a narrow scope (focussing on a few journeys) and trying to utilise gestures that are common for people to use. Think of Tombot - an AI powered labrador puppy that is a companion for seniors.
Let’s start by looking at some examples of common conversational interfaces:
As you can see, most people are familiar with these interfaces, and are quite comfortable using them. Let’s look at some less common examples:
Interactive tablet based Agent in the lobby of a business for self service, for example checking in or out of a hotel, or collecting a rental car.
Robot waiter: voice based robot waiter that takes your order in a restaurant and bring you your choices. Often a multi modal solution, featuring some kind of visual menu to support the voice interface.
App based central heating agent: allowing you to interact with you home’s heating system whether you are at home or not. Usually text based.
Automotive in-car Agentic AI: usually voice based, but more often multi modal. Allowing you hands-free operation of all kinds of tools and services in your car.
As mentioned above: text based conversational interfaces usually take the form of chatbots. Much maligned in the past, they are becoming more and more embraced as useful tools for customer service, communication and even as a marketing and sales tool.
Text based interaction is very intuitive, and because the chatbot is visually available other UI elements can be used to power the conversation effectively. Think beyond buttons to video, images, carousels, interactive lists and so forth.
Chatbots can be found in different places: on a corporate website, in an app, in social media channels such as Instagram or Facebook, and in SMS or Whatsapp.
In short, wherever people use text to communicate, there is a potential for creating an Agentic conversational interface. Just make sure that you follow the CDI method, add established design patterns and achieve best results in your text based Conversational Interface.
As mentioned above, audio interfaces present a sterner challenge in general than text based interfaces. However, the results can be very good. Think of a fully interactive telephone Agent, who can answer all manner of questions - including details on a person’s account for example. And, if it can’t help this agent can route to the most appropriate human service representative.
That said: in this age of GenAI there is an added challenge: GenAI agents perform best if customers ask detailed and informative questions. However, many customers have grown used to barking single word instructions at IVR’s. This leads to an insufficient prompt for the GenAI to generate a useful response. So, it’s necessary to have a fallback for the Agent to acknowledge the customer's query and ask for more detailed information before generating a response. This can sometimes frustrate the customer.
Finally, as said before: apply the design patterns, make sure your Voice assistant has a suitable and clear voice, ensure that it understands queries and focus on error handling!
In this age of GenAI the technology is helping us create more and more effective Conversational interfaces. Familiar challenges remain and it’s important not to be fooled by some of the promises that GenAI seems to make: while it’s true that it’s easier than ever to build something that looks like it’s almost ‘there’, it still takes a lot of design, testing and optimization to launch an Agent that is truly helpful and delightful for your customers or employees.
Now that we’ve looked at the different kinds of Conversational Interfaces, we’d like to share a cautionary tale: with the rise of Agentic AI there have been a number of efforts to create gadgets that offer access to a GenAI Agent without much other functionality. And, although this may become a thing in the future, so far each of these initiatives has failed.
Each of these examples provide a cautionary tale about designing interfaces. The most important takeaway is that each of these products offered no conceivable improvement or functionality over existing smartphones or smart watches. However, we may expect certain design elements of each of these products to make some kind of return in the future.
CDI provides comprehensive training, live support, coaching, and consultancy services to empower businesses in developing Conversational AI solutions that adhere to best practices. Our experienced team guides you through every step of the process, ensuring your implementation is efficient, effective, and aligned with industry standards.
With our curated selection of partners, you can trust that you're getting access to the best-in-class solutions that meet your needs and propel your business forward.
Become a PartnerDiscover our courses and certification programs for creating winning AI Assistants and enterprise capabilities. Get started today.
Our seasoned experts help brands to design, build and maintain best-in-class AI assistants. So if you want to hit the ground running or you need help scaling your team, get in touch.