Listen to Podcast Audio
In this episode, we get to talk with Preston So, Senior Director of Product Strategy at Oracle. We talk to Preston about his new book VOICE CONTENT AND USABILITY. We discuss the concepts of building conversational designs that are ethical, accessible, and usable.
✨ Episode Sponsor
- Auth0: https://auth0.com
- Auth0 on YouTube: https://www.youtube.com/auth0
- Auth0 on Twitch: https://www.twitch.tv/auth0
- Auth0 Avocado Labs online meetup events: https://avocadolabs.dev/
🔗 Episode Links
- Preston’s new book – Voice Content And Usability: https://abookapart.com/products/voice-content-and-usability
- Publisher: https://abookapart.com/
- Preston on Twitter: https://twitter.com/prestonso
- Preston’s Website: https://preston.so/
- Preston on LinkedIn: https://www.linkedin.com/in/prestonso/
- Oracle: https://www.oracle.com/
- Previous episode – 🪓 Headless CMS, Decoupling Drupal with Gatsby, & Conversational Design with Preston So https://www.thundernerds.io/2020/06/headless-cms-decoupling-drupal-w-gatsby-conversational-design-w-preston-so/
- Ask GeorgiaGov: https://georgia.gov/chat
- Google Cloud Dialogflow: https://cloud.google.com/dialogflow
- Diglossia: https://en.wikipedia.org/wiki/Diglossia
- Word by Word: The Secret Life of Dictionaries: https://www.amazon.com/Word-Secret-Life-Dictionaries/dp/110187094X
- Conversations with Things: UX Design for Chat and Voice: https://www.amazon.com/Conversations-Things-Design-Chat-Voice/dp/1933820268/ref=sr_1_1
- Invisible Man: https://www.amazon.com/Invisible-Man-Ralph-Ellison/dp/0679732764
- Gatsby: The Definitive Guide: https://preston.so/books/gatsby/
Brian Hinton: [00:00:00] I’m Brian Hinton.
Frederick Weiss: and I’m Frederick Philip von Weiss. And thank you so much for consuming the Thunder Nerds, a conversation with the people behind the technology that love what they do
[00:00:46] Brian Hinton: [00:00:46] and do tech good.
[00:00:52] Frederick Weiss: Yeah, thanks everybody for watching the show. If you can please go to the notification bell and subscribe.
Brian Hinton: We’d like to thank Auth0, Auth0 is this season’s sponsor. They make it easy for developers to build a custom secure and standards-based login, a unified login and authentication as a service, to try them out, go to Auth0.com today. Also check out their YouTube and Twitch under the username, Auth0 with some great developer resources and streams, and last but not least is our avocado labs.
[00:01:43] I love that name. An online destination that their developer advocates run organizing some great meetups. Thank you Auth0.
[00:01:52] Frederick Weiss: [00:01:52] Yes. Thanks Auth0! Let’s go ahead and welcome our guest.
[00:01:50] Frederick Weiss: [00:01:50] Thanks so much, Brian. So with that being said, and without any dues being further, let's go ahead and get to our guest and welcome him back. We have the author of the new book, VOICE CONTENT AND USABILITY, senior director product strategy at Oracle, speaker, Preston So. Preston, welcome back to the show!
[00:02:17] Preston So: [00:02:17] Hey Frederick. Hey Brian. Thanks so much for having me back on Thunder Nerds. Might I say it’s a real pleasure to be back here one more time to talk about my new book. Thanks for having me.
[00:02:26] Frederick Weiss: [00:02:26] I appreciate it. And we started a little late and you have an event that you were just doing. Do you mind telling us a little bit about that event?
[00:02:32] Preston So: [00:02:32] What that. I will. And my first and foremost dear apologies to everyone who was waiting for this live stream. I had the misfortune of forgetting getting, send out a confirmation email and an email that actually I had, let's say, Hey, this event is happening today. So we started a bit late and we ended a bit late.
[00:02:51] It was my launch event for my new book, which is here, voice content and usability. And we had a great time doing some discussion about the implications of voice interfaces for those of us who work with the web, which is, I think a lot of us in the funder nerds audience, as well as the implications of voice on our society.
[00:03:13] And of course, The vaunted and traditional book cake, which is something that everyone at a book apart, my publisher has to unveil as part of the process of launching a new book. It was a very interesting process, but very sorry to those who were waiting on this YouTube person.
[00:03:33] Frederick Weiss: [00:03:33] Oh, sorry. Did you say a book cake?
[00:03:35] Preston So: [00:03:35] utterly a cake? Yeah. Book cake. Maybe I'm saying too much. I don't know how, like it should be cake. Gotcha. Yeah. If she, yeah. Not like everything is cake, oh, it's all cake that yes. It's all cake as well, but a book cake, because basically oh, you're supposed to have a cake that looks like your book and represents your book.
[00:03:55] Yeah. So it was a great launch event and it was a real pleasure to share a little bit about the process. I went through writing the book and some of the really exciting things that I taught. Love that.
[00:04:09] Frederick Weiss: [00:04:09] And speaking of the book, we're going to be giving away three copies of the ebook courtesy of a book apart today.
[00:04:16] If you can just chat with us, ask us your questions. Maybe tell us you want a book we're going to randomly give away some books. So we'll be doing that as the show progresses on. Preston first, let me talk to you a little bit about you being with us last time, promoting your last book.
[00:04:35] Decoupling Drupal. Am I saying that correct?
[00:04:38] Preston So: [00:04:38] Yes. Decoupled Drupal in Practice.
[00:04:42] Frederick Weiss: [00:04:44] How was the success of that and how did that prompt you to start writing a new book? You just wrote that book not too long ago and all of a sudden you have another book.
[00:04:54] So I see a pattern every year, a new book, I wish I could come out with a new book every year. Like someone would say RL Stein of goosebumps or something like that. But This has been a really interesting process because my books tend to be very focused on really technical aspects of the ways in which we work with our content and the ways in which we work on the web.
[00:05:19] Preston So: [00:05:19] The first book I wrote was back in 2018, a couple of Drupal and practice. And I think one question I get a lot and definitely happy to answer for some of those on the call or those in the audience. What's it, what is it like as a technologist to write a book? Especially for those who are developers or designers.
[00:05:38] So this book is actually my first book that is not a coding book, not a technical book. It doesn't have any code snippets in it. Couple of code formatted sections that are really tiny, but it doesn't really have any sort of tutorials as to how to spin up a command line interface or things like that.
[00:05:58] It's really focused on the user experience and design audience and the accessibility audience, which is a very different audience from the audiences that I'm used to writing for. What's interesting is that decoupled, Drupal and practice is about the architectural underpinnings or the foundation of how you can deploy content.
[00:06:49] But the other thing I will say is that I actually made the mistake. I had the privilege, or some would say the misfortune of writing two books at the same time, over the past year and a half. And the other book that I've got coming out this fall is Gatsby, the definitive guide, which is about Gatsby JS, the static site framework.
[00:07:08] So right back in the other direction.
[00:07:10] Brian Hinton: [00:07:10] So you're going to write
[00:07:12] Frederick Weiss: [00:07:12] every other year,
[00:07:14] Brian Hinton: [00:07:14] three minutes a year
[00:07:15] Preston So: [00:07:15] I was thinking of more Fibonacci sequence, actually, Brian like I think I should write five and then eight and then 13 Yeah, they might get a little shorter and they might be filled with some more memes.
[00:07:25] So why is voice content usability? Like, why did you're like, okay, now I really think I need to write this.
[00:07:33] Yeah. Yeah. Specifically too, if I
[00:07:34] Frederick Weiss: [00:07:34] could append to that point, Brian Y you said yourself, like you moved away from like a coding kind of thing. Like why go that way into
[00:07:42] Preston So: [00:07:42] the accessibility?
[00:07:44] So I've always been really into web development, but my real core interest and passion has always been for design and user experience. I started out as a web designer. I started out as a print designer. I actually also did computer programming back in the back in those days and got into web development that way.
[00:08:02] But it really wasn't necessarily something that was an itch. I got to scratch very much this aspect of design and user experience that is beyond the web. And I've always been interested, not only in how we can serve some of the users who are interacting with some of the content that we produce or some of the experiences that we create in terms of technology beyond the web.
[00:08:27] I was also really interested in how we can actually best serve. Users that already exist and users that are already within the demographics of the audiences that we're trying to serve. I've always been interested in web accessibility first and foremost, as well as some of the aspects of how accessible it really changes the ways that we think about other user interfaces that might not have gotten and so much attention from the standpoint of how they can better serve disabled users and those who might be elderly and have a little bit more trouble for example, using a mouse or typing on a keyboard and those two audiences, specifically the elderly and disabled communities around the U S we're communities that we aim to serve with the first ever voice interface for residents of the state of Georgia.
[00:09:15] I worked to ask Georgia gov, which had the specific goal of really focusing on. How we can serve residents of the state of Georgia who want to be able to find out things like registering the vote or how they can get a small business loan or how they can renew their fishing license without necessarily having to incur the cognitive costs of either interacting with a screen reader driven website or interacting with, let's say somebody in person at an agency office.
[00:09:45] And I think one of the really interesting insights that we found is that I think really unexpectedly is that a lot of the websites that we build, obviously we think. Nowadays because so many people use the web because disabled folks use screen readers because so many people now are used to the paradigm of the web.
[00:10:04] The website is really the gospel of how people should now consume content and how people do consume content. But I think one of the things that's been born out by this project is that the kinds of things that people would ask an Amazon Alexa sitting in their own home about the state of Georgia and the government capabilities that are available to them were completely different.
[00:10:26] And in some cases, diametrically opposed to the sorts of queries and things that people would search for on the georgia.gov website, which is the ultimate source of all of the information that we used. And that really illuminates a little bit of this. I would say a little bit of this hidden bias that we have.
[00:10:43] Towards the website as the primary conduit for information, when in some ways it really should be just considered one facet of a wide variety of ways to access our content equitably. So then what do we
[00:10:56] Frederick Weiss: [00:10:56] do? Are we expected to have multiple locations for our content, like specifically. I'm going to build content for voice, or I'm going to build content for a website, and I'm going to build content that goes into an application.
[00:11:14] Or am I or does it behoove us to write content that is a uniform and maybe in a specific way, and possibly you might answer in what way that, that might be as one source of truth.
[00:11:31] Preston So: [00:11:31] That's a really challenging question. And obviously I shouldn't really go too far here without saying that some of those questions are answered in my book, voice, content, and usability as a book of parts.
[00:11:42] Please don't give everything away just a little bit. Could you read the whole book out loud, please? That'd be here all day. Yeah. Yeah, we do have, so what I will say is that this is the perennial debate, right? I think one of the things that we as designers struggle with as we really deal with this exploding, Kind of menagerie of user experiences that we increasingly have to deal with is what do we do with our data?
[00:12:10] What do we do with all of these things that we've built that are in some ways, very much oriented towards, or very focused on the audiences that we've cultivated over time, namely our websites and mobile applications being for these very visually rooted experiences and demographics that are used to these visual experiences, the things that are really problematic about some of the approaches that were characteristic of the early days of voice content.
[00:12:37] Let's say when people were experimenting with voice interfaces or chat bots, as a means to deliver a certain type of content, you would have a parallel version of the information that was already housed in your website. And those of us who are content designers or content straps, Can really feel the pain that comes from the notion of having a set of content over here in one silo, that's destined for the website and another piece of content over here, that's destined for a voice interface.
[00:13:04] How do you keep those two things in sync? And now that we have regulations like GDPR and HIPAA, for example, that are really obligatory, that content stays current, or that content stays up to date with what we need. How do we actually make sure that all of this content stays up to date without having it be in a single source of truth for content?
[00:13:24] Now, my book definitely doesn't make any prescriptions about going in one direction or the other where, oh yeah, you must do it this way. Or you must do it that way because there are exceptions to everything and nothing is ever cut and dry. However, I generally err on the side of saying that look at the case of what we did with the state of Georgia, georgia.gov, they insisted actually that we use one single source of truth for content that was going to be an omni-channel or channel agnostic source of truth for content because ultimately a lot of us don't have the luxury to maintain multiple versions of content that are destined for multiple conduits of content.
[00:14:00] So we ended up keeping it all in one source and we ended up maintaining it all in one. And having both voice and web versions of the content pull from the exact same repository of content, which ends up being more scalable in the long run, especially now that Georgia has built an additional chat bot that is a written chat bot, a textual chat bot, but also pulls from the same content.
[00:14:25] I'm curious, there
[00:14:25] Brian Hinton: [00:14:25] was a course of your research and writing of this book. Was there anything that shocked you or surprised you that you didn't like?
[00:14:33] Preston So: [00:14:33] Didn't immediately realize. Yeah. It's a great question, Brian. I there's a there's too many to list because I think one of the things that's really one of the things that's really tough about voice interfaces is that up until recently, it's been really challenging for a lot of those who are not computational linguists or machine learning engineers or people who are really deeply involved in some of these very low level technologies to really get involved with voice.
[00:15:08] However, one of the things I will share is that in some ways there's really interesting emergencies of some of the foibles in voice interface design. When you start working with this technology that is very reminiscent of back in the day and those of us who were listening to Thunderbirds.
[00:15:27] Have worked in the web for a while, will recognize, for example, the things that we used to deal with in the early two thousands or mid two thousands, like quirks mode compatibility, or some of the really odd browser hacks that we had to do with CSS. And there's weird things like that in voice interfaces.
[00:15:46] One example of this that I'll share and I'll keep it just to one is when we build, ask Georgia gov, which of course is that voice interface for the residents of the state of Georgia. There was a situation where we had a retrospective. And one of the things that we did for Georgia was they wanted to have the ability to administer and manage all this content in one single place.
[00:16:08] And we had a parallel set of logs and reports that would sit right next to the logs and reports for the website. So whenever somebody would hit a 4 0 4 error on the website they could compare and see. How many times did this piece of content also air out, for example, for the voice interface for Alexa, were there situations where the search return, the results or where it triggered 4 0 4 errors on the content management system that we were using to serve both the website and the voice interface.
[00:16:39] So we had this retrospective about eight months after the launch of the interface, which was in 2017. And we had a discussion about some of the logs and we kind of leaf through them and said, okay, what are some of the errors that we're seeing? And what are some of the things that we can do to either adjust the content or maybe even do some debugging of the interface itself?
[00:17:00] There was this one result that kept on coming up over and over again, this one error, this 404 error, basically a search that somebody conducted that returned no results, no content. And it was the word Lawson's L a w S O N apostrophe S. And this kept on popping up over and over again. It was about 16 times.
[00:17:20] If I remember correctly in the log and we thought. Who is searching, who wants to search for this, like proper now this brand name this person named Lawson did they get this confused with the different kind of application on their Alexa that they're trying to use? And we sat there and scratched our heads for a few minutes.
[00:17:38] And one of the native Georgians in the room suddenly perked up and she said, you know what? I think it's somebody who is from Georgia, who has a Southern drawl, who is trying to say the word license as in driver's license or nursing license or fishing license, and sure enough.
[00:17:57] That was exactly what happened. And this is one of those situations where, Hey you can do the best designed application that adheres to the latest and greatest standards and specifications like we did back in those days with CSS and come within an inch of perfection when it comes to these voice interfaces that we build custom.
[00:18:17] But ultimately it's in the hands of people like Amazon or Google, whether or not they can actually understand the kaleidoscope of American English dialects that we have in this country. And that we really should be able to understand. And I think it's a really good sign that yeah, these voice assistants are really good.
[00:18:35] But they're not yet at that point where they can beat us at our own game of human conversation. Yeah.
[00:18:40] Frederick Weiss: [00:18:40] This brings me if you don't mind really quick, Brian, this is something that Todd Libby wrote here, and he he also appended to his question where their edge and he wrote challenging where they're challenging edge cases with respect to a 11 Y that you ran into the Georgia project.
[00:19:00] Preston So: [00:19:01] Yeah. Great question, Todd. And when it comes to the work that we did on accessibility, on Astoria gov in terms of edge cases, I will share that. I think one of the big challenges, there were several challenges, right? And one I think is one of the one of those challenges that's inherent to.
[00:19:20] Voice interfaces that are pure voice interfaces, which I, and others define as basically a voice interface that lacks a screen. So there's no visual component, no tactical or physical cues on it. Yeah not a gooey. You're basically just interacting with somebody through the spoken word.
[00:19:37] And I think this is not really an edge case so I don't wanna say that this answers the question, but one of the things that I think a lot of people forget, and I think is really important to keep in mind when working with voice interfaces, when it comes to extending the accessibility of your content on a website or your web properties, is the fact that pure voice interfaces that are lacking in a visual or physical component are actually not accessible to certain disabled people, namely those who are deaf or those who are deaf blind.
[00:20:10] And the notion that I think a lot of people have today, Is voice interfaces can solve a lot of cases for accessibility, but that's really not the case because when it comes to so many of the demographics that we need to serve in the disabled community, there are certain solutions that only go part of the way there and we're going to do that.
[00:20:34] Yup. And yeah, so that's yep. Yeah. That's exactly right. How do we also make sure that we can serve content on a mobile? Consumable way to refreshable braille displays that are maybe not necessarily the same thing as the kind of let's say screaming and experience.
[00:20:52] That's very rooted in the visual structure of a webpage it's very early days still in this, the sort of notion of multimodal accessibility or how to really make sure that a lot of the user interfaces that we have are not actually stepping on the toes of other folks who are accessing content in particular ways.
[00:21:12] The edge case, however, that I will share is I think a lot of people also make the assumption that these voice interfaces and voice assistants can be. The ultimate solution for a lot of folks who are blind or have low vision, but that's really a tough sell in some ways, because I think one of the things that's really important to recognize about these peer voice interfaces like Alexa, is that they have a learning curve too.
[00:21:38] We know this web meter and some of these browsers or three meters like ChromeVox or Jaws have issues that require people to ascend a very steep learning curve to use them in an effective and efficient way. And voice interfaces are very much the same way. So one of the things that we encountered during our usability testing was.
[00:22:01] Just one of those things that we didn't necessarily expect, which is that a lot of people that we had come in and worked with and went through our usability study, really had very little experience with Alexa devices. And I think for those who are looking at voice interfaces as a means to be a compelling potential sidelong alternative to swim meters, that might necessarily, that might potentially be a little bit problematic and how they efficiently guide users to their content as as the voice interface designer, Chris Mari writes it is something to think about, which is there is still a learning curve.
[00:22:41] And how do you actually address that learning curve in a way that makes sense to those users that you need to.
[00:22:47] Brian Hinton: [00:22:47] Yeah, I'm curious in the sense of Georgia, where we're at my current role, we're working on a chat bot. And one thing that we've found most difficult is I think it's called semantic, parsing a word converts that conversation into what logically makes sense.
[00:23:03] What are they asking? And it's like the difference, like the capital of Georgia, someone's saying capital of Georgia and that's all they say, or what's the capital of Georgia or Georgia Capitol is like did you encounter anything weird in that sense or any cases.
[00:23:19] Preston So: [00:23:19] Yeah. I talk about this a lot in chapter three of my book, which is about writing those conversational dialogues that really are the lattice work of the voice interfaces that we produce.
[00:23:32] And it's a really challenging kind of thing because a lot of these questions, Brian are really rooted in the technology that you're using. Because some voice ecosystems or conversational ecosystems are better equipped to deal with. Let's say variations, like the ones that you mentioned just now, than others are.
[00:23:49] But there is a lot of work being done to improve the situation. So back in the day in 2016, when we worked on Astoria gov and in the grand scheme of voice interfaces and the history of conversation design five years ago is a long time ago. We might as well be talking about clay tablets and abacuses at this point, because that was an era where a lot of those utterances that people would state in order to do a process of what's.
[00:24:17] Intent identification where the user interface is able to piece together a sense of what the user actually wants to achieve, which is much easier said than done. That's a process that used to be very much a sort of manually driven process. For example, let's say that you're trying to identify a yeah.
[00:24:37] You're trying to identify a question like what is the capital of Georgia? It has to be phrased like a question, let's say. And one of the things that I think is really challenging for a lot of people who are just getting started with voice interfaces is that in some of these ecosystems, some of these technologies obligate you to be very clear about defining how the user has to respond.
[00:24:56] And as we know, as users. The ways that we actually respond to some of these questions and the ways in which we actually say some of these things can be phrased completely differently from the ways in which we've actually coded the voice interfaces or conversational interfaces or chatbots to consider.
[00:25:14] And whenever we have, what's called a, an out of domain error where the chat bot or the conversational interface or voice interface, isn't able to actually understand what you're saying, because the way that you phrased it, even though it's a perfectly logical thing isn't accounted for within the context of what the voice interfaces in is able to understand through its programming is a very big problem.
[00:25:39] So I'd talk about intent identification and the problems that occur when you have these very dedicated slots or tokens or some of these No, basically this teasing out process that you have to do with intern identification that really relies on some of these boilerplate templates that users have to use to say these things, but that's not how we speak.
[00:26:00] That's not natural, right? Nobody really wants to have to say things the same way. Over and over again, to be understood by a voice interface. Although there is usability research evidence that suggests that some users do prefer that. But there are some ecosystems now, like dialogue flow, for example or some of the major new conversational tools that are out there are getting better at understanding, let's say all the different variations that you could possibly have and being able to intelligently parse through that and say, okay, this is the intent of what the user is trying to do.
[00:26:36] Even though this person might have said something that's very remote from the, let's say a normal way or the default way that we would expect.
[00:26:45] Brian Hinton: [00:26:45] Yeah. My favorite, like real life scenario of beating my brain, being the AI, trying to understand is when I, somewhere, I can't remember where it was, Midwest that they asked what Coke do you want?
[00:26:56] And I said, Coke. And they're like, I'm sorry. Is that okay? Yeah.
[00:27:04] Preston So: [00:27:04] That's what they call it.
[00:27:07] Brian Hinton: [00:27:07] I can't imagine dealing with that sort of a scenario, isn't it? AI type? Yeah. That's funny too, cause it could be something where if you're trying to communicate something out to the bot or the voice technology, you got to think about the context of the personification of this voice or the overall brand. If I'm interacting with a hospital, I don't want the voice to sound all silly and goofy. I I want it to sound like a, just a normal, regular voice. There are some kinds of situations that you might want or even languages for that matter. If I'm somebody in Italy and I'm looking for.
[00:27:48] Frederick Weiss: [00:27:48] A lasagna recipe and I'm in Italy and I'm looking for a lasagna recipe and I go to, and it sends me to the food network and it starts reading me like a M roll recipe in in English. And I don't understand English. There's all kinds of interesting facets
[00:28:01] Preston So: [00:28:01] to this, yeah, this really brings up, I think a couple of interesting elements of the ways in which the conversation design or voice interface design landscape really requires us to think very differently about some of the things that we usually took for granted.
[00:28:17] And one of those really is the building blocks of language. And I'm very lucky in that. Working with voice interfaces over the past five or six years has really allowed me to scratch my itch when it comes to my academic background, which is actually in linguistics. I have a degree in linguistics. Not a lot of people know that.
[00:28:34] But the biggest issue, I think a lot of us face is we're moving. In several directions at the same time, the first is that we're moving a lot of the ways in which we use to write user interface, texts, or content from the written word over into the spoken word, which is a very different realm from how we normally write UI texts.
[00:28:56] Are we, how are we? Normally I actually write content. And just one example to illustrate that is the fact that we don't really say the phrase to whom it may concern when we actually speak. And we also don't really write the word literally, as often as we say it in conversation. So a lot of these little nuances are things that can often be missed.
[00:29:17] And there's two ways in which this really. Can be a problem. The first is that there are certain expectations that users will have that their voice interface reflects the kind of informal or colloquial conversation that they might have with a friend. And when it doesn't reflect that, and when the voice interface comes out with this very kind of stilted utterance or something, that's a very uncanny valley, like I can really interrupt or dislodge the user from what is called habitability and a voice interface.
[00:29:48] This is something that is talked about quite a bit in voice interface literature, where the user has to feel like they're not gonna want to actually tear their hair out or what little hair they have in terms of having a conversation with a voice interface. So that's number one, but I think number two is really interesting given that you alluded to some of the challenges around multilingualism.
[00:30:09] Types of conversation. And this really comes to, I think, some of the elements of voice interface design that remain a largely unexplored area and also an area that is very challenging because of the fact that so much of our conversational technology and voice interface technology has so far been rooted in the English speaking world.
[00:30:30] And one of those issues is when we think about the ways in which we want to serve multilingual audiences and international audiences on the web, we just have to provide translatable strings, right? We just have to provide like these versions of these different pieces of texts that we have or different pieces of content we have.
[00:30:48] But that is a very different kind of proposition when it comes to some of these other languages. And I think one of the biggest issues that we have to focus on. Is the fact that not all languages work like English, not all languages operate in the same kinds of systems and the same kinds of assumptions that a lot of us have about English.
[00:31:08] And one of the things that is really interesting to me is that I'm noticing more and more some of this Anglophone privilege or Anglophone bias in a lot of the voice interfaces that are coming out that are meant to be multilingual are also direct translations of an English interface because fundamentally some languages simply do not work the same way as English.
[00:31:28] There's a phenomenon in linguistics called Dyke Glossier. And this is something I talk about on my blog, Preston dot. And this notion of glossy is actually a phenomenon. I studied also when I was in college where the written form of a language is so vastly different from the spoken form of a language that they might as well be considered two different dialects or two different vernaculars.
[00:31:50] And in some cases. Like Brazilian Portuguese, for example, you really have to learn two different grammatical systems and two different lexicons and two different approaches to the language in order to make yourself understood. Because if I went out on the street and I started speaking in the way that I write, I wouldn't actually be necessarily understood.
[00:32:10] It I'd be understood because people would be able to understand, but it would be a very strange and off-putting conversation. What I find is very interesting with a lot of the work that conversation designers are doing today is that there's a lot of focus on efficiency and scalability, where we can build one single conversational agent or one single conversational interface that manifests as a chat bot as a slack bot, as a WhatsApp bot Facebook messenger bot, and as an Alexa skill and a Google.
[00:32:36] But there's a big problem with that, because that assumes that the same kind of conversation you would have with a chatbot is going to be the kind of conversation you have with a voice interface. And one of the things that we see in linguistics and also in the kinds of conversations that we have on a daily basis through email and texts and at the delegate.
[00:32:57] It isn't the case that our spoken conversations are word for word or even letter for letter. Exactly the same as our written conversations. And for those who don't speak English, for those who are operating in a realm where let's say that the language that they're writing for is not English.
[00:33:16] A lot of those considerations and concerns become a lot more important than essential when it comes to some of the design that we have to do. And I think this means that we have a long way to go in the English speaking world to understand how some of these conversational interfaces really are rooted in our ways of speaking in ways that might not be so appropriate for the rest of the world that we need to.
[00:33:38] Brian Hinton: [00:33:38] Yeah, all of this made me think of a book. I recently read word by word, the secret life of dictionaries. And it's a fantastic book, but it's like the slang too, of how you mentioned the different versions of Portuguese, the slang is different like Mexican slang versus Spanish, Mexican slang versus Spain.
[00:34:00] Spanish slang, very different and English slang, different, like someone said, and also how people will say things like cool versus cool, like completely different. And how to interpret that yeah, Johnny. Yeah. Tone.
[00:34:15] Preston So: [00:34:15] Yeah. And I think this really illustrates a couple of different things.
[00:34:18] You've got the subtext that is not something in UI text or in web content or in any of the word mediums that we have. And paralanguage sticks in this realm of, okay. How are you actually? Really reflect back the fact that the user or the interface might be speaking in a sarcastic tone or in a more assigned tone or in a very stilted tone.
[00:34:43] Like those three things can mean very different things, even though they all use the same single sentence. But the other thing that's really interesting too, Brian, and I think you raised a really good point there, which is it's not just the fact that we have all these differences between languages and the ways that they operate.
[00:34:58] We also have very important differences. Like I mentioned earlier with that Lawson's example around those of us who speak English. And one of the things that worries me a lot about some of these voice interfaces is first of all, the fact that we hear fundamentally one single dialect represented oftentimes in this realm of voice interfaces.
[00:35:19] And it's very similar in some ways to the ways in which newscasters and weather forecasters used to have to be obligated. By their organizations to speak using a middle American or general American dialect. It was unacceptable in certain past decades, in the news media for somebody to speak with a Southern accent or somebody to speak with a different dialect of American English on the air.
[00:35:43] And that's something that's represented now in voice interfaces, in both a very limiting and very pernicious way. Because as we know, from interacting with so many different people from so many different walks of life, not only do we have examples of people who might be bilingual or who might be members of a queer or trans communities who have to switch between different modes of speech or those who are bilingual descendants of immigrant communities who have to be able to code switch between English and Spanish, why aren't those sorts of interesting toggles and those sorts of interesting nuances.
[00:36:17] Representative voice interfaces too, because maybe the kind of conversation that I want to have is the kind of conversation that I would have at home in new Delhi, where I'm switching in between English and Hindi mid-sentence or I'm switching in between English and what I think mid sentence. So these sorts of considerations are not only important for those who are users of English in outside of America, which I think is one example of the America centric approach that we often have with technology all over the place.
[00:36:46] But also the fact that we have been very marginalized and underrepresented. Oppressed groups of people in the United States who speak in certain ways that are not reflected in how we want voice interfaces to speak as well. And I think one very compelling example, two very compelling examples of this is first of all, the fact that the ways in which people use AAV or African American vernacular English is very different from the sorts of voice interfaces that we interact with.
[00:37:14] For example, why is it that we can't hear those sorts of conversations represented in an Alexa device. It has something to do with the intrinsic bias that a lot of us have for a more middle American or general American approach to the conversations that we have. Of course, fundamentally and foundationally a white American form of speech.
[00:37:33] And by the same token we know that those who identify as LGBTQ have very different approaches to using certain language. There's certain code terms. There are certain colloquialisms that are really not understood by audiences that are outside of that community. And how do we make sure that voice interfaces can also represent those things?
[00:37:54] And this ties back to one of the things I talked about. In the final chapter of my book, which really is focused on the problems that surface that we don't consider when we go Willy nilly into this realm of voice interfaces and serving people through conversation in ways that we don't expect. And one of those examples is think about why organizations today and think about why it is that so many people want to get into voice interfaces and want to get into chat bots in the first place.
[00:38:20] So many people are doing this because these airlines, hotels, large companies, corporations, they fundamentally want to be able to reduce the load on their customer service, frontline agents or those who are cost center staffers. But if you think about it, who are these call center staffers? Who are these people who answered your recall when you're calling them in the middle of the night from the airport, screaming about your lost luggage or screaming about your canceled flight.
[00:38:44] It's somebody who might be in the Philippines or somebody who might be in India or somebody. Might be in the global south, it was a person of color who is from a lower middle of middle income country who doesn't have the resources necessary to speak in a general American dialect in the same way that you would expect somebody who's from your own community to speak.
[00:39:03] And this really illustrates a very, I think, big concern in voice interfaces today, which is. When we begin to sterilize and flatten out all of these rich nuances that make our conversations with all of these different people and from all of these different lived experiences, so important to our worldview and to the ways in which we interact with the world.
[00:39:26] What does that do to our future as users? What does that do to our level of trust in our user interfaces? What does that do to credibility and authority? Of those user interfaces and the information that they provide, because let me be honest. When I think about the fact that a voice interface might lead to a Filipino center worker or somebody who is in Mumbai, who is in a call center losing their jobs.
[00:39:52] I'm not so sure that I want that replacement to be this uncanny valley voice that is very stilted and mechanical and might not necessarily reflect the world that we live in today. And I think this really ties into a lot of the issues that we face around misinformation and automated racism and algorithmic oppression that we see around machine vision and so on and so forth, voice interfaces and voice technology and conversational technology.
[00:40:18] These are also domains that are not exempt from the issues that we have in society. Yeah,
[00:40:24] Frederick Weiss: [00:40:24] we start losing the quality of humanity and what you Manatee is, but is there anything I know you were talking a lot about in chapter six, about, about the future. Are there any brighter notes that you could no.
[00:40:41] Frederick, there's not, yeah. I don't want to go down the matrix road, but are there any like cool new things is that we could be looking forward to or things that we could start thinking about now that would be advantageous for us to go, oh, you know what, let me next year start thinking about this so I could get my projects.
[00:41:01] Preston So: [00:41:01] Yeah, absolutely. There's so much to think about. And obviously I wouldn't have written this book if I thought it was going to be a dystopian nightmare and the next few years, or next few decades because voice technology really does have a lot of illuminating and very interesting prospects that I think there's really important things to call out there.
[00:41:19] Not just the facts. And this is not something I mentioned very much in my book, but I do mention it very briefly in my Alyssa part, article usability testing for voice content, which is that there aren't a lot of people who I really appreciate waist interfaces for one unexposed. And that is that I think, as we all know, a lot of us, especially over the course of the last year and a half.
[00:41:42] And I do want to make sure to hold space for those who are still dealing with grief or suffering right now from the consequences of the coronavirus pandemic. Especially of course in India and Australia currently going through a very severe lockdown and the third wave ongoing in Africa Voice interfaces have been shown to stave off loneliness for a lot of people.
[00:42:05] There is research that suggests that having a voice interface that is there to have a conversation with is something that could be very beneficial for mental health. And in the future, as these conversations become better and better as voice interfaces, get to the point where they can do much better, small talk than these really simplistic, let's say gimmicky responses that they often issue.
[00:42:28] I think we can really look forward to a lot of interesting, let's say social benefits from voice interfaces. The other one though, I think is also the fact that there is going to be more efficiency when it comes to content delivery and information delivery. There's a. Futurists named Mark Curtis, who refers to what's called the conversational singularity.
[00:42:47] And we know about the kind of tech or AI singularity, the conversational singularity is along the same lines, which is this notion that as we move further and further into the future, there's going to be a point in time where conversational interfaces will be indistinguishable from other humans when it comes to the kind of conversation that we three are having right now.
[00:43:09] And one of the things that I think is important to call out, of course, as well. Okay. That's a great kind of future, but conversational singularity is going to be indistinguishable, but for whom, right? Whose conversations are going to be indistinguishable. As I was just saying earlier, but I think one of the really interesting things about the conversational singularity and some of them.
[00:43:27] Let's say conversations, centric, approaches that are coming out, which wash away some of the weird distortions that we have today, some of these arbitrary lines in the stand that we have, where you talk with a certain Alexa skill or a certain Google assistant, and they can only help you with this one, certain task.
[00:43:43] They can only help you order a pizza, but they can't help you book a flight. These sorts of interactions will soon become smoother because you know what, maybe I do want to go directly into just like I would with a hotel concierge. Actually have a conversation that moves directly into ordering a pizza.
[00:43:58] With extra pineapple as it should be. And then directly into booking a flight over to my favorite vacation destination. So a lot of these efficiencies are going to become very important in the future. And I think what's going to happen in the next few decades is we'll start to see ways in which, okay.
[00:44:16] Yeah. Some of these issues that we have with how conversational interfaces work or reflect the world that we live in back at us are going to become better in terms of the efficiency and ultimately the performance of user interfaces in the same way as that websites and mobile applications have become much more efficient and much more able to get us over to the things that we want to do.
[00:44:41] Frederick Weiss: [00:44:41] I remember at a Google IO, they had a, what was the one assistant that called to book a hair appointment for somebody. And they were like, oh yeah it's completely indistinguishable from a person that's wrong.
[00:44:53] Preston So: [00:44:53] I can totally tell I'm saying,
[00:44:57] Frederick Weiss: [00:44:57] yeah, I, yeah, I think you could tell, but they said and if you're on a phone call things you have things in the background you're trying to get through things quickly and you're like, yeah, whatever.
[00:45:09] Yeah. It could work. I'm sure. One day, like you said, a person will get like that that movie, her with Joaquin, Phoenix and Scarlett, Troy.
[00:45:17] Preston So: [00:45:17] Yeah. Who among us hasn't accidentally answered an automated phone call. That sounds exactly like a conversation. What are those spam calls that were all besieged by lately and answered a question because it sounded so real or perish the thought, and this is going to be very revealing.
[00:45:32] I think we've all done this, you accidentally answer somebody's voicemail. Automated message saying, Hey, it's Preston. Oh, Hey I'll leave a message at the tone. Oh wait. Okay.
[00:45:43] But yeah, I think it's a really exciting time and I do think that I think one of the things that's important, and I think this book is very timely, right? Because one of the things I will admit is that when this book first was being germinated as an idea, I thought it might be a little early because this project that we did for Georgia was very early in its time.
[00:46:03] It's one of the first ever content driven information driven voice interfaces. It's also really one of the first, very few examples of state governments and local governments doing this kind of work at the time, too. But now I think it's very timely because one of the things that we've seen over the course of the past year and a half is smart speakers, smart home systems.
[00:46:24] Everyone's buying them, they're flying off the shelves and increasingly here as we re-answer the world or live with the virus as it continues to be a problem for so many of us in the world, Just start getting used to some of these other ways of interacting with content. Other ways of interacting with information, with use cases and applications that we need to actually go through.
[00:46:48] And voice is just one of those. And I think we're going to see a lot more investment and a lot more care from the user experience side, not just the developer side in this realm of, okay, we've done this for the web and the web has served us really well for the last few decades, but how do we actually make sure that some of these more multimodal approaches, as we mentioned earlier on accessibility or some of these more interesting immersive or voiced an oral and immersive approaches can be things that will be compelling for users and designers and practitioners in the future as well.
[00:47:25] Frederick Weiss: [00:47:25] Makes sense? What do you think Brian? Or should we go to the lightning round? Yeah. Yeah.
[00:47:31] Brian Hinton: [00:47:31] getting close to the end here. So we're
[00:47:33] Frederick Weiss: [00:47:33] flying rats on. I've got my gloves on. Let's go ahead.
[00:47:37] Brian Hinton: [00:47:37] Yeah. So we're each gonna ask you a question, answer yours, and one at a time. And I'll go first. So would you rather be able to run at a hundred miles per hour or fly at 10?
[00:47:49] Preston So: [00:47:49] I have to think about this one. Probably fly and it's. Yeah. It's because you can see more. Yeah. That's fair.
[00:47:59] Frederick Weiss: [00:47:59] Preston. What is your favorite thing about yourself?
[00:48:04] Preston So: [00:48:04] Oh my gosh. Oh my gosh. These are some questions y'all really, I don't remember the last lightning round being like this. I think my favorite aspect about myself is that I have learned a lot and I've had the privilege of living in many different countries, which not everybody has the privilege to say.
[00:48:28] And that's given me a lot of good perspective. I'll say that. Would you rather live where it snows all the time or where the temperature never falls below a hundred
[00:48:39] degrees? Wow. This is like Snowpiercer versus thread 3d or something like that. Def. So I'm somebody who needs, so right now I am in an air conditioned room, even though it's actually not that hard of a day here in New York city, I need the cold, I cannot deal with the heat.
[00:48:57] And so yeah, it's definitely snowing all the time. I could probably be okay. In, in, in Antarctica actually, I would say, okay,
[00:49:06] Frederick Weiss: [00:49:06] Preston, what book are you yourself reading? To to learn from currently that you're
[00:49:12] Preston So: [00:49:12] enjoying. All right. I'm currently reading three different books. Not really making much progress in either of those; it's like the Fibonacci sequence of reading books and increasing those every year.
[00:49:28] One book that I'm reading, which I will share, which is a very esoteric book right now is Bosnian Croatian and Serbian a textbook because I'm learning Serbo Croatian at the moment as a language, but I'm also reading two other books that are really interesting. The first is conversations with things which is a book written by Rebecca Ivanhoe.
[00:49:49] And I forget the co authors name. I have it right here. I should look at it. As well as Margot Bloomstein book trustworthy, which is a book about how brands can be more authentic in how they operate in terms of content strategy.
[00:50:09] Brian Hinton: [00:50:11] What current fact about your life would most impress your five-year-old self?
[00:50:19] Preston So: [00:50:19] Oh my God. Wow. My five-year-old self. Got it. I thought that was an easy question. You answered it last time. Did I really? Oh my gosh. Let me think. The fact about myself, that people I think the fact that my five-year-old self would most be impressed by is the fact that, oh my gosh after the fact.
[00:50:48] Frederick Weiss: [00:50:48] I remember last time you said moving to New York and working in New York was one of your childhood dreams,
[00:50:54] Preston So: [00:50:54] giving them the answers. That's really funny, cause that's not what I, that's not what I would say to myself actually. That's really interesting. You know what I'll say is this actually I think this is an interesting one because just to get a little personal here when I was, and a lot of us dealt with this when we were younger a lot of us as children, as young toddlers are as young.
[00:51:14] Kids, we deal with speech impediments or other issues with let's say pronouncing words correctly, or doing those sorts of things. And I grew up with a speech impediment, which makes also some of the voice technology kind of things, really poignant. So what I would say is my five-year-old self would definitely be very proud of me for the fact that I can basically go on stage in front of 3000 people and not break a sweat or have this live stream with also 3000 people.
[00:51:43] Of course, there's 3000 people listening to this right now. And not break a sweat either. Yeah. With a personal note there.
[00:51:52] Frederick Weiss: [00:51:52] Nice. What is the most interesting thing that you learned in the process of writing
[00:51:58] Preston So: [00:51:58] this book? Most interesting thing that I learned in the process of writing this book, the most interesting thing I learned in the process of writing this book is probably the.
[00:52:10] Unexpected applications of accessibility. And unexpected challenges around accessibility that occur with voice interfaces, especially given the fact that I think a lot of us are accessibility efficient autos or those who are really passionate about accessibility. I think we often forget that.
[00:52:31] Not only are there so many different types of interfaces that we need to consider the interface that has become the most important one today, which is that the screen reader for websites is actually not necessarily the most optimal or pleasant experience. And I already did have a sense of this because I do a lot of and this is one thing I think everybody should do is you should always take.
[00:52:57] Sort of user interface, you're building and using it from the perspective of somebody who's using a screen reader or somebody who's using an assistive interface, because it is very important to understand how people work from that perspective. But one of the things, so I already knew that screen meters were really very tough, but I guess one of the things that I didn't necessarily realize is just how much people actually really don't like the screen reader sometimes, and really see it as an obstacle to getting to what they need.
[00:53:28] That was a very long answer.
[00:53:31] Frederick Weiss: [00:53:31] That's okay.
[00:53:35] Brian Hinton: [00:53:35] What book has made you cry?
[00:53:41] Preston So: [00:53:41] What book has made me cry? Gosh. Yeah, that's a really interesting question. Wow. There's been, there's definitely been many books that have made me cry. I would say the book that both made me cry and made the deepest impact on me is probably, oh my gosh. I'm just trying to think about this now because yeah the, what I will say is the book that has definitely made the biggest impact on me and made me cry.
[00:54:22] Both of those were probably invisible men. Which is a book that I recommend everybody read. It's one of those books that you read in high school or college English class, but it's a very important book and something that I think everybody should definitely read. Let me
[00:54:40] Frederick Weiss: [00:54:40] I'm out of lightning round questions.
[00:54:43] Brian, do you have anything else on that? Oh, no, I think we're good. Great. Let's get to our final topic here at the end Preston. We like to ask our guests for parting words of wisdom, any kind of things that you'd like to tell our audience at the end.
[00:55:01] Preston So: [00:55:01] Yeah. Great question. I think my biggest parting words of advice for everyone, and this is not just those who are in the design field or who are in the technology world.
[00:55:16] But I think one of the things that I would recommend for everyone who is watching this or listening to this, or will watch or listen to this is that it's really important to really listen to. And uplift and amplify and also hear and take into account in your own day-to-day work and your own day-to-day life.
[00:55:41] The lived experiences of those who are completely unlike you. And by completely unlike you all of those people who face a multiple axes of marginalization or oppression, or who faced very deep obstacles in our world today, who might be disabled, might be women or femmes might be people who are queer or trans might be people who are of color, who are black or indigenous.
[00:56:09] And I think one thing that is really important to me, and one thing that's very important to the way I live my life is. So I really deeply understand where everyone is coming from in terms of their context and in terms of how they have come to be the person that they are today. Because ultimately as practitioners of technology, as those who work on technology, the ultimate reason we're doing this is to help everybody who is our audience, succeed with what they're doing.
[00:56:42] And there's no way to do that unless you really deeply understand and take the time to learn about and comprehend what it is that your audience goes through in any field that we work as people in this world that we live in.
[00:57:02] Frederick Weiss: [00:57:02] Very well said. Thank you, Preston. Again all your social links we have at Preston So.
[00:57:07] So on Twitter, your website is Preston So. Presence on LinkedIn. And of course the new book, Voice Content and Usability by Preston So. Get it there today. Preston again, thank you so much for being on the show. We really appreciate it.
[00:57:32] Brian Hinton: [00:57:32] No, thank you for taking the time.
[00:57:35] Preston So: [00:57:35] Thank you both so much. It was such a pleasure to be here on Thunder Nerds again, and I'd love to come back sometime. Maybe I'll rehearse some lightning talk or lightning question responses for next time, but thanks so much for having me. I appreciate it. Yep.
[00:57:47] Frederick Weiss: [00:57:47] Thank you!
[00:57:48] Yeah, for the next book. We'll see you then. Thanks everybody. Oh, hold on. I got one last comment. Let's see. Thank you for all. Todd- “Thank you for all the phenomenal conversation”. Thank you so much, Todd. Thank you everybody for watching. Really appreciate it. Take care everyone.
If you have questions, or suggestions to modify the transcript, PLEASE let us know at email@example.com