Human-in-the-loop: Assisting Users to Author Text with Speech

Voice-based User Interfaces (VUI) have advanced rapidly with the recent development of Natural Language Processing (NLP). Using speech input allows users to write messages, memos or draft documents without occupying their hands typing and their eyes looking at a screen. It is particularly useful in situations where typing is not practical, or possibly due to impairment, or when written drafts need to be quickly generated.Current research on commercial VUI platforms, e.g., Alexa and Siri, focuses on understanding users’ intention and generating “smart” or human-like responses. There is also much work on improving speech recognition rate by analyzing people’s speech patterns, such as speech repair to address people’s natural stammers, repetitions and addition of filler words (eg., “em”, “ah”) [21, 20]. Existing dictation systems, such as Dragon and Google Doc Voice Typing, mainly transcribe users’ speech, then provide limited support for modifying the dictated text with the same modality. Moreover, existing work provides even less support for drafting a document such as structuring the spoken content and help ideation.In fact, generating textual content with speech is inherently different from typing [4]. To better design and implement future systems, we need an in-depth understanding of users’ natural behavior, expectation and responses while interacting with a voice-based system for this task. This is under-investigated in the existing literature. To fill this gaps, this project will investigate how to support users to generate textual content with speech, including tasks from composing, editing, structuring to ideating. We will invent and prototype new interaction techniques based on a conversational interface that enable users to generate and revise text with voice input and reduced visual engagement. For longer text, the interface can scaffold users’ speech for generating content with better structure and quality. What are the important design factors for these features that affect user experiences in terms of easiness to use, effectiveness and intrusiveness of the support?We will answer this question with user-centered iterative design and evaluation methods. In response to our preliminary studies of users’ natural behavior when performing writing tasks with speech, we will first develop prototypes of interaction techniques to support identified user strategies, and then evaluate their user experiences and performances with controlled user studies. The findings from this project will positively benefit the design and implementation of future commercial dictation systems and voice interfaces, and answer questions about how to better support the co-creation with machine intelligence. 


Project number9048167
Grant typeECS
Effective start/end date1/01/20 → …