Situating Automatic Speech Recognition Development within Communities of Under-heard Language SpeakersThomas Reitmaier, Electra Wallington, Ondřej Klejch, Nina Markl, and 5 more authorsIn Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
In this paper we develop approaches to automatic speech recognition (ASR) development that suit the needs and functions of under-heard language speakers. Our novel contribution to HCI is to show how community-engagement can surface key technical and social issues and opportunities for more effective speech-based systems. We introduce a bespoke toolkit of technologies and showcase how we utilised the toolkit to engage communities of under-heard language speakers; and, through that engagement process, situate key aspects of ASR development in community contexts. The toolkit consists of (1) an information appliance to facilitate spoken-data collection on topics of community interest, (2) a mobile app to create crowdsourced transcripts of collected data, and (3) demonstrator systems to showcase ASR capabilities and to feed back research results to community members. Drawing on the sensibilities we cultivated through this research, we present a series of challenges to the orthodoxy of state-of-the-art approaches to ASR development.
The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASRRamon Sanabria, Nikolay Bogoychev, Nina Markl, Andrea Carmantini, and 2 more authorsIn ICASSP 2023
English is the most widely spoken language in the world, used daily by millions of people as a first or second language in many different contexts. As a result, there are many varieties of English. Although the great many advances in English automatic speech recognition (ASR) over the past decades, results are usually reported based on test datasets which fail to represent the diversity of English as spoken today around the globe. We present the first release of The Edinburgh International Accents of English Corpus (EdAcc). This dataset attempts to better represent the wide diversity of English, encompassing almost 40 hours of dyadic video call conversations between friends. Unlike other datasets, EdAcc includes a wide range of first and second-language varieties of English and a linguistic background profile of each speaker. Results on latest public, and commercial models show that EdAcc highlights shortcomings of current English ASR models. The best performing model, trained on 680 thousand hours of transcribed data, obtains an average of 19.7% word error rate (WER) – in contrast to the 2.7% WER obtained when evaluated on US English clean read speech. Across all models, we observe a drop in performance on Indian, Jamaican, and Nigerian English speakers. Recordings, linguistic backgrounds, data statement, and evaluation scripts are released on our website under CC-BY-SA 1 license. 2 We hope that this work will encourage future research on a wider range of English varieties to create more accessible speech technologies.
"I can’t see myself ever living any[w]ere else": Variation in (HW) in Edinburgh EnglishNina MarklLanguage Variation and Change, 2023
Sociolinguistic research across Scotland in recent decades has documented an erosion of the phonemic contrast between /ʍ/ (as in which) and /w/ (as in witch). Based on acoustic phonetic analysis of 1,400
realizations produced by eighteen Edinburgh women born between 1938 and 1993, I argue that in the context of Edinburgh this is best understood as a complex sociolinguistic variable (HW) encompassing (at least) six fricated and fricationless variants. Realizations vary in type and relative duration of frication, voicing, and glide quality. Bayesian statistical analysis suggests that choice and realization of variants is conditioned by speaker’s social class, style, and phonetic context. Unlike some prior work, I do not find evidence of ongoing (apparent-time) change or an effect of contact with Southern British English. Fricated variants are most prevalent in formal speech styles and in the speech of middle-class women, while working-class speakers favor fricationless variants.</p> </div> </div> </div></li>Everyone has an accentNina Markl, and Catherine LaiIn Proc. INTERSPEECH 2023Automatic transcription and (de)standardisationNina Markl, Electra Wallington, Ondrej Klejch, Thomas Reitmaier, and 5 more authorsIn Proceedings - SIGUL 2023, 2nd Annual Meeting of the Special Interest Group on Under-resourced LanguagesSIGUL 2023, 2nd Annual Meeting of the Special Interest Group on Under-resourced Languages : a Satellite Workshop of Interspeech 2023, SIGUL 2023 ; Conference date: 18-08-2023 Through 20-08-2023
In this paper we illustrate the gap between real language use and the language use assumed in ASR development through the example of isiXhosa in Langa, South Africa. Understanding speech and writing practices in context is particularly important when developing speech technologies for minoritised and under-resourced languages, and their communities.
Mind the data gap(s): Investigating power in speech and language datasetsNina MarklIn Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion, May 2022
Algorithmic oppression is an urgent and persistent problem in speech and language technologies. Considering power relations embedded in datasets before compiling or using them to train or test speech and language technologies is essential to designing less harmful, more just technologies. This paper presents a reflective exercise to recognise and challenge gaps and the power relations they reveal in speech and language datasets by applying principles of Data Feminism and Design Justice, and building on work on dataset documentation and sociolinguistics.
The Lothian Diary Project: Sociolinguistic Methods during the COVID-19 LockdownLauren Hall-Lew, Claire Cowie, Catherine Lai, Nina Markl, and 6 more authorsLinguistics Vanguard, Mar 2022
The Lothian Diary Project is an interdisciplinary effort to collect self-recorded audio or video diaries of people’s experiences of COVID-19 in and around Edinburgh, Scotland. In this paper we describe how the project emerged from a desire to support community members. The diaries have been disseminated through public events, a website, an oral history project, and engagement with policymakers. The data collection method encouraged the participation of people with disabilities, racialized individuals, immigrants, and low-proficiency English/Scots speakers, all of whom are more likely to be negatively affected by COVID-19. This is of interest to sociolinguists, given that these groups have been under-represented in previous studies of linguistic variation in Edinburgh. We detail our programme of partnering with local charities to help ensure that digitally disadvantaged groups and their caregivers are represented. Accompanying survey and demographic data means that this self-recorded speech can be used to complement existing Edinburgh speech corpora. Additional sociolinguistic goals include a narrative analysis and a stylistic analysis, to characterize how different people engage creatively with the act of creating a COVID-19 diary, especially as compared to vlogs and other video diaries.
Language Variation and Algorithmic Bias: Understanding Algorithmic Bias in British English Automatic Speech RecognitionNina MarklIn 2022 ACM Conference on Fairness, Accountability, and Transparency
All language is characterised by variation which language users employ to construct complex social identities and express social meaning. Like other machine learning technologies, speech and language technologies (re)produce structural oppression when they perform worse for marginalised language communities. Using knowledge and theories from sociolinguistics, I explore why commercial automatic speech recognition systems and other language technologies perform significantly worse for already marginalised populations, such as second-language speakers and speakers of stigmatised varieties of English in the British Isles. Situating language technologies within the broader scholarship around algorithmic bias, consider the allocative and representational harms they can cause even (and perhaps especially) in systems which do not exhibit predictive bias, narrowly defined as differential performance between groups. This raises the question whether addressing or “fixing” this “bias” is actually always equivalent to mitigating the harms algorithmic systems can cause, in particular to marginalised communities.
Language technology practitioners as language managers: arbitrating data bias and predictive bias in ASRNina Markl, and Stephen Joseph McNultyIn Proceedings of the Language Resources and Evaluation Conference, Jun 2022
Despite the fact that variation is a fundamental characteristic of natural language, automatic speech recognition systems perform systematically worse on non-standardised and marginalised language varieties. In this paper we use the lens of language policy to analyse how current practices in training and testing ASR systems in industry lead to the data bias giving rise to these systematic error differences. We believe that this is a useful perspective for speech and language technology practitioners to understand the origins and harms of algorithmic bias, and how they can mitigate it. We also propose a re-framing of language resources as (public) infrastructure which should not solely be designed for markets, but for, and with meaningful cooperation of, speech communities.
Imagining the city in lockdown: Place in the COVID-19 self-recordings of the Lothian Diary ProjectClaire Cowie, Lauren Hall-Lew, Zuzana Elliott, Anita Klingler, and 2 more authorsFrontiers in Artificial Intelligence, Dec 2022
The COVID-19 pandemic brought about a profound change to the organization of space and time in our daily lives. In this paper we analyze the self-recorded audio/video diaries made by residents of Edinburgh and the Lothian counties during the first national lockdown. We identify three ways in which diarists describe a shift in place-time, or “chronotope”, in lockdown. We argue that the act of making a diary for an audience of the future prompts diarists to contrast different chronotopes, and each of these orientations illuminates the differential impact of the COVID-19 lockdowns across the community.
(Commercial) Automatic Speech Recognition as a Tool in Sociolinguistic ResearchNina MarklUniversity of Pennsylvania Working Papers in Linguistics, Sep 2022
As speech datasets used in sociolinguistic research increase in size, laborious and time-intensive manual orthographic transcription is a challenge, limiting the amount of (transcribed) data which can be analysed. In this paper, I discuss the use of (commercial) automatic speech recognition (ASR) as a tool in sociolinguistic research in the context of a case study: the Lothian Diary Project. I describe the kinds of errors produced by two commercial ASR systems for British English within the broader context of algorithmic bias in ASR, and suggest some best practices when working with ASR in sociolinguistic work.
Context-sensitive evaluation of automatic speech recognition: considering user experience & language variationNina Markl, and Catherine LaiIn Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing, Apr 2021
Commercial Automatic Speech Recognition (ASR) systems tend to show systemic predictive bias for marginalised speaker/user groups. We highlight the need for an interdisciplinary and context-sensitive approach to documenting this bias incorporating perspectives and methods from sociolinguistics, speech & language technology and human-computer interaction in the context of a case study. We argue evaluation of ASR systems should be disaggregated by speaker group, include qualitative error analysis, and consider user experience in a broader sociolinguistic and social context.
The Lothian Diary Project: Investigating the Impact of the COVID-19 Pandemic on Edinburgh and Lothian ResidentsLauren Hall-Lew, Claire Cowie, Stephen Joseph McNulty, Nina Markl, and 7 more authorsJournal of Open Humanities Data, Apr 2021
Querent Intent in Multi-Sentence QuestionsLaurie Burchell, Jie Chi, Tom Hosking, Nina Markl, and 1 more authorIn Proceedings of the 14th Linguistic Annotation Workshop, Dec 2020
Multi-sentence questions (MSQs) are sequences of questions connected by relations which, unlike sequences of standalone questions, need to be answered as a unit. Following Rhetorical Structure Theory (RST), we recognise that different “question discourse relations” between the subparts of MSQs reflect different speaker intents, and consequently elicit different answering strategies. Correctly identifying these relations is therefore a crucial step in automatically answering MSQs. We identify five different types of MSQs in English, and define five novel relations to describe them. We extract over 162,000 MSQs from Stack Exchange to enable future research. Finally, we implement a high-precision baseline classifier based on surface features.