Looking towards a linguistically diverse future – the role of language technology in supporting endangered languages
Around 40% of the approximately 6,500 languages used in the world today are at risk of disappearing in the near future. This endangerment is the result of a complex series of different factors, but the outcome is almost always a decrease in the number of users of a language and the domains in which the language is used, in favour of another language. The aim of Working Group 4 is to explore if and how language technology (LT) can be used to support languages that are endangered and so contribute to a linguistically diverse future, looking both at the opportunities and the challenges for endangered languages.
Over the last year this Working Group has considered the nature of language endangerment as well as the different types of LT currently available and how these might impact on language vitality. The Working Group has also explored examples from different linguistic contexts and how they are used to support languages at different stages of endangerment, recognising that LT technology can perform different functions, depending on the current sociolinguistic position of the language.
One function that LT can perform is to document a language and its speakers, using different tools to capture the language, including any of the cultural traditions associated with it, as well as producing a description of the language. One such example would be the use of technology to map and preserve indigenous language and cultural traditions, such as the “Terrastories: connecting generations” project, which uses a free and open-source tool that can both be used on- and offline, by communities themselves to record their own stories.
A different use of technology might be to improve communication between users and non-users of the language and the technology might, therefore, support translation. This could also be used to ensure that users of an endangered or minority language can access information that is also available to majority language speakers. LT might also be used to support language acquisition, for example in education (see also the work by WG5) – by creating tools and resources that can be used by a variety of different learners and create access to the language through technology, for example the Linguatec project.
Linguatec aims to improve the technological capacity of the Aragonese, Occitan and Basque languages through the development of language resources, tools and applications which has, to date, resulted in an online dictionary, machine translation systems, speech technology and several different applications. A further example would be the Ghiti app, which allows young children to become familiar with different (minority languages) and to foster positive attitudes towards the different languages in their environment.
LT brings new opportunities for developing or increasing the domains in which a language is used, and can also provide tools to support language preservation, language learning and communication between speakers and non-speakers of the language. There are also challenges for endangered languages and LT. Many of the current technologies rely on databases of source material in the (target) language, which may not be available for minority or endangered languages.
Furthermore, when considering the creation of tools and the use of LT in supporting language diversity and ensuring a language’s vitality in the future, it is also important to consider the ethical implications. This might especially be the case where language data is limited or involves recordings of individuals that are no longer able to give their consent for these to be used. The different varieties of a language, where applicable, might need to be considered and decisions made on what to include and exclude.
One of the important points that has been recognised by the WG in their discussions is the need for the language community to be involved at all stages of LT to have a positive impact on language diversity and vitality. This can start by discussions with the community around their needs, requirements, and skills, and evaluating both the language’s sociolinguistic vitality as well as its digital vitality, for example through the DLDP Digital language vitality scale.
If you are interested in the work of this Working Group, or want to find out more, please contact Claudia Soria or Ingeborg Birnie
Blog post author
Dr Ingeborg Birnie, Lecturer, WG 4 Co-chair (ORCID – 0000-0001-8227-9364)