â€˜Lowâ€™and â€˜highâ€™varieties of Indonesian and other languages of Indonesia are poorly resourced for developing human language technologies. Many languages spoken in Indonesia, even those with very large speaker populations, such as Javanese (over 80 million), are thought to be threatened languages. The teaching of Indonesian language focuses on the prestige variety which forms part of the unusual diglossia found in many parts of Indonesia. We developed a publicly available pipeline to scrape and clean text from the PDFs of a classic Indonesian textbook, The Indonesian Way, creating a corpus. Using the corpus and curated wordlists from a number of lexicons I searched for instances of non-prestige varieties of Indonesian, finding that they play a limited, secondary role to formal Indonesian in this textbook. References to other languages used in Indonesia are usually made as a passing comment. These methods help to determine how text teaching resources relate to and influence the language politics of diglossia and the many languages of Indonesia.
|Publication status||Published - 2021|
|Event||Workshop on Computational Methods for Endangered Languages - Online|
Duration: 1 Jan 2021 → …
|Conference||Workshop on Computational Methods for Endangered Languages|
|Period||1/01/21 → …|