Word Classes in Indonesian: A Linguistic Reality or a Convenient Fallacy in Natural Language Processing?

Meladel Mistica, Timothy Baldwin, I Wayan Arka

    Research output: Contribution to conferencePaper

    Abstract

    This paper looks at Indonesian (Bahasa Indonesia), and the claim that there is no noun-verb distinction within the language as it is spoken in regions such as Riau and Jakarta. We test this claim for the language as it is written by a variety of Indonesian speakers using empirical methods traditionally used in part-of-speech induction. In this study we use only morphological patterns that we generate from a pre-existing morphological analyser. We find that once the distribution of the data points in our experiments match the distribution of the text from which we gather our data, we obtain significant results that show a distinction between the class of nouns and the class of verbs in Indonesian. Furthermore it shows promise that the labelling of word classes may be achieved only with morphological features, which could be applied to out-of-vocabulary items.
    Original languageEnglish
    Pages293-302
    Publication statusPublished - 2012
    EventPacific Asia Conference on Language, Information and Computation 2011 - Singapore
    Duration: 1 Jan 2012 → …

    Conference

    ConferencePacific Asia Conference on Language, Information and Computation 2011
    Period1/01/12 → …

    Fingerprint Dive into the research topics of 'Word Classes in Indonesian: A Linguistic Reality or a Convenient Fallacy in Natural Language Processing?'. Together they form a unique fingerprint.

    Cite this