Mahalanobis distance with an adapted within-author covariance matrix: An authorship verification experiment

    Research output: Contribution to journalArticle

    Abstract

    The rotated delta, which is argued to be a theoretically better-grounded distance measure, has failed to receive any empirical support for its superiority. This study revisits the rotated delta-which is more commonly known as the Mahalanobis distance in other areas-with two different covariance matrices that are estimated from training data. The first covariance matrix represents the between-author variability, and the second the within-author variability. A series of likelihood ratio-based authorship verification experiments was carried out with some different distance measures. The experiments made use of the documents arranged from a large database of text messages that allowed for a total of 2,160 same-author and 4,663,440 different-author comparisons. The Mahalanobis distance with the between-author covariance matrix performed far worse compared to the other distance measures, whereas the Mahalanobis distance with the within-author covariance matrix performed better than the other measures. However, superior performance relative to the cosine distance is subject to word lengths and/or the order of the feature vector. The result of follow-up experiments further illustrated that the covariance matrix representing the within-author variability needs to be trained using a good amount of data to perform better than the cosine distance: the higher the order of the vector, the more data are required for training. The quantitative results also infer that the two sources of variabilities-notably within- and between-author variabilities-are independent of each other to the extent that the latter cannot accurately approximate the former.
    Original languageEnglish
    Pages (from-to)1051–1072
    JournalDigital Scholarship in the Humanities
    Volume37
    Issue number4
    DOIs
    Publication statusPublished - 2022

    Cite this