Estimating the Strength of Authorship Evidence with a Deep-Learning-Based Approach

Shunichi Ishihara, Satoru Tsuge, Mitsuyuki Inaba, Wataru Zaitsu

    Research output: Contribution to conference โ€บ Paper

    Abstract

    This study is the first likelihood ratio (LR)based forensic text comparison study in which each text is mapped onto an embedding vector using RoBERTa as the pre-trained model. The scores obtained with Cosine distance and probabilistic linear discriminant analysis (PLDA) were calibrated to LRs with logistic regression; the quality of the LRs was assessed by log LR cost (๐ถ๐‘™๐‘™๐‘Ÿ). Although the documents in the experiments were very short (maximum 100 words), the systems reached the ๐ถ๐‘™๐‘™๐‘Ÿvalues of 0.55595 and 0.71591 for the Cosine and PLDA systems, respectively. The effectiveness of deep-learning-based text representation is discussed by comparing the results of the current study to those of the previous studies of systems based on conventional feature engineering tested with longer documents.
    Original languageEnglish
    Pages1-5
    Publication statusPublished - 2022
    EventThe 20th Annual Workshop of the Australasian Language Technology Association - Adelaide, SA
    Duration: 1 Jan 2022 โ†’ โ€ฆ

    Conference

    ConferenceThe 20th Annual Workshop of the Australasian Language Technology Association
    Period1/01/22 โ†’ โ€ฆ

    Fingerprint

    Dive into the research topics of 'Estimating the Strength of Authorship Evidence with a Deep-Learning-Based Approach'. Together they form a unique fingerprint.

    Cite this