Likelihood ratio estimation for authorship text evidence: An empirical comparison of score- and feature-based methods

Shunichi Ishihara, Michael Carne

    Research output: Contribution to journalArticle

    Abstract

    This study compares score- and feature-based methods for estimating forensic likelihood ratios for text evidence. Three feature-based methods built on different Poisson-based models with logistic regression fusion are introduced and evaluated: a one-level Poisson model, a one-level zero-inflated Poisson model and a two-level Poisson-gamma model. These are compared with a score-based method that employs the cosine distance as a score-generating function. The two types of methods are compared using the same data (i.e., documents attributable to 2,157 authors) and the same features set, which is a bag-of-words model using the 400 most frequently occurring words. Their performances are evaluated via the log-likelihood ratio cost (Cllr) and its composites: discrimination (Cllrmin) and calibration (Cllrcal) cost. The results show that (1) the feature-based methods outperform the score-based method by a Cllr value of 0.14–0.2 when their best results are compared and (2) a feature selection procedure can further improve performance for the feature-based methods. Some distinctive performance characteristics associated with likelihood ratios produced using the feature-based methods are described, and their implications will be discussed with real forensic casework in mind.
    Original languageEnglish
    JournalForensic Science International
    Volume334
    DOIs
    Publication statusPublished - 2022

    Cite this