Strength of forensic text comparison evidence from stylometric features: a multivariate likelihood ratio-based analysis

    Research output: Contribution to journalArticle

    Abstract

    An experiment in forensic text comparison (FTC) within the likelihood ratio (LR) framework is described, in which authorship attribution was modelled with word-and character-based stylometric features. Chatlog messages of 115 authors were selected from a chatlog archive containing real pieces of chatlog evidence used to prosecute paedophiles. Four different text lengths (500, 1000, 1500 or 2500 words) were used for modelling in order to investigate how system performance is influenced by sample size. Strength of authorship attribution evidence (or LR) is estimated with the Multivariate Kernel Density formula. Performance was primarily assessed with the log-likelihood ratio cost (Cllr), but assessments of other metrics, e.g. credible interval and equal error rate, are also given. Taking into account the small number of features used for modelling authorship attribution, results are promising. Even with a small sample size of 500 words, the system achieved a discrimination accuracy of c. 76% (Cllr = 0.68258). With a sample size of 2500 words, a discrimination accuracy of c. 94% (Cllr = 0.21707) was obtained. Larger sample size is beneficial to FTC, resulting in an improvement in discriminability, an increase in the magnitude of the consistent-with-fact LRs and a decrease in the magnitude of the contrary-to-fact LRs. It was found that 'Average character number per word token', 'Punctuation character ratio', and vocabulary richness features are robust features, which work well regardless of sample sizes. The results demonstrate the efficacy of the LR framework for analysing authorship attribution evidence.
    Original languageEnglish
    Pages (from-to)67-98pp
    JournalThe International Journal of Speech, Language and the Law
    Volume24
    Issue number1
    DOIs
    Publication statusPublished - 2017

    Fingerprint Dive into the research topics of 'Strength of forensic text comparison evidence from stylometric features: a multivariate likelihood ratio-based analysis'. Together they form a unique fingerprint.

    Cite this