Abstract
The performances of two different procedures for calculating
the likelihood ratio (LR) for forensic text comparison (FTC) are
empirically compared. One is the multivariate kernel density (MVKD)
procedure with so-called lexical features. The MVKD procedure has
been successfully applied to various types of evidence, including texts.
The other is the procedure based on character N-grams. N-gram is a
widely-used, robust probabilistic language model. The effectiveness of
character N-grams has been reported in authorship analysis, however, to
the best of my knowledge, it has not yet been applied to LR-based FTC.
In this study, the log-likelihood-ratio-cost (Cllr), which is an appropriate
assessment metric for LR-based systems, is used to assess the
performance of the two procedures. It will be reported that the MVKD
procedure outperforms the character N-gram procedure. Through the
comparison of the two procedures, this study also demonstrates how the
weight of evidence (= LR) can be estimated from text messages.
Original language | English |
---|---|
Pages | 39-47 |
Publication status | Published - 2015 |
Event | Australasian Language Technology Association Workshop ALTA 2015 - Western Sydney University, Parramatta Duration: 1 Jan 2015 → … |
Conference
Conference | Australasian Language Technology Association Workshop ALTA 2015 |
---|---|
Period | 1/01/15 → … |