The performances of two different procedures for calculating the likelihood ratio (LR) for forensic text comparison (FTC) are empirically compared. One is the multivariate kernel density (MVKD) procedure with so-called lexical features. The MVKD procedure has been successfully applied to various types of evidence, including texts. The other is the procedure based on character N-grams. N-gram is a widely-used, robust probabilistic language model. The effectiveness of character N-grams has been reported in authorship analysis, however, to the best of my knowledge, it has not yet been applied to LR-based FTC. In this study, the log-likelihood-ratio-cost (Cllr), which is an appropriate assessment metric for LR-based systems, is used to assess the performance of the two procedures. It will be reported that the MVKD procedure outperforms the character N-gram procedure. Through the comparison of the two procedures, this study also demonstrates how the weight of evidence (= LR) can be estimated from text messages.
|Publication status||Published - 2015|
|Event||Australasian Language Technology Association Workshop ALTA 2015 - Western Sydney University, Parramatta|
Duration: 1 Jan 2015 → …
|Conference||Australasian Language Technology Association Workshop ALTA 2015|
|Period||1/01/15 → …|