42
loading...
This website collects cookies to deliver better user experience
What’s your favourite thing about SpaCy? Mine’s SpaCy.
Selection of two corpora that are not domain-specific, freely available, and in English: the Groningen Meaning Bank (GMB) and the CoNLL 2003 corpus.
Selection of five NER libraries that are free and open-source software, well-documented, available for Linux, and can recognise at least three types of entities: persons, organisations, and locations.
Comparison of each NER library’s generated NER annotations with annotations in the “gold data”, which contains the annotations that we’d expect. This is done by computing the precision, recall, and F-score for each library.
CoNLL 2003 | GMB | ||||||
---|---|---|---|---|---|---|---|
Library | Entity | Precision | Recall | F-score | Precision | Recall | F-score |
Stanford NLP | Location | 91.30 | 88.73 | 90.00 | 83.10 | 63.64 | 72.08 |
Organisation | 86.32 | 80.92 | 83.53 | 71.40 | 47.42 | 56.99 | |
Person | 92.72 | 82.68 | 87.41 | 78.59 | 84.70 | 81.53 | |
Overall | 90.06 | 73.67 | 81.05 | 79.81 | 63.74 | 70.88 | |
NLTK | Location | 52.47 | 65.47 | 58.26 | 77.13 | 77.10 | 77.12 |
Organisation | 36.20 | 24.80 | 29.44 | 42.06 | 35.54 | 38.53 | |
Person | 61.09 | 66.11 | 63.50 | 38.07 | 55.87 | 45.28 | |
Overall | 51.78 | 45.56 | 48.47 | 60.96 | 63.91 | 62.40 | |
GATE | Location | 59.63 | 78.63 | 67.82 | 79.03 | 48.16 | 59.85 |
Organisation | 50.58 | 21.29 | 29.96 | 45.08 | 37.68 | 41.05 | |
Person | 69.53 | 62.67 | 65.92 | 46.53 | 53.70 | 49.86 | |
Overall | 61.48 | 47.44 | 53.55 | 61.72 | 46.78 | 53.22 | |
OpenNLP | Location | 76.54 | 52.22 | 62.08 | 84.34 | 45.84 | 59.40 |
Organisation | 38.06 | 14.87 | 21.39 | 59.27 | 30.64 | 40.39 | |
Person | 83.94 | 37.17 | 51.52 | 62.34 | 41.98 | 50.17 | |
Overall | 68.68 | 30.44 | 42.18 | 37.35 | 41.71 | 39.41 | |
SpaCy | Location | 73.38 | 75.36 | 74.36 | 77.04 | 56.64 | 65.28 |
Organisation | 40.95 | 36.24 | 38.45 | 41.20 | 36.50 | 38.70 | |
Person | 66.89 | 56.22 | 61.09 | 67.41 | 69.14 | 68.27 | |
Overall | 60.94 | 49.01 | 54.33 | 66.15 | 54.32 | 59.66 |