Difference between revisions of "Resources"

Revision as of 15:21, 19 March 2010

Named Entity Recognition in Questions: Towards a Golden Collection

A set of nearly 5,500 manually annotated questions to be used as training corpus in machine learning based NER systems. The named entities in these questions were identified and classified according to the categories: Person, Location and Organization. We extended and particularized the guidelines of the shared task of the Conference on Computational Natural Language Learning (CoNLL) 2003 on NER to face the demands presented by questions.

These corpora are freely available for research purposes. You can download the training corpus here, and the testing corpus here.

Further details on building this question corpora can be found in [2]. We kindly ask you to cite this publication whenever you use the resource.

@@ Line 1: / Line 1: @@
 '''''Named Entity Recognition in Questions: Towards a Golden Collection'''''
-* 5,500 annotated questions to be used as training corpus in machine learning based NER systems. The named entities in these questions were identified and classified according to the categories: <code>Person</code>, <code>Location</code> and <code>Organization</code>. We extended and particularized the guidelines of the shared task of the Conference on Computational Natural Language Learning (CoNLL) 2003 on NER to face the demands presented by questions.
+* A set of nearly 5,500 manually annotated questions to be used as training corpus in machine learning based NER systems. The named entities in these questions were identified and classified according to the categories: <code>Person</code>, <code>Location</code> and <code>Organization</code>. We extended and particularized the guidelines of the shared task of the Conference on Computational Natural Language Learning (CoNLL) 2003 on NER to face the demands presented by questions.
-Further details on building the question corpus can be found in [[Publications#2010|[2]]]. We kindly ask you to cite this publication whenever you use the resource.
+These corpora are freely available for research purposes. You can download the training corpus [[Media:Train_5500questions_NEannotated.txt|here]], and the testing corpus [[Media:Test_500questions_NEannotated.txt|here]].
+Further details on building this question corpora can be found in [[Publications#2010|[2]]]. We kindly ask you to cite this publication whenever you use the resource.

Difference between revisions of "Resources"

Revision as of 15:21, 19 March 2010

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

l2f

community

Tools