Computing Veracity – the Fourth Challenge of Big Data

PHEME RTE dataset

For the special purpose of Natural Language Processing-based information verification, we have built a new Recognizing Textual Entailment  (RTE)  resource from Twitter data. The PHEME RTE dataset is compiled based on naturally occurring contradiction in manually labeled claims in tweets related to crisis events, and to our knowledge is the first resource for 3-way judgement RTE in the social media and verification domain. From about 500 English tweets related to 70 unique claims we created 5.4k RTE pairs. The RTE pairs are built by a semi-automatic method that is portable across languages and domains, but requires event and claim annotations. The resource, its creation method and pilot RTE evaluation are explained in the following paper:

Piroska Lendvai, Isabelle Augenstein, Kalina Bontcheva, Thierry Declerck (2016). Monolingual Social Media Datasets for Detecting Contradiction and Entailment. Proc. of LREC 2016.

Be Sociable, Share!

Comments are currently closed.