Workshop on Noisy, Unstructured Text

We’re holding a workshop on dealing with noisy text, like the social media content that Pheme relies upon. The workshop will be held in Copenhagen, Denmark on September 7, 2017, in conjunction with the top-tier EMNLP conference. Here’s the official call for papers.

Call for Papers

We seek submissions of regular papers on original and unpublished work (same page limit EMNLP main conference). 1-page abstracts on work-in-progress or work published elsewhere are also welcome and will *not* be included in the conference proceedings. All accepted submissions will be presented as posters. Additionally, selected submissions will be presented orally. Shared task participants are also encouraged (but not required) to submit system description papers and present posters; the top systems will be invited (but not required) to present orally.

Important Dates

Submissions Deadline: Friday, June 2
Notification: Friday, June 30
Camera-Ready: Friday, July 14
Workshop: September 7, EMNLP – Copenhagen, Denmark

Workshop website:

Submission URL:

Topics of interest include but are not limited to:

  • NLP Preprocessing of Noisy Text
  • Part of speech tagging
  • Named entity tagging, including a wide range of categories, e.g. product names
  • Chunking of user-generated text
  • Parsing
  • Text Normalization and Error Correction
  • Normalizing noisy text for downstream tasks and for human readability
  • Error detection and correction
  • Paraphrase identification and semantic similarity of short text or noisy text
  • User prediction, e.g. geolocation, gender, age, etc
  • Bilingual translation of noisy text
  • Information extraction from noisy text
  • Multilingual NLP in noisy text
  • Colloquial language, e.g. idiom detection
  • Domain adaptation to user-generated text
  • Geolocation prediction
  • Global and regional trend detection and event extraction
  • Extracting user demographics, profiles and major life events
  • Detecting rumors, contradictory information, sarcasms and humors on social media
  • Sentiment analysis
  • Temporal aspects of user-generated content (resolving time expressions, concept drift, diachronic analyses, etc…)

All submissions should conform to EMNLP 2017 style guidelines, . Long and short paper submissions must be anonymized. Abstract submissions should include author information (and where the work was published in a footnote on front page, if applicable). Please submit your papers at the softconf link ( ).
Shared task: Novel and Emerging Entity Recognition
This shared task focuses on identifying unusual, previously-unseen entities in the context of emerging discussions. Named entities form the basis of many modern approaches to other tasks (like event clustering and summarisation), but recall on them is a real problem in noisy text – even among annotators. This drop tends to be due to novel entities and surface forms. Take for example the tweet “so.. kktny in 30 mins?” – even human experts find entity kktny hard to detect and resolve. This task will evaluate the ability to detect and classify novel, emerging, singleton named entities in noisy text.

Shared task organisers: Leon Derczynski (University of Sheffield), Marieke van Erp (VU University Amsterdam), Nut Limsopatham (University of Cambridge), Eric Nichols (Honda Research Institute, Japan)
Workshop Organizers

Leon Derczynski (The University of Sheffield)
Wei Xu (The Ohio State University)
Alan Ritter (The Ohio State University)
Tim Baldwin (The University of Melbourne)

Invited Speakers

Miles Osborne (Bloomberg)
Bill Dolan (Microsoft Research)
Dirk Hovy (University of Copenhagen)
Program Committee

David Bamman (University of California, Berkeley)
Kalina Bontcheva (University of Sheffield)
Claire Cardie (Cornell University)
Colin Cherry (National Research Council Canada)
Grzegorz Chrupała (Tilburg University)
Marina Danilevsky (IBM Research)
Seza Doğruöz (Tilburg University)
Heba Elfardy (Columbia University)
Noura Farra (Columbia University)
Eric Fosler-Lussier (The Ohio State University)
Kevin Gimpel (Toyota Technological Institute at Chicago)
Weiwei Guo (Yahoo! Research)
Ben Hachey (Hugo AI)
Masato Hagiwara (Duolingo)
Ed Hovy (Carnegie Mellon University)
Jing Jiang (Singapore Management University)
Nobuhiro Kaji (Yahoo! Research)
Emre Kiciman (Microsoft Research)
Chen Li (University of Texas at Dallas)
Wang Ling (Google DeepMind)
Fei Liu (University of Central Florida)
Huan Liu (Arizona State University)
Rada Mihalcea (University of Michigan)
Smaranda Muresan (Columbia University)
Preslav Nakov (Qatar Computing Research Institute)
Naoaki Okazaki (Tohoku University)
Miles Osborne (Bloomberg)
Ellie Pavlick (University of Pennsylvania)
Daniel Preoţiuc-Pietro (University of Pennsylvania)
Will Radford (Hugo AI)
Afshin Rahimi (The University of Melbourne)
Shourya Roy (Xerox Research)
Alla Rozovskaya (City University of New York)
Derek Ruths (McGill University)
Andrew Schwartz (Stony Brook University)
Djamé Seddah (University Paris-Sorbonne)
Richard Sproat (Google Research)
Anders Søgaard (University of Copenhagen)
Benjamin Strauss (The Ohio State University)
Jeniya Tabassum (The Ohio State University)
Joel Tetreault (Yahoo! Research)
Svitlana Volkova (Pacific Northwest National Laboratory)
Byron C. Wallace (University of Texas at Austin)
Xiaojun Wan (Peking University)
Jun-Ming Xu (University of Wisconsin-Madison)
Diyi Yang (Carnegie Mellon University)
Yi Yang (Georgia Tech)
Guido Zarrella (MITRE)
Ming Zhou (Microsoft Research)

Workshop and prize sponsors to be announced

Anti-harassment Policy:

