Discourse Annotation
A workshop immediately following ACL
'04 in Barcelona, Spain
July 25-26, 2004
Full Paper Submissions: March 22, 2004
Preliminary Program
A preliminary version of the program for the workshop is available here.
Workshop Overview
Advances in language technology draw on a combination of annotated
empirical data and linguistic theory. The richer the annotation, the
more that can potentially be learned and applied to unseen data.
Thus the Penn TreeBank (PTB), with its part-of-speech (POS) tags
and syntactic annotation, has been more useful than corpora annotated
for POS-tags alone, and PropBank, in which PTB is annotated with
predicate-argument relations, will be useful for more applications
than the PTB alone.
Two gross features of PTB and PropBank are that they annotate
sentence/clause-level features and that they were undertaken
with communal agreement (albeit somewhat contentious at first).
Similar, largely communal projects have been undertaken for
dialogue annotation, including MATE (now NITE).
Discourse annotation (in contrast with sentence-level
annotation) has taken a somewhat different
course. While an early communal effort (DRI) to annotate discourse
structure according to a consensus framework failed to achieve its
goal, recognition remained of the value of discourse annotated
corpora. The result has been that diverse grass-roots efforts have
been producing individual corpora annotated for a wide variety of
phenomena such as
- referring/attributive expressions and coreference;
- spatial/temporal expressions and spatial/temporal relations;
- other anaphoric and/or elliptic expressions and their discourse
dependencies;
- discourse units and their relations to one another;
- information structure themes and the themes/rhemes that license
them;
- discourse connectives and what they connect;
- contexts of interpretation;
- cognitive accessibility scales (e.g. animacy);
- types of speech (direct, indirect, free indirect).
Groups involved in these efforts appear to be using (or
planning to use) these corpora for a range of applications that
include: empirical testing of theoretical claims/hypotheses;
supporting second-language acquisition of discourse-sensitive
linguistic devices; training resolution procedures for co-referring
expressions or other anaphors, that can be used in annotating
additional texts or in supporting technologies such as information
extraction, question answering, summarization, and/or text generation;
training discourse parsers that can be used for annotating additional
texts or for reducing the amount of manual effort needed in the
process; and probabilistic sentence and text realization.
The workshop is neutral as to whether consensus annotation is possible
for every type of discourse phenomenon. Its aims are rather to:
- bring a fuller range of discourse annotation activity to the
attention of researchers working on discourse phenomena and their
usefulness for language technologies;
- highlight tools used in the annotation process or used to display
or further analyse the results of annotation;
- discuss obstacles to some (all?) forms of discourse-level
annotation, such as the greater subjectivity that seems involved
in making judgments related to, for example, bracketting and
labelling;
- identify gaps in this work (e.g., in the range of genre being
annotated);
- stimulate researchers with respect to the uses other researchers
are putting their data to;
- discuss (in small groups and in feedback sessions) whether we
already have, or could together create, a significantly large,
reusable corpus (or set of corpora) annotated for multiple
discourse and sentence-level phenomena, as a much richer basis
for both assessing theories and building better tools.
With these aims in mind, we solicit papers on:
- discourse annotation projects (in any language);
- uses made of discourse annotated corpora, alone or together
with other forms of annotation;
- tools for discourse annotation (e.g., for assisting manual
annotation or for (semi-)automating the process) or for analysing
discourse annotated data;
- tools for integrating layers of annotation (different types of
word-, sentence-, and discourse-level markup);
- requirements for annotated corpora from the perspective of
computational linguistics (e.g., vis-a-vis data sharing,
comparison, integration/alignment, etc.)
- experiments with integrating and exploiting different layers of
annotation (from word to discourse level)
As well as for presentation, the papers will be used for structuring
the above-mentioned small group discussions and feedback sessions.
Submissions are limited to original, unpublished work. Papers should
be written in English.
Schedule
Paper submissions due at midnight GMT on March 22, 2004
Notification of acceptance for papers: April 30, 2004
Camera ready papers due: May 24, 2004
Workshop date: Jul 25-26, 2004
Co-chairs
Professor Bonnie Webber
School of Informatics
University of Edinburgh
2 Buccleuch Place
Edinburgh EH8 9LW
UK
email: bonnie@inf.ed.ac.uk
phone: +44 131 650 4190
fax: +44 131 650 4587
|
Professor Donna Byron
Dept. of Computer and Information Science
Ohio State University
395 Dreese Laboratory
2015 Neil Avenue
Columbus, Ohio 43210
USA
email: dbyron@cis.ohio-state.edu
phone: +1 614-292-6370
fax: +1 614-292-2911
|
donna byron
Last modified: Thu Jan 8 17:55:08 EST 2004