TED Task

IWSLT proposes challenging research tasks and an open experimental infrastructure for the scientific community working on spoken and written language translation. The IWSLT 2012 Evaluation Campaign includes the TED Task, that is the translation of TED Talks, a collection of public speeches on a variety of topics. Three tracks are proposed addressing different research tasks:

  • ASR track : automatic transcription of talks from audio to text (in English)
  • SLT track: speech translation of talks from audio (or ASR output) to text (from English to French)
  • MT track : text translation of talks for two language pairs plus ten optional language pairs:
    •      official: from English to French and from Arabic to English
    •      optional: from German, Dutch, Polish, Portoguese-Brazil, Romanian, Russian, Slovak, Slovenian, Turkish and Chinese to English
Main challenges of the proposed tracks are:
  • Open domain ASR, clean transcription of spontaneous speech, detection and removal of non-words, and talk style and topic adaptation.
  • Open domain SLT, translation of speech or ASR output into true-case punctuated text, and talk style and topic adaptation.
  • Open domain MT between distant languages, and talk style and topic adaptation.
Training of MT systems and language models for ASR is constrained to data supplied by the organizers. As for ASR acoustic modeling no training data are distributed, participants are allowed to use any publicly available data recorded before 31 December 2010.
 
ASR Track (English)
 
 
SLT Track (English to French)
 
MT Track (English to French, Arabic to English + 10 additional language pairs)
  • Input format: NIST XML format, true case with punctuation
  • Output format: NIST XML format, true case with punctuation (example)
  • Coding: UTF-8
  • Submission guidelines
  • Evaluation: BLEU and subjective ranking
  • Training data: 
  • Development data: from the WIT3 website
  • Test data: from the WIT3 website
Test References
The references of tst2011 set for ASR, SLT and MT tracks are available to participants here.
 
Registration
To participate in one or more tracks please fill in the Registration Form and send it by e-mail to iwslt2012.ted AT gmail DOT com 
Participants are requested to submit a system paper describing their work and to present it at the workshop.
 
Acknowledgments
  • TED talks data are copyright of TED Conference LLC and distributed under the Creative Commons Attribution-NonCommercial- NoDerivs 3.0 license
  • Google Books ngrams are copyright of Google Inc. and distributed under a Creative Commons Attribution 3.0 license
  • WMT12 data are kindly supplied by the NAACL 2012 7th Workshop on Statistical Machine Translation
  • MultiUN data are kindly supplied by the EuroMatrixPlus project
Supplied out-of-domain monolingual and parallel training data
 
TED Task Evaluation Team
  • Marcello Federico (FBK, Italy), Evaluation Chair
  • Michael Paul (NICT, Japan), Evaluation Chair
  • Mauro Cettolo (FBK, Italy), MT data processing (surname at fbk dot eu)
  • Sebastian Stueker (KIT, Germany), ASR data processing
  • Luisa Bentivogli (FBK, Italy), Subjective evaluation
  • Giovanni Moretti (CELCT, Italy), Subjective evaluation
You are here: Home Evaluation Campaign TED Task