JokeR

JokeR

Home | CLEF | Publications | Partners | Contest | Contact Us | Tools

CLEF Workshop:

Automatic Pun and Humour Translation

Call for Participation

Context

Humour remains one of the most difficult aspects of intercultural communication: understanding humour often requires understanding implicit cultural references and/or double meanings, and this raises the question of its (un)translatability. Wordplay is a common source of humour due to its attention-getting and subversive character. The translation of humour and wordplay is therefore in high demand. Modern translation depends heavily on technological aids, yet few works have treated the automation of humour and wordplay translation, or the creation of humour corpora. The goal of the JOKER workshop is to bring together translators and computer scientists to work on an evaluation framework for wordplay, including data and metric development, and to foster work on automatic methods for wordplay translation.

Tasks

We invite you to submit both automatic and manual runs! Manual intervention should be reported.

Deadlines

Access

Sign up at the CLEF website (https://clef2022-labs-registration.dei.unipd.it/). All team members should join the JOKER mailing list (https://groups.google.com/u/4/g/joker-project). After registration, you will receive an email with information on how to get access to the data.

The data is split into 3 folders corresponding to the shared tasks. Each task folder is further split in train data and test data.

Meta-data will be available as the participants’ results will be published.

Result submission:

Participants should put their run results into the folder Documents created for their user and submit them by email to contact@joker-project.com.

The email subject has to be in the format [CLEF TASK <NUMBER>] TEAM_ID.

Runs should be submitted as a ZIP folder of the corresponding JSON files. Manual runs are allowed to be submitted in a CSV format.

A confirmation email will be sent within 2 days after the submission deadline.

Task 1: Сlassify and explain instances of wordplay.

Train data format: List of classified wordplay instances in a JSON format or a CSV file (for manual runs) with the following fields:

Example:

[{"ID":"noun_1063","WORDPLAY":"Elimentaler","LOCATION":"Elimentaler","INTERPRETATION":"Emmental (cheese) + Eliminator","HORIZONTAL\/VERTICAL":"vertical","MANIPULATION_TYPE":"Similarity","MANIPULATION_LEVEL":"Sound","CULTURAL_REFERENCE":false,"CONVENTIONAL_FORM":false,"OFFENSIVE":null},{"ID":"pun_341","WORDPLAY":"Geologists can be sedimental about their work.","LOCATION":"sedimental","INTERPRETATION":"sentimental\/sediment","HORIZONTAL\/VERTICAL":"vertical","MANIPULATION_TYPE":"Similarity","MANIPULATION_LEVEL":"Sound","CULTURAL_REFERENCE":false,"CONVENTIONAL_FORM":false,"OFFENSIVE":null}]

Test data input format: List of wordplay instances to classify in a JSON format or a CSV file (for manual runs) with the following fields:

Input example:

[{"ID":"noun_1","WORDPLAY":"Ambipom"},{"ID":"het_1011","WORDPLAY":"These are my parents, said Einstein relatively"}]

Test data output format:

List of wordplay instances to be classified in a JSON format or a CSV file (for manual runs) with the following fields:

Output example:

[{"RUN_ID":"RT_task_1_run1","MANUAL":1,"ID":"noun_1063","WORDPLAY":"Elimentaler","TARGET_WORD":"Elimentaler","DISAMBIGUATION":"Emmental (cheese) + Eliminator","HORIZONTAL\/VERTICAL":"vertical","MANIPULATION_TYPE":"Similarity","MANIPULATION_LEVEL":"Sound","CULTURAL_REFERENCE":false,"CONVENTIONAL_FORM":false,"OFFENSIVE":null},{"RUN_ID":"RT_task_1_run1","MANUAL":1,"ID":"pun_341","WORDPLAY":"Geologists can be sedimental about their work.","TARGET_WORD":"sedimental","DISAMBIGUATION":"sentimental\/sediment","HORIZONTAL\/VERTICAL":"vertical","MANIPULATION_TYPE":"Similarity","MANIPULATION_LEVEL":"Sound","CULTURAL_REFERENCE":false,"CONVENTIONAL_FORM":false,"OFFENSIVE":null}]

Output format checker

You can use this python3 script to check the output format. The script requires Python 3 and the Pandas library: Download python output checker

Evaluation. Pilot Task 1 includes both classification and interpretation components. Classification performance will be evaluated with respect to accuracy, while interpretation performance will be evaluated semi-manually.

Result submission. Participants should put their run results into the folder Documents created for their user and submit them by email to contact@joker-project.com. The email subject has to be in the format [CLEF TASK 1] TEAM_ID.

Task 2: Translate single words containing wordplay.

Train data format: List of translated wordplay instances in a JSON format or a CSV file (for manual runs) with the following fields:

Example:

[{"id":"noun_1","en":"Ambipom","fr":"Capidextre"}]

Test data input format: List of wordplay instances to translate in a JSON format or a CSV file (for manual runs) with the following fields:

Input example:

[{"id":"noun_1185","en":"Fungun"}]

Test data output format:

List of wordplay instances to be translated in a JSON format or a CSV file (for manual runs) with the following fields:

Output example:

[{"RUN_ID":"OFFICIAL_task_2_run1","MANUAL":1,"id":"noun_1","en":"Ambipom","fr":"Capidextre"}]

Output format checker

You can use this python3 script to check the output format. The script requires Python 3 and the Pandas library: Download python output checker

Evaluation. Human evaluators will manually annotate the submitted translations according to both subjective measures and according to more concrete features such as whether wordplay exists in the target text, whether it corresponds to the type used in the source text, whether the target text preserves the semantic field, etc.

Result submission. Participants should put their run results into the folder Documents created for their user and submit them by email to contact@joker-project.com. The email subject has to be in the format [CLEF TASK 2] TEAM_ID.

Task 3: Translate entire phrases containing wordplay.

Train data format: List of translated wordplay instances in a JSON format or a CSV file (for manual runs) with the following fields:

Example:

[{"id":"pun_724_1","en":"My name is Wade and I'm in swimming pool maintenance.","fr":" Je m\u2019appelle Jacques Ouzy, je m\u2019occupe de l\u2019entretien des piscines."}]

Test data input format: List of wordplay instances to translate in a JSON format or a CSV file (for manual runs) with the following fields:

Input example:

[{"id":"het_713","en":"Ever since my mineral extraction facility was converted to parking, I've had a lot on my mine."}]

Test data output format:

List of wordplay instances to be translated in a JSON format or a CSV file (for manual runs) with the following fields:

Output example:

[{"RUN_ID":"JCM_task_3_run1","MANUAL":1,"id":"pun_724_1","en":"My name is Wade and I'm in swimming pool maintenance.","fr":" Je m\u2019appelle Jacques Ouzy, je m\u2019occupe de l\u2019entretien des piscines."}]

Output format checker

You can use this python3 script to check the output format. The script requires Python 3 and the Pandas library: Download python output checker

Evaluation. Human evaluators will manually annotate the submitted translations according to both subjective measures and according to more concrete features such as whether wordplay exists in the target text, whether it corresponds to the type used in the source text, whether the target text preserves the semantic field, etc.

Result submission. Participants should put their run results into the folder Documents created for their user and submit them by email to contact@joker-project.com. The email subject has to be in the format [CLEF TASK 3] TEAM_ID.

Terms of Use

By downloading and using JOKER data, you agree to the terms of use. Any use of the data for any purpose other than academic research, would be in violation of the intended use of these data.

Therefore, by downloading and using these data you give the following assurances with respect to the JOKER data:

  1. You will not use nor permit others to use the data in the JOKER datasets in any way except for classes and academic research.
  2. You will not at any time disclose, give, or transmit (in any manner or form or for any purpose) the data (or any portion thereof) to any location or person, including but not limiting to making the data available on the Internet, and copying the data onto any cloud-based storage system.
  3. You will not release nor permit others to release the dataset or any part of it to any person.

In case of violation of the conditions for access to the data for scientific purposes, this access may be withdrawn from the research entity and/or from the researcher. The research entity may also be liable to pay compensation for damages for third parties or asked to take disciplinary action against the offending researcher.

How to Cite

If you extend or use this work, please cite the paper where it was introduced:

Ermakova, L., Miller, T., Puchalski, O., Regattin, F., Mathurin, É., Araújo, S., 
Bosser, A.-G., Borg, C., Bokiniec, M., Corre, G. L., Jeanjean, B., Hannachi, R., 
Mallia, Ġ., Matas, G., & Saki, M. (2022). 
CLEF Workshop JOKER: Automatic Wordplay and Humour Translation. 
In M. Hagen, S. Verberne, C. Macdonald, C. Seifert, K. Balog, K. Nørvåg, & V. Setty (Eds.), 
Advances in Information Retrieval (Vol. 13186, pp. 355–363). Springer International Publishing. 
https://doi.org/10.1007/978-3-030-99739-7_45

Paper

Dowload .BIB

JOKER@CLEF presentation (pdf)

1-st Call for Participation (pdf)

This project has received a government grant managed by the National Research Agency under the program "Investissements d'avenir" with the Reference ANR-19-GURE-0001

JokeR is supported by The Human Science Institute in Brittany (MSHB)