From Shared Tasks to Benchmark Datasets:
Lessons from NTCIR’s 26-Year Journey

A Tutorial at LREC 2026

Event: The 15th edition of the Language Resources and Evaluation Conference (LREC 2026)

Location: Palau de Congressos de Palma, Palma de Mallorca (Spain)

Date: 11-16 May 2026 (Tutorial Date: TBA)

Abstract

This half-day tutorial will introduce the history and experiences of the NTCIR evaluation series (NII Testbeds and Community for Information access Research) from its inception in 1999 to the present. We will focus on how NTCIR pioneered multilingual information access evaluation, the challenges of collecting benchmark datasets (e.g. for Japanese, Chinese, and other languages) and how these challenges were addressed.

The tutorial will be Introductory to adjacent areas – providing an overview from the beginning until now – so no prior knowledge of NTCIR is required. We will cover the evolution of NTCIR’s shared tasks, the methods used to gather and curate evaluation corpora, and lessons learned in organizing international benchmark test collections.

Attendees will gain insight into NTCIR’s contributions to information retrieval (IR) and natural language processing (NLP) research – from cross-lingual search to question answering and beyond – and understand best practices for creating and sharing evaluation resources. The target audience includes researchers and practitioners interested in language resources and evaluation campaigns. By the end of the tutorial, participants will have a clear understanding of NTCIR’s legacy, the practical aspects of running collaborative evaluations, and open challenges for future multilingual evaluation initiatives.

Tutorial Outline

Part 1: NTCIR History and Test Collection Development (1 hour)

Presented by Noriko Kando

This part will introduce the inception and evolution of NTCIR. We will cover the motivations behind the first NTCIR workshop in 1999. Noriko Kando will recount how the early Japanese and Chinese IR test collections were constructed essentially from scratch, discussing challenges in document collection, topic design, relevance criteria, and handling linguistic differences (e.g., word segmentation). It highlights the "community building" aspect of NTCIR.

Part 2: Participants, Experiments, and Key Findings (1 hour)

Presented by Makoto P. Kato

This part focuses on insights gained from NTCIR’s many tasks (Ad Hoc retrieval, QA, Patent Search, Math Retrieval, Lifelog, etc.) and contributions to the research community. We will present statistics on the global growth of the NTCIR community and discuss notable experimental findings and methodological innovations that emerged from these shared tasks.

Part 3: Organizing Evaluation Campaigns and Dataset Sharing (1.5 hours)

Presented by Chung-Chi Chen

The final part focuses on the practical aspects of organizing shared evaluation campaigns. Drawing on experience from FinNum, FinArg, and NTCIR coordination, this session outlines the end-to-end process: developing task proposals, recruiting participants, dataset curation (including ethics/privacy), submission handling, and assessment procedures. It provides practical insights into organizing reliable and impactful evaluation campaigns.

Presenters

Tutorial Materials

Slides and Resources

The materials for this tutorial will be available here closer to the conference date.

Download Link (Coming Soon)