From Shared Tasks to Benchmark Datasets:
Lessons from NTCIR’s 26-Year Journey

A Tutorial at LREC 2026

Event: The 15th edition of the Language Resources and Evaluation Conference (LREC 2026)

Location: Palau de Congressos de Palma, Palma de Mallorca (Spain)

Date: 16 May 2026

Abstract

This half-day tutorial will introduce the history and experiences of the NTCIR evaluation series (NII Testbeds and Community for Information access Research) from its inception in 1999 to the present. We will focus on how NTCIR pioneered multilingual information access evaluation, the challenges of collecting benchmark datasets (e.g. for Japanese, Chinese, and other languages) and how these challenges were addressed.

The tutorial will be Introductory to adjacent areas – providing an overview from the beginning until now – so no prior knowledge of NTCIR is required. We will cover the evolution of NTCIR’s shared tasks, the methods used to gather and curate evaluation corpora, and lessons learned in organizing international benchmark test collections.

Attendees will gain insight into NTCIR’s contributions to information retrieval (IR) and natural language processing (NLP) research – from cross-lingual search to question answering and beyond – and understand best practices for creating and sharing evaluation resources. The target audience includes researchers and practitioners interested in language resources and evaluation campaigns. By the end of the tutorial, participants will have a clear understanding of NTCIR’s legacy, the practical aspects of running collaborative evaluations, and open challenges for future multilingual evaluation initiatives.

Tutorial Outline

Part 1: NTCIR History and Test Collection Development (1 hour)

Presented by Noriko Kando

This part will introduce the inception and evolution of NTCIR. We will cover the motivations behind the first NTCIR workshop in 1999. Noriko Kando will recount how the early Japanese and Chinese IR test collections were constructed essentially from scratch, discussing challenges in document collection, topic design, relevance criteria, and handling linguistic differences (e.g., word segmentation). It highlights the "community building" aspect of NTCIR.

Part 2: Participants, Experiments, and Key Findings (1 hour)

Presented by Makoto P. Kato

This part focuses on insights gained from NTCIR’s many tasks (Ad Hoc retrieval, QA, Patent Search, Math Retrieval, Lifelog, etc.) and contributions to the research community. We will present statistics on the global growth of the NTCIR community and discuss notable experimental findings and methodological innovations that emerged from these shared tasks.

Part 3: Organizing Evaluation Campaigns and Dataset Sharing (1.5 hours)

Presented by Chung-Chi Chen

The final part focuses on the practical aspects of organizing shared evaluation campaigns. Drawing on experience from FinNum, FinArg, and NTCIR coordination, this session outlines the end-to-end process: developing task proposals, recruiting participants, dataset curation (including ethics/privacy), submission handling, and assessment procedures. It provides practical insights into organizing reliable and impactful evaluation campaigns.

Presenters

Chung-Chi Chen

AIST, Japan

Dr. Chung-Chi Chen is a Researcher at AIST’s Artificial Intelligence Research Center. His expertise is in NLP for finance and evaluation. He is the Program Committee Co-Chair of NTCIR-18 and has organized tasks like FinNum and FinArg. He founded ACL SIG-FinTech and organizes the FinNLP workshop series.

Makoto P. Kato

University of Tsukuba, Japan
National Institute of Informatics

Dr. Makoto Kato is an Associate Professor at the University of Tsukuba. His research interests lie in IR, web mining, and user behavior analysis. He has been deeply involved in NTCIR as a General Co-Chair and task organizer. He has extensive experience in mentoring and teaching international students.

Noriko Kando

National Institute of Informatics, Japan

Dr. Noriko Kando is a Professor at NII and the founder of the NTCIR workshop series. With over 25 years of experience in IR evaluation and cross-lingual information access, she has led NTCIR since 1999. She is a pioneer in evaluating information access for Asian languages.

Tutorial Materials

Slides and Resources

The materials for this tutorial will be available here closer to the conference date.

Download Link (Coming Soon)