CLEFeHealth Task 3 Evaluation Package

Guidelines for Task 3

Overview

Task 3 is focused on Information retrieval to address questions patients may have when reading clinical reports.

This is a standard TREC-style information retrieval (IR) task using :

a 2012 crawl of approximately one million medical documents made available by the EU-FP7 Khresmoi project in plain text form,
general public queries that individuals may realistically pose based on the content of their discharge summaries.

The goal of Task 3 is to retrieve the relevant documents for the user queries.

The training collection (document set, sample development queries, and result set) was distributed to registered task participants. Participants had one month to explore the collection and develop retrieval techniques, after which test queries for the task were released.

The evaluation was conducted using test queries generated by medical professionals from a manually extracted set of highlighted disorders identified in Task 1. A mapping between each query and the associated Task 1 matching discharge summary (from which the disorder was taken) is provided. Task participants were free to obtain access to the discharge summary and to use them as an external resource if desired.

Participants were allowed to use any external resources in their system.

They were asked to submit up to seven ranked runs:

Run 1 (mandatory) is a baseline: only title and description in the query can be used, and no external resource (including discharge summary, corpora, ontology, etc) can be used.
Runs 2-4 (optional) any experiment WITH the discharge summaries.
Runs 5-7 (optional) any experiment WITHOUT the discharge summaries.

One of the runs from 2-4 and one from 5-7 has to use only the fields title and desc from the queries. The runs have to be ranked in order of priority (1-7, 1 being the highest priority). Runs submitted have to follow TREC format.

A PERL script is provided here to check for errors in each CLEF eHealth 2013 - Task 3 submission.

Post-submission relevance assessment was conducted using a pool of the submitted runs. Result sets for the task and performance measures were distributed to participants.

Evaluation of the participant submissions was distributed to the participants before the CLEF eHealth Workshop.

Outcome Measures

Evaluation focused on P@5, P@10, NDCG@5, and NDCG@10. In addition mean average precision (MAP), and other suitable IR evaluation measures for the submitted runs were considered.

Tools for Evaluation

Evaluation metrics can be computed with the trec_eval evaluation tool, which is available from the TREC EVAL website.