TY - JOUR KW - Neglected Diseases KW - Artificial Intelligence KW - Decision Support Systems, Clinical KW - Mobile Applications KW - dermatology KW - leprosy KW - Deep learning AU - Deps P AU - Amorim BBC AU - Repsold T AU - Almonfrey D AU - do Espírito Santo RB AU - Loureiro RM AU - Enechukwu NA AU - Barreiro TZ AU - Rodrigues MM AU - Fonseca AMF AU - Lima GN AU - Florian MC AU - Ruiz-Postigo JA AB -

Objectives.

To independently evaluate the World Health Organization (WHO) Skin Neglected Tropical Diseases (NTDs) application, focusing on the diagnostic performance of its underlying artificial intelligence model for leprosy detection. The primary objective was to determine the proportion of images in which leprosy appeared among the model’s Top-5 diagnostic predictions. The secondary objective was to qualitatively analyze diagnostic error patterns.

Methods.

A data set of 439 anonymized clinical images from confirmed leprosy cases (1996–2024) was analyzed, spanning the full clinical spectrum (indeterminate, tuberculoid, borderline/dimorphous, and lepromatous/Virchowian forms) and including reactional and atypical presentations. After excluding 16 images due to processing errors, 423 images were retained: 367 classical leprosy lesions and 56 reactional or atypical leprosy-related presentations. All images were evaluated using the WHO desktop version of the visual classifier. Top-5 sensitivity (recall) for leprosy was estimated, alongside a qualitative error analysis focusing on intrapatient inconsistencies and challenging lesion types.

Results.

The model achieved an overall Top-5 sensitivity (recall) of 84.9%, with higher sensitivity for classical lesions (87.2%) than for reactional or atypical presentations (69.6%). Qualitative review revealed inconsistent predictions for visually similar lesions from the same patient, and misclassifications concentrated among necrotic, inflammatory, and infiltrative lesions.

Conclusions.

The WHO Skin NTDs application demonstrates substantial promise as a clinical decision-support and educational tool, especially for classical leprosy. Performance gaps for reactional and atypical forms highlight the need for algorithmic refinement. Enhancing data set diversity and integrating patient-level context may improve diagnostic robustness.

 

BT - Revista Panamericana de Salud Pública DA - 04/2026 DO - 10.26633/rpsp.2026.40 LA - ENG M3 - Article N2 -

Objectives.

To independently evaluate the World Health Organization (WHO) Skin Neglected Tropical Diseases (NTDs) application, focusing on the diagnostic performance of its underlying artificial intelligence model for leprosy detection. The primary objective was to determine the proportion of images in which leprosy appeared among the model’s Top-5 diagnostic predictions. The secondary objective was to qualitatively analyze diagnostic error patterns.

Methods.

A data set of 439 anonymized clinical images from confirmed leprosy cases (1996–2024) was analyzed, spanning the full clinical spectrum (indeterminate, tuberculoid, borderline/dimorphous, and lepromatous/Virchowian forms) and including reactional and atypical presentations. After excluding 16 images due to processing errors, 423 images were retained: 367 classical leprosy lesions and 56 reactional or atypical leprosy-related presentations. All images were evaluated using the WHO desktop version of the visual classifier. Top-5 sensitivity (recall) for leprosy was estimated, alongside a qualitative error analysis focusing on intrapatient inconsistencies and challenging lesion types.

Results.

The model achieved an overall Top-5 sensitivity (recall) of 84.9%, with higher sensitivity for classical lesions (87.2%) than for reactional or atypical presentations (69.6%). Qualitative review revealed inconsistent predictions for visually similar lesions from the same patient, and misclassifications concentrated among necrotic, inflammatory, and infiltrative lesions.

Conclusions.

The WHO Skin NTDs application demonstrates substantial promise as a clinical decision-support and educational tool, especially for classical leprosy. Performance gaps for reactional and atypical forms highlight the need for algorithmic refinement. Enhancing data set diversity and integrating patient-level context may improve diagnostic robustness.

 

PB - Pan American Health Organization PY - 2026 SP - 1 EP - 8 T2 - Revista Panamericana de Salud Pública TI - Independent assessment of the WHO Skin Neglected Tropical Diseases application for leprosy detection UR - https://iris.paho.org/server/api/core/bitstreams/4fb0c2ef-3958-4132-856c-f4ad2135109d/content VL - 50 SN - 1020-4989, 1680-5348 ER -