03148nas a2200373 4500000000100000008004100001260004600042653002300088653002800111653003900139653002400178653001700202653001200219653001800231100001100249700001500260700001400275700001600289700002600305700001600331700001700347700001600364700001700380700001600397700001200413700001500425700002000440245010500460856009800565300000800663490000700671520207100678022002502749 2026 d c04/2026bPan American Health Organization10aNeglected Diseases10aArtificial Intelligence10aDecision Support Systems, Clinical10aMobile Applications10adermatology 10aleprosy10aDeep learning1 aDeps P1 aAmorim BBC1 aRepsold T1 aAlmonfrey D1 ado Espírito Santo RB1 aLoureiro RM1 aEnechukwu NA1 aBarreiro TZ1 aRodrigues MM1 aFonseca AMF1 aLima GN1 aFlorian MC1 aRuiz-Postigo JA00aIndependent assessment of the WHO Skin Neglected Tropical Diseases application for leprosy detection uhttps://iris.paho.org/server/api/core/bitstreams/4fb0c2ef-3958-4132-856c-f4ad2135109d/content a1-80 v503 a
Objectives.
To independently evaluate the World Health Organization (WHO) Skin Neglected Tropical Diseases (NTDs) application, focusing on the diagnostic performance of its underlying artificial intelligence model for leprosy detection. The primary objective was to determine the proportion of images in which leprosy appeared among the model’s Top-5 diagnostic predictions. The secondary objective was to qualitatively analyze diagnostic error patterns.
Methods.
A data set of 439 anonymized clinical images from confirmed leprosy cases (1996–2024) was analyzed, spanning the full clinical spectrum (indeterminate, tuberculoid, borderline/dimorphous, and lepromatous/Virchowian forms) and including reactional and atypical presentations. After excluding 16 images due to processing errors, 423 images were retained: 367 classical leprosy lesions and 56 reactional or atypical leprosy-related presentations. All images were evaluated using the WHO desktop version of the visual classifier. Top-5 sensitivity (recall) for leprosy was estimated, alongside a qualitative error analysis focusing on intrapatient inconsistencies and challenging lesion types.
Results.
The model achieved an overall Top-5 sensitivity (recall) of 84.9%, with higher sensitivity for classical lesions (87.2%) than for reactional or atypical presentations (69.6%). Qualitative review revealed inconsistent predictions for visually similar lesions from the same patient, and misclassifications concentrated among necrotic, inflammatory, and infiltrative lesions.
Conclusions.
The WHO Skin NTDs application demonstrates substantial promise as a clinical decision-support and educational tool, especially for classical leprosy. Performance gaps for reactional and atypical forms highlight the need for algorithmic refinement. Enhancing data set diversity and integrating patient-level context may improve diagnostic robustness.
a1020-4989, 1680-5348