Utilizing deep learning from mobile phone photos for early detection of horizontal strabismus: a screening approach

Rathachai Chawuthai; Anawat Sermswan; Chatree Boonnithititikul; Kiatthida Hokierti; Wasawat Sermsripong; Piyaphat Jaruniphakul; Thammanoon Surachatkumtonekul

doi:10.1038/s41598-026-48893-6

Utilizing deep learning from mobile phone photos for early detection of horizontal strabismus: a screening approach

Date

2026-4-14

Authors

Rathachai Chawuthai

Anawat Sermswan

Chatree Boonnithititikul

Kiatthida Hokierti

Wasawat Sermsripong

Piyaphat Jaruniphakul

Thammanoon Surachatkumtonekul

Publisher

Scientific Reports

Abstract

To develop and validate an artificial intelligence pipeline for binary screening of horizontal strabismus versus orthotropia using smartphone-acquired facial images and geometric landmark analysis. This two-stage system combines Real-Time Detection Transformer (RT-DETR) to localize nine ocular landmarks per eye across three gaze directions (left, center, right), and supervised machine learning classifiers. A feature set of five biometric ratios was derived from coordinates including the canthi, limbi, and corneal light reflexes. The model was trained on facial images from 150 participants (96 with strabismus and 54 controls). To address class imbalance and improve generalizability, Synthetic Minority Oversampling Technique (SMOTE) and 4-fold cross-validation were applied. RT-DETR achieved an intersection over union of 0.62 and a mean center-point error of 6.52 pixels in landmark localization. The Random Forest classifier achieved an accuracy of 0.95, sensitivity of 0.96, specificity of 0.94, positive predictive value of 0.97, and negative predictive value of 0.92. This study demonstrates the feasibility of combining transformer-based landmark detection with geometric ratios for strabismus screening. The framework shows high performance under controlled conditions. While the use of biometric ratios allows for feature-level inspection, further research is required to establish full clinical interpretability and performance in uncontrolled environments.