-
Information:
新疆师范大学心理学院,乌鲁木齐
-
Keywords:
Developmental dyslexia in Chinese; SMOTEENN oversampling; Stacking ensemble; Small sample imbalanced data; Predictive model
汉语发展性阅读障碍; SMOTEENN过采样; Stacking集成; 小样本不均衡数据; 预测模型
-
Abstract:
Early identification of developmental dyslexia (DD) is constrained by small samples and class-imbalanced data in real classrooms, and it remains unclear whether DD predictors vary across grade levels. Based on reading and cognitive ability assessments of 219 elementary school children in grades 1~5, this study constructs a predictive model by integrating SMOTEENN oversampling with stacking ensemble learning techniques, and employs SHAP analysis to identify key predictors and their grade-level differences. Results show: (1) phonological awareness and reading accuracy are core predictors of DD, maintaining significant contributions across all grades; (2) grade-specific modeling reveals that DD prediction in lower grades (1~3) primarily relies on “basic cognitive abilities” (e.g., fluent pinyin reading), whereas in higher grades (4~5) it shifts toward “reading efficiency indicators” (e.g., reading accuracy and fluency), highlighting the developmental changes in reading and cognitive measures associated with DD prediction. This study offers a feasible solution for DD prediction under conditions of small samples and imbalanced data, and provides psychological evidence for understanding the cognitive development trajectory of DD and enabling grade-specific, precise screening.
发展性阅读障碍(DD)的早期识别在真实教学场景中面临小样本与不均衡数据限制,且DD的预测因子是否随年级发展而变化尚不明确。基于对219名1~5年级小学儿童的阅读与认知能力的测评数据,融合SMOTEENN过采样与Stacking集成学习技术构建预测模型,并利用SHAP分析技术识别核心预测因子及其年级差异。结果发现:(1)语音意识与阅读准确性是DD的核心预测因子,二者在不同年级均保持重要贡献;(2)分年级建模显示,低年级(1~3年级)DD的预测主要依赖“基础认知能力”(如拼音朗读流畅性),而高年级(4~5年级)则转向“阅读效率指标”(如阅读准确性、阅读流畅性),揭示了预测DD的阅读和认知能力指标的发展性变化。本研究为小样本、不均衡数据条件下的DD预测提供了可行方案,也为理解DD的认知发展轨迹及分年级精准筛查提供了心理学依据。
-
DOI:
10.35534/pc.0806146 (registering DOI)
-
Cite:
杨智予, 姜小婷, 博思坦·马合木提江, 杨若涵. (2026). 基于小样本不均衡数据的汉语发展性阅读障碍预测模型. 中国心理学前沿, 8 (6), 988-998.