Using Machine Learning to Advance High School Dropout Prediction and Prevention

Loading...
Thumbnail Image
Degree type
PhD
Graduate group
Education
Discipline
Education
Subject
dropout prevention
early warning systems
machine learning
predictive analytics
Funder
Grant number
License
Copyright date
01/01/2025
Distributor
Related resources
Author
Alam, Anika
Contributor
Abstract

The importance of high school completion for jobs and postsecondary opportunities is well- documented. Combined with federal laws where high school graduation rate is a core performance indicator, school, districts, and states face pressure to actively monitor and assess high school completion. This study employs machine learning techniques to identify students at-risk of exiting high school in either 9th or 10th grade. I find increased precision when applying resampling techniques to balance the training data, and that logistic regression performs similarly to more complex algorithms. When assessing the algorithmic fairness of models, I find most models tend to discriminate students with group membership in English proficiency, disability, and economic disadvantage attributes. Post-hoc analyses of the XGboost model reveal that a student’s age in 8th grade followed by middle grade absences, especially chronic absenteeism, is predictive of early exit. This study advances the current state of knowledge in the field by (1) generating synthetic data to improve model accuracy, (2) ensuring that model predictions prevent the deepening of structural inequities, and (3) exploring novel approaches to enhance the explainability associated with “black box” models, ultimately generating actionable insights for practitioners and stakeholders.

Advisor
Bowden, A., Brooks
Date of degree
2025
Date Range for Data Collection (Start Date)
Date Range for Data Collection (End Date)
Digital Object Identifier
Series name and number
Volume number
Issue number
Publisher
Publisher DOI
Journal Issue
Comments
Recommended citation