Abstract
Purpose
To use machine learning (ML) to predict adult height (AH) based on growth measurements until age 6 years.
Methods
Growth data from 1596 subjects (798 boys) aged 0-20 from the longitudinal GrowUp 1974 Gothenburg cohort were utilized to train multiple ML regressors. Of these, 100 were used for model comparison, the rest was used for 5-fold cross-validation. The winning model, Random Forest (RF), was first validated on 684 additional subjects from the 1974-cohort. It was additionally validated using 1890 subjects from GrowUp 1990 Gothenburg cohort and 145 subjects form the Edinburgh Longitudinal Growth Study cohort.
Results
RF with 51 regression trees produced the most accurate predictions. The best predicting features were sex, and height at age 3.4-6.0 years. Observed and predicted AH were 173.9±8.9cm and 173.9±7.7cm, respectively, with prediction average error of -0.4±4.0cm. Validation of prediction for 684 GrowUp 1974 children showed prediction accuracy r=0.87 between predicted and observed AH (R²=0.75). When validated on the 1990 Gothenburg and Edinburgh cohorts (completely unseen by the learned RF model), the prediction accuracy was r=0.88 in both cases (R²=0.77). AH in short children was over-predicted and AH in tall children was under-predicted. Prediction absolute error correlated negatively with AH (p<0.0001).
Conclusions
We show successful, validated ML of AH using growth measurements before age 6 years. The most important features for prediction were sex, and height at age 3.4-6.0. Prediction errors result in over- or under-estimates of AH for short and tall subjects, respectively. Prediction by ML can be generalized to other cohorts.
This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)