Can machine learning predict injury in elite-level youth football players? To a certain extent .. yes it can.

It was a great pleasure to have contributed to this study. Elite-level youth football is known to entail a high injury risk. This is often attributed to early specialization, high training loads, and high training and game intensities. To specifically target injury risk mitigation strategies in young footballers, knowledge of both modifiable and non-modifiable risk factors is crucial. In practice, however, it is often not feasible for clubs and coaches to perform thorough player screening for injury risk management purposes. There is simply limited time and little financial means. Therefore, there is a strong interest to assess injury risk based on field-specific and relatively easy screening tests, such as motor performance tests already taken by many clubs to monitor player development. Therefore, the aim of this study was to use a machine learning approach to evaluate the risk of injury in youth elite-level football players, based on such available data.

  • The first aim was to use preseason test results to assess the accuracy of a machine learning model predicting injury during the season.

  • The second aim was to apply a similar model to correctly classify different types of injuries, namely overuse and acute injuries.

Methods

A total of 734 players in the U10 to U15 age categories (mean age, 11.7 ± 1.7 yr) from seven Belgian youth academies were prospectively followed during one season. Football exposure and occurring injuries were monitored continuously by the academies’ coaching and medical staff, respectively. Preseason anthropometric measurements (height, weight, and sitting height) were taken and test batteries to assess motor coordination and physical fitness (strength, flexibility, speed, agility, and endurance) were performed. Extreme gradient boosting algorithms (XGBoost) were used to predict injury based on the preseason test results. Subsequently, the same approach was used to classify injuries as either overuse or acute.

Results

During the season, half of the players (n = 368) sustained at least one injury. Of the first occurring injuries, 173 were identified as overuse and 195 as acute injuries. The machine learning algorithm was able to identify the injured players in the hold-out test sample with 85% precision, 85% recall (sensitivity) and 85% accuracy (f1 score). Furthermore, injuries could be classified as overuse or acute with 78% precision, 78% recall, and 78% accuracy.

FIGURE 1—SHAP. The features in the model are listed from the relatively most (top) to least (bottom) important by their global impact on the model. Dots representing the SHAP values for each feature value of an individual in the dataset are plotted …

FIGURE 1—SHAP. The features in the model are listed from the relatively most (top) to least (bottom) important by their global impact on the model. Dots representing the SHAP values for each feature value of an individual in the dataset are plotted horizontally next to the feature. Overlapping points are jittered in y-axis direction, so we get a sense of the distribution of the Shapley values per variable. The higher the absolute value (either positive or negative), the higher the importance in the classification decision-making process. Positive SHAP values represent a higher probability of a positive prediction (i.e., being injured). Each dot is colored by the value (i.e., measured value) of the feature for an individual, where blue represents the lower values (e.g., worse SBJ score), and red the higher values (e.g., better SBJ score). A gray dot represents a missing value. (A) SHAP values of the variables in the model predicting injury. Positive SHAP values represent a higher chance of injury. (B) SHAP values of the variables in the model classifying overuse and acute injuries. Positive SHAP values represent the risk of acute injuries, whereas negative values correspond to overuse injury risk. BMI, body mass index, n, number of; KTK3, 3 subtest short version of the Körperkoordinationstest für Kinder; YoYo IR1, YoYo Intermittent Recovery Test Level 1.

Conclusions

In this unique study, we observed that a machine learning model was reasonably accurate in the prediction of injury in elite-level youth football players based on preseason test results. It is also possible to classify players sustaining overuse or acute injuries with a slightly lower accuracy based on the same measures. Practitioners could use this information to assess the risk of particular types of injuries before the start of the competitive season. This information would allow academies to focus the available (financial) resources for injury risk management on those players with a higher injury risk.

Rommers, N. Rossler R. Verhagen, E. Vandecasteele, F. Verstockt, S. Vaeyens, R. Lenoir, M. D'Hondt, E. Witvrouw, E.A. Machine Learning Approach to Assess Injury Risk in Elite Youth Football Players. Med. Sci. Sports Exerc., Vol. 52, No. 8, pp. 1745–1751, 2020.