How Should we Screen Overweight and Obese Adolescents for Risk of Type 2 Diabetes in Large Public Health Initiatives ?

1Department of Psychiatry, New York University School of Medicine, New York, USA 2Department of Child and Adolescent Psychiatry, New York University School of Medicine, New York, USA 3Department of Neurology, New York University School of Medicine, New York, USA 4Department of Medicine, New York University School of Medicine, New York, USA 5Department of Radiology, New York University School of Medicine, New York, USA 6Nathan Kline Institute for Psychiatric Research, Orangeburg, USA #Authors contributed equally to this manuscript (co-first authors)


Introduction
Methods: In 1,712 overweight or obese high school students who had fasting blood measures of glucose and insulin, and could therefore be used as a training sample, we tested whether the anthropometric measures body mass index, waist/height ratio, % body fat, and mean arterial blood pressure, were sufficient to identify IR and whether additional blood markers (triglyceride levels, hemoglobin A1C, and C-reactive protein), which were obtained on the same blood draw as the fasting glucose and insulin, added to the sensitivity and specificity of the anthropometric measures to detect IR.Insulin resistance was identified by a homeostatic model assessment value ≥ 3.99.We used, Random Forest (RF), a nonparametric recursive partitioning classification method, to ascertain how well the demographic and anthropometric variables or those variables plus the blood markers classified our adolescents carrying excess weight into those with and without IR.Our goal was to have high sentivity of detection, but we were not concerned about low specificity, since all adolescents carrying excess weight could benefit from a lifestyle intervention.
Results: Demographic and anthropometric measures predicted IR with a 89.14% sensitivity and 32.72% specificity.Body mass index, waist/height ratio, age, and % body fat had the highest importance in RF models.Adding blood data increased the sensitivity/specificity 2.6%/ 5.77% respectively, with triglyceride and C-reactive protein added and % body fat dropped as variables of importance.

Conclusion:
Adding blood parameters to the anthropometric variables only increased sensitivity by 2.6%, indicating that the high sensitivity achieved by anthropometric measurements alone may be adequate for predicting IR in adolescents carrying excess weight.glucose tolerance, IR is a strong predictor of type 2 diabetes [9].There is also clear evidence that obese adolescents, particularly those with Metabolic Syndrome, have cognitive and structural brain abnormalities when compared to their metabolically healthy peers and that their degree of IR predicts the brain impairments [10].Therefore, it is very important to identify which adolescents carrying excess weight are more likely to be IR.

List of Abbrevations
The two gold-standard methods for the quantification of IR are the euglycemic hyperinsulinemic clamp technique [11,12] and the frequently sampled intravenous glucose tolerance test [13].These techniques allow the dynamic measurements of insulin function, but are complicated, fairly invasive, need to be done in a medical setting, and require specialized expertise [14], which makes them less suitable for routine medical screenings [15].The homeostatic model assessment of insulin resistance (HOMA-IR) [16] is a less invasive method for estimating IR and uses fasting insulin and glucose blood concentrations.Studies suggest that HOMA-IR has higher reliability in the measurement of IR in children and adolescents than other methods that also use fasting glucose and insulin levels [17].
There is no consensus on the best predictor(s) of IR during adolescence.Some report adiposity as the most important determinant [18], while other studies describe obesity as the most important risk factor after accounting for sex, age, and/or race/ethnicity [2,9].Although obtaining an overnight fasting blood sample to measure glucose and insulin levels is fairly straightforward, it imposes a significant burden on consumers and is not very practical for community screening efforts.Therefore, establishing a predictive measure of IR in youth that is valid, reliable, cost effective, does not overburden consumers, and can be used for large community screenings, particularly among adolescents carrying excess weight, would be an important first step in identifying youth at risk for developing type 2 diabetes.
Our goal was to ascertain whether demographic characteristics and anthropometric measures could predict IR with sufficiently high sensitivity to be used as a screening tool without blood derived markers such as cholesterol profile (including triglycerides (TG)), Hemoglobin A1C, or C-reactive protein (CRP).Since all adolescents carrying excess weight could benefit from interventions to improve their lifestyle, we focused mostly on sensitivity and were less concerned on the level of specificity.Regarding our own study, the intervention merely involved educational materials that were given to the parent/student together with the results of the medical evaluation, therefore high probability of false positives was not a major concern, but missing true positives was considered much more concerning.Other screening applications may have different considerations and a cost-benefit analysis may lead to the requirement of a different balance between sensitivity and specificity.We obtained fasting blood samples from a convenience sample of overweight and obese urban adolescents to test the following hypotheses: 1) Screening based on anthropometric measurements have sufficient sensitivity to identify adolescents with IR.
2) Additional blood markers add little to anthropometric measures in the prediction of IR.

Methods
To test our hypotheses we utilized Random Forest (RF), a model free classification approach that uses recursively partitioned classification analyses.

Study Participants
The data used in this study is drawn from the Banishing Obesity and Diabetes in Youth (BODY) Project, described in detail elsewhere [19].The BODY Project, a school-based medical screening and education program, was conducted from 2007 to 2014.The project was supported by the Nathan S. Kline Institute for Psychiatric Research and the New York University Langone Medical Center's Community Service Plan.It was approved by the institutional review boards of the New York University School of Medicine, the New York City Department of Education, the New York City Department of Health and Mental Hygiene, and the Nathan Kline Institute.
Students from 17 New York City public high schools (grades 9-12 and 13 to 21 years old) participated in The BODY Project.Informed consent was obtained directly from all participants aged 18 years and older.Parental consent and participant assent were obtained from students under 18 years of age.All participants were compensated for their time and received a medical report with educational information regarding their individual medical results.Self-reported pregnancy and existing diabetes diagnosis were the only exclusion criteria.A total of 3,088 unique students participated during the 7 years of the project and of these 1,712 participants were identified as carrying excess weight (overweight or obese), thus considered to be at increased risk for having IR, and used here for hypothesis testing.However, to better describe all the students that participated in the BODY Project, refer to Table 1 where lean and overweight/obese adolescents are contrasted.

Demographic Characteristics:
Students self-reported their sex, birthdate, and race/ethnicity.

Anthropometric Measurements and Body Mass Index (BMI):
Using standardized methods, trained BODY Project staff measured height, weight, and waist circumference.BMI was calculated with two different methods.1) BMI percentile was obtained using BMI percentile calculator for children and teens as provided by the Center for Disease Control and Prevention (CDC).The BMI percentile calculator, in addition to the height and weight, takes into account subjects' age and sex.BMI percentile equal to and above 85% were considered overweight or obese; 2) Raw BMI values were also measured with the following formula: Weight (kg)/ Height 2 (m), which allowed us to include age and sex as individual variables in RF model.Waist-to-height ratio (WHR) was calculated using waist circumference (cm) divided by height (cm).Quantum IV bioelectrical impedance analyzers (RJL Systems) and the bioelectrical impedance method were used to measure body composition and % body fat (PBF) [20].Mean blood pressure (MBP) was calculated as the sum of twice the diastolic blood pressure (DBP) plus the systolic blood pressure (SBP) divided by three; (MBP = ((DBP x 2) + SBP)/3).Blood Measurements: A blood sample was obtained in the early morning at the participant's school prior to the beginning of morning classes and after a 10-12 hour fast.Glucose and insulin levels; lipid profile (total cholesterol, low density lipoprotein (LDL), high density lipoprotein (HDL), and TG levels); hemoglobin A1C ; and C-reactive protein (CRP) were obtained.Prior to the blood draw, students were questioned about whether they had consumed anything other than water after dinner the night before.Those students that reported consuming any calories, including sweetened gum, were rescheduled.After the blood draw and anthropometric measurements, students were given a simple breakfast and sent to class.Standard clinical pathology methods were used for the blood tests.A glucose oxidase method (VITROS 950 AT; Johnson & Johnson) was used for the measurement of fasting blood glucose and insulin was assayed using chemiluminescence (Advia Centaur; Bayer Corporation).All assays were conducted at the NYU Langone Medical Center Clinical Pathology Laboratories.

Estimation of Insulin Resistance:
Fasting glucose and insulin levels were used to compute HOMA-IR.HOMA-IR score was calculated using the following formula: [Glucose (mg/dl) X Insulin (uIU/ml)]/405.As previously described in adolescent populations [21,22], we used a HOMA-IR ≥ 3.99 as a conservative cut point for defining someone as having IR.Insulin resistance is a continuum and although we could have chosen a lower cut score for IR, we felt a HOMA-IR ≥ 3.99 was conservative to identify a clinically relevant level of abnormality while still allowing behavioral interventions during this pre-clinical stage of increased risk of diabetes.
Once we ascertained which variables were important in the RF prediction of IR, we were interested to ascertain potential cut scores for those variables, which could offer clinicians some guidance as to when an abnormality could be concerning.To accomplish this we ran receiver operator characteristic (ROC) curves for the anthropometric variables that had been identified by RF as being predictive of IR.Given that the range of BMI and PBF differ widely between overweight and obese adolescents, in order to come up with practical and potentially useful cut scores for those two measurments, we ran ROC curves for each group separately.

Statistical Analysis and Classification and Regression Trees:
The Random Forest (RF) classification method is an extension of recursive partitioning that grows multiple trees rather than one [23].In contrast to recursive partitioning, which grows a single logical if/then tree leading to a predicted classification, RF does no pruning.It is an ensemble classification algorithm consisting of a collection of unpruned recursive partitioning decision trees, built from multiple bootstrap samples of the original data.Each tree casts a "vote" as to which group an individual belongs to and the classification, based on the average over the trees, thus has increased prediction precision.Approximately one-third of study participants are excluded in the construction of a specific tree in each bootstrap sample.Using this so-called out-of-bag sample as the test data, RF calculates the error rate of the derived classification forest [24].RF provides an importance estimate of each of the features (independent variables) and thus informs the value of any one feature for classification modeling.The sensitivity and specificity of the resulting RF classification algorithm can be obtained.RF is known to be a highly accurate classifier and its use in medical diagnosis or decision-making [25] is rapidly increasing.
For this study we first ran RF with only anthropometric measurements and demographics (variables that do not require a fasting blood sample) including BMI, WHR, PBF, age, sex, ethnicity and mean arterial blood pressure.We then repeated this analysis only utilizing the variables that achieved an importance score > 20 out of 100.As a next step we added blood markers such as LDL, HDL, TG, total cholesterol, CRP, and Hemoglobin A1C to the demographic and anthropometric variables and reran RF to predict IR.We report sensitivities and specificities here for the models that included only those variables with importance values of 20 or greater.
SPSS software (v.20; SPSS Inc, Chicago, IL) was also used to perform descriptive statistics of lean and overweight/obese groups as well as to identify outliers (≥ 3 standard deviations from the mean of their group) for all variables other than height, weight, or waist circumference); those outliers were excluded value-wise from the analyses.Salford Predictive Modeling (SPM) was used to perform RF.
A total of 3,088 high school students participated in the BODY Project and as can be seen in Table 1, the lean and overweight/ obese groups were significantly different on all basic anthropometric measurements as well as on mean arterial bloop pressure and the blood assays.However, as can also be seen in Table 1, the two groups did not differ on basic demographic characteristics except for age.Although the difference in mean age between the two groups was only 0.4 years, given the relatively low standard deviation and large number of subjects these differences were highly significant.We found 55.44% of the participants (1,712) were classified as overweight or obese and constitute the sample that was used in the RF analyses.Of the 1,712 students carrying excess

Results
Among the descriptive and anthropometric variables, the order of variable importance for detecting IR/non-IR in overweight/ obese asolescents (Figure 1) was BMI, WHR, age and PBF with sensitivity of 89.14% and specificity of 32.72% (Figure 2).When blood-derived measures were added as potential predictors, BMI, WHR, TG, age, and CRP had the highest importance scores (Figure 3).The sensitivity of this analysis was 91.6% and its specificity was 38.49% (Figure 4).

Random Forest Results
Overweight Group: BMI cut off of 26.5 kg/m 2 could predict IR with a sensitivity of 60% and specificity of 50%.WHR of 0.51 had the sensitivity of 55% and the specificity of 50%, and a PBF of 31.75% had a sensitivity of 65% and specificity of 50%.

Cut off Values
Obese Group: BMI cut off value of 31.75 kg/m 2 had the sensitivity of 72% and the specificity of 50% in the prediction of HOMA-IR.Likewise, WHR cut off of 0.58 with 75% sensitivity and 50% specificity and a PBF cut off of 35.45% had a 62% sensitivity and 50% specificity.weight 954 (55.72%) identified themselves as Hispanic, 451 (26.34%) as Black and 149 (8.70%) as Asian.Less than 3% of the study participants were white.Of these overweight/obese students, 405 qualified as IR with a HOMA-IR ≥ 3.99.The proportion of students with and without IR was equivalent by sex (P = 0.6).Based on our analyses using only demographic and anthropometric variables, we were able to predict IR with a sensitivity of 89.14%.Adding blood parameters to the prediction model increased the sensitivity by only 2.66% (from 89.14% to 91.8%).Based on the importance of detecting IR among adolescents carrying excess weight in a practical and cost-effective way, these data make a strong case for the use of BMI, WHR, age, and PBF to detect children at risk of diabetes and early cardiovascular disease.Although age is one of the predictive variables, it likely only enters the model because of the known and normal developmental aspects of insulin function, with older adolescents becoming more insulin sensitive [26].Although PBF is determined very straightforwardly using inexpensive equipment, if one wanted to further streamline the prediction of IR, upon removing PBF as a variable in the RF prediction of IR, the sensitivity of classification only drops 3.21% to 85.93%, suggesting that BMI, WHR, and age alone are very robust predictors.
The RF results strongly determines important variables predicting IR, but do not introduce any cut-off values that could be used in clinical applications.We ran ROC curve to provide an example of the possible cut scores for the variables that predicted overweight and separately obese adolescents to be at high risk of IR.This model suggests that with an average sensitivity and specificity, overweight adolescents with a BMI > 26.6 kg/m 2 , a WHR > 0.51, and a PBF > 31.75%, and obese adolescents with a BMI > 31.75 kg/m 2 , a WHR > 0.58, and a PBF > 35.45% are at increased risk of having a sufficiently elevated HOMA-IR to be categorized as IR.Not surprisingly the sensitivity and specificity for these ROC curves were much lower for the overweight adolescents, a group at much lower risk of IR.With that said, depending on the types of individulas that are screened, the cut scores reported here may not be appropriate to detect IR among a broader spectrum of BMI or among adolescents with a different genetic vulnerabilily.For example, Stern et al in a study using euglycemic clamp technique and classification and regression tree suggested that BMI > 28.7 kg/m 2 increases the risk of IR in normal subjects [15].In another study on Asian Indian adolescents, a group at much higher genetic risk, BMI > 22.6 kg/m 2 was suggested as the cut-off point increasing the risk of IR [14].To our knowledge no WHR and PBF cut-off scores have been suggested for the prediction of IR, although the assumption has always been that the higher those two variables are the higher the risk.

Discussion
In previous studies to predict IR, mostly blood markers along with some anthropometric measurements (especially BMI) were used [14,15].In 2009, Goel and colleagues [14] by utilizing recursive partitioning reported that the combination of anthropometric measurements and routine biochemical parameters was the most sensitive model for predicting IR in adolescents.In 2012, employing a multiple regression approach in a subset of the patients studied here, our group had reported that waist circumference was a highly informative predictor of HOMA-IR in obese adolescents [27].In the current RF analysis, we build on these previous findings using a much larger set of participants and confirm that use of only anthropometric measurements were the most important predictors of IR in overweight /obese adolescents using an unbiased partitioning method.Sensitivity of the prediction increases only minimally by adding blood parameters; BMI and WHR remained the best individual predictors.The strong predictive role of BMI has also been validated in previous studies [14,15].TG and CRP levels were the two laboratory values that added to anthropometric variables in contributing to the prediction of IR.However, together they added only 2.6% to the sensitivity of detection of the anthropometric variables alone.It is well established that elevations in serum TGs are commonly associated with IR and represent a valuable clinical marker of the Metabolic Syndrome [28].Different studies have demonstrated that there is a correlation between fasting insulin concentrations and CRP concentrations in plasma [29][30][31] suggesting that IR and inflammatory processes are related.Even though it's not yet clear whether inflammation is the direct result of obesity or whether it leads to IR, in these data CRP emerged as one of the predictors of IR, although with a lower importance value than anthropometric measures.It is interesting to note that other common lab values associated with obesity, such as Hemoglobin A1C levels did not add to the prediction of IR.Therefore, although some clinicians and researchers have advocated for the potential of using Hemoglobin A1C as a predictor of IR and/ or diabetes [32,33], we found no support for this among a large community-residing, non-clinical adolescents of color carrying excess weight.
One limitation of this study is the fact that we used a non-dynamic estimate of IR, HOMA-IR, which is based on fasting glucose and insulin levels.The gold standard to measure IR dynamically in the human is the hyperinsulinemic-euglycemic clamp technique [11].However, it would be unfeasible to use this technique with the large numbers of subjects reported on here.

Limitation and Strengths
Another limitation is that the specificity of the RF produced prediction was quite low.However, for detection of IR among adolescents carrying excess weight, this is not a major concern.False positives would most likely receive an educational intervention intended to improve lifestyle, which would carry little cost and could have benefits beyond reducing their risk of IR.However, clinicians/ public health practirioners considering other possible applications of these results need to conduct their own cost benefit analysis and decide whether the relatively low specificity is a problem.Also our study population was a population of convenience that was drawn from a medical screening school-based program.Participants were not selected randomly and students carrying excess weight were preferentially targeted for the screening program.This limitation could make the study result difficult to generalize to the general population.However, the goal of this study was to define a predictive model for only overweight /obese adolescences by using RF as a model-free means of analysis that does not assume normality of data.
A major strength of this study is the large number of participants, which is in contrast to previous studies on this topic.Also, our study included predominantly ethnic minorities, groups that likely through a combination of genetic and socioeconomic factors, are at highest risk of obesity and metabolic dysregulation from obesity.Another significant strength is the analytic method used in this study.RF has many advantages over traditional statistical techniques such as logistic regression because it accounts for both linear and non-linear relationships, missing data, and may reveal complex relationships between multiple predictor variables.To our knowledge, RF has not been utilized in any study relating the predictive role of various anthropometric and biochemical measures on IR.
We would like to acknowledge Dr. Carole Siegel for her suggestions on this paper.This study was supported by the Nathan S Kline Institute, and the NYU Langone Community Service Plan.

Figure 1 :
Figure 1: Random Forest variable importance for demographic characteristics and anthropometric measurements alone as predictors of insulin resistance defined by HOMA-IR ≥ 3.99

Figure 2 :
Figure 2: Receiver operator characteristic curve for Random Forest model with basic demographic characteristics and anthropometric measurements only as predictors of insulin resistance defined by HOMA-IR ≥ 3.99

Figure 3 :
Figure 3: Random Forest variable importance for demographic characteristics, anthropometric and routine blood measurements as predictors of insulin resistance defined by HOMA-IR ≥ 3.99

Figure 4 :
Figure 4: Receiver operator characteristic curve for Random Forest model with basic demographic characteristics, anthropometric and routine blood measurements as predictors of insulin resistance defined by HOMA-IR ≥ 3.99

Table 1 :
Demographic information and clinical characteristics of overweight/obese vs. lean Continous variables are presented as mean (standard deviation) BMI: Body Mass Index; Mean BP: Mean Blood Pressure; HDL: High Density Lipoprotein; LDL: Low Density Lipoprotein; HbA1c: Hemoglobin A1c; HOMA: Homeostatic Model Assessment; CRP: C-Reactive Protein