Feature name Type Description and values % missing
Encounter ID Numeric Unique identifier of an encounter 0%
Patient number Numeric Unique identifier of a patient 0%
Race Nominal Values: Caucasian, Asian, African American, 2%
Hispanic, and other
Gender Nominal Values: male, female, and unknown/invalid 0%
Age Nominal Grouped in 10-year intervals: 0% (0, 10), 10, 20), …, 90, 100)
Weight Numeric Weight in pounds. 97%
Admission type Nominal Integer identifier corresponding to 9 distinct 0%
values, for example, emergency, urgent, elective,
newborn, and not available
Discharge disposition Nominal Integer identifier corresponding to 29 distinct values, 0%
for example, discharged to home, expired, and
not available
Admission source Nominal Integer identifier corresponding to 21 distinct values, 0%
for example, physician referral, emergency room,
and transfer from a hospital
Time in hospital Numeric Integer number of days between admission 0%
and discharge
Payer code Nominal Integer identifier corresponding to 23 distinct values, 52%
for example, Blue Cross/Blue Shield, Medicare,
and self-pay
Medical specialty Nominal Integer identifier of a specialty of the admitting 53%
physician, corresponding to 84 distinct values, for
example, cardiology, internal medicine, family/
general practice, and surgeon
Number of lab Numeric Number of lab tests performed during the 0%
Procedures encounter
Number of Numeric Number of procedures (other than lab tests) 0%
Procedures performed during the encounter
Number of Numeric Number of distinct generic names administered 0%
Medications during the encounter
Number of Numeric Number of outpatient visits of the patient in the 0%
outpatient visits year preceding the encounter
Number of Numeric Number of emergency visits of the patient in the 0%
emergency visits year preceding the encounter
Number of Numeric Number of inpatient visits of the patient in the 0%
inpatient visits year preceding the encounter
Diagnosis 1 Nominal The primary diagnosis (coded as first three 0% digits of ICD9); 848 distinct values. (International Classification of diseases)
Diagnosis 2 Nominal Secondary diagnosis (coded as first three digits 0% of ICD9); 923 distinct values
Diagnosis 3 Nominal Additional secondary diagnosis (coded as first 1% three digits of ICD9); 954 distinct values
Number of Numeric Number of diagnoses entered to the system 0%
diagnoses
Glucose serum Nominal Indicates the range of the result or if the test was 0%
test result not taken. Values: “>200,” “>300,” “normal,” and “none” if not measured
A1c test result Nominal Indicates the range of the result or if the test was 0% not taken. Values: “>8” if the result was greater than 8%, “>7” if the result was greater than 7% but less than 8%, “normal” if the result was less than 7%, and “none” if not measured.
Change of Nominal Indicates if there was a change in diabetic 0%
medications medications (either dosage or generic name). Values: “change” and “no change”
Diabetes Nominal Indicates if there was any diabetic medication 0%
Medications prescribed. Values: “yes” and “no”
23 features for Nominal For the generic names: metformin, repaglinide, 0%
medications nateglinide, chlorpropamide, glimepiride, acetohexamide, glipizide, glyburide, tolbutamide, pioglitazone, rosiglitazone, acarbose, miglitol, troglitazone, tolazamide, examide, sitagliptin, insulin, glyburide-metformin, glipizide-metformin, glimepiride-pioglitazone, metformin-rosiglitazone, metformin-rosiglitazone, and metformin-pioglitazone,
the feature indicates whether the drug was prescribed
or there was a change, and “no” if the drug was not
prescribed
Readmitted Nominal Days to inpatient readmission. Values: “<30” if the 0% patient was readmitted in less than 30 days, “>30” if the patient was readmitted in more than 30 days, and “No” for no record of readmission.