AutoML Challenge Result Collection
Technical Memorandum

Introduction

This report documents the data we collected to analyze the AutoML challenge. All results were obtained from the Codalab platform at: http://codalab.org/AutoML. See a brief description of the challenge and results at ICML2016. Other events and related publications are found at http://automl.chalearn.org. This is the STARTING KIT of the challenge.

AutoML challenge phases.

Challenge datasets and winners

The datasets of the challenge are downloadable. It is possible to make post-challenge submissions in all rounds using the “clone” websites. Statistics on datasets of the challenge are given in Table \ref{tab:datasets}. We show in Table \ref{tab:resu-winners} the results of the challenge winners.

Challenge result data

We collected in Table \ref{tab:resutab}, all prediction results and ranking scores for all participants on all datasets for all phases (described in Table \ref{tab:phases}):

  1. LEADERBOARD RESULTS: Ranking scores for all phases as shown on the leaderboards of http://codalab.org/AutoML (in separate files).

  2. ALL SUBMISSIONS: Ranking scores for all submissions, including those not shown on the leaderboard. We concatenated all the results in 3 files (for automl, final, and tweakathon phases). The participants singled out by a (*) in Table \ref{tab:stars} are represented in color.

  3. PREDICTIONS AND CODE OF WINNERS
    For the winners given in Table \ref{tab:resu-winners}, we provided:
    1) Code: A table, which includes pointers to winners’ challenge submission archive, organized as follows:
        round[xx]_[phase]_[participant]/
        Example: round2_automl_A/ or round4_final_I/
        Round run from from 0 to 5.
        Phases are: automl, [tweakathon], final, clone.
        Participants (with and without star): A, B, D, etc.
    In each archive, you will find in some cases only the code submitted, and in some cases code and results.
    2) Prediction results: We provide the input to the scoring program for the same submissions.
    Those should be identical to the results found in the challenge submission archive (when present).

  4. SYSTEMATIC STUDY
    For some selected participants having participated in most phases, we performed a systematic study in which a re-ran the code of the last AutoML phase (downloadable from Table \ref{tab:code}) on all datasets of all rounds. For scoring the results, we used “clones” of the original website of the challenge. We provide the results for all the participants with a star, see Table \ref{tab:stars}. We organized the results as follows in directories correspond to participants:
    1) Scores: We provide a table recapitulating the scores on all datasets of all rounds.
    2) Submissions made and prediction results:We provide zip files with detailed results. Subdirectories correspond to phases. We have several types of phases, see Table \ref{tab:phases}. Not all results on all phases are necessarily available for all participants (because some participants did not enter all phases). In particular, the participants who did not provide their code do not have AutoML or “clone” results. In each subdirectory, you find 3 zip files:

    • ParticipantName-input.zip: submission made (code and/or results).

    • ParticipantName-prediction-output.zip: output of the code run, i.e. predictions made.

    • ParticipantName-output.zip: output of the scoring program, i.e. scores.

In Table \ref{tab:1} we show a recapitulation of the scores of the systematic study. We preferably used the “clone” results, but when such results were missing, we used directly the scores from the leaderboard in either AutoML of Final phases, see the table of score origin.

——————————————————————————————————————————————————————————————-

\label{tab:phases} Different phases.
AutoML: Those are phases in which code is blind tested on 5 new datasets.
Tweakathon: Long development phases without time/resource constraint (result submission). Validation set results displayed on leaderboard.
Final: Short phases run at end of Tweakathon with no new submission, to compute and display on leaderboard final test set results.
Clone: Systematic post-challenge runs performed on “cloned” websites for each round using the last code of the top ranking participants.


——————————————————————————————————————————————————————————————-

\label{tab:resutab} Downloadable AutoML challenge results. To download entire directories from the table below, first import the data in your own Google drive, then right click to download and save.
Name Description Link
(1) Leaderboard CSV tables downloadable from original leaderboards in all phases (visible scores) 119.6 Ko
(2) All submissions CSV tables downloadable from original admin in all phases (all scores) 74.8 Ko
(3) Preditions and code of winners Prediction results and code for top ranking participants, all phases 6.8 Ko
(4) Systematic study (clone results) CSV tables downloadable from original admin in all phases (all submissions) 52 Ko

——————————————————————————————————————————————————————————————-

\label{tab:stars} Top ranking participants for each round used in systematic study. We indicate with a (*) the participants who participated in most phases or those for whom we have a version of the code, which we could systematically run on all phases.
Abbreviation Name Codaba ID Star
A AAD Freiburg aad_freiburg (*)
B Abhishek Thakur abhishek4 (*)
D Damir Jajetic djajetic (*)
I Eugene Tuv ideal.intel.analytics (*)
J James Lloyd jrl44 and backstreet.bayes (*)
L Lisheng Sun lise_sun (*)
M Marc Boulle marc_boulle (*)
P Jungtaek Kim postech.mlg_exbrain -
R Organizers reference (*)
S Victor Kocheganov asml.intel.analytics -
T Tadej Stajer tadejs -
V Matthias Vonrohr matthias.vonrohr -

——————————————————————————————————————————————————————————————-

\label{tab:code} Code that we ran for the systematic study (so called on the “clone” results.)
Participant Code
aad_freiburg Aad_freiburg
Abhishek (abhishek4) Abhishek cpu , Abhishek gpu
Djajetic (djajetic) Djajetic
JamesLloyd JamesLloyd
postech.mlg_exbrain postech.mlg_exbrain
Lise_sun Lise_sun

——————————————————————————————————————————————————————————————-

\label{tab:1} The scores of the systematic study (A=aad_freiburg, R=Reference, M=MarcBoulle, L=Lise_sun, D=Djajetic, I=Ideal.intel, B=Abhishek, J=JamesLloyd).
Name A B D I J L M R
ADULT 0.8196 0.817776 0.8111 0.826222 0.813512 0.796962 0.814966 0.8173
CADATA 0.7978 0.792406 0.7753 0.813205 0.094834 0.787527 0.642396 0.7579
DIGITS 0.9543 0.939351 0.8282 0.963198 0.727975 0.947165 0.862431 0.8707
DOROTHEA 0.6639 0.871564 0.8239 0.887741 0.821667 0.837204 0.790628 0.7005
NEWSGROUPS 0.4806 0.460781 0.6383 0.589376 0.330142 0.052108 0.375205 0.5640
CHRISTINE 0.4916 0.464491 0.4772 0.553743 0.479846 0.458733 0.452975 0.4221
JASMINE 0.6274 0.613895 0.6198 0.645786 0.621868 0.612756 0.556948 0.5627
MADELINE 0.8154 0.590686 0.6446 0.813030 0.567384 0.583827 0.178949 0.5274
PHILIPPINE 0.6638 0.528731 0.5232 0.715266 0.521441 0.524443 0.453259 0.5129
SYLVINE 0.8973 0.874073 0.8903 0.934401 0.894377 0.871339 0.825459 0.8895
ALBERT 0.3793 0.322143 0.3627 0.374544 0.318619 0.342394 0.353325 0.3209
DILBERT 0.9406 0.786052 0.7545 0.980403 0.205467 0.237472 0.456052 0.7902
FABERT 0.3558 0.193274 0.3292 0.351617 0.025622 0.180991 0.212973 0.2394
ROBERT 0.4586 0.332090 0.3268 0.513582 0.212340 0.397557 0.369236 0.3617
VOLKERT 0.3346 0.256738 0.2795 0.369673 0.108845 0.149575 0.143360 0.2484
ALEXIS 0.7462 0.652066 0.6719 0.755245 0.619634 0.675878 0.617896 0.6449
DIONIS 0.8969 0.319426 0.7541 0.925978 0.022466 0.872194 0.811004 0.3097
GRIGORIS 0.7284 0.761328 0.7966 0.968569 0.541645 0.877537 0.963473 0.7532
JANNIS 0.5481 0.383526 0.4143 0.416623 0.235539 0.364760 0.388719 0.3995
WALLIS 0.7137 0.626787 0.7358 0.707129 0.122579 0.227869 0.584236 0.6181
EVITA 0.5902 0.594572 0.5805 0.613747 0.592082 0.589508 0.517537 0.4148
FLORA 0.4952 0.506437 0.4980 0.526335 0.022466 0.418871 0.507126 0.3710
HELENA 0.2241 0.225066 0.1531 0.245525 0.062051 0.202160 0.187143 0.0804
TANIA 0.4713 0.757413 0.3887 0.727105 0.533747 0.598563 0.659744 0.5378
YOLANDA 0.3241 0.371978 0.2871 0.386293 0.022466 0.241496 0.190242 0.2602
ARTURO 0.7458 0.799447 0.7775 0.772712 0.302994 0.717143 0.700492 0.7738
CARLO 0.4456 0.373970 0.4278 0.179569 0.357667 0.400558 0.369823 0.1425
MARCO 0.5485 0.711727 0.6902 0.535431 0.664495 0.538370 0.677593 0.2526
PABLO 0.3004 0.290594 0.3091 0.270825 0.030457 0.286216 0.250741 0.2828
WALDO 0.5865 0.563447 0.5726 0.606862 0.560087 0.556697 0.461757 0.5645
\label{tab:datasets} Datasets of the AutoML challenge. C=number of classes. Cbal=class balance. Sparse=sparsity. Miss=fraction of missing values. Cat=categorical variables. Irr=fraction of irrelevant variables. Pte, Pva, Ptr=number of examples of the test, validation, and training sets, respectively. N=number of features. Ptr/N=aspect ratio of the dataset.
Rnd   DATASET TASK Metric Time C Cbal Sparse Miss Cat Irr Pte Pva Ptr N Ptr/N
0 1 ADULT multilabel F1 300 3 1 0.16 0.011 1 0.5 9768 4884 34190 24 1424.58
0 2 CADATA regression R2 200 0 NaN 0 0 0 0.5 10640 5000 5000 16 312.5
0 3 DIGITS multiclass BAC 300 10 1 0.42 0 0 0.5 35000 20000 15000 1568 9.57
0 4 DOROTHEA binary AUC 100 2 0.46 0.99 0 0 0.5 800 350 800 100000 0.01
0 5 NEWSGROUPS multiclass PAC 300 20 1 1 0 0 0 3755 1877 13142 61188 0.21
1 1 CHRISTINE binary BAC 1200 2 1 0.071 0 0 0.5 2084 834 5418 1636 3.31
1 2 JASMINE binary BAC 1200 2 1 0.78 0 0 0.5 1756 526 2984 144 20.72
1 3 MADELINE binary BAC 1200 2 1 1.2e-06 0 0 0.92 3240 1080 3140 259 12.12
1 4 PHILIPPINE binary BAC 1200 2 1 0.0012 0 0 0.5 4664 1166 5832 308 18.94
1 5 SYLVINE binary BAC 1200 2 1 0.01 0 0 0.5 10244 5124 5124 20 256.2
2 1 ALBERT binary F1 1200 2 1 0.049 0.14 1 0.5 51048 25526 425240 78 5451.79
2 2 DILBERT multiclass PAC 1200 5 1 0 0 0 0.16 9720 4860 10000 2000 5
2 3 FABERT multiclass PAC 1200 7 0.96 0.99 0 0 0.5 2354 1177 8237 800 10.3
2 4 ROBERT multiclass BAC 1200 10 1 0.01 0 0 0 5000 2000 10000 7200 1.39
2 5 VOLKERT multiclass PAC 1200 10 0.89 0.34 0 0 0 7000 3500 58310 180 323.94
3 1 ALEXIS multilabel AUC 1200 18 0.92 0.98 0 0 0 15569 7784 54491 5000 10.9
3 2 DIONIS multiclass BAC 1200 355 1 0.11 0 0 0 12000 6000 416188 60 6936.47
3 3 GRIGORIS multilabel AUC 1200 91 0.87 1 0 0 0 9920 6486 45400 301561 0.15
3 4 JANNIS multiclass BAC 1200 4 0.8 7.3e-05 0 0 0.5 9851 4926 83733 54 1550.61
3 5 WALLIS multiclass AUC 1200 11 0.91 1 0 0 0 8196 4098 10000 193731 0.05
4 1 EVITA binary AUC 1200 2 0.21 0.91 0 0 0.46 14000 8000 20000 3000 6.67
4 2 FLORA regression ABS 1200 0 NaN 0.99 0 0 0.25 2000 2000 15000 200000 0.08
4 3 HELENA multiclass BAC 1200 100 0.9 6e-05 0 0 0 18628 9314 65196 27 2414.67
4 4 TANIA multilabel PAC 1200 95 0.79 1 0 0 0 44635 22514 157599 47236 3.34
4 5 YOLANDA regression R2 1200 0 NaN 1e-07 0 0 0.1 30000 30000 400000 100 4000
5 1 ARTURO multiclass F1 1200 20 1 0.82 0 0 0.5 2733 1366 9565 400 23.91
5 2 CARLO binary PAC 1200 2 0.097 0.0027 0 0 0.5 10000 10000 50000 1070 46.73
5 3 MARCO multilabel AUC 1200 24 0.76 0.99 0 0 0 20482 20482 163860 15299 10.71
5 4 PABLO regression ABS 1200 0 NaN 0.11 0 0 0.5 23565 23565 188524 120 1571.03
5 5 WALDO multiclass BAC 1200 4 1 0.029 0 1 0.5 2430 2430 19439 270 72

——————————————————————————————————————————————————————————————-

\label{tab:resu-winners} Results of the AutoML challenge winners. \(<R>\) is the average rank over all five data sets of the round and was used to rank the participants. \(<S>\) is the average score over the five data sets of the round. UP is the percent increase in performance between the average performance of the winners in the AutoML phase and the Final phase of the same round.
Rnd Ended Winners \(<R>\) \(<S>\) Ended Winners \(<R>\) \(<S>\) UP (\(\%\))
1. ideal 1.40 0.8159
0 NA NA NA NA 02/14/15 2. abhi 3.60 0.7764 NA
3. aad 4.00 0.7714
1. aad 2.80 0.6401 1. aad 2.20 0.7479
1 02/15/15 2. jrl44 3.80 0.6226 06/14/15 2. ideal 3.20 0.7324 15
3. tadej 4.20 0.6456 3. amsl 4.60 0.7158
1. jrl44 1.80 0.4320 1. ideal 2.00 0.5180
2 06/15/15 2. aad 3.40 0.3529 11/14/15 2. djaj 2.20 0.5142 35
3. mat 4.40 0.3449 3. aad 3.20 0.4977
1. djaj 2.40 0.0901 1. aad 1.80 0.8071
3 11/15/15 2. NA NA NA 02/19/16 2. djaj 2.00 0.7912 481
3. NA NA NA 3. ideal 3.80 0.7547
1. aad 2.20 0.3881 1. aad 1.60 0.5238
4 02/20/16 2. djaj 2.20 0.3841 05/1/16 2. ideal 3.60 0.4998 31
3. marc 2.60 0.3815 3. abhi 5.40 0.4911
G 1. abhi 5.60 0.4913
P NA NA NA NA 05/1/16 2. djaj 6.20 0.4900 NA
U 3. aad 6.20 0.4884
1. aad 1.60 0.5282
5 05/1/16 2. djaj 2.60 0.5379 NA NA NA NA NA
3. post 4.60 0.4150

Results

Bellow visualization of raw data without normalization .

\label{fig:fig1} Line plot of score ( participant x lables ) .

\label{fig:fig2} Line plot of score (data x lables ) .

\label{fig:fig3} Scatter plot of score (data x lables ) .

\label{fig:fig4} Plot SVD ( date and participants ) .

Corrected by Lisheng: Fig. 4 Plot SVD (data and participants )

Fig-\ref{fig:fig4}- represents a plot of score matrix using a SVD ( singular value decomposition) where data sets are represented with ellipse the height depends on the number of training points \(P_{tr}\) and the width depends on the number of features N.

\label{fig:fig5} Plot TSNE ( date and participants ) .

Lisheng: Plot TSNE ( date and participants ) .

Fig-\ref{fig:fig5}- represents a plot of score matrix using a TSNE (van der Maaten 2008) .

\label{fig:fig6} Plot dendrogramme ( double hierarchical clustering rows and columns ) .

The diagrams below shows a different participants and data box plot .

\label{fig:fig7} Plot box participants Lisheng: checked ok

\label{fig:fig8} Plot box data ( median descending task color ) Lisheng: checked ok

\label{fig:fig9} Plot box data ( task color ) Lisheng: checked ok

\label{fig:fig10} Plot box data ( median descending round color ) Lisheng: checked ok

\label{fig:fig11} Plot box data ( task round ) Lisheng: checked ok

Some general observations about box plots

  1. In the Fig\ref{fig:fig7}(ordered by median) , sample aad_freuburg , ideal-intel,djajetic and abhishek appear to have similar centres, which exceed those of lise-sun , reference and marc-boulle. Sample aad_freiburg appears to have smaller variability than the other samples. All samples are reasonably symmetric, but sample marc-boulle is skewed . There are no obvious outliers in any of the samples.

  2. In the Fig\ref{fig:fig8}(ordered by median) and Fig\ref{fig:fig9} , the colors of the box are based on tasks . There are obvious outliers in several samples. interpretation : we note that two types of tasks are distinguished (multi class : green and regression : red ) : - green box plot : - red box plot :

  3. In the Fig\ref{fig:fig10}(ordered by median) and Fig\ref{fig:fig11} , the colors of the box are based on round .There are obvious outliers in several samples.

\label{fig:fig12} Correlation matrix

\label{fig:fig13} Median - Quartile (data and participants)

Corrected by Lisheng: Fig 13 Median - Quartile (data and participants)

Replace this text with your caption
Replace this text with your caption

Conclusions and further work

The conclusion goes here.

Acknowledgment

References

  1. Laurens van der Maaten, Geoffrey Hinton. Visualizing data using t-SNE. The Journal of Machine Learning Research 9, 85 (2008).

[Someone else is editing this]

You are editing this file