Thursday, January 12, 2017

2017 MLS Superdraft Analysis

Last year my Superdraft model did fairly well, given that it only uses college and PDL team name, position, GA status, and USYNT history. However, this year it didn't seem to be as accurate with three North Carolina players(Colton Storm, Walker Hume, and Tucker Hume) projected to go over 2000 minutes while none seemed to be particularly close, so I've decided to try a different method. You can find that model under Raw Computer Model Exp. Mins in the posted spreadsheet with all the raw data used for my new method.

I first created a polynomial model for minutes played in the first two years and then used mock drafts and big boards to simulate where the player was picked and run the player's position in the mock draft/big board into the polynomial model to create Expected Minutes played numbers.

You can view the full viz here, it takes a while for it to load sometimes and is a bit small for my taste.


A couple of notes:

Since Abu Danladi was only selected first, second, or third and was most often selected second, his box plot is just a straight line. However, this doesn't mean that Danladi has a very high chance to play exactly 2690 minutes, the box plot appears this way because he is selected second in more than half the mock drafts.

Also this has a slight bias towards Soccer By Ives' draft material since it has published 7 mock drafts and big boards compared to TopDrawerSoccer's 4 and MLSSoccer.com's 2. This reflects especially in Reagan Dunk's boxplot, as SBI has ranked him anywhere between 21 and 31 compared to the TDS' range of 5 to 11.

Finally, I used a clustering algorithm to cluster the results to form more concrete “rankings,” that describe the difference between two players that normal 1,2,3,4... rankings do not provide. The difference between Jeremy Ebobisse and Abu Danladi is one spot if you rank the players, the same difference between Jacori Hayes and Nick DePuy. However, there is really no difference between Ebobisse’s and Danladi’s expected minutes (0 difference in median exp. mins) and there is a much bigger difference between Hayes’ and DePuy’s expected minutes(366 difference in median exp. mins).


PositionsCluster#(as appears on spreadsheet)Players(Median Exp. Mins)
1-32Abu Danladi(2690)
Jeremy Ebobisse(2690)
Miles Robinson(2507)
4-88Jackson Yueill(2176)
Jonathan Lewis(2176)
Chris Odoi-Atsem(1886)
Brandon Aubrey(1886)
Jacori Hayes(1886)
9-127Nick DePuy(1520)
Eric Klenfovsky(1414)
Niko Hansen(1316)
Lalas Abubakar(1316)
13-24(w/6)3Zeiko Lewis(1061)
Justin Schmidt(922)
Julian Gressel(804)
Guillermo Delgado(621)
Francis DeVries(621)
Walker Hume(585)
Christian Thierjung(585)
13-24(w/3)6Colton Storm(922)
Adonijah Reid(863)
Brian Wright(804)
Shamit Shome(804)
Reagan Dunk(661)
25-311Brian Nana-Sinknam(551)
Nazeem Bartman(469)
Chris Nanco(407)
Jordan Wilson(407)
Michael Amick(361)
Jakob Nerwinski(337)
Marcus Epps(243)
31-44415 players
46-69525 players

These clusters are determined algoritmically with a little extra weight to drafts published on the 13th. I'm not sure how exactly clusters 3 and 6 are different, when I eventually figure this out I'll update this.

Interestingly FC Dallas whittled down the combine invitees to a 29 player list, it would be interesting to see the overlap.

Teams picking at the beginning of a new cluster(like Columbus picking at 9th) should keep track of where players from previous clusters go and target players that fall outside of their cluster positions. It might have looked like Atlanta made a misstep when it traded for the 8th pick but it looks like just the right position according to the clusters, because I would bump up Shome due to his GA status.

*I only used Soccer By Ives, TopDrawerSoccer, and MLSSoccer.com for their big boards and mock drafts since I trust their credibility. Other mock drafts are can be found by NBCSport's ProSoccerTalkBleacherReport, and SBNation.

**I used the second version of the MLS Combine invite list. This excludes players like Trevor Haberkorn, Liam Callahan, and Alex Neff who appeared on multiple mock drafts and bog boards.

***I published a modified version of this to Quakes Epicenter on the 12th before four more big boards/mock drafts were posted. The clusters of the top groups contain all the same players other than the new cluster analysis adding Nanco, Nerwinski, and Epps.