Extensive stats: Present top 10 players at the previous 4 GS - APRIL 2010
Jul 17th, 2010, 12:34 AM
Re: Extensive stats: The present top 10 players at the previous 4 GS!

Jul 18th, 2010, 11:32 PM
Re: Extensive stats: The present top 10 players at the previous 4 GS!

I took this data and made cluster analysis, which basically tells you the similarities between players. a lot of this probably isn't ground breaking but it's still interesting.

you can see the similarities in the dendograms. for those who don't know how to interpret dendograms, the linkage ( -+--+ ) that is closest to 0 (in the 0 to 25 scale) indicates more similarities. so, for example, in the first graph, venus and stosur are similar because they're linked very early on. if you look at serena, she is linked to the rest of the group only at the 25 mark, which means that in this test, she's very different to all other players.

to help read the graph, put players in perspective. serena is regarded as the best server (helps her win matches), so being far from her is not positive, although not necessarily that negative. venus and stosur also known for relying on their serves, so that's another focus point.

the first dendogram tests the similarities in serve, specifically the variables aces per serve games, double faults per serve games, % on 1st serve, win % on 1st, % on 2nd, win % on 2nd, % of broken serve and probability of having your serve broken.

similarities in serve

Code:
```       C A S E         0         5        10        15        20        25
Label           Num  +---------+---------+---------+---------+---------+

venus             2   -+-----+
stosur            7   -+     +---------+
kuznetsova        4   -+-----+         |
safina            3   -----------------+-------------------------------+
wozniacki         5   -+-------------+ |                               |
azarenka          6   -+             +-+                               |
dementieva        9   ---------+-----+                                 |
jankovic         10   ---------+                                       |
serena            1   -------------------------------------------------+```

serve speed
variables: 1st serve speed, 2nd serve speed

Code:
```       C A S E         0         5        10        15        20        25
Label           Num  +---------+---------+---------+---------+---------+

safina            3   -+-------+
jankovic         10   -+       +---------------------------+
kuznetsova        4   ---+-+   |                           |
wozniacki         5   ---+ +---+                           +-----------+
azarenka          6   -----+                               |           |
serena            1   -+-------+                                       |
stosur            7   -+       +-------------+                         |
venus             2   ---------+             +-------------------------+
dementieva        9   -----------------------+```
serena, sam and venus (close by) have similar serve speeds. nothing new there. radwanska doesn't link to anyone immediately, and that's because she's by far the slowest server.

performance in points
variables: average winners, average winners minus aces, average UE, % of winners in rallies, % of UE in rallies, % of FE in rallies, near winners (forced errors by opponents).

Code:
```       C A S E         0         5        10        15        20        25
Label           Num  +---------+---------+---------+---------+---------+

kuznetsova        4   -+---+
stosur            7   -+   +-----+
venus             2   -----+     +-----------+
serena            1   -----------+           +-------------------------+
azarenka          6   -+-----------+         |                         |
dementieva        9   -+           +---------+                         |
safina            3   -------------+                                   |
wozniacki         5   ---+-----+                                       |
jankovic         10   ---+     +---------------------------------------+
interesting to see jj being partnered with woz. I thought she had a more agressive game.

player at the net
variables: average of net approaches, net approach success, probability of approaching the net during a rally

Code:
```       C A S E         0         5        10        15        20        25
Label           Num  +---------+---------+---------+---------+---------+

serena            1   -+---+
dementieva        9   -+   +---------------------+
kuznetsova        4   -+---+                     |
azarenka          6   -+                         +---------------------+
safina            3   ---+---+                   |                     |
wozniacki         5   ---+   +-------------------+                     |
jankovic         10   -------+                                         |
venus             2   ---------+-----+                                 |
stosur            7   ---------------+```
perspective: sam is the player that approaches the net more, but doesn't have the highest success level. serena doesn't go to the net often and has one of the lowest success rates.

general game play
variables: all of the above

Code:
```       C A S E         0         5        10        15        20        25
Label           Num  +---------+---------+---------+---------+---------+

venus             2   -+-------+
stosur            7   -+       +---------------+
kuznetsova        4   ---------+               +-------------------+
safina            3   -------+-------+         |                   |
dementieva        9   -------+       +---------+                   +---+
azarenka          6   ---------------+                             |   |
serena            1   ---------------------------------------------+   |
wozniacki         5   -+-----------------+                             |
jankovic         10   -+                 +-----------------------------+

Jul 24th, 2010, 04:55 PM
Re: Extensive stats: The present top 10 players at the previous 4 GS!

Jul 25th, 2010, 11:25 AM
Re: Extensive stats: The present top 10 players at the previous 4 GS!

First of all I'd like to apologize for not replying or updating the thread on a regular basis but it was a bit time consuming doing this, and I've been too busy with other stuff lately. But I'm glad for the interest from all of you, and I appreciate your comments!

One of reasons I made this thread was to show just how much data it is possible to extract from the normal Grand slam match statistics sheets. And I've not even explored all possibilities yet (winning points when receiving vs. serving could be interesting to look at to name just one unexplored set of data)! But I have to admit that it's too time consuming to collect all the data needed for these analyses so I'm not going to make a complete update anytime soon. Some of the stats are probably a bit useless anyway? Maybe I'll make an update of selected parts of these stats after US Open 2010 (with FO 2010 and Wimbledon 2010 match stats added). Perhaps I'll add one or two new players in the lot (at the expense of someone else). We'll see...

The thread title is a bit misleading, of course, as it does no represent the present top 10 players but players who were the top 10 at the given time (and it's no longer the 4 previous GS either!). Any suggestions for a better thread title?

Thanks for the dendrograms! The cluster analysis is a really interesting way to look at similarities, but I'll just have to stress the fact that the set of data analyzed is relatively small (as I'm sure you've realized yourself): It's just 4 tournaments in a timespan of 8 months, which means that any shifts in forms of the players or any slight injuries in this 8 months period may (probably) have a large impact on the resulting stats.
Jul 25th, 2010, 11:46 AM
Re: Extensive stats: The present top 10 players at the previous 4 GS!

Quote:
Originally Posted by n1_and_uh_noone View Post
I seriously think though that 'unforced errors' against someone like Wozniacki are really forced, but that is my opinion
I partly agree. Forced vs. Unforced can at times be really difficult to distinguish but I'm just presenting the data as they appear in the official stats (forced errors actually don't appear directly but they are easy to calculate by subtraction). Btw., I've had a new look at how Stosur and Wozniacki have won their points at the 2 most recent grand slams (FO 2010 + Wimbledon 2010) and compered these with the old chats (also presented on the first page in the thread). In my former analysis these two players were the 2 extremes, but now they are actually not too far apart:

Distribution of points won in a rally:

Wozniacki - FO2009 + Wimbledon 2009 + UO 2009 + AO2010 (old chart):

Stosur - FO2009 + Wimbledon 2009 + UO 2009 + AO2010 (old chart):

The two new charts:

Jul 25th, 2010, 01:28 PM
Re: Extensive stats: The present top 10 players at the previous 4 GS!

Quote:
Originally Posted by angliru View Post
...

The thread title is a bit misleading, of course, as it does no represent the present top 10 players but players who were the top 10 at the given time (and it's no longer the 4 previous GS either!). Any suggestions for a better thread title?
Extensive stats: The present top 10 players at the previous 4 GS! APRIL 2010

Quote:
Originally Posted by angliru View Post
Thanks for the dendrograms! The cluster analysis is a really interesting way to look at similarities, but I'll just have to stress the fact that the set of data analyzed is relatively small (as I'm sure you've realized yourself): It's just 4 tournaments in a timespan of 8 months, which means that any shifts in forms of the players or any slight injuries in this 8 months period may (probably) have a large impact on the resulting stats.
sure, but you work with what you can. outside of grand slams, you can't find such detailed stats on the matches, which is a shame - a bigger and better sample would keep some of us stat nerds busy for weeks.
Jul 25th, 2010, 10:08 PM
Re: Extensive stats: The present top 10 players at the previous 4 GS!

Quote:
Originally Posted by cellardoor View Post
Extensive stats: The present top 10 players at the previous 4 GS! APRIL 2010sure, but you work with what you can. outside of grand slams, you can't find such detailed stats on the matches, which is a shame - a bigger and better sample would keep some of us stat nerds busy for weeks.
Of course. As mentioned in my previous post I might have a go at another set of stats after US Open. Maybe I can present combined match stats from 8 consecutive grand slams (all slams since AO 2009). I'm not sure my next attempt will have 10 players; maybe there will be just 8 players or so but I may throw in some new ones like Schiavone, Sharapova, Zvonareva, or Na Li, or whoever seems interesting at that time (ranking being the most obvious criteria, but not necessarily the only one). I'm quite sure the stats won't be as extensive as in this thread.

All suggestions are welcome.
