The last time I downloaded a ranking PDF from the WTA was during the Australian Open, and I see they've changed the format again. They've replaced 0's with blank fields
, which looks like a giant problem for me.
Consider Wozniacki's line in the current WTA rankings:
1 (1) WOZNIACKI, CAROLINE DEN 9930 23 470 200 280 200
The various fields are in order,
This week's ranking,
Last week's ranking,
Total ranking points,
Points earned last week,
Points coming off,
16th event, and
The problem is that players don't have values in every field, especially the "Points added" field; see Vera Zvonareva's line:
3 (3) ZVONAREVA, VERA RUS 7815 20 320 125 60
On the other hand, there are people like Shahar Peer:
11 (11) PEER, SHAHAR ISR 3030 22 60 60 60
Which "60" goes in which field?
Is there any good way to parse the new WTA PDFs? I used to use a Perl script, but now I can't create a list for each player based on separating on spaces. Yes, I know there are varying numbers of spaces in names, but one could get around that problem by reversing the list when necessary, to get at the last itmes in the list. Now, however, thanks to the blank fields, it's no longer obvious what the last item in the list is.