PyEx: A statistic you can sink your teeth into

By Ryan Buckland / Expert

North Melbourne are 2014’s consensus ‘surge up the ladder’ team. Most pundits have them in the top four, and some were even as bold to say they’ll go all the way and take the title off Fremantle (who are my pick).

Last week’s insipid display may cause some to waver, though, although the Bombers seem to find something when their club is in the headlines in the lead-up to the real show (and, yes, that was a play on words).

The Roos finished up last year in 10th place, with a 10-12 record and a very strong percentage of 119.5 per cent (good enough for sixth overall). In kicking 342.255 on the season (that is, goals and behinds), the Roos kicked more maximums than anyone except Geelong and Hawthorn, while their accuracy trailed only the Dockers.

Defensively, they were merely middling, but in a competition where just under half the teams make the finals, that’s good enough.

North were in 10 matches where the final margin was less than two goals – what I consider a ‘close’ game from an analytical perspective – and won just three of them. And they had a ‘contender’s’ off season, in so far as they bought in the specific players and coaches to target deficiencies.

So it’s no surprise they’re the sweethearts of this year’s footy season.

Well, I hate to burst the AFL world’s bubble, but the numbers say North finished pretty much exactly where they should have last year. I don’t mean the numbers I’ve presented above, nor the numbers of revisionist commentators who said so after the fact. I’m talking about a statistic I’d like to introduce you to: Pythagorean Expected Wins (let’s call it PyEx; it can’t be a stat unless it’s got a cool abbreviation).

You’re probably thinking: I know this Pythagoras thingy from somewhere! It’s a mathematical theory we all learnt in high school as the way to calculate the length of one side of a triangle. In a sports statistics sense, it was first championed by the Godfather of sports analytics, Bill James.

If you’re interested in more information about the theory, shoot over to the Wikipedia page (which is, surprisingly, pretty good).

Essentially what it boils down to is that a team’s winning record should be a function of its offensive potency and defensive abilities, and that the two of these are, statistically, unrelated (that is, a team’s offense is independent of its defence).

It was first used in baseball, where it’s quite obvious the two are distinct. It’s since spread to a range of other American sports, and is able to accurately predict the wins a team should have throughout a season.

PyEx’s most heavy use in baseball, in particular, is to test the role of luck and/or outside factors in a team’s wins and losses. The theory behind this is fairly sound: if a team’s quality is measured by its offense and defence (PyEx), then this should be reflected in its overall wins and losses. Any deviation from this is caused by in-game or situational factors largely beyond the team’s control; a crucial free kick, for example, may lead to a close loss.

It has some issues, for sure. To be fully accurate, it needs a big sample: it’s incredibly effective in a 160+ game baseball slug fest, and less so in a 16 game gridiron grind.

For the AFL, there’s an argument that the offensive and defensive elements of the game aren’t independent – which is something I’d concede. However, as you’ll see towards the end of the article, PyEx is very, very, accurate when a couple of tweaks are applied.

But enough talk, what does the straight up PyEx formula say about the 2013 AFL home-and-away season?

(Note, if you just want to see the final results, skip ahead to the last table of figures.)

 

Wins

%

Ladder Rank

PyEx Wins

PyEx Rank

Hawthorn

19

135.7%

1

17.6

1

Geelong

18

135.6%

2

17.6

2

Fremantle

16.5

134.1%

3

17.1

3

Sydney

15.5

132.5%

4

17.1

4

Richmond

15

122.8%

5

15.7

5

Collingwood

14

115.0%

6

14.3

7

Essendon*

14

107.3%

7

12.7

9

Port Adelaide

12

102.4%

8

11.6

11

Carlton

11

106.7%

9

12.6

10

North Melbourne

10

119.5%

10

15.2

6

Adelaide

10

108.1%

11

12.9

8

Brisbane Lions

10

89.6%

12

8.4

14

West Coast

9

95.3%

13

9.8

12

Gold Coast

8

91.7%

14

8.9

13

Western Bulldogs

8

85.1%

15

7.2

15

St Kilda

5

82.6%

16

6.6

16

Melbourne

2

54.1%

17

1.3

17

Greater Western Sydney

1

51.0%

18

0.9

18

*Note I’ve chosen to place Essendon in their W/L position on the ladder, as we’re not testing the effect of executive orders on ladder position.

So PyEx gets it pretty well on the money from a rankings point of view. The top five teams according to PyEx ended up in the top five positions on the ladder, and it got the bottom four too – both in order, no less.

But in the middle things get a bit dicey. PyEx says North were the sixth-best team in the league, despite finishing 10th on the ladder, while Port Adelaide (even with Ken Hinkley) were only the 11th-best team, despite coming eighth on the W/L ladder.

You’ll also notice the wins and PyEx wins are quite different for a number of teams. Particularly at the very pointy end of the ladder (Hawthorn miss out on almost two wins), but also at the bottom (St Kilda gain 1.6 wins), and, well, the middle too, with North ‘gaining’ more than five wins using PyEx.

What’s the issue? Well, we want the formula to give a true account of a team’s luck (good or bad) over the course of a season, so we can enter the following season with some guide as to which teams may perform better (or worse) based on some external elements, like a more favourable draw, more luck in the close games, or the impact of changes in personnel. So, some tweaks are required.

My investigations led to two conclusions: PyEx in the AFL doesn’t appropriately take into account the role of defence in the creation of scoring shots (and so overvalues offensive potency), and the role close losses play in determining the number of wins a side has over the year.

Let’s mix these ingredients into PyEx and see what we get.

Close losses
As I foreshadowed earlier, close losses aren’t really taken into account in PyEx. But when you think about it, they can’t be. All PyEx is doing is trying to put a value on a team’s offensive and defensive potency; it won’t look at circumstances where a team wins or loses by a small margin.

Take North, who were involved in 10 games – 10 of their 22 – where the margin was two goals or less. That’s a phenomenal number; I might check this later on, but I’d hazard a guess it’s the most in any AFL season, and probably VFL too (although this is less likely as the game was lower scoring in the past).

There were 45 games decided by less than two goals over the whole home-and-away season; 44 of them were won, and there was one draw (between Fremantle and Sydney). How often should a side win a close game? A fair starting point is to assume that, over the long term, a side will win 50 per cent of games it plays in that are decided by two goals or less.

You could argue that some teams win more, consistently, but I’d almost guarantee that this stat will regress to the mean over the long run.

Right, so if a side can expect to win 50 per cent of its close games, how did each side fair last year?

Played

Won

% Won

+/- Close Wins

Brisbane Lions

7

5

71%

+1.5

Essendon

5

4

80%

+1.5

Port Adelaide

8

5

63%

+1.0

Fremantle

4

2.5

63%

+0.5

Hawthorn

5

3

60%

+0.5

Melbourne

1

1

100%

+0.5

West Coast

5

3

60%

+0.5

Western Bulldogs

5

3

60%

+0.5

Geelong

8

4

50%

0.0

Richmond

4

2

50%

0.0

Carlton

7

3

43%

-0.5

Collingwood

3

1

33%

-0.5

Gold Coast

3

1

33%

-0.5

Greater Western Sydney

1

0

0%

-0.5

Sydney

2

0.5

25%

-0.5

Adelaide

8

3

38%

-1.0

St Kilda

4

1

25%

-1.0

North Melbourne

10

3

30%

-2.0

North Melbourne won two fewer close games than we would expect last year but, as you can see, its winning percentage of 30 per cent isn’t the worst in the league.

St Kilda were involved in four close games, and won just one of them – with our analysis saying they could expect to have won at least one more.

The top end of this ladder is also quite intruiging. The Bombers were involved in five close games last year, and managed to snag four of them, earning them an extra 1.5 wins than we would expect over the long run. How about the Lions though! Involved in seven close games, and they managed to edge over the line in five of them – again, good enough for an extra 1.5 wins than we would generally expect.

Let’s now add these figures to PyEx, and see where we get. Note, I’ll be taking these +/- close wins from the PyEx figure, as we’re trying to calibrate the reasons as to why PyEx overstates or understates a team’s wins. As we’ve said, PyEx, theoretically, assumes that a team will win 50 per cent of its close games, which is why we’re taking the +/- out.

Or, to put it another way, we want to be adding the +/- close wins to the team’s actual wins, because we’re then in a way accounting for PyEx’s cold statistical analysis with the colour and excitement of the actual competition.

Look, just take my word for it, and if anyone wants to duke it out in the comments, we’ll deal with it then.

Wins

%

Ladder Rank

PyEx Wins

PyEx Rank

Hawthorn

19

135.7%

1

18.1

1

Geelong

18

135.6%

2

17.6

3

Fremantle

16.5

134.1%

3

17.6

2

Sydney

15.5

132.5%

4

16.6

4

Richmond

15

122.8%

5

15.7

5

Collingwood

14

115.0%

6

13.8

7

Essendon*

14

107.3%

7

14.2

6

Port Adelaide

12

102.4%

8

12.6

9

Carlton

11

106.7%

9

12.1

10

North Melbourne

10

119.5%

10

13.2

8

Adelaide

10

108.1%

11

11.9

11

Brisbane Lions

10

89.6%

12

9.9

13

West Coast

9

95.3%

13

10.3

12

Gold Coast

8

91.7%

14

8.4

14

Western Bulldogs

8

85.1%

15

7.7

15

St Kilda

5

82.6%

16

5.6

16

Melbourne

2

54.1%

17

1.8

17

Greater Western Sydney

1

51.0%

18

0.4

18

Right, so not a lot has changed in terms of the positions on the ladder. The top five and bottom four remain the same (although the Cats and Dockers have changed places), North and Adelaide have dropped back, while the Bombers have rocketed up into sixth position (causing everyone else to shift down a position or two).

But what we’re more interested in is the PyEx Wins versus the actual wins. And we’re significantly closer now. The biggest deviations remain North (10 Wins versus a PyEx of 13.2), West Coast (nine versus 10.3) and Hawthorn (19 v 18.1).

PyEx still hasn’t quite given GWS its only win of 2013, but as you can see, 12 teams are now rated within one win of their actual wins, while seven are within half a win (which I’d consider pretty good). This compared to 10/4 in the straight PyEx.

I’m not satisfied, though. Lets now add the last ingredient: the underappreciation of the role defence plays in creating scoring opportunities.

Adjusting offense
The final tweak we’ll make is to reduce, slightly, the influence of offense on PyEx Wins. This is to acknowledge that, unlike other sports, defensive skill plays a direct role in the ability to create scoring opportunities.

We’ve seen this emerge in recent years, with an increasing per centage of scores coming from direct turnovers, and with more and more direct turnovers resulting from the application of pressure by the defensive side.

I’m unfortunately not in a position to put some solid numbers behind how much we should cut ‘points for’ in PyEx. Intuitively, though, it makes sense. What I found in my fiddling with the numbers is that in the 2013 home-and-away season, reducing the influence of offense on PyEx by 2.7 per cent gives the best result.

Now that our PyEx is fully baked, where have we landed?

Wins

%

Ladder Rank

PyEx Wins

PyEx Rank

Hawthorn

19

135.7%

1

17.7

1

Geelong

18

135.6%

2

17.1

3

Fremantle

16.5

134.1%

3

17.2

2

Sydney

15.5

132.5%

4

16.1

4

Richmond

15

122.8%

5

15.1

5

Collingwood

14

115.0%

6

13.2

7

Essendon*

14

107.3%

7

13.6

6

Port Adelaide

12

102.4%

8

11.9

9

Carlton

11

106.7%

9

11.4

10

North Melbourne

10

119.5%

10

12.6

8

Adelaide

10

108.1%

11

11.2

11

Brisbane Lions

10

89.6%

12

9.2

13

West Coast

9

95.3%

13

9.6

12

Gold Coast

8

91.7%

14

7.8

14

Western Bulldogs

8

85.1%

15

7.1

15

St Kilda

5

82.6%

16

5.1

16

Melbourne

2

54.1%

17

1.7

17

Greater Western Sydney

1

51.0%

18

0.3

18

Ok, so we haven’t quite nailed the ladder. But take a look at the PyEx Wins versus actual wins. We’ve now got to a situation where 15 of the 18 teams have PyEx Wins that are within one of actual wins, and seven which are within half a win.

Here’s the table of the final PyEx versus actual wins, so you can see more clearly.

Fully baked

Wins

PyEx Wins

+/-

Hawthorn

19

17.7

+1.3

Geelong

18

17.1

+0.9

Fremantle

16.5

17.2

-0.7

Sydney

15.5

16.1

-0.6

Richmond

15

15.1

-0.1

Collingwood

14

13.2

+0.8

Essendon

14

13.6

+0.4

Port Adelaide

12

11.9

+0.1

Carlton

11

11.4

-0.4

North Melbourne

10

12.6

-2.6

Adelaide

10

11.2

-1.2

Brisbane Lions

10

9.2

+0.8

West Coast

9

9.6

-0.6

Gold Coast

8

7.8

+0.2

Western Bulldogs

8

7.1

+0.9

St Kilda

5

5.1

-0.1

Melbourne

2

1.7

+0.3

Greater Western Sydney

1

0.3

+0.7

When reading this table, what matters is the +/- figure. This, effectively, is the difference in terms of wins between a team’s actual result and the result implied by their offensive and defensive skills (and the interaction between defence and offence) and their luck in winning close games.

A plus indicates that the overall wins earned by a team are likely to reflect circumstance versus quality; or, if you’d like, their W/L column is overstated. A minus means the opposite.

Lets take Hawthorn, for example. Hawthorn ended the year on top of the ladder with 19 wins. Yet, according to the PyEx formula, they were only good enough for 17.7 wins – implying that circumstances outside of their control gave them an extra 1.3 wins over the course of the year. This also means that, if we were to replay the 2013 season again from scratch, we would expect Hawthorn to end up between 17 and 18 wins – more likely 18.

It also means they are a prime candidate for regression this year, as those factors which gave them an extra 1.3 wins may or may not be repeated in 2014. Whether they are able to offset that by increasing their offensive or defensive potency isn’t for me to judge.

Make no mistake, the figures still show that Hawthorn was the best side in 2013, but perhaps by less than was implied by their 19 wins.

Now, have a look at North Melbourne. Even after accounting for their terrible luck in close games, and adjusting down offense (which North are more associated with than defence), North were still 2.6 wins short of what you would tend to expect given their output. Adelaide were in a similar boat, which took me by surprise a bit.

This makes North Melbourne your prime bounce (pardon the pun) candidate for 2013 (particularly after their offseason moves), followed closely by Adelaide. Don’t rule out an improvement from Fremantle and West Coast based on these figures, too, and, well if we think Buddy is worth his price tag, Sydney should get a lift in 2014, too.

So, I commend PyEx to the masses. I’ll revisit this using 2014 numbers when we get to, say, the half way mark of the season to see how your team is tracking. And if The Crowd considers it useful, I may even go back over previous seasons and see where we get to.

The Crowd Says:

2014-03-28T07:26:14+00:00

Cat

Roar Guru


so many of the stats kept have serious flaws too. If a player takes a set shot from 20 meters out dead in front of goals, and shanks it out on the full, why is that not counted as a shot on goal? That would be like baseball deciding a player who struck out doesn't count as an at bat. Shots on goal should not be limited to only those that manage to score. But then imagine the impact that would have on certain players stats. I'd also like some clarity on what exactly counts as a disposal and how disposal efficiency is calculated. So many times I watch games and disagree with what the official stats say.

AUTHOR

2014-03-28T03:42:56+00:00

Ryan Buckland

Expert


Well, apparently a couple of ex-baseballers that are involved in Australian sports have developed a WAR-type stat for AFL players - but I've been told by a couple of insiders that its pretty awful. And you are certainly right about the AFL and stats. The AFL has outsourced it all to Champion Data, who charge ridiculous amounts of money to access them (I know because I've inquired about getting some stats). Hopefully as the AFL develops its media unit they'll open it up a bit - but I'm not holding my breath.

2014-03-28T03:35:10+00:00

Cat

Roar Guru


Sorry I knew I should have been clearer that I wasn't accusing you of massaging number to purposely fulfill your expected results, but I think now that you have made your educated guesses to what adjustments need to be made based on last season, it would be interesting if those exact numbers were used for the past 10-15 seasons and see if they still held up. I agree with your reasoning for the adjustments, just not sold yet on the magnitude of those adjustments, they sound good for last year because they fit reasonably well, will they for other seasons? Hopefully we'll see your further research and find out. Now if someone could translate VORP into AFL that would be amazing (and amazingly difficult). Having grown up on baseball and loving stats (I used to watch games on TV and keep score in the old baseball score books just for fun) I am often frustrated in how poorly AFL stats are used and tracked.

2014-03-28T03:00:52+00:00

Greg

Guest


You've just finished reading "Trading Bases" haven't you?

AUTHOR

2014-03-27T08:48:11+00:00

Ryan Buckland

Expert


Hi Bogan Baiter (awesome) You're right - you can't necessarily quantify the intangibles. But the +/- is saying that "the intangible factors" that occur in a game are worth "plus" or "minus" a certain number of wins for a team. In North's case, the intangibles cost them between 2 and 3 wins over the season, while Hawthorn gained between 1 and 2 wins. Note the intangibles are things, theoretically, which are beyond the team's control. That's if you believe scoring power and defensive prowess are the ultimate statistics of offense and defense, which, well, they kind of are for mine.

AUTHOR

2014-03-27T08:43:35+00:00

Ryan Buckland

Expert


Hi guys, I do plan on testing this over the past 10 years at least in terms of the actual data. What it should show is that those teams with a "minus" score do better in the following year than they did in the test year in terms of actual wins. But as you point out Pillock it doesn't take into account personnel changes - which is much more difficult to do in the AFL than in baseball. Effectively what it is saying is that if the season was played over again, you would "expect" (for example) North Melbourne to have ended up between 12 and 13 wins, as oppose to 10. And the adjustments weren't -just- made so the data fit the theory. I made them because they in my mind a deficiency in the theory as its applied to the AFL. I believe they do a similar thing in basketball, which is high scoring, has close games, and has less of a distinction between offence and defence than the stat's birthsport, baseball. Thanks for reading!

2014-03-27T07:23:29+00:00

Cat

Roar Guru


I'd be more interested how the formula shaked out over say the past 10-15 seasons, a single season is too small a sample size to know if the formula works out or if the 'adjustments' were just made to fit the desired results.

2014-03-27T07:20:57+00:00

Cat

Roar Guru


Shame it takes an utter embarrassment to motivate that team.

2014-03-27T07:05:58+00:00

Pillock

Guest


The trouble with this system is that it takes no account of changes in personnel which usually occurs during the off season. It's like making money if its all about the past history teachers would be the richest people around. Interesting article all the same. Maybe a mid season follow up to test the theory.

2014-03-27T04:35:45+00:00

Bogan Baiter

Guest


a very good analysis, however we can't really statistically quantify other intangible factors, most relevantly in north's case, the mental toughness required to win the close games. With the exception of the lions (and conversely sydney and to a lesser extent geelong) the teams that were at the top of the ladder had good records in tight games. norths lack of defensive pressure, and unwillingness to play "dour", tight, large number of contested possessions football will continue to hold them back. to be fair they have realised this andf apparently made tackling their number one priority in pre-season apparently, but as yet the results and stats (notwithstanding they are pre-season and rd 1) haven't changed. Essendon probably couldn't believe their luck

2014-03-27T01:33:16+00:00

Daws

Guest


Great investigation, hopefully some of that luck rubs off on the Dockers this year!

2014-03-26T23:07:36+00:00

Ash of Geelong

Guest


It may take all weekend to read that but one thing I can tell you is the Roos will galvanised after their effort last week similar to the way they were after the Hawthorn belting 2 years ago and went on to win 10 in a row.

2014-03-26T23:07:06+00:00

Cat

Roar Guru


problem for NM is when you lose a game 123 to 124, most will say they were unlucky to lose by a point, and to a degree they are right, but the real issue is any team that gives up 124 points a game is also damn lucky to even still be close to winning. Thats not a winning defense and NM haven't done anything to address that deficiency (not that I have seen so far anyway).

2014-03-26T22:35:02+00:00

DingoGray

Roar Guru


my head is hurting afer readng that... I was never any good at Math

Read more at The Roar