The Roar
The Roar

AFL
Advertisement

Locking up AFL data will only make it less valuable

Autoplay in... 6 (Cancel)
Up Next No more videos! Playlist is empty -
Replay
Cancel
Next
Roar Guru
26th August, 2019
4

Earlier this month the AFL website published an article based on data generated by Stats Insider rather than Champion Data.

Champion Data is the official data provider for the AFL, while Stats Insider has only just started recording similar match data by hand.

When I first saw it, I was dumbfounded. Surely it would have been easier and more reliable to use the official source of the numbers? These thoughts were shared on social media.

I don’t expect fans to be provided with all the data, but stories like this using shot data should be available for public consumption.

If an AFL journalist has an idea, requesting the data to do the article when it’s just a simple data cut isn’t the situation AFL fans and journalists should be in.

What the process of using Stats Insider suggests to me is that it was simply easier to go to Stats Insider rather than go through the process with Champion Data.

Advertisement

This is my impression from talking to people within the industry, who have ntoed that not only would there be a wait, but that the visual wouldn’t have been provided, rather only the summaries.

What fans are able to get from the box score is only the total summary at the end of the game – not important information like when the goal was scored, if the kicked missed everything, or anything describing the ‘pressure’ of the kick when it takes place.

So, fans are left on the far left of the data access spectrum. This article and the data in it represents a happy middle ground. No information around pressure of the shot was given, no information about the speed of the player, the amount of defenders around the player when the shot was taken is given. Just, in my opinion, middle ground.

If we are to believe Marc McGowan – and I do – the data is available to journalists on request. But why can’t it just be available?

And my original question is still waiting for an answer.

Advertisement

That’s the situation we are in – only one provider of numbers for everything, from a simple shot plot to the player ratings based on possession chains.

No one is saying that all the data should be provided. But if you write a piece online you must find it interesting, and if you are paid for providing content at the very least you must think that other people find it interesting.

Take Marcs article here on Brisbane Lions

For an example one might consider one of McGowan’s articles like this, written on the Brisbane Lions.

No one thinks that this data is particularly advanced, or even the way its being used is particularly state of the art.

But what if you liked the premise of the article but you are not a Brisbane fan? You can’t investigate your team in the way that Marc did in his article.

Let’s say you’rer a Brisbane fan – instead of the team summary you might be wonderin,g does my favourite player change the way he plays using the metrics Marc has used? But you can’t do this.

Advertisement

Good analytical pieces should promote debate among fans and analysts. If I was sitting down with my mate and mentioned that there’s a cool article on Brisbane showing that how they transition the ball changes week to week, her first question is probably whether the Eagles also do that.

Unfortunately, because the data is locked up, and the journalist who used it didn’t decide to cover that part, that’s it – that’s the conversation ending. What should be a jumping-off point instead becomes the end of the road. Is this the situation we want to be in?

Champion Data (CD) do great work – let me state that point-blank. Let’s take one of the more popular metrics used, the Player Ratings. Did you know: They came from Dr Karl Jackson’s PhD.

Tim Bedin, one of their data scientists, has a masters in physics. The people crunching the numbers are highly qualified and do great work.

My question is, why we must limit the research and understanding of football to only a select few?

Advertisement

I want to address this idea that there is no money in making this data available for fans.

The national football league – yep, the American one, with the Super Bowl – recently ran a big ‘data bowl’ whereby they allowed anyone access to player tracking data.

When I interviewed Michael Lopez on my podcast, Chilling with Charlie, one of the points he made was that people at clubs were unsure as to the full value of this data – they themselves wanted to see what people could come up with.

This competition was won not by insiders, but outsiders: university students! That’s right, folks, they entered and they won! They were able to showcase their talents to the league and to clubs and many were then able to get employment within clubland.

The data providers didn’t go under because a subset of data was released. I’d argue they may even be better off, because clubs were able to see the value in the data – they wanted to try out some of the presentations themselves and they were able to invest in people that have shown they have the skills.

Would there be less pressure on the AFL around scoring and congestion? Should important questions like this only be addressable by a small subset of people? Can the best possible solutions only come from them? Or could the AFL benefit from a more open data policy like the NFL did? But hiring an analyst isn’t making money for Champion Data – so how does that work?

It’s a little complex – so I’ll break down a few things I find quite weird.

Advertisement

First, the cap. AFL clubs have a salary cap on players as well as something called a soft cap. This soft cap applies to money spent on data, including from Champion Data, and club analysts.

The premium product from Champion Data, I am informed, is around $150,000 per season. Only about five clubs pay for it, currently. The AFL sets the soft cap, forcing clubs to make decisions about the amount of money they spend on data versus hiring their own analysts, and even spending on AFLW.

If you were an organisation being asked to pony up $150,000 for some data, you’d want to know what you’re going to get out of it. And what you get out of it has two elements: the data quality itself and the people in place to extract the value from the data.

The data quality is certainly there: CD hire numerous people at each and every game to ensure the collection of data. But good analysts cost money, and great analysts even more – getting these people to extract value from complex datasets costs is expensive business.

So some (or most) clubs choose not to build their own expertise in analysis, instead relying on the product from Champion Data. When they do work in-house, the soft cap pressures them to keep costs down. The soft cap limits football department spending, restricting clubs’ ability to hire the best data scientists, which means they must rely on Champion Data’s products.

This arrangement is a little questionable to me in light of the fact that CD is both the exclusive provider of stats and 51 per cent privately owned.

It means that each year, clubs are essentially forced to funnel money into the pockets of private individuals, and strongly disincentivized, if not outright prevented, from taking steps to wean themselves off the service by hiring their own experts. Is this really appropriate?

Advertisement

So, by giving more people the opportunity to show that they can get value in the data, maybe clubs might be more willing to invest in the higher tier product from Champion Data.

On a different topic – what is going on regarding access to data? I’m told from McGowan that clubs are the most worthy, of data, which I agree with.

Now let’s examine what clubs get for their money. Why are clubs writing their tools like web scrapers to get some of their data?

What is going on here? The Telstra data comes from CD. If clubs value it, why wouldn’t they just purchase it directly?

Logically, it suggests that whatever price point it’s at, the clubs feel it’s too much, given what they know about its value.

Advertisement

Knowing its value is key here, as it’s very hard to estimate a dataset’s potential without the ability to explore it. And that ability has two parts: access and capability, both subject to decisions that fall under the soft cap.

I know of at least four clubs who pay for raw data from CD, not to use the advanced CD metrics based on that data, but to instead create their own metrics from it.

This is partly because of a lack of trust in metrics they can’t actually reproduce, and partly because clubs value different inputs and they want control over these inputs in their models.

This seems wasteful, requiring clubs to spend soft cap money on something they don’t actually want – money that could have gone toward their own in-house analyst, or perhaps the AFLW.

Some clubs have found ways around this. A common one, which you might have noticed, is that clubs develop relationships with universities.

This essentially provides them with a workforce outside the soft cap: university students working on honours, masters or PhD projects.

This is incredibly stifling for the industry – when I went to the World Congress of Science and Football I watched numerous AFL presentations trying to answer things around game style, congestion and a whole heap of other interesting things.

Advertisement

I felt almost embarrassed but questions to the presenters consisted of:
– This is only one side’s data, how does it generalise?
– This data has both sides but only one match (data only available from match simulations)
– These kinds of questions are already being looked at in other sports, what’s next?

The common answer: we need more data. But it’s not available. It exists, but is locked up by the complex relationship that exists between clubs, the AFL and Champion Data, stifling research into the game.

I watched David Rath present at the conference on issues around congestion, including by what metrics those at AFL house use to judge if they are heading in the right direction.

It was a great talk, and by the end, you could see how the process worked – as a fan, you felt comfortable that at the very least, the process was right.

But I was lucky – I was able to go because I am a student and I presented at this conference, which meant the university paid my way.

If I’d just wanted to hear about how the AFL went about thinking about rule changes and how to assess their impact, it would have cost $600 for one day.

Should information like that David presented – the process in which changes were decided on and the evaluation of those rules – be so restricted?

Advertisement

Later this month, Karl Jackson will present ‘The Story of Scoring’ – how data has changed the narrative. Unfortunately, fans can’t hear this unless they pony up $500.

So this is the situation we are in. Research is being stifled, while fans and journalists don’t have easy access to data.

This is where I think the online community – the wonks, the nerds – can and want to play a role. People are itching to be able to show a club or the league that they have the technical ability and the thinking capacity to ask interesting questions in ways that clubs can utilize.

No one thinks for a second that people will start collecting the data themselves and CD will go out of business. But what is the harm in someone being the next ‘Figuring Footy’, who went from “Hey, let me show you I can do expected scores” to being hired by an AFL club? Do examples like that really undermine Champion Data’s value proposition?

What is the harm, in articles like Jordan’s, instead of providing goals and misses, being able to show us fans who is the best kick under pressure? The best kick when sprinting into an open goal? This kind of information isn’t new in clubland – what’s the harm in the public knowing?

Opening up data, just a little bit more data, wouldn’t do the league or Champion Data any harm. Allowing data-savvy fans to write better content than what is currently available should only create more appetite for it.

My very last point is about the online community, and why it has to come from the online community under the system in place.

Advertisement

While I have no interest in wanting to work for an AFL club, I have numerous friends that do. The sad reality is that while analyst jobs are possible the pay rate for them is severely under that which they could get elsewhere – especially for data science roles.

To paint a bleak picture, I have heard of club analysts who are paid the equivalent of three days work a week and have to find another role. But they do so simply because of a love of the industry and or club.

I have heard stories about issues getting an analyst for a job in AFL land. The feedback to the hiring manager was that people are requesting too much money. The response? Find someone who will take a pay cut to be in the AFL system.

The online community operate outside of their normal 9-5 that pays the bills simply because they genuinely love the sport.

They would like to grow the analysis behind their sport for free, like other bloggers have overseas.

That’s the situation we are in, unfortunately – a bunch of people who want to grow the sport analytically, being denied the opportunity.

close