Author Topic: Lots of data to look at (Read 4809 times)

mocat · « **Reply #25 on:** October 16, 2012, 02:13:21 PM »

Quote from: 8manpick on October 16, 2012, 02:10:18 PM

Quote from: mocat on October 16, 2012, 02:04:10 PM
well be my guest. have fun compiling that much data. it's much harder to find out how many possessions a team has than simply adding passing attempts + rushing attempts

I know... it would be nice if the stat services had it available though. Not like it would be hard or unheard of, they do it for basketball.

possessions in basketball are essentially the same as plays in football; teams usually average somewhere in the 60-80 range

8manpick · « **Reply #26 on:** October 16, 2012, 02:20:24 PM »

Quote from: mocat on October 16, 2012, 02:13:21 PM

Quote from: 8manpick on October 16, 2012, 02:10:18 PM
Quote from: mocat on October 16, 2012, 02:04:10 PM
well be my guest. have fun compiling that much data. it's much harder to find out how many possessions a team has than simply adding passing attempts + rushing attempts

I know... it would be nice if the stat services had it available though. Not like it would be hard or unheard of, they do it for basketball.

possessions in basketball are essentially the same as plays in football; teams usually average somewhere in the 60-80 range

Oh boy, tell me more

The point is that the possession is the smallest unit you can break a football game into to get the best idea of an offense's (or defense's) effectiveness. It eliminates the pts/play bias that causes a team that averages 12 plays per possession appear to be half as good as one that averages 6 plays per possession when they score at the same rate.

SwiftCat · « **Reply #27 on:** October 16, 2012, 02:34:03 PM »

Exactly. I'm sure the charts would still have similar results as points per play, but I still think it'd be a better indicator of a team's overall efficiency.

mocat · « **Reply #28 on:** October 16, 2012, 02:46:03 PM »

Quote from: 8manpick on October 16, 2012, 02:20:24 PM

Quote from: mocat on October 16, 2012, 02:13:21 PM
Quote from: 8manpick on October 16, 2012, 02:10:18 PM
Quote from: mocat on October 16, 2012, 02:04:10 PM
well be my guest. have fun compiling that much data. it's much harder to find out how many possessions a team has than simply adding passing attempts + rushing attempts

I know... it would be nice if the stat services had it available though. Not like it would be hard or unheard of, they do it for basketball.

possessions in basketball are essentially the same as plays in football; teams usually average somewhere in the 60-80 range

Oh boy, tell me more

The point is that the possession is the smallest unit you can break a football game into to get the best idea of an offense's (or defense's) effectiveness. It eliminates the pts/play bias that causes a team that averages 12 plays per possession appear to be half as good as one that averages 6 plays per possession when they score at the same rate.

yeah i get what youre saying, except, k-state is absolutely off the charts on PPP, even with our 12 plays per possession

SleepFighter · « **Reply #29 on:** October 16, 2012, 04:05:03 PM »

JFC guys.

http://www.adjustedstats.com/ratings-stats/cfbteams.php?team=Kansas+St.

SleepFighter · « **Reply #30 on:** October 16, 2012, 04:12:49 PM »

To summarize for the straight to the bottom crowd, K-State is #1 in the country in both raw and adjusted points per possession.

Rage Against the McKee · « **Reply #31 on:** October 16, 2012, 04:26:56 PM »

Quote from: SleepFighter on October 16, 2012, 04:12:49 PM

To summarize for the straight to the bottom crowd, K-State is #1 in the country in both raw and adjusted points per possession.

Also #1 in adjusted points per play

Stevesie60 · « **Reply #32 on:** October 16, 2012, 04:51:56 PM »

Quote from: SleepFighter on October 16, 2012, 04:05:03 PM

JFC guys.

http://www.adjustedstats.com/ratings-stats/cfbteams.php?team=Kansas+St.

Win probability for Saturday: .823

Score prediction: 47-33

CHONGS · « **Reply #33 on:** October 16, 2012, 05:09:47 PM »

I think there is a slight difference of "philosophy" as it were on how to compare teams. My goal is to use as few parameters as possible. I have no doubts a model with 27 fit parameters will fit the data better than a model with 3. I appreciate the limitations of using only 2+1 statistics (points per play scored, points per play given up, and losses), but in fact that is my goal. There is a strong correlation between the Pythagorean win % calculated with OE and DE and the actual win %.

Another limitation is the availability of statistics in computable form. Possessions are not an easy stat to extract for every team for every game, but I agree I would love to have it.

michigancat · « **Reply #34 on:** October 16, 2012, 05:48:56 PM »

Quote from: Chingon on October 16, 2012, 05:09:47 PM

I think there is a slight difference of "philosophy" as it were on how to compare teams. My goal is to use as few parameters as possible. I have no doubts a model with 27 fit parameters will fit the data better than a model with 3. I appreciate the limitations of using only 2+1 statistics (points per play scored, points per play given up, and losses), but in fact that is my goal. There is a strong correlation between the Pythagorean win % calculated with OE and DE and the actual win %.

Another limitation is the availability of statistics in computable form. Possessions are not an easy stat to extract for every team for every game, but I agree I would love to have it.

seems like extracting possessions would be easy. Punts, turnovers, scores, or ends of halves signify ends of possession. Is that in the wolfram download thingy?

CHONGS · « **Reply #35 on:** October 16, 2012, 06:02:40 PM »

Quote from: michigancat on October 16, 2012, 05:48:56 PM

Quote from: Chingon on October 16, 2012, 05:09:47 PM
I think there is a slight difference of "philosophy" as it were on how to compare teams. My goal is to use as few parameters as possible. I have no doubts a model with 27 fit parameters will fit the data better than a model with 3. I appreciate the limitations of using only 2+1 statistics (points per play scored, points per play given up, and losses), but in fact that is my goal. There is a strong correlation between the Pythagorean win % calculated with OE and DE and the actual win %.

Another limitation is the availability of statistics in computable form. Possessions are not an easy stat to extract for every team for every game, but I agree I would love to have it.

seems like extracting possessions would be easy. Punts, turnovers, scores, or ends of halves signify ends of possession. Is that in the wolfram download thingy?

Hmmm I would have to see how close the numbers of punts + to + fg attempted + fourth downs not converted + safeties is to the numnet of offensive possessions, it might just be close enough. I will miss on drives stopped by the half and multiple turnover plays might muck it up but should be statistically small.

SwiftCat · « **Reply #36 on:** October 16, 2012, 06:03:16 PM »

Is a turnover on downs listed as a turnover? What about of a fumble is returned for a TD?

michigancat · « **Reply #37 on:** October 16, 2012, 06:06:19 PM »

Quote from: Chingon on October 16, 2012, 06:02:40 PM

Quote from: michigancat on October 16, 2012, 05:48:56 PM
Quote from: Chingon on October 16, 2012, 05:09:47 PM
I think there is a slight difference of "philosophy" as it were on how to compare teams. My goal is to use as few parameters as possible. I have no doubts a model with 27 fit parameters will fit the data better than a model with 3. I appreciate the limitations of using only 2+1 statistics (points per play scored, points per play given up, and losses), but in fact that is my goal. There is a strong correlation between the Pythagorean win % calculated with OE and DE and the actual win %.

Another limitation is the availability of statistics in computable form. Possessions are not an easy stat to extract for every team for every game, but I agree I would love to have it.

seems like extracting possessions would be easy. Punts, turnovers, scores, or ends of halves signify ends of possession. Is that in the wolfram download thingy?
Hmmm I would have to see how close the numbers of punts + to + fg attempted + fourth downs not converted + safeties is to the numnet of offensive possessions, it might just be close enough. I will miss on drives stopped by the half and multiple turnover plays might muck it up but should be statistically small.

Is there not a timestamp on plays?

And I think a multiple turnover play should probably count as a new possession - the next play is 1st and 10 (or a score) no matter what.

CHONGS · « **Reply #38 on:** October 16, 2012, 06:11:47 PM »

Quote from: SwiftCat on October 16, 2012, 06:03:16 PM

Is a turnover on downs listed as a turnover? What about of a fumble is returned for a TD?

I don't think it officially is, but I could be wrong.

The trouble again will be compiling all of these stats. While in principle this could maybe be extracted from the website SleepFighter mentioned, it would take downloading/scraping at least 700+ webpages by the end of year. I think they would greatly frown upon that.

Right now I extract my stats from NCAA and I only have to scrape 5 or so pages. It it through my own record keeping in fact that I can break it down into game by game stats. There is also the process of building the schedule matrix which can be a pain in the ass.

michigancat · « **Reply #39 on:** October 16, 2012, 06:16:44 PM »

Quote from: Chingon on October 16, 2012, 06:11:47 PM

Quote from: SwiftCat on October 16, 2012, 06:03:16 PM
Is a turnover on downs listed as a turnover? What about of a fumble is returned for a TD?
I don't think it officially is, but I could be wrong.

The trouble again will be compiling all of these stats. While in principle this could maybe be extracted from the website SleepFighter mentioned, it would take downloading/scraping at least 700+ webpages by the end of year. I think they would greatly frown upon that.

Right now I extract my stats from NCAA and I only have to scrape 5 or so pages. It it through my own record keeping in fact that I can break it down into game by game stats. There is also the process of building the schedule matrix which can be a pain in the ass.

can you link to the pages you scrape?

CHONGS · « **Reply #40 on:** October 16, 2012, 06:18:50 PM »

Quote from: michigancat on October 16, 2012, 06:06:19 PM

Quote from: Chingon on October 16, 2012, 06:02:40 PM
Quote from: michigancat on October 16, 2012, 05:48:56 PM
Quote from: Chingon on October 16, 2012, 05:09:47 PM
I think there is a slight difference of "philosophy" as it were on how to compare teams. My goal is to use as few parameters as possible. I have no doubts a model with 27 fit parameters will fit the data better than a model with 3. I appreciate the limitations of using only 2+1 statistics (points per play scored, points per play given up, and losses), but in fact that is my goal. There is a strong correlation between the Pythagorean win % calculated with OE and DE and the actual win %.

Another limitation is the availability of statistics in computable form. Possessions are not an easy stat to extract for every team for every game, but I agree I would love to have it.

seems like extracting possessions would be easy. Punts, turnovers, scores, or ends of halves signify ends of possession. Is that in the wolfram download thingy?
Hmmm I would have to see how close the numbers of punts + to + fg attempted + fourth downs not converted + safeties is to the numnet of offensive possessions, it might just be close enough. I will miss on drives stopped by the half and multiple turnover plays might muck it up but should be statistically small.

Is there not a timestamp on plays?

And I think a multiple turnover play should probably count as a new possession - the next play is 1st and 10 (or a score) no matter what.

The trouble is getting this data and being able to compute with it. It's the trouble almost every company has: all this data and no efficient, plausible way to use it. In my opinion, the work required does not impart a big enough benefit. Averaged over a whole game and across a whole season and against all teams I imagine points per play and points per possession will differ merely by a scaling factor. This scaling factor is irrelevant if you normalize the data in a consistent manner and should not affect the overall correlation with actual winning %.

CHONGS · « **Reply #41 on:** October 16, 2012, 06:23:17 PM »

This is an example:
http://statistics.ncaafootball.com/merge/tsnform.aspx?c=ncaa-football&page=cfoot/stat/ncaa-team-totaloff.htm

(note only 120 teams are reported, the newest 4 are neglected, but don't really matter much anyway).

Author Topic: Lots of data to look at (Read 4809 times)

mocat

Re: Lots of data to look at

8manpick

Re: Lots of data to look at

SwiftCat

Re: Lots of data to look at

mocat

Re: Lots of data to look at

SleepFighter

Re: Lots of data to look at

SleepFighter

Re: Lots of data to look at

Rage Against the McKee

Re: Lots of data to look at

Stevesie60

Re: Lots of data to look at

CHONGS

Re: Lots of data to look at

michigancat

Re: Lots of data to look at

CHONGS

Re: Re: Lots of data to look at

SwiftCat

Re: Lots of data to look at

michigancat

Re: Lots of data to look at

CHONGS

Re: Lots of data to look at

michigancat

Re: Lots of data to look at

CHONGS

Re: Lots of data to look at

CHONGS

Re: Lots of data to look at