Saturday, April 7, 2007

Numbers Numbers Numbers

I almost waited to post this because I was having too much fun with the K-State basketball situation, but...I guess I’ll move on. For now.

I’ve officially embraced my stat-nerddom like never before. In preparation for the upcoming 2007 football season, I’m diving into box scores and swimming around a bit. And I plan on doing this all summer. Let’s just say that, as much as I enjoy basektball and basketball stats, football ranks much higher on my list. You’ve been warned.

Here was my first item of business: look at 2006 box scores for all Big XII teams and do a simple correlation. What would happen if I looked at a bunch of different statistical categories? Which categories would have the highest correlation to actual success and failure for each team? Would the key categories be the same for every team? Absolutely, positively not.

WARNING: For those of you (ahem, The Beef) who begins to get a headache when statistics terms are discussed, please skip over the italicized portion below.

What is a correlation? From wikipedia, which has all the answers: "In probability theory and statistics, correlation, also called correlation coefficient, indicates the strength and direction of a linear relationship between two random variables. In general statistical usage, correlation or co-relation refers to the departure of two variables from independence, although correlation does not imply causation. In this broad sense there are several coefficients, measuring the degree of correlation, adapted to the nature of data." Crystal clear? Basically it means, if one variable goes up, another variable will likely go up too. Correlation isn't equal to causation, but the higher the correlation, the stronger the relationship between the two numbers.

A correlation coefficient can only be between -1 and 1. In all of this correlation analysis, keep in mind that I’m looking at the absolute values of these correlations. Just about any category involving opponents’ success probably had a negative correlation (i.e. the fewer yards for your opponents, the higher correlation to your team’s success), but being that I was looking for the strongest overall correlation, I looked at absolute values.

Okay, now that that's out of the way...

First, here were the statistical categories with the five highest correlations to success, conference-wide:

1. First Down Ratio (your total FD’s versus your opponent’s FD’s) (correlation: 0.68)
2. Opponents’ Total First Downs (0.59)
3. Third Down Conversion Ratio (your 3rd down converion rate versus your opponent’s) (0.53)
4. Opponents’ Yards Per Passing Attempt (0.51)
5. Opponents’ Third Down % (0.48)
Now, your first impression as I list those five are, DUH. Of COURSE first downs and third down conversions matter. Well, everything matters to some degree. There’s no denying that. If the top category had been rushing yards or opponents’ total yards or time of possession, you’d have said ‘duh’ to that too. That was kind of my idea in looking at this. Everything matters, but what matters the most to each team?

Well, here’s a look at each team’s top five and what it probably suggests about each team.

Missouri

1. Rushing Attempts (0.84)
2. Time of Possession (0.72)
3. First Down Ratio (0.71)
4. Rushing Yards (0.70)
5. Opponents' Rushing Attempts (0.70)

There were a few weird things about Missouri’s numbers. First of all, as you’ll see, these correlation numbers are much higher than other teams’. In other words, these categories were more tied to Missouri’s success/failure than other teams’ highest categories were to theirs. Also, the 0.84 correlation for Missouri’s rushing attempts was just about the highest correlation on the board. Obviously there’s some cause-effect working there—Missouri is more likely to run the ball when they’ve already got the lead. But does that also mean they should run the ball more at any time in the game?

One other weird thing: for every other team in the conference, time of possession meant next to nothing (which was surprising in and of itself). But for Mizzou, it was just about the most important thing. Gary Pinkel spends a lot of time talking about how TOP doesn’t matter—it’s total plays that matters. Well, total plays had a relatively high correlation (in the 0.6 range), but TOP meant more. I wasn’t expecting that.

Baylor

1. Opponents’ turnovers (0.74)
2. Opponents’ completion % (0.70)
3. Opponents’ 3rd down conversion attempts (0.69)
4. Opponents’ 3rd down conversion % (0.64)
5. Third Down Conversion Ratio (0.60)
Takeaways meant more to Baylor than any other team, and by a pretty wide margin. They needed to create some extra opportunities for themselves, and when they didn’t do it, they lost.

Colorado

1. Opponents’ Rushing Attempts (0.87)
2. First Down Ratio (0.80)
3. Third Down Conversion Ratio (0.77)
4. Total Plays Ratio (0.71)
5. Opponents’ Total Plays (0.71)
Colorado’s correlations were the only team’s stronger than Mizzou’s, and all of these categories have to do with ball control...which makes sense. Colorado didn’t have many explosive weapons on offense, and their defense was good at preventing the big plays. Whoever was able to dictate the tempo won the game.

Iowa State

1. Third Down Ratio (0.79)
2. Opponents’ Passing Attempts (0.67)
3. Rushing Yards (0.66)
4. 3rd Down Conversion % (0.66)
5. Opponents’ Pass Completions (0.64)
To everyone else in the conference, opponents’ passing attempts and completions were just about the least important categories on the list. However, for Iowa State it was extremely important. To me, that suggests that Iowa State had trouble building a lead, but when they had one (and their opponents therefore had to pass a lot), they were decent at holding onto it.

Kansas

1. Opponents’ First Downs (0.67)
2. 3rd Down Conversion Rate (0.67)
3. Opponents’ Yards per Carry (0.60)
4. Opponents’ Total Plays (0.57)
5. First Down Ratio (0.54)
Like Colorado, ball control meant absolutely everything to Kansas. If their opponents were running the ball well and getting first downs, Kansas was screwed. However, if KU was able to string together some first downs, they were in good shape.

Kansas State

1. 3rd Down Conversion Ratio (0.86)
2. 3rd Down Conversion Rate (0.75)
3. 3rd Down Conversions (0.66)
4. Team Passing Attempts (0.57...this was a negative correlation)
5. Turnovers (0.55)
Sense a trend here? Me too.

Nebraska

1. Rushing Yards (0.82)
2. 3rd Down Conversion Ratio (0.75)
3. Rushing Attempts (0.75)
4. Pass Completion % (0.72)
5. 3rd Down Conversion Rate (0.71)
This tells me that Zac Taylor winning Big XII Offensive Player of the Year was an even bigger joke than I thought it was. If this team was running the ball well, Nebraska was winning. If they weren’t they were losing. And if they were running the ball well, that opened up the passing game. Zac Taylor was about the 9th most important player on the offensive side of the ball.

Okay, maybe that was a bit overboard. But my point is valid, and you know it.

Oklahoma

1. Opponents’ Completion % (0.74)
2. First Down Ratio (0.65)
3. Penalty Yards (0.61)
4. 3rd Down Conversion Ratio (0.58)
5. Opponents’ Rushing Yards (0.54)
This was a pretty unique set of categories. We knew Paul Thompson wasn’t all that important to OU’s overall success—his one job was “Don’t screw up” and he did a decent job of that—but this pretty much verifies that the entire offense was assigned the same role. That’s pretty surprising considering that OU was a pretty strong rushing team...especially when that Peterson guy was healthy. However, when you think about it, OU’s improvement coincided with their defense’s significant improvement. It was a huge disappointment the first month or so of the season, but after the Texas game, things clicked, and OU didn’t lose again the rest of the season (to a team not named Boise State, anyway).

Oklahoma State

1. Opponents’ Completion % (0.72)
2. First Down Ratio (0.65)
3. Opponents’ Yards Per Passing Attempt (0.63)
4. Opponents’ Rushing Yards (0.61)
5. Rushing Attempts (0.61)
This makes sense to me. OSU’s offense was consistently good all year. Lots of explosiveness and big plays. However, the defense...not so good. When the defense—particularly the pass defense—stepped up, success followed.

Texas

1. Yards Per Pass Attempt (0.75)
2. Yards Per Pass Completion (0.73)
3. First Down Ratio (0.70)
4. Opponents’ Yards Per Passing Attempt (0.64)
5. Pass Completion % (0.63)
Big plays = good. Giving up big plays = bad.

Texas A&M

1. Opponents’ Total First Downs (0.67)
2. First Down Ratio (0.65)
3. Opponents’ Yards Per Rush (0.62)
4. Opponents’ 3rd Down Attempts (0.57)
5. Opponents’ Pass Completion % (0.57)
I have absolutely no idea what to make of this. Seriously. Do you? Um, ball control’s important, I guess?

Texas Tech

1. First Down Ratio (0.81)
2. Third Down Ratio (0.76)
3. Rushing Yards (0.68)
4. Turnover Ratio (0.67)
5. Yards Per Pass Attempt (0.66)
This one makes sense too. Just like OSU (even moreso), the offensive yards were always there. They’re always going to get first downs, but if you get more than they do, you’re probably going to win. Also, when they’re ahead, they run more...just like Missouri.

Over the next couple of weeks—in the lead-up to the Black & Gold Game—I’ll be taking a look at each Big XII team, and I figure...you know...since I spent all this time looking at numbers, I should probably use them in those previews too, huh?

Again, you’ve been warned.