Sunday, August 12, 2007

The Captain's Crazy College Football Formula: Or Why I Never Actually Bet Money on Sports

As my anticipation and glee for the coming college football season mounts, I find myself contemplating what the future might hold. I'd like to think that after following the sport for a few years I have some understanding of which teams have a chance to win the national championship and which do not. After all you can impress a lot of people (and possibly make some money) if you can state in August who will come out on top in January. Surely all college football fans try their best to predict what will unfold when the games are actually played, but I have to question whether rational and objective measures are actually worth any more than random guessing. To that end I am going to gather up my best college football predictions and test to see whether they are any more successful than an arbitrary set of guesses. Let me start by putting together my selection of teams I believe are most likely to win the national championship.

1.
For starters, I know the following ten things, in no particular order, to be true:
1). The Tennessee Volunteers will not will the BCS Championship game.
2). The Florida State Seminoles will not will the BCS Championship game.
3). The Oklahoma Sooners will not will the BCS Championship game.
4). The Miami Hurricanes will not will the BCS Championship game.
5). The Ohio State Buckeyes will not will the BCS Championship game.
6). The Louisiana State Tigers will not will the BCS Championship game.
7). The Souther California Trojans will not will the BCS Championship game.
8). The Texas Longhorns will not will the BCS Championship game.
9). The Florida Gators will not will the BCS Championship game.
10). No team from a conference other than the six so-called BCS conferences will win the championship game.

I hold that these statements have the weight of evidence behind them and are not an arbitrary effort to slim down my pool of potential candidates or, worse still, a reflection of any sort of bias on my part. What you want to see that evidence? Well, I probably am setting a bad precedent by giving into you like this, but what the heck I like showing off anyway.

My claim that teams from the "mid-major" football conferences won't be winning the title is probably without controversy. After all, the BCS system is heavily tilted in favor of the traditional powers in the tradition conferences. We saw this laid out quite blatantly last year when after all the bowls where player only one undefeated team was left standing, sudden national darling Boise State from the WAC, and as far as I can tell they didn't receive so much as a first place vote let alone a national title. So I'm pretty sure we can all agree it will have to be a team from the Big East, ACC, Big 12, Pac-10, Big 10, SEC, or be a high profile independent school. (But we all know we're really just talking about Notre Dame here, right?) Personally I don't like the set up, so don't complain to me fans from other conferences. I just have to include it in my calculations.

Now a college football fan who is truly savvy (read: has paid attention to college football for the last decade) will notice that the nine specific teams I listed above have all won national championships recently. In fact these are the nine teams to have the first nine Bowl Conference Series Championships since the system reached its current format that includes all the major conferences (plus a disproportionate amount of Notre Dame of course). I am firmly convinced that once a team wins the BCS title game something happens that permanently spoils there chances of winning it again. There have been nine BCS championship games which means eight years in which a previous winner could have won a second BCS title. In those eight seasons a previous winner has played in the BCS championship game six times, so we've had enough occasions to determine whether or not there is some kind of trend. Now think real hard, how many times has a team that won the championship game since the founding of the BCS gone on to win it a second time?

The correct answer is zero. Once I realized the 0 for 6 trend (you could even expand it to 7 missed chances at multiple championships since Florida State appeared in the first BCS title game and lost but went on to win it the next year then lose it the following year) was my first indicator that something was up. Some of you may have thought of the University of Southern California who were awarded a national championship by the AP reporters poll and then went on to win the BCS title game the next year. I think this actually demonstrates how this is uniquely tied to the BCS game itself. The year after USC won a championship through the official sanctioned process of the BCS they joined the ranks of previous winners to lose with a chance to repeat. So let's be clear, I allow for the fact that the AP voters may award a title to any of the nine teams I listed above in collective fit of pique, as they did with USC. The polls are mercurial creatures anyway, and there's no certainty that the same strange cosmic forces that seem to control the BCS system have any effect on them.


I can even present additional proof that this is a uniquely BCS thing and not just some general cyclical condition of college football teams only get the stars-lined up for a championships once in a very great while. The University of Florida won the BCS championship last year having won a national championship just two years before the BCS system began. So it's not recent champions who are at a disadvantage; this is a condition unique to those who have won the BCS title game.

I cannot identify what is at work here, possibly the copy-catting by other teams of the techniques used by each successive championship, some kind of laxness in a program once it "arrives", or maybe just a bizarre and previously undetected kink in the BCS system that manages to call up just the right team to crush all of these potential dynasties. Whatever the reason, when those nine teams have already gone 0 'fer in all their chances so far, I see no reason to imagine this will be the year that's breaks the trend. I can easily imagine any one of the nine previous winners making the title game this season, but I am sure that if they do they will lose to a team that has not won the BCS championship yet.

So to begin my search for teams that by objective and quantifiable standards seem well suited to win the championship, I have to throw out all past BCS winners. Many have already picked one or more of these teams as their favorite to win, but I look at the evidence and have to say I'm more comfortable looking past the usual suspects. These means that I'm also looking past several teams that have some impressive evidence to suggest they are perfectly suited to win a championship.

2.
So what traits am I looking for in my pick to win it all? As I said I am looking for characteristics of a team that I can measure and analyze and that are ideally perfectly objective or less ideally a subjective measure that has been grounded and counterbalanced somehow. I also think we can focus on some factors that have been identified as crucial elements of a championship team that might get overlooked for some other snazzier looking characteristics of a team.

For starters, it has been demonstrated over and over that teams with notable players at skill positions and a tendency to perform impressively on offense are often overrated. They tend to be bested by teams that control the ball well, mostly with a strong running game, and play outstanding defense. We've seen this happen at least three times in the BCS era alone. The Miami Hurricanes lost to the Ohio State Buckeyes in 2003. The USC Trojans lost to the Texas Longhorns in 2006. The Ohio State Buckeyes lost to the Florida Gators in 2007. (These three teams may also have been given too much credit going in because they all had previously won a BCS championship).

Further we can say that having a highly ranked recruiting class in your immediate past isn't going to improve your odds of winning the title. So I will not be worrying about how many blue chippers or Rivals.com five-star high-schoolers a team has brought in. I will put much more credence to the strength of returning starters than hot shot recruits.

A few more factors warrant attention as I hunt out prospective champions. Several experts have also pointed out that teams without strong veteran offensive and defensive lines tend to fall short of the hype. I also believe a team has to play a schedule that's perceived to be worthy of a champion to get a shot. Again Boise State had the best record in the country last season, but the championship went to a team that played the toughest schedule and managed to only lose once. Also I think you have to weigh a team's value against their relative strength compared to their conference. All nine BCS winners also won their conference. Plus what good does it do you to be the third best team in the country if that means you're only the second best team in your conference, par example Michigan 0f 2006.

To sum all that up, I am looking for measurements of a teams rushing offense, defense, offensive and defensive lines, and schedule relative both to the national setting and each team's respective conference while incorporating the presence of returning starters and, if possible, for them all to fall into the same general metric so I can easily compare and combine the various values that I manage to concoct. It is a tall order, but I think I can do it. After all I'm the one who set this task out for my self so I would look quite the fool if I couldn't complete it.

3.
I started by basing all my measurements on percentiles, as in what percent of a given population is a given team better than in a given element of football. This works well because I can easily use it at both the conference and national levels. It maintains a level of absolute value in that all high performing teams will have a recognizably large number and all low performing teams will have a noticeably small number (unlike with some ordinals where you can guess an 7 is good and a 2 is bad in one measurement but how does that compare to a 12 and an 81 in another). Plus they bring in a relative factor a straight up metric wouldn't by showing us everything set to an easily comprehensible 0 to 100 scale.

As for the metrics themselves I have split them up into six categories, rushing on a national scale, rushing relative to conference, aggregate strength of the lines on both sides of the ball, defense on a national scale, defense relative to conference, and strength of schedule. I will admit that to make this work I did have to include subjective figures in with objective stats but I tried to balance that out. Let's take them one at a time.

For rushing on a national scale, called National Rushing in my charts, I decided to credit a team for how much of rushing success they were returning from last year. This meant I took their rushing rank from last year (converted into a percentile of course) and multiplied it by the percentage of starters they were returning. This seems to be a fair measure to me because it doesn’t penalize a team for losing a proven running back to the NFL draft nor does it give a team a bonus for bringing in some hotshot left tackle recruit. It rewards the system of rushing a team has in place because a system will outlast losses of players or even coaches and will make even mediocre recruits look like Heisman hopefuls. Thus a team that finished in the 92nd percentile in rushing yards but only returned 6 starters on offense will get as much credit as a team ended up in the 55th percentile and returned 10 starters. Think of it as a line of credit, the more success a team stored up last year the more credit they get and the more assets a team returns the more credit it gets. So the best way for a team to get a high rating was to have run the ball well last year and to have brought back a most of their offense, such as Texas A&M did.

For rushing on the conference scale, Conference Rushing on the charts, I had to include some subjective projected rankings. I took the projected rankings of the running back units for all the teams from both Athlon Sports and Scout.com and averaged them together and created a percentile ranking within for each teams within their respective conference. Combining these rankings is meant to limit the flaws to which subjective rankings and projections are prone. So teams that are generally regarded to have the best running attack in a large conference, like Arkansas, are going to receive high scores.

Measuring the strength of teams in the trenches is difficult because there aren’t many accurate stats kept in college football for how well offensive and defensive lines play their roles. So this metric, Average Lines on the charts, is again dependent on the subjective projections from Scouts.com and Athlon. I merely averaged together the projected unit rankings within the conference from both sources for both the offensive and defensive lines. This meant to get a high score a team had to be perceived as being among the best on both sides of the trenches within their conference, as Rutgers is.

I measured defense on the national scale, National Defense on the charts, much as I did national rushing. In this case I combined the teams’ percentile scores in both yards and points allowed. Then I multiplied every team’s score by the percentage of defensive starters they returned. Again this is based on my belief that football programs retain a certain base level of defense that survives changes in personnel, coaches, or scheme and can only be elevated as players gain experience. This is why teams with a strong reputation for defense will usually get good scores, and will get great scores when they bring back a lot of their experienced players, such as Virginia Tech this year.

Defense compared to the conference, in the charts as Conference Defense, was evaluated pretty much like rushing compared to the conference. This time I included the subjective projections from Athlon and Scout.com for defensive lines, defensive backs, linebackers, and overall defense for each team within their respective conference. Here a team must be generally judged to have one of the best all around defenses within their conference, as Wisconsin is this season.

Finally, I had to create a metric for the strength of a team’s schedule. This was a little tricky because most schedule ratings depend on a certain number of games having been played to evaluate the relative strength of each team on a team’s schedule. I didn’t want to rely entirely on last years results either since the strength of each team is obviously variable from year to year. So once again I had to rely on some subjective measures, namely all the preseason polls and rankings. Thankfully there are websites that already gather up all these preseason predictions for me. Hooray for the Internet! Is there anything it doesn’t know?

Since these are all just speculation I didn’t want to put too much emphasis on the rankings of individual teams, instead I divided all the world of Division I college football into a rough hierarchy of five groups. By playing teams from each of these groups a team can earn points that count toward giving it a better schedule ranking. Of course the points are different for each of the five groups and different if a team is playing them in or out of conference.

At the bottom you have all the teams from the lower football subdivision (generally called D-IAA but now officially titled the Football Championship Subdivision for the usual no good reason). These teams have their own play-offs and aren’t eligible for the national championship in Division IA (or the Football Bowl Subdivision, again no good reason) and are generally on a BCS conference team’s schedule as a way to guarantee a victory. I figure any team that tries to schedule an easy victory will be looked down upon by poll voters, so I attached a value of -1 point to all games against lower subdivision teams (which are by definition out of conference).

Then come the bottom feeders of the upper subdivision those teams which are generally agreed to be in the bottom half of the 120 teams playing for a shot at the BCS championship. Again teams may schedule them out of conference trying to get an easy win, so no points are awarded for playing teams ranked by Athlon and Scout.com in the bottom 60 out of conference. In conference things can be different because every team within a conference has more familiarity with a regular opponents style and system and the grind of a conference schedule can lead to upsets more easily, which is why I give one point for playing these teams in conference.

Next are teams that make the top 60 but not the top 30 these are the teams likely to go to a bowl game but not likely to appear in the Top 25 polls. They pose some threat of upset to any team and so are worth two points in conference but are worth more out of conference, three points, since it will impress voters to take on a solid but not great team by choice. Similarly teams that are by consensus in the top 30 going into the season are likely to be ranked for substantial parts of the season and therefore will be very impressive pelts for a team to collect, so I award four points for in conference games against them and five points for out of conference games against these potential Top 25 teams.

Then I have teams considered to be in the top ten by the preseason consensus. Since these are the teams considered most likely to make a BCS bowl and likely contenders for the championship they are valued most of all at six points within conference schedules at seven points outside of them. After all playing one of these teams by choice would really make an impression on the voters, probably more than any other victory could. I think these points would provide ratings that make intuitive sense and would fairly reward scheduling a tough out of conference slate even with a weak conference schedule and would sufficiently handicap teams that schedule cupcakes out of conference even if they play in a tough league. Overall this means teams from strong conferences that schedule games against big time opponents, as Oregon does this year, are going to get high scores.

After all of those convoluted computations I created a list of twenty teams who showed some potential to win the national championship this season. What's more I can now present them to you in the order in which my system ranks them. I cannot promise that the teams listed in the top have the best chances to play in the BCS title games -I cannot say for certain how well they each match up with the teams on their respective schedules- but the teams on the top are definitely the ones in which I have the most confidence.

They are:

1) Wisconsin Badgers
2) Georgia Tech Yellow Jackets
3) Rutgers Scarlet Knights
4) Texas A&M Aggies
5) UCLA Bruins
6) Michigan Wolverines
7) Virginia Tech Hokies
8) Louisville Cardinals
9) Oregon State Beavers
10) Arkansas Razorbacks
11) West Virginia Mountaineers
12) Auburn Tigers
13) Nebraska Corn Huskers
14) Iowa Hawkeyes
15) Oregon Ducks
16) Boston College Eagles
17) Georgia Bulldogs
18) California Golden Bears
19) Penn State Nittany Lions
20) Wake Forest Demon Deacons

Since Blogger hates tables, it’s hard for me to share all of my data. However I spent so much time putting it all together and computing all the figures, that I would hate to let it go to waste. That’s why I've posted all my spreadsheets here, if anyone cares to review my math the numbers of the teams I’m analyzing are on Sheet 1 (not that I actually included any of it, but I’ve tried to make my research methods as clear as possible, which is why I prattled on so much above).

4.

After all of these vaguely objective looking computations, the question remains can you get an sort of insight into who’s going to win the college football national championship based on analysis and logic or is it all a crapshoot. I really wanted to determine whether reasoned picks are any better than arbitrary one to test with all of this unscholarly research, so maybe I should propose an experiment.

If all of that number crunching from above has any real value and the pool of the teams I selected and measured should be more likely to produce the national champion than a pool of teams arbitrarily selected. In a gambling scenario, we could imagine two hypothetical bettors each interested in placing a futures bet (i.e. wagering on the outcome of some far off event with set pay-out odds line 30-1 or 5-2, much like someone betting on a horse race except this horse race doesn’t happen for four months and no one really know how good any of the horses are yet or exactly which of 119 different horses are likely to actually be in a race of 6 to 12 horses in the end). Both of these hypothetical gamblers want to use a big chunk of money they got (how they acquired this money isn’t really relevant to my experiment so just hypothesize whatever gets you tingly inside) to bet on who will win the national championship in the college FBS (do you even remember what this stands for anymore, *sigh* go ahead scroll back up the page I’ll wait). We’ll give them some credit for being rational gamblers and say they will put the same amount of money on every team in their pool rather than chasing bets –a pointless effort to try to cover losses by betting big either when your especially confident in your or when you can get a potentially high payoff. However they each have there own way of deciding on whom to bet. One uses the same analytical method I described above (clever bloke he is) to pick the 20 teams I’ve studied. While the other bases his picks on whims and passing fancies to get a hodgepodge of 17 teams and a “cover his bases” bet on the Field, as in all the long shot teams Vegas didn’t offer individual odds. I actually selected the arbitrary teams from a hodgepodge of past BCS champions and teams with comparable odds to the ones chosen objectively. You can see the teams lined up together on Sheet 2 to of my Excel charts, where they are paired up based on having comparable odds to win the championship according to Vegas.

So having staked out 37 of the 60 or so teams with any real hope of playing for the championship and throwing on a pick for the rest of the field outside of the major conferences and independents, it seems likely that one of our two pools will include the eventual national champion. For our hypothetical gamblers that could mean a big payday, something to cover their losses or just a whole lot of money flushed down the toilet. No matter what we’ll have some evidence that using careful analysis when placing bets either does or does not increase your odds of winning. I’ve included a chart breaking down the expected payoffs for the different teams. Remember not all of these teams get a big enough payoff to cover all the bets our gamblers are making. That’s just part of the risk when you make multiple attempts at a futures bets. Before you ask, there would be no point loading up on teams with bigger payoffs or betting more heavily on teams the analysis scored higher both of those may seem like good way to protect against losses but remember its far more likely that money would just be lost too.

There are a lot of ways this could play out and so many ways to break down the betting strategies employed that I’ll have to write an in-depth follow up at the end of the season. So now on top of all of my preexisting excitement for the college football season, I now have the added anticipation of how this crazy experiment will work out and a chance to study the results. That may seem a little geeky (okay it definitely seems very geeky), but all you can expect sports wagering to bring you is some added excitement to the experience of following the games. That is of course unless I’m the one extremely fortunate person to crack the code of college football. We’ll just have to wait and see.

No comments: