Sunday, June 7, 2009

And Now For Something Completely Similar

There is obviously a big difference between knowing what happened and knowing what will happen. Many of us cannot even agree on what did happen (the well publicized problems with the BCS for example), so how can we expect to know what will happen. However, knowing our limitations and the limitations of numbers can sometimes help us to get closer to predicting the future. There is a very good book called The Wisdom of Crowds by James Surowiecki, which talks about how large groups of people containing mostly non-experts and a few experts are very good at making predictions. He gives countless examples, but a few of them are financial markets, the Iowa Electronics Market (an extremely accurate predictor of political elections years in advance), and even a County Fair game of guessing the weight of an ox.

I should digress for a short moment to give the basic theory behind the book for those who do not take the time to read the linked articles. The idea is fundamentally simple: in large groups of people with good and bad information, the bad information cancels out the other bad information leaving only the good information. The example of the guessing game probably illustrates the point best. Imagine going to a County Fair where there is a competition of who can guess the weight of an ox. At this County Fair, besides you, are other members of your community: lawyers, doctors, janitors, teachers, bus drivers, etc. But there are also butchers, farmers, chefs and food wholesalers. The first group represent the non-experts. Though they may have very keen eyes, they probably don't know exactly what to look for when judging the weight of an ox. Some will guess much too low and some will guess much too high. Some will guess a little low and some will guess a little high. However, there is no reason to assume that their errors will be systemic. Quite simply, there is no reason to believe that the errors will pile up on the low end or the high end. In a group of large enough size, the bad low guesses and the bad high guesses should cancel each other out. Then there are the experts. They too will guess lower and higher than the actual weight. Some may actually guess almost exactly. However, again there is no reason to assume that their errors will either be mostly high or mostly low. In the end, you have a large number of low guesses and a large number of high guesses, which when averaged should be very near to the actual weight of the ox. This example is based on a real occurence at a 1907 County Fair attended by the mathematician Francis Galton, where the average of all the guesses was 1 pound from the actual weight, whereas no individual guess was nearly as close.

With this concept in mind, I pursued what I thought was the best way of predicting what will happen in the upcoming season. One more digression, I should mention that I used offensive and defensive efficiencies from the Relative Ratings as opposed to the adjusted pts/game for predicting 2009. I theorized that although teams' adjusted points for and against may fluctuate, how those points relate to the nation should remain more consistent year to year. For instance, Team A has an adjusted points for of 40 pts/game in 2008. In 2007 they had a value of 30 pts/game. The percent difference is 33%, which is pretty large. But let's say in 2008 the average adjusted points for of the nation was 20 pts/game and in 2007 the value was 17.5 pts/game. That means in 2008 Team A had an adjusted offensive efficiency of 2.00 (40 / 20), and in 2007 the value was 1.71 (30 / 17.5). This represents a 17% difference from 2007 to 2008, half as much as when using the adjusted pts / game. So although they "scored" 10 pts/game more, they weren't as drastically more efficient.

Ok. Great. So what? Using the last 10 years (BCS years), I went looking for trends. How did team's efficiencies change from year to year? Were there three/four/five year cycles, gradual improvements/declines, so-called unpredictable ascensions and collapses? Remembering The Widsom of Crowds, I decided to build projections with all of these possibilities in mind. I used regression and my own analytical ideas to calculate 11 separate offensive and defensive efficiencies for each team. Some have names like "Payback" or "Course Correction". Others have more boring names like "ADV" and "5YRAVG". Each method tries to model a different aspect of how a team may change year to year depending on what happened in the past. Most importantly each method has its own strengths and weaknesses; some of the predictions for 2009 will be too high, some will be too low.

I then figured out which methods were best when and assigned probabilities to each method being selected. This is kind of like saying "what is the population of experts and non-experts in my sample." So let's say "ADV" was the best predictor of a team's efficiencies about 10% of the time (in the past). Then 10% of the time any given team would have the "ADV" 2009 efficiency assigned to them.

I then wrote a fairly simple macro in Excel that would calculate a random number for every team for both offense and defense. It would then assign one of the 11 offensive or defensive efficiencies based on that random number (and based on the previously calculated probabilities). Every game in the season would then be simulated and the wins/losses/points tabulated. The process repeats until the number of simulations entered is completed. There is also some randomness built into each individual game, that way it prevents a team like Florida from beating a team like Troy in 100% of simulations. After all, unthinkable upsets sometimes happen (Appalachian State vs Michigan comes to mind).

Hopefully if you've simulated enough times, you are left with what amounts to the truth. Like The Wisdom of Crowds argues, whether your have experts (good predictions) or non-experts (bad predictions) in a group, given enough people (simulations) the bad will cancel out each other and the good will be left. I have included the results of 30,000 simulations below. Why 30,000? Because there are 121 teams, 11 possible offensive ratings and 11 possible defensive ratings. This means there are 14,641 different combination possibilities. I figured if you double that number you should get a good representation of all the possibilities. I then rounded that value to the nearest 10,000 because I like round numbers. A note about the table below: Log5 Wins uses the Log5 formula and each teams Relative Rating (reminder: RR = Off^2.8 / (Off^2.8 + Def^2.8)) to get a probability of each team winning and ABS (Absolute) Wins uses the points scored of each team to declare a winner. You will also notice a team called DIV I-AA Group. This represents all teams in what is now called the Football Championship Subdivision or FCS for short. I don't like that I have to do this, but as of yet I haven't been able to come up with a good way to properly value the skill level of an FCS opponent on an individual basis. Interestingly I have them winning 7 games in 2009, which is their average over the last 10 season. Is that evidence my methods are somewhat reasonable? I don't know. Onto the numbers:

TEAM 2008 WINS AVG LOG5 WINS AVG
ABS
WINS
MAX ABS WINS % OF SIMS
AIR FORCE 8 7.6 7.5 12 0.82
AKRON 5 6.3 6.1 12 0.22
ALABAMA 12 9.7 9.6 12 10.83
ARIZONA 7 6.8 6.8 12 0.27
ARIZONA ST 5 6.7 6.7 12 0.27
ARKANSAS 5 5.7 5.8 12 0.11
ARKANSAS ST 6 6.0 5.9 12 0.09
ARMY 3 4.5 4.6 12 0.03
AUBURN 5 6.9 6.9 12 0.72
BALL ST 12 8.7 8.5 12 3.88
BAYLOR 4 4.6 4.7 12 0.03
BOISE ST 12 10.9 10.9 13 17.25
BOSTON COLLEGE 9 8.4 8.1 12 2.19
BOWLING GREEN 6 5.8 5.6 12 0.19
BUFFALO 8 6.3 6.1 12 0.35
BYU 10 8.0 8.1 12 1.83
CALIFORNIA 8 7.5 7.3 12 0.96
CENTRAL FLORIDA 4 6.2 6.2 12 0.23
CENTRAL MICHIGAN 8 6.0 6.1 12 0.17
CINCINNATI 11 7.2 7.2 12 0.76
CLEMSON 6 8.1 7.5 12 1.49
COLORADO 5 4.9 4.9 12 0.05
COLORADO ST 6 5.4 5.6 12 0.07
CONNECTICUT 7 7.0 7.0 12 0.49
DIV I-AA GROUP 2 6.4 7.0 47 0.00
DUKE 4 4.5 4.7 12 0.01
EAST CAROLINA 9 6.6 6.5 12 0.25
EASTERN MICHIGAN 3 4.4 4.6 12 0.02
FLORIDA 12 10.4 10.3 12 22.61
FLORIDA ATLANTIC 6 5.8 5.7 12 0.24
FLORIDA INTL 5 3.6 3.8 12 0.01
FLORIDA ST 7 6.8 6.7 12 0.58
FRESNO ST 7 7.0 7.2 13 0.22
GEORGIA 9 7.5 7.1 12 1.33
GEORGIA TECH 8 7.2 7.2 12 0.89
HAWAII 7 8.1 7.9 13 0.72
HOUSTON 7 7.5 7.3 12 0.86
IDAHO 2 3.0 3.4 12 0.00
ILLINOIS 5 6.5 6.3 13 0.10
INDIANA 3 3.9 4.3 12 0.00
IOWA 8 8.0 7.8 12 1.35
IOWA ST 2 4.6 4.8 12 0.03
KANSAS 7 7.1 6.8 12 0.65
KANSAS ST 5 6.3 6.3 12 0.17
KENT ST 4 5.0 5.4 12 0.08
KENTUCKY 6 6.1 6.1 12 0.18
LA LAFAYETTE 6 5.6 5.8 12 0.12
LA MONROE 4 6.0 6.2 13 0.04
LOUISIANA TECH 7 4.7 4.6 12 0.00
LOUISVILLE 5 5.9 6.2 12 0.31
LSU 7 8.3 8.3 12 2.18
MARSHALL 4 4.8 5.0 12 0.05
MARYLAND 7 5.7 5.9 12 0.24
MEMPHIS 6 5.6 5.8 12 0.10
MIAMI FL 7 6.2 6.3 12 0.33
MIAMI OH 2 3.8 4.1 12 0.02
MICHIGAN 3 6.5 6.4 12 0.24
MICHIGAN ST 9 7.2 6.8 12 0.46
MIDDLE TENN ST 5 5.1 4.9 10 0.38
MINNESOTA 7 5.2 5.4 12 0.09
MISSISSIPPI 8 7.4 7.3 12 0.83
MISSISSIPPI ST 4 3.5 3.6 12 0.01
MISSOURI 9 8.5 8.3 12 3.02
NAVY 8 8.5 8.4 14 0.45
NEBRASKA 8 7.2 7.2 12 0.73
NEVADA 7 7.4 7.1 12 0.71
NEW MEXICO 4 5.5 5.5 12 0.17
NEW MEXICO ST 3 4.3 4.6 13 0.00
NORTH CAROLINA 8 6.4 6.5 12 0.26
NORTH CAROLINA ST 6 5.2 5.6 12 0.06
NORTH TEXAS 1 2.8 3.1 11 0.01
NORTHERN ILLINOIS 6 7.2 7.1 12 0.86
NORTHWESTERN 9 6.7 6.6 12 0.32
NOTRE DAME 6 5.9 5.9 12 0.44
OHIO 4 5.4 5.5 12 0.15
OHIO ST 10 9.8 9.9 12 12.85
OKLAHOMA 12 9.2 8.9 12 7.21
OKLAHOMA ST 9 7.1 7.1 12 0.77
OREGON 9 7.8 7.4 12 1.10
OREGON ST 8 7.6 7.4 12 0.75
PENN ST 11 10.2 10.2 12 18.25
PITTSBURGH 9 7.5 7.3 12 1.00
PURDUE 4 5.5 5.6 12 0.16
RICE 9 5.3 5.5 12 0.11
RUTGERS 7 8.9 8.6 12 5.00
SAN DIEGO ST 2 4.1 4.1 12 0.01
SAN JOSE ST 6 4.7 4.7 12 0.01
SMU 1 4.5 4.8 12 0.01
SOUTH CAROLINA 7 6.5 6.4 12 0.31
SOUTH FLORIDA 7 7.5 7.6 12 1.18
SOUTHERN MISS 6 7.5 7.4 12 0.83
STANFORD 5 5.2 5.4 12 0.05
SYRACUSE 3 3.2 3.3 11 0.03
TCU 10 9.8 9.8 12 16.74
TEMPLE 5 4.8 4.9 12 0.02
TENNESSEE 5 6.6 6.5 12 0.43
TEXAS 11 9.9 9.8 12 12.72
TEXAS A&M 4 5.5 5.5 12 0.11
TEXAS TECH 10 8.2 7.9 12 1.47
TOLEDO 3 4.6 4.8 12 0.04
TROY 8 7.2 7.0 11 1.48
TULANE 2 3.1 3.3 12 0.00
TULSA 10 7.4 7.1 12 1.10
UAB 4 3.9 4.4 12 0.03
UCLA 4 4.9 5.1 12 0.05
UNLV 5 4.6 4.7 12 0.02
USC 11 10.2 10.4 12 26.00
UTAH 12 8.4 8.3 12 2.88
UTAH ST 3 4.3 4.2 12 0.01
UTEP 5 6.2 6.1 12 0.21
VANDERBILT 6 5.6 5.8 12 0.10
VIRGINIA 5 5.5 5.8 12 0.13
VIRGINIA TECH 9 8.7 8.3 12 3.66
WAKE FOREST 7 6.9 6.9 12 0.57
WASHINGTON 0 3.7 3.7 12 0.01
WASHINGTON ST 2 2.7 2.9 11 0.02
WEST VIRGINIA 8 8.8 8.5 12 4.33
WESTERN KENTUCKY 1 3.8 3.8 10 0.25
WESTERN MICHIGAN 9 7.0 6.8 12 0.61
WISCONSIN 7 7.0 6.6 12 0.65
WYOMING 4 3.6 3.7 12 0.01

Saturday, June 6, 2009

2008 SAWP and Relative Ratings

I thought I should probably post the ratings from 2008. One important detail, these are the ratings going into the bowl season and therefore do not include any of the bowl results. This is to show which teams my methods thought were the best after the regular season (including conference championships). Remember, SAWP only counts wins and losses and RR only cares about points for and against.

First, the SAWP Ratings:

RNK
TEAM
RATING
1
OKLAHOMA
0.7173
2
TEXAS
0.7125
3
FLORIDA
0.6992
4
UTAH
0.6852
5
BOISE ST
0.6742
6
ALABAMA
0.6636
7
USC
0.6634
8
TEXAS TECH
0.6625
9
PENN ST
0.6510
10
OHIO ST
0.6481
11
PITTSBURGH
0.6281
12
CINCINNATI
0.6262
13
BALL ST
0.6229
14
TCU
0.6186
15
GEORGIA
0.6106
16
MICHIGAN ST
0.6074
17
GEORGIA TECH
0.5926
18
OKLAHOMA ST
0.5925
19
OREGON ST
0.5907
20
VIRGINIA TECH
0.5874
21
BYU
0.5838
22
BOSTON COLLEGE
0.5808
23
OREGON
0.5790
24
NORTH CAROLINA
0.5759
25
NEBRASKA
0.5742

Here are the Relative Ratings:

RNK TEAM RATING
1 USC 0.8278
2 FLORIDA 0.8197
3 PENN ST 0.8139
4 TCU 0.8079
5 TEXAS 0.8060
6 ALABAMA 0.8059
7 OKLAHOMA 0.8036
8 BOISE ST 0.8020
9 OHIO ST 0.7968
10 IOWA 0.7934
11 MISSOURI 0.7616
12 MISSISSIPPI 0.7551
13 UTAH 0.7462
14 TEXAS TECH 0.7439
15 FLORIDA ST 0.7355
16 OKLAHOMA ST 0.7324
17 OREGON ST 0.7320
18 OREGON 0.7270
19 CALIFORNIA 0.7255
20 BALL ST 0.7228
21 RUTGERS 0.7221
22 CLEMSON 0.7093
23 BOSTON COLLEGE 0.7031
24 ARIZONA 0.6937
25 PITTSBURGH 0.6933

Introduction to the Sportstatician

Hello Everyone (Anyone?),

I should start with a disclaimer. I am neither employed in the world of sports nor officially as a statistician. Though at times in my profession I must execute many of the same actions of a statistician, that is not my official job title. I am more properly labeled a research analyst (I have an Engineering background), which involves thoroughly analyzing huge amounts of data for trends and often times performing statistical testing on those numbers. It is an important distinction (I think), and in the interest of full disclosure, something I should mention. I bring this up only because I do not intend to mislead anyone with the blog title of "The Sportstatician." I thought it was clever, so I chose it. With all of that stated at the beginning, I'll move on.

Having for a long time been interested in both math and sports, it was a few years ago that I decided to combine the two into what has become my favorite hobby. Using pretty simple concepts of mathematics, I developed my own rating system to rank college basketball and football (college and pro) teams. I will not claim to have invented this method, as it is quite simple and most likely has been in use by someone, somewhere for a long time. I know Ken Pomeroy uses a quite similar method in his ranking of college basketball teams, and although I came upon this method on my own, I will not claim novelty nor genius. What I have found though is the method is quite powerful at evaluating properly what has happened and what will (read: should) happen.

So how do I calculate the ratings? Like Pomeroy, my system assigns both an offensive and defensive rating to each team by adjusting the number of points scored and points allowed to reflect their strength of schedule. This is done in two main steps: a team's points scored are multiplied by the national average points scored and then divided by the average of their opponent's points against. The same method is used for points allowed, but instead the denominator is opponent's average points for. Every team now has an adjusted points scored and points allowed for every game. The second step is the same as the first step, but instead of using the averages, you use the adjusted averages. Almost exactly the same as Pomeroy, except he adds in possessions, which for basketball makes a huge (and beneficial) difference. As I said earlier, I do not claim this method is new, just that it works.

Now every team has an adjusted average points scored and points allowed. To get each teams' rating, I have used Bill James' Log5 and Pyth formulae along with the 2008 schedule to find the best exponent, which for college football is approximately 2.8. I have labeled this the Relative Rating due to lack of imagination. There is a week-adjusted rating (where more recent games count more towards the rating) and a non week-adjusted rating. The week-adjusted is better for predictive purposes and also for correlating to the BCS (more on that in a minute), but I prefer non week-adjusted because it's less biased.

In 2008, USC and Florida had the highest Relative Ratings after the regular season and therefore were determined by the system to be the two best teams. I liked this result, because that was how I subjectively felt, even though in the end the voters and computers disagreed.

Ahh, those BCS computers, they don't care at all about points scored and points allowed. While I understand the reasoning behind it - evidently they think teams will run up the score more than they already do (paging Urban Meyer) - since wins and losses are absolutes they aren't necessarily good indicators of what actually happened on the field. Surely the voters are influenced by margin of victory, but the computers are prohibited from doing so. Alas, that is the way things are done, so in another stroke of unimagination, I developed my SAWP (Schedule Adjusted Win Percentage) Rating.

The SAWP rating method is also done using simple mathematics and the power of Excel's Solver utility. Similar to the Relative Rating, there are only a few main steps to calculating the SAWP. First, three variables are assigned: WP (win points), LP (loss points), and Z (denominator). WP is the value the computer will assign to a win, LP is likewise the value the computer assigns to a loss, and Z is the value the computer uses in the denominator of the following equation:

Wining Team Points = WP / (Z - Win% opp)
Losing Team Points = LP / (Z - Win% opp)

At the beginning, I assign WP = 1, LP = 0, and Z = 1.5, but it really does not matter what they start out at because the computer will solve for the optimal values.

A team's WTP and LTP are then averaged. This value is then correlated to actual win percentages by using the square of the error. I also set the requirements that WP + LP = 1 and that Z = WP + 1. The computer then attempts to minimize the error by changing the values of WP, LP and Z. The result is the SAWP Rating; in 2008 Oklahoma and Texas were the top ranked teams, which is also what the BCS computers concluded.

Using the historical results of the final BCS rankings, the computer calculated out a composite rating of RR and SAWP to determine which two teams will make the BCS championship. The formula works for every year of the BCS championship (1998 and on) except 2008 where it felt Florida should have played Texas. Nothing's perfect.