Strangely enough, I find myself writing again about Bill Simmons. I found his latest article interesting, well-thought out, with his conclusions generally supported by his arguments. So why am I writing? Simmons did a great job breaking down film and the problems with the type of statistics used. I took issue with the fact that he concludes this “proves” the lack of predictive power of statistics, when I thought he should have concluded that he used statistical and observational analysis correctly. Simmons missed a golden opportunity to show readers how to synthesize statistics and low-sample number observations.
The setup: Week 10, Patriots at the Colts, 34-28. The Patriots had the ball on their 28 yard line, 2 min 3 s left to play, and it was 4th-and-2. Belichek decided to go for the first down rather than punting. There might have been some issue with the ball being spotted in the wrong place, but essentially, the Colts stopped the Patriots. Turnover on downs. The Colts scored on their series, after dragging out the clock, and won the game by a point.
First, Simmons does what I like sports writers to do: combine on-the-field observation with the context of what one usually sees from football teams, in the aggregate (i.e. some group analysis, which usually does mean statistical analysis). I happen to think his argument against not-punting, in this specific play, is stronger than, for example, Joe Posnanski’s and Gregg Easterbrook’s posts about the statistical analyses that generally supported Belichek’s decision. Simmon’s arguments were stronger because he specifically placed his observation of the game and the Patriot’s performance leading up to this last offensive call in the context of aggregate statistics. True to form, however, he followed this by trashing the statistical analysis, rather than concluding that he had properly evaluated singular performance and identified how the Patriots deviated from the aggregate.
Simmon’s argument is that most stat-heads used the wrong set of probabilities. Posnanski, Easterbrook and Simmons presented the statistical arguments that the Patriots had a greater chance of winning had they gone for the conversion, rather than punting. To be fair, the difference might have been slight; numerically, of course, one probability was higher than the other (Tim Graham of ESPN arriving at a 1.5% win probability). Had Simmons focused on reconciling the statistical assumptions with how Belichek’s play calling lowered the Patriots’ chances of achieving first down, I believe he would have provided a wonderful illustration of how one goes about reconciling statistical/probability estimates with actual events. Unfortunately, Simmons ignores the probability of winning, focuses on the probability of losing, and asserts that punting was the unequivocal correct call.
Simmons had a contrary opinion from Easterbrook and Posnanski on the punting issue, but all three of them found problems with Belichek’s coaching in the last minutes of play, preceding the 4th down conversion attempt. All three seemed to have pointed out issues with game management (such as 2 timeouts that were called just to make sure the right players were on the field) and with play calling (rushing on first down, passing on the next two downs). That last sequence seemed to have suggested that the call to play out the fourth down rather than punting was a spontaneous call. Simmons broke that down nicely, suggesting that rushing on third down made more sense if one is in fact going for a 4th down conversion. Finally, the actual play on 4th down was atrocious, as the Patriots limited their options drastically, going with an empty backfield. In this formation, there was no running option, and the Colts simply jammed Brady to hurry his throw. As it happens, he connected with Kevin Faulk, but short of first down.
I don’t think anything here contradicts the aggregate story (such as a greater than even chance of getting 2 yards). The fact is, there was much circumstantial evidence that Belichek might have flubbed the play. After all, there are no guarantees; just because the average play nets 5 yards doesn’t mean the players just stand there, waiting for the refs to spot the ball up field. You need to select a play and then execute it. As the saying goes, that’s why they play the game. The players still need to give their fullest effort.
What one should consider is how Belichek reduced the Patriots’ chance of converting by using a bad strategy. And Simmons actually did this. He noted that this play was essentially a 2-point conversion attempt, as both offense and defense were lined up to attack and defend a short field (i.e. defending the end zone with the line of scrimmage at the 2 yard line). There seemed to have been some confusion between the special teams and offense as it wasn’t clear to the players whether they were attempting a punt or not, necessitating a time out that could have been used later to challenge the Faulk bobble (see Posnanski’s post). Simmons presented some stats showing that 2-point conversions had a lower success rate (on the road; I have issues with Simmons’s selective stat picking, but that piece wasn’t exactly a peer-reviewed article.) It was unreasonable to conclude that the Colts would have rolled back down field to score with under 2 minutes to go, possessing only 1 timeout (despite the fact that the Colts did exactly that on their preceding drive. It probably was an aberration and won’t happen again. But a stat here would be nice, comparing how long in distance and time an avg NFL drive is.) The Colts also had an inexperienced, young receiver corps, which might have increased the Patriots’ chances of stopping the Colts after a punt.)
So, even if the average successful 4th down conversion is around 60%, the Patriots did not maximize the likelihood of success. Thus the stat-heads, in essence, should have altered the assumptions for their calculations, based on the on the field observations, from the last couple of minutes of the game. Maybe the Patriots should have punted.
There are some arguments against punting. Easterbrook focused on the specific offense/defense matchups as determined by this particular game. Easterbrook wrote that, on the previous possession, the Colts drove 79 yards in 1:40, without a time out, for a touch down. Easterbrook also noted that, to his eyes, the Patriots defense seemed a step behind the Colts offense. Also, the Patriots were playing against a weak secondary. As it happened, Brady and company rolled up 370 yards on the night. It seemed like they should have had a greater than the league average chance of converting the 4th down. They might have had a slightly lower than league average chance of defending ~70 yards, had they punted, as they had just shown they could give up a long drive (although Simmons pointed out that the Patriots stopped the Colts in 5 of the last 7 defensive series in that game.)
Again, the two arguments are whether the Patriots can stop the Manning with under 2 minutes and whether Brady plus Faulk, Welker, and Moss can gain 2 yards. On the field, there are probably enough game-related distractions and observations for Belichek. As Posnanski said, there might have been a lot going in Belichek’s mind. It might have taken him until the last second to come to some conclusion about what to do on that fourth down. He probably did know, in general terms, the arguments above, but might not have led to a clear cut answer. He might have just decided that there was a very good chance his QB would have found a way to get the 2 yards. Although I support Simmons’s argument (and only because I think the win probability is shaded just slightly more towards punting, with Simmons’s modifications taken into account), I’m not sure if punting is a clear answer with so much time left on the clock, against a quarterback like Manning.
I think both punt and no-punt, observational arguments are valid. And the whole point of statistics is to help you weigh these alternatives against some metric (i.e. the league average.) Where it actually detracts from the analysis (to the non-statistician’s mind mind) is when the likelihoods of a positive outcome, for the considered alternatives, are rather similar.
The two points here is that, 1) contrary to Simmons point that observations are somehow better, observations also led to two contradictory, sound conclusions about the overall strategy, and 2) with the situation as stated, punting was still not a guarantee of a win (punting would have been the better option as time left to play decreased.)
The problem with the former is that we have a tendency to shoehorn these anecdotes into fitting the conclusions that we want to draw. That’s why having some statistics can provide a context for evaluating the single sample observations. You can’t do what Simmons did, which is to say that the aggregate is wrong because of the details in this situation (wrong play selection or no strategy leading to a 4th down conversion attempt) just as you can’t argue against the punt if a punt return-touchdown happened. Because in the aggregate, these things are aberrations. Even if Simmons arguments for punting was strong, it probably should have modified the outcome to only a greater than 50% winning probability, not the 100% win that Simmons thinks. In other words, you can’t just turn a 60% win probability into 100% just because you chose it. In the aggregate, both plays would yield a win more than 50% of the time.
Some other criticisms of Simmons’s piece: not all stats are created equal. Examples of what not to do with stats include Simmons using spurious stats, like how often there are 3TDs scored in the 4th quarter, to bolster his point. But why limit it to 4th quarter? Why not just look at how often 3TDs are scored in a quarter? Or why look at only 2 point conversion plays, on the road? I know Simmons made a point about how this particular play is set up like one, but the proper comparison is still against all 2 yard attempts or a comparison against all 2-point conversion plays. The problem is that, he made no attempt to discuss the validity of that particular stat in general before analyzing the break downs. In some regards, it might be simpler to prove the general case before the specific one. And certainly it helps to present all the splits, not just the ones that support your case.
Part of the issue with probability and statistics is that people do not have the luxury of the long-run or multiple trials. We only have this one trial. Which brings us the the asymmetry referred to in the title of this post. Models are one way in that one can build them by collecting multiple observations; it is a mug’s game to apply models to predict a specific event. Something might happen, until it does; the model is probabilistic, but the outcome is binary. That is part of the difficulty in accepting statistical models.
I thought that Simmons piece indicated that he did not separate the overall strategy with the details of the execution. As he is so fond of arguing, the details cannot be captured by a simple measure as “conversion”. There were many ways of getting there: is a recovered fumble an ideal way of converting a 4th down? How about a penalty against the defense? Was it a 4th and inches grind forward? Was it 8 yd pass against a weak opponent? Did the coach rest the first string defense in the fourth quarter, with the game well in hand? However, this was in the context of a Brady plus Welker, Faulk, and Moss offense that had nearly 400 yards on the night. That is a detail that Simmons did not dwell on. The players gave the Patriots a legitimate shot at converting the 4th down. It was the playcalling from Belichek that failed the Patriots. I thought it was unfair for Simmons to trash the strategy based on the example of this particular play.
And to spread the criticism a bit, I don’t think it makes sense to never punt, as Easterbrook maintains (though he argues this from an aesthetic perspective.) The contribution of that particular play to the overall win probability depends on the situation. It is the coach’s job to identify the most significant factors in terms of the aggregate (i.e. whole NFL result) and then apply it to an analysis of how his particular offensive and defensive play callings maximize the actual performance of his players.
Simmons missed a great opportunity to show how a proper analysis should be done. He could have supported the obvious point, that, hey, to maximize on that 60% success rate, you need to treat this like a normal play in a scripted series, not like a 2 pt conversion. He even said as much; another one of his points is that Belichek did not treat the whole series like a four down set. Doing so would have enhanced the overall chance of success. Instead, he raised the metaphorical equivalent of the “blogger-in-Mom’s-basement” attack against stat-heads: that they don’t watch the games. And that watching the game would have told you what the correct strategy was. I don’t think that was the case as all, as the contrary view can be derived using Easterbrook’s asssumptions.