I recently heard a fun episode of This American Life, called “Kid Politics”. Ira Glass presented three stories about children being forced to make grown-up choices. The second story is an in-studio interview of Dr. Roberta Johnson, geophysicist and Executive Director of the National Earth Science Teachers Association, and Erin Gustafson, a high-school student. The two represented a meeting of minds, between a scientist who is presenting the best evidence demonstrating human induced climate change and a student who, in her words, does not believe in climate change.

It is worth listening to; Ms. Gustafson is certainly articulate, and she is entitled to think what she wants. I simply emphasize that, Ms. Gustafson uses language that suggests she is engaged in a defense of beliefs rather than an exploration of scientific ideas.

Ira Glass, near the end of the interview, asks Dr. Johnson to present the best pieces of evidence arguing in favor of anthropogenic climate change. Dr. Johnson speaks of the analysis of ice cores, where carbon dioxide levels can be detected. This can be correlated to evidence of temperature. Ms. Gustafson points out that apparently, in the 1200s, there was human record of a warm spell – I gathered it was somewhere in Europe, although the precise location and the extent of this unseasonably hot weather was not mentioned –  where low CO2 levels at the time.

Clearly, Ms. Gustafson has shown enough interest in the topic to find some facts or observations to counter a scientific conclusion. She then calls for scientists to show her all evidence, after which she herself will get to decide. I suppose at this point, I’m going to trespass into Kruger-Dunning territory and speak about expertise, evidence, and the use of logic.

In general, I do not think it is a good approach for scientists to simply argue from authority. I admit, this comes from a bias in my interests in writing about science to a lay audience. I focus on the methods and experiment design, rather than the conclusions; my hope is that by doing that, the authority inherent in the concept of “expertise” will be self-evident. That is, I show you (not just tell you) what others (or I) have done in thinking and investigating a problem. By extension, I hope I informed myself sufficiently before I prepare some thoughts on the matter, shooting specifically for fresh metaphors and presentation. (As an aside, I suppose that this might be a mug’s game, given the findings of Kruger and Dunning.)

If a scientist has done his or her job, one is left with a set of “facts”. These facts populate any school textbook. But the facts are more than that: they can act as, with a bit of thought and elaboration, as models. I dislike the distinction people make when they argue that we need to teach kids how to think and not a set of facts. I argue that learning “how to think” depends crucially on how well a student had been taught to deal with facts. These skills include how to deal with facts by using them as assumptions in deductive reasoning, weighing whether a fact has solid evidence behind it, and using facts as if they were models.

Here’s my issue with how Ms. Gustafson, and other anti-science proponents (like anti-evolutionists), argue. Let’s say we were told that gas expands upon heating. One might take this as a given and immediately think of consequences. If these consequences are testable, then you’ve just made up an experiment. Inflate a balloon and tie it off. If temperature increases lead to volume increases, one might immerse the balloon in hot water to see if it grows larger. One might choose to examine the basis of thermal expansion of gas, and he’ll find that the experiments have been well documented since the 1700’s (Charles’s Law). A reasonable extrapolation of this fact is that, if heating gas increases its volume, then perhaps cooling gas will lead to a contraction? One might have seen a filled balloon placed in liquid nitrogen (at – 196 deg C) solidify, but it also shrivels up.

Depending on how well facts are presented, they can be organized within a coherent framework, as textbooks, scientific reviews, and  introductions in peer-reviewed articles already do. My graduate advisor characterized this context fitting as  “provenance.” No idea is truly novel; even if one does arrive at an idea through inspiration and no obvious antecedents, it is expected that this idea have a context. It isn’t that the idea has to follow from previous ideas. The point is to draw threads together and if necessary,  make new links to old ideas. The end point is a coherent framework for thinking about the new idea.

Of course, logic and internal consistency is no guarantee of truth; that is why a scientist does the experiment. What hasn’t been really emphasized about science is that it is as much about communication as it is about designing repeatable experiments. Although scientists tend to say, “Show me,” it turns out that they also like a story. It helps make the pill easier to swallow. The most successful scientists write  convincingly; the art is choosing the right arguments and precedents to pave the way for the acceptance of empirical results. This is especially important if the results are controversial.

The error Ms. Gustafson makes is that she thinks by refuting one fact, she can refute an entire tapestry of scientific evidence and best models (i.e. “theory”). She points to one instance where carbon dioxide levels do not track with the expected temperature change. But in what context? Is it just the one time out of 1000 such points? I would hazard a guess that the frequency of divergence is probably higher than that, but unless the number of divergences is too high, one might reasonably suppose that the two correlate more often than not. (Causation is a different matter;  correlation is not causation.)

But let us move on from that; a more elemental disagreement I have with Ms. Gustafson’s point is that, let’s say that one agrees that carbon dioxide is a greenhouse gas. A simple model is that this gas (and other greenhouse gases such as water vapor, methane, nitrous oxide) absorbs heat in the form of infrared radiation. Some of this energy is transferred into non-radiative processes. Eventually, light is re-emitted (also as infrared radiation) to bring the greenhouse molecule to a less energetic state. Whereas the infrared light had a distinct unidirectional vector, radiation by the greenhouse molecule will occur in all direction. Thus some fraction of light is reflected back towards the source while some other light essentially continues on its original path. If infrared light approaches earth from space, then these gases act as a barrier, reflecting some light back into space. Absorption properties of molecules can be identified in a lab. We can extend these findings to ask, what would happen to infrared heat that is emitted from the surface of the planet?

A reasonable deduction might be that just as out near the edge of the atmosphere, greenhouse gases near the Earth surface also absorb and reflect  a  fraction of heat. Only this time, the heat remains near the Earth’s surface. One logical question is, how does this heat affect the bulk flow of air through the atmosphere? (An answer is that the heat may be absorbed by water, contributing to melting of icebergs. Another related answer is that the heat may drive evaporation and increasing kinetic energic of water vapor, providing energy to atmospheric air flows and ultimately to weather patterns.

For someone who ignores greenhouse gas induced global warming, dismissing the contribution of carbon dioxide isn’t just a simple erasure of a variable in some model. What the global warming denier is really asking that the known physical property of carbon dioxide be explained away or modified. Again, the point is that carbon dioxide has measurable properties. For it not to contribute in some way to “heat retention” is to say that we must ask why the same molecule won’t absorb infrared radiation and re-emit infrared radiation in the atmosphere, in the same way that was observed in the lab. In other words, simply eliminating the variable would require us to explain why there are two different sets of physical laws that apply to carbon dioxide. In turn, this would require a lot of work to provide context, or, the provenance to the idea.

Yes, one might argue that scientists took a reductionist approach that somehow removed some other effector molecule if they measured carbon dioxide properties using pure samples. Interestingly enough, the composition of the atmosphere is well known. Not only that, one can easily obtain the actual “real-world” sample and measure its ability to absorb unidirectional infrared and radiate in all directions. This isn’t to say that thermodynamics of gases and their effects on the climate of Earth is simple. But it is going to take more than a simplistic type of question, such as to posit that there is some synergistic effect between carbon dioxide and some other greenhouse gas or some as-yet unidentified compound, so that we actually modify the working model physicists and chemists have about absorption and transfer of energy.

If you think that it seems rather pat for a scientist to sit and basically discriminate among all these various counter-arguments, I am sorry to disabuse you of the notion that scientists weigh all facts equally. Ideally, the background of the debaters ought not to matter. Hence, you will get scientists to weigh your criticisms more heavily if you show the context of the idea. The more relevant and cohesive your argument, the more seriously you will be taken. Otherwise, your presentation may do you the disservice of giving the appearance that you are simply guessing. That’s one problem with anti-science claimants: all too often it sounds like they are trying to throw as many criticism as possible, hoping that they will get lucky and have one stick.

Take evolution: if one suggests that mankind is not descended from primates, then one is saying that mankind was in fact created de novo. That is fine, in and of itself, but let’s fill out the context. Let’s not focus on the religious texts, but instead consider all the observations we have to explain away.

If we were to go on and to try and explain mankind as a special creation, how would we go about explaining mankind’s exceptionalism? Can we even show that we are exceptional? Our physiology is similar to mammals. We even share physical features as primates. Sure, we have large brains, among the largest brain mass to body mass ratios in the animal kingdom. Yet we differ in about 4% of our genome compared to chimpanzees. Further, at a molecular level, we are hard pressed to find great differences. We simply work the same way as a lot of other creatures. We have the same proteins, despite the obvious differences between man and mouse, a weak similarity between our proteins mean that we have only 70% sequence homology. It seems to me that at multiple levels, at a physiological level, at the level of physical appearances, and at a genomic level, we are of the same mettle as other life on earth. Yes, the fact is that we do differ from these other lifeforms, but it seems to be more logical to suggest that mankind is one type of life in a continuum of the possible lifeforms that can exist on Earth. It just seems likely that by whatever process that led to such a variety of creatures, man must also have been “created” from such a process.

 I hate to harp on this, but a fellow grad student and I had such arguments, while we were both doing our thesis work. My friend is a smart guy, but he still makes the same mistake that anti-evolutionists make: by disproving natural selection, one  therefore has provided some support for creationism. We argued about Darwin’s theory and whether it can be properly extended from a microscopic domain. He was willing to concede evolution occurs at a microbiotic level – such as for “simple” organisms, evolution makes sense, since fewer genes mean less complexity and therefore changes can be just as likely to be beneficial and deleterious.

I thought the opposite. If an organism is “simpler” – namely because it contains a smaller genome – it is even more crucial for a given organism that a mutation be beneficial. A larger genome, from empirical data, generally contains more variants of a given protein. While this in itself reflects the appropriation of existing genes and their products for new functions. Perhaps one possibility is that   an increase in isoforms of a protein also suggests how mutations can occur without the organism suffering ill effects directly. There is a redundancy of protein and function. Also, my friend seems to regard fitness as a “winner takes all” sort of game – as in the organism lives. I merely saw the “win” as an increase in probability that the animal will have a chance to mate, not organismal longevity. Sure, this is a just so story; I think his argument is better than the usual creationist claptrap, but only in the  trivial sense that, yes we need to take care not to over interpret our data or models and yes,  scientific theories – althoughswa they are our best models –  are temporary in the sense that we can revise them when better evidence comes along.

To go back to the way Ms. Gustafson and my friend argue, it behooves them to explain the exceptional circumstances by which we, or carbon dioxide, can act differently from our best model (i.e. theory) and yet conform to it most of the time.

Thus, despite Ms. Gustafson’s call for “all the evidence”, I somehow was left thinking no amount of evidence will persuade her. Part of the problem is that, like the religious who misapply ideas of meaning found in their bibles to the physical evidence generated by scientists, she misapplies her political views to provide the context through which she views scientific evidence about global warming. Whereas she should have used logic to deduce that global climate does not predict local weather and scientific principles  to determine whether global warming is part of a normal cycle for the Earth or is in fact due to circumstances like an increase in greenhouse gases, she probably thought of global warming in terms of regulations and taxes pushed, generally in the United States, by Democrats. Thus, Ms. Gustafson speaks, in Stephen Jay Gould’s term, from the magisteria of meaning (as defined by her political and religious beliefs) and not from the magisteria of science. In this case, she isn’t defending her theory about how the world works; her motivation is to fit the observations to her political and religious ideals.

Can we really separate the political from the scientific? If some scientist argues that there is a problem, it seems difficult to find ways to argue against them. My only suggestion is that Ms. Gustafson and others like her consider their arguments more carefully. Nitpicking specific examples is counter-productive. All theories can be criticized in this way. However, integrating the counter-example is not a straight-forward process, especially if simplistic criticism is at odds with some other firmer, more fundamental observation that even Ms. Gustafson has no problems accepting.



Kate Shaw, over at Ars Technica, reported on a recent study suggesting that there is no intrinsic acoustic property of the “best” violins from the time of Antonio Stradivari and Guiseppe Guarneri “del Gesu”, that would naturally attract professional violinists. She does a good job explaining both the methods and finding, and also placing the significance of the research into context.

The upshot of the study is that

… it definitely counters the wisdom that these old, highly valuable violins are unmatched in quality. In many cases, the old and new instruments are equal in quality – in some, the new models are superior to their “golden age” counterparts.


An important point is that this was a double-blind study, where the experimenters and the violinists did not know which violin was being assigned when. Violinists did not do better than chance when identifying the so-called “golden age” violin, nor did they necessarily prefer to older models to the new ones.

As an aside; I once attended a performance of Vivaldi’s Four Seasons, played by the string quartet on four such golden age instruments. The thing I remember most from that concert is that a few of the high notes were screeched out. Otherwise, it was a reasonable performance. I am sure we would not have lost anything had we remained ignorant of the provenance of the violins. With that said, sometimes it is not the quality of the tool but its history that gives it value. The fact remains that the human culture possesses this violin that had been made over 300 years, hand-crafted in the master’s workshop, and played by generations of virtuoso violinists. It is a bit of living history, infused by our hands.



Joe Posnanski has written another thoughtful piece on the divide between writers of a statistical bent and those who prefer the evidence of their eyes.  I highly recommend it; Posnanski distills the arguments into one about stories. Do statistics ruin them? His answer is no. Obviously, one should use statistics to tell other stories, if not necessarily better ones. He approached this by examining how one statistic, “Win Probability Added”, helped him look at certain games with fresh eyes.

My only comment here is that, I’ve noticed on his and other sites (such as Dave Berri’s Wages of Wins Journal) that one difficulty in getting non-statisticians to look at numbers is that they tend to desire certainty. What they usually get from statisticians, economists, and scientists are reams of ambiguity. The problem comes not when someone is able to label Michael Jordan as the greatest player of all time*; the problem comes when one is left trying to place merely great players against each other.

* Interestingly enough, it turns out the post I linked to was one where Prof. Dave Berri was defending himself against a misperception. It seems writers such as Matthew Yglesias and King Kaufman had mistook Prof. Berri’s argument using his Wins Produced and WP48 statistics, thinking  that Prof. Berri wrote other players were “more productive” than Jordan. To which Prof. Berri replied, “Did not”, but also gave some nuanced approaches in how one might look at statistics. In summary, Prof. Berri focused on the difference in performance of Jordan above that of his contemporary peers. 

The article I linked to about Michael Jordan shows that, when one compares numbers directly, care should be taken to place them into context. For example, Prof. Berri writes that, in the book Wages of Wins, he devoted a chapter to “The Jordan Legend.” at one point, though, he writes that

 in 1995-96 … Jordan produced nearly 25 wins. This lofty total was eclipsed by David Robinson, a center for the San Antonio Spurs who produced 28 victories.

When we examine how many standard deviations each player is above the average at his position, we have evidence that Jordan had the better season. Robinson’s WP48 of 0.449 was 2.6 standard deviations above the average center. Jordan posted a WP48 of 0.386, but given that shooting guards have a relatively small variation in performance, MJ was actually 3.2 standard deviations better than the average player at his position. When we take into account the realities of NBA production, Jordan’s performance at guard is all the more incredible.

If one simply looked at the numbers, it does seem like a conclusive argument that Robinson, having produced more “wins” than Jordan, should be the better player. The nuance comes when Prof. Berri places that into context. Centers, working closer to the basket, ought to have more, high-percentage shooting opportunities, rebounds, and blocks. His metric of choice, WP48, takes these into consideration. When one then looks at how well Robinson performed above his proper comparison group (i.e. other centers), we see that Robinson’s exceptional performance is something one should expect when comparing against other positions but is not beyond the pale when compared to other centers. However, Jordan’s performance, when compared to other guards, shows him to be in a league of his own.

That argument was accomplished by taking absolute numbers (generated for all NBA players, for all positions) and placing them into context (comparing to a specific set of averages, such as by position.)

This is where logic, math, and intuition can get you. I don’t think most people would have trouble understanding how Prof. Berri constructed his arguments. He tells you where his numbers came from, why there might be issues and going against “conventional wisdom”, and in this case, the way he structured his analysis resolved this difference (it isn’t always the case he’ll confirm conventional wisdom – see his discussions on Kobe Bryant.)

However, I would like to focus on the fact that Prof. Berri’s difficulties came when his statistics generated larger numbers for players not named Michael Jordan. (I will refer people to a recent post listing a top-50 of NBA players on Wages of Win Journal.*)

* May increase blood pressure.

In most people’s minds, that clearly leads to a contradiction: how can this guy, with smaller numbers, be better than the other guy? Another way of putting this is: differences in numbers always matter, and they matter in the way “intuition” tells us.

In this context, it is understandable why people give such significance to 0.300 over 0.298. One is larger than the other, and it’s a round number to boot. Over 500 at-bats, the difference between a 300-hitter and a .298-hitter  translates to 1 hit. For most people who work with numbers, such a difference is non-existent. However, if one were to perform “rare-event” screening, such as for cells in the blood stream that were marked with a probe that “lights” up for cancer cells, then a difference of 1 or 2 might matter. In this case, the context is that, over a million cells, one might expect to see, by chance, 5 or so false-positives in a person without cancer. However, in a person with cancer, that number may jump to 8 or 10.

For another example: try Bill Simmons’s ranking of the top 100 basketball players in his book, The Book of Basketball. Frankly, a lot of the descriptions, justifications, arguments, and yes, statistics that Simmons cites looks similar. However, my point here is that, in his mind, Simmons’s ranking scheme matters.  The 11th best player of all time lost something by not being in the top-10, but you are still better off than the 12th best player. Again, as someone who works with numbers, I think it might make a bit more sense to just class players into cohorts. The interpretation here is that, at some level, any group of 5 (or even 10)  players ranked near one another are practically interchangeable in terms of their practicing their craft. The differences between two teams of such players is only good for people forced to make predictions, like sportswriters and bettors. With that said, if one is playing GM, it is absolutely a valid criterion to put a team of these best players together based on some aesthetic consideration. It’s just as valid to simply go down a list and pick the top-5 players as ordered by some statistic.* If two people pick their teams in a similar fashion, then it is likely a crap shoot as to which will be the better team in any one-off series. Over time (like an 82-game season), such differences may become magnified. Even then, the win difference between the two team may be 2 or 3.

* Although some statistics are better at accounting for variance than others.

How this leads back to Posnanski is as follows. In a lot of cases, he does not just simply rank numbers; partly, he’s a writer and story teller. The numbers are not the point; the numbers illustrate. Visually, there isn’t always a glaring difference between them, especially when one looks at the top performances.

Most often, the tie-breaker comes down to the story, or, rather, what Posnanski wishes to demonstrate. He’ll find other reasons to value them. In the Posnanski post I mentioned, I don’t think the piece would make a good story, even if it highlighted his argument well, had it ended differently.

In a previous post, I wrote about financial models. The point is that a scientific model generally simplifies. At the same time simplification gives models their power, one must also take care to assess whether adapting or transplanting the model to new fields is valid. Hence some disconnects between economic models and the financial tools based off these models.

Here’s another illustration. I was talking with my friend about his thesis. R. is interested in building a model of the olfactory bulb. This structure is interesting; it is well defined anatomically into three layers. The top layer contains neuropil structures called glomeruli. Glomeruli contain the axon projections from the primary sensory neurons and dendritic branches of the neurons in the bulb. Both these “main” neurons and so-called interneurons form  connections within this layer. Since this is where raw signals from the nose arrive, it is called the input layer. Together, these cells form a network and reshapes the responses into new neural activity patterns, relayed to deeper olfactory processing areas of the brain.

The middle layer contains the cell bodies of the olfactory bulb output neurons. As mentioned, these cells, called mitral or tufted cells (usually termed M/T cells), send a main dendrite to the glomerulus. Each cell also sends secondary dendrites laterally, within the middle layer. The third layer, the granule cell layer, contains interneurons that form connections between the laterally spread dendrites in the middle layer. This forms a second point within the olfactory bulb where the raw input from the nose can be reshaped, repatterned, and repackaged for subsequent processing.

OK: my friend spoke of his troubles. He needed to convert the sensory neuron activity (from the nose), which differ for different smells. The features that are important seem to be when the activity begins (onset latency), how long it lasts for (duration), and how intense (basically how often the neuron “fires” an action potential.) There are some other subtleties, naturally. Each smell evokes activities in a great many olfactory neurons, some of which respond with a different set of characteristics. The idea is to build the model so that the responses from bulb output neurons can be calculated, given the set of parameters (i.e. the input activity patterns).  Ultimately, these input neural patterns can be related to the actual behavior that helped shape them (such as the sniffing that an animal might engage in as they hone in on some odorous.)

His trouble came with integrating the Hodgkin-Huxley model of the action potential (this is basically derived from physical/thermodynami first principles), determining how this model would generate action potential “spikes” in a way that mimics what the olfactory bulb neurons would do, given the pattern of input activity and the 2 layers of interneuronal influence within the bulb. It seemed like a set of nested differential equations – that is, the action potentials varied over time, with the degree of influence from the various interneurons also changing in time. That’s a real cluster-eff.

I thought I had a brilliant idea (and I still think it’s nice.) I suggested that he can simply build a phase space to describe all the possible arrangements of his input patterns. Each point in this abstract descriptive space can be correlated to a set of output profiles (i.e. how the bulb neurons eventually respond.) He can, in the end, identify the bulb response most likely to result from a given set of input patterns.

The problem is that this is a descriptive model. The Hodgkin-Huxley model would have the advantage of being an actual, theoretical model. Once this is in place, they can literally predict, down to the number of spikes and when they fire, the output of the olfactory bulb.

So yes, that, in a nutshell, is the difference between data-mining versus something derived from first principles. While one might be able to infer the same conclusions from a descriptive model, the theoretical model might be easier to work with when extending it slightly further than what had been observed by scientists. As Justin Fox warns, such extensions can be perilous if one does not take care to worry about validity.

I hadn’t quite planned on reading about the rise of mathematical financial theory and efficient market hypothesis,  but that is what I did.

As it my wont,  I will digress and say that, a prime theme of Moneyball is not that statistics are better than visual pattern recognition: it is that when markets exist,  so do arbitrage opportunities.  Lewis’s writing style is to group his subjects into opposing camps,  to the detriment of his story. So the tension between scouts and stats geeks dominate the book.  It’s a more interesting book,  if you like people stories.

The moneyball story isn’t simply that OBP is a good statistic; it was an undervalued metric,  in the sense that players with high OBP weren’t paid highly compared to,  say,  batters with high homerun totals and batting averages.  Whether Billy Beane was the first one to “discover” OBP (he wasn’t) is incidental to the observation that no one was actively making use of that information. While GMs at the time were starting to identify other metrics, no one put their money where their mouths  were: high OBP players were not paid a premium. Because of that pricing difference (OBP contributes strongly to runs scored and thus wins, but GMs did not pay well for it),  one might be able to buy OBP talent on the cheap.  Now,  that arbitrage opportunity has disappeared,  as teams with money (read: Red Sox and Yankees)  have bid up the price.  That means high OBP now commands a premium.  Thus what worked before (a winning strategy on the cheap),  no longer works now.  It is a combination of fiscal constraints and incorrect pricing that gave Beane an edge.  The fact that there was a better stat  is besides the point; the fact that there was an arbitrage opportunity is absolutely the point.

This brings us to financial markets. If prices for stocks in a company were set by supply and demand, then rational buyers and sellers essentially agree on a fair price due to the fact that the seller has control of the product (i.e. stocks) and can name its price, while buyers need not purchase the stock if they  find the deal poor. In other words, opposing rational interests create a balance between something being charged too much or too little.

Is this price the correct price?

From a simple question, much of the mathematical economics was developed to help investors, fund managers, brokers and bankers identify the worth of the various products they buy and sell today. The most successful of these theories is that  markets are efficient: prices in a financial market such as the New York Stock Exchange are not only the optimum price for sellers and buyers, but reflects a conclusion about the value of the product. That is, this price correctly valuates the company whose stock is being sold. There are different forms of this efficient market theory: they differ in the emphasis on whether different “information” is accounted for in the price. A weak version of efficient market theory suggests the stock prices reflect all past public information. A semi-strong form of this theory is that new publicly available information is accounted for in the price of a stock, in a large financial market. The strong form of this theory is that even private (i.e. inside) information is accounted for in the price.

This might seem strange to people, given that a) we just saw a financial market meltdown because finance sector personnel did not evaluate sub-prime mortgage bonds correctly, b) such bubbles existed before and even after we have complicated performance metrics (Dutch tulip  mania and the dot-com bubble), and c) that there are enough shenanigans involving inside trading.

At any rate, one difference that I will focus on is that economic scientists (i.e. economists, and a breed we should separate from the operators in the financial market), like most scientists, seek general explanations. Because their tool of trade is mathematics, economists prefer to derive their conclusions from first principles. Generally, statistical analysis is thought of as ways of either testing theory or helping guide the development of a theory. Statistical models are empirical and ad-hoc. They rely on the type of technique one uses, how one “scores” the observation, and they are, as a rule, not good at describing things that were unseen. A good theory is a framework for distilling some “essence” or a less complex principle that governs the events that happen, which led to “observations.” Usually, the goal is to isolate the few variables that presumably give rise to a phenomenon. These distinctions are not so firm, of course, in practice. Good observations are needed to provide the theorists with curves to fit, mathematically. And even good theories fall apart (again, it is still based on observations – boundary conditions are a key area where theories fail.)

What does all this have to do with financial markets and efficient markets? While we have evidence of inefficient markets, these events may have been rare or the result of a confluence of exacerbating factors. However, one thing that scientists would pay heed to is that pricing differences were proven to exist, mathematically, and derived from the same set of equations used to describe market efficiency. Joseph Stiglitz proved that there can’t be a so-called strong form of an efficient stock market, since information gathering in fact adds value and has a cost. The summary of his conclusion is that, if markets were perfect and all agents have perfect information, then everyone would have to agree on the price. If that were true, then there would be no trading (or rather, speculating), since no one would price things differently. When people are privy to different information, it may lead to pricing differences. That in turn, must lead to arbitrage opportunities (no matter how small.) Thus the “strong form” of market efficiency cannot exist.

I was talking with a friend who has an MBA. He wasn’t too keen on hearing that the efficient market hypothesis may not be entirely proper, when I was describing to him Justin Fox’s book, The Myth of the Rational Market. I was approaching things from a scientific perspective; I know that models are simplifications. Even the best of them can be found inadequate. And this is what I want to focus on: that although models may not describe everything exactly, it’s fine. It does not detract from it.

From Fox’s book, and also William Poundstone’s Fortune’s Formula, the reader sees some difficulties with the efficient market theory. For one, the theory was originally posited to explain why prices, in the very short term (daily), varied around some mean. Sure, over time, the overall price increases, but at every iota of time, one can see that prices ticked up and down by a very small fraction of the price. This is known as the random walk, first mathematically described in the doctoral thesis of Louis Bachelier. One bit of genius is that, Holbrook Working pointed out that these random price fluctuations may in fact indicate that the market has worked properly and efficiently to set a proper price. Otherwise, we would see huge price movements that reflect the buying and selling of stock due to new information. In other words, the price of a stock constitutes the mean around which we see a “natural” variation.

And from that, much followed. Both Poundstone and Fox talked at length about pricing differences. In some sense, market efficiency, although implying both speed and precision, did not address the rate of information propagation.  Eugene Fama suggested that information spread in a market is near instantaneous (as in, all pricing changes are set and reset constantly at a proper level). In the theory’s original form, I think this instantaneous rate resulted from a mathematical trick. Bachelier was able to “forecast” into the near, near future, showing the stock price can tick up or down. His work was extended into many instants by a brilliant mathematical trick. By assuming that stock transactions can be instantly updated and without cost, one can build up a trajectory of many near instants by constantly updating one’s stock portfolio. The near, near future can now be any arbitrary future moment.

Again, my only point here is not that the efficient market theory is wrong and must be discarded. I was fascinated by the description of counter examples and the possibility that some of the assumptions helping to build up a mathematical framework may  need revision.

My boss and I were talking about the direction of our research. He thought that models of cell signaling pathways were lacking in rigor (by that he means a mathematical grounding). He, having a physics background, scoffed at the idea that biology is a hard science, because biological models are mostly empirical and does not ‘fall-out” from considering first principles (i.e. based on assumptions, postulates, and deductive reasoning). I, being the biologist, tried defending this view. Biology, like any sort of system, is complex. There are some simple ideas that can help explain a lot (for instance, evolution and genetic heritability). The concept of the action potential, in neurons, can in fact be derived from physical principles (it is simply the movement of ions down an electrochemical gradient, which can be derived from thermodynamics). In fact, neurons can be modeled as a set of circuits. For example, one recent bit of work my supervisor and I published on, using UV absorption as a way to measure nucleic acid and protein mass in cells, is based on simple physical properties (the different, intrinsic absorption of the two molecules to light), which can be described by elementary, physical mathematical models.

However, the description of how networks of neurons may work, and how such physical phenomenon can give rise to animal and thoughts, and in turn how individuals may act in concert with others and form a societal organism, are wildly complex. Further, there can be multiple principles at work, none of which are necessarily derivable or deduced from a common set of ur-assumptions. For example, Newton’s laws of motion can be derived from Einstein’s theory of relativity. However, some basic ideas about human behavior (such as that leading to pricing correctness in market efficiency and game theory), or how humans may interact (as described by network theory), and how something as seemingly nebulous as and human-dependent as “information” can actually be described by Boolean algebra and a mathematical treatment of circuits.

I should be clear: I am simply noting that some fields are closer to being modeled by precise, mathematical rules than others. Reductionism works; even the process of trying to identify key features underlying natural phenomena is helpful. However, one should also keep in mind that wildly successful theories may change, as we obtain better tools and make more accurate measurements.

I think an important point that Fox makes, then, is that we do have a number of observations suggesting that markets are not entirely efficient. For example, there is price momentum (a tendency for stock prices to continue moving in a particular direction), there is significant amount of evidence suggesting that humans do not always act rationally (they tend to overvalue their property but discount things they do no own), and there are clearly signals that sometimes, herd mentality results (a la price momentum or bubbles). Fox also points out something rather important: even as economists point out inefficiencies in the market, they seem to disappear once known. Part of it could be statistical quirks: by chance, one might expect to see patterns in the noise of large, complex systems. Another part of it is that, once known, the information is in fact integrated into future stock prices. This places economists in a bind: if the effect is false, one might be justified in ignoring it as noise or a mirage of improper statistical analysis. However, if the effect is real, then it clearly suggests that the appearance of price incorrectness reflects market inefficiency. At the same time, the effect disappeared, also suggesting that once known, the market price showed correction, just as efficient market theorists predicted.

As one can imagine, there are opposing camps of thought.

Further compounding the difficulty is the fact that it has been hard to integrate non-rational agents into traditional market theory. current theory treats pricing as an equilibrium, consistent with the idea that information and rational agents pulling and pushing the prices this way and that, but ultimately, the disturbances are minor and the overall price of the stock is in fact the proper, true price. Huge disturbances are interpreted as movements in the equilibrium point, but they must arise from external forces (that is, from effects not modeled within the efficient market model – which actually leads to an inelegance of the variety that mathematicians and physicists dislike.) As the number of contingencies increase, one might as well resort to a statistically based, empirical model. Which brings us back to the original point of how well we understood the phenomenon.

On the other hand, no one who wishes to modify efficient market theory has successfully integrated the idea of the irrational agent. The advantage is that here, pricing changes – correct or incorrect – are based on the actions of “irrational” agents. Thus we are no longer looking at an assumption of a correct price and deviations from that price. We can, presumably, derive the current price by adding into the model the systematic errors made by agents. Thus even huge deviations in proper prices (i.e. bubbles, undervaluations, and perhaps even the rate of information incorporation) would be predicted in the model. However, a model remains just out of reach. In other words, efficient market opponents do not yet have a completed and consistent system to replace and improve the existing one. Be default, efficient market is what continues to be taught in business schools.

My interest in the Fox and Poundstone books is precisely in how difficult it is to incorporate new ideas if an existing one is place. It is this intellectual inertia that results in the concept of memes as ideas that take on a life of its own  (in that ideas exist for its own reproductive sake) and Kuhnian paradigm shifts that have to occur in science. My specific application has always been in how non-scientists deal with new ideas. If scientists themselves are setting up in opposing camps, what must laymen be doing when faced with something they do not understand?


A list of the top 7 scientific articles, in genetics for the last month, as ranked by the Faculty of 1000.


There’s also an interesting article in this week’s Nature about efforts to find and archive old data. Part of is for historical interest, but in a field like climatology, it is can be vital to keep the primary data for local weather over a “small” 100 year time frame.

Christian Specht wrote a short, cute analysis on citation mutations. He has a follow-up. Basically, these result from typos by authors or typesetters. This isn’t the problem. The problem is that some typos are inherited. Specht speculates that the inheritance  (i.e. copied and propagated through citations in other papers)  is a  problem because it implies that authors simply copy old references from other papers. I guess the ideal would be that authors would use their own database references or to build up their citation from the actual paper.

I ‘m not sure if this problem is as distressing as Specht writes, although to be fair he isn’t exactly worried.)  He simply made a point that there is likely much copying of old references – even if we can’t detect the occurrence because most people usually copy the correct reference.

Specht worries that the incorrect references may be an indication that scientists do not always read the papers they cite. I would add, simply, that maybe some scientists are lazy; if a paper already contains a properly formatted bibliography for the journal to which a new paper is being submitted, I can see why some authors might simply save time and make a copy.

Or the level of scrutiny for a paper usually doesn’t reach into the bibliography, which, ideally, would involve the authors actually searching for  each paper and actually checking if the page numbers match those from the article.


A press release from Queen Mary University of London:

Professor Lars Chittka from Queen Mary’s School of Biological and Chemical Sciences said: “In nature, bees have to link hundreds of flowers in a way that minimises travel distance, and then reliably find their way home – not a trivial feat if you have a brain the size of a pinhead! Indeed such travelling salesmen problems keep supercomputers busy for days. Studying how bee brains solve such challenging tasks might allow us to identify the minimal neural circuitry required for complex problem solving.”

The team used computer controlled artificial flowers to test whether bees would follow a route defined by the order in which they discovered the flowers or if they would find the shortest route. After exploring the location of the flowers, bees quickly learned to fly the shortest route.


Some interesting book reviews: Lee Smolin reviews Roger Penrose’s Cycles of Time, in which Penrose speculates about how the universe got its start. The mind-bender is that there might be no such official beginning, at least for our universe. Shame on me, although I am aware of Roger Penrose’s work, I had no idea how significant an impact he has had in physics. As Smolin writes in Nature,

We should pay attention because Penrose has repeatedly been far ahead of his time. The most influential person to develop the general theory of relativity since Einstein, Penrose established the generalized behaviour of space-time geometry, pushing that theory beyond special cases. Our current understanding of black holes, singularities and gravitational radiation is built with his tools.


In the same issue of Nature, Jascha Hoffman reviews Charles Seife’s Proofiness, where Seife creates a “taxonomy of statistical malfeasance”.



An interesting paper in Nature: a comparison of unique human genomes. The 1000 Genomes Project Consortium sequenced 882 people, with varying degrees of coverage (i.e. total nucleotides sequenced.) This has to do with time and costs. There were 2 mother-father-daughter trios who were sequenced with high-coverage, 178 individuals sequenced with low-coverage, and 697 individuals had only coding sequences within their genome sequenced. This type of research will enable researchers to categorize the genetic differences between closely and distantly related individuals. Further development of individual genome sequencing may enable both disease likelihood calculations as well as possibly tailoring drug treatments for disease, finer scale look at population migrations, and genetic correlates of phenotypic variation. Finally, the identification of the single nucleotide changes (polymorphisms) between individuals will also help researchers expand on the number of markers that are linked to a disease (and in fact have already guided researchers in expanding the probes in microarray chips that detect these new markers.)

A second interesting paper, this one published in Science. Workers were able to identify a specific neural circuit, in zebrafish, that processes visual information. Specifically, this circuit is tuned to small objects, perhaps used in the capture of the zebrafish’s prey.

%d bloggers like this: