SP500: Statistical Analysis of Frequency Data
Statistics can be based on measurement data and/or frequency data. Generally technical analysts consern themselves with measurement data. An example of this would be calculating the RSI, CCI, and/or moving averages. These are all derived from the SP500 price data. Typically these prices are used to to create all sorts of technical indicators. In addition, other basic statistics such as mean, median, mode, skewness, and kurtosis can be calculated. But prices can simply be expressed as either up or down from the previous day, or for any other period such as a year, quarter, month, week or hour. The number of times prices changed up and down can be used to determine if those changes occurred by chance or not. This essay will show you how frequency data is used with respect to the SP500 futures.
The first basic question we will ask is can we increase our odds in forecasting future market directions? So in an effort to answer this question, this essay will review the basic frequency data statistics generated by the SP500 (cash and futures)
Part 1: Annual Frequency Data
The easiest place to start is with annual data, but even with this smaller data set there numerous ways to create frequency data. First, you could evaluate only the last price of the year and classify it as up or down, and secondly, you could tabulate the 252 daily changes within the year. So each time frame can be represented by several data sets. If you were so inclined, you could tabulate the changes within the year on an hourly, minute, or tic basis. It's up to you, but the point is that each time frame can be analyzed many ways.
So if you look at the last price of the year for the SP500 cash index for the last twenty years as shown in the table below, you'll see that there were 16 up years and 5 down years. These data clearly favor an up forecast. The question though is how likely is it that this could occur due to chance. If the numbers are put into a formula the resulting significance would be 4.1 (the value 4.81 represents the basic calculation before applying the correction for continuity). This value of 4.1 is greater than the value of 3.84 for one degree of freedom at the 5% significance level. This means that there is still a 5% chance that the current sample of up to down years is due to normal variation but it is unlikely 95% of the time. So the annual bias is up. In layman's terms, if you're going to guess which way the market will go then choose up because the difference between the number of up years to down is not likely to be due to chance. This statistic has validity and should factored into your anaylsis.
The 95% doesn't mean that you'd be right 19 out of 20 times because the data shows you that only 16 of the past 21 years went up which is only up 3 out of every 4 years. The 95% means that that you are confident that the number of up/down years isn't random, or happened by chance. But there's still a 5% that it could be.
So that's where the buy and hold strategy gets its foundation from this one simple statistic. The problem is when investors buy at the top of a trading range and prices decline from those peak levels. Investors need to wait for prices to return to those levels to break even. The better time to buy is when the market is at the low end of the range. So now if you combine the "buy and hold" strategy with the "buy low, sell high" axiom, you now can invest successfully for the long haul.
So when do you "buy low"? Well, that's the problem. Nobody knows, but if you were you invest when prices are near the previous year's low then that's a good start (there are statistics for this too). The other problem is that nobody knows when this upward bias will stop. The past 200 years tell us that it always goes higher, but many of the G7 countries have failed investors at least once every hundred years. There have been crises that we tend to forget about, but to forget is at your peril because these events were wipeouts. In the USA, we had two such events: the great depression and Jan.1973-Dec.74. 1987 could be considered the third event, but is was only a 25% loss. In Europe and Japan, they all experienced a massive wipeout, or hyperinflation. No investor was spared. This is why "buy and hold" must be tempered with self-preservation. You shouldn't lose it all or expose yourself to such losses.
If you're profitable you should at least get out at break even. If you're not profitable yet, then it is emotionally difficult to sell it at a loss. But the alternative of hanging around for years waiting to breakeven costs you more because there are always other opportunities. So try and move to another investment and learn from your mistake. Don't invest all of your money at once. Invest 1/5 of it at a time. This way you have the chance to average in and convert a loss into a breakeven and start again. Or you can decide that it was a bad investment and you only used 1/5 of your money; and you get out. The point is that you aren't emotionally tied up in the investment and you are now more objective and better able to evaluate the investment's future prospects.
Part 2: Annual Frequency Data of daily price changes
The second method to evaluating annual frequency data is to examine the daily price changes within the year. For example, in 2003 there have been 252 trading days: 137 up and a 115 down. Now if the SP500 were truly random then you would expect the number of up days to equal the number of down days. 126 up to 126 down. But in the real world things get a little sloppy, and sampling doesn't produce the exact number that we would expect. Plus if you look at the different dates you won't find the exact same ratio of up to down days throughout the series. It changes slightly. So there happens to be a specific amount of variation that is acceptable or that can be considered random variation. Anything beyond that normal amount of variation gives us cause to state that something besides chance has influenced the number of up to down days.
In an effort to explain this better, look at the difference between the expected number of up days and the observed number of up days. It was eleven. This isn't a big difference. In addition the difference between the expected down days and the observed number of down days was eleven. This is also a small difference. But statistically, these small differences convert into a score of 1.92. Essentially any score less than 3.84 means that there was no significant difference between the number of up days and down days. In the world of statistics, this means that we can't say amything meaningful regarding about the data. There is nothing to explain. The difference that we observed (137 vs. 115) from the expected (126 vs. 126) was due to random sampling variation and due to chance. This variation was not attributable to any one cause.
And in case you're wondering, the minimum difference that would identify a non-random series would be when there are 142 up days to 110 down days. The question then is how many years had a ratio of 142:110 or greater of up to down days, or down to up days, and did those years forecast higher or lower prices the subsequent year. Below is a table of results for the past 20 years.
| Year | Up Days | Down Days | SPX EOY price | Significance Level |
| 1983 | 139 | 114 | 164.93 | 2.47 |
| 1984 | 113 | 140 | 167.24 | 2.88 |
| 1985 | 138 | 114 | 211.28 | 2.29 |
| 1986 | 141 | 112 | 242.17 | 3.32 |
| 1987 | 143 | 110 | 247.08 | 4.30 |
| 1988 | 140 | 113 | 277.72 | 2.88 |
| 1989 | 150 | 102 | 353.4 | 9.14 |
| 1990 | 135 | 118 | 330.22 | 1.14 |
| 1991 | 124 | 129 | 417.09 | 0.10 |
| 1992 | 131 | 123 | 435.71 | 0.25 |
| 1993 | 130 | 123 | 466.45 | 0.19 |
| 1994 | 133 | 119 | 459.27 | 0.78 |
| 1995 | 156 | 96 | 615.93 | 14.29 |
| 1996 | 138 | 116 | 740.74 | 1.91 |
| 1997 | 141 | 112 | 970.43 | 3.32 |
| 1998 | 140 | 112 | 1229.23 | 3.11 |
| 1999 | 129 | 123 | 1469.25 | 0.14 |
| 2000 | 121 | 131 | 1320.28 | 0.40 |
| 2001 | 120 | 128 | 1148.08 | 0.26 |
| 2002 | 112 | 140 | 879.82 | 3.11 |
| 2003 | 137 | 115 | 1111.92 | 1.92 |
| 2004+ | 73 | 67 | 1086 | 0.26 |
|
Note: Signifiance Levels > 3.84 indicate a non-random distribution of up/down days + = Mid-Year. Last trade date used was 7/23/04. |
||||
As discussed earlier, the first interesting fact regarding this table is that there were only 5 down years out of the last 21 years. Second, if we focus on the up/down intra-year frequency data, only 3 out of the last 21 years demonstrated significance levels that were greater than chance. This means that using these data wasn't helpful considering that an additional 13 years went up despite the random nature of the number of up/down days. Another interesting fact was that there weren't any significant non-random down years despite there being 5 down years. So clearly, using frequency data didn't help us to forecast the subsequent year's prices.
As a matter of fact, of the 3 years that did have a significant level greater than 3.84, one of those years went down. This shows us that these data didn't provide any predictive value. And if you look at those years in which the number of down days exceeded up days (despite the fact that I'm showing you that this is a logical flaw. These 5 years are statistically irrelevant.), there were 5 years. Only 3 out of the 5 predicted a down year in the subsequent year. So again, this didn't help much.
As a tangent, let's apply frequency analysis to a real world event
As an interesting exercise, I also broke from the sample and wondered if Congress's passage of the tax reduction on dividends lead to a statistical significant event. So I tallied the number of up/down days for 3, 6, and 12 months after the passage of the bill in late May 2003. So starting on 5/27/2003, I computed these results:
| Months | Up Days | Down Days | SPX Last price Chg | Significance Level |
| 3 | 40 | 26 | 996.79 +45.31 | 2.97 |
| 6 | 76 | 55 | 1058.2 +106.72 | 3.37 |
| 12 | 144 | 111 | 1121.28 +169.8 | 4.27 |
|
Note: Signifiance Levels > 3.84 indicate a non-random distribution of up/down days SPX last price on 5 /27 /2003 = 951.48
|
||||
As the table illustrates, the passage of the favorable income tax reduction bill on dividends saw a steady increase in the non-randomness throughout the year. Prices of the SPX rose consistently at each period as did the significance level. However, I must emphatically state that this does not prove causality! This again would be symptomatic of faulty logic. These data do not prove any statement. These data just show a probable connection.
It is easy to make the leap that this is the process, or the cause, for the rise in prices, but as we all know, there are many other variables that exist that weren't identified nor described. The fact that I didn't identify nor describe these other factors doesn't mean that these data prove causality. These data, in the manner in which I have associated them, mean that there is a non-random event, or a connection, between the passage of the favorable changes to the tax code and the rise in prices of the SPX during this time frame. If I wanted to prove this causality, I would need to perform another set of computations that calculate the multi-variate correlation amongst all known variables. The point to be made here is that you and I see a clear connection, but it's not 100% proof.
In addition, if there is a lesson to be learned from studying this one reaction of the market to favorable tax law changes, then you would have to walk away from this essay with a keener interest in what Congress is doing to our income tax code. The simple observation, and remarkable fact, is that SPX prices rose 17% one year after Congress reduced the income tax on dividends. Now the question is, what will you do the next time Congress significantly modifies the income tax code?
Conclusions & Summary
This essay's purpose was to demonstrate how to ...
I demonstrated the difference between measurement data and frequency data for the SP500. In addition, I helped you to see that frequency data for price data can be tabulated in various time frames. Annual frequency data can be tabulated in terms of end-of-year prices or intra-year daily price changes. It could also be tabulated in quarterly, monthly, weekly, hourly, minute, or tic formats. The results for annual frequency data on an end-of year basis showed us that the distribution of the up to down years was clearly non-random.
However, the intra-year ratio of up/down daily price changes were primarily by chance. Only 3 out of past 21 years had demonstrated non-random distributions and they didn't have any predictive value. In summary, the number of up/down days within a year didn't have any predictive value from any perspective. It didn't matter if up years were isolated from down years, there wasn't anything useful to be gained from these data as a trader.
Lastly, this foray into frequency data analysis also demonstrates to you how not to interpret data, or how to spot faulty logic. It would be all to easy to examine the table of data and group the data into those years that had a net up/down number of days. As an example, I simply grouped those years in which the net number of days was down without regards to considering if these variations could be due to chance. There were 5 years in which the net number of days were down. It would be tempting to create a trading strategy from these data, but the problem is that you must first identify whether these data are the result of normal random variation or not. If they do not pass the test of being more significant than normal random variation then any further analysis is void, or faulty.
As an introduction, these frequency data of the SP500 also serve to help you in understanding the basis for the "random-walk" debate. The random-walk theory is a simple one. This theory proposes that each daily price change is really a flip of the coin, and the sum of those random price changes produces the patterns that we see. It's by chance rather than causality that determine future prices, and if future price changes are by chance then attempting to predict the future is nothing more than gambling. You could flip a coin and let that decide your strategy. The chances of success would be the same.
I'm not writing this to refute the theoretical underpinnings of this argument, but rather to help you understand why this argument exists. As the data showed you, prices tended to rise despite the fact that the sum of up/down days each year was mainly due to chance. And in fact, there was a year in which the number of up./down days was significantly greater than chance and the following year was down. There was even a year in which the number of up/down days was up and the following year was down. So, there is no predictability in the series which implies random behaviour. These data are what support the the "random walk" theory. When you examine the actual results and put them to traditional statistical tests, there are no statistically significant relationships, other than the trend is up and always has been. That's why "buy and hold" exists and the best method to execute this strategy it to "buy low, and sell high".
I'm not stating that you should use this strategy, as a matter of fact I have described a seasonal strategy that is superior to simply "buy and hold", but I am writing this essay in an attempt to explore the empirical data from which these axioms materialized and explain the mathematical foundations behind them. In addition, I'm hoping that this essay will help you to become more critical when you examine other data presented to you. And if possible, hopefully you'll include in your analysis of the facts an assessment of the odds due to chance producing those results and consider the possibility of normal random variation in your analysis.
Formula used: Chi Square
X2 = (Observed Up - Expected Up)2 / Epected Up + (Observed Down - Expected Down)2 / Expected Down
Note: When the number of Expected is 10 or less then the squared terms are reduced by 0.5 (simply subtract 0.5 from the numerator shown above before dividing by the expected). This is to correct the bias induced by non-continuous (whole numbers) data.
created 12/9/03, The Small Investor's Software Co. All rights reserved.