Thursday, January 26, 2012

Moneyball and Investing: Data, Information and my 2012 Update

I loved Moneyball, both the book, by Michael Lewis, and movie starring Brad Pitt, because they bring together two things I love: baseball and numbers. At the risk of shortchanging the book, the central story in the book is a simple one. For most of baseball’s hundred plus years of existence, insiders (baseball managers, scouts and experts) have used stories and narratives to keep themselves above the riff raff (which is where you and I as fans belong). Thus, scouts claimed to have special skills (based on their long history of having done this before) to find potential superstars in high schools and the minor leagues, and managers justified their personnel decisions and game day choices with gut feeling and baseball instincts. Billy Beane, the general manager of the Oakland As, a storied but budget-constrained franchise, upended the game by shunting hoary tradition and putting his faith in the numbers.

I think that financial markets and baseball share a great deal in common. Equity research analysts are our baseball scouts, asking us to trust their story telling skills when picking stocks. Executives at companies are our baseball managers, flaunting their industry experience and asking us to trust their gut feeling and instincts, when it comes to big decisions. Like Billy Beane, I trust the numbers far more than either analyst stories or managerial instincts, and it is for that reason that I started gathering raw data on individual companies about two decades ago and computing industry averages for a few key inputs into investments: risk, return and growth. Initially, it was a limited exercise, where I looked at only US companies and  a handful of statistics. I put those numbers online, not anticipating many downloads, but was pleasantly surprised at how many people seemed to find the data useful (I won’t flatter myself. The fact that it was free did help…)

Each year my coverage has expanded, driven partially by external demand and mostly by easier access to raw data. Starting in 2003, I went global and a year or two later started providing data on the individual companies as well. So, here is where the long windup is leading. I have just finished the January 2012 update to my data. You can get to it by going to the updated data page on my website:
My sample includes all (a) publicly traded firms, (b) listed on any global exchange and (c) have data on the data sources that I use (Value Line for US companies, Capital IQ and Bloomberg for non-US companies). In January 2012, there were 41,803 companies in my overall dataset.

 I have computed industry averages for about 35 variables, covering a wide range of inputs:
a. Risk measures and hurdle rates: Betas and standard deviations, as well as costs of equity and capital, by sector.
b. Profitability measures: Profit margins (net and operating), tax rates and returns on equity and capital.
c. Growth measures/ estimates: Historical growth rates in revenues and earnings, as well as forecasted growth rates (where available)
d. Financial leverage (debt) measures: Book value and market value debt to equity and debt to capital ratios.
e. Dividend policy measures: Dividend yields and payout ratios, as well as cash statistics (cash as a percent of firm value).
f. Equity multiples: Price earnings ratios (current, trailing, forward), PEG ratios, Price to Book ratios and Price to Sales ratios.
g. Enterprise value multiples: Enterprise value to EBIT, EBITDA, revenues and invested capital.
I generally stay away from macro economic data but I do report equity risk premiums (historical and implied) over time and marginal tax rates across countries.

You are welcome to use whatever data you want from this site, but please keep in mind the following caveats:
1. Data yields estimates, not facts: In these days of easy data access and superb tools for analysis, it is easy to be lulled into believing that you are looking at facts, when you are really looking at estimates (and very noisy ones at that). Every number that is on my site, from the historical equity risk premium to the average PE ratio for chemical companies is  an estimate (and adding more decimal points to my numbers will not make them more precise).
2. Data has to be measured: That is again stating the obvious, but implicit in this statement are two points. The first is that someone (an accountant, a data service, me) is doing the measurement and imposing his or her judgment on the measured value. The second is that there can be error in measurement. Thus, with my data, you can be assured that there are errors and mistakes in the final numbers. While I can blame some of these mistakes on the data services that I get my raw data from, many are mine. So, if you find a mistake or even something that looks like a mistake, please let me know and I promise you two things. First, I will not be defensive about it and will take a look at the issue you have raised. Second, if I do find myself in error, I will fix the error as soon as I can. (With a staff of one (me), this data service can get stretched sometimes… So, please have some patience).
3. Data for post-mortems versus data for predictions: As I see it, data can be used in two ways. The first is to generate post-mortems (about past performance) and the other is make forecasts for the future. Given my focus on corporate finance and valuation, I am more interested in the latter than the former. Thus, my data definitions are more attuned to forecasting than to after-the-fact analysis. Just to provide an example, the cost of capital that I am interested in computing for a company is the cost of capital that I can use for the next five years, not the one for the last three years. 
4. Data anchoring: Whether we like it or not, our instinct when confronted with a number, and asked to decide whether it is high or low, is to compare it what we consider reasonable numbers (at least in our minds). Thus, if I came to you with a stock with a PE of 10, your determination of whether the stock is cheap or expensive will depend largely on what you think the average PE is across all stocks and what comprises a high or low PE and all too often, in the absence of updated and comprehensive data, these are guesses.  It is for this reason that analysts and investors create rules of thumb: a EV/EBITDA of less than six is cheap, a PEG ratio less than one is cheap or a stock that trades at less than book value is cheap. But who comes up with these rules of thumb? And do they work? The only way to answer these questions is to look at the data across all companies and make your own judgments.

There is one final point generally about data that I have to make, and it relates back to Moneyball. Much as I agree with Billy Beane on the importance of data, I think that his mistake was focusing far too much on the data. The data should be the starting point for your assessments, but not the ending point. Stories do matter, if they can be backed up by the data, or to draw implications from it. The secret to great investing is a happy marriage between plausible investment stories and numbers, with the recognition that even the best sounding stories have to be abandoned at some point, if the numbers don’t back them up. So, explore the data and make it your own!!


Tom said...

Thanks a lot for the efforts! Very much appreciated and very interesting data indeed.

One thing puzzling me slightly, but is not a major concern as it works for the Global file is that e.g. the beta for all of Europe does not add up to 1. If I remember correctly, last year the global beta was above 1 and for Europe also below...

farmland as an investment said...

Very interesting post. Data is obviously critical, but there should be a balance between data and story. If you looked purely at data, you might miss critical information about a company that could lead you to believe it will perform in a different manner down the road than the pure data indicates.

Aswath Damodaran said...

The betas are a problem. Capital IQ estimates the betas against local indices and they don't always aggregate to make sense. For instance, last year, the average beta for emerging market companies on the whole was less than 0.70. This year, I did introduce a global standardization requirement, where the average beta for companies globally has to be one. That does mean that betas in regional breakdowns (such as Europe) have changed. I continue to wrestle with this issue, since there is no easy solution.

michael s said...

Thank you for making this available.

One question. On the 'Working Capital Requirements by Industry' files, the percentage columns on first year (1/99) file add across as I would expect (A/R+Inv-A/P = NonCashWC), but in later years they don't, by substantial amounts. Is there some other factor involved, or something in the way this is aggregating that's causing this? (Since these are totals by industry it's difficult to look up the answer myself).

Many thanks.

Aswath Damodaran said...

Because I also include the catch all item "other current liabilities" (which includes other non-interest bearing current liabilities like deferred taxes) and "other current assets" (which include other non-cash current assets).

michael s said...


I asked because I was curious about the trends in working capital attached to sales, including ‘by industry’ – and to the extent we can estimate them, also what the incremental working capital rates might be for various industries. I’m guessing that AP, Inv, and AR are probably the most useful components for that purpose – perhaps more so than total reported non-cash WC. But I’m not entirely sure.

Sometimes it seems that we see some businesses, and perhaps entire industries, where steady revenue and earnings growth equates to lower value, from the combination of low margins and apparently high incremental WC rates, where the incremental WC is some multiple of the incremental earnings rate. A lot of heavy manufacturing looks like it may fall in this category.

Comparing the WC% (on an AR+INV-AP basis) to industry net margins is interesting. What’s also interesting are the trends – for example most WC rate improvement the past decade has apparently come from AR (I would have initially guessed inventory, but that may have been more the case in the past).

Looking at companies within industries is interesting also. For example, in Retail Stores we see an overall 5% non-cash WC rate (using the 3 components), however Walmart's negative WC rate suggests it might have a significant ‘growth’ advantage.

Again thanks – this is interesting.

Aswath Damodaran said...

What you are doing is what I hoped would come out of the data.. Explore the data and see both the similarities and the differences... So, have fun with it!

ju said...

Dear mr. Aswath,

I just recall the other day that you have post some topic of bond or equity, which one is better market indicator. But i can't find the topic here.


factoring company said...

thanks for the recent update. well written article. i really appreciate this writing.

Tom said...

Hello! Thanks for the feedback and "fixing".

The emerging market beta was indeed a problem, but then again, there is hardly reliable data availble and most of the time run against local indices...

Michael said...

This is a great collection, thanks a lot! I'm not sure if you were aware, but a few of the links to the data files are broken. Specifically the following: Chinacompfirm10.xls, Chinacompfirm09.xls, Indiacompfirm10.xls, Indiacompfirm09.xls, debtfundJapan.xls, divfundGlobal.xls, vebitChina.xls, and vebitIndia.xls.

Aswath Damodaran said...

Will fix those...

Unknown said...

This post is regarding data sets provided by you. I observe that the data sets are updated up to January 2012. I request you to provide the updated sets up to March 2012.It will benefit all your followers.


Unknown said...

You wrote very well in this article, this contains lot of valuable informative information. The data must be very clearly and perfectly about the business, this is the main consideration thing.
franchise business

Juju said...

Thanks Prof. Btw i just drop by to know what is the different of market risk premium calculated using implied risk premium and market risk premium from service provider by bloomberg ( they are using market cap * avarage ddm index). And like you said, many people still let go their responsibility to service provider for the data eventhough they dont know what the principle behind the formula and how service provider derive the number.


Lets rename wall street money ball street.

Learn Stock Market said...

Thank u for your sharing.Your blog has a unique feature that can make the people who reads become happy.After reading your blog,I feel happy.

Unknown said...

Fantastic post I like the post really like the method that you identified things, you do an excellent career a lot of some others as if you as a result of which type of beneficial weblogs present attention to help you related to many points. I read another useful information sites out of your websites and also I am a whole lot engaged with all your blogging expertise, My partner and i likewise did start to write information sites and also this type websites truly support myself out. My partner and i witout a doubt book marked ones web page and distributed the internet websites to our acquaintances not merely me however every one of them such as your current blogging expertise, desire anyone compose far more interesting weblogs this way a single in addition to best of luck to your future weblogs.

Jimmy Wilson from Bane Vest for Sale