In 1992, I had just finished a spreadsheet that contained the average PE ratios for companies in different sectors in the United States. There was little of substance in it, but I decided that since I had it, I might as well share it. I posted that spreadsheet for students in my class to download and made it available to others who visited my website (more hopeful thinking than an actual plan, since there were relatively few people looking for data online). Each year since, I have added to the data collection, initially expanding my list of data items for US companies, and in the last decade, adding to the collection by looking at non-US companies. It is my first task each year and it takes up the first week of the year, and I just uploaded the data today for the 2014 update.
I never imagined that my initial foray into data sharing that started with one spreadsheet of a single statistic would expand to cover 285 spreadsheets in 2014, with more than a thousand data items and that my universe of stocks would include 40,906 listed companies in 131 markets.
While you can find them all by going to the data section on my website, I won’t bore you with the details in this post, but focus instead on the what, why and what next of data.
The “what”: It starts with raw data!
In the last three decades, we have witnessed a revolution in data access that we need to step back to appreciate. In the 1980s, unless you worked at a university or an investment bank, your access to data was not just limited but often non-existent. I remember trekking to the library (yes, the place with real books and reference stacks and the Dewey decimal system) to review Value Line summary sheets for individual companies and the industry averages that S&P published at the start of every year. I had access to Compustat through the university but it was accounting-focused (with very few market numbers) and dated.
The first glimmers of the data revolution were in the 1990s and for me, it began with Value Line offering an electronic version of the data, delivered on a CD every month by mail. That was the basis for my first data updates and Value Line data remains my base for US data, more because of my familiarity with it and its history than any special characteristics. In fact, there are databases that have richer detail, not just in terms of having more data items for US companies, but in bringing in listings in other markets. My decision to expand my data updates from US to global companies was triggered by my access to Bloomberg terminals that were installed at the Stern School of Business about a decade ago. About five years ago, I started tapping into Capital IQ, an S&P product, that is one of the more comprehensive databases for global companies today. In addition to accounting data, it includes market data and corporate governance data on individual companies and an easy interface for screening and downloading data.
My focus in data analysis is to consolidate the data into a form where it not only less overwhelming but also more usable in valuation and corporate finance endeavors. To that effect, I compute averages on key statistics (profitability measures, risk measures and financial leverage measures) across industries and geographical groupings. I also use the raw data to put my spin on corporate finance measures (cost of capital, excess returns) for individual companies.
The why: It is purely self-interest!
While I am gratified that there are some out there who use my data in their analyses, I want to be clear that there is very little that is altruistic about my efforts. So, in case you are curious, here are the reasons why I think that the week that I spend at the start of each year is well spent.
- Anchor Angst: Behavioral economists, starting with Kahnemann and Tversky, have noted that investors and analysts look for anchors, starting points for making judgments, when making decision. They also noted that these anchors are often either skewed (by an investor's own experiences and history) or based on fiction, leading to bad decisions. So, what is a low PE ratio in today’s market or a high revenue multiple? Rather than make those judgments based on bad information, I find it useful to look at the data each year and let it inform my assessments. It is this theme that I used for my update last year, where I used one of my favorite books/movies, Moneyball, to illustrate the power of data.
- It is a time saver: This may seem like an odd claim to make, after I have spent a week collecting and processing the data, but I am convinced that the net effect of my efforts during the last week will be a time saving over the course of the year. As some of you are aware, I not only teach a valuation class but I also value companies frequently, both in the context of the class and to satisfy my curiosity. While the starting data for my valuations comes from the company’s financial statements, the key inputs in valuation are often industry-wide risk and profitability measures. The industry averages that I computed this week will often be the numbers that I return to over and over again, during the course of this year.
- Go global: It is easy to talk “global” but it remains true that we are most comfortable with staying “local”. This is not only true for investors, who continue to have a home bias in investing (over investing in their domestic markets) but it also applies to businesses and academics. In fact, much of finance research, while paying lip service to the global market, continues to have a US focus. One reason that I have extended and deepened my analysis of global companies over time is to fill in the empty spots in my knowledge on listed companies in many of the smaller markets. It is telling that 80% of the time that I spent in the last week was on non-US data, a significant jump from the cursory efforts I made a decade ago when I started reporting global numbers.
The what next: Caveat emptor!
If you do decide to download and use any of the data on my website and use it, here are a few things that I hope that you will keep in mind:
- Data can be subjective: Contrary to the widely held view that numbers are objective, the statistics that you will see in my datasets reflect my judgments and points of view, some of which you may agree with, but some that you may disagree with, perhaps vehemently. Thus, my estimates of equity risk premiums for individual countries are largely based upon sovereign ratings and CDS spreads, both bond market measures of default risk. Similarly, my estimates of costs of capital for individual companies are built on my estimates of relative risk (beta) for these companies, which are in turn estimated from the sectors that they operate in and their policies on debt.
- Bludgeon, not scalpel: One of the key differences between analyzing one company and trying to assess tens of thousands of companies is that you cannot have too much nuance in the estimation approaches that you use for the latter. For example, for an individual company, I will try to estimate the cost of debt, based on an actual or synthetic bond rating. With multitudes of companies, I use a much looser approximation, where I tie the cost of debt to the variability in the stock price. Bottom line: If you are valuing an individual company, go to the source (the annual report and financial filings) and not the line data that you see for that company on my data set. If you are analyzing an entire sector, you can use my approximated data in your analysis.
- There will be mistakes in the raw data: I am incredibly grateful to Value Line, Bloomberg and S&P for giving me access to the raw data on companies, but it is also true that there is potential for human error at the date input stage. While I run my own tests to try and catch data input errors , I will miss a few. Thus, if you do find a company in my data base that has a return on equity of 20,000% or a PE ratio of 0.1, odds are that there is something wrong in the raw data of the company.
- The outlier conundrum: Even if the raw data is accurate, the ratios and multiples computed from that data can sometimes yield absurd values. Thus, the PE ratio for a company with earnings fading towards zero can converge on infinity. With individual companies, you notice these absurdities and either adjust for them or look for alternative statistics. With large samples, though, that oversight is again difficult and while I could have arbitrarily set limits (ignore PE ratios greater than 200, for instance), I was reluctant to put my imprint on the data. So, if you see strange numbers for some statistics, it is what came out of the data.
- The law of large numbers is your ally: The other side of large samples is a positive one, since the advantage of having very large samples is that the outliers have less of an impact on your statistics. Thus, I am comforted by knowing that I have hundreds of firms in each sector, when I compute my averages and that strange numbers on the part of a few companies will have only a small impact on the averages.
P.S: As always, there are dozens of links and data sets in my data page and I am sure that I have screwed up on some of them. If you find any missing links or have issues with the data, please let me know and I will fix them as soon as I can.