If you use these tax rates, the following two references are the most relevant and provide details of the simulation procedure:
You may also want to read the following paper, which shows that the simulated MTRs downloadable from this website are close (i.e., the best available) approximations to tax rates simulated with federal government tax return data, and also shows how to fill in missing values when the simulated tax rate is missing (though I recommend that you run your analysis with and without this correction to fill in missing values) ...
The tax rates and some tax laws are described in Graham, John R., and Michael Lemmon, 1998, Measuring Corporate Tax Rates and Tax Incentives: A New Approach, Journal of Applied Corporate Finance 11, 54-65.
See the following reference for guidance on whether to use the BEFORE or AFTER interest expense MTRs (the BEFORE MTRs are used in this paper): Graham, John R., Michael Lemmon, and James Schallheim, 1998, Debt, Leases, Taxes, and the Endogeneity of Corporate Tax Status, Journal of Finance 53, 131-162.
Related papers include:
Please send me an email message to email@example.com if you use the tax rates. This will allow me to notify you if the tax rates are updated or modified.
The tax rate file is arranged as follows. The first column has 8 characters. The first 6 characters are the company's CNUM from COMPUSTAT. The seventh through eighth characters are the first two digits from the company's CIC code from Compustat. The eight characters are analogous to a CUSIP. The original idea behind using eight characters was that you could match exactly to each firm's 8-digit CUSIP. Some firms, like GM, have several Compustat 6-digit CNUMs (one for parent firm, one each for various subsidiaries), so to match precisely, you need to match on more than six digits. Unfortunately, Compustat changes CIC codes so often, that using the 8-digit code in my file from, say, 1998 will not match perfectly for 1997, 1996, etc. for some firms. In other words, you may be stuck matching by 6-digit CNUM and hand-checking for possible problems for firms that have subsidiaries or tracking stock. Note: this is not a problem for most firms. Also, the 6-digit/8-digit difference is a potential issue for using ANY Compustat data and is not specific to the tax rates. Note: Thanks for Elaine Harwood for pointing some of this out to me. (Also, see info about column 5 below)
The second column is the four digit year.
The third column is the simulated corporate MTR based on income BEFORE interest expense has been deducted.
The fourth column is the simulated corporate MTR based on income AFTER interest expense has been deducted.
The fifth column is Permno (and the 6th, 7th, and 8th are other relevant Permnos, for this company, if any). I asked a Ph.D. student to match the 8-digit CUSIPs (described above) to permnos, then to fill in missing Permnos by checking 6-digit CNUMS (as described above), and then to do a hand search to fill in any additional missing Permnos. However, there are still missing values. This feature is fairly new, so if you decide to merge by Permno, you might want to verify the accuracy of some of your matches by hand and/or also use Cusip/Cnum to match.
The sixth column is GVKey.
There are 228,764 firm-year tax rate observations from 1980-2017. THEREFORE YOU CAN NOT READ THE ENTIRE FILE USING old versions of EXCEL, though new Excel versions can handle. Note that since the data end in 2017, they do not reflect changes made to the tax code starting in January 2018.
I believe these "updated" tax rates are superior to the original tax rates described in some of the above 1996 references. I do not feel that there are no major problems with the originals. I would not say that the originals are technically "wrong", just that they have been improved upon. The "updated" files differ from the original tax rates as follows:
1) A "rolling historical" period is used to calculate the mean and drift of the taxable income forecast (rather than the entire sample period as in the originals). At least three historical observations are required to calculate a tax rate. Most tax rates have more historical data as the first tax rates in the sample are from 1980, while the historical data start as early as 1973. Still, a firm first in existence in 1982 can have a tax rate calculated as early as 1984.
2) The book NOL carryforwards are zeroed out for the "before-financing" rax rates (because historical interest can affect the current, cumulative NOL carryforward). For the prefinancing tax rates, NOL carryforwards are calculated from scratch using COMPUSTAT data, with the first possible historical observation being 1973.
3) The drift of the taxable income forecast is constrained to be nonnegative (versus allowing negative values in the original). See the "Proxies for the MTR" reference. Extraordinary and discontinued items do not affect the drift and volatility calculations (although they did in the original). Note, however, that extraordinary and discontinued items do affect the level of taxable income, and hence the tax bill due.
4) One-third of rental expense is added to the COMPUSTAT "interest expense" figure, to account for the interest implicit in the rental payments, when calculating "before-financing" tax rates. See the Graham, Lemmon, and Schallheim reference.
5) The marginal tax rate is calculated based on adding $10,000 to year t income (rather than $1 million as in the original. Note that the original references incorrectly say that $1 was added to year t income). $10,000 is chosen as the smallest dollar amount that is not overwhelmed by rounding in the COMPUSTAT data.
6) The AMT is phased in over a seven year period to more closely represent tax law. The original tax rates assumed that the full AMT effect was felt in 1987.
7) The tax laws that were instituted in late 1997 (e.g., 2-year carryback and 20-year caryforward) are included in the tax rates for 1998.
8) The 2001 tax rates use a two-year carryback period. In conjunction with an economic stimulus package, tax law changed in 2002, retroactive to 2001, to make the carryback period 5 years. The 2001 rates use a two year carryback to reflect the tax law that was in place at the end of 2001, and therefore are the tax rates that management probably had in mind when they made year-end 2001 decisions. The 2002 tax rates include the five-year carryback feature. This tax provision expired starting with 2003 tax rates and therefore a two-year carryback is in effect again starting in 2003.
Similarly, the 2009 tax rates use a five-year carryback period (because this was a feature of the federal stimulus package). Technically, firms could choose between using the five-year carryback in 2008 or 2009 tax years but given that the law was not signed until 2009, I left 2008 with a two-year carryback (under the assumption that management decisions in 2008 were made without knowing that there would eventually be a five year carryback period retroactive to 2008 tax data).
One final item of documentation that is true for all the tax rates, but was not printed in some of the references, is how I handle a missing observation that is surrounded by observations with valid data. For example, CNUM=000021105 has data available from 1985-1995, except that 1988 is missing. The 1988 data is filled in with 1987 values but ONLY when calculating tax rates for 1989 through the last year in the sample. No tax rate is calculated for 1988. The 1987 tax rate calculation is influenced by 1988 data only to the extent that a forecast of 1988 is made using the random walk model, using 1987 data as the last historical period.