Every tax is a pay cut. Every tax cut is a pay raise.
Citizens for Limited Taxation
Commonwealth of Massachusetts Expenditures
Source of Financial information
|How I did this life-sucking project on state finances|
I can now tell you that while it does not require knowledge of
rocket science or brain surgery, but it does require a
Masters degree in Computer Science to break through the sheer
obfuscation in the state budgets for the
Commonwealth of Massachusetts. Coincidentally, I have one of those.
I started this project back around December of 2007 and finished the bulk of it by May of 2008.
The primary source of financial information is obviously the Massachusetts State Legislature home page where the budgets are automagically created from all those wheel barrows of money we send in on April 15th each year. Majority house leader DiMasi is the alleged brains behind the outfit.
I converted each of these budget HTML files into a text file and wrote a couple of Perl programs (sec.pl) to extract the data and (plugin.pl) to merge the data and create an ASCII file delineated by "|" called mass-expenditures. A sample entry follows:
0320-0001|Supreme Court: chief justices|897209|934978|952518|912413|||||||
The first number is a state supplied expenditure code and I generated the second entry by reading the more lengthy text supplied in the actual budget. The next 10 numbers were the values found for each year. After plowing through ten years of state budgets, there were over 2,600 lines of different and unique expenditures each year.
After generating the data for ten years, I used a simple shell script make-html.sh to sort the data and generate the HTML for ten years of detailed expenditures. The Perl program genhtml.pl was then used to generate the HTML for any given year.
The bulk of this data extraction was done thru several Perl programs I wrote that scanned the web page, extracted the relevant numbers and merged them into the resulting mass-expenditures file. The hard part was the stare and compare phase where I had to cross check my values against the web pages. This sucked untold hours of my miserable life stream.
I then sorted this file by the expenditure code and removed the expenditure values (leaving the expense code and descriptor) to create a file called mass-expense-codes where I remapped the 2,600 expenditures and reduced them to about 80 expenditures. The following entries:
# District Attorney 0340-0100|Suffolk District Attorney # State Police 0340-0101|Suffolk District Attorney: overtime state police # District Attorney 0340-0114|Suffolk District Attorney: Project Sentry 0340-0200|Middlesex District Attorney # State Police 0340-0201|Middlesex District Attorney: state police overtime
would generate the following expense code mappings:
0340-0100 District Attorney 0340-0101 State Police 0340-0114 District Attorney 0340-0200 District Attorney 0340-0201 State Policein the file specific-to-general-mapping.
The 80 or so expenditure codes were selected mostly by sheer number of entries in that category. Obviously, there were lots of Health and Education entries.
Another Perl program mec.pl would then read the expenditures values and the expenditure category remapping file, perform the remapping operation, add up all the assorted values for each year and cough up an ASCII summary file general-commonwealth-expenses.
I than manually arranged the entries in this file by general categories. For instance, I wanted to clump all the entries having to do with criminal justice together because their seem to be so many of them.
Then, gen_mec_html.pl was used to generate the approrpiate HTML which is the final tally of state expenditures.
A recap of files and programs:
|Send comments to: firstname.lastname@example.org|