Tenets of Wilson: Math

Showing posts with label Math. Show all posts

2021-04-01

Opening Day!

1984 Topps - Glossy All-Stars #12 Whitey Herzog

I've always enjoyed one quirk of these All-Star cards, particularly for the managers. This card is from the 1984 set, features the 1983 All-Star game, which the Manager earned by being in the 1982 World series. But it's April, forget the All-Star break, forget the World Series, it's Opening Day.

So how does Whitey Herzog handle Opening Day? Let's see some stats

Year	Team	Result
1990	STL	Win
1989	STL	Loss
1988	STL	Loss
1987	STL	Win
1986	STL	Win
1985	STL	Loss
1984	STL	Win
1983	STL	Loss
1982	STL	Win
1981	STL	Loss
1979	KCR	Win
1978	KCR	Loss
1977	KCR	Win
1976	KCR	Loss
1973	TEX	Loss
Total	STL	5-5
Total	KCR	2-2
Total	TEX	0-1
Overall		7-8

7-8? That's no fun. So let's get creative. Whitey took over midseason as manager 3 times. 1980 for the Cardinals, and 1975 for the Royals, as you might expect from the years listed above. He also managed 4 games for the Angels between managers in 1974. So let's add in Whitey's personal opening days.

Date	Team	Result
1980-06-09	STL	Win
1975-07-25	KCR	Win
1974-06-26	TEX	Loss
Overall		2-1

Note: I'm not totally sure June 26 was his first game of 1974, but it seems to line up with the records of the previous manager, and starts a 2 for 4 stretch, which would be his final record that year.

We're up to 9-9, that's at least a .500 record. Can we get even more creative? How about 1981, which had a strike mid-year and the season was divided into halves. Everyone got a fresh start, a 0-0 record, and new shot at the playoffs, regardless of your previous record. Sounds like an opening day scenario to me. How'd he do in 1981's second opening day, in August?

Date	Team	Result
1981-08-10	STL	Win

There we go! A 10-9 "Opening Day" record, achieved completely legitimately with no funny business.

Opening Days
STL	5-5
KCR	2-1
TEX	0-1
Opening Days	7-8
First Days
STL	1-0
KCR	1-0
CAL	0-1
First Days	2-1
2nd Half
STL	1-0
Total	10-9

Just for fun, here's a video from one of my favorite Whitey Herzog stories. He was interested in becoming commissioner but was passed over for Bart Giamatti, previously president of Yale. Marv joked that if he didn't get it, there'd be an opening at Yale, and Whitey did not appreciate the joke. Seems a bit petty, but it's on the milder side of what we've seen from athletes and managers getting annoyed by reporters.

2016-05-09

Fairfield Football Repack #9

I think I'm about done with these repacks, for two reasons. First, my local Target is running out, and the next closest Target's are just out of my way enough to be annoying. Next, finishing a set via random loose packs is provably hard, which is why good collation in full boxes is extremely helpful in finishing sets. At this point I'm much more likely to get a duplicate to trade, but I'll need a trading partner before I can consider that helpful. Maybe I'll write up a post on the math of random packs one of these days. If you just can't wait for me, read up on the Coupon Collector's Problem,

2013 Score
#68 Matthew Stafford
#116 Matt Cassel
#125 Chandler Jones
#220 London Fletcher
Airmail #252 Robert Griffin III
Franchise #298 Robert Griffin III
Future Franchise #307 DeMarco Murray
2013 Rookie #432 Tyler Bray

Two Matt's don't make a right, I always say. Nor do two RGIIIs for that matter. London Fletcher was one of my favorite Rams, and I guess for awhile I thought he was much older when he played in St. Louis, because I would just be surprised every year he was still in the league. Even today, he's only 40.

2013 Score
#4 Michael Floyd
#12 Tony Gonzalez
#173 Jared Cook
#264 Terrell Suggs

I got another (now former) Ram in Jared Cook. I'll probably follow the players who were St. Louis Rams and hope they do well on their new teams, but I'm not so sure about the team itself in a year or two when a large majority of the players are new to the team in LA, like Jared Goff. It'll be interesting to see who the last St. Louis Ram standing is, in both LA and the NFL as a whole.

2013 NFL Teenymates
Cleveland, Green Bay, Tampa Bay, New York

Green Bay and Tampa were new, so I now need just 3 Running Backs; Seattle, Buffalo, and Houston.

2014 NFL Teenymates
Buffalo, Cleveland, Baltimore, Dallas

Cleveland and Dallas were new in this group, bringing my list down to 7.

2013 NFL Teenymates puzzle

2013 puzzle progress

This puzzle is now dangerously close to finished, with just 3 spots to go. The middle two pieces are the rest of the Teenymates name, and the piece at the bottom is Pittsburgh.

2014 NFL Teenymates puzzle

2014 puzzle progress

New England and New Orleans were new to my puzzle, and are keeping that Miami piece from hanging off in the ether. Nearly complete puzzles are much easier to move around and scan.

NFL Teenymates
	Figures				Puzzle Pieces
Team	QB	RB	WR	OL	1	2	3	4
STL	0	1	1	0	0	1	0	0
SF	0	1	3	0	0	1	1	0
SEA	0	0	0	0	0	1	1	0
AZ	0	1	0	0	0	1	1	0
GB	0	1	0	0	0	2	1	0
CHI	0	2	1	0	0	1	3	0
MIN	0	1	4	0	0	1	4	0
DET	0	1	1	0	0	2	3	0
TB	0	1	1	0	0	1	0	0
ATL	0	2	1	0	0	1	0	0
NO	0	1	2	0	0	1	1	0
CAR	1	1	1	0	0	2	1	0
DAL	0	1	1	0	0	1	3	0
NYG	0	2	4	0	0	1	2	0
WAS	0	1	1	0	0	1	1	0
PHI	0	2	1	0	0	2	0	0
Team	QB	RB	WR	OL	1	2	3	4
NE	0	1	1	0	0	1	1	0
NYJ	0	2	3	0	0	2	2	0
BUF	2	0	3	0	0	1	1	0
MIA	0	1	1	0	0	1	1	0
SD	1	1	1	0	0	1	1	0
OAK	1	2	1	0	0	1	1	0
DEN	1	1	1	0	0	2	0	0
KC	0	1	0	0	0	2	0	0
IND	0	1	1	0	0	1	3	0
HOU	0	0	0	0	0	1	0	0
JAC	0	2	0	0	0	1	4	0
TEN	0	2	2	0	0	1	1	0
PIT	0	2	2	0	0	0	2	0
CIN	0	1	0	0	0	2	1	0
CLE	0	2	1	0	0	1	1	0
BAL	0	1	3	0	0	1	1	0
Piece	QB	RB	WR	OL	1	2	3	4
Tee					0	1	0	0
nyM					0	0	1	0
ates					0	0	1	0

RB Special: Glow-in-the-Dark

WR Special: Monster (x2), Orange

2016-03-14

Pi Day 2016

As a math guy, I should really do more to celebrate Pi Day. It seems to have become more well known in recent years, but in case you don't know, today is 3/14, and Pi is the transcendental number that begins 3.14..., so today has been declared Pi Day. Rather than geek out about it from a math perspective, I've opted for a baseball slant. I haven't really posted anything about it for a few years, since 2013 when I declared it Pete Incaviglia Day, because at the time he was the only P.I. in the major leagues.

1993 Topps - ToppsGold #7 Pete Incaviglia

Just over a month after that Pi Day, we got another P.I. in the majors - Phil Irwin. I don't have any cards of his, but he played with Pittsburgh in 2013 and Texas in 2014, for a total of 8.2 innings. Phil's going to have to put in a little more work to get his name on Pi Day, so Happy Pete Incaviglia Day!

2013-03-14

Pi Day 2013

Last year, I brought you Felix Pie - all 2 cards of his I owned at the time. And yes, I know the pronunciation is not the same as pi, but I'm a simple man, seeking out cheap laughs. This year, I hereby declare.

Pete Incaviglia Day

That's right, Pete Incaviglia. Did you know that - at least according to baseball reference's player listings - he's the only MLB player in history with the initials P.I.? I bet you didn't, nor did you care...nor do you care now. But here's ol' Pete anyway.

1990 Donruss #48 Pete Incaviglia

1991 Donruss #464 Pete Incaviglia

1992 Upper Deck #271 Pete Incaviglia

1993 Topps - ToppsGold #7 Pete Incaviglia

1994 Fleer Ultra #246 Pete Incaviglia

It was easy to choose the scans to use, because these are the only 5 Incaviglia cards I have, though I have multiple copies of the 1991 Donruss.

But wait, there's more.

1997 Topps #550 Pete Incaviglia

I don't own this card, but it won me a few other cards in the 2011 Prime 9 redemption set, when I picked it out based on a small cropped piece.

Now, everyone go enjoy some pie.

2013-03-13

eBay Wins #63

I've been hunting for penny cards again, and this time I came up with 6 1989 Donruss commons.

1989 Donruss
#296 Walt Terrell
#302 Mike Brumley
#308 Ron Robinson
#314 John Shelby
#320 John Farrell
#325 Kevin Bass

I wasn't sure I could even get out a full sentence about these 6 cards. None of them were ever Cardinals, and everyone's in pretty standard late-80s uniforms, except maybe Mike Brumley up there in a brown Padres (presumably batting practice) jersey. Then I noticed that, by simply lining up these 6 non-consecutively numbered cards in order, the left-to-right gradient colors matched up perfectly. If I hadn't already sleeved and filed these away, I would have re-scanned in one long strip, because the Robinson and Shelby would even match up.

I wonder what the odds are of that happening.

When I start wondering about numbers, I can't help myself; I have to try to calculate the answer. To make it straighforward, assume 6 random different cards, and put them in a fixed order. There are 660 cards in 1989 Donruss, and 5 color patterns - Red-Yellow, Yellow-Green, Green-Blue, Blue-Purple, Purple-Red. Assume that exactly 1/5 of cards - or 132 - have each pattern. The color of the first card won't matter. The next card has a 132/659 chance of matching the border. It would be 132 in 660, or 1 in 5, but the first card has been removed from the pool. Assuming that works, the next card has a 132/658 chance of aligning, the fourth card's is 132/657, the fifth's is 132/656, and the sixth should match the first card, so there are 131 of 655 remaining cards that can finish it off.

By my count, that's just a 0.03% chance of this lineup happening. That's enough to make me want to go play the state lottery tonight.

Of course, these cards aren't randomized, they're sorted by number, and Donruss doesn't seem to follow a pattern of colors, so I'd have to enter all of the cards into a table by color and check each possible collection of 6 cards for the gradient match to get the true odds for this set. I'll stick that on my to-do list, somewhere after re-staining my deck, but before solving the world's energy crisis.

eBay Bargain Tracker
Total Cards Bought	2115
Total Spent	$40.02
Per Card	1.892 cents

2012-05-30

2012 Google Code Jam, Round 1B

In the second part of Round 1, with 1000 of the qualifiers ineligible due to already advancing, 3273 people scored points. Only the top 1000 would advance. This put the cutoff at 27 points in 1 hour 19 minutes, or 28 points or more.

I first attempted Problem A: Safety in Numbers.This problem is framed as a reality show that is apparently similar to Dancing with the Stars. Here are the bare facts, with a longer description available at the link.

There are N contestants
Each contestant gets a score from the judges from 0 to 100
Based on audience voting, the second part of the score is determined by multiplying the percentage of audience votes received by the total points awarded by the judges.

For example, if there are 2 contestants, scored 20 and 10 by the judges, they will split an additional 30 points based on the share of audience votes, so the final score may fall anywhere from 50-10 to 20-40.

In case of a tie for the lowest score, no one is eliminated that week, otherwise the lowest is eliminated.

The judges' scores are given, the problem is to find the minimum percentage of audience votes required to guarantee the contestant is saved no matter how the rest of the audience votes are distributed. In the 20-10 case above, the answer is

33.33 66.67

because if the first contestant gets 1/3 of the vote, they'll get 10 more points for a total of 30, and the second contestant can't beat him, but can only tie him. The same logic applies for contestant 2. a 2/3 vote would give him 30 and make him unbeatable (but tie-able, this is important to remember but I'll stop saying it now).

I struggled for awhile with this problem, first thinking I just needed to split the audience vote between the each contestant and the worst judged contestant, which works in the 2 contestant case. Another sample set was:
24 30 21
with a stated answer of
34.666667 26.666667 38.666667
This one flustered me because my algorithm said that giving 34.6% of the 75 audience points to #1 yielded a total of 50 points, and giving the other 65.3% to #3 yielded a total of 70, meaning #1 would lose. After a few minutes, I smacked myself in the forehead and realized #1 would be safe, because #2 would be stuck at 30 in that case. This led me to a closer solution, that each contestant needed enough audience votes to get to 1/N of the total points. So for this case with 75 judge points (and 75 audience points), that meant percentages that led to 50 total points would be the correct answer.

There was one more subtlety, what if a contestant already had more than their 1/N share without any audience votes at all? For example:
45 1 1 1
In this case, there are 96 total points, so each one would need 24 based on the previous assumption. But giving 25% to #2 would yield for him 13 total points, over the required 1/N, but #3 and #4 could each get 37.5% and beat him. I modified my algorithm again to ignore any contestants, and their points, if they were already over the 1/N threshold. The rest of the contestants would need to reach, in this case
1/(N-1) * (96 - 45) = 17 points
This yielded an answer of
0 33.3 33.3 33.3
which makes sense. Contestant #1 is already safe, and the others are on even footing, so splitting the points 3 ways keeps them all safe.

I used this algorithm to solve the easy problem, but as it turns out, it wasn't exactly correct. There were more subtleties around having lots of point values near the 1/N cutoff, and still more that were higher than the new cutoff of 1/(N-1) of a lesser value. As a result, my large solution failed. The problem author suggested using a binary search in the post-contest analysis, which was way off from what I did.

I spent lots of time on Problem A, so then decided to go straight for Problem C: Equal Sums, as it was worth a few more points than problem B if solved in full, and I thought I might need them.

This problem has a very simple setup, and a very complex answer. Given a set of integers, find two subsets with the same sum. The small input would be a set of 20 positive integers under 10,000. I actually iterated over all subsets, at least until I found an answer. This meant for each set, I could loop over up to 2^20, or 1,048,576 subsets if none had a matching sum. I wasn't sure that was feasible in the 4 minutes allowed, but it wound up working.

For the large input, the set had 500 integers under 1,000,000,000,000. This meant using the same algorithm was likely not an option, as 2^500 is a very large number of subsets to loop over, and I certainly wouldn't be able to store all of the sums in the case that I had to get all the way to the end and find there were no matches. I did some searching online and found that this is roughly equivalent to an NP-complete problem, the Subset Sum problem. I was very low on time by this point so I just modified my original code to loop over all 2^500 solutions, after all, I had 8 minutes. I ran it, and it finished very quickly, causing me to question if it really did what it was supposed to. I had no time to check as the clock was winding down, so I submitted the large input without looking at it.

Just a minute or so later I found out it was wrong. There was another bug in my code that reads the input, as it can't handle numbers greater than 32 bits. Once again, at least I've found something to fix for next year. If you want an in-depth explanation of the solution to this problem, I highly suggest reading the contest analysis for it, but in a nutshell, because of the size of the numbers and the sheer numbers of available subsets, there must be matching sum subsets, and lots of them. It is actually sufficient to choose random 6-number subsets until two with matching sums are found. It seems crazy, but that's math for you. I've actually gone back to read the "Rigorous proof" section of the analysis several times, and each time it makes a little more sense to me.

In the end, I finished with a paltry 16 points, putting me in 2262nd place for this round. Round 1C was the next day at 4am local time, since the Code Jam tries to scatter the start times to be fair to the whole world. I opted not to wake up for that one, so that ends my Code Jam journey this year.

2012-05-16

2012 Google Code Jam, Round 1A

In the next round of the Google Code Jam this year, 3686 people scored at least some points. Only the top 1000 would advance, though, which took at least 53/100 points and a fast enough time.

I first attacked Problem B. Kingdom Rush. It has a long explanation you can see at the link, but here's a short summary:

There are N levels to finish
Each level can be completed with a 1 star or 2 star rating; if it is previously completed with 1 star, completing it with 2 stars only earns one additional star, and completing it with 1 star is meaningless. Thus, the maximum possible stars to earn is 2N
There is a prerequisite number of stars to get a 1 star rating on a level, and a greater or equal prerequisite to finish the level with 2 stars.

The object is to find the fewest number of runs possible to complete all levels with 2 stars, if it's possible at all. Levels don't have to be completed in any order.

One sample input is:
0 2
0 1
meaning that there are 2 levels, and the first level has a pre-req of 0 for 1 star, and 2 for 2 stars. One example run could be finishing level 2 with 1 star (now you have 1 star), then finishing level 2 with with 2 stars (now you have 2 stars), then finishing level 1 with 2 stars in one single run because the pre-req is only 2 stars, for a total of 3 runs.

An impossible input would be
1 1
1 1
1 1
because, starting with 0 stars, there's no way to finish any level first.
Also impossible would be
0 0
0 0
0 6
because you couldn't get 6 stars before attempting 2 stars on the last level.

My first approach was to try to finish the first possible 2-star level, and if one wasn't reachable with the current level of stars, try the first possible 1 star level. This solved the test data, but my submissions for the small input kept getting rejected, so I knew there must be an additional trick. I tried to come up with solutions that broke my algorithm, and discovered one.
0 1
0 3

With this input, first I'd try all the 2-star values, and fail. Then I'd take the first 1-star value, leaving:
0 1
0 3

The next run would take the first possible 2-star value, at 1.
0 1
0 3

Next, the 2 star search would fail again, and it would take the next 0.
0 1
0 3

Then the 3, for a total of 4 runs.
0 1
0 3

But, I can see a faster way. First, take the last 0
0 1
0 3

Now that I have 1 star, I can complete the first level all at once
0 1
0 3

And with 3 stars, I can complete the second level, for a total of 3 runs
0 1
0 3

Adding the run for the final level lets this one get finished in 3 runs instead of the previous 4. So the flaw in the algorithm was in taking the first available 1-star, when instead I needed to take the 1-star associated with the highest possible 2-star value, in order to leave the others available to possibly finish both stars in one single run. I fixed my algorithm and submitted the small input, which was then correct. I submitted the large input and had to wait until the end to find out it if it was right, which it was.

Next, I attempted Problem A. Password Problem. Again, it has a long explanation, I'll try to boil down to some bullet points.

I have already typed A characters of my password, which is B characters wrong.
The chances I typed each of the A previous characters correctly are given
All future characters will be typed correctly
My choices are

keep typing
backspace 1 to A characters and continue typing
hit enter and start over.

The object is to choose the strategy to minimize the expected number of keystrokes.
Side note: I don't usually like expected value that much.

Here was the contest example:

Suppose my password is "guest" and I have already typed the first two characters, but I had a 40% chance of making a mistake when typing each of them. Then there are four cases:

I typed "gu" without error. This occurs with probability 0.6 * 0.6 = 0.36.
I typed the 'g' correctly but I made a mistake typing the 'u'. Then I have two letters typed still, but the second one is wrong: "gX". (Here, the 'X' character represents a mistyped letter.) This occurs with probability 0.6 * 0.4 = 0.24.
I typed the 'u' correctly but I made a mistake typing the 'g': "Xu". This occurs with probability 0.4 * 0.6 = 0.24.
I made a mistake typing both letters, so I have two incorrect letters: "XX". This occurs with probability 0.4 * 0.4 = 0.16.

I don't know how many mistakes I actually made, but for any strategy, I can calculate the expected number of keys required to use it. This is shown in the table below:

        "gu" "gX" "Xu" "XX" Expected
Probability                            0.36 0.24 0.24 0.16 -
Keystrokes if I keep typing            4      10   10   10 7.84
Keystrokes if I press backspace once   6       6   12   12 8.4
Keystrokes if I press backspace twice  8       8    8    8 8
Keystrokes if I press enter right away 7       7    7    7 7

My first thought here was that I would need to calculate the 2^A possibilities, so if I got character 1 right or wrong, combined with if I got character 2 right or wrong, etc. Since A could be up to 99999 in the problem statement, I knew that wouldn't be feasible. Then I realized it only mattered what the odds were of typing the first x characters correct, and anything after the first wrong character didn't effect the answer to the problem. that is why the columns under "Xu" and "XX" look the same

I treated the "keep typing" case as pressing backspace 0 times, so that it fit more nicely in my code. Now there were only 2 cases to check. Pressing enter right away was easy, the answer was always the same, so the expected value was always B+2, for the first enter, the B characters of the password, and the final enter.

Since the probability could be different for each letter, I stored them in a vector with the intent to multiply the ones I needed. Then I realized that what I really needed was the cumulative probability of getting all letters right up to any given letter, so I stored the values as

p1, p1*p2, p1*p2*p3,...

resulting in a new vector

P(0), P(1), P(2), P(3)...

Now I could figure out the answer in linear time. Letting the number of backspaces be C, and looping over C from 0 to A, the expected value with C backspaces was the product of all of the probabilities of typing characters 0 to A-C correct, times 2C+B-A+1 (C backspaces, C replacement characters, the rest of the password, plus the enter key), plus 1 minus that product times 2C+2B-A+2 ( The 2C+B-A+1 from before, plus another B characters when it is rejected, and the final enter).

So the answer to each case, given A, B, and the vector of p's is:

min(B+2,P(A-C)(2C+B-A+1),(1-P(A-C))(2C+2B-A+2))

for all C 0 to A

(I really need to get LaTeX working for Blogger to write fancier equations.)

I thought I had cleverly solved this one, and submitted the small input. Unfortunately, the large was wrong because I had not planned on reading in 900,000 characters from the input file on a single line, and some code I wrote several years ago broke. This is because each case was input as:
A B
p1 p2 p3 ... pA

Where px is the probability of getting character x correct. The probabilities were in the form x.xxxxxx, so a total of 9 characters each including numbers, decimal point, and space. With A up to 99999, there were nearly 900k characters on one of the lines.

So the takeaway from this round was to fix my input reader. I finished with 43 points, placing at #1463.

On to Round 1B!

2012-03-14

Pi Day 2012

Since I'm writing this Pi day post in advance, I don't know exactly how many pictures of Felix Pie you've seen today, but I'm going to guess a lot if I know my pun-loving blogger compatriots. Here's one more.

2011 Topps #106 Felix Pie

But wait, there's more than just a pun here. One of my other interests is mathematics. Did you know there's a movement among mathematicians with too much time on their hands to replace pi with tau, which is equal to 2 times pi? No? Yes? Why am I pretending I can hear you? We're going to need another Pie.

2011 Topps - Diamond Anniversary Parallel #106 Felix Pie

Here is one of the main starting points to read about it: Pi Is Wrong! Also one of my favorite math-based web comics, Spiked Math, covered the topic Tuesday. Basically, there are all kinds of reasons that a fundamental constant of 2pi makes more sense than pi. It's kind of like if we measured cars' speeds in half-miles per hour, and called them elims, and Interstate speed limits were 130 or 140 elims per hour instead of 65 or 70 miles per hour. All the math would still work, it might just feel a little off compared to other usages of the mile.

Alright, everyone go enjoy some pie.

2011-12-15

The Pujols Contract, part II

Last week I posted about the Albert Pujols contract when it looked like it was down to the Marlins and Cardinals. I was definitely a little taken aback by the sudden announcement of the Angels contract.

The post centered around a claim that Pujols would be paying way more in state taxes in St. Louis. I got a comment, which is still rare on my blog, asking about the difference from the Angels, so I figured I might as well run the numbers. Here they are.

City	State	Rate	Angels	Income	Tax	Cardinals	Income	Tax
HOU	TX	0.00%				9	12.22	0
MIA	FL	0.00%				4	5.43	0
SEA	WAS	0.00%	10	16.08	0
TB	FL	0.00%	3	4.82	0
TEX	TX	0.00%	10	16.08	0
PHI	PA	3.07%				3	4.07	0.13
PIT	PA	3.07%				6	8.15	0.25
DET	MI	4.35%	7	11.25	0.49	3	4.07	0.18
ARI	AZ	4.54%				3	4.07	0.18
COL	CO	4.63%	3	4.82	0.22	3	4.07	0.19
CHC	IL	5%				9	12.22	0.61
CHW	IL	5%	3	4.82	0.24
BOS	MA	5.30%	3	4.82	0.26
BAL	MD	5.50%	2	3.22	0.18
CIN	OH	5.93%				9	12.22	0.72
CLE	OH	5.93%	6	9.65	0.57
ATL	GA	6%				3	4.07	0.24
KC	MO	6%	3	4.82	0.29	3	4.07	0.24
STL	MO	6%				81	110.00	6.60
MIL	WI	7.75%				6	8.15	0.63
MIN	MN	7.85%	6	9.65	0.76
DC	DC	8.50%				4	5.43	0.46
NYM	NY	8.97%				4	5.43	0.49
NYY	NY	8.97%	6	9.65	0.87
LAA	CA	9.30%	81	130.22	12.11
LAD	CA	9.30%	3	4.82	0.45	7	9.51	0.88
OAK	CA	9.30%	9	14.47	1.35
SD	CA	9.30%	3	4.82	0.45	3	4.07	0.38
SF	CA	9.30%				2	2.72	0.25
TOR	Canada, eh		4	0	0
			158	254	18.22	162	220	12.45
					Profit of LAA over STL			28.22

To deal with the fact that the Angels do travel to Toronto, I just spread the income over just the 158 games in the US. This essentially treats Toronto as the weighted average of the other games. I'm sure it doesn't throw off the end number by much, but I have no idea what those games do to the tax bill.

This contract was also $34 Million more than the reported $220 Million Cardinals offer. If the Cardinals had decided to match it, Albert would have been $3.85 Million better off in St Louis, due to the high tax rates in California (although somewhat alleviated by the games in Texas). In fact, for $249.92 Million, the Cardinals could have had Albert for the same after-state-tax price. That high tax average may come down a bit once Houston moves to the AL West, however, giving him another in-division 0% tax opponent.

In the end, Albert didn't have to look at nickels and dimes, or even hundreds of thousands, to make his decision. He got tens of millions more, and I hope he's happy in Southern California. I also hope the Cardinals are ultimately happy with their decision. Landing Pujols would have certainly cost a lot of money, but would have paid some returns in ticket sales and wins. Hopefully we'll be able to parlay some of that money into 2 or 3 good players to help the team win even more.

2011-12-07

The Pujols Contract

Do you know what makes me mad (with a small m)? Bad Math.

Last night on my way home from class, I heard Cliff Saunders say something provocative to get reactions, as talk radio hosts usually do. He said all things being equal, if he were Pujols, he'd take the Miami offer over the Cardinals, then, as though he could feel 10,000 St. Louisans gasping and yelling "How DARE he?!", he followed with "And let me tell you why. State Taxes. Albert's $220 Million will be about $165 Million after Missouri taxes, and Florida has NO state income tax."

My BS meter immediately went off before even having to calculate the numbers. Some quick math and I discovered Mr. Saunders thinks he's paying 25% state tax in Missouri. I know Missouri basically works out to a flat 6% if you make enough to survive, putting the bill at around $13.2 Million, as opposed to Cliff's $55 Million . His accountant better be worried, Cliff may be coming for the 19% of his income the cheater pocketed last year. To his credit, he did come back from break and apologize for the math error (what I think he really meant was, sorry for making up numbers on the spot to be provocative), and state that the right bill would be around $13 million. So the Cardinals should simply offer $234+ million so that after taxes, it works out the same, right?

Then I started thinking about the funny mental image I get every year I do my state taxes. At least in Missouri, there's a specific section asking if you are a performer of some sort and if any of your income was earned in other states, making me wistfully think of running off to join the circus. My understanding is that this applies to professional athletes as well, as I recall Sports Illustrated throwing out the "Fun Fact" of how much California state tax Alex Rodriguez would pay during his 10-year, $252 Million deal. So, I ran the numbers using next year's schedule as a guide. I know Houston is leaving the NL Central and interleague play changes from year to year, but all I have to go on is the 2012 schedule for now, so let's assume everything is consistent year to year, and ignore anything but the 162 game schedule for simplicity

Assumptions for simplicity:
Pujols will pay the top marginal tax rate on ALL of his money
Tax rates will stay the same for 10 years.
Each regular season game counts for 1/162 of his pay.
The schedule and states of teams stay the same for the next 10 years.

My source for the state tax rates.

City	State	Rate	Marlins	Income	Tax	Cardinals	Income	Tax
HOU	TX	0.00%	3	4.07	0	9	12.22	0
MIA	FL	0.00%	81	110.00	0	4	5.43	0
SEA	WAS	0.00%	0	0	0	0	0	0
TB	FL	0.00%	3	4.07	0	0	0	0
TEX	TX	0.00%	0	0	0	0	0	0
PHI	PA	3.07%	9	12.22	0.38	3	4.07	0.13
PIT	PA	3.07%	3	4.07	0.13	6	8.15	0.25
DET	MI	4.35%	0	0	0	3	4.07	0.18
ARI	AZ	4.54%	4	5.43	0.25	3	4.07	0.18
COL	CO	4.63%	4	5.43	0.25	3	4.07	0.19
CHC	IL	5%	3	4.07	0.20	9	12.22	0.61
CHW	IL	5%	0	0	0	0	0	0
BOS	MA	5.30%	3	4.07	0.22	0	0	0
BAL	MD	5.50%	0	0	0	0	0	0
CIN	OH	5.93%	3	4.07	0.24	9	12.22	0.72
CLE	OH	5.93%	3	4.07	0.24	0	0	0
ATL	GA	6%	9	12.22	0.73	3	4.07	0.24
KC	MO	6%	0	0	0	3	4.07	0.24
STL	MO	6%	3	4.07	0.24	81	110.00	6.60
MIL	WI	7.75%	4	5.43	0.42	6	8.15	0.63
MIN	MN	7.85%	0	0	0	0	0	0
DC	DC	8.50%	9	12.22	1.04	4	5.43	0.46
NYM	NY	8.97%	9	12.22	1.10	4	5.43	0.49
NYY	NY	8.97%	0	0	0	0	0	0
LAA	CA	9.30%	0	0	0	0	0	0
LAD	CA	9.30%	3	4.07	0.38	7	9.51	0.88
OAK	CA	9.30%	0	0	0	0	0	0
SD	CA	9.30%	3	4.07	0.38	3	4.07	0.38
SF	CA	9.30%	3	4.07	0.38	2	2.72	0.25
TOR	Canada, eh		0	0	0	0	0	0
			162	220	6.57	162	220	12.44
					Extra Cost of STL over MIA			5.87

I'm glad neither team is travelling to Toronto. I don't know how that works.

So, all things being equal, the total difference over 10 years to Albert is around $5.87 Million, or $587,000 per year. It will also scale linearly to the actual contract value, so if he signs for $200 million the difference is closer to $533,000 per year. I suppose it's not an insignificant sum, but it's hard to think he'd leave fans that would likely never turn on him, even when he's a 41 year old base-clogger, and the chance to be a Cardinal for life, someone the kids want to see at opening day ceremonies for the next 40-60 years, the next Stan Musial in this town, for under $6 million. Of course, the Cardinals are low-balling him, as it'll mean a lot to ticket sales to have him around for the rest of his career and life, so maybe they could throw in another $600k a year. Add 20 cents to each ticket sold for the next 10 years and they should have it. Look at me, the Great Compromiser.

2011-05-23

2011 Google Code Jam: Round 1

Again, here is the most important detail first: I advanced!

As of this writing I'm sitting at 910th place in Round 1B. Assuming no disqualifications of me or anyone else, that's where I'll stay. As a reminder, Round 1 is split into sub-rounds A, B, and C. Everyone can compete in 1A, and the top 1000 advance. All but those 1000 can compete in 1B, and again, the top 1000 advance. Finally, everyone but the already-advanced 2000 gets one final shot in 1C, and again, the top 1000 advance. This mean there will be 3000 coders in Round 2, and I'm one of them.

As Google noted, Round 1A was brutal. I skipped the lowest-value Problem A out of fear it wouldn't be enough to advance (As it turns out, solving it in 42:01 would have been enough). I also felt like there were some tricky cases in it that made it harder than it should be for the points offered. Problem B also seemed like it could wind up taking too long to run a quickly coded solution. So, I focused in on the highest-value, Problem C.

You are playing a game with a fancy deck of cards. Each card has three bonus numbers: a card bonus c, a score bonus s, and a turn bonus t. Some of the cards start in your hand, while the rest are in a deck on the table. You start with one turn.

On each turn, you can choose any card from your hand and play it. If it has bonus numbers c, s, t, then the following happens:

The card is discarded from your hand, and it can never be used again.

You draw the first c cards from the deck into your hand. If the deck has fewer thanc cards in it, you draw all of them.

Your total score increases by s.

Your number of remaining turns increases by t.

If you do not have any cards in your hand at the start of a turn, then nothing happens on that turn. Your goal is to get as high a score as possible before running out of turns.

For example, suppose your hand and deck contain the following cards:
         +---+---+---+            +---+---+---+
   HAND: | c | s | t |      DECK: | c | s | t |
         +---+---+---+            +---+---+---+
Card #1: | 0 | 0 | 2 |   Card #4: | 1 | 1 | 0 |
Card #2: | 0 | 5 | 0 |   Card #5: | 0 | 1 | 1 |
Card #3: | 2 | 1 | 1 |   Card #6: | 2 | 2 | 0 |
         +---+---+---+            +---+---+---+
The following table shows how you can get a score of 8 from these cards. The first three columns show your hand, the number of turns left, and your score before playing each card, and the final column shows which card to play.
+---------+------------+-------+------+ | Hand | Turns left | Score | Play | +---------+------------+-------+------+ | 1, 2, 3 | 1 | 0 | 1 | | 2, 3 | 2 | 0 | 3 | | 2, 4, 5 | 2 | 1 | 2 | | 4, 5 | 1 | 6 | 5 | | 4 | 1 | 7 | 4 | | 6 | 0 | 8 | - | +---------+------------+-------+------+
As you can see, the card bonuses and turn bonuses allow you to chain together a long sequence of cards before you have to stop.

My first instinct turned out to be correct, but it went south from there. I figured out the best thing to do is play any cards with a turn bonus first. That would certainly have no negative effect on the potential final score. From there I tried varying strategies, from playing the cards with the highest card bonus, to playing the highest score when you're down to your last turn, to a recursive brute force algorithm. None worked for even the small input. I ended with 0 points.

Round 1B turned out much better. I just about jumped out of my chair when I saw Problem A. It asked for a fairly simple RPI calculation. I've written lots of code like this for various personal projects, such as writing an algorithm designed to be "Better than the BCS" and decide the best team in College Football and any sport. At the 32:03 mark, I had full credit for the small and large input. I wondered if that would be enough to advance, but I suspected it might not. I read the next problems and took a quick break to think about which one I could do best.

Problem B looked most plausible to solve quickly, so I went for it.

Last year, several hot dog vendors were lined up along a street, and they had a tricky algorithm to spread themselves out. Unfortunately, the algorithm was very slow and they are still going. All is not lost though! The hot dog vendors have a plan: time to try a new algorithm!

The problem is that multiple vendors might be selling too close to each other, and then they will take each other's business. The vendors can move along the street at 1 meter/second. To avoid interfering with each other, they want to stand so that every pair of them is separated by a distance of at least D meters.

Remember that the street is really long, so there is no danger of running out of space to move in either direction. Given the starting positions of all hot dog vendors, you should find the minimum time they need before all the vendors are separated (each two vendors are at least D meters apart from each other).

My solution wasn't quite optimal, so it worked for the small input, but failed for the large input. In fact, I let my solution on the large input run much longer than the allotted 8 minutes. After an hour, I finally stopped it, still with no solution. My first naive solution was to move in increments of 0.5s, start with the leftmost vendor, and move everyone 0.5m to the left, as long as they would not get closer than D meters to the vendor to the left. In that same 0.5s, I did the same thing starting from the right and looping to the left, moving everyone to the right as long as they wouldn't get too close. I refined the solution a bit before attempting the large input, but still it took too long to run.

With my time expired on Problem B and not enough time to think out a solution on Problem C, I was down to scoreboard watching. I saw my rank drop slowly from 561, down to 999th with 19 minutes to go. When it hit 1009 with 18 minutes to go I knew I had to get out of the house for a bit before I went crazy. I ran 2 quick errands and returned about 45 minutes after the end of the contest, and saw I finished in 910th place, good enough to advance. This happens because Google gives an "optimistic" score during the competition, so anyone who submits a solution for the large input gets credit until the correctness of the solution is revealed at the end. My large input solution for Problem A was correct, and enough other solutions were incorrect to vault me back into the top 1K.

For the 3rd consecutive year I've made the top 3000. I'm hoping I can get one step further this year and hit the top 500 to make it to Round 3, but I'm pretty happy with how far I've gotten already.