Tenets of Wilson: codejam

Showing posts with label codejam. Show all posts

2012-05-30

2012 Google Code Jam, Round 1B

In the second part of Round 1, with 1000 of the qualifiers ineligible due to already advancing, 3273 people scored points. Only the top 1000 would advance. This put the cutoff at 27 points in 1 hour 19 minutes, or 28 points or more.

I first attempted Problem A: Safety in Numbers.This problem is framed as a reality show that is apparently similar to Dancing with the Stars. Here are the bare facts, with a longer description available at the link.

There are N contestants
Each contestant gets a score from the judges from 0 to 100
Based on audience voting, the second part of the score is determined by multiplying the percentage of audience votes received by the total points awarded by the judges.

For example, if there are 2 contestants, scored 20 and 10 by the judges, they will split an additional 30 points based on the share of audience votes, so the final score may fall anywhere from 50-10 to 20-40.

In case of a tie for the lowest score, no one is eliminated that week, otherwise the lowest is eliminated.

The judges' scores are given, the problem is to find the minimum percentage of audience votes required to guarantee the contestant is saved no matter how the rest of the audience votes are distributed. In the 20-10 case above, the answer is

33.33 66.67

because if the first contestant gets 1/3 of the vote, they'll get 10 more points for a total of 30, and the second contestant can't beat him, but can only tie him. The same logic applies for contestant 2. a 2/3 vote would give him 30 and make him unbeatable (but tie-able, this is important to remember but I'll stop saying it now).

I struggled for awhile with this problem, first thinking I just needed to split the audience vote between the each contestant and the worst judged contestant, which works in the 2 contestant case. Another sample set was:
24 30 21
with a stated answer of
34.666667 26.666667 38.666667
This one flustered me because my algorithm said that giving 34.6% of the 75 audience points to #1 yielded a total of 50 points, and giving the other 65.3% to #3 yielded a total of 70, meaning #1 would lose. After a few minutes, I smacked myself in the forehead and realized #1 would be safe, because #2 would be stuck at 30 in that case. This led me to a closer solution, that each contestant needed enough audience votes to get to 1/N of the total points. So for this case with 75 judge points (and 75 audience points), that meant percentages that led to 50 total points would be the correct answer.

There was one more subtlety, what if a contestant already had more than their 1/N share without any audience votes at all? For example:
45 1 1 1
In this case, there are 96 total points, so each one would need 24 based on the previous assumption. But giving 25% to #2 would yield for him 13 total points, over the required 1/N, but #3 and #4 could each get 37.5% and beat him. I modified my algorithm again to ignore any contestants, and their points, if they were already over the 1/N threshold. The rest of the contestants would need to reach, in this case
1/(N-1) * (96 - 45) = 17 points
This yielded an answer of
0 33.3 33.3 33.3
which makes sense. Contestant #1 is already safe, and the others are on even footing, so splitting the points 3 ways keeps them all safe.

I used this algorithm to solve the easy problem, but as it turns out, it wasn't exactly correct. There were more subtleties around having lots of point values near the 1/N cutoff, and still more that were higher than the new cutoff of 1/(N-1) of a lesser value. As a result, my large solution failed. The problem author suggested using a binary search in the post-contest analysis, which was way off from what I did.

I spent lots of time on Problem A, so then decided to go straight for Problem C: Equal Sums, as it was worth a few more points than problem B if solved in full, and I thought I might need them.

This problem has a very simple setup, and a very complex answer. Given a set of integers, find two subsets with the same sum. The small input would be a set of 20 positive integers under 10,000. I actually iterated over all subsets, at least until I found an answer. This meant for each set, I could loop over up to 2^20, or 1,048,576 subsets if none had a matching sum. I wasn't sure that was feasible in the 4 minutes allowed, but it wound up working.

For the large input, the set had 500 integers under 1,000,000,000,000. This meant using the same algorithm was likely not an option, as 2^500 is a very large number of subsets to loop over, and I certainly wouldn't be able to store all of the sums in the case that I had to get all the way to the end and find there were no matches. I did some searching online and found that this is roughly equivalent to an NP-complete problem, the Subset Sum problem. I was very low on time by this point so I just modified my original code to loop over all 2^500 solutions, after all, I had 8 minutes. I ran it, and it finished very quickly, causing me to question if it really did what it was supposed to. I had no time to check as the clock was winding down, so I submitted the large input without looking at it.

Just a minute or so later I found out it was wrong. There was another bug in my code that reads the input, as it can't handle numbers greater than 32 bits. Once again, at least I've found something to fix for next year. If you want an in-depth explanation of the solution to this problem, I highly suggest reading the contest analysis for it, but in a nutshell, because of the size of the numbers and the sheer numbers of available subsets, there must be matching sum subsets, and lots of them. It is actually sufficient to choose random 6-number subsets until two with matching sums are found. It seems crazy, but that's math for you. I've actually gone back to read the "Rigorous proof" section of the analysis several times, and each time it makes a little more sense to me.

In the end, I finished with a paltry 16 points, putting me in 2262nd place for this round. Round 1C was the next day at 4am local time, since the Code Jam tries to scatter the start times to be fair to the whole world. I opted not to wake up for that one, so that ends my Code Jam journey this year.

2012-05-16

2012 Google Code Jam, Round 1A

In the next round of the Google Code Jam this year, 3686 people scored at least some points. Only the top 1000 would advance, though, which took at least 53/100 points and a fast enough time.

I first attacked Problem B. Kingdom Rush. It has a long explanation you can see at the link, but here's a short summary:

There are N levels to finish
Each level can be completed with a 1 star or 2 star rating; if it is previously completed with 1 star, completing it with 2 stars only earns one additional star, and completing it with 1 star is meaningless. Thus, the maximum possible stars to earn is 2N
There is a prerequisite number of stars to get a 1 star rating on a level, and a greater or equal prerequisite to finish the level with 2 stars.

The object is to find the fewest number of runs possible to complete all levels with 2 stars, if it's possible at all. Levels don't have to be completed in any order.

One sample input is:
0 2
0 1
meaning that there are 2 levels, and the first level has a pre-req of 0 for 1 star, and 2 for 2 stars. One example run could be finishing level 2 with 1 star (now you have 1 star), then finishing level 2 with with 2 stars (now you have 2 stars), then finishing level 1 with 2 stars in one single run because the pre-req is only 2 stars, for a total of 3 runs.

An impossible input would be
1 1
1 1
1 1
because, starting with 0 stars, there's no way to finish any level first.
Also impossible would be
0 0
0 0
0 6
because you couldn't get 6 stars before attempting 2 stars on the last level.

My first approach was to try to finish the first possible 2-star level, and if one wasn't reachable with the current level of stars, try the first possible 1 star level. This solved the test data, but my submissions for the small input kept getting rejected, so I knew there must be an additional trick. I tried to come up with solutions that broke my algorithm, and discovered one.
0 1
0 3

With this input, first I'd try all the 2-star values, and fail. Then I'd take the first 1-star value, leaving:
0 1
0 3

The next run would take the first possible 2-star value, at 1.
0 1
0 3

Next, the 2 star search would fail again, and it would take the next 0.
0 1
0 3

Then the 3, for a total of 4 runs.
0 1
0 3

But, I can see a faster way. First, take the last 0
0 1
0 3

Now that I have 1 star, I can complete the first level all at once
0 1
0 3

And with 3 stars, I can complete the second level, for a total of 3 runs
0 1
0 3

Adding the run for the final level lets this one get finished in 3 runs instead of the previous 4. So the flaw in the algorithm was in taking the first available 1-star, when instead I needed to take the 1-star associated with the highest possible 2-star value, in order to leave the others available to possibly finish both stars in one single run. I fixed my algorithm and submitted the small input, which was then correct. I submitted the large input and had to wait until the end to find out it if it was right, which it was.

Next, I attempted Problem A. Password Problem. Again, it has a long explanation, I'll try to boil down to some bullet points.

I have already typed A characters of my password, which is B characters wrong.
The chances I typed each of the A previous characters correctly are given
All future characters will be typed correctly
My choices are

keep typing
backspace 1 to A characters and continue typing
hit enter and start over.

The object is to choose the strategy to minimize the expected number of keystrokes.
Side note: I don't usually like expected value that much.

Here was the contest example:

Suppose my password is "guest" and I have already typed the first two characters, but I had a 40% chance of making a mistake when typing each of them. Then there are four cases:

I typed "gu" without error. This occurs with probability 0.6 * 0.6 = 0.36.
I typed the 'g' correctly but I made a mistake typing the 'u'. Then I have two letters typed still, but the second one is wrong: "gX". (Here, the 'X' character represents a mistyped letter.) This occurs with probability 0.6 * 0.4 = 0.24.
I typed the 'u' correctly but I made a mistake typing the 'g': "Xu". This occurs with probability 0.4 * 0.6 = 0.24.
I made a mistake typing both letters, so I have two incorrect letters: "XX". This occurs with probability 0.4 * 0.4 = 0.16.

I don't know how many mistakes I actually made, but for any strategy, I can calculate the expected number of keys required to use it. This is shown in the table below:

        "gu" "gX" "Xu" "XX" Expected
Probability                            0.36 0.24 0.24 0.16 -
Keystrokes if I keep typing            4      10   10   10 7.84
Keystrokes if I press backspace once   6       6   12   12 8.4
Keystrokes if I press backspace twice  8       8    8    8 8
Keystrokes if I press enter right away 7       7    7    7 7

My first thought here was that I would need to calculate the 2^A possibilities, so if I got character 1 right or wrong, combined with if I got character 2 right or wrong, etc. Since A could be up to 99999 in the problem statement, I knew that wouldn't be feasible. Then I realized it only mattered what the odds were of typing the first x characters correct, and anything after the first wrong character didn't effect the answer to the problem. that is why the columns under "Xu" and "XX" look the same

I treated the "keep typing" case as pressing backspace 0 times, so that it fit more nicely in my code. Now there were only 2 cases to check. Pressing enter right away was easy, the answer was always the same, so the expected value was always B+2, for the first enter, the B characters of the password, and the final enter.

Since the probability could be different for each letter, I stored them in a vector with the intent to multiply the ones I needed. Then I realized that what I really needed was the cumulative probability of getting all letters right up to any given letter, so I stored the values as

p1, p1*p2, p1*p2*p3,...

resulting in a new vector

P(0), P(1), P(2), P(3)...

Now I could figure out the answer in linear time. Letting the number of backspaces be C, and looping over C from 0 to A, the expected value with C backspaces was the product of all of the probabilities of typing characters 0 to A-C correct, times 2C+B-A+1 (C backspaces, C replacement characters, the rest of the password, plus the enter key), plus 1 minus that product times 2C+2B-A+2 ( The 2C+B-A+1 from before, plus another B characters when it is rejected, and the final enter).

So the answer to each case, given A, B, and the vector of p's is:

min(B+2,P(A-C)(2C+B-A+1),(1-P(A-C))(2C+2B-A+2))

for all C 0 to A

(I really need to get LaTeX working for Blogger to write fancier equations.)

I thought I had cleverly solved this one, and submitted the small input. Unfortunately, the large was wrong because I had not planned on reading in 900,000 characters from the input file on a single line, and some code I wrote several years ago broke. This is because each case was input as:
A B
p1 p2 p3 ... pA

Where px is the probability of getting character x correct. The probabilities were in the form x.xxxxxx, so a total of 9 characters each including numbers, decimal point, and space. With A up to 99999, there were nearly 900k characters on one of the lines.

So the takeaway from this round was to fix my input reader. I finished with 43 points, placing at #1463.

On to Round 1B!

2012-04-30

2012 Google Code Jam, Qualification Round

TL;DR version: Once again, I've qualified this year!

Here's last year's Qualification Round post for details on how the Google Code Jam works.

I was incredibly busy during the 25 hours Qualifcation round, and in fact for the third consecutive year I was out of town during the round. The cutoff was 20 points, achievable by solving the small and large inputs for one of the problems. My strategy was thus to log in shortly after the start, about 7pm Friday, and let the ideas roll around in my head all day Saturday so that I could spend a very short time coding during a brief break Saturday afternoon. Since the feedback from the codejam website doesn't reveal success or failure on the large input version until the contest is over, too late to try for more points. Even so, I was in a time crunch and so decided to trust I could get it right. Since I waited until near the end and solved the minimum, my rank was 15,158 out of the 15,193 that qualified. That's way more than the 9000+ that qualified last year. It feels like I just made it, but anyone who could earn enough points could make it. 17,803 earned at least some points, but 2,610 didn't get the 20 required.

I chose Problem B, Dancing With the Googlers, worth a total of 20 points, figuring it's the easiest one that had enough points to allow me to advance.

You're watching a show where Googlers (employees of Google) dance, and then each dancer is given a triplet of scores by three judges. Each triplet of scores consists of three integer scores from 0 to 10 inclusive. The judges have very similar standards, so it's surprising if a triplet of scores contains two scores that are 2 apart. No triplet of scores contains scores that are more than 2 apart.

For example: (8, 8, 8) and (7, 8, 7) are not surprising. (6, 7, 8) and (6, 8, 8) are surprising. (7, 6, 9) will never happen.

The total points for a Googler is the sum of the three scores in that Googler's triplet of scores. The best result for a Googler is the maximum of the three scores in that Googler's triplet of scores. Given the total points for each Googler, as well as the number of surprising triplets of scores, what is the maximum number of Googlers that could have had a best result of at least p?

For example, suppose there were 6 Googlers, and they had the following total points: 29, 20, 8, 18, 18, 21. You remember that there were 2 surprising triplets of scores, and you want to know how many Googlers could have gotten a best result of 8 or better.

With those total points, and knowing that two of the triplets were surprising, the triplets of scores could have been:

10 9 10
6 6 8 (*)
2 3 3
6 6 6
6 6 6
6 7 8 (*)

The cases marked with a (*) are the surprising cases. This gives us 3 Googlers who got at least one score of 8 or better. There's no series of triplets of scores that would give us a higher number than 3, so the answer is 3.

Input

The first line of the input gives the number of test cases, T. T test cases follow. Each test case consists of a single line containing integers separated by single spaces. The first integer will be N, the number of Googlers, and the second integer will be S, the number of surprising triplets of scores. The third integer will be p, as described above. Next will be Nintegers t_i: the total points of the Googlers.

I found this one rather easy, as it was just a quick math problem once I thought about it. In the regular case, seeking a best score of p would mean that the worst possible set of scores is p, p-1, p-1, and in the surprising case, p, p-2, p-2. Since scores couldn't be negative, I had to account for p being 0 or 1 as a special case. So the total for the regular case had to be at least 3p-2, and at least 0. In the surprising case, the total had to be at least 3p-4, and again, at least 0 (realistically, at least 2, with scores 2, 0, 0, or else it wouldn't be surprising).

I simply looped over the listed values, and if one was at least the regular total, it counted. If it wasn't, and S was still at least 1, and it was at least the surprising case total, it counted, and I subtracted 1 from S.

So, for the case of:
3 1 5 15 13 11
N S p t0 t1 t2

There were 3 Dancers to score. 1 had surprising scores (a spread of 2). I was to find how many could have a best score of at least 5. The regular target was 13 (5+4+4), and the surprising target was 11 (5+3+3). So, t0 and t1 qualified as regular, and t2 was the surprising case, and the answer was 4.

Rinse and Repeat for 100 cases and 1 to 100 competitors per case, and I solved the easy and hard versions rather quickly.

I'll post about Round 1A and if necessary, 1B and 1C later.

2011-05-23

2011 Google Code Jam: Round 1

Again, here is the most important detail first: I advanced!

As of this writing I'm sitting at 910th place in Round 1B. Assuming no disqualifications of me or anyone else, that's where I'll stay. As a reminder, Round 1 is split into sub-rounds A, B, and C. Everyone can compete in 1A, and the top 1000 advance. All but those 1000 can compete in 1B, and again, the top 1000 advance. Finally, everyone but the already-advanced 2000 gets one final shot in 1C, and again, the top 1000 advance. This mean there will be 3000 coders in Round 2, and I'm one of them.

As Google noted, Round 1A was brutal. I skipped the lowest-value Problem A out of fear it wouldn't be enough to advance (As it turns out, solving it in 42:01 would have been enough). I also felt like there were some tricky cases in it that made it harder than it should be for the points offered. Problem B also seemed like it could wind up taking too long to run a quickly coded solution. So, I focused in on the highest-value, Problem C.

You are playing a game with a fancy deck of cards. Each card has three bonus numbers: a card bonus c, a score bonus s, and a turn bonus t. Some of the cards start in your hand, while the rest are in a deck on the table. You start with one turn.

On each turn, you can choose any card from your hand and play it. If it has bonus numbers c, s, t, then the following happens:

The card is discarded from your hand, and it can never be used again.

You draw the first c cards from the deck into your hand. If the deck has fewer thanc cards in it, you draw all of them.

Your total score increases by s.

Your number of remaining turns increases by t.

If you do not have any cards in your hand at the start of a turn, then nothing happens on that turn. Your goal is to get as high a score as possible before running out of turns.

For example, suppose your hand and deck contain the following cards:
         +---+---+---+            +---+---+---+
   HAND: | c | s | t |      DECK: | c | s | t |
         +---+---+---+            +---+---+---+
Card #1: | 0 | 0 | 2 |   Card #4: | 1 | 1 | 0 |
Card #2: | 0 | 5 | 0 |   Card #5: | 0 | 1 | 1 |
Card #3: | 2 | 1 | 1 |   Card #6: | 2 | 2 | 0 |
         +---+---+---+            +---+---+---+
The following table shows how you can get a score of 8 from these cards. The first three columns show your hand, the number of turns left, and your score before playing each card, and the final column shows which card to play.
+---------+------------+-------+------+ | Hand | Turns left | Score | Play | +---------+------------+-------+------+ | 1, 2, 3 | 1 | 0 | 1 | | 2, 3 | 2 | 0 | 3 | | 2, 4, 5 | 2 | 1 | 2 | | 4, 5 | 1 | 6 | 5 | | 4 | 1 | 7 | 4 | | 6 | 0 | 8 | - | +---------+------------+-------+------+
As you can see, the card bonuses and turn bonuses allow you to chain together a long sequence of cards before you have to stop.

My first instinct turned out to be correct, but it went south from there. I figured out the best thing to do is play any cards with a turn bonus first. That would certainly have no negative effect on the potential final score. From there I tried varying strategies, from playing the cards with the highest card bonus, to playing the highest score when you're down to your last turn, to a recursive brute force algorithm. None worked for even the small input. I ended with 0 points.

Round 1B turned out much better. I just about jumped out of my chair when I saw Problem A. It asked for a fairly simple RPI calculation. I've written lots of code like this for various personal projects, such as writing an algorithm designed to be "Better than the BCS" and decide the best team in College Football and any sport. At the 32:03 mark, I had full credit for the small and large input. I wondered if that would be enough to advance, but I suspected it might not. I read the next problems and took a quick break to think about which one I could do best.

Problem B looked most plausible to solve quickly, so I went for it.

Last year, several hot dog vendors were lined up along a street, and they had a tricky algorithm to spread themselves out. Unfortunately, the algorithm was very slow and they are still going. All is not lost though! The hot dog vendors have a plan: time to try a new algorithm!

The problem is that multiple vendors might be selling too close to each other, and then they will take each other's business. The vendors can move along the street at 1 meter/second. To avoid interfering with each other, they want to stand so that every pair of them is separated by a distance of at least D meters.

Remember that the street is really long, so there is no danger of running out of space to move in either direction. Given the starting positions of all hot dog vendors, you should find the minimum time they need before all the vendors are separated (each two vendors are at least D meters apart from each other).

My solution wasn't quite optimal, so it worked for the small input, but failed for the large input. In fact, I let my solution on the large input run much longer than the allotted 8 minutes. After an hour, I finally stopped it, still with no solution. My first naive solution was to move in increments of 0.5s, start with the leftmost vendor, and move everyone 0.5m to the left, as long as they would not get closer than D meters to the vendor to the left. In that same 0.5s, I did the same thing starting from the right and looping to the left, moving everyone to the right as long as they wouldn't get too close. I refined the solution a bit before attempting the large input, but still it took too long to run.

With my time expired on Problem B and not enough time to think out a solution on Problem C, I was down to scoreboard watching. I saw my rank drop slowly from 561, down to 999th with 19 minutes to go. When it hit 1009 with 18 minutes to go I knew I had to get out of the house for a bit before I went crazy. I ran 2 quick errands and returned about 45 minutes after the end of the contest, and saw I finished in 910th place, good enough to advance. This happens because Google gives an "optimistic" score during the competition, so anyone who submits a solution for the large input gets credit until the correctness of the solution is revealed at the end. My large input solution for Problem A was correct, and enough other solutions were incorrect to vault me back into the top 1K.

For the 3rd consecutive year I've made the top 3000. I'm hoping I can get one step further this year and hit the top 500 to make it to Round 3, but I'm pretty happy with how far I've gotten already.

2011-05-14

2011 Google Code Jam: Qualification Round

First, the most important detail: I Qualified!

There are 10,565 Qualifiers as of this writing, including - or should that be excluding - at least one above my rank (I started at 9912, now 9911) who has been disqualified.

I was a bit surprised when I logged in. Google changed the format from solving one small and one large input to qualify, to needing 25 points. The first problem was worth 10 and 10, the next two were 10 and 15, and the last was 10 and 20. This meant if I wanted to solve just one, it couldn't be the first. It also meant if I wasn't too confident and I could only manage to solve the small version of the problem, I could pick up 2 more problems and solve their small input as well, and still advance.

I chose Problem B, as it looked fairly simple.

As a wizard, you can invoke eight elements, which are the "base" elements. Each base element is a single character from {Q, W, E, R, A, S, D, F}. When you invoke an element, it gets appended to your element list. For example: if you invoke W and then invoke A, (we'll call that "invoking WA" for short) then your element list will be [W, A].

We will specify pairs of base elements that combine to form non-base elements (the other 18 capital letters). For example, Q and F might combine to form T. If the two elements from a pair appear at the end of the element list, then both elements of the pair will be immediately removed, and they will be replaced by the element they form. In the example above, if the element list looks like [A, Q, F] or [A, F, Q] at any point, it will become [A, T].

We will specify pairs of base elements that are opposed to each other. After you invoke an element, if it isn't immediately combined to form another element, and it is opposed to something in your element list, then your whole element list will be cleared.

So basically, add a letter to a list, then see if it needs to combine with the previous letter to make a new one. If not, loop over the whole list to see if the list needs to be cleared. Looping over the list worried me a bit for the large input, as I would only have 8 minutes to solve it, and I've run out of time before when I wrote a quick and dirty, brute-force solution. Luckily the Qualification Round usually isn't overly complex, so it worked out.

I also solved Problem A on paper, but I didn't have time to actually code, debug, and submit the solution. I did go in and check my solution on Monday and found that sure enough, it worked. I might have been pretty disappointed if the large input on problem B hadn't worked out but I had held back a problem A solution. It runs in in O(n) time, where n is the number of button presses, so I was happy to at least have accomplished one "good" solution.

Now I get up to 3 tries to make the top 3000, next Friday, Saturday, and Sunday. As I'm in the Central Time Zone, so I'll have to be pretty desperate to do Round 1C at 4am.

2011-04-13

2011 Google Code Jam

I've been signed up for a few days now, but I'm very excited about the Google Code Jam. This will be my 3rd time competing. The format has been pretty consistent for the last 3 competitions, with one 24 hour qualification round, in which anyone solving one Small Input and one Large Input problem advances, followed by Round 1, split into 3 sub-rounds. The top 1000 from each sub-round move on to round 2, and if you fail in Round 1A, you can retry in Round 1B and 1C. Round 2 whittles the 3000 down to 500, who all win a coveted Google Code Jam T-shirt. Round 3 takes the 500 down to 25, who win a trip to the Finals.

Here's my Google Code Jam history

2009

Qualification Round

2467th place

Round 1A

1278th, not quite enough to move on

Round 1B

767th, advanced to round 2

Round 2

1055th, 7 points short of moving on

2010

Qualification Round

4156th. I was in the middle of a business trip to London and finished it about 1:15am, so I blame my low finish on that :)

Round 1A

915th, my time saved me on this one. I was stumped after solving problem 1, so I just painfully watched my rank drop and hoped it wouldn't break 1000. If I had been 8:40 later I would have been worse than 1000.

Round 2

1828th. I felt like I should have advanced this year, but I hit too many problems for which I didn't have a good mathematical understanding. At least I solved one to get on the board.

I've done a lot of dynamic programming this year, which I feel has been my weakness in the past. There's usually at least one dynamic programming problem per round, so hopefully that'll give my scores a little bounce. This year, again, I'll be traveling during the Qualification Round, but at least I'll be closer to my own time zone.