Two Wrongs

Advent of Code 2017 in R

Advent of Code 2017 in R

I was really impressed by the way Peter Norvig showed his solutions to the 2016 Advent of Code puzzles in one, continuous iPython notebook. With my new powers of embedding R code in Org mode, I want to do something similar this year.

Day 0: Preparations

I am very unprepared, but I’m going to pretend I’m not and update this section as I go …

I have only just started learning R, so I’m not an expert on library choices yet. For this article, I just import all of tidyverse because parts of it are needed here and there.11 The tidyverse library contains, among other things, the magrittr library which provides basically two things: the %>% pipe operator and the . implicit variable. Even though it’s a tiny library, it can make the code so much easier to read and write! It also includes the stringr, dplyr, tidyr and ggplot2 libraries which have as their strongest selling points that they provide a uniform grammar to talk about strings, data transformation, data cleaning and plotting, respectively.

library(tidyverse)
## Note to self: dessa importer borde INTE behövas...
library(tidyr)
library(readr)
library(stringr)
library(purrr)
library(dplyr)
library(ggplot2)
library(httr)
library(testthat)

get_input <- function(day)
      "https://two-wrongs.com/src/aux/aoc17/input%d.txt" %>%
          sprintf(day) %>% GET %>% content %>% str_trim

as_lines <- . %>% textConnection %>% readLines

I also prepare a function to download the input for each day’s puzzle. The httr library provides the GET function. I find this %>% based magrittr style very clean and easy to read. Without it, I would have written the get_input function as

get_input_nopipe <- function(day) {
    url <- "https://two-wrongs.com/src/aux/aoc17/input%d.txt"
    response <- GET(sprintf(url, day))
    readLines(str_trim(content(response)))
}

Sure, if you are used to C-like languages, then maybe you even prefer this verbose, nested tangle of spaghetti version. But if you have some experience with the Unix shell, or functional programming, you’ll probably see the structure more clearly in the first version.

Day 1: Inverse Captcha

In this puzzle, we get as our input a long string of digits, and then we are supposed to pick all digits that match the next digit in the string (including the last digit, if it matches the first) and return the sum of those. So, e.g. 122311 should return 4, as in 2+1+1.

Solution

offset <- function (v, n) c(tail(v, n), head(v, -n))

test_that("offset works", {
    expect_equal(offset(c(), 2), c())
    expect_equal(offset(c(1), 5), c(1))
    v <- c(5, 2, 9, 4, 1, 6)
    ## Offseting backward and forward should be inverses
    expect_equal(offset(v, -4), offset(v, 2))
})


repeated <- function(v) which(v == offset(v, 1))
test_that("repeated works", expect_equal(
  repeated(c(1, 2, 2, 3, 1, 1, 5, 1)), c(1, 3, 6)))


solve_captcha <- function (...)
    str_split(list(...), "") %>%
        map(. %>% parse_integer %>% {
            sum(.[repeated(.)])
        }) %>%
        as_vector

test_that("day 1 examples pass", expect_equal(
  solve_captcha("1122", "1111", "1234", "91212129"),
  c(3, 4, 0, 9)))

print(solve_captcha(get_input(1)))

And we appear to have the correct result!

[1] 1141

Reasoning

Table 1: A sloppy illustration of the process
Vector 1 2 2 3 1 1
Shifted vector 2 2 3 1 1 1
Matching elements   =     = =
Boolean result F T F F T T
which result   2     5 6
Indexes into   v     v v
Vector again 1 2 2 3 1 1
Sum   2     5 6

To solve the captcha, I start by splitting the input string(s) into characters and convert each character to a number. Now I want the index for each number that matches the next number.

To get this, I think it will be convenient to do a vectorised comparison (regular == operator in R) between the vector of numbers and itself – offset by one place22 I guess “cycled” or “rolled” one place ahead would be more correct terminology, since the back comes onto the head.. However, this offset function is not part of any R library I use, so I had to define it as

offset <- function (v, n) c(tail(v, n), head(v, -n))

The vector of booleans we get from the == operator can be translated into indices with the which function. In other words, the following function returns the indices that contain elements which are repeated.

repeated <- function(v) which(v == offset(v, 1))

Finally, summing the elements referenced by those indices is a piece of cake – we can simply index the vector by the other vector and we get a new vector with only those elements.

Day 2: Corruption Checksum

Today we have a bunch of numbers arranged in rows. We need to compute, and I bet you will have to re-read this, the sum of the differences of the largest and smallest number of each row.

Solution

compute_checksum <-  function(document) {
    read_tsv(document, col_names=FALSE) %>%
        group_by(row=rownames(.)) %>%
        gather("col", "val", -row) %>%
        summarise(span=diff(range(val, na.rm=TRUE))) %>%
        summarise(checksum=sum(span)) %>%
        .[["checksum"]]
}

test_that("day 2 example passes", {
    expect_equal(compute_checksum("
5       1       9       5
7       5       3
2       4       6       8"), 18)
})

print(compute_checksum(get_input(2)))

Resulting in

[1] 45351

Reasoning

We can solve this the correct way, but we can also very quickly get a sense of whereabouts the correct answer should be. One of the cool things about R is how easy it is to just look at the numbers, and instantly get an intuitive sense for what they are like.

day2_data <- gather(
    read_tsv(get_input(2), col_names=FALSE) %>%
    mutate(row=as.integer(rownames(.))),
    column, value, -row
)

day2_plot <- ggplot() +
    theme_classic() +
    theme(legend.position="none",
          panel.background=element_blank(),
          plot.background=element_blank()) +
    scale_x_continuous(expand=c(0,0),
                       breaks=seq(0, 6500, 500)) +
    scale_y_continuous(expand=c(0,0)) +
    scale_fill_gradient(low="gray50", high="black")

day2_plot +
    geom_histogram(
        data=day2_data %>% select(row, value),
        aes(x=value, fill=row, group=row),
        breaks=seq(0,6500,100))

Sorry, your browser does not support SVG.

It looks as though most values hang out below 500. However – that doesn’t help us a whole lot because to solve this problem we need to look only at the extremes of each row.

Spontaneously, it looks like most rows will have a difference of about 5000 between their extremes. Maybe a standard deviation of 500? Just guessing; it’s really hard to tell from this picture. If the guess was true, then we could use the central limit theorem33 The central limit theorem says that when you sum up a bunch of random values with mean \(\mu\) and standard deviation \(\sigma\), your result will have a normal distribution with mean \(n\mu\) and standard deviation \(\sigma\sqrt{n}\). to approximate the answer as

\[N(16 \times 5 000, 500 \times \sqrt{16}) = N(80 000, 2 000)\]

for which a 95% confidence interval would land in the range 76000–84000, a guess which I happen to know is wildly incorrect.44 Why do the numbers overall represent such a bad estimate for the answer? Probably because of the skewed distribution with the long tail. It’s hard to tell from that plot where all rows begin and end.

Instead, let’s annotate the data with its maximum and minimum across the groups, and plot those ranges instead.

day2_summary <- day2_data %>%
    group_by(row) %>%
    summarise(low=min(value), high=max(value)) %>%
    select(row, low, high) %>%
    arrange(high-low) %>%
    mutate(size_order=as.integer(rownames(.)))

day2_plot +
    geom_tile(data=day2_summary,
              aes(x=size_order,
                  y=(low+high)/2,
                  width=1,
                  height=high-low)) +
    scale_x_continuous("", expand=c(0,0), breaks=c()) +
    scale_y_continuous("value", expand=c(0,0),
                       breaks=seq(0, 6200, 500)) +
    geom_hline(aes(yintercept=3500),
               linetype="dashed", color="chocolate2") +
    geom_hline(aes(yintercept=100),
               linetype="dashed", color="chocolate2")

Sorry, your browser does not support SVG.

This makes it more clear that maybe we’re looking at a real range of 100–3500, and then the central limit theorem gives us the range 50000–58000.55 Which is actually fairly close to the true value!

Anyway, just straight up computing the value gives us the correct answer with no problems.

Day 3: Spiral Memory

This puzzle had me stumped for an embarassingly long time.

Solution

spiral_distance <- Vectorize(function(i) {
    ## Compute how many rows we need to get to i
    rows <- rep(0:ceiling(sqrt(4*i)/4), each=4)
    ## Compute the values of the corners for rows
    corner <- matrix(4*rows^2 + 2*rows*(0:3 - 1),
                     ncol=4, byrow=T)
    distance <- abs(corner - i)
    nearest <- which(distance == min(distance),
                     arr.ind=TRUE)
    ## The distance is 2*r - d, but since we are
    ## 1-indexing we need to subtract by one.
    manhattan <- max(0, 2*(nearest[1,"row"] - 1) -
                        distance[nearest][1] - 1)
    unname(manhattan, force=TRUE)
})

test_that("day 3 examples pass", {
    examples <-  c(1, 12, 23, 1024)
    solutions <-  c(0, 3, 2, 31)
    expect_equal(spiral_distance(examples), solutions)
})

print(spiral_distance(277678))

It is correct!

[1] 475

Reasoning

At some point, I concluded that I couldn’t figure it out in my head; I needed to draw a grid on paper. In doing so, I discovered that if the memory addresses are zero-indexed instead, the index of the four corners of the rectangles formed appear to follow a pattern.

Radius Upper-right (c=0) Upper-left (c=1) Lower-left (c=2) Lower-right (c=3)
r=0 0 0 0 0
r=1 2 4 6 8
r=2 12 16 20 24
r=3 30 36 42 48

Sorry, your browser does not support SVG.

Figure 3: Two equivalent paths to the same destination

This is nice, because computing the minimum distance to the centre from any corner is trivial: it’s twice the radius: once to get to either axis, and then once again to get to the origin. This is also the farthest distance possible given a particular radius.

So if you have the distance to the corner (which we can compute by simply subtracting the memory addresses of the location of the data and the corner) and the radius the corner belongs to, then it’s trivial to compute the distance to the data location by subtracting the two.

corner <- matrix(
    c(0, 0, 0, 0,
      2, 4, 6, 8,
      12, 16, 20, 24,
      30, 36, 42, 48,
      56, 64, 72, 80),
    ncol=4, byrow=T)
print(corner)
     [,1] [,2] [,3] [,4]
[1,]    0    0    0    0
[2,]    2    4    6    8
[3,]   12   16   20   24
[4,]   30   36   42   48
[5,]   56   64   72   80

Assuming we have the matrix above, we can compute the distances to all corners for an index, say, 23.

address <- 23

proximity <- abs(corner - address)
print(proximity)
     [,1] [,2] [,3] [,4]
[1,]   23   23   23   23
[2,]   21   19   17   15
[3,]   11    7    3    1
[4,]    7   13   19   25
[5,]   33   41   49   57

We use abs because we’re only interested in the distance, not in which direction the corner lies. We see already which corner is the closest – all we have to do is convince the computer.

nearest <- which(proximity == min(proximity), arr.ind=TRUE)
print(nearest)
     row col
[1,]   3   4

We ask the computer the same thing we asked ourselves. Which corner has the minimum proximity to our location? That’s the nearest one.

distance <- 2*nearest[1,"row"] - proximity[nearest][1] - 2
print(distance)
row
  3

Then computing the distance from our location is done the way it’s supposed to. But how do we get the matrix of corners we started from? Well, with a little thinking we can arrive at the conclusion that the matrix is built from the recursive formula

\[ n(r,c) = \begin{cases} 0 & \mathrm{if}\quad{}r = 0 \\ n(r-1,3) + 2r + 2rc & \mathrm{otherwise} \end{cases} \]

Which basically says that for rectangles $r = 0, 1, 2, 3, … $ and corners \(c = 0,1,2,3\), we can find the address for the corner by taking the base address of the rectangle and add the corner number times the distance to the corner.

Shuffling things around, these leads to the closed (i.e. not recursive) formula

\[ n(r, c) = 4r^2 + 2r(c-1) \]

Solving this equation gives us, for any address \(n\), a maximum number of rows

\[ \left \lceil \frac{1 + \sqrt{4n + 1}}{4} \right \rceil \]

Which we simplify a bit (we’re not interested in exact results) and then use to compute the matrix with the closed formula.

At this point, we could go even further and eliminate the matrix entirely, by simply calculating which two corners are candidates for being closest and then discarding everything else. However, that is left as an exercise for the reader.

Day 4: High-Entropy Passphrase

Finally! Something I can bang out in five minutes!

Solution

unique_words = function(str) {
    words <- str_split(str, " ")
    duplicates <- map(words, anyDuplicated)
    ## Return boolean vector which is true only for
    ## elements that are not duplicated
    duplicates == 0
}

count_valid_passphrases = function(passphrases)
    sum(unique_words(passphrases))

test_that("day 4 examples pass", {
    expect_equal(unique_words("aa bb cc dd ee"), TRUE)
    expect_equal(unique_words("aa bb cc dd aa"), FALSE)
    expect_equal(unique_words("aa bb cc dd aaa"), TRUE)
})

print(count_valid_passphrases(get_input(4) %>% as_lines))
[1] 451

Reasoning

There’s not much to say here. Pull out the words, look for duplicates, check if the amount of duplicates is zero. Count those lines.

This was, however, the first time I got an incorrect answer! Mostly because I misread the question as “How many invalid passphrases are there?” rather than how many valid.

Day 5: Twisty Trampolines

This day was really fun! I decided early that I wanted a vectorised solution rather than the obvious way, so I had to rewire my brain a bit to get it working correctly.

Solution

This is an optimised version, see the simpler version for an equivalent, but theoretically much more pleasing version.

twisty_trampolines <- function(relative) {
    n <- length(relative)

    ## Convert relative jumps to absolute vector indices
    memory <- matrix(relative + 1:n, n, 1)

    ## Initialise program counter
    pc <- matrix(diag(n)[1,])

    ## Preallocate jump matrix
    target <- matrix(FALSE, n, n)

    ## While there are active program counters
    while (any(pc > 0)) {
        ## Boolean vector of active program counters, controls
        ## which values are updated this iteration
        active <- pc > 0

        ## Construct program counter matrix with jump targets
        target[, active] <- outer(1:n, memory[active, 1], `==`)

        ## Increment value in this memory location
        memory[active, 1] <- memory[active, 1] + pc[active, 1]

        ## Compute new program counter locations from jump matrix
        pc[,1] <- target[, active, drop=FALSE] %*% pc[active, 1]
    }

    ## Figure out how much the memory has changed from initial value
    sum(memory - relative - 1:n)
}

test_that("day 5 example passes", {
    expect_equal(twisty_trampolines(c(0,3,0,1,-3)), 5)
})

print(twisty_trampolines(
    get_input(5) %>% as_lines %>% as.integer
))

After a little while66 Unsurprisingly, this approach is slower than a naïve loop concerned with only one-dimensional values., we get our result.

Reasoning

The first thing I did, before I even decided on an approach, was to convert the relative jump locations to absolute vector indices. It just seemed easier to deal with. I then picked an approach where I represent the pc77 program counter, i.e. index of current memory address as a one-hot vector. I had a weak intuition this would let me play to the strengths of R and the built-in vector operations. So when the program starts, the pc is

\[ \left( \begin{array}{cc} 1 \\ 0 \\ 0 \\ 0 \\ 0 \end{array} \right) \]

Then, after two very productive decisions, I hit a brick wall. I had no idea how to continue. I sat down and had a Deep Think. How can I compute the new one-hot pc vector based on the memory contents and the old pc? I decided to store the computation of the new pc as a matrix where the columns represent the current pc, and the rows represent the next pc. In other words, if the memory is

\[ \left( \begin{array}{cc} 3 & 1 & 4 & 3 & 2 \end{array} \right)^{\intercal} \]

then the matrix representation of this memory would be

\[ \left( \begin{array}{cc} 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{array} \right) \]

This representation of the memory has surprisingly many neat properties. The most important one for our purposes is that when this matrix is multiplied by the pc vector, we get the new pc. I want you to pause and think about what just happened. By picking the right data structures, we managed to turn our (rather complicated) problem into “perform a multiplication”.88 Generally, before starting to answer a question, it may be helpful to think of alternative ways to ask the same question. Surprisingly often, you just have to ask the question in the right way to get an obvious, simple and clear answer with no effort.

\[ \left( \begin{array}{cc} 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{array} \right) \dot{} \left( \begin{array}{cc} 0 \\ 0 \\ 0 \\ 1 \\ 0 \end{array} \right) = \left( \begin{array}{cc} 0 \\ 0 \\ 1 \\ 0 \\ 0 \end{array} \right) \]

Another neat property of this matrix is that if we want to have multiple parallel program counters, all we have to do is put more “ones” into the pc. We don’t have to change anything in our code, and all simultaneous program counters will Just Work. In a similar fashion, we can also introduce non-deterministic jumps99 Currently, each memory location contains a jump to a single, specific other memory location. We can also imagine that, for example, when the program counter points to location 7, it can jump to location 3 with a 75% probability, location 10 with a 10% probability, and stay where it is with a 15% probability. The resulting vector will then be the probabilities of the program counter being a particular value. All for free; no changes to the code required. We’re now starting to realise the power of representing things as matrices. by putting more than one value in a column in the target matrix.

The target matrix is constructed in R as the equality-based outer product of the memory and its indices.

twisty_trampolines <- function(relative) {
    n <- length(relative)

    ## Convert relative jumps to absolute vector indices
    memory <- matrix(relative + seq(relative), n, 1)

    ## Initialise program counter
    pc <- matrix(diag(n)[1,])

    ## While there are active program counters
    while (any(pc > 0)) {
        ## Construct program counter matrix with jump targets
        target <- outer(seq(memory), memory, `==`)

        ## Increment value in this memory location
        memory <- memory + pc

        ## Compute new program counter locations from jump matrix
        pc <- target %*% pc
    }

    ## Figure out how much the memory has changed from initial value
    sum(memory - relative - seq(relative))
}

However, there’s a big problem with this code.

Sure, it’s very general and handles multiple program counters and nondeterministic jump targets … when do we use that? We don’t. This puzzle only has a single program counter and jump target. So only one out of every thousand values we compute will actually be used. That’s bad.

Without loss of generality, we can optimise our code to handle the salient case much faster: we simply refuse to update values that we know will not be used. We compute the boolean vector pc > 0, which is a filter of the active memory locations. We then slice the vectors and matrices in our computations such that only the active memory locations are updated.1010 Other locations will contain invalid values, but since they’re inactive, that doesn’t bother us. Once they become active, they will be computed correctly again.

We make sure the target matrix, which is the largest data structure in the function, is pre-allocated at the start of the function, rather than allocated anew each iteration. A micro-optimisation we also perform is that we replace several calls to seq by the nearly constant vector 1:n, which should be slightly faster.

Day 6: Memory Reallocation

Solution

memory_realloc <- function(memory) {
    n <- length(memory)

    unseen <- function(history, memory) {
        history %>%
            group_by(time) %>%
            filter(bank == seq(memory), blocks == memory) %>%
            nrow(.) == 0
    }

    history <- data_frame(time=integer(), bank=integer(), blocks=integer())

    i <- 0
    while (unseen(history, memory)) {
        ## Add this memory to history
        history <- bind_rows(history, data_frame(time=i, bank=seq(memory), blocks=memory))

        from <- which.max(memory)
        to <- sort(((1:memory[from] + from - 1) %% n) + 1)
        memory[from] <- 0
        memory[1:4] <- memory[1:4] + tabulate(to, nbins=n)
        i <- i+1
    }
    print(history)
    history %>% summarise(m=max(time))
}

memory_realloc(c(0,2,7,0))

Reasoning

time bank blocks
0    1    0



0 2 7 0     2 2 3 3 3 3 3 3 3
0 2 0 0     2 2 4 5 6 7 8 9 10 (mod n)
      .     2 2 4 1 2 3 4 1 2
. . . .     1 1 2 2 2 2 3 4 4
. .

2 4 1 2     1 1 3 4 5 6 3 4 4 (mod n)
2 0 1 2     1 1 3 4 1 2 3 4 4
    . .     1 1 1 2 3 3 4 4 4
. .

3 1 2 3
0 1 2 3
  . . .

0 2 3 4
0 2 3 0
. . . .

1 3 4 1
1 3 0 1
      .
. . .

2 4 1 2