Advent of Code 2020, Days 1-5
Each month of 2020 has seem to come with a hobby obsession, and December is no exception - this year I’m resolved to try Advent of Code on each day. I’ve roped myself into “competing” against friends from the R4DS Slack group, which kind of sucks because a lot of them have a time zone advantage on me!
I’d been sharing my code a day at a time, but I think I’m going to re-wrap these posts into five-problem chunks and reflect on them as I go. Days 1-5 have been well within my wheelhouse so far - generalizing the problem to the second step being relatively achievable every time and total solves being under an hour for the most part. I’m currently still hammering everything into a tibble and cleaning it first, which really does help me grok the solutions.
Some statistics on Days 1-5:
## # A tibble: 5 × 6
## day time_1 rank_1 time_2 rank_2 timediff_to_2
## <dbl> <chr> <dbl> <chr> <dbl> <drtn>
## 1 5 07:15:37 26932 07:25:04 25548 9.4 mins
## 2 4 00:19:57 3991 00:30:59 1369 11.0 mins
## 3 3 00:33:36 6826 01:01:46 8007 28.2 mins
## 4 2 07:40:44 35065 07:49:04 33133 8.3 mins
## 5 1 09:26:00 34474 09:31:58 31630 6.0 mins
Favourite solve: hands down, Day 4 - I really liked how easy it was to solve after applying the usual set of data cleaning.
Least favourite solve: Day 5 - I’m jealous of the people who had the experience with binary enough to recognize you could just convert the string to 0s and 1s and thus receive seat numbers and row numbers. I appreciated the practice of writing recursive functions for later, though!
Packages used
For the most part, I’ve used tidyverse pretty darn consistently thus far, who are doing these problems in R as well.
suppressPackageStartupMessages({
library(tidyverse)
library(here)
library(lubridate)
})
Day One
This was a bit of stream of consciousness, and I hadn’t had the experience with AoC enough to realize and be prepared to tackle a second extension problem immediately afterwards - I think you can see this change in my approach to solutions for later days.
— Day 1: Report Repair —
After saving Christmas five years in a row, you’ve decided to take a vacation at a nice resort on a tropical island. Surely, Christmas will go on without you.
The tropical island has its own currency and is entirely cash-only. The gold coins used there have a little picture of a starfish; the locals just call them stars. None of the currency exchanges seem to have heard of them, but somehow, you’ll need to find fifty of these coins by the time you arrive so you can pay the deposit on your room.
To save your vacation, you need to get all fifty stars by December 25th.
Collect stars by solving puzzles. Two puzzles will be made available on each day in the Advent calendar; the second puzzle is unlocked when you complete the first. Each puzzle grants one star. Good luck!
Before you leave, the Elves in accounting just need you to fix your expense report (your puzzle input); apparently, something isn’t quite adding up.
Specifically, they need you to find the two entries that sum to
2020
and then multiply those two numbers together.For example, suppose your expense report contained the following:
1721 979 366 299 675 1456
In this list, the two entries that sum to
2020
are1721
and299
. Multiplying them together produces1721 * 299 = 514579
, so the correct answer is514579
.Of course, your expense report is much larger. Find the two entries that sum to
2020
; what do you get if you multiply them together?
— Data —
https://adventofcode.com/2020/day/1/input
Copy-paste + datapasta’s Fiddle Selection. (Thank you, Miles!)
input_data <- c(
1630, 1801, 1917, 1958, 1953, 1521, 1990, 1959, 1543, 1798,
638, 1499, 1977, 1433, 1532, 1780, 1559, 1866, 1962, 1999,
1623, 1772, 1730, 1670, 1791, 1947, 1961, 1523, 959, 1998,
1693, 1490, 1712, 910, 1635, 1837, 586, 1590, 1741, 1739,
1660, 1883, 1777, 1734, 1413, 1456, 1511, 1957, 1738, 1685,
1677, 1419, 1566, 1639, 1578, 1922, 1856, 1946, 1965, 1649,
1854, 1610, 1806, 1424, 1616, 218, 1678, 1992, 1985, 903,
1626, 1412, 1964, 671, 1692, 1571, 1690, 1587, 1933, 1367,
1585, 1575, 498, 1601, 2005, 1711, 1948, 1991, 1580, 1704,
207, 1560, 1867, 1600, 1594, 1930, 1541, 1832, 1613, 1599,
1757, 71, 1534, 1940, 1982, 1960, 1530, 1908, 1857, 1410,
1987, 1526, 1546, 2002, 1923, 1972, 1752, 1984, 1754, 1916,
1942, 1980, 1608, 1398, 1438, 1955, 1968, 1799, 1976, 1847,
1775, 1904, 1983, 1945, 1554, 1486, 1527, 1884, 1553, 1736,
1561, 1513, 1695, 1431, 1997, 1405, 1872, 1434, 1679, 1609,
105, 1582, 1795, 1826, 1886, 1472, 2007, 1617, 1978, 1669,
1764, 1865, 1773, 1993, 1666, 1583, 2009, 1969, 2001, 1659,
1833, 1713, 1893, 2000, 1520, 1652, 1437, 1556, 1633, 1386,
1819, 1973, 1426, 1975, 2010, 1863, 1593, 1996, 1796, 1986,
1995, 657, 1784, 1644, 1941, 1596, 1849, 1065, 1927, 1525)
— Part One: Find a solution where the sum of two entries equals 2020. —
The first solution that comes to mind is finding the difference each entry and 2020, and then left-joining the entries onto this difference.
pair_2020 <- tibble(expense = input_data) %>%
mutate(difference = 2020 - expense) %>%
left_join(
x = .,
y = .,
by = c("expense"="difference")
) %>%
filter(!is.na(expense.y))
pair_2020
## # A tibble: 2 × 3
## expense difference expense.y
## <dbl> <dbl> <dbl>
## 1 586 1434 1434
## 2 1434 586 586
pair_2020 %>%
mutate(multiply = expense * expense.y) %>%
distinct(multiply) %>%
pull()
## [1] 840324
— Part Two: Find a solution where the sum of three entries equals 2020 —
Okay, I can’t use the first approach here, so now I’ll hit crossing to make a big dataframe, then sum it all afterwards and filter to where sum == 2020.
triple_2020 <- crossing(first = input_data,
second = input_data,
third = input_data) %>%
mutate(total = first + second + third,
product = first * second * third) %>%
filter(total == 2020)
triple_2020
## # A tibble: 6 × 5
## first second third total product
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 207 903 910 2020 170098110
## 2 207 910 903 2020 170098110
## 3 903 207 910 2020 170098110
## 4 903 910 207 2020 170098110
## 5 910 207 903 2020 170098110
## 6 910 903 207 2020 170098110
triple_2020 %>%
distinct(product) %>%
pull()
## [1] 170098110
Day Two
This was a bit of an exercise in parsing, cleaning, and mapping, which was pretty comfortable for me.
— Problem —
To try to debug the problem, they have created a list (your puzzle input) of passwords (according to the corrupted database) and the corporate policy when that password was set.
For example, suppose you have the following list:
- 1-3 a: abcde
- 1-3 b: cdefg
- 2-9 c: ccccccccc
Each line gives the password policy and then the password. The password policy indicates the lowest and highest number of times a given letter must appear for the password to be valid. For example, 1-3 a means that the password must contain a at least 1 time and at most 3 times.
In the above example, 2 passwords are valid. The middle password, cdefg, is not; it contains no instances of b, but needs at least 1. The first and third passwords are valid: they contain one a or nine c, both within the limits of their respective policies.
— Input Data —
input_data <- read.delim("https://github.com/tanho63/advent_of_code/raw/master/2020/day-02.txt",header = FALSE)
— How many passwords are valid?—
password_table_one <- input_data %>%
separate(1,into = c("qty","character","password"),sep = " ") %>%
separate(qty, into = c("min","max"), sep = "-",convert = TRUE) %>%
mutate(character = str_remove(character,":"),
password = str_split(password,""),
count = map2_dbl(password,character,~sum(.x %in% .y)),
valid = count <= max & count >=min)
head(password_table_one)
## min max character password count
## 1 1 4 j j, j, j, q, z, m, g, b, j, w, p, j 5
## 2 2 4 w s, c, k, w, w, f 2
## 3 1 13 b b, c, b, b, b, b, b, b, b, b, b, b, h, b, b 13
## 4 10 11 x x, j, x, x, x, x, x, x, x, x, z, x, x, x 12
## 5 13 14 d d, d, d, d, d, d, d, d, d, d, d, d, d, d 14
## 6 16 18 s k, s, t, t, b, j, s, s, t, p, n, s, v, t, c, j, n, x 4
## valid
## 1 FALSE
## 2 TRUE
## 3 TRUE
## 4 FALSE
## 5 TRUE
## 6 FALSE
sum(password_table_one$valid)
## [1] 418
— Part Two: how many passwords are valid under the second policy?—
While it appears you validated the passwords correctly, they don’t seem to be what the Official Toboggan Corporate Authentication System is expecting.
The shopkeeper suddenly realizes that he just accidentally explained the password policy rules from his old job at the sled rental place down the street! The Official Toboggan Corporate Policy actually works a little differently.
Each policy actually describes two positions in the password, where 1 means the first character, 2 means the second character, and so on. (Be careful; Toboggan Corporate Policies have no concept of “index zero”!) Exactly one of these positions must contain the given letter. Other occurrences of the letter are irrelevant for the purposes of policy enforcement.
Given the same example list from above:
- 1-3 a: abcde is valid: position 1 contains a and position 3 does not.
- 1-3 b: cdefg is invalid: neither position 1 nor position 3 contains b.
- 2-9 c: ccccccccc is invalid: both position 2 and position 9 contain c.
How many passwords are valid according to the new interpretation of the policies?
password_table_two <- input_data %>%
separate(1,into = c("qty","character","password"),sep = " ") %>%
separate(qty, into = c("pos1","pos2"), sep = "-",convert = TRUE) %>%
mutate(character = str_remove(character,":"),
password = str_split(password,""),
password = pmap(list(password,pos1,pos2),~magrittr::extract(..1,c(..2,..3))),
count = map2_dbl(password,character, ~sum(.x %in% .y)),
valid = count == 1)
head(password_table_two)
## pos1 pos2 character password count valid
## 1 1 4 j j, q 1 TRUE
## 2 2 4 w c, w 1 TRUE
## 3 1 13 b b, h 1 TRUE
## 4 10 11 x x, z 1 TRUE
## 5 13 14 d d, d 2 FALSE
## 6 16 18 s j, x 0 FALSE
sum(password_table_two$valid)
## [1] 616
Day Three
I visualized this problem in a tibble format from the start, but instead of going back to the front I visualized binding more columns onto the table, which necessitated a while loop. Once there, it became a simpler subsetting problem. I did get tripped up forgetting to iterate from 1,1 instead of 0,0, and then there were some order of operations problems with going down two rows instead of one, but all in all not a terrible experience solving.
Day three!
— Problem —
You start on the open square (.) in the top-left corner and need to reach the bottom (below the bottom-most row on your map).
The toboggan can only follow a few specific slopes (you opted for a cheaper model that prefers rational numbers); start by counting all the trees you would encounter for the slope right 3, down 1:
From your starting position at the top-left, check the position that is right 3 and down 1. Then, check the position that is right 3 and down 1 from there, and so on until you go past the bottom of the map.
Starting at the top-left corner of your map and following a slope of right 3 and down 1, how many trees would you encounter?
— Input Data —
Converting to a matrix of 1s and 0s, where “#” is 1 and “.” is 0
input_matrix <- read_table("https://github.com/tanho63/advent_of_code/raw/master/2020/day-03.txt",col_names = "x") %>%
mutate(x = str_split(x,"")) %>%
unnest_wider(x,names_sep = "_") %>%
mutate_all(~case_when(.x == "#" ~ 1, .x == "." ~ 0)) %>%
as.matrix()
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## x = col_character()
## )
— Problem 1 —
Starting at the top-left corner of your map and following a slope of right 3 and down 1, how many trees would you encounter?
So the pattern repeats to the right until you get down to the bottom, i.e. until you access row 323 in this case.
input_rows <- nrow(input_matrix)
input_rows
## [1] 323
323 is the goal rows, and that means we’ll need ~ 969 columns. Tackling this via cbind + a while loop.
input_columns <- input_rows * 3
wide_matrix <- input_matrix
while(ncol(wide_matrix) < input_columns) wide_matrix <- cbind(wide_matrix,input_matrix)
Now building an accessor tibble with row numbers and column numbers
row_accessor <- 2:input_rows # start at 1,1 so the next row is 2
column_accessor <- 1:input_rows * 3 + 1 # start at 1,1 and move right three, so next column is 4
column_accessor <- column_accessor[-323] # need the accessors to be same length
indices <- tibble(
row = row_accessor,
column = column_accessor) %>%
mutate(path = map2_dbl(row,column,~wide_matrix[.x,.y]))
head(indices)
## # A tibble: 6 × 3
## row column path
## <int> <dbl> <dbl>
## 1 2 4 1
## 2 3 7 1
## 3 4 10 1
## 4 5 13 1
## 5 6 16 1
## 6 7 19 1
sum(indices$path)
## [1] 280
— Problem 2: Find this path for multiple slopes —
Find the path for
- 1,1
- 3,1
- 5,1
- 7,1
- 1,2
A good time to generalise the previous steps!
fn_slope <- function(right,down,matrix){
# Calculate number of columns
column_count <- (nrow(matrix) * right)
# Create matrix
wide_matrix <- matrix
while(ncol(wide_matrix) < column_count) wide_matrix <- cbind(wide_matrix,matrix)
# Create accessors
row_accessor <- (1:nrow(matrix) * down) + 1
row_accessor <- row_accessor[row_accessor <=nrow(matrix)]
column_accessor <- (1:nrow(matrix) * right) + 1
column_accessor <- column_accessor[1:length(row_accessor)]
x <- tibble(row = row_accessor,
column = column_accessor) %>%
mutate(path = map2_dbl(row,column,~wide_matrix[.x,.y]))
sum(x$path)
}
toboggan_paths <- tibble(right = c(1,3,5,7,1),
down = c(1,1,1,1,2)) %>%
mutate(paths = map2_dbl(right,down,fn_slope,input_matrix))
toboggan_paths
## # A tibble: 5 × 3
## right down paths
## <dbl> <dbl> <dbl>
## 1 1 1 77
## 2 3 1 280
## 3 5 1 74
## 4 7 1 78
## 5 1 2 35
reduce(toboggan_paths$paths,`*`)
## [1] 4355551200
Day Four
This one was immensely satisfying to crack - after applying my tibble cleaning on it, it was super easy to create the filters and counts as necessary, and super fast too! I still don’t consider myself a regex expert, but it turns out I know more than I think.
— Problem Description —
The automatic passport scanners are slow because they’re having trouble detecting which passports have all required fields. The expected fields are as follows:
- byr (Birth Year) - iyr (Issue Year) - eyr (Expiration Year) - hgt (Height) - hcl (Hair Color) - ecl (Eye Color) - pid (Passport ID) - cid (Country ID)
Passport data is validated in batch files (your puzzle input). Each passport is represented as a sequence of key:value pairs separated by spaces or newlines. Passports are separated by blank lines.
Count the number of valid passports - those that have all required fields
. Treat cid as optional. In your batch file, how many passports are valid?
— Input Data —
input_data <- read_lines("https://github.com/tanho63/advent_of_code/raw/master/2020/day-04.txt")
head(input_data)
## [1] "iyr:2015 cid:189 ecl:oth byr:1947 hcl:#6c4ab1 eyr:2026"
## [2] "hgt:174cm"
## [3] "pid:526744288"
## [4] ""
## [5] "pid:688706448 iyr:2017 hgt:162cm cid:174 ecl:grn byr:1943 hcl:#808e9e eyr:2025"
## [6] ""
Cleaning data into a tibble.
df_passports <- tibble(data = input_data) %>%
mutate(blank = ifelse(data == "",1,0),
passport_id = cumsum(blank)+1) %>%
filter(!blank) %>%
select(-blank) %>%
separate_rows(data, sep = " ") %>%
separate(data, sep = ":", into = c("key","value")) %>%
pivot_wider(names_from = key, values_from = value)
head(df_passports)
## # A tibble: 6 × 9
## passport_id iyr cid ecl byr hcl eyr hgt pid
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 2015 189 oth 1947 #6c4ab1 2026 174cm 526744288
## 2 2 2017 174 grn 1943 #808e9e 2025 162cm 688706448
## 3 3 2019 124 oth 1933 #733820 2001 159in 111220591
## 4 4 2026 291 oth 1942 #fffffd 2024 159cm 812929897
## 5 5 2013 83 amb 1974 #ceb3a1 2028 191cm 524032739
## 6 6 <NA> 221 gry 1963 eefed5 2029 183cm 88405792
— Problem 1: How many complete/non-missing passports? —
valid_p1 <- df_passports %>%
select(-cid) %>%
filter_all(~!is.na(.x))
head(valid_p1)
## # A tibble: 6 × 8
## passport_id iyr ecl byr hcl eyr hgt pid
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 2015 oth 1947 #6c4ab1 2026 174cm 526744288
## 2 2 2017 grn 1943 #808e9e 2025 162cm 688706448
## 3 3 2019 oth 1933 #733820 2001 159in 111220591
## 4 4 2026 oth 1942 #fffffd 2024 159cm 812929897
## 5 5 2013 amb 1974 #ceb3a1 2028 191cm 524032739
## 6 7 2018 grn 1923 #18171d 2021 181cm 777881168
nrow(valid_p1)
## [1] 264
— Problem 2: Validate columns —
Time to add more validation beyond non-missing!
valid_p2 <- valid_p1 %>%
mutate_at(c("byr","iyr","eyr"),as.numeric) %>%
filter(
between(byr,1920,2002),
between(iyr,2010,2020),
between(eyr,2020,2030),
case_when(
str_ends(hgt,"cm") & between(parse_number(hgt),150,193) ~ TRUE,
str_ends(hgt,"in") & between(parse_number(hgt),59,76) ~ TRUE,
TRUE ~ FALSE
),
str_detect(hcl,"^#[A-z,0-9]{6}$"),
ecl %in% c("amb","blu","brn","gry","grn","hzl","oth"),
str_detect(pid,"^[0-9]{9}$"),
)
head(valid_p2)
## # A tibble: 6 × 8
## passport_id iyr ecl byr hcl eyr hgt pid
## <dbl> <dbl> <chr> <dbl> <chr> <dbl> <chr> <chr>
## 1 1 2015 oth 1947 #6c4ab1 2026 174cm 526744288
## 2 2 2017 grn 1943 #808e9e 2025 162cm 688706448
## 3 5 2013 amb 1974 #ceb3a1 2028 191cm 524032739
## 4 7 2018 grn 1923 #18171d 2021 181cm 777881168
## 5 8 2016 gry 1941 #a5e1b5 2027 178cm 062495008
## 6 10 2020 blu 1925 #888785 2023 188cm 117915262
nrow(valid_p2)
## [1] 224
Day Five
As I said earlier, this one was not bad - I took my time to practice recursion, which came in handy later (on day seven, for example). I’m mostly just upset that I couldn’t recognize you could just convert to binary.
Instead of zones or groups, this airline uses binary space partitioning to seat people. A seat might be specified like FBFBBFFRLR, where F means “front”, B means “back”, L means “left”, and R means “right”.
The first 7 characters will either be F or B; these specify exactly one of the 128 rows on the plane (numbered 0 through 127). Each letter tells you which half of a region the given seat is in. Start with the whole list of rows; the first letter indicates whether the seat is in the front (0 through 63) or the back (64 through 127). The next letter indicates which half of that region the seat is in, and so on until you’re left with exactly one row.
The last three characters will be either L or R; these specify exactly one of the 8 columns of seats on the plane (numbered 0 through 7). The same process as above proceeds again, this time with only three steps. L means to keep the lower half, while R means to keep the upper half.
So, decoding FBFBBFFRLR reveals that it is the seat at row 44, column 5.
Every seat also has a unique seat ID: multiply the row by 8, then add the column. In this example, the seat has ID 44 * 8 + 5 = 357.
— Data —
input_05 <- read_lines("https://github.com/tanho63/advent_of_code/raw/master/2020/day-05.txt")
— Problem 1: highest seat ID? —
A good time to dust off some recursive functions!
search_function <- function(string,range){
midpoint <- length(range)%/%2
accessor <- str_sub(string,1,1)
new_range <- switch(
accessor,
"L" = ,
"F" = range[1:midpoint],
"R" = ,
"B" = range[-(1:midpoint)])
if(length(new_range) == 1) return(new_range)
new_string <- str_remove(string,"^.")
search_function(new_string,new_range)
}
search_function("FBFBBFF",0:127)
## [1] 44
df_boardingpasses <- tibble(input = input_05) %>%
mutate(row_string = str_sub(input,1,7),
seat_string = str_sub(input,-3,-1),
row = map_dbl(row_string,search_function,0:127),
seat = map_dbl(seat_string,search_function,0:7),
seat_id = row * 8 + seat)
head(df_boardingpasses)
## # A tibble: 6 × 6
## input row_string seat_string row seat seat_id
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FFBFBBBRLR FFBFBBB RLR 23 5 189
## 2 BBFFBFFRRR BBFFBFF RRR 100 7 807
## 3 BFFFFBBRRR BFFFFBB RRR 67 7 543
## 4 FFBFFBBLRR FFBFFBB LRR 19 3 155
## 5 FFBFBFBLLL FFBFBFB LLL 21 0 168
## 6 FFBFFBFLRR FFBFFBF LRR 18 3 147
max(df_boardingpasses$seat_id)
## [1] 818
— Problem 2: Your Seat —
It’s a completely full flight, so your seat should be the only missing boarding pass in your list. However, there’s a catch: some of the seats at the very front and back of the plane don’t exist on this aircraft, so they’ll be missing from your list as well.
Your seat wasn’t at the very front or back, though; the seats with IDs +1 and -1 from yours will be in your list.
actual_seats <- df_boardingpasses %>%
arrange(row,seat) %>%
filter(
row != min(row),
row != max(row)
)
x <- seq.int(min(actual_seats$seat_id),max(actual_seats$seat_id))
x[!(x %in% actual_seats$seat_id)]
## [1] 559