Introduction
Unless you spent 2022 on the moon, you’ve heard of Wordle, but just in case you haven’t here’s the story. You get up to six guesses to identify a secret five-letter word. The game tells you whether each of the letters of your guesses appears in the word and, if they do, whether they’re in the correct place.
Data
My Scores
read_csv("wordle_scores.csv", col_types = "nn-") |>
mutate(source = "me", puzzle = row_number()) ->
wordle_scores
wordle_scores |>
select(-source, -puzzle) |>
st(title = "My Wordle Scores: Summary Statistics")
My Wordle Scores: Summary Statistics
score |
466 |
4 |
1 |
2 |
3 |
5 |
7 |
Visualization
First let’s make one those those trend-and-distribution charts that I love so much. See the post about my Jeopardy! Coryat scores for another one.
For a more detailed comparison against the scores from Twitter, we’ll look at the cumulative distributions.
Code
# main plot
(wordle_scores |>
ggplot() +
# aesthetic mapping
aes(x = puzzle, y = score) +
# visual elements representing the data
geom_line(colour = "#b4b4b4") +
geom_smooth(se = FALSE, colour = "black") +
geom_point(colour = "#dc2828") +
# scales
scale_y_continuous(limits = c(7.25, 0.75),
breaks = 1:7,
labels = c(1:6, "X"),
trans = "reverse") +
scale_x_continuous(expand = c(0, 0), breaks = NULL) +
# labels
labs(title = "My Wordle Scores",
subtitle = "Trend and distribution",
x = "",
y = "Score") +
# theming
theme_bw() +
theme(panel.grid.minor = element_blank())) |>
# add the marginal histogram
ggMarginal(type = "histogram",
margins = "y",
fill = "#b4b4b4",
yparams = list(bins = 7, center = 0, binwidth = 1))
Code
# combine the scores from twitter with my own
wordle_scores |>
full_join(twitter_scores) |>
# count the occurrences of each possible combination of source and score
group_by(source, score) |>
count() |>
ungroup() |>
complete(source, score, fill = list(n = 0)) |>
# get the cumulative percentiles for each source
arrange(source, score) |>
group_by(source) |>
mutate(percentile = cumsum(n)/sum(n)) |>
# add some helper columns for evil secondary axis trickery later
mutate(axis = case_when(percentile == 1 ~ "n",
source == "me" ~ "l",
TRUE ~ "r")) |>
mutate(
r_label_colour = case_match(axis, "r" ~ "grey30"),
l_label_colour = case_match(axis, "l" ~ "grey30"),
r_tick_linetype = case_match(axis, "r" ~ "solid", .default = "blank"),
l_tick_linetype = case_match(axis, "l" ~ "solid", .default = "blank")) ->
# need to save this dataframe so we can refer to it within the ggplot call
temp
temp |>
ggplot() +
aes(x = score, y = percentile, fill = source) +
# visual elements representing the data
geom_line(linetype = "dotted") +
geom_point(size = 3, shape = 21, colour = "black") +
# scales
scale_x_continuous(breaks = 1:7,
labels = c(1:6, "X"),
expand = c(0, 0)) +
## evil secondary axis trickery part one
scale_y_continuous(breaks = temp$percentile,
labels = scales::label_percent(accuracy = 0.1),
limits = c(0, 1),
expand = c(0, 0),
sec.axis = dup_axis()) +
scale_fill_manual(values = c("#dc2828", "white")) +
# labels
labs(y = "",
x = "Score",
title = "My Wordle Scores vs Twitter",
subtitle = "Cumulative distributions",
fill = "Source") +
# theming
theme_bw() +
## evil secondary axis trickery part two
theme(panel.grid.minor = element_blank(),
axis.text.y.right = element_text(colour = temp$r_label_colour),
axis.text.y.left = element_text(colour = temp$l_label_colour),
axis.ticks.y.right = element_line(linetype = temp$r_tick_linetype),
axis.ticks.y.left = element_line(linetype = temp$l_tick_linetype))
Observations
I get fewer puzzles in two or fewer guesses than those who posted their scores to Twitter, but I do better at the harder words. I suspect this is the result of the bias speculated about above; probably there are many people who only tweeted because they got the puzzle in two guesses.
References & Further Reading