Cities with Nice Weather

Mean-variance optimality.
Published

May 12, 2022

Modified

June 1, 2024

Introduction

I live near Toronto. It’s springtime, and currently about 30 °C. In my opinion, Toronto is too hot in the summer and too cold in the winter. I’d like to know which cities have the least deviation from a tolerable average temperature.

Tools

This page was created using Quarto. I’m using the tidyverse for data wrangling, ggplot2 and ggExtra to plot.

Data

First, I created a CSV file comprising all the information in the Wikpedia article List of cities by average temperature.

read_csv("temps.csv", show_col_types = FALSE) ->
  city_temps

head(city_temps)
# A tibble: 6 × 15
  Country City         Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct
  <chr>   <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Algeria Algiers     11.2  11.9  12.8  14.7  17.7  21.3  24.6  25.2  23.2  19.4
2 Algeria Tamanrass…  12.8  15    18.1  22.2  26.1  28.9  28.7  28.2  26.5  22.4
3 Algeria Reggane     16    18.2  23.1  27.9  32.2  36.4  39.8  38.4  35.5  29.2
4 Angola  Luanda      26.7  28.5  28.6  28.2  27    23.9  22.1  22.1  23.5  25.2
5 Benin   Cotonou     27.3  28.5  28.9  28.6  27.8  26.5  25.8  25.6  26    26.7
6 Benin   Parakou     26.5  28.7  29.6  29    27.5  26.1  25.1  24.7  25    26.1
# ℹ 3 more variables: Nov <dbl>, Dec <dbl>, Year <dbl>

Each row corresponds to a distinct city. There are two text columns containing each city’s name and country, twelve numeric columns indicating the “averages of the daily highs and lows”1 for each month, and one additional numeric column containing the same figure for the entire year. The units are degrees Celsius. 455 cities are included in the data.

  • 1 This is a bit ambiguous, but no matter. The article also points out that “the actual daytime temperature in a given month will be 3 to 10 °C higher than the temperature listed here, depending on how large the difference between daily highs and lows is.”

  • We’ll define the “deviation” mentioned above as the difference between the value recorded for the coldest and hottest months, and the “average” as the value recorded for the whole year overall.

    We’ll ignore any other weather characteristics like humidity, rain, wind, diurnal temperature difference, etc.2

  • 2 This may or many not be reasonable depending on your personal preferences about weather.

  • city_temps |>
      rename_with(tolower) |>
      rowwise() |>
      transmute(city,
                avg = year,
                range = max(c_across(jan:dec)) - min(c_across(jan:dec))) ->
      city_temps
    
    head(city_temps)
    # A tibble: 6 × 3
    # Rowwise: 
      city          avg range
      <chr>       <dbl> <dbl>
    1 Algiers      17.4 14   
    2 Tamanrasset  21.7 16.1 
    3 Reggane      28.3 23.8 
    4 Luanda       25.8  6.5 
    5 Cotonou      27.2  3.30
    6 Parakou      26.8  4.9 

    Summary Statistics

    Now we can investigate the distribution of each of our two variables.

    Here are the default summaries:

    city_temps |> 
      select(avg, range) |>
      summarize()
              N    Mean   SD     Min    Q1 Median    Q3  Max
    1   avg 455   18.00 8.12   -14.4 12.45   18.6 25.65 30.5
    2 range 455   13.75 9.53     0.7  5.65   12.1 21.00 58.1

    Which cities correspond to the extremes for each variable?

    city_temps |>
      filter(
        avg   %in% (city_temps |> pull(avg)   |> range())  || 
        range %in% (city_temps |> pull(range) |> range())) |>
      arrange(avg)
    # A tibble: 4 × 3
    # Rowwise: 
      city         avg  range
      <chr>      <dbl>  <dbl>
    1 Gjoa Haven -14.4 42    
    2 Yakutsk     -8.8 58.1  
    3 Honiara     26.5  0.700
    4 Assab       30.5  8.7  

    Let’s see the values for Toronto as a baseline, and save them for later:

    city_temps |>
      filter(city == "Toronto")
    # A tibble: 1 × 3
    # Rowwise: 
      city      avg range
      <chr>   <dbl> <dbl>
    1 Toronto   9.4    26
    city_temps |>
      filter(city == "Toronto") |>
      pull(avg) ->
      toronto_avg
    
    city_temps |>
      filter(city == "Toronto") |>
      pull(range) ->
      toronto_range

    By global standards, Toronto is cool on average, but in keeping with my subjective perception, the deviation from that average over the year is quite large.

    Plots

    Let’s look at a scatter plot with marginal histograms:

    city_temps |>
      ggplot(aes(x = avg, y = range)) +
      geom_point(alpha = 0.33, colour = "#dc2828") +
      geom_vline(xintercept = toronto_avg,
                 linetype = "dashed",
                 alpha = 0.33) +
      geom_hline(yintercept = toronto_range,
                 linetype = "dashed",
                 alpha = 0.33) +
      labs(title = "Average Temperature vs Range by City",
           x = "Average Temperature (°C)",
           y = "Difference Between Hottest and Coldest Months (°C)") +
      theme_bw() ->
      plot
    
    plot |>
      ggMarginal(type = "histogram", fill = "#b4b4b4", size = 10) ->
      plot
    
    plot

    Here Toronto is indicated by the dashed lines.

    We can see there’s a negative association between a city’s average temperature and the range of temperatures experienced there. In particular, there’s a big cluster of very hot cities which have little difference between their hottest and coldest months.

    Ten tropical cities fall into both the hottest decile and the least varying decile:

    city_temps |> 
      filter(range < quantile(city_temps$range, 0.1),
             avg   > quantile(city_temps$avg,   0.9)) |>
      select(city)
    # A tibble: 10 × 1
    # Rowwise: 
       city         
       <chr>        
     1 Lodwar       
     2 Palembang    
     3 Pontianak    
     4 Kuala Lumpur 
     5 Malé         
     6 Lanka Colombo
     7 Oranjestad   
     8 Willemstad   
     9 Panama City  
    10 Barranquilla 

    While these cities see very little temperature variation throughout the year, they are much too hot.

    Zooming In

    The area of this plot I’m most interested in is the vertical slice around Toronto. Let’s see the same plot, including only the cities within one degree of Toronto’s average temperature.3 We’ll exclude the marginal histograms but add labels to the cities.

  • 3 I haven’t defined an ideal average temperature, but any city with a similar average and smaller range than Toronto is a clear improvement.

  • city_temps |>
      filter(abs(avg - toronto_avg) <=1) |>
      ggplot(aes(x = avg, y = range, label = city)) +
      geom_point(colour = "#dc2828") +
      geom_text(size = 4, nudge_x = 0.01, hjust = "left") +
      geom_vline(xintercept = toronto_avg,
                 linetype = "dashed",
                 alpha = 0.33) +
      geom_hline(yintercept = toronto_range,
                 linetype = "dashed",
                 alpha = 0.33) +
      labs(title = "Average Temperature vs Range by City (Detail 1)",
           x = "Average Temperature (°C)",
           y = "Difference Between Hottest and Coldest Months (°C)") +
      theme_bw()

    So it seems that La Paz, Edinburgh, or Dublin might be good options.

    But which cities are the best? These would be the ones with the smallest range for a given maximum average. Let’s find them.

    Finding the Cities with the Nicest Weather

    We want to know, for each maximum average temperature, the city that has the minimum range of temperatures. These are the cities that form the “bottom-left edge” of our first plot.

    Nine cities fit this criterion:

    city_temps |>
      arrange(avg) |>
      cbind(city_temps |> arrange(avg) |> pull(range) |> cummin()) |>
      rename(running_min = 4) |>
      filter(range == running_min) |>
      select(city)
            city
    1 Gjoa Haven
    2     Dikson
    3       Nuuk
    4  Reykjavík
    5    Stanley
    6     La Paz
    7      Cusco
    8     Bogotá
    9    Honiara

    Of these, the first two have temperatures which are more variable than Toronto, so we can remove them from consideration.

    Let’s plot the final seven candidates:

    city_temps |>
      arrange(avg) |>
      cbind(city_temps |> arrange(avg) |> pull(range) |> cummin()) |>
      rename(running_min = 4) |>
      filter(range == running_min) |>
      select(-running_min) |>
      filter(range <= toronto_range) |>
      ggplot(aes(x = avg, y = range, label = city)) +
      geom_point(colour = "#dc2828") +
      geom_text(size = 4, nudge_x = 0.5, hjust = "left") +
      geom_vline(xintercept = toronto_avg,
                 linetype = "dashed",
                 alpha = 0.33) +
      scale_x_continuous(expand = expansion(mult = 0.15)) +
        labs(title = "Average Temperature vs Range by City (Detail 2)",
             x = "Average Temperature (°C)",
             y = "Difference Between Hottest and Coldest Months (°C)") +
      theme_bw()

    Again we see that La Paz has a similar overall average temperature to Toronto, but much less annual variability. Cusco and Bogotá are warmer but even less variable.

    Reykjavík and Stanley are colder than Toronto, and while they represent a smaller decrease in variability compared to La Paz, Cusco, and Bogotá, they have the benefit (for me) of being 98%+ English-speaking.

    Nuuk and Honiara are right out.

    Next Steps

    It would be interesting to use detailed time series for each city and a utility function on temperatures (perhaps including wind chill and humidex) to determine which cities are truly mean-variance optimal.

    Of course, one should probably not choose a place to live based solely on the weather.