Supervision Violations Data and Creating Highcharts

Introduction

This model code will allow you to produce three data visualizations from the public report Supervision Violations and Their Impact on Incarceration. Utilizing the provided model code in the R statistical programming language on your local computer, you will be able to replicate the process of importing data, cleaning and wrangling these data, and finally creating data visualizations displaying supervision and non-supervision violation prison admission trends through area charts, supervision violation prison admission trends by violation type through bar charts, and a hex map of change in total prison admissions. This model code is reproducible and includes quality assurance checks for accuracy.

Highcharts

Create responsive and interactive plots using Highcharts. Additional information on Highcharts can be found in the R package documentation: jkunst.com/highcharter/.

Set Up

To follow along with this tutorial, you’ll need to have the R programming language installed as well as several R packages. If you don’t already have these packages installed, you can install them:

install.packages(c("tidyverse", "highcharter", "sf", 
                  "jsonlite", "geojsonsf", "scales", 
                  "rjson"))

After the packages are installed, we are able to load them into our session using the library() function.

library(tidyverse)
library(highcharter) 
library(sf)
library(jsonlite)
library(geojsonsf)
library(scales) 
library(rjson)

General Data Preparation

Download Data

We use data collected by The CSG Justice Center from state corrections departments, which provided annual counts of total prison admissions and prison populations as well as counts of prison admissions due to supervision violations. These violations are further broken down by the type of supervision (probation or parole) as well as by new offense and technical violation admissions when available. This data is used in all three charts. You can download supervision data from CSGJC’s Supervision Violations Impact on Incarceration tool or using the link shown below.

Import that data into R:

svvi_data_url <- "https://github.com/CSGJusticeCenter/va_data/raw/main/model_code/violation_admissions/MCLC_2024-05-29.csv"

svii_raw_download <- read_csv(svvi_data_url)

svii_raw_download
#> # A tibble: 3,200 × 4
#>    state   metric                                      year total
#>    <chr>   <chr>                                      <dbl> <dbl>
#>  1 Alabama Total Admissions                            2018 14054
#>  2 Alabama Supervision Violation Admissions            2018  6080
#>  3 Alabama Probation Violation Admissions              2018  3752
#>  4 Alabama Probation New Offense Violation Admissions  2018  2069
#>  5 Alabama Probation Technical Violation Admissions    2018  1683
#>  6 Alabama Parole Violation Admissions                 2018  2328
#>  7 Alabama Parole New Offense Violation Admissions     2018  1231
#>  8 Alabama Parole Technical Violation Admissions       2018  1097
#>  9 Alabama Total Population                            2018 27191
#> 10 Alabama Supervision Violation Population            2018   206
#> # ℹ 3,190 more rows

Data Wrangling

When you download the data, there are 16 unique metrics. Notice that each metric appears 200 times in the dataset. This makes sense since there are 50 states and each state has 4 years of data (2018, 2019, 2020, and 2021); 50*4 = 200.

svii_raw_download |>
  count(metric)
#> # A tibble: 16 × 2
#>    metric                                         n
#>    <chr>                                      <int>
#>  1 Parole New Offense Violation Admissions      200
#>  2 Parole New Offense Violation Population      200
#>  3 Parole Technical Violation Admissions        200
#>  4 Parole Technical Violation Population        200
#>  5 Parole Violation Admissions                  200
#>  6 Parole Violation Population                  200
#>  7 Probation New Offense Violation Admissions   200
#>  8 Probation New Offense Violation Population   200
#>  9 Probation Technical Violation Admissions     200
#> 10 Probation Technical Violation Population     200
#> 11 Probation Violation Admissions               200
#> 12 Probation Violation Population               200
#> 13 Supervision Violation Admissions             200
#> 14 Supervision Violation Population             200
#> 15 Total Admissions                             200
#> 16 Total Population                             200

The 16 unique full metrics can be separated into three distinct concepts or variables:

adm_or_pop: Prison Admissions or Prison Population
prob_or_par: Parole, Probation, or NA for metrics that include both
metric: simplified metric categories of New Offense Violation, Technical Violation, Supervision Violation (includes both New Offense or Technical for both parole and probation), Parole Violation (includes both New Offense or Technical), Probation Violation (includes both New Offense or Technical) or Total (total prison admissions or population)

svii_data <- svii_raw_download |> 
  rename(
    count = total, 
    full_metric = metric
  ) |> 
  mutate(
    adm_or_pop = word(full_metric, -1), 
    prob_or_par = ifelse(
      word(full_metric, 1) %in% c("Parole", "Probation"), 
      word(full_metric, 1), NA
      ), 
    metric = case_when(
      is.na(prob_or_par) == TRUE ~ word(full_metric, 1, -2), 
      word(full_metric, 2) == "Violation" ~ word(full_metric, 1, 2), 
      word(full_metric, 2) %in% c("New", "Technical") ~ word(full_metric, 2, -2)
    )
  )

svii_data
#> # A tibble: 3,200 × 7
#>    state   full_metric                  year count adm_or_pop prob_or_par metric
#>    <chr>   <chr>                       <dbl> <dbl> <chr>      <chr>       <chr> 
#>  1 Alabama Total Admissions             2018 14054 Admissions <NA>        Total 
#>  2 Alabama Supervision Violation Admi…  2018  6080 Admissions <NA>        Super…
#>  3 Alabama Probation Violation Admiss…  2018  3752 Admissions Probation   Proba…
#>  4 Alabama Probation New Offense Viol…  2018  2069 Admissions Probation   New O…
#>  5 Alabama Probation Technical Violat…  2018  1683 Admissions Probation   Techn…
#>  6 Alabama Parole Violation Admissions  2018  2328 Admissions Parole      Parol…
#>  7 Alabama Parole New Offense Violati…  2018  1231 Admissions Parole      New O…
#>  8 Alabama Parole Technical Violation…  2018  1097 Admissions Parole      Techn…
#>  9 Alabama Total Population             2018 27191 Population <NA>        Total 
#> 10 Alabama Supervision Violation Popu…  2018   206 Population <NA>        Super…
#> # ℹ 3,190 more rows

Area Chart of Prison Admissions by Type

Data Preparation

First we need to specify which state we want to use in the chart.

this_state <- "Alabama" # select state

Next, we need to filter our data to only include specific metrics for the specific state:

state == thisState: this chart is shown for a single state of interest
full_metric %in% c("Total Admissions", "Supervision Violation Admissions"): look at total admissions and supervision violation admissions

state_admissons_and_supervisions <- svii_data |> 
  filter(
    state == this_state, 
    full_metric %in% c("Total Admissions", "Supervision Violation Admissions")
  ) |> 
  select(state, year, count, metric)

state_admissons_and_supervisions
#> # A tibble: 8 × 4
#>   state    year count metric               
#>   <chr>   <dbl> <dbl> <chr>                
#> 1 Alabama  2018 14054 Total                
#> 2 Alabama  2018  6080 Supervision Violation
#> 3 Alabama  2019 14148 Total                
#> 4 Alabama  2019  6360 Supervision Violation
#> 5 Alabama  2020 10080 Total                
#> 6 Alabama  2020  4761 Supervision Violation
#> 7 Alabama  2021  9663 Total                
#> 8 Alabama  2021  4401 Supervision Violation

We want to show the breakdown of total prison admissions by supervision violation admissions and non-supervision violation admissions. In the data, we have total admissions and supervision violation admissions, but will need to calculate the non-supervision violation admissions.

Start by pivoting the current data to a wider format.

state_admissons_and_supervisions_wide <- state_admissons_and_supervisions |> 
  pivot_wider(names_from = metric, values_from = count)

state_admissons_and_supervisions_wide
#> # A tibble: 4 × 4
#>   state    year Total `Supervision Violation`
#>   <chr>   <dbl> <dbl>                   <dbl>
#> 1 Alabama  2018 14054                    6080
#> 2 Alabama  2019 14148                    6360
#> 3 Alabama  2020 10080                    4761
#> 4 Alabama  2021  9663                    4401

Next, we calculate the non-supervision violation admissions by subtracting supervision admissions from total admissions. Also, rename the total admissions variable (Total) to clearly mark it as total admissions.

state_calc_nonsupervision_adm <- state_admissons_and_supervisions_wide|> 
  mutate(`Non-Supervision Violation` = `Total` - `Supervision Violation`) |> 
  rename(total_admissions = Total)

state_calc_nonsupervision_adm
#> # A tibble: 4 × 5
#>   state    year total_admissions `Supervision Violation` Non-Supervision Viola…¹
#>   <chr>   <dbl>            <dbl>                   <dbl>                   <dbl>
#> 1 Alabama  2018            14054                    6080                    7974
#> 2 Alabama  2019            14148                    6360                    7788
#> 3 Alabama  2020            10080                    4761                    5319
#> 4 Alabama  2021             9663                    4401                    5262
#> # ℹ abbreviated name: ¹`Non-Supervision Violation`

Pivot the data back to the longer form so the disaggregated data is stacked. We also calculated the percent of total admissions for each subgroup; we will use this value in tooltips.

state_admissions_breakdown <- state_calc_nonsupervision_adm |> 
  pivot_longer(
  cols = c(`Supervision Violation`, `Non-Supervision Violation`), 
  names_to = "metric", values_to = "count") |> 
  mutate(perc = percent(count/total_admissions, accuracy = 1))

state_admissions_breakdown
#> # A tibble: 8 × 6
#>   state    year total_admissions metric                    count perc 
#>   <chr>   <dbl>            <dbl> <chr>                     <dbl> <chr>
#> 1 Alabama  2018            14054 Supervision Violation      6080 43%  
#> 2 Alabama  2018            14054 Non-Supervision Violation  7974 57%  
#> 3 Alabama  2019            14148 Supervision Violation      6360 45%  
#> 4 Alabama  2019            14148 Non-Supervision Violation  7788 55%  
#> 5 Alabama  2020            10080 Supervision Violation      4761 47%  
#> 6 Alabama  2020            10080 Non-Supervision Violation  5319 53%  
#> 7 Alabama  2021             9663 Supervision Violation      4401 46%  
#> 8 Alabama  2021             9663 Non-Supervision Violation  5262 54%

Next, we want to specify that if the total is 0, set the value to NA. In the same step, we create the text that will be used in the tooltip. The tooltip is the text that pops up on the screen when the mouse is hovering over the chart. Tooltip styling is done with html.

plot_state_admissions_breakdown <- state_admissions_breakdown |> 
  mutate(
    count = ifelse(count == 0, NA, count),
    tooltip = paste0(
      "<b>", state, " - ", year, "</b><br>",
      metric, " Admissions: ", comma(count), "<br>",
      "Percentage of Total Admissions: ", perc, "<br>", 
      "Total Admissions: ", comma(total_admissions)
      )
    )

plot_state_admissions_breakdown
#> # A tibble: 8 × 7
#>   state    year total_admissions metric                    count perc  tooltip  
#>   <chr>   <dbl>            <dbl> <chr>                     <dbl> <chr> <chr>    
#> 1 Alabama  2018            14054 Supervision Violation      6080 43%   <b>Alaba…
#> 2 Alabama  2018            14054 Non-Supervision Violation  7974 57%   <b>Alaba…
#> 3 Alabama  2019            14148 Supervision Violation      6360 45%   <b>Alaba…
#> 4 Alabama  2019            14148 Non-Supervision Violation  7788 55%   <b>Alaba…
#> 5 Alabama  2020            10080 Supervision Violation      4761 47%   <b>Alaba…
#> 6 Alabama  2020            10080 Non-Supervision Violation  5319 53%   <b>Alaba…
#> 7 Alabama  2021             9663 Supervision Violation      4401 46%   <b>Alaba…
#> 8 Alabama  2021             9663 Non-Supervision Violation  5262 54%   <b>Alaba…

# view single tool tip 
plot_state_admissions_breakdown$tooltip[1]
#> [1] "<b>Alabama - 2018</b><br>Supervision Violation Admissions: 6,080<br>Percentage of Total Admissions: 43%<br>Total Admissions: 14,054"

Creating Highcharts Plot

Before creating the Highcharts plot, specify that the thousands separator should be a comma in the Highcharts setting options. The default for Highcharts is to use a space. We can also create a set theme, and this theme can be used for other plots as well. You only need to do this once every session.

# this sets the thousands separator to a comma 
# so 1000 will be displayed as "1,000"
hcoptslang <- getOption("highcharter.lang")
hcoptslang$thousandsSep <- ","
options(highcharter.lang = hcoptslang)

plot_theme <- hc_theme(
  chart = list(style = list(fontFamily = "Arial", color = "#666666")),
  title = list(
    align = "center",
    style = list(
      fontFamily = "Arial",
      fontWeight = "bold",
      color = "black",
      fontSize = "16px"
      )
    ),
  legend = list(align  = "center", verticalAlign = "top"),
  xAxis = list(gridLineWidth = 0, lineWidth = 0, tickLength = 0),
  yAxis = list(gridLineWidth = 0)
  )

We use the hchart() function to create the highchart object and specify that we should use the dataset we created in the previous section. In this command, we also define which variables in the data should be assigned to x- and y-axes, as well as the variable which defines the group in the chart.

hc_area <- plot_state_admissions_breakdown |> 
  hchart(
    type = "area", # set the type of chart to be an area chart 
    stacking = "normal", # specify to stack the counts on top of each other 
    hcaes(x = year, y = count, group = metric),
    color = c("#C7E8F5", "#D6C246")
    )

hc_area

Next, we add information on the x and y axis, denoting the breaks for the x axis, and the style for the y axis. We remove titles from both x and y axis. We add a main title for the plot. Another crucial step is specifying the tooltips (recall that we created a variable called tooltip).

hc_area <- hc_area |> 
  hc_xAxis(title = "", tickPositions = c(2018, 2019, 2020, 2021)) |> 
  hc_yAxis(title = "", labels = list(format = "{value:,.0f}")) |>
  hc_title(text = paste0(this_state, ": Prison Admissions")) |> 
  hc_tooltip(formatter = JS("function(){return(this.point.tooltip)}"))

hc_area

Next, we specify the plot options. This allows us to set a variety of options:

Specify how the cursor should look as it interacts with the series in the chart.
Adjust the border width of each area segment.
Hide the point markers unless hovering over them with the cursor.
Add accessibility components and descriptive text.

hc_area <- hc_area |> 
  hc_plotOptions(
    series = list(animation = FALSE, cursor = "pointer", borderWidth = 3),
    area = list(marker = list(enabled = FALSE)),
    accessibility = list(
      enabled = TRUE,
      keyboardNavigation = list(enabled = TRUE),
      point = list(
        valueDescriptionFormat = "{point.state}, {point.year}, {point.metric},
        {point.count:,.0f}"
        ),
      linkedDescription = paste0("This is an area chart for the state of", 
                                  this_state, " 
                                  displaying the total prison admissions 
                                  disaggregated by admission type: supervision 
                                  violation admissions and new offense 
                                  (or other non-violation) admissions"),
      landmarkVerbosity = "one"
      )
    )

hc_area

The final step is adding our premade theme to the chart.

hc_area <- hc_area |> 
    hc_add_theme(plot_theme)

hc_area

Bar Chart of Violation Admissions by Type

Data Preparation

First, we need to specify which state we want to use in this chart.
Skip this step if you have already specified the state.

this_state <- "Alabama" # select state

Next, we need to filter our data to the specific values we are interested in: Alabama Admissions to Prison from Parole by Type (new offense violation or technical violation)

state == thisState: filter to state of interest
adm_or_pop == "Admissions": filter to admissions only
metric %in% c("New Offense Violation", "Technical Violation"): filter to show the type disaggregation

state_supervision_type <- svii_data |> 
  filter(
    state == this_state &
    adm_or_pop == "Admissions" & 
    metric %in% c("New Offense Violation", "Technical Violation")
  )

state_supervision_type
#> # A tibble: 16 × 7
#>    state   full_metric                  year count adm_or_pop prob_or_par metric
#>    <chr>   <chr>                       <dbl> <dbl> <chr>      <chr>       <chr> 
#>  1 Alabama Probation New Offense Viol…  2018  2069 Admissions Probation   New O…
#>  2 Alabama Probation Technical Violat…  2018  1683 Admissions Probation   Techn…
#>  3 Alabama Parole New Offense Violati…  2018  1231 Admissions Parole      New O…
#>  4 Alabama Parole Technical Violation…  2018  1097 Admissions Parole      Techn…
#>  5 Alabama Probation New Offense Viol…  2019  2372 Admissions Probation   New O…
#>  6 Alabama Probation Technical Violat…  2019  1596 Admissions Probation   Techn…
#>  7 Alabama Parole New Offense Violati…  2019  1266 Admissions Parole      New O…
#>  8 Alabama Parole Technical Violation…  2019  1126 Admissions Parole      Techn…
#>  9 Alabama Probation New Offense Viol…  2020  1306 Admissions Probation   New O…
#> 10 Alabama Probation Technical Violat…  2020  1838 Admissions Probation   Techn…
#> 11 Alabama Parole New Offense Violati…  2020  1375 Admissions Parole      New O…
#> 12 Alabama Parole Technical Violation…  2020   242 Admissions Parole      Techn…
#> 13 Alabama Probation New Offense Viol…  2021  1073 Admissions Probation   New O…
#> 14 Alabama Probation Technical Violat…  2021  1776 Admissions Probation   Techn…
#> 15 Alabama Parole New Offense Violati…  2021  1295 Admissions Parole      New O…
#> 16 Alabama Parole Technical Violation…  2021   257 Admissions Parole      Techn…

Next, we’ll combine probation and parole admissions so that we can plot the total number of admissions from supervision, while keeping the variable that indicates if the admissions was a new offense violation or technical violation. We’ll also calculate the total number of returns from supervisions and the percentage of supervision violation admissions that were due to new offenses versus technical violations.

state_supervision_breakdown <- state_supervision_type |> 
  group_by(state, year, metric, adm_or_pop) |> 
  summarize(count = sum(count, na.rm = TRUE), .groups = "drop") |>
  group_by(state, year) |> 
  mutate(
    all_sup_returns = sum(count),
    pct = count / all_sup_returns
    ) |> 
  ungroup()

state_supervision_breakdown
#> # A tibble: 8 × 7
#>   state    year metric                adm_or_pop count all_sup_returns   pct
#>   <chr>   <dbl> <chr>                 <chr>      <dbl>           <dbl> <dbl>
#> 1 Alabama  2018 New Offense Violation Admissions  3300            6080 0.543
#> 2 Alabama  2018 Technical Violation   Admissions  2780            6080 0.457
#> 3 Alabama  2019 New Offense Violation Admissions  3638            6360 0.572
#> 4 Alabama  2019 Technical Violation   Admissions  2722            6360 0.428
#> 5 Alabama  2020 New Offense Violation Admissions  2681            4761 0.563
#> 6 Alabama  2020 Technical Violation   Admissions  2080            4761 0.437
#> 7 Alabama  2021 New Offense Violation Admissions  2368            4401 0.538
#> 8 Alabama  2021 Technical Violation   Admissions  2033            4401 0.462

Next, we want to specify that if the total is 0, set the value to NA. In the same step, we create the text that will be used in the tooltip.

plot_state_supervision_breakdown <- state_supervision_breakdown |>
  mutate(
    count = ifelse(count == 0, NA, count),
    tooltip = paste0(
      "<b>", state, " - ", year, "</b><br>",
      metric, " Admissions: ", comma(count, 1), "<br>",
      "Percentage of Total Supervision Admissions: ", percent(pct, 1),"<br>", 
      "Total Return from Supervision Admissions: ", comma(all_sup_returns, 1)
      )
    )

plot_state_supervision_breakdown
#> # A tibble: 8 × 8
#>   state    year metric            adm_or_pop count all_sup_returns   pct tooltip
#>   <chr>   <dbl> <chr>             <chr>      <dbl>           <dbl> <dbl> <chr>  
#> 1 Alabama  2018 New Offense Viol… Admissions  3300            6080 0.543 <b>Ala…
#> 2 Alabama  2018 Technical Violat… Admissions  2780            6080 0.457 <b>Ala…
#> 3 Alabama  2019 New Offense Viol… Admissions  3638            6360 0.572 <b>Ala…
#> 4 Alabama  2019 Technical Violat… Admissions  2722            6360 0.428 <b>Ala…
#> 5 Alabama  2020 New Offense Viol… Admissions  2681            4761 0.563 <b>Ala…
#> 6 Alabama  2020 Technical Violat… Admissions  2080            4761 0.437 <b>Ala…
#> 7 Alabama  2021 New Offense Viol… Admissions  2368            4401 0.538 <b>Ala…
#> 8 Alabama  2021 Technical Violat… Admissions  2033            4401 0.462 <b>Ala…

# view a single tool tip 
plot_state_supervision_breakdown$tooltip[1]
#> [1] "<b>Alabama - 2018</b><br>New Offense Violation Admissions: 3,300<br>Percentage of Total Supervision Admissions: 54%<br>Total Return from Supervision Admissions: 6,080"

Create Highcharts Plot

Skip this step if you have already specified thousands separator and created the plot theme.

Code

# this sets the thousands separator to a comma 
# so if you 1000 it will be displayed as "1,000"
hcoptslang <- getOption("highcharter.lang")
hcoptslang$thousandsSep <- ","
options(highcharter.lang = hcoptslang)
  
plot_theme <- hc_theme(
  chart = list(style = list(fontFamily = "Arial", color = "#666666")),
  title = list(
    align = "center",
    style = list(
      fontFamily = "Arial",
      fontWeight = "bold",
      color = "black",
      fontSize = "16px"
      )
    ),
  legend = list(align  = "center", verticalAlign = "top"),
  xAxis = list(gridLineWidth = 0, lineWidth = 0, tickLength = 0),
  yAxis = list(gridLineWidth = 0)
  )

We start by creating a highchart object with the dataset created in the previous section.

hc_bar <- plot_state_supervision_breakdown |> 
   hchart(
    type = "column", # specify the type of chart to be a column chart 
    hcaes(x = year, y = count, group = metric),
    color = c("#D25E2D", "#EDB799")
  )

Next, we remove titles from the x and y axis, add a main chart title, and display the created text in the variable tooltip for the tooltips in the chart.

hc_bar <- hc_bar |> 
  hc_xAxis(title = "") |> 
  hc_yAxis(title = "", labels = list(format = "{value:,.0f}")) |> 
  hc_title(text = paste0(this_state, ": Supervision Violation Admissions by Type")) |> 
  hc_tooltip(formatter = JS("function(){return(this.point.tooltip)}"))

hc_bar

Next, we specify the plot options. This allows us to set a variety of options:

Specify how the cursor should look as it interacts with the series in the chart.
Adjust the border width of each area segment.
Hide the point markers unless hovering over them with the cursor.
Add accessibility components and descriptive text.

hc_bar <- hc_bar |> 
    hc_plotOptions(
    series = list(animation = FALSE, cursor = "pointer", borderWidth = 3),
    area = list(marker = list(enabled = FALSE)),
    accessibility = list(
      enabled = TRUE,
      keyboardNavigation = list(enabled = TRUE),
      point = list(
        valueDescriptionFormat = "{point.state}, {point.year}, {point.metric},
        {point.count:,.0f}"
        ),
      linkedDescription = paste0("This is a bar chart for the state of", 
                                 this_state, 
                                 "displaying the total prison admissions due to 
                                 supervision violations, disaggregated by technical 
                                 violations vs. new offense violations."),
      landmarkVerbosity = "one"
      )
    )

hc_bar

Now we want to style the chart.

hc_bar <- hc_bar |> 
  hc_add_theme(plot_theme)

hc_bar

Hex Map

We will create a U.S. map by state that shows the percent change in total admissions between 2018 and 2021. The colors of each state on the map will indicate the magnitude of the change. Rather than use the standard U.S. state map, we’ll create a hexbin map, which depicts each state as an equal-sized hexagon.

Data Preparation

Hex Map Data

To create a hex map of the 50 states, you need to download the hex map coordinates. The hex map coordinates are referenced on the R Graph Gallery, Hexbin map in R: an example with US State. A cleaned version of the coordinates can be imported using the link below. If you are interested in downloading the data and cleaning it yourself, please review Appendix: Hex Map Data Prep.

hex_url <- "https://github.com/CSGJusticeCenter/va_data/raw/main/model_code/violation_admissions/us_hex_map.json"
hex <- fromJSON(file = hex_url)

Prison Admissons Data from SVII Dataset

We will display data downloaded from the Supervision Violations and Their Impact on Incarceration report.

We need to filter the admissions data to the specific values used in the hex map: Change in Total Admissions to State Prison from 2018 to 2021.

metric == "Total": filter to total metric (total admissions to state prison)
adm_or_pop == "Admissions": filter to admissions only
year %in% c(2018, 2021): filter to 2018 and 2021 so you can calculate the change between the two years

total_prison_adm <- svii_data |> 
  filter(metric == "Total", adm_or_pop == "Admissions", year %in% c(2018, 2021))

total_prison_adm
#> # A tibble: 100 × 7
#>    state      full_metric       year count adm_or_pop prob_or_par metric
#>    <chr>      <chr>            <dbl> <dbl> <chr>      <chr>       <chr> 
#>  1 Alabama    Total Admissions  2018 14054 Admissions <NA>        Total 
#>  2 Alabama    Total Admissions  2021  9663 Admissions <NA>        Total 
#>  3 Alaska     Total Admissions  2018 32627 Admissions <NA>        Total 
#>  4 Alaska     Total Admissions  2021    NA Admissions <NA>        Total 
#>  5 Arizona    Total Admissions  2018 18361 Admissions <NA>        Total 
#>  6 Arizona    Total Admissions  2021 11518 Admissions <NA>        Total 
#>  7 Arkansas   Total Admissions  2018  9204 Admissions <NA>        Total 
#>  8 Arkansas   Total Admissions  2021  8123 Admissions <NA>        Total 
#>  9 California Total Admissions  2018 35391 Admissions <NA>        Total 
#> 10 California Total Admissions  2021 29425 Admissions <NA>        Total 
#> # ℹ 90 more rows

Only select variables that are needed (this makes the pivoting easier). Then pivot the data so that the years (2018, 2021) are their own columns.

total_prison_adm_wide <- total_prison_adm |> 
  select(state, full_metric, year, count) |> 
  # pivot wider so values for 2018 and 2021 are in 2 different columns 
  pivot_wider(names_from = year, values_from = count)

total_prison_adm_wide
#> # A tibble: 50 × 4
#>    state       full_metric      `2018` `2021`
#>    <chr>       <chr>             <dbl>  <dbl>
#>  1 Alabama     Total Admissions  14054   9663
#>  2 Alaska      Total Admissions  32627     NA
#>  3 Arizona     Total Admissions  18361  11518
#>  4 Arkansas    Total Admissions   9204   8123
#>  5 California  Total Admissions  35391  29425
#>  6 Colorado    Total Admissions   9985   5086
#>  7 Connecticut Total Admissions  21018  12717
#>  8 Delaware    Total Admissions  13358   9899
#>  9 Florida     Total Admissions  31285  20800
#> 10 Georgia     Total Admissions  18275  13611
#> # ℹ 40 more rows

Next, we mutate the data by creating new variables to reflect the years in the change calculations, calculating the change between 2018 and 2021, adding state abbreviation identifier, creating tooltip text, and formatting the text displayed for the percent change.

plot_total_prison_adm_change <- total_prison_adm_wide |> 
  mutate(
    state_abb = state.abb[match(state.name, state)],
    n_change = `2021` - `2018`,
    pct_change = n_change / `2018`  * 100,
    tooltip = paste0(
      "<b>", state, "</b><br>",
      "2018 admissions: ", comma(`2018`, 1), "<br>",
      "2021 admissions: ", comma(`2021`, 1), "<br>",
      "Change in admissions 2018 to 2021: ", comma(n_change, 1), "<br>",
      "Percent change in admissions 2018 to 2021: ", percent(pct_change, 1, scale = 1)
      ),
    changelabel = ifelse(is.na(pct_change), "-", percent(pct_change, 1, scale = 1))
    )

plot_total_prison_adm_change
#> # A tibble: 50 × 9
#>    state       full_metric   `2018` `2021` state_abb n_change pct_change tooltip
#>    <chr>       <chr>          <dbl>  <dbl> <chr>        <dbl>      <dbl> <chr>  
#>  1 Alabama     Total Admiss…  14054   9663 AL           -4391      -31.2 <b>Ala…
#>  2 Alaska      Total Admiss…  32627     NA AK              NA       NA   <b>Ala…
#>  3 Arizona     Total Admiss…  18361  11518 AZ           -6843      -37.3 <b>Ari…
#>  4 Arkansas    Total Admiss…   9204   8123 AR           -1081      -11.7 <b>Ark…
#>  5 California  Total Admiss…  35391  29425 CA           -5966      -16.9 <b>Cal…
#>  6 Colorado    Total Admiss…   9985   5086 CO           -4899      -49.1 <b>Col…
#>  7 Connecticut Total Admiss…  21018  12717 CT           -8301      -39.5 <b>Con…
#>  8 Delaware    Total Admiss…  13358   9899 DE           -3459      -25.9 <b>Del…
#>  9 Florida     Total Admiss…  31285  20800 FL          -10485      -33.5 <b>Flo…
#> 10 Georgia     Total Admiss…  18275  13611 GA           -4664      -25.5 <b>Geo…
#> # ℹ 40 more rows
#> # ℹ 1 more variable: changelabel <chr>

Create Highcharts Plot

We can create a set theme for the hex map. We also specify the minimum and maximum values in the dataset.

map_theme <- hc_theme(
  chart = list(style = list(fontFamily = "Arial", color = "#666666")),
  title = list(
    style = list(
      fontFamily = "Arial",
      fontWeight = "bold",
      color = "black",
      fontSize   = "30px"
      )
    )
  )


min_map <- min(plot_total_prison_adm_change$pct_change, na.rm = TRUE)
max_map <- max(plot_total_prison_adm_change$pct_change, na.rm = TRUE)

We start by plotting the hex map data and adding the labels (the state abbreviation and percent change).

hc_hex <- highchart() |> 
  hc_add_series_map(
    map = hex,
    df = plot_total_prison_adm_change,
    joinBy = "state_abb",
    value = "pct_change",
      dataLabels = list(
        enabled = TRUE,
        useHTML = TRUE,
        formatter = JS("function() {return '<div style=\"text-align:center;\">' +
          '<span style=\"font-weight:bold;\">' + this.point.state_abb + '</span><br>' +
          '<span>' + this.point.changelabel + '</span>' +
          '</div>';}"),
        style = list(
          fontSize = "14px",
          fontWeight = "regular"
          )
        ),
    nullColor = "#e8e8e8",
    accessibility = list(
      point = list(
        valueDescriptionFormat = "state: {point.state}, percent change: {point.value:.1f}"
        )
      )
    )

hc_hex

Next, we use the min/max values we calculated to establish a gradient color legend. This will fill each state hexagon with a color based on that state’s percent change. We also specify where to put the legend on the chart.

hc_hex <- hc_hex |> 
    hc_colorAxis(
    min = min_map,
    max = max_map,
    stops = color_stops(4, c("#004270", "#236ca7", "#C7E8F5", "#FFFFFF")),
    labels = list(format = "{value}%")
    ) |> 
  hc_legend(
    align = "right",
    layout = "vertical",
    verticalAlign = "top",
    y = 300
  )

hc_hex

Next, we specify the chart title and what variable to use as the tooltip text.

hc_hex <- hc_hex |> 
  hc_title(text = "Change in Total Admissions to State Prison<br>2018–2021") |> 
  hc_tooltip(
    formatter = JS("function(){return(this.point.tooltip)}"),
    outside = TRUE
  )

hc_hex

Finally, we add the theme that we created at the top of this section.

hc_hex <- hc_hex |> 
  hc_add_theme(map_theme)

hc_hex

R Session Info

#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14 ucrt)
#>  os       Windows 10 x64 (build 19045)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United States.utf8
#>  ctype    English_United States.utf8
#>  tz       America/New_York
#>  date     2024-08-09
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  assertthat    0.2.1      2019-03-21 []  CRAN (R 4.4.0)
#>  backports     1.5.0      2024-05-23 []  CRAN (R 4.4.0)
#>  bit           4.0.5      2022-11-15 []  CRAN (R 4.4.0)
#>  bit64         4.0.5      2020-08-30 []  CRAN (R 4.4.0)
#>  broom         1.0.6      2024-05-17 []  CRAN (R 4.4.0)
#>  class         7.3-22     2023-05-03 []  CRAN (R 4.4.1)
#>  classInt      0.4-10     2023-09-05 []  CRAN (R 4.4.0)
#>  cli           3.6.2      2023-12-11 []  CRAN (R 4.4.0)
#>  colorspace    2.1-0      2023-01-23 []  CRAN (R 4.4.0)
#>  crayon        1.5.3      2024-06-20 []  CRAN (R 4.4.1)
#>  curl          5.2.1      2024-03-01 []  CRAN (R 4.4.0)
#>  data.table    1.15.4     2024-03-30 []  CRAN (R 4.4.0)
#>  DBI           1.2.2      2024-02-16 []  CRAN (R 4.4.0)
#>  digest        0.6.36     2024-06-23 []  CRAN (R 4.4.1)
#>  dplyr       * 1.1.4      2023-11-17 []  CRAN (R 4.4.0)
#>  e1071         1.7-14     2023-12-06 []  CRAN (R 4.4.0)
#>  evaluate      0.24.0     2024-06-10 []  CRAN (R 4.4.1)
#>  fansi         1.0.6      2023-12-08 []  CRAN (R 4.4.0)
#>  fastmap       1.2.0      2024-05-15 []  CRAN (R 4.4.0)
#>  forcats     * 1.0.0      2023-01-29 []  CRAN (R 4.4.0)
#>  generics      0.1.3      2022-07-05 []  CRAN (R 4.4.0)
#>  geojsonsf   * 2.0.3      2022-05-30 []  CRAN (R 4.4.0)
#>  ggplot2     * 3.5.1      2024-04-23 []  CRAN (R 4.4.0)
#>  glue          1.7.0      2024-01-09 []  CRAN (R 4.4.0)
#>  gtable        0.3.5      2024-04-22 []  CRAN (R 4.4.0)
#>  highcharter * 0.9.4.9000 2024-06-07 []  Github (batpigandme/highcharter@6644cf7)
#>  hms           1.1.3      2023-03-21 []  CRAN (R 4.4.0)
#>  htmltools     0.5.8.1    2024-04-04 []  CRAN (R 4.4.0)
#>  htmlwidgets   1.6.4      2023-12-06 []  CRAN (R 4.4.0)
#>  igraph        2.0.3      2024-03-13 []  CRAN (R 4.4.0)
#>  jsonlite    * 1.8.8      2023-12-04 []  CRAN (R 4.4.0)
#>  KernSmooth    2.23-24    2024-05-17 []  CRAN (R 4.4.1)
#>  knitr         1.48       2024-07-07 []  CRAN (R 4.4.1)
#>  lattice       0.22-6     2024-03-20 []  CRAN (R 4.4.1)
#>  lifecycle     1.0.4      2023-11-07 []  CRAN (R 4.4.0)
#>  lubridate   * 1.9.3      2023-09-27 []  CRAN (R 4.4.0)
#>  magrittr      2.0.3      2022-03-30 []  CRAN (R 4.4.0)
#>  munsell       0.5.1      2024-04-01 []  CRAN (R 4.4.0)
#>  pillar        1.9.0      2023-03-22 []  CRAN (R 4.4.0)
#>  pkgconfig     2.0.3      2019-09-22 []  CRAN (R 4.4.0)
#>  proxy         0.4-27     2022-06-09 []  CRAN (R 4.4.0)
#>  purrr       * 1.0.2      2023-08-10 []  CRAN (R 4.4.0)
#>  quantmod      0.4.26     2024-02-14 []  CRAN (R 4.4.0)
#>  R6            2.5.1      2021-08-19 []  CRAN (R 4.4.0)
#>  Rcpp          1.0.13     2024-07-17 []  CRAN (R 4.4.1)
#>  readr       * 2.1.5      2024-01-10 []  CRAN (R 4.4.0)
#>  rjson       * 0.2.21     2022-01-09 []  CRAN (R 4.4.0)
#>  rlang         1.1.3      2024-01-10 []  CRAN (R 4.4.0)
#>  rlist         0.4.6.2    2021-09-03 []  CRAN (R 4.4.0)
#>  rmarkdown     2.27       2024-05-17 []  CRAN (R 4.4.0)
#>  rstudioapi    0.16.0     2024-03-24 []  CRAN (R 4.4.0)
#>  scales      * 1.3.0      2023-11-28 []  CRAN (R 4.4.0)
#>  sessioninfo   1.2.2      2021-12-06 []  CRAN (R 4.4.0)
#>  sf          * 1.0-16     2024-03-24 []  CRAN (R 4.4.0)
#>  stringi       1.8.4      2024-05-06 []  CRAN (R 4.4.0)
#>  stringr     * 1.5.1      2023-11-14 []  CRAN (R 4.4.0)
#>  tibble      * 3.2.1      2023-03-20 []  CRAN (R 4.4.0)
#>  tidyr       * 1.3.1      2024-01-24 []  CRAN (R 4.4.0)
#>  tidyselect    1.2.1      2024-03-11 []  CRAN (R 4.4.0)
#>  tidyverse   * 2.0.0      2023-02-22 []  CRAN (R 4.4.0)
#>  timechange    0.3.0      2024-01-18 []  CRAN (R 4.4.0)
#>  TTR           0.24.4     2023-11-28 []  CRAN (R 4.4.0)
#>  tzdb          0.4.0      2023-05-12 []  CRAN (R 4.4.0)
#>  units         0.8-5      2023-11-28 []  CRAN (R 4.4.0)
#>  utf8          1.2.4      2023-10-22 []  CRAN (R 4.4.0)
#>  vctrs         0.6.5      2023-12-01 []  CRAN (R 4.4.0)
#>  vroom         1.6.5      2023-12-05 []  CRAN (R 4.4.0)
#>  withr         3.0.0      2024-01-16 []  CRAN (R 4.4.0)
#>  xfun          0.46       2024-07-18 []  CRAN (R 4.4.1)
#>  xts           0.14.0     2024-06-05 []  CRAN (R 4.4.0)
#>  yaml          2.3.9      2024-07-05 []  CRAN (R 4.4.1)
#>  zoo           1.8-12     2023-04-13 []  CRAN (R 4.4.0)
#> 
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Appendix: Hex Map Data Prep

Hex Map Coordinates

To create a hex map of the 50 states, you need to download the hex map coordinates. The hex map coordinates are referenced on the R Graph Gallery, Hexbin map in R: an example with US State. The hexagon boundaries can be downloaded in geojson format from Carto or using the link shown below.

raw_hex_url <- "https://github.com/CSGJusticeCenter/va_data/raw/main/model_code/violation_admissions/us_states_hexgrid.geojson"

rawhex <- read_sf(raw_hex_url) |>
  # rename identifier as 'state_abb'
  select(state_abb = iso3166_2) |>
  # remove DC as our data is for the 50 states only 
  filter(state_abb != "DC") |>
  # create new variable with full state name 
  mutate(state_name = state.name[match(state_abb, state.abb)])

rawhex
#> Simple feature collection with 50 features and 2 fields
#> Geometry type: POLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -137.9747 ymin: 26.39343 xmax: -69.90286 ymax: 55.3132
#> Geodetic CRS:  WGS 84
#> # A tibble: 50 × 3
#>    state_abb                                                 geometry state_name
#>  * <chr>                                                <POLYGON [°]> <chr>     
#>  1 ME        ((-72.62574 55.3132, -69.90286 54.40843, -69.90286 52.5… Maine     
#>  2 RI        ((-72.62574 49.57439, -69.90286 48.54431, -69.90286 46.… Rhode Isl…
#>  3 VT        ((-80.79436 52.53744, -78.07148 51.57081, -78.07148 49.… Vermont   
#>  4 OK        ((-110.746 35.79821, -108.0231 34.51297, -108.0231 31.8… Oklahoma  
#>  5 NC        ((-91.68585 39.5301, -88.96298 38.30704, -88.96298 35.7… North Car…
#>  6 VA        ((-88.96298 43.0717, -86.2401 41.91257, -86.2401 39.530… Virginia  
#>  7 WV        ((-94.40873 43.0717, -91.68585 41.91257, -91.68585 39.5… West Virg…
#>  8 CA        ((-124.3603 39.5301, -121.6375 38.30704, -121.6375 35.7… California
#>  9 KS        ((-108.0231 39.5301, -105.3002 38.30704, -105.3002 35.7… Kansas    
#> 10 KY        ((-99.85447 43.0717, -97.1316 41.91257, -97.1316 39.530… Kentucky  
#> # ℹ 40 more rows

Next, we need to reformat the data. First, we set the coordinate reference system (or CRS) by referring to a specific EPSG code. The code used below (3857) is a reference to the WGS 84 / Pseudo-Mercator CRS which will “square” the hexagons.

# Reformat hex data
hex <- rawhex |>
  # set CRS to WGS 84 / Pseudo-Mercator
  st_transform(3857) |>
  # convert sf object to geojson format 
  geojsonsf::sf_geojson() |>
  # converts geojson format to list format (to integrate with Highcharts) 
  jsonlite::fromJSON(simplifyVector = FALSE)

hex$type
#> [1] "FeatureCollection"

hex$features[1] # example of data for single state (Maine)
#> [[1]]
#> [[1]]$type
#> [1] "Feature"
#> 
#> [[1]]$properties
#> [[1]]$properties$state_abb
#> [1] "ME"
#> 
#> [[1]]$properties$state_name
#> [1] "Maine"
#> 
#> 
#> [[1]]$geometry
#> [[1]]$geometry$type
#> [1] "Polygon"
#> 
#> [[1]]$geometry$coordinates
#> [[1]]$geometry$coordinates[[1]]
#> [[1]]$geometry$coordinates[[1]][[1]]
#> [[1]]$geometry$coordinates[[1]][[1]][[1]]
#> [1] -8084660
#> 
#> [[1]]$geometry$coordinates[[1]][[1]][[2]]
#> [1] 7422891
#> 
#> 
#> [[1]]$geometry$coordinates[[1]][[2]]
#> [[1]]$geometry$coordinates[[1]][[2]][[1]]
#> [1] -7781551
#> 
#> [[1]]$geometry$coordinates[[1]][[2]][[2]]
#> [1] 7247891
#> 
#> 
#> [[1]]$geometry$coordinates[[1]][[3]]
#> [[1]]$geometry$coordinates[[1]][[3]][[1]]
#> [1] -7781551
#> 
#> [[1]]$geometry$coordinates[[1]][[3]][[2]]
#> [1] 6897891
#> 
#> 
#> [[1]]$geometry$coordinates[[1]][[4]]
#> [[1]]$geometry$coordinates[[1]][[4]][[1]]
#> [1] -8084660
#> 
#> [[1]]$geometry$coordinates[[1]][[4]][[2]]
#> [1] 6722891
#> 
#> 
#> [[1]]$geometry$coordinates[[1]][[5]]
#> [[1]]$geometry$coordinates[[1]][[5]][[1]]
#> [1] -8387769
#> 
#> [[1]]$geometry$coordinates[[1]][[5]][[2]]
#> [1] 6897891
#> 
#> 
#> [[1]]$geometry$coordinates[[1]][[6]]
#> [[1]]$geometry$coordinates[[1]][[6]][[1]]
#> [1] -8387769
#> 
#> [[1]]$geometry$coordinates[[1]][[6]][[2]]
#> [1] 7247891
#> 
#> 
#> [[1]]$geometry$coordinates[[1]][[7]]
#> [[1]]$geometry$coordinates[[1]][[7]][[1]]
#> [1] -8084660
#> 
#> [[1]]$geometry$coordinates[[1]][[7]][[2]]
#> [1] 7422891

Why is CRS important?

The CRS determines how the map is laid out into a 2-dimensional format. The hexagon borders are based on actual lat/long values for the U.S. states. Notice how the default CRS results in slightly wrapped hexagons whereas the new CRS makes all hexagons square