Supervision Violations Data and Creating Highcharts
Introduction
This model code will allow you to produce three data visualizations from the public report Supervision Violations and Their Impact on Incarceration. Utilizing the provided model code in the R statistical programming language on your local computer, you will be able to replicate the process of importing data, cleaning and wrangling these data, and finally creating data visualizations displaying supervision and non-supervision violation prison admission trends through area charts, supervision violation prison admission trends by violation type through bar charts, and a hex map of change in total prison admissions. This model code is reproducible and includes quality assurance checks for accuracy.
Highcharts
Create responsive and interactive plots using Highcharts. Additional information on Highcharts can be found in the R package documentation: jkunst.com/highcharter/.
Set Up
To follow along with this tutorial, you’ll need to have the R programming language installed as well as several R packages. If you don’t already have these packages installed, you can install them:
install.packages(c("tidyverse", "highcharter", "sf",
"jsonlite", "geojsonsf", "scales",
"rjson"))
After the packages are installed, we are able to load them into our session using the library()
function.
library(tidyverse)
library(highcharter)
library(sf)
library(jsonlite)
library(geojsonsf)
library(scales)
library(rjson)
General Data Preparation
Download Data
We use data collected by The CSG Justice Center from state corrections departments, which provided annual counts of total prison admissions and prison populations as well as counts of prison admissions due to supervision violations. These violations are further broken down by the type of supervision (probation or parole) as well as by new offense and technical violation admissions when available. This data is used in all three charts. You can download supervision data from CSGJC’s Supervision Violations Impact on Incarceration tool or using the link shown below.
Import that data into R:
<- "https://github.com/CSGJusticeCenter/va_data/raw/main/model_code/violation_admissions/MCLC_2024-05-29.csv"
svvi_data_url
<- read_csv(svvi_data_url) svii_raw_download
svii_raw_download#> # A tibble: 3,200 × 4
#> state metric year total
#> <chr> <chr> <dbl> <dbl>
#> 1 Alabama Total Admissions 2018 14054
#> 2 Alabama Supervision Violation Admissions 2018 6080
#> 3 Alabama Probation Violation Admissions 2018 3752
#> 4 Alabama Probation New Offense Violation Admissions 2018 2069
#> 5 Alabama Probation Technical Violation Admissions 2018 1683
#> 6 Alabama Parole Violation Admissions 2018 2328
#> 7 Alabama Parole New Offense Violation Admissions 2018 1231
#> 8 Alabama Parole Technical Violation Admissions 2018 1097
#> 9 Alabama Total Population 2018 27191
#> 10 Alabama Supervision Violation Population 2018 206
#> # ℹ 3,190 more rows
Data Wrangling
When you download the data, there are 16 unique metrics. Notice that each metric appears 200 times in the dataset. This makes sense since there are 50 states and each state has 4 years of data (2018, 2019, 2020, and 2021); 50*4 = 200.
|>
svii_raw_download count(metric)
#> # A tibble: 16 × 2
#> metric n
#> <chr> <int>
#> 1 Parole New Offense Violation Admissions 200
#> 2 Parole New Offense Violation Population 200
#> 3 Parole Technical Violation Admissions 200
#> 4 Parole Technical Violation Population 200
#> 5 Parole Violation Admissions 200
#> 6 Parole Violation Population 200
#> 7 Probation New Offense Violation Admissions 200
#> 8 Probation New Offense Violation Population 200
#> 9 Probation Technical Violation Admissions 200
#> 10 Probation Technical Violation Population 200
#> 11 Probation Violation Admissions 200
#> 12 Probation Violation Population 200
#> 13 Supervision Violation Admissions 200
#> 14 Supervision Violation Population 200
#> 15 Total Admissions 200
#> 16 Total Population 200
The 16 unique full metrics can be separated into three distinct concepts or variables:
adm_or_pop
: Prison Admissions or Prison Population
prob_or_par
: Parole, Probation, or NA for metrics that include bothmetric
: simplified metric categories of New Offense Violation, Technical Violation, Supervision Violation (includes both New Offense or Technical for both parole and probation), Parole Violation (includes both New Offense or Technical), Probation Violation (includes both New Offense or Technical) or Total (total prison admissions or population)
<- svii_raw_download |>
svii_data rename(
count = total,
full_metric = metric
|>
) mutate(
adm_or_pop = word(full_metric, -1),
prob_or_par = ifelse(
word(full_metric, 1) %in% c("Parole", "Probation"),
word(full_metric, 1), NA
), metric = case_when(
is.na(prob_or_par) == TRUE ~ word(full_metric, 1, -2),
word(full_metric, 2) == "Violation" ~ word(full_metric, 1, 2),
word(full_metric, 2) %in% c("New", "Technical") ~ word(full_metric, 2, -2)
) )
svii_data#> # A tibble: 3,200 × 7
#> state full_metric year count adm_or_pop prob_or_par metric
#> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
#> 1 Alabama Total Admissions 2018 14054 Admissions <NA> Total
#> 2 Alabama Supervision Violation Admi… 2018 6080 Admissions <NA> Super…
#> 3 Alabama Probation Violation Admiss… 2018 3752 Admissions Probation Proba…
#> 4 Alabama Probation New Offense Viol… 2018 2069 Admissions Probation New O…
#> 5 Alabama Probation Technical Violat… 2018 1683 Admissions Probation Techn…
#> 6 Alabama Parole Violation Admissions 2018 2328 Admissions Parole Parol…
#> 7 Alabama Parole New Offense Violati… 2018 1231 Admissions Parole New O…
#> 8 Alabama Parole Technical Violation… 2018 1097 Admissions Parole Techn…
#> 9 Alabama Total Population 2018 27191 Population <NA> Total
#> 10 Alabama Supervision Violation Popu… 2018 206 Population <NA> Super…
#> # ℹ 3,190 more rows
Area Chart of Prison Admissions by Type
Data Preparation
First we need to specify which state we want to use in the chart.
<- "Alabama" # select state this_state
Next, we need to filter our data to only include specific metrics for the specific state:
state == thisState
: this chart is shown for a single state of interestfull_metric %in% c("Total Admissions", "Supervision Violation Admissions")
: look at total admissions and supervision violation admissions
<- svii_data |>
state_admissons_and_supervisions filter(
== this_state,
state %in% c("Total Admissions", "Supervision Violation Admissions")
full_metric |>
) select(state, year, count, metric)
state_admissons_and_supervisions#> # A tibble: 8 × 4
#> state year count metric
#> <chr> <dbl> <dbl> <chr>
#> 1 Alabama 2018 14054 Total
#> 2 Alabama 2018 6080 Supervision Violation
#> 3 Alabama 2019 14148 Total
#> 4 Alabama 2019 6360 Supervision Violation
#> 5 Alabama 2020 10080 Total
#> 6 Alabama 2020 4761 Supervision Violation
#> 7 Alabama 2021 9663 Total
#> 8 Alabama 2021 4401 Supervision Violation
We want to show the breakdown of total prison admissions by supervision violation admissions and non-supervision violation admissions. In the data, we have total admissions and supervision violation admissions, but will need to calculate the non-supervision violation admissions.
Start by pivoting the current data to a wider format.
<- state_admissons_and_supervisions |>
state_admissons_and_supervisions_wide pivot_wider(names_from = metric, values_from = count)
state_admissons_and_supervisions_wide#> # A tibble: 4 × 4
#> state year Total `Supervision Violation`
#> <chr> <dbl> <dbl> <dbl>
#> 1 Alabama 2018 14054 6080
#> 2 Alabama 2019 14148 6360
#> 3 Alabama 2020 10080 4761
#> 4 Alabama 2021 9663 4401
Next, we calculate the non-supervision violation admissions by subtracting supervision admissions from total admissions. Also, rename the total admissions variable (Total
) to clearly mark it as total admissions.
<- state_admissons_and_supervisions_wide|>
state_calc_nonsupervision_adm mutate(`Non-Supervision Violation` = `Total` - `Supervision Violation`) |>
rename(total_admissions = Total)
state_calc_nonsupervision_adm#> # A tibble: 4 × 5
#> state year total_admissions `Supervision Violation` Non-Supervision Viola…¹
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Alabama 2018 14054 6080 7974
#> 2 Alabama 2019 14148 6360 7788
#> 3 Alabama 2020 10080 4761 5319
#> 4 Alabama 2021 9663 4401 5262
#> # ℹ abbreviated name: ¹`Non-Supervision Violation`
Pivot the data back to the longer form so the disaggregated data is stacked. We also calculated the percent of total admissions for each subgroup; we will use this value in tooltips.
<- state_calc_nonsupervision_adm |>
state_admissions_breakdown pivot_longer(
cols = c(`Supervision Violation`, `Non-Supervision Violation`),
names_to = "metric", values_to = "count") |>
mutate(perc = percent(count/total_admissions, accuracy = 1))
state_admissions_breakdown#> # A tibble: 8 × 6
#> state year total_admissions metric count perc
#> <chr> <dbl> <dbl> <chr> <dbl> <chr>
#> 1 Alabama 2018 14054 Supervision Violation 6080 43%
#> 2 Alabama 2018 14054 Non-Supervision Violation 7974 57%
#> 3 Alabama 2019 14148 Supervision Violation 6360 45%
#> 4 Alabama 2019 14148 Non-Supervision Violation 7788 55%
#> 5 Alabama 2020 10080 Supervision Violation 4761 47%
#> 6 Alabama 2020 10080 Non-Supervision Violation 5319 53%
#> 7 Alabama 2021 9663 Supervision Violation 4401 46%
#> 8 Alabama 2021 9663 Non-Supervision Violation 5262 54%
Next, we want to specify that if the total is 0
, set the value to NA
. In the same step, we create the text that will be used in the tooltip. The tooltip is the text that pops up on the screen when the mouse is hovering over the chart. Tooltip styling is done with html.
<- state_admissions_breakdown |>
plot_state_admissions_breakdown mutate(
count = ifelse(count == 0, NA, count),
tooltip = paste0(
"<b>", state, " - ", year, "</b><br>",
" Admissions: ", comma(count), "<br>",
metric, "Percentage of Total Admissions: ", perc, "<br>",
"Total Admissions: ", comma(total_admissions)
) )
plot_state_admissions_breakdown#> # A tibble: 8 × 7
#> state year total_admissions metric count perc tooltip
#> <chr> <dbl> <dbl> <chr> <dbl> <chr> <chr>
#> 1 Alabama 2018 14054 Supervision Violation 6080 43% <b>Alaba…
#> 2 Alabama 2018 14054 Non-Supervision Violation 7974 57% <b>Alaba…
#> 3 Alabama 2019 14148 Supervision Violation 6360 45% <b>Alaba…
#> 4 Alabama 2019 14148 Non-Supervision Violation 7788 55% <b>Alaba…
#> 5 Alabama 2020 10080 Supervision Violation 4761 47% <b>Alaba…
#> 6 Alabama 2020 10080 Non-Supervision Violation 5319 53% <b>Alaba…
#> 7 Alabama 2021 9663 Supervision Violation 4401 46% <b>Alaba…
#> 8 Alabama 2021 9663 Non-Supervision Violation 5262 54% <b>Alaba…
# view single tool tip
$tooltip[1]
plot_state_admissions_breakdown#> [1] "<b>Alabama - 2018</b><br>Supervision Violation Admissions: 6,080<br>Percentage of Total Admissions: 43%<br>Total Admissions: 14,054"
Creating Highcharts Plot
Before creating the Highcharts plot, specify that the thousands separator should be a comma in the Highcharts setting options. The default for Highcharts is to use a space. We can also create a set theme, and this theme can be used for other plots as well. You only need to do this once every session.
# this sets the thousands separator to a comma
# so 1000 will be displayed as "1,000"
<- getOption("highcharter.lang")
hcoptslang $thousandsSep <- ","
hcoptslangoptions(highcharter.lang = hcoptslang)
<- hc_theme(
plot_theme chart = list(style = list(fontFamily = "Arial", color = "#666666")),
title = list(
align = "center",
style = list(
fontFamily = "Arial",
fontWeight = "bold",
color = "black",
fontSize = "16px"
)
),legend = list(align = "center", verticalAlign = "top"),
xAxis = list(gridLineWidth = 0, lineWidth = 0, tickLength = 0),
yAxis = list(gridLineWidth = 0)
)
We use the hchart()
function to create the highchart
object and specify that we should use the dataset we created in the previous section. In this command, we also define which variables in the data should be assigned to x- and y-axes, as well as the variable which defines the group in the chart.
<- plot_state_admissions_breakdown |>
hc_area hchart(
type = "area", # set the type of chart to be an area chart
stacking = "normal", # specify to stack the counts on top of each other
hcaes(x = year, y = count, group = metric),
color = c("#C7E8F5", "#D6C246")
)
hc_area
Next, we add information on the x and y axis, denoting the breaks for the x axis, and the style for the y axis. We remove titles from both x and y axis. We add a main title for the plot. Another crucial step is specifying the tooltips (recall that we created a variable called tooltip
).
<- hc_area |>
hc_area hc_xAxis(title = "", tickPositions = c(2018, 2019, 2020, 2021)) |>
hc_yAxis(title = "", labels = list(format = "{value:,.0f}")) |>
hc_title(text = paste0(this_state, ": Prison Admissions")) |>
hc_tooltip(formatter = JS("function(){return(this.point.tooltip)}"))
hc_area
Next, we specify the plot options. This allows us to set a variety of options:
- Specify how the cursor should look as it interacts with the series in the chart.
- Adjust the border width of each area segment.
- Hide the point markers unless hovering over them with the cursor.
- Add accessibility components and descriptive text.
<- hc_area |>
hc_area hc_plotOptions(
series = list(animation = FALSE, cursor = "pointer", borderWidth = 3),
area = list(marker = list(enabled = FALSE)),
accessibility = list(
enabled = TRUE,
keyboardNavigation = list(enabled = TRUE),
point = list(
valueDescriptionFormat = "{point.state}, {point.year}, {point.metric},
{point.count:,.0f}"
),linkedDescription = paste0("This is an area chart for the state of",
"
this_state, displaying the total prison admissions
disaggregated by admission type: supervision
violation admissions and new offense
(or other non-violation) admissions"),
landmarkVerbosity = "one"
) )
hc_area
The final step is adding our premade theme to the chart.
<- hc_area |>
hc_area hc_add_theme(plot_theme)
hc_area
Bar Chart of Violation Admissions by Type
Data Preparation
First, we need to specify which state we want to use in this chart.
Skip this step if you have already specified the state.
<- "Alabama" # select state this_state
Next, we need to filter our data to the specific values we are interested in: Alabama Admissions to Prison from Parole by Type (new offense violation or technical violation)
state == thisState
: filter to state of interestadm_or_pop == "Admissions"
: filter to admissions onlymetric %in% c("New Offense Violation", "Technical Violation")
: filter to show the type disaggregation
<- svii_data |>
state_supervision_type filter(
== this_state &
state == "Admissions" &
adm_or_pop %in% c("New Offense Violation", "Technical Violation")
metric )
state_supervision_type#> # A tibble: 16 × 7
#> state full_metric year count adm_or_pop prob_or_par metric
#> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
#> 1 Alabama Probation New Offense Viol… 2018 2069 Admissions Probation New O…
#> 2 Alabama Probation Technical Violat… 2018 1683 Admissions Probation Techn…
#> 3 Alabama Parole New Offense Violati… 2018 1231 Admissions Parole New O…
#> 4 Alabama Parole Technical Violation… 2018 1097 Admissions Parole Techn…
#> 5 Alabama Probation New Offense Viol… 2019 2372 Admissions Probation New O…
#> 6 Alabama Probation Technical Violat… 2019 1596 Admissions Probation Techn…
#> 7 Alabama Parole New Offense Violati… 2019 1266 Admissions Parole New O…
#> 8 Alabama Parole Technical Violation… 2019 1126 Admissions Parole Techn…
#> 9 Alabama Probation New Offense Viol… 2020 1306 Admissions Probation New O…
#> 10 Alabama Probation Technical Violat… 2020 1838 Admissions Probation Techn…
#> 11 Alabama Parole New Offense Violati… 2020 1375 Admissions Parole New O…
#> 12 Alabama Parole Technical Violation… 2020 242 Admissions Parole Techn…
#> 13 Alabama Probation New Offense Viol… 2021 1073 Admissions Probation New O…
#> 14 Alabama Probation Technical Violat… 2021 1776 Admissions Probation Techn…
#> 15 Alabama Parole New Offense Violati… 2021 1295 Admissions Parole New O…
#> 16 Alabama Parole Technical Violation… 2021 257 Admissions Parole Techn…
Next, we’ll combine probation and parole admissions so that we can plot the total number of admissions from supervision, while keeping the variable that indicates if the admissions was a new offense violation or technical violation. We’ll also calculate the total number of returns from supervisions and the percentage of supervision violation admissions that were due to new offenses versus technical violations.
<- state_supervision_type |>
state_supervision_breakdown group_by(state, year, metric, adm_or_pop) |>
summarize(count = sum(count, na.rm = TRUE), .groups = "drop") |>
group_by(state, year) |>
mutate(
all_sup_returns = sum(count),
pct = count / all_sup_returns
|>
) ungroup()
state_supervision_breakdown#> # A tibble: 8 × 7
#> state year metric adm_or_pop count all_sup_returns pct
#> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Alabama 2018 New Offense Violation Admissions 3300 6080 0.543
#> 2 Alabama 2018 Technical Violation Admissions 2780 6080 0.457
#> 3 Alabama 2019 New Offense Violation Admissions 3638 6360 0.572
#> 4 Alabama 2019 Technical Violation Admissions 2722 6360 0.428
#> 5 Alabama 2020 New Offense Violation Admissions 2681 4761 0.563
#> 6 Alabama 2020 Technical Violation Admissions 2080 4761 0.437
#> 7 Alabama 2021 New Offense Violation Admissions 2368 4401 0.538
#> 8 Alabama 2021 Technical Violation Admissions 2033 4401 0.462
Next, we want to specify that if the total is 0
, set the value to NA
. In the same step, we create the text that will be used in the tooltip.
<- state_supervision_breakdown |>
plot_state_supervision_breakdown mutate(
count = ifelse(count == 0, NA, count),
tooltip = paste0(
"<b>", state, " - ", year, "</b><br>",
" Admissions: ", comma(count, 1), "<br>",
metric, "Percentage of Total Supervision Admissions: ", percent(pct, 1),"<br>",
"Total Return from Supervision Admissions: ", comma(all_sup_returns, 1)
) )
plot_state_supervision_breakdown#> # A tibble: 8 × 8
#> state year metric adm_or_pop count all_sup_returns pct tooltip
#> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 Alabama 2018 New Offense Viol… Admissions 3300 6080 0.543 <b>Ala…
#> 2 Alabama 2018 Technical Violat… Admissions 2780 6080 0.457 <b>Ala…
#> 3 Alabama 2019 New Offense Viol… Admissions 3638 6360 0.572 <b>Ala…
#> 4 Alabama 2019 Technical Violat… Admissions 2722 6360 0.428 <b>Ala…
#> 5 Alabama 2020 New Offense Viol… Admissions 2681 4761 0.563 <b>Ala…
#> 6 Alabama 2020 Technical Violat… Admissions 2080 4761 0.437 <b>Ala…
#> 7 Alabama 2021 New Offense Viol… Admissions 2368 4401 0.538 <b>Ala…
#> 8 Alabama 2021 Technical Violat… Admissions 2033 4401 0.462 <b>Ala…
# view a single tool tip
$tooltip[1]
plot_state_supervision_breakdown#> [1] "<b>Alabama - 2018</b><br>New Offense Violation Admissions: 3,300<br>Percentage of Total Supervision Admissions: 54%<br>Total Return from Supervision Admissions: 6,080"
Create Highcharts Plot
Before creating the Highcharts plot, specify that the thousands separator should be a comma in the Highcharts setting options. The default for Highcharts is to use a space. We can also create a set theme, and this theme can be used for other plots as well.
Skip this step if you have already specified thousands separator and created the plot theme.
Code
# this sets the thousands separator to a comma
# so if you 1000 it will be displayed as "1,000"
<- getOption("highcharter.lang")
hcoptslang $thousandsSep <- ","
hcoptslangoptions(highcharter.lang = hcoptslang)
<- hc_theme(
plot_theme chart = list(style = list(fontFamily = "Arial", color = "#666666")),
title = list(
align = "center",
style = list(
fontFamily = "Arial",
fontWeight = "bold",
color = "black",
fontSize = "16px"
)
),legend = list(align = "center", verticalAlign = "top"),
xAxis = list(gridLineWidth = 0, lineWidth = 0, tickLength = 0),
yAxis = list(gridLineWidth = 0)
)
We start by creating a highchart
object with the dataset created in the previous section.
<- plot_state_supervision_breakdown |>
hc_bar hchart(
type = "column", # specify the type of chart to be a column chart
hcaes(x = year, y = count, group = metric),
color = c("#D25E2D", "#EDB799")
)
Next, we remove titles from the x and y axis, add a main chart title, and display the created text in the variable tooltip
for the tooltips in the chart.
<- hc_bar |>
hc_bar hc_xAxis(title = "") |>
hc_yAxis(title = "", labels = list(format = "{value:,.0f}")) |>
hc_title(text = paste0(this_state, ": Supervision Violation Admissions by Type")) |>
hc_tooltip(formatter = JS("function(){return(this.point.tooltip)}"))
hc_bar
Next, we specify the plot options. This allows us to set a variety of options:
- Specify how the cursor should look as it interacts with the series in the chart.
- Adjust the border width of each area segment.
- Hide the point markers unless hovering over them with the cursor.
- Add accessibility components and descriptive text.
<- hc_bar |>
hc_bar hc_plotOptions(
series = list(animation = FALSE, cursor = "pointer", borderWidth = 3),
area = list(marker = list(enabled = FALSE)),
accessibility = list(
enabled = TRUE,
keyboardNavigation = list(enabled = TRUE),
point = list(
valueDescriptionFormat = "{point.state}, {point.year}, {point.metric},
{point.count:,.0f}"
),linkedDescription = paste0("This is a bar chart for the state of",
this_state, "displaying the total prison admissions due to
supervision violations, disaggregated by technical
violations vs. new offense violations."),
landmarkVerbosity = "one"
) )
hc_bar
Now we want to style the chart.
<- hc_bar |>
hc_bar hc_add_theme(plot_theme)
hc_bar
Hex Map
We will create a U.S. map by state that shows the percent change in total admissions between 2018 and 2021. The colors of each state on the map will indicate the magnitude of the change. Rather than use the standard U.S. state map, we’ll create a hexbin map, which depicts each state as an equal-sized hexagon.
Data Preparation
Hex Map Data
To create a hex map of the 50 states, you need to download the hex map coordinates. The hex map coordinates are referenced on the R Graph Gallery, Hexbin map in R: an example with US State. A cleaned version of the coordinates can be imported using the link below. If you are interested in downloading the data and cleaning it yourself, please review Appendix: Hex Map Data Prep.
<- "https://github.com/CSGJusticeCenter/va_data/raw/main/model_code/violation_admissions/us_hex_map.json"
hex_url <- fromJSON(file = hex_url) hex
Prison Admissons Data from SVII Dataset
We will display data downloaded from the Supervision Violations and Their Impact on Incarceration report.
We need to filter the admissions data to the specific values used in the hex map: Change in Total Admissions to State Prison from 2018 to 2021.
metric == "Total"
: filter to total metric (total admissions to state prison)adm_or_pop == "Admissions"
: filter to admissions onlyyear %in% c(2018, 2021)
: filter to 2018 and 2021 so you can calculate the change between the two years
<- svii_data |>
total_prison_adm filter(metric == "Total", adm_or_pop == "Admissions", year %in% c(2018, 2021))
total_prison_adm#> # A tibble: 100 × 7
#> state full_metric year count adm_or_pop prob_or_par metric
#> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
#> 1 Alabama Total Admissions 2018 14054 Admissions <NA> Total
#> 2 Alabama Total Admissions 2021 9663 Admissions <NA> Total
#> 3 Alaska Total Admissions 2018 32627 Admissions <NA> Total
#> 4 Alaska Total Admissions 2021 NA Admissions <NA> Total
#> 5 Arizona Total Admissions 2018 18361 Admissions <NA> Total
#> 6 Arizona Total Admissions 2021 11518 Admissions <NA> Total
#> 7 Arkansas Total Admissions 2018 9204 Admissions <NA> Total
#> 8 Arkansas Total Admissions 2021 8123 Admissions <NA> Total
#> 9 California Total Admissions 2018 35391 Admissions <NA> Total
#> 10 California Total Admissions 2021 29425 Admissions <NA> Total
#> # ℹ 90 more rows
Only select variables that are needed (this makes the pivoting easier). Then pivot the data so that the years (2018, 2021) are their own columns.
<- total_prison_adm |>
total_prison_adm_wide select(state, full_metric, year, count) |>
# pivot wider so values for 2018 and 2021 are in 2 different columns
pivot_wider(names_from = year, values_from = count)
total_prison_adm_wide#> # A tibble: 50 × 4
#> state full_metric `2018` `2021`
#> <chr> <chr> <dbl> <dbl>
#> 1 Alabama Total Admissions 14054 9663
#> 2 Alaska Total Admissions 32627 NA
#> 3 Arizona Total Admissions 18361 11518
#> 4 Arkansas Total Admissions 9204 8123
#> 5 California Total Admissions 35391 29425
#> 6 Colorado Total Admissions 9985 5086
#> 7 Connecticut Total Admissions 21018 12717
#> 8 Delaware Total Admissions 13358 9899
#> 9 Florida Total Admissions 31285 20800
#> 10 Georgia Total Admissions 18275 13611
#> # ℹ 40 more rows
Next, we mutate the data by creating new variables to reflect the years in the change calculations, calculating the change between 2018 and 2021, adding state abbreviation identifier, creating tooltip text, and formatting the text displayed for the percent change.
<- total_prison_adm_wide |>
plot_total_prison_adm_change mutate(
state_abb = state.abb[match(state.name, state)],
n_change = `2021` - `2018`,
pct_change = n_change / `2018` * 100,
tooltip = paste0(
"<b>", state, "</b><br>",
"2018 admissions: ", comma(`2018`, 1), "<br>",
"2021 admissions: ", comma(`2021`, 1), "<br>",
"Change in admissions 2018 to 2021: ", comma(n_change, 1), "<br>",
"Percent change in admissions 2018 to 2021: ", percent(pct_change, 1, scale = 1)
),changelabel = ifelse(is.na(pct_change), "-", percent(pct_change, 1, scale = 1))
)
plot_total_prison_adm_change#> # A tibble: 50 × 9
#> state full_metric `2018` `2021` state_abb n_change pct_change tooltip
#> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <chr>
#> 1 Alabama Total Admiss… 14054 9663 AL -4391 -31.2 <b>Ala…
#> 2 Alaska Total Admiss… 32627 NA AK NA NA <b>Ala…
#> 3 Arizona Total Admiss… 18361 11518 AZ -6843 -37.3 <b>Ari…
#> 4 Arkansas Total Admiss… 9204 8123 AR -1081 -11.7 <b>Ark…
#> 5 California Total Admiss… 35391 29425 CA -5966 -16.9 <b>Cal…
#> 6 Colorado Total Admiss… 9985 5086 CO -4899 -49.1 <b>Col…
#> 7 Connecticut Total Admiss… 21018 12717 CT -8301 -39.5 <b>Con…
#> 8 Delaware Total Admiss… 13358 9899 DE -3459 -25.9 <b>Del…
#> 9 Florida Total Admiss… 31285 20800 FL -10485 -33.5 <b>Flo…
#> 10 Georgia Total Admiss… 18275 13611 GA -4664 -25.5 <b>Geo…
#> # ℹ 40 more rows
#> # ℹ 1 more variable: changelabel <chr>
Create Highcharts Plot
We can create a set theme for the hex map. We also specify the minimum and maximum values in the dataset.
<- hc_theme(
map_theme chart = list(style = list(fontFamily = "Arial", color = "#666666")),
title = list(
style = list(
fontFamily = "Arial",
fontWeight = "bold",
color = "black",
fontSize = "30px"
)
)
)
<- min(plot_total_prison_adm_change$pct_change, na.rm = TRUE)
min_map <- max(plot_total_prison_adm_change$pct_change, na.rm = TRUE) max_map
We start by plotting the hex map data and adding the labels (the state abbreviation and percent change).
<- highchart() |>
hc_hex hc_add_series_map(
map = hex,
df = plot_total_prison_adm_change,
joinBy = "state_abb",
value = "pct_change",
dataLabels = list(
enabled = TRUE,
useHTML = TRUE,
formatter = JS("function() {return '<div style=\"text-align:center;\">' +
'<span style=\"font-weight:bold;\">' + this.point.state_abb + '</span><br>' +
'<span>' + this.point.changelabel + '</span>' +
'</div>';}"),
style = list(
fontSize = "14px",
fontWeight = "regular"
)
),nullColor = "#e8e8e8",
accessibility = list(
point = list(
valueDescriptionFormat = "state: {point.state}, percent change: {point.value:.1f}"
)
) )
hc_hex
Next, we use the min/max values we calculated to establish a gradient color legend. This will fill each state hexagon with a color based on that state’s percent change. We also specify where to put the legend on the chart.
<- hc_hex |>
hc_hex hc_colorAxis(
min = min_map,
max = max_map,
stops = color_stops(4, c("#004270", "#236ca7", "#C7E8F5", "#FFFFFF")),
labels = list(format = "{value}%")
|>
) hc_legend(
align = "right",
layout = "vertical",
verticalAlign = "top",
y = 300
)
hc_hex
Next, we specify the chart title and what variable to use as the tooltip text.
<- hc_hex |>
hc_hex hc_title(text = "Change in Total Admissions to State Prison<br>2018–2021") |>
hc_tooltip(
formatter = JS("function(){return(this.point.tooltip)}"),
outside = TRUE
)
hc_hex
Finally, we add the theme that we created at the top of this section.
<- hc_hex |>
hc_hex hc_add_theme(map_theme)
hc_hex
R Session Info
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.1 (2024-06-14 ucrt)
#> os Windows 10 x64 (build 19045)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate English_United States.utf8
#> ctype English_United States.utf8
#> tz America/New_York
#> date 2024-08-09
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> assertthat 0.2.1 2019-03-21 [] CRAN (R 4.4.0)
#> backports 1.5.0 2024-05-23 [] CRAN (R 4.4.0)
#> bit 4.0.5 2022-11-15 [] CRAN (R 4.4.0)
#> bit64 4.0.5 2020-08-30 [] CRAN (R 4.4.0)
#> broom 1.0.6 2024-05-17 [] CRAN (R 4.4.0)
#> class 7.3-22 2023-05-03 [] CRAN (R 4.4.1)
#> classInt 0.4-10 2023-09-05 [] CRAN (R 4.4.0)
#> cli 3.6.2 2023-12-11 [] CRAN (R 4.4.0)
#> colorspace 2.1-0 2023-01-23 [] CRAN (R 4.4.0)
#> crayon 1.5.3 2024-06-20 [] CRAN (R 4.4.1)
#> curl 5.2.1 2024-03-01 [] CRAN (R 4.4.0)
#> data.table 1.15.4 2024-03-30 [] CRAN (R 4.4.0)
#> DBI 1.2.2 2024-02-16 [] CRAN (R 4.4.0)
#> digest 0.6.36 2024-06-23 [] CRAN (R 4.4.1)
#> dplyr * 1.1.4 2023-11-17 [] CRAN (R 4.4.0)
#> e1071 1.7-14 2023-12-06 [] CRAN (R 4.4.0)
#> evaluate 0.24.0 2024-06-10 [] CRAN (R 4.4.1)
#> fansi 1.0.6 2023-12-08 [] CRAN (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [] CRAN (R 4.4.0)
#> forcats * 1.0.0 2023-01-29 [] CRAN (R 4.4.0)
#> generics 0.1.3 2022-07-05 [] CRAN (R 4.4.0)
#> geojsonsf * 2.0.3 2022-05-30 [] CRAN (R 4.4.0)
#> ggplot2 * 3.5.1 2024-04-23 [] CRAN (R 4.4.0)
#> glue 1.7.0 2024-01-09 [] CRAN (R 4.4.0)
#> gtable 0.3.5 2024-04-22 [] CRAN (R 4.4.0)
#> highcharter * 0.9.4.9000 2024-06-07 [] Github (batpigandme/highcharter@6644cf7)
#> hms 1.1.3 2023-03-21 [] CRAN (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [] CRAN (R 4.4.0)
#> htmlwidgets 1.6.4 2023-12-06 [] CRAN (R 4.4.0)
#> igraph 2.0.3 2024-03-13 [] CRAN (R 4.4.0)
#> jsonlite * 1.8.8 2023-12-04 [] CRAN (R 4.4.0)
#> KernSmooth 2.23-24 2024-05-17 [] CRAN (R 4.4.1)
#> knitr 1.48 2024-07-07 [] CRAN (R 4.4.1)
#> lattice 0.22-6 2024-03-20 [] CRAN (R 4.4.1)
#> lifecycle 1.0.4 2023-11-07 [] CRAN (R 4.4.0)
#> lubridate * 1.9.3 2023-09-27 [] CRAN (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [] CRAN (R 4.4.0)
#> munsell 0.5.1 2024-04-01 [] CRAN (R 4.4.0)
#> pillar 1.9.0 2023-03-22 [] CRAN (R 4.4.0)
#> pkgconfig 2.0.3 2019-09-22 [] CRAN (R 4.4.0)
#> proxy 0.4-27 2022-06-09 [] CRAN (R 4.4.0)
#> purrr * 1.0.2 2023-08-10 [] CRAN (R 4.4.0)
#> quantmod 0.4.26 2024-02-14 [] CRAN (R 4.4.0)
#> R6 2.5.1 2021-08-19 [] CRAN (R 4.4.0)
#> Rcpp 1.0.13 2024-07-17 [] CRAN (R 4.4.1)
#> readr * 2.1.5 2024-01-10 [] CRAN (R 4.4.0)
#> rjson * 0.2.21 2022-01-09 [] CRAN (R 4.4.0)
#> rlang 1.1.3 2024-01-10 [] CRAN (R 4.4.0)
#> rlist 0.4.6.2 2021-09-03 [] CRAN (R 4.4.0)
#> rmarkdown 2.27 2024-05-17 [] CRAN (R 4.4.0)
#> rstudioapi 0.16.0 2024-03-24 [] CRAN (R 4.4.0)
#> scales * 1.3.0 2023-11-28 [] CRAN (R 4.4.0)
#> sessioninfo 1.2.2 2021-12-06 [] CRAN (R 4.4.0)
#> sf * 1.0-16 2024-03-24 [] CRAN (R 4.4.0)
#> stringi 1.8.4 2024-05-06 [] CRAN (R 4.4.0)
#> stringr * 1.5.1 2023-11-14 [] CRAN (R 4.4.0)
#> tibble * 3.2.1 2023-03-20 [] CRAN (R 4.4.0)
#> tidyr * 1.3.1 2024-01-24 [] CRAN (R 4.4.0)
#> tidyselect 1.2.1 2024-03-11 [] CRAN (R 4.4.0)
#> tidyverse * 2.0.0 2023-02-22 [] CRAN (R 4.4.0)
#> timechange 0.3.0 2024-01-18 [] CRAN (R 4.4.0)
#> TTR 0.24.4 2023-11-28 [] CRAN (R 4.4.0)
#> tzdb 0.4.0 2023-05-12 [] CRAN (R 4.4.0)
#> units 0.8-5 2023-11-28 [] CRAN (R 4.4.0)
#> utf8 1.2.4 2023-10-22 [] CRAN (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [] CRAN (R 4.4.0)
#> vroom 1.6.5 2023-12-05 [] CRAN (R 4.4.0)
#> withr 3.0.0 2024-01-16 [] CRAN (R 4.4.0)
#> xfun 0.46 2024-07-18 [] CRAN (R 4.4.1)
#> xts 0.14.0 2024-06-05 [] CRAN (R 4.4.0)
#> yaml 2.3.9 2024-07-05 [] CRAN (R 4.4.1)
#> zoo 1.8-12 2023-04-13 [] CRAN (R 4.4.0)
#>
#>
#> ──────────────────────────────────────────────────────────────────────────────
Appendix: Hex Map Data Prep
Hex Map Coordinates
To create a hex map of the 50 states, you need to download the hex map coordinates. The hex map coordinates are referenced on the R Graph Gallery, Hexbin map in R: an example with US State. The hexagon boundaries can be downloaded in geojson format from Carto or using the link shown below.
<- "https://github.com/CSGJusticeCenter/va_data/raw/main/model_code/violation_admissions/us_states_hexgrid.geojson"
raw_hex_url
<- read_sf(raw_hex_url) |>
rawhex # rename identifier as 'state_abb'
select(state_abb = iso3166_2) |>
# remove DC as our data is for the 50 states only
filter(state_abb != "DC") |>
# create new variable with full state name
mutate(state_name = state.name[match(state_abb, state.abb)])
rawhex#> Simple feature collection with 50 features and 2 fields
#> Geometry type: POLYGON
#> Dimension: XY
#> Bounding box: xmin: -137.9747 ymin: 26.39343 xmax: -69.90286 ymax: 55.3132
#> Geodetic CRS: WGS 84
#> # A tibble: 50 × 3
#> state_abb geometry state_name
#> * <chr> <POLYGON [°]> <chr>
#> 1 ME ((-72.62574 55.3132, -69.90286 54.40843, -69.90286 52.5… Maine
#> 2 RI ((-72.62574 49.57439, -69.90286 48.54431, -69.90286 46.… Rhode Isl…
#> 3 VT ((-80.79436 52.53744, -78.07148 51.57081, -78.07148 49.… Vermont
#> 4 OK ((-110.746 35.79821, -108.0231 34.51297, -108.0231 31.8… Oklahoma
#> 5 NC ((-91.68585 39.5301, -88.96298 38.30704, -88.96298 35.7… North Car…
#> 6 VA ((-88.96298 43.0717, -86.2401 41.91257, -86.2401 39.530… Virginia
#> 7 WV ((-94.40873 43.0717, -91.68585 41.91257, -91.68585 39.5… West Virg…
#> 8 CA ((-124.3603 39.5301, -121.6375 38.30704, -121.6375 35.7… California
#> 9 KS ((-108.0231 39.5301, -105.3002 38.30704, -105.3002 35.7… Kansas
#> 10 KY ((-99.85447 43.0717, -97.1316 41.91257, -97.1316 39.530… Kentucky
#> # ℹ 40 more rows
Next, we need to reformat the data. First, we set the coordinate reference system (or CRS) by referring to a specific EPSG code. The code used below (3857) is a reference to the WGS 84 / Pseudo-Mercator CRS which will “square” the hexagons.
# Reformat hex data
<- rawhex |>
hex # set CRS to WGS 84 / Pseudo-Mercator
st_transform(3857) |>
# convert sf object to geojson format
::sf_geojson() |>
geojsonsf# converts geojson format to list format (to integrate with Highcharts)
::fromJSON(simplifyVector = FALSE) jsonlite
$type
hex#> [1] "FeatureCollection"
$features[1] # example of data for single state (Maine)
hex#> [[1]]
#> [[1]]$type
#> [1] "Feature"
#>
#> [[1]]$properties
#> [[1]]$properties$state_abb
#> [1] "ME"
#>
#> [[1]]$properties$state_name
#> [1] "Maine"
#>
#>
#> [[1]]$geometry
#> [[1]]$geometry$type
#> [1] "Polygon"
#>
#> [[1]]$geometry$coordinates
#> [[1]]$geometry$coordinates[[1]]
#> [[1]]$geometry$coordinates[[1]][[1]]
#> [[1]]$geometry$coordinates[[1]][[1]][[1]]
#> [1] -8084660
#>
#> [[1]]$geometry$coordinates[[1]][[1]][[2]]
#> [1] 7422891
#>
#>
#> [[1]]$geometry$coordinates[[1]][[2]]
#> [[1]]$geometry$coordinates[[1]][[2]][[1]]
#> [1] -7781551
#>
#> [[1]]$geometry$coordinates[[1]][[2]][[2]]
#> [1] 7247891
#>
#>
#> [[1]]$geometry$coordinates[[1]][[3]]
#> [[1]]$geometry$coordinates[[1]][[3]][[1]]
#> [1] -7781551
#>
#> [[1]]$geometry$coordinates[[1]][[3]][[2]]
#> [1] 6897891
#>
#>
#> [[1]]$geometry$coordinates[[1]][[4]]
#> [[1]]$geometry$coordinates[[1]][[4]][[1]]
#> [1] -8084660
#>
#> [[1]]$geometry$coordinates[[1]][[4]][[2]]
#> [1] 6722891
#>
#>
#> [[1]]$geometry$coordinates[[1]][[5]]
#> [[1]]$geometry$coordinates[[1]][[5]][[1]]
#> [1] -8387769
#>
#> [[1]]$geometry$coordinates[[1]][[5]][[2]]
#> [1] 6897891
#>
#>
#> [[1]]$geometry$coordinates[[1]][[6]]
#> [[1]]$geometry$coordinates[[1]][[6]][[1]]
#> [1] -8387769
#>
#> [[1]]$geometry$coordinates[[1]][[6]][[2]]
#> [1] 7247891
#>
#>
#> [[1]]$geometry$coordinates[[1]][[7]]
#> [[1]]$geometry$coordinates[[1]][[7]][[1]]
#> [1] -8084660
#>
#> [[1]]$geometry$coordinates[[1]][[7]][[2]]
#> [1] 7422891
Why is CRS important?
The CRS determines how the map is laid out into a 2-dimensional format. The hexagon borders are based on actual lat/long values for the U.S. states. Notice how the default CRS results in slightly wrapped hexagons whereas the new CRS makes all hexagons square