---
title: "Drainage Area Estimation"
author: "dblodgett@usgs.gov"
output:
  rmarkdown::html_vignette:
    tabset: true
vignette: >
  %\VignetteIndexEntry{Drainage Area Estimation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{css, echo = FALSE}
/* Override html_vignette narrow body to allow wider figures */
body, div.main-container, div.body, div.contents {
  max-width: 1600px !important;
  width: 95% !important;
}
img {
  max-width: 100% !important;
  height: auto !important;
}
.vignette-note {
  background: #f8f8f8;
  border-left: 3px solid #ccc;
  padding: 8px 12px;
  margin: 8px 0 16px 0;
  font-size: 0.9em;
}
```

```{r setup, include = FALSE}
library(nhdplusTools)

#pkgdown build uses this
build_local <- (Sys.getenv("BUILD_VIGNETTES") == "TRUE")

#cran package uses this
cran_build <- (Sys.getenv("BUILD_VIGNETTES_CRAN") == "TRUE")

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 12,
  fig.height = 7,
  dpi = 150,
  out.width = "100%",
  eval = build_local && !cran_build,
  fig.path = "drainage_area_figures/"
)

huc12 <- NULL
wbd_gdb <- "../../Data/final_wbd/WBD_National_GDB/WBD_National_GDB.gdb"
if(file.exists(wbd_gdb) & !cran_build) {
  huc12 <- sf::read_sf(wbd_gdb, "WBDHU12")
  # Rename columns to the schema expected by
  # get_drainage_area_estimates() and the vignette plot helpers.
  rename_map <- c(
    huc12 = "huc_12",
    noncontributingareaacres = "ncontrb_a",
    hutype = "hu_12_type"
  )
  for(src in names(rename_map)) {
    if(src %in% names(huc12) && !(rename_map[[src]] %in% names(huc12)))
      names(huc12)[names(huc12) == src] <- rename_map[[src]]
  }
}

# Local NHDPlusV2 national snapshot. Gated behind USE_LOCAL_NHDP so the
# read only happens when explicitly requested. When enabled and the GDB
# is present, NHDWaterbody and CatchmentSP are passed through
# waterbody_data / catchment_data so the point-in-waterbody lookup
# (Malheur), the gap-zone catchment retrieval, and the full-network
# catchment retrieval are resolved locally instead of via the OGC API.
# NHDFlowline_Network is read for the vignette's flowline-plot fetch.
waterbody_data <- NULL
catchment_data <- NULL
flowlines_data <- NULL
if(Sys.getenv("USE_LOCAL_NHDP") == "TRUE") {
  nhdp_gdb <- paste0("../../Data/nhdp/NHDPlusNationalData/",
    "NHDPlusV21_National_Seamless_Flattened_Lower48.gdb")
  if(file.exists(nhdp_gdb)) {
    waterbody_data <- sf::read_sf(nhdp_gdb, "NHDWaterbody")
    catchment_data <- sf::read_sf(nhdp_gdb, "CatchmentSP")
    flowlines_data <- sf::read_sf(nhdp_gdb, "NHDFlowline_Network")
    flowlines_data <- rename_geometry(flowlines_data, "geom")
    names(flowlines_data) <- tolower(names(flowlines_data))
  }
}

has_davidson <- FALSE
oldoption <- options(scipen = 9999)
```

*A fully rendered version of this vignette with figures is available at
<https://doi-usgs.github.io/nhdplusTools/articles/drainage_area_estimation.html>.*

`get_drainage_area_estimates()` combines Watershed Boundary Dataset
(WBD) delineations and non-surface-contributing area estimates with
National Hydrography Dataset Plus version 2 (NHDPlusV2) catchment
areas for the gap between a basin outlet and the nearest upstream
HU12. Non-contributing areas captured in HU12 boundaries are included.

This vignette shows output of the function on five basins that span a range of
hydrologic settings -- from basins where topographic signal dominates to arid,
glacial, and endorheic settings where it does not.

## How `get_drainage_area_estimates()` Works

The function produces drainage area estimates by stitching together
two kinds of spatial data: HU12 polygon areas for the bulk of the
upstream basin and NHDPlusV2 catchment areas for the gap between the
gage (or other outlet) and the nearest upstream HU12 boundaries.
Because HU12 polygons carry a non-contributing area attribute
(`ncontrb_a`) and non-surface-contributing HU12s can be retrieved from
HU10 or HU8 groupings, the estimates separate total drainage area
from surface network contributing drainage area.

### Algorithm Steps

1. **Resolve the start feature.** A Network Linked Data Index (NLDI)
   feature list (e.g.
   `featureSource = "nwissite", featureID = "USGS-05406500"`) is
   resolved to an NHDPlusV2 COMID via the NLDI. An `sfc_POINT` inside
   a waterbody triggers a waterbody lookup to find all outlet flowlines.

2. **Negotiate the outlet catchment.** For gage-based starts, the
   function computes where the gage sits along its outlet flowline as a
   0--100 measure. This measure determines whether the outlet catchment
   needs to be split (see *Catchment splitting* below).

3. **Fetch the upstream network and HU12 pour points.** The full
   upstream flowline network is retrieved along with all HU12 pour
   points on that network. These pour points mark the downstream outlet
   of each HU12 watershed unit that drains to the gage.

4. **Identify immediate HU12 outlets.** A network navigation
   finds which HU12 outlets are directly upstream of the gage with no
   intervening HU12 outlet. These define the boundary between the
   "HU zone" (covered by HU12 polygon areas) and the "gap zone"
   (covered by individual catchment areas).

5. **Fetch and assemble HU12 polygons.** HU12 polygons are retrieved
   at three query levels: the specific NLDI-identified HU12 IDs, all
   HU12s within upstream HU10 boundaries, and all HU12s within
   upstream HU8 boundaries. The broader queries can capture
   HU12s that share a parent HU but lack an on-network pour point --
   common in prairie-pothole and playa-dominated landscapes. A
   disconnect filter removes HU12s pulled incidentally by the broader
   queries that are not hydrologically connected. Each level is a
   superset of the narrower level, and each produces both a total and a
   contributing-only area estimate.

6. **Compute gap area.** Catchment areas between the gage and the
   immediate HU12 outlets are summed. Split-catchment logic is applied
   at both the HU12 outlets and (optionally) the gage point.

7. **Assemble scalar estimates.** Six drainage area estimates are
   computed: total and contributing at the HU12, HU10, and HU8
   query levels. Each equals the HU12 polygon area sum plus the gap
   area. A seventh scalar, `network_da_sqkm`, is the NHDPlusV2
   cumulative drainage area at the outlet -- a comparison baseline.

8. **NHDPlus High Resolution (NHDPlusHR) estimate (optional).** When
   `nhdplushr = TRUE`, the function fetches NHDPlusHR flowlines and
   catchments for the basin bounding box, indexes the start point to
   the HR network (disambiguating by drainage area), navigates
   upstream, and sums catchment areas.

### Catchment Splitting

When a gage sits partway along a flowline rather than at its outlet,
the downstream portion of the catchment does not contribute to the
gage. The function calls the NLDI split-catchment service to divide
the catchment at the gage point and counts only the upstream portion.
The `outlet_split_threshold_m` parameter (default 100 m) sets the
minimum gage-to-outlet distance before splitting is performed; if the
gage is closer than this threshold, the full catchment is used.

A parallel split occurs at HU12 outlets: the HU boundary cuts
through a catchment, and the portion upstream of the pour point
overlaps with HU12 area already counted. The split-catchment service
divides this catchment and the upstream overlap is excluded from the
gap area to avoid double-counting.

The *Outlet Catchment Splitting* section later in this vignette
illustrates both splits using Davidson Creek (USGS streamgage 08110075).

### Running with Local Data

Three parameters reduce dependence on web services:

- `local_navigation = TRUE` loads the NHDPlusV2 Value Added Attributes
  via `get_vaa()` for network navigation and flowline attribute
  lookups. Only HU12 pour points are still fetched from the NLDI.

- `huc12_data` accepts a pre-loaded `sf` data.frame of HU12 polygons
  (e.g. from the WBD National Geodatabase). When provided, all
  HU12 polygon queries are resolved by subsetting this table locally
  instead of calling WBD web services. The table must include `huc_12`
  and `ncontrb_a` columns (case-insensitive).

- `huc12_outlets` accepts either a path to a GPKG or a pre-loaded `sf`
  data.frame of on-network HU12 pour points. A national CONUS file is
  available from Blodgett, D.L., 2022, *Mainstem Rivers of the
  Conterminous United States* (ver. 3.0, February 2026): U.S.
  Geological Survey data release,
  [doi:10.5066/P13LNDDQ](https://doi.org/10.5066/P13LNDDQ) -- use the
  `hu_points` layer of `finalwbd_outlets.gpkg`. The function renames
  `COMID` and `FinalWBD_HUC12` to `comid` and `identifier` and filters
  the table to the upstream network automatically. When supplied, no
  NLDI `huc12pp` queries are issued.

Using all three together eliminates all WBD and NLDI `huc12pp` calls
and most flowline attribute calls. The offline call looks like:

```r
result <- get_drainage_area_estimates(
  start, local_navigation = TRUE,
  huc12_data = wbd_sf,
  huc12_outlets = "path/to/finalwbd_outlets.gpkg"
)
```

### Web Services and Performance

By default the function contacts several web services. Each adds
latency and is subject to rate limits or outages:

- **NLDI** (`findNLDI`): resolves the start feature, navigates the
  upstream network, and retrieves HU12 pour points. The `huc12pp`
  query is skipped when `huc12_outlets` is supplied.

- **NHDPlusV2 OGC API** (`get_nhdplus`): fetches flowline attributes,
  catchment geometries in the gap zone, and (when `catchments = TRUE`)
  the full upstream catchment polygon set. Skipped for flowline
  attributes when `local_navigation = TRUE`.

- **WBD / HU service** (`get_huc`, `get_huc12_by_huc`): fetches
  HU12 polygons by ID or by parent HU. Skipped when `huc12_data` is
  provided.

- **Split-catchment service** (`get_split_catchment`): divides
  catchments at HU12 outlets and at the gage point. Always called.

- **NHDPlusHR service** (`get_nhdphr`): fetches high-resolution
  flowlines and catchments for the basin bounding box. Skipped when
  `nhdplushr = FALSE`.

For large basins (e.g. the Brazos at Rosharon) the cumulative time for
these calls can be substantial. Setting `nhdplushr = FALSE` eliminates
the most expensive single call. Providing `huc12_data` and using
`local_navigation = TRUE` together can reduce total run time to a
fraction of the default, at the cost of requiring local data files.
The vignette's `fetch_or_load` wrapper caches results as RDS files so
that repeated runs avoid repeating the web service calls entirely.

## Fetch and Cache Results

Each basin result is stored in a separate RDS file in the nhdplusTools
data directory so that subsequent runs skip the web service calls.

```{r fetch_data, message = TRUE}
library(sf)

data_dir <- nhdplusTools_data_dir()
dir.create(data_dir, recursive = TRUE, showWarnings = FALSE)

# On-network HU12 pour points from the Mainstem Rivers data release
# (Blodgett 2022, doi:10.5066/P13LNDDQ). Downloaded once and cached so
# subsequent builds reuse the local file instead of hitting the NLDI.
outlets_path <- file.path(data_dir, "finalwbd_outlets.gpkg")
outlets_url <- paste0(
  "https://prod-is-usgs-sb-prod-publish.s3.amazonaws.com/",
  "65cbc0b3d34ef4b119cb37e9/finalwbd_outlets.gpkg")
if(!file.exists(outlets_path)) {
  message("Downloading HU12 outlets GPKG to ", outlets_path)
  download.file(outlets_url, outlets_path, mode = "wb")
}

# Define the six study sites
sites <- list(
  black_earth = list(featureSource = "nwissite",
    featureID = "USGS-05406500"),
  french_broad = list(featureSource = "nwissite",
    featureID = "USGS-03451500"),
  brazos = list(featureSource = "nwissite",
    featureID = "USGS-08116650"),
  james = list(featureSource = "nwissite",
    featureID = "USGS-06468000"),
  malheur = st_sfc(st_point(c(-118.8, 43.3)), crs = 4326),
  purgatoire = list(featureSource = "nwissite",
    featureID = "USGS-07128500"),
  davidson = list(featureSource = "nwissite",
    featureID = "USGS-08110075")
)

# Skip NHDPlusHR for large basins to keep fetch times reasonable
hr_basins <- c("black_earth", "french_broad", "davidson", "purgatoire",
  "james")

fetch_or_load <- function(name, start) {
  rds_path <- file.path(data_dir,
    paste0("da_est_", name, ".rds"))
  if(file.exists(rds_path)) {
    message("Loading cached: ", name)
    return(readRDS(rds_path))
  }
  message("Fetching: ", name)
  result <- tryCatch(
    get_drainage_area_estimates(start, catchments = TRUE,
      huc12_data = huc12,
      huc12_outlets = outlets_path,
      waterbody_data = waterbody_data,
      catchment_data = catchment_data,
      local_navigation = TRUE,
      nhdplushr = name %in% hr_basins),
    error = function(e) {
      warning("Failed for ", name, ": ", conditionMessage(e),
        call. = FALSE)
      NULL
    })
  if(!is.null(result)) saveRDS(result, rds_path)
  result
}

da_results <- Map(fetch_or_load, names(sites), sites)
da_results <- Filter(Negate(is.null), da_results)
```

```{r fetch_flowlines, message = TRUE}
fetch_flowlines <- function(name, da_result) {
  rds_path <- file.path(data_dir, paste0("fl_", name, ".rds"))
  if(file.exists(rds_path)) {
    message("Loading cached flowlines: ", name)
    return(readRDS(rds_path))
  }
  message("Fetching flowlines: ", name)
  comids <- da_result$all_network$comid
  fl <- if(!is.null(flowlines_data)) {
    message("  Subsetting from local NHDFlowline_Network...")
    flowlines_data[flowlines_data$comid %in% comids, ]
  } else {
    tryCatch(
      get_nhdplus(comid = comids, realization = "flowline"),
      error = function(e) {
        warning("Flowline fetch failed for ", name, ": ",
          conditionMessage(e), call. = FALSE)
        NULL
      })
  }
  if(!is.null(fl) && nrow(fl) > 0) saveRDS(fl, rds_path)
  fl
}

flowlines <- Map(fetch_flowlines, names(da_results), da_results)
flowlines <- Filter(Negate(is.null), flowlines)
```

```{r fetch_nwis_da, message = TRUE}
# Fetch NWIS drainage area for sites with an nwissite featureSource
nwis_ids <- vapply(sites, function(s) {
  if(is.list(s) && identical(s$featureSource, "nwissite"))
    s$featureID else NA_character_
}, character(1))
nwis_ids <- nwis_ids[!is.na(nwis_ids)]

nwis_da <- if(length(nwis_ids) > 0) {
  tryCatch({
    ml <- dataRetrieval::read_waterdata_monitoring_location(
      unname(nwis_ids))
    # Match returned rows back to site names by monitoring_location_id
    idx <- match(nwis_ids, ml$monitoring_location_id)
    # drainage_area and contributing_drainage_area are in sq miles
    data.frame(
      name = names(nwis_ids),
      nwis_da_sqmi = ml$drainage_area[idx],
      nwis_contrib_da_sqmi = ml$contributing_drainage_area[idx],
      nwis_da_sqkm = ml$drainage_area[idx] * 2.58999,
      nwis_contrib_da_sqkm = ml$contributing_drainage_area[idx] *
        2.58999,
      stringsAsFactors = FALSE
    )
  }, error = function(e) {
    warning("NWIS site fetch failed: ", conditionMessage(e),
      call. = FALSE)
    NULL
  })
} else NULL
```

## Return Structure

Each result is a list with scalar drainage area estimates and spatial
data frames. Here are the elements for Black Earth Creek:

```{r structure}
names(da_results$black_earth)
```

The scalar estimates (square kilometers) across all basins:

```{r summary_table}
basin_labels <- c(
  black_earth = "Black Earth Creek",
  french_broad = "French Broad",
  brazos = "Brazos at Rosharon",
  james = "James River",
  malheur = "Malheur Lake",
  davidson = "Davidson Creek",
  purgatoire = "Purgatoire River"
)

summary_df <- data.frame(
  basin = basin_labels[names(da_results)],
  network_da = vapply(da_results, \(x) x$network_da_sqkm, numeric(1)),
  da_huc12 = vapply(da_results, \(x) x$da_huc12_sqkm, numeric(1)),
  da_huc10 = vapply(da_results,
    \(x) ifelse(is.na(x$da_huc10_sqkm), NA_real_, x$da_huc10_sqkm),
    numeric(1)),
  da_huc08 = vapply(da_results,
    \(x) ifelse(is.na(x$da_huc08_sqkm), NA_real_, x$da_huc08_sqkm),
    numeric(1)),
  nhdplushr = vapply(da_results,
    \(x) ifelse(is.na(x$nhdplushr_network_dasqkm), NA_real_,
      x$nhdplushr_network_dasqkm),
    numeric(1))
)

# Add NWIS drainage areas where available
if(!is.null(nwis_da)) {
  summary_df$nwis_da <- ifelse(
    names(da_results) %in% nwis_da$name,
    nwis_da$nwis_da_sqkm[match(names(da_results), nwis_da$name)],
    NA_real_)
  summary_df$nwis_contrib_da <- ifelse(
    names(da_results) %in% nwis_da$name,
    nwis_da$nwis_contrib_da_sqkm[match(names(da_results), nwis_da$name)],
    NA_real_)
} else {
  summary_df$nwis_da <- NA_real_
  summary_df$nwis_contrib_da <- NA_real_
}

knitr::kable(summary_df, digits = 1,
  col.names = c("Basin", "Network DA", "HU12 DA",
    "HU10 DA", "HU8 DA", "NHDPlusHR DA",
    "NWIS DA", "NWIS Contributing DA"),
  caption = "Drainage area estimates (sq km) by basin and method")
```

```{r helpers, include = FALSE}
library(ggplot2)

# USGS USTopo provider for maptiles. The "?f=jpg" suffix is a no-op
# query parameter that gives maptiles a recognized extension hint.
usgs_topo_provider <- list(
  src = "USGS_USTopo",
  q = paste0("https://basemap.nationalmap.gov/arcgis/rest/services/",
    "USGSTopo/MapServer/tile/{z}/{y}/{x}?f=jpg"),
  sub = NA,
  cit = "USGS The National Map: USGSTopo"
)

# Cache directory for tiles
tile_cache <- file.path(nhdplusTools_data_dir(), "da_v_tiles")
dir.create(tile_cache, recursive = TRUE, showWarnings = FALSE)

# Compute a target tile zoom level so the rendered tiles are sharp
# at the figure size we use. Web Mercator world width is 40075016 m
# and tile size is 256 px. The figure pixel width drives the
# resolution we want from the tile mosaic.
target_zoom <- function(sf_obj, fig_px = 1700) {
  bb <- st_bbox(sf_obj)
  span_m <- max(bb["xmax"] - bb["xmin"], bb["ymax"] - bb["ymin"])
  if(!is.finite(span_m) || span_m <= 0) return(10L)
  z <- log2(40075016 * fig_px / (256 * span_m))
  as.integer(min(max(round(z), 4L), 15L))
}

# Fetch USGS topo tiles for an sf bbox. Returns a list with a color
# matrix (suitable for annotation_raster) and the tile bounding box.
# Uses annotation_raster (not geom_raster) so the fill aesthetic
# stays free for choropleth use. When fig_ratio is provided, the
# bbox is expanded to match the figure aspect ratio via
# fit_bbox_to_fig so that tiles cover the full canvas.
fetch_topo_raster <- function(sf_obj, zoom = NULL,
  fig_ratio = NULL) {
  fit <- if(!is.null(fig_ratio))
    fit_bbox_to_fig(sf_obj, fig_ratio = fig_ratio) else NULL
  if(!is.null(fit)) {
    bb <- st_bbox(c(
      xmin = fit$xlim[1], xmax = fit$xlim[2],
      ymin = fit$ylim[1], ymax = fit$ylim[2]
    ), crs = st_crs(3857))
    bb_sf <- st_as_sfc(bb)
  } else {
    bb <- st_bbox(sf_obj)
    pad_x <- (bb["xmax"] - bb["xmin"]) * 0.05
    pad_y <- (bb["ymax"] - bb["ymin"]) * 0.05
    bb["xmin"] <- bb["xmin"] - pad_x
    bb["xmax"] <- bb["xmax"] + pad_x
    bb["ymin"] <- bb["ymin"] - pad_y
    bb["ymax"] <- bb["ymax"] + pad_y
    bb_sf <- st_as_sfc(bb, crs = st_crs(sf_obj))
  }
  if(is.null(zoom))
    zoom <- target_zoom(st_transform(bb_sf, 3857))
  args <- list(
    x = bb_sf, provider = usgs_topo_provider, crop = TRUE,
    cachedir = tile_cache, zoom = zoom
  )
  tiles <- tryCatch(
    do.call(maptiles::get_tiles, args),
    error = function(e) {
      warning("Tile fetch failed: ", conditionMessage(e),
        call. = FALSE)
      NULL
    })
  if(is.null(tiles)) return(NULL)
  rgb_arr <- terra::as.array(tiles)
  if(length(dim(rgb_arr)) != 3 || dim(rgb_arr)[3] < 3) return(NULL)
  col_mat <- matrix(
    rgb(rgb_arr[, , 1], rgb_arr[, , 2], rgb_arr[, , 3],
      maxColorValue = 255),
    nrow = dim(rgb_arr)[1], ncol = dim(rgb_arr)[2]
  )
  ext <- terra::ext(tiles)
  list(
    raster = col_mat,
    xmin = ext[1], xmax = ext[2],
    ymin = ext[3], ymax = ext[4]
  )
}

# Expand a bbox to match the figure aspect ratio so that coord_sf
# fills the canvas without white sidebars. The basin content is
# centered and the narrower axis is padded with additional basemap.
fit_bbox_to_fig <- function(sf_obj, fig_ratio = 12 / 7, pad = 0.08) {
  bb <- st_bbox(st_transform(sf_obj, 3857))
  dx <- unname(bb["xmax"] - bb["xmin"])
  dy <- unname(bb["ymax"] - bb["ymin"])
  if(!is.finite(dx) || !is.finite(dy) || dx <= 0 || dy <= 0)
    return(NULL)
  # Add uniform padding first
  cx <- unname((bb["xmin"] + bb["xmax"]) / 2)
  cy <- unname((bb["ymin"] + bb["ymax"]) / 2)
  dx <- dx * (1 + pad * 2)
  dy <- dy * (1 + pad * 2)
  # Expand the narrower axis to match the target aspect ratio
  current_ratio <- dx / dy
  if(current_ratio < fig_ratio) {
    dx <- dy * fig_ratio
  } else {
    dy <- dx / fig_ratio
  }
  list(
    xlim = c(cx - dx / 2, cx + dx / 2),
    ylim = c(cy - dy / 2, cy + dy / 2)
  )
}

# Add basemap as annotation_raster (does not consume fill scale)
add_topo <- function(p, sf_obj, fig_ratio = 12 / 7) {
  topo <- fetch_topo_raster(sf_obj, fig_ratio = fig_ratio)
  if(is.null(topo)) return(p)
  p + annotation_raster(
    topo$raster,
    xmin = topo$xmin, xmax = topo$xmax,
    ymin = topo$ymin, ymax = topo$ymax
  )
}

# Resolve gage point in target CRS
get_gage_pt <- function(da, target_crs) {
  pt <- st_geometry(da$start_feature)
  if(!inherits(pt, "sfc_POINT"))
    pt <- st_centroid(pt)
  st_transform(st_sf(geometry = pt), target_crs)
}

# Pick the spatially-largest of the available HUC12 sets returned by
# get_drainage_area_estimates: hu12_by_huc08 (if present), otherwise
# hu12_by_huc10, otherwise hu12_by_huc12. Largest is determined by
# the sum of HUC12 dasqkm in each set.
largest_hu12_set <- function(da) {
  candidates <- list(
    huc12 = da$hu12_by_huc12,
    huc10 = da$hu12_by_huc10,
    huc08 = da$hu12_by_huc08
  )
  candidates <- Filter(
    function(x) !is.null(x) && nrow(x) > 0, candidates)
  areas <- vapply(candidates,
    function(x) sum(x$dasqkm, na.rm = TRUE), numeric(1))
  candidates[[which.max(areas)]]
}

# Compute a single merged basin polygon for a given HUC level set
# (hu12_by_huc12, hu12_by_huc10, or hu12_by_huc08), unioned with
# extra_catchments and the local catchments around the gage.
# Uses hydroloom::dissolve_polygons() with gap_tolerance to absorb
# sliver gaps between polygons from different services (WBD HUC12s,
# NHDPlusV2 catchments, split-catchment service).
compute_basin_polygon <- function(da, hu_layer, target_crs,
  tol_m = 250) {
  common_crs <- st_crs(hu_layer)
  pieces <- list()
  pieces$hu <- st_geometry(hu_layer)

  if(!is.null(da$extra_catchments) && nrow(da$extra_catchments) > 0)
    pieces$extra <- st_geometry(
      st_transform(da$extra_catchments, common_crs))

  if(!is.null(da$split_catchment) && nrow(da$split_catchment) > 0) {
    sc <- da$split_catchment
    sc_full <- sc[sc$id == "catchment", ]
    if(nrow(sc_full) > 0)
      pieces$catch <- st_geometry(st_transform(sc_full, common_crs))
  }

  all_polys <- st_sf(
    geometry = do.call(c, pieces)
  ) |> st_set_crs(common_crs)

  hydroloom::dissolve_polygons(all_polys,
    gap_tolerance = tol_m,
    max_hole_area = Inf,
    single_polygon = TRUE) |>
    st_transform(target_crs)
}

# Common theme: no axis ticks, labels, or grid; legend sized for
# readability
map_theme <- function() {
  theme_void() +
    theme(
      legend.position = "bottom",
      legend.box = "horizontal",
      legend.text = element_text(size = 11),
      legend.title = element_text(size = 12),
      legend.key.size = unit(0.6, "cm"),
      plot.title = element_blank(),
      plot.margin = margin(2, 2, 2, 2)
    )
}

# Add the gray basin underlay to a plot. The polygon represents the
# full estimated drainage area envelope (largest available HUC level
# plus extra catchments and local catchments).
add_basin_underlay <- function(p, basin_poly) {
  p + geom_sf(data = basin_poly,
    fill = "gray30", color = NA, alpha = 0.15,
    inherit.aes = FALSE)
}

# Mode 1: Drainage Area Boundaries
plot_boundaries <- function(da, basin_label) {
  # Reproject working layers to Web Mercator (3857) so the basemap
  # tiles align cleanly.
  target_crs <- st_crs(3857)
  gage_pt <- get_gage_pt(da, target_crs)
  basin_poly <- compute_basin_polygon(da, largest_hu12_set(da), target_crs)

  # Dissolve boundaries at each HU level. Each HU level is combined
  # with extra_catchments and split_catchment via compute_basin_polygon
  # so every boundary reaches the gage instead of stopping at the
  # lowest complete HU outlet.
  boundaries <- list()
  boundaries[["HU12"]] <- st_geometry(
    compute_basin_polygon(da, da$hu12_by_huc12, target_crs))
  if(!is.null(da$hu12_by_huc10))
    boundaries[["HU10"]] <- st_geometry(
      compute_basin_polygon(da, da$hu12_by_huc10, target_crs))
  if(!is.null(da$hu12_by_huc08))
    boundaries[["HU8"]] <- st_geometry(
      compute_basin_polygon(da, da$hu12_by_huc08, target_crs))
  if(!is.null(da$all_catchments))
    boundaries[["Network catchments (NHDPlusV2)"]] <- st_union(
      st_transform(da$all_catchments, target_crs))
  if(!is.null(da$nhdplushr_boundary))
    boundaries[["NHDPlusHR"]] <- st_transform(
      da$nhdplushr_boundary, target_crs)

  # Linetype, color, and linewidth all keyed off the source factor so
  # one merged "DA boundary" legend distinguishes the layers.
  # Network catchments (NHDPlusV2) gets a saturated magenta dotted
  # line so it reads distinctly against the basemap and against HU10's
  # longdash. NHDPlusHR uses orange to separate from the HU layers.
  all_ltypes <- c(
    "HU12" = "solid",
    "HU10" = "longdash",
    "HU8" = "dotdash",
    "Network catchments (NHDPlusV2)" = "22",
    "NHDPlusHR" = "solid"
  )
  all_colors <- c(
    "HU12" = "gray10",
    "HU10" = "gray10",
    "HU8" = "gray10",
    "Network catchments (NHDPlusV2)" = "magenta",
    "NHDPlusHR" = "#D55E00"
  )
  all_lwds <- c(
    "HU12" = 0.8,
    "HU10" = 0.8,
    "HU8" = 0.8,
    "Network catchments (NHDPlusV2)" = 0.5,
    "NHDPlusHR" = 0.6
  )
  ltypes  <- all_ltypes[names(boundaries)]
  lcolors <- all_colors[names(boundaries)]
  lwds    <- all_lwds[names(boundaries)]

  bnd_sf <- do.call(rbind, lapply(seq_along(boundaries), function(i) {
    st_sf(source = factor(names(boundaries)[i],
        levels = names(boundaries)),
      geometry = boundaries[[i]])
  }))

  hu12_pp <- if(!is.null(da$hu12_outlet))
    st_transform(da$hu12_outlet, target_crs) else NULL

  # Combine gage + HU12 outlets into one points layer with a single
  # shape scale. Keeping shape only (no color aes) frees the color
  # scale for the boundary symbology.
  pts_list <- list(
    st_sf(label = "Stream gage / outlet",
      geometry = st_geometry(gage_pt))
  )
  if(!is.null(hu12_pp))
    pts_list <- c(pts_list, list(
      st_sf(label = "HU12 outlet",
        geometry = st_geometry(hu12_pp))))
  pts_sf <- do.call(rbind, pts_list)
  pts_sf$label <- factor(pts_sf$label,
    levels = c("Stream gage / outlet", "HU12 outlet"))

  fig_lims <- fit_bbox_to_fig(basin_poly)

  p <- ggplot() |>
    add_topo(basin_poly) |>
    add_basin_underlay(basin_poly)

  # HU12 outlets drawn first so basin boundaries overlay them.
  if(!is.null(hu12_pp))
    p <- p + geom_sf(
      data = pts_sf[pts_sf$label == "HU12 outlet", ],
      shape = 4, size = 1, color = "darkred", stroke = 0.6,
      alpha = 0.75, inherit.aes = FALSE)

  p <- p +
    geom_sf(data = bnd_sf,
      aes(linetype = source, color = source, linewidth = source),
      fill = NA, inherit.aes = FALSE) +
    scale_linetype_manual(values = ltypes,  name = "DA boundary") +
    scale_color_manual(   values = lcolors, name = "DA boundary") +
    scale_linewidth_manual(values = lwds,   name = "DA boundary") +
    geom_sf(
      data = pts_sf[pts_sf$label == "Stream gage / outlet", ],
      shape = 17, size = 5, color = "black", fill = "white",
      stroke = 1.4, inherit.aes = FALSE) +
    coord_sf(crs = target_crs, expand = FALSE,
      xlim = fig_lims$xlim, ylim = fig_lims$ylim) +
    labs(title = paste(basin_label, "-- Drainage Area Boundaries")) +
    map_theme()

  p
}

# Mode 2: Stream Network
plot_network <- function(da, fl, basin_label) {
  target_crs <- st_crs(3857)
  gage_pt <- get_gage_pt(da, target_crs)
  fl <- st_transform(fl, target_crs)
  basin_poly <- compute_basin_polygon(da, largest_hu12_set(da), target_crs)

  # fcode 46003 = intermittent, 46007 = ephemeral, 46006 = perennial
  fl$flow_class <- ifelse(
    fl$fcode %in% c(46003L, 46007L), "Intermittent/Ephemeral",
    "Perennial/Other"
  )
  fl$lwd <- pmin(pmax(fl$streamorde, 1) / 3, 1.5)

  pts <- st_sf(label = "Stream gage / outlet",
    geometry = st_geometry(gage_pt))

  fig_lims <- fit_bbox_to_fig(basin_poly)

  ggplot() |>
    add_topo(basin_poly) |>
    add_basin_underlay(basin_poly) +
    geom_sf(data = fl,
      aes(color = flow_class, linewidth = lwd),
      show.legend = "line", inherit.aes = FALSE) +
    scale_color_manual(
      values = c("Perennial/Other" = "steelblue",
        "Intermittent/Ephemeral" = "lightskyblue"),
      name = "Flow class") +
    scale_linewidth_identity() +
    ggnewscale_color(pts, target_crs) +
    coord_sf(crs = target_crs, expand = FALSE,
      xlim = fig_lims$xlim, ylim = fig_lims$ylim) +
    labs(title = paste(basin_label, "-- Stream Network")) +
    map_theme()
}

# Brazos-specific variant of plot_network that adds a Caprock Escarpment
# trace and label per LH-217625600 / LH-2005481448. The escarpment is
# the eastern margin of the Llano Estacado and the physiographic
# boundary between the High Plains and the Rolling Plains (Wermund,
# 1996). The polyline below approximates its trace through the Brazos
# basin's northwestern arm.
plot_network_brazos <- function(da, fl, basin_label) {
  target_crs <- st_crs(3857)
  caprock_coords <- matrix(c(
    -101.05, 34.50,
    -101.00, 33.70,
    -100.95, 32.85
  ), ncol = 2, byrow = TRUE)
  caprock_line <- sf::st_transform(
    sf::st_sfc(sf::st_linestring(caprock_coords), crs = 4326),
    target_crs)
  caprock_anchor <- sf::st_transform(
    sf::st_sfc(sf::st_point(c(-101.05, 34.50)), crs = 4326),
    target_crs)
  caprock_line_sf <- sf::st_sf(geometry = caprock_line)
  caprock_label_sf <- sf::st_sf(
    label = "Caprock Escarpment",
    geometry = caprock_anchor)

  plot_network(da, fl, basin_label) +
    geom_sf(data = caprock_line_sf,
      color = "black", linewidth = 0.8,
      inherit.aes = FALSE) +
    geom_sf_text(data = caprock_label_sf, aes(label = label),
      size = 3.8, fontface = "italic", color = "gray15",
      hjust = 0, nudge_x = 8000, nudge_y = 22000,
      inherit.aes = FALSE)
}

# Helper to add the gage point as a labeled layer using a shape scale
# that doesn't conflict with other color scales already in use.
ggnewscale_color <- function(pts, target_crs) {
  list(
    geom_sf(data = pts, aes(shape = label),
      size = 5, color = "black", fill = "white", stroke = 1.4,
      inherit.aes = FALSE),
    scale_shape_manual(
      values = c("Stream gage / outlet" = 17),
      name = NULL)
  )
}

# Mode 3: HUC12 Type (hu_12_type)
plot_type <- function(da, fl, basin_label,
  fig_ratio = 12 / 8) {
  target_crs <- st_crs(3857)
  largest <- largest_hu12_set(da)
  hu12 <- st_transform(largest, target_crs)

  type_labels <- c(
    "S" = "Standard",
    "C" = "Closed basin",
    "M" = "Multiple outlets",
    "F" = "Frontal",
    "W" = "Water",
    "D" = "Drainage",
    "I" = "Island"
  )
  hu12$type_label <- factor(
    type_labels[hu12$hu_12_type],
    levels = c("Standard", "Closed basin", "Multiple outlets",
      "Frontal", "Water", "Drainage", "Island")
  )

  gage_pt <- get_gage_pt(da, target_crs)
  fl <- st_transform(fl, target_crs)
  basin_poly <- compute_basin_polygon(da, largest, target_crs)

  pts <- st_sf(label = "Stream gage / outlet",
    geometry = st_geometry(gage_pt))

  fig_lims <- fit_bbox_to_fig(basin_poly, fig_ratio = fig_ratio)

  # 30% alpha baked into the fill so the legend swatches match the map.
  # Standard is light grey so the legend swatch reads against white.
  col30 <- function(col) adjustcolor(col, alpha.f = 0.3)
  type_colors <- c(
    "Standard" = adjustcolor("gray85", alpha.f = 0.4),
    "Closed basin" = col30("#8B4513"),
    "Multiple outlets" = col30("#555555"),
    "Frontal" = col30("#228B22"),
    "Water" = col30("#1E90FF"),
    "Drainage" = col30("#9467BD"),
    "Island" = col30("#FFC107")
  )

  p <- ggplot() |>
    add_topo(basin_poly, fig_ratio = fig_ratio) |>
    add_basin_underlay(basin_poly) +
    geom_sf(data = fl, color = "steelblue", linewidth = 0.25,
      alpha = 0.5, inherit.aes = FALSE) +
    geom_sf(data = hu12,
      aes(fill = type_label),
      color = "gray50", linewidth = 0.5,
      inherit.aes = FALSE) +
    scale_fill_manual(
      values = type_colors,
      drop = TRUE,
      name = "HU12 type")

  p +
    ggnewscale_color(pts, target_crs) +
    coord_sf(crs = target_crs, expand = FALSE,
      xlim = fig_lims$xlim, ylim = fig_lims$ylim) +
    labs(title = paste(basin_label, "-- HU12 Types")) +
    map_theme() +
    theme(legend.box = "vertical") +
    guides(
      fill = guide_legend(
        nrow = 1, order = 1,
        override.aes = list(color = "gray50", linewidth = 0.5)))
}

# Per-basin summary table
basin_summary_table <- function(da, basin_name = NULL) {
  vals <- data.frame(
    source = c("Network", "HU12",
      "HU10", "HU8", "NHDPlusHR"),
    total_sqkm = c(
      da$network_da_sqkm,
      da$da_huc12_sqkm,
      ifelse(is.na(da$da_huc10_sqkm), NA_real_, da$da_huc10_sqkm),
      ifelse(is.na(da$da_huc08_sqkm), NA_real_, da$da_huc08_sqkm),
      ifelse(is.na(da$nhdplushr_network_dasqkm), NA_real_,
        da$nhdplushr_network_dasqkm)
    )
  )
  # Add NWIS rows if this basin has NWIS data
  if(!is.null(nwis_da) && !is.null(basin_name) &&
    basin_name %in% nwis_da$name) {
    row <- nwis_da[nwis_da$name == basin_name, ]
    vals <- rbind(vals, data.frame(
      source = c("NWIS", "NWIS contributing"),
      total_sqkm = c(row$nwis_da_sqkm, row$nwis_contrib_da_sqkm)
    ))
  }
  knitr::kable(vals, digits = 1,
    col.names = c("Source", "Area (sq km)"),
    caption = "Drainage area estimates by source")
}
```

## Basin Vignettes {.tabset}

### French Broad River at Asheville

Well-determined basin with strong topographic signal
(USGS streamgage 03451500). Contributing area essentially equals total
drainage area across all sources.

#### Drainage Area Boundaries

```{r french_broad_boundaries}
plot_boundaries(da_results$french_broad, "French Broad")
```

```{r fb_table}
basin_summary_table(da_results$french_broad, "french_broad")
```

::: {.vignette-note}
- All boundary sources converge tightly around the same watershed outline.
- The HU12-, HU10-, and HU8-derived boundaries are nearly identical,
  consistent with a basin whose divides are topographically unambiguous.
- The gage (triangle) sits on the main stem French Broad River at
  Asheville.
:::

#### Stream Network

```{r french_broad_network}
plot_network(da_results$french_broad, flowlines$french_broad,
  "French Broad")
```

::: {.vignette-note}
- The network is predominantly perennial with dense tributary coverage
  in the Appalachian headwaters.
- Intermittent reaches are sparse, concentrated in low-order headwater
  channels.
- HU12 outlets (x markers) align with major tributary junctions.
:::

#### HU12 Types

```{r french_broad_type, fig.height = 8}
plot_type(da_results$french_broad, flowlines$french_broad,
  "French Broad")
```

::: {.vignette-note}
- Every HU12 in the French Broad basin is classified as Standard.
- The uniform light-grey fill confirms that all units link to a downstream
  HU12 through normal surface drainage.
:::

### Brazos River, West Texas

Arid/ephemeral connectivity basin (USGS streamgage 08116650).
Transmission losses and disconnected uplands create large differences
between total and contributing drainage area.

#### Drainage Area Boundaries

```{r brazos_boundaries}
plot_boundaries(da_results$brazos, "Brazos at Rosharon")
```

```{r bz_table}
basin_summary_table(da_results$brazos, "brazos")
```

::: {.vignette-note}
- Boundary sources diverge substantially.  HU-derived boundaries
  extend further west into arid uplands than the network-derived
  boundary.
- The gap between HU12 total area and contributing area reflects
  large noncontributing designations in western sub-basins.
:::

#### Stream Network

```{r brazos_network}
plot_network_brazos(da_results$brazos, flowlines$brazos,
  "Brazos at Rosharon")
```

::: {.vignette-note}
- Extensive intermittent and ephemeral reaches dominate the western
  (upstream) portion of the basin.
- Perennial flow is concentrated in the lower main stem and major
  tributaries east of the Caprock Escarpment, the physiographic
  boundary between the High Plains (Llano Estacado) and the Rolling
  Plains traced on the figure.
- The transition from ephemeral headwaters to perennial mainstem
  illustrates why contributing area is smaller than total area.
:::

#### HU12 Types

```{r brazos_type, fig.height = 8}
plot_type(da_results$brazos, flowlines$brazos,
  "Brazos at Rosharon")
```

::: {.vignette-note}
- Closed-basin HU12s (brown) concentrate in the arid western uplands
  of the Llano Estacado, where surface water never reaches the Brazos.
- Multiple-outlet HU12s (grey) have more than one outlet to the same
  downstream system — for example, a HU12 with a braided channel
  where flow leaves through more than one point.
- Frontal HU12s (green), where present, are coastal units that drain
  directly to the Gulf of Mexico rather than to a downstream HU12.
- Standard HU12s (light grey) dominate the perennial eastern main stem.
:::

### James River / Cottonwood Lake

Glacial prairie basin (USGS streamgage 06468000). Prairie potholes
create a large noncontributing fraction that varies by sub-basin.

#### Drainage Area Boundaries

```{r james_boundaries}
plot_boundaries(da_results$james, "James River")
```

```{r jm_table}
basin_summary_table(da_results$james, "james")
```

::: {.vignette-note}
- Boundary estimates diverge in the upper basin where glacial
  topography creates ambiguous divides.
- The HU12-derived total area exceeds the contributing area by a
  notable margin, reflecting prairie pothole storage.
:::

#### Stream Network

```{r james_network}
plot_network(da_results$james, flowlines$james, "James River")
```

::: {.vignette-note}
- The network includes a mix of perennial and intermittent reaches.
- Intermittent channels are common in the upper basin where glacial
  drift creates closed depressions and episodic connectivity.
- The lower main stem is perennial and well-defined.
:::

#### HU12 Types

```{r james_type, fig.height = 8}
plot_type(da_results$james, flowlines$james, "James River")
```

::: {.vignette-note}
- Two closed-basin HU12s (brown) appear in the upper prairie-pothole
  headwaters — these are units with no surface outlet to the James.
- The remaining HU12s are Standard (light grey), including the lower
  perennial main stem.
:::

### Black Earth Creek

Small Driftless-Area basin in southern Wisconsin
(USGS streamgage 05406500). Well-determined drainage area with
minimal noncontributing fraction.

#### Drainage Area Boundaries

```{r black_earth_boundaries}
plot_boundaries(da_results$black_earth, "Black Earth Creek")
```

```{r be_table}
basin_summary_table(da_results$black_earth, "black_earth")
```

::: {.vignette-note}
- All boundary sources converge on a compact watershed outline.
- The basin is small enough to fall within a single HU10, so HU10-
  and HU8-level boundaries are not computed separately.
:::

#### Stream Network

```{r black_earth_network}
plot_network(da_results$black_earth, flowlines$black_earth,
  "Black Earth Creek")
```

::: {.vignette-note}
- A short, predominantly perennial network drains the Driftless Area
  landscape.
- Few intermittent headwater reaches are present.
:::

#### HU12 Types

```{r black_earth_type, fig.height = 8}
plot_type(da_results$black_earth, flowlines$black_earth,
  "Black Earth Creek")
```

::: {.vignette-note}
- The single HU12 covering Black Earth Creek is Standard type
  (light-grey fill), consistent with a well-defined basin in
  Driftless-Area terrain.
:::

### Malheur Lake

Malheur Lake basin is an endorheic (closed) basin with no surface
outlet. The terminal feature is a lake rather than a stream gage.

Because the start point falls in a closed-basin HU12 (Malheur Lake,
171200010710) with no flowline pour point on the NHD network,
`get_drainage_area_estimates()` auto-promotes the parent HUC10
(1712000107) into `HU_inclusion_override`. The HU08 estimate then
spans the full Harney-Malheur HU8 17120001 -- including the lake
itself, Harney Lake, and the frontal HU12s along the lake margins --
rather than only the on-network HU12s reachable from the inflow
tributaries.

#### Drainage Area Boundaries

```{r malheur_boundaries}
plot_boundaries(da_results$malheur, "Malheur Lake")
```

```{r ml_table}
basin_summary_table(da_results$malheur, "malheur")
```

::: {.vignette-note}
- The basin is endorheic -- all boundaries terminate at Malheur Lake
  with no downstream outlet.
- HU-derived and network-derived boundaries diverge in the
  surrounding high-desert uplands where surface connectivity is
  ambiguous.
- The shaded grey area is the merged basin envelope used as a
  background underlay, not an HU8 boundary.
- The start feature (triangle) marks the centroid of the Malheur Lake
  waterbody polygon rather than a stream gage.
:::

#### Stream Network

```{r malheur_network}
plot_network(da_results$malheur, flowlines$malheur, "Malheur Lake")
```

::: {.vignette-note}
- Intermittent and ephemeral reaches dominate the network, particularly
  in the southern and eastern tributaries.
- Perennial flow is limited to the Silvies River and Donner und Blitzen
  River corridors draining into Malheur and Harney Lakes.
- The network terminates at the lake with no surface outlet downstream.
:::

#### HU12 Types

```{r malheur_type, fig.height = 8}
plot_type(da_results$malheur, flowlines$malheur, "Malheur Lake")
```

::: {.vignette-note}
- Closed-basin HU12s (brown) in HU8 17120001 -- Malheur Lake itself,
  Harney Lake, Sunset Valley, and Lower Riddle Creek -- mark
  Harney-Malheur units where surface drainage terminates internally.
  These are now included in the HU08 estimate via the closed-basin
  auto-override.
- The brown HU12s south and southwest of the lake (Hay Lake,
  Capehart Lake, Mule Springs Valley, Smoky Hollow, Lake On The
  Trail, Foster Lake) belong to HU8 17120004 Silver Creek-Silver
  Lake -- a separate endorheic basin pulled into the HU08 estimate
  because Silvies/Donner upstream HU12s share its parent HU8.
- Water HU12s (blue) — for example, Mud Lake — designate large open
  water bodies coded as Water type in the current WBD.
- Multiple-outlet HU12s (grey) appear on the low-gradient flats of
  the Harney Basin.
- The surrounding Standard HU12s (light grey) supply the Silvies and
  Donner und Blitzen corridors that actually reach the lake.
:::

### Purgatoire River near Las Animas

Boundary placement uncertainty basin in southeastern Colorado
(USGS streamgage 07128500). Flat terrain adjacent to the basin
produces a drainage divide that different delineation methods place
in different locations, resulting in divergent drainage area estimates
for the gage and all downstream stations. Documented in
[Dupree and Crowfoot (2012, TM 11-C6)](https://pubs.usgs.gov/tm/11c6/).

#### Drainage Area Boundaries

```{r purgatoire_boundaries}
plot_boundaries(da_results$purgatoire, "Purgatoire River")
```

```{r purgatoire_table}
basin_summary_table(da_results$purgatoire, "purgatoire")
```

::: {.vignette-note}
- Boundary sources diverge in the flat terrain along the western and
  southern margins of the basin where the drainage divide is poorly
  defined.
- Manual delineation from USGS topographic maps assigned a
  noncontributing area adjacent to the basin as part of the Purgatoire
  drainage; the WBD placed that area in the neighboring hydrologic
  unit to the west.
- The resulting differences propagate to all downstream gages on the
  Purgatoire and Arkansas Rivers.
:::

#### Stream Network

```{r purgatoire_network}
plot_network(da_results$purgatoire, flowlines$purgatoire,
  "Purgatoire River")
```

::: {.vignette-note}
- The upper basin drains steep terrain along the Sangre de Cristo
  Range with predominantly perennial flow.
- The lower basin crosses the high plains where intermittent and
  ephemeral channels are more common and terrain gradients weaken.
- The transition from montane headwaters to plains illustrates why
  the divide becomes uncertain in the lower-gradient portions of the
  basin.
:::

#### HU12 Types

```{r purgatoire_type, fig.height = 8}
plot_type(da_results$purgatoire, flowlines$purgatoire,
  "Purgatoire River")
```

::: {.vignette-note}
- Every HU12 in the Purgatoire basin is Standard type (light-grey fill).
- The boundary-placement-uncertainty story here is about where the WBD
  places the divide between Standard HU12s, not about closed-basin or
  multiple-outlet designations within the basin.
:::

## Outlet Catchment Splitting

When a gage sits partway along a flowline rather than at its outlet,
the gage's outlet catchment is only partly upstream of the gage. The
downstream portion does not contribute to the gage and should be
excluded. A similar situation arises at HU12 outlets: the HU
boundary cuts through a catchment and the portion upstream of the
pour point overlaps with the HU area that is already counted.

`get_drainage_area_estimates()` handles both splits automatically.
The `outlet_split_threshold_m` parameter (default 100 m) controls the
minimum gage-to-outlet distance before the split is performed.

This example uses a small Texas gage (USGS streamgage 08110075,
Davidson Creek) where the gage falls roughly two thirds of the way
up its flowline.

```{r outlet_split_prep, include = FALSE}
da_dav <- da_results$davidson
fl_dav <- flowlines$davidson
has_davidson <- !is.null(da_dav) && !is.null(fl_dav) &&
  !is.null(da_dav$outlet_split_catchment)

if(has_davidson) {
  target_crs <- st_crs(3857)

  gage_pt <- get_gage_pt(da_dav, target_crs)
  gage_comid <- as.integer(da_dav$start_feature$comid)

  # all catchments in the gap between the gage and HUC12 outlets
  extra_cat <- st_transform(da_dav$extra_catchments, target_crs)
  all_cat <- st_transform(da_dav$all_catchments, target_crs)
  fl_sf <- st_transform(fl_dav, target_crs)
  hu12_pts <- st_transform(da_dav$hu12_outlet, target_crs)

  # HUC12 split catchment (split at the HUC12 pour point)
  hu12_sc <- st_transform(da_dav$split_catchment, target_crs)
  hu12_catch_full <- hu12_sc[hu12_sc$id == "catchment", ]
  hu12_catch_split <- hu12_sc[hu12_sc$id == "splitCatchment", ]
  hu12_comid <- as.integer(na.omit(hu12_sc$catchmentID))

  # gage outlet split catchment (split at the gage point)
  osc <- st_transform(da_dav$outlet_split_catchment, target_crs)
  osc_full <- osc[osc$id == "catchment", ]
  osc_split <- osc[osc$id == "splitCatchment", ]

  # flowlines in the gap area
  gap_comids <- c(gage_comid, extra_cat$featureid, hu12_comid)
  gap_fl <- fl_sf[fl_sf$comid %in% gap_comids, ]

  # focus bbox: union of gap catchments + split catchments + gage
  focus_geom <- st_union(c(
    st_geometry(extra_cat),
    st_geometry(hu12_catch_full),
    st_geometry(osc_full),
    st_geometry(gage_pt)
  ))
  focus_bb <- st_bbox(focus_geom)
  pad_x <- (focus_bb["xmax"] - focus_bb["xmin"]) * 0.05
  pad_y <- (focus_bb["ymax"] - focus_bb["ymin"]) * 0.05
  focus_xlim <- c(focus_bb["xmin"] - pad_x, focus_bb["xmax"] + pad_x)
  focus_ylim <- c(focus_bb["ymin"] - pad_y, focus_bb["ymax"] + pad_y)
}
```

#### Overview: Gage and HU Outlet Positions

The gage (triangle) and the nearest HU12 outlet (x) sit on
different catchments with a gap between them. The gap catchments
(light blue) and split catchment boundaries define the area between
the gage and the HU12 boundary.

```{r outlet_split_overview, fig.height = 10, eval = has_davidson}
p_overview <- ggplot() |>
  add_topo(focus_geom) +
  # all catchments as light underlay
  geom_sf(data = all_cat, fill = "gray30", color = NA, alpha = 0.12,
    inherit.aes = FALSE) +
  # gap catchments
  geom_sf(data = extra_cat, fill = "lightblue", color = "steelblue",
    linewidth = 0.3, alpha = 0.5, inherit.aes = FALSE) +
  # HUC12 split catchment — full outline
  geom_sf(data = hu12_catch_full, fill = NA, color = "gray10",
    linewidth = 0.6, linetype = "dashed", inherit.aes = FALSE) +
  # gage outlet catchment — full outline
  geom_sf(data = osc_full, fill = NA, color = "gray10",
    linewidth = 0.6, inherit.aes = FALSE) +
  # flowlines in the gap
  geom_sf(data = gap_fl, color = "steelblue", linewidth = 0.5,
    inherit.aes = FALSE) +
  # HUC12 outlet
  geom_sf(data = hu12_pts[hu12_pts$comid == hu12_comid, ],
    shape = 4, color = "darkred", size = 1, stroke = 0.6, alpha = 0.5,
    inherit.aes = FALSE) +
  # gage point
  geom_sf(data = gage_pt, shape = 17, color = "black",
    fill = "white", size = 5, stroke = 1.4,
    inherit.aes = FALSE) +
  coord_sf(crs = target_crs, xlim = focus_xlim, ylim = focus_ylim,
    expand = FALSE) +
  labs(title = paste0("Davidson Creek (USGS streamgage 08110075)",
    " -- Gage and HU12 Outlet Positions")) +
  map_theme()
print(p_overview)
```

::: {.vignette-note}
- The gage (triangle) and the HU12 pour point (x) sit on different
  catchments with several gap catchments (light blue) between them.
- The dashed outline marks the catchment where the HU12 pour point
  falls; the solid outline marks the gage's catchment.
- Both the gage and the HU12 outlet sit partway along their
  respective flowlines, so splitting is needed at both locations.
:::

#### HU Outlet Detail

The HU12 split catchment is small enough that it is not visible in the
overview. This view zooms to a 500 m square centered on the HU12 outlet.

```{r outlet_split_huc_zoom, fig.height = 10, eval = has_davidson}
hu12_outlet_pt <- hu12_pts[hu12_pts$comid == hu12_comid, ]
hu12_coords <- st_coordinates(hu12_outlet_pt)
half_side <- 250 # meters in EPSG:3857
zoom_xlim <- c(hu12_coords[1, "X"] - half_side,
  hu12_coords[1, "X"] + half_side)
zoom_ylim <- c(hu12_coords[1, "Y"] - half_side,
  hu12_coords[1, "Y"] + half_side)

p_huc_zoom <- ggplot() |>
  add_topo(hu12_outlet_pt) +
  # gap catchments
  geom_sf(data = extra_cat, fill = "lightblue", color = "steelblue",
    linewidth = 0.3, alpha = 0.5, inherit.aes = FALSE) +
  # HUC12 split catchment — full outline
  geom_sf(data = hu12_catch_full, fill = NA, color = "gray10",
    linewidth = 0.6, linetype = "dashed", inherit.aes = FALSE) +
  # HUC12 split portion
  geom_sf(data = hu12_catch_split, fill = "orange", color = "gray10",
    linewidth = 0.4, alpha = 0.5, inherit.aes = FALSE) +
  # flowlines in the gap
  geom_sf(data = gap_fl, color = "steelblue", linewidth = 0.5,
    inherit.aes = FALSE) +
  # HUC12 outlet
  geom_sf(data = hu12_outlet_pt,
    shape = 4, color = "darkred", size = 1, stroke = 0.6, alpha = 0.5,
    inherit.aes = FALSE) +
  coord_sf(crs = target_crs, xlim = zoom_xlim, ylim = zoom_ylim,
    expand = FALSE) +
  labs(title = paste0("Davidson Creek -- HU12 Outlet Detail",
    " (500 m view)")) +
  map_theme()
print(p_huc_zoom)
```

::: {.vignette-note}
- The HU12 outlet (x) and its split catchment (orange fill, dashed
  outline) are visible at this scale.
- The split catchment upstream of the HU outlet is excluded from the
  gap area because it overlaps with the HU12 drainage area.
:::

#### Split Catchments: Upstream and Downstream Portions

Two catchments are split. At the gage, the downstream portion is
removed because it does not contribute flow to the gage. At the
HU12 outlet, the upstream portion is removed because it overlaps
with the HU12 area already counted in the drainage area estimate.

```{r outlet_split_detail, fig.height = 10, eval = has_davidson}
# Build an sf with labeled polygons for a single legend
split_layers <- rbind(
  st_sf(
    role = "Upstream of gage (included)",
    geometry = st_geometry(osc_split)),
  st_sf(
    role = "Downstream of gage (excluded)",
    geometry = st_difference(
      st_geometry(osc_full), st_geometry(osc_split))),
  st_sf(
    role = "Upstream of HU outlet (excluded, overlaps HU)",
    geometry = st_geometry(hu12_catch_split)),
  st_sf(
    role = "Downstream of HU outlet (included as local area)",
    geometry = st_difference(
      st_geometry(hu12_catch_full), st_geometry(hu12_catch_split)))
)

split_colors <- c(
  "Upstream of gage (included)" = "#4DAF4A",
  "Downstream of gage (excluded)" = "#E41A1C",
  "Upstream of HU outlet (excluded, overlaps HU)" = "#FF7F00",
  "Downstream of HU outlet (included as local area)" = "#377EB8"
)

split_layers$role <- factor(split_layers$role,
  levels = names(split_colors))

p_split <- ggplot() |>
  add_topo(focus_geom) +
  # gap catchments as underlay
  geom_sf(data = extra_cat, fill = "gray80", color = "gray60",
    linewidth = 0.2, alpha = 0.3, inherit.aes = FALSE) +
  # split polygons with role-based fill
  geom_sf(data = split_layers,
    aes(fill = role), color = "gray20", linewidth = 0.4,
    alpha = 0.6, inherit.aes = FALSE) +
  scale_fill_manual(values = split_colors, name = NULL) +
  # flowlines
  geom_sf(data = gap_fl, color = "steelblue", linewidth = 0.5,
    inherit.aes = FALSE) +
  # HUC12 outlet
  geom_sf(data = hu12_pts[hu12_pts$comid == hu12_comid, ],
    shape = 4, color = "darkred", size = 1, stroke = 0.6, alpha = 0.5,
    inherit.aes = FALSE) +
  # gage
  geom_sf(data = gage_pt, shape = 17, color = "black",
    fill = "white", size = 5, stroke = 1.4,
    inherit.aes = FALSE) +
  coord_sf(crs = target_crs, xlim = focus_xlim, ylim = focus_ylim,
    expand = FALSE) +
  labs(title = paste0("Davidson Creek -- Split Catchment Roles"),
    subtitle = paste0(
      "Flowline measure at gage: ",
      round(da_dav$outlet_flowline_measure, 1),
      "; downstream removed: ",
      round(osc_full$dasqkm - osc_split$dasqkm, 2), " km\u00B2")) +
  map_theme() +
  guides(fill = guide_legend(ncol = 1))
print(p_split)
```

::: {.vignette-note}
- **Green**: the portion of the gage's catchment upstream of the gage —
  this area is included in the drainage area estimate.
- **Red**: the portion downstream of the gage — excluded because it
  does not contribute flow to the gage.
- **Orange**: the portion of the HU12 outlet catchment upstream of the
  pour point — excluded because this area is already counted as part
  of the HU12 drainage area.
- **Blue**: the portion downstream of the HU12 outlet — included as
  local area in the gap between the HU boundary and the gage.
- The `outlet_split_threshold_m` parameter (default 100 m) controls
  whether the gage split is performed. If the gage is within the
  threshold distance of the catchment outlet, no split occurs.
:::

```{r teardown, include = FALSE}
options(oldoption)
```
