The dodgr_dists_categorical
function enables multiple distances to be aggregated along distinct categories of edges with a single query. This is particularly useful to examine information on proportions of total distances routed along different edge categories. The following three sub-sections describe the three main uses and interfaces of the dodgr_dists_categorical
function. Each of these requires an input graph
to have an additional column named "edge_type"
, which labels discrete categories of edges. These can be any kind of discrete labels at all, from integer values to character labels or factors. The labels are retained in the result, as demonstrated below.
The “default” interface of the dodgr_dists_categorical
function requires the same three mandatory parameters as dodgr_distances
, of
graph
on which the distances are to be calculated;from
points from which distances are to be calculated; andto
points.As for dodgr_distances
, the from
and to
arguments can be either vertex identifiers (generally as from_id
and to_id
columns of the input graph
), or two-column coordinates for spatial graphs. The following code illustrates the procedure, using the internal data set, hampi
, initially starting by reducing to the largest connected component only, to ensure all points are mutually reachable.
graph <- weight_streetnet (hampi, wt_profile = "foot")
graph <- graph [graph$component == 1, ]
graph$edge_type <- graph$highway
table (graph$edge_type)
##
## path primary residential secondary service steps
## 2767 106 32 560 184 28
## track unclassified
## 518 454
That network then has 8 distinct edge types. Submitting this graph to the function, and calculating pairwise distances between all points, then gives the following result:
v <- dodgr_vertices (graph)
from <- to <- v$id
d <- dodgr_dists_categorical (graph, from, to)
class (d)
## [1] "list" "dodgr_dists_categorical"
length (d)
## [1] 9
sapply (d, dim)
## distances path primary residential secondary service steps track
## [1,] 2270 2270 2270 2270 2270 2270 2270 2270
## [2,] 2270 2270 2270 2270 2270 2270 2270 2270
## unclassified
## [1,] 2270
## [2,] 2270
The result has the dedicated class, dodgr_dists_categorical
, which it itself a list of matrices, one for each distinct edge type. This class enables a convenient summary
method which converts data on aggregate distances along each category of edges into overall proportions:
summary (d)
## Proportional distances along each kind of edge:
## path: 0.5134
## primary: 0.0162
## residential: 4e-04
## secondary: 0.1561
## service: 0.0607
## steps: 0.0018
## track: 0.1017
## unclassified: 0
If summary
results like those immediately above are all that is desired, without any additional information from the full distance matrices, then a proportions_only
parameter can be used to directly return those:
dodgr_dists_categorical (graph, from, to,
proportions_only = TRUE)
## path primary residential secondary service steps
## 0.5132842476 0.0160102898 0.0004094485 0.1561022988 0.0606827519 0.0018216218
## track unclassified
## 0.1017598199 0.1499295217
Queries with proportions_only = TRUE
are constructed in a different way in the underlying C++ code, with the main difference being that the full list of matrices is not stored, and these queries will generally use considerably less memory. For most jobs, this should translate to faster queries, as illustrated in the following benchmark:
bench::mark (full = dodgr_dists_categorical (graph, from, to),
prop_only = dodgr_dists_categorical (graph, from, to,
proportions_only = TRUE),
check = FALSE, time_unit = "s") [, 1:3]
## # A tibble: 2 × 3
## expression min median
## <bch:expr> <dbl> <dbl>
## 1 full 1.18 1.18
## 2 prop_only 0.369 0.418
The third and final use of the dodgr_dists_categorical
function is through the dlimit
parameter, used to specify a distance threshold below which categorical distances are to be aggregated. This is useful to examine relative proportions of different edges types necessary in travelling in any and all directions away from each point or vertex of a graph.
When a dlimit
parameter is specified, the to
parameter is ignored, and distances are aggregated along all possible routes away from each from
point, out to the specified dlimit
. The value of dlimit
must be specified relative to the edge distance values contained in the input graph. For spatial graphs obtained with dodgr_streetnet()
or dodgr_streetnet_sc()
, for example, as well as the internal hampi
data, these distances are in metres, and so dlimit
must be specified in metres.
The result in then a single matrix in which each row represents one of the from
points, and there is one column of aggregate distances for each edge type, plus an initial column of overall distances. The following code illustrates:
dlimit <- 2000 # in metres
d <- dodgr_dists_categorical (graph, from, dlimit = dlimit)
dim (d)
## [1] 2270 9
head (d)
## distance path primary residential secondary service steps
## 339318500 10085.921 7931.597 0 0 0 2069.0068 0
## 339318502 4143.007 3527.852 0 0 0 615.1548 0
## 2398958028 4160.784 3545.630 0 0 0 615.1548 0
## 1427116077 4179.437 3564.283 0 0 0 615.1548 0
## 7799710916 6201.269 5418.331 0 0 0 782.9384 0
## 339318503 6225.570 5610.415 0 0 0 615.1548 0
## track unclassified
## 339318500 0 85.31705
## 339318502 0 0.00000
## 2398958028 0 0.00000
## 1427116077 0 0.00000
## 7799710916 0 0.00000
## 339318503 0 0.00000
The row names of the resultant data.frame
are the vertex identifiers specified in the from
parameter. Such results can easily be combined with spatial information on the vertices obtained from the `dodgr_vertices()~ function to generate spatial maps of relative proportions around each point in a graph or network. Summary statistics can also readily be extracted, for example,
hist (d$path / d$distance,
xlab = "Relative proportions of trips along paths", main = "")
Trips along paths are roughly evenly distributed between 0 and 1. In contrast, proportions of trips along service ways – used to facilitate motorised vehicular access in the otherwise car-free area of Hampi, India – are distinctly different:
hist (d$service / d$distance,
xlab = "Relative proportions of trips along service ways", main = "")
These distributions provide more detailed and nuanced insights than those provided by the overall summary
functions above, which only revealed overall respective relative proportions of 0.51 and 0.06 for paths and service ways. The results within the distance threshold reveal that the distributional forms of proportional distances differ as much as the aggregate values, and that both aspects of the function provide distinct insights into proportional distances along categories of edge types.
Finally, this use of the function also utilizes distinct difference in the underlying C++ code that are even more efficient that the previous case of proportional distances. The following code benchmarks the three modes:
bench::mark (full = dodgr_dists_categorical (graph, from, to),
prop_only = dodgr_dists_categorical (graph, from, to,
proportions_only = TRUE),
dlimit = dodgr_dists_categorical (graph, from, dlimit = 2000),
check = FALSE, time_unit = "s") [, 1:3]
## # A tibble: 3 × 3
## expression min median
## <bch:expr> <dbl> <dbl>
## 1 full 0.925 0.925
## 2 prop_only 0.387 0.398
## 3 dlimit 0.0899 0.0995
Finally, note that the efficiency of distance-threshold queries scales non-linearly with increases in dlimit
, with queries quickly becoming less efficient for larger values of dlimit
.