Snow published two version of his cholera map. The first, which appeared in the book On The Mode Of Communication Of Cholera (Snow 1855a), is more famous. The second, which appeared in the Report On The Cholera Outbreak In The Parish Of St. James, Westminster, During The Autumn Of 1854 (Snow 1855b), is more important. What makes it so is that Snow adds a graphical annotation of the “neighborhood”, or set of residences, that he believes is most likely to make use of the Broad Street pump:
By identifying the pump’s neighborhood, Snow sets limits on where we would expect to find fatalities and, more importantly, where we would expect not to find fatalities. Ideally, this would help support his arguments that cholera is a waterborne disease and that the Broad Street pump is the source of the outbreak. As Snow puts it: “it will be observed that the deaths either very much diminish, or cease altogether, at every point where it becomes decidedly nearer to send to another pump that to the one in Broad street” (Snow 1855b, 109).
To help assess Snow’s claims, I provide functions that allow one to analyze and visualize the two approaches to computing pump neighborhoods: Voronoi tessellation, based on the Euclidean distance between pumps, and walking distance, based on the distance travelled along the network of roads in Soho. In either case, the guiding principle is the same: all else being equal, people choose the closest pump. Thus, by aggregating across individual cases, be they observed or simulated, an image of a pump’s neighborhood will emerge.
Not only does this package allows one to compare the two methods, it also allows one to explore different scenarios. For example, Snow argues that residents found the water from the Carnaby and Little Marlborough Street pump (#6) to be of low quality and actually preferred to go to the Broad Street pump (#7).1 Using this package, one can examine this possibility.2
Cliff and Haggett (1988) appear to be the first to use Voronoi tessellation3 to compute pump neighborhoods. In their digitization, Dodson and Tolber (1992) included coordinates for 13 Voronoi cells. These are available in HistData::Snow.polygons.
To allow for replication, I use deldir::deldir(). I find that, with the exception of the border between the neighborhoods of the Market Place and the Adam and Eve Court pumps (pumps #1 and #2), Dodson and Tobler’s computation are otherwise identical.
To allow users to explore the data with this approach, neighborhoodVoronoi() creates an R object that describes the desired neighborhood configuration. The results can be visualized with plot() and summarized with summary().
The figure below plots the “addresses” and Voronoi cells for the 13 pumps in the original map.
plot(neighborhoodVoronoi())
The data frame below summarizes the results in the map. Pearson is (Count - Expected) / sqrt(Expected):
summary(neighborhoodVoronoi())
## pump.id Count Percent Expected Pearson
## 1 1 0 0.00 19.491634 -4.4149330
## 2 2 1 0.31 6.234668 -2.0964402
## 3 3 10 3.12 13.983773 -1.0653256
## 4 4 13 4.05 30.413400 -3.1575562
## 5 5 3 0.93 26.463696 -4.5611166
## 6 6 39 12.15 39.860757 -0.1363352
## 7 7 182 56.70 27.189136 29.6895576
## 8 8 12 3.74 22.112172 -2.1504470
## 9 9 17 5.30 15.531287 0.3726776
## 10 10 38 11.84 18.976903 4.3668529
## 11 11 2 0.62 24.627339 -4.5595790
## 12 12 2 0.62 29.671129 -5.0799548
## 13 13 2 0.62 46.444106 -6.5215206
The most frequent criticism against applying Voronoi tessellation to Snow’s map is that it assumes that people can walk directly to pumps in “as-the-crow-flies” fashion (i.e., Euclidean distance) rather than in indirect fashion along the network of roads.
I see Snow’s annotation as an attempt to addresses this issue. He writes that the annotation outlines “the various points which have been found by careful measurement to be at an equal distance by the nearest road from the pump in Broad Street and the surrounding pumps” (Snow 1855b, 109). In other words, his annotation appears to be based on some computation of walking distance. While the details of are lost to history, I replicate and extend his efforts by writing functions that allow one to compute and visualize the weighted shortest paths between a fatality and its nearest pump along the network of roads.4 While the computation of walking distance is by no means new (see Shiode, 2012), I hopefully offer a more transparent, user-friendly way to apply this technique.
The walking distance approach is based on computing the shortest path between a given fatality and its nearest pump along the network of roads. For example, the figure below plots the path between the case 150 and its nearest pump, the Broad Street pump (“p7”).
walkingPath(150)
By drawing paths for all observed cases, we can begin to get a sense of the different pump neighborhoods.
plot(neighborhoodWalking())
The summary results are:
summary(neighborhoodWalking())
## pump Count Percent Expected Pearson
## 1 p1 0 0.00 23.691938 -4.8674365
## 2 p2 0 0.00 1.308699 -1.1439840
## 3 p3 12 3.74 16.983963 -1.2093591
## 4 p4 6 1.87 24.497021 -3.7371898
## 5 p5 1 0.31 13.063716 -3.3377034
## 6 p6 44 13.71 48.975392 -0.7109488
## 7 p7 189 58.88 31.758191 27.9023093
## 8 p8 14 4.36 23.784361 -2.0062577
## 9 p9 32 9.97 24.243784 1.5752507
## 10 p10 20 6.23 19.288336 0.1620419
## 11 p11 2 0.62 25.240538 -4.6259068
## 12 p12 1 0.31 31.818384 -5.4634982
## 13 p13 0 0.00 36.345676 -6.0287375
To get a sense of entire neighborhoods, I extend the approach from observed data to “expected” or simulated data. Using sp::spsample() and sp::Polygon(), I place 5000 regularly spaced points across the map and compute their walking path to the nearest pump. This allows me to see the full extent and range of the different pump neighborhoods.
I do this in two ways. In the first, I identify neighborhoods by coloring roads.5
plot(neighborhoodWalking(), observed = FALSE)
In the second, I identify neighborhoods by coloring regions.6
plot(neighborhoodWalking(), street = FALSE, observed = FALSE)
While the latter is easier to “read” due to its familiarity, it is potentially less accurate and more misleading. Conceptually, the problem is that the latter takes the focus away from roads and walking distances, and puts it on regions, which may not have a relevant meaning because they include points or locations where there are no roads or residences. Computationally, the problem is that the shape of neighborhoods will be sensitive to the algorithm that determines the closest pump. Depending on our choices, this can produce different results.7
neighborhoodWalking() is computationally intensive. With default arguments and a single core, it takes about 1-2 minutes to run on a 2.3 GHz Intel Core i7.8 The reason is that function performs three significant tasks: 1) it computes the shortest walking path between each observed fatality and all selected pumps; 2) it computes the shortest walking path between each simulated fatality and all selected pumps; and 3) to reduce over-plotting it computes the minimal set of segments needed to represent all walking paths within a given neighborhood.
If you want to explore many different scenarios, it may be better to pre-compute the data or to run in batch mode. For convenience, seven configurations return pre-computed results. The first three uses the original 13 pumps. The second three the revised 14 pumps. The three configurations are: 1) the default set of arguments, which uses all pumps; 2) the default set excluding the pump on Little Marlborough Street (pump 6); and 3) the default set with just the Little Marlborough Street and the Broad Street pumps (6 and 7). The seventh and final is Snow’s Broad Street pump neighborhood, which represents his graphical annotation of the version of the map that appeared in the Vestry report.
Snow writes: “It requires to be stated that the water of the pump in Marlborough Street, at the end of Carnaby Street, was so impure that many persons avoided using it; and I found that the persons who died near this pump, in the beginning of September, had water from the Broad Street pump.”↩
In fact, one can include or exclude any set of pumps one wants. And one can select from two different sets of pump: the 13 pumps in the original map or the 14 pumps in the revised map in Vestry Report. Note that because computing walking distances is computationally intensive, I’ve included six pre-computed scenarios. For details, see the Notes section of the help page for neighborhoodWalking().↩
Another approach is to use GIS. For applications that don’t need to consider the actual historic walking distances, a layers-based approach, which typically relies on current maps, may be sufficient: e.g., https://www.theguardian.com/news/datablog/2013/mar/15/john-snow-cholera-map. To reconstruct the roads represented on Snow’s map, one might also consider John Mackenzie’ approach at https://www1.udel.edu/johnmack/frec682/cholera/cholera2.html.↩
Shiode (2012) uses this approach.↩
Mackenzie (N.D) uses this approach.↩
Cliff and Haggett also produce an adjusted Voronoi cells that reflect walking distances: “So far we have assumed that access to each pump was determined by ‘crow-fly’ distances. While the physical effort of carrying water mean that most people visited their nearest pump, recognition must be made of the complex street pattern in this area of London. Therefore in (D), the Thiessen polygons have been adjusted to take into account the patterns of access made possible by the street system shown in diagram (A) (Cliff and Haggett 1988, 53). However, details about how this was done do not appear to be available. Moreover, because the graphic only shows the outline of the polygon and not the streets, comparisons with other approaches is difficult.↩
With 4 physical cores and 8 logical ones, it takes about 45 seconds.↩