Testing

Title Page
Preface
Background
Conceptual Framework
Theory
Application
Testing
Prescriptive Use
Conclusion
References
Table3
Figure Captions
Figure1
Figure2a
Figure2b
Figure3
Figure4
Figure5
Figure6
Figure7
Figure8a
Figure8b
Figure9
Figure10
Figure11
Figure12a
Figure12b
Figure13
Figure14
Figure15
Figure16
Figure17a
Figure17b
Figure18
Figure19
Figure20
Figure21

TESTING

An important strength of the SHALSTAB model is that because it can be treated as a parameter free model, it can be tested with field data and can be rejected. Model success occurs if the majority of the landslide scars occur in grid cells with low values of q/T. If high values of q/T are necessary to account for the location of most of the landslides, the model is not a significant improvement over a simple slope map (in which every spot over a critical slope is considered equally unstable). We see possible rejection as beneficial because it should lead to important questions about the quality of the topographic data and the data used to test the model and to questions about the role of processes or factors not considered by the model.

Testing SHALSTAB requires care be given in data collection and analysis. The essential idea is that aerial photographs (or sometimes field work) are used to map all shallow landslide scars in an area. The scars are then overlain onto the map of q/T values and for each scar a q/T is assigned. Histograms of landslide-associated q/T values are then made.

There are several issues to consider here. The standard of landslide mapping must be much higher than normal, particularly when the grid sizes are 10 m and smaller. It is not common for mappers to report their uncertainty in plotting scars on maps, but it is unlikely that scars can be located to within one scar width of its actual location - and commonly it is much worse than that. This error arises in large part because the typical base topographic maps are commonly locally inaccurate so there are few clues for precise locating of the scars, and because errors arise in transferring observations from aerial photographs to maps even when the base maps are pretty good. Commonly the mapper does not or can not distinguish between the landslide scar and the debris flow runout track; the model only applies to the former.

Once a landslide map is made it can be digitized and overlaid onto a map of log(q/T) values. We take the lowest value of q/T that is touched by the landslide scar. We use this approach because there is always uncertainty in locating the scar accurately and we assume that the least stable cell controls the stability of the slide (recall, too, that the slope is calculated as the geometric mean of an area that is 9 times that of the individual cell, hence selecting the minimal q/T to represent potential instability associated with a slide scar makes sense). This approach, however, introduces a bias because for any polygon we always pick the lowest q/T value. Consider placing random polygons the same size of the typical landslide scar (and much bigger than individual cells) on the digital terrain map. If we also record just the lowest value, then the random model will also have a bias towards lower q/T values. If SHALSTAB is successful, however, the random biased model (having the same number of landslides as the observed) will predict significantly smaller numbers of landslide scars at low q/T values than observed.

If the observed and random model are the same, then SHALSTAB may have limited predictive power. The reasons for the word "may" is that the random model may also simply reveal that we can't adequately test the model this way rather than to show it has no predictive power. The larger the landslide scar, the more likely the random placement of it will intersect a low value of q/T, hence the ability to distinguish model performance from random depends on the size of the scar. We consider this random test comparison a tough test and if the observed scars show a higher concentration at a given q/T value relatively to the random case, it is likely due to the success of the model. It is important, as a reminder, that the actual landslide scar (and not the debris flow runout) be mapped and that the random model scar size be similar to the actual scar size.

An example may help illustrate these points. As part of a SHALSTAB validation study in the Northern California Coast Ranges, landslide scars were mapped in six watersheds; their locations were digitized and for each slide minimum q/T values were determined. Scars of average size were randomly placed on the map and minimum q/T was also noted. The same number of random scars as that mapped for the watershed was used and the random model was run 10 times to estimate effects of sample size on the model outcomes. Figure 17 shows results for the largest (143 km²) watershed with the highest number of landslides (432 in-unit failures, 91 road failures in about a 20 year period). Data are plotted as landslide density, which is the number of landslides per unit area of the corresponding hazard category. This graph shows that for log(q/T) values less than -2.5 the actual landslide density is higher than that obtained from the random model for both in-unit failures and those associated with roads. We would expect an even better performance if the topographic base map were of higher resolution. Perhaps surprisingly, in the analysis we have performed in Northern California, Oregon, and Washington SHALSTAB performed equally well for road failures as in-unit failures.

So far, Dietrich et al. (1993), Montgomery and Dietrich (1994), Pack and Tarboton (1997), and Montgomery et al. (in press) have published results reporting comparison between SHALSTAB predictions (Pack and Tarboton used the equation but did their own programming) and landslide locations. In all four cases, the results were favorable, although none used the random model as the null hypothesis. Pack and Tarboton report finding 91% of the landslides in the 739 km² Trout Lake Basin in British Columbia fell within in the high hazard zone which covered just 13% of the landscape. It appears in their data that high hazard was defined for log(q/T) of -3.3 and smaller. A study of 3,224 landslides in 14 watersheds in Oregon and Washington (Montgomery et al., in press) compared landslide locations with a modified form of SHALSTAB in which a root strength is added and soil depth is treated as a constant. The majority of slides occurred in areas of low q/T and the frequency of shallow landslides (# of slides per km²) was related to q/T values.

Another way to test and calibrate the model is to map the landslides in the field and measure drainage area, a, scar width, b and local slope. A plot of these data on a graph of a/b against slope should show a clustering and they can then be used to estimate q/T for instability (as illustrated earlier with digital terrain data from an example reported by Montgomery and Dietrich, 1994). Figure 18 illustrates this approach. Field data collected by crews working for the BLM in the Oregon Coast Range are plotted. The two vertical lines define the threshold to chronic instability (tanq ł 1.0) and the threshold to unconditionally stable (tanq Ł 0.375). The curved line between the two vertical ones is the slope stability model underlying SHALSTAB (equation 7). The two curve diagonal lines define the boundary between saturated and unsaturated conditions for two possible log (q/T) cases. The upper curve defines the q/T used in the slope stability model, and suggests that a log(q/T) of -3.1 is a good descriptor of these data.

Copyright 1998, William Dietrich and David Montgomery
For problems or questions regarding this web contact bill@geomorph.berkeley.edu.
Last updated: November 29, 1998.

Copyright 1998, William Dietrich and David Montgomery For problems or questions regarding this web contact bill@geomorph.berkeley.edu. Last updated: November 29, 1998.

Copyright 1998, William Dietrich and David Montgomery
For problems or questions regarding this web contact bill@geomorph.berkeley.edu.
Last updated: November 29, 1998.