Chapter 5


5.1 Introduction

This chapter investigates the automatic classification of macro morphological landforms using GIS and digital elevation models (DEM). In the past, manual methods have been used for classifying macro morphological landforms from contour maps. Hammond's (1954 and 1964) procedure has to a certain extent become the de facto standard. A process developed by Dikau et al. (1991), which automates Hammond's manual procedures using GIS, is applied to the study area. Although this produces a classification that has good resemblance to the landforms in the area, it has some problems. A new process is presented that partly solves these problems. Landform classification is very sensitive to the operational definition used and this will be demonstrated. An application of fuzzy set theory that uses the notion of entropy is used to present this sensitivity.

For landscape classification, landform should be classified by morphology rather than rock type, structure, age or origin. It is usually the morphology that gives the greatest visual impression to the general public. Usually the rock type or structure is not even seen from a reasonable distance as the land may be covered by trees or buildings. Landscape assessment is concerned with the present character rather than the genesis. Genetic concepts are useful for understanding the processes forming the landforms but do not necessarily describe the appearance of a landform. The aims of a visual landscape classification are different from those of a genetic geomorphological classification, and therefore a different approach is required.

Within the fields of geomorphology and hydrology, the automatic mapping of morphological landforms has been of interest, for instance in modelling erosion (Dikau et al., 1991), providing watershed information (Band, 1986), and mapping land components (Dymond et al., 1995). A morphological landform classification has long been of interest to climatologist for developing climate models - topoclimatology (Geiger, 1971). Although these disciplines have a different purpose for landform information compared to landscape research, the ideas and methods initiated are very useful. In general, geomorphological classifications are based at the meso-relief, micro-relief and nano-relief levels, while landscape classification needs to incorporate macro-relief, and some elements of meso-relief (Linton, 1970). Dikau (1989) defines the macro landform scale to be landform greater than 10 square km and less than 1000 square km in area.
 
 

5.2 Manual classification

Hammond (1954 and 1964) has developed a macro morphological landform classification that was applied to the whole of North and South America. Wallace (1955) used Hammond's classification, with a few modifications, to classify New Zealand's landforms. Hammond's classification is very quantitative with clear, explicit definitions that can be easily applied by other researchers. It is perhaps this quality that explains why Hammond's classification has been so widely applied. The classification scheme used by Hammond is presented in Figure 5.1. A combination of three important parameters was used to identify different landforms. These were relative (local) relief, slope, and profile type. Relative relief is the maximum difference in height over a certain area. Hammond used a square grid measuring 9.65km (6 miles) across to determine the search area. After experimenting with different grid sizes, Hammond (1964) found that this size was

      "neither too small as to cut individual slopes in two and thus distort the determination of local relief, nor so large as to include areas of excessive diversity" (p.17).
       

Gentle slope is used to distinguish areas of relief and non relief. He chose 8 percent inclination as the upper limit of gentle slope, justifying this value by saying that it,

      "falls within the range of inclination in which the difficulty of machine cultivation increases rapidly, erosion of cultivated fields becomes troublesome, easy movement of vehicles becomes impeded, and in general one becomes highly conscious that he [sic] has a sloping surface to deal with" (p.17).

He also noted that the Soil Conservation Service in the U.S. had used this threshold. However, the method used to identify this critical gradient is not explained by Hammond. As discussed in section 0, this is an elusive parameter to define. Profile type is explained in more detail in section 0. It is a means for expressing whether flat areas are above or below the surrounding terrain and so is used for identifying tablelands.

Subsequent to Hammond's work other landform classification schemes have been developed. Many are an adaptation of Hammond's work and Table 5.1. summarizes three of these.

Wallace (1955) has produced the only morphological classification of landforms for the entire of New Zealand (refer to Figure 5.2.). A 1:1,000,000 base map was used and this was completed nearly forty years ago. As previously mentioned, Wallace used a method based on Hammond's scheme. Wallace (1955) remarked regarding future developments that he

      "earnestly hoped that others with more advanced concepts and better databases will work on a larger scale and reveal the inadequacies of this early effort" (p. 27).

Wallace did not explicitly calculate slopes because this would have been too laborious. Today, such slope information is easily available because automatic extraction of information from digital databases has advanced considerably. These data would have probably been beyond Wallace's most wild hopes. Despite these advances, which will be discussed and demonstrated in this chapter, there has been very little further development in New Zealand with this type of morphological classification since his attempt. This study will try to fulfil Wallace's hope.

The only other real initiative or discussion on morphological landform classification in New Zealand since Wallace's effort has been in response to the Protected Natural Areas (PNA) programme (Myers et al., 1987). The PNA programme was instigated to satisfy the requirements of the Reserves Act (1977) which established provisions for

      "...the preservation of representable samples of all classes of ecosystems and landscape...".

A discussion on landform classification resulting from this produced two papers: "Terrain evaluation for rapid ecological survey" (Crozier and Owen, 1983); and "A landform classification for PNA surveys in Southern Alps" (Whitehouse, Basher, and Tonkin, 1990). It appears that the main emphasis of the PNA survey was the protection of ecosystems and, in particular, significant representations of natural flora. As a result, there was no deliberation over visual landscape assessment theory. Crozier and Owen's classification scheme is based on the work of Wallace, which in turn can be traced back to the work of Hammond. The classification scheme devised by Whitehouse (et al.) appears to have been the adopted scheme used in the PNA program for the Southern Alps. This was genetically based which means that landform data collected for the PNA program is not the most appropriate for a visual landscape classification. Landform data from the PNA program is also difficult to use because most of it is not in digital format, and also the definitions of the different landform classes are not precise enough. For example, "valley floor" is defined as, "the comparatively broad, flat bottom of a valley". How broad is broad? With several different field teams, there could be inconsistency between different areas.

There have been many publications that describe New Zealand's landforms from a genetic perspective. A recent notable example is Soons and Selby (1982) but this does not help much for the development of a landform classification that needs to be morphological.
 
 

5.3 Automated classification

Computers have been used for extracting terrain parameters from DEMs for at least the last twenty years. Collins (1975) discussed different algorithms that could be used for identifying features such as tops of hills, bottoms of depressions, watershed or depression boundaries and areas, storage potential of watersheds, slope, and aspect. With the development of commercial GIS and national digital databases (NDDB) in the mid 1980s, there has been a resurgence of interest in this field (Dikau, 1989, Weibel, 1988, Weibel and Heller, 1991, Dikau et al., 1991, and Moore et al., 1993). Significant advances have been made, and many processes for identifying these parameters are now becoming standard functions within a GIS. Functions have been developed for generalising extensive terrain surfaces using triangulated irregular networks (TIN) (Midtbo, 1992). TIN and other algorithms have been used for generating DEMs from contours (Weibel and Heller, 1993), and slope can be obtained easily from either a TIN or a DEM. It is not the intention of this thesis to discuss in detail the mechanics of these functions as many general GIS books do this (eg. Aronoff, 1991). What is of interest in this thesis is how these parameters can be used to identify different landforms.

Regarding landscape research, there have only been a few published works on automatic landform classification. Barbanente et al.(1992) developed routines for identifying ravines and cliffs automatically. These are not features that can be justifiably included in a landscape classification because of the need to generalise. Jackson (1990) used GIS to identify certain terrain parameters using what are now fairly well known GIS functions. It is necessary now to determine more complex parameters and how these parameters can be used for identifying landforms.

The identification of parameters (parameterization) is an important first step in identifying landforms. These parameters are then used to develop parametric signatures of different landforms (described as formalisation). Dikau (1989) used this approach to identify plateaux, convex scarps, straight front slopes, concave foot-slopes, scarp forelands, cuesta scarps, valleys and small drainage ways, and crests. Many of these landform features are, however, at the nano-meso scale, which is too detailed for a landscape classification that requires macro scale landforms.

Dikau, Brabb, and Mark (1991), in a very obscure publication, developed automated routines that do identify macro landforms. The process they developed automates Hammond's manual process nearly exactly and produces a similar result, which they demonstrated on the landforms of the entire state of New Mexico in the United States. Given that Hammond's classification has, to a certain extent, become the standard approach for a morphological landform classification, this is a significant development. In any classification, standardisation is important. The automated process developed by Dikau et al. is therefore of particular relevance to this thesis and will be discussed in detail.
 
 

5.3.1 Automating Hammond's classification scheme

Table 5.2 compares Hammond's scheme with the automated scheme developed by Dikau et al. The main difference between the two approaches is the number of classes identified and the method of generalization. The combination of parameter classes that Hammond's classification identifies could provide as many as 96 landform units, but it only identifies the more common landform units, which totalled 45. Perhaps this was required for practical reasons. The automated approach identifies all 96 landform units. Hammond's process also merged areas smaller than 2072 square kilometres into adjacent areas, so that the information could be generalized on to a 1:5,000,000 scale map. The automated approach does not do this.

Another difference concerns the use of spatial averaging windows. While a similar size square window was used by Dikau et al. (9.8 km sides compared to Hammond's 9.65 km), the averaging procedure was different. Hammond's approach moves the window along in 9.65km steps. This means that all the area within the window is generalised to one landform type. With the automated approach a neighbourhood function is used, as described in section 3.2.1.1, and its window moves in 200m steps, where 200m is the raster cell length. For each step, a generalization of the window was calculated and this information was assigned to the focal cell (the cell in the centre of the window). With Hammond's scheme, areas near the edge of the window boundary could be easily generalised wrongly as information outside the window boundary could be important to these areas but would not have been considered. This problem is partly solved with the automated approach using a neighbourhood focal function.

The basic procedures used in the automated approach developed by Dikau et al. are described in table 5.3.. It identifies the three components required - slope, relative relief, and profile type. Slope was calculated using a three by three moving window on a DEM, and from each placement of the window, the nine adjacent elevation points were used. Relative relief was calculated using a 49 by 49 moving window on a DEM (200m cell size). For each window placement, the difference between maximum and minimum elevation was used as the measure of relative relief. Figure 5.3. illustrates how the profile type was identified. As mentioned previously, profile type is used to determine whether the flat areas are above or below the surrounding terrain and is used principally for identifying tablelands. Three classes are distinguished: lowland gentle sloping, upland gentle sloping, and not gentle sloping. Upland and lowland profiles are identified by first calculating the maximum elevation within the moving window. The height of the central grid cell is subtracted from this. If this is less than half of the relative relief within the moving window, then the central cell is identified as upland. Otherwise, the central cell is lowland. The resulting upland and lowland coverage is then overlaid with a slope coverage to identify upland and lowland gentle sloping areas. The percentage of gentle sloping areas that are in lowland profiles is then calculated using a focal neighbourhood function.

Once these three components have been identified and classified, unique combinations are found by overlaying them. These are listed in Table 5.4., where the codes are the same as used in Hammond's scheme (refer to Figure 5.1.). The subclasses are labelled using a capital letter, a number, and a small letter. These represent the different components used for identifying the subclasses. The capital letters from A to D represent different slope classes, the numbers from 1 to 6 represent different relative relief classes, and the small letters from a to d represent the different profile classes. The combinations of the different classes identify the 96 different subclasses. Once the subclasses are identified, the landform classes and types are determined by grouping the subclasses as shown in Table 5.4

The database used by Dikau et al. for classifying the landforms of New Mexico was a 100m grid DEM. This was used to generate a 200m grid DEM. The software they used was a grid modelling system, an image processing system, and ARC/INFO. The hardware they used was a Sun Sparc 2, Vax 4000, Microvax II, and Prime.
 
 

5.3.2 Automated classification of New Zealand's landforms

Given that Hammond's landform classification scheme is reasonably well recognised and accepted, and also given that this scheme has been previously automated, it was decided that an automated process based on Hammond's scheme should be investigated for classifying New Zealand's landforms. ARC/INFO, a Sun Sparc 10 workstation, and a 100m contour database with spot heights were used. The contour database was converted to a 200m grid DEM using ARC/INFO's TIN, and TIN to grid functions. The process was thereafter similar to that developed by Dikau et al. (1991). A range of neighbourhood functions, as discussed in section 3.2.1.1. were used, as well as, a slope function within the GRID module of ARC/INFO, and a classify function (CLASS). The same class intervals, codes and labels were used as in Dikau et al. (1991). Figure 5.4 shows the different stages of the process for the Banks Peninsula region. First a DEM was produced. From this, slope can be calculated, which was then classed as less than or greater than (and equal to) 8 percent. The "mean slope" was calculated by assigning the value 100 to areas that were gentle sloping (< 8%) and the value zero where it was not. A focal mean function with a NAW of 5600m was then used to calculate the percentage of the neighbouring area that was gentle sloping. These percentages, classed into intervals, define the "mean slope" component. Relative relief was calculated from the DEM using a focal range function and a NAW of 5600m. A circular pattern results because of the influence of high points that affect the whole of the circular NAW. The relative relief values were classed into six intervals. Profile was calculated from the DEM by using a focal maximum function, and relative relief to identify upland and lowland profiles. This was then combined with the slope classes to identify the three profile classes. The profile component is represented by "profile percent" classes, which describe the percentage of gentle sloping areas that are in lowland profiles. The spatial averaging procedure used to accomplish this was as follows. A focal sum function counts the number of cells in the neighbourhood that were gentle sloping, and also the number of cells classed as lowland gentle sloping. From these values, the percentage of gentle slope areas that are lowland can be calculated. Figure 5.5 shows the resulting landform classes for the study area. The processing time was about two hours.

One difference between the process developed in this study and that developed by Dikau et al. was the shape of the NAW. Dikau et al. used a square window, while the process developed in this study uses a circle. A circle seems more appropriate than a square, for the obvious reason that the extent of the boundary of a circle will always be the same distance from the focal point, unlike a square. With the latest GIS technology it is easy to use a circle as a moving window. Perhaps it was not a viable option when Dikau et al. were developing their process. The radius used for the search window in this study was calculated to be 5529m in order for the area of the window to be the same as that used by Dikau et al. and Hammond. This radius is rounded to a multiple of the cell size, which with a 200m cell size becomes 5600m.

The automated process produces a classification (Figure 5.5.) that has resemblance to the landforms of this area and is similar to Wallace's classification of the same area. It is difficult to quantitatively compare these two classifications since Wallace's (1955) classification is not available digitally. Wallace classifies virtually all of Banks Peninsula's landform as "low mountains". The automated approach identifies a significant proportion of Banks Peninsula as "low mountains" as well, but it also recognises that large parts of Banks Peninsula have flat areas, either as broad spurs on the far eastern parts of Banks Peninsula, or as valley floors. These flat areas have affected the classification and have resulted in a proportion of Banks Peninsula being identified as "open low mountains". The automated approach has also integrated plains and hills to generate a class that is a composition of these classes. As identified in the criteria given in section 2.9, composition is important for landscape classification.

The automated process, however, does have some problems. The first of these is the large regular shaped block in the Canterbury plains identified as "flat or nearly flat plains" in Figure 5.5.. In reality there is no significant visual difference in landform between this area and the neighbouring areas on the Canterbury Plains. This area is the result of difficulties in producing an accurate TIN when the contours are far apart. Subsequently, this affects the slope calculation, which is important for distinguishing classes. This problem could be resolved if more contours or spot heights were added.

A second problem with the automated approach is the way classes change as the distance away from the areas of relief increases. For example, in Figure 5.5 the area between the Canterbury Plains and Banks Peninsula has a series of classes going from "plains" to "plains with hills" to "plains with high hills" to "plains with low mountains" to "low mountains". This reflects a progressive change in relative relief towards Banks Peninsula and is not a particularly desirable result. It is not how you would expect people to conceptualize the landforms in this area. As discussed above, it is desirable to have a composition class that incorporates the change from plains to mountains but this should not be done with progressive zonation.

A third problem with this automated approach is that some areas that are quite different in appearance are being classified the same. This is particularly the case with areas classified as "open" Some areas are "open" because they are at the interface between the plains and the mountains, while other areas are also "open" because they are in a broad valley, or on flat spurs. The process cannot distinguish between these different landforms. On the north eastern side of Banks Peninsula an area is classified as "open low mountains" and as previously noted this was because of the large flat spurs in this region. It does not seem appropriate that this area should be classified the same as areas that are at the interface between mountains and plains. The operational definition is unable to distinguish some objects that are of micro or meso scale, such as flat spurs, from objects that are of macro scale, such as plains. It is also for this reason that some areas are classified as "tablelands" when they are just ordinary hills.

Related to this scale issue is slope. Slope is very dependent on the scale at which it is measured, a matter that will become more apparent in section 5.4.3 when the effects of cell size are examined. This process uses the same slope criteria as Hammond (8 percent), but measures slope at a different scale, thereby, in effect, adopting a different slope criterion. It is necessary to determine whether this new slope criterion is appropriate. This issue regarding slope is discussed further in section 5.4.4.

If it was thought to be appropriate that conical volcanoes should be identified in the classification then this could in theory be included in an automated process. Dikau (1989) shows how concave and convex surfaces (in any direction) can be identified by using aspect and slope. It seems viable that conical shapes could be identified by their convex surfaces in the horizontal direction, and, possibly, concave surfaces in the vertical direction to develop a parametric signature of conical shaped volcanos. However, the issue is whether it is appropriate that volcanos are included in a landscape classification.

Although this automated classification has problems, it nevertheless has important advantages over manual processes. These are that it is totally explicit and that it can also be applied to large areas to produce results relatively quickly. This automated approach can also be viewed as just the start of a process that can evolve as better techniques develop. Because the process is explicit, one can analyse and improve on it.
 
 

5.4 Sensitivity to operational definition

The automated approach developed by Dikau et al. (1991) and then subsequently implemented in New Zealand is very dependent on critical thresholds specified for different parameters. For example, an eight percent slope threshold is used, and particular bounds are chosen for the component class intervals. The process also uses a neighbourhood analysis window that is defined by its radius. It would be interesting to know the effect of changing these values. With GIS and the use of macro programmes, it is possible to structure the process so that different thresholds can be easily changed. The macro used to run the landform classification process developed in this study contains variables for all parametric thresholds. These variables were then defined at the beginning by a separate sub-macro. As the processing time was only two hours it was possible to produce many different classifications that were the result of different parameter settings. Figure 5.6, Figure 5.7, and Figure 5.8. show, respectively, the effect of different slope thresholds, relative relief class intervals, and NAWs on the resulting landform classification (the relative relief class intervals are altered by dividing or multiplying the class bounds by the factors shown in Figure 5.7.). The amount of agreement (ie. percentage of cells with the same class) between the classification that uses 2 percent slope and the classification that uses 14 percent slope is 21% for the Banks Peninsula area. The agreement between classifications with relative relief decreased by a factor of 4 and increased by a factor of 4 is 91%, and between a NAW of 1,000m and 10,000m radius is 43%. These figures show that the resulting classification is very dependent on how these parameters, especially slope and the NAW, are defined. However, the sensitivity to these parameters will depend on location.

The sensitivity analysis does not produce surprising results. The way the process is structured it is not surprising that if you change the definition of gentle sloping from being less than 2 percent slope to less than 14 percent slope, then there will be more "open mountains". By definition, in this classification process, for an area to be classified "open" it must contain a certain proportion of flat areas. By using 14 percent, then more areas will be identified as gentle sloping, and therefore more area will be identified as "open". The changes in relative relief levels have not affected the classification outcome substantially for the Banks Peninsula region, but it is easy to conceive that changes in relative relief classes could affect the outcome in certain locations where the topography is close to being either a mountain or a hill.

The effect of different NAW radii on the classification process is more complicated. It needs to be remembered that NAWs were used at many different stages of the process. It is used to calculate the percentage of area that is gentle sloping, the relative relief, and three times when calculating profile. The same size NAW was used for all these operations. The radius of the NAW will affect the boundary between areas of relief and no relief, subsequently the distinction between the classes "plains", and "plains with hills or mountains" changes with different radii. With relative relief, the larger the NAW then the more likely that the difference between the highest point and the lowest point will be greater. The size of the NAW also affects the amount of generalisation. When the NAW radius is only 1000m, the classification is more detailed than when the NAW radius is 10,000m. With a 1000m radius, micro relief is being identified, such as flat spots on the eastern spurs that have been identified as tablelands. As discussed previously, with landscape classification the identification of macro landforms rather than micro landforms is important. Small flat areas on spurs are not macro relief. 

Figure 5.6, Figure 5.7, and Figure 5.8. show 21 different landform classifications of the same area. For each figure only one parameter has been altered and the others have been held constant. If the combinational effect of changing several parameters simultaneously was investigated, then virtually hundreds of different classifications would be produced.

5.4.1 A definitive classification

When Hammond produced his landform classification, it would not have been practical to investigate the effects of different operational definitions. It would have been important that the definitions of different landforms be chosen and only these are implemented, as this task would have been laborious enough. Now with GIS technology, one can see that it is possible to investigate different parameter thresholds. But it is still difficult to choose which operational definitions are appropriate as it depends on whose conceptual model is being considered. For example, a Dutch person will probably have a different definition of a mountain than a Nepalese. When viewing landforms, some people may focus on small areas, while others may view more widely and get an overall impression. As demonstrated, it is now possible to produce many different conceptual models of landforms, but having hundreds of classifications is of little use to research that needs a single frame of reference. A single classification needs to be decided upon.

One way of choosing an appropriate classification is to use the class that occurs most frequently (majority), for a given cell, from a wide range of different classifications that represent many different conceptual model. This can be easily implemented with GIS. The more advanced GIS software can do this with one command. Although hundreds of different conceptual models can be created, it seems that with ARC/INFO (version 6.2) only 47 coverages could be incorporated in the majority function. Figure 5.9 is the majority of 45 different classifications. The following parameter settings were used:

Five slope settings - 4, 6, 8, 10, and 12 percent;

Three relative relief settings - Hammond's,

Hammond's divided by 2, and

Hammond's multiplied by 2; and

Three NAW radii - 2400m, 5600m, and 8400m.

The combination of all these settings produces 45 different classification. It should be noted that when the majority function is used in ARC/INFO and there is no clear majority (ie. when two or more classes share the highest frequency) for a particular cell, then no value is assigned to that cell. For Banks Peninsula there were a few cells where this was the case, but where this happened the cell value from Hammond's parameter settings was used instead. It should also be noted that a cell size of 400m was used because of the amount of processing involved.

A majority classification could be used as a definitive classification because it incorporates a wide range of conceptual models. However, a majority classification is sensitive to the range of conceptual models chosen, and perhaps a different range is more desirable. With GIS this majority calculation is very quick, so different ranges of parameter setting could easily be experimented with. On the other hand it could also be argued that Hammond's classification should be the definitive classification as it has been in use since 1954 and has become a de facto standard.
 
 

5.4.2 An application of fuzzy set theory

As discussed in section 2.9., landscapes are fuzzy entities, as they are based on human conceptualization and this varies between different people. Fuzzy set theory provides a means of presenting this fuzziness by providing information that shows the degree of membership of different classes that exist for each cell. Using the example presented in the previous section, membership is calculated by comparing all the 45 different outcomes. For each class, a coverage is created that shows the degree of membership (frequency of occurrence) that exists for different cells. The membership of each class was calculated by first generating grid coverages that consisted of only the value for that class, for example a grid coverage that consisted only of 2 (2 corresponded to "tablelands"). An "equal to" function was used to count for each cell how many of the 45 different classifications equalled this blank coverage value. This provided information on the membership of that class. This process was repeated for all the classes. Figure 5.10 shows the results for the landform types. In this case there are only five possible classes so this information can be easily presented. When there are hundreds of different classes, which will be the case with a landscape classification that consists of the unique combination of four different attributes, then this information will not be easy to present and would in fact be too much for anyone to assimilate.

One way of presenting this membership information for easier assimilation is to use the notion of entropy (Wilson, 1970, Ashby, 1994). Entropy provides information on the distribution of the membership of the different classes for a given area (in this case a cell). It is implemented by first calculating for each class the proportion of the 45 outcomes that are assigned to that class. Thus if a particular cell is assigned to class A in 15 outcomes, the coverage for class A will show a value P of 0.33 for that cell, while coverages for the other classes will show P values totalling 0.67. The entropy coverage is then created by combining these P values with the formula for entropy (Eqn. 5.1). If the membership of one class is very high and the membership of all the other classes is low then entropy will be low. If the memberships of all the classes are fairly even and there is no class that stands out, then entropy will be high. Low entropy indicates a high degree of consensus between classifications, and a high entropy value means there is very little consensus between classifications.

The equation for entropy of a cell is:

The entropy calculated from the 45 different landform classifications generated for the Banks Peninsula area is shown in Figure 5.11.

The entropy values show that when the classes are general there is more agreement, but as the classes become more specific there is less agreement. It is interesting to speculate whether this reflects consensus in society. Are people more likely to agree that a particular landform is a mountain but less likely to agree whether the mountain is high or low?

Entropy appears useful for evaluating landscape classifications and their application. For instance, one use for a landscape classification is a frame of reference for psychophysical landscape assessment, as discussed in section 2.5.1. It would be appropriate if the photos for the public preference surveys were taken of areas where there is agreement over its classification. Entropy provides this information.

Figure 5.11

The entropy values calculated in Figure 5.11. are not specific to any one classification. They provide general information about a particular area. However, it is possible to provide consensus information that is specific to one classification. If a definitive classification is agreed upon (and perhaps this will be a majority classification) then it will be appropriate that consensus information is obtained that is specific to that classification. This can be done by again using the "equal to" function to count how many of the 45 classifications equal a suggested definition for each cell. If the majority classification, as shown in Figure 5.9., is accepted as the definitive classification then the amount of agreement between this and the 45 different landform classifications can be calculated. The result is shown in Figure 5.12. It can be argued that this approach (which will be now referred to as the agreement model) is better than the use of entropy. The agreement model is easier to understand and to implement within GIS. On the other hand, entropy does provide additional information about all the other possible classes that could be classified for a given area.

This application of fuzzy set theory is simpler than that used by Burrough (1989) and Burrough et al. (1992) for soil classification. Nevertheless, it is still an effective application. Burrough's et al. (1992) approach is more complex because it considers the probability of the different parameter settings that produce the possible outcomes, whereas in this study, the probability of the different parameter settings is assumed to be equal. This assumption is necessary because it is not known what the probability of the different settings should be. Perhaps some settings, such as 14 percent slope, are unlikely to agree with public perception, and this should be incorporated in the process by assigning this parameter setting a low probability. This application is simpler also because it uses simulation to determine membership rather than complex mathematical calculations. It should be remembered that the results from these fuzzy set theory applications, presented previously, do not express the statistical probability of a class. The results can only be used as a relative indication of the probability of different classes.

5.4.3 The effects of cell size on the classification process

The effects of using different cell sizes on the process were also investigated, and produced some interesting results. Figure 5.13. shows that different cell sizes have a significant effect on the resulting landform classification. Over the whole study area, the agreement between 200m and 500m cell size for the landform classes was 90%, although for Banks Peninsula it was only 61%. The reason for this effect of cell size was investigated by visualizing, for each cell size, the individual stages of the process. Figure 5.14 and Figure 5.15  show the process for 100m and 1000m cell sizes respectively. It is apparent that it is the variation in the slope classes that are causing most of the variation in the output . Figure 5.16. and Figure 5.17. show the effect of cell size on slope classes (70% agreement between 100m and 1000m cell size for Banks Peninsula), and "mean slope" (54% agreement between 100m and 1000m cell size for Banks Peninsula) respectively. The reason for this variation in slope with different cell sizes becomes apparent when the cells are examined in relation to the contours and TIN lines (Figure 5.18). With this automated process the DEM is produced from the TIN coverage. The DEM is then used to determine slope by using a neighbourhood function that compares the heights of the neighbouring cells and then calculates slope. From Figure 5.18., it is clear that as the cell size is increased the detail in the topography is being lost. With a 100m cell size, non macro topography is being identified, such as flat spots on spurs and ridge tops, and small steep sections. With the larger cell sizes, such topography is being lost and it even appears that detail at the macro scale is being lost as well. This difference is thus affecting the "mean slope" (Figure 5.17.). This effect depends on the presence or absence of different scales of topography, and whether this topography consists of flat objects or steep objects. It illustrates the scale dependency of slope that Dymond and Harmsworth (1994), and Moore et al. (1993) have also illustrated.
 
 

5.4.4 Slope - the elusive parameter

Slope is a critical parameter for identifying landforms and is used in manual methods as well as in automated methods. Yet slope is difficult to objectively measure. To measure slope objectively using manual techniques in the field, usually requires that a scale be specified by choosing a particular slope length. Calculating the mean slope using a slope length of one metre will give a different result to using a slope length of one kilometre. It is also necessary to specify where these slope lengths begin and finish. For practical reasons, manual methods for calculating the mean slope of an area have not been explicit, and so it is difficult to automate these using GIS.

A comparison was made between GIS generated slope measurements and manual slope measurements for the whole of the study area. The LRI contains manually measured slope information classed into intervals for areal units. The LRI slope information was reclassed as flat if it contained a slope interval less than 12 percent, otherwise it was reclassed as non-flat. It was then stored as a 200m resolution GIS layer. For comparison, a GIS generated slope coverage was produced from a 200m cell size DEM. From this, a range of flat/non flat coverages were produced based the following thresholds: 1, 2, 4, 6, 8, and 12 percent. These were then compared with the classified LRI slope coverage, by calculating the amount of agreement (number of cells classified the same). The agreements for the different slope thresholds were as follows:

Slope Percentage agreement

1 87

2 88

4 88

6 88

8 87

12 84

These agreement figures appear to be quite high but they actually reflect quite significant differences between manual and GIS slope measurements. The analysis was done on very general slope classes (just two classes) and these classes have a dramatic effect on the classification outcome. If two classifications were derived for the study area and they both used a 12 percent threshold but one was based on the GIS slope measurements and the other on the LRI data, then only 84% of the area in the classifications would be in agreement (ie. 16% would be different). This analysis shows that it is unwise to take slope thresholds based on manual measurement and use them in classifications based on GIS measurement. The GIS slope measurements used in this study and Dikau et al. (1991) are not flawed, they are just obtained differently.

If slope information from the LRI is used in the process then the "mean slope" is relatively stable with different cell sizes as shown in Figure 5.19. There is 98% agreement between 100m and 1000m cell size for Banks Peninsula. It is apparent from a comparison of Figure 5.17 and Figure 5.19. that using LRI slope information provides a more stable result in relation to cell size than using the DEM derived slope information. The slope information in the LRI is obtained from field measurements that are determined at a macro scale. This information is stored in a polygon coverage. Because these polygons are large, detail is not lost when these polygons are converted to grids, even with large cell sizes. The problem with using the LRI is that the slope information for each areal unit is given as an interval. If the terrain within the areal unit is variable then this slope interval may be large. There can also be more than one slope interval given for an areal unit. It can therefore be difficult to determine if the slope of an areal unit is above or below the slope criteria. With the LRI data it was assumed that an areal unit was "not flat" if it contained a slope interval that extended above the critical slope threshold of 8%, and because slope information is stored in intervals this resulted in a 12% threshold being used. It should be noted that the LRI may be inconsistent because of the difficulties in determining a totally explicit field method for calculating slope, and that not all countries have access to such databases.

As demonstrated in the previous section, the "mean slope" determined from DEMs changes considerably when the cell size is changed. How do we know what is the best cell size to use? Also, is it desirable to have a process that is dependent on a particular cell size? What happens if an accurate DEM with 200m cell size is not available? Alternative methods for automatically calculating slope were therefore investigated.

Instead of calculating slope from a DEM it is possible to derive slope from a TIN (based on the slope of the triangle facets), and then convert this slope information directly to a grid coverage. Figure 5.20. shows the effect of different cell sizes on slope obtained directly from a TIN. There is 53% agreement in slope classes between 100m and 1000m cell size for Banks Peninsula. There are some obvious differences with this figure compared to Figure 5.16. where slope is obtained directly from a DEM, especially with larger cell sizes. The slope calculated directly from TIN is still very sensitive to cell size because of the effects of micro topography. The TIN identifies micro relief objects but these are generalised when converted to a grid coverage. The degree of generalisation depends on what cell size used. The use of TIN therefore does not solve the problem.

Another alternative method for determining "mean slope" that reduces the effect of micro relief and is less sensitive to changes in cell size is to first remove small flat areas from the slope class grid before the "mean slope" is calculated (slope can be calculated from either a DEM or directly from a TIN). Small flat areas can easily be identified by their size. From the definition for macro landform size given by Dikau (1989), this threshold size should be 10 square kilometres. Once identified, these flat areas can be converted to non-flat areas. This approach is implemented in the following section.
 
 

5.5 A new automated landform classification process

As previously mentioned, Dikau et al.'s (1991) classification process has certain problems. These being that it produces a progressive zonation when landform changes from plains to relief, it does not distinguish open valleys from a plains-mountain interface, and it is affected by micro relief. A new process was therefore developed that partly solves these problems. This process was developed using a 500m cell size to ensure the processing time was not too great. It will be demonstrated that the outcome is not severely affected by cell size.

Figure 5.21 and Figure 5.22. show the different steps in the first phase of the process, which in summary produces three classifications of landform:

        1) a set of six relief types,

        2) a division of "flat" types into open valley and plain, and

        3) identification of a special class of tableland within the "plain" type.
         
         

Starting with a DEM, a slope grid was derived just like Dikau et al.'s (1991) process, and this was classified according to slope. However, a 4 percent threshold was used instead of an 8% threshold to distinguish the low gradient cells. The reason for this is discussed later. Any small flat areas that were less than 10 square kilometres in size were then converted to non-flat areas to produce a "macro slope classes" grid. The next three steps identified open valleys. An open valley is a large flat area that has relief on opposite sides. This pattern was identified using an expand and shrink sequence (as used for identifying indented coastlines in the previous chapter). Areas identified as non-flat were expanded by 3000 metres (with a 500m cell size this corresponds to six cells), and then shrunk by 3000m. The effect of these two steps was that flat enclosed and semi-enclosed areas (open valleys) became non-flat. Open valleys were then identified by using a conditional statement on the "macro-slope classes" grid and the "shrunken" grid. That is, if a cell was flat in the "macro-slope classes" grid and was not in the "shrunken" grid then it was class as an open valley. For an area to remain classified as an open valley, it also had to be more than 10 square kilometres in size. A conditional statement was used for this.

Relative relief was determined by Dikau et al.'s (1991) process by using a focal range function. For areas that were previously identified as non-flat, the relative relief was classified into five classes to produce a relief type grid. The relief classes were:

0-150m - Low hills

150-600m - Hills

600-900m - High hills

900-1500m - Mountains

Above 1500m - High mountains

These relative relief classes are slightly different to those used by Dikau et al. They are intended to reflect how New Zealanders conceptualise terrain in New Zealand, although there is no substantive evidence to suggest how this is. The Banks Peninsula region is classified as high hills by Glasson (1991) in a visual assessment study. A relative relief interval of 600-900m achieves this. Two mountain classes are recognised, distinguishing the grander mountains, which often have permanent snow and bare rock, from the others. It should be noted that flat cells defined by gradient were maintained as flat areas even though some had high relative relief neighbourhoods.

Tablelands were identified from upland and lowland profiles and these profiles were identified in a similar way to Dikau et al.'s process. However, the actual identification of Tablelands was simpler than Dikau et al.'s because "profile percent" classes were not used. Instead, if an area was upland and flat in the macro-slope coverage, then it was identified as a tableland. No tablelands were identified in the whole region using this process.
 
 

A coverage that has the potential to identify eight morphological landform classes (five relief types, plains, open valley, and tableland) was then produced by overlaying the maps of relief types, open valleys, and tablelands. Figure 5.23 shows this for the whole study area. This landform components map cannot be used in a landscape classification in this form because it does not contain composition classes, but instead identifies the sharp boundaries between different landform types (eg. plains and mountains). However, it could be used for other purposes (eg. climate and hydrology modelling).

Once the landforms had been conceptualised, the second phase of the landform classification could commence. Landform compositions were identified in a similar way to that used for the landcover attributes. Each of the eight landform components were singled out into individual grids, with the value 100 assigned to cells where the particular component is present, and the value zero where it is not. A focal mean function, with a 3000m radius NAW, was then applied to each component grid, and these mean values were placed into one of four class intervals (the results are shown in Figure 5.24 and Figure 5.25.). These eight spatial influence grids were then overlaid to produce a new grid that contained unique combinations of them (a vector representation is shown in Figure 5.25.). Since eight grids were combined and each had the possibility of four different classes, then the combined grid had the possibility of 65,536 unique classes. However, there were only 613 unique combinations in the study area. Twenty two landform classes were then identified by querying this combined coverage. The classes are listed in Table 5.5. under level 1, and the definitions used to identify them are described in Appendix 4. The classes have been chosen because of their distinctiveness in form, and to a certain extent reflect the classes used by past classifications. Checks were made to ensure that the definitions were mutually exclusive and exhaustive as described in section 4.2.3. Not all these landforms existed in the study area. The resulting level 1 classification is shown in Figure 5.26..

In deriving a landform component map, several parameter thresholds had to be determined - 4 percent slope, a 6000m maximum valley width criteria, and as already discussed the various relative relief classes. A slope of 4 percent was used for distinguishing flat and non-flat areas. This differs from Hammond's 8 percent, which was also adopted by Dikau et al. (1991). As discussed in section 0, using DEMs to derive slope produces a different result compared to using field measurements. Therefore it is likely that a different slope threshold is needed with automation compared to Hammond's method. The effects of different slope thresholds were investigated by implementing the process with different slope thresholds (Figure 5.27.). The amount of agreement between the use of a 1% slope threshold and an 8% threshold is 67%. With 8 percent, 7,528 more cells were classed as plains or open valleys than with 1 percent. The opposite occurred for the classes containing relief. Low hills and hills are virtually absent with 8 percent, and the non relief classes extend well into areas that can be regarded as relief.

A comparison was made between the resulting slope classes and the slope information in the LRI (similar to that shown in section 0 but this time using a 500m cell size). As previously discussed, the LRI slope information is based on areal units, slope is given in class intervals, and occasionally more than one interval is given to an areal unit. Despite this, it still provides the best available representation of slope for which a comparison can be made. A slope interval of 0-7 degrees (based on LRI intervals of 1-3 and 4-7) was used to represent flat areas. The 4 percent threshold produced a slope class grid that had the highest agreement with the LRI (91%). The slope threshold of 1 percent and 8 percent both had agreements of only 88%. Four percent therefore seems an appropriate threshold. Even when 4 percent was compared with the LRI slope interval of 1-3 degrees, the agreement was still high (90%). Although hills are not very well represented with a 4 percent threshold, it appears more suitably for identifying the extent of open valleys.

A 6000m maximum valley width threshold was decided upon by assessing the effects of different width criteria. Valley widths vary considerably and topographic maps show that these can be 5000m in the Rangitata catchment. To be sure all such valleys were identified, 6000m was decided upon (this was achieved by using an expand and shrink of 3000m). If the maximum valley width criterion is set too high then some large basins become identified as open valleys.

The landform classification can be easily generalised by grouping different classes. This was done to produce six different levels of generalisation. The way the different classes were grouped is shown in Table 5.5. Figure 5.28 shows graphically the effect of different levels of generalisation. No keys are provided with this figure to avoid cramming, but the colours are the same as used in Figure 5.26 and the keys can be ascertained by using this and Table 5.5. Like the rationale for the level 1 classes, the classes in levels 2-6 have been chosen because of their distinctiveness in form. At the more general levels this distinctiveness needs to be more apparent.

This new process produces a landform classification that does not have the same problems as that developed by Dikau et al. (1991). The interface between relief and plains is not identified as a progressive zonation, valley floors are distinguished, and micro relief does not alter significantly the outcome. Cell size, however, still affects the classification. There is 89% agreement between level 1 classifications based on 200m and 500m cell sizes. This is similar to the 90% found for Dikau et al.'s landform classes. However, for a comparison between this new process and Dikau et al.'s to be valid, it needs to be done at a similar level of generalisation. For level 3, which has a similar number of classes as Dikau et al.'s landform types, there is 93% agreement between 200 and 500m cell size. Cell size is still affecting the calculation of slope classes with this new process, despite the removal of small flat areas. Slope classes particularly affect the boundaries of large open valleys that gradually get steeper and therefore do not have a distinct boundary.

What this classification identifies as open valleys perhaps does not agree with how most people conceptualize valleys. The definition of an open valley as a large flat area that has non-flat areas on opposite sides, is perhaps too simple. People often associate rivers with valleys, so perhaps a river must be in the vicinity. This could be incorporated in the classification process. Another issue is that where there is an isolated hill surrounded by flat areas, the flat area between the hill and a nearby non-flat area becomes identified as a valley. This can be seen in Figure 5.26. on the edge of the Canterbury Plains. This is a problem with the process. One may also think that the maximum width of a valley should be determine by how high the surrounding relief is. For example, in the head of the Rangitata catchment the relief is very high, so although the flat areas are very wide (5 km), one still gets an impression of being in a valley. If the surrounding relief had been only low hills then this area perhaps would not be conceptualised as a valley. This problem could be solved with context dependent definitions that take the relative relief into account, but this makes the process more complicated.

As with the components discussed in chapter 4, the use of a 3000m search radius for determining the spatial influence of different components can also be questioned. There has been no cognitive research that can be used for determining what spatial influence different components of the landscape have on people's conceptualisation of the landscape. One could argue that this figure should not be constant for landforms. Some components, such as high mountains, have more spatial influence than other components, such as low hills. The use of context dependent search radii could also be incorporated into the process.
 
 

5.6 Summary

Automating landform classification is an interesting challenge. It produces classifications that have a good resemblance to manual methods, and because definitions are explicit they can be easily identified, questioned, and improved. This has been demonstrated with Dikau et al.'s (1991) process. Several problems were encountered when applying it to the study area: it produced a progressive zonation when landform changes from plains to mountains; it did not distinguish open valleys from a plains-mountain interface; and it was affected by micro relief. Also, the same slope threshold was used as Hammond's even though slope was measured differently. Although automating existing quantitative manual processes are important steps in the evolution of automation, definitions may need to be calibrated. This is the case with slope measurements. The effects of scale and generalisation also need special attention.

Dikau et al.'s (1991) process can be improved by adopting a 4% slope threshold, removing non macro relief, identifying open valleys using an expand/shrink sequence, using different relative relief classes, and by using spatial influence information of each component to identify landform compositions. A new process has been developed that adopts these improvements. There are opportunities for improving the process further with the use of more context dependent definitions, and the identification of particular distinctive landforms such as conical volcanos.