CHAPTER 4


4.1 Introduction

This chapter outlines the development of a process for automatically classifying landcover for the use in a landscape classification. Landcover for the purpose of this study consists of vegetation, naturalness and water. The classification of these three attributes is addressed separately. Although they are related, for instance exotic vegetation affects the naturalness of an area, this separation is necessary to simplify the task. Most of the components that contribute to landcover are already conceptualised in the digital databases available in New Zealand, while this is not so with landform components. For this reason the classification of landcover is relatively simple compared to the classification of landform and is therefore presented first. All figures and tables are placed at the end of the chapter.
 
 

4.2 Vegetation
 
 

4.2.1 Past research

Virtually all manual landscape classifications, which have used the physical landscape components, have used vegetation as an attribute. The vegetation classes have been based on major differences in vegetation form. Classes such as grassland, scrub, and forest, and classes that are compositions of form have been commonly used (eg. Linton, 1970, and Auckland Regional Authority, 1984). These classes are similar to those used in Raunkiaer's life form classification (cit. in Tansley, 1946). The plant taxonomy used by Botanists and Ecologists has not been used because it is based on plant evolution rather than outward appearance.

The use of GIS for incorporating vegetation information within landscape studies is becoming increasingly common. Lesslie et al. (1988) and Kliskey and Kearsley (1993) both incorporated vegetation within their wilderness identification processes, but only used the distinction between exotic and natural vegetation. Bird et al. (1994) used GIS to monitor landscape change and included many different vegetation classes. The classes were manually derived from aerial photos and GIS was used for analysing change. In New Zealand there has not been any attempt to derive suitable vegetation classes for a landscape classification automatically from existing databases.

The automatic classification of vegetation using remote sensing has been widely researched (Leckie, 1990), and the results appear promising for use within landscape studies. DOSLI has completed a pilot project that successfully mapped broad vegetation classes using Landsat images (Dept. of Survey and Land Information, 1994). The use of remote sensing techniques will not be investigated in this study because remote sensing is concerned with creating NDDB, while this study is concerned with using NDDB.
 
 

4.2.2 Suitable databases

Landcare has produced a digital vegetation map of New Zealand (Newsome, 1995), and vegetation information is also included in their LRI. The Ministry of Forestry has produced a coverage of indigenous and exotic forests, and DOSLI, as mentioned previously, has experimented with the use of Landsat images to produce a landcover map of the central North Island. Landcare's vegetation database is currently the most suitable to be used in a landscape classification. It has nationwide coverage with 49 different vegetation classes. It was derived from the LRI and from field work, but is slightly dated since it was based on field work from 1981-1987. Newsome (1995) notes that the accuracy of this database is acceptable at the scale of mapping, which was 1:1,000,000, but cautions that the exotic forests and pasture-scrubland classes have changed since it was published. In contrast, DOSLI's landcover database, which is being derived from Landsat images, is using a base map at a scale of 1:50,000. It, however, only has 20 different classes and has only been completed for the central North Island. The Ministry of Forestry data sets are nationwide, were developed from base maps at a scale of 1:250,000, but only records the presence of two classes, exotic forest and indigenous forests.

For this study Landcare's vegetation database (Newsome, 1987) was updated using the Ministry of Forestry's exotic forest database. If Landcare's database was not recording exotic forest in a certain area and the Ministry of Forestry's was, then Landcare's database was changed to exotic forest, otherwise it did not change. These two databases were used because they were available for the study area, and also because they identify classes that are necessary for a landscape classification. The age of Landcare's database was not considered a serious drawback in this study, where the primary concern is the classification process. Once a process has been developed, it can be easily applied to current databases when they become available.
 
 4.2.3 Classification process

Landcare's vegetation database contains 47 classes, which are listed in Figure 4.1. Newsome (1987) describes precisely what these classes are. This database is provided in vector format and one of the first tasks was to convert it to a raster format with a cell size of 500m. Very little spatial accuracy is lost during this conversion because the minimum size polygon of the vector coverage is 500 ha which is considerably larger than the cell size used. To preserve the vegetation class information in the grid coverage, this attribute had to be represented by integers before being converted.
 
 

Some of Landcare's classes are too detailed to be used in a landscape classification and so were generalised. It is doubtful whether the general public perceive the difference between a lowland podocarp-broadleaved forest and a highland podocarp-broadleaved forest from a distance. To most people these would be just indigenous forests. If they could, it is still doubtful whether this distinction would be significant in determining landscape quality. Twelve groups were created from Landcare's original classes. These are listed below, along with Landcare's classes that constitute each group.

1 Horticulture (C1, C2)

2 Pasture (G1, G2, GS1, GS2, GS3, GS6,

GF1, GF2, GF3, GF4)

3 Tussock grassland (G3, G4, G5, G6, GS4, GS5, GS7,

GS8, GF5, GF6)

4 Lowland indigenous scrub (GS1, GS2, GS3, S1, S2, FS1, FS2,

FS3, FS4, FS5, FS6, FS8, M4)

5 Exotic scrub (GS6, S4, FS8)

6 Alpine scrub (GS4, GS5, GS7, GS8, S3, FS7,

M4)

7 Indigenous forest (GF1, GF2, GF3, GF5, GF6,

FS1, FS2, FS3, FS4, FS5, FS6, FS7,

F1, F2, F3, F4, F5, F6, F7, F8)

8 Exotic forest (GF4, FS8, F9)

9 Alpine herbfields, rock, and ice (M1)

10 Wetland (M2)

11 Sanddune (M3)

12 Vegetation not significant (Urban areas, lakes, and large rivers)

This level of generalisation was selected to ensure that important vegetation groups were included. These groups form the basis of the twelve components of the vegetation attribute; they distinguish major changes in the form or colour of the vegetation, and whether it is native to New Zealand. The groups are based predominantly on the author's knowledge of New Zealand's vegetation, along with information from Newsome (1987). Newsome (1987) also groups the classes, as shown in Figure 4.1.. However, these groups were not used because they do not distinguish between exotic and native vegetation, nor between tussock and lowland pasture. If the original vegetation database had contained information on the form, colour, and naturalness of each class then this generalisation could have been implemented automatically. It will become apparent that when different compositions of these groups are considered and these are combined with other landscape attributes, the classification becomes quite detailed.

The list above shows that some of Landcare's classes have been included in more than one group. For example, pasture exotic forest (GF4) is included in group 2 (pasture) and group 8 (exotic forest). It is necessary to do this to establish the presence or absence of each group, and it does not matter whether these groups spatially overlap. A separate grid coverage was made for each group, and these are illustrated in Figure 4.2 and Figure 4.3

Altitude information was used to allocate occurrences of some Landcare classes to the groupings. M4 (Pakihi heathland communities) can exist in a (sub) alpine environment and a lowland environment (Newsome, 1987). To know what group to assign different areas of this class, it was necessary to use altitude information, which can be easily implemented with GIS. A threshold of 500m was used to assign this class to either lowland scrub or (sub) alpine scrub. Landcare's databases also contain a class that consists of ice, snow, scree, and sand. Sand dunes were distinguished from this class using an altitude threshold of 200m.

Once the 12 single theme vegetation component grids had been derived, vegetation compositions were then determined. This required a series of steps. Considering each vegetation component grid separately, the value 100 was assigned to the cells where the component was present, and the value zero to the cells where it was absent. A focal mean function with a neighbourhood analysis window (NAW) radius of 3000m was passed over each grid. The resulting mean values give the percentage of the NAW that contains the vegetation component. This in effect describes the spatial influence of each vegetation component for each cell. The rationale for a 3000m NAW will be discussed in section 0 along with the effects of other NAW radii. The results were classified (based on critical thresholds) into four levels of spatial influence and are shown in Figure 4.4 and Figure 4.5. The threshold levels 0%, 20%, and 50% were used since 0% indicates presence/absence, 20% seems an appropriate minimum presence (this is discussed in section 0), and 50% indicates a majority. It was not necessary to determine the spatial influence of the 12th class because urban areas and water are classified later.

These 11 spatial influence grids were then overlaid to produce a grid that contained the unique combinations of these grids. The last map in Figure 4.5 shows a vector representation of this combined grid. There were 360 unique combinations identified in the study area. If all the possible combinations had been present in the study area this would have totalled 4,194,304. If the spatial influence grids had been classified into more than four levels, then the process would have the potential to identify even more combinations. For example, if each coverage had contained five levels then there is the potential for 48,828,125 unique combinations to be identified. It was necessary to keep the number of levels to around four since the software appears to have a limit of about 10,000,000 potentially unique combinations.
 
 

This combined grid still contains information on the spatial influence of the 11 vegetation components. Therefore, vegetation compositions can be identified by querying this. For example, an indigenous forest-tussock composition is common in New Zealand. This class can be identified by doing a query for areas where there is a spatial influence of both indigenous forest and tussock. Forty six different vegetation classes (not all are compositions) were identified in this way. These are listed in Table 4.1. under level 1. The definitions used for identifying each class are given in Appendix 1, and were implemented using ARCPLOT. These definitions generalise the large number of possible compositions to a manageable size by using a relatively complex set of rules based on many different attribute values. Not all these vegetation classes existed in the study area. The resulting vegetation classification is shown in Figure 4.6..

Many different classes could have been identified using the spatial influence information. The classes identified in Table 4.1 under level 1 have been chosen because they reflect major differences in appearance (form and colour), naturalness, and contentiousness. Some classes, such as wetlands, are contentious in landuse planning. What is important about this process is not the actual classes identified but the fact that it demonstrates that vegetation compositions (associations) can be expressed. As our understanding of landscapes improves and a substantive rationale develops for the importance of different classes then the above classes can be revised using other explicit definitions.

By using ARCPLOT to select compositions based on a set of definitions, it is possible to list the compositions that have not been accounted for. Depending on the remaining compositions, the definitions were either altered so that the compositions were included or a definition for a new class was developed if this was considered appropriate. It was also possible to check that the definitions were mutually exclusive by counting the number of compositions selected for each class. If the total number selected was greater than the number of compositions available, then some compositions were selected twice and therefore the definitions overlapped and alterations were needed. A check was also made to ensure all areas were selected.

In theory it is not necessary to combine the grids, and then query the attribute table of the combined grid. It is possible to use a series of conditional statements that query each individual component grid, however, this would be a slower process, and it would have been difficult to check that definitions were mutually exclusive. With ARC/INFO the quickest method is to use ARCPLOT to query the attribute table of the combined grid.

This vegetation classification can be further generalised by grouping different classes. This was done to produce six different levels of generalisation. The way the different classes were grouped is shown in Table 4.1. Figure 4.7 shows graphically the effect of different levels of generalisation. No keys are provided with this figure to avoid cramming, but the colours are the same as used in Figure 4.6. and the keys can be ascertained by using this and Table 4.1.. Such generalisation is important for reasons discussed in section 2.9.. This will become even more apparent when the different landscape attributes are combined to produce a landscape classification.

Why were six levels of generalisation developed and what rationale is there for the different classes within each level? The fact that there are six levels is not particularly important. What is important is that different levels of generalisation can be easily expressed. Perhaps it will become apparent if the classification is used in landscape research which of the levels are important. The classes used for levels 2-6 were chosen for similar reasons as the level 1 classes. They reflect important differences in appearance, naturalness, and contentiousness, but as the levels become more general these reasons need to be more apparent.

The generalisation process used to identify the six levels uses relatively simple conditional rules based on the existence of one attribute. This is kept simple so that a hierarchical structure is produced whereby the relationships between generalisation levels can be easily interpreted. Information on classes at the general levels can be applied to classes that are at more detailed levels because the classes feed into each other - many to one going from detailed to general. Complex conditional rules based on many different attribute values would not have been appropriate because the links between levels would have also been complex - many to many.
 
 

4.2.4 The neighbourhood analysis window (NAW)

The classification process described above used a 3000m radius NAW to determine the spatial influence of different vegetation components. This radius was selected after a careful investigation of the effects of a range of radii from 1000m to 5000m. Figure 4.8 compares the effect of three different search radii - 1000m, 3000m, and 5000m (the key is the same as for Figure 4.6), The amount of agreement (percentage of area with the same class) between 1000m and 5000m search radii is low - 61%. When the search radius is small, less compositions and many small discrete areas are identified. With a search radius of 5000m there is a lot more generalisation, a few large discrete areas are identified, and many of these are classified as compositions. It is difficult to know which search radius is more appropriate. If a large search radius is used then areas that are far away are being used to classify the focal area. It is not appropriate if this is too far.

A 3000m neighbourhood search radius is large enough to go beyond small hills, but it is not too large to require considerable amounts of processing when the resolution of the raster coverage is 500m. The search radius should be related to how people view and experience landscapes. However, sufficient cognitive research on this is not available. People can often see for more than 3000m but how much detail is perceived beyond this distance? Does the foreground of a view have more impact than the background? If so by how much? To address such questions, it would be useful to know more about how people experience landscape. It is probably highly variable, and is not only dependent on the person but on the situation. Discussion and empirical research are required. For the time being, different landscape classifications can be created using different search radii, and the variability in the results can be presented. Variability can also be represented using fuzzy set theory, and an application of this will be presented in section 5.4.2 with regard to landforms.

It should be noted that the distance of the search radius is measured using horizontal distance, and does not incorporate ground distance, which also has a vertical component. This results because a two dimensional grid is used to represent a three dimensional surface. The effect of this is that the neighbourhood extent is more in hilly and mountainous areas in terms of ground distance. This may not be appropriate because in such areas topography can reduce the amount of movement or exploration. However, one mountain top may be easily viewed from another mountain top even though the ground distance between the two may be considerable because of a deep valley in between.

Annuluses could also be used for determining the shape of the NAW. The annulus shape comprises of one smaller circle within a larger circle (donut shape). Cells that fall outside the radius of the smaller circle but inside the radius of the larger circle will be included in the processing of the neighbourhood. An annulus would enable the spatial influence of components to be specified for a range of distances. It would be possible to quantify the spatial influence of components for different degrees of proximity - close, medium distance, and far away. This information could then be used for developing complex definitions for different landscape classes. Not only can annuluses be used but also wedges can be specified that control the aspect of the NAW, eg. 0-90 degrees. This would provide even more opportunity for specifying the exact nature of the spatial influence of different components. The way different landscape components are composed could then be quantified. The NAW can also be weighted by using a kernel. This kernel could enable an appropriate distance decay functions to be specified for each landscape component, which could then be incorporated in the spatial influence calculations. The problem with using these GIS features, which are available with ARC/INFO, is that it is not known which complex compositions are important, or which distance decay functions should be used for the different components. Therefore, these features will not be used in this study.

4.2.5 What is a significant amount of spatial influence?

When determining definitions for vegetation compositions, it was necessary to specify what amount of spatial influence of a particular component is significant. For example, if 1% percent of a neighbourhood is grass and 99% is forest, should this be called "forest", "grassland-forest", or "forest with a small amount of grass"? In this circumstance it was considered that "forest" was appropriate, because the influence of grass was not significant enough. For most of the class definitions a 20% threshold was used for the important vegetation components. It is difficult to know whether this is appropriate as it depends on how people perceive landscapes, and there is no substantive research on this. Other thresholds as well as 20% were experimented with and Figure 4.9 shows the results. The difference in the outcome is significant. The amount of agreement between the 10 percent threshold and 30 percent threshold is only 63%. When a low threshold is used there is a high mix of vegetation components, and not many "pure classes" are identified, while with a high threshold there are more "pure classes".

Some components dominate over other components. For example, all things being equal, forest dominates landscapes more than grass. Therefore, the thresholds used in the definitions are related to the components being used. It will also be noticed from the definitions in Appendix 1 that the "or" statement is used. This is so that a range of combinations can be considered.
 
 

4.2.6 Sensitivity to cell size

The cell size greatly affects the processing speed. If the cell size is halved then the number of cells in the grid coverage increases by four, and therefore any operation that processes each cell will take much longer. However, there is more to it than that. When a neighbourhood function is used, the NAW of each cell needs to be analysed. If the NAW is set at a certain distance in metres, then the number of cells within the radius increases as the cell size is reduced. For example, with a NAW of 3000m and a cell size of 500m, there will be approximately 110 cells within that radius that will need to be processed. If the cell size was reduced to 200m then there would be approximately 700 cells that would need to be processed. Therefore, reducing the cell size not only increases the number of focal cells but also increases the number of neighbouring cells.

Cell size also has an effect on the spatial accuracy of the boundary of the classes. With larger cell sizes this will be less accurate. Also, when a vector coverage is converted to a grid then small objects may be lost if the cell size is too large.

To see how sensitive the classification is to cell size, a range of different cell sizes was experimented with. Figure 4.10 shows the results. There is not a significant difference between the use of 300, 500, and 700m cell sizes. The agreement between the use of 300 and 700m cell sizes is 95%. There is a difference in the coarseness of the boundaries of the classes. This is difficult to see at the scale used in Figure 4.10, but it is obviously coarser with the larger cell size. The variation resulting from different cell sizes is not great because the minimum size polygon in Landcare's vegetation database is 500 ha. The cell sizes are significantly less than this (9, 25, and 49 ha) so very little detail is lost during vector to raster conversion. It should be noted that the search radius was held constant. Different cell sizes affect the speed of the processing quite significantly. With a cell size of 500m the process can be completed in a couple of hours, but with a cell size of 300m approximately eight hours is required.
 

4.3 Naturalness
 

4.3.1 Introduction

Although people are familiar with the concept of naturalness, it is a difficult concept to define. In this thesis it relates to the degree of development or cultural influence in a landscape. It is chiefly concerned with the amount of cultural modification of the surface cover. Naturalness therefore spans a spectrum from very unnatural landscapes, such as urban environments, to untouched landscapes, such as wilderness areas. Whether a landscape is natural or not depends on how different aspects of human modification are perceived. Is an exotic forest perceived as natural and is this more natural than an agricultural landscape? What is natural depends very much on the individual and therefore requires public perception studies to ascertain this information scientifically.

For landscape classification it is not actually necessary to rank the naturalness of different areas, as a nominal classification is sufficient. Areas that are similar in naturalness need to be grouped together. This can be done by classifying naturalness character rather than ranking naturalness. Areas that are similar in terms of human modification need to be identified. It is possible to do this using a range of parameters as will be demonstrated in this chapter.
 
 

4.3.2 Past research

Naturalness has been a common attribute used in landscape studies, although a range of different approaches has been used to define it. Bennett (1985) ranked naturalness for different areal units using a score of one to five, and this tended to be an intuitive procedure using field observations, rather than using explicit guidelines. Linton (1970) incorporated naturalness with vegetation and landuse. He used fairly broad classes - urbanized and industrial, farmland, and wild landscapes. The Manchester study (cit.in Countryside Commission, 1988) incorporated a whole array of man made components, such as towns and villages, railways, roads, power lines, and buildings.

The application of GIS for classifying naturalness is not new, however, it appears that most initiatives have been focused at the wilderness end of the naturalness spectrum (Lesslie et al., 1988, Kliskey and Kearsley 1993). This study will attempt to identify the whole range of the naturalness spectrum, from urban and rural areas, to remote areas. There does not appear to have been any published research that investigates the use of GIS for identifying automatically this range of landscapes. The research on wilderness identification does provide a starting point from which a process can be developed.

Lesslie et al. (1988) used four indicators to identify wilderness. These were:

1) Remoteness from settlement,

2) Remoteness from access,

3) Aesthetic naturalness (free from structures), and

4) Biophysical naturalness.

Apart from biophysical naturalness, these indicators were obtained by using GIS to measure the nearest distance from each cell, in a raster representation, to various human entities, such as settlements, roads, structures, and logging operations etc. These distance measurements were then used to derive the different indicators. These indicators were then classed and weighted before being combined to ascertain wilderness quality.

In this study, elements of biophysical naturalness have been classified under vegetation, and therefore it is not necessary to include this again in a naturalness classification. It is also not necessary for a landscape character classification to specify quality, but instead it should distinguish character that may explain differences in quality.

Kliskey and Kearsley (1993) used similar indicators as Lesslie et al. (structures, access, vegetation, and use levels). These indicators were mostly obtained using buffers around different unnatural entities in vector coverages. These buffer coverages were then overlaid and "wilderness purism" scores calculated.

As previously discussed in section 3.4.2., the use of neighbourhood mean functions is more appropriate than the nearest distance calculations used by Lesslie et al. and is also more appropriate than the buffer functions used by Kliskey and Kearsley (1993).
 
 

4.3.3 The automated process

The automated process developed in this study identifies 22 different classes of naturalness. These are listed in Table 4.2. under level 1. The major problem with classifying naturalness is that there is a lack of information on how people conceptualize naturalness. There is a certain amount of information at the wilderness end of the naturalness spectrum (Stankely and Schreyer, 1987, and Kliskey and Kearsley, 1993), but not at the other end. Although urban areas are already conceptualized in topographical maps and in NDDBs, this has not been based on how the public conceptualise urban areas. The intermediary classes between urban and wilderness have also not been explicitly conceptualised. Common language, such as "rural" "town", and "settlement", give some clues to how development in the countryside is conceptualised, however, it can be quite vague. The classes used by the ARA (1984) also help.

A clue that can be used for deciding upon different classes is the amount of contention that exists over different development initiatives in the countryside. In well settled areas such as the Canterbury Plains, people do not get too concerned, relatively, about the impact of new roads or buildings on the landscape, however, in undeveloped areas they do (for example the Bealey Hotel in Arthur's Pass). The quality of the landscape is more sensitive to subtle changes at the natural end than at the developed end of the naturalness spectrum. This implies that more classes are required at the more natural end, which is how the classes in this study have been organised.

The availability of information in digital databases also affects which naturalness class can be identified. The databases used in this study for identifying naturalness classes were:

262 (1:250,000) topographic database (DOSLI),

Digital Chart of the World (DCW) (ESRI), and

Supermap2 (Statistics New Zealand).

Access was not available to DOSLI's 1:50,000 topographical databases because of cost and also because it had not yet been fully developed for the study area. It is possible to speculate on the use of additional information from this source.

The main database used was the 262 topographical database. When one looks at a hardcopy of a 1:250,000 topographic map, it is possible to assess naturalness in different area based on the number of roads, structures, railways, pylons, urban areas, etc. The method used in this study attempts to simulate this assessment by using a method similar to that used for classifying vegetation. Here, 15 single component layers were obtained from the three vector databases listed above and converted to raster coverages (500m cellsize) with a value of 100 assigned to areas where the components are present, and zero where they are absent. The spatial influence of the components was then expressed using a focal mean function. Figures 4.11, 4.12 & 4.13 show the spatial influence classes of the different components and the extent of the actual components themselves. The different neighbourhood search radii used for each component is stated in these figures, as well as the different class intervals used. It should be noted that the search radii and class intervals are not the same for each component. The reason for this will become apparent later. The spatial influence classes were then used as parameters for defining naturalness classes.

Urban areas were identified from the secondary roads labelled "urban" in the DOSLI topographic database. A focal mean function with a NAW of 3000m was used to determine the spatial influence of these roads. Urban areas were defined as areas where this spatial influence was greater than 10% (refer to Appendix 2). This was considered the best approach for identifying urban areas because it was consistent. The DOSLI topographic database does not contain a polygon coverage of urban areas. The DCW contains an urban area layer, and the Ministry of Forestry have also digitised an urban layer, but these were not considered as consistent as DOSLI's "urban roads" layer.

The towns, large settlements, and small settlements, were identified by integrating the settlement layer of the DCW with the population data from Supermap2. With Supermap2 it is possible to select groups of meshblocks confined together to constitute a town or city. Each group is assigned a place name, and because the meshblocks are reasonably confined together they can be used as point information without having the problem associated with generalising over a large area. It was possible to automatically relate the DCW's location of places with Supermap2's population of towns or cities using place names. Some modification to the place names of the DCW coverage was required because it is necessary that these be spelt exactly the same as place names in Supermap2 in order for this transfer of data to work. Also, places were added to the DCW that were distinguished by Supermap2 as a town or city but were not present in the original DCW. Small settlements were places identified in the updated DCW but were too small to be distinguished by Supermap2. Large settlements were identified by Supermap2 as having a population less than 500, and a town had a population greater than or equal to 500 but was not identified as an urban area described previously.

The 11 other component layers were themes represented in DOSLI's topographic database. The combined roads were derived from national and provincial highways, and sealed and unsealed secondary roads. The secondary roads (sealed and unsealed) excluded forestry and urban secondary roads.

The walking track layer of DOSLI's topographical database was not used because it was too inconsistent. In some places it was detailed and contained the same information as the hard copy maps, and in other places it did not.

Once all 15 spatial influence layers had been derived and classified, they were overlaid. This resulted in 2139 unique combinations for the study area. A vector representation of this coverage is shown in Figure 4.13. Twenty two naturalness classes were then derived by querying the attribute table of this combined coverage. The definitions are described in detail in Appendix 2, and the actual classes are listed in Table 4.2. under level one. Utility includes pylons and railways. Checks were made to ensure that the definitions were mutually exclusive and exhaustive as described in section 4.2.3. The result of this process for the study area is shown in Figure 4.14. It appears from this figure that a buffer function was used, however, this was not so. The classes identified have been chosen because of their importance in planning disputes. At the more natural end of the spectrum subtle changes in naturalness can be contentious therefore more classes are needed. The classes also reflect the information that was available which, as will be discussed in section 4.3.6, was deficient for identifying some classes.

This naturalness classification was then generalised by grouping classes. Table 4.2. shows how the classes were grouped for each of the six levels of generalisation developed. Figure 4.15. graphical illustrates the effect of this generalisation. No key is provided with this figure to avoid cramming, but the colours are the same as used in Figure 4.14. and the key can be ascertained by using this and Table 4.2.. Like level 1, the classes in level 2-6 maintain more detail with the more natural classes because of the contentiousness at this end of the spectrum. However, at a very general level, even detail here is lost.

A few details in the naturalness classification process warrant further clarification. To identify reasonably developed areas in the countryside (which are classed as "developed rural" and "rural" in this study), the spatial influence of the combined road layer was used. This was considered the best indicator of this class, although these areas contain much more infrastructures than this. It is fairly safe to assume that where there is a lot of development then there will be a high density of roads. The more intense the farming, the more activity there will be, and therefore the more access that will be required. For the study area a 10,000m radius NAW was considered the most appropriate for identifying the classes "developed rural" and "rural" because they are very general. This was decided upon after examining the results from a range of different search radii, from 3000m to 20,000m. The density of buildings could have also been used for identifying these classes as it shows a similar pattern as the density of roads. However the structures layer is probably not as consistent as the road layers because of the difficulty in defining a structure. It was therefore considered less effective.

Not all the information that was available was used in the definitions. For example, although national highways and provincial highways were identified separately, they were grouped together in the definitions. After different options were considered, it was decided that deriving separate classes based on these components was not necessary. The difference in naturalness between national and provincial highways is too inconsistent to be of any use in a landscape study. For example, is there a significant difference in naturalness between the Lewis Pass road, which is a national highway, and the Arthur's Pass road, which is a provincial highway?

An outcome from this process, which is perhaps undesirable is the very small slithers distinguished at some levels. For example, along the road across Arthur's Pass there are many small areas identified as "utility" and "highway", rather than "highway with utility". This is because the highway, and pylon components are not occupying the same area. The pylons are often located a few hundred metres from the road. The spatial influence of these two components therefore differs, and it is not possible to allow for this in the definitions without adverse effects.
 

4.3.4 Cell size

This automated process for classifying naturalness is affected quite significantly by cell size. The reason for this is that the original vector coverages consists of lines and points. When these are converted to a raster image, each cell is generalised so that any cell that overlays a point or line will be assigned the attribute value of the line or point. When a vector coverage of roads is converted to a raster coverage with a cell size of 500m then the road is represented by 500m wide cells, although the road may in reality be only 20m wide. If a 20m cell size was used then the raster coverage would be closer to reality. One may think that using a 500m cell size will lead to major errors. However, since roads are likely to have a spatial influence of more than 500m this is not a serious problem for landscape classification. The difference in outcome between different cell sizes occurs when focal means are calculated. Consider an isolated straight road converted to a raster grid with a cell size of 500m, with the value 100 assigned to cells where the road is present and zero where it is absent, and the focal mean calculated from a surrounding (square) neighbourhood of 2500 hectares (10 X 10 cells). The focal mean of the cells where the road is present will be equal to the number of cells where roads are present in the neighbourhood, which will equal 10, times 100 (the value of these cells), divided by the total number of cells in the neighbourhood (100). This would be:

10 X 100 / 100 = 10

Now if a cell size of 20m was used and the neighbourhood extent stayed the same, the number of cells in the neighbourhood would be 62500 (250 X 250 cells), and the number of cells where roads are present within the neighbourhood would equal 250. The focal mean for cells where roads are present would therefore be:

250 X 100 / 62500 = 0.4

These focal means are used to define naturalness, and since these values change with cell size then the definition will also change with cell size. Thus, for a given NAW area, the naturalness definition is dependent on the cell size. This sensitivity to cell size could be reduced if the definitions were based on the actual area of the components (eg. hectares of highway), rather than the number of cells representing the components. However, this would require knowing the average area of a cell that a component occupies. This is difficult to ascertain.

4.3.5 The use of Supermap2

As mentioned in section 3.3, Supermap2 is a database of the census results, produced by Statistics New Zealand, and organised by meshblocks or areal units. Among other things, it is possible to use information in Supermap2 to get an impression of development in different areas. Supermap2 contains information on the number of dwellings and this can be subdivided by the type of dwelling - hotel, motel, house, etc. There is also information on the population of different meshblocks. Available as "clip-ons" to Supermap2 are the business directory databases, which contain information on the extent of different industries in different regions based on the number of employees. All this information could be used to develop an impression of the type of development that exists in different areas. It is possible to use Supermap2 to distinguish an area as very tourism orientated, based on the number of hotels or the number of employees working in hotels. However, Supermap2 has not been used significantly in this classification process because of problems relating to spatial accuracy.

A major problem with Supermap2 is that the information is organised by meshblocks whose sizes are related to population density. In towns and cities the meshblocks are quite small, while in rural areas these can be quite big. The spatial accuracy of the information is affected by the size of the meshblocks, and so in rural areas the spatial inaccuracy can be quite significant. In this study the emphasis is on rural areas therefore if Supermap2 data were used significant error would arise. Figure 4.16. demonstrates this for population and dwelling numbers in the Mackenzie district. These are mapped as density in order to correct for different meshblock areas. Because of the large meshblocks, significant generalisation occurs which can lead to misleading results, such as along the road between Twizel and the Mount Cook village, most of the dwellings and population are close to the main road, yet the meshblocks extend well beyond the road. It would be inappropriate to generalise the effect of these dwellings and populations over the whole extent of the meshblocks. Furthermore, if the meshblock boundaries were organised differently, it is highly probable that the statistics in this figure would change significantly. This is commonly known as the modifiable areal unit problem, which was reviewed recently by Wrigley (1995).

Meshblock data are appropriate for classifying towns or cities because in these areas the meshblocks are more confined so less generalisation occurs. The process does this at a very general level by determining the population of different urban areas.

To distinguish agricultural and forestry landscapes in rural areas, it is better to use other parameters than meshblock statistics. Agricultural landscapes can be identified by vegetation and by the number of roads (which indicates activity), while forestry can be accurately identified by the presence of exotic plantations.
 
 

4.3.6 Information deficiencies

The process described above identifies as many important classes of naturalness as possible from existing databases. There are, however, some important classes that have not been possible to identify. These are rural landscapes affected by tourism, mining, electricity generation, and other industries. If any of these industries are big enough at a particular location than perhaps they would be identified as a settlement or town, however, often they are not. As discussed in section2.4., tourism development is a major contentious landscape issue, yet the extent of this industry is not mapped in rural areas. Some information may be ascertained from the 1:50,000 topographic database on the extent of ski fields, but not accommodation. The structures layer of the topographical databases only specifies that one or more buildings exist at a particular location, and it does not specify the type or size of the structure. The structure could be a small farm house or a hotel with 100 rooms.

Information on mining activities would be useful as this is also a contentious landscape issue in rural areas. Mining sites are mapped in topographical databases but the information is not detailed enough. It is only specified that at a particular point there is a mine. It is not possible to ascertain whether the mine is an underground mine, an opencast mine, or an old derelict mine overgrown with vegetation. It is therefore inappropriate to use this information for landscape classification as these different mines have significantly different impacts on the landscape. It would also be useful to have other major industrial sites mapped out and available in digital format. They would be relatively cheap to produce as they would only consist of point information, and there are not many major industries in rural areas. Such information could also be used for other planning issues, for example transportation. Statistics New Zealand does survey all industries, and it would not be too difficult for them to obtain grid references of each industry and map this information. However, Statistics New Zealand is restricted by law from making this information public. It appears from the directories of available databases that there has perhaps been more emphasis on mapping nature, for example animals, plants, and wetlands, and less emphasis on unnatural and potentially harmful things, for example industries, and hydro dams. Both types of information are needed to address environmental issues.
 
  4.4 Influence of water

Classifying the influence of water, for the purposes of this study, requires classifying coastal areas, and identifying rivers and lakes. There is very little discussion about this in the literature regarding landscape classification, except for rivers (Mosley, 1989), and even less on doing this automatically within GIS.
     

4.4.1 Classification of coast

Early coastal classifications recognized two classes - emerging and submerging (Davis, 1902). Shepard (1937, and 1938) introduced the concepts of primary and secondary coasts. Primary coasts were based on the influence of land based processes, such as fluvial activity, glaciation, aeolian processes, or denudational processes. Secondary coasts were based on marine processes. Valentine (1952) developed a coastal classification that incorporates the above classifications.

Landscape classification is not concerned with genesis but only with the contemporary appearance. Therefore, Valentine's classification is inappropriate. For a coastal classification to be useful in a landscape classification, it also needs to be at a very generalised level. This is because landscape classification needs to incorporate a wide range of other attributes. Coastal morphology is more relevant for this study. Weerakkody (1993) used remote sensing to identify three features important to coastal morphology - the coastline indentedness, plan-curvature, and orientation. Indentedness was described as being formed by headlands, islets, spits, river mouths, lagoonal outfalls, beach rocks, rock outcrops, sea cliffs, coral reef, and engineering structures. Plan-curvature of the coastline uses notions such as concave, convex, or straight to describe the coastlines that are not particularly indented. Orientation was used because of the effects this has on marine activity, such as the refraction effect of waves, longshore drifting, and direction of littoral currents.

Existing digital databases in New Zealand contain very little information on the coast. Although beach rocks, rock outcrops, sea cliffs, and engineering structures appear on hard copy maps, this information has not been digitised. All that is available is an outline of the coast. From this it is possible to see what coasts are indented and which are not, however this information has not been conceptualised in the database. The coastline therefore needs to be spatially analysed to distinguish these classes.

It was decided that the coastal classification for this study should have four classes at its most detailed level - indented, very indented, non indented, and non-coastal. Indentedness is considered important because of the prevalence of common language that describe this feature of the landscape, for example bay, headland, inlet, fiord, and sounds.

4.4.2 The coastal classification process

Indented coastlines can be identified by using expand and shrink functions. These functions actually expand or shrink a specified class by a specified number of cells. Figure 4.17 shows the different steps in the process for the Banks Peninsula region. The process starts with a grid that has one value for land and another value for sea. The land is then expanded 2500m (five cells) into the sea, using an expand function. This output is then shrunk by the same amount. The net effect of this expand/shrink sequence is that indented sea becomes land and the only sea is open. The semi-enclosed sea can then be identified by comparing the original land coverage with this open sea coverage. The coast can then be easily classified as indented or non-indented depending on the percentage of the neighbourhood that is indented or non-indented. If there was an indented coast within a 5500m radius then the coast was classified as indented. A very indented coast was defined as an indented coast that was further than 9500m from the open sea. This was identified using a buffer function on the open sea grid. Three coastline types were therefore classified. The spatial influence of these was set at 3000m inland using a buffer function which acted equally on all three classes.

The original land grid was obtained by converting DOSLI's 1:250,000 topographic polygon coverage of coast to a raster coverage with 500m cell size. The DCW or the Ministry of Forestry's database could have also been used, but they were less detailed, especially the DCW. The actual coastline can be identified by converting the arcs of the polygons to a grid coverage. However, this did not overlay precisely with the land grid because the generalisation effects of converting lines to grids are different from converting polygons to grids. Instead, the process shrinks the land grid by one cell, and then the difference between this and the original land grid gives the coastline.

This process expands and shrinks land by 2500m to identify indentedness. The effect of using 2500m is that inlets or bays that are less than 5000m across are identified as indented. One may question why 5000m should be used as a threshold and not 1000m or 10,000m. If a large number is used, reasonably straight coastlines that are also slightly concave become identified as indented. A range of thresholds was experimented with, and it was decided that 5000m was the most appropriate. The other thresholds that needed to be specified were also determine through experimentation. This includes the 9500m from the open sea used to distinguish between indented and very indented, and the 3000m radius NAW used to determine the spatial influence of the indented sea. What these thresholds should be is fairly arbitrary. What is important, however, is that these figures are explicitly stated.
 
 

4.4.3 Classification of rivers and lakes

Mosley (1989) characterised a range of different "riverscapes" based on an extensive perception study. Many of these characteristics are covered in this study by the other landscape attributes. Also, some characteristics are too detailed to be included in a total landscape classification, such as river straightness, and eroded banks. It was decided that it was appropriate to only include rivers and lakes over a certain size. This is consistent with Linton (1970) and the need to generalise.

DOSLI's topographical databases, DCW, and Ministry of Forestry databases contain river and lake layers. The DCW contains both rivers and lakes in one hydro layer, but these cannot be distinguished using attribute information. The Ministry of Forestry databases could be used, but it was initially considered that DOSLI's topographical database was better because it had information on the size of the rivers. It was also considered more consistent and accurate.

Large lakes were distinguished from smaller lakes by size using a threshold of 500 hectares. The spatial influence of these large lakes was then ascertained using a focal mean function with a 3000m search radius. If a cell had a large lake present within a 3000m radius then it was classified as lake.

It is possible to identify regions in which there are many small lakes. This can be easily done by first identifying the small lakes and then using a focal mean function. This would identify another class, but was not done because of the need to generalise.

Larger rivers can be identified by their hierarchy level given in DOSLI's 1:250,000 database. This hierarchy appears to be based on the Strahler method (cit.in Petts and Foster, 1985), except for level 7 (the highest level), which is braided rivers. Level 7 is difficult to classify according to size, because it includes the large, braided rivers, such as the Rangitata, and many small, braided reaches of small rivers. It was not possible to automatically distinguish large braided rivers from small braided rivers. Since the inclusion of sections of small braided rivers was not appropriate for this study, a database developed by Landcare was used instead. This database was derived from DOSLI's 1:250,000 database by isolating the levels 5, 6, and 7, and manually deleting the small braided reaches. It is not possible to say exactly what specifications were used. Once a suitable database of rivers had been obtained, the spatial influence of rivers was determined, as for lakes, with a 3000m NAW.
 
 

4.4.4 Water classification

Once the coast, lakes, and rivers had been classified, and their spatial influences determined, a water classification was produced by overlaying this spatial influence information with a process similar to that used for classifying vegetation, and naturalness. Figure 4.18. illustrates this process. Eleven unique combinations were identified out of a possible 16. Since there were a low number of possible combinations, all of these were used for level 1 of the water classification. Figure 4.19. shows this classification. The names of the classes indicate how they were defined, however, precise definitions are given in Appendix 3. Like the other main landscape attributes, this classification was generalised down to six different levels by hierarchically grouping classes together. Table 4.3 shows how this was done, and Figure 4.20. illustrates graphically the results. Again, no keys are provided with this figure to avoid cramming, but the colours are the same as used in Figure 4.19. and the keys can be ascertained by using this and Table 4.3.. The classes used at each level of generalisation reflect important differences in appearance and contentiousness. The distinction of a coastal class was maintained throughout the generalisation because these areas are given special consideration in the Resource Management Act.

The process for classifying water is sensitive to cell size because rivers and coasts are linear features that become misrepresented with vector to raster conversion, as discussed with regard to the naturalness classification (refer to section 4.3.4).
 
 

4.5 Summary

The processes used to classify vegetation, naturalness, and water, follow a common sequence of steps. Once the appropriate NDDBs have been decided upon, the important landscape components are identified by grouping or generalising different objects in the NDDBs, and generating single theme (or binary) coverages. Except indented coastlines, all the components are conceptualized in existing NDDBs, thus making this step relatively simple. It was possible to conceptualise indented coastlines by using a combination of expand and shrink functions. The spatial influence of each component is then calculated with a focal mean function, and this information is then used to define landscape attribute classes using overlay composites.

The following decisions were required to classify vegetation, naturalness, and water:

      . the NDDBs used,

      . the generalisation of these NDDBs,

      . the determination of spatial influence, which included

          . the size of the neighbourhood analysis window (NAW), and

          . the spatial influence thresholds,

      . the definition of the attribute classes, and

      . the cell size.
       
       

With the classification of coastlines it was also necessary to decide what constitutes an indented coastline, and what distance from the open sea makes an indented coastline very indented.

The sensitivity of the classification to the size of the NAW, and spatial influence thresholds was investigated and found to be substantial. The effects of different cell sizes depended on whether the components were originally represented by lines or polygons. Objects originally represented by lines were distorted significantly by vector to raster conversion and this subsequently affects the class definitions. This distortion depends on cell size and needs to be built into the definitions. To avoid having different sets of definitions for different cell sizes, it is necessary to decide on an appropriate cell size. With a cell size of 500m the spatial detail of NDDBs was not unduly lost, the necessary components can be represented, and the processing speed is acceptable.