3.1 Introduction
Rhind (1988, p.26) stated that,
"existing GIS systems do not contain the ability to express high level geographic concepts. Instead they are entirely or very substantially based upon storage of coordinate data and their attributes - essentially low level conceptualizations of the objects under consideration. Human beings evidently store multiple levels of conceptualization of objects, sometimes in a "soft" or "fuzzy" fashion... ."
From the previous chapter it is apparent that the concept
of landscape is complex and so Rhind's statement is questioning whether GIS can express
landscapes adequately. This chapter addresses the challenge of classifying landscapes
using GIS by framing it as an operational definition problem. In a GIS context, Lay (1991)
identifies three factors that need to be balanced with operational definitions: the human
concept model, characteristics of the digital databases, and GIS capabilities. The
previous chapter has provided information on the human concept model of landscape, and the
first part of this chapter will provide a brief overview of GIS capabilities and in
particular focal neighbourhood functions as they are important in later chapters. The
available digital databases will also be introduced. Nyerges (1991a and 1991b) provides a
more sophisticated framework for developing operational definitions by dividing
geographical meaning into four abstractions: classification, generalisation, association,
and aggregation. The challenge is to represent these abstractions using GIS. This is
discussed, especially with regard to representing generalisation and association as these
are more difficult. This chapter also introduces the study area, the method of research,
and a means for validation. Lastly, past research in automated landscape classification is
briefly reviewed but the detail of this is left to later chapters.
3.2 GIS Overview
GIS is a collective term commonly accepted for describing computer systems that can manipulate geographic data. This includes the following operations:
. acquisition and verification
. compilation
. storage
. updating and editing
. management and exchange
. retrieval and presentation
. analysis and combination.
Geographical data can be defined as consisting of information on the qualities of and the relationships between objects that are uniquely georeferenced (Bernhardsen, 1992).
GIS is a relatively new technology that has only become well recognized and utilized with the development of commercial GIS software in the 1980s, although the basic principles were conceived in the early 1960s with the first system, the Canadian Geographic Information System (Maguire, 1989). The key to their enormous value is that they offer users the opportunity to analyse and manipulate large databases, select data by theme, search for particular features in particular areas, and update databases quickly. Also, they can produce a variety outputs, ranging from maps, graphs, data lists, and summary statistics. The benefits, components, and functions of GIS have been thoroughly reviewed elsewhere (Maguire, 1989, Aronoff, 1991, and Cassettari, 1993).
Data models for GIS can be divided into two categories: vector and raster. In brief, a vector data model is represented by points, lines, and polygons, while a raster data model is represented by pixels (commonly called grids). Both data models have their advantages and disadvantages. Raster (or grid) format is a simpler data structure, while a vector data structure can be complex but provides an accurate representation of boundaries and linear and point features. Different data models suit different analysis functions. Overlay and neighbourhood analysis functions are easily computed with raster data models, while vector format is more efficient with network analysis (Aronoff, 1991). With GIS it is now common to have functions that convert data from vector to raster and vice versa. Such functions will be used in this research. Both data models will be used at different stages in this study depending on the analysis functions being used.
A major part of GIS is cartographic modelling (or GIS modelling). This is concerned with how data are used rather than with the gathering, maintaining and conveying of data. It is, as the term suggests, the development of models (or representations) expressed in a cartographic form (Tomlin, 1990), but is more concerned with process rather than a product. Tomlin (1990) identifies two types of cartographic modelling - descriptive and prescriptive. Descriptive modelling describes "what is" or perhaps "what could be", and uses analysis of form and position with synthesis of cartographic characteristics. Prescriptive modelling is concerned with "what should be", and is problem solving, especially regarding allocation (eg. selecting locations to satisfy stated objectives). Landscape classification is dominantly descriptive as it is concerned with describing "what is there".
An important objective of cartographic modelling is to derive meaningful information from what can be an overwhelming amount of data (Cassettari, 1993, Maguire, 1989). Planners need clear single theme models that can then be incorporated in a landuse information model (as discussed in section Error! Reference source not found.). A theme identified as important for planners is the landscape. There is a wealth of different databases that could provide information relating to this theme, but these are not helpful to planners who do not have the time to interpret such databases. Automated landscape classification is about converting large quantities of data to useful information.
3.2.1 GIS analysis functions
The processing of digital data into information with GIS requires the use of analysis functions. Aronoff (1991) provides a useful classification of GIS analysis functions, which has also been adopted by Cassettari (1993) (refer to figure 3.1). This research will use and/or discuss many of them. For example it will use: various functions for the maintenance of spatial and attribute data; retrieval, classification and measurement functions; overlay functions; various neighbourhood functions; topographic functions; interpolation functions; some connectivity functions such as, proximity measures, and intervisibility (viewshed); and the output formatting functions. As mentioned previously, raster data models (or grids) are particularly useful for spatial analysis. This is particularly so with neighbourhood analysis. In a vector data model, neighbourhood analysis is virtually limited to the use of buffer zones. With grids there are other possibilities. The more promising of these is, what is commonly called, "focal neighbourhood functions" (Tomlin, 1993). Since this function is an important part of this study it will be discussed in detail.
3.2.1.1 Focal neighbourhood functions
With focal neighbourhood functions each cell within the specified coverage becomes in turn the centre for processing (Figure 3.2). When a cell is being processed, the cell values within the specified neighbourhood of that central cell are included in the processing. The process could, for example, be to calculate the mean of all cell values with the neighbourhood. The result of this process is then assigned to the cell of a new grid with the same position as the central cell. The next cell in the grid coverage then becomes the centre of the processing. This continues for all the cells that are available for processing. As can be imagined, this can involve a lot of processing, especially when the specified neighbourhood consists of many cells, and the grid coverage contains many cells. The neighbourhood can be of any shape which can be directly specified in some GIS software or custom designed. The focal function has been used extensively for image processing in regard to remote sensing (Matler, 1987). For this application it is used for different types of filters (kernels)
In a good GIS many different focal functions are available. The focal mean has been mentioned above. Other focal functions that will be used in this study are as follows:
Focal sum - which is the sum of the values within the specified neighbourhood;
Focal maximum - which is the highest value within the specified neighbourhood;
Focal range - which is the difference between the focal maximum and the focal minimum;
Focal majority - which is the most frequent value within the neighbourhood; and
Focal variety - which is the number of unique classes
within the neighbourhood.
There are other focal functions and these are reviewed in GIS manuals, and Tomlin (1993). Focal functions can be applied to both discrete and continuous data. However, some functions are better suited to certain data types, for instance focal majority and focal variety are more suited to discrete data.
When a landscape is assessed manually, the overall
impression of an area is considered. Focal functions are particularly powerful for
landscape classification because they can be used to capture the essence of the
surrounding location of a particular point, and therefore capture some of the holistic
(composition) qualities of landscapes. Duffield and Coppock (1975) used focal mean
functions for identifying recreational landscapes, but since then it does not appear to
have been used for landscape classification although it has considerable potential. The
function has also been shown to be useful for automatic landform classification (Dikau,
1991) which will be discussed in chapter 5. It will be argued in section 0 that focal
functions are now the most effective functions available in GIS for expressing spatial
association for landscape classification.
3.3 National digital databases
The term digital database, for this study, refers to geographical databases that are in digital format and that can be incorporated in a GIS. Information stored in these databases is geographically referenced. The term national digital database (NDDB) refers to such a database that covers, or is intended to cover, the whole of a nation. They have been developed as a result of the development of GIS. The full utilization of these databases is yet to be realized. They are a recent technology whose full potential needs to be developed and experimented with. The construction of such databases can be a time consuming and expensive task and so it is preferable to utilize existing databases if they are appropriate.
Most Western countries have developed, or are in the process of developing, national digital databases of their environmental resource. In the United States, a digital topographic map of the whole country is covered at a scale of 1:100,000, and the United States Geological Survey aims to complete the digitising of a 1:24,000 scale map by the year 2000 (Southard, 1987). In Britain, a topographical database of the whole country is covered at a scale of 1:25,000, and the Ordnance Survey aims to complete digitizing of all the large scale maps (1:1,250 and 1:10,000) by about the year 2010 (Maguire, 1989). National databases are also being developed for the less developed countries (United Nations Environmental Monitoring Programme, 1990).
The development of digital databases is also being instigated at a global level (Clark et al., 1991). These are usually constructed by combining national digital databases. The Digital Chart of the World (DCW), developed by Environmental Systems Research Institute in the US, is an example of such a database. This gives digital information, which is stored on CD ROMs, of the whole globe, and provides a variety of information ranging from roads to political boundaries and waterways. The Global Resource Information Database (GRID) being developed through the United Nations Environmental Monitoring Programme (1990) also has goals of developing a global digital database. Other well known global databases are the World Data Bank I and II files. These contain information on contours, river networks, and coastlines, which are all digitised from 1:1,000,000 maps (Maguire, 1989). This study will use mostly national databases, however, global databases can be more accessible than national databases and can therefore be useful for analysis at a national level. It is conceivable that a process, once developed, may be applied to the whole globe with the development of global databases and powerful GIS.
The databases suitable for landscape classification were mostly identified from database directories, such as The Department of Statistics (now Statistics New Zealand) (1992), and Newsome (1995). The following criteria were used for identifying the relevant databases:
1) The databases need to contain information on at least one of the four important landscape attributes at a national scale;
2) They need to have an appropriate level of spatial and attribute accuracy; and
3) They need to be accessible to the researcher.
Using these criteria, the databases described in Table 3.1 were identified and will be used in this study. Most NDDBs are derived from hard copy maps and the scale of these are specified in 1. Mostly DOSLI's 1:250,000 and Newsome's vegetation databases are used in this study, but where these were deficient for particular themes other databases were used. The advantages and disadvantages of these databases will be discussed when the different landscape attributes are classified in chapters 4 and 5. Many other databases are being developed by Regional Councils and DOC. However, some of these databases are not available or consistent at a national scale, and some are not relevant. Unfortunately DOSLI's 1:50,000 topographic database was incomplete for the study area and would have also been financially inaccessible. Accessibility is a severe limiting factor that affects the use of DOSLI's topographical databases. The topographical data used in this study cost about $30,000 to purchase from DOSLI. Fortunately access was secured through Landcare Research LTD through a collaborative agreement. Without Landcare's support this research would have been severely limited.
It should be kept in mind that this thesis is
investigating the potential for GIS to classify landscape. It is not intended that a
current, usable classification is produced, and so no attempt will be made to identify and
remove specific errors propagated from databases. If the classification produced in this
thesis has substantial errors resulting from the databases, then with time this will be
reduced as databases are upgraded. The real issue is whether GIS can incorporate the
important compositional and generalised nature of landscapes. Despite this, the database
errors will be discussed later to determine if there is a need for improvement.
3.3.1 Sources of national digital databases
There are two main sources of national digital databases,
which are remote sensing, and the scanning and manual digitising of existing information.
These two sources will be discussed separately. Global Positioning Systems (GPS) are
another source of digital data. They are used for collecting spatial information usually
in conjunction with either field work or scanners used for remote sensing.
3.3.1.1 Remote sensing
The term "remote sensing" refers to the observation of a target using a device located some distance away from it (Curran, 1985). This includes taking normal photographs, using aeroplanes to take stereoscopic photographs and scanning infra red images, and the use of satellites for scanning a wide variety of wavelengths. All these can be used as primary data sources for information on the landscape.
Of particular interest are the images obtained in digital format from scanners, as these can be analysed conveniently with computers using "image processing" techniques. Typically, for environmental sciences, these images are derived from scanners located on satellites, however, the use of scanners located underneath aeroplanes is becoming increasingly important.
The first unmanned satellite designed to provide systematic global coverage of the earth's resources was the Earth Resources Technology Satellite (ERTS-1, later named Landsat-1) (Aronoff, 1991). It was launched in 1972. Since then there has been an array of different satellites launched for remote sensing, ranging from geostationary satellites that are fixed above some point on the earth's surface and usually used for weather forecasting (eg. Meteosat-2), to sun-synchronous satellites that orbit the earth (eg. Landsat-5 and Spot-2). Continuous acquisition of digital scans of the earth's surface from these satellites has been prevented in practice by cloud cover and the lack of local ground receiving stations. The current generation of radar satellites will help to overcome the cloud problem. The resolutions of past images vary with the scanners from 10 x 10m for the panchromatic scanner on Spot-1, to 56 x 79m for the multispectral scanner on Landsat-5, and 1 km or more for the Advanced Very High Resolution Radiometer (AVHRR) on the National Oceanographic Atmospheric Administration (NOAA) satellite. From these images, it is possible, using image processing, to derive digital information on a range of environmental attributes, such as topography, vegetation, landuse, and influence of water. This information can then be incorporated within a GIS.
Scanners mounted on airborne platforms can provide even
more detailed environmental information. The images are analysed in a similar way to
satellite images using raster based image analysis software. However, for the same areal
coverage as satellite images, this can be a more expensive option.
3.3.1.2 Scanning and manual digitising
A lot of environmental information has been gathered, either through field observation, topographic map interpretation, or photo interpretation. In this way, considerable information has been obtained on vegetation, soils, geology, landforms, fauna, landuse, archaeological sites, karst systems, and topography (Department of Statistics, 1992). An important means of conveying this information has been the map. With the development of sensitive office based scanners, many of these maps are being converted into digital form. Manually digitising these maps is also an option but this is tending to be less important as scanning technology improves. Scanning and digitising provides the spatial extent of different entities, however, it is also necessary to input attribute information that describes the different entities.
Many mapping agencies around the world are scanning the
different layers of their maps so that they can be easily updated and republished. It
appears that the main reason that these topographical databases are being developed is to
aid cartography. It is perhaps a coincidence that these topographical databases can also
be used for complex automated spatial analysis within GIS.
3.3.2 Classification of digital databases
Databases can be classified by many different data characteristics, for instance point or area, discrete or continuous, and integer or real. A useful classification could be based on the degree of input processing that they have had, and on whether they are specific or general purpose. Such a classification exists for data in general, and distinguishes between primary and secondary data (O'Brien, 1992).
Primary databases consist of crude data that has not yet been analysed, and does not necessarily present any meaningful information for a particular context. Digital databases that could be included in this category are remotely sensed images, such as from SPOT and LANDSAT, and also digital data obtained from field observations and GPS.
Secondary databases have already been processed to meet the needs of the collectors. Digital databases that fall into this category include digitised topographic maps, DCW, LRI, and Supermap 2. The agencies that supply these databases are in the information business and are therefore producing generalised databases that will suit a wide range of clients. They are usually derived from primary digital databases, or digitized or scanned from maps that were originally derived from field observations or remote sensing.
It would be useful if another category, here labelled tertiary digital databases, was distinguished to refer to digital databases that contain only relevant information for a specific issue. A tertiary database could be derived directly from processing a primary or secondary database, or digitised from maps. Landcare's digital vegetation map would be an example of such a tertiary digital database. A database could be secondary for some purposes and tertiary for others. For the landscape issue, being addresses in this study, Newsome's vegetation database would be regarded as a secondary database as further processing of this information is required.
This study is interested in developing a database that could also be categorized as tertiary. It is intended to do this by processing secondary databases. There is no point in deriving a landscape database from a primary digital database if it can be done more efficiently from a secondary database. However, there are disadvantages in using secondary databases because the processing used to derive them, which usually involves generalisation, is not often known, and therefore it is difficult to determine their quality.
It could be argued that it is better to derive a tertiary database by digitizing or scanning tertiary maps. Such a map, for a particular purpose, has to be available, and also, digitising can be expensive. Specific theme maps are usually not suitable for landscape classification purposes. For example, the landform map of Norway (Klemsdal and Sjulsen, 1988), is based on a genetic classification rather than a morphological classification. As will be discussed in chapter 5, it is the landform morphology that is important for a landscape classification.
3.4 Operational definitions
The automatic classification of landscapes is influenced by three factors: the human conceptual model, characteristics of the digital database, and GIS capabilities. As already mentioned, the balancing, or integration, of these factors has been labelled by Lay (1991) as "operational definition". Operational definition is not a foreign concept to geographical analysis (Mitchell, 1993), although Lay's interpretation is a slight variant because it is in regard to automation. Automation requires that the human concept model be formulated in a way that it can be "operationalized" with existing databases and GIS capabilities. With automation, the tradeoffs on the human concept model can be considerable, but this can be outweighed by the benefits of automation. Just because an automated approach may not represent a particular landscape precisely, is not a sufficient reason to discard the approach. The speed, explicitness, consistency, and repeatability of an automated representation may outweigh the disadvantages of misrepresentation. To classify landscapes automatically, it is necessary to understand the nature of landscapes, the available databases, and GIS functions. The former has been discussed in section section 2.9, and the latter two have just been discussed in this chapter. The formulation of operational definitions now needs to be considered.
Kliskey and Kearsley's (1993) attempt to automate the mapping of wilderness also needed to address operational definition issues. They used a public perceptual survey to help determine more precisely the nature of wilderness so that definitions of this could be constructed. However, this does not appear to have been a useful method for deriving operational definitions, because public perceptual surveys still only provide general definitions. For example, some people identified remoteness as an important component of wilderness, but remoteness is ambiguous. What distance from huts, tracks, and roads, constitutes remote? For a definition to be precise, it really requires the use of numbers and mathematical relationships. Most people do not think in this way regarding landscape classes. Although Kliskey and Kearsley used a perceptual study of the term wilderness, when it came to implementing these definitions within a GIS, many arbitrary decisions regarding the mathematical interpretation of these definitions were required (for instance the extent of the buffer zones surrounding the tracks for identifying areas of different degrees of remoteness).
With automated classification, the issue that needs to be investigated is the transition from concepts (or geographical meaning) to operational definition. This is where the emphasis in this study will be, but obviously attention will be given to the meaning behind different concepts. A perceptual survey of the public's concept of different landscape attributes will not be conducted. The content category research discussed in section 2.9. provides some direction for a landscape classification. Definitions used by previous manual methods will also be used if appropriate, as well as definitions found in the literature, and if necessary personal judgement.
In the documented dialogue between Carlson (1977) and Ribe (1982) over the possibility of quantifying scenic beauty, Ribe (p.69) states that:
"Numbers, when used for equations and statistics, provide a powerful means of rigorously describing, testing and analysing relationships in ways not possible through the use of only qualitative concepts and description".
With GIS, it is possible to extract from digital databases an almost unlimited number of different kinds of measurements on different aspects of the landscape. Not only can the quantity of different components be measured, for example length of road, and area of mountainous terrain, but this can be qualified in terms of different levels of scale, and can be combined with other measurements so that associations can be measured. This is a powerful advantage of GIS and digital databases, and it does not appear to have been utilized fully for landscape classification, especially with regard to identifying landcover. An important part of this research will be the identification of useful parameters that can be used for identifying different landscape classes. It can be argued that GIS can measure some parameters that are not practically possible to do manually, just because of the number of calculations involved. An example of such a parameter could be the density of roads within a given radius, calculated for 15 million points systematically located throughout New Zealand. This can be done within 10 minutes using modern computer hardware. To attempt to do this manually would not be practical. Such a measurement could be useful for constructing an operational definition of naturalness.
As discussed in section Error! Reference source not found., landscape perception is a complex cognitive process that, among other things, involves generalisation, composition, and classification. Before an operational definition of landscape classes can be defined and implemented within GIS, it is necessary to know their exact nature in terms of mathematical relationships. Often landscapes are expressed in words rather than quantitatively. The challenge is to express the meaning of these words quantitatively. For example, how can a mountain be expressed mathematically. Nyerges (1991a) provides an interesting discussion on how geographical meaning (conceptualisation) can be represented, or formulated, in what he calls semantic data models. He argues that in order for computers to automate geographical models of reality it is necessary to include geographical meaning.
Four types of geographical abstractions are important in providing sufficient knowledge of meaning to perform structure identification. These are classification, generalization, aggregation, and association (Nyerges, 1991b). "A classification abstraction is created when one or more entities are assigned to an entity class" (p.1489). A generalisation abstraction is "created when a specific character of an entity class can be identified such that it is described as a subclass of the original class" (p.1490). Aggregation and association are both forms of geographical neighbourhood. "An aggregation is created when entities of the same or different entity classes form part of a more complex entity as a rigid structuring of parts" (p.1491). With aggregation there must be a substantive connection between entities. With association, entities are grouped as well, but this is based on looser relationships.
If the four above abstractions form the basis of structural geographic identification, how can these be represented within GIS? Nyerges (1991a) outlines a range of techniques for representing knowledge within a semantic data model. These are type hierarchy, functional dependency, domain role, definition, schema, attached procedure, and inference rule. The following summarizes these.
Type hierarchy is the ordering of classes according to generality.
Functional dependency indicates whether the entities are primary or secondary referents. Primary referents are independent, while secondary are functionally dependent on primary entities.
Domain roles interpret the interaction of an entity in relation to another.
Definition can be of three types: (1) classical - the use of conditions to show inclusion or exclusion; (2) prototype - the use of best examples to determine inclusion or exclusion; and (3) probabilistic - the use of statistical commonality to demonstrate inclusion or exclusion.
Schema describes default (ie. normal) occurring roles that an entity type plays in relation to another type.
Attached procedures are a set of external procedures that are used depending on a set of criteria.
Inference rule represents reasoning based on explicitly
stored knowledge of entity classes.
The formalisation of structural meaning into four abstraction types and then outlining representation techniques, is an important attempt to develop definitions for geographical meaning that can be implemented with computers. These building blocks can perhaps be implemented within a GIS. They now need to be tested in relation to particular entities, and an attempt to automate landscape classification will provide this test.
The four types of abstraction can be used to express the nature of landscapes. Landscape can be seen as an association of components. For example, a mountainous, forested landscape is an association of mountains and forest. Different components can be seen as an aggregation of sub-components. For example, a forest is an aggregation of large trees that may consist of a range of species. It has already been stated that landscapes are a generalisation, and the previous example is also an example of this. This demonstrates that abstractions are interrelated and complex. Describing landscapes using these abstractions raises questions as to the exact nature of the associations, and how components are aggregated, generalised, and classified. To answer these, it is necessary to express these abstraction types using representation techniques. This research will attempt to do this using representation techniques available within GIS. Nyerges' formulation of possible techniques provides a useful overview at a generalised level. To develop an automated approach requires the exact specification of GIS functions, such as overlays, conditional statements, and neighbourhood functions. The language used to express representation techniques in this thesis will therefore be at a GIS level rather than at the general level used by Nyerges.
One representation technique that Nyerges did not mention specifically was the use of fuzzy set theory, although this could be regarded as an inference rule. The foundations for fuzzy set theory were first laid by Zadeh (1965). Since then, it has been of growing research interest, especially with the development of GIS. Fuzzy set theory provides a strict mathematical framework in which imprecise conceptual phenomena can be studied. It can be thought of as a generalization of classical set theory, but instead of using the binary choice of two elements, weighted membership with more than two elements is used. This weighting of membership allows a continuum of possible choices that can be used to describe imprecise terms (Zimmermann, 1992). For example, with landforms, there is not a clear distinction between mountains and hills. Some areas may be described as either a mountain or a hill. Such an area could be classified as 50 percent mountain and 50 percent hill, while areas that are clearly mountains or hills could be described as 100 percent mountain, or 100 percent hill, respectively. Landscapes are inherently fuzzy in nature because they are human constructs. Different people perceive landscapes differently and this needs to be incorporated in a classification. Fuzzy set theory provides a theoretical framework for expressing fuzziness. This now needs to be incorporated within operational definitions.
Of the four abstractions presented by Nyerges,
classification and aggregation are easy to represent using GIS. Objects can be assigned to
classes simply by selecting objects and naming them. Aggregation can be implemented by
using overlay techniques. The representation of generalisation and association is more
complex and will be discussed in detail in the following sections. Related to operational
definition is the need to balance complexity with functionality and this will also be
discussed separately.
3.4.1 Generalization
As discussed in chapter 2, landscape perception involves generalisation and it is necessary to incorporate this in a landscape classification. This generalisation is complex because it is an overall impression of an area obtained from exploration and movement. The question is: How can GIS incorporate this? This section discusses in more detail why generalisation is an issue, and also identifies techniques for resolving this issue.
Many existing databases have far more information than is needed in a landscape classification. The information in such databases cannot be perceived in reality from a reasonable distance. These databases may have been developed by researchers in specialized fields, such as botanist, soil scientists, and geomorphologists, with special purposes in mind, for instance to provide understanding on geomorphological process, protect species diversity, or determine the optimal crop. For deriving a landscape classification, it is not always optimal to import these databases directly without some form of generalisation. For example, Landcare's digital vegetation map has many more classes than can be normally perceived from a distance, as it was not developed for this purpose.
Generalisation is a contemporary problem resulting from developments in information technology. In the past, the degree of detail used in a model or classification has been limited by resources, especially finance (Jeffers, 1973). Classifications have contained as much information as can be obtained within the budget for the project. Converting the "firehose of data" that is available today to useful information is becoming ever more a generalisation problem. Techniques are now required to process this information and derive adequate generalisations. It appears that with GIS and national digital databases, it is the easy option to produce a detailed classification. The harder option is to produce a meaningful generalised classification.
With landscape classification, the composition of landscape components need to be considered. Since there are many different landscape components, there exists the potential for a very large number of possible compositions. These compositions need to be generalised to ensure the number of classes is at a useful level. It is difficult to know exactly what level of generalisation is appropriate, because of the different scales that the classification may be used for. It was concluded in section 2.8 that a hierarchical classification with a range of different level of generalisation is needed.
Nyerges (1991a) makes the distinction between cartographic generalisation, and the use of generalisation for geographical database abstraction. Cartographic generalisation commonly applies to selection, simplification, classification, induction, and symbolisation of maps. It is concerned with removing unwanted detail when a scale change takes place, and removing unwanted detail for thematic mapping (Armstrong, 1991). Newsome's (1987) vegetation map uses shading and symbols to express three levels of generalisation. When the map is viewed from different distances, different amounts of detail become apparent. Generalisation for database abstraction is concerned more with "a concept having a more general interpretation than some other concept with a more specific interpretation" (Nyerges, 1991a, p. 67). This is the way the term is used in the philosophy of science literature, and is the intended use in this thesis since the concern is with database abstraction. However, it appears that although cartographic generalisation may have different purposes to database abstraction, its generalisation can be similar. For instance, grouping trees into one symbol and calling the symbol a forest is an example of cartographic generalisation and generalisation for database abstraction.
Within GIS there is a range of different generalisation techniques available. Shea (1991) calls these "rule groups" and has provided a model that portrays them (refer to figure 3.3). The model has been provided in relation to cartographic generalisation but may be useful for geographical abstraction. Conditional rules are the basic mechanisms for generalisation, of which there are five types:
(1) existence, which test for the presence or absence;
(2) scope, which test for specific instances of some characteristic;
(3) fact, a test for truth or fallacy;
(4) value, which examine an entity's attribute values; and
(5) relation, which address cartographic and topographic
relations.
These conditions can be applied within three types of actions:
(1) logic control, which directs the search and reasoning techniques;
(2) spatial transformations, which affect spatial data; and
(3) attribute transformation, which affect attribute
data.
The relevancy of the logic control actions in this model
is questionable as it specifies the type of generalisation rule that should be applied
rather than being an actual generalisation rule.
The combination of conditional rules and actions can then be applied to various degrees of
severity to suit requirements. In this case three levels are presented in 3: generic,
thematic, and user.
In designing a landscape classification process,
different types of generalisations need to be considered. For example, should spatial
information be generalised by deleting objects, or should attribute information be
reclassified to more general classes. It also needs to be decided what type of conditional
rule should be applied. Conditions can be complex involving many different objects and
their values, or they can be simply based on the existence of one class. The importance of
different types of rules will be demonstrated in chapters 4 and 5.
3.4.2 Association
Within GIS, there are many different methods for expressing association and it is necessary to determine which are the most appropriate. As previously mentioned, overlays can be used for expressing aggregation, but overlays could also be used to express association, since the distinction between these two abstractions is not that clear. Overlaying by itself is limited for expressing neighbourhood associations as it cannot identify whether two objects are within the vicinity of each other unless they occupy the same space. Other functions in conjunction with overlays have therefore been used for expressing wider neighbourhood associations. For landscape classification these have included buffer functions (Kliskey and Kearsley, 1993), nearest distance calculations (Lesslie et al., 1988), viewshed analysis (Bishop and Hulse, 1994), and focal neighbourhood functions (Duffield and Coppock, 1975). It will be argued that focal neighbourhood functions (described in section 0) are the most appropriate for this task
A buffer function only indicates that a particular entity is present within a specified distance. It does not indicate how much of that entity is present, or how far away that entity is (except that it is within the buffer zone). Nearest distance calculations determine the distance to the nearest object in question but will not indicate the magnitude of the object. For example, if the spatial influence of roads from a particular point need to be determined, and from that point there is one road 10 km to the south and another road 11 km to the north. The use of nearest distance will give a value of 10, whether the road to the north existed or not. If a focal mean function was used then the output will be affected by both roads and is therefore more sensitive. However, if a road also went through the central point, then the nearest distance will be zero. The focal mean would be affected by this and also the roads in the distance. This may or may not be appropriate since the roads in the distance may be considered too far away. If it is desirable that roads too far in the distance not be included, then this can be achieved by limiting the neighbourhood search radius.
Although viewshed analysis is becoming a standard function within GIS, focal mean functions may be more appropriate for determining the spatial influence of different objects. This is because landscape perception is not just affected by what is directly visible, but also by what has been experienced through movement and exploration. This point has been made in section Error! Reference source not found., and is based on the work of Zube, Sell, and Taylor (1982) who reviewed 160 landscape articles from 20 different journals. The need to incorporate movement and exploration has therefore been stated as a criterion for landscape classification. Focal mean functions can express the spatial influence of objects within the vicinity of a particular point regardless of whether or not it is in direct line of sight. Focal neighbourhood functions will therefore be the main GIS function used to express spatial association and their effect is demonstrated in chapters 4 and 5.
3.4.3 Complexity versus functionality
Operational definition requires that the process can be run adequately within the confines of GIS, and also provide adequate representation. The process needs to be complex enough to give useful results, but also needs to be functional within GIS. Moore et al. (1993) provide a graph showing the tradeoffs between complexity and functionality regarding mathematical modelling (refer to Figure 3.4). It shows that if a model is too complex, requiring substantial amount of data and processing, then the model will not function very well. This is because the demands on computation will be too great. It is difficult to know exactly what the dimensions of this negative relationship are. It is shown as an exponential curve, but it may not be. The figure is just an abstract illustration that is useful for discussing this important tradeoff.
Moore et al.'s figure, however, only regards functionality in terms of "ease of use" (p.198). Functionality should also consider how well the model depicts reality, which is also a function of complexity. If a landscape model is too simple and does not reveal the important subtleties that are present in reality, then the model is not functioning very well. It should be noted that complexity and degree of generalisation are not the same thing. A model can be quite general, with only a few broad classes, but the process for deriving this generalisation may be very complex involving large detailed databases and sophisticated calculations.
To develop a functioning model, it is therefore necessary to choose an appropriate level of complexity that is computationally feasible, and that can identify important subtleties. It appears from past research that manual methods have been unable to balance these two criteria to produce a functioning model. This is either because the manual models were too computationally demanding, thus requiring considerable resources, or the resulting classification was not complex enough to be of any use. The question that will be addressed in this thesis is: Can GIS function acceptably at the required levels of complexity?
Moore et al. (1993) suggested two important principles that a model should follow - parsimony and modesty. A model should be parsimonious in that it should not be more complex than it needs to be, and should include only the smallest possible number of parameters. A model should be modest by not pretending to do too much.
3.5 Investigation method
It has already been said that this research is about investigating the application of GIS to landscape classification. The discussion so far has provided a general theoretical framework, and identified major issues, such as operational definition and generalisation. It is now necessary to put this theory into practice and actually apply GIS to this problem. This section outlines the method that will be used for doing this, as well as the study area and a discussion on validation issues.
It has been argued that landscape is composed of landform, vegetation, naturalness, and water. To simplify landscape classification, these attributes will be classified separately, and then the landscape classes can be constructed from the unique combinations of these four layers. When classifying the separate attributes, it will be necessary to consider that some of these attributes, for instance vegetation and naturalness, are interlinked.
The main method used for developing an automated landscape classification can be regarded as a kind of simulation. Simulation can be defined as the representation of the characteristics of one system through the use of another system, such as computers. The system being represented is manual landscape classification. The characteristics deemed important have been incorporated in the criteria listed in chapter 2. Simulation is a powerful tool within GIS that have macro language capabilities. A process, once developed, can be easily altered by simply changing parts of the program. The sensitivity of different parameters can be investigated by using a range of parameter settings and comparing the resulting outcomes either visually or quantitatively. Parameter settings can be changed using variables within a "Do Loop". In this way many different outcomes can be produced with relative ease. The GIS used for this investigation was ARC/INFO 6.1.2. and the hardware was a SUNSPARC 10 workstation.
Display of outcomes can be a problem because of the quantity produced. To facilitate comparison between maps, information on the hard copy outputs will be kept to a minimum. For instance, most of the maps produced will not contain north arrows and scale keys. For all maps the north direction is up. The scales vary, but Figure 3.5 has a scale bar, and from this the approximate scale of the other maps can be ascertained. For all maps that are raster based, the cell size will be given. Many maps will also display the main roads and hydrology layer for geographical reference purposes.
The classification process developed in this study will convert vector databases to raster databases to aid spatial analysis. With raster databases it is necessary to decide on an appropriate cell size. The effects of cell size are complex and will be a major part of this study. Consideration will be given to the processing speed, the spatial resolution of the NDDBs, and the objects that are being identified. A cell size of 500m will initially be used, but the effects of smaller and larger cell sizes will be investigated.
3.5.1 Study area
Four factors were considered in choosing a study area. The first of these was that the area should have a suitable range of landscapes so that the generality of the landscape process can be tested. This requires that the study area vary significantly in landform, vegetation, naturalness, and the influence of water. It also helps if the study area is well known to the researcher so that the outcome can be easily compared with reality. If this was not the case then each output would have to be systematically compared with representations of reality such as hardcopy topographic maps and photographs. It may also be necessary to conduct field visits. When the study area is known, then the outputs can be more quickly assessed, and spurious output spotted. Another consideration for determining a study area was that the necessary digital data can be obtained. It is also beneficial if the area has already been classified manually.
The area chosen is a cross section of the middle of the
South Island of New Zealand (5). It consists in total of approximately 3.7 million
hectares, however, only 2.8 million hectares of this are land. When divided into pixels of
500m cell size, a matrix of 328 (rows) by 453 (columns) cells result. The study area
consists of a large
variety of landscapes. On the east coast there is the extensive Canterbury Plains, dividing the east and the west are the Southern Alps with mountains up to 2500m high, and on the west coast there is a relatively narrow strip of flat and hilly landforms. Banks Peninsula on the east coast is an extinct volcano with hilly to mountainous topography. The vegetation over the study area also varies. There is expansive pasture on Canterbury Plains and the adjacent foot hills, a mix of forest and tussock in the Southern Alps, and a mix of forest, scrub, and pasture on the west coast. A range of human modification also exists. Christchurch is an industrialized urban area of 300,000 people, while parts of the Southern Alps and west coast are relative wilderness. Several large rivers and lakes are present, as well as a range of coastlines. An assessment of the landscape in this area can be obtained from DOSLI's NZMS 262 (1:250,000) topographical maps - sheet numbers 10-13, and from Landcare's vegetation map. This study area is well known to the author who has travelled extensively throughout this region, both through work and a passion for exploration and outdoor recreation. Most of the important databases for this area were also obtainable after some negotiation. A disadvantage with the study area is that the landscape has not previously been manually classified using the attributes landform, vegetation, naturalness, and water. There is only one area in New Zealand that has been classified using these attributes, and that is the Auckland region (ARA, 1984). The Auckland region would not have been appropriate for this study because of the lack of landscape contrast there, and because it is unfamiliar to the author. The landscape in the study area has, however, been classified using different attributes. The most notable of these is the classification developed for the survey of natural resources (Ministry of Works and Development, 1983). This used formal artistic criteria and tended to evaluate rather than classify character. The Canterbury Regional Council recently commissioned a landscape study, but the classification resulting from this was an inventory of physical characteristics - similar to the Land Resource Inventory. It includes information, such as soil and genetic geomorphology, that is not directly relevant to a perceptual landscape classification.
3.5.2 Validity
To develop a classification process, there must be some way of assessing the worth of the output. Without some form of assessment of validity one cannot say whether a classification is useful or know whether it needs improving. Validating a computer generated landscape classification is particularly difficult because of the complex nature of landscapes. It is not possible to develop a landscape classification and then compare this with the real world because landscape classes are human constructs that only exist in the mind. The components of landscapes can be assessed in the field, but landscapes are a generalisation and composition of these components. Validation of landscape classifications has not been seriously discussed in the literature, and is in itself a theoretical issue.
In science, it appears that classifications are validated by further research. Classifications can be seen as representations of knowledge. As knowledge of a particular field increases with research, the validity of the existing classification can be assessed. It is in this way that classifications have evolved. By using a landscape classification as a frame of reference in applied or theoretical research, the usefulness of that classification will become apparent. If inconsistencies result between different areas for the same class, then the classification has perhaps not captured the essence of the landscape character. For example, if a public preference survey shows that the quality of a class in one area is high, and in another area the same class is low, then the important characteristics of landscape may not have been totally incorporated in the classification. Unfortunately, in this study it is not possible to validate a landscape classification in this way because of the time and resources required to do further research. It will therefore not be possible to say whether the resulting classification is valid in this sense.
Two approaches for validating a process are: (1) to examine the outputs by comparing with a desired output, or (2) to examine the process itself (which includes input). If the process appears sound then the output can be assumed to be valid. With landscape classification it is difficult to validate an output because there is no correct output with which to compare it. Validation therefore needs to be predominantly process based.
A comparison will be made between these manual approaches and a GIS approach, based on the general and specific criteria, to determine whether there has been an improvement. It is only possible to compare classification processes since it is not possible to physically compare outcomes. Even comparing processes is difficult because manual methods are often intuitive.
Errors are another issue that will need to be considered
for assessing validity. Errors include database errors, computational errors, and logical
errors. Because GIS is particularly powerful with spatial information, errors can be
easily propagated (Goodchild, 1993). It is therefore necessary to confirm that this has
not been the case to ensure the classification is valid.
3.6 Past research
The use of computers for landscape assessment has been mostly limited to programmes that give perspective views (such as VIEWIT), photomontage, and also overlays of grids and polygons (Brown, 1981). Past research in automated landscape classification is extremely limited. There does not appear to be any automated process developed for classifying landscapes as a whole, although, there has been some work on the use of GIS for identifying some attributes of the landscape. Barbanente et al. (1992) used GIS and digital databases to automatically identify three landscapes: cliff, ravine, and system of farms in regular grid. Lesslie et al. (1988) used GIS to map wilderness areas in Tasmania, and Kliskey and Kearsley (1993) did a similar study in New Zealand. As mentioned previously, Duffield and Coppock (1975) used a primitive GIS that had focal neighbourhood function capabilities for delineating recreational landscapes. These researchers have worked independently of the landscape theory previously discussed. They have only examined one part of the landscape, and have therefore not addressed what are the important attributes needed for a total landscape classification. The components that they have identified are not necessarily relevant for a landscape classification. Many of them cannot be easily incorporated into a total landscape classification because they are too detailed.
Jackson (1990) discussed the application of GIS for identifying landscape features in New Zealand using digital terrain models (DTM). This is one of the earliest published applications of this type of research in New Zealand. Jackson demonstrated how information on slope, aspects, contours, and views could be obtained. This was at a time when some of these functions were quite new for commercial GIS. Now, many more functions have been made available. There has also been considerable progress in the development of national digital databases, although some problems that Jackson mentioned regarding availability are still pertinent today. It is now possible to implement some of Jackson's ideas on a larger scale and identify more features of the landscape.
There are many other landscape character assessment studies that have used GIS, including Brooke (1994), and Canterbury Regional Council (1993). With these studies, GIS has been used mainly as a presentation tool, and the analysis has been done with non-GIS means. These studies are not very useful for this research because it is the automation of analysis within GIS that will be investigated. Although their criteria are not explicitly stated, they appear to be based on different criteria to those established in this study.
There has been some work by geomorphologists and hydrologists using GIS to automatically extract terrain information from digital databases, which is of considerable relevance to this research. Lay (1991), Cowen (1993), Dikau (1989), Tang (1992) and Weibel and DeLotto (1988) have all discussed different aspects of this type of research. These works are of interest because terrain information is important for characterising the landform attribute of the landscape classification, and because some of the techniques can be applied in this study.
Past research will be reviewed in detail in chapters 4
and 5 in specific reference to the different landscape attributes.
3.7 Summary
There is an increasing range of NDDB that contain information useful for landscape classification. Most of these databases contain low level conceptual information and can be classed as primary or secondary NDDB. It is important for decision makers that tertiary NDDB, which address a single theme such as landscape, are derived from them. A landscape classification requires high level conceptual information. Operational definitions based on four abstractions - classification, generalisation, association, and aggregation, provide a framework for deriving this information from the low level information available in NDDB. A difficult challenge in using GIS to classify landscapes automatically, is expressing association and generalisation. Focal neighbourhood functions can be used to express association and it has been argued that they are more appropriate than buffer functions, nearest neighbourhood functions, and visibility functions. Generalisation can be achieved using attribute, and spatial information along with a range of different conditional rules. The role of these procedures will be demonstrated in the following chapters, which deal systematically with each of the four attributes (vegetation, naturalness, water, and landform) to be used in the landscape classification.