Interactive species distribution reporting, mapping and modelling using the World Wide Web

Tony Boston and David Stockwell Environmental Resources Information Network (ERIN), Canberra, Australia.

Second International WWW Conference '94 : Mosaic and the Web - Chicago, USA

Abstract

The Australian Environmental Resources Information Network (ERIN) is setting up a national facility allowing ready access to key information about the Australian environment. The World Wide Web represents a revolution in our ability to provide access to such information through easy to use hypermedia computer interfaces.

This paper describes links which have been set up between WWW and the ERIN database to allow fast, easy access to information on the distribution of plants and animals, information on nature conservation and other protected areas, management information on government programs and projects and information on data sets held by ERIN. Using the WWW CGI interface, a generic script has been written in Perl which accepts form parameters from a Web client, passes them to database reports and other programs which output HTML documents for viewing by the Web client.

One such application, an integrated, automated spatial reporting, mapping and modelling system for species distributions is described in detail. Users simply specify a species name or click on a geographic region and the system automatically produces reports, data files, maps and images of bioclimatic models based on the species' recorded distribution. The system interactively accesses the ERIN database for records of species occurrence, then passes these data to mapping and modelling programs. The modelling program GARP (Genetic Algorithm for Rule Set Production) simulates both BIOCLIM analyses and a variety of other forms of habitat modelling. Output from the modelling programs are predicted potential distributions for a species. These can be compared and validated against field observations and used to indicate new areas where a particular species may occur.

1. Introduction

Access to environmental information held in government and other databases should be made as easy as possible. There is a growing interest in and need for environmental information from government decision makers, scientists and the general public. The World Wide Web represents a revolution in our ability to provide access to such information through easy to use hypermedia computer interfaces.

The Australian Environmental Resources Information Network (ERIN) is setting up a national facility allowing ready access to key information about the Australian environment. ERIN was formed in 1989 to "draw together, upgrade and supplement information on the distribution of endangered species, vegetation types and heritage sites". ERIN aims to provide access to information about the environment, for use in decision making, through an easy to use computer interface. ERIN is using Internet information retrieval tools such as the World Wide Web and Gopher to access information held in its database and geographic information systems.

The ERIN WWW server was made public in August 1993 and contains information on biodiversity, terrestrial landscapes, marine and coastal environments, weather and climate, human impacts and the state of the environment. This service is accessed about 2000 times per day, transferring an average of 9 MB of images/ information per day to people in Australia (over 50%), USA (30%), Canada (5%), UK (4%) and over 30 other countries. The server uses the NCSA httpd v1.3 software running on a single processor Sun Sparc 10 with the SunOS 4.1.3 operating system.

The ERIN database is made up of six integrated modules which have been implemented on a Sun DataCentre 2000 using ORACLE version 7 and the Solaris 2.3 OS. The database contains information on plants and animals and their recorded distributions, scientific names for plants and animals and the history of name changes, the conservation status of plants and animals, information on nature conservation areas in Australia, Australian government environmental programs and projects and a meta-data database of information on data sets held by ERIN and associated agencies.

2. WWW interface to environmental information

The World Wide Web [Bern1] is a wide-area hypermedia information retrieval project started in 1989 by Tim Berners-Lee at the CERN European Laboratory for Particle Physics. Documents in the World Wide Web are written in Hypertext Markup Language (HTML) [Bern2] and can contain references to images, sound and video files as well as marked up text. URLs (Uniform Resource Locators) [Bern3] are used to locate information on the World Wide Web, each URL specifies the protocol, server and document identifier of the information to be retrieved.

Using the HTML hypertext format and URLs it is possible to provide more than simple access to hypertext documents. Information references in the WWW can just as easily refer to dynamically generated information (Putz 1994) as pre-existing files. Using the WWW Common Gateway Interface (CGI) [McCool], programs and scripts written in Perl, C or the UNIX shell can be used to access data stores of any type to dynamically return information to a Web client based on query criteria provided by the user. This ability provides a powerful mechanism for making information available from diverse data sources such as databases, geographic information systems and modelling packages and is being used around the world at many WWW sites. Documents dynamically generated are treated exactly the same way by Web clients as static documents and can contain embedded references (URLs) to further information.

The interface between the WWW and the ERIN database is shown in Figure 1 below:

The database is accessible using any Web client software which supports the ISMAP and forms facilities such as NCSA Mosaic [Mosaic] . A generic CGI script was written in Perl (Wall and Schwartz 1990) to interface between a Web client and the ERIN database. The Perl script takes parameters input by the user via forms, using the 'POST' method or passed via a URL using the 'GET' method and assembles them and then passes them to a report which runs against the ERIN database. The reports are written in the report writer SQR but could just as easily be written in basic SQL or SQL embedded in C. For example, an interface to information on Australian Public Lands and Protected Areas is accessible. It consists of a form shown in Figure 2. The 'POST' method is used and the report name is coded into the form as a hidden field:
input TYPE="hidden" NAME="report" VALUE="MAN100R"
The user enters query parameters into form fields and selects values from pick lists before pressing the query button to access the database:

The same query could be performed using a URL in the 'GET' format:
http://kaos.erin.gov.au/cgi-bin/ERIS.pl?report=MAN100R&name;=heron&
region=QLD&type;=Nature+Conservation+Reserve:Marine+Park

The reports return information to the client in HTML format with embedded links to more detailed information held in the database. These links contain the information required to re-query the database (usually a table's primary key) in the 'GET' format. Using the primary key makes re-querying the database fast and efficient especially as the record is already in memory. The interface allows users to specify a number of query parameters, and receive a summary report of records which satisfy these criteria, before examining individual records in detail. Using the WWW, it is possible to build an interface which enables the user to navigate the network of information stored in the database via the links (primary and foreign keys) which exist between tables in the database schema. For instance the user might navigate from information about a project, and the organisations and people involved in the project to data sets generated by that project to the actual data. The power of the Web is that reports run against the database can return HTML documents and images which contain embedded links to related information. To this extent user access paths to information are dynamic.

3. Interactive species distribution reporting, mapping and modelling

The ERIN database contains over 1 million records of the occurrence of plants and animals in Australia. This database has been compiled from museum and herbarium specimen records as well as observational information from flora and fauna surveys. The information compiled has focused on endangered species and major landcover vegetation species such as Eucalypts, Acacias, Chenopods and arid grass species. The data have been loaded and validated for the species name and geographic location. Most of these data are freely available, but for some data sets the exact locations of the species are sensitive, and in these cases the latitude and longitude is rounded to the centre of a 1/2 degree grid.

Users require answers to questions such as:

What is the distribution of species X?
Where else might species X occur?
What species occur in region Y?

The application allows the user to access the information by species name or region, in this case the 1:100,000 Australian topographic maps.

3.1 Access by species name

The user can query the database for a particular species. The user is required to enter the genus, species and optionally the subspecies name and also select the type of output desired. At present these are:

Data File - raw data records in a comma delimited ASCII file which can be loaded into the user's own package for analysis
Report - an ASCII report file of all data for the species
Map - displays the data as a series of points on a map of Australia
BIOCLIM Model - produces a bioclimatic model of the predicted distribution of the species based on climatic variables
GARP Model - produces a genetic model of the potential distribution of the species based on environmental variables

The output type form field has a name of "report" and a value which corresponds to the report to be run. This report name is sent to the CGI Perl script and determines which output type is returned to the client.

3.1.1 Help system

Help is provided with scientific names for plants and animals, while access via common names is a future enhancement. The help system first asks users if they need help with plant or animal names and asks for the first initial of the genus name. On querying the database, a list of Genus names is returned in a scrollable pick list. After choosing a genus name and pressing the query button a form is returned with all species names for the genus. The user can then select the species name and output type and query the database. This series of help screens is controlled by the generic CGI Perl script and a single SQR report which detects what the user requires based on hidden fields which are in the form. The use of hidden fields in dynamically generated HTML documents is a mechanism for keeping track of what users have queried ('state' information).

3.1.2 Map output

The map output option produces a map of the Australian continent and displays the species occurrence records as a series of points. Figure 3 shows a typical output for the koala (Phascolarctos cinereus).

Also included is a report summarising the institutions from which ERIN obtained the information and the years during which the original data were collected. The user can find additional details about records from a particular location by selecting the output type form field (Data File or Report) and clicking on the map. This facility uses the ISMAP Form facility to combine a form choice (Data File or Report) with a selection by the user of a point on the map. This form calls a CGI Perl script which records the user's choice in a temporary file, calls the 'imagemap' program to translate the X,Y pixel location on the map into the corresponding 1:100,000 map sheet. The imagemap program then calls another CGI Perl script which assembles all the user parameters and passes them to the chosen SQR report. The report returns either a data file or report of all records of the chosen species found within the 1:100,000 map sheet selected.

The map output is built using location data retrieved from the ERIN database (latitude and longitude) which are converted to Miff image format by a program called 'rasterize', overlain on an outline of Australia and then coverted to Gif format by the ImageMagick [Cristy] public domain software. A typical species map of 1000 points takes about 15 seconds to complete and the resulting file is between 5 and 15 Kbytes depending on the number of points. The map is a simple raster file in a geographical projection at a scale of approximately 1:30,000,000 with a cell size of 10 minutes. 1:100,000 map sheets are 1/2 degree square (30 minutes) which corresponds to a square of 3 X 3 pixels on the map image.

3.1.3 Model output

The modelling program GARP (Genetic Algorithm for Rule Set Production) [Stockwell] simulates both BIOCLIM analyses and a variety of other forms of habitat modelling. Output from the modelling programs is the predicted potential distributions of a species. These can be compared and validated against field observations and used to indicate new areas where a particular species may occur.

BIOCLIM analyses can be used for predicting the potential distribution of species or vegetation types based on the value of a set of climatic variables estimated for all sites. The approach was conceptually developed by Nix (1986) and first made available as a package by Busby (1986). The BIOCLIM method uses 36 climatic variables estimated at each species site. These are the 12 monthly mean values for precipitation, maximum temperature and minimum temperature. Combinations of these are used to derive 16 indices considered to have biological influence. These are used to build a climate profile for a species, which is then used to predict the potential distribution of the species. An example of a BIOCLIM model for the species Eucalyptus tetrodonta is shown in Figure 4:

The central assumptions of this model are:

the distribution of a species is very strongly influenced by climate
the distribution of the climate variables is standard normal
all variables with restricted ranges influence the species of interest

The GARP model uses a genetic algorithm (Stockwell and Noble 1992) which includes the BIOCLIM indices as well as 13 other terrain, soil and geology variables to predict potential species distributions. It aims to develop models with:

all forms of environmental data, including categorical data and both presence and presence/ absence data
arbitrary distributions, including bi-modal (two-peaked)
concise rules using few variables

The following sections describe the operation of the GARP application:

3.1.3.1 Database

The data are extracted from the ERIN database in a point coverage ASCII file composed of the locations of occurrence of the species:

X coordinate (longitude in decimal degrees)
Y coordinate (latitude in decimal degrees)

These data serve as input into the rasterize program, which outputs the number of occurrences in a 10 minute grid cell as a list of bytes, scaled between 1 and 254, and arranged in a row column scan pattern. This format is called a 'layer'.

The program 'presample' uses these layers to produce two random samples of the point data output as a training file 'train' and a testing file 'test'. If a mask variable contains a 0 or a 255 the point is not selected and options allow values of the predicted variable to be selected in given proportions.

3.1.3.2 Modelling

The output of the presample program, called 'train' is used as input to the GARP algorithm in the program 'initial'. Initial produces an initial model as a starting point for the GARP algorithm. The GARP algorithm produces a file 'model' as output.

Once the model has been developed using GARP, the model can be used with the program 'verify' to test the performance of the model. Verify provides an estimate of overall predictive accuracy and a confusion matrix which tabulates the accuracy of the prediction in each category over the files train and test.

The program 'predict' then generates a predicted layer. Predict takes the file population and outputs the four layers:

predI - the prediction of the model
fragI - the rule used to predict the outcome
certI - the certainty, a form of error
probI - a probability surface for the prediction

The program 'translate' translates the rules into natural language.

3.1.3.3 Visualisation

The layers are output in the Miff image format and then coverted to Gif format by the pubic domain software ImageMagick for viewing in the WWW. An overall visualisation of the model output is produced as shown in Figure 4 plus images showing the original point data, probability of occurrence, model prediction over area, an explanation of the rules used to make predictions, and a certainty of probability surface (error).

A typical BIOCLIM or GARP analysis takes approximately 1 minute to complete. The modelling programs which make up GARP were written by David Stockwell in C and are available on request by emailing to davids@erin.gov.au.

3.2 Access by geographic region

This part of the application is still under development. The user is presented with a map of Australia. They can select the required state/ territory and zoom into a particular state using a simple ISMAP call. The state is then displayed as an ISMAP Form with a pick list of the category of species required:

Acacias
Chenopods
Eucalypts
Grasses
Rare or Threatened Plants
All Plants in the ERIN database
Birds
Koalas
Snakes
Endangered and Vulnerable Animals

The state/ territory map has a grid overlay of the 1:250,000 map sheets. The user can click on the map which resolves the 1:100,000 map sheet and queries the database to produce a list of species for this map sheet. The user can then click on a particular species name and the system builds a form from which they can access the reports, maps and models described in section 3.1.

4. Future Enhancements

The species reporting, mapping and modelling application described has many limitations. Among the most obvious are the inability to zoom in, lack of contextual information such as cities, major roads, rivers etc. It may be possible in future to extend the GARP modelling application to allow display of vector data or convert the mapping part of the application to another mapping package such as Arc/ Info to allow the display of such data held in the ERIN GIS. Improvements to the ISMAP facility available in Web clients such as the in-built ability to define a region by clicking and dragging (to form a box) or definition of a polygon by a series of mouse clicks would enable improved spatial searching and zooming capabilities.

5. Conclusion

Using the World Wide Web, dynamic query and retrieval interfaces can be created easily between WWW clients and other data sources such as databases, geographic information systems and modelling systems. The WWW has created an open, well documented, simple and effective means for data centres to provide easy to use and interesting hypermedia interfaces to their information.

6. Acknowledgements

David Crossley of ERIN who prepared the maps and ISMAP files used in the access by geographic region (Section 3.2).
ERIN Unit staff for their work in collating the data in the ERIN database and for encouragement, support and suggestions.

7. References

[Bern1]
Berners-Lee, T. (1994).
The World Wide Initiative.
CERN WWW Documentation.
http://info.cern.ch/hypertext/WWW/TheProject.html
[Bern2]
Berners-Lee, T. and Connolly, D. (Eds) (1994).
Hypertext Markup Language (HTML).
CERN, IEFT Internet Draft v1.2.
http://info.cern.ch/hypertext/WWW/MarkUp/HTML.html
[Bern3]
Berners-Lee, T. (Ed) (1994).
Uniform Resource Locators
CERN, IEFT Internet Draft.
http://info.cern.ch/hypertext/WWW/Addressing/URL/Overview.html
Busby, J.R. (1986).
A biogeoclimatic analysis of Nothofagus cunninghamii (Hook.) Oerst. in southeastern Australia.
Australian Journal of Ecology 11: 1-7

[Cristy]
Cristy, J. (1994).
ImageMagick v 2.3.6 94/01/01
Copyright 1994 E. I. du Pont de Nemours & Company
ftp://ftp.x.org/contrib/applications/ImageMagick/

[McCool]
McCool (1994).
The Common Gateway Interface.
NCSA WWW Documentation.
http://hoohoo.ncsa.uiuc.edu/cgi/overview.html
[Mosaic]
Mosaic Project Team (1994).
NCSA Mosaic Home Page.
NCSA WWW Documentation.
http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/NCSAMosaicHome.html
Nix, H. (1986).
A biogeographic analysis of Australian elapid snakes.
Atlas of elapid snakes of Australia. Australian Flora and Fauna Series, Number 7. pp. 4-15. Bureau of Flora and Fauna. Canberra: Australian Government Publishing Service.

Putz, S. (1994).
Interactive Information Services using the Wold-Wide Web Hypertext.
CERN: First International Conference on World-Wide Web, May 25-27, 1994. http://www1.cern.ch/PapersWWW94/putz.ps
Stockwell, D.R.B. and Noble, I.R. (1992).
Induction of sets of rules from animal distribution data: a robust and informative method of data analysis.
Mathematics and Computers in Simulation 33: 385-390

[Stockwell]
Stockwell, D.R.B. (1994).
Genetic Algorithm for Rule-set Production (GARP).
ERIN WWW Server.
http://kaos.erin.gov.au/general/biodiv_model/ERIN/GARP/home.html
Wall L. and Schwartz R.L. (1990).
Programming Perl.
Sepastopol, CA: O'Reilly & Associates.

About the authors

Tony Boston completed a Bachelor of Science (Hons) degree majoring in Geology from the Australian National University in 1983. After working as a Petroleum Geologist in industry and government for 4 years, he moved into the computing field and completed a Graduate Diploma in Computing Studies from the University of Canberra in 1992.

Since 1990 Tony has worked at the Environmental Resources Information Network managing the development of the ERIN database and more recently the ERIN Gopher and World Wide Web services which went public in March and August 1993 respectively. His main interests lie in the development and use of computer applications to store, manipulate, analyse and display environmental information.

David Stockwell completed a Bachelor of Science (Hons) degree majoring in Statistics from the Australian National University in 1987. He completed a PhD degree in applications of machine learning to Ecology with the Ecosystems Dynamics Group, ANU in 1992. His research interests include modelling theory (especially in ecology), AI approaches to modelling and biodiversity databases.

Since 1992 David has completed a number of consultancies in the area of ecological modelling. In a collaborative project involving ERIN, the Australian Nature Conservation Agency and the Tasmanian Parks and Wildlife Service, he developed GARP (Genetic Algorithm for Rule-set Production), which uses environmental data collected through field surveys to produce species distribution maps and models.

Please address all enquiries to:

Tony Boston <tony@erin.gov.au>
Database/ WWW/ Gopher Manager
Environmental Resources Information Network (ERIN)
Department of the Environment, Sport and Territories
GPO Box 787 Canberra ACT 2601 Australia

Last modified: Wed Sep 14 14:30:05 EST 1994