Principles of
Geographic
Information
Systems
MODULE-2: Data management and processing
systems
Complied by: Ujwala Sav
Ujwala.sav@vsit.edu.in
Vidyalankar School of
Information Technology
Wadala (E), Mumbai
www.vsit.edu.in
Certificate
This is to certify that the e-book titled “Principles of
Geographic Information Systems” comprises
all elementary learning tools for a better understating of the relevant concepts.
This e-book is comprehensively compiled as per the predefined eight parameters
and guidelines.
Date: 22-01-2022
Mrs. Ujwala Sav
Assistant Professor
Department of IT
DISCLAIMER: The information contained in this e-book is compiled and distributed
for educational purposes only. This e-book has been designed to help learners
understand relevant concepts with a more dynamic interface. The compiler of this
e-book and Vidyalankar Institute of Technology give full and due credit to the
authors of the contents, developers and all websites from wherever information has
been sourced. We acknowledge our gratitude towards the websites YouTube,
Wikipedia, and Google search engine. No commercial benefits are being drawn from
this project.
Unit II
Contents:
Spatial Data Analysis
• Hardware and software trends
• Geographic information systems
• Stages of spatial data handling
• Database management systems
• GIS and spatial databases
Recommended Books
Principles of Geographic Information Systems- An Introductory Text Book
Editors: Otto Huisman and Rolf A. Fourth 2009
Principles of Geographic Information Systems P.A Burrough and R.A.McDonnell
Oxford University Press Third 1999
Fundamentals of Spatial Information Systems, R.Laurini and D. Thompson,
Academic Press 1994
Fundamentals of Geographic Information Systems Michael N.Demers Wiley
Publications Fourth 2009
Introduction to Geographic Information Systems Chang Kang-tsung
(Karl),McGrawHill Any above 3rd Edition 2013 7th Edition
GIS Fundamentals: A First Text on Geographic Information Systems Paul Bolsatd
XanEdu Publishing Inc 5th Edition
Prerequisites and Linking:
Unit II Pre-requisites Sem. II Sem. III Sem. IV Sem. Sem. VI
V
Data DBMS CGA Projects
management
and
processing
systems
Chapter 3
Data management and processing systems
The ability to manage and process spatial data is a critical component for any functioning
GIS. Data processing systems refer to hardware and software components which can
process, store and transfer data.
3.1 Hardware and software trends
Advances in computer hardware seem to take place at an ever-increasing rate. Every
several months, a faster, more powerful processor generation replaces the previous
one today’s handheld computers is a multiple of the performance that the first PC
Handheld PC’s had when it was introduced in the early 1980’s. In fact, current
PCs have orders
of magnitude more memory and storage capacity than the so-called minicomputers
of 25 years ago. Computers are also becoming increasingly affordable. Hand-held
computers are now commonplace in business and personal use, equipping field
surveyors with powerful tools, complete with GPS capabilities for instantaneous
georeferencing. To support these hardware trends, software providers continue to pro-
duce application programs and operating systems that, while providing a lot more
functionality, also consume significantly more memory. In general, soft-ware
technology has developed somewhat slower and often cannot fully utilize the
possibilities offered by the exponentially growing hardware capabilities. Existing
software obviously performs better when run on faster computers.
Today almost any computer on Earth can connect to some network, and contact
computers virtually anywhere else, allowing fast and reliable exchange of (spatial) data.
Mobile phones are more and more frequently The UMTS protocol (Universal Mobile
Telecommunications System), allows digital communication of text, audio, and video
at a rate of approximately 2 Mbps. The new HSDPA protocol offers up to 10 times this
speed. Looking at these developments it is clear that he combination of a GPS receiver,
a portable computer and mobile phone has already dramatically changed our world,
certainly so for out-of-office activities of Earth science professionals.
Wireless LANs (Local Area Networks), under the so-called WiFi standard, nowadays
offer a bandwidth of up Wireless LAN and WiFi to 108 Mbps on a single connection
point, to be shared between computers. They are more and more used for constructing
a computer network in office buildings and in private homes.
Standard ‘Dial-up’ telephone modems allow rates up to 56 kbps. Digital telephone
links (ISDN) support much higher rates: up to 1.5 Mbps. ADSL technology widely avail-
Structured networks able through telephone companies on standard copper-wire
networks supports transfer rates anywhere between 2 and 20 Mbps towards the
customer (down-stream), and between 1 and 8 Mbps towards the network (upstream)
depending on the internet provider and quality of the network infrastructure.
3.2 Geographic information systems
GIS provides a range of capabilities to handle georeferenced data, including:
1. Data capture and preparation
2. Data management (storage and maintenance)
3. Data manipulation and analysis, and
4. Data presentation.
Planning projects require data sources, both spatial and non-spatial, from different
national institutes, like national mapping agencies, geological, soil, and forest survey
institutes, and national census bureaus. The data sources obtained may be from
different time periods, and the spatial data may be in different scales or projections.
With the help of a GIS, the spatial data can be stored in digital form in world
coordinates. This makes scale transformations unnecessary, and the conversion
between map projections can be done easily with the software.
3.2.1 GIS software
GIS can be a data store (i.e. a system that stores spatial data), a toolbox, a technology,
an information source or a field of science. The main characteristics of a GIS software
package are its analytical functions that provide means for deriving new
geoinformation from existing spatial and attribute data.
The use of tools for problem solving is one thing, but the production of these tools is
something quite different. Not all tools are equally well-suited for a application, and
they can be improved and perfected to better serve a particular need or application.
The discipline of geographic information science is driven using our GIS tools.
Some GISs have traditionally focused more on support for raster-based functionality,
others more on (vector-based) spatial objects. We can safely state that any package
that provides support for only raster’s or only objects, is not a complete GIS. Well-
known, full-fledged GIS packages include ILWIS, Intergraph’s GeoMedia, ESRI’s ArcGIS,
and MapInfo from Map-Info Corp.
There is no GIS package which is necessarily ‘better’ than another one: this depends
on factors such as the intended application, and the expertise of its user. ILWIS’s
traditional strengths are in raster processing and scientific spatial data analysis,
especially in project-based GIS applications. Intergraph, ESRI and MapInfo products
have been known better for their support of vector-based spatial data and their
operations, user interface and map production (a bit more typical of institutional GIS
applications).
Videio on GIS https://www.youtube.com/watch?v=-ZFmAAHBfOU
Video on GIS Software:
Types of GIS Software's | Types of GIS Software | gis mapping software | Geology GIS - YouTube
3.2.2 GIS architecture and functionality
A GIS consists of several functional components—components which support key GIS
functions. These are data capture and preparation, data storage, data analysis, and
presentation of spatial data. Figure shows a diagram of these components, with arrows
indicating the data flow in the system. For a particular GIS, each of these components
may provide many or only a few functions.
Fig Functional components of a GIS
Video on GIS System Architecture and Components
https://www.youtube.com/watch?v=4ze4rHXlxrM
3.2.3 Spatial Data Infrastructure (SDI)
Many organizations are forced to work in a cooperative setting in which geographic
information is obtained from, and provided to, partner organizations and the general
public. The sharing of spatial data between the various GISs in those organizations is
of key Data sharing importance and aspects of data dissemination, security, copyright
and pricing require special attention. The design and maintenance of a Spatial Data
Infrastructure (SDI) deals with these issues.
An SDI is defined as “the relevant base collection of technologies, policies and
institutional arrangements that facilitate the availability of and access to spatial data”.
Fundamental to those arrangements are in a wider sense the agreements between
organizations and in the narrow sense, the agreements between software systems on
how to share the geographic information.
Standards exist for all facets of GIS, ranging from data capture to data presentation.
They are developed by different organizations, of which the most prominent are the
Inter-national Organization for Standardizations (ISO) and the Open Geospatial
Consortium (OGC).
Additional learning: To know details about OGC standards.
Standards (ogc.org.tw)
Video on SDI: https://www.youtube.com/watch?v=Dwd7jZD0sMM
3.3 Stages of spatial data handling
The functions for capturing data are closely related to the disciplines of surveying
engineering, photogrammetry, remote sensing, and the processes of digitizing, i.e. the
conversion of analogue data into digital representations.
3.3.1 Spatial data capture and preparation
Traditional techniques for obtaining spatial data, typically from paper sources,
included manual digitizing and scanning. Table lists the main methods and devices
used for data capture. In recent years there has been a significant increase in the
availability and sharing of digital (geospatial) data.
Method Devices
Manual digitizing coordinate entry via keyboard digitizing
tablet with cursor
mouse cursor on the computer monitor
(heads-up digitizing)
Automatic digitizing (digital) photogrammetry
Scanner
Semi-automatic digitizing line-following software
Input of available digital data CD-ROM or DVD-ROM
via computer network or internet (including
geo-webservices)
Additional learning: Explore the GIS data capturing methods.
Want to know more about data capture methods in GIS? (uizentrum.de)
The data, once obtained in some digital format, may not be quite ready for use in the
system. This may be because the format obtained from the capturing process is not
quite the format required for storage and further use, which means that some type of
data conversion is required.
3.3.2 Spatial data storage and maintenance
The way that data is stored plays a central role in the processing and the eventual
understanding of that data. In most of the available systems, spatial data is organized
in layers by theme and/or scale.
In a GIS, features are represented with their (geometric and non-geometric) attributes
and relationships. The geometry of features is represented with primitives of the
respective dimension: a windmill probably as a point, an agricultural field as a polygon.
Cells, pixels and voxels:
Vector data types describe an object through its boundary, thus dividing the space
into parts that are occupied by the respective objects.
The raster approach subdivides space into (regular) cells, mostly as a square
tessellation of dimension two or three. These cells are called either cells or pixels in 2D,
and voxels in 3D.
The data indicates for every cell which real world feature it covers, in case it represents
a discrete field. In case of a continuous field, the cell holds a representative value for
that field.
Raster representation Vector representation
Advantages
simple data structure efficient representation of topology
adapts well to scale changes
simple implementation of
overlays allows representing networks
allows easy association
efficient for image processing
with attribute data
Disadvantages
less compact data structure complex data structure
difficulties in representing
overlay more difficult to implement
topology inefficient for image processing
cell boundaries independent more update-intensive
of feature boundaries
Raster encoding
This simple encoding scheme is known as row ordering. The header of the raster file
will typically inform how many rows and columns the raster has, which encoding
scheme is used, and what sort of values are stored for each cell. Raster files can be
quite big data sets. For computational reasons, it is wise to organize the long list of
cell values in such a way that spatially nearby cells are also near to each other in the
list.
DBMS and spatial databases
GIS software packages provide support for both spatial and attribute data, i.e. they
accommodate spatial data storage using a vector approach, and attribute data using
tables. Historically, however, database management systems (DBMSs) have been
based on the notion of tables for data storage. For some time, substantial GIS
applications have been able to link to an external database to store attribute data and
make use of its superior data management functions. Currently, All major GIS packages
provide facilities to link with a DBMS and ex-change attribute data with it. Spatial
(vector) and attribute data are still sometimes stored in separate structures, although
they can now be stored directly in a spatial database
Data maintenance
Maintenance of (spatial) data can best be defined as the combined activities to keep
the data set up-to-date and as supportive as possible to the user community. It deals
with obtaining new data, and entering them into the system, possibly replacing
outdated data. The purpose is to have an up-to-date stored data set available. After a
major earthquake, for instance, we may have to update our road network data to
reflect that roads have been washed away ,or have otherwise become impassable.
Updating spatial data stems from the requirements that the data users impose, as well
as the fact that many aspects of the real world change continuously. These data
updates can take different forms. It may be that a complete, new survey has been
carried out, from which an entirely new data set is derived that will replace the current
set.
3.3.3 Spatial query and analysis
The most distinguishing parts of a GIS are its functions for spatial analysis, i.e. operators
that use spatial data to derive new geoinformation. Spatial queries and process models
play an important role in this functionality. One of the keys uses of
GISs has been to support spatial decisions. Spatial decision support systems (SDSS) are
a category of information systems composed of a database, GIS software, models, and
a so-called knowledge engine which allow users to deal specifically with locational
problems.
In a GIS, data are usually grouped into layers (or themes). Usually, several themes are
part of a project. The analysis functions of a GIS use the spatial and non-spatial
attributes of the data in a spatial database to provide answers to user questions. GIS
functions are used for maintenance of the data, and for analyzing the data in order to
infer information from it. Analysis of spatial data can be defined as computing new
information that provides new insight from the existing, stored spatial data.
Analysis of spatial data can be defined as computing new information that provides
new insight from the existing, stored spatial data.
Ex: In mountainous areas this is a complex engineering task with many cost factors,
which include the amount of tunnels and bridges to be constructed, the total length
of the tarmac, and the volume of rock and soil to be moved. GIS can help to compute
such costs on the basis of an up-to-date digital elevation model and soil map
3.3.4 Spatial data presentation
The presentation of spatial data, whether in print or on-screen, in maps or in tabular
displays, or as ‘raw data’, is closely related to the disciplines of cartography, printing
and publishing. The presentation may either be an end-product, for example as a
printed atlas, or an intermediate product, as in spatial data made available through the
internet
Method Devices
Hard copy Printer
plotter (pen plotter, ink-jet printer, thermal
transfer printer, electrostatic plotter)
Soft copy film writer screen
computer
Output of digital magnetic tape
data sets
CD-ROM or DVD
the Internet
3.4 Database management systems
Designing a database is not an easy task. Firstly, one must consider carefully what the
database purpose is, and who its users will be. Secondly, one needs to identify the
available data sources and define the format in which the data will be organized within
the database. This format is usually called the database structure. Lastly, data can be
entered into the database. It is important to keep the data up-to-date, and it is
therefore wise to set up the processes for this, and make someone responsible for
regular maintenance of the database.
3.4.1 Reasons for using a DBMS
There are various reasons why one would want to use a DBMS for data storage and
processing.
A DBMS supports the storage and manipulation of very large data sets.
A DBMS can be instructed to guard over data correctness.
A DBMS supports the concurrent use of the same data set by many users.
A DBMS provides a high-level, declarative query language.
A DBMS always includes data backup and recovery functions to ensure data
availability.
A DBMS allows the control of data redundancy.
3.4.2 Alternatives for data management
The decision whether to use a DBMS will depend, among other things, on how much
data there is or will be, what type of use will be made of it, and how many users might
be involved.
when the data set is small, its use relatively simple, and with just one user—we might
use simple text files, and a text processor. Think of a personal address book as an
example, or a small set of simple field observations. Text files offer no support for data
analysis whatsoever, except perhaps in alphabetical sorting.
If our data set is still small and numeric by nature, and we have a single type of use in
mind, a spreadsheet program will suffice. This might be the case if we
haveanumberoffieldobservationswithmeasurementsthatwewanttoprepare for
statistical analysis, for example. However, if we carry out region- or nation-wide
censuses, with many observation stations and/or field observers and all sorts of
different measurements, one quickly needs a database to keep track of all the data.
Video on Introduction to database https://www.youtube.com/watch?v=d11viALaCvA
3.4.3 The relational data model
A data model is a language that allows the definition of:
The structures that will be used to store the base data,
The integrity constraints that the stored data must obey at all moments in time, and
The computer programs used to manipulate the data.
For the relational data model, the structures used to define the database are at-
tributes, tuples and relations. Computer programs either perform data extraction from
the database without altering it, in which case we call them queries, or they change
the database contents, and we speak of updates or transactions
Relations, tuples and attributes
A table or relation is itself a collection of tuples (or records). In fact, each table is a
collection of tuples that are similarly shaped.
An attribute is a named field of a tuple, with which each tuple associates a value, the
tuple’s attribute value.
An attribute’s domain is a (possibly infinite) set of atomic values such as the set Of
integer number values, the set of real number values
When a relation is created, we need to indicate what type of tuples it will store. This
means that we must.
1. Provide a name for the relation,
2. Indicate which attributes it will have, and
3. Set the domain of each attribute.
Finding tuples and building links between them
A key of a relation comprises one or more attributes. A value for these attributes
uniquely identifies a tuple.
If we have a value for each of the key attributes, we are guaran-teed to find no more
than one tuple in the table with that combination of values. It remains possible that
there is no tuple for the given combination
Videio on DATABASE RELATIONAL MODEL
https://www.youtube.com/watch?v=zVwI9jrRWRw
3.4.4 Querying a relational database
The three query operators have some traits in common. First, all of them require input
and produce output, and both input and output are relations! This guarantees that the
output of one query (a relation) can be the input of another query, and this gives us
the possibility to build more and more complex queries
The first operator is Tuple selection works like a filter: it allows tuples that meet the
selection condition to pass, and disallows tuples that do not meet the condition
SELECT FROM Parcel
WHERE AreaSize > 1000
A second operator is attribute projection. Attribute projection works like a tuple
formatter: it passes through all tuples of the input but reshapes each of them in the
same way.
SELECT PId, Location FROM Parcel
Our third query operator differs from the two above in that it requires two input
relations.
The join operator takes two input relations and produces one output relation, gluing
two tuples together (one from each input relation), to form a bigger tuple, if they meet
a specified condition.
SELECT
FROM TitleDeed, Parcel
WHERE TitleDeed.Plot = Parcel.PId
UNARY OPERATOR
3.5 GIS and spatial databases
3.5.1 Linking GIS and DBMS
GIS software provides support for spatial data and thematic or attribute data. GISs
have traditionally stored spatial data and attribute data separately. This required the
GIS to provide a link between the spatial data (represented with rasters or vectors),
and their non-spatial attribute data. The strength of GIS technology lies in its built-in
‘understanding’ of geographic space and all functions that derive from this, for
purposes such as storage, analysis, and map production.
DBMSs offer much better table functionality, since they are specifically designed for
this purpose. A lot of the data in GIS applications is attribute data, so it made sense to
use a DBMS for it. For this reason, many GIS applications have made use of external
DBMSs for data support. In this role, the DBMS serves as a centralized data repository
for all users, while each user runs her/his own GIS software that obtains its data from
the DBMS. This meant that a GIS had to link the spatial data represented with raster’s
or vectors, and the attribute data stored in an external DBMS.
With raster representations, each raster cell stores a characteristic value. This value can
be used to look up attribute data in an accompanying database table.
With vector representations, our spatial objects—whether they are points, lines or
polygons—are automatically given a unique identifier by the system. This identifier is
usually just called the object ID or feature ID and is used to link the spatial object (as
represented in vectors) with its attribute data in an attribute table.
The ID in the vector system functions as a key, and any reference to an ID value in the
attribute database is a foreign key reference to the vector system. For example, in
Figure 3.8, parcel is a table with attributes, linked to the spatial objects stored in a GIS
by the Location column. Obviously, several tables may make references to the vector
system, but it is not uncommon to have some main table for which the ID is actually
also the key.
3.5.2 Spatial database functionality
The main problem was that there is additional functionality needed by DBMS in order
to process and manage spatial data. As the capabilities of our hardware to process
information has increased, so too has the desire for better ways to represent and
manage spatial data. During the 1990’s, object-oriented and object-relational data
models were developed for just this purpose. These extend standard relational models
with support for objects, including ‘spatial’ objects.
Currently, GIS software packages are able to store spatial data using a range of
commercial and open source DBMSs such as Oracle, Informix, IBM DB2, Sybase, and
PostgreSQL, with the help of spatial extensions. Some GIS software have integrated
database ‘engines’, and therefore do not need these extensions.
ESRI’s ArcGIS, for example, has the main components of the MS Access data spatial
DBMS base software built-in. This means that the designer of a GIS application can
choose whether to store the application data in the GIS or in the DBMS
GIS and spatial databases
Spatial data can be stored in a special database column, known as the geometry
column, (or feature or shape, depending on the specific software package), as shown
in Figure This means GISs can rely fully on DBMS support for spatial data, making use
of a DBMS for data query and storage (and multi-user support), and GIS for spatial
functionality. Small-scale GIS applications may not require a multi-user capability and
can be supported by spatial data support from a personal database.
Fig. Geometry data stored directly in a spatial database table
Querying a spatial database
A Spatial DBMS provides support for geographic co-ordinate systems and
transformations. It also provides storage of the relationships between features,
including the creation and storage of topological relationships
SELECT R.Name
FROM Restaurants AS R, Hotels as H
WHERE R.Type = “Thai” AND H.name = “Hilton” AND
ST Intersects(R.Geometry, ST Buffer(H.Geometry, 2000))
In this case the WHERE clause uses the ST Intersects function to perform a spatial join
between a 2000 m buffer of the selected hotel and the selected subset of restaurants.
The Geometry column carries the spatial data.
Additional learning: To learn the mapping traits/diversity/species richness at
different resolution with code. Spatial Data Analysis Case Studies (rspatial.org)
Graded Questions
1. What are the stages of GIS
2. Write a note on Spatial Data presentation.
3. State the difference between raster and vector representation.
4. What is DBMS? Explain its reasons.
5. Explain the relational data model using suitable example.
6. How query can be implemented in relational database? What are its three ways?
7. Explain Timestamp based protocol
8. What are the different ways of spatial data capture and preparation? Explain.
9. What is data capture and maintenance? What are the inputs methods and devices
used?
10.What are the functional components of GIS
Multiple Choice Questions
1) Full form of SDI is_____________________
A. spatial data industry
B. static data infrastructure
C. Spatial data infrastructure
D. none of them
2) Standard exist for all facets of GIS are ___________and _____________
A. ISI ,ISO
B.ISO,OGC
C. ISI,OGC
D. none of them
3) Raster approach subdivides space into regular cells, In 2D these cells are called
A. pixel
B. Voxel
C. cell
D. cluster
4) Raster approach subdivides space into regular cells, In 3D these cells are called
A. pixel
B. Voxel
C. cell
D. cluster
5) Raster encoding is also called______________
A. column encoding
B. table encoding
C. row encoding
D. All of them
6) SDSS stands for_______________________
A. spatial data support system
B. structural data support system
C. Spatial decision support system
D. None of them
7) Primary key is represented in table by using____________
A. underscore(_)
B. hyphen(-)
C. Underline(________)
D. PK
8) Set of tuples in a relation at some point in time called
A. instance
B. row
C. attribute
D. All of them
9) Spatial Database allow user to ______________ the data
A. Store
B. Query
C. Manipulate
D. All of the above
10) Two unary query operator are ________and ___________-
A. Tuple selection and project selection
B. tuple selection
C. project selection
D. All of the above
11) OGC stands for _________________________
A. Organization of Geospatial Consortium
B. Origin of GIS Committee
C. Origami of GIS Council
D. All of the above
12) GIS software packages are able to store spatial data using a range of commercial
and open source DBMSs.
A. Oracle
B. Informix
C. IBM DB2
D. All of the above
13) _________________ representation is complex data structure
A. Vector
B. Raster
C. Boolean
D. Relational
14) The first operator is __________selection works like a filter: it allows records that meet
the selection condition to pass, and disallows tuples that do not meet the condition.
A. Table
B. Cell
C. Tuple
D. Column
15) DBMS works at the ________________ of GIS software to store and represent data.
A. Front End
B. Back End
C. Side End
D. Last End
Additional Activity: Learning by Doing LbD
Click on the link and solve the puzzle
https://puzzel.org/en/jigsaw/play?p=-MtytgfSBcvkaHQPV4pw
***