DBM449 Lab 4 OpenGIS Implementation in mySQL
OBJECTIVES
Understand and become familiar with current capabilities and limitations of the OpenGIS implementation in MySQL.
Learn to create, update, and use spatial indices.
Explore practical approaches to calculating distances between points on the Earth’s surface.
Understand fundamentals of geotagging.
Create stored procedures to determine real-world distances, and to process spatial queries returning result sets of data points within a bounding rectangle.
Explore visualization of GIS data.
- PARTS LIST
- MySQL (dev.mysql.com/downloads)
III. PROCEDURE
The argument could be made that Business Intelligence (BI) and Data Analytics revolutionized Online Analytical Processing (OLAP) by making it simple for users to traverse, examine, and visualize different aggregations of data over the dimension of time. Geographic Information Systems, once an arcane, rare, expensive, and highly specialized type of information system, have brought about a similar revolution using the spatial dimension. As these systems have become affordable and entered the mainstream—indeed, they are now ubiquitous—they have also become mainstream; or perhaps it would be more accurate to say that mainstream DBMS systems have come to commonly adopt and integrate the specialized data structures and algorithms required to implement spatially enabled, data-driven systems at will.
In this laboratory exercise, you will create a GIS-enabled database by implementing a spatially indexed table, populating it with spatially encoded data, and creating stored procedures to provide augmented functionality to determine distances between points, and to process queries returning results containing points within spatially defined boundaries. Finally, you will learn to express and explore spatial data in its most natural and intuitive form: visually displayed as maps and plots.
This lab may be completed using MySQL running on either your own computer, or the DeVry iLab. In either case, it is presumed that you will begin the initial lab step AFTER addressing any necessary routine housekeeping chores, such as creating an appropriate schema (e.g., DBM449LAB2), and creating any necessary user accounts, permissions, and so on.
Note: At the time this lab was written, the full OPENGIS standard was not implemented in the current production release of MySQL, but new features were being added with each point release of MySQL. It is entirely possible—in fact, quite likely—that improved OPENGIS compliance, including functions for distance calculation for spherical projections (e.g., points on the Earth’s surface) and circular proximities (e.g., “within radius of”) will become built-in to the MySQL database. However, as the study of the underlying mathematical and topographical principles needed to implement non-planar distance calculations, and for determining envelope or bounding box point results are a very worthwhile study, you should implement your own stored procedures for these functions, rather than substituting any built-in capabilities that become available. You may, however, repeat steps using such features as available, in order to compare and study the similarities and differences between your calculation methods and those later implemented as part of the OPENGIS API.
Important Further Note: At the time of this writing, spatial indices are supported ONLY in the MyISAM storage engine, and not in the InnoDB or other storage engines. BE SURE TO CREATE YOUR DATABASE TABLE FOR THIS LAB USING THE MyISAM STORAGE ENGINE!
DBM449 Lab 4 Questions
Objectives: (In your own words what was this lab designed to accomplish? What was its purpose?)
Results: (Discuss the steps you used to complete your lab. Were you successful? What did you learn? What were the results? Explain what you did to accomplish each step. You can include screen shots, code listings, and so on. to clearly explain what you did. Be sure to record all results specifically directed by the lab procedure. Number all results to reflect the procedure number to which they correspond.)
- Create the table indicated in the following ERD.
- Be sure you have first addressed the assigned research for this week’s unit involving Spatial Indices, and then use the following DDL to create a spatial index on the table just created:
- Paste the complete SQL Data Definition Language (DDL) you used to create this table and index into your lab report.
- Choose a point of interest (e.g., your house, your local DeVry campus, etc.), and at least three additional points within 20 miles, and three additional points more than 40 miles from the first point. For example, I chose my house, and three favorite restaurants in town, and three favorite restaurants in a distant town where I used to live. Using Google Maps or other service capable of converting street addresses to geographical (longitude and latitude) coordinates with good precision, note the geolocation of each point. Record this data in your lab report.
- From your research, you should anticipate that you cannot simply insert these values directly. Model your insert statements for the data to be inserted into your Points table on the following example.
- CHECKPOINT QUESTION: Explain what the GeomFromText() function does, and why it is necessary to use this? Paste your response into your lab report.
- Run your insert statement(s) to add the data to the Points table. Paste a screen shot showing your SQL statement AND result into your lab report.
- Attempt to retrieve all of the table’s contents using a SELECT * statement. You should find that this does not produce readable results. Your results may resemble the following.
- CHECKPOINT QUESTION: Why does this query not produce the results you might typically expect from a SELECT * statement? How can the AsText() function be incorporated into a query returning every field in the table in a readable format? Paste your response into the lab report.
- Execute the query you composed in the previous step, and paste a screenshot of the results into your lab report. The results should be similar to the following.
- Calculating Distances on Earth’s Surface (Spherical, or Nonplanar Distance Calculation)
CHECKPOINT QUESTION: Your assigned research and graded threaded discussion questions this week should quickly lead you to discover that although the Pythagorean Theorem is marvelously useful for calculating the distance between points on a Cartesian planar surface, on a curved surface (such as the surface of the Earth), the further apart two points reside from one another, the greater is the error that results from misapplication of this formula to a curved (in this case, roughly spheroidal) surface. Better (less imprecise) results can be obtained by making use of the Great Circle Formula, haversine formulas, and cosine transforms. You will need to select and appropriate formula, and compose a stored procedure which can be used to calculate the geographic distances between points in your table. You will also need to use a coefficient or conversion factor so that the units of the results are expressed appropriately (e.g., kilometers, meters, miles, yards, feet, etc.), and with reasonable precision. Record your determination of the formula you will use, the reason you believe this is a good approach, and discuss both the degree of precision/error to be expected, and the units you elected to use for your measurement, and why. Record your answer in the lab report.
- Compose and install your stored procedure or function for calculating geographic distance, into the database. Take a screen shot showing your SQL statement, and the result showing that the procedure was successfully created. Paste this into the lab report.
- Spatial Queries: Retrieving Data Points Within a Bounding Polygon
CHECKPOINT QUESTION: Your assigned research and graded discussion questions this week will inform your understanding of the use of a bounding box or bounding polygon used to return all spatially indexed points stored in the database which reside within the area defined by the boundary. Parameters for a bounding rectangle can minimally be specified using the vertex points of either diagonal. For example, the upper-left corner, and the lower-right corner. In such case, all points with a horizontal value equal to or between the x-axis elements of the bounding points, that also have vertical value equal to or between the y-axis elements of the bounding points, reside within the qualifying region. You will want to easily be able to center this bounding rectangle on a point which you choose. How will you accomplish this? Design and document the stored procedure or function you will use to implement a bounding rectangle function, and paste your analysis and design into the lab report.
- Install the stored procedure or function you designed in the previous step into the database. Create a screen shot showing the SQL used to create the procedure, and the result of its successful creation.
- Write SQL to use your bounding box function, centered on your original point of interest, and all of the surrounding points of interests within 20 miles (horizontal and vertical distance) from that point. The results should show that points outside the region are not returned by this spatial query result. Paste a screen shot showing your query and the result, into your lab report.
- CHECKPOINT QUESTION: It is possible for a point residing within 20 miles of your original point to correctly be omitted by the bounding box query. Explain why this is the case, and what improvements/refinements might be undertaken in order to improve upon this.
- Visualization: Mapping and Displaying Spatial Data Graphically
CHECKPOINT QUESTION: Having created a stored procedure that can easily calculate the distance between any two points in your table, it will occur to you that you could easily create queries that would find “the point B, nearest a given point, A, meeting some additional criteria”. However, consider carefully that you would do this by using a calculated field (Distance). For how many points in your database would the query need to calculate Distance? What are the implications of this to performance and efficiency, if your table is quite large (millions of rows)? What approach could you take that would result in greater efficiency, perhaps allowing Distance to be calculated for a relatively small subset? (Hint: Think about your bounding box function. It returns a small set of points within a given proximity of a specified point, and does so pretty efficiently if the proper indexes are available, because it filters for latitude and longitude values within a bounded numerical range. What if you calculated Distance for only this subset, and further filtered for the minimum Distance?) Record your answers to these questions in your lab report.
- With your answers to the previous questions in mind, formulate an EFFICIENT query that returns only the latitudes and longitudes for two points: the original point, and its nearest neighbor, in a single row (Hints: 1. A JOIN statement might be useful here; 2. It may be convenient to use the X() function and Y() function on your point data type columns, for example: “SELECT X(Points.location) as longitude1, Y(Points.location) as latitude1 FROM Points;”).
- Test your query, and when you are satisfied that it is working correctly, paste a screen shot showing your query and its results into your lab report.
- Modify your query using concatenation and string manipulation functions as needed so that the output of the results resembles:
- Notice that the highlighted elements in this output should be string literals. Only the X() and Y() values from the first and second points are values obtained from the database.
- Take a screenshot of your query, showing both the SQL and the result, and paste it into your lab report.
- Test the URL you have generated in the previous query, by pasting it into the address bar of your internet browser. A route map should be generated with characteristics similar to the figure below (your map will, of course, reflect the unique points/locations you selected for your database).
- In your lab report, provide a description explaining the route image, for example, “Closest Pizza Parlor to my home.”
- CHECKPOINT QUESTION: What are the benefits of displaying spatial data visually? What are some examples of this sort of spatial visualization of GIS data OTHER than driving directions for consumers? Record your response in your lab report.
- Conclusions: (After completing this lab, in your own words, what conclusions can you draw from this experience?)
- File Format: MS-Word .docx
- Total Number of Pages: 15
- Paper Format: No Specific Format
- Figures Included: Yes
- Custom Solution Available: Yes