Map Arlington 2: Importing Shapefile Data into PostGIS¶
The Problem¶
Arlington County, Virginia makes GIS data about the county available through its Arlington County, Va. GIS Data website.
I want to import this data into my map_arlington
database. Specifically,
I would like to add tables for
voter precincts,
civic associations,
county parks,
and zip codes.
Moving the Database¶
Before attempting to add the new tables, I wanted to make a copy of the database and move it to a development machine, so on the host machine I ran:
$ pg_dump -U [user] map_arlington > map_arlington_export.sql
I then copied map_arlington_export.sql
to the new server and ran:
$ psql map_arlington < map_arlington_export.sql
With the database copied, it was time to figure out how to add the new data.
Identifying the Spatial Reference System¶
One of the essential problems with using geospatial data from different sources it getting the data into a common spatial reference system. PostGIS stores spatial reference system information using a Spatial Reference System Identifier (SRID). For my task of adding the Arlington GIS data to my database derived from OSM data, I need to do the following:
Find the SRID of the OSM data in my database.
Find the SRID of the Arlington County data.
Convert the Arlington County data into the spatial reference system of the OSM data and load it into the database.
Tools¶
To accomplish these three tasks, I will use shp2pgsql, ogrinfo, and the web application Prj2EPSG.
shp2pgsql
comes with the postgis
package and ogrinfo
with the
gdal-bin
package. I installed both of these with:
$ sudo aptitude install postgis gdal-bin
Process¶
Before proceeding to add new tables to my database from the Arlington County
data, I decided to remove the gen0
and gen1
tables and rename each
table to remove the osm_new_
that started each table name.
Here is a sample part of my psql session to illustrate this:
map_arlington=# \dt *landusages*
List of relations
Schema | Name | Type | Owner
--------+-------------------------+-------+---------
public | osm_new_landusages | table | [username]
public | osm_new_landusages_gen0 | table | [username]
public | osm_new_landusages_gen1 | table | [username]
(3 rows)
map_arlington=# DROP TABLE osm_new_landusages_gen0;
DROP TABLE
map_arlington=# DROP TABLE osm_new_landusages_gen1;
DROP TABLE
map_arlington=# ALTER TABLE osm_new_landusages RENAME TO landusages;
ALTER TABLE
map_arlington=# \dt *landusages*
List of relations
Schema | Name | Type | Owner
--------+------------+-------+---------
public | landusages | table | [username]
(1 row)
When I finished, the complete list of tables looked like this:
map_arlington=# \dt
List of relations
Schema | Name | Type | Owner
--------+------------------+-------+---------
public | aeroways | table | [username]
public | amenities | table | [username]
public | boundary | table | [username]
public | buildings | table | [username]
public | landusages | table | [username]
public | mainroads | table | [username]
public | minorroads | table | [username]
public | motorways | table | [username]
public | places | table | [username]
public | railways | table | [username]
public | spatial_ref_sys | table | [username]
public | transport_areas | table | [username]
public | transport_points | table | [username]
public | waterareas | table | [username]
public | waterways | table | [username]
(15 rows)
To find the spatial referencing system in use by OSM, I just listed the boundary table with:
map_arlington=# \d boundary
id | integer | not null default ...
osm_id | bigint |
name | character varying(255) |
type | character varying(255) |
admin_level | smallint |
geometry | geometry(Geometry,900913) |
Geometry types in PostGIS are listed with geometry
followed by parentheses
enclosing a pair of values: the geometry (possible values include Point
,
Polyline
, Polygon
, MultiPoint
, MultiPolyline
, MultiPolygon
,
and Geometry
), and the SRID.
In this case the SRID is 900913, the Google Projection.
To find the spatial referencing system used by the Arlington County data, I ran the following command inside the directory with the unzipped voter precint shapefile:
$ ogrinfo -al -so Voter_Precinct.shp
The -al
switch means “all layers” and the -so
means “summary only”. The
output of this command was:
INFO: Open of `Voter_Precinct.shp'
using driver `ESRI Shapefile' successful.
Layer name: Voter_Precinct
Geometry: Polygon
Feature Count: 52
Extent: (11860792.110333, 6987491.520335) - (11900924.090333, 7026256.980335)
Layer SRS WKT:
PROJCS["NAD83_Virginia_North_ftUS",
GEOGCS["GCS_North_American_1983",
DATUM["North_American_Datum_1983",
SPHEROID["GRS_1980",6378137,298.257222101]],
PRIMEM["Greenwich",0],
UNIT["Degree",0.017453292519943295]],
PROJECTION["Lambert_Conformal_Conic_2SP"],
PARAMETER["standard_parallel_1",39.2],
PARAMETER["standard_parallel_2",38.03333333333333],
PARAMETER["latitude_of_origin",37.66666666666666],
PARAMETER["central_meridian",-78.5],
PARAMETER["false_easting",11482916.667],
PARAMETER["false_northing",6561666.667],
UNIT["Foot_US",0.30480060960121924]]
OBJECTID: Integer (10.0)
PRECINCT: Integer (10.0)
HOUSE: Integer (10.0)
SENATE: Integer (10.0)
PREC_NAME: String (80.0)
POLLING_PL: String (80.0)
ADDRESS: String (80.0)
LABEL: String (80.0)
Shapearea: Real (24.15)
Shapelen: Real (24.15)
What I am interested in here is the PROJCS[...]
. I want to turn that
information into an SRID. For that I used the web application
Prj2EPSG. The following screen shot shows the
result I was looking for:
The SRID is 2283 (NAD83_Virginia_North_ftUS).
Knowing both the source and target SRIDs, I was ready to load the shapefile data into my database:
$ shp2psql -s 2283:900913 Voter_Precinct.shp precincts | psql -d map_arlington
I then changed directories to each of the shapefile directories in turn, and ran the same command for each shapefile. When I finished, my database tables consisted of:
List of relations
Schema | Name | Type | Owner
--------+--------------------+-------+---------
public | aeroways | table | jelkner
public | amenities | table | jelkner
public | boundary | table | jelkner
public | buildings | table | jelkner
public | civic_associations | table | jelkner
public | county_parks | table | jelkner
public | landusages | table | jelkner
public | mainroads | table | jelkner
public | minorroads | table | jelkner
public | motorways | table | jelkner
public | places | table | jelkner
public | precincts | table | jelkner
public | railways | table | jelkner
public | spatial_ref_sys | table | jelkner
public | transport_areas | table | jelkner
public | transport_points | table | jelkner
public | waterareas | table | jelkner
public | waterways | table | jelkner
public | zip_codes | table | jelkner
(19 rows)
To demonstrate visually that everything worked, I connected to the database from QGIS and loaded the civic associations from the Arlington data on top of the county boundary from the OSM data:
Finally, I ran pg_dump
again to export the database and move it back to
the production server.