Vote Arlington 3: Processing the Data¶
I began this investigation with the hypothesis that I will find a direct correlation between average income and voting participation rate in the November 2010 general election in Arlington, Virginia.
By correlation I
am referring to a statistical association between the 51 bivariate data points of (avg_income,
vote_rate)
for each of the voting precincts in Arlington County, Virginia.
Before the statistical analysis can be done, I need to calculate the values of the bivariate data. I’ll do voting participation rate first, since that will be by far the easier of the two to calculate.
Voting Participation Rate¶
I will define voting participation rate for this investigation as the voter turnout divided by the voting age population within each voting precinct expressed as a percentage.
There are other possible ways to define this value that would have other possible implications. I could have used number of registered voters, for example, instead of voter age population. I chose voting age population because I am interested in testing the hypothesis that communities with lower average income have less “voice” in the political process rather than investigating the personal behavior of individual voters.
To compute the voting participation rate with my given definition, I ran the following script:
SELECT precinct_num, precinct_name,
((voter_turnout * 100)::numeric / vote_age_pop)::numeric(4, 2) AS vote_rate
FROM voting_precincts ORDER BY precinct_num;
which produced the following table:
precinct_num | precinct_name | vote_rate
--------------+--------------------+-----------
01 | Arlington | 18.69
02 | Ashton Heights | 37.61
03 | Aurora Hills | 50.38
04 | Ballston | 47.02
05 | Barcroft | 33.83
06 | Crystal City | 23.91
07 | Cherrydale | 45.03
08 | Hume | 26.66
09 | Columbia | 32.45
10 | Wilson | 22.69
11 | East Falls Church | 53.39
12 | Fairlington | 51.70
13 | Glen Carlyn | 34.86
14 | Clarendon | 34.75
15 | Lyon Park | 32.33
16 | Lyon Village | 40.89
17 | Overlee Knolls | 49.44
18 | Park Lane | 21.96
19 | Rosslyn | 22.58
20 | Thrifton | 57.51
21 | Virginia Highlands | 28.15
22 | Abingdon | 43.29
23 | Westover | 40.19
24 | Woodlawn | 38.39
25 | Arlington Forest | 38.25
26 | Fillmore | 29.68
27 | Jefferson | 38.27
28 | Claremont | 27.21
29 | Dominion Hills | 53.85
30 | Glebe | 25.58
31 | Lexington | 52.80
32 | Oakridge | 26.09
33 | Rock Spring | 58.23
34 | Yorktown | 52.29
35 | Madison | 59.91
36 | Marshall | 40.57
37 | Nottingham | 53.52
38 | Arlington View | 22.36
39 | Ashlawn | 38.54
40 | Virginia Square | 28.29
41 | Woodbury | 21.32
42 | Shirlington | 29.62
43 | Arlington Mill | 12.87
44 | Dawson | 28.64
45 | Buckingham | 23.39
46 | Central | 24.27
47 | Four Mile Run | 33.47
48 | Courtlands | 30.18
49 | Monroe | 28.15
50 | Crystal Plaza | 18.50
51 | Taylor | 37.22
(51 rows)
Average Income¶
Finding average income per voting precinct is a greater challenge, since I will have to derive the data from three different tables. Here is the plan:
Use a spatial query to determine which census blocks are in which voting precincts.
Determine the total income for each census block by multiplying census block population by the income percapita for the census block taken from the block group table.
Calculate total income and total population for each voting precinct with and where is each census block in the precinct, and divide these two values to obtain percapita income for the voting precinct.
The first task is accomplished by:
SELECT
vp.precinct_num,
vp.precinct_name,
cb.tract,
cb.block,
cb.total_pop
FROM
voting_precincts AS vp
LEFT JOIN
census_blocks AS cb
ON
ST_Within(cb.geom, vp.geom)
ORDER BY
vp.precinct_num;
with the following top of the 2188 row table showing it works as desired:
precinct_num | precinct_name | tract | block | total_pop
--------------+--------------------+--------+-------+-----------
01 | Arlington | 102701 | 1007 | 194
01 | Arlington | 102600 | 2002 | 98
01 | Arlington | 102600 | 1006 | 70
01 | Arlington | 102701 | 1005 | 189
01 | Arlington | 102600 | 1004 | 135
01 | Arlington | 102701 | 2001 | 335
01 | Arlington | 102600 | 2003 | 83
01 | Arlington | 102600 | 1007 | 53
01 | Arlington | 102600 | 1002 | 1375
01 | Arlington | 102600 | 1003 | 150
01 | Arlington | 102701 | 1000 | 567
01 | Arlington | 102600 | 2004 | 86
01 | Arlington | 102702 | 2000 | 100
01 | Arlington | 102701 | 1006 | 550
01 | Arlington | 102701 | 1002 | 170
01 | Arlington | 102600 | 2001 | 92
01 | Arlington | 102600 | 1005 | 68
01 | Arlington | 102600 | 1008 | 59
01 | Arlington | 102600 | 1009 | 40
The second task is calculated with:
SELECT
cb.tract, cb.block, bg.income_percap * cb.total_pop AS total_income
FROM
census_blocks AS cb
LEFT JOIN
block_groups AS bg
ON
cb.tract || substring(cb.block from 1 for 1) = bg.tract || bg.block_group
ORDER BY
cb.tract
the top rows of whose result are:
tract | block | total_income
--------+-------+--------------
100100 | 4000 | 8726958
100100 | 3006 | 3953691
100100 | 3004 | 5202225
100100 | 1003 | 4829517
100100 | 4008 | 3464136
100100 | 3020 | 3814965
100100 | 4007 | 3131046
100100 | 1004 | 2069793
100100 | 4024 | 3264282
100100 | 1006 | 3073329
100100 | 1008 | 3198771
100100 | 2007 | 9349784
100100 | 1009 | 0
100100 | 4019 | 5595912
100100 | 1000 | 11540664
100100 | 1001 | 1317141
100100 | 1011 | 8341893
100100 | 1012 | 3512376
The final result is obtained with:
CREATE TABLE precinct_data
AS (SELECT
vp.precinct_num,
vp.precinct_name,
SUM(bg.income_percap * cb.total_pop) / SUM(cb.total_pop)
AS income_per_cap,
((vp.voter_turnout * 100)::numeric / vp.vote_age_pop)::numeric(4, 2)
AS vote_rate
FROM
census_blocks AS cb
LEFT JOIN
block_groups AS bg
ON
cb.tract || substring(cb.block from 1 for 1) = bg.tract || bg.block_group
LEFT JOIN
voting_precincts as vp
ON
ST_Within(cb.geom,vp.geom)
GROUP BY
vp.precinct_num,
vp.precinct_name,
vote_rate
ORDER BY
vp.precinct_num);
which creates a new table named precinct_data
:
our_arlington=# \dt
List of relations
Schema | Name | Type | Owner
--------+------------------+-------+---------
public | block_groups | table | jelkner
public | census_blocks | table | jelkner
public | precinct_data | table | jelkner
public | spatial_ref_sys | table | jelkner
public | voting_precincts | table | jelkner
(5 rows)
Ordering the precinct_data
table by income_per_cap
produces the
following:
precinct_num | precinct_name | income_per_cap | vote_rate
--------------+--------------------+----------------+-----------
43 | Arlington Mill | 18226 | 12.87
01 | Arlington | 26398 | 18.69
38 | Arlington View | 35605 | 22.36
13 | Glen Carlyn | 36116 | 34.86
30 | Glebe | 36627 | 25.58
28 | Claremont | 38218 | 27.21
47 | Four Mile Run | 40988 | 33.47
05 | Barcroft | 41136 | 33.83
09 | Columbia | 42684 | 32.45
26 | Fillmore | 43755 | 29.68
45 | Buckingham | 44886 | 23.39
23 | Westover | 45908 | 40.19
27 | Jefferson | 49073 | 38.27
32 | Oakridge | 50625 | 26.09
04 | Ballston | 51939 | 47.02
25 | Arlington Forest | 54068 | 38.25
39 | Ashlawn | 54438 | 38.54
44 | Dawson | 55198 | 28.64
18 | Park Lane | 56868 | 21.96
24 | Woodlawn | 58522 | 38.39
49 | Monroe | 60482 | 28.15
21 | Virginia Highlands | 60756 | 28.15
19 | Rosslyn | 60990 | 22.58
29 | Dominion Hills | 61231 | 53.85
10 | Wilson | 61583 | 22.69
07 | Cherrydale | 61699 | 45.03
12 | Fairlington | 61883 | 51.70
11 | East Falls Church | 61939 | 53.39
42 | Shirlington | 62106 | 29.62
15 | Lyon Park | 62315 | 32.33
50 | Crystal Plaza | 63895 | 18.50
17 | Overlee Knolls | 64294 | 49.44
31 | Lexington | 65480 | 52.80
36 | Marshall | 66046 | 40.57
08 | Hume | 68363 | 26.66
22 | Abingdon | 69594 | 43.29
16 | Lyon Village | 69839 | 40.89
02 | Ashton Heights | 70247 | 37.61
03 | Aurora Hills | 70811 | 50.38
40 | Virginia Square | 71906 | 28.29
46 | Central | 72095 | 24.27
34 | Yorktown | 72890 | 52.29
41 | Woodbury | 73800 | 21.32
14 | Clarendon | 76054 | 34.75
37 | Nottingham | 76403 | 53.52
06 | Crystal City | 77172 | 23.91
51 | Taylor | 84365 | 37.22
33 | Rock Spring | 85660 | 58.23
48 | Courtlands | 86792 | 30.18
20 | Thrifton | 98391 | 57.51
35 | Madison | 108370 | 59.91
(51 rows)
Before moving to the final statistical analysis, a few maps to help visualize this table would be helpful.
Time again for QGIS. Using the DB Manager, I used a natural join to add the data from the table above to the one with the precinct polygons.
Loading this layer and viewing its attribute table shows the needed information:
Which I used to make the following two choropleth maps:
All that is left now is to analyze the data, but a cursory look at the table and maps makes confirmation of the hypothesis look promising.