2018 is finally here, and with it midterm elections. While national eyes will be focused on control of Congress, Pennsylvania's voters will decide in November whether to reelect Democratic incumbents Governor Tom Wolf and Senator Bob Casey, or support as-yet undecided Republican challengers.
Within swing-state Pennsylvania, Philadelphia plays the role of Democratic bastion. Elections often boil down to two questions (1) how do Pittsburgh and Philadelphia's suburbs swing between parties, and (2) can Philadelphia's turnout overcome those swings.
I've downloaded data from the *amazing* Open Elections Project. Let's look ahead to November.
Philadelphia and Pennsylvania's turnout
Turnout in PA Governor races is typically 2/3 of the turnout in Presidential elections. Just under 6 Million votes are typically cast for President, while under 4 Million are cast for Governor. This means that there is significant room for growth in 2018 turnout with an energized electorate.
The closest proxy for 2018 is probably 2006: in that year, an unpopular Republican President and an incumbent Democratic Governor led to a sweeping victory across the state, with Ed Rendell winning reelection by 19.8 percentage points. Coincidentally, Bob Casey was running then as now, and this wave carried him to a 17.4 point win over incumbent Rick Santorum.
That election also saw the highest statewide turnout for a Governor's race, with 4.1 million Pennsylvanians voting. Democratic voters in 2018 appear to be even more motivated than twelve years ago; if turnout closes even a fraction of the 2 Million vote gap between Presidential and Gubernatorial races, we could expect record vote counts.
Despite its reputation as a swing state, Pennsylvania entered 2016 with the Democrat having won six of the prior seven President & Governor races. Of course, swinginess reasserted itself in 2016, as Trump carried the state overall by 44 thousand votes. Interestingly, Rendell's big Democratic win in 2006 came even while the Republican count of votes held steady: he received 55,000 more votes than in 2002, and that leap was responsible for the surging victory.
Within the state, Philadelphia serves both the roles of casting the most votes and being the most Democratic county. In 2014, Philadelphia supported Wolf with 88% of the vote. He carried the state with 55%.
Together, Philadelphia and its four-county suburbs--Bucks, Chester, Delaware and Montgomery--constitute 31% of the state's population and 33% of its votes. Philadelphians vote disproportionately more in Presidential elections, making up 11.8% of the electorate on average, as opposed to 10.8% of the electorate in Governor's races. That difference may seem small, but combined with the fact that Philadelphia often votes 36 percentage points more Democratic than the rest of the state, this single swing can change statewide results by 0.36 percentage points. For a sense of size, Donald Trump won the state by 1.2 percentage points.
Philadelphia and its suburbs regularly vote more Democratic than the rest of the state. Across the years, however, there are some subtle trends. Philadelphia's suburbs appear to be the most susceptible to waves: the swing from Democrat to Republican between 2006 and 2010 was much more dramatic than in Philadelphia or the rest of the state. In 2016, those same suburbs exemplified the anti-Trump movement, and actually voted *more* for Clinton than they had for Obama in 2012. Both Philadelphia and the rest of PA did the opposite.
Turnout vs. Persuasion in Pennsylvania
In a previous post, I decomposed swings in votes into changes in turnout and changes in party selection. Let's apply that same calculation to state, county by county.
A recap: the calculation decomposes each county's effect on the statewide election results into the effect of its turnout swings and of swings in its party preference. If a county has a score of 0.2 in Party Variability, that means that its typical swing in party preference changes the statewide election outcomes by 0.2 percentage points. If a county has strong average turnout but varies in which party it votes for, it will have a high Party Variability score. If a county always votes for one party, and its turnout varies from election to election, it will have a high Turnout Variability score. Counties that always have low turnout or don't vote strongly for a given party will have low scores, as will counties that never vary.
In almost every county, party variability dominates turnout variability. The single, huge exception is Philadelphia. Philadelphia's score of 0.37 means that its changes in turnout between Governor and Presidential races can change the statewide election by 0.74 percentage points, as it goes from one deviation below its mean to one deviation above. All other counties almost entirely affect the state by swings in party preference.
(Note: This calculation focuses on county-level results, and I don't have data on individual voter turnout. Some of the Party Variability is almost certainly within-county turnout differentials: if within a county the Republicans are more likely to vote than the Democrats in a given election, that will appear in this calculation as a change in that county's party preference. As such, this calculation almost certainly overstates Party Variability and understates Turnout Variability. Nonetheless, Philadelphia's larger score is telling.)
Party Variability impacts the entire state, and the map of the Variability impact largely looks like the map of voters.
By comparison, only Philadelphia has a consistent-enough party preference and enough turnout variability for its turnout to have a large effect.
Now that 2018 is finally here, there is a lot of analysis to do. In the coming year, I will be profiling local congressional districts, analyzing state and local polls, and trying to understand our city and state a little bit better with each week.
Do you have thoughts for a post? An open question? A calculation or map you'd like to see? Let me know!
Sorry for the sparse posts. I am in the process of trying to wrangle state election data into a useable form, to ramp up for the 2018 Governor's race.
(Big, big shout-out to the Open Elections Project)
Once the data is in working order, keep an eye out for some thoughts on PA in 2018. But in the meantime, I thought I would present a sequence of maps that made me laugh out loud, and reinforce some basic rules about maps and data. (I am not the first to come up with these, but PA makes them *super* striking).
How did Pennsylvania vote in the 2014 Gubernatorial Election? Democrat Tom Wolf beat Republican incumbent Tom Corbett by 1.92 M votes (55%) to 1.57 M (45%). However, Corbett won 43 counties, to Wolf’s 24.
A naïve map of the election, coloring counties by winner, makes the election look outstanding for Corbett.
The binary nature of this maps hides the fact that much of PA is actually purple. Here’s a map where we color counties by a continuous version of the percentage gap between Wolf and Corbett.
The race looks a lot closer.
This map isn’t perfect either. It shows us how the land-area of Pennsylvania voted, but land doesn’t vote. People do. Some of these pixels represent many fewer voters than others. If we now change the transparency of the colors to represent the voters per mile, we get a map that in which the strength of the color is proportional to the number of votes the pixel represents.
With this map, we see... um... why Wolf won.
There is another method to correctly represent counties' vote total. That is to ignore the polygons all together, and represent counties by dots at their centroid, scaled by the number of votes.
We see that Pittsburgh and Philadelphia carry the largest weight, and just how important the Philadelphia suburbs are.
A number of Philadelphians find themselves engaged in politics like never before. Many are considering running for office. While you may not be eyeing a run for Governor, the 2018 primary includes the race for 3,372 Democratic Committeepeople. What would it take for you to be one of them?
What's a committeeperson?
Philadelphia is divided into 66 political wards, which serve as the geographic unit for political party organization. The Wards are subdivided into 1,686 divisions. Divisions contain about 1,000 people, and typically cover about 11 city blocks. Your division dictates your polling place.
Both the Democrats and Republicans represent their divisions with two elected committeepeople. These elections are held every four years in the party primary. Every committeeperson is up for reelection in 2018.
The bulk of a committeeperson's role is to be the foot soldiers for the party—managing get out the vote efforts and distributing endorsements. They participate in Ward meetings, and are the residents' conduit for communications (the effectiveness of this depends on the Ward and committeeperson, obviously). In some Wards, committeepeople also participate in selecting candidates to endorse. Finally, committeepeople are the ones who select Ward leaders, in a vote shortly after their election.
For more reading, I recommend the Committee of Seventy's Guide. But here, let's look at the voting patterns, and what it takes to win. Being a committeeperson is a natural point of entry to city government. And, frankly, it’s the easiest race you could run.
How many votes does it take to win?
Many committee positions are left completely empty. In 2014, 348 of the 3,372 Democratic committee positions went unfilled. An additional 95 candidates won with only a single vote, and 214 winning candidates received ten or fewer. In the map above, win count is the number of votes received by the second winning candidate.
How do people win with so few votes? Many won as write-in candidates, without their name on the ballot. All 95 one-vote winners were write-ins, and presumably either wrote in their own name or have really proud mothers. In total, 275 candidates won as write-ins. Almost all of the write-in winners were in divisions with fewer than two names on the ballot, though amazingly 11 write-in candidates won in divisions by beating someone on the ballot.
Some 168 divisions (10%) had no candidates on the ballot, while 273 (16%) had one candidate. In the rest, 1,012 divisions (60%) had exactly two candidates, one for each position, and only 233 (14%) had a competitive race with three or more.
What to look for in May 2018
Petitions for committee candidates to get on the ballot are due March 6, 2018. If you're interested, the Committee of Seventy provides a useful guide for How to Run for Office.
Even if you're not running for office, stay tuned. One of the important open questions of 2017 is the extent to which newfound national political energy will translate into local political engagement. Will our new political world effect city and state governance? May's committeeperson race will be an important signal.
 I focus on Democratic committeepeople. If you want to be a Republican Committeeperson, the competition is non-existent.
One of the ongoing debates in politics is whether elections are won by persuading swing voters to vote for your side or maximizing turnout among your base. We instinctively talk about campaigns as if they are battling for voters' minds. Can Hillary win over Working Class Whites? Can Donald Trump limit his harm among racial minorities? This implies that the most important movement in elections is due to voters who are on the fence, changing their minds.
Recent push-back has argued that simply nobody changes their minds anymore, and the people who do probably aren't all that likely to vote anyway. What really matters is energizing your base, and making sure they vote. There can be once-a-generation political realignments, but for the most part people just vote for their party. The difference between victories for one party or another is which party is more energized.
There are smarter people than me having this debate (my favorite recent book has been the eye-opening Democracy for Realists), but what I can do is look at the Philadelphia version. Where in Philadelphia does Turnout matter? Where does Persuasion?
How to measure the difference between turnout and persuasion
One of the favorite tools of demographers is the decomposition. This is a (surprisingly simple) mathematical technique for breaking down the change in something into the relative importance of the change in its constituent parts. In the footnotes, I develop a formula that decomposes a the variability in a party's win total (measured as the gap in votes between the winning party and the losing party) into variability of turnout plus variability of party preference. The equation boils down to
Variability in Gap = (Variability in Preferences) x (Average Turnout) + (Variability in Turnout) x (Average Preference)
= (Party Variability Component) + (Turnout Variability Component)
The equation reveals two campaign strategies, by maximizing each of the two addends:
(1) maximize the first component by persuading voters in divisions with variable preferences and high turnout.
(2) maximize the second component by turning out voters in divisions with highly variable turnout, where the average preference is strongly for your party.
These components boil down to obvious strategies, but the decomposition gives us a way to score divisions based on each. A division's score along each component tells us what impact that division has changes in the citywide election.
Party variability measures the impact on the citywide Democrat-Republican gap that a division's preference swings have. A division with a score of 0.02 means that when a division changes its party percentage by its average swing, the citywide vote total changes by 0.02 percentage points. This will depend on (1) how swing-y that division is, i.e. what constitutes its "average swing", and (2) its typical turnout. Divisions that swing between Democrats and Republicans, and where everyone votes, will have high scores.
Turnout variability measures the impact on the gap that a division's turnout swings have. A division with a score of 0.02 means that when a division increases its turnout by its average swing, the citywide vote total changes by 0.02 percentage points. This will depend on (1) how much turnout varies in that district, which determines its "average swing", and (2) its typical party gap. Divisions where turnout varies wildly but which is strongly Democratic or Republican will have high scores.
(I use "relative turnout", which is a division's proportion of the citywide total, rather than vote counts. Thus, large swings for a Presidential election aren't considered a swing unless the division votes disproportionately more than other divisions in that year. This also means that citywide turnout efforts won’t affect the score, only localized ones.)
Overall, it appears that persuasion has a larger impact on Philadelphia's vote gap than turnout. The longer tail in the top facet means that many divisions have high scores along this dimension. This is largely because turnout doesn't vary much between divisions--a division that represents 1% of the city's votes will do so in most elections. The total votes for the city as a whole fluctuates dramatically, but divisions' share within a given year doesn't.
I expect this result to reverse when we zoom out to the state level; relative turnout variability is likely much higher between the city and the rest of the state than it is within the city.
Where Turnout Matters
Turnout will matter in divisions where the percentage of voters varies from year to year, and where voter preferences are overwhelmingly Democratic or Republican. The map below colors divisions by the turnout component score, with the hue representing whether Republicans or Democrats benefit from an increase in turnout. The Northeast and Manayunk have low scores--they are too split between parties for turnout to help one party more than another. West Philly also has surprisingly low scores: turnout there is simply too stable for turnout efforts to have an effect (except for in University City). Interestingly, the predominately-Hispanic section of North Philly has very high scores: voters turn out for Presidential elections and not for other elections, and they vote overwhelmingly Democratic.
Where Persuasion Matters
Persuasion campaigns will have an impact in divisions where the party preferences vary from election to election, and where turnout is always high. The map below colors divisions by their party preference score. The Northeast, Manayunk, and Port Richmond now light up: these are all neighborhoods where people vote at high rates, and swing between Democrats and Republicans. North Philly and West Philly are simply too consistent in party preferences--Democrats win with 99% of the vote--for convincing to matter.
What does this mean for the state?
Our most important upcoming election is the 2018 race for Governor. In a future post, I will replicate this analysis at the state level. What counties can be persuaded? What counties need to be mobilized? I expect that the state results could be very different than for within Philadelphia. Stay tuned!
 Of course, strategies for convincing and strategies for turnout are not so separable. There are probably a lot of campaign actions that achieve both.
Appendix: The Decomposition
First, I'll motivate demographers' common decomposition in the case of only two years, then I'll expand it to a multi-year version. (Apologies for the hideous math, I haven't been able to figure out how to typeset math in this blog yet.)
Suppose we have a value G that is the product of two numbers, P and T. In the current case, G is the citywide vote gap between Democrats and Republicans, defined as the total votes for Democrats minus the total votes for Republicans. P is the percentage gap between Democrats and Republicans, and T is the total turnout in votes. Notice that
G = P * T
Consider the case where we observe these values in two time periods, 0 and 1. We are interested in what caused the gap G to change between year 0 and year 1. It could be changes in the percentage P (call this persuasion) or changes in turnout T (call this mobilization). That is, we are interested in G_1 – G_0. We can write
G_1 – G_0 = (P_1 * T_1) – (P_0 * T_0)
A little bit of algebra can show that:
G_1 – G_0 = (P_1 – P_0) * Avg(T) + (T_1 – T_0) * Avg(P)
These two addends are our components. The first addend can be interpreted as the change in the vote gap due to the changes in the proportion voting for the Democrat, scaled by the average number of voters. The second is the change in the gap due to changes in the total people voting, scaled by the average party preference.
A simple picture helps build intuition for this decomposition. Below, we are interested in the difference of the areas of the two rectangles, G_1 and G_0. Notice that the area of each can be written as base times height, or P * T. The area from G_0 to G_1 changes both because P changes and T changes. But how much of the change was due to P, and how much due to T?
One way to decompose the area is to draw the diagonal between the top right corners. The total difference between G_1 and G_0 is then the sum of the areas of the two trapezoids.
Remembering the 8th grade formula for the area of a trapezoid (area = height * avg(base lengths)) gives us the decomposition from above. Notice that the trapezoid on the right is the first component, and the trapezoid on the top is the second.
We aren't limited to doing this for only the city as a whole. We can calculate this decomposition for each division. The total gap in the city is the sum of the gaps in each precinct. The total state gap is given by summing over each precinct i:
G_1 – G_0 = SUM(G_i1 – G_i0)
= SUM(P_i1 * T_i1 – P_i0 * T _i0)
= SUM( (P_i1 – P_i0) * Avg(T_i) + (T_i1 – T_i0) * Avg(P_i) )
= SUM( Preference_Component_i + Turnout_Component_i)
We can thus calculate each precinct’s two scores, and then the total change in the city-wide gap between year 0 and year 1 as the sum of both scores for all precincts.
Thus far we've only been using two years. However, we want to be able to incorporate all of the elections at once.
The absolute value of the first term can be written as
|(P_i1 – P_i0) * Avg(T_i)| = (|(P_i1 – Avg(P_i)| + |(P_i0 – Avg(P_i)|) * Avg(T_i),
(this relies on the fact that Avg(P_i) is between P_i0 and P_i1). This may look complicated, but it has a nice interpretation: it is the size of typical deviation of P_i from its average value, scaled by the average turnout.
We similarly write
|(T_i1 – T_i0)| * Avg(P_i) = (|(T_i1 – Avg(T_i)| + |(T_i0 – Avg(T_i)|) * Avg(P_i)
Where I’ve left P_i outside of the absolute value so the sign captures partisanship: positive values represent a Democratic gap, negative a Republican gap (signs are arbitrary).
This inspires a multi-year version of the decomposition:
Decomposed Variability = Avg(|P_ij – Avg(P_i)|) * Avg(T_i) + Avg(|T_ij – Avg(T_i)|) * Avg(P_i)
= Preference Variability Component + Turnout Variability Component
Notice that the last term does not use the absolute value on P_i. This means that it will have the sign of the typical partisan gap, and thus the last term is capable of showing which party an increase in turnout will help.
(Statsy readers: Avg(|P_ij – Avg(P_i)|) is like a linear version of the variance).
This allows us to score a county i based on whether its percent-Democrat varies more, or its turnout. It’s no longer an accounting identity, but it does simplify to the accounting identity above when you only consider two years.
We could be done here, but to really compare across years, I divide a precinct’s vote total by that year’s City vote total, T_j. This means that the left hand side can be interpreted as the percentage gap between Democrats and Republicans, rather than a raw vote count. For example, if the Republican won 52% to 48%, the gap would be 4 percentage points.
Decomposed Pct Gap = Avg(|P_ij – Avg(P_i)|) * Avg(T_ij / T_j) + Avg(|T_ij/T_j – Avg(T_ij/T_j)|) * Avg(P_i)
Forecast: Who will win the PA House?
The race for the Pennsylvania Senate
The race for the Pennsylvania House
Evaluating the Live Election Tracker
So you wanna be a Committeeperson