I’ve spent the last few months trying to see if there’s a way I could plausibly predict November. Sites like FiveThirtyEight do a plenty good job of national races, but what can we say about state races? Could Democrats win the Pennsylvania House? The PA Senate?
Well, I finally think I’ve got a model that does a plausible job.
Soon, I’ll publish some predictions for the winners. But first, let’s look at turnout.
Trends in PA Turnout
Today, I’m focusing on the turnout in even-year general elections (so all Presidential or Gubernatorial races) since 2002. I’m going to only use the two-party vote and use the total votes for President or Governor as turnout, rather than the actual turnout. This ignores third party voters and people who skipped the topline election altogether. The difference between this and actual turnout won’t be large, and this makes the predictions later easier
Between 3.5M and 4.1M Pennsylvanians voted in the midterms since 2002.
What’s a good guess for turnout this year? 2006 seems like an obvious benchmark. In that year, an incumbent Democratic Governor and Senate candidate Bob Casey, Jr capitalized on a national Democratic surge against an unpopular president. Sounds familiar. In that election, 4.09 million Pennsylvanians voted for governor. The other high was from the other wave election in the period: Republicans' sweep of 2010.
I’ve built a model that predicts turnout of every precinct using data from even years from 2002 – 2016. The model uses information on the election (if it’s a midterm, the party in the presidency, whether local races are contested, the incumbency of local races, the presence of female candidates, and district population), and allows for different precinct-level responses to midterm elections, presidential party, and turnout growth or shrinkage over times.
The thing that makes predicting state races so hard is that there aren’t surveys. Without them, it’s really hard to find good proxies for voter excitement and disproportionate interest. Instead, I’ve built the model to simulate the full distribution of types of elections, from very Democratic to very Republican, and then give the entire range of possible results. We can then use that to either (a) examine the full range of possible outcomes, or (b) plug in specific values and see the results, for example “what if the election looked like 2006?”
To achieve this, I've modeled the correlations in turnout among precincts, to identify groups of precincts that all turn out together. Some precincts all come out disproportionately in midterms, others come out only in Democratic wave years. It’s this factor that is the biggest unknown moving into November: what type of election will it be. These correlations create a lot of uncertainty: you can't rely on the Law of Large Numbers to cancel out all of the districts' indiosyncracies.
So, does the model work?
Testing the Model: 2016
To test the model, let’s pretend its September 2016. Using only data from 2002-2014, and I fit it, and then generate predictions for 2016 turnout.
In 2016, I would have estimated 5.68 million votes cast statewide, with a 95% credible interval of 5.16M – 6.29M (the uncertainty is huge, but listen, science is hard, and I’m a serious person). In reality, 6.01 million votes were cast for President. I undershot it by a little bit, but the result is well within the interval.
Capturing relative turnout is arguably more important for final results than overall. Which places voted more than usual, and which less? Let’s compare the model’s predictions for Vote in 2016 / Vote in 2012, compared to the actual values.
I did less well on that. Above is a plot of the observed turnout growth in each geography (measured as turnout in 2016 divided by turnout in 2014) versus what the model would have predicted. A perfect prediction would have all of the points on the 45-degree.
There maybe exists correlation between my predicted growths and the observed results, but it’s weak. It turns out that the growth depends heavily on the partisanship of the election; the correlation factors that I discussed above. Since I don't know what that is ahead of time, I have to simulate them from all of the possibilities, resulting in the elliptical blob above. The model easily identifies these factors retrospectively--I can say for example that 2006 was a very strong Democratic year--but I don’t in general have a way to predict that for an upcoming election.
Enough delay. What do I predict for turnout in 2018?
There will be 4,295,981 votes for Governor.
This strikes me as high. It’s higher turnout than any midterm in my dataset. But the model did relatively well in the holdout test of 2016, and I don’t want to commit the sin of post-hoc adjusting. So this is my prediction, and I'm sticking to it.
What are the arguments for this astronomical number? You, a person who somehow reads this blog and thus are well down the elections analysis rabbit-hole, might have noticed unprecedented excitement for a midterm, and be unsurprised by a high prediction. But the model doesn't have that info. Instead, it does see that (a) a Republican is president, which increases midterm turnout more in Democratic precincts than a Democratic president increases in Republican ones, (b) many more races are contested, including in the newly-redrawn congressional districts and a ton of contested state house seats, and (c) after all of the adjustments, turnout has been steadily increasing since 2002. All of these combine to create a prediction for midterm turnout that is unprecedented in the dataset. And some of those features, particularly the contested races, are probably serving as proxies for voter enthusiasm.
There’s a lot of uncertainty in the prediction because, again, science is hard. The 95% credible interval is 3.85M to 4.72M. That interval would include the turnout of the last two wave midterm elections—4.09M in 2006 and 4.00M in 2010—and exclude the lower-turnout years of 2002 and 2014.
Within Philadelphia, I project 460,000 voters, with a 95% CI of (410,000, 517,000). Even at the lower end, that would beat out the 2006 and 2010 turnout highs.
What does the model have to say about precinct-specific changes? Below is a plot of its predictions in Philadelphia, relative to turnout in 2014. Keep in mind that these predictions are equivalent to the blob plot above: there's a loose predictive power, but a ton of noise based on what type of election this ends up being.
I predict particularly high turnout in Center City East and the River Wards, upwards of 60% growth over 2014. That one bright yellow precinct in the River Wards is because of population changes that have seen increasing midterm turnout, and a competitive State House election in a neighborhood that hasn't seen one for years. West Philly, North Philly, up to West Oak Lane, will likely turn out similarly to 2014, given their largely uncontested races.
So I tentatively expect record turnout, at least among election since 2002. Will it happen? I’ve over-predicted turnout before. We’ll see if I learned my lesson.
Until that test comes, let’s brazenly barrel forward and predict the actual results. Coming soon.
Data comes, as always, from the amazing Open Elections Project.
I also leaned heavily on Ballotpedia to complement and extend the data.
GIS data is from the US Census.
Welcome to the Sixty-Six Wards District Profiles!
Democrats need to pick up 20 seats to tie up the PA House. There are 19 districts currently held by Republicans that voted for Clinton in 2016. All of them are in the Philadelphia region. I've profiled Delco's District 168, Philadelphia's Districts 170 and 177, and Bucks' District 178.
Today, let's look at Chester County, which showed some of the region's strongest anti-Trump shifts. If Democrats want to have any chance to win the House, they need to win these districts with strong, Republican incumbents but which voted for Clinton in 2016. District 155 exemplifies these important districts: Republican incumbent Becky Corbin won by 16 points in 2014 even as Clinton won the district by 8. She's being challenged by Danielle Friel Otten, running from the left with endorsements by Emily's List, Planned Parenthood, and the SEIU.
District 155 gerrymanders through Chester, combining slices of Democratic Phoenixville and Spring City in the East with vast swaths of lower-density Republican lands.
Corbin was first elected in 2012, and won reelection by 21 and 16 points in 2014 and 2016, respectively. In 2016, voters gave her a landslide victory even while voting strongly for Clinton, providing a 24 point swing that was typical of districts in Chesco.
[Interactive State House Map] [Interactive Presidential Map]
All but one precinct voted more for Corbin than for Trump, with 18 of the 30 precincts showing an over-20 point difference between the Republicans.
While Phoenixville and Spring City are denser than the right of the area, the sweeping Republican stretch in Uwchlan (above Downingtown and Exton) still dominate the vote, by virtue of their huge landmasses. Friel Otten lives in Uwchlan, which may prove important in swaying this traditionally Republican stronghold.
The district is basically entirely White, with the most diverse Census Tract being in the heart of Phoenixville, which is still 71% non-Hispanic White, and 19% Hispanic (and only a third of that tract is even in the district). No other tract is less than 79% White.
Lining up the precinct votes in order of 2016 vote shows limit of the Democratic precincts' impact, and the steep hill for Friel Otten. She needs many of the broadly Republican precincts to shift--like they did for Clinton--and can't just rely on the local urban cores.
Unlike the other districts I've looked at, this one shows almost no difference in relative turnouts between 2014 and 2016: all precincts increased their votes by 75%, regardless of party. In every other district, Democrats turned out disproportionately more in 2016. In those cases, I thought it boded well for Democrats: if an engaged Democratic party looked more like 2016 than 2014, that would help them. Here, that looks less likely: the huge engagement differences between 2014 and 2016 didn't yield any Democratic edge, so it's hard to imagine them getting a turnout edge here. Any progress will have to be changing the minds of voters.
This district is a stretch for Democrats, but is the type of District--with a strong Republican incumbent but steep anti-Trump sentiment--where they will have to do well to take the state house in November.
Election data from the Open Elections Project
Population data from the 2016 American Community Survey 5-year estimates.
Boundaries and GIS data from election-geodata
Base maps provided by maps.stamen.com/
For the last few weeks, I've been posting District Profiles of the hot races in the lower chamber of PA's General Assembly, aka the State House. One cool aspect of being able to generate plots systematically is it provides a compelling way to compare data across districts, once you've oriented yourself to a given plot.
Today, I present those Multiple Minis.
The Bar Plot
Maps are great, but sometimes organizing data in a non-geographic plot lets you emphasize a different relationship. One useful way to visualize all of the precincts in a district is by lining them up in order of democratic result. Below is that bar plot from last week's post on Northeast Philly's District 170:
Each vertical bar is a precinct, divided into its Republican and Democratic votes. The width of the bars is proportional to the votes in that precinct, so the total red and blue area represents to total outcome in the district.
This plot is useful to see the overall outcome in the district, the diversity of the precincts in the district (the slope of the change), as well as any skew in the tails. PA-170, for example, has precincts whose votes ranged from 75% Republican to 66% Democratic.
To get a sense of an entire county, I've generated this plot for every district, for the 2016 Presidential results (since a bunch of State House races were non-competitive. I've laid out the districts in relatively geographic order (for a full map of the districts, see the Appendix at the bottom).
I'll move North to South through the suburbs. Here's Bucks County, with super competitive districts, and a lot of districts with urban-ish cores that create Democratic skews:
Comparatively, here's Montgomery County, which was much more Democratic, especially in the south east corner closer to Philadelphia:
Chester is almost as competitive as Bucks, though four districts have intense skews at Democratic cores, especially out West.
In Delaware county, five districts are competitive. Four were overwhelmingly Democratic, including Chester City and Upper Darby.
Finally, gigantic Philadelphia shows its diversity, with much of the city voting overwhelmingly for Clinton, but the Northeast, Manayunk, the Riverwards, and parts of South Philly housing significant Trump votes.
The Scatter Plot
The other plot that I've generated for each district is a scatter plot of the 2016 Presidential vote versus State House vote. Below is the plot for PA-170.
This plot contains a ton of information. Each dot is a precinct, sized by its total votes. Moving left-to-right represents more Democratic votes in the presidential election, and moving bottom-to-top more Democratic votes in the State House. In the top right quadrant are precincts that voted for Clinton and the Democratic Representative, in the bottom left are precincts that voted for Trump and the Republican.
Precincts along the 45-degree line voted for Trump exactly as much as they voted for their representative; points below the line voted more for the Republican rep than for Trump, points above the line voted more for Trump than for their Republican rep. In PA-170, the most points are below the line, indicating that the District voted more for Martina White, the Republican State House candidate, than for Trump. That imbalance was just enough that White won the district and Trump lost it.
In multiple mini fashion, here are those results for every district in Bucks County:
Districts PA-144, 142, and 18 all had uncontested State House Republicans, and PA-140 an uncontested Democrat, explaining the 100% results. PA-145 voted for both the Republican rep and Trump, and showed no anti-Trump shift: the districts all lie right on the 45-degree line. PA-143, 178, 31, and 29 show anti-Trump skews that we'll become familiar with in later plots, and PA-141 exhibits a rare pro-Trump skew: Democrat Tina Davis far outpaced Clinton.
Here is Montgomery County:
Districts PA-70, 149, 148, 153, and 154 voted almost entirely for Democrats. The others were more balanced, but with serious anti-Trump shifts. This includes PA-61, 151, and 152, in which vast majorities of the precincts voted for Clinton but for the Republican representative.
Here's Chester County:
PA-26 had an uncontested Republican, and all of the other districts showed significant anti-Trump shifts. You can see the skew that visible in the bar charts above: PA-13, 74, and 26 all have precincts that are stretched far more Democratic than the others.
PA-160 was uncontested, and PA-166, 164, and 159 overwhelmingly Democratic. The others all show anti-Trump shifts.
Most of the districts had un-contested Democrats. The two competitive races, 177 and 170, had typical anti-Trump shifts.
These plots give us a nice lay of the land. Philadelphia, as a whole, is a strong Democratic city with swing-y suburbs, and those suburbs had significant anti-Trump shifts. It remains to be seen if those incumbents will do even better in 2018, when Trump isn't on the ballot, or if the intervening two years have caused the Trump antipathy to seep down to the rest of the Republican party.
Election data from the Open Elections Project
Here are the full district maps, in their gerrymandered glory.
Forecast: Who will win the PA House?
The race for the Pennsylvania Senate
The race for the Pennsylvania House
Evaluating the Live Election Tracker
So you wanna be a Committeeperson