Welcome to the Sixty-Six Wards District Profiles!
Democrats need to pick up 20 seats to tie up the PA House, and 19 districts currently held by Republicans voted for Clinton in 2016. All of them are in the Philadelphia region. I've profiled Delco's District 168, and Philadelphia's District 177. Both of those are nested in Congressional Districts that FiveThirtyEight considers "Solid", and all "Solid Democrat".
The only Congressional District in the region that is considered "Likely"--the step lower--is Bucks County's CD 1, where Incumbent Brian Fitzpatrick R is favored to defeat challenger Scott Wallace, with a 78% chance as of the time of writing this.
Bucks County has within it some fascinating State House races. Maybe the most interesting is District 178.
District 178 covers Northampton Township, to the West of Newtown, and stretches up to the NJ border, including New Hope.
Until December 2017, the district had represented by Scott Petri (R), who stepped down to become the Executive Director of the PPA. He was replaced in a Special Election in May, in which Democrat Helen Tai beat Wendi Thomas by the narrowest of margins.
How close was the special election? Tai won by 101 votes, 6,366 - 6,265. Less than 1% of the vote. In fact, in the separate primaries that happened the exact same day, Thomas received 6,649 votes, and Tai received 6,269. Tai won by the margin of people who forgot to push both buttons. It was close.
The two are facing off again in November. Could the Special Election result be carried over to a higher-turnout general?
The district swung hard towards the Democrat in the Special Election. In 2016, Petri won 61% of the state house vote, even as Trump won only 51%.
[Interactive State House Map] [Interactive Presidential Map]
New Hope and neighboring Solebury voted overwhelmingly for Clinton, and even marginally for Democratic Rep Dougherty. The entire south of the district voted for Petri and for Trump.
Every single precinct in the District voted more heavily for Petri than for Trump in 2016.
Here's the corresponding map for the Special Election. Every precinct swung towards the Democrats.
[Interactive Map Here]
The swing was huge. Below is a similar scatter of precincts, but comparing 2016 to the 2018 Special election. Points above a 45-degree line mean that they voted more Democratic in 2018 than 2016. (Every point is above that imaginary line)
[Interactive Plot Here]
The district is heavily White. Some 89.5% of the residents are non-Hispanic White, with 4.5% of the residents Asian, 2.6% of the residents Hispanic, and 1.5% of the residents Black.
[Interactive Map Here]
The turnout of the district typically favors the south, where population and votes are denser. Here is the turnout for 2016.
[Interactive Map Here]
But turnout changed dramatically for the Special Election. Comparing the relative change in turnout from 2016 to the Special Election, the north lights up. In 2016, 38,588 votes were cast. In the Special Election, 12,631 votes were, for an average ratio of 33%. But the north voted at rates up to 40% of 2016, while many of the southern district was below 30%. Overall, the south still can carry the day, but by much less than before.
[Interactive Plot Here]
In 2014, 23,544 votes were cast for Governor, with Corbett beating Wolf with 54% of the vote. Petri ran unopposed for the PA House.
So we should probably expect at least twice as many voters in November as in the Special Election. Will the Republicans from 2014 and 2016 show up as they do in every midterm? Or is the shift more substantial?
This race could serve a sign of a huge Democratic wave in November. If Tai wins reelection, that will mean that a District that voted for Trump in 2016 and had a long-serving Republican representative was caught up in a Democratic. It would have been impossible to imagine a Democrat winning this district four years ago, but Helen Tai did it in May.
Election data from the Open Elections Project
Population data from the 2016 American Community Survey 5-year estimates.
Boundaries and GIS data from election-geodata
Base maps provided by maps.stamen.com/
Welcome to the Sixty-Six Wards District Profiles! Democrats need to pick up 20 seats to tie up the PA House, and 19 districts currently held by Republicans voted for Clinton in 2016. All of them are in the Philadelphia region. Yesterday, I profiled Delco's District 168, where DSA supported Kristin Seale is challenging Republican incumbent Christopher Quinn. But what if you wanted an interesting election, but insisted on taking the El? Well, we've got a fascinating race right in Port Richmond.
Today: Philadelphia’s PA House District 177
District 177 covers the River Wards above Lehigh, with an arm that stretches up to the Boulevard and Rhawn. The district has been represented by John Taylor since 1984, who headed Philadelphia’s Republican City Committee until recently stepping down. It largely contains the River Wards, while carving out a space for Taylor's home, which lies in the Frankford arm of the boundaries.
With Taylor stepping down, this race doesn’t have an incumbent for the first time in 34 years. Democrat Joe Hohenstein will be facing off against Republican Patty Kozlowski. Hohenstein challenged Taylor in 2016 and lost 55-45, and won a crowded Democratic Primary with a plurality of 37% of the vote. Kozlowski ran unopposed.
In 2016, Taylor beat Hohenstein by 10 points, even though Clinton beat Trump by 17 among the same voters. Taylor won 44 of the 71 divisions
[Interactive map here]
The spatial map can be a little misleading, as those divisions along the river are largely commercial/industrial, and have low population densities. A map of votes per mile shows the relative strengths of the districts.
[Interactive map here]
The district also manages to gerrymander out significant chunks of non-White Frankford. The one nub that ventures into predominantly Hispanic and Black neighborhoods along Castor Ave are also the divisions that voted the strongest for Hohenstein two years ago.
[Interactive map here]
So how close is the race overall? The plot below lines up divisions by their 2016 percent Democrat, with widths denoting a division's total votes. A ton of divisions sat between 40-50% Democrat in 2016, which would mean FiveThirtyEight's 16 point swing towards the Democrats (an 8 percent increase in the vote) would easily put them over.
[Interactive plot here]
There are two huge good signs for Democrats: the retirement of Taylor, and how well Clinton did in 2016. She won by 17 points, even as Taylor carried the district. Every single division voted more strongly for Taylor than for Trump in 2016, indicating either favor for the incumbent or anti-Trumpism.
[Interactive plot here]
A huge question in this district is going to be turnout. The district is extremely diverse, racially and politically, and the relative mobilization of precincts could determine who wins.
Turnout in the 2014 midterm was extremely weighted towards the Republican divisions. While the district as a whole saw 2.8 times as many voters in 2016 as in 2014, many Democratic divisions say *four times* the turnout. The heavily Republican divisions, because they turn out so strongly in midterms, had less relative growth.
There's one more data-point, and it's a doozy. Democratic turnout in May's primary was 4,533 voters. Only 1,541 Republicans voted in May, but Kozlowski was running unopposed.
Four years ago, only 13,237 people voted in the *general*, for the Governor's race. It's hard to figure out exactly what this means, since in 2014 this district didn't even have a Democratic Primary, though it *did* have a Governor's primary, with 2018 didn't.
The map of Democratic turnout, though, is revelatory. The map belows shows votes in the May primary as a fraction of total votes in 2016. This gives a good measure of relative mobilization, assuming that the 2016 election represents a peak attainable turnout.
The very bottom of the district, along Lehigh Ave, went bonkers in the Primary, with over 40% of 2016 turnout in some divisions. Those are also where Hohenstein cleaned up:
Election data from www.philadelphiavotes.com
Population data from the 2016 American Community Survey 5-year estimates.
Boundaries and GIS data from www.opendataphilly.org
Base maps provided by maps.stamen.com/
Updates: I've edited this article to reflect the fact that Representative Taylor lives in the Frankford arm of the district, explaining at least that part of its odd shape.
Welcome to the Sixty-Six Wards District Profiles! I'll be walking through the most interesting races occuring in Philadelphia and the burbs (mostly the burbs), starting with the PA House. Democrats need to pick up 20 seats to tie up the PA House. Nineteen districts currently held by Republicans nonetheless voted for Clinton; all of them are in the Philadelphia region.
Today: Delaware County’s PA House District 168.
District 168 covers the Northeast of Delco, including parts of Media. It is represented by Christopher Quinn, a Republican who was first elected in 2016. Before that, it was represented by longtime Republican rep Thomas Killion, and has never been represented by a Democrat.
In 2016, Quinn beat Democratic challenger Diane Levy by 12 points. Clinton beat Trump by 7 among the same voters. In Special Elections since, FiveThirtyEight has calculated a 16 point swing towards Democrats nationally, which would provide a 4 point Democratic victory if it held here. Of course, Quinn won significantly in 2016 in this anti-Trump district, even in the year that Trump was on the ballot.
This year, Quinn is being challenged by Democrat Kristin Seale. She won a competitive primary by only 100 votes, and would be the first woman and first Democrat to hold the seat. She is running as a strong progressive, endorsed by the DSA and several union groups.
In 2016, the Republican Quinn won by 12 points, and won in 33 of the district’s 40 precincts.
The precincts that voted for Democrat Levy were all centered around Media, the district’s “urban” core.
Those precincts are both the densest residentially—the represented 19% of the votes—and the most racially diverse, though no precinct is less than 74% White.
Lining up the districts by results from the 2016 in the State House race shows how dramatically Republican it was.
For Kristin Seale to win, she would need to swing those marginal precincts towards her (raise the bars) or *drastically* increase turnout in the Democratic precincts (widen the Democratic bars). The plot makes it pretty clear that just increasing turnout in Media probably won’t cut it.
The map of the 2016 Presidential race looks completely different from the State House race of the same election, with 23 of the precincts voting for Clinton:
In fact, every single precinct voted more for Quinn than they did for Trump.
Finally, how should we expect turnout to compare to 2016? Some 38,000 votes were cast in the State House race in 2016, compared to 16,000 in 2014. The Democratic precincts saw much larger jumps in turnout between elections, mirroring a longstanding rule that midterm election turnout favors Republicans. If a Blue Wave means that Democratic turnout is more like a Presidential year, the election could be hers.
There are a number of open questions that are unanswerable. The biggest is: What should we make of a district that simultaneously elected a Republican State Rep and swung hard towards Clinton, both to landslide victories in the same election? Will these Delco voters stick with their favored incumbent, especially without Trump on the ballot? Or has the intervening two years of a Trump presidency turned these voters, who already disliked Trump, against Republicans all the way down the ballot? We’ll see in November.
Election data from the Open Elections Project
Population data from the 2016 American Community Survey 5-year estimates.
Boundaries and GIS data from election-geodata
Base maps provided by maps.stamen.com/
On Thursday, I looked at the composition of the Pennsylvania House. The message: it could be in play, but only in the most extreme wave election. How about the Pennsylvania Senate? It's much safer for Republicans. How close could it get?
The Composition of the PA Senate
The Senate has 50 members, who serve four-year terms. That means 25 are up for election in November. Currently, the Senate is lopsided: 34 Republicans versus 16 Democrats (in a State that was evenly split in the 2016 election). Among those elected in 2014 and up for reelection, 18 are Republicans, and 7 Democrats. Nine of those 2014 races were uncontested, including four Democrats and five Republicans.
To even the Senate, Democrats need to pick up 9 seats. That’s a steep, steep task with only half of the districts on the ballot. Suppose we assume a Democratic swing of 16 points from 2014. Only three of the districts won by Republicans would switch parties. Another three districts went Republican by between 16 and 20 points. Of course, some of those uncontested districts could be in play too.
Alternatively, use the 2016 Presidential Election as a guide: there are seven districts that voted for Republican Senators in 2014 but for Hillary Clinton in 2016, and one that voted for a Democratic Senator but for Donald Trump. Five of those districts are in Philadelphia’s suburbs.
District by District
Below is a breakdown of the districts with 2018 elections. I've broken them into chunks based on the party of the current Senator, and whether they voted for Clinton or for Trump. Below, I present each district's 2016 results (as the two-party vote, meaning excluding third parties and non-votes), and turnout in the 2014 and 2018 primaries.
First, the seven districts with Republican senators defending seats that voted for Clinton.
There is one district that's the opposite, with a Democratic Senator but which Trump won.
There are 11 Republican Senators in districts that Trump did win. Three of those are within a 16 point swing.
Finally, there are six Democratic Senators defending districts that also voted for Hillary.
Could the Senate be in play?
Democrats would need to gain 9 seats to even up the State Senate. To pull that off, they would have to win all but one of the ten districts that either (a) voted for Clinton or (b) were within a 16 point gap in 2014. That's a long shot. But even a less lopsided Senate could have a big impact on state politics, and gains now could set up a real Senate challenge when the other 25 senators are up for election in 2020.
November 6th are the midterm elections, the first national election since Donald Trump became president. Nationally, Democrats are expected to make gains. The discussion has largely centered on whether the "Blue Wave" will be large enough to help Democrats take control of the U.S. House or even the Senate.
Pennsylvania has important state and local races as well. Voters here will be choosing a Governor and Senator, in two elections where Democratic incumbents are well-positioned. How about the down-ballot races? Could the Pennsylvania House of Representatives be in play?
The Composition of the PA House
The Pennsylvania House has been solidly Republican since 2010, and has had a Republican majority for 20 out of the last 24 years. In 2016, 121 Republicans won versus 82 Democrats. Of those races, 51 were uncontested Republicans, and 47 uncontested Democrats.
In order to take back the house, Democrats need to pick up 20 seats. In a typical year, that would be impossible. But this year? Nationally, in Special Elections for congress, Democrats have seen an average 16 point increase over their baseline. This would imply that a race that went 58-42 to Republicans would be in play. If every state house district swung towards Democrats by 16 points from 2016, Democrats would pick up *17 seats*. And that's not counting the races that Democrats failed to contest in 2016.
Philadelphia is vitally important in this picture. Some 13 of those Republican districts within 16 points are in Philadelphia or its four-county suburbs.
A surge of that size is probably not going to happen, but it shows what's in play. These hyper-local races probably benefit from very strong incumbency, and it's unclear how national signals will translate.
Pennsylvania has 19 districts with a Republican representative that was won by Hillary Clinton. Stunningly, every single one of them is in Philadelphia or its suburbs. And in two of those Democrats didn't even field a candidate. If Donald Trump has nationalized the election, and energizes Democratic voters, Democrats could capitalize on this anti-Trump sentiment.
Plenty of people are trying to read the tea leaves in May's Primary turnout. Could surges in Democratic Primary turnout presage victories in November? I look at turnout below. This discussion comes with a strong caveat: it's unclear how well primary turnout translates to general election results, and that is even further complicated by the fact that turnout is driven by where races are competitive, and we had many (many) more competitive primaries than in 2014, the last midterm. That being said, below is a plot of primary turnout changes, from 2014 - 2018, broken down by 2016 election results.
District by District
Finally, let's look at the raw data for the districts. To make this manageable, I've broken them into chunks based on whether the primaries were contested, party of the Representative, and 2016 Presidential results. Below, I present each district's 2016 results (as the two-party vote, meaning excluding third parties and non-votes), and turnout in the 2014 and 2018 primaries.
First, the 39 districts that the Democrats didn't contest in 2016 but now have a candidate, including two that Hillary won.
Next, the other 17 districts that voted for Republican representatives but also for Hillary in 2016. They're all in Philadelphia and its suburbs.
Next, the 53 districts where a Republican representative won a contested race and Trump did too. Four of these were within 16 points of a Democratic victory in 2016.
Where were the districts that were extra Trumpy? Here are the 17 that voted for a Democratic Representative, but also for Trump (including 8 that the Republicans didn't field a candidate).
Next, the 65 districts where Democrats won both the State House and Presidential race.
Finally, the twelve districts that Democrats didn't contest in 2016 and still aren't contesting in 2018.
Coming up next
What do we make of all this? Unfortunately, it's unclear how this election will break. Are the districts that voted for a Republican Representative but Clinton in 2016 more likely to swing towards the Democrats, given the first two years of Trump's presidency, or will they continue to support the incumbent, especially with Trump not on the ballot? Will the districts with primary surges show up? How important is the fact that Democrats have fielded candidates in 39 districts that went uncontested in 2016, and can that shift turnout there in ways that change the Governor's race?
As the summer comes to a close, I'll be looking at a few of these Philadelphia-area races, and trying to understand the landscape for November.
A ton of you shared your voter number as part of the live tracker on May 15th. This data helps me get at a question I've wondered for some time: When do Philadelphians vote? The answer may surprise you.
When all was said and done, I had 619 valid submissions from election day. This included data from 50 wards, and from every hour of the day. Each submission consisted of the time of day, the ward + division, and the cumulative count of votes at that division up to that time.
One important thing to note is that my data is not representative of the city as a whole. The map below shows where data was submitted from. I have many more points from Philadelphia's white, predominantly wealthy wards. My model will be overweighted towards these people. I break down the results by race and income, as much as is possible, at the end.
When did Philadelphia Vote?
For each point, I divide the current count by the final vote count in that division to calculate the proportional vote total at that time. This puts divisions on comparable footing. I then model the cumulative percentage of the vote across the city at each hour, treating each row of data as independent.
Above is the distribution of votes across the day. About 19% of the votes came from 7-9 am. Another 16% of the votes came from 11am - 1pm, during the lunchtime surge. Some 42% of the votes came between 4 and 7pm.
I was amazed by the steady increasing nature of the plot; I expected the distribution to be much more heavily weighted to before and after work. Voting clearly slows down between 10 and 11am, and 1 and 3pm, but the lunchtime push is much stronger than I expected. Still, after work (starting around 4pm) is by far the strongest surge.
The results between 7 and 8pm require some discussion. I don't impose any rule that the total be increasing, and the model actually predicts that negatively many people voted in the last hour. This is due to the thunderstorm that covered the city at exactly that hour. Don't overinterpret the negative sign, the model isn't that smart. Basically, nobody voted. Assuming that the trend from 6-7 would have continued, we could have seen another 30,000 votes without the storm. (This is also what broke my prediction...).
One other weird thing is that my final total doesn't finish at 100%. This means that people were reporting numbers at 8pm that didn't match the city's reported total. I'm not totally convinced how this happened, maybe they had actually voted earlier and got confused by the submission form?
Another way to view this data is to break voting down by hour.
The highest proportion of votes came from 6-7pm, with nearly 15% of the vote. Next was 7am with 11%, followed by the 4-6pm surge, with just under 10% of the vote each hour.
Votes by Ward Groups
This pattern could obviously vary by neighborhood type. I don't have quite enough data from certain parts of the city to do a fine break down, but we can get a crude idea by using Race and Income data from the American Community Survey. I group wards based on the maximum represented race/ethnicity, and whether the median household income is below or over $60,000. This gives four groups: Black under $60K, Hispanic under $60K, White under $60K, and White over $60K. I only have two data points from the Hispanic wards, but let's look at the remaining three groups.
It's also important to note that I don't actually use the race or income of the division. Even within Black or White low-income wards, I think that my data will overrepresent wealthier and whiter divisions. For example, among the Black wards, 51% of the submissions come from three wards: Ward 22 in East Mount Airy, 36 in Point Breeze, and 46 in University City, all wards that are notoriously gentrifying. This suggests that maybe the data will represent the gentrified divisions, and still not capture the Blacker neighborhoods. Maybe we'll get more representative data in November.
With that being said, here is the same model fit only within each ward group.
In the Black wards, the overall trend looks similar to the city as a whole, though significantly more of the votes came between 8 and 9am, and fewer from 9 to 10. There was also a more exagerrated lunchtime rush, with 13.6% of the votes coming from 12-1pm.
The White wards with incomes less than $60K are much steadier through the day. Some 12.2% of the votes came before 8am, though five other hours--12-1, 2-3, and 4-7--broke 10% of the vote.
Curiously, the White wards with incomes over $60K were the wards with the lowest share of the voting coming before 9am--only 17.9%--but where voting continued between 9 and 10am and where the after-work surge started as early as 3pm. These may be wards where people live close to work, and can arrive to work late or cut out early to vote.
Election day was last Tuesday. I’ll do a full analysis of the results in a later post, but first I want to talk about the live election tracker, which you all participated in beyond my wildest dreams. How did it do?
I developed and tuned the model without having any data that would look like what we actually got on election day: I had final vote counts by division, but not the time-series—how many people voted by 8am? How many by 3pm?—that I would need to use on election day. Now that I have that, let’s evaluate how the model worked.
Here was my last prediction from Tuesday night, for Democratic turnout.
I don’t have the final Democratic turnout yet, but the final overall count was… (drumroll)… 172,466. For Democrats and Republicans combined. At historic rates, this would mean about 162 thousand Democratic votes. That’s well outside my margin of error, nearly 4 standard errors below the estimate. Whoops.
So what happened? I’ve dug into the model and broken the results into three categories: the good, the bad, and the really really stupid.
First, let me remind you what the model was. Residents across the city submitted three data points: their Ward+Division, the time of day, and the voter number at their precinct.
I modeled the cumulative votes V for submission i, in Division d, at time t, using the log scale:
log(V_idt) = alpha_t + beta_d + e_i
This appears easy to fit, but the sparsity of the data makes it quite complicated. We don’t observe every division at every time, obviously, so need to borrow information across data points. I used an E-M style algorithm with following parametrizations:
alpha_t ~ loess(t)
beta_d ~ MultiVariateNormal(mu, Sigma)
e_i ~ normal(0, sigma_e)
This model implicitly assumes that all divisions have the same distribution of votes through the day. For example, while divisions can have different vote totals overall, if one division has 30% of its votes by 9am, then all do (plus or minus noise). I did not enforce that the time series should be non-decreasing, though obviously it must be. Instead, I figured the data would largely corroborate that fact. This does mean that the estimated trend can in general go down, due to noise when the true trend is flat. Eh, it largely works.
As for the divisions, mu is forced to have mean 0, so the overall level of voting is absorbed into alpha. I use mu and Sigma estimated on the historic final turnout counts of each division, and the covariance matrix of that turnout. I discuss what that covariance looks like here.
The final turnout is estimated as exp(alpha) * sum(exp(beta_d)), ignoring the small correction for asymmetry in exp(e_i).
Evaluating the Model: The good, the bad, and the really really stupid.
Good: your participation.
We ended the day with 641 submissions. Holy shit. Y’all are awesome.
[Note: I’m including some straggler submissions that came in through the rest of the night, which is why this result and what’s below don’t match the night-of results exactly. None of this changes the substantive results]
Really Really Stupid: What does Voter Count represent?
I announced that I was estimating Democratic turnout, because I remembered someone from the last primary saying that voter numbers are broken down by party. Turns out that’s wrong. And I never bothered to confirm. So the whole time I said I was estimating Democratic turnout, the model was actually estimating the total number of voters. So I should have been pointing to that overall 172 thousand popular count, and that's a lot closer to my estimate. From here on out, let's pretend I didn't mess that up, and use the 172K as the test comparison.
Bad (but defensible): Smoothing and the thunderstorm.
I never imagined I would have over 600 data points. To give you a sense of how many points I was expecting, I built and tested the model on datasets of 40. At that size, it was really important to strongly borrow information across time and divisions, and not overfit single points. So I hard-coded very strong smoothing into the time series. Notice how the time plot above keeps going up in a straight line from 5pm to 8pm.
It turns out (because y'all are amazing) that I had enough data points to use much less smoothing, and let the model identify more nuanced bends in the data. That would have helped a lot. Here’s what the plot looks like if I use much less smoothing:
You may also have noticed the giant thunderstorm that blanketed the city at 7pm. It turns out that voting completely flattened that hour. The over-smoothed model just assumed the linear trend would keep on going.
When I fit the model with the more adaptive smoothing parameter, the prediction would have been 173,656 (CI: 157,150 - 192,058). That’s… really close.
Good: Ward relative turnout
On the beta side of the model, things turned out pretty well. Here is the plot I created with the change in votes from 2014.
Here is a recent tweet from Commissioner Schmidt’s office.
The maps match, albeit with 39 and 26 switching places as well as 18 and 31. (Full disclosure: the size of the increase I was showing in Ward 26 made me *really* worried on election day that I had messed something up. Turns out that increase was real).
It's worth emphasizing how cool this is: with 641 submissions, many from before noon, we were able to almost perfectly rank wards based on their increased turnout.
Another way to evaluate the spatial part of the model is to compare my Division-level predictions with the true turnout. Below is a plot of my final estimate for each division on the x-axis, with the difference between the true turnout and my predicition (the residual) on the y-axis.
The residuals are centered nicely around zero. Notice that I only had data from 329 divisions, so 1,357 of these estimates came in divisions for which we observed no data, and were entirely borrowed from other divisions based on historic correlations.
Okay (?) ¯\_(ツ)_/¯: Confidence Intervals
In statistics, it's usually a lot easier to estimate a value than it is to put error bars around it. And a point estimate is of no use without a measure of its uncertainty. I've got a somewhat complicated model; how did my purported uncertainty do?
One way to test this out is the bootstrap. You sample from your observed data over and over with replacement, to create simulated, plausible data sets. You can then calculate your point estimate as if that were your data, and look at the distribution of those estimates. Voila, you have an estimate of the uncertainty in your method. The benefit of this is that you can mechanically explore the uncertainty in your full process, rather than needing to rely on math that uses perhaps-invalid assumptions.
This model is not a perfect use case of the bootstrap because it relies so heavily on having data from a variety of divisions. The bootstrap will necessarily provide fewer divisions than the data we have, because data points get repeated. Thus, we would expect the bootstrap uncertainty to be larger than the true uncertainty in the model with real data.
The bootstrap CIs are 35% larger than the estimated CIs I provided. I frankly don't have a great sense of whether this means my method underestimates the uncertainty, or if this is due to having fewer divisions in the typical bootstrap sample. I need to break out some textbooks and explore.
This project had some errors this time around, but it seems like with some easy fixes we could build something that does really, really well. Here are some additional features I hope to build in.
While you guys are awesome, some of you still made mistakes. Here is a plot from 12:30:
I was pretty sure that negatively many people didn't vote at lunch. That trend is entirely driven by that one person who reported an impossibly low number in their division at 12:30.
Here's another plot:
Someone claimed that they were the 10,821,001,651st voter in their division. That is larger than the population of the world and seems implausible.
Through the day I implemented some ad-hoc outlier detection methods, but for the most part my strategy was to manually delete the points that were obviously wrong. But there are some points that are still unclear, and which I ended up leaving in. I hope to build in more sophisticated tests for removing outliers by November.
Predicting final outcomes
Because nobody had collected time-of-day data on Philadelphia voting before, I couldn't predict the end of day turnout. This means, for example, at 3pm I could estimate how many people had voted as of 3pm, but was not predicting what final turnout would look like. I simply didn't know how voting as of 3pm correlated with voting after 3pm. But *now*, we have one election's worth of time-of-day data! I am going to work up an analysis on the time-of-day patterns of voting (stay tuned to this blog!) and see if it's reasonable to add a live, end-of-day prediction to the tool.
See you in November!
I'm going to put aside the live modelling business for a while, and spend the next few months looking at static analyses of these results, and what they might mean for November. But don't worry, the Live Election Tracker will be back in November!
Share your voter number at bit.ly/sixtysixturnout!
Philadelphia's 2018 Primary elections are coming on May 15th. I'm excited to announce that Sixty-Six Wards will be *live tracking* turnout on election day. And I need help from you!
What I need from you
On election day, vote! When you sign in to vote, you can see what number voter you are in your division. After voting, log in to at bit.ly/sixtysixturnout and share with me your (1) Ward, (2) Division, (3) Time of Day, and (4) Voter Number. Using that information, I've built a model that estimates the turnout across the city (see below).
You'll then be able to track the live election results at jtannen.github.io/election_tracker. Will turnout beat the 165,000 who voted in 2014, even with a non-competitive Senate and Governor primary? Will the surge seen in 2017 continue?
This estimation can only get better with more data points. So encourage your friends to vote, and share their voter number too!
Some note about the data collection: I only collect the four data points above (Ward, Division, Time, and Voter Number), and no identifying information on submitters. I *will* share this data publicly--again, only those four questions--in hopes that it can prove useful to others. And I am only using Democratic primary results (sorry Republicans, but there are simply too few of you, especially in off-presidential years, for me to think I could make any valid estimate).
Now for what you really care about: the math.
Estimating turnout live requires simultaneously estimating two things: each Division's relative mobilization and the time pattern of voters throughout the day. The 100th voter means something different in a Division that had 50 voters in 2014 than it does in a Division that had 200, and it means something different at 8am than it does at 7pm. Further, Philadelphia has 1,686 Divisions, and I don't think we'll get data on every Division (no matter how well my dedicated readers blast out the link). I use historic correlations among Divisions to guess the current turnout in Divisions for which no one has submitted data.
To estimate turnout, I model the turnout X in division i at time t, in year y, as
log(X_ity) = a_y + b_ty + d_iy + e_ity
The variable a represents the overall turnout level in the city for this year. b_t is the time trend, which starts at exp(b_t) = 0 and finishes at exp(b_t) = 1 so the time trend progresses from 0% of voters having voted at 7am to 100% having voted at 8pm. d_i represents Division i's relative turnout versus other divisions for this year, and e_it is noise.
To best estimate d_i, especially for Divisions with no submitted data, I use historical data. Using the Philadelphia primaries since 2002 (excluding 2009, where something is weird with the data), I estimate each Division's average relative turnout versus other Divisions (its "fixed effect"), and the correlation among Divisions' log turnout across years. Divisions are often very similar to each other: when one Division turns out strongly in an election, similar Divisions do too.
Here are the estimated average relative turnouts (the fixed effects):
We see a familiar pattern. Center City, Germantown and Mount Airy, and Overbrook all vote at disproportionately high rates, while the universities, North Philly, and the northern River Wards vote at disproportionately low rates.
That's across all years. But the way Divisions over- or under-perform these averages in a given year creates patterns as well. Divisions' turnouts are correlated with each other.
We have 1,686 Divisions and only 16 years, so we need to simplify the covariance matrix to estimate covariance among Divisions. To do that, I use Singular-Value Decomposition to identify three dimensions of turnout. These dimensions represent groups of Divisions that swing together: when one Division in the group turns out higher than usual, the others do to. The signs are not meaningful; some years the Divisions with a positive sign turn out higher, other years those with negative signs. What's important is that the positive and negative signs move oppositely.
SVD assigns a score in each dimension for the dimensions and for the years. Divisions with a positive score in a dimension turn out more strongly in years with positive scores, Divisions with a negative score turn out more strongly in years with negative scores.
Eyeballing the score maps together with the years' scores serves as a sanity check for years, and provides intuition to the underlying story. The dimensions are ordered from the strongest separation to the weakest. I'm not going to pay too close attention to the specific values of the scores, what matters most is the relative values.
Dimension 1 has clearly identified the racial divide in the city. Divisions with positive scores are predominantly Black and Hispanic, while divisions with negative scores are predominantly White (again, the signs are not meaningful). Divisions with positive scores voted disproportionately in 2012 and 2003 (President Obama's and Mayor Street's reelections, respectively), while divisions with negative scores turned out particularly strongly in 2017. Interestingly, the year with the lowest score, meaning the greatest disproportionate turnout in non-Hispanic White neighborhoods, was the 2017 DA's race won by Larry Krasner.
It's less obvious from the map what Dimension 2 captures, but the time series is clear: Dimension 2 identifies Divisions where relative turnout has steadily increased over the 16 years. This also lines up with the neighborhoods that have gentrified: Powelton, Fishtown, Fairmount, and Girard Estates have seen the strongest trends up, while broad swaths of North Philly and the Greater Northeast have seen relative decreases over time.
Dimension 3 identified Divisions that surge in turnout specifically for Presidential primaries. Penn and Drexel obviously see the strongest swings, though Southwest Philly and Hispanic sections of North Philly also have positive scores: these neighborhoods voted strongly in 2016 and 2008, relative to their typically low overall turnout.
These dimension allow me to calculate the smoothed covariance matrix Sigma, among divisions. In a given year, then, the vector of Division effects, d, is drawn from a multivariate Normal:
d ~ MVNormal(mu, Sigma),
with mu the Division fixed effects.
The time series of voting throughout the day is currently completely unknown. What fraction of voters vote before 8am? What fraction at lunch time? Knowing how to interpret a data point at 11am hinges on the time profile of voting. For this model, I assume that all Divisions have the same time profile: all divisions have the same fraction of their voters vote before time t, with possible noise. Since I don't have historical data for this, I will estimate it on the fly.
I model the total time effect including the annual intercept, a_y + b_t, using a loess smoother.
The full model
Having specified each term in
log(X_ity) = a_y + b_ty + d_iy + e_ity,
I fit this model using Maximum Likelihood.
The output is a joint modeling of (a) the time distribution across the city and (b) the relative strength of turnout in neighborhoods. Come election day, you'll be able to see estimates of the current turnout, as well as how strong turnout is in your neighborhood!
See you on Election Day!
2018 could be an exciting moment for Philadelphia elections. The primary will give us an important signal for what to expect in November. Please help us generate live elections by voting, sharing your data, and getting your friends to do the same. See you May 15th!
Back in December, I looked at how many votes it takes to become a Democratic Party Committeeperson. Philadelphia's Committeepeople are the foot soldiers of the Party, responsible for getting out the vote and organizing the party in the 1,686 Divisions. Each Division has 2 committeepeople, so every four years a potential 3,372 Philadelphians are elected to the post (I focus on Democrats here, though the same is true for Republicans). In 2014, 348 (10%) of those positions went completely unfilled, while another 275 of the positions were won by Write-In candidates, usually in districts with less than two candidates on the ballot.
An open question for the upcoming May Primary is whether Democrats' newfound energy will translate to the rank-and-file positions of local the local Party. Well, applications have been filed and the Commissioner's office released the official slate of candidates.. How do the numbers bode for that trickle-down energy?
The hypothetical surge in Committeeperson candidates definitely did not materialize.
The surge in candidates didn't happen
In total, there are currently 3,204 candidates on the Democratic ballot, after a number of applications were contested and rejected. That compares to 3,098 that survived to the election in 2014. The counts are from slightly different points in the process. The 2014 data uses election results of non-write-in candidates, while the 2018 uses the recently released Commissioner's data on candidates who survived potential challenges.
There are currently a total of 558 seats that have no candidate on the ballot, 61 fewer than four years ago.
Some 1,615 of those candidates are incumbents. To calculate incumbent candidates, I use fuzzy text matching on candidate names between 2014 and 2018. This tests whether two names are the same based on the fraction of characters that are different. Matching is harder than it may seem because of variability of how candidates write out their names: spellings may change, a candidate may identify as Junior in one but not another, Elizabeth may change her listed name to Betsy. Rather than manually assign incumbency, I automate the process; 've spot checked the fuzzy matching on 40 borderline matches and think I've got a good first-order approximation to incumbency assignments.
The wards with the highest number of candidates per division are in Wards 1 and 2 in Queens Village, 55 in the lower Northeast, and 46 in West Philly. The wards with the lowest include 27 and 20, which include Penn and Temple, respectively.
The most astonishing increase is in the 58th Ward in the Northeast, which has 66 more candidates in 2018 than it did in 2014. Ward 42 saw the the greatest decrease
There has not been an obvious surge in energy across the city. 34 of the 66 wards saw increases in the total number of candidates, while 29 saw a decrease.
Which Wards have the energy?
Maybe to see the energy, we need to look in specific places in the city. I often find two useful ways to break up the city: by race, and by vote in the 2016 primary. Race often captures an important axis for identity and experience in the city, while vote in the 2016 primary does two things: differentiates White wards between more establishment (Clinton voters) and less establishment voters (Sanders), and differentiates predominantly Black wards that nonetheless are in the process of gentrifying. Wards 46 and 47 neighbor Penn and Temple, respectively, and are predominantly Black Wards that voted relatively strongly for Sanders, largely because of the sizeable young White population.
There is one dimension that is not captured by the Race x 2016 primary distinction, and that is the Trumpiness of White voters. For example, many White Northeast Wards voted for Sanders, but then also swung towards Trump in the general election. This suggests that their Sanders votes may not have been a declaration of progressivism, but a vote against Clinton. Looking at Sanders-voting White wards will conflate the young progressives in the center of the city with the anti-Clinton voters of the Northeast.
For reference in the upcoming discussion, here is a map of wards' predominant race and ethnicity, calculated using the 2012-2016 American Community Survey.
Below is a plot of the average number of 2018 committeeperson candidates per division, plotted by predominant race and 2016 Primary vote. For the most part, White wards have the most candidates, followed by Black wards and then last Hispanic wards. The Black ward with the most candidates is ward 46, which is actually a rapidly gentrifying ward that Bernie almost won.
The most interesting trend is within Hispanic wards, where the trend in White and Black wards is reversed: the number of candidates on the ballot increases with vote for Clinton. Wards 7, 19, and 43 voted strongest for Clinton in the city, and have around 2 candidates per Division. I read this reversal as demonstrating that political organization is different within Hispanic wards from the others. In Hispanic wards, party organization is correlated with establishment votes, while the connection is less clear in other types of wards.
So how about the change in the number of candidates? Did this newfound energy make Bernie-loving wards mobilize committeepeople en masse?
Finally, those changes in candidates running also mean that White wards have the most non-incumbents running. Some 42% of candidates in White wards are incumbents, compared to 53% in Hispanic wards and 57% in Black wards.
What to look for in the Primary
The May 15th primary will see many new Committeepeople be seated, and decide the which Democrats run for all of the important (newly-redistricted) U.S. House and the PA State House and Senate races. The surge in energy that many predicted didn't seem to materialize in residents running for committeeperson, but the primary will give us a much better sense of who is energized where, ahead of the national November midterms (and race for governor).
I've got some exciting news planned for the primary. Stay tuned!
Forecast: Who will win the PA House?
The race for the Pennsylvania Senate
The race for the Pennsylvania House
Evaluating the Live Election Tracker
So you wanna be a Committeeperson