Mapping Congressional Cosponsorship Over Time
A recently developed interest of mine has been visualizing and studying social networks. At first, I looked to revisit the Instagram API I used in the past to study follower/following relationships amongst verified accounts (who doesn't want to know if Taylor Swift is really the most central celebrity ever?). Unfortunately, Instagram has now closed off their API, so we found a more timely dataset to analyze - sponsorships and cosponsorships in legislation.
This article mainly introduces the dataset, visualizes the networks over time, and does some preliminary analysis. Much deeper analysis will follow in the future.
The Data
The data is all available through the GovTrack API, which tracks all legislation that goes through the House and Congress. All code for this article is found on Github and interactive visualizations can be found on Plotly, a visualization tool that can layer interactiveness onto regular matplotlib graphs.
I tracked sponsorships and cosponsorships in House and Senate bills from Congress #100 (1987-89) to Congress #114 so far (2015-17) for bills and joint resolutions. I disregarded approval status because we simply want to study the social dynamics of which legislators support one another, regardless of the outcome of the bill. We do not include simple resolutions, which do not have the force of law.
Visualizing Networks
All networks are constructed as directed graphs. Nodes represent congresspeople who have sponsored legislation, and edges lead into these nodes from other nodes who cosponsor, or support, their legislation. Edge weights are assigned by the number of times a cosponsor has sponsored a particular congressperson, so more cosponsorships from Person B to Person A will result in a greater edge weight between B and A. Node weights are assigned by total number of cosponsorships that person has, or the weighted in-degree of that node, so the more cosponsorships from any person to Person A will result in a greater node weight for person A. In the Senate, a bill may have multiple sponsors whereas in the House, that is not the case. Thus, most of our analysis following will be comparing cosponsorships as a more direct basis of comparison.
All node positions are visualized using force-directed drawing algorithms, which display pairs of nodes with greater edge weights closer together and those with lesser edge weights further apart. Nodes closer together represent more weight, or cosponsorships, between these nodes. Most of these graphs show a cluster of nodes in the center that are close together, meaning they have many cosponsorships with many of the other nodes in the center. The most recent Congress (#114) is shown directly below; historical congresses are in links following.
We also look at average eigenvector and in-degree centralities over time for the top 50 senators and representatives. These averages are based on the number of terms they served.
In-degree centrality of a node is proportional to the in-degrees to that node, which measures the effectiveness of a congressperson in attracting cosponsors. Eigenvector centrality also depends on the degree of connections, but additionally counts the centrality of those connections: $C_{E}(v_{i}) = \frac{1}{\lambda}\sum_{j \neq i}(A_{j,i}C_{E}(v_{i}))$, where $A$ is the square adjacency matrix of the network and $\lambda$ is some constant. If we write $C_{E}(v)$ as a vector of the eigenvector centralities of all nodes, then we can say $\lambda$ times this vector equals $A^{T}C_{E}(v)$, so $\lambda$ is the eigenvalue and $C_{E}(v)$ is the eigenvector. In this context, the most central senators and representatives are those connected to influential senators and representatives.
Senate
House
Modularity
Next, we look at divisiveness of the House and Senate over time by mapping modularity over time. Modularity is a metric that essentially sums up for all node pairs, the difference between the actual and expected number of edges between them. High modularity indicates that edge connections are not random. Instead, there are dense connections amongst some nodes, and sparse connections to other densely connected nodes. First, we look at the modularity given a partition along political party lines:
We can see that modularity in the House of Representatives is typically higher than in the Senate, a conclusion supported by Y. Zhang, 2009. This may be explained by the intuition that House elections are smaller and more local than statewide Senate elections, and thus have a higher likelihood of electing more partisan people. We can see modularity rise dramatically during the 112th Congress for both the Senate and the House, a period that has been deemed "the least productive since the Civil War" due to extreme polarity. This period was after the 2010 midterm elections, in which more partisan congresspeople were elected into Congress and Republicans took both houses. This polarity was compounded by the fact that Obamacare was signed in March 2010 before these midterm elections.
Next, we look at the clusters that the Louvain method identifies in these Congresses over time and compare them to the actual split along party lines. This network cluster detection methoditeratively maximizes the modularity measure (measure of divisiveness):
High error periods generally correspond to periods when party-modularity is not high, as expected.
What's Next?
After we've introduced our data and did some basic visualizations and analyses, there is a lot left to explore. Possible next steps, as time allows, include:
1) Panel-regressing centrality measures on senator or representative characteristics over time. Are women less central? Are older people more or less central? Does being from a certain state automatically make you more or less central?
2) Reciprocity. Are some pairs always voting for each other?
3) Predicting cosponsorship edges based on the characteristics of the sponsor, or even the content of the bill.