Can the Country Be United Again Reddit

Abstract

Echo chambers in online social networks, whereby users' behavior are reinforced by interactions with agreeing peers and insulation from others' points of view, have been decried as a cause of political polarization. Here, nosotros investigate their role in the debate around the 2016 US elections on Reddit, a key platform for the success of Donald Trump. Nosotros place Trump vs Clinton supporters and reconstruct their political interaction network. We discover a preference for cross-cutting political interactions between the two communities rather than inside-group interactions, thus contradicting the repeat chamber narrative. Furthermore, these interactions are asymmetrical: Clinton supporters are particularly eager to respond comments by Trump supporters. Abreast asymmetric heterophily, users testify assortative behavior for activeness, and disassortative, asymmetric behavior for popularity. Our findings are tested confronting a null model of random interactions, by using ii unlike approaches: a network rewiring which preserves the activeness of nodes, and a logit regression which takes into account possible misreckoning factors. Finally, we explore possible socio-demographic implications. Users show a tendency for geographical homophily and a pocket-size positive correlation between cross-interactions and voter abstention. Our findings shed light on public opinion germination on social media, calling for a better understanding of the social dynamics at play in this context.

Introduction

Polarization is a defining characteristic of contemporary politicsⁱ. Polarization along party lines in the United States is on the riseⁱⁱ, and 2016 elections accept deepen the separate³. This polarization is easy to detect online, and specially on social media, where people share their opinions liberally. Indeed, several platforms have been the subject field of polarization studies, from Twitter, to YouTube, to Facebook^4,5,6,7. On Twitter, in particular, political polarization tin can be exacerbated by social bots that amplify divisive messages⁸. Polarized issues fall non only forth ideological fault-lines⁴, but can besides touch whatever collectively resonant topic⁹.

Several scholars have identified social media itself as a cause of polarization, citing "echo chambers" as a cause^10,xi,12. Echo chambers are situations in which users have their beliefs reinforced due to repeated interactions with like-minded peers and insulation from others' points of view^13,xiv. The dynamics leading to echo chambers on online social networks have been associated to selective exposure^xv, biased assimilation^sixteen, and group polarization¹⁷; in particular, Garrett¹⁰ pointed to the pursuit of opinion reinforcement equally a possible crusade. Echo chambers have been empirically observed and characterized around several controversial topics, such every bit abortion or vaccines^13,18. Many take expressed concern that, every bit citizens become more polarized virtually political bug, they exercise not hear the arguments of the opposite side, just are rather surrounded by people and news sources who express only opinions they agree with (e.g., Marking Zuckerberg^xix). Even so, the very existence of such echo chambers has been recently questioned^20,21, as well as their relation with the news feed algorithm of unlike social media platforms¹⁴. In item, the result of repeat chambers in increasing political polarization has been put under scrutiny^21,22.

In this paper, nosotros consider a highly polarized upshot, the 2016 The states presidential elections, and investigate the role of echo chambers on social media in exacerbating the argue. We focus on Reddit as a platform where to study political interactions between groups with reverse views. Reddit is a social news aggregation website, in 2016, it was the seventh almost visited website in Usa, with more 200 one thousand thousand visitors. It was a fundamental platform for the success of Donald Trump'southward political campaign²³. Given this context, nosotros ready out to characterize the interaction patterns betwixt opposing political communities on Reddit, by considering supporters of the two main presidential candidates, Clinton and Trump. So, we wait at the way they interact in a common loonshit of political word (i.e., the most popular subreddit related to politics), by reconstructing the information flow betwixt users, as determined by their comments and replies. Are echo chambers responsible for the increased polarization on Reddit during the 2016 electoral cycle²⁴?

Our empirical investigation shows that there is no testify of repeat chambers in this case. On the contrary, cantankerous-cutting political interactions betwixt the two communities are more frequent than expected. This heterophily is not symmetric with respect to the ii groups: Clinton supporters are peculiarly eager to answer comments by Trump supporters, an disproportion that is not explained by other confounders. Finally, nosotros inquire how these findings are modulated by socio-demographics, environmental characteristics of the Reddit users involved in the discussions, determined by geo-localization of such users. Our results indicate at a preference for geographical homophily in online interactions: users are more likely to interact with other users from their ain state. We discover a statistically significant (albeit pocket-sized) positive correlation between cross-interaction and voter avoidance, which may support the hypothesis that exposure to cross-cutting political opinions is associated with diminished political participation^25,26,27. We obtain a similar effect, although in the negative direction, for living in a swing state: cantankerous-interactions are suppressed in this case.

These results have important implications in terms of our agreement of public opinion formation. It is often assumed that repeat chambers can be pierced by increasing the amount of cross-cutting content and interactions between the polarized sides^25,28. Instead, the present study shows that polarization effectually a highly controversial issue, such as 2016 U.S. Presidential elections²⁹, can co-exist with a large presence of cross-cutting interactions. The nature of these interactions might even increase polarization via "backfire effect"^xxx, as recently empirically found for Twitter²². Alternatively, cross-cutting interactions and polarization might be the result of growing underlying socio-economical divisions^31,32. Overall, our findings telephone call for a better understanding of the social dynamics at play in this context before suggesting technical solutions for such social systems, which could have unintended consequences.

Political interactions on Reddit

We gather data from Reddit, through the Pushshift collection³³. Reddit is organized in communities, called subreddits, that share a common topic and a specific set of rules. Users subscribe to subreddits, which contribute to the news feed of the user (their domicile) with new posts. Inside each subreddit, a user can post, or annotate on other posts and comments. Thus, the overall discussion nether each post evolves equally a tree structure, growing over fourth dimension. In addition, users can likewise upvote posts and comments to show approving; they manifest disapproval with a downvote. Each message therefore is associated to a score, which is the number of upvotes minus the number of downvotes it has received.

Given the two-political party nature of the US political organization and the polarized state of its political soapbox, nosotros approach the trouble by modeling the interactions betwixt groups of users labeled past their political leaning—specifically, according to which candidate they support in the 2016 presidential elections. We so model such interactions as a weighted, directed network, where nodes represent users and links represent comments betwixt them. On top of political leaning, we likewise characterize the users in terms of their activity, i.e., their propensity to engage in interactions with other peers, and popularity, every bit given by the score assigned to their comments. In the rest of this section we explain these three steps more than in detail.

Political leaning of Reddit users

We identify the political leaning of Reddit users by looking at their posting beliefs. With respect to the 2016 US presidential elections, users tin can exist characterized as supporting the Democratic candidate, Hillary Clinton, or the Republican candidate, Donald Trump. On Reddit, we identify specific subreddits dedicated to supporting the main presidential candidates. For Donald Trump, we select the subreddit r/The_Donald; for Hillary Clinton, we choose the subreddits r/hillaryclinton and r/HillaryForAmerica.

The subreddit r/The_Donald was created in June 2015, at the beginning of Donald Trump campaign for the Republican party nomination. Information technology has been one of the largest online communities of Trump supporters, with 269,904 users in Nov 2016. Participation in this subreddit is a valid proxy to written report Donald Trump support, as the rules of this subreddit explicitly country that the customs is for "Trump Supporters Only", and that dissenting users will exist removed. As such, it has been previously used in literature to analyze the behavior of Trump supporters^34,35. r/hillaryclinton and r/HillaryForAmerica are the main communities that supported the Hillary Clinton'southward campaign in 2016. The former was created in 2015, while the latter was created in 2016 specifically to back up her presidential bid. In November 2016, they were able to attract 35,142 and 3025 Reddit users, respectively. Since the stated goal of these communities is to support her presidential entrada, and they forbid the utilise of the subreddit to campaign for other candidates, we consider active participation in these communities equally a practiced proxy for support for the Democratic party candidate. Nosotros call these subreddits the home communities for each candidate. Nosotros place 117,011 users who actively posted on r/The_Donald in 2016, and thirteen,821 on r/hillaryclinton and r/HillaryForAmerica. Given the massive use of Reddit equally a political tool by Trump'southward campaign²³, the difference in size between the two communities is not surprising.

Although these subreddits are defended to supporters of the candidate, we find that 3702 users postal service in both subreddits (2.ix%). In order to disambiguate the leaning for these users, we retrieve the Reddit score of their comments in the abode communities. The score represents the difference betwixt the number of upvotes and the number of downvotes assigned past other users visiting the same subreddit. Upvotes are generally understood to encode approving, appreciation, or agreement; downvotes encode their opposites. Thus, a user with a higher score on Clinton and a lower score on Trump is about likely a Democratic supporter. Following this reasoning, among all users who posted on both dwelling communities, we consider a user as Clinton supporter if they accept an average score on their comments on the Clinton dwelling house community that is larger than the score on the Trump dwelling customs and vice-versa. Users with tied scores are discarded, as they represent just 5% of the set of tied users (0.145% of the overall fix of users).

Therefore, nosotros define the political leaning of a user u equally a binary label $L_{u}$, assigned as Clinton supporter ($L_{u} =C$), if they post only on Clinton'due south dwelling community, or they posts on both communities and have a larger average score on Clinton'due south customs, and every bit Trump supporters otherwise ($L_{u} =T$). Our method identifies ten,240 users as Clinton supporters and 110,806 users equally Trump supporters.

Network of interactions on Politics

To study the interactions betwixt the ii sides, we demand a community that is visited regularly past both groups, just which is nonetheless topically related to politics and popular enough. The best candidate for such a office is r/politics, since it is the largest political subreddit. We collect all submissions and comments in the twelvemonth 2016. From the collected comments, we reconstruct the network of political interactions amongst the users we previously identified. Among these users, 31,218 authored a bulletin on r/politics in 2016 and thus appear as nodes V in the graph ($N_T={27{,}012}$ Trump supporters, $N_C={4206}$ Clinton supporters). Nodes correspond to users with known political leaning, while a weighted, directed link (u,five) corresponds to user u posting a comment as a response to user 5. The weight $w_{uv}$ corresponds to the number of such interactions from u to five. Note that the link management represents the interaction, and is contrary to the information menstruum (user u should have read what five wrote to respond, but it is not guaranteed that v volition read u's answer).

Table 1 Main properties of the Politics network: number of users N, divided in Trump/Clinton supporters $N_T$ / $N_C$, number of links East, average degree $\langle k \rangle$, reciprocity $\rho$ (fraction of bidirectional links over the full), and total number of interactions W.

Full size table

In the Politics network, the probability to observe a node labelled equally $X \in \{C,T\}$ (henceforth, X node for brevity) in the network is $P(Ten) = N_X/Due north$, corresponding to $P(T) \simeq 0.87$ for Trump, $P(C) \simeq 0.13$ for Clinton. The main properties of the Politics network are reported in Tabular array one. The joint probability to notice an interaction from an X node to a Y node reads

(i)

where the rows of the matrix indicate the leaning of the writer of a comment, and the columns the ane of the target, $W = {716{,}765}$ is the total weight of the links in the network (that is, the number of interactions betwixt all considered nodes), and $W_{XY}$ is the weight of directed links from X nodes to Y nodes:

$$\begin{aligned} W_{XY} = \sum \limits _{u,five \in V \mid L_{u} =X \wedge L_{v} =Y} w_{uv}. \end{aligned}$$

We denote with $W_{\rightarrow Ten} = \sum _Y W_{YX}$ the number of interactions received by X nodes ($\sum _Y$ denotes the sum over all possible label assignments to Y), and $W_{X \rightarrow }$ the ones originated past X nodes. Information technology follows that $\sum _{XY} W_{YX} = \sum _{X}W_{X \rightarrow } = \sum _{X}W_{\rightarrow X} = W$.

Diagonal elements of the matrix in Eq. (1) represent to the interactions within political groups, off-diagonal to those across groups. The sum by rows (columns) of the matrix in Eq. (i) corresponds to the probability that an Ten node initiates (receives) an interaction, $P(X \rightarrow ) = \frac{W_{Ten \rightarrow }}{West}\,$ ($P(\rightarrow Ten ) = \frac{W_{\rightarrow X}}{W}$). From Eq. (1), interactions across communities, or cross-interactions look symmetric between Clinton and Trump communities. Notwithstanding, joint probabilities exercise not take into account the deviation in size between the ii groups. This effect stems from the fact that the probability that Clinton nodes initiate an interaction, $P(C \rightarrow ) = W_{C \rightarrow }/W \simeq 0.35$ is much larger than the fraction of Clinton supporters in the network, $N_C/Northward \simeq 0.13$, which implies that Clinton supporters accept much larger weighted out-degree than Trump ones.

These characteristics can exist further inspected by considering the conditional probability to discover an interaction from an X node to a Y node, given that the start node has leaning Ten,

(2)

By looking at the columns of Eq. (2), in absence of homophilic or heterophilic effects, 1 would wait elements of each column to be equal: given the author of a comment, the probability to collaborate with the 2 groups would be equal, given only by the size of the group. Instead, we can notice that Clinton supporters tend to collaborate more with Trump supporters (72% of interactions) than Trump supporters themselves within the customs (62%). The same effect is visible for Trump supporters, who are more than likely to interact with Clinton ones (38% of interactions) than the Clinton community within itself (28% of interactions). These intuitions will be solidified in Section 3, by comparing these values to a null model of random social interactions.

Finally, nosotros compare the average sentiment polarity of each blazon of interaction. To do so, first we mensurate the sentiment polarity (ranging from $-1$ to 1) of the textual content of each interaction according to VADER ³⁶; and so, we compute the average values according to the possible pairs of labels. In this way, we obtain:

(3)

Kickoff, we observe that interactions inside Trump supporters are more than negative than interactions within Clinton supporters (average sentiment of 0.0575 vs 0.0126). In improver, cross-cutting interactions between groups have on average a more than negative sentiment than interactions within groups. That is, Clinton supporters commenting Trump supporters accept an average sentiment of 0.0110, while when commenting on other Clinton supporters the boilerplate sentiment is 0.0575. The same is true for Trump supporters. This difference is consistent with the hypothesis that cross-cutting interactions are a potential expression of conflict.

Reddit score and action of users

Political interactions on Reddit tin can be further characterized in terms of the score assigned to each comment or submission, and the action of users, i.e., their propensity to engage in interactions with other peers.

In network terms, the activity of a user u, $a_u$, can exist measured by the total weight of out-going links from node u, which corresponds to the out-strength of node u: $a_u =\sum _v w_{uv}$. Figure 1a shows the activity distribution P(a) in the Politics network, plotted separately for Clinton and Trump supporters, both with typical heavy-tailed behavior. The action distribution of Trump supporters decays more quickly than for the Clinton ones, thus indicating a propensity to appoint in a larger number of interactions from Clinton supporters.

The Reddit score of a comment is a measure out of its popularity and, as such, it strongly depends on the subreddit where this comment is posted: popular comments posted on the subreddit r/The_Donald volition be likely unpopular in subreddit where contrary political views dominate, such as Clinton-oriented subreddits. We ascertain the popularity of a user u on a subreddit as the average score of their comments on that subreddit, $s_u$, and it volition thus depend on the subreddit under consideration. Figure 1b shows the popularity distribution P(due south) of users in the Politics network, separately for Clinton and Trump supporters. While the function form of the P(s) distribution is like for Clinton and Trump supporters, comments by Clinton supporters have much larger scores on boilerplate, while the scores of Trump supporters span a larger interval of values. This observation implies that the overall attitude on the politics subreddit is more favorable to comments from Clinton than from Trump supporters, although users classified equally Trump supporters are a much larger set than Clinton supporters.

This liberal bias in the general opinion of r/politics, however, does not seem to discourage Trump supporters from commenting in large numbers. Therefore, since we wish to study the two communities and how they interact, r/politics is the all-time arena to observe such interactions. Our set of users of interest is not a representative of r/politics users. Yet, we are not interested in studying the typical behavior of users in this subreddit, only in analyzing how these two polarized communities collaborate in this arena. The fact that the two communities are non representative of the politics subreddit is therefore of no consequence.

Comparing with a null model of random interactions

To understand whether the empirical patterns observed in the previous department represent a consistent beliefs, nosotros need to compare them with a theoretical cipher model of interactions. The simplest null model for our data follows the hypothesis that the interactions are unaffected by the political leaning of users. In mathematical terms, the null model is a directed, weighted, random network (RN). This network is obtained by reshuffling links of the original network while preserving the in- and out-strength (weighted degree) of each node.

In this network, the probability to observe a link from an Ten node to a Y node is the production of two independent probabilities: the probability that an X node initiates an interaction, and the probability that a Y node receives an interaction,

(4)

The RN model preserves both the in- and out-strength sequence of nodes, while rewiring connections among them, thus post-obit the so called configuration model ³⁷. In this RN model, the conditional probability to observe a link from an 10 node to a Y node, given that the start node has leaning 10, reads

(v)

In the following, we investigate deviations of observed data from this RN model, so to highlight specific patterns of behavior.

Departure of conditional and articulation probabilities

The difference between the empirical and random joint probabilities, given by Eqs. (i) and (4), respectively, is shown in Fig. 2a. Cantankerous-interactions betwixt reverse political groups in the Politics networks happen more oftentimes than expected in a RN model, with an odds ratio of 1.195. This observation implies that there is a certain degree of heterophily in interactions, i.e., the preference to interact with users from the the opposite political group. This result is surprising, considering the ample literature about homophily in social networks and particularly nearly repeat chambers in political give-and-take on social media^4,13. The difference betwixt empirical and random provisional probabilities, given by Eqs. (2) and (v), respectively, is reported in Fig. 2b. The Politics network is characterized by an asymmetry between the ii political groups: Clinton supporters interact with Trump supporters more than the other way around, with a 6.two% increase with respect to a random null model on one side and 3.3% on the other.

Given that Trump and Clinton supporters are different according to several metrics, there may be some confounding effects in the interactions. In detail, we explore the roles of activity (the forcefulness of the node) and popularity (the average score of the node). To give a visualization similar to the ones in Fig. ii, we define action and score classes from the distributions shown in Fig. 1. Nosotros manually define four score classes, from depression to loftier score, and 8 activeness classes, from low to high activity. In the post-obit nosotros indicate for brevity a user of score course due south as a s-user, and a user of activeness form a as a a-user.

Next, we define the empirical probability to observe a link from an a-node to an $a'$-node, $P(a\rightarrow a')$, by applying Eq. (i) to activity classes, i.east., $P(a\rightarrow a')= W_{a,a'}/W$ where $W_{a,a'}$ is the number of interactions from a-nodes to $a'$-nodes. The same can be washed for the conditional probability, by applying Eq. (2) to activity classes. For random interactions represented by the zilch model, ane can obtain the joint probability $P_{RN}(a\rightarrow a')$ and conditional probability $P_{RN}(a \rightarrow a' \vert a)$. Effigy three (left) shows the difference betwixt empirical and random conditional probabilities for interactions with respect to activity classes. Positive values bespeak that a pair of classes interacts more than expected past random adventure; negative values indicate that they interact less than expected. Thus, we observe that users show an assortative beliefs with respect to activity, i.due east., user with high activity tend to interact with similarly-active users, and the aforementioned for users with low activity.

Nosotros tin too define the probability to detect a link from a south-node to a $southward'$-node, $P(s\rightarrow south')$, by applying Eq. (1) to score classes. The same can be done for the conditional probability and for the goose egg model. Figure three (right) shows the deviation between empirical and random provisional probabilities for interactions with respect to score classes. Users show a disassortative, slightly asymmetric behavior with respect to scores: users with low score tend to collaborate with users with high score, viceversa is less slightly frequent. This is due to pop comments alluring many comments from other users, generally of low popularity.

Given these characteristics of the network, it is of import to empathise how the differences in behavior w.r.t. activity and score affect the heterophily and disproportion results found for the leaning. To do and then, nosotros need a unified model that puts all these ingredients together, and confronts the resulting model vs the random network nix model. The side by side department explains our approach to tackle this task.

Logit regression model

So far, we have recognized the furnishings of dissimilar groups and user characteristics in the commenting behavior. We at present wish to assess whether the effects we accept identified and so far are statistically pregnant, and what is the relationship among them. In item, nosotros want to run across how the variables of involvement for our study (customs interactions) are confounded by the popularity variables. At that place is no need to control for activity variables every bit our zippo model already takes that aspect into account, as explained adjacent. To exercise so, we design a logit model, in order to quantify the odds of inside-group and cross-group interaction, and the role of each variable. Such a model likewise allows usa to report the issue of geographical-based variables in the next section.

Our logit model defines the probability of u interacting with 5 every bit a office of the features of u and five. Nosotros consider iii sets of features: community interaction features (the political labels we defined), misreckoning Reddit features (popularity metrics), and ecology features which capture real-earth phenomena.

For community features, similarly to the previous section, we consider the following set of binary features:

Clinton back up: 1 if $L_u=C$ (u is a Clinton supporter), 0 otherwise.
Cantankerous-group: one if u and v back up different candidates ($L_u \ne L_v$), 0 otherwise.
Clinton support, Cross-group: interaction feature betwixt the previous variables, 1 if $L_u=C$ and $L_v=T$, 0 otherwise.

The combinations of these variables, that we assume to be independent, represent all four possible scenarios of political labels (depicted in Fig. 2a).

We then add several variables to control for potential effects of unlike levels of popularity between supporters of Clinton and Trump. We operationalize these misreckoning features as follows:

Average score: average score obtained, separately, by v (target) and by u (author) ($s_v$ and $s_u$).
Difference in average score: the absolute difference between the average scores of v and u, namely $\vert s_v - s_u \vert$.
Fraction of positives: the fraction of comments with a positive score over the total number of comments, separately for v (target) and for u (writer).
Difference in fraction of positives: the absolute difference between the fractions of u and v.

We quantile-normalize these features and so that they all brandish the aforementioned distribution, thus allowing to easily interpret the coefficients of our logit regression model.

We choose to utilize both the boilerplate score and the fraction of comments with positive score because the scores accept a heavy tailed distribution, therefore the boilerplate might be skewed. Conversely, the fraction of positives represents a summary statistic on a boolean property, and thus captures a different aspect of the data. Nosotros likewise add the differences to include link-based features that capture the dynamics of the interaction between the specific author and target.

To model the effects of these variables on annotate cosmos, we utilise a logistic regression model. In this data set, we accept $N={31{,}218}$ users and $W={716{,}765}$ comments. Information technology is thus unfeasible to use the consummate set of negative links (i.east., pairs of users (u,five) where u did not interact with 5). We therefore resort to sampling the set of negatives. This sampling procedure only changes the value of the intercept coefficient, which is non of involvement, and does not bear upon the estimation of the important parts of the model (the coefficients of the community variables under study).

It is important to carefully cull the sampling strategy, as it needs to faithfully represent the null model we are considering. In the zilch model we presented in Sect. 3, we consider the author and the target of the comments to be fixed, that is, we rewire the network while preserving the in- and out-forcefulness of each node, equivalent to a configuration model³⁷. In the sampling strategy of negative links, we follow exactly the same procedure. Nosotros cull node u with probability proportional to their out-strength or activity $a_u$, node five with probability proportional to their in-strength, divers in Sect. ii.3. If the link between u and 5 exists, we discard it. This way, the negative sample reflects exactly the null model presented in Sect. iii: the probability of considering a pair of nodes is but the production of two independent probabilities—the probability that a node u initiates an interaction, and the probability that a node v receives it. The role of logistic regression is thus to capture how the variables nosotros consider alter the chances of observing a link $u \rightarrow five$.

Logit regression results

First, we present results for the model that only considers the community variable. We study the odds ratios obtained for this model in the first cavalcade of Table 2. All the coefficients are statistically significant at the 0.1% level. These results ostend our analysis so far:

(i)
comments on r/politics are heterophilic: the likelihood of u answering to 5 increases when u and 5 support different candidates (odds ratio 1.195);
(ii)
Clinton supporters are less likely to leave a comment than Trump'south (odds ratio 0.942), in full general;
(iii)
however, Clinton supporters are asymmetrically slightly more likely to leave a annotate when the user they are responding to supports the other candidate (odds ratio 1.064).

Table 2 Odds ratios obtained past logistic regression.

Total size tabular array

We can too compute the interaction matrix obtained from this model. To exercise so, we multiply the odds ratios obtained by the model for each of the four possible combinations of groups between author and target. In this way, we obtain***

(6)

The results obtained via this methodology are in line with those presented in the previous section (i.due east., Fig. two).

Then, we control for the other Reddit variables we analyzed, in club to appraise whether these effects are robust or if they can exist explained past because these other features. We report results for models that include the average score, the fraction of positively scored comments, and both, in the other columns of Tabular array ii. Including these variables do not touch neither the odds ratios nor the statistical significance of the community features. This result confirms that the effects we notice for political interactions across communities are non confounded by these other user characteristics.

These command variables, in improver, show that users with a higher average score are more likely both to initiate and receive interactions. This relationship is heterophilic: a large departure in boilerplate score is associated to an increased likelihood. We tin can interpret average score equally a proxy measure for visibility: authors able to attract a large number of upvotes are besides more prolific, and they also attract more than comments, which explains the larger-than-one ratios. They also tend to trigger a response even from unpopular authors, thus explaining the heterophily. Again, the results obtained with the logit regression model are in agreement with what observed by comparing empirical interactions with a random network (i.e., Fig. 3).

Authors with a larger fraction of positively scored comments also are more likely to send and receive comments. In fact, since score is a measure of the social feedback from the community, this shows that authors more aligned with the community tend to exist more than active in information technology, which is not surprising. With this variable, nonetheless, the human relationship is homophilic: users positively scored and users negatively scored are more likely to annotate each other. This result might exist an effect of the community trying non to "feed the trolls"³⁸.

Sociodemographic implications

In this department, we investigate the connections between online interactions present in our data and offline socio-demographic factors. In item, our enquiry question is the following: which environmental factors are associated to higher levels of online cross-group interactions? While nosotros cannot testify any causal outcome of the environmental factors, such observational study can provide insights for theory generation and followup investigation.

To answer our research question, nosotros need a proxy of the socio-demographic environment of users. We choose US states as a proxy, as this is the finest spatial granularity nosotros can reliably infer for Reddit users. We infer the state of each user co-ordinate to the information gathered by Balsamo et al.³⁹, which is based on the usage of local Reddit communities. Out of our set up of 121,046 Reddit users, we are able to geo-localize 37% of them at the state level. Henceforth, we restrict our analysis merely to comments authored by users in this set. Country information is slightly unbalanced (36% for Trump's supporters, 43% for Clinton's). The number of users we obtain for each state closely resembles their population (Spearman R 0.97).

Table iii Odds ratios obtained by logistic regression.

Full size table

Starting time, nosotros enquire ourselves if there could be a homophilic beliefs due to geographic proximity, explained by common interests (e.k., local problems), culture, and norms. To exercise so, nosotros include in our regression model a dummy variable "same state" that indicates whether u and 5 come from the same US land. Then, we select a set of macroscopic attributes of each state, that we hypothesize might be related to their online behavior in a political community. These variables nowadays a basic sketch of the environs of the authors, as represented by state they live in. In particular we focus on the following political, economical, and demographic variables:

Swing land: a dummy variable which indicates whether the author lives in a US state that obtained a 2016 presidential election margin of less than 4% for whatever candidate.
Clinton/Trump share: the shares of votes obtained past these two candidates in the 2016 elections in the state where the author lives.
Non-vote share: fraction of the population that did not vote for either of the two major candidates in that land.
Unemployment: unemployment rate in 2016 in the state (source: US Bureau of Labor Statistics).
Gini coefficient: income inequality in the state, as measured by the Gini coefficient in 2010 (data from the American Customs Survey, conducted by the US Census Bureau).
Median income: median household income in 2016 (source: American Community Survey).
High school: fraction of the population with a high schoolhouse caste or higher (source: 2013–2017 American Community Survey).

We normalize all numerical variables according to quantile normalization, so that they display the aforementioned distribution. The data related to voting behavior refers to the election of November 2016, while our comments are in full general gathered from the whole electoral year. This procedure is coherent with our hypothesis: is there any difference in behavior in the general population of a US state that could manifest itself also on social media, and that affected the electoral process?

Nosotros build a logistic regression model for each one of these variables separately. In each model, beside the studied variable, we besides include the interaction feature between the selected environmental variable and the cross-group feature. This way, we capture whether the selected ecology variable has an consequence on the likelihood of a user interacting with another user who supports a different presidential candidate. We also include in each model all the Reddit-related variables analyzed and then far, since they all emerged as meaning. Nosotros repeat this analysis for each variable, with and without including the aforementioned state characteristic in each of the other models, which may human action as a large confounder for the other environmental variables. Nosotros report merely results including this variable, but the two cases are quantitatively like.

Tabular array three shows the odds ratio and the statistical significance obtained by these models. We consider significant for our hypothesis simply models where both the analyzed variable and its interaction with the cross-group variable is meaning. Note that the inclusion of environmental factors does not alter significantly the odds ratio and the significance obtained by the customs variables, thus further testifying for the robustness of our main results on heterophily and asymmetry.

We summarize the findings obtained via the models in Table 3 every bit follows.

(i)
In that location is in fact a significant ($p < 0.001$) homophily among users living in the same U.s.a. state. This result suggests that geographical proximity affects the likelihood of political interactions on Reddit. Similar results are known for other communities (for instance, the Brexit leave campaign^xl). A possible explanation is that geographical location subsumes other characteristics—for instance, close-past users could be more than probable to share similar interests, and therefore to annotate on the same, locally-relevant topic. Furthermore, nation-broad political campaigns as well involve local matters and candidates: discussing those bug might gather local users.
(two)
We find a pregnant correlation with non vote. In particular, states where individuals are nearly likely to abstain from voting Trump or Clinton in the presidential elections are as well those where cross-party interactions on Reddit are most likely ($p < 0.001$). Moreover, in those states, users seem less probable leave a comment, although with less significance ($p < 0.05$). This finding is consistent with the idea, well discussed in literature, that exposure to cantankerous-cut political views is associated with diminished political participation^25,26,27.
(three)
We discover swing states to be less probable to show cross-party interactions, but to foster more homophilic interactions ($p < 0.05$). This upshot is somewhat surprising, but is consistent with previous findings; e.grand., that face-to-confront interactions within families decrease when in that location is political disagreement, which is exacerbated by massive political ads campaigns in swing states⁴¹.
(4)
Higher rates of unemployment seem to lead to the same effect: users coming from states with higher unemployment are more than likely to leave a comment.
(5)
We observe a slight variation ($p < 0.05$) betwixt Trump-leaning and Clinton-leaning states: cross-group interactions are less probable in states with higher shares of Trump votes.
(vi)
The other variables we test do not show a statistically significant correlation: the likelihood of interactions does not seem to exist correlated with income inequality or education level.

Word

In this work, we analyzed cross-group interactions betwixt supporters of Trump and those of Clinton on Reddit during the 2016 US presidential elections. To this aim, we reconstruct the interaction network among these users on the chief political discussion community, r/politics. We observe that, despite the political polarization, these groups tend to interact more beyond than amidst themselves, that is, the network exhibits heterophily rather than homophily. This finding emerges by comparison with a null model of random social interactions, implemented both as a network rewiring that preserves the activeness of users, and every bit a logistic regression model for link prediction which takes into account possible confounding factors.

Overall, our findings show that Reddit has been a tool for political word between opposing points of view during the 2016 elections. This beliefs is in stark contrast with the repeat chambers observed in other polarized debates regarding different topics, on several social media platforms. While information technology has been argued that polarization on social media can result in the presence of echo chambers, in which users do not hear opposing views, here we observe the reversed phenomenon: polarization is associated to increased interactions between groups belongings opposite opinions. All the same, this relation between polarization and heterophily might non go across the digital realm. Reportedly, people perceive to encounter more disagreement in online than in offline interactions⁴². Further research should be dedicated to understanding whether the heterophily found in this social network is specific almost the 2016 presidential elections, or information technology applies to politics in full general, and thus it might exist a general feature of the Reddit platform^fourteen.

Several works in the literature accept tried quantifying the presence of echo chambers in different social media, although most of them accept not studied Reddit. Conover et al.⁴ analyze 250,000 tweets from the 2010 U.s.a. congressional midterm elections. They measure the ratio between the observed and expected numbers of links in a random model: they find that users are more than likely to collaborate people with whom they agree, with an odds ratio of 1.2–ane.3 for mentions and 1.7–2.iii for retweets, thus concluding that the retweet network is highly polarized. In our information, the same measure gives opposite results. Garimella et al.^ix quantify the presence of echo chambers in controversial political discussion on Twitter. Similarly to our methodology, they place two ingredients that create an repeat chamber: the stance that is shared, and the social network that allows the opinion to echo (operationalized as the follow network). By looking at the correlation of opinions expressed by a user and opinions exposed to, they ostend the presence of repeat chambers (i.e., users with dissimilar opinions tend non to follow each other), only merely when the topic is controversial. Bakshy et al.²⁵ wait at the interaction between users and news on Facebook. Again, the aforementioned 2 main ingredients are the focus of the written report: the user opinion (operationalized equally self-reported ideological alignment on a liberal-conservative axis), and the social network (friendship network). Their findings show the presence of homophily in the social network, which is also the main cause of decrease of exposure to ideologically cross-cutting content (compared to the effect of the news feed algorithm). Given these results, it is possible that the organisation of social media as a social network (due east.g., Twitter and Facebook), rather than a social forum such as Reddit, fosters the creation of repeat chambers. As previously discussed, there is no unanimous consensus on the furnishings of echo-chambers in public discourse, or even on their very being. Ref.²⁰, for instance, challenged the hypothesis that repeat-chambers actually reduce the content variety to which social media users are exposed. The authors found that many users check news sources dissimilar from their usual ones, often offline. They thus argue against generalizing unmarried-media studies to describe the complication of a loftier-choice multiple media environs. Forth the same lines, Ref.⁴³ highlights the importance of taking into business relationship the temporal dimension in the formation (or disgregation) of echo-chambers. The authors of Ref.⁴³ plant that the exchange of information with respect to some controversial topics on Twitter starts and persists as a national conversation participated by different political sides, before sliding into an echo-chamber picture.

Every bit with whatsoever empirical work, our work presents some limitations. First, Reddit users are not representative of the US population, and instead have strong socio-demographic biases. Reddit users are more likely to be young males⁴⁴, this finding is also confirmed by informal surveys proposed to users⁴⁵. Furthermore, Reddit is much more pop in urban rather than rural areas⁴⁴. More importantly, it has been shown that the political leaning and general interests of Reddit users may differ from those of the general population⁴⁶. Even and so, the nowadays report is focused on the Reddit community, and does not claim any socio-demographic implication on the full general population, besides the ones specifically addressed by the geolocation of users which really have the opposite causal direction. In this respect, a second limitation is evident: the state-level aggregation is fibroid-grained, and does not have into consideration differences betwixt areas within the same state (e.k., urban vs rural). All the same, the state-level is the finest spatial granularity we can reliably infer for a large-enough sample of Reddit users.

Despite these limitations, we find several interesting patterns regarding sociodemographic and ecology factors associated to an increment in likelihood of interactions between like-minded individuals⁴⁷. To test this hypothesis, nosotros analyzed the effect of different environmental factors by inferring the state of each user co-ordinate to the information gathered by Balsamo et al.³⁹. We observed an effect of geographical homophily on the r/politics network: interactions between users located in the same land are significantly more likely than random run a risk. At the same fourth dimension, users from the aforementioned state are less likely to interact when they support different candidates. Therefore, we speculate that while different political views foster interactions in the general example, geographical location might act more as a barrier.

Among other ecology factors, we as well observed a correlation between the likelihood of cross-group connections and the fraction of the population that abstained from voting. This finding suggests prudence when defining multifariousness of exposure as a normative goal. Similar results were measured through surveys, with Mutz²⁶ arguing that conflicts within i'due south own social environment can produce ambiguity, which tin in plow reduce the intensity of support for one'southward side. Interestingly, Mutz²⁶ observes such subtract even in the absence of new data. Farther empirical evidence is needed to sympathize this phenomenon and which additional factors may drive it. We leave this question as of import future piece of work. While weighting our understanding of social media as a dissonating bedroom or equally an echoic 1, nosotros cannot escape the question if dissonance damages the pursuit of common goals for political groups, or if it produces more realistic and less enthusiastic views of the bachelor candidates.

References

Pew Inquiry Center. The Partisan Divide on Political Values Grows Even Wider (Tech, Rep, 2017).
Baldassarri, D. & Gelman, A. Partisans without constraint: Political polarization and trends in American public opinion. Am. J. Sociol. 114, 408–446 (2008).

Article Google Scholar
Jacobson, G. C. Polarization, gridlock, and presidential entrada politics in 2016. Ann. Am. Acad. Polit. Soc. Sci. 667, 226–246 (2016).

Article Google Scholar
Conover, Thou. D. et al. Political polarization on twitter. In 5th International AAAI Conference on Weblogs and Social Media (2011).
Garimella, K., De Francisci Morales, Thou., Gionis, A. & Mathioudakis, M. Quantifying Controversy in Social Media. In WSDM 'xvi: 9th ACM International Conference on Spider web Search and Data Mining, 33–42 (2016).
An, J., Quercia, D. & Crowcroft, J. Partisan sharing: Facebook evidence and societal consequences. In COSN'14: ACM Conference on Online Social Networks, 13–24 (2014).
Bessi, A. et al. Users polarization on facebook and youtube. PLoS One 11, e0159641 (2016).

Article Google Scholar
Caldarelli, G., De Nicola, R., Del Vigna, F., Petrocchi, M. & Saracco, F. The role of bot squads in the political propaganda on twitter. Commun. Phys. three, 1–15 (2020).

Article Google Scholar
Garimella, K., De Francisci Morales, Thousand., Gionis, A. & Mathioudakis, M. Quantifying controversy on social media. TSC ACM Trans. Soc. Comput. 1, 3 (2018).

Google Scholar
Garrett, R. M. Echo chambers online?: Politically motivated selective exposure among Internet news users. J. Comput. Mediat. Commun. 14, 265–285 (2009).

Commodity Google Scholar
Gilbert, E., Bergstrom, T. & Karahalios, Thousand. Blogs are echo chambers: Blogs are echo chambers. In 42nd Hawaii International Briefing on Arrangement Sciences, one–10 (2009).
Quattrociocchi, W., Scala, A. & Sunstein, C. R. Echo chambers on Facebook. Bachelor at SSRN 2795110, (2016).
Garimella, K., De Francisci Morales, 1000., Gionis, A. & Mathioudakis, Thou. Political discourse on social media: Repeat chambers, gatekeepers, and the price of bipartisanship. In Proceedings of the 2018 Www Conference, 913–922 (2018).
Cinelli, M., De Francisci Morales, G., Galeazzi, A., Quattrociocchi, Westward. & Starnini, K. Repeat chambers on social media: A comparative analysis (2020). arXiv:2004.09603
Klapper, J. T. The Effects of Mass Communication (Costless Press, Mumbai, 1960).

Google Scholar
Lord, C. G., Ross, L. & Lepper, Grand. R. Biased assimilation and attitude polarization: The effects of prior theories on afterward considered evidence. J. Pers. Soc. Psychol. 37, 2098 (1979).

Commodity Google Scholar
Baumann, F., Lorenz-Spreen, P., Sokolov, I. Grand. & Starnini, M. Modeling repeat chambers and polarization dynamics in social networks. Phys. Rev. Lett. 124, 048301. https://doi.org/10.1103/PhysRevLett.124.048301 (2020).

ADS MathSciNet CAS Article PubMed Google Scholar
Cossard, A. et al. Falling into the echo chamber: The italian vaccination contend on twitter. In ICWSM 'xx: Fourteenth International AAAI Conference on Web and Social Media, 130–140 (2020).
Zuckerberg, K. Edifice global community (2017).
Dubois, E. & Blank, Yard. The echo chamber is overstated: The moderating event of political interest and diverse media. Inf. Commun. Soc. 21, 729–745 (2018).

Article Google Scholar
Guess, A., Nyhan, B., Lyons, B. & Reifler, J. Avoiding the Echo Chamber About Echo Chambers (Knight Foundation, Miami, 2018).

Google Scholar
Bail, C. A. et al. Exposure to opposing views on social media can increase political polarization. Proc. Natl. Acad. Sci. 115, 9216–9221 (2018).

CAS Article Google Scholar
Karpf, D. Digital politics after trump. Ann. Int. Commun. Assoc. 41, 198–207 (2017).

Google Scholar
Nithyanand, R., Schaffner, B. & Gill, P. Online political discourse in the Trump era. arXiv:1711.05303 (2017).
Bakshy, E., Messing, S. & Adamic, 50. A. Exposure to ideologically diverse news and stance on facebook. Scientific discipline 348, 1130–1132 (2015).

ADS MathSciNet CAS Commodity Google Scholar
Mutz, D. C. The consequences of cross-cut networks for political participation. Am. J. Polit. Sci. 838–855, twenty (2002).

Google Scholar
Huckfeldt, R., Mendez, J. M. & Osborn, T. Disagreement, ambivalence, and appointment: The political consequences of heterogeneous networks. Polit. Psychol. 25, 65–95 (2004).

Article Google Scholar
Garimella, Thou., De Francisci Morales, G., Gionis, A. & Mathioudakis, M. Reducing controversy by connecting opposing views. In WSDM '17: tenth ACM International Briefing on Spider web Search and Data Mining, 81–90 (2017).
Johnston, R., Jones, K. & Manley, D. An increasingly polarized America. Atlas of the 2016 Elections 104–110 (2018).
Nyhan, B. & Reifler, J. When corrections neglect: The persistence of political misperceptions. Polit. Behav. 32, 303–330 (2010).

Article Google Scholar
Duca, J. V. & Saving, J. L. Income inequality and political polarization: Time series evidence over ix decades. Rev. Income Wealth 62, 445–466 (2016).

Commodity Google Scholar
Storper, M. Separate Worlds? Explaining the current moving ridge of regional economic polarization. J. Econ. Geogr. 18, 247–270. https://doi.org/x.1093/jeg/lby011 (2018).

Commodity Google Scholar
Baumgartner, J., Zannettou, Southward., Keegan, B., Squire, M. & Blackburn, J. The pushshift reddit dataset. arXiv:2001.08435(arXiv preprint) (2020).
Flores-Saviaga, C. I., Keegan, B. C. & Brutal, S. Mobilizing the trump railroad train: Understanding collective action in a political trolling customs. In Twelfth International AAAI Conference on Web and Social Media (2018).
Massachs, J., Monti, C., De Francisci Morales, G. & Bonchi, F. Roots of trumpism: Homophily and social feedback in donald trump support on reddit. In Proceedings of the 12th ACM Conference on Spider web Science (2020).
Hutto, C. J. & Gilbert, East. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In 8th International AAAI Conference on Weblogs and Social Media (2014).
Molloy, One thousand. & Reed, B. A critical point for random graphs with a given degree sequence. Random Struct. Algorithms 6, 161–180 (1995).

MathSciNet Commodity Google Scholar
Bergstrom, Thousand. "don't feed the troll'': Shutting downwardly debate well-nigh customs expectations on reddit. com. Showtime Mon 16, twenty (2011).

Google Scholar
Balsamo, D., Bajardi, P. & Panisson, A. Firsthand opiates abuse on social media: monitoring geospatial patterns of interest through a digital accomplice. The Www Briefing 2572–2579, (2019).
Bastos, Grand., Mercea, D. & Baronchelli, A. The geographic embedding of online repeat chambers: Show from the brexit campaign. PLoS One 13, 20 (2018).

Google Scholar
Chen, M. K. & Rohla, R. The consequence of partisanship and political advertising on close family ties. Science 360, 1020–1024 (2018).

ADS CAS Article Google Scholar
Vaccari, C. How prevalent are filter bubbling and echo chambers on social media? Not as much as conventional wisdom has it (2018).
Barberá, P., Jost, J. T., Nagler, J., Tucker, J. A. & Bonneau, R. Tweeting from left to right: Is online political communication more than than an echo bedroom?. Psychol. Sci. 26, 1531–1542 (2015).

Article Google Scholar
Duggan, Thou. & Smith, A. vi% of online adults are reddit users. Pew Cyberspace Am. Life Project 3, i–10 (2013).

Google Scholar
Finlay, C. Historic period and gender in reddit commenting and success. J. Inf. Sci. Theory Pract. two, 18–28. https://doi.org/10.1633/JISTaP.2014.2.three.two (2014).

Commodity Google Scholar
Singer, P., Flöck, F., Meinhart, C., Zeitfogel, E. & Strohmaier, M. Evolution of reddit: From the front page of the internet to a cocky-referential community? In Proceedings of the 23rd International Conference on World Broad Web, 517–522, https://doi.org/x.1145/2567948.2576943 (New York, NY, Usa, 2014).
Gentzkow, M. & Shapiro, J. M. Ideological segregation online and offline. Q. J. Econ. 126, 1799–1839 (2011).

Article Google Scholar

Download references

Acknowledgements

GDFM and MS acknowledge the support from Intesa Sanpaolo Innovation Center. The funder had no role in report design, data collection and analysis, determination to publish, or preparation of the manuscript.

Author data

Affiliations

ISI Foundation, Via Chisola 5, 10126, Turin, Italy

Gianmarco De Francisci Morales, Corrado Monti & Michele Starnini

Contributions

G.D.F.M., C.M., G.S. designed the study. G.D.F.Yard., C.M., Yard.South. analyzed and discussed the results. G.D.F.M., C.M., M.S. wrote the manuscript. All authors approved the terminal version of the manuscript.

Corresponding authors

Correspondence to Gianmarco De Francisci Morales, Corrado Monti or Michele Starnini.

Ideals declarations

Competing interests

The authors declare no competing interests.

Additional data

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Admission This article is licensed under a Creative Eatables Attribution 4.0 International License, which permits utilise, sharing, adaptation, distribution and reproduction in any medium or format, as long as y'all give advisable credit to the original author(s) and the source, provide a link to the Artistic Commons licence, and indicate if changes were made. The images or other 3rd party textile in this article are included in the article'southward Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is non included in the article's Creative Eatables licence and your intended utilize is not permitted by statutory regulation or exceeds the permitted use, y'all will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/past/4.0/.

Reprints and Permissions

Nearly this commodity

Verify currency and authenticity via CrossMark

Cite this commodity

De Francisci Morales, G., Monti, C. & Starnini, Thou. No echo in the chambers of political interactions on Reddit. Sci Rep 11, 2818 (2021). https://doi.org/x.1038/s41598-021-81531-ten

Download citation

Received: 03 July 2020
Accepted: 05 January 2021
Published: 02 February 2021
DOI : https://doi.org/10.1038/s41598-021-81531-10

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you lot observe something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

mccormickofat1999.blogspot.com

Source: https://www.nature.com/articles/s41598-021-81531-x