CS224W Project Proposal: Characterizing and Predicting Dogmatic Networks

Similar documents
NPTEL NPTEL ONINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking

Meaning in Modern America by Clay Routledge

Identifying Dogmatism in Social Media: Signals and Models

Studying Adaptive Learning Efficacy using Propensity Score Matching

The Scripture Engagement of Students at Christian Colleges

RECOMMENDED CITATION: Pew Research Center, July, 2014, How Americans Feel About Religious Groups

Congregational Survey Results 2016

Logical (formal) fallacies

Appendix 1. Towers Watson Report. UMC Call to Action Vital Congregations Research Project Findings Report for Steering Team

ONTOLOGICAL PROBLEMS OF PLURALIST RESEARCH METHODOLOGIES

PROSPECTIVE TEACHERS UNDERSTANDING OF PROOF: WHAT IF THE TRUTH SET OF AN OPEN SENTENCE IS BROADER THAN THAT COVERED BY THE PROOF?

On the Relationship between Religiosity and Ideology

The World Wide Web and the U.S. Political News Market: Online Appendices

Outline of today s lecture

August Parish Life Survey. Saint Benedict Parish Johnstown, Pennsylvania

Module - 02 Lecturer - 09 Inferential Statistics - Motivation

CONTENTS A SYSTEM OF LOGIC

MISSOURI S FRAMEWORK FOR CURRICULAR DEVELOPMENT IN MATH TOPIC I: PROBLEM SOLVING

Church Planter Summary Report for Shane Planter

Westminster Presbyterian Church Discernment Process TEAM B

Nigerian University Students Attitudes toward Pentecostalism: Pilot Study Report NPCRC Technical Report #N1102

Netherlands Interdisciplinary Demographic Institute, The Hague, The Netherlands

AMERICAN SECULARISM CULTUR AL CONTOURS OF NONRELIGIOUS BELIEF SYSTEMS. Joseph O. Baker & Buster G. Smith

Prentice Hall Biology 2004 (Miller/Levine) Correlated to: Idaho Department of Education, Course of Study, Biology (Grades 9-12)

BEFORE THE MINNESOTA OFFICE OF ADMINISTRATIVE HEARINGS 600 North Robert Street St. Paul, MN 55101

Measuring religious intolerance across Indonesian provinces

The Millennial Inventory: A New Instrument to Identify Pre- Versus Post-Millennialist Orientation

Question Answering. CS486 / 686 University of Waterloo Lecture 23: April 1 st, CS486/686 Slides (c) 2014 P. Poupart 1

REASON AND PRACTICAL-REGRET. Nate Wahrenberger, College of William and Mary

the paradigms have on the structure of research projects. An exploration of epistemology, ontology

Introduction Questions to Ask in Judging Whether A Really Causes B

ADAIR COUNTY SCHOOL DISTRICT GRADE 03 REPORT CARD Page 1 of 5

TEXT MINING TECHNIQUES RORY DUTHIE

Sentence Starters from They Say, I Say

THE CONCEPT OF OWNERSHIP by Lars Bergström

Macmillan/McGraw-Hill SCIENCE: A CLOSER LOOK 2011, Grade 4 Correlated with Common Core State Standards, Grade 4

Philosophy of Science. Ross Arnold, Summer 2014 Lakeside institute of Theology

ECE 5424: Introduction to Machine Learning

The distinctive should of assertability

NPTEL NPTEL ONLINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture 31

The Critical Mind is A Questioning Mind

FOURTH GRADE. WE LIVE AS CHRISTIANS ~ Your child recognizes that the Holy Spirit gives us life and that the Holy Spirit gives us gifts.

DIFFERENCES BETWEEN TYPES OF CHURCH

Human Nature & Human Diversity: Sex, Love & Parenting; Morality, Religion & Race. Course Description

Radical Centrism & the Redemption of Secular Philosophy

Torah Code Cluster Probabilities

FIRST STUDY. The Existential Dialectical Basic Assumption of Kierkegaard s Analysis of Despair

NICHOLAS J.J. SMITH. Let s begin with the storage hypothesis, which is introduced as follows: 1

Catholics Divided Over Global Warming

MLLunsford, Spring Activity: Conditional Probability and The Law of Total Probability

Introduction to Statistical Hypothesis Testing Prof. Arun K Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras

McDougal Littell High School Math Program. correlated to. Oregon Mathematics Grade-Level Standards

Radiomics for Disease Characterization: An Outcome Prediction in Cancer Patients

Role of Spiritual Values on Spiritual Personality among MBBS Students of AMU

2nd International Workshop on Argument for Agreement and Assurance (AAA 2015), Kanagawa Japan, November 2015

UC Berkeley UC Berkeley Previously Published Works

A New Parameter for Maintaining Consistency in an Agent's Knowledge Base Using Truth Maintenance System

You are Not a Beautiful and Unique Snowflake

Actuaries Institute Podcast Transcript Ethics Beyond Human Behaviour

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur. Lecture No. # 18 Acceptance Sampling

Macmillan/McGraw-Hill SCIENCE: A CLOSER LOOK 2011, Grade 1 Correlated with Common Core State Standards, Grade 1

SPIRITUAL LIFE SURVEY REPORT. One Life Church. September 2011

Appendix A: Scaling and regression analysis

JEWISH EDUCATIONAL BACKGROUND: TRENDS AND VARIATIONS AMONG TODAY S JEWISH ADULTS

Building Systematic Theology

Near and Dear? Evaluating the Impact of Neighbor Diversity on Inter-Religious Attitudes

Beliefs Versus Knowledge: A Necessary Distinction for Explaining, Predicting, and Assessing Conceptual Change

In Search of the Ontological Argument. Richard Oxenberg

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 21

Theory-driven Recommendations: Modeling Hedonic and Eudaimonic Movie Preferences

Prentice Hall World Geography: Building A Global Perspective 2003 Correlated to: Colorado Model Content Standards for Geography (Grade 9-12)

Research Findings on Scriptural Engagement, Communication with God, & Behavior Among Young Believers: Implications for Discipleship

Northfield Methodist Church

The SELF THE SELF AND RELIGIOUS EXPERIENCE: RELIGIOUS INTERNALIZATION PREDICTS RELIGIOUS COMFORT MICHAEL B. KITCHENS 1

1.5 Deductive and Inductive Arguments

1. Introduction Formal deductive logic Overview

PSY 202 Sample 2. Question/Prompt: It is logical that others see us differently than we see ourselves, and there is

Supplement to: Aksoy, Ozan Motherhood, Sex of the Offspring, and Religious Signaling. Sociological Science 4:

January Parish Life Survey. Saint Paul Parish Macomb, Illinois

AN OUTLINE OF CRITICAL THINKING

What Is Science? Mel Conway, Ph.D.

The Qualiafications (or Lack Thereof) of Epiphenomenal Qualia

Tuen Mun Ling Liang Church

9/7/2017. CS535 Big Data Fall 2017 Colorado State University Week 3 - B. FAQs. This material is built based on

Comments on Lasersohn

Rationality in Action. By John Searle. Cambridge: MIT Press, pages, ISBN Hardback $35.00.

THE ROLE OF COHERENCE OF EVIDENCE IN THE NON- DYNAMIC MODEL OF CONFIRMATION TOMOJI SHOGENJI

Asking the Right Questions: A Guide to Critical Thinking M. Neil Browne and Stuart Keeley

Healthy Churches. An assessment tool to help pastors and leaders evaluate the health of their church.

Comparing A Two-Factor Theory of Religious Beliefs to A Four-Factor Theory of Isms

Union for Reform Judaism. URJ Youth Alumni Study: Final Report

Why Good Science Is Not Value-Free

Development, Globalization, and Islamic Finance in Contemporary Indonesia

From the Greek Oikos = House Ology = study of

I Couldn t Agree More: The Role of Conversational Structure in Agreement and Disagreement Detection in Online Discussions

(i) Morality is a system; and (ii) It is a system comprised of moral rules and principles.

Module 02 Lecture - 10 Inferential Statistics Single Sample Tests

Religious affiliation, religious milieu, and contraceptive use in Nigeria (extended abstract)

Comprehensive Plan for the Formation of Catechetical Leaders for the Third Millennium

I also occasionally write for the Huffington Post: knoll/

Transcription:

CS224W Project Proposal: Characterizing and Predicting Dogmatic Networks Emily Alsentzer, Shirbi Ish-Shalom, Jonas Kemp 1. Introduction Increasing polarization has been a defining feature of the 21st century. 1 Systematic evidence shows that elevated dogmatism, a tendency to assert opinions as truths and ignore opposing viewpoints, has increasingly polarized discourse in topics ranging from the environment, to health, politics, and guns. 2,3,4,5 Some researchers attribute the immense polarization between groups to stagnation in the pace and consistency of reform. 6 Other large bodies of research have investigated how social, economic, or psychological factors contribute to elevating dogmatism, with a primary focus on individual behavior. However, the past decade has seen fundamental changes in the structure of social interactions with the advent of the Digital Age. Today, people can control who, how, when, and where they interact with others. At the click of a button, they can unfollow people with whom they disagree. We therefore propose that dogmatism is not a phenomenon resulting from individual behavior, but rather results from the customized structure of the social network with whom a user is communicating. With the new age of information consumption personalization, we expect that investigating the structure of social networks will uncover information about how an individual s interactions with their social network instigate or perpetuate dogmatism. In the remaining sections of this proposal, we review three papers that address various concepts relevant to our area of research. We discuss these papers relationship to our topic, and use them as a basis to formulate the specific research question we wish to investigate. Finally, we propose a concrete plan to address this question, including a dataset, a methodological plan, and our expected results and deliverables. 2. Literature Review 2.1 Predicting Positive and Negative Links in Online Social Networks 7 2.1.1 Summary Leskovec, Huttenlocher, and Kleinberg develop a machine learning model to predict the sign of links in an online social network using information about local structure, such as node degree and triads. The model, a logistic regression classifier, succeeds in predicting sign with high accuracy on real-world social networks from Epinions, Slashdot, and Wikipedia. Comparing their results to the classical theories of balance and status in signed social networks, the authors find that while both theories are reasonably accurate in reduced-form models, they cannot capture the subtleties of interaction in the full networks with the same accuracy as the learned model. Furthermore, at the level of global network structure, only the predictions of status theory are empirically supported by the data under consideration. 2.1.2 Critique The key result in this paper clearly demonstrates that local network structure provides information about the nature of interactions between community members, and that in turn these interactions may have

Alsentzer, Ish-Shalom, Kemp 2 implications for global properties of the network. However, two open questions emerge for further research. First, edge sign is an inherently crude measure of the nature of a human relationship. While many human interaction networks could be represented as signed networks, in most cases this would abstract away important subtleties of interactions between users (with the exception of some limited settings, such as a voting network). Theories of balance and status in signed networks are well-developed, but could a more complex aspect of interactions (such as dogmatism) lead to similarly well-formed predictions about structural properties of the network? Or, alternatively, could a prediction model use information from network structure to make predictions about the dogmatism of interactions in that network? Second, while the authors do compare global network properties to expectations from theory, model predictions are based only at the level of local structure, which focuses analysis towards individual-level behavior. Can we instead characterize, for example, the overall nature of discourse in a community based on its global network properties? These questions form the starting point for our investigation, and we will return to them later as we build our research proposal. 2.2 Identifying Dogmatism in Social Media: Signals and Models 8 2.2.1 Summary Fast and Horvitz present a statistical model for binary classification of online comments to identify dogmatism in social media. Feature engineering techniques included using bag-of-words and linguistic features derived from analysis using the Linguistic Inquiry and Word Count (LIWC) lexicon. The final model achieved a training accuracy of 0.881 and a test accuracy of 0.791. With this model, Fast and Horvitz labelled millions of unannotated posts to answer four questions about how dogmatic language shapes the Reddit community: 1. What subreddits have the highest and lowest levels of dogmatism? 2. How do dogmatic beliefs cluster? 3. What user behaviors are predictive of dogmatism? 4. How does dogmatism impact a conversation? While not primarily focused on network science, Fast and Horvitz s work directly relates to course content considering human behavior online. The paper explores how psychological theory translates into real-world data, finding that the features with the most predictive power (such as negative emotion, second person singular pronouns, and present tense) align well with current psychological theories. Additionally, in their examination of the clustering of dogmatism, Fast and Horvitz identify links between subreddits where a given user posts dogmatic comments in each, thereby developing a network of subreddits linked by common dogmatic users. 2.2.2 Critique A key strength of the analysis is its rigorous definition of dogmatism, with findings validated against relevant psychological theory. Subjecting the training and test set to multiple layers of filtering evokes additional confidence in the overall robustness of the model. Moreover, layering analysis of the Reddit

Alsentzer, Ish-Shalom, Kemp 3 ecosystem on top of the dogmatism model offers further validation by confirming prior intuitions about human behavior. For example, the most dogmatic subreddits were found to be oriented around politics and religion, while the least dogmatic subreddits tend to focus on hobbies. In particular, the Reddit analysis offers a natural avenue to connect the study of dogmatism to more explicitly network-based questions. However, the analysis is not without weaknesses. Establishing ground truth presents a notable challenge: despite recruiting only Master workers on Amazon Mechanical Turk (AMT) to label comments, the authors find that comments in the middle two quartiles for dogmatism rating (on a 1-5 scale) exhibit inter-rater agreement no better than chance. Analysis is therefore limited to the top and bottom quartiles only; yet even for these comments, α = 0.69 (where α = 0 is equivalent to chance and α = 1 denotes perfect agreement). This indicates that even with a clear definition of dogmatism, understanding of how it is expressed in communication can be highly subjective. The accuracy of the model presents another limitation. A test accuracy of 80% compounded over millions of comments results in hundreds of thousands, if not millions, of misclassifications. Between this and the lack of agreement between AMT workers, the challenges of modeling a phenomenon as complex as dogmatism become clear. Exploration of other classifiers beyond logistic regression, particularly those without assumptions of linearity, might offer a first step towards improving results. 2.3 A Measure of Polarization on Social Media Networks Based on Community Boundaries 9 2.3.1 Summary As discussed in the introduction, polarization is highly related to dogmatism, with more dogmatic discourse tending to increase polarization between opposing groups. Guerra et al. approach this question from a network perspective, arguing that the traditional metric of modularity is not a sufficiently direct measure of polarization, and proposing a new metric based on network boundary conditions between the communities. Specifically, they develop a model in which polarization is defined in terms of nodes likelihood of connecting to others outside their group, relative to those within their group. They also demonstrate empirically that nonpolarized networks are more likely to have many popular (i.e. high degree) nodes along the boundary, whereas in polarized networks intergroup antagonism reduces crossover. 2.3.2 Critique Guerra et al. offer a compelling model of polarization, but its reliance on inter-community boundary conditions is both a key strength and a key weakness. The authors rightly note that a) evidence of antagonism between two communities is likely to be most evident in the structure of the boundary, and b) inference on the polarization of communities that do not share a boundary may not be appropriate (as the communities may simply be unrelated or disconnected). Their model s explicit consideration of these assumptions is its foremost advance over previous metrics. However, as discussed in the introduction, features of discourse in a community such as elevated dogmatism contribute to the rise of polarization. Thus, while boundary analysis may be necessary to

Alsentzer, Ish-Shalom, Kemp 4 determine the existence of polarization, intrinsic features of a community should predict a propensity for antagonism and polarization, irrespective of actual relationships to other communities. The model developed by Guerra et al. is descriptive rather than predictive, but we plan to instead approach the latter problem in our research. 3. Literature Discussion & Brainstorming Fundamentally, our project addresses a similar question to the sign prediction problem: can we predict the nature of discourse and interactions in a network based on structural properties? However, we extend the problem in two important ways. First, rather than edge sign we adopt dogmatism as our measure of interest, per the work of Fast and Horvitz. While this is a much more complex and challenging measure to accurately quantify, it captures a dimension of human interaction that goes beyond mere positive or negative sentiment, and one that is especially relevant in the current political climate. Second, we choose to focus on the properties of a community as a whole rather than individual links. We aim to characterize the dogmatism of groups rather than individuals, because as the model proposed by Guerra et al. suggests, group-level interactions ultimately define polarization. Indeed, a large fraction of research on the topic emphasizes individual-level analysis, and thereby risks missing relevant phenomena on a larger scale. Our approach can potentially offer a predictive complement to the polarization model, insofar as we hypothesize that high community-level dogmatism might by proxy indicate the likelihood of a community developing polarized relationships. 4. Proposal 4.1 Problem Statement Social media is playing an increasingly important role in shaping the national discourse around conversations related to race, gender, politics, and other contested topics. Online users can instantly connect to individuals across the country and around the world with diverse backgrounds and beliefs. Fast and Horvitz suggest that while dogmatism is a deeper personality trait, its expression may be influenced by engagement with other dogmatic users online. In light of these findings, we hope to better understand specifically how online interactions can influence dogmatism. In particular, we will investigate the network characteristics of dogmatic Reddit communities with the ultimate goal of predicting the formation of dogmatic groups online. 4.2 Data We will use Reddit data from over 2000 subreddit communities, courtesy of TA Will Hamilton, in order to understand the relationship between network properties and community dogmatism. We have monthly interaction networks for four week periods for each subreddit during 2014. In the interaction networks, each node is a user, and users are connected if the users replied in the same linear thread within three comments of one another. Only users who commented at least 50 times in 2014 are included.

Alsentzer, Ish-Shalom, Kemp 5 4.3 Specific Aims 4.3.1. Label the sentiment polarity and level of dogmatism of every subreddit community in 2014. We will apply the TextBlob sentiment classifier and Ethan Fast s dogmatism classifier in order to label both the polarity and level of dogmatism for each user in the 26,000 communities in our dataset (~2000 subreddits x 13 monthly snapshots). A network will be considered dogmatic if the average dogmatism of its users is higher than a given threshold, which will be empirically determined by calculating the average dogmatism of known dogmatic networks from the Fast and Horvitz paper. We will randomly divide the data into training and test sets, keeping all weekly snapshots of the same subreddit in either the training or test sets. We will also ensure that each set includes both dogmatic and non-dogmatic communities. 4.3.2. Characterize the network properties of both dogmatic and non-dogmatic networks. Using the training set alone, we will perform an exploratory analysis to determine whether there are certain network properties that are characteristic of dogmatic and non-dogmatic networks. The network properties we will consider include, but are not limited to: clustering coefficient, average path length, triadic closure, degree and excess degree distributions, diameter, size and number of connected components, various metrics of centrality, and the presence of bridges and strong and weak ties. We hypothesize that more closed triads and cliques will be indicative of dogmatic communities. 4.3.3. Predict the level of dogmatism in a subreddit community using network properties as features. After describing the features of both dogmatic and non-dogmatic communities, we will use these features to develop a classifier to predict the presence of dogmatism in a community. We will use Python s Sklearn toolkit to develop naive Bayes, support vector machine, and random forest classifiers, making sure to weight according to imbalanced class sizes. Finally, we will perform feature importance analysis to determine which features are most important in predicting dogmatic networks. 4.3.4. Predict the formation of dogmatic communities by incorporating temporal features describing network changes over time into our algorithm If we are able to accomplish the above specific aims, we additionally plan to explore whether we can predict the formation of a dogmatic community. Rather than using each monthly snapshot of a subreddit as a separate training example, we will instead consider only the ~2000 individual subreddits. In order to predict formation, we will examine temporal motifs describing changing network connectivity over time and include these as features in our machine learning algorithms. 4.4 Evaluation We will evaluate the success of our models by calculating sensitivity, specificity, and F1 scores against our training and test sets. 4.5 Deliverables Upon the completion of this project, we will have developed a better understanding of the network properties associated with dogmatism in online Reddit communities, and we will have produced a model for predicting the level of dogmatism in static communities. Time permitting, we will have also extended

Alsentzer, Ish-Shalom, Kemp 6 our model to account for temporal trends in order to predict the formation of dogmatic communities over time. References 1. Doherty, Carroll. "7 things to know about polarization in America." Pew Research Center (2014). 2. Jacobson, Gary C. "Partisan polarization in American politics: A background paper." Presidential Studies Quarterly 43.4 (2013): 688-708. 3. Guber, Deborah Lynn. "A cooling climate for change? Party polarization and the politics of global warming." American Behavioral Scientist (2012): 0002764212463361. 4. Baker, Jeffrey P. "Mercury, vaccines, and autism: one controversy, three histories." American Journal of Public Health 98.2 (2008): 244-253. 5. Wozniak, Kevin H. "American public opinion about gun control remained polarized and politicized in the wake of the Sandy Hook mass shooting." USApp American Politics and Policy Blog (2015). 6. Frye, Timothy. Building states and markets after communism: the perils of polarized democracy. Cambridge University Press, 2010. 7. Leskovec, Jure, Daniel Huttenlocher, and Jon Kleinberg. "Predicting positive and negative links in online social networks." Proceedings of the 19th international conference on World wide web. ACM, 2010. 8. Fast, Ethan, and Eric Horvitz. "Identifying Dogmatism in Social Media: Signals and Models." arxiv preprint arxiv:1609.00425 (2016). 9. Guerra, Pedro Henrique Calais, et al. "A Measure of Polarization on Social Media Networks Based on Community Boundaries." ICWSM. 2013.