That's Your Evidence?: Using Mechanical Turk To Develop A Computational Account Of Debate And Argumentation In Online Forums

That's Your Evidence?: Using Mechanical Turk To Develop A Computational Account Of Debate And Argumentation In Online Forums Natural Language and Dialogue Systems Lab Prof. Marilyn Walker

Debate and Deliberation: Key Human Activity Navy Research Lab funding IARPA, 3rd year of funding Identify subgroups on sides of issue

Persuasion and Argumentation on ConvinceMe

STANCE An overall position held by a person towards an object, idea or position (Somasundaran & Wiebe, 2009) Stance Groups: IARPA s subgroup (cells)

Stance Classification: Death Penalty Yes we should keep it I value human life so much that if someone takes one than his should be taken. Also if someone is thinking about taking a life they are less likely to do so knowing that they might lose theirs No we should not There is no proof that the death penalty acts as a deterrent, plus due to the finalty of the sentence it would be impossible to amend a mistaken conviction which happens with regualrity especially now due to DNA and improved forensic science

Dialogic Properties of Convinceme Every site offers different contextual affordances Convinceme provides three sources of dialogue structure Original post topic and responses on either side can be considered a response to the original post Rebuttal links explicitly link to a previous post on the other side Temporal context at the time of your post. What the page looked like, existing posts the user could see (is lost) Timestamps only by day, get partial order by day, plus order within day only via rebuttals No agree links

Death Penalty: Monologic Posts Yes we should keep it I value human life so much that if someone takes one than his should be taken. Also if someone is thinking about taking a life they are less likely to do so knowing that they might lose theirs No we should not There is no proof that the death penalty acts as a deterrent, plus due to the finalty of the sentence it would be impossible to amend a mistaken conviction which happens with regualrity especially now due to DNA and improved forensic science

Death Penalty: Rebuttal Chain RIGHT Studies have shown that using the death penalty saves 4 to 13 lives per execution. That alone makes killing murderers worthwhile. When Texas and Florida were executing people one after the other in the late 90's, the murder rates in both states plunged, like Rosie O'donnel off a diet... WRONG What studies? I have never seen ANY evidence that capital punishment acts as a deterrant to crime. I have not seen any evidence that it is ``just'' either. That's your evidence? What happened to those studies? In the late 90s a LOT of things were different than the periods preceding and following the one you mention. We have no way to determine what of those contributed to a lower murder rate, if indeed there was one. You have to prove a cause and effect relationship and you have failed.

How do humans do at this task?

1113 Debates, 4873 posts Topic Rebuttals P/A Cats v. Dogs 40% 1.68 Firefox vs. IE 40% 1.28 Mac vs. PC 47% 1.85 Superman/Batman 34% 1.41 2nd Amendment 59% 2.09 Abortion 70% 2.82 Climate Change 69% 2.97 Communism vs. Capitalism 70% 3.03 Death Penalty 62% 2.44 Evolution 76% 3.91 Exist God 77% 4.24 Gay Marriage 65% 2.12 Healthcare 80% 3.24 Marijuana Legalization 52% 1.55 Ideological topics always more than 50% rebuttals More author investment

Mechanical Turk Stance Siding

Data Preparation Natural Language and Dialogue Systems Lab

Map Debates into Topic sets Open Debates matching capital punishment Should Capital Punishment be Allowed? Mar 10 Do you agree with capital punishment? Sep 22 Open Debates matching death penalty Capital Punishment Feb 09 Should young adults who are convicted of extreme crimes, be issued the death penalty? Jan 28 Should death penalty be repealed again in the Philippines? Sep 15 Should the death penalty be brought back?why? Jun 24 death penalty Mar 14 Is the death penalty morally correct as it is SUPPOSED to be used in the United States? Aug 27 death penalty Mar 02 The Death Penalty should be legal Feb 04

ConvinceMe: 1113 Two Sided Debates Topic Rebuttals P/A Cats v. Dogs 40% 1.68 Firefox vs. IE 40% 1.28 Mac vs. PC 47% 1.85 Superman/Batman 34% 1.41 2nd Amendment 59% 2.09 Abortion 70% 2.82 Climate Change 69% 2.97 Communism vs. Capitalism 70% 3.03 Death Penalty 62% 2.44 Evolution 76% 3.91 Exist God 77% 4.24 Gay Marriage 65% 2.12 Healthcare 80% 3.24 Marijuana Legalization 52% 1.55

Mechanical Turk: HIT 9 annotators/post

Mechanical Turk: Human Topline

Human Topline for Stance Classification Class Correct Total Accuracy Rebuttal 606 827.73 Non-Rebuttal 427 493.87 Overall accuracy about 78% Harder for humans to classify stance of rebuttals Rebuttals are more context dependent Sometimes people post on wrong side Harder for humans to classify ideological posts 76% of ideological posts sided correctly, 85% non-ideological

Stance Classification & Rebuttal Classification Natural Language and Dialogue Systems Lab

Experimental Setup Stance Classification Within topic Remove cases where majority of annotators got it wrong But have additional NOISY data self-annotated (as posted) Explore the role of different feature sets 10 fold cross-validation Naïve Bayes, Jrip, SVM learners

Context Features: (naïve) IsRebuttal, Poster, Parent Post Features RIGHT Studies have shown that using the death penalty saves 4 to 13 lives per execution. That alone makes killing murderers worthwhile. When Texas and Florida were executing people one after the other in the late 90's, the murder rates in both states plunged, like Rosie O'donnel off a diet... WRONG What studies? I have never seen ANY evidence that capital punishment acts as a deterrant to crime. I have not seen any evidence that it is ``just'' either. That's your evidence? What happened to those studies? In the late 90s a LOT of things were different than the periods preceding and following the one you mention. We have no way to determine what of those contributed to a lower murder rate, if indeed there was one. You have to prove a cause and effect relationship and you have failed.

STANCE Classification Results MTurk Uni Best Best FeatSet Cats v. Dogs 94 59.23 62.31 All, no context Firefox vs. IE 74 51.25 53.75 LIWC, no context Mac vs. PC 76 53.33 56.67 LIWC, no context Superman Batman 89 54.84 57.26 LIWC with context 2nd Amendment 69 56.41 69.23 Unigram with context Abortion 75 50.97 53.70 LIWC with context Climate Change 66 53.65 58.33 LIWC Comm vs. Capitalism 68 48.81 56.55 LIWC with context Death Penalty 79 51.80 57.55 Generalized Dep POS with context Evolution 72 57.24 57.24 Unigram, no context Existence of God 73 52.71 53.42 Generalized Dep POS with context Gay Marriage 88 60.28 60.28 Unigram, no context Healthcare 86 52.13 60.64 LIWC with context MJ Legalization 81 57.55 59.43 All, no context Two topics unigram best Idealogical topics: context tends to help Need better context features

Compare Somasundaran & Wiebe 2010 Results range from 60 to 70% Arg + Sent statistically better than Unigram Their unigram baseline is higher Data doesn t contain rebuttals Domain (#posts) Distribution Unigram Sentiment Arguing Arg+Sent Overall (2232) 50 62.50 55.02 62.59 63.93 Guns Rights (306) 50 66.67 58.82 69.28 70.59 Gay Rights (846) 50 61.70 52.84 62.05 63.71 Abortion (550) 50 59.1 54.73 59.46 60.55 Creationism (530) 50 64.91 56.60 62.83 63.96 Table 4: Accuracy of the different systems

Current Work: Incorporate Social Network Models Use information over whole discussion Graph-based algorithm Speaker always agrees with self (P/A ranges 1.28 to 3.91) Rebuttal indicates disagreement A, B both disagree with C, agree with each other Evolution Topic: now 71.5% accuracy (from 57%) Firefox vs. IE: no improvement Graph topology differences, under investigation now

QUESTIONS? Natural Language and Dialogue Systems Lab

Discourse Relation Recognition Rebuttal is a kind of discourse relation Initiative Response Recognition Wang & Rose 2009 Get about 70% accuracy, LSA-CART Don t distinguish rebuttals from other Feature Engineering for Discourse Relation Recognition Echahabi & Marcu 2002: cartesian pairs Pitler, Louis & Nenkova, 2009

http://pcon.soe.ucsc.edu/mturk_external/ convinceme/cme.php?totalhits=100&pagegroup=1

Related Work Stance Classification in (Ideological) Online Debates Somasundaran & Wiebe, 2009; 2010 Discourse Relations (in scope of concession) Argumentation Features Results range from 60% to 70% accuracy Debate Side Classification (congress or forums) Thomas et. al. 2006, Bansal, Cardie 2010, Mishne Glance, Murakami & Raymond 2010 Use Graph-based algorithms (Social Network structure via Mincut, Maxcut) Simple features for agreement/disagreement from text improve performance

Compare Somasundaran & Wiebe 09,10 Topic Posts OPPr + Discourse Feats Best Accuracy Firefox vs. IE 62 66.13 % Windows vs. Mac 15 66.67% SonyPS3 vs. WII 36 61.0 % Opera vs. Firefox 4 100% Domain (#posts) Distribution Unigram Sentiment Arguing Arg+Sent Overall (2232) 50 62.50 55.02 62.59 63.93 Guns Rights (306) 50 66.67 58.82 69.28 70.59 Gay Rights (846) 50 61.70 52.84 62.05 63.71 Abortion (550) 50 59.1 54.73 59.46 60.55 Creationism (530) 50 64.91 56.60 62.83 63.96 Table 4: Accuracy of the different systems

Feature Sets Set Description/Examples Post Info IsRebuttal, Poster Unigrams Word frequencies Bigrams Word pair frequencies Cue Words Initial unigram, bigram, and trigram Repeated Punctuation Collapsed into one of the following:??,!!,?! LIWC LIWC measures and frequencies Dependencies Dependencies derived from the Stanford Parser. Generalized Dependencies Dependency features generalized with respect to POS of the head word and opinion polarity of both words. Opinion Dependencieion Subset of Generalized Dependencies with opin- words from MPQA. Context Features Matching Features used for the post from the parent post.

LIWC Lexical Categories Feature Anger words Metaphysical issues Physical state/function Inclusive words Social processes Family members Past tense verbs References to friends Causation Discrepancy Example hate, kill, pissed God, heaven, coffin ache, breast, sleep with, and, include talk, us, friend mom, brother, cousin walked, were, had pal, buddy, coworker because, know, ought should, would, could

Dependencies Dependencies Derived using the Stanford Parser Well, maybe the pistol and the hunting rifle, what do you need an automatic weapon? Have the deer gotten faster? 'amod(weapon, automatic)', 'dep:dobj(need, weapon)', 'dep:det(weapon, an)',...

Generalized Dependencies (POS) Generalized Dependencies Generalized over POS of head word Well, maybe the pistol and the hunting rifle, what do you need an automatic weapon? Have the deer gotten faster? Joshi and Rosé 2009 show that semi-lexicalized dependency features are better than fully lexicalized or fully generalized 'amod(nn, automatic)', 'dep:dobj(vbp, weapon)', 'dep:det(nn, an)',...

Generalized Dependencies (Opinion) Generalized Dependencies MPQA opinion dictionary over each word independently Intended to approximate Somasundaran & Wiebe's opinion target but if I want to own a pistol, a shotgun, and some fancy automatic, well that's my right. 'dep_opinion: amod (automatic, positive)' dep_opinion:poss(positive, my)'...