Radiomics for Disease Characterization: An Outcome Prediction in Cancer Patients

Similar documents
NPTEL NPTEL ONINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture-59 Ensemble Methods- Bagging,Committee Machines and Stacking

ECE 5424: Introduction to Machine Learning

Introduction to Statistical Hypothesis Testing Prof. Arun K Tangirala Department of Chemical Engineering Indian Institute of Technology, Madras

The World Wide Web and the U.S. Political News Market: Online Appendices

Brandeis University Maurice and Marilyn Cohen Center for Modern Jewish Studies

Six Sigma Prof. Dr. T. P. Bagchi Department of Management Indian Institute of Technology, Kharagpur

Studying Adaptive Learning Efficacy using Propensity Score Matching

Nigerian University Students Attitudes toward Pentecostalism: Pilot Study Report NPCRC Technical Report #N1102

This report is organized in four sections. The first section discusses the sample design. The next

Religious Beliefs of Higher Secondary School Teachers in Pathanamthitta District of Kerala State

NPTEL NPTEL ONLINE CERTIFICATION COURSE. Introduction to Machine Learning. Lecture 31

Supplement to: Aksoy, Ozan Motherhood, Sex of the Offspring, and Religious Signaling. Sociological Science 4:

Same-different and A-not A tests with sensr. Same-Different and the Degree-of-Difference tests. Outline. Christine Borgen Linander

ECE 5424: Introduction to Machine Learning

McDougal Littell High School Math Program. correlated to. Oregon Mathematics Grade-Level Standards

MISSOURI S FRAMEWORK FOR CURRICULAR DEVELOPMENT IN MATH TOPIC I: PROBLEM SOLVING

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Lecture 21

Social Perception Survey. Do people make prejudices based on appearance/stereotypes? We used photos as a bias to test this.

Introduction to Inference

ECE 5424: Introduction to Machine Learning

CONGREGATIONS ON THE GROW: SEVENTH-DAY ADVENTISTS IN THE U.S. CONGREGATIONAL LIFE STUDY

Research Findings on Scriptural Engagement, Communication with God, & Behavior Among Young Believers: Implications for Discipleship

Church Planter Summary Report for Shane Planter

Deep Neural Networks [GBC] Chap. 6, 7, 8. CS 486/686 University of Waterloo Lecture 18: June 28, 2017

Grade 6 correlated to Illinois Learning Standards for Mathematics

Multiple Regression-FORCED-ENTRY HIERARCHICAL MODEL Dennessa Gooden/ Samantha Okegbe COM 631/731 Spring 2018 Data: Film & TV Usage 2015 I. MODEL.

Introduction Chapter 1 of Social Statistics

Allreduce for Parallel Learning. John Langford, Microsoft Resarch, NYC

This is certainly a time series. We can see very strong patterns in the correlation matrix. This comes out in this form...

August Parish Life Survey. Saint Benedict Parish Johnstown, Pennsylvania

Appendix 1. Towers Watson Report. UMC Call to Action Vital Congregations Research Project Findings Report for Steering Team

This is certainly a time series. We can see very strong patterns in the correlation matrix. This comes out in this form...

Netherlands Interdisciplinary Demographic Institute, The Hague, The Netherlands

Biometrics Prof. Phalguni Gupta Department of Computer Science and Engineering Indian Institute of Technology, Kanpur. Lecture No.

Agnostic Learning with Ensembles of Classifiers

Torah Code Cluster Probabilities

ECE 5984: Introduction to Machine Learning

The Zeal of the Convert: Religious Characteristics of Americans who Switch Religions

AMERICAN SECULARISM CULTUR AL CONTOURS OF NONRELIGIOUS BELIEF SYSTEMS. Joseph O. Baker & Buster G. Smith

Georgia Quality Core Curriculum

Results from the Johns Hopkins Faculty Survey. A Report to the Johns Hopkins Committee on Faculty Development and Gender Dr. Cynthia Wolberger, Chair

NICHOLAS J.J. SMITH. Let s begin with the storage hypothesis, which is introduced as follows: 1

CS485/685 Lecture 5: Jan 19, 2016

I also occasionally write for the Huffington Post: knoll/

The 2010 Jewish Population Study of Metropolitan Chicago METHODOLOGY REPORT

Identity and Curriculum in Catholic Education

Spirituality Leads to Happiness: A Correlative Study

It is One Tailed F-test since the variance of treatment is expected to be large if the null hypothesis is rejected.

Working Paper No Two National Surveys of American Jews, : A Comparison of the NJPS and AJIS

Tuen Mun Ling Liang Church

Logical (formal) fallacies

May Parish Life Survey. St. Mary of the Knobs Floyds Knobs, Indiana

Factors related to students spiritual orientations

PARSEC An R package for PARtial orders in Socio- EConomics Alberto Arcagni and Marco Fattore

Religious affiliation, religious milieu, and contraceptive use in Nigeria (extended abstract)

MITOCW watch?v=4hrhg4euimo

occasions (2) occasions (5.5) occasions (10) occasions (15.5) occasions (22) occasions (28)

Appendix A: Scaling and regression analysis

TECHNICAL WORKING PARTY ON AUTOMATION AND COMPUTER PROGRAMS. Twenty-Fifth Session Sibiu, Romania, September 3 to 6, 2007

What s In It for Me? Profiling Opportunity Seeking Customers in Malaysian Islamic Banking Sector

AN EXPLORATORY SURVEY EXAMINING THE FAMILIARITY WITH AND ATTITUDES TOWARD CRYONIC PRESERVATION. W. Scott Badger, Ph.D. ABSTRACT INTRODUCTION

Bounded Rationality. Gerhard Riener. Department of Economics University of Mannheim. WiSe2014

Lampiran 1. Daftar Sampel Reksa dana campuran syariah

CREATING THRIVING, COHERENT AND INTEGRAL NEW THOUGHT CHURCHES USING AN INTEGRAL APPROACH AND SECOND TIER PRACTICES

April Parish Life Survey. Saint Elizabeth Ann Seton Parish Las Vegas, Nevada

Probability Distributions TEACHER NOTES MATH NSPIRED

Factors Influencing on Peaceful Co-Existence: Christian s Living in Tehran

The distinctive should of assertability

Inverse Relationships Between NAO and Calanus Finmarchicus

Studying Religion-Associated Variations in Physicians Clinical Decisions: Theoretical Rationale and Methodological Roadmap

PHILOSOPHY AND RELIGIOUS STUDIES

SAMPLING AND DEMOGRAPHICS...

When Financial Information Meets Religiosity in Philanthropic Giving: The Case of Taiwan

Market Share and Religious Competition: Do Small Market Share Congregations and Their Leaders Try Harder?

Ability, Schooling Inputs and Earnings: Evidence from the NELS

POLS 205 Political Science as a Social Science. Making Inferences from Samples

On the Verge of Walking Away? American Teens, Communication with God, & Temptations

FOURTH GRADE. WE LIVE AS CHRISTIANS ~ Your child recognizes that the Holy Spirit gives us life and that the Holy Spirit gives us gifts.

Computational Learning Theory: Agnostic Learning

CS224W Project Proposal: Characterizing and Predicting Dogmatic Networks

Grade 7 Math Connects Suggested Course Outline for Schooling at Home 132 lessons

Religious shift between cohorts

Sentiment Flow! A General Model of Web Review Argumentation

How many imputations do you need? A two stage calculation using a quadratic rule

U.S. Catholics Express Favorable View of Pope Francis

ST. Matthew s Episcopal Church: Congregation Survey Highlights. REV: June 6, Source: Congregation Survey Highlights, 2014

Measuring religious intolerance across Indonesian provinces

Congregational Survey Results 2016

Content Area Variations of Academic Language

The numbers of single adults practising Christian worship

The Negative Relationship between Size and the Probability of Weekly Attendance in Churches in the United States

Sociology Exam 1 Answer Key February 18, 2011

Slides by: Ms. Shree Jaswal

Support, Experience and Intentionality:

and Voting for Evangelicals in Latin America Appendix

NEWS AND RECORD / HIGH POINT UNIVERSITY POLL MEMO RELEASE 3/29/2018

Pray, Equip, Share Jesus:

FACTORS AFFECTING THE VIEWS OF BISHOPS AND PRIESTS ABOUT CATHOLIC SCHOOLS

Follow on Work from the Church Growth Research Programme

A Layperson s Guide to Hypothesis Testing By Michael Reames and Gabriel Kemeny ProcessGPS

Transcription:

Radiomics for Disease Characterization: An Outcome Prediction in Cancer Patients Magnuson, S. J., Peter, T. K., and Smith, M. A. Department of Biostatistics University of Iowa July 19, 2018 Magnuson, Peter, Smith (Wheaton) 7/19/2018 1

Background- Information Lung cancer is the leading cause of cancer-related mortality in the United States 234,030 new cases expected in 2018 200 CT scans from University of Iowa Hospital Patients 410 quantitative imaging biomarkers (Intensity, Shape, Texture) used for analysis 5 patient demographics (Lobe, Age, Race, Gender, Packs per Year) 45% of cases were benign and 55% of cases were malignant Magnuson, Peter, Smith (Wheaton) 7/19/2018 2

Project Objective To develop a statistical model to predict lesion malignant/benign status of each patient Magnuson, Peter, Smith (Wheaton) 7/19/2018 3

Background Descriptive Statistics Age (years) Packs Smoked (per year) Minimum 24 0 Mean 59.88 26.18 Median 60 20 Maximum 90 150 Magnuson, Peter, Smith (Wheaton) 7/19/2018 4

Background Descriptive Statistics Magnuson, Peter, Smith (Wheaton) 7/19/2018 5

Background Descriptive Statistics Magnuson, Peter, Smith (Wheaton) 7/19/2018 6

Data-Preprocessing Filtering Variables Magnuson, Peter, Smith (Wheaton) 7/19/2018 7

Filtering Variables Due to the high correlation of predictors, we look for the removal of noninformative/redundant variables to improve model stability and performance Heat Map Magnuson, Peter, Smith (Wheaton) 7/19/2018 8

Filtering Variables Methods for Data-filtering 1. Correlation: remove predictors so that all pairwise correlations are below a specified threshold (0.95) 2. Near Zero Variance: remove variable predictors that are constants When applied to the full data set, 348 predictors were removed Magnuson, Peter, Smith (Wheaton) 7/19/2018 9

Model Selection and Assessment AUC and ROC Magnuson, Peter, Smith (Wheaton) 7/19/2018 10

Model Selection and Assessment- AUC AUC: area under the receiver operating characteristic (ROC) curve Estimates the probability that a randomly selected subject with a malignant lesion will have a greater model predicted probability than a randomly selected subject with a benign lesion The closer AUC is to 1.0 (100% specificity and 100% sensitivity), the better the predictive performance The closer AUC is to 0.50, the worse the test Magnuson, Peter, Smith (Wheaton) 7/19/2018 11

Model Selection and Assessment- AUC Range Scale 0.97-1.00 Excellent 0.92-0.97 Very Good 0.75-0.92 Good 0.50-0.75 Fair Magnuson, Peter, Smith (Wheaton) 7/19/2018 12

K-Fold Repeated Cross-Validation Original Data Fold 1 Fold 2 Fold 3 Cross-Validation Estimate of the Performance Metric, AUC: 5 10 AAAAAA = 1 50 AAAAAA rrrr rr=1 kk=1 Magnuson, Peter, Smith (Wheaton) 7/19/2018 13

Elastic Net Model details, filtering vs. non-filtering Magnuson, Peter, Smith (Wheaton) 7/19/2018 14

Model Details- Elastic Net Logistic regression finds parameters that maximize the binomial likelihood function, LL(pp) The parameters can be regularized by adding a penalty to the likelihood function There are two types of penalties to add: 1. Ridge 2. LASSO (least absolute shrinkage and selection operator) Elastic Net combines the two types of penalties Magnuson, Peter, Smith (Wheaton) 7/19/2018 15

Model Details- Elastic Net log LL pp λλ [ 1 αα 1 2 jj=1 PP ββ jj 2 + αα λλ controls the total amount of penalization PP jj=1 ββ jj ] αα is the mixing percentage (when αα = 1 it is a pure lasso penalty; when αα = 0 it is a pure ridge-regression-like penalty) This enables effective regularization via the ridge-type penalty with the feature selection quality of the LASSO penalty Magnuson, Peter, Smith (Wheaton) 7/19/2018 16

Filtering vs. Non-filtering- Elastic Net Magnuson, Peter, Smith (Wheaton) 7/19/2018 17

Random Forest Decision trees, Model Details, filtering vs. non-filtering Magnuson, Peter, Smith (Wheaton) 7/19/2018 18

A Forest of Decision Trees We can apply the same concept of decision making to classifying data. Magnuson, Peter, Smith (Wheaton) 7/19/2018 19

Random Forest Model Details Random forest takes a majority vote over a collection of decision trees to improve accuracy and reduce prediction variability Magnuson, Peter, Smith (Wheaton) 7/19/2018 20

Filtering vs. Non-filtering- Random Forest Magnuson, Peter, Smith (Wheaton) 7/19/2018 21

Stochastic Gradient Boosting Model details, filtering vs. non-filtering Magnuson, Peter, Smith (Wheaton) 7/19/2018 22

Model Details-Stochastic Gradient Boosting Influenced by Learning Theory: a number of weak classifiers are combined to produce an ensemble Basic Principles of Boosting: 1. The algorithm seeks to find an additive model of decision trees to minimize a given loss function 2. Algorithm initialized with best guess of the response 3. The gradient (residual) is calculated and a model is fit to the residuals 4. Current model added to the previous model 5. Procedure continues for a specified number of iterations Magnuson, Peter, Smith (Wheaton) 7/19/2018 23

Model Details- Stochastic Gradient Boosting Boosting bears similarities to Random Forest and both models give equal predictive performance Random Forest and Boosting are constructed differently In Random Forest, all trees are created independently and each tree is created to have maximum depth and all trees contribute equally In Boosting, the trees are dependent on past trees, have minimum depth, and contribute unequally to the model Magnuson, Peter, Smith (Wheaton) 7/19/2018 24

Filtering vs. Non-filtering: Stochastic Gradient Boosting Magnuson, Peter, Smith (Wheaton) 7/19/2018 25

Model Comparison Magnuson, Peter, Smith (Wheaton) 7/19/2018 26

Index: Method to identify a probability cut point that optimizes the sensitivity and specificity with respect to the prevalence rate and the cost iiiiiiiiii = min 1 ssssssss 2 + rr 1 ssssssss 2, where rr = 1 pp (cccccccc pp) and pp = prevalence = 0.50 and cccccccc = ffffffffff nnnnnnnnnnnnnnnn ffffffffff pppppppppppppppp = 4.0 Magnuson, Peter, Smith (Wheaton) 7/19/2018 27

Index Table: Stochastic Gradient Boosting Stochastic Gradient Boosting Index (mean) Sensitivity (mean) Specificity (mean) 0.5 0.12 0.70 0.78 0.45 0.09 0.78 0.71 0.40 0.07 0.86 0.63 0.35 0.06 0.90 0.59 Magnuson, Peter, Smith (Wheaton) 7/19/2018 28

Conclusions Main takeaways, future work Magnuson, Peter, Smith (Wheaton) 7/19/2018 29

Main Takeaways and Future Work The Stochastic Gradient Boosting model had the best performance, considering its high AUC and relatively low variability The filtering helped the Random Forest models noticeably The logistic regression using only the demographic predictors performed the best However, using the biomarkers alone did improve predictive performance Plan to explore the index values further Plan to explore deep neural networks Magnuson, Peter, Smith (Wheaton) 7/19/2018 30

Acknowledgments Dr. Brian J. Smith, Professor, Dept. of Biostatistics University of Iowa National Heart, Lung, and Blood Institute (NHLBI), grant #HL131467 Magnuson, Peter, Smith (Wheaton) 7/19/2018 31

References Kuhn, M., & Johnson, K. (2016). Applied Predictive Modeling. New York: Springer. Max Kuhn. Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem, Luca Scrucca, Yuan Tang, Can Candan and Tyler Hunt. (2018). caret: Classification and Regression Training. R package version 6.0-80. https://cran.r-project.org/package=caret R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.r-project.org/ Smith, Brian J. (2018) BIOS6720, [PDF]. University of Iowa, Department of Biostatistics Magnuson, Peter, Smith (Wheaton) 7/19/2018 32

Thank You! *Waits for Audience to Clap* Magnuson, Peter, Smith (Wheaton) 7/19/2018 33

Variable Importance Elastic Net Magnuson, Peter, Smith (Wheaton) 7/19/2018 34

Variable Importance Random Forest Magnuson, Peter, Smith (Wheaton) 7/19/2018 35

Variable Importance Logistic Magnuson, Peter, Smith (Wheaton) 7/19/2018 36