Efficien Model Checking of Faul-Toleran Disribued Proocols 1, Johannes Kinder 12, Marco Serafini 3 and Neeraj Suri 1 1 Technische Universiä Darmsad, Germany 2 EPFL, Lausanne, Swizerland 3 Yahoo! Research, Barcelona, Spain 28h June 2011 1 / 19
: Tracable Model Checking Message-passing widely used Cloud ec. Acors Complex designs Concurrency Fauls ec. Exising ools don' scale MoDis Mace Crysalball ec. 2 / 19
: Tracable Model Checking Message-passing widely used Cloud ec. Acors Complex designs Concurrency Fauls ec. Exising ools don' scale MoDis Mace Crysalball ec. Sae space inracabiliy 2 / 19
: Tracable Model Checking Message-passing widely used Cloud ec. Acors Complex designs Concurrency Fauls ec. Exising ools don' scale MoDis Mace Crysalball ec. Sae space inracabiliy Sae space reducions 2 / 19
of Our Proposed Sae Reducions Message-passing sysems Faul-oleran sysems receive & sore m 1 receive & sore m 2... receive & sore m k & TAKE ACTION 3 / 19
of Our Proposed Sae Reducions Message-passing sysems Faul-oleran sysems receive & sore m 1 receive & sore m 1,..., m k & TAKE ACTION receive & sore m 2... receive & sore m k & TAKE ACTION QUORUM TRANSITION 3 / 19
of Our Proposed Sae Reducions Message-passing sysems Faul-oleran sysems receive & sore m 1 receive & sore m 1,..., m k & TAKE ACTION receive & sore m 2... QUORUM TRANSITION receive & sore m k & TAKE ACTION Conribuion 1 EVALUATION of MC efficiency wih quorum ransiions 3 / 19
of Our Proposed Sae Reducions Message-passing sysems Faul-oleran sysems PARTIAL-ORDER REDUCTION 3 / 19
of Our Proposed Sae Reducions Message-passing sysems Faul-oleran sysems PARTIAL-ORDER REDUCTION 1 2 2 1 3 / 19
of Our Proposed Sae Reducions Message-passing sysems Faul-oleran sysems PARTIAL-ORDER REDUCTION 1 2 3 / 19
of Our Proposed Sae Reducions Message-passing sysems Faul-oleran sysems PARTIAL-ORDER REDUCTION 1 2 Conribuion 2 Propose & evaluae SYSTEMATIC ransiion refinemens for message-passing 3 / 19
Talk Srucure In Pracice Quorum PART I: QUORUM TRANSITIONS PART II: TRANSITION REFINEMENT PART III: TOOL SUPPORT Evaluaion Relaed Work 4 / 19
Quorum in Pracice Paxos: Crash-oleran consensus (Lampor 2001) If he proposer receives a response o is prepare requess (...) from a majoriy of accepors,... ABD: Crash-oleran sorage (Aiya e al. 1995)..., a leas (n + 1)/2 processors received he message M. In Pracice Quorum Evaluaion Relaed Work Byzanine-oleran mulicas (Reier 1994) When a process receives hese (2 V x + 1)/3 echoes.. 5 / 19
Quorum Definiion Aomic, process local even ha Consumes a se X of messages. Changes he local sae of he process. Sends zero or more messages. In Pracice Quorum Evaluaion Relaed Work 6 / 19
Quorum Definiion Aomic, process local even ha Consumes a se X of messages. Changes he local sae of he process. Sends zero or more messages. Quorum ransiion via single-message ransiions L Execue repeiively s i L (m i ) s i+1 : Proceed from sae s i o s i+1 by consuming a single message m i. Execuions concurren wih oher ransiions C. In Pracice Quorum Evaluaion Relaed Work receive & sore X={m 1,..., m k } & TAKE ACTION... L (m 1 ) L (m 2 ) L (m k ) receive & receive & receive & sore m 1 sore m 2 sore m k & C C C TAKE ACTION C 6 / 19
Evaluaion: Model Checking Quorum Proocol Propery Resul Single-mess. rans. Quorum rans. (# processes) Saes Time Saes Time Paxos (6) Consensus Verified 6,247,530 23h 2,822,764 9h37m Fauly Paxos (6) Consensus Bug 524 12s 279 10s Echo Mcas (5) Agreemen Verified 9,222 2m22s 652 12s Echo Mcas (4) Agreemen Verified 9,986 1m55s 2,787 31s Echo Mcas (6) Wrong agr. Bug 66 9s 48 6s Regular sorage (5) Regulariy Verified 185,711 33m49s 20,039 3m4s Regular sorage (6) Wrong reg. Bug 72,937 12m37s 41,331 6m46s In Pracice Quorum Evaluaion Relaed Work 7 / 19
Evaluaion: Model Checking Quorum Proocol Propery Resul Single-mess. rans. Quorum rans. (# processes) Saes Time Saes Time Paxos (6) Consensus Verified 6,247,530 23h 2,822,764 9h37m Fauly Paxos (6) Consensus Bug 524 12s 279 10s Echo Mcas (5) Agreemen Verified 9,222 2m22s 652 12s Echo Mcas (4) Agreemen Verified 9,986 1m55s 2,787 31s Echo Mcas (6) Wrong agr. Bug 66 9s 48 6s Regular sorage (5) Regulariy Verified 185,711 33m49s 20,039 3m4s Regular sorage (6) Wrong reg. Bug 72,937 12m37s 41,331 6m46s In Pracice Quorum Evaluaion Relaed Work Space/ime reducions (up o 89%/91%) 7 / 19
Relaion To Exising Work Specificaion of message-passing sysems (Agha e al. 1997, Aiya e al. 2004) Quorum ransiions are explicily defined or expressible. Sae space implicaions no sudied or shown o be inefficien (Bokor e al. 2010). In Pracice Quorum Evaluaion Relaed Work Model checking suppor for message-passing sysems (SPIN, Basse, ec.) Quorum ransiions are no direcly suppored. 8 / 19
Relaion To Exising Work Specificaion of message-passing sysems (Agha e al. 1997, Aiya e al. 2004) Quorum ransiions are explicily defined or expressible. Sae space implicaions no sudied or shown o be inefficien (Bokor e al. 2010). Quorum ransiions in pracice In Pracice Quorum Evaluaion Relaed Work Model checking suppor for message-passing sysems (SPIN, Basse, ec.) Quorum ransiions are no direcly suppored. 8 / 19
PART II: TRANSITION REFINEMENT Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work 9 / 19
Parial-Order Reducion (Clarke e al. 2000) Inuiion Diamond paern: independen ransiions 1 and 2. Explore represenaive execuion order, say, s 1 1 2 s2 s4. Parial order: 1 and 2 are unordered in every run. Propery preservaion All deadlock saes are visied. s 1 s 2 s 3 Preservaion of oher properies can be included. 1 2 s 4 2 1 Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work 10 / 19
Parial-Order Reducion (Clarke e al. 2000) Inuiion Diamond paern: independen ransiions 1 and 2. Explore represenaive execuion order, say, s 1 1 2 s2 s4. Parial order: 1 and 2 are unordered in every run. Propery preservaion All deadlock saes are visied. Preservaion of oher properies can be included. s 2 1 2 s 1 s 4 Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work 10 / 19
Parial-Order Reducion (Clarke e al. 2000) Inuiion Diamond paern: independen ransiions 1 and 2. Explore represenaive execuion order, say, s 1 1 2 s2 s4. Parial order: 1 and 2 are unordered in every run. Propery preservaion All deadlock saes are visied. Preservaion of oher properies can be included. In pracice Diamond is deeced by guessing he fuure : over-approximaion of he successors of s 1. Reducion: Only 1 is execued in s 1. s 2 1 2 s 1 s 4 Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work 10 / 19
Coarse POR is inefficien wih coarse ransiions Successor saes are coarsely over-approximaed. s 1 1 s 1 2 Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work s 6 ' s 2 s 3 s 4 ' s 5 s 2 2 s 4 1 s 3 ' s 5 11 / 19
Coarse POR is inefficien wih coarse ransiions Successor saes are coarsely over-approximaed. can enable ' No reducion in s 1 s 1 1 s 1 2 Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work s 6 ' s 2 s 3 s 4 ' s 5 s 2 2 s 4 1 s 3 ' s 5 11 / 19
Coarse POR is inefficien wih coarse ransiions Successor saes are coarsely over-approximaed. can enable ' No reducion in s 1 s 1 1 canno enable ' Reducion s 1 2 Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work s 6 ' s 2 s 3 s 4 ' s 5 s 4 1 s 3 ' s 5 11 / 19
Transiion Definiion (generalized from Godefroid 1996) The refinemen of ransiion S S is 1,..., k such ha every i. All properies are preserved (refinemen is "re-labeling" he sae graph). s 1 1 s 1 2 Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work s 2 s 3 s 2 s 3 2 1 s 4 Coarse s 4 Refined Transiion refinemen improves POR 12 / 19
Transiion Definiion (generalized from Godefroid 1996) The refinemen of ransiion S S is 1,..., k such ha every i. All properies are preserved (refinemen is "re-labeling" he sae graph). s 1 s 2 s 3 Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work HOW? 1 s 1 2 s 2 s 3 2 1 s 4 Coarse s 4 Refined Transiion refinemen improves POR 12 / 19
Transiion Definiion (generalized from Godefroid 1996) The refinemen of ransiion S S is 1,..., k such ha every i. All properies are preserved (refinemen is "re-labeling" he sae graph). s 1 s 2 s 3 Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work HOW? 1 s 1 2 s 2 s 3 1 s 1 2 s 2 s 3 2 1 3 4 s 4 Coarse s 4 Refined s 4 Over-refined Transiion refinemen improves POR 12 / 19
Our 1 s Proposed : Reply Spli A reply ransiion reply sends acknowledgemens. Process 1 "ACK 1 " Process 3. reply m 2 Process 2 "ACK 2 " reply reply spli ino reply -1 and reply -2. m 1 Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work Process 1 m 1 "ACK 1 " Process 3. reply -1 Process 2 m 2 "ACK 2 " Process 3. reply-2 13 / 19
Our 1 s Proposed : Reply Spli A reply ransiion reply sends acknowledgemens. Process 1 Process 2 m 1 "ACK 1 " m 2 "ACK 2 " Process 3. reply Coarse over-approximaion m 1 can enable ACK 2 reply reply spli ino reply -1 and reply -2. Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work Process 1 m 1 "ACK 1 " Process 3. reply -1 Process 2 m 2 "ACK 2 " Process 3. reply-2 Beer over-approximaion m 1 canno enable ACK 2 13 / 19
Our 2 nd Proposed : Quorum Spli Spli ransiion receives messages from a fixed quorum. Spliing majoriy funcion maj wih 4 processes: Process 1 Process 2 Process 3 Process 1 Process 2 Process 3 m 1 m 2 m 1 m 2 Process 4. maj Process 4. maj Process 1 Process 2 Process 3 Process 1 Process 2 Process 3 m 1 m 2 m 1 m 2 Process 4. maj -1 Process 4. maj -2 Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work Process 1 m 1 Process 1 m 1 Process 2 Process 4. maj Process 2 Process 4. maj -3 Process 3 m 2 Process 3 m 2 14 / 19
Our 2 nd Proposed : Quorum Spli Spli ransiion receives messages from a fixed quorum. Spliing majoriy funcion maj wih 4 processes: Process 1 Process 2 Process 3 Process 1 Process 2 Process 3 m 1 m 2 m 1 m 2 Process 4. maj Process 4. maj Process 1 Process 2 Process 3 Process 1 Process 2 Process 3 m 1 m 2 m 1 m 2 Process 4. maj -1 Process 4. maj -2 Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work Process 1 m 1 Process 1 m 1 Process 2 Process 4. maj Process 2 Process 4. maj -3 Process 3 m 2 Process 3 m 2 Coarse over-approx. For all i m i can enable maj. Beer over-approx. E.g., m 1 is necessary o enable maj -1. 14 / 19
Reply and Quorum (Combined) Spli Resuls Proocol Propery Resul Coarse rans. Combined spli (# processes) Saes Time Saes Time Paxos (6) Consensus Verified 2,822,764 9h37m 548,061 3h30m Fauly Paxos (6) Consensus Bug 279 10s 105 8s Echo Mcas (5) Agreemen Verified 652 12s 232 12s Echo Mcas (4) Agreemen Verified 2,787 31s 1165 18s Echo Mcas (6) Agreemen Verified >12mil >48h 7,087,193 42h21m Echo Mcas (6) Wrong agr. Bug 48 6s 48 9s Regular sorage (5) Regulariy Verified 20,039 3m4s 18,451 4m31s Regular sorage (6) Wrong reg. Bug 41,331 6m46s 6,987 2m34s Sae/ime reducions (up o 81%/64%) (In addiion o PART I reducions) Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work 15 / 19
Reply and Quorum (Combined) Spli Resuls Proocol Propery Resul Coarse rans. Combined spli (# processes) Saes Time Saes Time Paxos (6) Consensus Verified 2,822,764 9h37m 548,061 3h30m Fauly Paxos (6) Consensus Bug 279 10s 105 8s Echo Mcas (5) Agreemen Verified 652 12s 232 12s Echo Mcas (4) Agreemen Verified 2,787 31s 1165 18s Echo Mcas (6) Agreemen Verified >12mil >48h 7,087,193 42h21m Echo Mcas (6) Wrong agr. Bug 48 6s 48 9s Regular sorage (5) Regulariy Verified 20,039 3m4s 18,451 4m31s Regular sorage (6) Wrong reg. Bug 41,331 6m46s 6,987 2m34s Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work Sae/ime reducions (up o 81%/64%) (In addiion o PART I reducions) Caveas Splis can be inefficien, e.g., buggy mulicas (6). POR wih spli ransiions increases ime overhead, e.g., sorage (5). 15 / 19
Relaion To Exising Work User-defined ransiions The inpu of he POR algorihm is he se of all (user-defined) ransiions. Saic POR (Clarke e al. 2000). Dynamic POR (Godefroid e al. 2005). And oher POR implemenaion/semanics, e.g., Kahlon e al. 2009. Parial-Order Reducion Transiion Reply and Quorum Splis Relaed Work Operaion refinemen (Godefroid 1996) No sysemaic refinemen schemes. No evaluaion. 16 / 19
PART III: TOOL SUPPORT MP-Basse 17 / 19
The MP-Basse Model Checker Srucure: MP-Basse > Basse > Java Pahfinder MP-Basse Verifier: Saeful, POR-powered model checker for safey properies. Language: Java resricion for message-passing sysems wih quorum ransiions. hp://www.deeds.informaik.u-darmsad.de/peer/mpbasse/ MP-Basse Basse (Lauerburg e al. 2009) Saeful model checker for safey properies. Java resricion wih single-message ransiions only. Java Pahfinder (JPF) Saeful model checker for Java programs. 18 / 19
Possible exensions/fuure work Combinaion wih symmery reducion. Liveness model checking. Unmodified proocol implemenaions. Summary: Pracical MC of proocols wih quorum rans. 1 Evaluaed he use quorum ransiions in pracical verificaion. 2 Developed & evaluaed ransiion refinemen sraegies for message-passing sysems. 3 Implemened 1+2 and available as public ool. 19 / 19