Smith Waterman Algorithm - Performance Analysis Armin Bundle Department of Computer Science University of Erlangen Seminar mucosim SS 2016 Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 1 / 18
Outline 1 The Smith Waterman Algorithm The concept 2 Profiling and data structure 3 The code 4 Likwid performance measurement 5 Problems and Outlook Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 2 / 18
The Smith Waterman Algorithm The concept (1) The Smith Waterman Algorithm does local sequence alignment to find similar regions in e.g. DNA or protein sequences. A sequence alignment is a sequence of edit-operations. Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 3 / 18
The Smith Waterman Algorithm The concept (2) is a variation of the Needleman-Wunsch algorithm to compare two sequences and create a global similarity score Application area of the SW: The search for genes in which sequences are similar to well known genes The algorithem uses the method of dynamic programming The complexity is quadratic Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 4 / 18
The Smith Waterman Algorithm First step: the matrix initialisation Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 5 / 18
The Smith Waterman Algorithm Example input data Calculation function f = -1 MatchScore = 2 MismatchScore = -1 w(x, y) = { m, x=y mm, else Evaluate the neighbours 0 F (i 1, j 1) + w(x i, y i ) F (i, j) = max F (i 1, j) + f F (i, j 1) + f Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 6 / 18
The Smith Waterman Algorithm Second step: calculation of the local alignment score of the matrix Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 7 / 18
The Smith Waterman Algorithm Second step: calculation of the local alignment score of the matrix Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 8 / 18
The Smith Waterman Algorithm Third step: Traceback Matrix Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 9 / 18
Profiling and data structure Profiling Profiling Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 10 / 18
Profiling and data structure Data structure Input value (reference value is 41) Arrays sequence size = 1 <<(scale / 2) main sequence & match sequence Memory: 1mb (41) goodscores & scores Memory: 4.8 kb goodendsi, goodendsj, index & best Memory: 2.4 kb weights Memory: 0.6 kb Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 11 / 18
The code The code (1) Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 12 / 18
The code The code (2) Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 13 / 18
The code The code (3) Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 14 / 18
Likwid performance measurement Likwid performance measurement - value Branch misprediction rate 7.8e-6 Load to Store ratio 5.5 CPI 0.42 L2 bandwidth [MBytes/s] 5702 L2 data volume [GBytes/s] 606.2 L2 miss rate 0.0084 L3 bandwidth [MBytes/s] 5180 L3 data volume [GBytes/s] 550.0 Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 15 / 18
Likwid performance measurement Runtime Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 16 / 18
Problems and Outlook Problems & Outlook Problems Run the code with MPI Catching a node for memory messurements The roofline model Outlook Change the Data structure or the order of the sequence array access Use MPI to see how the performance increases Use the SIMD technology of CPUs Convert the code for GPUs Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 17 / 18
Appendix Sources I https://pressbit.wordpress.com/2014/03/07/lokalessequenzalignment-mit-dem-smith-waterman-algorithmus-in-c Mrz 7, 2014 Smith Waterman Algorithm - Performance Analysis Seminar mucosim SS 2016 18 / 18