Performance vs Productivity
While designing HPC applications, the trade-off between developer productivity versus application performance is a major concern. Developer-friendly HPC programming models may ensure abstraction for quicker turnaround time for application development with less possibility of programmer error. However, it may not guarantee fine-level performance controlling factors. On the other hand, a programming model delivering high performance may be highly tedious to interdisciplinary researchers with a longer turnaround time and a high possibility of semantic errors. Graph algorithms have wide usage in scientific computing and data analytics such as biological, and social network analysis, grouping similar proteins, and modeling transportation/communication networks. In recent years, the vast increase of social, information, and genomic data demands systems that offer large memory capacity and huge amounts of computation. With the advancement of high-throughput parallel computing architecture like multicore CPUs and many-core GPUs, the gap between the observed and expected performance of graph-based applications is widening. There are standard programming models and interfaces for designing shared-memory and distributed-memory parallel or a combination of both adopted by researchers in this research domain. However, design abstraction for parallel execution, data communication, code portability to different parallel architectures (e.g., CPU to GPU), and memory footprint often contribute to the programmer’s difficulty. Given the irregular memory access behavior of graph algorithms, studying the performance metrics of the graph kernels built on different programming models can be a good indicator of how those programming models compare. The evaluation of some key graph algorithm kernels on exascale computing platforms is helpful to practitioners and researchers of both exascale systems and graph kernels.