Development of high performance PC clusters computing system for engineering applications

number: 
1643
إنجليزية
Degree: 
Author: 
Mahmoud Shuker Mahmoud
Supervisor: 
Dr. Raad S. Fyath
Dr. Imad H. Al-Hussaini
year: 
2007

Abstract : Supercomputing technologies have been limited to the powerful first world nations for a very long time due to the extremely high costs involved in setting-up and maintaining supercomputers. Recently, the availability of high speed networks and increasingly powerful commodity microprocessors is making the usage of clusters, or network of computers as an attractive field for cost effective parallel computing. Therefore, high performance computing based on cluster computing allows scientists and engineers to deal with very complex problems using low cost system with supercomputing capabilities. The first part of this thesis focuses on the design, implementation and performance evaluation for a cost effective PC-based cluster system. This concept is demonstrated by the installation of two clusters which are based entirely upon commodity personal computer components. These clusters use 13 PC's with Intel Pentium III microprocessors (one as a master node and the other are compute nodes), switched fast Ethernet as a communication fabric, and a free distribution of Linux as an operating system. To compensate the limited number of available processors in low cost cluster systems used to parallelize complex applications, a library of parallel routines based on Single Instruction Multiple Data (SIMD) techniques was developed . In order to ensure the low cost concept, the Multi-Media eXtension (MMX) and Stream SIMD Extension (SSE) technologies available in most general purpose Intel's processors were exploited to construct the library routines. These techniques allow a parallelism within processor registers. Also to ensure high performance, the inline-assembly was used as a coding methodology for the library routines. Finally, the concept of generality was investigated in coding for most of the library routines to ensure its usage in the development of future applications. Cluster that supports multiple levels of parallelism was achieved by combining coarse and fine grain parallelisms. Combining coarse grain parallelism, achieved through distributing data among cluster nodes, and fine grain parallelism, achieved through using the proposed SIMD based library routines, higher performance was achieved at a good ice/performance ratio. Therefore, in the second part of this thesis, four applications have been modified to expose multiple levels of parallelism and their performances were evaluated. These applications are low-level image processing, vector quantization, H.261 video encoder, and steganalysis system. Using multilevel parallelism, most of the developed applications exposed a superlinear speedup performance, overcoming the theoretical limitation for the maximum speedup that can be achieved in any parallel system which is equal to the number of the available processors in that system. For example, speedup reaches approximately 230 in one of the
parallelized applications on a cluster of 13 nodes and supports multiple levels of parallelism.