Jan 11 2009

## Papers & Publications

### Abstract

In this paper, a parallel solution framework for the linear static analysis of large structures on PC clusters is presented. The framework consists of two main steps: data preparation and parallel solution. The parallel solution is performed by a substructure based method with direct solvers. The aim of the data preparation step is to create the best possible substructures so that the parallel solution time is minimized. An actual structural model was solved utilizing both homogeneous and heterogeneous PC clusters to illustrate the performance and applicability of the presented framework.

**Keywords: ***substructure; parallel solution; heterogeneous clusters; homogeneous clusters; workload balancing; partitioning; repartitioning*

**A Comparative Study on Two Different Direct Parallel Solution Strategies for Large-Scale Problems**

**by T. Bahcecioglu, S. Ozmen and O. Kurc**

**Abstract**

This paper presents a comparative study on two different direct parallel solution strategies for the linear solution of large scale actual finite element models: global and domain-by-domain. The global solution strategy was examined by utilizing the parallel multi-frontal equation solver, MUMPS, together with a finite element program. In a similar manner a substructure based parallel solution framework was utilized for investigating the domain-by-domain strategy. Various large-scale structural models were solved with both solution strategies in order to illustrate the efficiencies and weaknesses of each solution strategy. The test runs were performed on a homogeneous PC cluster composed of eight computers connected with an ordinary 1 GBit network switch.

**Keywords:***Parallel solution, multi-frontal, substructure, large-scale, PC cluster*

**A Substructure Based Parallel Dynamic Solution of Large Systems on Homogeneous PC CLusters**

**by S. Ozmen, ****T. Bahcecioglu ****and O. Kurc**

**Abstract**

This study focuses on developing a parallel solution framework for the linear dynamic analysis of large structural models on homogeneous PC clusters. The framework consists of two separate stages where the former is preparing data for the parallel solution that involves partitioning. The latter is a fully parallel finite element analysis that utilizes substructure based solution approach with direct solvers to perform implicit integration. The linear dynamic analysis of a large scale model was performed on a homogeneous PC cluster and the number of computers was varied in order to demonstrate the performance and the efficiency of the overall solution framework. The performance of the implemented framework was also compared with the widely acknowledged parallel direct solver, MUMPS.

**Keywords:***Dynamic Analysis; Parallel Solution; Substructure; Workload Balancing; PC Clusters*

**A Comparative Study on Parallel Solution Algorithms for Linear Dynamic Analysis**

**by ****T. Bahcecioglu,**** S. Ozmen **** ****and O. Kurc**

**Abstract**

Linear dynamic analysis is widely utilized for solving soil-structure interaction problems with finite element method. When the domain is modeled in three dimensions, the size of the problem, hence the solution time, increase significantly. Thus, parallel computing not only reduces the solution time but allows modeling larger models.

Newmark’s time integration algorithms were being implemented in various analysis software. Depending on the integration parameters, the integration can be performed in two different ways. The first approach, implicit Newmark, requires the assembly and solution of the system equations. In computational point of view, the fully assembled system matrices have to be factorized once and solved repetitively at every time step. The second approach, explicit Newmark, performs the solution at the element level. In other words, at every time step, only the element forces are computed and the system level matrices are never formed. The explicit algorithms are, however, conditionally stable, and may require very small time steps.

This study presents a comparative study on the parallel versions of the implicit and explicit versions of the Newmark time integration algorithms. The parallel version of the implicit integration algorithm initiates by forming the system matrices in parallel. Then, the system matrices are factorized utilizing either a multi-frontal or a substructure based (multiple-front solver) solver. The system is solved and displacements obtained on host process are distributed to the corresponding processers to be used in calculations at the next time step. When substructure based solver is utilized for the solution, the substructures are formed in such a way that the computational work among processors/computers are balanced. The parallel version of the explicit integration algorithm first partitions the structural model among processors/computers. During partitioning dual graph representation of the structural model and PARMETIS library is utilized. As the partitioning is finalized, substructures at every processor/computer are created. At every time step, the element forces are computed, solved and nodal accelerations, velocities and displacements are computed.

Soil structure interaction problems were chosen as the test case and models with different mesh and earthquake characteristics are used for comparison focusing on speed up, accuracy and numerical stability provided by both algorithms. Analyses were performed on a homogenous PC cluster running on Windows operation system by using different number of processors and computers.

**Keywords: ***Dynamic Analysis; Parallel Solution; Partitioning; Newmark Algorithm;** *Substructure;* PC Clusters*

**Schur Complement Computation for**

Dense Matrices on GPGPUs

Dense Matrices on GPGPUs

**by **** S. Ozmen ****and O. Kurc**

**Abstract**

In this paper, Schur complement computation for symmetric positive definite dense matrices with partial Cholesky factorization that utilizes GPGPU is

presented. Partial Cholesky algorithm was implemented for obtaining Schur complement of a dense matrix on hybrid system of CPU and CUDA supported

graphical processors by either using provided CUBLAS routines or optimized custom routines. Performance of the blocked algorithm was investigated on

various sizes of dense matrices by utilizing different CUDA supporting graphical cards and the solution times were compared not only with the pure CPU timings but also with the timings of CULA which is a commercial dense algebra library for GPGPU.

**Keywords: ***Schur Complement; Cholesky Factorization; Dense Linear Algebra; Hybrid Algorithms; CUDA*