Users:General FEM Analysis/Solvers Reference/Linear solvers

From Carat++ Public Wiki
Jump to: navigation, search


Contents

Best practice

In this section all available linear solvers are explained and an example for the input syntax is given. In-house solvers are implemented as a back-up if no external solver library is available. This is why they are used in almost every benchmark example.

It is strongly recommended to use the MKL or TRILINOS package to reduce equation solving time. Best practice for standard problems on a personal computer:

 PC-SOLVER 1: MKL_PARDISO
 DATA_STORAGE = IN_CORE    ! IN_CORE  OUT_OF_CORE  SWAP
 BANDWITH = CUTHILL_MCKEE  ! can even be switched of by commenting this line

For more details see the following sections with detailed explanations for all solver settings

In-house solvers

Carat++ provides two direct linear equation solvers that do not depend on any 3rd part library. Sparse matrix algebra and equation solving are part of the in-house software development.

Crout Skyline

Crout Skyline is one of the most frequently used solvers. It turned out to be the standard solver for Windows users or Linux users that do not use the TRILINOS software package. As the name already tells, it is a Crout matrix decomposition algorithm derived from Gaussian elimination. It transforms the matrix K into the form K = L D L' where L is a triangular and D a diagonal matrix. Sparse matrix storage is done by a skyline method, [1] which is not very memory efficient, but provides quick matrix data access.

A Crout Skyline solver is defined in the input file as follows:

 PC-SOLVER 1: CROUT_SKYLINE
 BANDWITH = CUTHILL_MCKEE

The last line in the input block activates a bandwidth optimization based on the Cuthill-McKee algorithm. If a skyline storage scheme is used a bandwidth optimization is strictly recommended, because it reduces the memory overhead generated by the skyline method dramatically.

LDL

This was the first linear solver available in Carat++. It works on a Cholesky decomposition based Matrix factorization of the form K = L D L'. The LDL solver is outperformed by the Crout Skyline solver, but it is based on a compressed sparsity column scheme (CSC) [2] which is a memory efficient and general storage method for sparse matrices.

An input sequence is shown below.

 PC-SOLVER 1: SANDIA_LDL
 BANDWITH = CUTHILL_MCKEE

Although the memory request of the CSC storage scheme does not depend on the matrix's bandwidth an optimization should be performed anyway if a direct solver is used as it reduces the fill-in produced by the equation solving algorithm.

MKL-solvers

CG

General Info:

  • Conjugate Gradient Iterative Solver with optional Jacobi preconditioner
  • The CG method may fail to compute the solution or compute the wrong solution if the matrix of the system is not symmetric and positive definite.


An input sequence is shown below:

 PC-SOLVER 1 : MKL_CG
 MAX_NUM_ITER = 5000
 REL_TOLERANCE = 1E-25
 PRECOND=JACOBI ! PRECOND=NONE
 BANDWITH = CUTHILL_MCKEE

PARDISO

General Info:

  • State of the art solver
  • The Pardiso solver is able to solve any kind of problems. Symmetric, Unsymmetric, etc. Up to know the solver can be used for this two kind of problems.
  • Solver can be run in different data storage modes
    • IN_CORE: All data is kept in main memory (default)
    • OUT_OF_CORE All data is written to hard disk (ooc_* files, stored in working directory)
    • SWAP: Keeps data in-core until main memory size is exceeded.


An input sequence is shown below:

 PC-SOLVER 1: MKL_PARDISO
 BANDWITH = CUTHILL_MCKEE
 MATRIX_TYPE = FULL   !SYMMETRIC or FULL -> Default Vaule 'SYMMETRIC'
 DATA_STORAGE = IN_CORE    ! IN_CORE  OUT_OF_CORE  SWAP

TRILINOS-solvers

TRILINOS offers two packages to solve large and sparse linear equation systems: Amesos, which is an interface package to direct solution algorithms, and Aztec00, providing different iterative solution methods. All TRILINOS solvers are based on a compressed row storage (CRS) [3] scheme which is very similar to the CSC method, inheriting all its drawbacks and advantages.

Amesos

Although Amesos already includes an equation solving algorithm it is not narrowed to this algorithm. Amesos provides an interface to several of the most famous and powerful open source equation solver available at this point of time. All this algorithms can be plugged in into TRILINOS and used via the same input interface.

The input syntax is shown below the explanations of the particular solvers.

Amesos KLU

KLU is Amesos' native solver, working on a pivoting LU-factorization. It performs in the range of the Crout Skyline solver.

Amesos UMFPACK

This is probably the most famous algorithm accessible via Amesos. UMFPACK appears as a built-in routine in MATLAB and is a set of routines for solving unsymmetric sparse linear systems. [4] It is a very powerful solver and yielded computation times that were smaller by a factor of 7 compared to KLU on bigger benchmarks (<100.000 degrees of freedom). The UMFPACK solver package has to be installed separately and linked via the TRILINOS build.

A drawback of the usage of UMFACK via Amesos is that Amesos adresses the 32-bit routines of UMFPACK which are only able to adress 2GB of memory.

Amesos SuperLU distributed

TRILINOS supports data management for sparse matrices on distributed memory architectues. For this reason it is common-sense to provide a distributed memory system supporting equation solver (KLU and UMFPACK can be used for parallel computing, too, but the whole data will be shifted to one machine to perform the solution in serial). The SuperLU algorithm provided Xiaoye Sherry Li [5] turned out to be a good choice for this purpose, as it is available for serial architectures as well as for shared and distributed memory systems.

 PC-SOLVER 1: TRILINOS_AMESOS
 TYPE=KLU   ! choose one of the following: KLU  UMFPACK  SUPER_LU_DIST
 BANDWITH = CUTHILL_MCKEE

Aztec00

Aztec00 is the package of TRILINOS which supports iterative equation solving. It provides several Krylow subspace solution methods such as GMRES, CG, CGS, BiCG, BiCGStab or TFQMR. As the current applications within Carat++ only deliver equation systems which are symmetric positive definite, the CG algorithm is selected by hard-code.

As preconditioning has got a heavy influence onto the convergence behaviour of a Krylow method, Aztec provides several preconditioning approaches usch as Jacobi preconditioning, the most simple approach, or preconditioning by Neumann or least squares polynomials. Incomplete Cholesky or LU factorization can be applied, too. In case of parallel computing, these methods will precondition each process independent from the other processes. So the convergence behaviour might depend on the number of processes involved.

The following section will present the input sequence and the paramters of the Aztec00 solver.

Compulsory Parameters
Parameter Values, Default(*) Description
PC-SOLVER int Solver ID
PC-SOLVER TRILINOS_AZTEC00
REL_TOLERANCE real break criteria
MAX_NUM_ITER int maximum number of iterations to be performed
PRECOND NONE, JACOBI, NEUMANN, LS, DOMAIN_DECOMP select a pre-conditioner
SUBDOMAIN ICC, LU preconditioning on domain decomposition level by incomplete factorization, either Cholesky or LU. Select PRECOND = DOMAIN_DECOMP to use a sub-domain preconditioner.
REORDER 0,1 determines is RCM reordering is used within incomplete factorization (1) or not (0)
TYPE_OVERLAP SYMMETRIC, STANDARD determines how to treat preconditioning when different processes have computes different value for shared variables

STANDARD: Only concider the information concidered by the hosting process
SYMMETRIC: Add the results to keep the preconditioner symmetric.

OVERLAP int augment each sub-process matrix with rows/columns of the neighboring processes in int steps.
GRAPH_FILL 0,1,2 controlls the fill-in produced by incomplete factorization preconditioning. Higher vales will lead to a better preconditionier, but request more memory
KEEP_INFO 0,1 controlls if matrix information is kept in memory after the solving (1) or not(0)
REUSE 0,1,2 reuse information from previous solution

0: Don't reuse anything
1: Reuse preconditioner information, eg. symbolic factorization
2: Reuse complete preconditioner and use former solution as initial guess

OUTPUT 0,-1,int Give information about iteration progress

0: No information
-1: Summary at the end
int: Print information every int-th step

DIAGNOSTICS 0,1 Print out diagnostic data (1) or not (0)
BANDWITH CUTHILL_MCKEE, NONE Choose the bandwidth optimization method


 PC-SOLVER 1 : TRILINOS_AZTECOO
 REL_TOLERANCE=1E-7  MAX_NUM_ITER=2000
 PRECOND=JACOBI        !choose:  NONE  JACOBI  NEUMANN  LS  DOMAIN_DECOMP
 SUBDOMAIN_SOLVE=ICC   !choose:  ICC   LU 
 REORDER=1  TYPE_OVERLAP=SYMMETRIC OVERLAP=1
 GRAPH_FILL=0  KEEP_INFO=0  REUSE=0 OUTPUT=0 DIAGNOSTICS=0
 BANDWITH = CUTHILL_MCKEE



References

  1. http://www.netlib.org/linalg/html_templates/node96.html
  2. http://www.netlib.org/linalg/html_templates/node92.html#SECTION00931200000000000000
  3. http://www.netlib.org/linalg/html_templates/node91.html#SECTION00931100000000000000
  4. http://www.cise.ufl.edu/research/sparse/umfpack/
  5. http://crd.lbl.gov/~xiaoye/SuperLU/index.html




Whos here now:   Members 0   Guests 1   Bots & Crawlers 0
 
Personal tools
Content for Developers