David Strubbe edited Parallelization2.tex  over 9 years ago

Commit id: 83072a3f005be9b76509c66440b3247716731855

deletions | additions      

       

different regions that are assigned to each processor. For most  operations, only the boundaries of the regions need to be communicated  among processors. Since the grid can have a complicated shape dictated  by the shape of the molecule, it is far from trivial to distribute the grid-points among processors. For this task we use a third party third-party  library called {\sc ParMETIS}~\cite{Karypis_1996}. This library provides routines to  partition the grid ensuring a balance of points and minimizing the size  of the boundary regions, and hence the communication costs. An example 

Additional parallelization is provided by other data decomposition  approaches that are combined with domain decomposition. This includes parallelization over \(k\)-points and spin, and over KS states.   The first parallelization strategy is quite efficient, since for each \(k\)-point or spin component the operations are independent. However, it is limited by the size of the system, and often cannot even be used is not available  (as in the case of closed shell closed-shell  molecules, for example). The efficiency of the parallelization over KS states depends on the type of calculation being performed. For ground state calculations, the orthogonalization and subspace diagonalization routines~\cite{Kresse_1996} require the communication of states. In Octopus this is handled by parallel dense linear-algebra operations provided by the ScaLapack ScaLAPACK  library~\cite{scalapack}. For real-time propagation, on the other hand, the orthogonalization is preserved by the propagation~\cite{Castro_2006} and there is no need to communicate KS states between processors. This makes real-time TDDFT extremely efficient in massively parallel computers~\cite{Andrade_2012,Schleife_2014}. An operation that needs special care in parallel is the solution of the  Poisson equation. Otherwise, it constitutes a bottleneck in parallelization, as a