this is for holding javascript data
Xavier Andrade added Parallelization2.tex
over 9 years ago
Commit id: 5797e7a6e84f46c1fc4eb134a0cdf853bbe51ccd
deletions | additions
diff --git a/Parallelization2.tex b/Parallelization2.tex
new file mode 100644
index 0000000..f3b3f09
--- /dev/null
+++ b/Parallelization2.tex
...
Parallelization in Octopus is performed on different levels. The most
basic one is domain decomposition, were the grid is divided in
different regions that are assigned to each processor. For most
operations, only the boundaries of the regions need to be communicated
among processors. Since the grid can have a complicated shape dictated
by the shape of the molecule, it is far from trivial to distribute the grid-points among processors. For this tasks we use a third party library called {\sc
ParMETIS}~\cite{Karypis_1996}. This library provides routines to
partition the grid ensuring a balance of points and minimizing the size
of the boundary regions, and hence the communication costs. An example
of grid partitioning is show in Fig.~\ref{fig:partitioning}.
Additional paralellization is provided by other data decomposition
approaches that are combined with domain decomposition. This includes parallelization over (k\)-points and spin, and over Kohn-Sham states.
The first parallelization strategy is quite efficient, since for each \(k\)-point or spin component the operations are independent. However, it is limited by the size of the system, and often cannot be even used (as in the case of closed shell molecules, for example).
The efficiency of the parallelization over Kohn-Sham states depends on the type of calculation being performed. For ground state calculations, the orthogonalization and subspace diagonalization routines~\cite{Kresse_1996} require the communication of states. In Octopus this is handled by parallel dense linear-algebra operations provided by the ScaLapack library~\cite{scalapack}. For real-time propagation, on the other hand, the orthogonalization is preserved by the propagation~\cite{Castro_2006} and there is no need to communicate Kohn-Sham states between orbitals. This makes real-time TDDFT extremely efficient in massively parallel computers~\cite{Andrade_2012,Schleife_2014}.
An operation that needs special care in parallel is the solution of
Poisson equation. Otherwise, it constitues a bottleneck in parallelization, as a
single Poisson solution is required independently of the number of states in the system. A considerable effort has been devoted to the
problem of finding efficient parallel Poisson solvers that can keep up
with the rest of the code~\cite{Garc_a_Risue_o_2013}. We have found that the most efficient methods are on FFTs, which require a different domain
decomposition to perform efficiently. This introduces the additional
problem of transfering the data between the two different data
partitions. In Octopus this was overcome by creating a mapping at
initialisation stage and using it during execution to efficiently
communicate only the data that is strictly necessary between
processes~\cite{Alberdi_2014}.