next up previous contents
Next: Help Up: xi3: Tuning Program for Previous: xi3: Tuning Program for

Sample Run

Here is a sample procedure showing how to use xi3 in practice. We are looking at the problem of 8 particles at SC lattice sites in (5, 5, 5) periodic box. A sample configuration file (xi3.scm) is given in the package:
; sample set file for xi3
; SC lattice config of 8 particles in (5,5,5) box
; $Id: manual.tex,v 1.5 2008/10/12 20:16:53 kichiki Exp $
(define version    "F")     ; version. "F", "FT", or "FTS"
(define flag-mat   #t)      ; #t => matrix scheme, #f => atimes scheme
(define flag-notbl #f)      ; #t => no-table,      #f => with table

(define np         8)       ; number of particles
(define ewald-eps  1.0e-12) ; cut-off limit for Ewald summation

; lattice vector
(define lattice '(5.0  5.0  5.0))

; configuration of particles
(define x #(
0.0  0.0  0.0
2.5  0.0  0.0
0.0  2.5  0.0
0.0  0.0  2.5
0.0  2.5  2.5
2.5  0.0  2.5
2.5  2.5  0.0
2.5  2.5  2.5
))

; list of time ratio Tr/Tk for Ewald summation (optional)
;(define ewald-trs
;  '(0.1
;    1.0
;    10.0
;    100.0
;    ))
Here is a part of the result:
# F version table matrix
0.110000 0.245379 22.087 21.575 0.512 1.33163581314065055e-01 2197 125 1713 80
0.121000 0.249308 21.106 20.592 0.514 1.33163581314059309e-01 2197 125 1713 80
0.133100 0.253300 20.812 20.224 0.588 1.33163581314069690e-01 2197 125 1689 92
...
Each line of the output consists of 10 columns in this case, that is, for F version with table. First 5 columns are the same for any cases; First and second columns are $ R_T$ and $ \xi $ (see below in details). The next 3 are CPU times in milli-seconds for real space, reciprocal space, and the total calculations, respectively. The next column is, for F version, the averaged velocity obtained by the calculation (see below in details). For FT and FTS versions, there are 2 and 3 numbers there. Next two integers show the lattice points within the range for real and reciprocal summations. Note that the numbers are those in the cubic regions (not spherical). For non-table version we are taking into account the lattice points within the cubic region specified by the numbers of lattice points in $ x$, $ y$, and $ z$ directions. For table version, on the other hand, we apply more complicated (and empirical) criteria for the truncation of lattice sum and roughly speaking this reduce the region from cubic to spherical. In the case, the final two integers, which are the actual numbers of points for real and reciprocal summations we took, are added.

In xi3 program as well as libstokes library, another parameter $ R_T$ instead of $ \xi $ is used. $ R_T$ is a rough estimation of CPU time ratio between real and reciprocal summations and related to $ \xi $ as

$\displaystyle R_T = \frac{ \left( l_x l_y l_z \xi^3 \right)^2 }{\pi^3} = \frac{T_{real}}{T_{recip}} ,$ (2.31)

where

$\displaystyle T_{real} \propto l_x l_y l_z \xi^3 , \quad T_{recip} \propto k_x k_y k_z = \frac{\pi^3}{l_x l_y l_z \xi^3} .$ (2.32)

Figure 2.1: $ \xi $ versus $ R_T$.
\includegraphics[width=7cm]{figures/FIG-xi3-xi}
Figure 2.1 shows $ \xi $ vs. $ R_T$. (Those are at the first and second columns in the result file.) Note that this is implemented in the routine xi_by_tratio ().

Changing $ R_T$, the number of lattice points in real and reciprocal summations are changing: The former is decreasing and the latter increasing as $ R_T$ is increasing. Because the calculation result is independent of $ \xi $ and therefore $ R_T$, we can use this parameter to tune the calculation of the Ewald summation. That is, we can take a specific value of $ \xi $ which minimize the calculation cost. This is the whole purpose of xi3 program. Figure 2.2 shows CPU times for real and reciprocal spaces and the total.

Figure 2.2:
\includegraphics[width=7cm]{figures/FIG-xi3-CPU}
As we see, there is an obvious minimum point on the total CPU time. In this example (for SC lattice of $ N=8$ particles in $ (5, 5, 5)$ periodic box), the minimum is around $ R_T\approx 4$.

Previously, I wrote that the calculation result is independent of $ \xi $ and therefore $ R_T$. This is the mathematical conclusion and therefore this is a good check for the code:

The results should be the same for various $ \xi $ (and therefore $ R_T$).
Actually, we truncate the lattice summations at the point where the term is small enough. The criteria is given by another parameter ewald_eps. In this example, we take $ {\tt ewald\_eps} = 10^{-12}$. (Small enough, isn't it?) In the code of xi3, we calculate not physical problems but the plain $ \mathbf{A}\cdot\mathbf{x}$ calculation for the mobility matrix $ \mathbf{A}$ and a vector $ \mathbf{x} = (1,1,\cdots,1)^\dagger$. The 6th column of the result xi3 generates is the average of $ \mathbf{A}\cdot\mathbf{x}$, that is, a kind of averaged velocity. (``a kind of'' means that the average is taken element-wise rather than particle-wise.)
Figure 2.3:
\includegraphics[width=7cm]{figures/FIG-xi3-err}
Figure 2.3 shows the calculated results versus $ R_T$. The values in y-axis is the absolute value of the difference to a point at $ R_T \approx 10$, which I just pick to see the fluctuations, in other words, the empirical error. (You should note that if everything is working good, this approach works, but otherwise it is not.) It looks OK. Actually, my cut-off criteria with ewald_eps might be a little hard (because the error is less than 1.0e-13, one order lower than expected). But it does not harm and I leave it.


next up previous contents
Next: Help Up: xi3: Tuning Program for Previous: xi3: Tuning Program for
Kengo Ichiki 2008-10-12