Python with HPC/FEniCS

enter image description here

Here are some plots for a simple Poisson solver, using Python. The red bar is the total run time, and the right-hand bar is the actual computation. So, there is a problem - by the time the core count goes above 1000, we are spending more time just loading python “from dolfin import *” than actually computing.
Let’s look at the module imports:

python -v -c "from dolfin import *" 2>&1 | grep -e "\.py" | grep instant | wc -l

etc.

Here is the breakdown in number of .py files per module:

six	5
instant	20
FIAT	46
dolfin	70
numpy	146
ffc	154
python	166
ufl	202
sympy	319
TOTAL	1128

Now 1128 files might be OK to load on one machine, but when multiplied by 1152, that makes about 1.3M file accesses. From what I can understand, LUSTREfs (used on many HPC systems) is not really optimised for this kind of access. Nor is NFS.

Fortunately, there is a partial remedy available. Python natively supports imports from a .zip file, so if we just zip up all our files (I can certainly do that for all the FEniCS folders, though maybe not for python itself), and do something like:
export PYTHONPATH=~/fenics.zip:$PYTHONPATH
then each core will only read one file for instant, FIAT, dolfin, ufl and ffc (the .zip file) and decompress it and load the modules from memory.

Testing out this idea on ARCHER showed that it does work. I compressed FEniCS and sympy (the two worst offenders), and it reduced the load time on 768 cores from 71s to 39s.

Some caveats: you can’t zip up shared object .so files. Python uses dlopen() to access these, so they have to be separate files. Also, though we can bring down the load time, the fundamental scaling is bad, so it is probably necessary, at some point, to simply reduce the number of machines which are reading, and distribute the file contents using MPI. It would be nice to be able to use a FUSE virtual filesystem, or maybe mount an HDF5 file as a mount point, and use MPI-IO to access it…

Any comments, most welcome!

Written with StackEdit.

FEniCS/dolfin on HPC Clusters

Wednesday, 15 October 2014

Why does python take so long to load?

Python with HPC/FEniCS

No comments:

Post a Comment

Blog Archive