August 17, 2015
Parallelization Made Easy in pSeven
Parallelization in pSeven can be understood in several ways:
- Internal parallelization: some blocks, like Optimizer or ApproxBuilder, can create parallel threads internally to speed up the algorithms.
- Branch parallelization: different branches of workflow can start in parallel since there are no dependencies between them
- Parallelization of the whole workflow: this feature allows dealing with batch input and performing several independent calculations at a time.
Imagine a problem that requires an evaluation of tens or hundreds of design space points. Engineering simulations can be time-consuming, so the idea to run them in parallel comes to mind right away. This note will show how parallelization of simulations can be made easily in pSeven.
First, we should be sure that different workflows, running in parallel, do not interact with each other (for example, do not write to the same file on disk). This can be achieved with a sandbox concept.
- Sandbox is a working directory of a block during run-time that contains files used and created by the block during execution.
- A prototype directory for block sandbox can be set, so the content of such directory is copied to the sandbox when it is initialized.
- Sandboxes and transfer of files and variables through ports allow making a workflow path-independent.
Second, computational workflow should be grouped into a composite block. Composite block acts as a wrapper for a part of the workflow and allows organizing workflows hierarchically. The inputs and outputs of internal workflow turn into ports of composite block.
Third, we should enable and setup Parallel Execution feature of the composite block. Chosen ports now await a list of input points.
pSeven automatically splits it into single points and starts a computational workflow for each of them. One can set the maximum number of parallel instances running at a time. Once any of these virtual workflows finishes calculation, it receives the next point from the input list. The result is also a list, containing the responses of the computational model. That is the way of workflow execution parallelization in pSeven works.
Notice that parallelization can be naturally combined with other pSeven features. Most important use-cases are:
Rebuilding of geometry model usually cannot be parallelized, so pSeven automatically prevents CAD blocks from different parallel instances from simultaneous access to a CAD system
Since we are using sandboxes, there is no trouble with running the HPC calculation using a built-in cluster manager integration. Each ShellScript block would simply create its own job on a cluster.
Parallelization allows speeding up not only the data gathering for Design of Experiments but also the optimization process. Optimizer in pSeven can operate in batch mode when it generates several points at each iteration. This feature together with workflow parallelization can significantly reduce the time of optimization. See more about the batch mode in Optimizer in forthcoming notes.
By Anton Saratov, Application engineer, DATADVANCE