In short: we recommend a modern NVIDIA GeForce graphic card!
flox-GPU has special hardware requirements: it requires the use of a graphic card (GPU). If you have a system and are uncertain whether it runs flox-GPU, get the free demo and see if it works for you. Run some of the test sets and compare with given benchmark times for different hardware.
Why does flox-GPU need a GPU?
The strengths of flox-GPU rely on the raw power bundled in todays GPUs. In terms of FLOPS, cutting edge GPUs are ahead of almost all CPU solutions in a comparable price range.
What GPU do I need?
flox-GPU uses NVIDIA’s CUDA API to target GPUs on your system. It exploits the parallel performance provided by the many parallel processors in modern GPUs. The nature of the computation performed, maps precisely to the architecture of these devices. The resulting performance (computed cells per time) is not achievable on state of the art CPUs. flox-GPU therefore requires a supported GPU!
Since CUDA is a vendor specific standard, the only GPUs supported are those manufactured by NVIDIA. While we would love to support other GPUs as well, the usage of CUDA does not allow for this yet. Within the NVIDIA GPUs, different product lines are supported:
- NVS (not recommended)
- GeForce
- Quadro
- Tesla
A few remarks about the different product lines:
NVS:
A line of multi-display GPUs, which are technically supported (see compute capability list) but not recommended due to memory size and computational power limits.
Geforce:
While technically designed (and branded) for computer gaming, GeForce cards provide the most bang for the buck when working with flox. They exist in a wide variety of models and versions which differ in clock speed, memory size and cooling options. Powerful models from last year’s generation are affordable and work well. The Titan/Titan X products are also part of the GeForce product line and usually provide the best memory/processor speed combination. If they fit your budget, we recommend those.
Quadro:
The Quadro GPUs, being NVIDIA’s professional workstation line provide excellent computational power and memory set ups. The signed drivers, special rendering advantages (think CAD applications) and generally higher reliability compared to GeForce make the cards quite a bit more expensive while providing little benefit to the use of flox-GPU.
Tesla:
Specially designed for GPGPU processing like flox-GPU uses it. Tesla cards provide high performance in 64bit floating point calculation. This however is not required for flox-GPU. This line of cards also provides error-correcting code (ECC) memory, which together with their moderate clock speeds makes them more reliable than GeForce cards.
flox-GPU requires a NVIDIA GPU with compute capability 3.0 or higher. The compute capability is a number assigned to certain architectural features of a GPU of which subsequently a program (like flox-GPU) can take advantage of.
You can find a list of GPUs with the compute capability they provide here.
What GPU model provides the best use depends entirely on the domain size, the time available per simulation and generally speaking the circumstances under which you want to work with flox-GPU. Therefore there is no perfect GPU for flox-GPU, performance depends on your simulation demands. Still, here are a few pointers:
GPU clock rate:
While not always the limiting factor in a simulation, usually the faster the better. Most cards provide a base clock and a boost clock. If no thermal restrictions are put on the card (ie bad casing, bad cooling, overheating of individual cards) they usually work at their boost clocks without issues.
GPU memory:
GPU memory size limits the wet domain size and to a lesser degree the overall domain size. flox-GPU does dynamic memory management for the wet domain, so you can carelessly model very big domains if only a small percentage of the cells are actually wet (see documentation for details on this). The following table provides a rough number for the memory needed for a completely wet domain of a given size (which in terms of memory is the worst case).
domain size [cells] | estimated GPU memory required[MB] |
100×100 | 1 |
200×200 | 4 |
500×500 | 25 |
1000×1000 | ~200 |
3000×3000 | ~850 |
4000×4000 | ~1400 |
7000×7000 | ~4’500 |
Turning on or off different features in flox-GPU obvisously changes these numbers. They only serve as rough estimate.
GPU cooling:
Generally speaking, you want your GPU to be as cool as possible. Having a longer simulation run on your GPU can put it on a high load (80% and more) for the whole duration of the run. This can result in relatively high temperatures on your hardware. Modern hardware is absolutely capable of handling these temperatures under load for an extended amount of time. If your casing or hindered air flow etc. limits the capacity of heat being transferred away from the GPU, they will throttle their clock or shut down completely before they overheat and damage themselves. This should be avoided through a good cooling solution. If you plan to use several GPUs in the same workstation, make sure to either space them adequately from each other or use a sufficient cooling solution. Reference coolers usually provide better cooling if you plan a workstation with 3 or more GPUs, while custom (so called “Aftermarket”) cooling solutions usually work best on single or dual card setups.
What other things are to consider other than the GPU on your workstation?
CPU:
Generally speaking, speed of your CPU is neglectable compared to speed of your GPU when tuning for performance in flox-GPU. The system typically spends less than 10% of the time on CPU, the rest of the time it is idle. Any current mid-segment CPU will work fine with flox-GPU.
Memory (CPU):
Most simulations are limited by memory on the GPU (state of the art GPUs can have up to 12GB of memory) and not by memory available to the CPU. However: exceptionally large domains can use up a lot of CPU memory too. So, if you plan to work on very large domains, you will need a lot or RAM. As a rule of thumb we recommend about 2-3 times the amount of GPU memory for your CPU, taking into account that it is not only running flox-GPU.
Harddrive:
When working locally on a machine, in- and output (especially for large simulation domains) can benefit from fast read and write operations on the disk. Therefore a SSD or similar is advisable. Similarly, when working over a network, the network connection speed can also have an impact on the user experience.