Allocate 3 nodes with 1 task of 24 cpus each:
```shell
salloc -N 3 -n 3 -c 24
```

Within the allocation, make sure you have dask etc (e.g. by activating a virtual or conda env).

Start a scheduler (on the first node) and three workers (one per node):
```shell
$ srun -n1 -N1 -r0 dask-scheduler --scheduler-file scheduler.json &>> scheduler.log &
$ srun -n3 -N3 --cpus-per-task=24 dask-worker --scheduler-file scheduler.json --nthreads=24 &>> worker.log &
````

Start a Python prompt and
```python
In [1]: from dask.distributed import Client, progress                                                                                                                                                       

In [2]: client = Client(scheduler_file='scheduler.json')                                                                                                                                                    

In [3]: from dask import array as darr                                                                                                                                                                      

In [4]: xy = darr.random.uniform(0, 1, size=(1000e9 / 2 / 8, 2), chunks=(1e9 / 8 / 2, 2))                                                                                                                   

In [5]: pi = 4 * ((xy ** 2).sum(axis=-1) < 1.0).mean()                                                                                                                                                      

In [6]: pi = pi.persist()

In [7]: progress(pi)
[########################################] | 100% Completed |  1min  2.6s

In [8]: print(pi.compute())                                                                                                                                                                                
3.141586099776

```