| Are there significant research advantages to super-fast turnaround (less than 15 minutes, enabled by massive parallelism) in this domain?
No, the 15 minute turn around time is not important given the dataset we have at the moment, but showing that it was possible to do was considered important from a science perspective, because of the upcoming LSST telescope. LSST will generate an amount of data equivalent to the full dataset we had available every 3-4 days, so being able to scale this up far enough to accommodate that as well as future planned extensions to the algorithm, necessitated showing scale. The actual science runs by the project are usually done on a few hundred nodes over a couple of hours.
| Do you feel like a massively parallel system with Xeon Phi nodes is a good match for this problem? Or did the code get optimized to run at high scale on Cori Phase II because that's where you were given compute resources?
Cori Phase II worked well for this problem, though I wouldn't be surprised if GPUs wouldn't have been a better fit (though harder to program of course and at the time the Julia GPU infrastructure probably wasn't quite ready yet - even KNL was a struggle since LLVM was still in the process of completing support for it). The Celeste project is still ongoing (working on science goals more so than extra parallelism or performance improvements at the moment), but I wouldn't be surprised if there was an attempt to run on Summit at some point, especially now that Julia's GPU compiler is much more mature.
One of the biggest problems we failed to anticipate actually was getting the data from disk to compute units quickly enough. Early in the project we crashed the interconnect on the machine, so for the challenge run we weren't allowed to do anything other than pull the data directly from disk (lest we bring down the machine again while other challenge runs were ongoing). I haven't really looked at the interconnect on Summit, so I can't say how well it would handle that.
| Does this approach effectively scale down to e.g. a university that can afford a large storage array and beefy commercial servers
Yes, it scales fairly well. In fact you could probably do it with spot instances on a public could fairly well. The biggest thing would once again be getting the data to the compute units quick enough. That's quite demanding on the network (and ideally you want to pre-stage the data in memory). Certainly it's feasible to do this on a large-ish university cluster on the SDSS data set in a few hours. Probably less feasible on LSST data once that comes online, but maybe by that point improvements in computation speed and storage speed will have made up for that and it'll become feasible again.
>No, the 15 minute turn around time is not important given the dataset we have at the moment, but showing that it was possible to do was considered important from a science perspective, because of the upcoming LSST telescope. LSST will generate an amount of data equivalent to the full dataset we had available every 3-4 days, so being able to scale this up far enough to accommodate that as well as future planned extensions to the algorithm, necessitated showing scale. The actual science runs by the project are usually done on a few hundred nodes over a couple of hours.
I've heard that the folks working on the EHT array need months to crunch numbers. Could something like this be used to speed up that process? Or is there some other reason that would prohibit that.
I don't know. I'd imagine that the folks working on the EHT already are making use of plenty of HPC for image reconstruction. It's quite a different problem from the Celeste application of course, so this work isn't directly applicable, but if they ever wanted to rewrite their code in Julia, they should give us a shout ;).
No, the 15 minute turn around time is not important given the dataset we have at the moment, but showing that it was possible to do was considered important from a science perspective, because of the upcoming LSST telescope. LSST will generate an amount of data equivalent to the full dataset we had available every 3-4 days, so being able to scale this up far enough to accommodate that as well as future planned extensions to the algorithm, necessitated showing scale. The actual science runs by the project are usually done on a few hundred nodes over a couple of hours.
| Do you feel like a massively parallel system with Xeon Phi nodes is a good match for this problem? Or did the code get optimized to run at high scale on Cori Phase II because that's where you were given compute resources?
Cori Phase II worked well for this problem, though I wouldn't be surprised if GPUs wouldn't have been a better fit (though harder to program of course and at the time the Julia GPU infrastructure probably wasn't quite ready yet - even KNL was a struggle since LLVM was still in the process of completing support for it). The Celeste project is still ongoing (working on science goals more so than extra parallelism or performance improvements at the moment), but I wouldn't be surprised if there was an attempt to run on Summit at some point, especially now that Julia's GPU compiler is much more mature.
One of the biggest problems we failed to anticipate actually was getting the data from disk to compute units quickly enough. Early in the project we crashed the interconnect on the machine, so for the challenge run we weren't allowed to do anything other than pull the data directly from disk (lest we bring down the machine again while other challenge runs were ongoing). I haven't really looked at the interconnect on Summit, so I can't say how well it would handle that.
| Does this approach effectively scale down to e.g. a university that can afford a large storage array and beefy commercial servers
Yes, it scales fairly well. In fact you could probably do it with spot instances on a public could fairly well. The biggest thing would once again be getting the data to the compute units quick enough. That's quite demanding on the network (and ideally you want to pre-stage the data in memory). Certainly it's feasible to do this on a large-ish university cluster on the SDSS data set in a few hours. Probably less feasible on LSST data once that comes online, but maybe by that point improvements in computation speed and storage speed will have made up for that and it'll become feasible again.