Supercomputers for Weather Forecasting Have Come a Long Way

g0xA52A2A · on March 1, 2017

    Currently the most powerful weather forecasting computer is the UK Met Office
    machine, a Cray XC40 machine that delivers 6.8 petaflops and sits at number 11
    on the TOP500.

But this does not run the production forecast, that is done on one of two smaller machines which are capable just over 3 petaflops.

Source: I work for Cray at the UK Met Office.

gingerbread-man · on March 1, 2017

So the XC40 is used for research purposes? I'd be very interested to know more about your workflow.

g0xA52A2A · on March 1, 2017

That XC40 system is for collaboration work but it also serves as a test bed of sorts as it's so much lager users can try scaling up models on it.

alex_hitchins · on March 1, 2017

What is the 'bigger' one used for then?

Edit : Answered while I was posting this

crosbyar · on March 1, 2017

This is a very good point. Large clusters are typically shared resources, this is a big problem for the US forecasts since we have to balance the time/cpu and timeliness of results.

g0xA52A2A · on March 1, 2017

Well production runs on one of two identical machines. This kind of redundancy is not uncommon for weather sites. One of these will run the production version of the forecast and the other will be running version n+1. We flip between the two semi-regularly to ensure if one machine were to have a major issue production would be unaffected if forced to move to the other machine. Smaller weather sites may have a single machine partitioned into two.

NLips · on March 1, 2017

Please can you tell us what forecasts it is running then?

onion2k · on March 1, 2017

What does it do then?

popobobo · on March 1, 2017

Is it running fortran? Which version?

batbomb · on March 1, 2017

It runs Cray Linux (based on SUSE) and assuredly somebody is running Fortran on it in some form.

Many people think these large machines are intended for a small handful of users that run extremely large jobs. That's typically not the case; they typically have tens of users per day up to hundreds per week, depending on institution. With diverse users comes diverse software and the challenges based on that.

g0xA52A2A · on March 1, 2017

Not sure exactly what you're after here. The systems have assorted versions of Fortran compilers from both Cray and Intel. With regards to versions of the language standard used/ required for the models I'm not sure.

__mp · on March 1, 2017

A reason for the slow adoption of accelerators is that these code bases are pretty old and mostly Fortran based. I don't think we are going to see a shift to accelerator-based weather models any time soon. The code bases are pretty old and most scientists working on these models don't have the experience or do not want to work in a programming methodology that works well with accelerators (at least that's my experience). Also in a lot of cases people just like to work with an already existing code because they are just doing a PhD and don't want to translate everything to a new language.

The COSMO model which I'm working on is 18 years old. We ported the compute intensive part of the model to C++ using a stencil library and integrated OpenACC pragmas (think OpenMP, but for GPUs) in the rest of the code base.

I'm not a big fan of OpenACC, because it requires the user to make assumptions on the underlying hardware and quite a bit of thinking to get high-performance code. It is quite time consuming to integrate the pragmas so that the code performs. OpenACC capable compilers used to be very unstable, we regularly got compiler breakages and regressions. It got a bit better since we send the code to the vendors, but we still see regressions from time to time. All (usable) OpenACC implementations are proprietary, so we have a vendor dependency. The PGI OpenACC binaries used to be pretty slow, but it is now almost up to par with the Cray binaries.

The successor model to COSMO, ICON, developed at DWD and MPI is also written in Fortran using OpenACC pragmas (and I think OpenMP pragmas because they plan to use it on XeonPhi). The code is interesting in the sense that they are using a icosahedral grid, which stands in contrast to the square grid COSMO uses: Data accesses are not straightforward. Keep in mind that Fortran does not have easy abstractions for data accesses outside of square grids, so you have to use a couple of macros/functions/etc.. to get to the fields you need. Disclaimer: I have not seen the code myself, but the DWD certainly knows how to write high-performance code, so I'm certain most of the code is optimized.

m_mueller · on March 2, 2017

Hi! I'm actually working on an OSS transpiler project that is made to convert CPU-only array based Fortran applications such that they can run on both CPU and GPU. It's been in the works since 2012 and I'm just now gearing up for a 1500x1300 2km run on a Japanese supercomputer, results will get submitted soon.

https://github.com/muellermichel/Hybrid-Fortran

__mp · on March 2, 2017

Oh that's pretty neat. A colleague of mine is working on a similar project applying directives to allow loop reorderings for OpenACC and OpenMP code (CLAW project). He's using the Omni-Compiler tough.

m_mueller · on March 3, 2017

You mean VC? Yup, he's visited me in Tokyo, we've talked about the two projects ;-)

__mp · on March 3, 2017

Haha, yes exactly I meant him. The world is small :)

g0xA52A2A · on March 1, 2017

Do you work at MeteoSwiss perchance?

__mp · on March 1, 2017

g0xA52A2A · on March 1, 2017

Ha though so, it's a small world ;)

The MeteoSwiss utilizes K80 GPU's in the system hosted at CSCS though right? Or were you saying you don't expect a shift to accelerator-based weather models as a general trend?

__mp · on March 1, 2017

Yes, exactly: Piz Kesch and Escha. The code is currently also run on Piz Daint on Pascal. Piz Kesch/Escha is pretty special because the system has 16 GPUs per node which makes the MPI configuration pretty hairy compared to one GPU systems like daint.

I meant it as a general trend. There's interest from the community to run the GPU model on their system but unfortunately the official COSMO code is not fully GPU ready yet.

g0xA52A2A · on March 1, 2017

Cool, thanks for replying. Didn't know you guys were actually running anything on Daint (I'd heard tests were planned). Your earlier comment was the first time I'd heard of the ICON model too. Quite an informative day on HN.

__mp · on March 1, 2017

You're welcome. This is what they presented last year: ftp://ftp-anon.dwd.de/pub/DWD/Forschung_und_Entwicklung/CUS2016_presentations_PDF/06_Dynamics_and_Numerics/01_Invited_Z%E4ngl/ICON_CUS_2016.pdf very nice stuff! (there's a seminar at DWD next week where they probably will give an update). The ICON model is also interesting because they are able to interleave a local area model over Europe with their global model.

Well, actually it's actually mostly Jenkins that runs the code for us on Daint. ETH, which I'm also affiliated with, is starting to use the model in GPU mode for climate runs on Daint.

What are you working on at Cray?

g0xA52A2A · on March 2, 2017

> What are you working on at Cray?

I'm a sysadmin so I get called when things so bump in the night and maintain the running of the systems. I also get to help out with other systems in the EMEA region.

__mp · on March 2, 2017

Cool, then you are probably the a person who is able to answer this question: What is a good way to mix C++ and Fortran build environments? Is there a way to mix the PrgEnv-{Cray/PGI} so that I can compile C++/CUDA code with GCC 5.x/6.x and the Fortran code in OpenACC mode with PGI/Cray?

Right now, we have quite a elaborate build environment where we first load the GCC build environment to compile the C++ code base. Then we load a Fortran build environment to compile the Fortran code base. I would rather have to use one environment to work in. In the most recent Cray and PGI environments (6.0.3) I have to also add the linker path to GCC version I used for C++. This is to avoid linker issues with the C++11 abi in the newer GCC compilers (somehow, the Cray/PGI environment always wants to link with old GCC libraries). Its not a big problem with our build scripts but annoying nevertheless.

g0xA52A2A · on March 2, 2017

Not sure on that sadly, our users don't really mix compilers within a build. With regards to the Cray modules they do go out of their way to avoid you mixing compilers (in terms of setting conflicts in the module files). I'll ask one of our application guys tomorrow to see if there's something I may not be aware of.

njharman · on March 1, 2017

Can we take a moment to reflect on how awesome and revolutionary (in changing world, not its tech) the Linux Kernel (and *nix ecosystem around it).

I remember when the TOP500 was being taken over by Beowulf Clusters https://en.wikipedia.org/wiki/Beowulf_cluster Linux made that happen. At the time (and today) it ran on small embedded systems, to some of the world's fastest supercomputers and practically everything in between. Incredible compared to the alternative contemporary OSs.

clappski · on March 1, 2017

For all it's warts, in my opinion Linux (and *BSD's) are some of the greatest software engineering efforts of our time. I think we'll be lucky to ever see a FOSS kernel reach the same level of ubiquity in our lifetime as Linux.

lucaspiller · on March 1, 2017

I've always wondered how meteorology works in terms of computer models. As someone knowing nothing, what's some good material to start learning how it works?

Now we have powerful computers and lots of data freely available (well maybe less now thanks to Trump), I've always dreamed of running an old computer model say from the 1980s at home.

aeroman · on March 1, 2017

You can do better than the 1980s - the WRF (Weather Research and Forecasting) model [1] is up to date and will run on a PC (albeit not at a super high resolution). You can provide boundary conditions with freely available data from NOAA [2]

However, if you want to run the kind of forecasts that a weather center would do, you big issue would be getting input data. Some of the biggest advances in forecasting have come from improving the amount of satellite data being 'assimilated' into weather forecasts - look how much better the southern hemisphere became compared to the northern hemisphere during the early satellite era (1980-2000). Getting this data in a timely fashion is more of a challenge to do at home.

[1] - http://www2.mmm.ucar.edu/wrf/users/

[2] - https://nomads.ncdc.noaa.gov/

[3]Page 3 in - http://www.ecmwf.int/sites/default/files/elibrary/2012/14553...

julienchastang · on March 1, 2017

On this topic you can now run WRF in Docker [1] [2], something that I have successfully done in the past on my MacBook Pro. I was even able to generate NCL [3] images from the output. However, I don't think you can truly take advantage of HPC capability via the Docker route if you have access to a supercomputer. Nevertheless, this makes the lives of graduate students the world over much easier since installing WRF is no easy task.

[1] https://www.ral.ucar.edu/projects/ncar-docker-wrf

[2] https://github.com/NCAR/container-wrf

[3] https://www.ncl.ucar.edu/

semi-extrinsic · on March 1, 2017

Also, I believe most weather forecasting makes good use of ensemble runs - take your initial data set from observations, introduce N sets of appropriately sized random variations in the initial data, and run N simulations. Then you look e.g. at the the spread of results, giving you information about the certainty of your prediction.

amelius · on March 1, 2017

I suppose these perturbations would need to be extremely small, because of the butterfly effect.

angry_octet · on March 1, 2017

The butterfly effect is overrated. There is always divergent behaviour in the long term, but most of the runs will come out very similar. In fact, it is a major problem to introduce enough variation into the parameters of the ensemble run to capture the natural variation which is observed by the data acquisition systems. The US systems use vector breeding and European systems tend to use the singular vector method, which as I understand it is a bit more effective.

http://journals.ametsoc.org/doi/abs/10.1175/2008MWR2498.1

semi-extrinsic · on March 1, 2017

I'm guessing they use something like Gaussian noise with a width approx. equal to the uncertainty in the initial conditions.

mturmon · on March 1, 2017

"Bred vectors" are an example: https://en.wikipedia.org/wiki/Bred_vector

Note that these vectors are generated not by just sampling the uncertainty in I.C.'s. Of course, this is because the space of I.C. perturbations is too high-dimensional to cover. In the method above, selection of perturbation direction is based on the adjustments implied when new data is sync'ed to the model.

There are other techniques.

brootstrap · on March 1, 2017

Well honestly you have to start with the science & physics. The 'model' (call it numerical weather prediction) is just a bunch of physics coded up and plugged in together. For instance you have a cloud parameterization model that describes cloud formation & precipitation, which interacts with a land surface model, which interacts with the ocean model etc...

These models are just computer code translations of physical equations mainly. You get an equation to model how air (fluid) moves, code it up, set up a numerical process to integrate over a small time step. Depending on how many assumptions you want to make the equation can be relatively simple or highly complex.

In short, it's fuckin insane. This has some good stuff in it https://www.ral.ucar.edu/projects/armyrange/references/forec...

digler999 · on March 1, 2017

> In short, it's fuckin insane.

Just take a moment to consider the feedback loops: huge cloud cover blocks sunlight, so the ground temperature is cooler, so ground air pressure is lower, which probably acts like a sponge to pull in any nearby clouds, which would then increase chances of precipitation. I would imagine the time of day matters too, because cloud cover at night can trap in radiant heat... fucking insane.

I wonder if they put an open bounty on the problem, say for specific geographical regions: find a better prediction scheme for city X, win $Y

angry_octet · on March 1, 2017

Don't worry, meteorologists are brutally competitive in their own way. They don't say anything, they just put up a chart comparing forecast accuracy.

robert_tweed · on March 1, 2017

Kinda disappointed about the lack of historical data given about the sort of supercomputers used for weather modelling over the years. Mainly because I want to know which era of weather forecasting could conceivably run on my PC, my phone, or the latest Raspberry Pi.

ghaff · on March 1, 2017

As I recall from when I followed the supercomputer space in more detail, weather was one of the areas that tended to use specialized supercomputers like IBM BlueGene rather than big clusters because the weather models were harder to parallelize. Go back in time and I'm sure you'll see quite a few high-end IBM pSeries. Go further back and there will be things like Cray vector machines.

angry_octet · on March 1, 2017

No it is inherently a well parallelizible problem, it is just that code was originally written for the original vector supercomputer CPUs (Cray, NEC SX) and had to be rewritten for MPI style clusters.

It ironic because now we have to rewrite everything to use SIMD/vector accelerators (GPUs).

olegkikin · on March 1, 2017

I do windsurfing, and wind prediction is absolutely horrible, especially in the areas with complex terrain. I understand why it's hard, it's a 3-dimensional problem, and pretty much all weather stations are sitting on the surface, and the wind ones are quite sparse.

We need 10x-100x more stations/sensors for the data to be good. I've been thinking of ways to create a cheap solution for a simple weather station that would ping the data once a minute to the central server over something like GSM, but I'm more of a software guy. Determining wind strength/direction is not that hard, but making it all weather proof is.

bdamm · on March 1, 2017

Aviation is all over this. Most airports have an Internet-connected weather station reporting wind speed, wind direction, temperature, and even dewpoint (which is a measurable quantity). This raw data is fairly high quality, publicly available in a standard format, and fed into weather processing systems. Another data source that comes from aviation is airplanes. Especially airliners, because the airline companies really want to avoid areas of strong turbulence. So they mandate their pilots to make pilot reports of wind speed, temperature, turbulence, to air traffic controllers who in turn make it available to weather forecasters (it is a data point, again standard format and reasonably well trained reporters makes for overall pretty good quality data.)

There's a huge community already feeding data into weather underground. I am not sure if NOAA uses weather underground data, but I suspect they do not because the quality of the data is not very reliable (station mount, model, terrain in the immediate vicinity of the station, etc.)

Weather balloons and satellites fill in most of the rest of the data sources.

All weather models are 3D. Terrain is a major factor in weather. The weather forecast is usually pretty good in the time window of the next 24 hours around very specific places, such as the TAF report for major airports, which covers the 2 miles around the airport, and resolving weather to within a 2 hour time frame. Of course, you likely won't be windsurfing there, and I suspect you want a higher resolution weather, such as to within 1/4 mile, with 30 minutes event bracket, and 7 days out. Wouldn't we all!

olegkikin · on March 1, 2017

1) You're confusing temperature prediction (which is indeed not bad, and is what most people want) with wind prediction.

2) Yes, the models are 3D, but the data is 2D and is sparse. I'm fully aware of the airport wind data, and it's not even close to being enough, because the number of airports itself is very limited, especially outside of the US.

bdamm · on March 1, 2017

No, wind prediction is a major part of weather forecasts, and are particularly critical for aviation. I don't think I missed your question here. You seemed to want to get raw data... and I wanted to make sure you knew that there is already a community (weather underground.)

olegkikin · on March 2, 2017

No, I want more data, a lot more. I know the existing data is mostly available.

brandmeyer · on March 1, 2017

There is a startup that has designed an ultrasonic wind field sensor that can make many measurements of the wind field above the instrument. The instrument is on a smallish trailer, towed by a pickup to the test site. Once there, it can take measurements of the wind field up to 200m or so above the trailer. The idea is to make a cheaper replacement to the met towers that wind turbine site surveys use today.

Its much more expensive than a ground-level mechanical anemometer, but also much less expensive than a tower with an anemometer.

mturmon · on March 1, 2017

AFAIK, there are not that many large-scale operational retrievals of wind velocity by remote methods. The ultrasound technique you mention for remote retrieval (which was news to me) is apparently based on this kind of rig: https://en.wikipedia.org/wiki/SODAR

Ocean surface winds are retrieved on vast scales by satellite measurements, due to their effect on the surface roughness, which is measured by radar (e.g., https://manati.star.nesdis.noaa.gov/products.php). But that's an exceptional case.

Helmet · on March 1, 2017

Which start-up?

brandmeyer · on March 2, 2017

My information was out of date. The startup was Second Wind. They have since been bought by Vaisala, and their product currently ships under the Triton name.

angry_octet · on March 1, 2017

Airports where wind is especially problematic (eg HKG) use special radars to look at the turbulence. Some research on running a local high res model with radar used for initialisation, but the short timescales make it difficult.

http://www.atrad.com.au/products/wind-profilers/boundary-lay...

Also, SODAR: http://www.vaisala.com/en/energy/Weather-Measurement/Remote-...

xapata · on March 1, 2017

Balloons.

The trick is to make them so cheap they're disposable. Preferably bio degradable, though that'd be tough for the electronics. Weather proof is impossible.

amelius · on March 1, 2017

Or attach them to large cargo ships?

froindt · on March 1, 2017

That would get us data for areas where almost no humans are nearby. Not too valuable.

angry_octet · on March 1, 2017

Wrong! We want data from offshore so we can model what happens when the system moves over land. Also, shipping is valuable and has humans who can drown.

patall · on March 1, 2017

Could not every wind turbine be a wind sensor? Sure, they are not distributed evenly but at least here in europe pretty much all wind rich areas are full of them so they could provide some data that obviously is available as they have to be managed centrally somehow.

Odenwaelder · on March 1, 2017

Here's a great (paywalled) Review on numerical weather forecasting: http://www.nature.com/nature/journal/v525/n7567/full/nature1...

Retric · on March 1, 2017

It seems like weather forecasting has actually gotten really good over the last 30 years, except for snow. Are there any good statistics for relative accuracy out say 1-3 days?

Afforess · on March 1, 2017

Weather forecast accuracy is a statistic anyone can measure and generate. Put the forecasts and the actual weather outcome together in a spreadsheet, examine the results each day for the 1,3, and 5 day forecasts. NOAA and the UK Met Office both do this for all of their forecasts, it's called "verification", and they measure their forecasting bias. In general, the statistics I've seen suggest the 1 day forecasts are 95% accurate, 3 day forecasts are ~80% accurate, and 5 day forecasts are ~70% accurate. This may vary if you live in a hard to forecast area.

For example, surface temperature forecast verification for Jan 2017: http://www.mdl.nws.noaa.gov/~verification/ndfd/index.php?mo_...

Retric · on March 1, 2017

That does not seem like a great way to test for accuracy. ed: as described.

I can predict Hawaii's temperatures out 20 years based on historical data much better than Chicago's. So some adjustment for area variability should be accounted for. People also care a lot more based on how off the temperatures are. So, being off by 5 degrees every day is better than being off by 1 degree for 4 days then 20 for the 5th.

I assume there is some adjustments for this stuff?

bdamm · on March 1, 2017

Yes. NOAA MOS does just this.

http://www.nws.noaa.gov/mdl/synop/products.php

_1tan · on March 1, 2017

What constitutes a hard to forecast area? Any examples?

ghaff · on March 1, 2017

A lot of variability in weather. Multiple weather systems tending to come together. It's a lot harder to forecast Boston than it is Las Vegas. It also depends what you're trying to forecast. If you live in a very dry area, you're going to be right about precipitation the vast bulk of the time.

Afforess · on March 1, 2017

Sure. The most challenging areas to forecast are typically areas with poor data availability (Mountains) or with topological features that impact the weather significantly (Great Lakes affect winter forecasts significantly). I imagine the Rocky Mountains and Great Lakes region both have lower than average forecast accuracy.

pkaye · on March 1, 2017

I've always wondered why so many weather websites make it hard to compare actual vs predicted weather forecasts for historical data.

Cyphase · on March 1, 2017

Possibly they don't want to make it easy to determine their accuracy rate.

angry_octet · on March 1, 2017

People are always happier being grumpy and ignorant than informed and accepting.

mikeash · on March 1, 2017

It can be pretty amazing. I subscribe to a specialized forecasting service for glider flying, which takes the output from the big models and uses it to derive parameters useful for flying.

I remember flying on one day when the forecast said that I'd be able to climb to 3,000ft in one spot, and to 6,000ft in another spot only about five miles away. In the air, I tried the first spot because it looked better to me. I topped out at 3,000ft. I moved over to the second spot and sure enough, 6,000ft. It was pretty amazing how good that one was.

piokuc · on March 1, 2017

What's the software/service that you were using?

mikeash · on March 1, 2017

It's called XCSkies: http://www.xcskies.com/

endersshadow · on March 1, 2017

So snow is super hard to forecast because snow bands. These happen at a mesoscale, and typically one 1-2 grid points of the models used to forecast. The exact processes that contribute to banding are not very well measured, either, so it's very difficult to get a good prediction for it.

Due to the banding, you can get extreme variations in snowfall, leading to, "I thought we were supposed to get a foot, but we only got an inch!" type of comments. Or, bands will line up with the storm motion and just pelt an area harder than expected.

Example: http://imgur.com/i0j4I7g

knz · on March 1, 2017

The Updraft blog by Minnesota Public Radio just had a small piece about how the various weather models change and what that means for local snowfall.

http://blogs.mprnews.org/updraft/2017/02/snowcover-view-from... - under "GFS model shifted". They do similar articles semi regularly.

ghaff · on March 1, 2017

In general, precipitation seems to be one of the tougher areas, especially as it can be very localized. With snow, you then also throw in rain/snow lines which, where I live in New England, are very commonly somewhere within the greater Metro area.

But in the big picture sense, and for things like hurricanes, forecasting has improved a lot. It's unlikely we'd be caught by surprise today as happened with the Blizzard of '78 when massive numbers of cars got stuck on the highway and had to be abandoned or people were stuck at their offices for a week. (The weather events still occur of course but they're less likely to catch people unprepared.)

bluedino · on March 1, 2017

That happened in Chicago in 2011 - a blizzard caused hundreds of cars to get stuck on South Shore Drive. I'm not sure if it wasn't warned or people were just stubborn.

ghaff · on March 1, 2017

>or people were just stubborn

It's probably less true than it used to be but, for years, a lot of people in Boston were really paranoid about big snow storms. Everyone knew someone or at least knew someone who knew someone who had be evacuated from Route 128 or slept in their office for a week.

About a decade later a friend of mine moved to Boston and she once told me that she had never seen a northern city where people took off home from work at the first snowflake the way they did in the Boston area.

rm_-rf_slash · on March 1, 2017

Coming from upstate New York, I have accepted the longtime reality that the weather will always be shittier than expected, so we pack according to probable shittiness instead of expected forecast.

My girlfriend is from Florida, and even after three years of living here she is still unused to the habit of keeping ice scrapers, umbrellas, rain boots and an extra jacket in the car at all times.

madenine · on March 1, 2017

"Welcome to Syracuse, where the seasons don't matter and the weather is made up"

Upstate weather is a mystery. When I was in school we had 80 degree weather towards the end of the spring semester before a small snowfall the day before graduation.

gingerbread-man · on March 1, 2017

A one-day temperature forecast is now typically accurate within about two to 2.5 degrees, according to National Weather Service data. https://www.washingtonpost.com/opinions/five-myths-about-wea...

tmsldd · on March 1, 2017

Could someone please give some references/publications for the algorithms models and implementation strategies that are currently running weather forecast predictions on this machines?

brandmeyer · on March 1, 2017

https://mpas-dev.github.io/

This is a model that didn't quite make it as the replacement for the GFS in the US. It had broad backing from the research community, and if you follow their references back you'll find a wealth of research.

angry_octet · on March 1, 2017

http://www.wrf-model.org

You can run a version of it yourself.

deepnotderp · on March 1, 2017

Apparently perfect simulation will require zettaflops!

We can't even achieve exascale right now! (But we'll get there)

dekhn · on March 1, 2017

There is no such thing as perfect simulation.

And, it's more likely that when people realize how incredibly hard it is to build exascale machines, and how unproductive they will be, they'll just end up training deep learning models which approximate the perfect simulations well enough and cost-effectively enough on commodity GPUs that nobody will buy or build supercomputers any more.

deepnotderp · on March 1, 2017

Baidu estimates it takes ~20 ExaFLOPs to train a deep net. Consider that it's not uncommon to churn through ~100 hyperparamter combinations (let alone architectures), especially for something high end as this, and you realize that deep learning is no escape from the compute problem!

There's a reason why the big DL players hire lots of HPC guys for deep learning. And also, parallelizing SGD is a highly non-trivial task, until very recently, you had to linearly lower the learning rate as you added new workers, and keep in mind that bandwidth is far more expensive (both economically, temporally and energy-wise) than compute.

That being said, many people are already exploring deep learning for weather simulation (such as Yandex I believe) and it has worked very well, and is near, or beating SOTA iirc, so I definitely think there's a future there.

For the record, I think exacsale is achievable, but not with the current architecture (admittedly I'm biased, as a deep learning chip and supercomputing chip startup founder), but I think my objective evidence is pretty strong that with a new architecture, exascale is possible. On the other hand, zettascale may be the end of supercomputer scaling. Moore's law, Dennard Scaling, et al are no saviors either, since communication and scheduling logic now dominates the cost of computation.

dekhn · on March 1, 2017

You mean "20 ExaFLOP". "20 ExaFLOPs" is a per-second measure.

FWIW, I work at Google, have a background in parallelizing simulations, and built an execution system that runs huge hyperparameter combinations. We called it "exacycle" because it exceeded 1 exaflop / second (no communication between tasks) using only idle cycles.

I think you probably missed my point: it's now pretty well established that for any physical simulation process, you can train a net using far less energy and get an equivalent quality result. The training itself doesn't require lots of communication unless the model is enormous.

deepnotderp · on March 1, 2017

Yes, exaflop, I'm so used to writing "FLOPs" that I wrote it xD

You may have a background in parallelizing simulations, but I beg to differ about parallelizing deep learning training. There is a lot of communication involved. For one, your model is very likely to be enormous (perhaps even necessitating model parallelism) and second, there is a LOT of communication involved even with data parallelism.

dekhn · on March 2, 2017

I guarantee that communication is small compared to what the weather simulators require. Deep learning systems typically work fine on large-bandwidth, medium-latency systems. They are highly tolerant of async updates.

Most parallelism in DL uses coarse exchange, although I agree wrt to large models. That's similarly true for every supercomputer application (it's the only reason people use supercomputers now; it's much easier and faster if your system fits on a single system).