Hacker News new | past | comments | ask | show | jobs | submit login
Introducing Zaius, Google and Rackspace’s Open Server Running IBM Power9 (googleblog.com)
116 points by kungfudoi on Oct 15, 2016 | hide | past | favorite | 24 comments



IBM have done lots of work to port stuff to ppc64le (edit: wrong, see reply), which I think is the particular arch for P9. One of the reasons for introducing the -le variant was to make porting easier.

They ported Go; in my day job I've seen them porting various parts of Cloud Foundry and supporting infrastructure.

Basically I'm interested to see if Google open these up on GCP. As points of differentiation go, it would be a doozy.


> They ported Go;

No, they didn't. Minux Ma, a major Go contributor unaffiliated with IBM, ported Go to POWER.


I'm sorry. Once again I am getting wires crossed in my head between POWER and 390.


I would be nice if they would continue to maintain the big endian PowerPC branch of Go, as that appears to have atrophied in recent years.

From a software quality standpoint, testing on older Apple PowerPC gear is probably the cheapest (PowerMac and Xserve G5's are in the $100-300 range on the used market) and highest performance big endian systems available, and they're still natively supported by a lot of mainline distros, Ubuntu in particular.

The only other BE systems out there of comparable performance are SPARC systems which either are very expensive or have low-single threaded performance (T1 and T2 based).


> I would be nice if they would continue to maintain the big endian PowerPC branch of Go

Who is "they"?

If you are referring to the Go project, big endian power is supported.

> as that appears to have atrophied in recent years.

What exactly has atrophied?

> Apple PowerPC [...] highest performance big endian systems available

???

POWER8 (and soon POWER9) systems are available today, in big endian mode. You don't need obsolete hardware.

Modern MIPS64 is also faster than old Apple G5s, albeit it can be tricky to get one.

ARM64 is available in server grade hardware, with current level of performance. Unfortunately, even though ARM64 supports big-endian, nobody deploys in big-endian mode. When I did the ARM64 Go port, I did run my ARM64 hardware in big-endian mode for a while, but as that required created my own distribution, I never committed big-endian ARM64 support in the Go port. If a big-endian ARM64 distribution ever appears, we'll definitely add ARM64 big-endian support to Go.

> The only other BE systems out there of comparable performance are SPARC systems

FYI, I am working on a SPARC64 Go port. It's in a very advanced stage. I hope it will be ready for Go 1.8. I am using a S7-2 system, 4.13GHz, 128 threads, 256GB RAM, very good single threaded performance. I can assure you it's very performant, but yes, it's more expensive that POWER.

In any case, thanks for your support! We definitely need more awareness of non-x86 architectures.


If Rackspace, given their major support for various opensource projects, were to provide POWER9 runners for, say, Gitlab CI, this could be a major help in porting software. Or, they could, like IBM provide SSH access to interested projects. But the CI part is important to ensure there's no regression, and given the scarce availability of POWER9 (or even POWER8) hardware to the general public, let alone opensource developers, Gitlab CI integration sounds like the more practical service.


IBM sponsor a fleet of POWER-based systems at the OSU Open Source Lab[1].

Edit: You already said this ("Or, they could, like IBM provide SSH access to interested projects."), and if you'll excuse me, I'm going to go hide in shame.

In my day job we've interacted with an IBM team who are porting our entire buildpacks pipeline[2][3] (which uses Concourse) to run on ppc64le. We fall under the Cloud Foundry heading the list of projects.

The eventual goal is that we will be able to run x86 workers (on a regular commercial cloud) and some POWER workers at OSU-OSL or SoftLayer, and build both kinds of binaries from the same pipeline.

I believe the eventual eventual goal is that all Cloud Foundry pipelines and products will be fully available across both x86 and ppc64le, including first-class integration with any pipeline producing binaries. Given that buildpacks represents the bulk of the binary volume, it makes sense to ensure our entire pipeline works on ppc64le.

Disclosure: I work for Pivotal, not IBM, and I'm not able to commit either to anything.

[1] http://osuosl.org/services/powerdev

[2] https://buildpacks.ci.cf-app.com/

[3] https://github.com/cloudfoundry/buildpacks-ci


What would be the course of action for an opensource project to set up a CI worker there (ideally per-commit on X branches, not periodic) such that it could be integrated in a pre-merge check? I'm not bound to Gitlab CI runners, but it's the first thing that came to mind given the popularity of github and gitlab.


I honestly have no idea. I imagine access is mediated by OSU, not IBM. The contact page (http://osuosl.org/contact) seems like the place to start.

One of the tricky parts about running PRs is that you're running arbitrary code, for which the main threat is the exfiltration of secrets. You need to lock down the workers fairly tightly to avoid unintended consequences. I'd be interested in reading more about how Travis, CircleCI, Gitlab et al do it -- some light googling didn't turn up any specifics.

Edit: looks like CircleCI call this out explicitly and state their defences -- https://circleci.com/docs/fork-pr-builds/#security-implicati...


That CircleCI post seems to talk about different issues than I see when I think safety of random CI jobs.

It's a non-trivial problem to solve, especially with caching of artifacts involved. You'd probably want to run a sandbox inside a vm and secure the vm itself first, while having only ephemeral storage attached. Barring a container escape via just read/write/execve allowed inside the sandbox, which could probably also used to escape the surrounding vm, there isn't much you can do if you support running random stuff in a CI job.

Actually, maybe CI needs to be limited to tools that can run on something like ZeroVM.

Limiting persistent state and spinning up machines (vm or bare metal) for each job, while having no permanently active job runners, sounds like another defense to consider.

That said, I very much doubt any of the CI services goes to such great lengths, given the limitations involved.


> Limiting persistent state and spinning up machines (vm or bare metal) for each job, while having no permanently active job runners, sounds like another defense to consider.

I can imagine how I'd do this with Concourse, but it'd be confusingly meta in approach -- a pipeline which builds a new pipeline with a new worker for each PR.

I still think the exfiltration threat is the worst. Any secret injected into the environment of any tested codebase is vulnerable -- especially if your logs are public.


> I still think the exfiltration threat is the worst. Any secret injected into the environment of any tested codebase is vulnerable -- especially if your logs are public.

Fair point, though instead of worrying about that, I think the real solution is to have test-only keys and also make sure logs can be shared without fear of leaking data.


We (buildpacks team) get some of the way by ensuring that all secrets in our logs are redacted -- we actually wrote a rough-and-ready tool (concourse-filter[0]) for this purpose. It works on a whitelist principle. Any environment variable emitted to stdout or stderr is redacted unless it appears on a whitelist[1].

You're right that in the longer run, providing per-test keys will be the safest option. It's on our radar as part of the overall "3 Rs" effort[2].

[0] https://github.com/pivotal-cf-experimental/concourse-filter

[1] https://github.com/cloudfoundry/buildpacks-ci/blob/1c345c30e...

[2] https://medium.com/built-to-adapt/the-three-r-s-of-enterpris...


Right. Unfortunately, Rotate and Repave are not common practice, just like periodically restoring backups isn't.


We're working on it. One day I expect it'll be considered normal.


Since Zaius also seems to have fully open firmware for most (all, including USB controller?) pieces, it would be nice to get something like a <3k$ workstation. I would say Raptor Talos, but they seem to have their hands full with the POWER8 workstation and I wouldn't want them to divert resources to a P8 -> P9 move and further delay their project.


Dumb question: with lots of NVLink/OpenCAPI bandwidth for GPUs/FPGAs, what's all the PCIe bandwidth for? I count something like 150 GB/sec of it in that system diagram. Terabit ethernet?


Based on what's not on the board, I'd guess network and/or local storage.


Storage! Of course


Will I be able to buy this machine?


I wouldn't count on it. OCP designs seem to be targeted at companies that buy servers and switches by the truckload.



You seem not new here (>3000 days). Is it really the first time you see that the duplicate detection algorithm don't work reliably on HN ?

This bug is not even really a problem: it allows to give a second chance to unlucky good submissions.


It's not a bug, resubmissions are explicitly allowed by the software for stories that haven't gotten significant attention.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: