ARM server market

rbanffy · on Oct 12, 2015

One crucial point when we talk about high core count parts is memory and IO bandwidths. In the end, that'll determine the overall performance of your server. I don't see ARM parts, due to their relatively low single-thread performance, as good for compute-bound workloads, but, with memory and IO channels significantly faster than current x86-based parts, they could be competitive in IO bound ones.

I'd love to see more diversity in the server space, but reality is harsh.

kev009 · on Oct 12, 2015

This is where IBM cleans shop. POWER is pretty mean when it comes to bandwidth. You pay the tax to be a couple years ahead of the Intel curve if you need it.

rbanffy · on Oct 12, 2015

Indeed, but there is no magic there. If ARM-based servers with high core count can be made with bandwidths in the same league as POWER, IBM's high-margin server business may end up being threatened.

Also, we cannot forget POWER is very good with single threaded performance.

reacweb · on Oct 12, 2015

I fully agree. With a dedicated server, memory and bandwidth are more often the bottleneck than CPU. I am using scaleway.com since 2 weeks. My small arm server runs squid (to bypass the proxy of my society) and a little static server that fits almost entirely in RAM. It works like a charm.

sheraz · on Oct 12, 2015

I've also been playing with Scaleway's ARM servers for a scraping project and to learn more about distributed computing. (details here in other HN thread https://news.ycombinator.com/item?id=10371316)

I've been able to scale my scraping to over almost 100 requests per second in aggregate across 1.8MM domains. And these little Atom quad core machines are great.

I just got my account upgraded to "developer" which mean I can spin up 100 machine (400 cores!), so I'm going to be experimenting with that.

j3th9n · on Oct 12, 2015

FWIW: The Scaleway servers are not Atom, but ARM based servers.

kabdib · on Oct 12, 2015

The ARM systems I'm familiar with have absolutely terrible memory and I/O bandwidth.

I assume these server-class ARM systems are better?

rwmj · on Oct 12, 2015

And finally a standardized platform, so you can just install any commercial Linux distro on any hardware:

https://www.arm.com/about/newsroom/arm-ecosystem-collaborate...

No more random fork of uboot, proprietary boot method and guess-the-device-tree. Instead you get the goodness(?) of UEFI, ACPI and standard minimum hardware.

wyldfire · on Oct 12, 2015

Are there any breadboard-style releases implementing this spec? I'm looking for the low(er) cost and open ecosystem that I find in ODROID and RasPi style boards, but still the robustness and standardization of this common spec.

rwmj · on Oct 12, 2015

There's a new development board coming real soon that will implement it. It is also implemented by the firmware on APM X-Gene (Mustang) and HP Moonshot. Not sure about Cavium as I've not used it.

However it's unlikely you'll find a small format SBSA-compliant board any time soon, because the server stack simply assumes a lot more RAM than is available on phone/tablet SoCs. For example, the minimum RAM required to run RHEL/aarch64 is 1 GB/pCPU (so in reality 4-8GB min), and even the Snapdragon 808 only has 3 GB. I personally wouldn't be happy doing development work for 64 bit ARM SBSA with less than 8 GB of RAM, and for OpenStack, 32 GB is the minimum I'd recommend.

(I do run Fedora/aarch64 on my LG G4 phone though :-)

apendleton · on Oct 12, 2015

I think server-class ARM is interesting, but it's always struck me as much more interesting for on-premises stuff... like, you want a cheap on-site NAS that doesn't use much power, or a generic machine you can use as a switch, or maybe even an intranet host or something. Anything in the category of stuff that is always on but not very resource-intensive.

In cloud deployments, though, sharing a bigger box amongst lots of virtual machines just seems way more economical.

esaym · on Oct 12, 2015

I kept a close eye on the ARM server market for years waiting for it to happen and bring cheap multi core boxes to the masses. Ultimately, I stopped caring when quad core Xeon boxes got down to less than $400 and draw less than 30 watts idle: http://amzn.com/B00FE2G79C

skrause · on Oct 12, 2015

I still think that 30 watts idle are way too much for at-home equipment. At German electricity prices we're still dealing with $75 a year just for electricity. What I want to see is a reasonably powerful machine that is <5 watts idle (for the board, CPU and RAM) plus the power of the installed hard disks.

creshal · on Oct 12, 2015

Right now you'd need NUC-style nettops for that kind of wattage. I'm not sure the performance qualifies as "reasonably powerful", though…

creshal · on Oct 12, 2015

Entry-level Xeons are ridiculously cheap and powerful nowadays. Our infrastructure went from $5000/server (Netburst era) to $1200 nowadays.

And even for smaller purposes, x86 SoCs are getting good, like PCEngines' 6-12W APU boards that serve nicely as routers/firewalls.

makomk · on Oct 12, 2015

Of course, the price and power consumption of those quad core Xeon boxes probably has a lot to do with Intel wanting to kill the ARM server market.

skrause · on Oct 12, 2015

And people are still saying that Intel doesn't have any competition anymore because AMD is doing so bad these days...

tracker1 · on Oct 12, 2015

I think that where ARM could have a lot to offer is in distributing storage and compute more evenly... for distributed bigtable or cassandra-like storage, it could make a lot of sense to pair 16-32gb of fast ssd storage to a 4-8 core ARM node... With the amount of I/O bound data per compute node reduced, and lower cost per node, this could yeild much better results than the big boxes for distributed databases.

For many other chores, I'm not sure if it makes as much sense... but that's juse my $.02 on the issue. I still think it's a pretty cool option, but not sure how well it works for a lot of different areas.

mangeletti · on Oct 12, 2015

It's interesting. I agree with you, and what I think about when I read your comment is that ARM serves best the role of decentralized computing in general (things like what you mentioned and a whole lot more), which brings up a bigger question: is there a larger role for decentralized computing in the future than there is centralized computing (server farms, cloud, etc.)? My guess is yes, and I'm not just talking about the "Internet of things", etc. With lambda architecture and data pipelining there is a good chance that these sorts of advancements (the advent of cheap ARM servers, etc.) and other changes in the ecosystem will mean a lot.

ksec · on Oct 13, 2015

I am going to presume, for ARM to win any market share, they will have to offer a competitive advantage. ( Correct me if I am wrong )

So where does ARM's Server Chip actually offer an Advantage?

Price - The CPU is only part of the Server's Cost. When you look at the whole picture, The Memory, I/O Controller & SSD, Network, will all cost the same. The higher price of the memory and SSD / HDD, the lower total cost % of the CPU. And once you lowered to a certain percentage, you want an incredibly cheap, Atom Server CPU is available for less then $50.

Performance - Even in the chance that any ARM's Server CPU market manage to make a competitive CPU core against Xeon, which in itself is an incredible achievement, they will have to also compete against Intel's world class ECC Memory Controller, Network Controller, and I/O controller. All three are the main reason AMD CPU didn't offer any competition even when they are much cheaper.

Power - This is similar to Price, once you factor in the memory and I/O, CPU's role is relatively small. And the power / energy usage scenario on a server much flavor Intel's rather then ARM.

So where does it offer a advantage? When you want a small baremetal server that is cheap, but in the world where VM is a common place, there is no reason why you cant have a small instance of it running on a much larger CPU. And even Intel admit it, none of their Server Customers, ( Read NONE ) wanted the Atom as they thought. It turns out everyone wanted Xeon-D, you pay a little premium for HUGE amount of flexibility. Intel wanted the Atom to disrupt itself rather then ARM, and it turns out it was the Xeon-D that did that.

rphlx · on Oct 13, 2015

The current batch of ARM microserver parts are indeed pretty primitive compared to x86, but each year they will copy more & more of the tricks and optimizations that Intel implemented 10-15 years ago, including the superb virtualization tech. Still I agree they are most likely going to fail due to the inevitable patent infringement lawsuits, the fab gap, and/or customer reluctance WRT deploying ARM SW.

steckerbrett · on Oct 12, 2015

I thought ARM servers might be nice on scaleway but was quite disappointed in the performance. 32bit and all disk IO over a network connection is really not a good start. Maybe that will improve when we see more ARM64 around.

hyperbovine · on Oct 12, 2015

At .004 euro cents an hour, seriously, what were you expecting? 32 bits seems like they threw in 16 bits for free.

steckerbrett · on Oct 12, 2015

It's approximately the same price as Digital Oceans lowest offering, which has significantly higher performance in every area.

kawera · on Oct 12, 2015

I'm not sure they are comparable. I have a bunch of servers with both Scacleway and DO and, despite the slow disk access on Scacleway's, you get 2GB of memory per server which works well for some microservices.

A DO instance with the same amount of memory costs $20, compared to €2 for an IP-less server on Scacleway.

dbaupp · on Oct 12, 2015

Are you (and parent) looking at something other than their pricing pages? It seems to me that scaleway costs 0.6 euro cents/hour, but also generally has higher numbers for most features than Digital Ocean: more cores, more RAM, more disk, more bandwidth.

https://www.digitalocean.com/pricing/ https://www.scaleway.com/pricing/

steckerbrett · on Oct 12, 2015

My comparison was between the C1 offering from Scaleway and the lowest DigitalOcean package. Scaleway has far larger disk and memory, but the disk is remote over the network link and the 4 ARM cores are 32bit and terribly slow. I really like Scaleways configuration and control panel but the performance just isn't there. It's likely better for a lot of things other people are doing but I needed the IO and CPu performance.

ju-st · on Oct 12, 2015

How did you measure CPU performance? With DO you have propably a high "burst" cpu performance but if you keep using a lot of CPU ressources they will probably supsense your VM because of "abuse" or something (my experience with other VM providers). On Scaleway you have worse CPU performance but you can use the full CPU 24/7.

0x4a42 · on Oct 12, 2015

I tested scaleway C1 for compiling with GCC and doing some basic lores video converting with ffmpeg and they are terribly slow for this kind of tasks.

steckerbrett · on Oct 12, 2015

Just general feeling compiling some packages and messing with 'openssl speed'.

4ad · on Oct 12, 2015

Compiling software is a somewhat meaningful metric across architectures (with caveats), but openssl speed is not really indicative of the CPU performance, rather it's benchmarking its own fast assembly routines. That only matters if your workload is crypto-bound.

rjsw · on Oct 12, 2015

You could also be benchmarking the interface between openssl and crypto hardware, several ARM SoCs have crypto accelerators.

malvim · on Oct 12, 2015

Two words, though: unmetered bandwidth.

steckerbrett · on Oct 12, 2015

Digital Ocean claims to meter bandwidth but they don't.

codexon · on Oct 12, 2015

This is true but betting a business it will remain this way is foolish.

They have stated in multiple pages that they plan to meter bandwidth.

IanCal · on Oct 12, 2015

I assume you mean 0.004 euros, 0.004 euro cents * 24 * 30 would be 2.88 euro cents per month.

kdeldycke · on Oct 12, 2015

Even less than that, as price is monthly capped: https://scaleway.com/faq/billing/#-How-do-I-take-advantage-o...

tracker1 · on Oct 12, 2015

I think that tethering the storage to the compute nodes would work better.. 16-32GB on the same board with the arm chip, in clusters per nU, per rack ... Reducing the bottlenecks from data storage to the processor. In this kind of scenario widely distributed databases could be a much better fit in ARM than in x86 based servers.

Imagine how fast Cassandra or something similar could run if your filtering nodes ran with the data, and much more widely distributed than typical... it could literally be a night and day difference in the amount of data that can be processed in a rack.

philipw · on Oct 12, 2015

That is Scacleway's implementation, not so much a limitation of ARM. Have a look at Applied Micro based servers for something more "server" like..

rogeryu · on Oct 12, 2015

I tried to run a Tomcat server on Scaleway, but it simply didn't work, and it was nothing complex, just jspwiki. Too bad, the price was really good.

edouardb · on Oct 12, 2015

What was your issue?

samcheng · on Oct 12, 2015

I've been out of the loop for a while datacenter-wise, and so fail to see the immediate value here.

Are these servers running virtualized instances? Are they much cheaper as far as $/request or watt/request (or other efficiency metric? How do they compare to a slice of an Intel server, which has the benefit of a more mature environment?

i_have_to_speak · on Oct 12, 2015

Current ARM server chips are much worse than Intel chips on the performance/watt metric [1]. The datacenter play would necessarily have to target consumers with low performance needs and offer them a better price. That would be tough because you have to compete against the already-cheap $5/server/mo Intel market.

AMD's SeaMicro was arguably the best shot at offering an alternate ARM-server commoditization model. AMD shut down SeaMicro in April [2].

[1] http://www.anandtech.com/show/8357/exploring-the-low-end-and... [2] http://www.theregister.co.uk/2015/04/16/amd_q1_2015_earnings...

Twirrim · on Oct 12, 2015

ARM has typically been more efficient, for both heat and power consumption, while losing out on computational power.

That's enabled people to cram significantly more processing cores into a server rack. 4 years ago HP release a server line with 288 quad-core ARM processors in 4U, for a total of 1152 cores. Obviously that introduces the complexity of needing code that is suited to massive parallelism. Each processor there took up just 1.5 watts idle, 5 watts under load. By comparison the Atom from Intel at the time had comparable performance, but consumed 8.5 watts, just over 70% more, and didn't idle to as low consumption.

There are a number of problems that are significant with large scale data-centre operations, but power consumption and heat generation are way up there at the top. In theory a data-centre filled with ARM based servers would give you comparable performance, but with cheaper electrical and climate control bills.

The emphasis there is on "in theory" :) The reality at the moment is that companies have spent years trying to get ARM based, massive core count servers to be a thing, and it hasn't really worked out. Switching processor architectures is never something that can be done lightly. Software has to be compiled and supported provided for it, and isn't necessarily tuned for the ARM architecture. The x86 architecture's performance characteristics have been really well understood, and software likely designed with those characteristics in mind.

Historically ARM has also lacked Windows server support, and had very variable quality linux support (not helped by every Tom, Dick and Harry SoC company do the darnedest things in pursuit of creating 'value'.) As far as I've heard, they've since done (or started on?) a huge restructuring of the ARM path in the kernel to clean up the mess and provide significantly better customisation opportunities.

In part it has also seemed like a case of "no one ever got fired for buying IBM". You've only got a certain budget, do you gamble on an unproven (to you) architecture, or stick the the tried and tested Intel?

nkurz · on Oct 12, 2015

Twirrim: ARM has typically been more efficient, for both heat and power consumption, while losing out on computational power.

i_have_to_speak : Current ARM server chips are much worse than Intel chips on the performance/watt metric.

It's hard to square these two statements. It might depend on how you define "efficiency". If efficiency is the amount of energy to perform a calculation, I don't think ARM is more efficient. This paper from a couple years ago concludes that ISA is no longer a defining factor: http://www.embedded.com/design/connectivity/4436593/Analysis...

Separately, this is an excellent article comparing recent generations of Intel against each other, showing that although power use has been going up, "instructions per cycle" has been going up even faster, resulting in a net improvement in energy efficiency: http://kentcz.com/downloads/P149-ISCA14-Preprint.pdf

Twirrim · on Oct 12, 2015

I wonder if the question we ought to be asking is "How idle is your server farm"? ARM still uses significantly less power than Intel chips do when idle. If your fleet is working hard all the time, it would be a no-brainer to go with Intel. What if your fleet spends 50% of the time not working? 70%? There presumably is some tipping point there.

bsder · on Oct 12, 2015

Cloud services probably upended this.

If my server farm is idle any significant amount, I'm going to take those servers offline. If I need capacity, I'll spin up some cloud instances in the short term, and I'll bring my own servers online until I'm at an appropriate idle/busy metric.

I suspect that the days of 99% idle servers are long gone. I suspect that utilization is probably above 70%.

pjc50 · on Oct 12, 2015

The interesting market is the "personal" server, which is idle 99% of the time but cannot be turned off as the owner only has 1 to start with.

discodave · on Oct 15, 2015

That's not an interesting market because it is tiny and will get relatively smaller going forward.

There are 2, maybe 3 interesting markets that will come to dominate computing even more than they already do in the next few years:

Cloud, Mobile and IoT (maybe)

Desktops, personal servers and any other non-cloud server workloads will be a rounding error.

nkurz · on Oct 12, 2015

I suspect there is a tipping point, but that it's probably much lower than you suggest: maybe 10-25% of a single core, thus some single digit percentage of the entire processor.

Modern Intel processors can shut down unneeded cores almost completely, and frequency scaling gives you another range of efficient power reduction. It's only when you get lower than that that you are losing significant power at 'idle'.

Unlike a small battery powered device, I'd guess that the difference in idle power for the CPU is never going to be the deciding factor, as keeping the non-CPU rest-of-the-machine running will dwarf the difference. What workload would you envision as having the greatest advantage? Maybe if you were running a single instance per dedicated core?

ergo14 · on Oct 12, 2015

https://www.hetzner.de/us/hosting/produktmatrix/rootserver-p... - this is an interesting offering, Hetzner is one of bigger hosting companies in Europe. They seem to be experimenting with cheap ARM offerings recently.

anjanb · on Oct 12, 2015

Looking forward to a high performance JVM on the ARM server. Is there a way to rent an ARM server which has a high-performance JVM ?

needusername · on Oct 12, 2015

Oracle JDK has full support for C2 and hard float on ARM 64. That code is not in OpenJDK. Red Hat is working on C1 in OpenJDK for Java 9.

enevill · on Oct 13, 2015

Full support for C1 & C2 is available in OpenJDK 7,8 & 9. Prebuilt binaries may be downloaded from http://openjdk.linaro.org/releases.htm. Sources are upstreamed for jdk9 (http://hg.openjdk.java.net/jdk9/dev). For jdk8 they are maintained in the OpenJDK aarch64 project (http://hg.openjdk.java.net/aarch64-port/jdk8). For jdk7 they are maintained in the IcedTea project (http://icedtea.classpath.org/hg/icedtea7-forest).

The default java on Ubuntu trusty (14.04) and vivid (15.04) LTS releases is openjdk. From my 48 core aarch64 platform running trusty.

ed@arm64:~/jdk8/jdk8$ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=14.04 DISTRIB_CODENAME=trusty DISTRIB_DESCRIPTION="Ubuntu Trusty Tahr (development branch)" ed@arm64:~/jdk8/jdk8$ uname -a Linux arm64 3.18.0-g57fcd51 #1 SMP Fri Jun 26 17:33:54 PDT 2015 aarch64 aarch64 aarch64 GNU/Linux ed@arm64:~/jdk8/jdk8$ java -version java version "1.7.0_51" OpenJDK Runtime Environment (IcedTea 2.4.6) (7u51-2.4.6-1ubuntu4) OpenJDK 64-Bit Server VM (build 25.0-b70, mixed mode) ed@arm64:~/jdk8/jdk8$

OpenJDK is also the default java on RHEL and fedora.

the_why_of_y · on Oct 14, 2015

There was a talk about the port last year, with slides:

https://archive.fosdem.org/2014/schedule/event/openjdk_aarch...

Tepix · on Oct 12, 2015

I think these servers can be interesting for increased privacy. You can use them as a low-cost, low energy dedicated server where noone can snoop on you undetected (unlike a VPS).

eva1984 · on Oct 12, 2015

Until they succeeds to persuade AWS/GCE/Azure to sell ARM backed instances, they will be taken seriously.

SixSigma · on Oct 12, 2015

Let's not lose sight of who James Hamilton is :

From his Linkedin :

VP & Distinguished Engineer

Amazon.com

December 2008 – Present (6 years 11 months)

A member of the Amazon Web Services team. Specializes in infrastructure efficiency, reliability and scaling. Prior to joining Amazon.com, James was Microsoft Data Center Futures Architect. He has spent more than 20 years working on high-scale services, database management systems, and compilers.

And a couple of biographies :

http://highscalability.com/blog/2015/1/12/the-stunning-scale...

http://www.wired.com/2013/02/james-hamilton-amazon/

AstroJetson · on Oct 12, 2015

He presently lives on his trawler, he's presently on an island 1200 miles west of Australia.

  http://blog.mvdirona.com/

eva1984 · on Oct 12, 2015

Oh, that actually makes more sense...

whiterabbit_obj · on Oct 12, 2015

Actually, Amazon bought last year Annapurna Labs for $400M. Annapurna Labs, while being secretive, was known to design ARM chips to servers (Fabless), and is probably one of the only startups in the recent years in the stagnant chips market. Why would Amazon invest & buy a chip design firm if it was not intending to ditch Intel and try placing some ARM chips in its servers?

wmf · on Oct 12, 2015

But those ARMs may be used in infrastructure that is never exposed to customers.

RP_Joe · on Oct 12, 2015

That link is dead.

tw04 · on Oct 12, 2015

His site doesn't appear to be able to handle the load. Google has it cached thought:

http://webcache.googleusercontent.com/search?q=cache:CAYvaWj...