One crucial point when we talk about high core count parts is memory and IO bandwidths. In the end, that'll determine the overall performance of your server. I don't see ARM parts, due to their relatively low single-thread performance, as good for compute-bound workloads, but, with memory and IO channels significantly faster than current x86-based parts, they could be competitive in IO bound ones.
I'd love to see more diversity in the server space, but reality is harsh.
This is where IBM cleans shop. POWER is pretty mean when it comes to bandwidth. You pay the tax to be a couple years ahead of the Intel curve if you need it.
Indeed, but there is no magic there. If ARM-based servers with high core count can be made with bandwidths in the same league as POWER, IBM's high-margin server business may end up being threatened.
Also, we cannot forget POWER is very good with single threaded performance.
I fully agree. With a dedicated server, memory and bandwidth are more often the bottleneck than CPU. I am using scaleway.com since 2 weeks. My small arm server runs squid (to bypass the proxy of my society) and a little static server that fits almost entirely in RAM. It works like a charm.
I've also been playing with Scaleway's ARM servers for a scraping project and to learn more about distributed computing. (details here in other HN thread https://news.ycombinator.com/item?id=10371316)
I've been able to scale my scraping to over almost 100 requests per second in aggregate across 1.8MM domains. And these little Atom quad core machines are great.
I just got my account upgraded to "developer" which mean I can spin up 100 machine (400 cores!), so I'm going to be experimenting with that.
No more random fork of uboot, proprietary boot method and guess-the-device-tree. Instead you get the goodness(?) of UEFI, ACPI and standard minimum hardware.
Are there any breadboard-style releases implementing this spec? I'm looking for the low(er) cost and open ecosystem that I find in ODROID and RasPi style boards, but still the robustness and standardization of this common spec.
There's a new development board coming real soon that will implement it. It is also implemented by the firmware on APM X-Gene (Mustang) and HP Moonshot. Not sure about Cavium as I've not used it.
However it's unlikely you'll find a small format SBSA-compliant board any time soon, because the server stack simply assumes a lot more RAM than is available on phone/tablet SoCs. For example, the minimum RAM required to run RHEL/aarch64 is 1 GB/pCPU (so in reality 4-8GB min), and even the Snapdragon 808 only has 3 GB. I personally wouldn't be happy doing development work for 64 bit ARM SBSA with less than 8 GB of RAM, and for OpenStack, 32 GB is the minimum I'd recommend.
(I do run Fedora/aarch64 on my LG G4 phone though :-)
I think server-class ARM is interesting, but it's always struck me as much more interesting for on-premises stuff... like, you want a cheap on-site NAS that doesn't use much power, or a generic machine you can use as a switch, or maybe even an intranet host or something. Anything in the category of stuff that is always on but not very resource-intensive.
In cloud deployments, though, sharing a bigger box amongst lots of virtual machines just seems way more economical.
I kept a close eye on the ARM server market for years waiting for it to happen and bring cheap multi core boxes to the masses. Ultimately, I stopped caring when quad core Xeon boxes got down to less than $400 and draw less than 30 watts idle: http://amzn.com/B00FE2G79C
I still think that 30 watts idle are way too much for at-home equipment. At German electricity prices we're still dealing with $75 a year just for electricity. What I want to see is a reasonably powerful machine that is <5 watts idle (for the board, CPU and RAM) plus the power of the installed hard disks.
I think that where ARM could have a lot to offer is in distributing storage and compute more evenly... for distributed bigtable or cassandra-like storage, it could make a lot of sense to pair 16-32gb of fast ssd storage to a 4-8 core ARM node... With the amount of I/O bound data per compute node reduced, and lower cost per node, this could yeild much better results than the big boxes for distributed databases.
For many other chores, I'm not sure if it makes as much sense... but that's juse my $.02 on the issue. I still think it's a pretty cool option, but not sure how well it works for a lot of different areas.
It's interesting. I agree with you, and what I think about when I read your comment is that ARM serves best the role of decentralized computing in general (things like what you mentioned and a whole lot more), which brings up a bigger question: is there a larger role for decentralized computing in the future than there is centralized computing (server farms, cloud, etc.)? My guess is yes, and I'm not just talking about the "Internet of things", etc. With lambda architecture and data pipelining there is a good chance that these sorts of advancements (the advent of cheap ARM servers, etc.) and other changes in the ecosystem will mean a lot.
I am going to presume, for ARM to win any market share, they will have to offer a competitive advantage. ( Correct me if I am wrong )
So where does ARM's Server Chip actually offer an Advantage?
Price - The CPU is only part of the Server's Cost. When you look at the whole picture, The Memory, I/O Controller & SSD, Network, will all cost the same. The higher price of the memory and SSD / HDD, the lower total cost % of the CPU.
And once you lowered to a certain percentage, you want an incredibly cheap, Atom Server CPU is available for less then $50.
Performance - Even in the chance that any ARM's Server CPU market manage to make a competitive CPU core against Xeon, which in itself is an incredible achievement, they will have to also compete against Intel's world class ECC Memory Controller, Network Controller, and I/O controller. All three are the main reason AMD CPU didn't offer any competition even when they are much cheaper.
Power - This is similar to Price, once you factor in the memory and I/O, CPU's role is relatively small. And the power / energy usage scenario on a server much flavor Intel's rather then ARM.
So where does it offer a advantage? When you want a small baremetal server that is cheap, but in the world where VM is a common place, there is no reason why you cant have a small instance of it running on a much larger CPU.
And even Intel admit it, none of their Server Customers, ( Read NONE ) wanted the Atom as they thought. It turns out everyone wanted Xeon-D, you pay a little premium for HUGE amount of flexibility. Intel wanted the Atom to disrupt itself rather then ARM, and it turns out it was the Xeon-D that did that.
The current batch of ARM microserver parts are indeed pretty primitive compared to x86, but each year they will copy more & more of the tricks and optimizations that Intel implemented 10-15 years ago, including the superb virtualization tech. Still I agree they are most likely going to fail due to the inevitable patent infringement lawsuits, the fab gap, and/or customer reluctance WRT deploying ARM SW.
I thought ARM servers might be nice on scaleway but was quite disappointed in the performance. 32bit and all disk IO over a network connection is really not a good start. Maybe that will improve when we see more ARM64 around.
I'm not sure they are comparable. I have a bunch of servers with both Scacleway and DO and, despite the slow disk access on Scacleway's, you get 2GB of memory per server which works well for some microservices.
A DO instance with the same amount of memory costs $20, compared to €2 for an IP-less server on Scacleway.
Are you (and parent) looking at something other than their pricing pages? It seems to me that scaleway costs 0.6 euro cents/hour, but also generally has higher numbers for most features than Digital Ocean: more cores, more RAM, more disk, more bandwidth.
My comparison was between the C1 offering from Scaleway and the lowest DigitalOcean package. Scaleway has far larger disk and memory, but the disk is remote over the network link and the 4 ARM cores are 32bit and terribly slow. I really like Scaleways configuration and control panel but the performance just isn't there. It's likely better for a lot of things other people are doing but I needed the IO and CPu performance.
How did you measure CPU performance? With DO you have propably a high "burst" cpu performance but if you keep using a lot of CPU ressources they will probably supsense your VM because of "abuse" or something (my experience with other VM providers). On Scaleway you have worse CPU performance but you can use the full CPU 24/7.
Compiling software is a somewhat meaningful metric across architectures (with caveats), but openssl speed is not really indicative of the CPU performance, rather it's benchmarking its own fast assembly routines. That only matters if your workload is crypto-bound.
I think that tethering the storage to the compute nodes would work better.. 16-32GB on the same board with the arm chip, in clusters per nU, per rack ... Reducing the bottlenecks from data storage to the processor. In this kind of scenario widely distributed databases could be a much better fit in ARM than in x86 based servers.
Imagine how fast Cassandra or something similar could run if your filtering nodes ran with the data, and much more widely distributed than typical... it could literally be a night and day difference in the amount of data that can be processed in a rack.
I've been out of the loop for a while datacenter-wise, and so fail to see the immediate value here.
Are these servers running virtualized instances?
Are they much cheaper as far as $/request or watt/request (or other efficiency metric?
How do they compare to a slice of an Intel server, which has the benefit of a more mature environment?
Current ARM server chips are much worse than Intel chips on the performance/watt metric [1]. The datacenter play would necessarily have to target consumers with low performance needs and offer them a better price. That would be tough because you have to compete against the already-cheap $5/server/mo Intel market.
AMD's SeaMicro was arguably the best shot at offering an alternate ARM-server commoditization model. AMD shut down SeaMicro in April [2].
ARM has typically been more efficient, for both heat and power consumption, while losing out on computational power.
That's enabled people to cram significantly more processing cores into a server rack. 4 years ago HP release a server line with 288 quad-core ARM processors in 4U, for a total of 1152 cores. Obviously that introduces the complexity of needing code that is suited to massive parallelism. Each processor there took up just 1.5 watts idle, 5 watts under load. By comparison the Atom from Intel at the time had comparable performance, but consumed 8.5 watts, just over 70% more, and didn't idle to as low consumption.
There are a number of problems that are significant with large scale data-centre operations, but power consumption and heat generation are way up there at the top. In theory a data-centre filled with ARM based servers would give you comparable performance, but with cheaper electrical and climate control bills.
The emphasis there is on "in theory" :) The reality at the moment is that companies have spent years trying to get ARM based, massive core count servers to be a thing, and it hasn't really worked out. Switching processor architectures is never something that can be done lightly. Software has to be compiled and supported provided for it, and isn't necessarily tuned for the ARM architecture. The x86 architecture's performance characteristics have been really well understood, and software likely designed with those characteristics in mind.
Historically ARM has also lacked Windows server support, and had very variable quality linux support (not helped by every Tom, Dick and Harry SoC company do the darnedest things in pursuit of creating 'value'.) As far as I've heard, they've since done (or started on?) a huge restructuring of the ARM path in the kernel to clean up the mess and provide significantly better customisation opportunities.
In part it has also seemed like a case of "no one ever got fired for buying IBM". You've only got a certain budget, do you gamble on an unproven (to you) architecture, or stick the the tried and tested Intel?
Twirrim: ARM has typically been more efficient, for both heat and power consumption, while losing out on computational power.
i_have_to_speak : Current ARM server chips are much worse than Intel chips on the performance/watt metric.
It's hard to square these two statements. It might depend on how you define "efficiency". If efficiency is the amount of energy to perform a calculation, I don't think ARM is more efficient. This paper from a couple years ago concludes that ISA is no longer a defining factor: http://www.embedded.com/design/connectivity/4436593/Analysis...
Separately, this is an excellent article comparing recent generations of Intel against each other, showing that although power use has been going up, "instructions per cycle" has been going up even faster, resulting in a net improvement in energy efficiency: http://kentcz.com/downloads/P149-ISCA14-Preprint.pdf
I wonder if the question we ought to be asking is "How idle is your server farm"? ARM still uses significantly less power than Intel chips do when idle. If your fleet is working hard all the time, it would be a no-brainer to go with Intel. What if your fleet spends 50% of the time not working? 70%? There presumably is some tipping point there.
If my server farm is idle any significant amount, I'm going to take those servers offline. If I need capacity, I'll spin up some cloud instances in the short term, and I'll bring my own servers online until I'm at an appropriate idle/busy metric.
I suspect that the days of 99% idle servers are long gone. I suspect that utilization is probably above 70%.
I suspect there is a tipping point, but that it's probably much lower than you suggest: maybe 10-25% of a single core, thus some single digit percentage of the entire processor.
Modern Intel processors can shut down unneeded cores almost completely, and frequency scaling gives you another range of efficient power reduction. It's only when you get lower than that that you are losing significant power at 'idle'.
Unlike a small battery powered device, I'd guess that the difference in idle power for the CPU is never going to be the deciding factor, as keeping the non-CPU rest-of-the-machine running will dwarf the difference. What workload would you envision as having the greatest advantage? Maybe if you were running a single instance per dedicated core?
I think these servers can be interesting for increased privacy. You can use them as a low-cost, low energy dedicated server where noone can snoop on you undetected (unlike a VPS).
A member of the Amazon Web Services team. Specializes in infrastructure efficiency, reliability and scaling. Prior to joining Amazon.com, James was Microsoft Data Center Futures Architect. He has spent more than 20 years working on high-scale services, database management systems, and compilers.
Actually, Amazon bought last year Annapurna Labs for $400M. Annapurna Labs, while being secretive, was known to design ARM chips to servers (Fabless), and is probably one of the only startups in the recent years in the stagnant chips market.
Why would Amazon invest & buy a chip design firm if it was not intending to ditch Intel and try placing some ARM chips in its servers?
I'd love to see more diversity in the server space, but reality is harsh.