It is in memory performance, which is what I assumed was being measured here.

kllrnohj · on Jan 6, 2021

How are you defining memory performance and where are your supporting comparisons? This article only discusses the M1's behavior, and makes no comparisons to any other CPU.

FabHK · on Jan 6, 2021

FWIW, I ran it on a MacBook Pro (13-inch, 2019, Four Thunderbolt 3 ports), 2.4 GHz Quad-Core Intel Core i5, 8 GB 2133 MHz LPDDR3:

  two  : 49.6 ns  (x 5.5)
  two+ : 64.8 ns  (x 5.2)
  three: 72.8 ns  (x 5.6)

EDIT to add: above was just `cc`. Below is with `cc -O3 -Wall`, as in Lemire's article:

  two  : 62.8 ns  (x 7.1)
  two+ : 69.2 ns  (x 5.5)
  three: 95.3 ns  (x 7.3)

namibj · on Jan 6, 2021

You _need_ to use -mnative because it otherwise retains backwards compatibility to older x86.

FabHK · on Jan 6, 2021

  (base) Coding % cc -mnative two-three.c
  clang: error: unknown argument: '-mnative'

  (base) Coding % cc -v
  Apple clang version 12.0.0 (clang-1200.0.32.28)
  Target: x86_64-apple-darwin20.2.0
  Thread model: posix

astrange · on Jan 7, 2021

It's spelled "-march=native" in gcc and "-arch x86_64h" in clang.

It doesn't make much difference though, autovectorization doesn't work very well and there is not a lot of special optimization for newer x86 CPUs.

namibj · on Jan 8, 2021

All recent Intel Core-i microarchitectures require using full vector width loads to max out L1d bandwidth, because the load/store units don't actually care about the width of a load, as long as it doesn't cross a cache line (in which case the typical penalty is an additional cycle).

Only using 128 bit wide instructions on a core that has 512 bit hardware results in 4x less L1d bandwidth.

africanboy · on Jan 6, 2021

there must be something wrong there, on my late 2014 laptop that mounts

    Type: DDR4
    Speed: 2133 MT/s

I get

    two  : 27.1 ns (3x)
    two+ : 28.6 ns (2.2x)
    three: 39.7 ns (3x)

which is not much, considering this is an almost 6 years old system with 2x slower memor

FabHK · on Jan 6, 2021

Dunno, I didn't reboot and didn't close all other programs (browser, editor, mail, calendar, notes, editor)... Top shows

Load Avg: 2.36, 2.01, 1.97 CPU usage: 2.10% user, 3.39% sys, 94.49% idle