Hacker News new | past | comments | ask | show | jobs | submit login

It is in memory performance, which is what I assumed was being measured here.



How are you defining memory performance and where are your supporting comparisons? This article only discusses the M1's behavior, and makes no comparisons to any other CPU.


FWIW, I ran it on a MacBook Pro (13-inch, 2019, Four Thunderbolt 3 ports), 2.4 GHz Quad-Core Intel Core i5, 8 GB 2133 MHz LPDDR3:

  two  : 49.6 ns  (x 5.5)
  two+ : 64.8 ns  (x 5.2)
  three: 72.8 ns  (x 5.6)
EDIT to add: above was just `cc`. Below is with `cc -O3 -Wall`, as in Lemire's article:

  two  : 62.8 ns  (x 7.1)
  two+ : 69.2 ns  (x 5.5)
  three: 95.3 ns  (x 7.3)


You _need_ to use -mnative because it otherwise retains backwards compatibility to older x86.


  (base) Coding % cc -mnative two-three.c
  clang: error: unknown argument: '-mnative'

  (base) Coding % cc -v
  Apple clang version 12.0.0 (clang-1200.0.32.28)
  Target: x86_64-apple-darwin20.2.0
  Thread model: posix


It's spelled "-march=native" in gcc and "-arch x86_64h" in clang.

It doesn't make much difference though, autovectorization doesn't work very well and there is not a lot of special optimization for newer x86 CPUs.


All recent Intel Core-i microarchitectures require using full vector width loads to max out L1d bandwidth, because the load/store units don't actually care about the width of a load, as long as it doesn't cross a cache line (in which case the typical penalty is an additional cycle).

Only using 128 bit wide instructions on a core that has 512 bit hardware results in 4x less L1d bandwidth.


there must be something wrong there, on my late 2014 laptop that mounts

    Type: DDR4
    Speed: 2133 MT/s
I get

    two  : 27.1 ns (3x)
    two+ : 28.6 ns (2.2x)
    three: 39.7 ns (3x)
which is not much, considering this is an almost 6 years old system with 2x slower memor


Dunno, I didn't reboot and didn't close all other programs (browser, editor, mail, calendar, notes, editor)... Top shows

Load Avg: 2.36, 2.01, 1.97 CPU usage: 2.10% user, 3.39% sys, 94.49% idle




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: