For etcd benchmarking we used the 'benchmark' tool from etcd's source (commit d62ce55, ~3.1.0-rc1). AFAIK that tool does use the v3 api.
The paper was completed before etcd 3.1 was released. etcd's docs stated (at that time) that the client is not linearizable. We only tested writes.
As for performance, it's hard to say if our results pair up with the official etcd results (we were not aware of them - were they published at the time?). The hardware & software setup used for benchmarking is different.
I mean that without the ?quorum flag the stale reads are possible. I tested the etcd v3 (http api) and the reads were incredibly fast but when I set the flag, the read's latency became the same as write.
Also I believe the authors of the paper tested against the old etcd v2 API/backend although they use etcd3.
The new API/backend has significant high performance.
I work on etcd.