Found this http://manpages.ubuntu.com/manpages/trusty/man2/perf_event_o... and that article doesn't instill much confidence in the reliability of these counters. Comment for CPU_CYCLES says "Be wary of what happens during CPU frequency scaling", comment for INSTRUCTIONS says "these can be affected by various issues, most notably hardware interrupt counts", BRANCH_INSTRUCTIONS says "Prior to Linux 2.6.34, this used the wrong event on AMD processors" and so on.
If I wanted to measure what OP was measuring, I would disable frequency scaling (probably doable on overclocker-targeted motherboards, also search finds some utilities which claim to do that, both windows and linux ones), measure time, then divide by frequency.
CPU_CYCLES counts cycles. This means that the time per cycle varies with frequency. If you're trying to see how many cycles something that fits in L1 takes, CPU_CYCLES is the right thing to measure.
Parent is pointing to documentation suggesting that it's measuring time and dividing it by frequency, and perhaps not perfectly in the case of dynamic scaling. They seem aware of what CPU_CYCLES is supposed to do.
The documentation is not the best. CPU_CYCLES is genuinely counting cycles.
perf is all about reading actual hardware counters. It's awesome for this. There is essentially nothing made up about perf's output, except to the extent that the hardware itself reports inexact output. (For example, perf annotate may attribute events to an instruction near the instruction in question on older hardware, because older hardware has a small amount of skew when sampling.)
Found this http://manpages.ubuntu.com/manpages/trusty/man2/perf_event_o... and that article doesn't instill much confidence in the reliability of these counters. Comment for CPU_CYCLES says "Be wary of what happens during CPU frequency scaling", comment for INSTRUCTIONS says "these can be affected by various issues, most notably hardware interrupt counts", BRANCH_INSTRUCTIONS says "Prior to Linux 2.6.34, this used the wrong event on AMD processors" and so on.
If I wanted to measure what OP was measuring, I would disable frequency scaling (probably doable on overclocker-targeted motherboards, also search finds some utilities which claim to do that, both windows and linux ones), measure time, then divide by frequency.