8051 uses several (typically 4 or 12, etc. [1]) clock cycles for one actual machine cycle. So an instruction that takes "2" cycles, can actually take 24 clock cycles.
Of course, some modern 8051 clones are more efficient. Some of them can execute one instruction per actual clock cycle.
Even then, one ARM instruction can often do work of 2-10 8051 instructions. 8051 is particularly bad at pointer arithmetic (except incrementing pointer by one) and, of course being an 8-bit CPU, 16/32 bit math.
"Microcontrollers (and many other electrical systems) use crystals to syncrhronize operations. The 8051 uses the crystal for precisely that: to synchronize it’s operation. Effectively, the 8051 operates using what are called "machine cycles." A single machine cycle is the minimum amount of time in which a single 8051 instruction can be executed. although many instructions take multiple cycles.
A cycle is, in reality, 12 pulses of the crystal. That is to say, if an instruction takes one machine cycle to execute, it will take 12 pulses of the crystal to execute. Since we know the crystal is pulsing 11,059,000 times per second and that one machine cycle is 12 pulses, we can calculate how many instruction cycles the 8051 can execute per second:
11,059,000 / 12 = 921,583
This means that the 8051 can execute 921,583 single-cycle instructions per second. Since a large number of 8051 instructions are single-cycle instructions it is often considered that the 8051 can execute roughly 1 million instructions per second, although in reality it is less--and, depending on the instructions being used, an estimate of about 600,000 instructions per second is more realistic."