I don't think compressed sensing is really extracting more information than Shannon, it simply exploits the fact that the signal we are interested in is sparse so we don't need to sample "everything". But this is somewhat outside my area of expertise so my understanding could be wrong.
Maybe I’m mixing Shannon’s limit with the sampling rate imposed by the Nyquist-Shannon Sampling theorem
> Around 2004, Emmanuel Candès, Justin Romberg, Terence Tao, and David Donoho proved that given knowledge about a signal's sparsity, the signal may be reconstructed with even fewer samples than the sampling theorem requires.[4][5] This idea is the basis of compressed sensing
…
> However, if further restrictions are imposed on the signal, then the Nyquist criterion may no longer be a necessary condition.
A non-trivial example of exploiting extra assumptions about the signal is given by the recent field of compressed sensing, which allows for full reconstruction with a sub-Nyquist sampling rate. Specifically, this applies to signals that are sparse (or compressible) in some domain
In so many words, Shannon gave a proof showing that in general the sample rate of a digital sensor puts an upper bound on the frequency of any signal that sensor is able to detect.
Unlike the Nyquist-Shannon theory, compressed sensing is not generally applicable: it requires a sparse signal.
As with many other optimization techniques, it’s a trade off between soundness and completeness.
that is not correct. digital sensors detect frequencies above the nyquist limit all the time, which is why they need an analog antialiasing filter in front of them. what they can't do is distinguish them from baseband aliases
you could just as correctly say 'nyquist-shannon theory is not generally applicable; it requires a bandlimited signal' (which is why compressed sensing doesn't violate it)
Thank you for the clarification, great point about the importance of distinguishing the acts of "detecting" and "making sense of" some signal/data/information
Consider the simple example of frequency aliasing. If you sample a 3.2MHz sine wave at a 1MHz sample rate, it looks the same as a 0.2MHz wave. But if you know a priori that the signal only has frequency components between 3 and 3.5 MHz, then you know the 0.2MHz you are measuring is actually 3.2MHz - you can fully reconstruct the original signal even though you are not sampling it fast enough.
Interestingly, in a philosophical way, you might never be able to know the “original signal”, since any signal can also technically be the alias of an infinite number of other signals, including the one used for sampling