Just a consideration (I am mathematician, but have nothing to do with signal processing or its mathematics; so don't overestimate my knowledge in this area):
Let us consider a periodic signal (though, I believe, this assumption can be weakened). If we have to assume that all frequencies from the inverse of the period length up to the Nyquist frequency might occur in the signal, the sampling theorem gives us the information that we need.
Now some "practical" consideration: Assume that the signal that we want to sample is very "well-behaved", e.g. we know that lots of its low frequencies are 0 or near 0 (or in general have some other equation that tells us "how the signal has to look like"). So if we reconstruct the frequencies of the signal, but in these Fourier coeficients some other value than 0 appears, we know that it has to come from "noise" that originates in some higher frequency above the Nyquist frequency.
My mathematical intuition tells me that it is plausible that such a trick might be used to reconstruct signal exactly even if we sample with less than than 2 times the highest occuring frequency. Why? Because we know more about the signal than what sampling theorem assumes. So a "wrong" reconstruction for which the sampling theorem can show its existence if there occurs a frequency higher than the Nyquist frequency is not important for us, since we can plausibly show that this signal cannot have 0s in the "excluded low frequencies" Fourier coefficients.
This would explain to me why such an impossible looking algorithm works so well in practice.
It it very plausible to me that such a trick is already used somewhere in engineering.
Let us consider a periodic signal (though, I believe, this assumption can be weakened). If we have to assume that all frequencies from the inverse of the period length up to the Nyquist frequency might occur in the signal, the sampling theorem gives us the information that we need.
Now some "practical" consideration: Assume that the signal that we want to sample is very "well-behaved", e.g. we know that lots of its low frequencies are 0 or near 0 (or in general have some other equation that tells us "how the signal has to look like"). So if we reconstruct the frequencies of the signal, but in these Fourier coeficients some other value than 0 appears, we know that it has to come from "noise" that originates in some higher frequency above the Nyquist frequency.
My mathematical intuition tells me that it is plausible that such a trick might be used to reconstruct signal exactly even if we sample with less than than 2 times the highest occuring frequency. Why? Because we know more about the signal than what sampling theorem assumes. So a "wrong" reconstruction for which the sampling theorem can show its existence if there occurs a frequency higher than the Nyquist frequency is not important for us, since we can plausibly show that this signal cannot have 0s in the "excluded low frequencies" Fourier coefficients.
This would explain to me why such an impossible looking algorithm works so well in practice.
It it very plausible to me that such a trick is already used somewhere in engineering.