You can! The problem in that case is it's not a convolution (blur) per se. Each ...

You can! The problem in that case is it's not a convolution (blur) per se. Each pixel is an average of subpixels P(1)=S1(1)+S2(1)+...+Sn(1), P(2)=S1(2)+...+Sn(2) -- but as you can see there are no elements in common between pixels, which is the case with convolution. That is, in the deconvolution case there are as many variables as unknowns, whereas in the upscaling case you're creating unknowns out of a single variable.

To estimate those subpixels then you're going to be forced to make additional assumptions; if you assume they are independent you would simply estimate Sk(j)=P(j). The traditional (easiest) assumption is that the image is somewhat "bandlimited" -- it does not have many variations (frequencies) faster than once per pixel (so no equal or faster than subpixel variation). If this were the case, you could reconstruct the subpixels perfectly [1], save for some noise. But this is not always the case (and fails spectacularly when you have edges), resulting in upscale blur. So simple linear upscaling algorithms by reaching a compromise between blur and edge enhancing.

If you want to do better though, you have to use non-linear kernels and have good underlying models for your image content. A promising approach is to use machine learning/NNs: http://engineering.flipboard.com/2015/05/scaling-convnets/

[1] https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampli...