The running-sum-difference approach suggested above is a box filter, which has the best possible noise suppression for a given step-function delay, although in the frequency domain it looks appalling. It uses more RAM, but not that much. The single-pole RC filter you're suggesting is much nicer in the frequency domain, but in the time domain it's far worse.
Not really? Sort of? I don't really have a good answer here. It depends on what you mean by "due to". It's certainly due to the impulse response, since in the sense I meant "far worse" the impulse response is the only thing that matters.
Truncating the impulse response after five time constants wouldn't really change its output noticeably, and even if you truncated it after two or three time constants it would still be inferior to the box filter for this application, though less bad. So in that sense the problem isn't that it's infinite.
Likewise, you could certainly design a direct-form IIR filter that did a perfectly adequate job of approximating a box filter for this sort of application, and that might actually be a reasonable thing to do if you wanted to do something like this with a bunch of op-amps or microwave passives instead of code.
So the fact that the impulse response is infinite is neither necessary nor sufficient for the problem.
The problem with the simple single-pole filter is that by putting so much weight on very recent samples, you sort of throw away some information about samples that aren't quite so recent and become more vulnerable to false triggering from a single rapid mouse movement, so you have to set the threshold higher to compensate.
Reading all of you sounding super smart and saying stuff I don’t recognize (but perhaps utilize without knowing the terms) used to make me feel anxious about being an impostor. Now it makes me excited that there’s so many more secrets to discover in my discipline.
It turns out that pretty much any time you have code that interacts with the world outside computers, you end up doing DSP. Graphics processing algorithms are DSP; software-defined radio is DSP; music synthesis is DSP; Kalman filters for position estimation is DSP; PID controllers for thermostats or motor control are DSP; converting sonar echoes into images is DSP; electrocardiogram analysis is DSP; high-frequency trading is DSP (though most of the linear theory is not useful there). So if you're interested in programming and also interested in graphics, sound, communication, or other things outside of computers, you will appreciate having studied DSP.
Don't worry, this is a domain of magic matlab functions and excel data analysis and multiply-named (separately invented about four times on average in different fields) terms for the same thing, with incomprehensible jargon and no simple explanation untainted by very specific industry application.