There are many ways to think about the Kalman filter. Here are a few that I like:
* You can think of it as a Bayesian update process for linear Gaussian systems. That is: given a prior belief of the state of a system (and an uncertainty about that belief), and a measurement about the system (and uncertainty about that measurement), the Kalman filter tells you how to combine the prior with the measurement. This is very hard to do in general, but has an exact solution if your system is Linear-Gaussian. That's magical!
* You can also think of it as a "better way to average". If I gave you two quantities that reflected some "true" value and asked you what the true value was, you would probably average them. The Kalman filter does you one better, because it tells you to average the two quantities weighted by how confident you feel about each one.
* If you like control theory, you can think of the the Kalman filter as the dual of the Linear-Quadratic Regulator. That is, the KF is the optimal state estimator for Linear Gaussian systems in the same way that the LQR is the optimal (minimum cost) controller for LG systems. It's also worth pointing out that if the system you are estimating is being controlled, the KF can incorporate control inputs as well!
* You can think of it as a factor graph with linear residuals and Gaussian noise functions in factors that connect a chain of variables, with all but the most recent variable marginalized. It’s a well known fact that linear, Gaussian factors result in a closed-form expression that gives the optimal maximum a posteriori estimate. The Kalman filter exploits this very special case. You can also write a LQR down with a factor graph (as the parent commented, the KF and LQR are duals).
That's the same as the first point in the parent comment's list: a factor graph is a visualisation of the conditional probability distribution. But yes it is very helpful to draw out the factor graph (or Bayesian graph) for the Kalman filter, probably more useful than just writing out the equations.
By the way (as if my original comment above isn't already nitpicky enough, this is even worse...):
It bugs me when people use the word "optimal" in the Gaussian / Bayesian formulation. As the top-level comment above says, if you assume the various prior and conditional distributions are Gaussian then the posterior distribution is Gaussian too. This is not optimal, it's exact, just like you wouldn't say x=2 is optimal solution to x+1=3.
It is the optimal solution in the quadratic optimisation formulation, as the top-level comment also correctly said.
I'm not a mathematician at all (mechanical engineer), but to me, "exact" sounds like "deterministic" as an opposition to stochastic.
I though optimal conveyed the idea of "literally the best possible solution but you're still in the presence of a fully random system here".
Which might be the wrong interpretation, but hopefully it explains why some people (who aren't necessarily familiar with rigorous mathematics) use optimal.
I do see your point. But if you're talking about a probability or probability distribution, it can still be an exact solution to a model. For example, if I throw two standard dice, what is the probability of throwing two sixes? The answer is 1/36. To me, it sounds odd to describe 1/36 as the "optimal" solution to that problem, even though it's stochastic. Even "exact" solution is a bit odd, I'll concede, but a lot less so. "The solution" or "the answer", with no more qualification needed, sounds best to me.
It's an estimator in this case. A fixed number is an estimator too, it's just not going to be desirable in most cases. But single numbers are absolutely and unquestionably also valid estimators.
In any case, what I'm trying to get at there is that in estimator theory there is a concept of optimality for an estimator over a distribution.
Sure, a single number is a trivial example of a procedure/formula.
But an estimator estimates an unknown parameter from data (or in such a trivial estimator possibly without data) - and I believe this is central to the confusion.
Yes, point 2 is really the elevator pitch of Kalman filters. It enables sensor fusion, averaging a fast noisy sensor with a slow accurate sensor, or even add a model as one less confident input to the filter.
As pointed out elsewhere in this thread, demonstrating a Kalman filter with only one input doesn’t really show their real potential.
I think state of the art can work with something different than Gaussian distribution, either in the input data or the predicted one (which, with non linear models can be very unregular). Isn't that the point of the unscented kalman filter and all the ones that generate lots of hypotheses to check the target distribution. I probably don't use the correct vocabulary here... Sorry.
But doesn't that only describe one part of it. Since it also gives you a way to automatically figure out the "how confident you are" about each value over time ?
I'm struggling to see the advantage over the arithmetic mean if I don't have a model of the true value. How is the confidence estimated for each value? given the example from the article where the variance of the temperature sensor is considered to be constant.
Imagine you want to know the depth of liquid in a tank you're filling up - and you've got a noisy depth gauge and a noisy flow-rate meter.
If you merely take the mean of the last 10 depth gauge readings, your smoothed depth reading will always lag behind the true depth, being about 5 readings out-of-date.
By fusing together the noisy depth gauge, and the noisy flow-rate meter, and a model saying how fast depth rises with flow, you can average out the noise without creating the same level of lag.
This is useful in applications like GPS receivers - there's noise so you do need filtering, but for driving through complex junctions, the last thing you want is a 5 second delay!
If you have separate estimates / noisy measurements of the same quantity, and also have some estimate of their variance (e.g. pollsters or weather forecasters with different accuracy track records), you'd probably want a weighted mean (inversely by their variance) rather than a simple mean. If you have a system that runs for some time, you'd usually (but not always) want an average with higher weights placed on the recent past than on the distant past. (The arithmetic mean is the optimal estimate in a model where the true value is unknown but time-constant.)
The Kalman filter is "just" a recipe to calculate the weights for optimal estimation, given different model assumptions. The difficult part is not applying the formulas, but mostly in coming up with a good model of the movements of the things you want to measure/estimate.
At a high level, the Kalman filter offers two advantages over a simple arithmetic mean (which correspond to its two steps, update and observe):
* The Kalman filter can account for changes in state. This is useful the value you're measuring is changing over time (e.g. the position of a moving vehicle, or water level in a bucket being filled mentioned in another comment). The type of variation that can be handled by the Kalman filter will depend on what assumptions you make - how you configure it, if you like. Often you would account for velocity, but you can go a step further and account for acceleration too. A moving average will always lag behind a bit, even when movement is totally linear and there is no error in the measurement at all.
* A Kalman filter effectively gives more weight to recent measurements and less weight to older ones. e.g. a moving average with period 4 will have weights of (..., 0, 0, 0.25, 0.25, 0.25, 0.25), whereas a Kalman filter might effectively have weights of (..., 0.125, 0.25, 0.5). You can even have a continuously-varying time gap between measurements, so if a new measurement comes in then it will update the estimate a lot if the previous measurement was a long time ago, whereas if the previous measurement was extremely recent then it will effectively be averaged with the new one. One downside of this is that if the state changes a lot very suddenly then the Kalman filter will remember the old state, to some extent, whereas a moving average will forget it entirely once it drops out of the window; but this is offset to some extent by the previous point.
If you don't have a model of the various confidences then you'll need to guess them - if you do a really bad job then it might work out worse than a moving average. But if the Kalman filter's working better then you can always make use of the state estimate while ignoring the computed uncertainty.
Author here. A unitary model is still a model; I chose it mainly just to minimize the mental overhead for the reader. But even if you use a different model, the Kalman filter will still converge to a frequency-domain filter as long as the dynamics are not time-dependent (ie, constant measurement noise).
Feels kind of disingenuous though. The article makes it seem like you’re suggesting you can measure angle with simply a gyroscope + a low pass filter. But in reality, that method does not work at all (hence the famous NASA problem that Kalman filter solved). You need multiple input state space model to see the magic of Kalman filters.
Hmm, I'm not trying to make that suggestion at all. How did I make that impression? I'd like to correct it.
My goal with this article is only to help readers who have wondered how time-domain approaches like Kalman filtering relate to frequency-domain approaches like low-pass filtering, and to connect those dots. It's not a comprehensive article about the magic of Kalman filters.
(I do not have a PhD in control systems or specialize in it so take my input with a grain of salt) I think it's because Kalman filters and low pass filter are trying to achieve similar but different end goals. Kalman filter is trying to get a measurement and lower it's uncertainty. A low pass filter is trying to remove high fluctuations assuming those fluctuations is noise. As mentioned before, a low pass filter cannot achieve what a Kalman filter can. The example you show works because the noise is set to the high fluctuations. I think explaining a multiple input system is extremely critical to Kalman filters. Without it, it makes it a bit deceiving. For instance, if you do the same exercise in the article but measuring angle, you quickly see how a gyroscope + lowpass filter never reaches the level of measurement compared to gyroscope + accelerometer. Even funner, you can increase the noise in the gyroscope + accelerometer while keeping the gyroscope + lowpass filter noise the same, the gyroscope + accelerometer with a Kalman filter would always perform better.
Control theory person here. In the linear case resulting "system" is a linear time-invariant system. That's the standard state space system that you can convert and obtain a transfer function. But from a higher order transfer function, through partial fraction decomposition you can always write it as a sum of simple lead-lag and second order filters. So the distinction in the post and also discussion here is kind of orthogonal semantics. What NASA did was to "extend" the kalman filter for nonlinear cases (actually their extension was also not so big but if it works it works). In short every linear system can be written as a sum of trivial components such as low pass/notches/anti-notches etc. This is in fact how you implement higher order controllers real time.
The MIMO distinction is also not super important. Since you can also have MIMO low pass filters. The real difficulty is obtaining the coefficients of these filters and hopefully find the best ones. That's where you start getting into the optimality and the real contribution of these tools. But as a side-effect you must assume that the noise is Gaussian otherwise you lose much of the niceties of the theoretical guarantees. This is basically the biggest control theoretical disadvantage of kalman filters and the reason why other domains keep rediscovering it while in control it is "kinda, sorta" falling out of grace.
Thanks, and I broadly agree with what you say. It's only under specific conditions (system dynamics, process covariance, measurement matrices, and measurement covariance are all static) that the Kalman filter converges to the Wiener filter, as Kálmán says in his paper.
I would contest that the reason a frequency filter doesn't work on a gyroscope is not because it's a MIMO system (which frequency domain techniques can generalize to; you just end up with n x m transfer functions) but because the system is not static, so it doesn't satisfy the conditions that would cause the Kalman filter to converge to a fixed-coefficient filter.
The title is probably the issue. People read it and go in expecting you to make that case. In seeking brevity it seems you lost sufficient context and nuance.
even as a static model, it is not a low pass filter.
in fact, in the static case, a kalman filter is exactly the same as the recursive least squares estimator, which is provably the optimal unbiased estimator
In the static case, the Kalman filter converges to a fixed (frequency domain) filter after running for a while. The type of frequency behaviour depends on the system properties. The filter it converges to is the Wiener filter for that system.
FWIW, in the context of robotics, where there are real inputs you set, the motion model ends up being extremely useful and behaves much differently than just a frequency-domain filter.
But the overall subject of comparison to just a low pass filter is very interesting
Absolutely! And thank you for saying so. (To be clear, the answer to the question posed in the title was never intended to be yes. The answer was intended to be sometimes.)
In one of my other articles where I applied the Kalman filter to a more complex system, a frequency-domain filter would have failed. But in this article I tried to stick to the simplest possible system, so that the math wouldn't get in the way of illustrating the way in which Kalman and Wiener filters relate to each other, since it's an interesting connection that doesn't get much attention.
Not a robotics expert. But my understanding of the effect of input/motion models on the state estimation of Kalman filters is that it conditions the error covariance of the state estimate.
So, when the quality of measurements is very high (almost no noise), then the motion model provides very little benefit. And when the measurement quality is very low, the motion model reduces the error ellipse around the real state of the system.
So, even when there are inputs and a motion model you could imagine the Kalman Filter as a frequency-domain filter (E.G. IIR/FIR filter), where the tap weights are being updated based on some known function.
If the quality of the measurements are high enough, and the sampling frequency is high enough. Then the Kalman filter should be able to converge to a very good estimate of state, even with no motion model right? In which case it would have the exact same characteristics as an adaptive, slowly time-varying frequency-domain filter.
> If the quality of the measurements are high enough, and the sampling frequency is high enough. Then the Kalman filter should be able to converge to a very good estimate of state, even with no motion model right?
No. Consider this discrete-time linear system with 2-dimensional state [x1, x2] and one output y:
x1' = 2 * x1
x2' = x2
y = x1 + x2
Suppose we see the output sequence 1, 2, 4, 8, ... If we know the motion model, then we know that the initial state must have been [1, 0]. (In technical terms, the pair (A, C) is observable.) But if we don't have the motion model -- all we know is that y = x1 + x2 -- there's no hope of ever getting a state estimate, even if our y measurement has zero noise.
Totally agreed, some people just gloss over the state space model aspect of control systems which is pretty much the foundation of all these type of filters
Hey there's all the interactive multiple model fancy, where you can run (and make compete or merge) several models together, and for some kinds of tracking activities, the 'the thing doesn't move and is just fixed and noisy' is part of the constellation of competing models.
Only did a quick skim, but click bait-y title since it seems like they’re using a single input Kalman filter which then yeah…it’ll be a low pass filter. The article also doesn’t show any confidence bands which is a huge part of Kalman filters too. But the magic of a Kalman filter is multiple inputs. One thing that took a long time for me to click is multiple noisy measurements measured at the same time becomes one less noisy measurement thanks to the inherent nature of Bayesian statistics. That’s where the magic is and why Kalman filter are not low pass filters.
> The article also doesn’t show any confidence bands which is a huge part of Kalman filters too.
The figure does show confidence bands in pale blue, although they're hard to see because the data points kind of obscure them. But since the covariance deterministically converges to a constant within a few samples, regardless of the data, the confidence bands are not very interesting in the example in the article.
There is no rule that a low pass filter must be single input. You just get a matrix of transfer functions. Most control loops with a reference input and sensor noise can be viewed this way.
A paper I once found really illuminating on what a kalman filter was and how it related to a similar models is "A Unifying Review of Linear Gaussian Models", by Sam Roweis and Zoubin Ghahramani. It can be found here in pdf format: http://www.stat.columbia.edu/~liam/teaching/neurostat-fall20...
Meta comment: I really appreciate that the author is taking the time to reply to people with queries (not this post). Thanks! It makes this community better.
This is kind of like picking a scalar input function with a constant derivative to assert that newton's method is just error feedback. -- in that special case that's all it is, and you won't get an intuition that newton's method constructs and solves a local linear approximation from the function's Taylor expansion unless you look at a more complicated case.
Same applies here, single input with a trivial model, kalman is just a lowpass filter.
Isn't it more than that though? It can be used to combine multiple data sources with varying degrees of uncertainty and change the confidence in each sensor on the fly. That's more than a low pass filter.
Absolutely. The Kalman filter has many uses, and filtering a single stream of data is only one of them. Sensor fusion is an important application, which I believe is mentioned in the first footnote.
Author here. Yes, if the system has a high-pass behaviour, then the corresponding Kalman filter will converge to a high-pass filter, and so on. Non-low-pass-like behaviour can emerge from the prediction step, if the model is not unitary. If there's interest, I might write a followup article where a Kalman filter takes on a bandpass nature.
I read Gaussian and Cauchy distributions as inputs, but can you add arbitrary distributions other than those two examples, and expect the filter to yield artifacts from those distributions? Naively, I'm thinking of different coloured noise as inputs, and what kind of artifacts a Kalman filter would produce over them.
I'd expect BPF behavior when estimating the rate of a noisy input (e.g. state: speed, observation: position), but I can't think of a process model that would be truly high-pass.
I've used it to compensate for directional measurement bias.
If you pass a moving probe through a magnetic field your meaured results vary by direction (with or perpendicular to flux) and velocity.
If you want to measure a geomagnetic field using an aircraft, the heading matters.
To normalise in post processing you calibrate a Kalman filter by flying precessing butterfly wing patterns in a known relatively level flux area and then use that to remove the magnetic signature of the aircraft and heading from data collected over multiple headings and days.
( There are a few other twists - diurnal flux and induced field from the Earths GMF interactions needs to be isolated, etc ).
Conceptually, every step of applying a Kalman filter in the simplest case is just updating a Gaussian prior representing position with a new datapoint sampled with (assumed) Gaussian error. The result is equivalent to convolving a Gaussian representing position with a Gaussian representing noise.
In the frequency domain, that's equivalent to multiplying one Gaussian by another one with zero mean, which will always put higher weight on low frequencies. No matter what the gain is in the Kalman filter, that'll still be true as far as I can tell. As the gain varies, the cutoff within the low-pass filter will change though.
It gets harder to analyze when you start using non-linear models to update position though. Generally, I think the same logic applies.
Yes, it can easily correspond to a BPF, for example when trying to track a sinusoidal signal, say the AC line frequency. Then you can be reasonably certain that it will be near 50 Hz (or 60 Hz), and the resulting Kalman filter will be a sharp band-pass filter centered around that (when neglecting higher harmonics).
it's a noise filter, so it wouldn't be high pass or band path. it's approximately low pass filter when it hasn't converged precisely to a low pass filter.
As the author points out: there isn’t any one Kalman filter. The Kalman filter is really a recipe for constructing optimal linear predictive filters, but the actual characteristics of the resulting filter will depend on the dynamics, state variables, and sensors that you’ve tuned it for, and that dependence gets reflected numerically in the various matrices that your specific Kalman filter is built from.
To me the biggest magic of kalman filtering wasn't the low-pass nature (yes it's a multidimensional low pass filter!), but rather issues around system identification. Namely if and when the system equations evolve as much as the hidden state, then things get really interesting! That specific formulation (joint system estimation + Kalman smoothing) was fairly interesting and useful for the applications I worked on. Obviously your mileage may vary :)
As an addendum, I want to say a well designed whitening matrix + IIR filter can replace a Kalman filter, again depending on the application. Just makes things easier to understand, debug etc. Works if your vector is somehow decomposable into roughly independent scalars.
The generic Kalman filter is also relatively old and slow to adapt.
Explicit noise regularized (or variance regularized at least) filters do better especially in case the signal is corrupted by non-gaussian noise where Kalman filter can get divergent.
* You can think of it as a Bayesian update process for linear Gaussian systems. That is: given a prior belief of the state of a system (and an uncertainty about that belief), and a measurement about the system (and uncertainty about that measurement), the Kalman filter tells you how to combine the prior with the measurement. This is very hard to do in general, but has an exact solution if your system is Linear-Gaussian. That's magical!
* You can also think of it as a "better way to average". If I gave you two quantities that reflected some "true" value and asked you what the true value was, you would probably average them. The Kalman filter does you one better, because it tells you to average the two quantities weighted by how confident you feel about each one.
* If you like control theory, you can think of the the Kalman filter as the dual of the Linear-Quadratic Regulator. That is, the KF is the optimal state estimator for Linear Gaussian systems in the same way that the LQR is the optimal (minimum cost) controller for LG systems. It's also worth pointing out that if the system you are estimating is being controlled, the KF can incorporate control inputs as well!