A few thoughts come to mind. Some said others not.
1. Seems a bit weird to be looking at the accelerometer data yet miss the obvious approach of summing up the acceleration to get the velocity and then summing that up to get position. Yes I know about drift but even then I'd assume the fairly constant several-second g force of pulling in or out of a station or taking a curve would be a strong signal easy to distinguish from short lived jostling.
2. The "train moving" frequency you discovered via fourier analysis was most likely the hunting oscillation. This has to do with how the wheels of the train are designed to force it to turn opposite to any deviation from the direction of the track. Thus there is a back and forth "hunting" for the center that is completely determined by the geometry of the track and the wheels, and therefore the length of track per complete back-and-forth cycle (aka the wavelength) is constant. The frequency of the oscillation (aka back-and-forth cycles per second) is just this constant length divided by the velocity of the train. This fact could be leveraged to estimate the actual train speed rather than just moving/not moving.
3. Combining 1 and 2, a combination of integrating acceleration and confirming / correcting estimated velocity with the expected hunting oscillation would likely be the most powerful / reliable model.
4. Using a classifier seems overkill here. But I'm sure at some point it was easier to just raw-data it than work out a theory driven model which accounted for all the practical confounding factors.
I'm one of the developers behind this project, so I want to jump in because we did explore some of these solutions.
You are correct in saying the low frequency acceleration from starts, stops and turns can be distinguished from the higher frequency noise.
One big challenge was with orientation. Acceleration can look the same as deceleration and turns from the sensor's perspective, if you turn the phone around. Taking the integral of the gyro reading, the error would grow quadratically, and we found magnetometer readings unreliable depending on the vehicles.
Your point about the hunting oscillation is interesting and I agree, estimating the speed would be a great improvement.
Well done! Have you considered spoofing Wi-Fi/BLE signals in addition to accelerometer data? This way you could verify that you are on the right station.
That's just an inertial nav systems, quite useful for missiles before/without usable GNSS, but also probably quite inaccurate with the MEMS accelerometers in phones.
1. Seems a bit weird to be looking at the accelerometer data yet miss the obvious approach of summing up the acceleration to get the velocity and then summing that up to get position. Yes I know about drift but even then I'd assume the fairly constant several-second g force of pulling in or out of a station or taking a curve would be a strong signal easy to distinguish from short lived jostling.
2. The "train moving" frequency you discovered via fourier analysis was most likely the hunting oscillation. This has to do with how the wheels of the train are designed to force it to turn opposite to any deviation from the direction of the track. Thus there is a back and forth "hunting" for the center that is completely determined by the geometry of the track and the wheels, and therefore the length of track per complete back-and-forth cycle (aka the wavelength) is constant. The frequency of the oscillation (aka back-and-forth cycles per second) is just this constant length divided by the velocity of the train. This fact could be leveraged to estimate the actual train speed rather than just moving/not moving.
3. Combining 1 and 2, a combination of integrating acceleration and confirming / correcting estimated velocity with the expected hunting oscillation would likely be the most powerful / reliable model.
4. Using a classifier seems overkill here. But I'm sure at some point it was easier to just raw-data it than work out a theory driven model which accounted for all the practical confounding factors.