An interesting side point is that the graph optimization approach used here is somewhat similar to modern graph-based visual SLAM
The graph in the article can be seen as a factor graph. VSLAM systems usually have a (kind of bipartite) factor graph with vertices/variables that are either keyframes or ‘good’ features, with edges/factors between features and the frames that see them and between adjacent frames; each of these edges are the factors in the graph. This structure results in very large but sparse graphs, and there are factor graph optimization libraries that take advantage of this sparsity (e.g. g2o or GTSAM.) These libraries also use specialized optimization techniques for some of the nonlinear manifolds (e.g. SO(3)) that arise in SLAM problems.
Maybe they only uploaded a 1080p version of the video - but I was expecting higher def. ...then again, I suppose interplanetary bandwidths are probably not great.
30+ years ago, I had friends reconstructing crime scenes for court proceedings. Architectural drawings and 3D scenes. They used AutoCAD and AutoSolid (?). Showing stuff like blood and ballistics.
Super effective. They turned my stomach.
I don't have words for these Forensic Architecture recreations. I almost feel like that I'm there (present).
I can only imagine their future VR recreations will be overpowering.
It'd be pretty neat to lift this up into 3d - you could probably reverse the transforms to find the camera pose for each frame, then drop it into a scene alongside the camera frustum and the topography so we can see exactly how much steering the descent stage did to hit its target and how fast it was descending at every stage.
On the other hand, I'd fully expect NASA/JPL to have IMU telemetry from the EDL. If I'm not completely mistaken, they would eventually get published in PDS here: https://naif.jpl.nasa.gov/naif/data.html
I’ve been thinking more about the navigation of their little helicopter.
On earth were used to being able to use GPS for route planning. If you could use this process in reverse to constantly determine ones position in 3D space above the surface using stored satellite imagery with a downward facing camera cross referenced with whatever gyro / accelerometer based positioning they’re using I wonder if there’d be any benefit. Maybe what they’ve got already is sufficient for anything you’d want to do in the near future.
> If you could use this process in reverse to constantly determine ones position in 3D space above the surface using stored satellite imagery with a downward facing camera cross referenced with whatever gyro / accelerometer based positioning they’re using I wonder if there’d be any benefit
That is pretty much exactly how TRN worked for the EDL. I don't think Ingenuity has much in terms of navigation ability, probably just basic INS. But its also not intended to fly any extended distances, so it doesn't really need any navigation abilities. I'd imagine future copters would use TRN style navigation.
The graph based approach is interesting... But I wonder if better and far simpler results might be had by simply using a few iterations of optical flow to perfect the alignment of each frame starting from the alignment of the previous frame?
As a benefit, the transformation could use images after being projected onto a deformable mesh to model the hills etc.
I love that you can see the approach angle in the distortion of the field. It also helps to convey how thin the atmosphere is to see how long it takes for that to square up.
I've done this kind of stuff through a point and click UI in GIS software. It's really cool seeing a lot of the underlying math and concepts laid out like this.
ESRI software has had this raster function for quite a while, at least 20 years. Usually 2 or 3 points would suffice. Using hundreds of points was unnecessary.
Hundreds of points lets you get a good average. 2 or 3 requires that you've definitely clicked the same point on both images; a human can use other bits of the image to work that out, but a computer finds it harder.
A next step could be to leave the already projected images where they are, and only draw over them, while marking the latest frame with a border. Eventually use frame sections which cover multiple frames to perform multi-frame superresolution.
Excellent post! I wonder why SIFT didn't find sufficient keypoints early on, it's typically a beast of a method for such a task. It looks like there's some intensity variation, the satellite image is darker, but I'm not sure that would explain it all.
The SIFT algorithm discards low contrast keypoints.
In the beginning the surface looks quite blurry (it seems the camera is auto-focusing the heat shield) which probably causes only low quality keypoints to be found on the surface.
Additionally, if the algorithm also capped the maximum number of keypoints per image, the situation gets even worse, because strong keypoints on the heat shield (which had to be discarded "manually" later) compete against weak keypoints on the surface.
5 meters. However, the "intended target" is not simply defined.
The landing ellipse for Perseverance was 7.7km by 6.6km. The goal is to land at a safe spot within the ellipse rather than land at a specific location.
The new Terrain Relative Navigation capability determines the rovers position relative to the surface during descent by comparing camera images to onboard satellite imagery. On Earth you'd use GPS. No GPS on Mars.
Once the rover knows it's position, it can now determine the safest spot to land using an onboard hazard map. The spot it chose to land at vs the spot it actually landed at was 5 meters apart.
> Once the rover knows it's position, it can now determine the safest spot to land using an onboard hazard map. The spot it chose to land at vs the spot it actually landed at was 5 meters apart.
To add a bit more info, poorly remembered from this excellent We Martians episode[0] interviewing Swati Mohan, who is the Mars 2020 Guidance, Navigation and Controls Operations Lead and was the voice of the landing. Go listen to it!
On the way down an image is taken. Using data about how the atmospheric entry is going, and with a lot of constraints that include the hazard map and what kinds of manoeuvres are possible with the descent system (in particular it does a divert and there are minimum and maximum distances the divert must lie between), a single pixel is chosen from that image to aim for. That pixel represents a 10m x 10m square, and the rover landed with 5m of that square.
The hazard map is created from images with a 1m x 1m resolution, from one of the orbiters (Mars Reconnaissance Orbiter I think). Those images are scaled down for the hazard map, as the on-board image processing had very tight bounds on how long it could search for a valid landing site. The podcast goes into some cool detail about that whole system and its technical design.
There is an obvious case where you can't rely on GPS on Earth.
Pershing-2 missiles had radar correlation guidance back in the 80's.
An obvious consequence of Google maps imagery and open source is that a capable college student can make an optical terminal guidance unit out of a mobile phone.
So it looks like it landed a little over 1km from the center of the oval, if that's your question.
When precisely talking about space travel, things tend to be discussed as "nominal" instead of being on target or correct. This is because some variance is expected, and systems are designed to work successfully within that variance. In that sense, Perseverance landed within the landing oval and on a safe landing spot, so it was 0 meters away from target.
An analogy would be it hit the bullseye and got the points, even it if it wasn't exactly in the middle of the dart board.
If you look at the last final seconds to the left of the landing you can make out an ancient river delta. That is one of the prime targets they want to investigate.
Or perhaps more importantly, did the terrain navigation software correctly choose an optimal landing location? It seems like it chose one of the rockiest places.
The graph in the article can be seen as a factor graph. VSLAM systems usually have a (kind of bipartite) factor graph with vertices/variables that are either keyframes or ‘good’ features, with edges/factors between features and the frames that see them and between adjacent frames; each of these edges are the factors in the graph. This structure results in very large but sparse graphs, and there are factor graph optimization libraries that take advantage of this sparsity (e.g. g2o or GTSAM.) These libraries also use specialized optimization techniques for some of the nonlinear manifolds (e.g. SO(3)) that arise in SLAM problems.