I work on Time-of-flight camera's that need to handle the kind of data that you'...

I work on Time-of-flight camera's that need to handle the kind of data that you're referring too.

Each pixel takes a multiple measurements over time of the intensity of reflected light that matches the emission pulse encodings. The result is essentially a vector of intensity over a set of distances.

A low depth resolution example of reflected intensity by time (distance):

i: _ _ ^ _ ^ - _ _ d: 0 1 2 3 4 5 6 7

In the above example, the pixel would exhibit an ambiguity between distances of 2 and 4.

The simplest solution is to select the weighted average or median distance, which results in "flying pixels" or "mixed pixels" for which there are existing efficient techniques for filtration. The bottom line is that for applications like low-latency obstacle detection on a cost-constrained mobile robot, there's some compression of depth information required.

For the sake of inferring a highly realistic model from an image, Neural radiance fields or gaussian splats may best generate the representation that you might be envisioning, where there would be a volumetric representation of material properties like hair. This comes with higher compute costs however and doesn't factor in semantic interpretation of a scene. The Top performing results in photogrammetry have tended to use a combination of less expensive techniques like this one to better handle sparsity of scene coverage, and then refining the a result using more expensive techniques [1].

1: https://arxiv.org/pdf/2404.08252