In addition, for the purposes of characterising the system using NTP, ideally on...

In addition, for the purposes of characterising the system using NTP, ideally one should also either eavoid any ensembling / combining of sources because that's just pulling in multiple sources of noise, or it should be proven that doing so does not affect the final results, or if it does, then by how much.

There's so much more that can be picked apart here because it's an absolute rabbit hole of a topic - for example, saturate the links a little or a little more, especially with bursty traffic in both directions (or do an 80-20 cycle), and watch those measurements go out the window and only with PTP-capable switches at every hop will you survive this. The Telecom industry has done it ad nauseam and for years with appropriate standardised measurements, test masks and requirements.

And this whole business is also not fundamentally PTP vs. NTP because the principles are exactly the same, it's the fact that PTP was designed with hardware timestamping in mind and it would serve no purpose more useful than NTP had NTP gained support for one-step operation, hardware timestamping - and network assistance. But the default PTP profile uses known multicast groups and thus known destination MACs and it was the easiest entry into hardware packet matching - early "PTP-enabled" NICs only timestamped PTP packets (and most only multicast), only more modern ones allowed to timestamp all packets and that includes NTP.

And as far as RasPi goes - for time sync, at least in terms of COTS equipment, Intel is king, but that's because they had smart people working hard for years to purposefully integrate time-aware functionalities into the architectures (Hey Kevin and team!) - invariant TSC, ART, culminating with PCIE PTM. But this is where aiming for the tens to single digit ns region.

You can easily deliver sub-10 ns sync to a NIC, but a huge source of uncertainty is time transfer from your hardware-timestamping NIC to the OS clock. PTM is the only way to do this in hardware, otherwise, with Solarflare being the only NON-PTM exception I've worked with, comparing NIC to OS time is literally reading the time register on the NIC and the kernel time in quick succession in batches (granted, with local interrupts disabled), and then picking the pair of reads that seems to have taken the least amount of time. Unknowns on top of unknowns.