In addition, for the purposes of characterising the system using NTP, ideally one should also either eavoid any ensembling / combining of sources because that's just pulling in multiple sources of noise, or it should be proven that doing so does not affect the final results, or if it does, then by how much.
There's so much more that can be picked apart here because it's an absolute rabbit hole of a topic - for example, saturate the links a little or a little more, especially with bursty traffic in both directions (or do an 80-20 cycle), and watch those measurements go out the window and only with PTP-capable switches at every hop will you survive this. The Telecom industry has done it ad nauseam and for years with appropriate standardised measurements, test masks and requirements.
And this whole business is also not fundamentally PTP vs. NTP because the principles are exactly the same, it's the fact that PTP was designed with hardware timestamping in mind and it would serve no purpose more useful than NTP had NTP gained support for one-step operation, hardware timestamping - and network assistance. But the default PTP profile uses known multicast groups and thus known destination MACs and it was the easiest entry into hardware packet matching - early "PTP-enabled" NICs only timestamped PTP packets (and most only multicast), only more modern ones allowed to timestamp all packets and that includes NTP.
And as far as RasPi goes - for time sync, at least in terms of COTS equipment, Intel is king, but that's because they had smart people working hard for years to purposefully integrate time-aware functionalities into the architectures (Hey Kevin and team!) - invariant TSC, ART, culminating with PCIE PTM. But this is where aiming for the tens to single digit ns region.
You can easily deliver sub-10 ns sync to a NIC, but a huge source of uncertainty is time transfer from your hardware-timestamping NIC to the OS clock. PTM is the only way to do this in hardware, otherwise, with Solarflare being the only NON-PTM exception I've worked with, comparing NIC to OS time is literally reading the time register on the NIC and the kernel time in quick succession in batches (granted, with local interrupts disabled), and then picking the pair of reads that seems to have taken the least amount of time. Unknowns on top of unknowns.
There's so much more that can be picked apart here because it's an absolute rabbit hole of a topic
That pretty much sums it up and I agree with everything you stated. There are countless variables that one could spend a lifetime trying to understand, tune and compensate for and all of that changes with each combination of hardware and refreshing hardware is inevitable. It can be a never ending game. I just tune for good enough for my needs that being slightly better than defaults.
There's so much more that can be picked apart here because it's an absolute rabbit hole of a topic - for example, saturate the links a little or a little more, especially with bursty traffic in both directions (or do an 80-20 cycle), and watch those measurements go out the window and only with PTP-capable switches at every hop will you survive this. The Telecom industry has done it ad nauseam and for years with appropriate standardised measurements, test masks and requirements.
And this whole business is also not fundamentally PTP vs. NTP because the principles are exactly the same, it's the fact that PTP was designed with hardware timestamping in mind and it would serve no purpose more useful than NTP had NTP gained support for one-step operation, hardware timestamping - and network assistance. But the default PTP profile uses known multicast groups and thus known destination MACs and it was the easiest entry into hardware packet matching - early "PTP-enabled" NICs only timestamped PTP packets (and most only multicast), only more modern ones allowed to timestamp all packets and that includes NTP.
And as far as RasPi goes - for time sync, at least in terms of COTS equipment, Intel is king, but that's because they had smart people working hard for years to purposefully integrate time-aware functionalities into the architectures (Hey Kevin and team!) - invariant TSC, ART, culminating with PCIE PTM. But this is where aiming for the tens to single digit ns region.
You can easily deliver sub-10 ns sync to a NIC, but a huge source of uncertainty is time transfer from your hardware-timestamping NIC to the OS clock. PTM is the only way to do this in hardware, otherwise, with Solarflare being the only NON-PTM exception I've worked with, comparing NIC to OS time is literally reading the time register on the NIC and the kernel time in quick succession in batches (granted, with local interrupts disabled), and then picking the pair of reads that seems to have taken the least amount of time. Unknowns on top of unknowns.