Another attractive method for parallel computation has been suggested by G. Estrin [Proc. Western Joint Computing Conf. 17 (1960), 33–44]; for n = 7, Estrin’s method is:
Here a3 = u(x). However, an interesting analysis by W. S. Dorn [IBM J. Res. and Devel. 6 (1962) 239–245] shows that these methods might not actually be an improvement over the second-order rule, if each arithmetic unit must access a memory that communicates with only one processor at a time.
* * *
Personally this seems like a lot of implementation trouble and communication overhead, and I suspect it will lose out badly to a GPU cranking out evaluations at separate points.
For my own thing I used neither Estrin's nor Horner's. I profiled a bunch of equivalent parse trees and then zeroed on one that was performing well on the target m/c. You may not need communication though. As lng as it decomposes into mutually independent subtrees the compiler can exploit that
to generate better code.
Knuth, page 488:
Another attractive method for parallel computation has been suggested by G. Estrin [Proc. Western Joint Computing Conf. 17 (1960), 33–44]; for n = 7, Estrin’s method is:
Here a3 = u(x). However, an interesting analysis by W. S. Dorn [IBM J. Res. and Devel. 6 (1962) 239–245] shows that these methods might not actually be an improvement over the second-order rule, if each arithmetic unit must access a memory that communicates with only one processor at a time.* * *
Personally this seems like a lot of implementation trouble and communication overhead, and I suspect it will lose out badly to a GPU cranking out evaluations at separate points.