I don't have a link, but something similar was done with custom kernels with Redis compiled to run native above Xen - the performance gain was only ~ 13% - so in this case it didn't worth the trouble.
But if you a large HPC cluster, getting 13% more of each compute node definitely worth the trouble.
But if you a large HPC cluster, getting 13% more of each compute node definitely worth the trouble.
EDITED: see link in child's post