As the bug report mentions, the sleep(1) call appears to be wrapped in a #define that (I assume) won't be active in normal OpenSSL builds. At least in v0.9.8o (the only one I checked) the only references I see are:
ssl/s2_pkt.c:
#ifdef PKT_DEBUG
if (s->debug & 0x01) sleep(1);
#endif
There are two references like that to PKT_DEBUG (read and write); the only other is:
ssl/ssl_locl.h:
/*#define PKT_DEBUG 1 */
I suspect this is a non-issue. Interesting though.
Sleep(1) is very noisy on Windows machines. The time slice given by default is ~15-20ms [1], and calling Sleep(<15) relinquishes the rest of the time slice. Windows has a multimedia API that can be used to get intervals down to 1ms, but it requires system calls that increase your system load. So you usually just need to use proper synchronization anyways, which is what the Python guys should have done :)
This crummy Sleep() implementation has some nice effects on programmers. Those who like to solve problems with lots of copy/paste code are forced to think about using proper synchronization primitives when running high resolution loops that wait for events, or their code just won't run very fast.
When I did driver programming in Windows, it was well-known that Sleep had a resolution of 10 ms; it is based on the interrupt timer (not the high frequency timer). You could change the interrupt timer's duration, but its ticks are what guide Sleep. Not counting the effect of context switching, since you are waiting for the timer ticks, your actual times vary from 10 ms to 19.9999 ms. 15 ms is a nice way to say "on average", but I would not rely on that measure.
Timers are hard to get right. Tread warily, programmers! This is one of those areas where it is good to understand some things about the computer hardware behind the software.
EDIT: I should add that the high frequency timer is not a panacea either. It will work for you most of the time, but there are two circumstances that will occasionally trip you:
(1) At least in Windows XP and 2000, there is a KB (I do not remember it now) that explains that for a certain small set of manufacturers, if there is a sudden spike in high load on the PCI bus, the high frequencer timer will be jumped ahead to account for the lost time during that laggy spike. This correction is not accurate. This means that if your initial timestamp is X, and you are waiting for it to be X+Y, wall clock time may be between X and X+Y, but Windows itself altered the timestamp to be X+Y+Z, and your software thinks the time has elapsed. I personally experienced this bug.
(2) You actually have more than one high frequency timer -- one for each CPU on your system. Once you start playing on a system with multiple CPUs, how do you guarantee that the API is providing you a timestamp from the same timer? I remember there may have been way to choose if you dropped to assembly to make the query but that the API at the time did not support a choice. The timer starts upon power-up. If one CPU powers up after the other, you will have a timestamp skew. Some high frequency software algorithms attempt to correct for this skew. I do not know all the details to that now.
"The issue has two components: rate of tick and whether all cores (processors) have identical values in their time-keeping registers. There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized. In such cases, programmers can only get reliable results by locking their code to a single CPU."
The entry also mentions that hibernation can affect the counters. I wonder if power savings implementations that speed up or slow the CPU could also have an effect.
Noisy means there is a lot of variance in the actual time the process spends sleeping. When you say sleep(1) most OSes interpret that as saying, sleep as short as you can. Based on the scheduler internals, that can vary a lot.
Which OS interprets sleep(1) (ie, "sleep for 1 second") as "sleep for as short as you can"?
On WinAPI, Sleep is denominated in milliseconds.
On BSD, sleep(3) is a library wrapper around nanosleep(2).
Linux's man pages make no mention of the magic number "1" as a "sleep 1 timeslice" shortcut; also, older Linux man pages warn that sleep(3) can be implemented in terms of alarm(1), which is used all over POSIX as an I/O timeout and would blow up the world if it alarmed in milliseconds.
If you want to sleep "as short as you can", sleep for 0 seconds, or call any other system call to yield your process back to the scheduler.
Thanks for the correction. I was just talking about the de facto behavior I have seen on Linux and BSD for very short sleep intervals (way shorter than 1 second), not necessarily about the behavior as specified by the system call. I should have been clearer.
Ok, but (a) this article is talking about literally POSIX sleep(3), and (b) there is a ton of confusion on this thread about whether sleep is ms-denominated or seconds-denominated.
Wait, what? If I call sleep(1), you're saying it's going to sleep for THIRTY EIGHT SECONDS?
People, it's right there in the man pages.
Are you maybe thinking about WinAPI's Sleep? That's ms-denominated. It would make sense that attempting to sleep for 1 millisecond wouldn't work, and would build in the time for the scheduler and the timeslices for every other process. We're talking about OpenSSL and POSIX sleep(3).
People seem to have a skewed perception of how to defeat timing attacks, generally. At the end of the day, it's more about making things constant time than trying to make the timing difficult to detect. Simple example:
You have two hashes and want to see if they're equal. The naive approach is to iterate over each byte in both hashes and compare them, then break when you find a byte that doesn't match. That approach, however, could be vulnerable to a timing attack because you could potentially measure how many times it iterates. An implementation that's resistant to timing attacks could XOR each byte of each hash and accumulate across them; if that accumulator is zero at the end of the loop, it's equal. That approach is constant time, rather than being dependent on the data you're dealing with.
Besides, I see no evidence that the sleep was intended to thwart timing attacks. It looks like it's all about easing the debug process somehow.
Incidentally, the primary problem here is not the mere presence of a debug flag that governs a sleep, it's the fact that PySSL_SSLdo_handshake sets that debug flag. Right?
In other words, it's not a bug in OpenSSL itself, but rather the Python wrapper for OpenSSL. That's how I understand it.
A second long sleep on every read or write? If this was actually happening, it sounds like it could create unheard of performance issues for any significant transfer.
Or was this not noticed because all the major frameworks like cherrypy and twisted are still using the pyopenssl wrapper?
Is there any evidence that this bugfix actually changes the performance?
it is actually more reliable to sleep than to block. by definition blocking is unreliable because you don't know exactly when it will unblock. You do know when a sleep will end though. I also want a variable delay between writes.
Uhhh. I think you need to study this topic some more.
> it is actually more reliable to sleep than to block. by definition blocking is unreliable because you don't know exactly when it will unblock.
A block will end when the nic can handle more data. You can't just wait a second and assume the nic can handle the data. That's where the "unreliable" part comes in. You assume it can handle the data, but you are not checking. And the way to check is either by polling, blocking, or receiving a signal. Waiting is not a way to check.
> You do know when a sleep will end though.
It makes no difference that you know when the sleep will end. It's irrelevant - all you care about is can the nic accept more data or not.
> I also want a variable delay between writes.
If you want variable delays then do that, but that has nothing whatsoever to do with making sure the nic doesn't lose data.
http://thedailywtf.com/Articles/The-Speedup-Loop.aspx