Anecdotally, my father-in-law was chief of staff for a small city hospital and they had a rule about not scheduling afternoon surgeries on important days like anniversaries, birthdays, etc. Basically, if the surgery started going long, you didn't want the surgeon worrying about missing their spouses birthday, birthday with their kids or an anniversary.
It actually wasn't that many surgeries delayed, as the surgeon just juggled surgeries and consults/paperwork/insurance to fit.
If this is fairly standard practice, then an afternoon birthday surgery would be an emergency situation and, hence, more deadly. Given the paper said some surgeons take the day off entirely, any surgeon with that habit would be performing an emergency surgery.
The problem is amusingly circular. Even if you reject the conjure in parent comment, you will be tempted to reduce the number of birthday surgeries due to the increased mortality. This will mean that birthday surgeries are only done in even more desperate circumstances which of course will increase the risk.
So mitigation of this problem will lead to the percentage increasing even more! Actually, it turns out that it is possibly better if the percentage is high!
You could control for patient characteristics (age, severity of the condition, etc), and that was indeed done here, see the paper. It also specifically addresses this issue:
> The major threat to the internal validity of our findings is that surgeons may selectively operate on sicker and more complex patients on their birthday, perhaps because those patients cannot have their procedures delayed. However, this is unlikely to explain our findings because we found that patients who underwent surgery on the surgeon’s birthday were similar in all observable characteristics to patients who underwent surgery on other days. Furthermore, severity of illness as measured by predicted mortality, and the number of procedures performed per surgeon, also did not differ based on whether a surgery occurred on a surgeon’s birthday compared with other days.
It seems to me that the analysis is quite carefully done:
> Findings were qualitatively unaffected by: using in-hospital mortality instead of 30 day mortality;
additionally adjusting for the timing of the surgery; including both hospital and surgeon fixed effects in the same regression models; excluding potentially outlier surgeons with the highest mortality; using logistic regression models instead of linear probability models: using random effects models instead of fixed effects models; restricting our analysis to surgeons who performed procedures on their birthdays; additionally adjusting for the day of the year; or excluding surgeons who were born on the outlier birthdays (supplementary eTables 5-13).
[...]
The study findings were qualitatively unaffected when the analysis was restricted to procedures with the highest average mortality or to patients with the highest severity of illness (supplementary eTables 16 and 17).
This is a great point to raise, but it's worth noting that it directly contradicts the GP's anecdotal observation. Had they instead found greater severity on birthdays and attempted to statistically correct for it, the two would be compatible. Instead, they looked, and found that there was no underlying difference to correct for. The question then becomes whether the GP is wrong, whether the hospital in question didn't have such a policy, or whether the measurements used in the study were insufficient to pick up the difference in severity.
I don't know which of these is true, but despite the apparent statistical significance of the finding, I wouldn't be confident assuming that the result is generally applicable. While not impossible, it strikes me as suspicious that they found no differences whatsoever in the surgeons' birthday vs non-birthday schedules. I somewhat wonder if by "no difference" they really meant "no statistically significant difference", which in this case wouldn't justify their lack of adjustment.
Furthermore, note that there really is a significant "avoiding surgeries on birthdays" effect: 2064 in 980,876 operations were done on a birthday, which is 1 in 475, rather than the 1 in 365 if there were no such effect. That's a reduction of 23%, which is rather suspicious given that we're trying to explain a 23% increase in mortality rate.
So what mechanism is responsible for that reduction, and is it likely to affect surgeries differently based on how urgent and specialized (and therefore dangerous) they are? Since the authors restricted it to surgeons that have done at least one surgery on their birthday, that rules out blanket "never on birthday" policies. It seems like the only mechanism that wouldn't affect them differently is "the surgeon is already on vacation in another country and can't get here for the operation" (and they choose to take vacations on their birthday more frequently). One could probably check vacation-day records relatively easily...
I think you have it backwards. The risk of the individual surgeries is not increased. Only the risk that a surgeon will be confronted with an above average critical surgery is increased.
It is not even clear that there actually is a problem. It's just a weird way to slice the data to produce an effect.
> The risk of the individual surgeries is not increased.
They never said that it did (unless they've edited their comment since you replied to it). They just said that the percentage [of deaths on surgeons' birthdays] will increase, and that is correct.
But the more you avoid birthday surgeries altogether, the more those that end up still happening are extreme cases with high mortality rates. So if your target is to avoid this scary statistical anomaly, you might instead want to promote more benign surgeries on birthdays.
I don't think surgeons are fongible assets. There is a lot of planning and study taking place before a complex operation. Not to mention most surgeons have specialties.
Surgeons are generally specialists in one particular area and possibly even specialists within that area (ie: only do knee replacements, etc.). Surgeons get better with experience, it's a skill, and different surgeries are different enough that experience doesn't transfer too much. There's also changes over time in best practices so skills degrade not just due to lack of recent experience. If I remember the best predictor of the outcome of your surgery is how many similar surgeries the doctor does per year.
My gut instinct is that this is incorrect, too, but I don't know enough about surgery to make a compelling argument.
I'd back myself to pick up Ruby (a language I've never touched before) and be productive, more than I'd trust a surgeon who only has experience with heart surgery to operate on my brain. Maybe that's ignorant of me.
I don't think that's the scenario @jrh206 was talking about, though. Most code written in Ruby doesn't have the sort of immediate risk to life or limb surgeries do.
For non-emergency surgeries it's often a long-term relationship where the same doctor who has seen the patient a few times would be the one operating - so if that particular surgeon isn't available for whatever reason, the planned operation would be rescheduled to a different date with the same doctor, not to a different doctor in the same day.
> If this is fairly standard practice, then an afternoon birthday surgery would be an emergency situation and, hence, more deadly
This is spot on. The causality can be both ways.
Note that it would be interesting to dig into what is really computed here, because the whole wording seem intentionally sensationalistic.
1) "23% more likely to die" seems _huge_, but it applies to an already very small chance. The mortality rate just goes from 5.6% to 7%. Using this logic, moving from 0.1% mortality rate to 0.3% would mean "you are 3 times more likely to die".
2) Comparing mortality rates only make sense if the distribution of operation complexity are identical for these days. As the parent post suggest, it seems very likely that low complexity operations are postponed after a surgeons birthday.
3) Where are the confidence intervals? I refuse to even consider looking at a statistics if error boundaries and significance metrics are not provided.
That may very well all be provided in the underlying paper, but the article itself does not really discuss these points.
> 1) "23% more likely to die" seems _huge_, but it applies to an already very small chance. The mortality rate just goes from 5.6% to 7%. Using this logic, moving from 0.1% mortality rate to 0.3% would mean "you are 3 times more likely to die".
But that is indeed precisely what it means. The 737 MAX might have increased the accident rate from 1 in a million to 3 in a million, and that would have been a tripling. That is not sensationalistic.
> 3) Where are the confidence intervals?
In the paper: "(7.2% v 5.6%; adjusted difference 1.6%, 95% confidence interval 0.4% to 2.8%; P=0.01)"
It's about the success of surgery on that surgeon's birthday, and calculating the implications of switching the surgery types etc etc, while in a real world situation you'd just use the other surgeon - which hopefully has another birthday. I agree there's gonna be some smaller effect even then, but less and less the more surgeons you have and the more randomly distributed their birthdays are.
> If this is fairly standard practice, then an afternoon birthday surgery would be an emergency situation and, hence, more deadly.
As pointed out by jlebar, this is controlled for by comparing similar emergency surgeries.
"The patients were all Medicare beneficiaries aged 65 to 99. They had all undergone one of 17 common emergency surgical procedures between 2011 and 2014."
There are emergencies, and there are EMERGENCIES. The former can often wait 24 hours; the latter cannot wait at all. Medicare data does not capture that at all - it only captures the categorical, "surgery type."
e.g., someone has a run of the mill cholecystitis that needs to come out. It can go when there's an opening in the surgical schedule, or tomorrow morning. That's an "emergency" - it came in through the ED, wasn't elective.
Then there's the person w/ chole that looks septic and you're afraid they're going to perf or already have. That person is going to the OR now.
Under Medicare coding, both of those are lap choles, CPT 47562. This doesn't control for that at all, except in the broadest of ways.
Also, a 65yo surgical candidate and a 99yo surgical candidate are wildly different. 99yo isn't going under the knife for anything other than immediate threat of death or unendurable pain. In the lap chole example above, I'm going with a trial of abx in the 99yo unless he's absolutely about to perf; 65yo, sure, let's take the gallbladder out - once he's progressed to sx chole, odds are really good it'll have to come out within the next two years. I think most surgeons would rather do it at 65 than 67.
Looking at the 2x2 of 65, 99, emergency, and EMERGENCY, you capture an incredibly large variety of severity and risk.
This is the most frustrating pattern in online discussion of science. A post title presents a conclusion, then the top comment proposes some alternative cause of the conclusion or some hypothesized methodological weakness in the research, then readers assume that the research is bad. Often, the exact specific criticism appears directly in the paper because the researchers thought of that too.
> Often, the exact specific criticism appears directly in the paper because the researchers thought of that too.
From the paper:
"The major threat to the internal validity of our findings is that surgeons may selectively operate on sicker and more complex patients on their birthday, perhaps because those patients cannot have their procedures delayed. However, this is unlikely to explain our findings because we found that patients who underwent surgery on the surgeon’s birthday were similar in all observable characteristics to patients who underwent surgery on other days. Furthermore, severity of illness as measured by predicted mortality, and the number of procedures performed per surgeon, also did not differ based on whether a surgery occurred on a surgeon’s birthday compared with other days."
No reason to be frustrated. Things are almost never right or wrong, particularly in complex science.
In this case, based on another comment above about "emergency procedure" having multitude of meanings, you're most likely wrong in that the paper has a rebuttal to the top post. The hypothesis then, is that the actual urgency of surgeries is not controlled precisely enough to state that they cannot affect the measurement.
> The major threat to the internal validity of our findings is that surgeons may selectively operate on sicker and more complex patients on their birthday, perhaps because those patients cannot have their procedures delayed. However, this is unlikely to explain our findings because we found that patients who underwent surgery on the surgeon’s birthday were similar in all observable characteristics to patients who underwent surgery on other days. Furthermore, severity of illness as measured by predicted mortality, and the number of procedures performed per surgeon, also did not differ based on whether a surgery occurred on a surgeon’s birthday compared with other days.
> we found that patients who underwent surgery on the surgeon’s birthday were __similar in all observable characteristics__ to patients who underwent surgery on other days
My father is a surgeon at a small hospital and my mother just got her hip replaced a few weeks ago (at a different, larger city hospital) and the first thing he insisted on was she was scheduled as the first patient of the day.
My father is a physician. He always recommends that if you are getting a test done, try to schedule it for the middle of the week and not near a holiday. It seems the lab techs make more mistakes on Fridays, weekends, etc. Just his anecdotal experience.
Getting surgeons to adopt the kind of "It's obvious but point and speak or you're fired"-style checklists a la operating an aircraft has reduced complications (from the minor to deaths) by several percent in the NHS. It's perhaps worrying given how low-hanging some of these fruit are - i.e. "Do we have the right patient?".
Back when I was still practicing as an anesthesiologist (1977-2015) I had a pack of 3x5 cards I carried in my scrubs pocket on each of which was an exhaustively detailed list I'd made of EVERYTHING I needed at hand to perform specialized procedures such as inserting an arterial line (putting a #20g plastic catheter through the skin on the inside of the wrist into the radial artery for direct beat-to-beat monitoring of arterial blood pressure, a measurement employed for seriously ill or unstable patients). I would assemble a tray with the following items in the OR before going to ICUs or the ER because invariably one or more of the items I would need would not be present and would take time to procure.
For example:
ARTERIAL LINE
• several sterile alcohol skin wipes
• 3cc syringe with 25g needle [for skin infiltration of local
anesthetic at puncture site]
• bottle of 2% lidocaine with epinephrine 1/1000
• 2x2 cotton gauze pads to use for pressure on failed
puncture sites
• 3 #20 gauge plastic catheters (22 gauge for small children)
• 2 surgical towels to drape over hand and lower arm to
absorb blood that accompanied successful arterial puncture
• size 7.5 sterile surgical gloves for me to wear while
performing procedure
• specialized 1" waterproof plastic skin tape to secure and
protect catheter in situ
I was constantly amazed by how my colleagues would have to stop and wait for something not present in the unit they were called to.
Conversely, when I was called because of an inability to insert an A-line, as they were referred to, and wasn't in a place where I could assemble my desired materials, I'd proceed with the materials at hand, all the while thinking "this could have been done a lot better...."
Note added after HN editing hour closed: The reason I created these Procedure Cards is that when I was really tired — for example at 1 a.m. after I'd been on call and working 18 hours since 7 a.m. the previous day, with six more hours to go till I was relieved at 7 a.m. — I invariably forgot something I needed and had to wait or make do because of this fatigue-induced lapse. With my list, I was able to do what I knew how to do without having to try and remember a zillion specific things.
> It's perhaps worrying given how low-hanging some of these fruit are - i.e. "Do we have the right patient?"
It turns out that while it's a good idea to check things this "low-hanging", the value is far larger than catching wrong patients, so don't worry too much!
Much of the value comes from essentially disrupting routine with an opportunity to stop, and from creating a culture of speaking up. I think the NHS was the organisation to trial having the nurse run the checklist, which had the effect of empowering the "lowest level" person in the operating theatre. Studies showed that even just having everyone in the room speak once increased the chance of subsequent communication, and ultimately improved patient outcomes.
Atul Gawande was one of the key people in designing and rolling out these checklists and wrote a book about it that I'd recommend – The Checklist Manifesto.
That's excellent! it never occurred to me it could have such a positive effect in a team situation. Empowering the supporting members is very insightful, they may be supporting but they are just as critical, they need to feel appreciation for the value they provide to stay sharp and motivated - I hope this can be applied to other contexts, thanks for sharing this.
If I remember the anecdotes correctly, the focus on nurses wasnt as cheery as this. Basically, the nurses often knew when the surgeon was in the process of making a mistake, up to cutting the wrong leg off a patient, but kept quiet because surgeons had the power to ruin their careers if they "embarressed" them. So the innovation was really about forcing surgeons to not terrify their nurses into silence.
Ouch. This reminds me of the problems nurses faced in the 80s as described in "Pragmatic Thinking and Learning". From that book it made it sounds as thought the issues were mostly historic, I guess there have been a number of different battles fought.
+1 on Atul Gawande's book The Checklist Manifesto. It's an interesting read if you're into aviation or healthcare or anticipate being in hospital one day.
I'm probably going to sound silly now but I do this when doing any technical work I deem important or critical enough, point and say outloud what it's for or what state it should be in or in the imperative what to do to it... it does work, it somehow catches silly mistakes compared to keeping everything in your mind, and also gives you confidence because it works like a checklist.
What you're describing is very similar to Rubber duck debugging. The idea here is that forcing you to explain what you're trying to do makes it easier for you to catch your mistakes.
It's similar to why formal verification is so important in hardware, because it's effectively forcing you to be specific about semantics and let's the computer walk through things for you.
Reminds me of a scene in House, where the cynical veteran doctor about to receive surgery on his right leg uses a sharpie to write "NOT THIS LEG" on his left leg.
Suboptimal. The "NOT" might be covered by something, and then what remains is "THIS LEG". I would have written "NO NO NO" on one leg, and "THIS THIS THIS" on the other.
In aviation, standard phraseology is generally carefully designed such that (mosts) subsets of a phrase are distinct from the opposite phrase. For example, when ATC warns of traffic, you reply either "traffic in sight" or "negative contact". When ATC hears only half of either, they still know what you meant.
I know people who have been instructed to do this by medical staff prior to surgery. I have personally helped someone mark a mole they couldn't reach on their back.
Can confirm - had to point to the area that was supposed to be operated while laying in pre-op (or whatever it's called) room, just before being knocked out... Was quite scary ("you should know what you're doing!"), but quite sensible on a second thought.
> It's perhaps worrying given how low-hanging some of these fruit are...
You don't realise how "bad" normal operational discipline is until you've seen it done right. The risk isn't so much that people skip a step, but that phantom steps start creeping in because people aren't quite sure of the standard, that saps a surprising amount of resources away which could be used for checking mistakes. And then people get disorganised and potential holes appear.
A big part of excellence isn't doing the right steps, it is trimming out the steps that don't need to be there, to focus attention on the stuff that works.
Might have surprised me before 2020, but the amount of people throwing hissy fits over masks this year has prepared me for the ridiculous stubbornness of humans asked to change.
If you want to read more on the background of this The Checklist Manifesto is a great book and has lots of applications when it comes to IT operations. Stupid checklists in markdown that you can check off as you go through have certainly improved our performance in ad hoc tasks that we don't automate.
sad anecdote: a friend of mine got surgery to remove a problematic mole on his back, and not only the surgeon got the wrong one, but even got upset with him when my friend mentioned something like "I thought it was further up my back".
One of the studied showed that teams that completed the checklist had lower mortality, but argued that the teams completing checklists might be systematically different from the teams that didn't complete the checklists, and thus the difference could not be attributed to the checklist, per se. Not wrong, but doesn't amount to refutation.
Note that in the debate you cited both, proponent and opponent, advocated the (continued) use of checklists.
Side note: I could see how you could do a blinded RCT, but not how you could do a double blind RCT here.
I think checklists are probably just fine, but I think they are overhyped despite a lack of scientific comparison to other interventions (I would much rather have sharpie on my legs and barcodes on the surgical sponges, given the choice) - and in particular, checklists are not directly relevant to the content of this article.
I just got surgery in California and noticed I was asked to state my name and birthday anytime I moved rooms or saw a new person. Seems like this is now part of protocol in a lot of places.
That is standard procedure in the US for every trained medical professional, all the way down to the MA who measures your height and weight at a routine physical.
I got in a medium-severity bicycle accident in 2010 and while I was being whisked around the hospital for different tests and procedures I had to do that too.
Procedures in hospitals are interesting. I recently visited a hospital and there was a red sign above the table where nurses dose medicines for patients: "No room numbers on trays, no bed numbers on trays". Seems like a low hanging fruit but a very non-obvious one (well at least non-obvious to me).
As an advocate of checklists for specific tasks, yes that stuff is good. Oftentimes there are a lot of checks that one would go "yeah no duh." But most good checklist exist with a checkbox because that was an issue before.
I’m going to guess (article won’t load for me) it is because the title should be “proportion of elderly deaths is 23% higher for surgery done on surgeons birthday” which might be a different thing if working on your birthday confounds with a certain type of surgery E.g. emergency surgery and being on call vs. elective surgery
That is not correct, the article talks about methodology:
> The patients were all Medicare beneficiaries aged 65 to 99. They had all undergone one of 17 common emergency surgical procedures between 2011 and 2014. Examples of those 17 procedures included cardiovascular surgeries, hip and femur fracture, appendectomy, and small bowel resection. The study focused on emergency surgery, so as to minimize the potential selection bias. For example, surgeons might otherwise choose patients based on their illness severity, or patients might choose their surgeon.
It isn't a good point. This was explicitly controlled for in the study. People are just making wild guesses about methodological limitations that don't exist.
>The effect size of surgeons’ birthday observed in our analysis (1.3 percentage point increase or a 23% increase in mortality), though substantial, is comparable to the impact of other events, including holidays (eg, Christmas and New Year) and weekends, which have been argued to affect the quality of patient care.
I recently had a complex facial surgery. Before doing so, I tracked all informations I could about my surgeon.
Everyone had good comments, good healing, very impressive track record, etc.
Everyone but one patient. She complained on how her surgery was rushed, how quick she was out of the operating theater, how she is scarred from it, etc.
At first, I thought she was lying. Why was she such an outlier?
I asked her for more details. It turns out her surgery was in the afternoon of the last friday before the Christmas holidays.
After knowing this, I made sure to schedule in during the middle of the week aways from holidays. My surgery went well.
I am glad to finally have statistics on this gut feeling I had. 23% is a LOT, and the 30% for the holidays is even worst. I had made a mental note to avoid those days, now it's become a rule.
Just a note, this (and many) titles report _releative_ change as the percentage. E.g. 1% -> 1.5% report as "50% increase". It is IMHO very misleading. At one point I developed a gut reaction to any statistic "Relative or absolute".
> The study, which appears today in the British Medical Journal (BMJ), looked at 980,876 procedures performed in US hospitals by 47,489 surgeons. Of those procedures, 2,064 (0.2%) took place on a surgeon’s birthday.
So, only a fraction of the surgeons performed operation on their birthdays. Would be interesting to compare the outcomes using the same surgeon group: surgeries on birthday vs same-surgeon surgeries on other dates.
I never took the day off on my birthday, but a lot of people do. So it might also make sense to ask why these surgeons are operating on their birthdays. Did they take the day off and went back to perform surgery on a critical patient? Are they overworked? Etc.
It’s coming up on a year since I got wheeled into the operating room as my Christmas present for 2019. I’m obviously still here to tell the tale, but the thought did cross my mind: “how much of a hurry are you to get home, Dr. Takiyama?” But according to TFA, a Christmas Day surgery was about equal in risk to one on the surgeon’s birthday. Which means my worries weren’t entirely unfounded.
Is that a thing in the US, taking a day off on your birthday? To the best of my knowledge, nobody does that in Europe - certainly not in the Netherlands.
It's probably just a personal thing. I'm in Europe and always take my birthday off - I don't understand why anybody would want to work on their birthday. Even if I don't have plans to celebrate until the weekend the idea of working on my birthday is very depressing (in fact in rarely celebrate my birthday and if I do it's something small like dinner with a couple of friends). To be honest I thought it would be more prevalent in Europe given we seem to get more time off and taking time off is less 'frowned upon' (at least this is my experience compared with my US based colleagues).
> I don't understand why anybody would want to work on their birthday.
I personally was under the impression that once you left childhood and the impatience of getting presents your birthday pretty much became just a day as usual. That also seems to be how my college treats it but it might be cultural. Europe is not a homogeneous place.
As far as I know it's not a thing here, either. I've known one or two that do, but in my experience the vast majority of people will celebrate their birthday on the nearest weekend, if they do so at all.
It's debatable whether the UK is "in Europe" but all my employees take their birthdays off, as does my wife. It seems to be hugely common here. I don't do it though because I just don't see the point.
Depends on where in Europe you are and how much the people you know value getting wasted or stoned. Personally, I prefer taking off at birthday + 1 day after.
You make a good point, though I'd like to add that one would expect 1/365 = .274% of procedures to be on a random day (like a birthday). .21% is not so far off that it can't be random chance.
Interesting question, wondered that myself as well. My statistics are a little rusty, so doing a quick simulation shows that the range is about .25-.29 (code sample 1). However, things like operations aren't completely random, and you might be more correct by looking at something like a Poisson process, resulting in a range of about .14-.44 for this sample size (code sample 2).
All in the end to be taken with a grain of salt of course, as the planning component of operations eliminates most statistical properties :)
They included fixed effects for hospitals and surgeons: "To test whether our findings were affected by including both hospital and surgeon fixed effects in the same regression models." [1]
Perhaps the link could be changed to the paper itself?
Ah, couldn't access the page so had assumed parent was correct reporting that they hadn't done this control. Sorry for getting that wrong - won't trust similar posts in the future.
> They also found that some surgeons did not work on their birthdays
Is this partially that better/more experienced/senior/more affluent doctors are taking their birthday off, hence people getting more junior doctors?
It would be interesting to see how many more people die on a particular surgeons birthday compared with the rest of their year, rather than the birthday of the surgeon a patient gets.
I'd guess that few surgeons start boozing at breakfast on their birthday or have a lunchtime drinking session to celebrate. But are more likely to be rushing off to have a party than actually be impaired.
I remember a previous study showing that junior doctors were actually associated with better results since they were more recently trained and had less ingrained habits/biases.
I would be surprised if it did. I believe the reduction in doctors’ overtime was a wash for parent safety as more shift handoffs and less sleep deprivation more or less canceled except in surgery where it increased mistakes.
I suppose what you want is someone with enough experience, but who hasn't been out of medical school for too long. Probably 7-10 years of experience would be the sweet spot.
I wonder if this applies to other domains as well.
> Research on judges has yielded similar results. It has found, for example, that external factors as diverse as outdoor temperatures and sports results can influence judges’ decisions.
When I hear of these funky effects I always wonder how they relate to AI. Presumably this is somehow related to some kind of lossy compression of the state of the world, maybe similar to principal components[0]? Where the "I feel grumpy" component is a mix of defendant-is-guilty, temperature-is-low and my-team-lost.
It might also be related to the binding problem[1]?
Also interesting is "Impossibly Hungry Judges". Many of these correlation effects are paradoxically so strong that they they _cannot_ be the true explanation.
"The phenomenon of favorable decisions peaking after a meal break is likely an artifact of the order of case presentation. It is not evidence that meal breaks affect the boards’ decisions."
Yup. Even ignoring the lack of randomization, there’s another issue with the hungry judges paper that should have raised some eyebrows.
They fit a model using both the case order (1st, 2nd, 3rd, etc) and the time elapsed. Either is significantly related to the case’s outcome, but when both are included, the rank explains away the time elapsed. This is obviously not compatible with increasing hunger/decreasing blood sugar, since those should depend on wall-time.
I don't have a reference handy, but I seem to recall that that study on judges' verdicts being influenced by temperature, how close they were to lunchtime and so on, failed to replicate. I wouldn't be surprised if the OP study fails to replicate as well. They have a relatively high number of data points (to get an idea of order of magnitude: mortality of 145 across 2064 operations on birthdays) but only reach a P-value of 0.03 on their main conclusion.
The judges-at-lunch thing turned out not to be a good “natural experiment” because the prisoners were systematically ordered.
The parole board considered cases from one prison at a time. Within each prison, prisoners representing themselves went last and they tended to fair worse than those with attorneys. The judges tended to take meal breaks between prisons, and....poof, there’s the result.
For human beings: How about simple plain distraction as the problem?
If you refer to AI there are many examples where the training data is biased. One funny example was enemy tank recognition that saw enemies whenever there was gloomy weather, because the sample images of enemy tanks all where shot at such weather conditions to make them appear sinister to human eyes.
If you refer to a mental model, I guess it might simply be a resource management problem. Just because we do not experience the distraction actively it does not mean it is not there. How exactly distraction is compensated is irrelevant to this explanation. Explaining this with mathematical terms is probably pretty arbitrary and leads to framing (in a psychological sense). But I also like speculating on AI ;)
“ The study, which appears today in the British Medical Journal (BMJ), looked at 980,876 procedures performed in US hospitals by 47,489 surgeons. Of those procedures, 2,064 (0.2%) took place on a surgeon’s birthday”
Actual data:
2064/980876 operation on bday 6.9% die, rate = 142 death
Expected: (if we use non bday death rate)
1/365=0.0027
2687 operation
With better 5.6% death rate
= 150 death
So deaths are only 6% more, so basically they are doing less and more emergency operations on birthday, so death rate is increasing it seems.
It means they are making 23% less operations on birthday then any normal day.
More than a decade ago we had to digitize some org’s patient cards into a database. After running few “test” queries against that dataset out of curiosity we found out that you’re less likely to be alcoholic-y if you live on 4-5 floor. The explanation I thought was that they hang out outside, but those who live higher have harder time to go home and prefer to not go. Building with 6+ stories usually have an elevator and “the problem” disappears.
This might be different elsewhere, but here, the apartments on lower floors are typically rented and they're cheaper than those on higher floors. It's the case at the building I currently live in, although it's because on the lower floors, there are more apartments (smaller ones) than on the higher ones. Roof apartments are all sold out, not rented.
So, I could imagine, that the higher floors are occupied by more wealthy people, who might be less prone to alcoholism. This theory, however, doesn't check out for high rise prefab houses. But it could perhaps cause the deviation?
Perhaps. The answer is it’s unknown. It was the first time I met correlation so clearly irl, and my idle guess was naive and obviously stupid.
There was no info on building types or sqft/m2, but lower and higher floors are historically cheaper here (1st is little loud/non-private and last is little cold and flood-prone).
"The effect size of surgeons’ birthday observed in our analysis (1.3 percentage point increase or a 23% increase in mortality), though substantial, is comparable to the impact of other events, including holidays (e.g., Christmas and New Year) and weekends. (...) But the authors say the “natural experiment” in the present study is more revealing than, for example, holiday-related mortality rates. That is because “those events not only affect physicians’ performance but also influence patients’ decision to seek care (i.e., patients seeking care on these special days might be sicker than those seeking care on other days), as well as hospital staffing.” Unless, of course, the patients know their surgeon’s birthday, which is unlikely (though that may change if this study becomes widely known)."
That doesn't take exclude the possibility that surgeons may be assigned different patients on their birthdays. Some studies on the 'weekend effect' [1] seem to also control for illness severity, not clear if that was done here.
[1] https://en.wikipedia.org/wiki/Weekend_effect
I feel like we need an opposite saying for sentiments like this to the age old "correlation does not equal causation", something like "correlation does not mean automatically dismiss"
The comment you replied to has nothing to do with "correlation does not equal causation". The point was that fishing for results, by trying everything, is guaranteed to find something.
So if, in a study like this, they looked at differences in morality rate between: 1) men and women, 2) younger and older people, 3) younger and older surgeons, 4) surgeries in summer vs winter 4) surgeries in morning vs evening 5) etc etc.... And it came down all the way to n) surgeries on birthday - to find an effect. Then it would be almost guaranteed that such a finding is spurious.
> Then it would be almost guaranteed that such a finding is spurious.
But that's the fallacy. You can't just preemptively assume that there are no real correlations.
You definitely want to use a smaller p threshold when you look for more things, but it's quite possible to hit real correlations with a pile of plausible hypotheses.
As an example: Let's say just 1/150 of your hypotheses hit a real correlation, and you're inappropriately using a p<.05 test. Tiny signal, huge noise. But even in that pessimistic case, more than 10% of your positives are real. Far from a guarantee.
> [...] but it's quite possible to hit real correlations with a pile of plausible hypotheses.
Yes of course. But the trouble is that, if you do this p-hacking expedition, you are guaranteed to find those correlations in pure noise. So if you use a procedure that will find something in noise - you cannot also use it to claim to have found something in your data.
In the words of statistics philosopher Deborah Mayo - "A conjecture passes a test only if a refutation would probably have occurred if it's false". In this case no refutation would have occurred if the correlation is false. Hence - the result is equivalent as if no test has actually been performed.
Or, a more simplistic example, imagine if someone observes an asteroid and says "it might be aliens". Some astro-physicists then describe that all the observed properties of that object behave just as we expect them to behave in the case of an asteroid. But the person might then reply with: "yeah, but it still might have been aliens".
I feel that the same is true for "yeah, but the correlation might still be true".
> In the words of statistics philosopher Deborah Mayo - "A conjecture passes a test only if a refutation would probably have occurred if it's false".
Sure, one weak result out of many doesn't pass. But not passing is a far cry from "almost guaranteed" to be spurious.
> Hence - the result is equivalent as if no test has actually been performed.
A result like that takes a big list of plausible correlations and distills it down. If you think even a handful of the original list items are likely to have merit, then the distilled list is useful for suggesting where you should collect more data.
> Or, a more simplistic example, imagine if someone observes an asteroid and says "it might be aliens".
What fraction of asteroids to you expect to be aliens?
If it's one in a billion, then cutting the list by a factor of 20 is useless. If it's one in a hundred, then cutting the list by a factor of 20 is very helpful.
> I feel that the same is true for "yeah, but the correlation might still be true".
It depends on the original list being sufficiently plausible. You can't distill tap water into vodka.
The only way to make the results believable is if they pre-registered the study based on a proposed mechanism of action, and then validated the results. Otherwise we can never know how many different attempts at crafting signal from the noise were attempted.
This is a huge problem in scientific papers; it's very, very common to see results for all kinds of metrics with confidence intervals or p-values and then see a few "significant" measurements, without a mention of the fact that multiple tests were made - and implicitly, possibly many more tests were made during the exploratory phase of the research. What does significance even mean then? Hard to say (there are techniques to try and compensate of course, but they have their own issues).
One simple way we can at least mitigate that problem is by requiring far lower p-values (or wider CI's), and where that's not feasible, require a much clearer-eyed explanation and acceptance of the fact that such research cannot be trivially supported by statistics, and instead additionally requires careful experimental setup and consideration of causal networks.
Basically: if you have p = 0.0001 or whatever I'm more willing to believe that publication biases and multiple testing aren't super likely to cause false positives that often. But without that, you want a clear hypothesis and proposed way to test published beforehand, and just one test, and ideally a clear hypothesis about causation etc too, so you can critically push and prod the results to try and distinguish noise from signal. A p=0.03 just isn't very obvious, at all.
In general, I think modern science is too reliant on statistics over complex systems, and in the effort to tease out significance then needs to try and correct for all kinds of known interference (confounders) and other effects; thus then need to use more advanced statistical models and less general assumptions about distributions (whether for significance, or for mathematical tractability), that it's just very hard for anyone to say they didn't make some systematic error somewhere. And sure, being an expert in the subject matter and having an expert statistician on hand helps, but making reasoning errors is too easy; too human to reliably avoid. Instead of seeking signals in noise, we should be targeting research more narrowly to parts of the puzzle that we can measure better, then use classical plain logic to put the pieces together - not try and measure the whole thing in one go. After we put all the well-measured pieces together, validating with tricky statistics is reasonable as a sanity check, but not much more than that. If common sense is hard, statistics is harder, even for statisticians.
Interpreting results like this as any more than "huh, that's something we could look into" is unwise.
Why are you and others assuming this was the case and it wasn't the original hypothesis they had in mind?
They explain the reasoning for selecting this hypothesis to test:
>Operations performed on birthdays of surgeons might provide a unique opportunity to assess the relationship between personal distractions and patient outcomes, under the hypothesis that surgeons may be more likely to become distracted or feel rushed to finish procedures on their birthdays, and therefore patient outcomes might worsen on those days.
Because popular hobby on HN is trying to find mistakes in studies by guessing from title without reading article.
Incidentally, interestingly, that is how bias in real world often works - people making different assumptions for different groups in absence of evidence.
Did they pre-register before conducting the study? If not, then how can you possibly assume the opposite? They can always write the justification for choosing the hypothesis after the fact.
EDIT: This need not be done maliciously, that could very well be the actual reason why they decided to look at the birthday. What is concealed is how many other possibilities, equally well justified, were considered.
I always liked xkcd's "Correlation doesn't imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing 'look over there'." (https://xkcd.com/552/) although it is rather long.
The website is currently unavailable so I can't read the article. However p-hacking, to which I believe the parent comment was referring, is a separate issue.
I agree that the incessant belaboring of the difference between correlation and causation in these types of threads is tiresome, but I don't think it applies in this case.
How about "if you find a correlation and want to report this as if you think it's significant, then you should formulate a specific hypothesis and perform independent experiments to test it, rather than publishing junk."
Do we? Wait for an hour and there will be a bunch of plausible..apparent descriptions that “invalidate” that “they’re just negligent on their birthday”.
Well, that's how you discover things. You go fish for correlations, and publish any interesting findings, so another group can get independent data and check them.
On the meanwhile, a news reporter gets your (probably spurious) correlations and announce them for the entire world as "the TRUTH! science says so, and you don't doubt science, do you?"
That's basically how science gets done on any complex field where we can't test things directly.
> The effect size of surgeons’ birthday observed in our analysis (1.3 percentage point increase or a 23% increase in mortality), though substantial, is comparable to the impact of other events, including holidays (e.g., Christmas and New Year) and weekends.
> But the authors say the “natural experiment” in the present study is more revealing than, for example, holiday-related mortality rates. That is because “those events not only affect physicians’ performance but also influence patients’ decision to seek care (i.e., patients seeking care on these special days might be sicker than those seeking care on other days), as well as hospital staffing.” Unless, of course, the patients know their surgeon’s birthday, which is unlikely (though that may change if this study becomes widely known).
That doesn't take exclude the possibility that surgeons may be assigned different patients on their birthdays. Some studies on the 'weekend effect' [1] seem to also control for illness severity, not clear if that was done here.
[1] https://en.wikipedia.org/wiki/Weekend_effect
For those wondering about that “elderly”: the researchers only looked at “100% fee-for-service Medicare beneficiaries aged 65 to 99 years who underwent one of 17 common emergency surgical procedures in 2011-14” (https://www.bmj.com/content/371/bmj.m4381), so they didn’t check whether this applies to non-elderly patients.
> The study focused on emergency surgery, so as to minimize the potential selection bias. For example, surgeons might otherwise choose patients based on their illness severity, or patients might choose their surgeon.
The results are a lot less interesting when you read that the risk increase they found is comparable to that present during regular weekends:
>The effect size of surgeons’ birthday observed in our analysis (1.3 percentage point increase or a 23% increase in mortality), though substantial, is comparable to the impact of other events, including holidays (eg, Christmas and New Year) and weekends, which have been argued to affect the quality of patient care.
I suspect a causation along those lines would be that surgeons are less likely to push back surgeries to the day after their birthday if they're worried that the patient may soon become too sick to operate on.
> The researchers emphasized that this study focused on common procedures, and on older Medicare patients. This means that the findings may not apply to other types of patients, or to other surgical procedures.
Is there any reason Medicare would play a role in the surgeon's performance?
Medicare (as in the method of payment) obviously wouldn't, but by only polling Medicare patients they bias the patients, as Medicare patients are predominately the elderly.
I read something similar based on how long the surgeon had been working that day. I always wondered how plausible it was to show up for surgery day and ask the surgeon how many hours of sleep they got the night before and how long they have been working that day.
In general I wonder there is some QA/QC process involved in surgery, may be have some sort of independent evaluators observe each surgery and ensure the work is being done upto the laid down guidelines.
I’ve worked with a lot of surgeons and have heard/seen a lot of unbelievable things. Human beings are human. And often poorly educated in principles and ethics.
OK and how is this observational fact of any scientific relevance? Well there could be a similar observation like Elderly people are X% more likely to die if the moon is in the fifth phase of jupiter. So by some hocus-pocus astrological deduction this is a Study worth considering?
So you have 365 days in the year when a surgery can take place but only one day when a surgeon can have birthday. You compare two samples of 364:1 ratio ... it wouldn't be suprising the tiny sample gets some crazy deviations. Yet you find nothing so you're forced to dig in deeper (well this kind of content does well on internet you better find something) and bam there's a deviation for elderly patients, now you've got something to get your work talked about ...
Funny the website that hosts the study* even displays an "altmetric" chart showing how many media talk about it, how many tweets etc. Well done science :)
It actually wasn't that many surgeries delayed, as the surgeon just juggled surgeries and consults/paperwork/insurance to fit.
If this is fairly standard practice, then an afternoon birthday surgery would be an emergency situation and, hence, more deadly. Given the paper said some surgeons take the day off entirely, any surgeon with that habit would be performing an emergency surgery.