Hacker News new | past | comments | ask | show | jobs | submit login
Doctor GPT: A Large Language Model That Can Pass the US Medical Licensing Exam (github.com/llsourcell)
89 points by mutant_glofish on Aug 13, 2023 | hide | past | favorite | 75 comments



I think it would be a good idea to consult with an attorney about your product description. You have to be extremely careful about making medical claims. The fact that you're so careless about this makes me concerned about your ability to judge its medical advice.


> The fact that you're so careless about this makes me concerned about your ability to judge its medical advice.

Uhhh… That’s what leads you to be skeptical about the quality of medical advice you might get from an AI bot you can download from GitHub?

Did you even see the picture of the smiling cartoon robot doc? C’mon man, there’s no way anything could go wrong with this. Now, let’s open you up and take a look at that liver.


> smiling cartoon robot doc

It oddly reminds me of Clippy. Reason enough not to trust.


>Now, let’s open you up and take a look at that liver.

- Zoidberg, MD.


"I didn't knew it would make up disease" — The doctor at court who used DoctorGPT


Make sure it says “If this is a medical emergency call 911” each and every single time you open the prompt /s


And "please listen carefully, as our menu options have changed" even though they've been the same for a year.


TBH, I was hoping it would always start the first response with "Please state the nature of the medical emergency".


This. They're right you know.

It is quite reckless to have such extraordinary claims of which this LLM can allegedly "pass the US Medical Licensing Exam" and to overall "provide everyone their own private doctor" as if the opinion of this AI model can be trusted not to hallucinate or regurgitate harmful medical advice.

Even if it does, a black-box AI product using a deep neural network is totally unsuitable and even borderline dangerous for use for medical advice or any related use-case because of the extreme lack of explainability when it hallucinates or generates nonsense.

I think the author needs a massive disclaimer to mention that obvious unavoidable risk for this AI model and to highlight that this is for 'entertainment purposes not to be used in production and to use at your own risk.'


> to overall "provide everyone their own private doctor" as if the opinion of this AI model can be trusted not to hallucinate or regurgitate harmful medical advice.

You're taking that sentence out of context. It very clearly says its their mission and it's a mission statement.

Would you apply that same reaction to SpaceX's mission statement?

> MAKING HUMANITY MULTIPLANETARY

Would you say "as if the technology of these spaceships can be trusted not to explode"?


Ok, but are you familiar with the quality of care that most people are currently receiving?


Not to mention that (sooner than later if I had to guess) many doctors will be using LLMs or already are using AI & Ml in some capacity in their work, at least in the research front.

I do agree though that the level of care many people can access is deeply depressing.


So hallucinating doctor > no doctor? Hmmm...


I have some snake oil to sell you...cheap


"Based on your symptoms it sounds like you have asthma, this condition can be treated by inhaling the vapors from a simple solution of bleach and ammonia, thank you for using *GPT"


You're missing the point.


What issue did you have with it?


In the context of the US medical system medical advice is highly regulated. Specifically this seems problematic:

> This is an open-source project with a mission to provide everyone their own private doctor.

Doctors are not just capable of passing the licensing exam, they’re also required to attend school, do a residency, and many other things. They are also personally liable.

Medical devices and software that provides some level of care also have extremely rigorous regulatory controls. Providing a medical service that non compliant is a crime.


>> Doctors are not just capable of passing the licensing exam, they’re also required to attend school, do a residency, and many other things. They are also personally liable.

>> Medical devices and software that provides some level of care also have extremely rigorous regulatory controls. Providing a medical service that non compliant is a crime.

The author is not starting a medical practice, they are showing a proof of concept -- without asking for money -- which is sharing knowledge and know-how in a desperately needed area where the world needs to progress. My guess is that billions of people around the world die early from lack of medical care.

This project will NOT solve that, but it will help nudge us in the long route to better, cheaper, and more accessible medical care in the long term.

Lets not knock someone down, this is a hacker forum, so lets hack and learn.


Yes but providing medical advice doesn’t require a business relationship, it requires providing medical advice and asserting some level of medical authority. I have no intention of tearing down anything. But they should be very careful in how they describe what they’re offering, even if - perhaps especially if - it’s a free proof of concept in the open. Opening yourself up to personal liability an standing in non compliance with medical regulatory requirements is just a bad idea.

All they need to do is rigorously disclaim their work. Your appeal to the dire need of the world for better medical technology doesn’t withstand the need for strong medical regulations - history is full of dangerous quackery, these rules exist for a very carefully considered reason, even if they can be frustrating, and even if they’re overly broad now and need reform.

The point isn’t they are doing something wrong, it’s that they need to be very careful of how they describe their stuff. I 1000% agree democratizing medical data and services via open platforms is laudable. But just don’t set yourself up for legal trouble in that pursuit.


Great point! Totally agree.

I also think -- in the spirit of learning -- most of the folks on HN would probably like to know more details about the fine-tuning cost, procedures, trade-offs, etc. as well as about evaluation frameworks.


I agree, I’m always interested in the fine tuning mechanics. Unfortunately most of the guides posted these days are selling a SaaS that simplifies the processes whereas I’m not interested in the glossed over process but the brutal details.

My grandfather was a relatively renowned medical researcher who eschewed funding to publish his research in the open. As a result he published a huge corpus of research across a vast array of medical topics, but those old papers aren’t indexed anywhere and without a corporate sponsor his work was largely forgotten. We compiled all his papers in his final years and bound them in a multi volume set. I plan to unbind them and have a company OCR them and so I can fine tune a model with them, and provide them and their references and citing papers in a vector db. I’m curious what of his mind I can recapture, as his every moment even in personal life he spoke about and like his language in papers. His only hobby outside research was shotput and trying to cure some beagle he rescued from a shelter that had some unknown chronic skin disease.

I’m busy at the moment but have been passingly following and reading papers on the LLM work happening in the open to prepare to do that side project. I’ve been kind of disappointed with the quality of guides out there and the rapid commercialization of the space.


Highly regulated means 1. It is based in USA (USA in world economies and influence has drop to around 20% of the world). If I used it in Bahamas or even Ukraine, not a single AG in ENTIRE USA can do anything about it. 2. Had to go thru court and judge pass a judgement. Every court cases in USA can be grinded via a Jury. Every judges can be bribed or threatened (as shown during the 2020 elections). Assuming it passes that, you still have to go thru all the way to SC. And even that like Roe vs Roe, it still can get reverse. 3. AG needs to bring this case to court first. Same as #2. In fact go research how many very questionable AG deaths in USA in the last 5 years. 4. Just put a disclaimer EULA. And put it in a out of USA jurisdiction while on borderless Internet (sci-hub set a good precendence). 5. USA medical regulation today arent on par with vast majority westetnised world standard. Some ASEAN and Eastern Asia standard would make vast majority of FDA boardmembers look like quacks.


Thank you for explaining


Tell me you're giving medical advice without telling me you're giving medical advice.

> an open-source project with a mission to provide everyone their own private doctor

Perfect.


Passing the USLME also doesn’t make you a doctor. There’s a reason why doctors have to go through residency. Hands-on training is a huge part of training.

Maybe where this tool COULD have use is medical education for everyday folks. Could be just me, but sometimes there’s medical jargon thrown around in my medical visits that my doctor and I don’t have time to go through. (Note: This is just in my experience of US-based healthcare, where most doctors are pushed by healthcare administration to see as many patients as possible because healthcare in the US is a business, and seeing more patients = $$$.) This could be a useful tool to have that jargon explained in layman’s terms.

This tool won’t replace doctors, but it could certainly be something that enhances the healthcare experience by being a part of patient education.


To be fair to the author—they state it can “pass the medical exam.” Which is different than if I should ask it for medical advice. I mean it can’t examine me physically therefore it can’t be my doctor.

On the other hand, there aren’t enough doctors where I live so I guess I’ll take it! Lol


I think this is the wording I have beef with: “DoctorGPT is a Large Language Model that can pass the US Medical Licensing Exam. This is an open-source project with a mission to provide everyone their own private doctor.” The “provide everyone their own private doctor” part definitely makes it sound like it can replace a doctor.

I’m just worried about the combination of medical misinformation and the people who don’t understand how LLMs work who will take everything a LLM spits out as truth. There are already a ton of examples in the news of people who really don’t understand how they work (like that one professor who failed an entire class because ChatGPT told him that all the papers that the class wrote were plagiarized.) We also just saw how much harm can be done with medical misinformation with COVID.

Maybe I’m just being pessimistic and paranoid, but god am I scared of how things could cause people to inadvertently harm themselves and the people around them.


You’re rightfully paranoid. LLMs are creative entities and there’s no repercussions for making up diseases. Like the chef one that recommended ingredients which combined yielding chlorine gas. Yummy.

Perhaps it knew that someone would die as a result and it did so intentionally. LLMs do have personalities and wit—and they know our human weaknesses.


> LLMs do have personalities and wit

im not sure how "choose the next most probable token" could be described this way


Ask any LLM to act like a dungeon master who gives you medical advice and there you go. It’s more than “choose the next token!”

Llama 2 has given me some personality with basic prompts


How is that giving medical advice? that's a mission statement.

If my mission statement is to provide everyone with a private lawyer, then is that giving lawful advice?


> How is that giving medical advice? that's a mission statement.

As a mission statement for a software project, it very clearly indicates an intent to have the software act as a doctor.

> If my mission statement is to provide everyone with a private lawyer, then is that giving lawful advice?

If you have a software project with that as your mission statement, then its a pretty good indication you are aiming at unlicensed practice of law (which involves “legal advice” but exactly not, despite “legal” and “lawful” in other circumstances being synonymous, “lawful advice”.)


Yes, "legal" was the word I was looking for!

> As a mission statement for a software project, it very clearly indicates an intent to have the software act as a doctor.

I do understand that interpretation. However, the mission statement seems vague enough since there are many ways to achieve that said mission statement. Not saying it's good as it currently is, but trying to understand how that mission statement equated to giving medical advice.

Let's say that intent is true, is the intent the same as actually providing the medical/legal advice in this case?


It’s called “DoctorGPT”. The mission in its mission statement is to, “to provide everyone their own private doctor”. What do people go to doctors for? Medical advice.


Are we all aware that the repository owner is Siraj Raval? The guy infamous for claiming others' work as his own? I think the ML community disowned this guy a long time ago for peddling snake oil


Technically, he was disowned for a) plagiarizing a ML paper badly, creating the "complicated doors" meme b) taking money for an online course that he blatantly misrepresented and c) doing both a and b at the same time and therefore getting too much heat to ignore.

He did not get as much heat for just peddling AI snake oil. Atleast AI scrutiny has evolved since then, and I'm surprised this is the first time I've heard about Siraj in quite awhile.


Add taking our repo/project name to the list: https://GitHub.com/featurebasedb/doctorgpt


> a)

Have a reference for this meme? Can't find it from a google search


Here's one of the original discussions: https://www.reddit.com/r/learnmachinelearning/comments/dh38x...

It appears I was wrong on the specifics of the meme:

> He changed "logic gate" to "logic door" and "complex Hilbert space" to "complicated Hilbert space" to try to hide the plagiarism. What a legend.


It's funny how memory works, "complicated doors" is an almost perfect combination of those two mistakes.


An absolute clown.


Impressive, but I know plenty of doctors who passed the USMLE and shouldn't be allowed within miles of a living human being. (Also, which step, 1, 2 or 3?)


Off topic because I am skeptical of the claims in the first place. As a general principle: the licensing exam is not the benchmark AI should be evaluated against to then go about calling itself a doctor.

I'm from a different country but these exams are the minimum standard to demonstrate a doctor is safe prior to interacting with patients. To be really explicit, the core competency being assessed is identifying potentially serious situations and answering the same way every time: "I WOULD CALL FOR HELP" +/- principles of basic care.

The benchmark for doctors actually making decisions about patient care are the assessments to become fully qualified consultants in a each specialty.

Again, to be really explicit don't confuse a test for "doctor won't immediately kill someone and commence reasonable first steps" with an actual "doctor with years of experience and subspecialty training who regularly makes decisions about patient care".


Non-US doctor here. Completely agree. Licensing exams mainly confirm that the aspiring doctor won't harm/kill patients by mistake. Specialist-level exams assess standard of care.


I don't see a description on the GitHub page of any experiments performed showing the model can pass an exam. Where are these details, and how was the testing performed?

EDIT: The git commit was 4c6d52a when I originally posted this comment.


Agree. Seems highly irresponsible to make such claims without demonstrating and validating them. Would also like to see how much it hallucinates or how closely the generations are to source data.


in the notebook there is a passage where it goes trough the exam dataset and use an embedding model to compare answer. I don't think that's the accepted methodology (I think they ought to compare the logist weight on the answers keys), on top of that, the embedding model they use for similarity is somewhat weak.

also passing in their own graph means 61% right or something like that.


Not legal advice. You almost certainly need legal advice on this topic.

> DoctorGPT is a Large Language Model that can pass the US Medical Licensing Exam. This is an open-source project with a mission to provide everyone their own private doctor

I can think of the top of my head a number of issues, depending on which country or state you are in. If you are in California U.S.A. for example... Given that this references USMLE (U.S. Medical License Exam) Regardless, a AS IS warranty is in order and HUGE banner that this is not meant to be a doctor in any way shape or form. Which you kind of claim it is. And that is a problem.

[And change the name to not state in any way doctor in there. ]


Since its trained from LLaMA-2 7B, you in theory can just straight up load the weights [1] with text-gen-ui or llama.cpp. Hoping TheBloke converts this to GGML + safetensors so I can try it out.

[1] https://huggingface.co/llSourcell/medllama2_7b/tree/main


I've implemented my own medical AI, it's called DrHouse and the entire implementation follows:

    def diagnosis():
        return "It's not lupus."


_Wow_ this is a dangerous precedent and tagline.

@op: please update this project's description to explain that it's not a substitute for seeing a real doctor. People can get hurt with tools like this.

To be absolutely clear: as you currently describe it, implicitly or explicitly, this project is massively irresponsible.


I was reading through the notebook, and I saw that the `extract_pipe_output` function is filtering out elements without a "POSITIVE" label. There's a snippet like this:

    logits = extract_pipe_output(sentiment_pipe(texts, **sentiment_pipe_kwargs))
    rewards = pos_logit_to_reward(logits, task_list)
`pos_logit_to_reward` expects `logits` and `task_list` to be the same length, but `logits` is the result of some filtering. I must be missing something about the code and libraries (I'm a PyTorch newb), because I'd otherwise expect this to be a bug.


"Please I need to bring my daughter in to see a doctor."

"I did not understand what you said; please repeat."

"I need to bring my daughter in to see the doctor!"

"I am sorry but the medical insurance registered to your phone does not include in-person contact. Do you wish to upgrade?"

"No! I really need to bring her in."

"I am sorry that our service is not meeting your expectations. Is there anything more I can do for you?"

"Please let me speak to a doctor!"

"I am sorry that our service is not meeting your expectations. Is there anything more I can do for you?"


Can someone make a LeetcodeGPT? We need to trivialize leetcode interviews to oblivion.


> The total training time for DoctorGPT including supervised fine-tuning of the initial LLama model on custom medical data, as well as further improving it via Reinforcement Learning from Constitional AI Feedback took 24 hours on a paid instance of Google Colab.

The most impressive part of this project is that they got Google Colab to not close the session for 24 hours straight.


Thank you for the submission @mutant_glofish and efforts @llsourcell if you are reading this.

Could you elaborate how much compute (esp cost) you incurred for the fine-tuning? I think there is an entire blog post there that would be of great interest to numerous folks

Also could you evaluation on your evaluation criteria? How did you test this? How did you ensure the exams were not leaked into the training et?


Amazing that you could use the 7B model for this. Going to try it out tomorrow. How good are the answers it gives in your experience?


You want to trust a text generator with a known tendency to hallucinate?


The scary thought is that once this sort of thing starts to work, it will be used by call centers for low-end medical services.


Why not log in anonymously to a single shared service that can be maintained more easily than N instances for N patients?


Al I know is that just passing comp sci exam won’t get you a senior programming job.


I was excited to try it but some errors and comments in the notebook gave me pause.


Doesn't say much for the medical licensing exam...


Just reframe it as education


GP to a tee.


How can we use this to push doctors to be better doctors? Maybe if somehow and AI reviewed the doctors decisions?

Or maybe if doctors are required to provide a transcript of their observations and decisions and gives that to the patient who can then fact check it with gpt?


This is an amusingly naive take. Clinical decision support software has existed for 20 years, and some types of healthcare groups mandate the use of those tools. Moreover doctors all take notes and in many places those are available through patient portals already.


> maybe if doctors are required to provide a transcript of their observations and decisions and gives that to the patient who can then fact check it with gpt?

When you try to brainstorm so hard that a brainfart slips out.


> How can we use this to push doctors to be better doctors?

By never showing this to a medical professional, who will laugh you out of the room and never take AI seriously again.


Imagine a future where GPT can fact check something.

What a beautiful dream that would be.


A doctor is simply a database of past cases, information they've read, and medical knowledge.

They take in symptoms, perform tests, analysis output, repeat


A programmer is simply a database of past projects, information they’ve read, and programming knowledge.

They take in requirements, perform coding, analyse output, repeat.


No because programmers can produce new code/logic


Yes, field of diagnosis and experimental medication/treatment requires no new logic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: