the problem is that for medical coding, this translates to "a small number of pr...

the problem is that for medical coding, this translates to "a small number of procedures will be coded wrong", and that's not a meaningfully better situation than "a small number of procedures can't be coded", and in fact is probably worse. So you need a reasonably high confidence threshold, and really in most cases you probably want to have a human manually review the problem (or questionable) cases.