Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Give this project a try. I've been using it with promising results.

https://github.com/matthsena/AlcheMark




I tried with one PDF and was surprised to see it connect to some cloud service:

  2025-05-14 07:58:49,373 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): openaipublic.blob.core.windows.net:443
  2025-05-14 07:58:50,446 - urllib3.connectionpool - DEBUG - https://openaipublic.blob.core.windows.net:443 "GET /encodings/o200k_base.tiktoken HTTP/1.1" 200 361 3922
The project's README doesn't mention that anywhere...


The project's README mentions that it uses tiktoken[0], which is a separate project created by OpenAI.

tiktoken downloads token models the first time you use them, but it does not mention that. It does cache the models, so you shouldn't see more of those connections, if I'm understanding the code correctly.

[0] <https://github.com/openai/tiktoken>


I'll check it out!




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: