I'd like to see this using ONNX and streaming from storage (I have my reasons, b...

rcarmo 5 months ago | parent | context | favorite | on: Llama3 implemented from scratch

I'd like to see this using ONNX and streaming from storage (I have my reasons, but mostly about using commodity hardware for "slow" batch processing without a GPU)