Hacker News new | past | comments | ask | show | jobs | submit login

what do they mean by "evaluating the model on corpus." and "Evalutes the corpus on the model".

I know what a LLM is and I know very well what is Bayesian Optimization. But I don't understand what this library is trying to do.

I am guessing it's tryng to test the model's ability to generate correct and relevant responses to a given input.

But who is the judge ?




Same. "Evaluate" and "corpus" need to be defined. I don't think OP intended this to be clickbait but without clarification it sounds like they're claiming 10x faster inference, which I'm pretty sure it's not.


Hi, OP here. It's not 10 times faster inference, but faster evaluation. You use evaluation on a dataset to check if your model is performing well. This takes a lot of time (might be more than training if you are just finetuning a pre-trained model on a small dataset)!

So the pipeline goes training -> evaluation -> deployment (inference).

Hope that explanation helps!


Evaluate is referring to measuring the accuracy of a model on a standard dataset for the purpose of comparing model performance. AKA benchmark.

https://rentruewang.github.io/bocoel/research/


Right I guess I am not familiar how automated Benchmarks for LLM work. I assumed to decide if an LLM answer was good required Human Evaluation.


Multiple choice tests, LM Eval (e.g. have GPT-4 rate an answer, or use M-of-N GPT-4 ratings as pass/fail), perplexity (i.e. how accurately can it reproduce a corpus that it was trained on).

Lots of ways to evaluate without humans. Most (nearly all) LLM benchmarks are fully automated, without any humans involved.


The "eval" phase is done after a model is trained to assess its performance on whatever tasks you wanted it to do. I think this is basically saying, "don't evaluate on the entire corpus, find a smart subset."


Hi, OP here. So you evaluate LLMs on corpuses to evaluate their performance right? Bayesian optimization is here to select points (in the latent space) and tell the LLM where to evaluate next. To be precise, entropy search is used here (coupled with some latent space reduction techniques like N-sphere representation and embedding whitening). Hope that makes sense!


The definition of "evaluate" isn't clear. Do you mean inference?


Perhaps I should clarify it in the project README. It's the phase to evaluate how well your model is performing. So the pipeline goes training -> evaluation -> deployment (inference) corresponding to the datasets in supervised training, training (training) -> evaluation (validation) -> deployment (testing).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: