what do they mean by "evaluating the model on corpus." and "Evalutes the corpus ...

causal · 2024-02-13T17:14:25 1707844465

Same. "Evaluate" and "corpus" need to be defined. I don't think OP intended this to be clickbait but without clarification it sounds like they're claiming 10x faster inference, which I'm pretty sure it's not.

renchuw · 2024-02-13T19:00:30 1707850830

Hi, OP here. It's not 10 times faster inference, but faster evaluation. You use evaluation on a dataset to check if your model is performing well. This takes a lot of time (might be more than training if you are just finetuning a pre-trained model on a small dataset)!

So the pipeline goes training -> evaluation -> deployment (inference).

Hope that explanation helps!

deckar01 · 2024-02-13T17:31:00 1707845460

Evaluate is referring to measuring the accuracy of a model on a standard dataset for the purpose of comparing model performance. AKA benchmark.

https://rentruewang.github.io/bocoel/research/

skyde · 2024-02-13T17:35:52 1707845752

Right I guess I am not familiar how automated Benchmarks for LLM work. I assumed to decide if an LLM answer was good required Human Evaluation.

MacsHeadroom · 2024-02-13T18:21:43 1707848503

Multiple choice tests, LM Eval (e.g. have GPT-4 rate an answer, or use M-of-N GPT-4 ratings as pass/fail), perplexity (i.e. how accurately can it reproduce a corpus that it was trained on).

Lots of ways to evaluate without humans. Most (nearly all) LLM benchmarks are fully automated, without any humans involved.

ragona · 2024-02-13T17:23:11 1707844991

The "eval" phase is done after a model is trained to assess its performance on whatever tasks you wanted it to do. I think this is basically saying, "don't evaluate on the entire corpus, find a smart subset."

renchuw · 2024-02-13T18:58:16 1707850696

Hi, OP here. So you evaluate LLMs on corpuses to evaluate their performance right? Bayesian optimization is here to select points (in the latent space) and tell the LLM where to evaluate next. To be precise, entropy search is used here (coupled with some latent space reduction techniques like N-sphere representation and embedding whitening). Hope that makes sense!

hackerlight · 2024-02-13T19:00:13 1707850813

The definition of "evaluate" isn't clear. Do you mean inference?

renchuw · 2024-02-13T19:04:38 1707851078

Perhaps I should clarify it in the project README. It's the phase to evaluate how well your model is performing. So the pipeline goes training -> evaluation -> deployment (inference) corresponding to the datasets in supervised training, training (training) -> evaluation (validation) -> deployment (testing).