Evaluating Large Language Models Trained on Code | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

		Evaluating Large Language Models Trained on Code (arxiv.org)
		11 points by aray on July 8, 2021 \| hide \| past \| favorite \| 1 comment

yewenjie on July 8, 2021 [–]

> On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%.

Interesting that they are comparing their model with GPT-J.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact