Is the encoder style arch better for representing classification tasks at a give... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

robrenaud 5 months ago | parent | context | favorite | on: What happened to BERT and T5?

Is the encoder style arch better for representing classification tasks at a given compute budget than a causal LM?

Is this because the final represention in bert style models more globally focused, rather than being optimized for next token prediction?

jerrygenser 5 months ago [–]

They are 100% better for classification at a given compute budget. They can account for information before and after e.g. a token for token classification and use that information to classify.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact