Hacker News new | past | comments | ask | show | jobs | submit | from login
New Gemini 1.5 Pro (0801) top on LMSys leaderborad (twitter.com/lmsysorg)
9 points by zopper 9 months ago | past
Gemma 2B: Scores better than GPT 3.5 Turbo (twitter.com/lmsysorg)
8 points by FergusArgyll 9 months ago | past
Chatbot Arena Leaderboard: Gemini 1.5 Flash, Pro and Advanced Results (twitter.com/lmsysorg)
57 points by tosh 11 months ago | past | 38 comments
GPT-2 Chatbots Top the Arena with +50 Elo, Strongest Model Ever (twitter.com/lmsysorg)
2 points by georgehill on May 14, 2024 | past
Gemini 1.5 Pro is now #2 on the leaderboard (twitter.com/lmsysorg)
1 point by kmisiunas on April 23, 2024 | past
Gemini 1.5 moves to #2 on the lmsys arena leaderboard (twitter.com/lmsysorg)
3 points by petulla on April 23, 2024 | past
Llama-3 now top-5 on the Arena leaderboard (twitter.com/lmsysorg)
4 points by tosh on April 22, 2024 | past
Lmsys Arena results are out: Claude 3 Opus behind turbo, ahead of classic GPT4 (twitter.com/lmsysorg)
3 points by vitorgrs on March 7, 2024 | past | 1 comment
Google's Bard shows big leap on LLM performance leaderboard (twitter.com/lmsysorg)
132 points by mkmk on Jan 26, 2024 | past | 91 comments
Mistral Medium reaches Claude-level performance on Chatbot Arena (twitter.com/lmsysorg)
4 points by reissbaker on Jan 10, 2024 | past | 2 comments
Gemini Pro, Mixtral (Mistral-Small) vs. GPT3.5 in LLM Arena (twitter.com/lmsysorg)
2 points by Palmik on Dec 16, 2023 | past
Vicuna v1.5 series, featuring 4K and 16K context, based on Llama 2 (twitter.com/lmsysorg)
168 points by tosh on Aug 3, 2023 | past | 43 comments
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena (twitter.com/lmsysorg)
20 points by weichiang on June 16, 2023 | past
Google PaLM 2 ranked 6th on the LLM benchmark in the wild (twitter.com/lmsysorg)
1 point by weichiang on May 25, 2023 | past
Chatbot Arena: a crowd-sourced LLM leaderboard (twitter.com/lmsysorg)
1 point by weichiang on May 12, 2023 | past | 1 comment
Chatbot Arena Leaderboard: OpenAI GPT-4 and Anthropic Claude Take the Lead (twitter.com/lmsysorg)
2 points by MMMercy2 on May 10, 2023 | past
Fastchat-T5: 4x smaller but more powerful than Dolly-v2, commercial use ready (twitter.com/lmsysorg)
7 points by zhisbug on April 28, 2023 | past | 1 comment
Vicuna releases its secrete of finding available A100s on the cloud to train it (twitter.com/lmsysorg)
4 points by zhwu on April 13, 2023 | past | 2 comments
State-of-the-Art Chatbot, Vicuna-7B, now runs on MacBook with GPU acceleration (twitter.com/lmsysorg)
126 points by weichiang on April 6, 2023 | past | 84 comments
State-of-the-art open-source chatbot, Vicuna-13B, just released model weights (twitter.com/lmsysorg)
271 points by weichiang on April 3, 2023 | past | 139 comments

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: