In this session, Jonathan from Shujin AI will talk about LLM benchmarks and their performance evaluation metrics. He will address intriguing questions such as whether Gemini truly outperformed GPT4-v. Learn how to review benchmarks effectively and understand popular benchmarks like ARC, HellSwag, MMLU, and more.
Topics that were covered:
🧠 Did Gemini really beat GPT4-v?
The performance showdown between Gemini and GPT 4, based on objective and detailed benchmark results.
🔍 What exactly are ARC, HellSwag, MMLU, etc.?
Gain insights into some of the most popular benchmarks in the LLM arena, such as ARC, HellSwag, and MMLU.
💪 How to review benchmarks and what to look out for?
Jonathan will guide you through a step-by-step process to assess these benchmarks critically, helping you understand the strengths and limitations of different models.