Scaling Private RAG with Open-Source and Custom Models šŸš€

Scaling Private RAG with Open-Source and Custom Models šŸš€

Tags
https://youtu.be/2tm0b8_TVr8?feature=shared

Watch here:

In this session, Chaoyu Yang, Founder and CEO at BentoML, talked about the practical considerations of building private Retrieval-Augmented Generation (RAG) applications, utilizing a mix of open source and custom LLMs. He also talked about OpenLLM (https://github.com/bentoml/OpenLLM) and how it can help with LLM Deployments. Topics that were covered: āœ… The benefits of self-hosting open source LLMs or embedding models for RAG. āœ… Common best practices in optimizing inference performance for RAG. āœ… BentoML for building RAG as a service, seamlessly chaining language models with various components, including text and multi-modal embedding, OCR pipelines, semantic chunking, classification models, and reranking models.