In this session, Chaoyu Yang, Founder and CEO at BentoML, talked about the practical considerations of building private Retrieval-Augmented Generation (RAG) applications, utilizing a mix of open source and custom LLMs.
He also talked about OpenLLM (https://github.com/bentoml/OpenLLM) and how it can help with LLM Deployments.
Topics that were covered:
✅ The benefits of self-hosting open source LLMs or embedding models for RAG.
✅ Common best practices in optimizing inference performance for RAG.
✅ BentoML for building RAG as a service, seamlessly chaining language models with various components, including text and multi-modal embedding, OCR pipelines, semantic chunking, classification models, and reranking models.