Fine-tuning LLMs vs RAG - Which Approach for Production?
Fine-tuning LLMs vs RAG - Which Approach for Production?
Hi ML community! I'm working on a project that requires domain-specific knowledge for a chatbot. I'm torn between fine-tuning an existing LLM (like Llama or Mistral) versus implementing a RAG (Retrieval-Augmented Generation) system. What are your experiences with production deployments? RAG seems more maintainable but fine-tuning might give better performance. Cost and update frequency are also considerations. Any insights would be appreciated!