This Streamlit application implements a Retrieval-Augmented Generation (RAG) system for intelligent document-based question answering, enabling users to upload PDFs and interactively query their contents.
- PDF document upload and processing
- Advanced text chunking and embedding
- Vector storage using Pinecone
- AI-powered question answering with Mistral
- Interactive chat interface
- Streamlit
- Pinecone
- LangChain
- Mistral AI
- HuggingFace Embeddings
- Upload PDF files through Streamlit interface
- Extract and chunk text using advanced splitters
- Generate high-dimensional embeddings
- Store vectorized documents in Pinecone index
- Retrieve contextually relevant document chunks
- Generate precise answers using Mistral AI
- Provide source document references
streamlit
pinecone-client
langchain
transformers
mistralai
- Pinecone API Key
- Mistral AI API Key
- Model:
BAAI/bge-large-en-v1.5
- Dimensions: 1024
- Device: CPU/CUDA
- Upload PDF documents
- Click "Process Documents"
- Ask questions in chat interface
- Receive AI-generated answers with source references
User uploads research papers ➡️ Documents are chunked and embedded ➡️ User asks: "What are the key findings?" ➡️ AI retrieves relevant sections ➡️ Generates comprehensive answer
- Secrets managed via Streamlit
- Temporary file handling
- Secure API key management
- Multi-language support
- Enhanced embedding models
- More granular source tracking
- Advanced filtering options
[MIT]
- [Gauri Sharan]
- Streamlit Community
- Pinecone
- Mistral AI
- LangChain Team