Multi-RAG Portal is a web-based tool designed to help you interact more intuitively with PDF resources. By harnessing Retrieval-Augmented Generation (RAG), this system enables you to effortlessly upload PDF documents and pose natural language queries. It seamlessly integrates a FastAPI-powered backend with a user-friendly frontend built using Streamlit. The result is a fluid user experience where you can find information based on the contents of your uploaded files, all through simple, conversational queries.
- Document Submission: Easily send in your PDF documents for indexing.
- Intelligent Q&A: Ask questions about your PDFs and receive context-rich, human-like answers.
- User-Friendly UI: Enjoy a clean, intuitive interface that simplifies interaction.
- Interactive Frontend: Benefit from an interactive Streamlit-based frontend for enhanced user experience.
Before starting, ensure you have the following set up:
- Python 3.10.12++
- pip (Python’s package manager)
- Node.js and npm (if you need to adjust frontend components)
- Streamlit: For the frontend interface
For processing PDF files, you’ll need poppler-utils:
- Ubuntu/Debian:
sudo apt-get install poppler-utils
- macOS:
brew install poppler
- Windows: Obtain Poppler from its official source and include its
bin
directory in yourPATH
.
git clone https://github.com/RohmaButt/MultiModal-RAG.git
cd multi-rag-portal
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
pip install -r requirements.txt
- For PDF processing:
pip install pdf2image pillow
- On Ubuntu/Debian:
sudo apt-get install poppler-utils
- On macOS:
brew install poppler
- On Windows:
Download and install Poppler manually. Add its bin directory to your PATH.
The application requires an OpenAI API key for generating embeddings and processing queries. You can set this up using a .env file.
- a. Install python-dotenv
pip install python-dotenv
-b. Create a .env File
In the root directory of your project, create a file named .env and add your OpenAI API key:
OPENAI_API_KEY=your-openai-api-key-here"
Activate Your Virtual Environment (if not already activated):
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
uvicorn api:app --reload
Open a web browser and navigate to http://localhost:8000 to access the FastAPI backend (primarily used by the Streamlit frontend).
Ensure the Backend is Running:
Make sure the FastAPI server is active as described above.
In a separate terminal window, navigate to the project directory and run:
streamlit run frontend.py
- Use the Streamlit interface to upload PDFs and submit queries.
- View responses directly within the Streamlit app, including any embedded images.
The FastAPI backend provides the following endpoints:
- GET /: Serves the main HTML page (primarily used for backend testing; the Streamlit frontend interacts directly with the API).
- POST /upload_pdf/: Endpoint for uploading PDF files.
- POST /query/: Endpoint for submitting queries.
- GET /
- Description: Serves the main HTML page.
- Usage: Access via http://localhost:8000/ in a web browser.
- Response: Returns the index.html file located in the static directory.
- POST /upload_pdf/
- Description: Uploads a PDF file for processing.
- Parameters:
- file (form data): The PDF file to upload.
- Response:
- Success (200):
{
"filename": "uploaded_file.pdf",
"message": "File uploaded successfully"
}
- Error (400):
{
"detail": "Only PDF files are allowed"
}
- POST /query/
- Description: Submits a natural language query related to the uploaded PDFs.
- Parameters:
- question (JSON body): The user's query.
{
"question": "What is the main result of the paper?"
}
- Response:
- Success (200):
{
"response": "The main result of the paper is..."
}
- Error (400):
Copy code
{
"detail": "Please upload a PDF first"
}
MultiModal-RAG/
├── api.py
├── addvectorstore.py
├── main.py
├── load_files.py
├── imgproc.py
├── ragretriever.py
├── retrieval.py
├── frontend.py
├── requirements.txt
├── .env
├── static/
│ └── index.html
├── uploads/
│ └── (uploaded PDFs and extracted images)
└── README.md