Research Paper Classifier with Gemini AI

Automated AI-Powered Research Paper Categorization System

Overview

The Research Paper Classifier is a sophisticated Streamlit application that leverages Google's Gemini AI to automatically categorize research papers into predefined domains. Designed for researchers, academicians, and AI enthusiasts, this tool streamlines paper organization and metadata management.

Key Features

🤖 Gemini AI Integration: Utilizes state-of-the-art LLM capabilities for accurate document classification
📁 Batch Processing: Handles multiple PDF files simultaneously with configurable input directories
⚙️ Customizable Categories: Supports both default and user-defined classification categories
📊 CSV Metadata Management: Maintains structured records of classifications with reasoning
📈 Real-Time Progress Tracking: Interactive progress bar and detailed processing logs
🔒 Secure API Handling: Safe management of Gemini API credentials

Installation

Clone Repository:

git clone https://github.com/Anas-Altaf/Doc-Annotator_py.git
cd Doc-Annotator_py

Create Virtual Environment:

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

Install Dependencies:

pip install streamlit pandas google-genai python-dotenv

Configuration

1.Gemini API Key:

Obtain from Google AI Studio
Store in .env file:
```
GEMINI_API_KEY=your_key_here
```

Directory Setup:
```
mkdir -p downloaded_papers metadata
```

Usage

Launch Application:
```
streamlit run app.py
```
Interface Guide:
- PDF Directory: Path containing research papers (default: ./downloaded_papers)
- CSV Output Path: Metadata storage location (default: ./metadata/papers_metadata.csv)
- API Key: Your Gemini API key (masked input)
- Custom Categories: Optional user-defined classification labels
Classification Process:
- Click "Start Classification" to initiate processing
- Monitor real-time progress in the dashboard
- View results in interactive DataFrame display
- Access historical data through generated CSV files

Screenshots

Main application interface with configuration options

Real-time progress tracking during classification

Final classification results with export options

Architecture

graph TD
    A[User Interface] --> B[PDF Directory]
    A --> C[Gemini API]
    B --> D[PDF Processor]
    C --> E[AI Classification]
    D --> E
    E --> F[CSV Metadata]
    F --> G[Results Visualization]

Troubleshooting

Common Issues:

FileNotFoundError: Ensure directories exist before processing
API Authentication Error: Verify correct Gemini API key
Invalid Response Format: Check PDF readability and AI response parsing

Debugging:

# Enable debug logging
STREAMLIT_DEBUG=1 streamlit run app.py

Performance

Metric	Specification
Avg Speed	5-100 pdfs/minute
Maximum File Size	50MB per PDF
Supported Languages	English technical text
Accuracy Range	99-100% (varies by domain)

Contributing

We welcome contributions! Please follow these steps:

Fork the repository
Create feature branch (git checkout -b feature/improvement)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/improvement)
Open Pull Request

License

Distributed under MIT License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.idea		.idea
images/ss		images/ss
results		results
test		test
.gitignore		.gitignore
Data-Annotator_py.iml		Data-Annotator_py.iml
README.md		README.md
app.py		app.py
example.env		example.env
index.md		index.md
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Research Paper Classifier with Gemini AI

Overview

Key Features

Installation

Configuration

Usage

Screenshots

Architecture

Troubleshooting

Performance

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Anas-Altaf/Doc-Annotator_py

Folders and files

Latest commit

History

Repository files navigation

Research Paper Classifier with Gemini AI

Overview

Key Features

Installation

Configuration

Usage

Screenshots

Architecture

Troubleshooting

Performance

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages