This project is a web application built with ReactJS (frontend) and Python (backend) that detects whether a given text is human-written or AI-generated. It participates in the Voight-Kampff Generative AI Authorship Verification 2024 challenge.
- Generative AI Detection
- AI vs Human Detection: Classifies text as human-written or AI-generated.
- Machine Learning Models: Uses various models for accurate classification.
- ReactJS Frontend: Provides a user-friendly interface.
- Python Backend: Handles AI detection logic with robust backend processing.
- Lightweight Python web framework for building web applications.
- Provides HTTP request routing and response generation.
- Extension to enable Cross-Origin Resource Sharing (CORS).
- Allows server to respond to requests from different origins.
- Deep learning library used to load and run models.
- Runs the GPT-2 model for computing perplexity.
- Uses
GPT2LMHeadModel
andGPT2TokenizerFast
for loading the GPT-2 model and tokenizer. - GPT-2 is a pre-trained language model for text generation and analysis.
- Performs text processing tasks like character extraction and sentence splitting.
- Maintains order of dictionary entries for structured results.
The core algorithm uses GPT-2 to calculate Perplexity and Burstiness of input text:
- Measures how well the model predicts a sentence.
- Lower perplexity implies human-written text; higher implies AI-generated.
- Computed using the negative log likelihood of each word.
- Measures variation in perplexity across lines.
- High burstiness often indicates AI-generated content.
- Perplexity < 60: Likely AI-generated.
- Perplexity 60–80: Most likely AI, but requires more text for confirmation.
- Perplexity > 80: Likely human-written.
- Provides a POST route (
/
) accepting JSON payloads with a text field. - Returns label ("AI-generated" or "Human-written") with perplexity and burstiness scores.
- Content authenticity checks (articles, blogs, essays)
- AI detection in education (preventing AI plagiarism)
- Content moderation on social media or forums
Modules used: Flask, Flask-CORS, PyTorch, transformers, regex, and OrderedDict.
Algorithms used: Perplexity (text predictability), Burstiness (variation in sentence predictability), and thresholding for labeling the text as AI or human-written.
Use case: AI vs. human text detection, content authenticity verification.
To set up the project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/ankitsharma-tech/Generative-AI-Detection.git cd Generative-AI-Detection
-
Install frontend dependencies:
cd frontend npm install
-
Install backend dependencies:
cd ../server pip install -r requirements.txt
-
Start the backend server:
python app.py
-
Start the frontend server:
npm start
- Open your browser and go to
http://localhost:3000
. - Upload a pair of texts (one human-written and one AI-generated).
- Click on the "Analyse" button to see the results.
- The dataset used in this project consists of a collection of human-written and AI-generated texts.
- Texts are analyzed to calculate metrics like Perplexity and Burstiness to determine their likely origin (AI or human).
- Perplexity: Measures how well the AI model predicts the next word in a text. Lower perplexity suggests human authorship, while higher perplexity suggests AI generation.
- Burstiness: Measures the variation in perplexity across different lines of text. Higher burstiness often indicates AI-generated text.
- Reorganized the content under appropriate headings.
- Added a Data section for clarity on the dataset.
- Reformatting of steps in Installation to improve clarity.
Contributions are welcome! If you have suggestions for improvements or new features, please fork the repository and submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.