Learn With Martian - Technical Specification (2025)

This project is based on a research paper titled "Learn With Martian: A Tool For Creating Assignments That Can Write And Re-Write Themselves" presented at the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023).

The paper describes Learn, an educational technology platform that uses natural language processing and AI to automatically generate and improve educational questions and assignments. Key features include:

Main Capabilities:

Automatically generates questions from course materials (textbooks, PDFs, videos, etc.)
Uses large language models like GPT-3, Codex, and T5 for question generation
Implements spaced repetition to optimize student learning
Provides analytics on question effectiveness using Item Response Theory
Can create rich, interactive questions with code, animations, and games
Allows questions to be automatically re-written and improved based on student performance data

Results from Case Study:

Tested at University of Pennsylvania with over 1,000 students
Students using Learn scored 0.29 standard deviations higher on exams
83% of students preferred Learn over traditional reading quizzes
Every 15 minutes of additional studying with Learn led to 0.08σ improvement in exam scores

The tool aims to reduce the workload for instructors while improving educational outcomes through AI-powered question generation and adaptive learning techniques.

Here we are aiming to recreate the project.

Project Overview

Build a web-based educational platform that automatically generates, displays, and improves questions/assignments using NLP and machine learning techniques.

Core Architecture

Recommended Tech Stack (2025)

Frontend:

Framework: Next.js 14+ with App Router
UI Library: React 18+
Styling: Tailwind CSS + shadcn/ui components
State Management: Zustand or TanStack Query
Rich Text Editor: Lexical or TipTap
Code Editor: Monaco Editor (VS Code editor)
Math Rendering: KaTeX
Charts/Analytics: Recharts or Tremor

Backend:

Framework: FastAPI (Python) or Node.js with Hono/Bun
API Layer: GraphQL with Apollo Server or tRPC
Database: PostgreSQL with Prisma ORM
Vector Database: Pinecone or Weaviate (for embeddings)
Queue System: BullMQ (Redis-based) or Temporal
File Storage: S3-compatible (AWS S3, Cloudflare R2)
Caching: Redis with Upstash for serverless

Infrastructure:

Deployment: Vercel (frontend) + Railway/Fly.io (backend)
Container Orchestration: Docker + Kubernetes or Cloud Run
Monitoring: Sentry + Datadog/New Relic
Analytics: PostHog or Mixpanel

ML Infrastructure Updates

LLM Integrations:

class LLMClients:
    def __init__(self):
        # Current frontier models (2025)
        self.claude_client = Anthropic(api_key=CLAUDE_KEY)  # Claude 3 Opus/Sonnet
        self.openai_client = OpenAI(api_key=OPENAI_KEY)    # GPT-4o, GPT-4 Turbo
        self.google_client = GoogleAI(api_key=GOOGLE_KEY)   # Gemini Pro 1.5
        self.cohere_client = Cohere(api_key=COHERE_KEY)     # Command R+
        
        # Open source alternatives
        self.llama_client = Together(api_key=TOGETHER_KEY)  # Llama 3 70B
        self.mistral_client = Mistral(api_key=MISTRAL_KEY)  # Mixtral 8x7B
        
        # Specialized models
        self.embedding_model = OpenAIEmbeddings()  # text-embedding-3-large
        self.code_model = OpenAI()  # GPT-4 with code interpreter

Updated Feature Specifications

1. Material Upload System

Modern Implementation:

// Frontend upload component with Next.js
interface MaterialUploader {
  uploadFile: (file: File) => Promise<ProcessedMaterial>
  scrapeWebsite: (url: string) => Promise<ScrapedContent>
  processDocument: (doc: Document) => Promise<ExtractedText>
}

// Backend processing with FastAPI
from fastapi import FastAPI, UploadFile
from langchain.document_loaders import PyPDFLoader, UnstructuredAPIFileIOLoader
from whisper import load_model  # For audio transcription

class DocumentProcessor:
    async def process_upload(self, file: UploadFile):
        # Modern document processing pipeline
        if file.content_type == 'application/pdf':
            loader = PyPDFLoader(file)
        elif file.content_type in ['audio/mpeg', 'video/mp4']:
            # Use Whisper or Assembly AI
            transcriber = WhisperModel('large-v3')
        # Process and chunk for LLM consumption

2. Question Generation System

Updated Pipeline:

from langchain.chains import LLMChain
from langchain.prompts import ChatPromptTemplate
import instructor  # For structured outputs

class ModernQuestionGenerator:
    def __init__(self):
        # Initialize with multiple models for redundancy/comparison
        self.primary_llm = Claude3Opus()
        self.secondary_llm = GPT4Turbo()
        self.code_llm = GPT4CodeInterpreter()
        
        # Use instructor for structured outputs
        self.instructor_client = instructor.from_anthropic(
            Anthropic()
        )
    
    async def generate_questions(
        self, 
        material: str, 
        question_type: QuestionType,
        bloom_level: BloomLevel,
        style_examples: List[Question] = None
    ) -> List[Question]:
        # Use structured generation with Pydantic models
        class QuestionOutput(BaseModel):
            question: str
            answer: str
            explanation: str
            difficulty: int
            concepts: List[str]
            
        # Generate with automatic retry and validation
        questions = await self.instructor_client.create(
            model="claude-3-opus-20240229",
            response_model=List[QuestionOutput],
            messages=[
                {"role": "system", "content": self.get_system_prompt()},
                {"role": "user", "content": material}
            ],
            max_retries=3
        )
        return questions

3. Enhanced Student Interface

Modern React Component:

// Using Next.js 14 with Server Components
export default async function QuestionDisplay({ 
  questionId 
}: { 
  questionId: string 
}) {
  // Server-side data fetching
  const question = await getQuestion(questionId)
  
  return (
    <Card>
      <CardContent>
        <QuestionRenderer 
          content={question.content}
          type={question.type}
        />
        {question.type === 'code' && (
          <MonacoEditor
            language={question.language}
            theme="vs-dark"
            options={{ minimap: { enabled: false } }}
          />
        )}
        {question.hasMath && (
          <KaTeXRenderer content={question.mathContent} />
        )}
      </CardContent>
    </Card>
  )
}

4. Advanced Analytics with AI

Modern Analytics Implementation:

class AIAnalytics:
    def __init__(self):
        self.llm = Claude3Sonnet()  # Fast model for analysis
        self.embeddings = OpenAIEmbeddings()
        self.vector_store = Pinecone()
    
    async def analyze_student_responses(
        self, 
        responses: List[StudentResponse]
    ) -> AnalysisReport:
        # Use embeddings for semantic similarity
        response_embeddings = await self.embeddings.embed_documents(
            [r.answer for r in responses]
        )
        
        # Cluster similar responses
        clusters = self.cluster_responses(response_embeddings)
        
        # Generate insights using LLM
        insights = await self.llm.analyze({
            "task": "analyze_student_misconceptions",
            "data": clusters,
            "context": "educational_assessment"
        })
        
        return AnalysisReport(
            common_errors=insights.errors,
            suggestions=insights.improvements,
            concept_gaps=insights.gaps
        )

5. Real-time Collaboration Features

WebSocket Integration:

// Real-time updates using Socket.io or native WebSockets
export function useRealtimeQuestions(assignmentId: string) {
  const [questions, setQuestions] = useState<Question[]>([])
  
  useEffect(() => {
    const ws = new WebSocket(`wss://api.learnwithmartian.com/ws/${assignmentId}`)
    
    ws.on('question:updated', (data) => {
      setQuestions(prev => updateQuestion(prev, data))
    })
    
    ws.on('analytics:live', (data) => {
      // Update live analytics dashboard
    })
    
    return () => ws.close()
  }, [assignmentId])
  
  return questions
}

6. Multimodal Question Generation

Support for Modern Multimodal Models:

class MultimodalQuestionGenerator:
    def __init__(self):
        self.vision_llm = GPT4Vision()
        self.audio_llm = Gemini15Pro()
        
    async def generate_from_image(
        self, 
        image: bytes, 
        context: str
    ) -> List[Question]:
        # Generate questions about diagrams, charts, etc.
        return await self.vision_llm.analyze_image(
            image=image,
            prompt="Generate educational questions about this image",
            context=context
        )
    
    async def generate_from_video(
        self, 
        video_url: str
    ) -> List[Question]:
        # Extract frames and audio, generate comprehensive questions
        frames = await self.extract_key_frames(video_url)
        transcript = await self.transcribe_audio(video_url)
        
        return await self.audio_llm.generate_multimodal_questions(
            frames=frames,
            transcript=transcript
        )

7. Advanced Personalization

Using Modern ML Techniques:

class PersonalizationEngine:
    def __init__(self):
        self.recommendation_model = TensorFlowRecommender()
        self.difficulty_adjuster = DifficultyAdjustmentModel()
        self.learning_style_detector = LearningStyleClassifier()
    
    async def personalize_questions(
        self, 
        student: Student
    ) -> List[Question]:
        # Get student's learning profile
        profile = await self.analyze_student_profile(student)
        
        # Adjust question selection and difficulty
        questions = await self.select_questions(
            difficulty=profile.optimal_difficulty,
            learning_style=profile.style,
            weak_concepts=profile.concepts_to_improve
        )
        
        # Rewrite questions to match student's level
        personalized = await self.llm.rewrite_for_student(
            questions=questions,
            student_level=profile.knowledge_level,
            preferred_style=profile.communication_style
        )
        
        return personalized

Modern Security & Compliance

Updated Security Stack:

Authentication: Clerk, Auth.js, or Supabase Auth
API Security: Rate limiting with Upstash, API key rotation
Data Privacy: GDPR/CCPA compliance tools
Encryption: End-to-end encryption for sensitive data
Audit Logging: Comprehensive activity tracking

LLM Safety:

class LLMSafetyWrapper:
    def __init__(self):
        self.moderation = OpenAIModerationAPI()
        self.prompt_guard = PromptInjectionDetector()
        
    async def safe_generate(self, prompt: str) -> str:
        # Check for prompt injection
        if await self.prompt_guard.is_malicious(prompt):
            raise SecurityException("Potential prompt injection detected")
            
        # Generate with safety constraints
        response = await self.llm.generate(
            prompt=prompt,
            safety_settings={
                "block_harmful": True,
                "academic_only": True
            }
        )
        
        # Post-generation moderation
        if not await self.moderation.is_safe(response):
            return await self.regenerate_safely(prompt)
            
        return response

Performance Optimizations

Modern Caching Strategy:

// Edge caching with Vercel KV or Upstash
export const questionCache = {
  async get(key: string) {
    return await kv.get(key)
  },
  
  async set(key: string, value: any, ttl: number = 3600) {
    await kv.set(key, value, { ex: ttl })
  }
}

// React Server Components with streaming
export default async function QuestionList() {
  const questions = await questionCache.get('questions')
  
  if (!questions) {
    // Stream questions as they're generated
    return (
      <Suspense fallback={<QuestionSkeleton />}>
        <StreamingQuestions />
      </Suspense>
    )
  }
  
  return <Questions data={questions} />
}

Deployment Architecture

Modern Cloud-Native Setup:

# docker-compose.yml for local development
version: '3.8'
services:
  frontend:
    build: ./frontend
    environment:
      - NEXT_PUBLIC_API_URL=http://localhost:8000
    ports:
      - "3000:3000"
      
  backend:
    build: ./backend
    environment:
      - DATABASE_URL=postgresql://...
      - REDIS_URL=redis://redis:6379
    ports:
      - "8000:8000"
      
  postgres:
    image: postgres:16
    volumes:
      - postgres_data:/var/lib/postgresql/data
      
  redis:
    image: redis:7-alpine
    
  vector_db:
    image: qdrant/qdrant
    ports:
      - "6333:6333"

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Learn With Martian - Technical Specification (2025)

Project Overview

Core Architecture

Recommended Tech Stack (2025)

Frontend:

Backend:

Infrastructure:

ML Infrastructure Updates

LLM Integrations:

Updated Feature Specifications

1. Material Upload System

Modern Implementation:

2. Question Generation System

Updated Pipeline:

3. Enhanced Student Interface

Modern React Component:

4. Advanced Analytics with AI

Modern Analytics Implementation:

5. Real-time Collaboration Features

WebSocket Integration:

6. Multimodal Question Generation

Support for Modern Multimodal Models:

7. Advanced Personalization

Using Modern ML Techniques:

Modern Security & Compliance

Updated Security Stack:

LLM Safety:

Performance Optimizations

Modern Caching Strategy:

Deployment Architecture

Modern Cloud-Native Setup:

About

Uh oh!

Releases

Packages

ccb/learn

Folders and files

Latest commit

History

Repository files navigation

Learn With Martian - Technical Specification (2025)

Project Overview

Core Architecture

Recommended Tech Stack (2025)

Frontend:

Backend:

Infrastructure:

ML Infrastructure Updates

LLM Integrations:

Updated Feature Specifications

1. Material Upload System

Modern Implementation:

2. Question Generation System

Updated Pipeline:

3. Enhanced Student Interface

Modern React Component:

4. Advanced Analytics with AI

Modern Analytics Implementation:

5. Real-time Collaboration Features

WebSocket Integration:

6. Multimodal Question Generation

Support for Modern Multimodal Models:

7. Advanced Personalization

Using Modern ML Techniques:

Modern Security & Compliance

Updated Security Stack:

LLM Safety:

Performance Optimizations

Modern Caching Strategy:

Deployment Architecture

Modern Cloud-Native Setup:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages