Game Analyzing Model Methods Attentively Guessing Alternative Model Mechanics Analytically Grasping Attention Mechanism Mysteries Accessibly
╭─────────────────────────────────────────────────────────╮
│ │
│ ☇ GAMMA - LLM Learning & Experimentation ☇ │
│ │
╰─────────────────────────────────────────────────────────╯
GAMMA is a comprehensive toolkit for exploring, comparing, and experimenting with Large Language Models (LLMs). It transforms complex AI concepts into interactive experiences.
- ☇ Interactive Game: Predict what the model will generate next and compete against AI
- ☛ Chat Interface: Simple, direct conversations with any supported model
- ☰ Tutorial Mode: Learn how LLMs work through guided lessons
- ☄ Quick Inference: Single-shot generation with performance metrics
- ☲ Model Comparison: Side-by-side analysis of different models
- ⚗ Mind Meld: Experimental multi-model collaboration system
- ⚗ Language Comparison: TypeScript vs JavaScript LLM code generation benchmarks
- ⚘ Color Library: dream.js - Material Design 3 color utilities with HCT color space
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install base requirements
pip install -r requirements.txt
# Choose your engine (install at least one):
pip install -r requirements-pytorch.txt # PyTorch (recommended)
pip install -r requirements-llamacpp.txt # llama.cpp for GGUF models
pip install -r requirements-onnx.txt # ONNX Runtime
pip install -r requirements-mlx.txt # Apple Silicon
# For language comparison benchmarks (optional)
cd src/benchmarks/dream
npm install
cd ../../..
# Unified CLI
python gamma.py game # Interactive game
python gamma.py comparison # Model comparison
python gamma.py mind-meld # Mind meld experiments
python gamma.py language-comparison # Benchmarks
# Direct entry points
python game.py # Interactive mode
python game.py --chat # Chat mode
python game.py --tutorial # Tutorial mode
python game.py --prompt "Explain quantum computing" # Quick inference
GAMMA supports models from multiple sources with automatic detection:
# GAMMA auto-detects Ollama models
ollama list
# Use directly - no configuration needed
python game.py # Interactive menu shows your Ollama models
Features:
- ☑ Auto-detection of all Ollama models
- ☑ No downloads required
- ☑ Works with llamacpp or ollama engine
- ☑ Deduplicates models found in multiple locations
- ☑ Shows model source (Ollama, HuggingFace, local files)
# Auto-downloaded on first use
python game.py --engine pytorch --model google/gemma-2-2b-it
# For gated models (like Gemma), login first:
huggingface-cli login
# Place GGUF files in models/ directory
python game.py --engine llamacpp --model models/my-model.gguf
# Or create symlinks to Ollama models:
ln -s ~/.ollama/models/blobs/sha256-abc123... models/qwen-coder.gguf
Engine | Best For | Hardware | Status | When to Use |
---|---|---|---|---|
llamacpp | GGUF models | CPU, GPU (ROCm/CUDA) | ☑ Fully Supported | Default - Quantized models, efficient inference |
pytorch | HF Transformers | CUDA, ROCm, MPS | ☑ Fully Supported | Full-precision models, latest HF models |
tensorflow | TF/Keras models | CUDA, CPU | ⚠ Experimental | TF-specific models or pipelines |
jax | JAX/Flax models | TPU, CUDA | ⚠ Experimental | TPU support or JAX models |
onnx | ONNX Runtime | CPU, CUDA, DirectML | ⚠ Experimental | Cross-platform, DirectML on Windows |
mlx | MLX-optimized | Apple M1/M2/M3/M4 | ⚠ Experimental | Apple Silicon MLX optimizations |
Quick Guide:
- ☐ Local models (Ollama) → Use
llamacpp
- ☁ HuggingFace models → Use
pytorch
(orllamacpp
for GGUF) - ♁ Apple Silicon → Use
llamacpp
(ormlx
if you have MLX models) - ☐ Windows without CUDA → Use
llamacpp
oronnx
- ⚗ TPU/specialized → Use matching engine (
jax
for TPU,tensorflow
for TF Serving)
Engine Selection Logic:
- Interactive menu auto-detects Ollama models → recommends llamacpp
- Falls back to PyTorch if HuggingFace is authenticated
- Shows warnings for gated models without authentication
- Displays available VRAM and memory requirements
Note: Ollama is a model provider (like HuggingFace), not an engine. GAMMA uses the llamacpp
engine to run Ollama's GGUF files directly.
python game.py
Features:
- Hardware detection (GPU, VRAM, CPU)
- Auto-detects Ollama models
- Shows memory requirements before loading
- Recommends engines based on your setup
- Local vs. downloadable model indicators
Menu Options:
- Just Play - Classic game with smart defaults
- Quick Tutorial - Start learning immediately
- Quick Compare - Compare 2 small models
- Classic Game - Full configuration options
- Tutorial Mode - Customized learning experience
- Comparison Mode - Multi-model analysis
- Mind Meld Mode - Experimental collaboration
# Classic game with specific model
python game.py --engine llamacpp --model models/qwen3-coder-30b.gguf
# Chat mode
python game.py --engine ollama --model qwen3-coder:30b --chat
# Single-shot inference with performance stats
python game.py --prompt "Write a Python hello world" --steps 20
# Compare two models
python game.py --comparison \
--comparison-models \
llamacpp:models/model1.gguf \
ollama:qwen3:30b
# Tutorial with specific model
python game.py --tutorial --engine pytorch --model google/gemma-2-2b-it
# Advanced options
python game.py \
--engine llamacpp \
--model models/my-model.gguf \
--temperature 0.7 \
--top-k 40 \
--top-p 0.95 \
--steps 50 \
--show-attention \
--verbose
# Core Settings
--engine ENGINE # ollama, llamacpp, pytorch, etc.
--model MODEL # Model name or path
--steps N # Max generation steps (default: 8)
--temperature T # Sampling temperature 0.1-2.0 (default: 0.7)
--top-k K # Top-K filtering (default: 8)
--top-p P # Nucleus sampling 0.0-1.0 (default: 0.95)
# Display Options
--show-attention # Show attention heatmaps
--verbose # Detailed explanations
--num-choices N # Choices per round (default: 4)
# Game Modes
--chat # Chat mode
--tutorial # Tutorial mode
--comparison # Comparison mode
--prompt "TEXT" # Single-shot inference
# Engine-Specific
--llama-cpp-n-gpu-layers N # GPU layers for llama.cpp (-1 = all)
--llama-cpp-n-ctx N # Context size (default: 2048)
--load-in-4bit # 4-bit quantization (PyTorch)
--load-in-8bit # 8-bit quantization (PyTorch)
- Auto-Detection: Finds models in Ollama, HuggingFace cache, and local directories
- Memory Estimation: Calculates VRAM requirements before loading
- Smart Defaults: Recommends engine and models based on hardware
- Deduplication: Detects same model in multiple locations
- Source Indicators: Shows where each model comes from
- Hardware Detection: CUDA, ROCm, Metal, CPU backends
- GPU Offloading: Configurable layer distribution
- KV Cache: Efficient context management
- Quantization: 4-bit and 8-bit support (PyTorch)
- Memory Mapping: Efficient model loading
Predict the model's next token choice. Learn about:
- Temperature effects on randomness
- Top-K and Top-P sampling
- Attention mechanisms
- Token probabilities
Simple conversation interface with:
- Multi-turn conversations
- Context preservation
- Exit commands (
/quit
,/exit
,/bye
)
Interactive lessons covering:
- How LLMs work
- Tokenization
- Sampling strategies
- Attention visualization
Side-by-side model analysis:
- Compare predictions
- See probability differences
- Understand model biases
- Test prompts across models
Multi-model collaboration featuring:
- Dynamic model swapping
- KV cache bridging
- Weighted averaging
- Agreement-based ensembling
- Custom swap strategies
python tools/download_model.py --repo-id <REPO_ID> --filename <FILENAME>
python tools/run_api_server.py --model <MODEL> --engine <ENGINE>
gamma/
├── gamma.py # Unified CLI entry point
├── game.py # Game entry point
├── src/
│ ├── game/ # ☇ Game-specific code
│ │ ├── game_logic.py
│ │ ├── game_displays.py
│ │ └── tutorial_mode.py
│ ├── comparison/ # ☲ Model comparison tools
│ │ └── comparison_mode.py
│ ├── mind_meld/ # ⚗ Multi-model collaboration
│ │ ├── strategies/
│ │ └── advanced/
│ ├── benchmarks/ # ⚗ Benchmarking suite
│ │ ├── mind_meld_benchmark.py
│ │ └── dream/ # DREAM: TypeScript vs JavaScript benchmarks
│ │ ├── index.js
│ │ ├── tasks/ # 20+ coding tasks
│ │ ├── runner/
│ │ └── evaluator/
│ ├── color_utils/ # ⚘ Color utilities
│ │ ├── dream.js # Material Design 3 colors
│ │ ├── demo/ # MILCHICK demo
│ │ └── test/
│ ├── core/ # Core shared utilities
│ │ ├── engine_interface.py
│ │ ├── interactive_menu.py
│ │ ├── model_catalog.py
│ │ └── ...
│ └── engines/ # Model execution engines
│ ├── pytorch_engine.py # ☑ Fully implemented
│ ├── llamacpp_engine.py # ☑ Fully implemented
│ ├── ollama_engine.py # ☑ Fully implemented
│ └── ...
├── tools/ # Standalone CLI tools
├── models/ # Local model storage
├── requirements*.txt # Dependencies
└── docs/ # Documentation
# GAMMA auto-detects your Ollama models
python game.py
# Select Ollama from the menu
# Choose your model from the list
# Models are marked with ☐ (local) and show size
python game.py \
--engine llamacpp \
--model models/qwen3-coder-30b.gguf \
--chat
python game.py \
--comparison \
--comparison-models \
ollama:qwen3-coder:30b \
ollama:deepseek-r1:32b \
--prompt "Write a Python function to calculate fibonacci"
# Use quantization for large models
python game.py \
--engine pytorch \
--model google/gemma-2-9b-it \
--load-in-4bit \
--chat
# Learn about LLMs interactively
python game.py --tutorial
# Or with a specific model
python game.py --tutorial \
--engine ollama \
--model gemma3:1b-it-qat
# Check Ollama is running
ollama list
# Restart GAMMA
python game.py
# Use smaller model
python game.py --model google/gemma-2-2b-it
# Use quantization
python game.py --load-in-4bit
# Reduce context size
python game.py --llama-cpp-n-ctx 1024
# Use CPU layers
python game.py --llama-cpp-n-gpu-layers 0
# For gated models (Gemma, Llama, etc.)
huggingface-cli login
# Or set token
export HF_TOKEN=your_token_here
# Rebuild llama-cpp-python with GPU support
pip uninstall llama-cpp-python
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
# Or for ROCm (AMD)
CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
Multi-model collaboration system (experimental):
python game.py \
--mind-meld \
--meld-models \
pytorch:google/gemma-2-2b-it \
pytorch:Qwen/Qwen2-1.5B-Instruct \
--meld-strategy round_robin
Swap Strategies:
fixed_interval
: Swap every N tokensround_robin
: Rotate through modelspattern
: Swap on specific patternsconfidence
: Swap when model is uncertainrandom
: Random switching
Advanced Features:
- Weighted averaging of logits
- Agreement-based ensembling (ABE)
- KV cache bridging (limited support)
- Vocabulary translation
See Mind Meld Documentation for details.
# In game.py or custom script
engine_config = {
'llama_cpp_n_ctx': 4096,
'llama_cpp_n_gpu_layers': -1,
'llama_cpp_lib_verbose': False,
'seed': 42
}
engine = get_engine('llamacpp', 'models/model.gguf', engine_config)
GAMMA and PAWS/REPLOID are complementary tools serving different purposes:
GAMMA:
- ☇ Educational focus - Learn how LLMs work
- ☲ Model comparison - Understand differences between models
- ⚗ Experimentation - Test sampling strategies, attention mechanisms
- ☐ Local operation - All models run locally
PAWS/REPLOID:
- ☇ Development focus - AI-assisted code generation
- ☲ Multi-agent competition - 3-5 LLMs compete with test-driven consensus
- ⚗ Production workflows - Git-backed reproducibility
- ☥ Visual review - Browser interface with diff viewer
Use GAMMA when: Learning about LLMs, testing models, exploring AI concepts Use PAWS/REPLOID when: Developing software, refactoring code, production changes
Both projects share:
- Philosophy of transparency and human oversight
- Support for local models (Ollama, HuggingFace)
- Educational value through clear explanations
- Open-source MIT license
Contributions welcome! Areas for help:
- ☇ Bug fixes and improvements
- ☐ Documentation
- ⚗ Tests
- ⚛ New game modes
- ⚙ Engine implementations
- ⛶ Benchmarking tools
See CONTRIBUTING.md for guidelines.
MIT License - See LICENSE for details.
Built with:
- ☰ Full docs: docs/
- ⚠ Report issues: GitHub Issues
- ☛ Discussions: GitHub Discussions
Made by developers who believe understanding AI is the first step to using it wisely.
☇ Interactive Learning × ☲ Model Comparison × ⚗ Experimentation = GAMMA