
A modern, full-stack chat application demonstrating how to integrate React frontend with a Go backend and run local Large Language Models (LLMs) using Docker's Model Runner. This project features a comprehensive Redis-powered observability stack with real-time monitoring, analytics, and distributed tracing.

This project showcases a complete Generative AI interface with enterprise-grade observability that includes:
- React/TypeScript frontend with a responsive chat UI
- Go backend server for API handling
- Integration with Docker's Model Runner to run Llama 3.2 locally
- Redis Stack with TimeSeries for data persistence and analytics
- Comprehensive observability with metrics, logging, and tracing
- NEW: Redis-powered analytics with real-time performance monitoring
- Enhanced Docker Compose setup with full observability stack
- π¬ Interactive chat interface with message history
- π Real-time streaming responses (tokens appear as they're generated)
- π Light/dark mode support based on user preference
- π³ Dockerized deployment for easy setup and portability
- π Run AI models locally without cloud API dependencies
- π Cross-origin resource sharing (CORS) enabled
- π§ͺ Integration testing using Testcontainers
- π Redis-powered metrics and performance monitoring
- π Structured logging with zerolog
- π Distributed tracing with OpenTelemetry & Jaeger
- π Grafana dashboards for visualization
- π Advanced llama.cpp performance metrics
- π Redis Stack with TimeSeries, Search, and JSON support
- π Redis Exporter for Prometheus metrics integration
- π Token Analytics Service for usage tracking
- π Production-ready health checks and service dependencies
- π Auto-configured Grafana with Prometheus and Redis datasources
The application now consists of a comprehensive observability stack:
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Frontend β >>> β Backend β >>> β Model Runner β
β (React/TS) β β (Go) β β (Llama 3.2) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
:3000 :8080 :12434
β β
βββββββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β Grafana β <<< β Prometheus β β Jaeger β
β Dashboards β β Metrics β β Tracing β
βββββββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
:3001 :9091 :16686
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Redis Stack β β Redis Exporter β β Token Analytics β
β DB + Insight β β (Prometheus) β β Service β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
:6379, :8001 :9121 :8082
βββββββββββββββββββ
β Redis TimeSeriesβ
β Service β
βββββββββββββββββββ
:8085
- Docker and Docker Compose
- Git
- Go 1.19 or higher (for local development)
- Node.js and npm (for frontend development)
Before starting, pull the required model:
docker model pull ai/llama3.2:1B-Q8_0
Start the complete AIWatch observability stack:
# Clone the repository
git clone https://github.com/collabnix/aiwatch.git
cd aiwatch
# Start the complete stack (builds and runs all services)
docker-compose up -d --build
After deployment, access these services:
Service | URL | Credentials | Purpose |
---|---|---|---|
AIWatch Frontend | http://localhost:3000 | - | Main chat interface |
Grafana | http://localhost:3001 | admin/admin | Monitoring dashboards |
Redis Insight | http://localhost:8001 | - | Redis database GUI |
Prometheus | http://localhost:9091 | - | Metrics collection |
Jaeger | http://localhost:16686 | - | Distributed tracing |
Token Analytics | http://localhost:8082 | - | Usage analytics API |
TimeSeries API | http://localhost:8085 | - | Redis TimeSeries service |
After deployment, verify the observability stack is working:
-
Check Grafana Connection:
- Visit http://localhost:3001
- Login with admin/admin
- Go to Configuration > Data Sources
- Verify Prometheus datasource shows "β Data source is working"
- Verify Redis datasource is configured
-
Check Prometheus Targets:
- Visit http://localhost:9091/targets
- All targets should show State: UP:
prometheus:9090
(Prometheus itself)redis-exporter:9121
(Redis metrics)backend:9090
(Backend metrics)token-analytics:8082
(Analytics metrics)
-
View Pre-built Dashboard:
- In Grafana, go to Dashboards
- Open "AIWatch Redis Monitoring"
- You should see Redis metrics: Memory Usage, Connected Clients, Commands/sec
-
Redis Database (Port 6379)
- Primary data store for chat history and session management
- Redis TimeSeries for metrics storage
- Redis JSON for complex data structures
- Redis Search for full-text capabilities
-
Redis Insight (Port 8001)
- Web-based Redis GUI for database inspection
- Real-time monitoring of Redis performance
- Key-value browser and query interface
-
Redis Exporter (Port 9121)
- Exports Redis metrics to Prometheus
- Monitors memory usage, command statistics, connection counts
- Integration with alerting systems
-
Token Analytics Service (Port 8082)
- Tracks token usage patterns and costs
- API endpoint for analytics queries
- Integration with frontend metrics display
-
Redis TimeSeries Service (Port 8085)
- Dedicated API for time-series data operations
- Historical performance data storage
- Real-time metrics aggregation
- Real-time Redis Metrics: Memory usage, commands/sec, connections
- Token Usage Analytics: Input/output tokens, cost tracking, usage patterns
- Performance Monitoring: Response times, throughput, error rates
- Historical Data: Time-series storage of all metrics for trend analysis
- Grafana Integration: Pre-configured dashboards for Redis monitoring
- Auto-configured Datasources: Prometheus and Redis datasources automatically set up
The frontend is built with React, TypeScript, and Vite:
cd frontend
npm install
npm run dev
This will start the development server at http://localhost:3000.
The Go backend can be run directly:
go mod download
go run main.go
Make sure to set the required environment variables from backend.env
:
BASE_URL
: URL for the model runnerMODEL
: Model identifier to useAPI_KEY
: API key for authentication (defaults to "ollama")REDIS_ADDR
: Redis connection address (redis:6379)LOG_LEVEL
: Logging level (debug, info, warn, error)LOG_PRETTY
: Whether to output pretty-printed logsTRACING_ENABLED
: Enable OpenTelemetry tracingOTLP_ENDPOINT
: OpenTelemetry collector endpoint
- The frontend sends chat messages to the backend API
- The backend formats the messages and sends them to the Model Runner
- Chat history and session data are stored in Redis
- The LLM processes the input and generates a response
- The backend streams the tokens back to the frontend as they're generated
- Token analytics are collected and stored in Redis TimeSeries
- Redis metrics are exported to Prometheus for monitoring
- Observability components collect metrics, logs, and traces throughout the process
- Grafana dashboards provide real-time visualization of system performance
βββ compose.yaml # Complete observability stack deployment
βββ backend.env # Backend environment variables
βββ main.go # Go backend server
βββ frontend/ # React frontend application
β βββ src/ # Source code
β β βββ components/ # React components
β β βββ App.tsx # Main application component
β β βββ ...
βββ pkg/ # Go packages
β βββ logger/ # Structured logging
β βββ metrics/ # Prometheus metrics
β βββ middleware/ # HTTP middleware
β βββ tracing/ # OpenTelemetry tracing
β βββ health/ # Health check endpoints
βββ prometheus/ # Prometheus configuration
β βββ prometheus.yml # Scraping configuration
βββ grafana/ # Grafana configuration
β βββ provisioning/ # Auto-configuration
β β βββ datasources/ # Prometheus & Redis datasources
β β βββ dashboards/ # Dashboard provisioning
β βββ dashboards/ # Pre-built dashboard JSON files
βββ redis/ # Redis configuration
β βββ redis.conf # Redis server configuration
βββ observability/ # Observability documentation
βββ ...
The application includes detailed llama.cpp metrics displayed directly in the UI:
- Tokens per Second: Real-time generation speed
- Context Window Size: Maximum tokens the model can process
- Prompt Evaluation Time: Time spent processing the input prompt
- Memory per Token: Memory usage efficiency
- Thread Utilization: Number of threads used for inference
- Batch Size: Inference batch size
These metrics help in understanding the performance characteristics of llama.cpp models and can be used to optimize configurations.
The project includes comprehensive observability features:
- Model performance (latency, time to first token)
- Token usage (input and output counts)
- Request rates and error rates
- Active request monitoring
- Redis performance metrics (memory, commands, connections)
- Token analytics with cost tracking
- llama.cpp specific performance metrics
- Structured JSON logs with zerolog
- Log levels (debug, info, warn, error, fatal)
- Request logging middleware
- Error tracking
- Request flow tracing with OpenTelemetry
- Integration with Jaeger for visualization
- Span context propagation
For more information, see Observability Documentation.
The Redis setup includes:
- Persistence: RDB and AOF enabled for data durability
- Memory Optimization: Configured for optimal performance
- Security: Protected mode disabled for development (configure for production)
- TimeSeries: Enabled for metrics storage
- Networking: Bridge network for service communication
All services include:
- Health Checks: Automated service health monitoring
- Restart Policies: Automatic restart on failure
- Resource Limits: Memory and CPU constraints
- Logging: Centralized log collection
- Grafana Datasources: Automatically configured Prometheus and Redis connections
- Dashboard Provisioning: Pre-built Redis monitoring dashboard
- Prometheus Targets: All services automatically discovered and monitored
You can customize the application by:
- Changing the model in
backend.env
to use a different LLM - Modifying the frontend components for a different UI experience
- Extending the backend API with additional functionality
- Customizing the Grafana dashboards for different metrics
- Adjusting llama.cpp parameters for performance optimization
- Configuring Redis for different persistence and performance requirements
- Adding custom analytics using the Token Analytics Service API
- Creating custom dashboards in Grafana for specific monitoring needs
- Adding new datasources in
grafana/provisioning/datasources/
The project includes integration tests using Testcontainers:
cd tests
go test -v
- Model not loading: Ensure you've pulled the model with
docker model pull
- Connection errors: Verify Docker network settings and that Model Runner is running
- Streaming issues: Check CORS settings in the backend code
- Metrics not showing: Verify that Prometheus can reach the backend metrics endpoint
- Redis connection failed: Check Redis container status and network connectivity
- llama.cpp metrics missing: Confirm that your model is indeed a llama.cpp model
- Grafana dashboards empty: Ensure Prometheus is collecting metrics and data source is configured correctly
- Redis Insight not accessible: Check if port 8001 is available and Redis container is running
- Token analytics not working: Verify Redis TimeSeries module is loaded and service dependencies are met
- Performance degradation: Monitor Redis memory usage and consider adjusting configuration
- Data not persisting: Check Redis volume mounts and persistence configuration
If Grafana shows "No data" like in your screenshot:
-
Check Datasource Configuration:
# Verify Prometheus is accessible from Grafana container docker exec aiwatch-grafana wget -qO- http://prometheus:9090/api/v1/query?query=up
-
Check Prometheus Targets:
# View Prometheus targets status curl http://localhost:9091/api/v1/targets
-
Restart Stack (if needed):
docker-compose down docker-compose up -d --build
The issue you encountered was due to Docker networking - services within the Docker network must communicate using service names (like prometheus:9090
) rather than localhost:9090
. We've fixed this by:
- β
Mounting the
prometheus.yml
configuration file properly - β Using correct service names in Prometheus targets
- β Auto-configuring Grafana datasources with proper internal URLs
- β Adding pre-built Redis monitoring dashboard
Monitor service health using:
# Check all container status
docker-compose ps
# View specific service logs
docker-compose logs redis
docker-compose logs grafana
docker-compose logs prometheus
docker-compose logs token-analytics
- Memory Management: Configure
maxmemory
and eviction policies - Persistence: Balance between RDB and AOF based on use case
- Networking: Use Redis clustering for high availability
- Monitoring: Set up alerts for memory usage and connection limits
- Thread Configuration: Optimize thread count based on CPU cores
- Memory Settings: Configure context window based on available RAM
- Batch Processing: Adjust batch size for optimal throughput
If upgrading from a previous version:
- Backup existing data (if any)
- Stop current services:
docker-compose down
- Use new compose file:
docker-compose up -d --build
- Verify all services: Check health endpoints and Grafana dashboards
- Import existing data into Redis if needed
MIT
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- Docker Model Runner team for local LLM capabilities
- Redis Stack for comprehensive data management
- Grafana and Prometheus communities for observability tools
- OpenTelemetry project for distributed tracing standards