A professional iOS application for real-time speech recognition with advanced audio processing, timing data capture, and comprehensive export capabilities. Built with Swift and SwiftUI, this app provides high-quality transcription services with professional-grade timing formats suitable for video editing and accessibility workflows.
- Real-time Speech Recognition: High-quality transcription using Apple's Speech framework with millisecond precision
- Audio Recording: Configurable audio quality settings with native hardware format detection
- Timing Data Capture: Precise timing information for each transcription segment
- Session Management: Save and manage multiple recording sessions with metadata
- Multiple Text Formats: Export as Plain Text, RTF, or Markdown
- Professional Timing Formats: Export timing data in SRT, VTT, TTML, and JSON formats
- Audio + Timing Export: Combined audio and timing data export for professional workflows
- Native Sharing: iOS share sheet integration and Files app support
- Intelligent Autoscroll: Automatically follows new text with manual override capability
- Customizable Interface: Adjustable themes (Light, Dark, High Contrast) and text sizes
- Audio Visualization: Real-time waveform display and VU meter
- Accessibility: VoiceOver support and accessibility-focused design
- Automated Build System: Comprehensive build and test automation scripts
- Quality Assurance: Unit tests and UI tests with detailed reporting
- Development Tools: Quick iteration scripts for rapid development cycles
Microphone Input → Native Format Detection → Audio Recording → Speech Recognition → Timing Data Capture
- SpeechRecognizer: Core speech recognition with Apple's Speech framework
- AudioRecordingManager: High-quality audio recording with configurable settings
- TimingDataManager: Precise timing data capture and session management
- ExportManager: Multiple export formats with background processing
- AudioPlaybackManager: Synchronized audio playback with text highlighting
- Session Storage: Local storage of recording sessions with timing metadata
- Export System: Professional format support for video editing workflows
- Cache Management: Efficient caching of audio files and processed data
- Xcode 15.0 or later
- iOS 15.0 or later (supports 99%+ of active iOS devices)
- iPhone 6s or newer / iPad Air 2 or newer
- Clone the repository:
git clone https://github.com/SerialForBreakfast/SpeechDictation.git
- Open the project in Xcode:
open SpeechDictation.xcodeproj
Use the included automation scripts for development:
# Build only (default)
./utility/build_and_test.sh
# Build with unit tests
./utility/build_and_test.sh --enableUnitTests
# Target specific simulator
./utility/build_and_test.sh --simulator-id <UUID>
# Quick iteration for development
./utility/quick_iterate.sh
- Select your target device or simulator
- Build and run the project
- Grant microphone and speech recognition permissions when prompted
- Start Recording: Tap "Start Listening" to begin real-time transcription
- View Transcript: Watch as speech is converted to text in real-time
- Export Results: Use the export button to save in various formats
- Manage Sessions: Access previous recordings and timing data
- Text Export: Plain text, RTF, or Markdown formats
- Timing Export: SRT subtitles, VTT captions, TTML, or JSON data
- Audio Export: Combined audio and timing data for video editing
- Themes: Light, Dark, and High Contrast modes
- Text Size: Adjustable for better readability
- Audio Quality: Configure recording quality settings
SpeechDictation-iOS/
├── SpeechDictation/ # Main application
│ ├── Services/ # Audio recording, playback, timing management
│ ├── Speech/ # Speech recognition and audio processing
│ ├── UI/ # SwiftUI views and components
│ ├── Models/ # Data models and structures
│ └── SpeechDictationApp.swift # App entry point
├── SpeechDictationTests/ # Unit tests
├── SpeechDictationUITests/ # UI automation tests
├── utility/ # Build and development automation
│ ├── build_and_test.sh # Comprehensive build automation
│ ├── quick_iterate.sh # Fast development iteration
│ └── README.md # Utility documentation
└── memlog/ # Project documentation
├── tasks.md # Comprehensive task management
├── changelog.md # Project change history
└── directory_tree.md # Project structure documentation
- Minimum: iOS 15.0 (covers 99%+ of active devices)
- Current: Tested through iOS 18.x
- iPhone: iPhone 6s and newer
- iPad: iPad Air 2 and newer
- Audio Hardware: Native format detection ensures compatibility with all configurations
The project includes comprehensive build automation:
- Prerequisite validation: Checks Xcode, simulators, project structure
- Build automation: Clean builds with error handling
- Test execution: Unit and UI tests with detailed reporting
- Performance metrics: Build time, test counts, project size tracking
- All features must pass unit tests
- Code follows Swift concurrency best practices
- UI is accessible and follows iOS design guidelines
- Performance meets established benchmarks
- Core speech recognition with timing data
- Export and sharing system with professional formats
- Intelligent autoscroll system
- Build automation and quality assurance tools
- Text editing and correction capabilities
- Recording session management (pause/resume)
- Enhanced audio playback and review
- Advanced waveform visualization
- File management system
We welcome contributions! Please:
- Fork the repository
- Create a feature branch
- Make your changes with appropriate tests
- Use the build automation scripts to validate
- Submit a pull request
This project is licensed under the MIT License. See the LICENSE file for details.
For questions, suggestions, or support:
- Email: [email protected]
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Professional speech recognition for iOS with timing data precision