SignesTrad: Real-Time French Sign Language to Text Interpreter

SignesTrad is a revolutionary edge AI solution that bridges communication gaps between the deaf community and hearing individuals by translating French Sign Language (LSF) to text in real-time. Leveraging the powerful STM32N6 board and computer vision technology, our proposed device will capture hand gestures through its integrated camera, process them using optimized neural networks, and instantly display the translated French text. This portable, low-latency solution will work independently without requir
- Data Acquisition System: We utilize the MIPI camera interface of the STM32N6 board to capture high-quality video input at 30fps, with the camera positioned to clearly view the signer's hands and upper body movements.
- AI Processing Pipeline: The core of our solution involves a two-stage deep learning approach:
- A modified MobileNetV3 model for hand and pose detection that identifies and tracks key points on the hands, arms, and face
- A temporal GRU (Gated Recurrent Unit) network that analyzes sequences of movements to recognize grammatical structures specific to LSF
- Optimization for Edge Deployment: We plan to employ several techniques to ensure optimal performance on the STM32N6:
- Model quantization to 8-bit precision
- Layer fusion to reduce memory transfers
- Custom activation functions optimized for the Neural-ART architecture
- Hardware-specific memory allocation to minimize data transfer bottlenecks
- User Interface: A clean, intuitive interface displays the translated text on the integrated LCD screen. The system includes:
- Real-time text display with minimal latency (<200ms from gesture to text)
- Confidence indicators for ambiguous interpretations
- Simple controls for adjusting sensitivity and language preferences
- Battery status and system diagnostics
- The powerful STM32N6 microcontroller serves as the system's brain, coordinating all operations
- The Neural-ART NPU accelerates neural network inference by up to 30x compared to CPU-only execution
- The MIPI connector interfaces with our custom camera module for high-quality video input
- The 32MB HexaRAM provides sufficient memory for our model's activation maps and intermediate results
- The onboard LCD display presents translated text to the user
- The SD card slot stores our model weights and optional recording capabilities for system improvement
- Camera Interface Module: Will handle video capture, preprocessing, and frame management
- AI Inference Engine: Will coordinate the execution of our neural networks on the Neural-ART NPU
- Sign Language Processing: Will post-process network outputs to handle linguistic features of LSF
- User Interface Manager: Will control display output and user input
- System Management: Will handle power, connectivity, and resource allocation
- Sign recognition accuracy: 85-90% | For isolated signs
- Sentence comprehension: 75-80% | For simple sentences
- Processing latency: <250ms | From gesture to display
- Frame rate: 12-15 FPS | Sufficient for fluid interpretation
- Power consumption: ~1.5W | Estimated during active use
- Initial vocabulary size: 500-600 words | First implementation
- Boot time: ~5s | From power-on to ready state
- Month 1-2: Dataset collection and model training
- Month 3: Initial algorithm implementation and optimization
- Month 4: Hardware integration and testing
- Month 5: User interface development and performance tuning
- Month 6: Final testing, validation, and documentation
- Social Impact: Our project addresses a real-world accessibility challenge faced by approximately 300,000 deaf individuals in France who use LSF as their primary means of communication.
- Technical Innovation: We push the boundaries of what's possible with edge AI on microcontrollers, demonstrating how complex computer vision and natural language processing can be optimized for embedded systems.
- Complete Utilization of STM32N6 Capabilities: Our solution leverages virtually all the key features of the STM32N6 Discovery Kit, from its Neural-ART NPU to its camera interface, memory resources, and display capabilities.
- Practical Implementation: SignesTrad is designed to be usable in everyday situations, with careful attention to user experience, battery life, and real-world performance.
- Future Potential: The project establishes a foundation for expanded capabilities and could lead to commercial applications that bring tangible benefits to the deaf community.
Discussion (0 commentaire(s))