AI-powered detection of genuine, synthesized, and replayed speech using ensemble deep learning models.
Drag & drop your audio file here, or click to browse
Our ensemble model combines RawNet2, AASIST, and ECAPA-TDNN architectures to analyze audio at both raw waveform and spectral feature levels. The system extracts over 40 acoustic features including MFCCs, chroma, spectral contrast, and temporal patterns.
AI-Generated: Speech synthesized by TTS systems (VITS, Bark, etc.)
Replay Attacks: Recorded audio played back through speakers
Genuine: Authentic human speech recorded directly
WAV, MP3, FLAC, OGG, M4A
Sample rates: 8kHz, 16kHz, 22.05kHz, 44.1kHz, 48kHz
Mono or Stereo (auto-converted to mono)
Max duration: 30 seconds