Audio-based emotion recognition has many applications in human-computer interaction, mental health assessment, and customer service analytics. This paper presents a machine learning-based on-device emotion (ie, anger, disgust, fear, happiness, neutrality, sadness, and surprise) recognition from audio for low-cost embedded devices. We show the influence of the speaker’s mental state on various acoustic features, such as intensity, shimmer, etc. However, classifying the emotions from audio is challenging, as these emotions sound ambiguous for different speakers. Our extensive evaluation with lightweight machine learning models indicates an overall F1-score of 61.2% with below 50 ms response time and 256 KB memory usage in modern embedded devices.