Applied machine learning
AI Voice Emotion Detector
An independent machine learning project that classifies emotion from speech using the RAVDESS dataset, MFCC feature extraction with Librosa, a Random Forest classifier, and evaluation outputs like classification reports and confusion matrices.
Implementation highlights
Dataset preparation and preprocessing
I structured the workflow around RAVDESS audio files, label handling, and a clean preprocessing path so the model would receive consistent inputs instead of ad hoc feature blobs.
MFCC feature extraction
Librosa was used to extract MFCC-based features from each clip. That gave the project a practical speech representation layer without turning it into a deep-learning-only exercise.
Training and evaluation
I trained a Random Forest classifier, then looked at confusion matrices and per-class reporting to understand where the model was separating emotions well and where confusion remained.
Project gallery
Screens, outputs, and working views

Training output and classification report
A training run showing the generated feature data, reported accuracy, and the class-by-class evaluation summary.

Confusion matrix visualization
A matrix view of prediction quality that made it easier to see which emotion classes were being separated well and which were not.
Overview
I built this project to work through a complete, grounded ML pipeline on speech data instead of stopping at model fitting alone.
Emotion classification from audio is interesting because it forces preprocessing, feature representation, and evaluation to matter just as much as the model choice.
The project stays intentionally interpretable: real dataset, real feature extraction, classical model, and output artifacts that make the results easy to inspect.
Problem / Goal
Speech emotion recognition is harder than a simple label lookup because emotion is noisy, audio quality varies, and similar classes can overlap in feature space.
The project needed a workflow that was understandable from raw data to final evaluation, not just a model with a headline result.
That meant treating preprocessing and feature extraction as major parts of the implementation instead of assuming the classifier could recover from weak inputs.
Approach / Architecture
I used the RAVDESS dataset as the source of labeled speech samples and organized the data so each audio clip could be tied cleanly to its emotion label.
Using Librosa, I extracted MFCC features from each sample and converted the audio into a structured representation suitable for classical ML workflows.
After organizing and normalizing the dataset, I trained a Random Forest classifier in scikit-learn to keep the project grounded in a model that is practical and interpretable.
Matplotlib and the evaluation utilities from scikit-learn made it easier to inspect the results with more than just one summary score.
Engineering details
MFCC extraction was central because it converted raw waveform data into compact features that a traditional classifier could work with effectively.
The preprocessing pipeline mattered as much as the model itself. Clean labels, consistent feature structure, and normalization were all part of getting meaningful output.
I used a Random Forest because it gave me a solid baseline and made the behavior easier to reason about than a more opaque end-to-end setup.
The confusion matrix and classification report helped show which emotions were easier to separate and which ones were being confused with each other.
This project was valuable because it treated the entire ML workflow as the deliverable, not just the final training call.
Challenges
Audio work adds its own preprocessing complexity, and small decisions in feature extraction can change the quality of the input substantially.
Emotion labels are not perfectly separable, so the evaluation had to be framed honestly rather than oversold.
It took care to keep the code organized enough that the dataset, features, training, and plotting stayed easy to reason about.
What I learned
This project made the full ML pipeline feel more concrete to me, especially the importance of preprocessing and evaluation in audio work.
It also reinforced the value of interpretable outputs like confusion matrices when a model's mistakes matter as much as its successes.
More broadly, it reminded me that good ML writeups should explain the data and the workflow, not just the model name.
Previous
A collaborative full-stack build where I worked on authentication, user-flow polish, and integrating the frontend cleanly with backend behavior.
Next
Lower-level C++ work involving memory correctness, static libraries, POSIX I/O, FUSE, Linux tooling, and debugging under tighter constraints.