Ashish Kumar | Full-stack & ML Engineer

Machine learning in production is vastly different from Jupyter notebook experiments. After years of building ML systems at scale, here are the patterns that actually work.

The Reality of Production ML

Most ML tutorials end where the real challenges begin. Getting a model to work locally is maybe 10% of the effort. The remaining 90% involves:

Data pipeline reliability - Your model is only as good as your data
Feature consistency - Training/serving skew is a silent killer
Model versioning - Because you will need to rollback
Monitoring - Detecting drift before it impacts users

Architecture That Scales

The key insight is treating ML systems as software systems first. This means:

1. Feature Stores Are Non-Negotiable

from feast import FeatureStore

store = FeatureStore(repo_path=".")

# Consistent features for training and serving
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=["user_features:age", "user_features:activity_score"]
)

2. Immutable Data Pipelines

Every transformation should be versioned and reproducible. Use tools like DVC or MLflow to track:

Raw data versions
Preprocessing steps
Feature engineering logic

3. Shadow Deployments

Never ship directly to production. Run new models in shadow mode first:

# Log predictions without affecting users
shadow_prediction = new_model.predict(features)
log_shadow_result(shadow_prediction, production_prediction)

Monitoring Is Not Optional

Set up alerts for:

Input drift - Feature distributions changing
Output drift - Prediction distribution shifts
Performance metrics - Latency, throughput, error rates

The goal is catching issues before your users do.

Conclusion

Production ML is software engineering with statistical challenges. Treat it accordingly, and you'll build systems that actually work at scale.