Skip to content

1.4 Classical ML Engineering: Algorithms in Production

The Meta-Narrative

Classical models — Random Forests, XGBoost, SVMs — are not relics. On tabular data, they consistently outperform deep learning. On Kaggle leaderboards, in fraud detection, credit scoring, and recommender systems, gradient boosting dominates. This chapter goes beyond the theory (covered in 1.2) to focus on the engineering of classical ML: hyperparameter tuning strategies, interpretability, handling imbalanced data, and deployment patterns.


Why Default Hyperparameters Are Wrong

Every dataset has a different optimal configuration. The question is: how do you find it efficiently?

Exhaustive search over a predefined parameter grid. Simple but scales exponentially: \(k\) parameters with \(n\) values each = \(n^k\) evaluations.

Use when: Small search space, cheap model training.

Bergstra & Bengio (2012) showed that random search is more efficient than grid search. With 60 random trials, you cover 95% of the 1D slices of the search space.

Why it works: Most hyperparameters have a single "important" dimension — random search explores that dimension more densely than a grid along irrelevant dimensions.

Models the objective function \(f(\theta)\) as a Gaussian Process and uses an acquisition function to decide where to sample next:

\[ \theta_{next} = \arg\max_\theta \; \alpha(\theta) = \arg\max_\theta \; \mathbb{E}[I(\theta)] \]

where \(I(\theta) = \max(0, f(\theta) - f(\theta^*))\) is the expected improvement.

Tools: Optuna, Hyperopt, SMAC.

graph LR
    A["Define Search Space"] --> B["Sample Configuration"]
    B --> C["Train + Evaluate"]
    C --> D["Update Surrogate Model"]
    D --> E{"Budget Exhausted?"}
    E --> |"No"| B
    E --> |"Yes"| F["Return Best Config"]

Handling Imbalanced Data

When one class is 1% of the dataset (fraud, rare diseases):

Technique Mechanism When to Use
Class Weights Increase loss for minority class Built into most classifiers
SMOTE Synthetic oversampling of minority Small datasets
Undersampling Reduce majority class Very large datasets
Focal Loss Down-weight easy examples: \(L = -\alpha(1-p_t)^\gamma \log(p_t)\) Deep learning
Threshold Tuning Adjust decision threshold post-training Any classifier

The Deep Insight: Don't Optimize Accuracy

On imbalanced data, a model predicting the majority class always achieves 99% "accuracy." Use PR-AUC (Precision-Recall Area Under Curve), not ROC-AUC, when the positive class is rare.

Model Interpretability

Method Type Scope Strengths
Feature Importance Model-specific Global Built into trees; fast
SHAP Model-agnostic Local + Global Theoretically grounded (Shapley values)
LIME Model-agnostic Local Works on any black-box model
PDP / ICE Model-agnostic Global Visual, intuitive
🚀 Lab: Complete ML Pipeline with Optuna Tuning
import optuna
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import GradientBoostingClassifier

X, y = load_breast_cancer(return_X_y=True)

def objective(trial):
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 50, 500),
        'max_depth': trial.suggest_int('max_depth', 2, 8),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3, log=True),
        'min_samples_split': trial.suggest_int('min_samples_split', 2, 20),
        'subsample': trial.suggest_float('subsample', 0.6, 1.0),
    }
    model = GradientBoostingClassifier(**params, random_state=42)
    scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')
    return scores.mean()

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50, show_progress_bar=True)

print(f"Best accuracy: {study.best_value:.4f}")
print(f"Best params: {study.best_params}")

References

  • Bergstra, J. & Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization. JMLR.
  • Chawla, N. V. et al. (2002). SMOTE: Synthetic Minority Over-sampling Technique. JAIR.
  • Lundberg, S. M. & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. NeurIPS.