MR

Telco Churn Predictor

A two-stage churn model with explainable drivers on the IBM Telco dataset.

View on GitHub
Problem
Which customers are likely to churn, and what drives each prediction so retention can be targeted?
Built
A two-stage pipeline: a gradient-boosted risk scorer feeding an XGBoost classifier, with explainable drivers.
Models / methods
Gradient boosting (risk scoring) and XGBoost (classification).
Result
0.82 ROC-AUC on a held-out test set.
Strength shown
Practical pipeline design plus explainable, actionable output.
Links
GitHubCase Study
ROC curve — XGBoost (AUC 0.82)
ROC curve — XGBoost (AUC 0.82)
Confusion matrix
Confusion matrix
Top-20 feature importances (XGBoost)
Top-20 feature importances (XGBoost)

Charts and diagrams are real outputs and architecture from the project.

01Objective

Predict which customers are likely to churn and explain why, so retention efforts can be targeted at the highest-risk accounts.

02Dataset / input

The IBM Telco customer dataset — contract, billing, tenure, and service-usage attributes — with churn as the target (≈1,400 test customers in the confusion matrix).

03Model approach

  • Cleaning, encoding, and feature engineering on customer attributes
  • A two-stage pipeline: a gradient-boosted risk scorer feeding an XGBoost classifier
  • Threshold tuning and class-imbalance handling for fair evaluation

04Results / metrics

Reached 0.82 ROC-AUC on a held-out test set. Feature-importance analysis identified the strongest churn drivers (contract type, tenure, and charges), turning the model into actionable insight.

05Deployment / reproducibility

Reproducible notebook plus a saved model artefact ready to score new customers in batch.

06Limitations

  • A single public dataset that may not match a given operator's mix
  • Static snapshot — no time-aware or streaming churn signals

07Future improvements

  • Calibrated churn probabilities tied to retention-offer economics
  • Drift monitoring once deployed against live data

08Key takeaway

A clean, explainable classifier with a real metric (0.82 ROC-AUC) and drivers a business could act on.