Predicting customer churn or attrition with call center data set

Description

This project uses the customer service call dataset to predict customer churn and identify the key features influencing this behavior. The dataset is sourced from the model data package in R, an MLC++ machine learning software challenge problem. It contains 19 input variables and 1 binary outcome highlighting customer characteristics and company interactions.

Technologies

Python3, Pandas, Numpy, Seaborn, Matplotlib, Statsmodel, SciKit Learn,

Overview

The primary objective of this project is to demonstrate how to use a customer service call dataset to predict customer churn and identify the key features influencing this behavior. This project follows a systematic approach, including data preprocessing, exploratory data analysis (EDA), feature engineering, model building, model performance evaluation, and cross-validation. By applying robust predictive models, I identified the most significant features influencing customer churn, ultimately helping businesses enhance their retention strategies. The dataset contains various data points that highlight customer characteristics and company interactions but does not include personally identifiable information.

Project Outcome

This project yielded two (2) models out of 8 that was developed and these 2 models can be used for feature prediction on a new dataset. The primary model, Model 8, included interaction terms and polynomial features and demonstrated the highest performance with a mean ROC_AUC score of 0.6798 and a 95% confidence interval of 0.0117. These findings underscore the importance of advanced feature engineering and model selection techniques in accurately predicting customer churn. More importantly, they provide actionable insights for enhancing customer retention strategies, thereby contributing to the stability and growth of businesses.