Project Statement
Portobello Tech is an app innovator who has devised an intelligent way of predicting employee turnover within the company. It periodically evaluates employees' work details, including the number of projects they worked on, average monthly working hours, time spent in the company, promotions in the last five years, and salary level.
Data from prior evaluations shows the employees’ satisfaction in the workplace. The data could be used to identify patterns in work style and their interest in continuing to work for the company. The HR Department owns the data and uses it to predict employee turnover.
As the ML Developer assigned to the HR Department, you have been asked to create ML programs to analyze, predict, and implement targeted retention strategies.
Data Dictionary
The original dataset provides various metrics around employee performance, satisfaction, and status.
| Feature Column | Description |
|---|---|
| satisfaction_level | Satisfaction level at the job of an employee |
| last_evaluation | Rating between 0 and 1, received by an employee at his last evaluation |
| number_project | The number of projects an employee is involved in |
| average_montly_hours | Average number of hours in a month spent by an employee at the office |
| time_spend_company | Number of years spent in the company |
| Work_accident | 0 - no accident, 1 - accident during employee stay |
| left | Target Variable: 0 indicates employee stays, 1 indicates employee left |
| promotion_last_5years | Number of promotions in their stay |
| Department | Department to which an employee belongs |
| salary | Salary categorized level (e.g., in USD) |
Project Workflow Pipeline
-
Data Quality & Exploratory Analysis (EDA)
- Perform data quality checks by looking for missing values.
- Understand contributing factors to employee turnover via EDA.
- Plot a heatmap of the correlation matrix between numerical features.
- Draw distribution plots for
satisfaction_level,last_evaluation, andaverage_montly_hours. - Compare employee project counts for both stayers and leavers via bar charts.
-
Employee Segmentation (Clustering)
- Focus on employees who left using
satisfaction_levelandlast_evaluation. - Perform K-Means clustering to create 3 distinct clusters of churned employees.
- Extract business conclusions based on cluster characteristics.
- Focus on employees who left using
-
Data Preprocessing & Resampling
- Separate categorical and numeric variables, applying
get_dummies(). - Perform a 80:20 stratified train-test split (
random_state=123). - Handle Class Imbalance by upsampling the training dataset using the SMOTE technique.
- Separate categorical and numeric variables, applying
-
Model Training & Evaluation
- Train Logistic Regression, Random Forest, and Gradient Boosting classifiers.
- Apply 5-fold cross-validation and evaluate classification reports.
- Identify the best model and justify the evaluation metrics (ROC/AUC curves, Confusion Matrix).
- Determine whether to optimize for Recall or Precision based on the business use case.
-
Predictive Retention Strategies
- Predict the probability of employee turnover on test data using the best model.
- Categorize employees into four defined risk zones based on calculated probabilities.
- Provide targeted retention strategies for each zone.
Safe Zone
Score < 20%Low-Risk
20% < Score < 60%Medium-Risk
60% < Score < 90%High-Risk
Score > 90%Evaluation of the HR Employee Retention Analysis
By categorizing employees into different zones based on their predicted turnover probability, we can implement more targeted and effective retention strategies:
1. Safe Zone (Green) - Predicted Turnover Probability < 20%
- Description: These employees are highly likely to stay. They generally have high satisfaction, good performance, and feel engaged.
- Strategy: Maintain engagement and reinforce positive aspects. Focus on regular check-ins, recognizing achievements, providing opportunities for growth (even if they're already satisfied), and ensuring they remain advocates for the company culture. These employees can also serve as mentors for newer staff.
- Actions: Continue with standard HR practices, leadership development programs, and opportunities for internal mobility.
2. Low-Risk Zone (Yellow) - Predicted Turnover Probability 20% - 60%
- Description: These employees have some indicators that could lead to turnover but are not immediately at high risk. They might be passively looking or could be swayed by minor external factors.
- Strategy: Proactive engagement and identification of potential friction points. Implement early intervention strategies to address any emerging concerns before they escalate. Focus on career development and ensuring a clear path forward.
- Actions: Conduct "stay interviews" to understand what keeps them engaged, offer mentorship, provide skill development opportunities, clarify career progression paths, and regularly review compensation and benefits to ensure competitiveness.
3. Medium-Risk Zone (Orange) - Predicted Turnover Probability 60% - 80%
- Description: Employees in this zone show significant indicators of potential turnover. They are likely actively disengaged or exploring other options. Urgent and direct intervention is required.
- Strategy: Aggressive retention efforts focused on understanding their specific grievances and offering tailored solutions. This group might include those performing well but with low satisfaction or those feeling stagnant.
- Actions: Direct manager intervention, personalized career coaching, review of workload and responsibilities, potential reassignment to new projects, addressing specific concerns about work environment or management, and consideration of salary adjustments or promotion opportunities where appropriate. Emphasize their value to the company.
4. High-Risk Zone (Red) - Predicted Turnover Probability > 80%
- Description: These employees are highly likely to leave, and some may have already made up their minds. This group often includes those with very low satisfaction and/or poor performance.
- Strategy: While retention efforts can still be made, it might also be pragmatic to focus on knowledge transfer and smooth transitions if departure is inevitable. For those who are still undecided, a last-ditch effort might involve significant changes (e.g., new role, significant pay raise, extensive support).
- Actions: Immediate and direct conversation with HR and management to understand their intentions. Offer substantial incentives or changes if appropriate and feasible. If departure is certain, focus on minimizing disruption, ensuring a positive exit experience, and conducting thorough exit interviews to gather critical feedback for future prevention.