Customer Retention Analysis and Churn Prediction for B2B Company

Customer retention is a top priority for many companies. Acquiring new customers can be several times more expensive than retaining existing ones. How to improve customer retention?

Business Goals

Gaining an understanding of the customers’ churn reasons is a powerful component of designing a data-driven customer retention strategy. Identifying the customers that are likely to churn and preventing attrition is a challenging task. Simple business heuristics often fall short on.

That is why, the goal of our project was to perform a deep analysis of data in terms of customer retention, building a mechanism for identifying customers at risk of churn, and supporting the prevention of churn.

The customer operates in B2B software as a service business worldwide and has more than 50 thousand registered users.

Customer Retention Analysis & Churn Prediction: Solution Implementation

In-depth analysis of data and business processes

In the first stage of the project, it’s very important to analyze business processes and perform data analysis. We run statistical analysis of all available attributes, analyze existing data structure, as well as customer care department actions and all related business aspects.

Tasks that have been performed from a business point of view:

  • Gathering requirements from business departments
  • Customer care processes analysis
  • Analysis of existing IT infrastructure
  • Preparation of the project plan

Tasks that have been performed from the data point of view:

  • Analysis of available data types
  • Visual data analysis
  • Correlation analysis
  • Outlier detection (dbscan, isolation forest)
  • Missing values analysis
  • Analysis and definition of the “target” variable

Based on the results and insights obtained regarding these steps, we have a better understanding of what variables we will be able to generate at the data preparation stage and what the system architecture will look like.

Data Preparation

At this stage, the main task was to prepare data for machine learning modeling. It is important to properly aggregate data, create all available variables.

When building features that will be used in modeling as variables, we do not limit ourselves to simple aggregation but extract all possible insights from each feature. For example, based on the variable “income/income” we can derive the variable “average income”, but the model will be better if we prepare more variables such as:

  • “last month income/revenue amount”,
  • “average income/revenue in last x month”,
  • “nominal growth/decrease in income/revenue in last x months”,
  • “real (%) growth/decrease in income/revenue in last x months”
    and similar variables,
  • and similar variables.

It is also very important to define the target variable. Depending on the industry, the definition may differ. For example, for our client, a company that offers Saas software, we collectively defined 60 days of inactivity as churn. This means that the model will predict: “what is the probability that a given company will leave in the next 60 days”


At this stage, it is very important to create a proper machine learning model in accordance with best practices.

During this process our team has performed:

  • Data preprocessing: clean and transform data into an appropriate format,
  • Conduct a feature selection in order to choose the most relevant set of variables,
  • Selecting the appropriate metric to measure the performance of the model (Accuracy, Precision, Recall, F1-score, Fx-score),
  • Train several models (Random Forest, Neural Nets, XGBoost) and optimize hyper-parameters, for instance, using Bayesian hyperparameter optimization,
  • Validate stability of the model based on historical data (ex. using cross-validation techniques),
  • Analyze the results of the model using feature importance, PDP plots, probability distribution, error plots, and more.

Machine learning prediction flow:

Machine learning prediction flow

4. Deployment and Integration

After completing the modeling stage, we integrated the solution with the existing infrastructure.

  • The solution has been fully connected with the Data Warehouse. T-sql routines process and prepare data for the modeling part.
  • Machine Learning models are built and implemented using Python technology (packages: scikit, xgboost, keras, tensorflow). The technology is available in open source (without additional paid licenses). Results of the Machine learning models are saved in the data warehouse,
  • The user interface (UI) as a source was connected to the data warehouse and presented the churn prediction results in combination with various important and relevant data (tables),
  • The results were also integrated with CRM systems and notifications for customer service departments with information on why a given customer/user plans to withdraw from services.

The architecture of the solution:

Machine learning churn prediction

Customer Retention Analysis & Churn Prediction: Analytics Interface

Customer retention analysis UI helps to understand how customers are influenced by different business decisions. By analyzing customer churn, business users are able to understand and see trends in product or service satisfaction/dissatisfaction.

Analysis based on cohorts and demographic data can be very helpful. It delivers insights on what is impacting particular customer decisions (price changes, new products or services, product upgrades or changes in customer communication, and other).

Dashboards provide a convenient interface where you can visualize and analyze data and focus on key performance indicators (KPIs) from across your organization, helping you gain valuable insight and make quick and accurate decisions.

Key customer retention benefits:

  • Split customers into cohorts and custom lists to find out who is driving your business growth and answer complex questions about your next investments,
  • Ability to conduct in-depth analysis to gain insight and correlations between different subscriptions and business activities,
  • Monitor all KPIs in one place to understand business performance.

Customer Retention Rate Monitoring

The customer retention rate is an essential metric in any B2B business. The retention rate is the ratio of the number of customers retained to the number at risk. It helps to monitor performance in attracting and retaining customers. Additionally, it helps in analyzing trends and monitoring customer success performance within the company.

customers churn

It also allows you to analyze how the number of customers changes over time – how many new acquired, lost, inactive, and active customers you have over time and compared historically month to month.

Churn Prediction Analysis

Customer churn analysis refers to the customer loss rate in a company. This analysis helps b2b companies identify the cause of customer loss and implement effective customer retention strategies.

Combining financial KPIs with demographics helps you understand which customer segments to focus on.

The main information in the report:

  • Churn probability forecast for a specific client (scoring),
  • Detailed information on the reasons for churn,
  • Highlighted the most risky segments and customers,
  • Historical customer activity.

Cohort Analysis

A cohort analysis report helps you analyze how many customers leave your company’s services and over what period, and understand what is causing an undesirable shift in dynamics.

On the right, there is an example of a cohort analysis chart with advanced filtering (subscription, region, segment, cohort).

Each row is responsible for the start of using the services in the company. Each column is responsible for the next month of activity and the values inside show how many percent of customers remain active.

Customer Retention – Behavior Analysis

Customer behavioral analysis report which shows the information on why an individual customer decides to leave. After entering the customer ID, the following report appears.

Customer behavior analysis

  • Red – means which features increase the probability of customer churn
  • Blue – means which features decrease the probability of customer churn

Data Importance

This chart shows the significance of the variables from a trained ML model.

At the very top, there are the most important variables that determine whether a customer decides to leave.

This type of report helps you find important factors that influence your decision to leave.

Some information can help you find very important factors that no one has paid attention to before.

Customer Retention Analysis & Churn Prediction: Results

  • Customized machine learning model which helps to prevent churn with an accuracy of up to 90%,
  • Business departments have at their disposal a tool that integrates various data sources and presents conclusions in one convenient interface (UI),
  • The time needed to analyze individual customer activity in order to understand the reason for opting out has been reduced,
  • The implemented system saves working time thanks to the daily information processing of tens of thousands of customers and automatic drawing of conclusions,
  • The efficiency and performance of the customer service department has been improved by increasing the level of automation in the company,
  • The implemented solutions help to react quickly to retain customers, resulting in increased retention and reduced overall customer acquisition costs.

If you want to discover customer retention examples – read our article on how to increase customer retention.

Planning AI or BI project? Get an Estimate

Get a quick estimate of your AI or BI project within 1 business day. Delivered straight to your inbox.