Predictive Analytics 101: A Beginner’s Guide

Picture this: You are about to present the most recent data analysis project to company executives. Your dataset has the potential to influence new marketing campaigns, develop RFP material, and spur new sales. It is sitting on the cloud for easy access and interpretation. You even have a dashboard with visualisations that perfectly illustrate the dataset’s immense power. You are going to make waves.

Five minutes into the slides, an exec interrupts you, “How will this data change in the future?” Before you can answer, another exec says, “How do we know this dashboard is really telling us everything?”

Taken aback, you stop to think. The data you are showing these execs is accurate — your QA team was knee-deep in testing for months. But can you really say if and how this data will change? The dataset and dashboard are just snapshots in time. No one can predict the future.

But what if you could get close? Modern brands need more than just point-in-time reporting. They need to mitigate future risk, increase sales and customer satisfaction, and streamline their processes. To do so, companies in countless industries are turning to predictive analytics. Harnessing the power of predictive analytics means understanding its current applications, its intersection with the cloud, and the science behind it.

What is predictive analytics?

Predictive analytics is the practise of aggregating and analysing historical data to anticipate future outcomes. Aggregating multiple datasets connects the dots between different departments, business processes, and types of data (structured vs. unstructured).

However, simply aggregating various data points does not necessarily indicate future behaviour. Predictive analytics leverages statistical techniques like data modelling, machine learning, and even artificial intelligence to uncover patterns in big data.

While these patterns cannot predict exactly what will happen in the future, predictive analytics can identify trends, herald disruptive industry changes, and allow for more data-driven decision making.

Practical applications of predictive analytics

Any field that captures data is a candidate for predictive analysis. Everything from enhancing cybersecurity to developing more targeted marketing to strengthening actuarial performance is fair game.

Predictive analytics in healthcare

Healthcare is a primary use case for predictive analytics. A major issue in healthcare is the difficulty of predicting patient risk. Actuarial teams need to establish optimal insurance rates and governmental requests for reimbursement for members with various health issues.

Due to this need, health insurance agencies were some of the first companies to adopt big data practises. Actuaries use predictive analytics to determine things like a patient’s predisposition for developing a worsened condition, or a patient’s likelihood of participating in sponsored wellness activities.

Predictive analytics allows health insurance companies to examine patterns of risk among patients of similar age, with similar conditions, and from similar social determinants of health. Armed with this information, health insurance companies are able to make more informed financial and ethical decisions.

Predictive analytics in finance

Lending, a key function of the financial services industry, has been revolutionised by predictive analytics. Before a bank gives out a loan, they want to make sure that a customer is trustworthy. Ultimately, they want their money back. So how do underwriters gauge that trust?

Until several years ago, underwriters would judge an applicant based on past performance and personal hunches. Underwriters would review the applicant’s history and debt-to-income to arrive at a convoluted interest rate. As new financial laws emerged, lenders had to develop a more statistically relevant method for underwriting.

The lending industry underwent a revolution when third-party predictive analytics models like VantageScore and FICO Score became available. These models allowed lenders to calculate accurate risk-based interest pricing, and limited subjective bias. Instead of basing interest rates on a few outdated metrics, the VantageScore and FICO Score models are based on the performance of millions of borrowers with similar spending tendencies.

Predictive analytics in the real world

While hypothetical use cases are interesting, what about real world applications of predictive analytics?

Meeting the customer where they shop

Children’s clothing and accessories retailer Tape à l'œil has a base of 250 stores in France, 25 in Belgium, and 11 in Poland, and a partnership network that operates in the Middle East, North Africa, and overseas. Guillaume Porquier, Information Systems Director at Tape à l'œil, describes the critical importance of predictive analytics: “If a product is not available at time T, the sale is lost. You run the risk both that the customer will turn to another retailer, and that the product will have to be sold at a discount at the end of the season.” 

Internal and external data provide the stores with insight into customer behaviours and buying habits. This information can then be incorporated into predictive models. With the data collected online, the company can also enhance its customer marketing data with customer journey analysis and predictive analytics. 

The company can now capture digital traffic data from European merchant websites and post KPIs for the management team. They also capture customer feedback through surveys from different angles, such as customer level of satisfaction. Finally, they retrieve campaign results from Facebook and Instagram to cross-reference them in the data lake, making all this information available to the various teams who need the data. 

Engaging users with the right recommendations

Lenovo, a personal technology company that produces PCs and smartphones, serves customers in more than 160 countries. Faced with standing out in a highly competitive industry, Lenovo realised offering innovative products was not enough. They needed to create new categories of products to enhance the customer experience.

In order to be highly effective, Lenovo set a goal to understand customer needs by using data sets outlining their expectations, behaviours, and preferences. To do so, the company developed a channel-agnostic and real-time predictive analytics practise that involved acquiring data from a variety of touch points. This predictive analytics model helped Lenovo improve the customer experience and achieve an 11% increase in revenue per retail unit.

Establishing a 360 view of the customer

Air France–KLM is a world leader in its three main business lines: passenger transportation, cargo transportation, and aeronautics maintenance. With 90 million annual customers and 2.5 million monthly unique web visitors, Air France-KLM places data management as a key priority to maintaining customer satisfaction.

Leveraging their data, Air France–KLM developed a 360° customer approach based on predictive analytics. From providing call centre agents with a complete customer history to sending targeted promotional offers and launching customer service bots, the company created an exceptional customer experience by anticipating needs. In fact, Air France–KLM went as fair to identify customers’ main stress factors and create a proactive plan of action to mitigate any potential issues.

How predictive analytics works

Predictive analytics seems like magic, but it stems from statistical science. At its core, predictive modelling involves giving the presence of particular variables in a large dataset a certain weight or score. This score is then used to calculate the probability of a certain event occurring in the future.

There are two main statistical modelling approaches used in predictive analytics: classification models and regression models.

Classification models

Classification models are typically binary. For example, one company might be interested in member enrolment. A classification model will tell you if a member is likely to either stay with the company or disenroll in a given timeframe, based on certain criteria. 

Regression models

Regression models are less black and white. Instead of a 0 or 1, regression models will predict an actual number. Consider an example in healthcare: let’s say a member has a BMI of 29. A regression model might predict that the member’s BMI could drop 3 points in the next year with a consistent, healthy diet.

Three techniques for predictive analytics: decision trees, regression, and neural networks

There are several techniques data scientists use to construct classification and regression models. Namely, decision trees, regression, and neural networks.

  1. Decision trees visually represent a path of choices. Each branch of the decision tree is a possible decision between two or more options, whereas each leaf is a classification (a yes or no). Decision trees are one of the more attractive techniques for modelling because they can handle missing values and are simple to comprehend.
  2. Regression is another popular modelling tool. As discussed earlier, regression is used with continuous data as opposed to binary data. Different data questions require different applications of regression. For instance, linear regression is used if only one independent variable can be ascribed to an outcome. If multiple independent variables have an effect on an outcome, multiple regression is most appropriate. Logistic regression is an even more complex form of regression that does not follow the same convention as linear and multiple regression. Unlike the other two regression models, logistic regression is used when the dependant variable is binary. For example, a logistic regression could be used to evaluate how the odds of a patient having a heart attack (binary variable) change with every additional BMI value (continuous variable).
  3. Neural networks represent the final, most complicated technique. This method is becoming more and more in-demand because perfectly linear relationships are rare in nature. Neural networks allow for more sophisticated pattern recognition by employing artificial intelligence.

Although these statistical methods are not new, they are being more widely accepted and used. This can be attributed to the rise in popularity of the cloud.

Big data, the cloud, and the future of predictive analytics

Before the cloud, predictive analytics seemed impossible. Computers did not have the capacity to house petabytes of data, let alone have enough processing power to run labyrinthian data models. The cloud offers companies a way to compile and combine multiple, huge data sets and easily scale their models.

There are many emerging, cloud-based predictive analytics products. In the future, the cloud will enable companies to build their own machine learning models. By teaching a computer to find patterns in the data, the cloud eliminates manual work and allows for greater interpretation and extrapolation.

The cloud also allows for more customisation and flexibility. With the advent of internet of things (IoT)on the cloud, predictive analytics tools could get even more granular in their assessment of people’s everyday habits.

Modern predictive analytics software and tools

Now that companies are able to readily retrieve large datasets from the cloud, there is so much room for big data analysis, and there are many cloud-based predictive analytics software options on the market. While it is vital to have a team of experts to interpret data models, software is necessary to lessen the time to collect, clean, and analyse data. Predictive analytics software can digest both stored and real-time data, and assist in appropriate formatting.

In addition, most cloud-based predictive analytics software integrates well with ERP systems, digital analytics software, and business intelligence platforms that most companies already have. Business intelligence teams can also use predictive analytics software to demonstrate the value of predictive analytics in visual form via dashboards.

Talend is an example of big data software that is universally applicable. Since Talend is an open-source integration platform, it is versatile enough to assist in data preparation, data management, and cloud integration. As companies mature and develop their predictive analytics practise, the first task will be to migrate their data to the cloud.

Ready to get started? Try Talend Data Fabric today to start transforming your company’s data.

Ready to get started with Talend?