Feature stores are an important part of the modern data-driven landscape. They are a great way to improve the efficiency and productivity of your team. But what are feature stores? Does your organisation need one, and why? And when should you start building one?
Feature engineering is the process of transforming raw data into features that can be used by machine learning (ML) models for predictions and analytics. It involves extracting important information from the data, cleaning it up, and converting it into a format that the model can understand.
A feature is a specific attribute of data that is useful for modelling. Features are usually built from aggregations sums, averages, minimums, maximums, and so on. These aggregations are then used to inform and enable a machine learning model to predict something.
Uber’s ML engineering team built and popularised feature stores back in 2017 when they introduced Michelangelo, an ML-as-a-service platform, which made building, deploying and operating ML solutions a bearable process. Today you can find a wide range of competing feature stores, each with their own unique benefits and capabilities. For example, Google has one in Vertex AI and Amazon has one in SageMaker.
Features can be numeric or categorical and can be extracted from data in several ways — for example, building a model to predict fraudulent transactions. A relevant feature might be whether or not a person’s spending habits seem unusual, or if they’ve made any purchases in a different country.
A feature store, then, is a centralised data management system that lets you store, manage and distribute features to ML models. Ultimately, a feature store improves the accuracy of your models, saves time and increases productivity. More importantly, it reduces the amount of time that data scientists spend on discovering and calculating features that are often repeated within the same company.
Bear in mind that feature stores aren’t necessary for every company: if you’re not doing any ML, or only have a small amount of slow-changing data, then you can probably stop reading here.
Key reasons
While there are plenty of reasons to invest in a feature store, here are the key ones:
Repeatability
As soon as you find yourself repeatedly doing something, you need to write a piece of code that replicates that. It’s the same for a feature store. Instead of your data scientists having to start their modelling from scratch, using a feature store allows them to employ similar models already developed, saving time and money. By extracting your features into a reusable format, you avoid the need to rebuild your models every time you want to use them.
Centralisation
As we all know, data often sits in various spaces around organisations. Having a feature store centralises your data and makes it easier to access. If everyone connects to the same data source, for the visualisations and for the ML models, it becomes your single source of truth.
Auditability
Biased data and algorithms skew decision making in ways that might disadvantage certain interest groups. For example, a recommendation engine for a news agency could develop a bias towards content that drives “likes” — and therefore more newsfeed time — because it stirs an angry emotional response from an audience. With a feature store, you can easily identify what data your model has been trained on and compare that to the actual feeds it’s receiving.
Cost and time benefits
Feature stores make data easily accessible to analysts and scientists so that they can build models and analyse results. This includes data that is stored in different formats or locations. Feature stores are better for computing costs, too. When data is dispersed across different locations or formats, it can be difficult and expensive to compute. A feature store helps to consolidate all that data into a manageable format, making things easier and more affordable
Weighing up whether to build a feature store? Learn more via our free downloadable Feature Store e-book written by our machine learning experts at Teraflow.
- The authors, Christian Viljoen and Dominic Kafka, are with Teraflow
- This promoted content was paid for by the party concerned