We’re actually in the progress of deploying our feature store using a combo of Feast and Dagster’s SDA! From my point of view, the components of a feature store are:
1. Feature catalog: Allows you to codify, version-control, and share a central catalog of features across the org/team.
2. Offline store: Facilitates the creation of training datasets with point-in-time correctness for features/labels.
3. Online store: Low-latency storage layer for inference.
Dagster’s SDA are amazing to create your catalog of data assets, which includes features as a subset, so it somewhat covers #1.
However, one of the big value-add of a feature store such as Feast, is that #2 and #3 are directly built through Feast’s API on the same exact datasets, which ensures that there is no training-serving skew caused by using different datasets for training and serving. Feast handles the ingestion of batch features in your online store, but also supports streaming features (and then pushes them down to your offline store to generate future training datasets) which isn’t something you would typically handle within Dagster’s SDA from my POV.
We wrote an internal package (which is in a very early stage) to integrate Feast + SDA , but it essentially generates Dagster assets from a Feast repository and automatically creates the lineage to your underlying assets, similar to the
load_dbt_assets_from_project
method does in the
dagster-dbt
package. We can also schedule Dagster jobs that update features in our online store through Feast’s API, leveraging the SDA lineage (which is extremely powerful!) which allows us to update our online store when underlying data has been refreshed by other systems in our stack.
We’re still very early in the implementation of the above, but I’m happy to answer any follow-up questions you have!