I was going to start digging into feature stores but having dagster #random

I was going to start digging into feature stores, ...

George Pearse

06/27/2022, 3:33 PM

I was going to start digging into feature stores, but having just had a peak at Feast's I've realised that the API and functionality can be very similar to Software Defined Assets in Dagster (maybe with a more column centric, rather than table / asset centric view). Would I get much in the way of additional benefits from adopting one?

George Pearse

06/27/2022, 3:34 PM

Is it Dagster SDAs + a storage solution with fast serving?

sandy

06/27/2022, 3:42 PM

My understanding is that Feast's value-add is in helping you align feature definitions between: • An offline store that's used for model training • An online store that's used for model serving I believe @Charles Lariviere is using Feast together with software-defined assets, so he might have more intelligent things to say about how they fit together than I do. We've internally discussed a Feast + SDA integration.

❤️ 1

Charles Lariviere

06/27/2022, 8:11 PM

We’re actually in the progress of deploying our feature store using a combo of Feast and Dagster’s SDA! From my point of view, the components of a feature store are: 1. Feature catalog: Allows you to codify, version-control, and share a central catalog of features across the org/team. 2. Offline store: Facilitates the creation of training datasets with point-in-time correctness for features/labels. 3. Online store: Low-latency storage layer for inference. Dagster’s SDA are amazing to create your catalog of data assets, which includes features as a subset, so it somewhat covers #1. However, one of the big value-add of a feature store such as Feast, is that #2 and #3 are directly built through Feast’s API on the same exact datasets, which ensures that there is no training-serving skew caused by using different datasets for training and serving. Feast handles the ingestion of batch features in your online store, but also supports streaming features (and then pushes them down to your offline store to generate future training datasets) which isn’t something you would typically handle within Dagster’s SDA from my POV. We wrote an internal package (which is in a very early stage) to integrate Feast + SDA , but it essentially generates Dagster assets from a Feast repository and automatically creates the lineage to your underlying assets, similar to the

load_dbt_assets_from_project

method does in the

dagster-dbt

package. We can also schedule Dagster jobs that update features in our online store through Feast’s API, leveraging the SDA lineage (which is extremely powerful!) which allows us to update our online store when underlying data has been refreshed by other systems in our stack. We’re still very early in the implementation of the above, but I’m happy to answer any follow-up questions you have!

🍗 1

🤯 2

George Pearse

06/27/2022, 8:18 PM

Beautiful answer, cheers @Charles Lariviere. Do you know if feature stores particularly low latency for embeddings / vectors?

🙌 1

Charles Lariviere

06/27/2022, 8:28 PM

Are you referring to nearest neighbour search on embeddings? I don't think that's something that Feast currently supports but I may be wrong; it's more of a key-value lookup (i.e. get features for a given entity). You can do ANN on Redis though (which is one of the available backends for Feast’s online store).

George Pearse

06/27/2022, 8:29 PM

I'm actually using QDrant and love it. Just not quite sure how low latency it gets yet. Wasn't sure if feature stores had overlapping functionality with vector databases or not. QDrant supports positive and negative example recommendations, + hybrid search + collection updates

Charles Lariviere

06/27/2022, 9:28 PM

Interesting! I didn’t know of QDrant, I’ll look it up!

Open in Slack

Previous Next