Joel Olazagasti

05/17/2023, 3:47 PM
Does anyone else who uses ingestion-as-code have a use case with a set of streams that have an incremental update schedule and a full refresh schedule? We're syncing from Salesforce, and our setup has many formula fields, which slowly drift with incremental syncs, so once a week we'd like to full refresh to bring it up to date. The issue I'm running into is that, if I set up 2 connections with different syncing patterns in Airbyte, they'll show up as separate data assets in Dagster. Is there a neat way to coalesce those, or maybe some alternative solution?

Guy McCombe

05/23/2023, 2:02 PM
I’m also interested in this FWIW. Have you had any breakthroughs?

Joel Olazagasti

05/23/2023, 2:09 PM
For right now I just have the full-refresh sync manually configured & scheduled in Airbyte to work around it. My team is actually meeting with Salesforce later today about their native Snowflake integration. My understanding of the product is that you essentially just mount their data lake as a stage in Snowflake, and have realtime zero-copy access to the underlying data. It's unclear if the product is live, but I'm holding off on a more robust solution here until I know I have to. If that option isn't available, I'll probably look into extending the ingestion-as-code library a bit, either adding the ability to define an Airbyte schedule on the connector definition, or a non-asset based way to create a sync. I think the larger concept of having 2 assets that coalesce to 1 seems a lore more difficult, and would probably require some changes deeper in Dagster's core functionality