Could someone help me understand if a dagster graph best supports my use case?
I’m building something similar to Plaid. I’m planning on using a dagster graph to orchestrate my data pipeline. Here’s a simplified version of what it looks like:
1. Extract the user’s bank transactions. This involves logging into their bank with their submitted credentials, scraping their bank transactions, and saving this raw data to bucket storage.
2. Transforming the bank transactions. This involves downloading the raw data from bucket storage, transforming the data into our standardized format, and saving this data to a postgres table.
This graph must be run on a per-user basis. There are two instances in which this graph is ran:
1. When a user is first created
2. During a daily job to refresh all user’s data
Thus, if I have 1M users, then I would need something that supports 1M concurrent graph runs. Is this achievable with dagster?