hey folks! we’re heavy dbt users running several ...
# integration-dbt
u
hey folks! we’re heavy dbt users running several 10+ builds per hour, and we want to move our orchestration layer to dagster we’re currently trying to figure out what’s the best dagster deployment for us we thought about using dagster cloud, but our understanding is that a dbt build would consume serverless workers that have much more resources than what we actually need in order to run dbt how do y’all recommend handling dbt-focused dagster deployments?
r
We’d still recommend that you use Dagster Cloud so that you don’t have to worry about managing the infrastructure for the orchestration layer. Re: your question about serverless workers, this seems to be a pricing question: you can check our website for details on that. You are correct that in the case for dbt, the worker is just shuffling compute, which is all happening in the data warehouse layer.
u
do you have a sense of how many workers 1 dbt build uses? is it just 1 worker?
r
Just to be clear, in Serverless, we don’t charge by worker number. We charge by compute minute. But yes, a
dbt build
is a single process. And within that, it’s threaded. So it’s just one worker.
u
gotcha!
we don’t charge by worker number. We charge by compute minute.
but if I’m doing something heavy I’ll need more than one process/worker, so I’ll be paying for more compute, right?
r
Yeah — if your future computations translate into increased step duration for your Dagster jobs, then yes you will be paying for that.
u
sounds good!
f
Hey Ricardo. Our pricing model actually just changed due to use feedback. You can read the details here. The punchline is that we have dropped the existing compute-duration based pricing mechanism in favor of event-based pricing (called Dagster Credits) and per-seat pricing above certain limits.
Sorry for the timing, but this happened just last week.
u
hey @Fraser Marlow! thanks for letting me know
unfortunately I think this would be even more expensive for us… we have thousands of dbt models that are materialized every hour and my understanding is that each 1k dbt models materialized hourly would cost 2.4k/month with the new pricing model 😞 and this would be only for orchestration, the real compute is made by snowflake which already costs us dozens of thousands of dollars
👍 4
3
j
I agree with @0xDoing _ that this new pricing doesn't work well for DBT based projects. We use DBT with BigQuery and have about ~1.2k models we materialize several times a day. In the past 30 days, we materialized 174,991 models. Ignoring op and seat costs, Dagster credits for the Team plan would cost us (174,991-10,000)*$0.03 + $100 = $5,049.73. For reference, our entire BigQuery bill last month was $11,129.41. We currently materialize these models in one of our k8s clusters with 2Gi and 200 millicores, so about $10/mo worth of VM compute on GCP with n2 machine types.
👀 2
Not that Dagster pricing should be crazy low as it does provide a lot of value which we're sold on, but it's difficult to justify the cost of Dagster cloud with finance when it's ~45% of our current DWH spend.
f
Thanks for the feedback and the context. This is really helpful as we think this through.
👍🏽 1
👍 1
s
Question for you @0xDoing _ and @Joseph Florencio: what % of your models are views? We're investigating making changes that would make it easy to avoid running models that are views if the models haven't changed since the last time they ran. Depending on your setup, that might be able to trim costs significantly?
u
hey @sandy! I would say like 40% of the models are views or ephemeral that would help with the costs but 60% of several thousand bucks is still a lot
j
@sandy out of those materializations, 73,262 are views. So ~42%.
Full disclosure: We do not use Dagster in production yet. I’ve prototyped a migration for my company, and I’ll be doing the migration sometime this quarter. My 2 cents: I think the challenge is that not all materializations are equal. A company with a single data analyst can easily create hundreds of DBT models in a short time, many of which will be trivial - essentially mirrors of data loaded from providers like Segment/Rudderstack/Fivetran/etc. On the other side you have pipelines that are written in frameworks like Airflow that are substantially more difficult to write, require some amount of compute, and you’ll typically have far less of. This is super reasonable pricing for our airflow style jobs. It doesn’t feel as reasonable for DBT. I’ve been debating between Dagster OSS vs Cloud Hybrid. In the past 30 days we spent 12,091 minutes invoking DBT, which for the previous Dagster pricing I assume would map to 12,091 * $0.03 = $362.73/mo, or ~$4.35k/yr. This new pricing would make it $60k/yr without view optimizations or $36k/yr with optimizations just for DBT invocations. For smaller companies like mine, this essentially takes the cloud offering off of the table. Compared to SaaS pricing for GitLab, GitHub, Astronomer, GCP Cloud Composer, DBT, and others, this feels like you’re double paying - once for the premium per developer seat pricing and then a second time for these tiny materializations even when you bring the compute. FWIW, I can completely understand why you guys came up with this new pricing model, and it likely makes sense in a lot of cases. I don’t have a proposal for how you could improve the pricing that can’t be abused so it doesn’t penalize DBT users so much. Just wanted to relay my feedback.
👍 2
2
g
I'm on the same boat, plenty of dbt materializations and little compute time. We'll try to trim our pipelines to avoid useless materializations but I'm not sure if it will help significantly. On a perfect world my business users wouldn't mind daily
dbt runs
but I lost the battle to hourly ones and this leads to way too much materializations. sadblob
d
hey everyone, just wanted to acknowledge the feedback. Thanks for taking the time to share with us, we’re reading everything and discussing.
daggy love 3
☝️ 1
h
Hi everyone, I'm exploring with the new pricing model using event-based. @Dagster Jarred and @Fraser Marlow In my project, I use dynamic mapping and collect to run my ops. (https://docs.dagster.io/_apidocs/dynamic) How does Dagster count credit in the case I use mapping and collect?
d
hey hoang! this would be a great question for the primary thread since other people might also ask it, but the map step itself clones an op for each separate dynamic out, and each executed op there would consume 1 credit
👍 1
h
Thank you @Dagster Jarred for confirming it.
y
@Dagster Jarred I think we are in the same situation regarding the new Pricing. Currently we use Prefect Cloud, but looking for an alternative since we don’t have the dev time to manage the infra which is a little bit flakey. (ECS Tasks). We have an ETL job from Mongo to Snowflake. It runs every 15 minutes and syncs around 80 collections to Snowflake. We want to use a single @job that runs 80 @ops. I assume the @ops will be good here since it allows parallel execution. Let’s say that in 1 hour the job will run 3 times. So calculating the number of ops per month: 3 * 80 * 730 = 175200 ops Which would cost * $0.03 = $5256 per month. In Prefect this would be super cheap. As you are not billed per Task. (Task is like an Op) (even if we cut down to sync each 1 hour, we are at $1752 per month) Am I missing something here regarding the use-case and my calculations?
👍 1
d
hey @Yaron Levi let me follow up with you on DM
👍 1
h
Yeah, I think with the new pricing of Dagster Cloud, the cost is very expensive compared to other ones (Prefect)
d
going to follow up with you on DM
hey friends, not sure if this will be hugely impactful for anyone on this thread who is probably generating what we would consider ‘enterprise’ levels of activity, but we’ve just expanded the number of credits included in the solo and team plans: Solo -> 7,500 credits included per month Team -> 30,000 credits included per month This hasn’t been announced but we rolled out the actual price changes today, and the pricing page itself will change tomorrow