https://dagster.io/ logo
#integration-dbt
Title
# integration-dbt
n

Nicolas May

05/01/2023, 1:53 PM
Hi there! I'm trying to use Dagster to run full dbt DW builds using dagster-dbt library, but I can't get a full build to run in production... Details in 🧵
The dev deployment is Docker compose on my local machine. The prod deployment is Docker compose on a GCP VM. • 🟢 Full DW builds work fine in dev. (See first screenshot.) • 🟢 Small dbt jobs work fine in prod. (See second screenshot.) • 🔴 Full DW build does not work in prod. The dbt step just hangs at
Started execution of step "run_dbt_1236b"
for hours before I terminate the job. It never gets to the step that should be
Executing command: dbt --no-use-color --log-format json run --project-dir /opt/dagster/app/dagster_integrations/resources/../.. --profiles-dir /opt/dagster/app/dagster_integrations/resources/../../profile --select *
. (See third screenshot.)
I'm guessing it has something to do with the size of our dbt project (533 models, 1319 tests, 0 snapshots, 14 analyses, 859 macros, 0 operations, 18 seed files, 269 sources, 6 exposures, 0 metrics), but I can't get any observability into why things just stall.
• prod VM.... 4 vCPUs w/ 16 GB memory • local dev env Docker resources... 6 vCPUs w/ 9 GB memory
t

Tim Castillo

05/01/2023, 4:43 PM
Hmm, 533 should be well within bounds for Dagster to run. If you go into the raw compute logs (screenshot below), do you see anything in the stderr or stdout?
n

Nicolas May

05/01/2023, 4:44 PM
No... I just see
No log file available
Thanks for replying btw
I should add... The image is running Dagster 1.2.7 and dbt 1.1.1
t

Tim Castillo

05/01/2023, 4:47 PM
Odd that it doesn't have compute logs, which may indicate that the process isn't starting at all.
n

Nicolas May

05/01/2023, 4:49 PM
Here's a job that ran okay but doesn't have compute logs
t

Tim Castillo

05/01/2023, 4:50 PM
Sorry, sent that message without reading your most recent one: I think streaming compute logs are a more recent addition to the integration, so it might not work with dbt 1.1.1. any particular reason preventing you from upgrading to dbt 1.4+? All good if there is; just trying to figure out the right ways to debug this.
n

Nicolas May

05/01/2023, 4:52 PM
The AEs who do a bulk of the dbt work aren't ready for an upgrade just yet... That's my impression anyway
I'm just reflecting the env they use in development and what's currently running in Airflow... Trying to migrate the full DW build away from Airflow but running into this
Would a dbt upgrade fix this?
t

Tim Castillo

05/01/2023, 4:55 PM
It's a possibility. Let me reach out to more of the team to help out with this.
n

Nicolas May

05/01/2023, 4:55 PM
tysm!
My confusion is that it works fine in local dev env
o

owen

05/01/2023, 6:08 PM
hi @Nicolas May! definitely odd, and I'm struggling to come up with a super plausible explanation. how is dagster being deployed? I'm specifically curious about what database you're using for event log storage. another question would be how many of the sources in your dbt project connect to upstream dagster assets
n

Nicolas May

05/01/2023, 6:12 PM
how is dagster being deployed?
Dagster is deployed using Docker compose on a GCP VM.
I'm specifically curious about what database you're using for event log storage.
The backend DB is GCP CloudSQL Postgres 14.
how many of the sources in your dbt project connect to upstream dagster assets
There are quite a few Fivetran assets that are upstream of the dbt assets... and not all of them are included in this problematic job... It's hard to capture in a pic because it's big
You may be able to see from the screenshots that we've got three Fivetran syncs' asset groups upstream of the full dbt step... But we've excluded many that are upstream... Should I try to include them?
Would some of the excluded Fivetran assets being stale or never materialized be blocking this dbt step? ... even though Dagster isn't really materializing the Fivetran assets, but instead is just giving observability into the Fivetran syncs?
I'll add the the working prod dbt job does include all of its upstream Fivetran assets
Also... thanks so much for following up on this! It's def a head-scratcher for me
o

owen

05/01/2023, 6:21 PM
thanks for that info! my best guess at the moment (although it's a shaky one) is that this specific setup is causing the framework to make a bunch of queries to your event log database, which are causing the step to hang.
trying to confirm if that's plausible right now
n

Nicolas May

05/01/2023, 6:22 PM
thank you box
That's interesting... The local dev env just uses a Postgres service defined in the docker-compose.yaml... It doesn't point to any persistent DB
👀 1
o

owen

05/01/2023, 6:48 PM
I'm feeling a bit more confident on this hypothesis -- if you have easy observability into your backend postgres db, it'd be great to take a look at that to confirm if this is happening. I'd expect to see a large volume of queries right when the step starts. each one should be fairly quick but I could imagine this filling up a queue. also looking into what we can do on our side to make this not fire off as many queries
n

Nicolas May

05/01/2023, 6:49 PM
Okay... Thanks much owen! I'll see what I can do to track down queries around the time this step tries to start
🌈 1
b

Bojan

05/01/2023, 7:14 PM
@Nicolas May that's gonna be db highly likely
i have had similar issues, it really comes down due to the amount of events your runs generate, at one point tables get too long for the workload that you have
you're gonna see those 60 second timeouts popping on the db side. Can you prune asset events etc
n

Nicolas May

05/01/2023, 7:15 PM
Thanks Bojan... I guess, how can I improve the situation?
b

Bojan

05/01/2023, 7:16 PM
or is that something that you under no circumstances want to do
what's the size of your pg instance ?
n

Nicolas May

05/01/2023, 7:17 PM
db-g1-small
Scale up?
b

Bojan

05/01/2023, 7:17 PM
ya for sure
n

Nicolas May

05/01/2023, 7:17 PM
Any size recommendations?
b

Bojan

05/01/2023, 7:18 PM
is that gcp or aws
n

Nicolas May

05/01/2023, 7:18 PM
GCP... CloudSQL Postgres 14
b

Bojan

05/01/2023, 7:19 PM
also how often do you run the whole workload, and back to my original question of can you prune assets and events
ya pg on gcp does not use t shirt sizes but it has its own way of picking sizes
n

Nicolas May

05/01/2023, 7:20 PM
Full DW build is once daily when nothing else is running... We can try to prune assets, but I'll need to to talk with my AEs to see what's minimally possible
b

Bojan

05/01/2023, 7:21 PM
ah no need then
👍 1
i build mine every fivetran sync
which happens every 10 minutes hence the question
ya you need to up the db
go for 4 cpus, 8 gig
n

Nicolas May

05/01/2023, 7:23 PM
Okay... thanks for the rec
b

Bojan

05/01/2023, 7:49 PM
btw gcp pg should give some nice monitoring out of the box. You should be able to figure out io load there if this size doesn't pan out.
thank you box 1
n

Nicolas May

05/03/2023, 11:30 AM
Just wanna follow up on this thread... We turned on GCP CloudSQL [Query Insights](https://cloud.google.com/sql/docs/postgres/using-query-insights) to get some observability into the query load and definitely see spikes right before
dbt run
commands We upgraded the GCP CloudSQL DB instance from a shared-CPU, 1.7 GB memory
db-g1-small
machine type to a 2 vCPU 8 GB memory
db-custom-2-7680
and that did the trick... Now the jobs that include dbt are starting super quick and/or not failing Thanks again for your help @Tim Castillo @owen @Bojan
b

Bojan

05/03/2023, 11:31 AM
sweet, note that you might want to start prune-ing the old jobs/events/assets
n

Nicolas May

05/03/2023, 12:33 PM
Good call out... How have you gone about that?
I've set the sensor/schedule tick retention per this, but not sure how to prune old events
t

Tim Castillo

05/03/2023, 12:45 PM
n

Nicolas May

05/03/2023, 12:45 PM
Thanks much!
3 Views