wave I would like to try Dagster at my company and I have t dagster #ask-community

:wave: I would like to try Dagster at my company a...

Matt Delacour

04/05/2022, 2:33 PM

👋 I would like to try Dagster at my company and I have the following requirements • Being able to run batch jobs with DBT and pandas dataframe • Ability to validate input/output with GE (Great Expectation) or similar • Ability to move off pandas dataframe to a distributed framework easily the day the data is too big (in a couple of years) • Ability to run jobs incrementally in a reproducible way How should I deploy Dagster in your opinion ? Should I go with a mono repo approach or multi repo (one for DBT jobs, one for pandas jobs) like suggested in the deploying with helm doc ? If we go with the multi user repo approach, can we schedule jobs from different repos ? Thanks in advance for your help 🙏

Alex Service

04/05/2022, 3:05 PM

I’ve not played with dbt or ge, but I don’t think you’ll encounter much trouble here, since dagster provides methods of both structural validation and value validation. I’ve used the helm deployment for the multi-user repo approach and it’s a day or two to setup if you have a decent devops person; I recommend that. I made the same pandas choice and hope that a transition to dask or similar won’t be too painful 😛

👌 1

Alex Service

04/05/2022, 3:06 PM

Each user code repo can define schedules as part of their

@repo

; that won’t be an issue

Matt Delacour

04/05/2022, 4:04 PM

Each user code repo can define schedules as part of their
@repo

So can you schedule jobs coming from different

@repos

Matt Delacour

04/05/2022, 4:05 PM

Also any recommendation/experience about running jobs incrementally with Dagster ?

Alex Service

04/05/2022, 4:48 PM

Yes, the schedules defined in the user code repos will show up in dagit (but there’s no way in dagit itself to change the schedule, other than on/off at the moment)

Alex Service

04/05/2022, 4:49 PM

When you say incrementally, I assume you mean something like a partitioned job, e.g. daily partitions. Dagster does support that as well

Matt Delacour

04/05/2022, 4:59 PM

When you say incrementally, I assume you mean something like a partitioned job

Might be it. Like the ability to read increments of data (not necessarily timeseries) and append the output at the end. Will have a look at the partitioned job

Alex Service

04/05/2022, 5:01 PM

Another method that may work then: https://docs.dagster.io/concepts/ops-jobs-graphs/dynamic-graphs#a-dynamic-job

2 Views

Open in Slack

Previous Next