Hi everyone We are moving our data platform to dagster and a dagster #ask-community

Hi everyone, We are moving our data platform to d...

Charles Couble

03/08/2023, 9:18 AM

Hi everyone, We are moving our data platform to dagster and are really enjoying the journey ! 🙂 Therefore some questions remains to us : we would like to know more about best practices. We have DBT Code that directly translate into Assets. We have Python scripts that executes on a schedule. There is a separate lineage for Assets & Jobs. We think it would be cool to have a single lineage for all our steps. Do you recommend doing jobs that executes Assets ? Or do you recommend translating each jobs into Assets ? Do you have pro / cons of going for assets or jobs ? I'm curious to know your recommendations, Thanks in advance

Vinnie

03/08/2023, 1:08 PM

In my experience there’s a lot of benefit from moving your jobs into assets. It requires a small mindset shift, but it’s well worth it, not only for ergonomics in the system, but also because it brings your thinking closer to what data users see and interact with. As far as creating jobs to materialize assets, I’d recommend you take a look at declarative scheduling. I haven’t been able to fully adopt it into my entire infra due to a small limitation in how I set it up, but it’s running for parts of it and it’s been pretty incredible.

chris

03/08/2023, 9:32 PM

As you mentioned your dbt models are an easy fit for assets, but your python scripts might be translatable to assets as well - I think the important thing to ask is do those python scripts create some sort of defined software artifact? If so, then they are probably a good fit for software defined assets as well, which means everything can live in a coherent asset graph.

Charles Couble

03/09/2023, 8:30 AM

Thanks for your insights @Vinnie & @chris, What makes me wonder about my jobs being "assets" , is that their result will not always "materialize" into something. What I mean by that, is that they are basically data transfer & loading script : But data might be here, or might not be here... (Basically we have a software that transfer JSON data to a SFTP on a unregular fashion , sometimes everydays, sometimes every 2 days , or sometimes later .... - not schedulable on dagster side , and then a python script bring the data up to snowflake) Which means that basically, at the end of our job/Script, we are not sure if we really produce a result. Do you think that still make sense to transcribe then into Assets ?

Vinnie

03/09/2023, 10:42 AM

You can definitely put that logic into schedules/sensors and only kick off an asset materialization if it will result in an output. In terms of scheduling/partitioning, you could use

DynamicPartitions

, maybe check this related thread from a few days ago too: https://dagster.slack.com/archives/C01U954MEER/p1676890440544719

Charles Couble

03/09/2023, 4:41 PM

I see, From my understanding Assets abstraction seems pretty rich. Asset-based pipelines seems to be the way to go 🙂 ! Thank you guys for your insights

2 Views

Open in Slack

Previous Next