Hi, I created my first pipeline using Dagster and...
# community-showcase
d
Hi, I created my first pipeline using Dagster and I have used assets to build the steps of the ETL process. I am a bit confused about the difference between ops, jobs and assets. My understanding is that Jobs can run Ops and accept configuration (e.g. I can change the behaviour of the code based on execution time). Is that something that cannot be achieved with assets? What about scheduling? Is scheduling more "powerful" when applied to Ops+Jobs vs Assets? I am also considering doing this course: https://corise.com/course/dagster in order to get a more in depth overview of all the functionalities that Dagster offers so that I can build things in the right way from the start. Has anyone tried it? Is it good?
t
As I understood it, assets bundle ops with input / output management. You can create a job to materialize an asset with
create_asset_job
. The documentation sadly isn't very clear (to me) when you can use something thats described for ops and jobs also for assets (or how you do it).
If you don't create a job for an asset, than there is an
_ASSET_JOB_0
(at least for me) that materializes all of them.
d
For now I am just materialising from the UI - "materialize all" and the pipeline just runs as intended! @Tobias Pankrath
v
This takes some getting used to, I’ll give you that. In the background, an
asset
is an
op
, the significant difference between the two is that `asset`s track their outputs and dependencies. You can also configure assets at runtime by passing configs through the “materialize” button in the GUI or through the run requests. The usual advice is to use `asset`s whenever the operation materializes a persistent object that you wanna track and expose lineage for. If you wanna run an order of operations but not necessarily track every single output, you can use a graph-backed asset. Scheduling should work the same way either way, the only difference as Tobias said being that you need to wrap `asset`s in a
job
and then schedule the job. I’m hoping this limitation will be removed in the future for full declarative scheduling (as a sidenote, you can use `sensor`s to materialize `asset`s without first defining a
job
) Regarding the course, I took it last year and even though I already knew a lot about Dagster, still felt like I left with a lot of nice little bonuses that I implemented in my work. I also know that Dennis has done a lot of work in the background since that edition (e.g. the concept of assets was still fairly fresh last November, so it wasn’t a major part of the course). Paging @dhume
❤️ 2
d
Thanks for the plug. This will be the third time doing the course and I always just try and keep up with Dagster which is very hard given how quickly Dagster evolves. We start with
ops
in the first week and end on
assets
. The course will show the differences between them but still let you apply similar patterns (like schedules) to both
d
@dhume @Vinnie Thank you guys. This is amazing and incredibly helpful!! bless you
t
(as a sidenote, you can use `sensor`s to materialize `asset`s without first defining a
job
)
That's new, isn't it? I am pretty sure I had to define a job for a sensor a couple of weeks ago, at least that's what was in the docs.
v
@Tobias Pankrath it’s experimental, but supported. https://docs.dagster.io/_apidocs/schedules-sensors#dagster.sensor
That's new, isn't it? I am pretty sure I had to define a job for a sensor a couple of weeks ago, at least that's what was in the docs.
Fairly new - a couple months ago