Axel Bock
04/30/2023, 1:59 PMAssetMaterialization
are of no help as well.
then i don’t know - at all - how to connect `Jobs`s ,`Op`s, and `Asset`s. can i write def my_op(my_asset): …
? or something like this:
# for example purposes, i would love to have ...
# my_zip_asset = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
# ... and split_asset_op() to split that into singular
# assets each representing one number
@job
def my_job():
# called automatically once zip_asset is available?!
# how do i define zip_asset, when it is being read
# from a queue? as an op? how?
split_asset_op(my_zip_asset)
# if i get this correctly, the OP will then "materialize"
# an asset called "zip_sub_asset" or whatever i want to
# name it, and i then can react on this ... later? somewhere else?
@job
def process_splits():
# called automatically when the "zip_sub_asset" is available?!
# since this is not an annotated function - how do i reference
# the asset without a function?
# stupid op that converts 5 to "0005"
zfill_op(zip_sub_asset)
if anybody could help me with a couple of lines of code instead of documentation pointers, i would be amazingly greatful. i really really don’t get it.Tim Castillo
04/30/2023, 2:34 PMpersist_to_storage
isn't a function that Dagster provides. It's a hypothetical function to show that the user would write the data to storage themselves.
• Where the asset data will be will depend on where you write them. If you're manually writing to the file system, it'll be on the file system that Dagster is running on.
◦ If you're using an I/O manager (ex. returning data from an asset), by default, the data will be written to storage as a pickled file either under the directory defined by the DAGSTER_HOME
env var, if defined, or it'll be in a directory under your dagster project prefixed with tmp*
if DAGSTER_HOME
isn't defined.
• re: connecting ops, jobs, and assets
◦ Because of the dynamic nature of what you're doing, I'd recommend using Ops to create assets, rather than using than the @asset
definition.
◦ Instead, it'd be easier to use an op- and graph-based approach that generates assets.
So it'd be:
1. Op to get/create/magically invent the ZIP file
2. Downstream Op to that that processes those ZIP files
3. Wrap those in a graph to let you loop over each ZIP file
4. Meanwhile, you can use `AssetMaterialization`s to tell Dagster that these assets are being built during these runsAxel Bock
04/30/2023, 5:38 PMReadMaterializationConfig
? is that again something that is created by the user, without any hint except - maybe - the ops name read_materialization
?
• where (in that example) is asset_event.dagster_event.asset_key
coming from? it’s not a property from DagsterEvent
, and there is - of course - no explanation.
◦ is it maybe the asset_key=
parameter from the AssetMaterialization
creation? would description
also be a part of it? is there any mention of that magic anywhere, at all?
the longer i look at that documentation, the more do i wonder if the docs are just horribly sub-par, or the whole thing just utterly over-brained and complex.
as i said, i am really getting frustrated here, and so far the only reason i didnt abandon the whole thing yet is that i am really stubborn.daniel
04/30/2023, 9:56 PMasset_key
is actually a property of DagsterEvent: https://github.com/dagster-io/dagster/blob/master/python_modules/dagster/dagster/_core/events/__init__.py#L632-L634 - we'll look into why it's not appearing in the API docs, there may be a bug in the parsing library that we're using to turn annotations into API docs.
We'll pass the broader feedback about the docs being unclear on to the team working on docs improvements as well - appreciate that there's a steep learning curve with many concepts and that there are many places we can make them better.daniel
05/01/2023, 3:07 AMAxel Bock
05/02/2023, 10:27 AM