Fabio Picchi
04/17/2023, 10:26 AMAndras Somi
04/17/2023, 10:55 AMFabio Picchi
04/17/2023, 11:20 AMFabio Picchi
04/17/2023, 11:21 AMLe Yang
04/17/2023, 2:11 PMFabio Picchi
04/17/2023, 2:14 PMFabio Picchi
04/17/2023, 2:16 PMFabio Picchi
04/17/2023, 2:17 PMLe Yang
04/17/2023, 2:32 PMAndras Somi
04/17/2023, 3:29 PMFabio Picchi
04/17/2023, 3:34 PMFabio Picchi
04/17/2023, 3:34 PMFabio Picchi
04/17/2023, 3:35 PMowen
04/17/2023, 8:48 PMNothing
type output will result in no serde behavior, meaning there will be no communication between upstream and downstream assets as to the location of files
• This is fine in the case that the filepath is a function of the asset key + other static metadata about the asset, as both the upstream and downstream can run that same function to figure out where to read/write the file. However, if the filepath might change at runtime, then this pattern won't work.
• You can also add metadata to a specific materialization event at runtime, but this materialization event is not (by default) available to you in body of the downstream asset. So theoretically you could use context.add_output_metadata in the upstream asset, to indicate the file location, then in the downstream asset query the dagster instance database to get the upstream event (along with that metadata), then use that to determine where to read from. This would avoid having to serialize this information to an external system like s3, but would require a bit of tinkering inside any of the assets that you wanted to use this pattern for.Dmitry Ustimov
07/16/2023, 10:04 AMowen
07/18/2023, 9:11 PM@asset(io_manager=custom_s3_bytes_io_manager)
def staged_file():
file_bytes = download_file_from_internet()
return file_bytes
@asset(io_manager=snowflake_pandas_io_manager)
def table(staged_file):
# ... staged file is just raw bytes, which can be parsed / transformed as desired
# parse it into a dataframe or something
return parsed_dataframe
In this case, you're correct that we don't have a native IOManager built for this type of operation, but making your own custom IOManager is generally not too difficult (for example, you would just need to slightly modify the existing s3 pickle io manager to remove the pickling step if you wanted to just store raw bytes: https://sourcegraph.com/github.com/dagster-io/dagster/-/blob/python_modules/libraries/dagster-aws/dagster_aws/s3/io_manager.py?L41)Dmitry Ustimov
07/19/2023, 3:32 PM