hi y all just getting started with Dagster and I ve got a co dagster #ask-community

hi y'all - just getting started with Dagster, and ...

sri raghavan

03/07/2023, 9:15 PM

hi y'all - just getting started with Dagster, and I've got a couple of questions about asset persistence. • we're hoping to run Dagster locally to help us iterate on some workflows we're building (an ideal flow might be: write some code / click "Materialize" / view the resulting CSV / edit the code / repeat), and I'm having a hard time figuring out the best way to view the result. We're using local file storage, so obviously we can just visit the path for a given asset, but (1) the file is in pickle format, and presumably needs to be unpickled to be properly viewed, and (2) it's somewhat cumbersome to have to use a terminal/etc for viewing the file when almost everything else can be done with Dagster. Does anyone have recommendations on what to do here? • relatedly, that resulting CSV is meant to be directly served to customers (e.g., for them to download by clicking a link); what's the "Dagster way" to model / think about publishing that raw result (without metadata) for download? (is "publish" a

@job

that uploads to S3 and persists the blob path to our production customer DB? is this use case better served by a custom

IOManager

? etc)

chris

03/07/2023, 9:21 PM

Regarding (1) - check out https://docs.dagster.io/_apidocs/assets#dagster.AssetValueLoader.load_asset_value I think that solves your use case (at least via python) Regarding (2) - I think this is still a fit for assets - essentially you think about the persisted blob path as a software artifact, and model an asset based upon that

Mark

03/08/2023, 5:43 AM

I've had some success by adding markdown metadata to give me the top N rows of my dataframe in Dagit. You can see my post here to get an idea. You may also customize the io_manager to save to .csv (or some other format) so you don't necessarily have to use pickle files.

sri raghavan

03/08/2023, 7:15 AM

@chris - for (2) - makes sense! the CSV we're producing is partitioned by

customer

and

date

, so presumably we'll need both of those partition keys available when uploading / generating the blob path. it sounds like partitioning the dependent job with the same

partition_def

is the way to do that? (at least, i was able to get it working that way). for (1) - yep, that helps, though what I'm really looking for is the ability to view the asset contents in Dagit directly @Mark - that's a great idea. it's not the whole asset, but it'll certainly give us enough to be able to spot-check results, and I think that's what we're looking for here to grease the iteration wheels. thanks!!

Open in Slack

Previous Next