This is a cross post from the < C04EKVBA69Y|dagster snowflak dagster #ask-community

This is a cross-post from the <#C04EKVBA69Y|dagste...

clay

02/14/2023, 5:00 PM

This is a cross-post from the #dagster-snowflake channel, per recommendation of Jaime... I have a question that maybe related to how I'm using dagster with Snowflake or might be that I'm not storing assets properly, etc. This is with a multi-container docker setup. I have a module that I'm mounting as a volume, per the docs. The structure is similar to this example, where I have packages/folders in my assets directory. In the linked example, there are packages for

activity_analytics

core

, and

recommender

. In my case, let's say I have

package1

and

package2

. As with the example,

all_assets

ends up being

[*package1_assets, *package2_assets]

and is passed to

Definitions

in the same way that the assets are in the linked example. If I go into dagit and launch a run to materialize all of the

package1

and

package2

assets, which are in different groups, all works fine. However, if I modify the code to

package2

and then update the git repository and restart dagster/dagit, the system no longer knows that the

package1

assets were materialized, in spite of the fact that

package1

did not change at all. I can understand that

package2

would reset, however, I'd like to figure out how to set this up so that only the assets associated with the package that changed would need to be rematerialized for dagster to know about them. Most of my work/assets is being pushed to Snowflake. Is this simply a matter of dagster not knowing how to find them on Snowflake? Or will all of the assets associated with all of the packages always reset when I push a code change to a single package?

owen

02/14/2023, 7:03 PM

hi @clay! I think this is actually unrelated to snowflake/io-managers in general. Dagster keeps track of historical state inside the DagsterInstance. Essentially, this is a database that has a record of every event that's happened when running dagster code (i.e. "a step started" or "an asset was materialized"). This is separate and independent from the physical assets in snowflake. The fact that history is disappearing when restarting dagster/dagit indicates to me that the instance itself is disappearing between runs. My guess is that in your setup, you're using the default DagsterInstance behavior, which writes these events to a sqlite database in the local filesystem. If the local filesystem is docker, and the container is restarted, then I imagine this would wipe out that local database. To avoid this, you'll want to have a more persistent DagsterInstance set up -- we have a deploy_docker example that will help you get started down that path, it includes a docker compose file that will set up a local Postgres database for you which should be persistent between deploys

clay

02/14/2023, 8:12 PM

Thanks, @owen. I'm using Postgres, actually, that was built following the deploy_docker example. It seems that the history is lost if anything in the

Definitions

for the module changes? For instance, the additional of a job to the list of jobs?

clay

02/14/2023, 8:13 PM

or the addition of assets to the assets list?

owen

02/14/2023, 8:54 PM

hm sorry for barking up the wrong tree there, but I think the overall theme of the answer is still the same -- there's no reason that the history should be lost due to a code change (unless that code change modifies the asset keys for your assets, in which case the materializations for the old key will not show up in the UI, but would still be present in the database)

owen

02/14/2023, 8:55 PM

are you observing that if you don't change anything, but reload everything, your history is retained?

clay

02/14/2023, 8:55 PM

That's correct

clay

02/14/2023, 8:56 PM

I have two code locations, both set up as mounted volumes w/containers for managing each. If I change one, the other is fine... it's just that the history is lost for the one I change

clay

02/14/2023, 8:56 PM

If I just stop/start then all is fine wrt history preservation. It's just when, for example, I add a new "package" to one of the code locations (with logically separate code) that the history for the other packages in the same code location is lost

owen

02/14/2023, 9:02 PM

just to summarize, your situation is: • a single DagsterInstance, with postgres storage • two separate code locations, each in their own docker containers • each code location has multiple logical sets of artifacts (jobs, assets, etc.) making any change to a code location wipes out the history for all assets in that code location, but has no impact on the assets in the other location. Is that basically correct?

clay

02/14/2023, 9:06 PM

That's correct -- or seems to be. Within one code location, I have a structure like the example project I linked above (also here) . In that, for instance, were I to add a few assets and/or change the flow to

core

, then the history for all of them would be wiped out. Or, if I added a 4th subfolder called

clay

or something, and then added new assets/groups/jobs related to it, then the history for

activity_analytics

core

, and

recommender

assets also would disappear upon restarting docker. Or... I think that's what's happening. Thanks for the input. I will poke around with it more to see if I can determine what's going on.

owen

02/14/2023, 9:09 PM

got it -- let me know if I can assist at all. one thing I'm curious about is what precisely you mean by the history for the assets disappearing (specifically, which pages you're looking at in the UI). if you're comfortable with

psql

, then checking the number of the rows in the

event_logs

table should help determine if anything is actually getting deleted. I'm biased towards thinking that's very unlikely but I've been surprised before on these things

clay

02/14/2023, 9:13 PM

ok, i'll check that out in a bit - or maybe tomorrow. Here's kind of what I mean... imagine my code location has package1 and package2 as subfolders, similar to that example. I load it and run/materialize everything from both packages in dagit. At this point, I can docker-compose down and up and rebuild, etc. and all is fine. If I modify package2 or create a package3, do a pull directory which is a mounted volume in the code container, then subsequently docker-compose down and up again, dagster and dagit are unaware that I ever materialized the assets in package1 or package2.

daniel

02/14/2023, 9:25 PM

this is very odd.. clay when you say its losing the history - which specific page in Dagit are you using to check that?

daniel

02/14/2023, 9:26 PM

any chance you could share your dagster.yaml?

clay

02/14/2023, 9:27 PM

I'm looking at the asset group and/or job related to the particular code of interest

daniel

02/14/2023, 9:27 PM

another question - when you say "restart dagster/dagit" - what command are you running to do that? Is there any chance it could be removing your postgres container and clearing out all the history there?

clay

02/14/2023, 9:28 PM

that is entirely possible. i moved to mounting my code as volumes so that I could avoid rebuliding the containers, thinking that was the issue. I have to do something asap for my boss at the moment but will share later

clay

02/15/2023, 3:22 PM

I just wanted to get back to you here @daniel - I believe you are correct: I was stomping my postgres container inadvertently. Apologies for wasting your time. 🙂

daniel

02/15/2023, 3:22 PM

no worries - easy mistake to make

👍 1

3 Views

Open in Slack

Previous Next