This is a cross-post from the <#C04EKVBA69Y|dagste...
# ask-community
c
This is a cross-post from the #dagster-snowflake channel, per recommendation of Jaime... I have a question that maybe related to how I'm using dagster with Snowflake or might be that I'm not storing assets properly, etc. This is with a multi-container docker setup. I have a module that I'm mounting as a volume, per the docs. The structure is similar to this example, where I have packages/folders in my assets directory. In the linked example, there are packages for
activity_analytics
,
core
, and
recommender
. In my case, let's say I have
package1
and
package2
. As with the example,
all_assets
ends up being
[*package1_assets, *package2_assets]
and is passed to
Definitions
in the same way that the assets are in the linked example. If I go into dagit and launch a run to materialize all of the
package1
and
package2
assets, which are in different groups, all works fine. However, if I modify the code to
package2
and then update the git repository and restart dagster/dagit, the system no longer knows that the
package1
assets were materialized, in spite of the fact that
package1
did not change at all. I can understand that
package2
would reset, however, I'd like to figure out how to set this up so that only the assets associated with the package that changed would need to be rematerialized for dagster to know about them. Most of my work/assets is being pushed to Snowflake. Is this simply a matter of dagster not knowing how to find them on Snowflake? Or will all of the assets associated with all of the packages always reset when I push a code change to a single package?
o
hi @clay! I think this is actually unrelated to snowflake/io-managers in general. Dagster keeps track of historical state inside the DagsterInstance. Essentially, this is a database that has a record of every event that's happened when running dagster code (i.e. "a step started" or "an asset was materialized"). This is separate and independent from the physical assets in snowflake. The fact that history is disappearing when restarting dagster/dagit indicates to me that the instance itself is disappearing between runs. My guess is that in your setup, you're using the default DagsterInstance behavior, which writes these events to a sqlite database in the local filesystem. If the local filesystem is docker, and the container is restarted, then I imagine this would wipe out that local database. To avoid this, you'll want to have a more persistent DagsterInstance set up -- we have a deploy_docker example that will help you get started down that path, it includes a docker compose file that will set up a local Postgres database for you which should be persistent between deploys
c
Thanks, @owen. I'm using Postgres, actually, that was built following the deploy_docker example. It seems that the history is lost if anything in the
Definitions
for the module changes? For instance, the additional of a job to the list of jobs?
or the addition of assets to the assets list?
o
hm sorry for barking up the wrong tree there, but I think the overall theme of the answer is still the same -- there's no reason that the history should be lost due to a code change (unless that code change modifies the asset keys for your assets, in which case the materializations for the old key will not show up in the UI, but would still be present in the database)
are you observing that if you don't change anything, but reload everything, your history is retained?
c
That's correct
I have two code locations, both set up as mounted volumes w/containers for managing each. If I change one, the other is fine... it's just that the history is lost for the one I change
If I just stop/start then all is fine wrt history preservation. It's just when, for example, I add a new "package" to one of the code locations (with logically separate code) that the history for the other packages in the same code location is lost
o
just to summarize, your situation is: • a single DagsterInstance, with postgres storage • two separate code locations, each in their own docker containers • each code location has multiple logical sets of artifacts (jobs, assets, etc.) making any change to a code location wipes out the history for all assets in that code location, but has no impact on the assets in the other location. Is that basically correct?
c
That's correct -- or seems to be. Within one code location, I have a structure like the example project I linked above (also here) . In that, for instance, were I to add a few assets and/or change the flow to
core
, then the history for all of them would be wiped out. Or, if I added a 4th subfolder called
clay
or something, and then added new assets/groups/jobs related to it, then the history for
activity_analytics
,
core
, and
recommender
assets also would disappear upon restarting docker. Or... I think that's what's happening. Thanks for the input. I will poke around with it more to see if I can determine what's going on.
o
got it -- let me know if I can assist at all. one thing I'm curious about is what precisely you mean by the history for the assets disappearing (specifically, which pages you're looking at in the UI). if you're comfortable with
psql
, then checking the number of the rows in the
event_logs
table should help determine if anything is actually getting deleted. I'm biased towards thinking that's very unlikely but I've been surprised before on these things
c
ok, i'll check that out in a bit - or maybe tomorrow. Here's kind of what I mean... imagine my code location has package1 and package2 as subfolders, similar to that example. I load it and run/materialize everything from both packages in dagit. At this point, I can docker-compose down and up and rebuild, etc. and all is fine. If I modify package2 or create a package3, do a pull directory which is a mounted volume in the code container, then subsequently docker-compose down and up again, dagster and dagit are unaware that I ever materialized the assets in package1 or package2.
d
this is very odd.. clay when you say its losing the history - which specific page in Dagit are you using to check that?
any chance you could share your dagster.yaml?
c
I'm looking at the asset group and/or job related to the particular code of interest
d
another question - when you say "restart dagster/dagit" - what command are you running to do that? Is there any chance it could be removing your postgres container and clearing out all the history there?
c
that is entirely possible. i moved to mounting my code as volumes so that I could avoid rebuliding the containers, thinking that was the issue. I have to do something asap for my boss at the moment but will share later
I just wanted to get back to you here @daniel - I believe you are correct: I was stomping my postgres container inadvertently. Apologies for wasting your time. 🙂
d
no worries - easy mistake to make
👍 1