Hi i've got a weird niche use-case of Dagit that I...
# ask-community
c
Hi i've got a weird niche use-case of Dagit that I'd like figure out the best way to solve. I've got dagster running in-process in some cloud container. This itself is a bit strange but bear with me 😅 -- After dagster's done executing in-process in the container, our tool writes out all data that dagster produced to S3, including
history/
and
compute_logs/
. I want to download these from S3 and visualize them in Dagit locally. What's the best way to "register" existing runs with dagster DB? I would have expected that simply adding them into my local
history/
and
compute_logs/
folders would have been a good start, but I think that doesn't actually send them to the db volume
o
hi @Chaitya! in this case, are you trying to continually add to an existing local instance, or just be able to view what happened in one "session"? Trying to continually append to an existing db will definitely run into some issues, as there are some cross-run indexes that need to be maintained (e.g. asset materialization events have autoincrement storage ids), but I'd expect loading just a single run's history/ to work ok
c
Just trying to visualize a single session. I think we'd use a deployed instance for cases where we want to do things that are more complex in the UI. Our use case is more that we're using dagster for some in-process graph execution, and find the UI to be a nice way to inspect output logs and benchmarks. Would love to just be able to feed in some output data and view it in the UI.
I did make some progress here - I'm able to mount my folders into the dagster-dev container in the DAGSTER_HOME directory as defined in this structure https://docs.dagster.io/deployment/dagster-instance#default-local-behavior My file structure looks like this:
Copy code
/var/lib/dagster:
   dagster.yaml
   history/
      runs.db
      runs/
        index.db
        ...
   storage/
      ...
         some_log.out
         some_log.complete
         some_log.err
I think dagster still isnt aware of where my data is though. When trying to spin up the container, i get the error:
Copy code
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) unable to open database file
(Background on this error at: <https://sqlalche.me/e/14/e3q8>)
o
that directory structure looks reasonable -- do you have the DAGSTER_HOME env var set to that path, and are the permissions on that directory such the process can read/write to it?
c
yeah, both should be true.
let me double check the permissions
i mean, changing dagster.yaml does indeed impact how the UI runs... so I assume we are able to read that
dagster.yaml
i guess its possible that the read/write permissions are different for directories in that folder though.
let me see
looks OK i think:
Copy code
-rw-rw-r-- 1 1002 1005 1153 Jun 15 18:14 dagster.yaml
drwxr-xr-x 3 1002 1005 4096 Jun 15 18:13 history
drwxr-xr-x 2 root root 4096 Jun 15 18:19 logs
drwxr-xr-x 6 1002 1005 4096 Jun 15 18:13 storage
o
just as a sanity check, if you chmod 777 the directory, what happens? dagster.yaml only needs read access, but I believe the db files need both read and write
c
looks like same issue when all of the files are rwx
so the underlying issue was that docker does not handle symlinks very well in nested directories. We were mounting the history/storage folders to the container, but all of the underlying files were symlinks to my local system (because we're doing this via bazel). As a result, we uploaded a bunch of broken links, instead of the actual db files. Manually mounting the files worked, but i would like to avoid that. Gonna look into using a docker volume for this data instead.