Hi together, I am using dagster locally with a po...
# ask-community
Hi together, I am using dagster locally with a postgres backend, which I specified in dagster.yaml, which lies in DAGSTER_HOME:
Copy code
      username: ""
      password: ""
      hostname: ""
      db_name: ""
      port: 5432
From my understanding what should happen now, ist that everything that dagster has written in DAGSTER_HOME under "storage" (before specifying PG, as it was using sql-lite) should now land in my PG-database instead of in my local file-system / folder. But Dagster is still writing assets in the "storage" folder under DAGSTER_HOME locally. Is this expected?
dagster bot responded by community 1
This is a little confusing (and I might be wrong about 1-2 things, writing mostly from memory here) but the postgres storage you’re setting on your
only relates to what’s written to a backend database; that is (non-exhaustive), schedule information, heartbeats, run history, dynamic partitions, etc. What you’re likely still seeing in your
folder are the outputs of your IO Manager (if using the default IO Manager — https://docs.dagster.io/concepts/io-management/io-managers) and probably compute logs for each run (https://docs.dagster.io/deployment/dagster-instance#compute-log-storage)
Ok thanks for answering! Is there a way to stop the I/O manager of outputing these things? Because we are not explicity reading/writing locally. We are copying a blob in Azure Blob Storage from one account to another using the
. And exactly when execuiting a run, that does one of these file copies, it creates a "file" (?) in the
folder. The only reason for this behaviour I could think of, is that dagster needs this in order to know that the
has been materialized (aka the copy was successful, aka having a "placeholder" file locally in the
Whatever you return from a dagster
gets passed to the IO Manager, which then takes care of saving it so downstream `op`s and `asset`s can load it and process it further. The way to stop it from creating that file would be overriding the (either default or selected) IO Manager. As far as I can tell, there’s one in the
library: https://docs.dagster.io/_apidocs/libraries/dagster-azure#dagster_azure.adls2.adls2_pickle_io_manager As an example, if you’re handling all your IO logic within the op or asset and returning its remote path (as I’ve often seen in Airflow implementations), this would cause dagster to pass this remote path to the IO Manager, which will then write it to a file/blob storage/database/whatever you tell it to
❤️ 1
I can confirm what Vinnie is saying here-- the default IO manager uses the filesystem, if you are using azure blob storage then you should use the azure IO manager to handle passage of data between ops/assets.
Thanks @sean! 🙂