Hello, I'm using the dagster-duckdb-pandas integra...
# ask-community
d
Hello, I'm using the dagster-duckdb-pandas integration per this doc. In the
{"io_manager": duckdb_pandas_io_manager.configured({"database": "my_db.duckdb"})}
, it appears "my_db.duckdb" path is relative to the current script. But how do I specify a path that is located completely somewhere else? I tried
"database": {"env": "DUCKDB_DB_PATH"},
where the contents of my .env file is just
DUCKDB_DB_PATH=/home/some_user/databases/nhtsa.duckdb
, but I got an error message saying it couldn't find the database file because it was looking at
path_to_my_dagster_script/home/some_user/databases/nhtsa.duckdb
. So it appears it is appending the value in my DUCKDB_DB_PATH environment variable to the path of the dagster script file. Not sure if this is a bug or I need to specify the location of my duckdb database file in a different manner. Also, I think the default schema for duckdb is "main", not "public". Maybe that needs to be changed in the documentation also?
j
hey @Daniel Kim as far as i know the db should be able to create databased stored in other paths. For example, in our unit testing, I’m able to store the db in a tmp_path by doing
Copy code
"database"; os.path.join(tmp_path, "unit_test.duckdb")
I inspected the test and
tmp_path
is
Copy code
PosixPath('/private/var/folders/ns/r7rp0cg558zdj1yjm3p66qn80000gn/T/pytest-of-jamie/pytest-65/test_duckdb_io_manager_with_as0')
which is definitely not the path where the test file is. maybe try joining the path with the os library to see if that makes a difference? what OS are you running? that might also make a difference. As for
public
vs
main
- all of our database io managers share a default schema name, which is
public
. so if the schema isn’t set by the user, we manually set it to
public
.
d
Thanks! Im using Windows. Using pathlib did the trick for me. Thank you!