Hi - I'm a dagster super noob - Just ran through t...
# ask-community
l
Hi - I'm a dagster super noob - Just ran through the tutorial project and have data writing to duckdb -- what's the recommended way to query local duckdb data? I want to verify (per the tutorial) that data exists:
Copy code
To validate that this worked, materialize your entire asset graph. The I/O manager should store the DataFrame in a table called analytics.hackernews.topstories and continue to store the IDs and most frequent words in a directory called data
from the tutorial https://docs.dagster.io/tutorial/saving-your-data at step
Choosing an I/O manager for each asset#
Can I look at the contents of
analytics.hackernews.topstories
from the UI? Do I need to install the duckdb client? If using the duckdb client, where is the local db saved by default? Also - what's the advantage of using local duckdb vs parquet? Just that you can query the data via SQL?
🤖 1
t
Hi! You should already have the duckdb python package installed, so you can open up a Python REPL, connect to the database yourself, and query it from here, ex.
import duckdb
conn = duckdb.connect(database="databasefilename.duckdb")
conn.execute("select count(*) from table").fetchall()
💯 1
l
How do I find the value for this:
databasefilename.duckdb
?
Copy code
>>> import duckdb
>>> conn = duckdb.connect(database="analytics.hackernews.duckdb")
>>> conn.execute("SELECT COUNT(*) FROM topstories")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
duckdb.CatalogException: Catalog Error: Table with name topstories does not exist!
Did you mean "pg_views"?
LINE 1: SELECT COUNT(*) FROM topstories
                             ^
>>> conn.execute("SELECT COUNT(*) FROM analytics.hackernews.topstories")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
duckdb.CatalogException: Catalog Error: Table with name topstories does not exist!
Did you mean "temp.pg_catalog.pg_views"?
LINE 1: SELECT COUNT(*) FROM analytics.hackernews.topstories
                             ^
>>> conn.execute("SHOW TABLES")
<duckdb.DuckDBPyConnection object at 0xffff7c6499f0>
>>> list(conn.execute("SHOW TABLES"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'duckdb.DuckDBPyConnection' object is not iterable
>>> result = conn.execute("SHOW TABLES")
>>> conn.execute("SHOW TABLES").fetchall()
[]
>>> conn.execute("SHOW TABLES").fetchall()
[]
>>> conn = duckdb.connect(database="analytics.hackernews.duckdb")
>>> conn.execute("SHOW TABLES").fetchall()
[]
>>> conn = duckdb.connect(database="public")
>>> conn.execute("SHOW TABLES").fetchall()
[]
>>> conn = duckdb.connect(database="analytics")
>>> conn.execute("SHOW TABLES").fetchall()
[]
image.png
t
I believe the file is just called analytics.hackernews. Not at my computer right now so I can't confirm. It should have been made in your project directory
l
Copy code
>>> conn = duckdb.connect(database="analytics.hackernews")
>>> conn.execute("SHOW TABLES").fetchall()
[]
>>> conn = duckdb.connect(database="analytics.hackernews.duckdb")
>>> conn.execute("SHOW TABLES").fetchall()
[]
figured it out -- thank you
Copy code
>>> import duckdb
>>> conn = duckdb.connect(database="analytics.hackernews")
>>> conn.execute("SHOW TABLES").fetchall()
[]
>>> conn.execute("SELECT COUNT(*) FROM topstories").fetchall()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
duckdb.CatalogException: Catalog Error: Table with name topstories does not exist!
Did you mean "public.topstories"?
LINE 1: SELECT COUNT(*) FROM topstories
                             ^
>>> conn.execute("SELECT COUNT(*) FROM public.topstories").fetchall()
[(100,)]
have to load
analytics.hackernews
and then select count(*) from public.topstories
thanks!
daggy love 1