Luke Dziurzynski
08/06/2023, 5:20 PMTo validate that this worked, materialize your entire asset graph. The I/O manager should store the DataFrame in a table called analytics.hackernews.topstories and continue to store the IDs and most frequent words in a directory called data
from the tutorial https://docs.dagster.io/tutorial/saving-your-data at step Choosing an I/O manager for each asset#
Can I look at the contents of analytics.hackernews.topstories
from the UI? Do I need to install the duckdb client? If using the duckdb client, where is the local db saved by default?
Also - what's the advantage of using local duckdb vs parquet? Just that you can query the data via SQL?Tim Castillo
08/06/2023, 7:42 PMimport duckdb
conn = duckdb.connect(database="databasefilename.duckdb")
conn.execute("select count(*) from table").fetchall()
Luke Dziurzynski
08/06/2023, 7:50 PMHow do I find the value for this:?databasefilename.duckdb
Luke Dziurzynski
08/06/2023, 7:51 PM>>> import duckdb
>>> conn = duckdb.connect(database="analytics.hackernews.duckdb")
>>> conn.execute("SELECT COUNT(*) FROM topstories")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
duckdb.CatalogException: Catalog Error: Table with name topstories does not exist!
Did you mean "pg_views"?
LINE 1: SELECT COUNT(*) FROM topstories
^
>>> conn.execute("SELECT COUNT(*) FROM analytics.hackernews.topstories")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
duckdb.CatalogException: Catalog Error: Table with name topstories does not exist!
Did you mean "temp.pg_catalog.pg_views"?
LINE 1: SELECT COUNT(*) FROM analytics.hackernews.topstories
^
>>> conn.execute("SHOW TABLES")
<duckdb.DuckDBPyConnection object at 0xffff7c6499f0>
>>> list(conn.execute("SHOW TABLES"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'duckdb.DuckDBPyConnection' object is not iterable
>>> result = conn.execute("SHOW TABLES")
>>> conn.execute("SHOW TABLES").fetchall()
[]
>>> conn.execute("SHOW TABLES").fetchall()
[]
>>> conn = duckdb.connect(database="analytics.hackernews.duckdb")
>>> conn.execute("SHOW TABLES").fetchall()
[]
>>> conn = duckdb.connect(database="public")
>>> conn.execute("SHOW TABLES").fetchall()
[]
>>> conn = duckdb.connect(database="analytics")
>>> conn.execute("SHOW TABLES").fetchall()
[]
Luke Dziurzynski
08/06/2023, 7:51 PMTim Castillo
08/06/2023, 7:52 PMLuke Dziurzynski
08/06/2023, 7:52 PM>>> conn = duckdb.connect(database="analytics.hackernews")
>>> conn.execute("SHOW TABLES").fetchall()
[]
>>> conn = duckdb.connect(database="analytics.hackernews.duckdb")
>>> conn.execute("SHOW TABLES").fetchall()
[]
Luke Dziurzynski
08/06/2023, 7:58 PMLuke Dziurzynski
08/06/2023, 7:58 PM>>> import duckdb
>>> conn = duckdb.connect(database="analytics.hackernews")
>>> conn.execute("SHOW TABLES").fetchall()
[]
>>> conn.execute("SELECT COUNT(*) FROM topstories").fetchall()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
duckdb.CatalogException: Catalog Error: Table with name topstories does not exist!
Did you mean "public.topstories"?
LINE 1: SELECT COUNT(*) FROM topstories
^
>>> conn.execute("SELECT COUNT(*) FROM public.topstories").fetchall()
[(100,)]
Luke Dziurzynski
08/06/2023, 7:58 PManalytics.hackernews
and then select count(*) from public.topstoriesLuke Dziurzynski
08/06/2023, 8:08 PM