Hi! Total noob to dagster here. There are some big...
# integration-bigquery
k
Hi! Total noob to dagster here. There are some bigquery scheduled jobs that we are currently running that we want to define dependencies for. Is this a good use case for dagster?
dagster bot responded by community 1
v
Sure, it can be. You can use those tables as source assets and define their dependencies
k
Since these tend to be CREATE TABLE statements I’m thinking to define these as Ops and load the resulting tables as Source Assets. Is this a good practice?
v
You could also have those be assets, my idea with source assets would be to use that until you're ready to move everything from scheduled queries to dagster-managed
k
ic ic. thanks for the reply. it seems like the general sense is that assets are superior - but on the other hand i don’t want to load stuff to memory and keep the computation and storage strictly within bigquery. would assets still be superior under that restriction?
v
Yes, that would be a case of defining your own IO Manager. By default and in most (maybe all?) dagster-provided IO Managers, the assumption is that you're processing your data in memory and returning the result, the IO Manager then handles loading it somewhere. You could also define an IO Manager that takes a SQL query and runs it in BQ or similar, there's lots of ways to go here.
k
hmm i see - i guess that’s why the quickstart-gcp defines the custome bigquery iomanager?
v
The "why" question is something I'd defer to @jamie, but I'd assume yes.
k
still i guess the fastest way to migrate to dagster is to use bigqueryresource to run bigquery sql within ops and graphs? if we’d need to write our custom iomanagers
j
hey @Kazushi Nagayama - sorry to throw yet more info at you, but if you want to keep all computation done within bigquery, your best bet will likely be to use the
BigQueryResource
(just a configurable wrapper around a bigquery client) and set up asset dependencies using
deps
Copy code
from dagster_gcp import BigQueryResource

@asset 
def orders(bigquery: BigQueryResource) -> None: 
   sql = "SOME SQL THAT CREATES A TABLE NAMED ORDERS"
   with bigquery.get_client() as client:
       client.query(sql)

@asset(
    deps=[orders]
)
def orders(bigquery: BigQueryResource) -> None: 
   with bigquery.get_client() as client:
       client.query("SELECT * FROM ORDERS WHERE returned='True'")
also - could you point to where in the
quickstart-gcp
example you see a custom BQ IO manager? i thought we changed that a while ago to use the built in dagster BQ IO manager
k
@jamie hey thanks! yeah I ended up using BigQueryResource. I thought you were using custom iomanager here, but i may be reading it wrong - https://github.com/dagster-io/quickstart-gcp/blob/main/quickstart_gcp/io_managers.py
j
oh i see - you’re totally correct. we have another example project within the dagster repo called
quickstart-gcp
so i was looking at the wrong files
k
oh i didn't realise there was another one
@jamie this was what i was doing before you replied
Copy code
def execute_query_from_file(bigquery:BigQueryResource, file:str) -> bool:
    with bigquery.get_client() as client, open(file, "r") as f:
        query = f.read()
        client.query(query)
    return True

@op
def generate_processed_logs(bigquery:BigQueryResource) -> bool:
    return execute_query_from_file(bigquery=bigquery, file="queries/processed_logs.sql")

@op
def generate_annotated_logs(bigquery:BigQueryResource, previous_step:bool) -> bool:
    return execute_query_from_file(bigquery=bigquery, file="queries/annotated_logs.sql")

@job
def processe_and_annotate_logs() -> None:
    generate_annotated_logs(previous_step=generate_processed_logs())
i ended up not defining these as assets because these queries insert into preexisting bigquery tables and wasn't sure how they would be handled within dagster assets
i felt it was a bit hacky but it did what i wanted
j
i wouldn’t call that hacky! it looks good to me
k
cool cool. we went for doing this for streaming insert + declaring source assets based on preexisting tables. good to hear it looks good to you!