Joe Schmid
11/08/2021, 8:24 PMfrom dagster import job, op, Output
owen
11/08/2021, 8:58 PMAssetMaterialization
from dagster as well, if you haven't alreadyJoe Schmid
11/08/2021, 9:00 PMowen
11/08/2021, 9:24 PMremote_storage_path
, which in this case is the string '.csv'
. if you want dagster to write the contents of df
to a remote storage system, you have a few options. What remote storage system are you using, and how do you usually store stuff there?Joe Schmid
11/08/2021, 9:27 PMowen
11/08/2021, 9:33 PMJoe Schmid
11/08/2021, 9:35 PMowen
11/08/2021, 9:40 PM<http://df.to|df.to>_csv("your_chosen_path.csv")
before your AssetMaterialization event, which will store it to whatever local path you want. to store remotely, you'll probably want to get the contents of the df as a csv-formatted string (which would be csv_string = <http://df.to|df.to>_csv()
), at which point you'd have to call the azure API to upload that string to a specific locationJoe Schmid
11/08/2021, 9:41 PMowen
11/08/2021, 9:41 PMJoe Schmid
11/08/2021, 9:46 PMowen
11/08/2021, 10:03 PMJoe Schmid
11/09/2021, 7:17 PMowen
11/09/2021, 7:22 PMconfigured
function is a dictionary where each key is the name of a configurable field, and the value is the value you want to set it to. So in this case you'd want this to be fs_io_manager.configured({"base_path": "/mnt/c/psf/dagstertesting/"})
(i.e. "set base_path to be /mnt/c/...")Joe Schmid
11/09/2021, 7:28 PMowen
11/09/2021, 7:32 PMJoe Schmid
11/09/2021, 7:36 PMowen
11/09/2021, 10:32 PMJoe Schmid
11/09/2021, 10:33 PMjob: ODBC Extract DAG
op: open connection to ODBC data source
op: pull table list & definitions
op: for each table:
- read rows into dataframe
- save dataframe to csv
- save dataframe to SQL table in db example-db-yyyymmdd (create db if it does not exist)
op: send notification to slack
owen
11/17/2021, 5:50 PMrequired_resource_key
on the ops that talk to the database. This has a few benefits, including allowing you to substitute this resource for a mock one if you want to test your ops. As for the AssetMaterializations, you can optionally yield an AssetMaterialization every time that you save a dataframe to a SQL table. This won't impact the functionality of your job in any way, and is totally optional, but it will help Dagster keep track of changes to that table over time, which might be useful. You could also yield AssetMaterializations for those saved csv files if you want those to be tracked as well.Joe Schmid
11/18/2021, 11:50 PM