sephi
05/13/2020, 8:55 AMalex
05/13/2020, 8:16 PMprha
05/13/2020, 8:24 PM@daily_schedule
decorator on dagit (see /schedules/<schedule_name>/
)sephi
05/14/2020, 7:21 AMcomposite_solid)
• Our pipeline
runs the various composite_solid
• Each composite_solid
generates various statistics on the dataset / dataframe (with context.log)
3. We would like to run comparison of these statistics between different runssephi
05/14/2020, 10:27 AMAsset manager
maybe a good candidate to keep the dataframe statistics/metrics, and thereafter run comparisons on those assets or should the logs be a better candidate?
We could not fine any example for populating the Asset manger - assistance would be apprciatedprha
05/14/2020, 4:24 PMMaterializations
which can be emitted from a solid with some metadata. During your pipeline execution, you could yield a materialization with a unique asset_key
string, and attach whatever metadata you want to it. After your pipeline run, there should be an entry on the Assets
tab in dagit for that asset key, along with the metadata that you’ve entered.prha
05/14/2020, 4:25 PMprha
05/14/2020, 4:28 PMprha
05/14/2020, 4:29 PMsephi
05/14/2020, 5:47 PMsephi
05/14/2020, 5:49 PMsephi
05/17/2020, 8:04 AMMaterialize
data that is saved into a HDFS filesystem.
We tried to use the EventMetadataEntry.path
- and can see EVENT TYPE Metarialization
that was generated in the logs.
However we could not see anything in the Assets tab.
We are using the sqlite storage - thus have an empty dagster.yaml
file (running version 0.7.12).
What are we missing in order to see results the Assets
tab?prha
05/18/2020, 3:48 PMasset_key
param to the Materialization:
yield Materialization(label='my_materialization_label', asset_key='my_asset_key', metadata_entries=[EventMetadataEntry.path('my_path')])
prha
05/18/2020, 3:48 PMlabel
param at the moment - they might coalesce into a single param soon, but for now they are separate while we figure out the ideal APIsephi
05/18/2020, 7:00 PMprha
05/18/2020, 8:04 PMsqlite
storage. Unfortunately, we currently only support asset-based features on our dagster-postgres
storage. This is a product of the way we structured our sqlite implementation of event log storage, which makes asset-based queries very difficult.sephi
05/20/2020, 3:48 AMpostgres
db - in how many places do we need to change the configuration?prha
05/20/2020, 5:28 PMdagster.yaml
), you would need to configure this:
event_log_storage:
module: dagster_postgres.event_log
class: PostgresEventLogStorage
config:
postgres_url: {my_postgres_url}
prha
05/20/2020, 5:29 PMrun_storage:
module: dagster_postgres.run_storage
class: PostgresRunStorage
config:
postgres_url: {my_postgres_url}
schedule_storage:
module: dagster_postgres.schedule_storage
class: PostgresScheduleStorage
config:
postgres_url: {my_postgres_url}
sephi
06/01/2020, 3:57 PMassets
tab.
We can view the various ASSET KEY
- and in each asset we see the various runs
. However we can only view the DETAILS
of the Last Materialized Event
- how can we compare the results between the various runs?prha
06/01/2020, 5:06 PMprha
06/01/2020, 5:06 PMFloatMetadataEntry
on the materialization, we compare that numeric value in a graph over timesephi
06/02/2020, 10:12 AMEventMetadataEntry.float
-
Maybe it is worth while to expand the documentation in https://docs.dagster.io/docs/apidocs/solids#dagster.EventMetadataEntryprha
06/02/2020, 4:52 PM