https://dagster.io/ logo
#announcements
Title
# announcements
s

sephi

05/13/2020, 8:55 AM
Hi, Has there been any work done in order to compare between logs? e.g. Assuming we are running some pipeline and would like to compare between the results of the runs? or between specific steps between runs
a

alex

05/13/2020, 8:16 PM
not on log output directly but we are starting to show some cross run information @prha can guide you if you provide more context on what you want to track across runs
p

prha

05/13/2020, 8:24 PM
Hi @sephi… can you give me some info on how you’re running these pipelines and what you’d like to compare? I’ve been working on creating cross-run longitudinal views for partitioned pipelines on the schedules page. They should be available for schedules with the
@daily_schedule
decorator on dagit (see
/schedules/<schedule_name>/
)
s

sephi

05/14/2020, 7:21 AM
Hi, 1.  thx for all your great work.  2. I'm giving some background to our process : • We have a collection of data sets • Each data set is going through a process to generate some sort of dataframe in   (wrapped in a 
composite_solid)
• Our 
pipeline
  runs the various 
composite_solid
• Each 
composite_solid
 generates various statistics on the dataset / dataframe (with context.log) 3. We would like to run comparison of these statistics  between different runs
Update We just read that
Asset manager
maybe a good candidate to keep the dataframe statistics/metrics, and thereafter run comparisons on those assets or should the logs be a better candidate? We could not fine any example for populating the Asset manger - assistance would be apprciated
p

prha

05/14/2020, 4:24 PM
Some background on assets and asset manager, which might be a good option for you. We have
Materializations
which can be emitted from a solid with some metadata. During your pipeline execution, you could yield a materialization with a unique
asset_key
string, and attach whatever metadata you want to it. After your pipeline run, there should be an entry on the
Assets
tab in dagit for that asset key, along with the metadata that you’ve entered.
If you’ve entered numeric metadata with that materialization, the assets tab will show a graph of those values across runs (by execution time)
The asset manager is still in alpha…. let me know if you have questions after playing around with it.
s

sephi

05/14/2020, 5:47 PM
Great - will try playing around with it next week.
BTW when searching for "Materialization" on the site - you only get the api docs but not the other resources
Hi, We are trying to 
Materialize
 data that is saved into a HDFS filesystem. We  tried to use the  
EventMetadataEntry.path
  - and can see EVENT TYPE 
Metarialization
 that was generated in the logs. However we could not see anything in the Assets tab. We are using the sqlite storage - thus have an empty 
dagster.yaml
 file (running version 0.7.12). What are we missing in order to see results the 
Assets
 tab?
p

prha

05/18/2020, 3:48 PM
Hi sephi… You’ll need to add an explicit
asset_key
param to the Materialization:
Copy code
yield Materialization(label='my_materialization_label', asset_key='my_asset_key', metadata_entries=[EventMetadataEntry.path('my_path')])
It might seem redundant with the
label
param at the moment - they might coalesce into a single param soon, but for now they are separate while we figure out the ideal API
s

sephi

05/18/2020, 7:00 PM
Thx for the update - will try to morrow and will update
p

prha

05/18/2020, 8:04 PM
Oh, also, I just read that you’re using
sqlite
storage. Unfortunately, we currently only support asset-based features on our
dagster-postgres
storage. This is a product of the way we structured our sqlite implementation of event log storage, which makes asset-based queries very difficult.
s

sephi

05/20/2020, 3:48 AM
so - if we spin up a
postgres
db - in how many places do we need to change the configuration?
p

prha

05/20/2020, 5:28 PM
In your instance configuration (e.g.
dagster.yaml
), you would need to configure this:
Copy code
event_log_storage:
  module: dagster_postgres.event_log
  class: PostgresEventLogStorage
  config:
    postgres_url: {my_postgres_url}
But at that point, many people configure their schedule storage and run storage to use the same postgres DB as well:
Copy code
run_storage:
  module: dagster_postgres.run_storage
  class: PostgresRunStorage
  config:
    postgres_url: {my_postgres_url}
schedule_storage:
  module: dagster_postgres.schedule_storage
  class: PostgresScheduleStorage
  config:
    postgres_url: {my_postgres_url}
s

sephi

06/01/2020, 3:57 PM
Hi We have updated the storage with a postgres db - and can view the assets in the
assets
tab. We can view the various
ASSET KEY
- and in each asset we see the various
runs
. However we can only view the
DETAILS
of the
Last Materialized Event
- how can we compare the results between the various runs?
p

prha

06/01/2020, 5:06 PM
Right now, there’s not a way to do this…. what values would you want to be comparing?
If you have a
FloatMetadataEntry
on the materialization, we compare that numeric value in a graph over time
s

sephi

06/02/2020, 10:12 AM
Hi Thx - we were able to work with
EventMetadataEntry.float
- Maybe it is worth while to expand the documentation in https://docs.dagster.io/docs/apidocs/solids#dagster.EventMetadataEntry
p

prha

06/02/2020, 4:52 PM
Yes, thanks, I’m working on improving all of the Materialization documentation for the next release, including some of the asset stuff
2 Views