Does dagster integrate with any tools for data governance? (managing dataset permissions etc.) main thing I've looked into so far is Apache Atlashttps://atlas.apache.org/#/QuickStart . Basically making sure a datapoint is never used for the wrong thing (e.g. training) irrespective of where it's stored.
06/23/2022, 1:43 PM
I can't speak directly for Atlas, but I know with Immuta and Ranger, those tools mostly handle permissions based on the address of the data. I'm sure you could build hooks in Dagster to update metadata, etc., but I'm guessing you'd be rolling your own
06/23/2022, 8:00 PM
I'm not aware of existing integrations between Dagster and Atlas. Is there functionality that you'd want in particular?
06/23/2022, 8:27 PM
Super early exploration at the min. Clients provide us different permissions wrt the use of their data. Looking for ways to set that, link it to formal docs and then impact/control downstream use (mostly train on / don't train on)
Not very familiar with tooling in this space
@sandy kind of after solutions which would allow you to see everything a datapoint has been used for, like see everytime it was an upstream data asset in a job (but at the single datapoint level). That would need a lot of metadata storage though
06/24/2022, 3:11 PM
got it - alas, I'm not very familiar with tooling in this space either, so might not be the best person to ask
07/01/2022, 10:45 AM
Hey @sandy just looping back on this v. briefly. Just realised Atlas is more specific to Hadoop + big data specific storage so I wouldn't be going near it. https://atlan.com/open-source-data-governance-tools/ has a good list of Data Governance tools if you're interested.