< schrockn> The `ext architecture` link is broken as it poin dagster #feature-pipes

<@UCCLPK5G8> The `ext architecture` link is broken...

Muhammad Jarir Kanji

09/05/2023, 8:11 PM

@schrockn The

ext architecture

link is broken, as it points to an internal repo. Additionally, if the

ext

protocol formalizes how structured and unstructured messages are passed to/from Dagster, have you thought about making it so that the remote execution context is able to modify the state of the DAG in the Dagster server? Specifically, my first thought when reading through the proposal was that it could provide the basis for Dagster to do double-duty as a metadata and logging platform (which is what most ML experiment tracking platforms are at their core), without necessarily orchestrating anything (i.e., Dagster doesn't trigger the job itself; it just receives information about it). It'd be awesome if I could: • create, from an external environment using

ext

, an asset that does not exist in the

definitions

in the environment where the Dagster server is running; and • run a process (like training an ML model) in a Jupyter notebook somewhere and stream related logs back to Dagster. This would, in essence, be like manually pressing the "Materialize" button in the Dagster UI, but triggered from outside the Dagster server. The combination of these two things would allow me to potentially use Dagster as a replacement for an experiment tracking platform and also allow me to use Dagster as the "single pane of glass" for more experimental, ad-hoc work and production jobs that are properly defined as a DAG in Dagster (and orchestrated by it). This is definitely coloring outside the lines of "pure" orchestration, though. If this makes sense and is at least somewhat interesting/feasible, I can also post in the GH Discussion.

👍 1

D 1

schrockn

09/05/2023, 10:12 PM

The
ext architecture
link is broken, as it points to an internal repo.

Thanks! Fixed.

schrockn

09/05/2023, 10:22 PM

This is a great comment and we’ve been toying with going in this direction.

create, from an external environment using
ext
, an asset that does not exist in the
definitions
in the environment where the Dagster server is running; and

This is interesting. That asset keys are defined as definition time is baked into Dagster in fairly deep ways. It is one the reasons why you can render the same asset graph on your laptop that gets deployed: the asset graph does not depend on state. If we were to do this you would still likely have to define your asset keys and deps ahead of time.

run a process (like training an ML model) in a Jupyter notebook somewhere and stream related logs back to Dagster. This would, in essence, be like manually pressing the “Materialize” button in the Dagster UI, but triggered from outside the Dagster server.

The above is far more doable. With the asset graph defined ahead of time, streaming in events that attached themselves to those keys as appropriate is a sensible extension.

👍 2

2 Views

Open in Slack