https://dagster.io/ logo
m

max

06/11/2020, 11:20 PM
Workspace, host and user process separation, and repository definition Dagit and other tools no longer load a single repository containing user definitions such as pipelines into the same process as the framework code. Instead, they load a "workspace" that can contain multiple repositories sourced from a variety of different external locations (e.g., Python modules and Python virtualenvs, with containers and source control repositories soon to come). The repositories in a workspace are loaded into their own "user" processes distinct from the "host" framework process. Dagit and other tools now communicate with user code over an IPC mechanism. This architectural change has a couple of advantages: - Dagit no longer needs to be restarted when there is an update to user code. - Users can use repositories to organize their pipelines, but still work on all of their repositories using a single running Dagit. - The Dagit process can now run in a separate Python environment from user code so pipeline dependencies do not need to be installed into the Dagit environment. - Each repository can be sourced from a separate Python virtualenv, so teams can manage their dependencies (or even their own Python versions) separately. We have introduced a new file format,
workspace.yaml
, in order to support this new architecture. The workspace yaml encodes what repositories to load and their location, and supersedes the
repository.yaml
file and associated machinery. As a consequence, Dagster internals are now stricter about how pipelines are loaded. If you have written scripts or tests in which a pipeline is defined and then passed across a process boundary (e.g., using the
multiprocess_executor
or dagstermill), you may now need to wrap the pipeline in the
reconstructable
utility function for it to be reconstructed across the process boundary. In addition, rather than instantiate the
RepositoryDefinition
class directly, users should now prefer the
@repository
decorator. As part of this change, the
@scheduler
and
@repository_partitions
decorators have been removed, and their functionality subsumed under
@repository
.
s

schrockn

06/11/2020, 11:27 PM
I just want to reiterate here what a massive change to the system this, but a great one and puts us on trajectory to all sorts of interesting things. However given that it is such a huge change some of y’all might run into bugs!
b

Binh Pham

06/13/2020, 7:02 AM
- Dagit no longer needs to be restarted when there is an update to user code.
When does dagit refresh, upon code save? I couldn't get dagit 0.8 to refresh.
s

schrockn

06/13/2020, 4:28 PM
@Binh Pham we had to temporarily remove this feature as it now requires a wholly different implementation, but we will readd it soon!
Thanks for letting us know that this is still valuable