danil
02/04/2021, 7:00 PMworkspace.yaml
with 2 pipelines under the same directory. Reading documentation was of no help since it covers only simple use cases.
Here is the structure of the folder from where I run `dagit`:
- local_repo.py
- workspace.yaml
- dagster_baby_pipeline (folder with pipeline definition)
- dagster_try (folder with pipeline definition)
In local_repo.py
I try to import baby_pipeline.py
from dagster_baby_pipeline
and hello_cereal.py
from dagster_try
to return from @repository
function. From the logs I am suspecting that those module aren’t getting loaded into Dagster process hence the relative imports don’t work.
If I specify working_directory
for ONLY dagster_baby_pipeline
folder in workspace.yaml
and don’t import dagster_try
then it works. It seems like if I create a repository for each pipeline in its respective modules and import them separately in workspace.yaml
then it will achieve what I want but will require boilerplate overhead.
Can you please clarify what is happening under the hood with imports in this scenario and what is the best practice on using these abstractions? Is there a way to load both dagster_baby_pipeline
and dagster_try
modules so they can be accessed by a single local_repo.py
?alex
02/04/2021, 7:18 PM(dagenv38) /tmp/project:$ tree
.
├── folder_a
│ └── __init__.py
├── folder_b
│ └── __init__.py
├── repo.py
└── workspace.yaml
2 directories, 4 files
(dagenv38) /tmp/project:$ cat workspace.yaml
load_from:
- python_file:
relative_path: repo.py
working_directory: .
(dagenv38) /tmp/project:$ cat repo.py
from dagster import repository
from folder_a import my_pipeline
from folder_b import other_pipeline
@repository
def my_repo():
return [my_pipeline, other_pipeline]
danil
02/04/2021, 7:43 PM__init__.py
file present. If we omit dagster_try
package for now and only try to deal with dagster_baby_pipeline
for the sake of simplicity, here is what we get:
Happy Path:
(dagster_baby_pipeline) danil@dk dagster-pipelines % cat local_repo.py
from dagster import repository
import baby_pipeline
@repository
def repository():
return [baby_pipeline.pipeline]%
(dagster_baby_pipeline) danil@dk dagster-pipelines % cat workspace.yaml
load_from:
- python_file:
relative_path: local_repo.py
working_directory: /Users/danil/desktop/dagster-pipelines/dagster_baby_pipeline/
(dagster_baby_pipeline) danil@dk dagster-pipelines %
Sad Path:
(dagster_baby_pipeline) danil@dk dagster-pipelines % cat local_repo.py
from dagster import repository
from dagster_baby_pipeline import baby_pipeline
@repository
def repository():
return [baby_pipeline.pipeline]%
(dagster_baby_pipeline) danil@dk dagster-pipelines % cat workspace.yaml
load_from:
- python_file:
relative_path: local_repo.py
working_directory: .
(dagster_baby_pipeline) danil@dk dagster-pipelines %
Error log:
UserWarning: Error loading repository location local_repo.py:dagster.core.errors.DagsterUserCodePro
cessError: dagster.core.errors.DagsterImportError: Encountered ImportError: `No module named 'io_solids'` while importing module local_repo from file /Users/danil/Desktop/dagster-pipelines/local_repo.py. Local modules were resolved usi
ng the working directory `/Users/danil/Desktop/dagster-pipelines`. If another working directory should be used, please explicitly specify the appropriate path using the `-d` or `--working-directory` for CLI based targets or the `workin
g_directory` configuration option for `python_file`-based workspace.yaml targets.
io_solids
is inside `dagster_baby_pipeline`:
(dagster_baby_pipeline) danil@dk dagster-pipelines % tree
.
├── Pipfile
├── Pipfile.lock
├── dagster_baby_pipeline
│ ├── Pipfile
│ ├── Pipfile.lock
│ ├── README.md
│ ├── __init__.py
│ ├── baby_pipeline.py
│ ├── configs
│ │ └── pipeline_config.yaml
│ ├── data
│ │ ├── forestfires.csv
│ │ └── processed_df.csv
│ ├── io_solids.py
│ └── logic.py
├── dagster_try
│ ├── Pipfile
│ ├── Pipfile.lock
│ ├── __init__.py
│ ├── cereal.csv
│ ├── config.py
│ ├── config_env.yml
│ ├── hello_cereal.py
│ ├── hello_dagster.py
│ ├── inputs.py
│ └── inputs_config.yml
├── local_repo.py
└── workspace.yaml
4 directories, 24 files
(dagster_baby_pipeline) danil@dk dagster-pipelines %
alex
02/04/2021, 7:57 PMimport io_solids
or from .io_solids import ...
edit: correct relative importdagster_baby_pipeline
is in the python path, i believe you need to do relative imports once you are inside the moduledagster is failing to load recursively everything under the current working directorythis is less of a dagster specific issue and more to do with how python works https://docs.python.org/3/tutorial/modules.html, you should see the same behavior if you just do
python local_repo.py
danil
02/04/2021, 8:28 PMio_solids
gets imported from baby_pipeline.py
as import io_solids
. import .io_solids
throws a syntax error. Yes, I can confirm python local_repo.py
throws same issue.
What would be the best practice to organize the pipeline so that they can in their our modules and successfully ran from Dagit?File "/Users/danil/Desktop/dagster-pipelines/./dagster_baby_pipeline/baby_pipeline.py", line 2
import .io_solids as io
^
SyntaxError: invalid syntax
alex
02/04/2021, 8:37 PMfrom .io_solids import ...
danil
02/04/2021, 8:44 PM(dagster_baby_pipeline) danil@dk dagster-pipelines % python local_repo.py
Traceback (most recent call last):
File "local_repo.py", line 2, in <module>
from dagster_baby_pipeline import baby_pipeline
File "/Users/danil/Desktop/dagster-pipelines/dagster_baby_pipeline/baby_pipeline.py", line 2, in <module>
from .io_solids import io_solids as io
ImportError: cannot import name 'io_solids' from 'dagster_baby_pipeline.io_solids' (/Users/danil/Desktop/dagster-pipelines/dagster_baby_pipeline/io_solids.py)
alex
02/04/2021, 8:44 PMfrom .io_solids import io_solids as io
- do you have a attribute named io_solids
in the file io_solids
?from .io_solids import *
but otherwise i think you want to just import the things from that file manually from .io_solids import solid_x, solid_y, ...
bob
02/04/2021, 9:14 PMdagster new-repo
that would generate a working Dagster project skeleton with a single repository. You can view the skeleton code here on Phabricator. If you want to try out the CLI command, you can git checkout
my feature branch with instructions heredanil
02/04/2021, 9:19 PM_from_ .io_solids _import_ *
seems to be the remedy for the this problem! Thanks so much - this is resolved.baby_pipeline
from Dagit UI alongside others which is a big win - thanks guys. However, once I get inside dagster_baby_pipeline
module and try to run the pipeline separately, getting the import error the other way:
(dagster_baby_pipeline) danil@dk dagster-pipelines % tree
.
├── Pipfile
├── Pipfile.lock
├── __init__.py
├── dagster_baby_pipeline
│ ├── Pipfile
│ ├── Pipfile.lock
│ ├── README.md
│ ├── __init__.py
│ ├── baby_pipeline.py
│ ├── configs.py
│ ├── constants.py
│ ├── data
│ │ ├── forestfires.csv
│ │ └── processsed_df.csv
│ ├── io_solids.py
│ └── logic.py
├── dagster_try
│ ├── Pipfile
│ ├── Pipfile.lock
│ ├── __init__.py
│ ├── cereal.csv
│ ├── config.py
│ ├── config_env.yml
│ ├── hello_cereal.py
│ ├── hello_dagster.py
│ ├── inputs.py
│ └── inputs_config.yml
├── local_repo.py
└── workspace.yaml
3 directories, 26 files
(dagster_baby_pipeline) danil@dk dagster-pipelines % cd dagster_baby_pipeline
(dagster_baby_pipeline) danil@dk dagster_baby_pipeline % dagster pipeline execute -f baby_pipeline.py --preset local
Traceback (most recent call last):
File "/Users/danil/.local/share/virtualenvs/dagster_baby_pipeline-Vr5WxBjI/lib/python3.8/site-packages/dagster/core/code_pointer.py", line 94, in load_python_file
return import_module_from_path(module_name, python_file)
File "/Users/danil/.local/share/virtualenvs/dagster_baby_pipeline-Vr5WxBjI/lib/python3.8/site-packages/dagster/seven/__init__.py", line 50, in import_module_from_path
spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 783, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "baby_pipeline.py", line 2, in <module>
from .io_solids import *
ImportError: attempted relative import with no known parent package
dagster_baby_pipeline
with my coworker via a GitHub repo (hence running on its own is a must) while being able to load multiple pipelines in Dagit UI at the same time._from_ dagster_baby_pipeline _import_ constants
inside baby_pipeline
for Dagit to work and dagster pipeline execute -f baby_pipeline.py -d .. --preset local
for running from within the module. ``-d ..` saved the day!!Michael Lynton
03/18/2021, 7:49 PM