Hey Dagsters and Early Adopters, Thank you for the...
# announcements
d
Hey Dagsters and Early Adopters, Thank you for the awesome work - I am in the early stages of exploring Dagster, so far loving it but not without bumps ๐Ÿ™‚ I am having trouble with properly importing pipelines from its respective modules when using Dagit and
workspace.yaml
with 2 pipelines under the same directory. Reading documentation was of no help since it covers only simple use cases. Here is the structure of the folder from where I run `dagit`:
Copy code
- local_repo.py
- workspace.yaml
- dagster_baby_pipeline (folder with pipeline definition)
- dagster_try (folder with pipeline definition)
In
local_repo.py
I try to import
baby_pipeline.py
from
dagster_baby_pipeline
and
hello_cereal.py
from
dagster_try
to return from
@repository
function. From the logs I am suspecting that those module arenโ€™t getting loaded into Dagster process hence the relative imports donโ€™t work. If I specify
working_directory
for ONLY
dagster_baby_pipeline
folder in
workspace.yaml
and donโ€™t import
dagster_try
then it works. It seems like if I create a repository for each pipeline in its respective modules and import them separately in
workspace.yaml
then it will achieve what I want but will require boilerplate overhead. Can you please clarify what is happening under the hood with imports in this scenario and what is the best practice on using these abstractions? Is there a way to load both
dagster_baby_pipeline
and
dagster_try
modules so they can be accessed by a single
local_repo.py
?
Let me know if youโ€™d like me to create a ticket in GitHub issues or paste the code here. What ever makes it easier to debug.
a
Can you share more about the error you are seeing? Unless you make this a package you canโ€™t do โ€œrelativeโ€ imports but working directory based imports should work
the quick test i did
Copy code
(dagenv38) /tmp/project:$ tree
.
โ”œโ”€โ”€ folder_a
โ”‚ย ย  โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ folder_b
โ”‚ย ย  โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ repo.py
โ””โ”€โ”€ workspace.yaml

2 directories, 4 files
(dagenv38) /tmp/project:$ cat workspace.yaml
load_from:
  - python_file:
      relative_path: repo.py
      working_directory: .
(dagenv38) /tmp/project:$ cat repo.py
from dagster import repository
from folder_a import my_pipeline
from folder_b import other_pipeline


@repository
def my_repo():
    return [my_pipeline, other_pipeline]
d
I do have all of them as a package with
__init__.py
file present. If we omit
dagster_try
package for now and only try to deal with
dagster_baby_pipeline
for the sake of simplicity, here is what we get: Happy Path:
Copy code
(dagster_baby_pipeline) danil@dk dagster-pipelines % cat local_repo.py
from dagster import repository
import baby_pipeline

@repository
def repository():
    return [baby_pipeline.pipeline]%                                                                                                                                                                                                       
(dagster_baby_pipeline) danil@dk dagster-pipelines % cat workspace.yaml 
load_from:
  - python_file:
      relative_path: local_repo.py
      working_directory: /Users/danil/desktop/dagster-pipelines/dagster_baby_pipeline/

(dagster_baby_pipeline) danil@dk dagster-pipelines %
Sad Path:
Copy code
(dagster_baby_pipeline) danil@dk dagster-pipelines % cat local_repo.py
from dagster import repository
from dagster_baby_pipeline import baby_pipeline

@repository
def repository():
    return [baby_pipeline.pipeline]%                                                                                                                                                                                                       
(dagster_baby_pipeline) danil@dk dagster-pipelines % cat workspace.yaml 
load_from:
  - python_file:
      relative_path: local_repo.py
      working_directory: .

(dagster_baby_pipeline) danil@dk dagster-pipelines %
Error log:
Copy code
UserWarning: Error loading repository location local_repo.py:dagster.core.errors.DagsterUserCodePro
cessError: dagster.core.errors.DagsterImportError: Encountered ImportError: `No module named 'io_solids'` while importing module local_repo from file /Users/danil/Desktop/dagster-pipelines/local_repo.py. Local modules were resolved usi
ng the working directory `/Users/danil/Desktop/dagster-pipelines`. If another working directory should be used, please explicitly specify the appropriate path using the `-d` or `--working-directory` for CLI based targets or the `workin
g_directory` configuration option for `python_file`-based workspace.yaml targets.
io_solids
is inside `dagster_baby_pipeline`:
Copy code
(dagster_baby_pipeline) danil@dk dagster-pipelines % tree
.
โ”œโ”€โ”€ Pipfile
โ”œโ”€โ”€ Pipfile.lock
โ”œโ”€โ”€ dagster_baby_pipeline
โ”‚   โ”œโ”€โ”€ Pipfile
โ”‚   โ”œโ”€โ”€ Pipfile.lock
โ”‚   โ”œโ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ baby_pipeline.py
โ”‚   โ”œโ”€โ”€ configs
โ”‚   โ”‚   โ””โ”€โ”€ pipeline_config.yaml
โ”‚   โ”œโ”€โ”€ data
โ”‚   โ”‚   โ”œโ”€โ”€ forestfires.csv
โ”‚   โ”‚   โ””โ”€โ”€ processed_df.csv
โ”‚   โ”œโ”€โ”€ io_solids.py
โ”‚   โ””โ”€โ”€ logic.py
โ”œโ”€โ”€ dagster_try
โ”‚   โ”œโ”€โ”€ Pipfile
โ”‚   โ”œโ”€โ”€ Pipfile.lock
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ cereal.csv
โ”‚   โ”œโ”€โ”€ config.py
โ”‚   โ”œโ”€โ”€ config_env.yml
โ”‚   โ”œโ”€โ”€ hello_cereal.py
โ”‚   โ”œโ”€โ”€ hello_dagster.py
โ”‚   โ”œโ”€โ”€ inputs.py
โ”‚   โ””โ”€โ”€ inputs_config.yml
โ”œโ”€โ”€ local_repo.py
โ””โ”€โ”€ workspace.yaml

4 directories, 24 files
(dagster_baby_pipeline) danil@dk dagster-pipelines %
So my understading is that dagster is failing to load recursively everything under the current working directory.
@alex Thanks for getting back to me so quickly - appreciate it.
a
are you doing
import io_solids
or
from .io_solids import ...
edit: correct relative import
the former wont work unless
dagster_baby_pipeline
is in the python path, i believe you need to do relative imports once you are inside the module
dagster is failing to load recursively everything under the current working directory
this is less of a dagster specific issue and more to do with how python works https://docs.python.org/3/tutorial/modules.html, you should see the same behavior if you just do
python local_repo.py
d
io_solids
gets imported from
baby_pipeline.py
as
import io_solids
.
import .io_solids
throws a syntax error. Yes, I can confirm
python local_repo.py
throws same issue. What would be the best practice to organize the pipeline so that they can in their our modules and successfully ran from Dagit?
Copy code
File "/Users/danil/Desktop/dagster-pipelines/./dagster_baby_pipeline/baby_pipeline.py", line 2
    import .io_solids as io
           ^
SyntaxError: invalid syntax
a
oh my bad - you have to do
from .io_solids import ...
d
Hmm that doesnโ€™t work either but the error isnโ€™t helpful either:
Copy code
(dagster_baby_pipeline) danil@dk dagster-pipelines % python local_repo.py
Traceback (most recent call last):
  File "local_repo.py", line 2, in <module>
    from dagster_baby_pipeline import baby_pipeline
  File "/Users/danil/Desktop/dagster-pipelines/dagster_baby_pipeline/baby_pipeline.py", line 2, in <module>
    from .io_solids import io_solids as io
ImportError: cannot import name 'io_solids' from 'dagster_baby_pipeline.io_solids' (/Users/danil/Desktop/dagster-pipelines/dagster_baby_pipeline/io_solids.py)
a
from .io_solids import io_solids as io
- do you have a attribute named
io_solids
in the file
io_solids
?
you can do
from .io_solids import *
but otherwise i think you want to just import the things from that file manually
from .io_solids import solid_x, solid_y, ...
b
@danil If youโ€™re interested, Iโ€™m currently working on a CLI command
dagster new-repo
that would generate a working Dagster project skeleton with a single repository. You can view the skeleton code here on Phabricator. If you want to try out the CLI command, you can
git checkout
my feature branch with instructions here
d
thanks @bob, I will check it out - that sounds pretty much what I am doing: setting up local dev environment to play with multiple pipelines using Dagit.
@alex
_from_ .io_solids _import_ *
seems to be the remedy for the this problem! Thanks so much - this is resolved.
@alex @bob Following up on this again, currently I am able to launch
baby_pipeline
from Dagit UI alongside others which is a big win - thanks guys. However, once I get inside
dagster_baby_pipeline
module and try to run the pipeline separately, getting the import error the other way:
Copy code
(dagster_baby_pipeline) danil@dk dagster-pipelines % tree
.
โ”œโ”€โ”€ Pipfile
โ”œโ”€โ”€ Pipfile.lock
โ”œโ”€โ”€ __init__.py
โ”œโ”€โ”€ dagster_baby_pipeline
โ”‚   โ”œโ”€โ”€ Pipfile
โ”‚   โ”œโ”€โ”€ Pipfile.lock
โ”‚   โ”œโ”€โ”€ README.md
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ baby_pipeline.py
โ”‚   โ”œโ”€โ”€ configs.py
โ”‚   โ”œโ”€โ”€ constants.py
โ”‚   โ”œโ”€โ”€ data
โ”‚   โ”‚   โ”œโ”€โ”€ forestfires.csv
โ”‚   โ”‚   โ””โ”€โ”€ processsed_df.csv
โ”‚   โ”œโ”€โ”€ io_solids.py
โ”‚   โ””โ”€โ”€ logic.py
โ”œโ”€โ”€ dagster_try
โ”‚   โ”œโ”€โ”€ Pipfile
โ”‚   โ”œโ”€โ”€ Pipfile.lock
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ cereal.csv
โ”‚   โ”œโ”€โ”€ config.py
โ”‚   โ”œโ”€โ”€ config_env.yml
โ”‚   โ”œโ”€โ”€ hello_cereal.py
โ”‚   โ”œโ”€โ”€ hello_dagster.py
โ”‚   โ”œโ”€โ”€ inputs.py
โ”‚   โ””โ”€โ”€ inputs_config.yml
โ”œโ”€โ”€ local_repo.py
โ””โ”€โ”€ workspace.yaml

3 directories, 26 files
(dagster_baby_pipeline) danil@dk dagster-pipelines % cd dagster_baby_pipeline 
(dagster_baby_pipeline) danil@dk dagster_baby_pipeline % dagster pipeline execute -f baby_pipeline.py --preset local                           
Traceback (most recent call last):
  File "/Users/danil/.local/share/virtualenvs/dagster_baby_pipeline-Vr5WxBjI/lib/python3.8/site-packages/dagster/core/code_pointer.py", line 94, in load_python_file
    return import_module_from_path(module_name, python_file)
  File "/Users/danil/.local/share/virtualenvs/dagster_baby_pipeline-Vr5WxBjI/lib/python3.8/site-packages/dagster/seven/__init__.py", line 50, in import_module_from_path
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "baby_pipeline.py", line 2, in <module>
    from .io_solids import *
ImportError: attempted relative import with no known parent package
Are you aware of Python wizardry to have it both ways, be able to run as a module from a higher level directory using Dagit and being able to run it standalone? It is important to be able to do both for seamless dev UX. My use case is I want to share
dagster_baby_pipeline
with my coworker via a GitHub repo (hence running on its own is a must) while being able to load multiple pipelines in Dagit UI at the same time.
I was able to figure this out by using imports like this
_from_ dagster_baby_pipeline _import_ constants
inside
baby_pipeline
for Dagit to work and
dagster pipeline execute -f baby_pipeline.py -d .. --preset local
for running from within the module. ``-d ..` saved the day!!
celebrate 2
m
@danil thanks for closing the loop on this. I am just getting my feet wet but have been reading through posts that relate to โ€œstructureโ€ and this helped a ton.
๐Ÿ™Œ 2