I'm trying to add dagster to an existing git repos...
# dagster-plus
r
I'm trying to add dagster to an existing git repository in a subdirectory of the repository root. Given the following directory structure, how do I define my code location in
dagster_cloud.yaml
?
Copy code
├── dagster
│   ├── example_project
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── assets.py
│   ├── example_project_tests
│   │   ├── __init__.py
│   │   └── test_assets.py
│   ├── pyproject.toml
│   ├── requirements.txt
│   ├── setup.py
├── dagster_cloud.yaml
├── some_other_project
│   ├── ...
├── doc
│   ├── ...
Please also critique any structure misunderstanding I may have in configuring this project. We do not plan to support multiple versions of Python within our implementation at this time.
Copy code
#setup.py

from setuptools import find_packages, setup

setup(
    name="optios-dagster",
    packages=find_packages(exclude=["*_tests"]),
    install_requires=[
        "dagster",
        "dagster-cloud",
        "boto3",
        "pandas",
        "matplotlib",
    ],
    extras_require={"dev": ["dagit", "pytest", "requests-mock"]},
)
Copy code
#pyproject.toml

[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[tool.dagster]
module_name = "example_project"
I can run
dagster dev
from the dagster subdirectory successfully. Ideally, I'd also like to configure
dagster dev
to run without parameters from the repository root, just to be consistent with production. That is not required, however (particularly if you tell me that's not how cloud works)
p
To set up your git repo to deploy locations to dagster cloud, you can specify your
dagster_cloud.yaml
file to map code locations to the source directory you’re trying to deploy: https://docs.dagster.io/concepts/code-locations#cloud-deployment
r
I'm struggling to understand what file dagster cloud needs as an entry point. Is it dagter/setup.py, dagster/example_project/_init_.py, somthing else?
Is it incorrect to assume a single code location can dynamically discover defintiions
p
You can specify a specific file or a specific module (using the
code_source
attribute) for each location config but I believe each location will by default look for
pyproject.toml
file.
r
Perfect, thnak you
Still can't get this to work. I believe the dagster_config.yaml is not correct but I cannot find any resource that configures a subdirectory. I've been over https://docs.dagster.io/concepts/code-locations#cloud-deployment may times. The directory structure of our git repos is:
Copy code
.
├── README.md
├── dagster                          <-- sub directory
    ├── README.md
    └── data_orchestration           <-- code location root
        ├── README.md
        ├── data_orchestration
        │   ├── __init__.py
        │   │   └── __init__.cpython-38.pyc
        │   └── assets
        │       ├── README.md
        │       ├── __init__.py
        │       │   └── __init__.cpython-38.pyc
        │       ├── example
        │       │   ├── README.md
        │       │   ├── __init__.py
        │       │   └── hackernews_assets.py
        │       └── trade
        ├── data_orchestration_tests
        │   ├── __init__.py
        │   └── example_tests
        │       └── test_example.py
        ├── orchestration_utils
        │   └── orchestration_utils
        ├── pyproject.toml
        ├── requirements.txt          <-- requirements for code location
        ├── setup.cfg
        └── setup.py                  <-- module definition
├── dagster_cloud.yaml
└── util
dagster_cloud.yaml
Copy code
locations:
- location_name: data_orchestration
  code_source:
    working_directory: dagster
    module_name: data_orchestration.data_orchestration
The following, run from the root of the git repos source, works great
Copy code
dagster dev -m data_orchestration.data_orchestration -d dagster
When executing the branch deploy github action (lifted from project_feature_complete), it reports
Copy code
Building Python executable for data_orchestration from directory /home/runner/work/neuroedge/neuroedge/source_root/. and Python 3.8.
I expect the path to be:
Copy code
/home/runner/work/neuroedge/neuroedge/source_root/dagster/.
How can I configure dagster_cloud.yaml so the code location loads the requirements.txt file from a subdirectory (
dagster
) of the repo source?
p
Apologies for the confusion…
code_source
references where to find the definitions from the image that is deployed. the
build
property determines how that image is built and deployed, relative to the git repository. You might find this guide (specific to github) to be helpful: https://docs.dagster.io/dagster-cloud/developing-testing/branch-deployments/using-branch-deployments-with-github#step-3-add-githu[…]-your-repository
r
I thought I'd post a follow-up for anyone who comes across this. I'd recommend adding this detail to documentation as it was not intuitive. Dagster doesn't care where in the source control repository the project lives. The integration from Dagster with GitHub does not pull the source code and inspect configuration. Rather, the automatic GitHub integration (from Dagster cloud) creates GitHub workflows and actions, in the GitHub repo, that use Dagster tools to deploy Dagster code to the connected account. So GitHub workflows push the deployment to Dagster via the
<https://github.com/dagster-io/dagster-cloud-action|dagster-io/dagster-cloud-action>
(Github Action) upon source control triggers defined in the GitHub workflows. Its essentially automating the deployment described using the Dagster CLI tool using GitHub CI Actions. To host Dagster in a sub-directory of a monorepo, simply place the Dagster project code in whatever subdirectory fits the directory structure strategy, including
dagster_cloud.yaml
and configure the GitHub Action to point to the config file in that directory. Specifically, configure
dagster_cloud_file
with the correct sub-directory path to the _dagster_cloud.yaml_ file. For example, if the Dagster project root is in the sub directory
tools/dagster_orchestration
, configure the following in both: • .github/workflows/dagster_branch_deployments.yml • .github/workflows/dagster_deploy.yml
Copy code
dagster_cloud_file: "$GITHUB_WORKSPACE/project-repo/tools/dagster_orchestration/dagster_cloud.yaml"
Note that the "project-repo" sub-directory is established by the checkout task with
Copy code
- name: Checkout
       ...
          path: project-repo # <-- places source code in the sub-directory project-repo
Avoid naming a sub-directory
dagster
as that may conflict with the
dagster
library package
...and as luck would have it, I found the example I've been searching for that clearly demonstrates what I've been trying to do through trial and error for several weeks. @Sean Lopp How can we get https://github.com/slopp/dagteam officially included with Dagster documentation and examples? Its fabulous and excellent and would have saved me a ton of time.
s
Rusty I'm both glad the example was helpful and saddened that it took so long to find. We are working on adding more examples of these cloud deployment strategies and will include them in our official docs/examples to be more discoverable
🙏 2
p
thanks for going through this @Rusty Zarse 🙂 saved me a few headaches