https://dagster.io/ logo
#ask-community
Title
# ask-community
o

Oliver

07/06/2022, 1:02 AM
Hi all, I have large deependencies installed in my execution environment that I don't want to carry around in the user code deployment (eg torch), however at the moment when trying to load user code deployment without those dependencies I get a crash. Has anyone else had similar use cases? Any ideas for approaches could I use to solve this?
p

prha

07/06/2022, 1:41 AM
Do you have a trace of what that crash looks like? What run launcher are you using? The user code deployment has to be able to evaluate schedules/sensors, and load job code at definition time in order for dagit to load and for the daemon to run. However, the bodies of individual assets/ops aren’t actually executed until a run is launched. Depending on what run launcher you’re using, you might be able to defer the loading of torch until you actually need it.
o

Oliver

07/06/2022, 2:13 AM
Not my actual code but should be fine as an example.
Copy code
import torch
@asset
def torch_asset():
    return torch.randn(10,10)

@repository
def repo():
    return torch_asset
the crash is just saying
module not found
because I didn't include it in the image I guess one option would be to import within the solid but I don't like that as a general solution eg
Copy code
@asset
def torch_asset():
    import torch
    return torch.randn(10,10)

@repository
def repo():
    return torch_asset
my actual folder structure looks like this, all the files under
sepsis.data
are dagster assests,
train.py
is the one that has the heavy imports.
sepsis.experiments
is where all the business logic for the
train
asset lives and there is some minor setup code in the train asset itself.
sepsis.data.__init__
is where the repositories are define and the workspace yaml is pointed at the
sepsis.data
module
Copy code
sepsis
├── __init__.py
├── config.yaml
├── data
│   ├── __init__.py
│   ├── classification_dataset.py
│   ├── cohort.py
│   ├── dev.yaml
│   ├── diagnoses.py
│   ├── executor_mapping.yaml
│   ├── normalisation_constants.py
│   ├── ray_dev.yaml
│   ├── resources.py
│   ├── staging.yaml
│   ├── train.py
│   ├── utils.py
│   └── vitals.py
└── experiment
    ├── __init__.py
    ├── architecture
    │   ├── __init__.py
    │   ├── basic_ff.py
    │   ├── basic_ff.yaml
    │   ├── components
    │   │   ├── __init__.py
    │   │   ├── conv_1d_res.py
    │   │   ├── fourier_features.py
    │   │   ├── positional_embeddings.py
    │   │   ├── receptive_field.py
    │   ├── linear.py
    │   ├── linear_res.yaml
    │   ├── resnet1d.py
    │   ├── resnet1d.yaml
    │   ├── rnn.py
    │   ├── rnn.yaml
    │   ├── wavenet.py
    │   └── wavenet.yaml
    ├── dataset
    │   ├── __init__.py
    │   ├── mnist.py
    │   ├── mnist.yaml
    │   ├── time_series_lmdb.py
    │   └── time_series_lmdb.yaml
    ├── ff_sepsis_classifier.yaml
    └── model
        ├── __init__.py
        ├── binary_classifier.py
        ├── binary_classifier.yaml
        ├── classifier.py
        └── classifier.yaml
I am using k8s run launcher and a custom executor (https://github.com/dagster-io/dagster/issues/2830#issuecomment-1165156021)
p

prha

07/06/2022, 2:29 AM
Yeah, we have to be able to load the repository (with all of its defined assets) in order to get the shape of the asset dependencies. This means that it will hit the import statement, which requires the package to be there, regardless of whether the body of the asset is evaluated or not. Deferring the import into the body of the asset is the main way I can think of to do that.
o

Oliver

07/06/2022, 2:47 AM
hm ok, yea, I think I am framing this wrong. I guess another potential soln would be to base my user code image on the execution env image, then only the lighter dependencies will be need to be built/pushed. -- The overarching goal I am trying to achieve here is for users to be able to run pipelines locally as if they were on prod (maybe more like staging?) infrastructure. The Ray executor goes a long way to solving by allowing users to run dagit locally and connect to a ray cluster in a cloud environemnt but if dagit is local and ray is remote then I lose dagit UI updates since the ray cluster doesn't have access to the local sqlite db
2 Views