https://dagster.io/ logo
#ask-community
Title
# ask-community
a

Akira Renbokoji

02/29/2024, 6:10 PM
Hello, I'm trying to replace
@repository
with
Definitions
to future proof against
@repository
being deprecated some day. Well, really I guess it's because I can't get
hooks
and
resource_defs
to work with
define_asset_jobs
and
@repository
but I can get it to work with
Definitions
. Can someone check my understanding on the transition? Currently I have a
repository.py
file with multiple
@repository
functions.
Copy code
@repository
def foo():
    ...

@repository
def bar():
    ...
workspace.yaml
Copy code
load_from:
  - grpc_server:
      host: grpc_host
      port: grpc_port
      location_name: "grpc_server"
Unfortunately for me,
Definitions
and
@repository
can't coexist so I have to replace
@repository
It's my understanding that I shouldn't have multiple
Definitions
in one file, instead I should split them up into different folders and declare the
Definitions
in the
__init__.py
file, which is great.
foo_project/__init__.py
Copy code
definitions = Definitions(
    ...
)
bar_project/__init__.py
Copy code
definitions = Definitions(
    ...
)
workspace.yaml
Copy code
load_from:
  - grpc_server:
      host: grpc_host
      port: grpc_port
      location_name: "grpc_server"
How do I point to the multiple Definitions with a GRPC server? With
@repository
I have a file with multiple repository functions. Can I do the same with Definitions?
Copy code
foo_definition = Definitions(
    ...
)

bar_definition = Definitions(
    ...
)
I would like to be able to pass the same resource to various Definitions from a top file instead of declaring it in each
__init__.py
Copy code
Cannot have more than one Definitions object defined at module scope	

Dagster found multiple Definitions objects in a single Python module. 

Only one Definitions object may be in a single code location.
Ok, so I have to set up multiple code locations?
z

Zach

02/29/2024, 7:48 PM
Yeah Definitions are a little different than repositories in that they specifically define a user code location. So you either have to combine your repositories into a single Definition, or set up multiple code locations
a

Akira Renbokoji

02/29/2024, 7:48 PM
oh, I can combine my repositories into a single Definition!?
z

Zach

02/29/2024, 7:48 PM
A bit of a shame, I found the repository concept really helpful for organizing jobs within a code location
Yeah why not?
a

Akira Renbokoji

02/29/2024, 7:49 PM
hmmm, wait I don't think I grasp the idea yet
i have multiple repositories with different assets
do I just assign it to Definitions as a list?
Copy code
def = Definitions(
    assets=[foo_asset, bar_asset]
)
z

Zach

02/29/2024, 7:49 PM
Yup,
Defintions(assets=[asset1, asset2, asset3])
a

Akira Renbokoji

02/29/2024, 7:50 PM
and jobs the same way?
will asset 1 and 2 be available to job 3?
z

Zach

02/29/2024, 7:50 PM
Yes
a

Akira Renbokoji

02/29/2024, 7:50 PM
oh ok, so everything is just available to all
so same with resources.
Thank you. That helps.
🎉 1
z

Zach

02/29/2024, 7:51 PM
Yup, things within the same Definitions object can all access each other
a

Akira Renbokoji

02/29/2024, 7:52 PM
Do you think
@repository
will eventually be deprecated?
z

Zach

02/29/2024, 7:55 PM
Yes, but I think it will be a while
I'm still using
@repository
because I don't think the deprecation will come anytime soon, like possibly not within the next 12 months
a

Akira Renbokoji

02/29/2024, 8:10 PM
Do you know how to go about setting up different code locations with GRPC? Would I have to set up multiple GRPC servers?
I'm not sure if I should combine my 20+ repositories into 1 Definition.
z

Zach

02/29/2024, 8:31 PM
It very much depends on how you have Dagster deployed. But I don't think there's any technical reasons why you can't combine all your repositories into 1 Definition. If you only have one user code location it'll be the same as your repository set up, except the jobs unfortunately won't be in sub folders
If you want to have multiple Definitions, you'll need to have multiple code location servers. How that happens is highly dependent on how Dagster is deployed
a

Akira Renbokoji

02/29/2024, 9:52 PM
So I have different slack groups being alerted with
@repository
. If I make a single Definition with all my repository than I will end up with only one slack group right?
GRPC Server is in a Docker Container. When it launches it runs
repository.py
Copy code
repository.py

import project_foo
import project_bar

@repository
def foo():
    project_foo.to_job(
         ...
         resource_defs={
             "foo_slack": foo_slack 
         }
)

@repository
def bar():
    project_bar.to_job(
         ...
         resource_defs={
             "bar_slack": bar_slack 
         }
)
yea, I think I have to go the multiple code location route but I don't know how to set up GRPC for it.
Maybe I can do something like: GRPC launches definitions.py
Copy code
definitions.py

import project_foo
import project_bar

def = Definitions(
    assets=[project_foo.assets()]
    jobs=[<http://project_foo.jobs|project_foo.jobs>(), <http://project_bar.jobs|project_bar.jobs>()]
)
z

Zach

02/29/2024, 10:03 PM
If the resource keys are different, then you can have different slack resources for different jobs in the same definitions object
a

Akira Renbokoji

02/29/2024, 10:03 PM
hmm, how do I associate each resource def with a specific job?
I was under the impression that each resource is available to all jobs. Oh maybe, I just name them.
z

Zach

02/29/2024, 10:04 PM
where is "bar_slack" being referenced?
Or another way to ask would be how are you referencing resources in your jobs?
a

Akira Renbokoji

02/29/2024, 10:05 PM
sorry, it looks like bar_slack is a local variable for that
@repository
function
so we can replace: foo_slack = ["@foo-stakeholder"] bar_slack = ["@bar-stakeholder"]
z

Zach

02/29/2024, 10:06 PM
I'm not sure what you're referencing there
a

Akira Renbokoji

02/29/2024, 10:07 PM
It's the names that the Slack hook should use when alerting (succes/failure).
Copy code
definitions = Definitions(
  assets=foo_asset,
  jobs=[define_asset_job(name="foo_assets_materialization", hooks={slack_message_on_failure})],
  resources={
    "foo": foo_config,
    'slack': foo_slack_credentials(),
    'slack_names': "@foo-stakeholder"
  },
)
z

Zach

02/29/2024, 10:09 PM
I'm not super familiar with the slack failure hook, it's unclear to me how it uses resources but maybe that'll work
a

Akira Renbokoji

02/29/2024, 10:10 PM
Now I want to add my:
Copy code
@repository
def bar():
    run bar_graph.to_job(
        ...
        resource_defs={
            'slack_names' : "@bar-stakeholder"
        }
    )
    return [run]
so with the way I set up Definitions, I can only see both
@foo-stakeholder
and
@bar-stakeholder
getting alerted.
I guess, it's not a problem if I can set up the hooks somehow at the job level
I could just have different hooks created. {slack_message_on_failure} would instead be {slack_foo_stakeholder_on_failure} or something
z

Zach

02/29/2024, 10:12 PM
Is slack_names the required resource key for the
slack_on_failure
hook? It also looks like you could also just set hooks at a job level like this:
Copy code
@slack_on_failure("#foo", webserver_base_url="<http://localhost:3000>")
@job(...)
def my_job():
    pass
I'm struggling to see how "slack_names" works with the built-in
slack_on_failure
hook but I get the feeling there's a lot I'm missing about your code
a

Akira Renbokoji

02/29/2024, 10:13 PM
yep, but the job level part helps a lot since I was thinking that
I just can't imagine this not looking insane
having 50+ jobs in a definition
z

Zach

02/29/2024, 10:14 PM
I don't really see the problem. You already had 50+ jobs in your code location. You'll still have 50+ jobs in your code location
a

Akira Renbokoji

02/29/2024, 10:14 PM
maybe it's not so bad if I'm just passing in functions to the definition
z

Zach

02/29/2024, 10:14 PM
But if you want to split it up, split it up
a

Akira Renbokoji

02/29/2024, 10:14 PM
well, it's still split up into different functions with
@repository
Copy code
@repository
def a():
    ...
    return [a]

...

@repsitory
def z():
    return [z]
Copy code
@job
def a():
    ...

@job
def z():
    ...

def = Definition(
    assets=[a_asset, ... z_asset],
    jobs=[a, ... z],
)
I guess it'll be the same thing.
z

Zach

02/29/2024, 10:17 PM
yup
a

Akira Renbokoji

02/29/2024, 10:17 PM
Thank you. This should be fun. 😅
z

Zach

02/29/2024, 10:18 PM
Good luck! I'm happy to help try to walk you through doing multiple code locations if you feel you want to go that route too!
a

Akira Renbokoji

02/29/2024, 10:18 PM
I feel like multiple code locations is more of a "best practice" but i'm not sure.
I am curious as to how it works even if it's not.
z

Zach

02/29/2024, 10:19 PM
That just carries a lot more unnecessary overhead if you don't have requirements around separated environments or separated projects
How is your Dagster instance deployed right now? Like I've said a couple times, it is very dependent on that
a

Akira Renbokoji

02/29/2024, 10:19 PM
Oh ok, so if I just want something simple then single definition would work for me
I think dagit, daemon, and grpc all live in their own docker containers
z

Zach

02/29/2024, 10:19 PM
yes. the only reasons to separate code locations are if you want separated environments or isolated projects
a

Akira Renbokoji

02/29/2024, 10:20 PM
ah ok, I don't think i'm looking for that yet
I just want to be as organized as I can be
the multiple folder sounded cool but I think I can still attempt to do that
well, I guess the definitions being in the init.py file was what I was attracted to lol
I don't think I can do that but it's ok
z

Zach

02/29/2024, 10:21 PM
Why can't you put the definitions in the init.py file?
a

Akira Renbokoji

02/29/2024, 10:22 PM
oh can i!?
since I have a single definition that GRPC is calling that I couldn't really import projects with their own definitions
Copy code
grpc -> definitions.py
Copy code
definitions.py

@job
def a():
    ...

@job
def b():
    ...

def = Definitions(
    ...
    jobs=[a,b]
)
but if I have a and b project in folders with their own definitions..
Copy code
definitions.py

import a
import b
?
z

Zach

02/29/2024, 10:24 PM
you can do
Copy code
#__init__.py
from project_a import a
from project_b import b
def = Definitions(jobs=[a,b])
a

Akira Renbokoji

02/29/2024, 10:25 PM
what would
definitions.py
look like, the file that grpc is calling?
z

Zach

02/29/2024, 10:26 PM
You can point the GRPC server to your init.py file. Basically you can put your Definitions object in any file you'd like, and just point the GRPC server to that file in the workspace.yaml
a

Akira Renbokoji

02/29/2024, 10:27 PM
we will still have a single Definitions() object?
z

Zach

02/29/2024, 10:27 PM
If you wanted to put it in a init.py file you can do something like this in your workspace.yaml:
Copy code
load_from:
  - python_module: path.to.package
if the init file was in /path/to/package/init.py
Yes one Definitions object, just like in the example I showed above
a

Akira Renbokoji

02/29/2024, 10:27 PM
sorry, I misunderstood. I thought we were setting up a unique Definitions() object in each projects init file.
z

Zach

02/29/2024, 10:27 PM
If you're doing multiple code locations, yes
Sorry I think I confused you because I didn't quite understand
what would
definitions.py
look like, the file that grpc is calling?
you were asking about what the definitions.py file would look like in a multi code location setup
which I misunderstood
a

Akira Renbokoji

02/29/2024, 10:29 PM
hmm, let me see if i can find the command. it's when we launch grpc we pass a file or module as an argument
z

Zach

02/29/2024, 10:29 PM
Often when you have multiple code locations they're in separate git repos entirely
a

Akira Renbokoji

02/29/2024, 10:29 PM
I'm passing a file with a bunch of
@repository
in it
yah, i dont think multiple code location is for me at this time
i'll keep it in my back pocket for the day I do need it
🎉 1
Copy code
dagster api grpc --python-file definitions.py --host 0.0.0.0 --port 4266
Copy code
# definitions.py

from project_a import a
from project_b import b

def = Definitions(
    ...
    jobs=[a, b]
)
Copy code
# project_a.py

@job
def a():
    ...
z

Zach

02/29/2024, 10:34 PM
That looks like it would work
a

Akira Renbokoji

02/29/2024, 10:36 PM
I think so, I'm just wary about
@op
and
@graph
now. Mainly, that I don't remember the difference between
@op
and
@job
. I'm going to take a look at the docs now to clear up my confusion. I think my understanding was that
@graph
has a collection of
@job
and
@op
is a function in
@job
So I guess my question should be changed to, would Definitions() work with
@graph
Ok, graph contains ops
z

Zach

02/29/2024, 10:38 PM
graphs, jobs, ops all stay the same between
@repository
vs.
Definitions
🙌 1
a

Akira Renbokoji

02/29/2024, 10:38 PM
ah ok, so i can use
.to_job
and pass in the graph as a job to Definitions
I think I have everything I need to do this.
🎉 1
z

Zach

02/29/2024, 10:39 PM
You got this!
a

Akira Renbokoji

02/29/2024, 10:39 PM
Thank you!
2 Views