Hello, I'm trying to replace `@repository` with `D...
# ask-community
a
Hello, I'm trying to replace
@repository
with
Definitions
to future proof against
@repository
being deprecated some day. Well, really I guess it's because I can't get
hooks
and
resource_defs
to work with
define_asset_jobs
and
@repository
but I can get it to work with
Definitions
. Can someone check my understanding on the transition? Currently I have a
repository.py
file with multiple
@repository
functions.
Copy code
@repository
def foo():
    ...

@repository
def bar():
    ...
workspace.yaml
Copy code
load_from:
  - grpc_server:
      host: grpc_host
      port: grpc_port
      location_name: "grpc_server"
Unfortunately for me,
Definitions
and
@repository
can't coexist so I have to replace
@repository
It's my understanding that I shouldn't have multiple
Definitions
in one file, instead I should split them up into different folders and declare the
Definitions
in the
__init__.py
file, which is great.
foo_project/__init__.py
Copy code
definitions = Definitions(
    ...
)
bar_project/__init__.py
Copy code
definitions = Definitions(
    ...
)
workspace.yaml
Copy code
load_from:
  - grpc_server:
      host: grpc_host
      port: grpc_port
      location_name: "grpc_server"
How do I point to the multiple Definitions with a GRPC server? With
@repository
I have a file with multiple repository functions. Can I do the same with Definitions?
Copy code
foo_definition = Definitions(
    ...
)

bar_definition = Definitions(
    ...
)
I would like to be able to pass the same resource to various Definitions from a top file instead of declaring it in each
__init__.py
Copy code
Cannot have more than one Definitions object defined at module scope	

Dagster found multiple Definitions objects in a single Python module. 

Only one Definitions object may be in a single code location.
Ok, so I have to set up multiple code locations?
z
Yeah Definitions are a little different than repositories in that they specifically define a user code location. So you either have to combine your repositories into a single Definition, or set up multiple code locations
a
oh, I can combine my repositories into a single Definition!?
z
A bit of a shame, I found the repository concept really helpful for organizing jobs within a code location
Yeah why not?
a
hmmm, wait I don't think I grasp the idea yet
i have multiple repositories with different assets
do I just assign it to Definitions as a list?
Copy code
def = Definitions(
    assets=[foo_asset, bar_asset]
)
z
Yup,
Defintions(assets=[asset1, asset2, asset3])
a
and jobs the same way?
will asset 1 and 2 be available to job 3?
z
Yes
a
oh ok, so everything is just available to all
so same with resources.
Thank you. That helps.
🎉 1
z
Yup, things within the same Definitions object can all access each other
a
Do you think
@repository
will eventually be deprecated?
z
Yes, but I think it will be a while
I'm still using
@repository
because I don't think the deprecation will come anytime soon, like possibly not within the next 12 months
a
Do you know how to go about setting up different code locations with GRPC? Would I have to set up multiple GRPC servers?
I'm not sure if I should combine my 20+ repositories into 1 Definition.
z
It very much depends on how you have Dagster deployed. But I don't think there's any technical reasons why you can't combine all your repositories into 1 Definition. If you only have one user code location it'll be the same as your repository set up, except the jobs unfortunately won't be in sub folders
If you want to have multiple Definitions, you'll need to have multiple code location servers. How that happens is highly dependent on how Dagster is deployed
a
So I have different slack groups being alerted with
@repository
. If I make a single Definition with all my repository than I will end up with only one slack group right?
GRPC Server is in a Docker Container. When it launches it runs
repository.py
Copy code
repository.py

import project_foo
import project_bar

@repository
def foo():
    project_foo.to_job(
         ...
         resource_defs={
             "foo_slack": foo_slack 
         }
)

@repository
def bar():
    project_bar.to_job(
         ...
         resource_defs={
             "bar_slack": bar_slack 
         }
)
yea, I think I have to go the multiple code location route but I don't know how to set up GRPC for it.
Maybe I can do something like: GRPC launches definitions.py
Copy code
definitions.py

import project_foo
import project_bar

def = Definitions(
    assets=[project_foo.assets()]
    jobs=[<http://project_foo.jobs|project_foo.jobs>(), <http://project_bar.jobs|project_bar.jobs>()]
)
z
If the resource keys are different, then you can have different slack resources for different jobs in the same definitions object
a
hmm, how do I associate each resource def with a specific job?
I was under the impression that each resource is available to all jobs. Oh maybe, I just name them.
z
where is "bar_slack" being referenced?
Or another way to ask would be how are you referencing resources in your jobs?
a
sorry, it looks like bar_slack is a local variable for that
@repository
function
so we can replace: foo_slack = ["@foo-stakeholder"] bar_slack = ["@bar-stakeholder"]
z
I'm not sure what you're referencing there
a
It's the names that the Slack hook should use when alerting (succes/failure).
Copy code
definitions = Definitions(
  assets=foo_asset,
  jobs=[define_asset_job(name="foo_assets_materialization", hooks={slack_message_on_failure})],
  resources={
    "foo": foo_config,
    'slack': foo_slack_credentials(),
    'slack_names': "@foo-stakeholder"
  },
)
z
I'm not super familiar with the slack failure hook, it's unclear to me how it uses resources but maybe that'll work
a
Now I want to add my:
Copy code
@repository
def bar():
    run bar_graph.to_job(
        ...
        resource_defs={
            'slack_names' : "@bar-stakeholder"
        }
    )
    return [run]
so with the way I set up Definitions, I can only see both
@foo-stakeholder
and
@bar-stakeholder
getting alerted.
I guess, it's not a problem if I can set up the hooks somehow at the job level
I could just have different hooks created. {slack_message_on_failure} would instead be {slack_foo_stakeholder_on_failure} or something
z
Is slack_names the required resource key for the
slack_on_failure
hook? It also looks like you could also just set hooks at a job level like this:
Copy code
@slack_on_failure("#foo", webserver_base_url="<http://localhost:3000>")
@job(...)
def my_job():
    pass
I'm struggling to see how "slack_names" works with the built-in
slack_on_failure
hook but I get the feeling there's a lot I'm missing about your code
a
yep, but the job level part helps a lot since I was thinking that
I just can't imagine this not looking insane
having 50+ jobs in a definition
z
I don't really see the problem. You already had 50+ jobs in your code location. You'll still have 50+ jobs in your code location
a
maybe it's not so bad if I'm just passing in functions to the definition
z
But if you want to split it up, split it up
a
well, it's still split up into different functions with
@repository
Copy code
@repository
def a():
    ...
    return [a]

...

@repsitory
def z():
    return [z]
Copy code
@job
def a():
    ...

@job
def z():
    ...

def = Definition(
    assets=[a_asset, ... z_asset],
    jobs=[a, ... z],
)
I guess it'll be the same thing.
z
yup
a
Thank you. This should be fun. 😅
z
Good luck! I'm happy to help try to walk you through doing multiple code locations if you feel you want to go that route too!
a
I feel like multiple code locations is more of a "best practice" but i'm not sure.
I am curious as to how it works even if it's not.
z
That just carries a lot more unnecessary overhead if you don't have requirements around separated environments or separated projects
How is your Dagster instance deployed right now? Like I've said a couple times, it is very dependent on that
a
Oh ok, so if I just want something simple then single definition would work for me
I think dagit, daemon, and grpc all live in their own docker containers
z
yes. the only reasons to separate code locations are if you want separated environments or isolated projects
a
ah ok, I don't think i'm looking for that yet
I just want to be as organized as I can be
the multiple folder sounded cool but I think I can still attempt to do that
well, I guess the definitions being in the init.py file was what I was attracted to lol
I don't think I can do that but it's ok
z
Why can't you put the definitions in the init.py file?
a
oh can i!?
since I have a single definition that GRPC is calling that I couldn't really import projects with their own definitions
Copy code
grpc -> definitions.py
Copy code
definitions.py

@job
def a():
    ...

@job
def b():
    ...

def = Definitions(
    ...
    jobs=[a,b]
)
but if I have a and b project in folders with their own definitions..
Copy code
definitions.py

import a
import b
?
z
you can do
Copy code
#__init__.py
from project_a import a
from project_b import b
def = Definitions(jobs=[a,b])
a
what would
definitions.py
look like, the file that grpc is calling?
z
You can point the GRPC server to your init.py file. Basically you can put your Definitions object in any file you'd like, and just point the GRPC server to that file in the workspace.yaml
a
we will still have a single Definitions() object?
z
If you wanted to put it in a init.py file you can do something like this in your workspace.yaml:
Copy code
load_from:
  - python_module: path.to.package
if the init file was in /path/to/package/init.py
Yes one Definitions object, just like in the example I showed above
a
sorry, I misunderstood. I thought we were setting up a unique Definitions() object in each projects init file.
z
If you're doing multiple code locations, yes
Sorry I think I confused you because I didn't quite understand
what would
definitions.py
look like, the file that grpc is calling?
you were asking about what the definitions.py file would look like in a multi code location setup
which I misunderstood
a
hmm, let me see if i can find the command. it's when we launch grpc we pass a file or module as an argument
z
Often when you have multiple code locations they're in separate git repos entirely
a
I'm passing a file with a bunch of
@repository
in it
yah, i dont think multiple code location is for me at this time
i'll keep it in my back pocket for the day I do need it
🎉 1
Copy code
dagster api grpc --python-file definitions.py --host 0.0.0.0 --port 4266
Copy code
# definitions.py

from project_a import a
from project_b import b

def = Definitions(
    ...
    jobs=[a, b]
)
Copy code
# project_a.py

@job
def a():
    ...
z
That looks like it would work
a
I think so, I'm just wary about
@op
and
@graph
now. Mainly, that I don't remember the difference between
@op
and
@job
. I'm going to take a look at the docs now to clear up my confusion. I think my understanding was that
@graph
has a collection of
@job
and
@op
is a function in
@job
So I guess my question should be changed to, would Definitions() work with
@graph
Ok, graph contains ops
z
graphs, jobs, ops all stay the same between
@repository
vs.
Definitions
🙌 1
a
ah ok, so i can use
.to_job
and pass in the graph as a job to Definitions
I think I have everything I need to do this.
🎉 1
z
You got this!
a
Thank you!