Akira Renbokoji
02/29/2024, 6:10 PM@repository
with Definitions
to future proof against @repository
being deprecated some day.
Well, really I guess it's because I can't get hooks
and resource_defs
to work with define_asset_jobs
and @repository
but I can get it to work with Definitions
.
Can someone check my understanding on the transition?
Currently I have a repository.py
file with multiple @repository
functions.
@repository
def foo():
...
@repository
def bar():
...
workspace.yaml
load_from:
- grpc_server:
host: grpc_host
port: grpc_port
location_name: "grpc_server"
Unfortunately for me, Definitions
and @repository
can't coexist so I have to replace @repository
It's my understanding that I shouldn't have multiple Definitions
in one file, instead I should split them up into different folders and declare the Definitions
in the __init__.py
file, which is great.
foo_project/__init__.py
definitions = Definitions(
...
)
bar_project/__init__.py
definitions = Definitions(
...
)
workspace.yaml
load_from:
- grpc_server:
host: grpc_host
port: grpc_port
location_name: "grpc_server"
How do I point to the multiple Definitions with a GRPC server?
With @repository
I have a file with multiple repository functions. Can I do the same with Definitions?
foo_definition = Definitions(
...
)
bar_definition = Definitions(
...
)
I would like to be able to pass the same resource to various Definitions from a top file instead of declaring it in each __init__.py
Akira Renbokoji
02/29/2024, 6:40 PMCannot have more than one Definitions object defined at module scope
Dagster found multiple Definitions objects in a single Python module.
Only one Definitions object may be in a single code location.
Akira Renbokoji
02/29/2024, 6:40 PMZach
02/29/2024, 7:48 PMAkira Renbokoji
02/29/2024, 7:48 PMZach
02/29/2024, 7:48 PMZach
02/29/2024, 7:49 PMAkira Renbokoji
02/29/2024, 7:49 PMAkira Renbokoji
02/29/2024, 7:49 PMAkira Renbokoji
02/29/2024, 7:49 PMAkira Renbokoji
02/29/2024, 7:49 PMdef = Definitions(
assets=[foo_asset, bar_asset]
)
Zach
02/29/2024, 7:49 PMDefintions(assets=[asset1, asset2, asset3])
Akira Renbokoji
02/29/2024, 7:50 PMAkira Renbokoji
02/29/2024, 7:50 PMZach
02/29/2024, 7:50 PMAkira Renbokoji
02/29/2024, 7:50 PMAkira Renbokoji
02/29/2024, 7:51 PMAkira Renbokoji
02/29/2024, 7:51 PMZach
02/29/2024, 7:51 PMAkira Renbokoji
02/29/2024, 7:52 PM@repository
will eventually be deprecated?Zach
02/29/2024, 7:55 PMZach
02/29/2024, 7:55 PM@repository
because I don't think the deprecation will come anytime soon, like possibly not within the next 12 monthsAkira Renbokoji
02/29/2024, 8:10 PMAkira Renbokoji
02/29/2024, 8:10 PMZach
02/29/2024, 8:31 PMZach
02/29/2024, 8:31 PMAkira Renbokoji
02/29/2024, 9:52 PM@repository
.
If I make a single Definition with all my repository than I will end up with only one slack group right?Akira Renbokoji
02/29/2024, 10:01 PMrepository.py
repository.py
import project_foo
import project_bar
@repository
def foo():
project_foo.to_job(
...
resource_defs={
"foo_slack": foo_slack
}
)
@repository
def bar():
project_bar.to_job(
...
resource_defs={
"bar_slack": bar_slack
}
)
yea, I think I have to go the multiple code location route but I don't know how to set up GRPC for it.Akira Renbokoji
02/29/2024, 10:03 PMdefinitions.py
import project_foo
import project_bar
def = Definitions(
assets=[project_foo.assets()]
jobs=[<http://project_foo.jobs|project_foo.jobs>(), <http://project_bar.jobs|project_bar.jobs>()]
)
Zach
02/29/2024, 10:03 PMAkira Renbokoji
02/29/2024, 10:03 PMAkira Renbokoji
02/29/2024, 10:04 PMZach
02/29/2024, 10:04 PMZach
02/29/2024, 10:04 PMAkira Renbokoji
02/29/2024, 10:05 PM@repository
functionAkira Renbokoji
02/29/2024, 10:06 PMZach
02/29/2024, 10:06 PMAkira Renbokoji
02/29/2024, 10:07 PMAkira Renbokoji
02/29/2024, 10:08 PMdefinitions = Definitions(
assets=foo_asset,
jobs=[define_asset_job(name="foo_assets_materialization", hooks={slack_message_on_failure})],
resources={
"foo": foo_config,
'slack': foo_slack_credentials(),
'slack_names': "@foo-stakeholder"
},
)
Zach
02/29/2024, 10:09 PMAkira Renbokoji
02/29/2024, 10:10 PM@repository
def bar():
run bar_graph.to_job(
...
resource_defs={
'slack_names' : "@bar-stakeholder"
}
)
return [run]
Akira Renbokoji
02/29/2024, 10:11 PM@foo-stakeholder
and @bar-stakeholder
getting alerted.Akira Renbokoji
02/29/2024, 10:11 PMAkira Renbokoji
02/29/2024, 10:12 PMZach
02/29/2024, 10:12 PMslack_on_failure
hook? It also looks like you could also just set hooks at a job level like this:
@slack_on_failure("#foo", webserver_base_url="<http://localhost:3000>")
@job(...)
def my_job():
pass
Zach
02/29/2024, 10:13 PMslack_on_failure
hook but I get the feeling there's a lot I'm missing about your codeAkira Renbokoji
02/29/2024, 10:13 PMAkira Renbokoji
02/29/2024, 10:13 PMAkira Renbokoji
02/29/2024, 10:14 PMZach
02/29/2024, 10:14 PMAkira Renbokoji
02/29/2024, 10:14 PMZach
02/29/2024, 10:14 PMAkira Renbokoji
02/29/2024, 10:14 PM@repository
Akira Renbokoji
02/29/2024, 10:17 PM@repository
def a():
...
return [a]
...
@repsitory
def z():
return [z]
@job
def a():
...
@job
def z():
...
def = Definition(
assets=[a_asset, ... z_asset],
jobs=[a, ... z],
)
Akira Renbokoji
02/29/2024, 10:17 PMZach
02/29/2024, 10:17 PMAkira Renbokoji
02/29/2024, 10:17 PMZach
02/29/2024, 10:18 PMAkira Renbokoji
02/29/2024, 10:18 PMAkira Renbokoji
02/29/2024, 10:18 PMZach
02/29/2024, 10:19 PMZach
02/29/2024, 10:19 PMAkira Renbokoji
02/29/2024, 10:19 PMAkira Renbokoji
02/29/2024, 10:19 PMZach
02/29/2024, 10:19 PMAkira Renbokoji
02/29/2024, 10:20 PMAkira Renbokoji
02/29/2024, 10:20 PMAkira Renbokoji
02/29/2024, 10:20 PMAkira Renbokoji
02/29/2024, 10:21 PMAkira Renbokoji
02/29/2024, 10:21 PMZach
02/29/2024, 10:21 PMAkira Renbokoji
02/29/2024, 10:22 PMAkira Renbokoji
02/29/2024, 10:22 PMAkira Renbokoji
02/29/2024, 10:24 PMgrpc -> definitions.py
definitions.py
@job
def a():
...
@job
def b():
...
def = Definitions(
...
jobs=[a,b]
)
but if I have a and b project in folders with their own definitions..Akira Renbokoji
02/29/2024, 10:24 PMdefinitions.py
import a
import b
Akira Renbokoji
02/29/2024, 10:24 PMZach
02/29/2024, 10:24 PM#__init__.py
from project_a import a
from project_b import b
def = Definitions(jobs=[a,b])
Akira Renbokoji
02/29/2024, 10:25 PMdefinitions.py
look like, the file that grpc is calling?Zach
02/29/2024, 10:26 PMAkira Renbokoji
02/29/2024, 10:27 PMZach
02/29/2024, 10:27 PMload_from:
- python_module: path.to.package
if the init file was in
/path/to/package/init.pyZach
02/29/2024, 10:27 PMAkira Renbokoji
02/29/2024, 10:27 PMZach
02/29/2024, 10:27 PMZach
02/29/2024, 10:28 PMwhat wouldlook like, the file that grpc is calling?definitions.py
Zach
02/29/2024, 10:28 PMZach
02/29/2024, 10:28 PMAkira Renbokoji
02/29/2024, 10:29 PMZach
02/29/2024, 10:29 PMAkira Renbokoji
02/29/2024, 10:29 PM@repository
in itAkira Renbokoji
02/29/2024, 10:29 PMAkira Renbokoji
02/29/2024, 10:29 PMAkira Renbokoji
02/29/2024, 10:32 PMdagster api grpc --python-file definitions.py --host 0.0.0.0 --port 4266
# definitions.py
from project_a import a
from project_b import b
def = Definitions(
...
jobs=[a, b]
)
# project_a.py
@job
def a():
...
Zach
02/29/2024, 10:34 PMAkira Renbokoji
02/29/2024, 10:36 PM@op
and @graph
now. Mainly, that I don't remember the difference between @op
and @job
. I'm going to take a look at the docs now to clear up my confusion.
I think my understanding was that @graph
has a collection of @job
and @op
is a function in @job
Akira Renbokoji
02/29/2024, 10:37 PM@graph
Akira Renbokoji
02/29/2024, 10:38 PMZach
02/29/2024, 10:38 PM@repository
vs. Definitions
Akira Renbokoji
02/29/2024, 10:38 PM.to_job
and pass in the graph as a job to DefinitionsAkira Renbokoji
02/29/2024, 10:38 PMZach
02/29/2024, 10:39 PMAkira Renbokoji
02/29/2024, 10:39 PM