Ricky Kim
01/04/2023, 4:21 PMdbt_resource_key
argument with load_assets_from_dbt_projects
.
But I am having problem include both dbt asset groups in the repository.
I can include either one of them in the return statement of repository
.
But when I include both of them, I get below error.
dagster._core.errors.DagsterInvalidDefinitionError: Invalid dependencies: op "run_dbt_b7d19" does not have input "source_moo_moocards_moocards_moo_user". Available inputs: ['source_moo_recurly_billing_info', 'source_moo_recurly_invoices', 'source_moo_recurly_transactions']
The dbt assets are defined as below.
def get_dbt_group_name(node_info: Mapping[str, Any]):
name = _get_node_group_name(node_info)
return "dbt_" + name if name else None
# setup an `asset` with all dbt models
all_models = with_resources(
# when loading assets from dbt, we invoke the `node_info_to_group_fn` argument
# to label them all under a unified `group_name`
load_assets_from_dbt_project(
project_dir=DBT_PROJECT_DIR, profiles_dir=DBT_PROFILES_DIR, node_info_to_group_fn=get_dbt_group_name
),
{
"dbt": dbt_cli_resource.configured(
{"project_dir": DBT_PROJECT_DIR, "profiles_dir": DBT_PROFILES_DIR, "target": DBT_TARGET}
)
},
)
# setup an `asset` with dbt prep models
prep_models = with_resources(
load_assets_from_dbt_project(
project_dir=DBT_PREP_DIR,
profiles_dir=DBT_PROFILES_DIR,
dbt_resource_key="dbt_prep",
node_info_to_group_fn=get_dbt_group_name,
),
{
"dbt_prep": dbt_cli_resource.configured(
{"project_dir": DBT_PREP_DIR, "profiles_dir": DBT_PROFILES_DIR, "target": DBT_TARGET}
)
},
)
Could I get some insight on how to load both dbt assets into the repository?
Thank you!Sylvain Lesage
01/04/2023, 4:30 PMJing Zhang
01/04/2023, 6:24 PMdaniel
01/04/2023, 6:56 PMHebo Yang
01/04/2023, 7:46 PMgeoHeil
01/04/2023, 7:59 PM# old repository:
CMD ["dagster", "api", "grpc", "-h", "0.0.0.0", "-p", "4000", "-f", "FOO/repository.py", "--attribute", "myrepo"]
# ==> works fine
# new definitions:
CMD ["dagster", "api", "grpc", "-h", "0.0.0.0", "-p", "4000", "-m", "package_name"]
# ==> fails with: =>
DagsterInvariantViolationError: No repositories, jobs, pipelines, graphs, asset groups, or asset definitions found in "package_name"
CMD ["dagster", "api", "grpc", "-h", "0.0.0.0", "-p", "4000", "-m", "package_name.package_name"]
# ==> fails with: => No module named 'package_name/package_name'` while importing module package_name/package_name
Similarly to https://github.com/slopp/dagteam/blob/main/workspace.yaml:
my workspace.yaml looks like:
load_from:
- grpc_server:
host: package_name
port: 4000
location_name: "package_name"
inside docker. But outside docker (during local development):
load_from:
- python_package:
package_name: package_name.package_name
Outside docker a pip install -e .
has been executed - and the modules are resolvable. Inside docker, it would be convenient to not first volume-map/copy the file and then additionally pip install -e . it.
It would be uesful (I think also for speedier reloading) to be able to pass a file like before: "-f", "FOO/repository.py".
Is there any chance I can get this convenient behavior back when using definitions?
My package (scaffolded from dagster cli as package_name/package_name/assets.py and package_name/package_name/__init__.py) contains in the init:
from . import assets
defs = Definitions(assets=(load_assets_from_modules([assets])))
However, I do not find any good way to specify the init file in the -f
parameter. There, dagster also fails to resolve it due to missing modules.Daya
01/04/2023, 8:45 PMZachary Bluhm
01/04/2023, 8:48 PM"dagit": {
"enableReadOnly": false, <---
...
}
Shaounak Nasikkar
01/05/2023, 2:36 AMdocker.errors.APIError: 500 Server Error for <http+docker://localhost/v1.41/images/create?tag=5.2.144&fromImage=><jfrog_url>%2F<project>%2Fdev%2F<container_image>: Internal Server Error ("Head https://<jfrog_url>/v2/<project>/dev/<container_image>/manifests/5.2.144: unknown: Authentication is required")
File "/usr/local/lib/python3.8/dist-packages/dagster/core/instance/__init__.py", line 1732, in launch_run
self._run_launcher.launch_run(LaunchRunContext(pipeline_run=run, workspace=workspace))
File "/usr/local/lib/python3.8/dist-packages/dagster_docker/docker_run_launcher.py", line 152, in launch_run
self._launch_container_with_command(run, docker_image, command)
File "/usr/local/lib/python3.8/dist-packages/dagster_docker/docker_run_launcher.py", line 110, in _launch_container_with_command
client.images.pull(docker_image)
File "/usr/local/lib/python3.8/dist-packages/docker/models/images.py", line 465, in pull
pull_log = self.client.api.pull(
File "/usr/local/lib/python3.8/dist-packages/docker/api/image.py", line 429, in pull
self._raise_for_status(response)
File "/usr/local/lib/python3.8/dist-packages/docker/api/client.py", line 270, in _raise_for_status
raise create_api_error_from_http_exception(e) from e
File "/usr/local/lib/python3.8/dist-packages/docker/errors.py", line 39, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation) from e
The above exception was caused by the following exception:
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: <http+docker://localhost/v1.41/images/create?tag=5.2.144&fromImage=><jfrog_url>%2Fdatamax%2Fdev%2F<container_image>
File "/usr/local/lib/python3.8/dist-packages/docker/api/client.py", line 268, in _raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.8/dist-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
We are using two celery workers and rabbitmq. We are using the dagster.yaml file where we tried specifying the jfrog credentials using the following syntax -
registry:
url: "<<https://registry.gitlab.com/v2>>"
username: "myusername"
password:
env: DAGSTER_CONT_REGISTRY_DEPLOY_TOKEN
This issue is happening intermittently
• We have tried restarting the docker services but in vain.
• We tried to perform the docker login from inside the container and it worked but the pipeline was still failing intermittently.
• We tried restarting all the containers, but the same issue was repeated.
• We have a similar setup in another environment with the same container images and those are executing just fine. The docker version in the environment where it’s running and this environment where it's failing intermittently is the same.geoHeil
01/05/2023, 7:16 AMfrom utilities.utilities import common_function
Dagster is only happy with this version above.
from utilities import common_function
Pytest only with this version here.
XOR in either case one of thm (dasger, pytest throws a module not found error) or ImportError: cannot import name 'common_function' from 'utilities' (unknown location)
.
NOTICE: all 3 packages have been installed withand should be natively callable.pip install -e .
Yevhen Samoilenko
01/05/2023, 9:46 AMVinnie
01/05/2023, 9:48 AMDaniel Galea
01/05/2023, 9:57 AMDaniel Galea
01/05/2023, 12:29 PMRafael Gomes
01/05/2023, 12:46 PM[A, B, C]
, and each gets data from different tables in a database and makes some transformations. The result of these assets is a dataframe that will be passed to another asset X
responsible to send the data to BigQuery. I don't think this is the proper way to do that, I keep replicating the asset X
because the table names are different. Is it possible to have a common asset that receives data from multiple assets and extra parameters (table name for example)? Also, should X
be an asset?Ismael Rodrigues
01/05/2023, 2:52 PMZachary Bluhm
01/05/2023, 3:21 PMbuild_op_context
How do I specify tags in this function? Thanks!AJ Floersch
01/05/2023, 4:12 PMgraph_assets = AssetsDefinition.from_graph(the_graph, partitions_def=DailyPartitionsDefinition(...), can_subset=True)
Rohil Badkundri
01/05/2023, 6:00 PMmy_asset = AssetsDefinition.from_graph(my_graph, ...)
# alternatively
another_asset = AssetDefinition.from_op(my_op, ...)
Allowing you to re-use my_graph
and my_op
across different assetsRafael Gomes
01/05/2023, 6:37 PMsar
01/05/2023, 7:19 PMAnthony Reksoatmodjo
01/05/2023, 8:04 PMRohil Badkundri
01/05/2023, 8:12 PMop_config
within an IOManager? E.g.
@op(config_schema = {"uuid": str})
def op1(context) -> int:
return 1
@op(config_schema = {"uuid": str})
def op2(context, a: int):
return a * 2
@job(resource_defs={"io_manager": my_io_manager})
def my_job():
op2(op1())
@io_manager
def my_io_manager():
return MyIOManager()
class MyIOManager(IOManager):
def handle_output(self, context, obj):
# I want to access the uuid here to form the filepath to save the object
def load_input(self, context):
# I want to access the uuid here to form the filepath to load the object
Anton Peniaziev
01/05/2023, 8:25 PMZach P
01/05/2023, 9:13 PMMandi Alexander
01/05/2023, 9:13 PM% helm install dagster dagster/dagster
Error: INSTALLATION FAILED: values don't meet the specifications of the schema(s) in the following chart(s):
dagster:
invalid character '<' looking for beginning of valuedagster-user-deployments:
invalid character '<' looking for beginning of value
I've tried using a local copy of values.yaml
. I've also tried different versions of the helm chart, but I get the same error. My best guess is that it's reaching out and getting a bad http response. Does anybody know what's causing this?Mayank Jariwala
01/06/2023, 4:47 AMĐinh Đức Dương
01/06/2023, 5:05 AMYH
01/06/2023, 6:59 AMAndras Somi
01/06/2023, 9:46 AMAssetDefinition.from_graph
? Given that the @asset
decorator picks up docstring from the decorated function I expected the docstring of the graph to be “propagated” as the description of the created asset, but it doesn’t seem to be the case.