Xavier BALESI
08/03/2022, 1:58 PMMatt Menzenski
08/03/2022, 2:28 PMdagster new-project
in the CLI reference but it’s not there. Is there another place in the documentation to find it?
Specifically I was hoping to see whether I can pass a custom destination path (to run dagster new-project .
or similar)Scott Hood
08/03/2022, 2:33 PMDaniel Mosesson
08/03/2022, 4:09 PMAlan
08/03/2022, 5:06 PMAnoop Sharma
08/03/2022, 5:19 PMexecution:
config:
multiprocess:
max_concurrent: 2
but where do I put this config. Does it go in dagster.yaml?
PS: I have tried putting it in dagster.yaml but it doesn't seem to be working.Amit Arie
08/03/2022, 6:02 PMStepExecutionContext
or HookContext
?Matt Menzenski
08/03/2022, 6:45 PMChris Hansen
08/03/2022, 8:42 PMdagster-io/dagster-cloud-branch-deployments-quickstart
and i’m running into issues in the Deploy to Dagster Cloud
step.
│ /tmp/dagster-cloud-cli/dagster_cloud_cli/gql.py:578 in │
│ create_or_update_branch_deployment │
│ │
│ 575 │ │
│ 576 │ name = result.get("data", ***).get("createOrUpdateBranchDeployment" │
│ 577 │ if name is None: │
│ ❱ 578 │ │ raise Exception(f"Unable to create or update branch deployment │
│ 579 │ │
│ 580 │ return cast(str, name) │
│ 581
what is the name
field that i’m missing here?Ugochukwu Onyeka
08/03/2022, 9:41 PMSaul Burgos
08/03/2022, 10:13 PMSaul Burgos
08/03/2022, 10:17 PMSterling Paramore
08/03/2022, 10:59 PMVinnie
08/04/2022, 9:36 AMWarning: Invalid HTTP request
whenever I try to open the website. Am I missing some more configuration or is it just not supported? Deploying it under dagster.example.com works fine.
And in case it’s not supported, are you planning on supporting it in the future?Stefan Samba
08/04/2022, 9:37 AM@op
elements it’s easy to get serial (link).
In my specific case I’m looking for:
• getting serial when running .ipynb files
For example:
import dagstermill as dm
from dagster import job
download_data = dm.define_dagstermill_op(
"download_data",
notebook_path=("download_data.ipynb"),
output_notebook_name="download_data_output",
)
prepare_data = dm.define_dagstermill_op(
"prepare_data",
notebook_path=("prepare_data.ipynb"),
output_notebook_name="prepare_data_output",
)
@job(
resource_defs={
"output_notebook_io_manager": dm.local_output_notebook_io_manager,
}
)
def dagster_main():
download_data()
prepare_data(download_data)
This is small example for illustrational purposes. The last line will not work because prepare_data can’t take any arguments. It just for illustrational purposes to show that prepare data will depend on download_data.
Would it be possible to make this serial in some way?
Q2
For ops I can see it’s possible to get a dataflow going from one step to another. Would that dataflow be possible when working with ipynb files? I can imagine this is a challenge as a ipynb file is not returning a value. And ideas here?Indra
08/04/2022, 10:07 AMBen
08/04/2022, 12:55 PMBen
08/04/2022, 12:57 PMTom Reilly
08/04/2022, 3:03 PMdefine_asset_job()
is there a way to stop execution mid job and have the job still marked as a successful run? For example, if the asset job looked like
get_new_files --> process_new_files
and the get_new_files
asset doesn't find new files I'd like the job to finish as successful without initiating any downstream assets (in this case process_new_files
.Martin O'Leary
08/04/2022, 6:09 PMChris Hansen
08/04/2022, 7:12 PMbq_op_for_queries
is what i want to use, but the docs lost me at Expects a BQ client to be provisioned in resources as context.resources.bigquery.
Maksym Domariev
08/04/2022, 10:45 PMcontext.py:555: UserWarning: Error loading repository location hello_flow:dagster._core.errors.DagsterInvalidDefinitionError: @graph 'test_graph' returned problematic value of type <class 'dagster._core.definitions.composition.DynamicFanIn'>. Expected return value from invoked solid or dict mapping output name to return values from invoked solidif I remove return from graph I have this :
dagster._check.CheckError: Invariant failed. Description: All leaf nodes within graph 'test_graph' must generate outputs which are mapped to outputs of the graph, and produce assets. The following leaf node(s) are non-asset producing ops: {'load_something'}. This behavior is not currently supported because these ops are not required for the creation of the associated asset(s).for obvious reasons. Not sure How to fix it. I'm obviously calling collect as manual described,
Sean Lindo
08/04/2022, 11:21 PMVxD
08/05/2022, 1:00 AMminimum_interval_seconds=10
.
We noticed that each time the sensor runs, it only processes one succeeded graph, even if 50 have succeeded over the past 10s.
This is heavily problematic because the sensor requires 10 minutes to process 60 completed jobs, which doesn't scale when we need to handle hundreds.
Is there a way we can get the sensor to process more than one pipeline in one go?Maksym Domariev
08/05/2022, 1:40 AMBolin Zhu
08/05/2022, 2:51 AMgeoHeil
08/05/2022, 6:50 AMTaqi
08/05/2022, 6:53 AMKautsar Aqsa
08/05/2022, 8:36 AMshell_op()
, but unfortunately I got this error
dagster._core.errors.DagsterInvalidInvocationError: Compute function of op 'shell_op' has context argument, but no context was provided when invoking.
and I still don't understand what dagster means context here. Here is my .py code
@op
def Load_To_Elastic(context, table):
"""Loading jobs from warehouse to elasticsearch"""
bash_command = "/Users/kautsaraqsa/ELK/logstash-7.15.2/bin/logstash -f /Users/kautsaraqsa/code/AIHSP/badung-case.conf"
<http://context.log.info|context.log.info>("Indexing table:" + table)
shell_op(context, bash_command)
@graph
def Pipeline():
"""Load table to warehouse then transfer it to Elasticsearch with Logstash"""
table = Ingest_From_Synchro()
Load_To_Elastic(table)
Hope anyone can help me with this, thank you!Xavier BALESI
08/05/2022, 10:36 AM@asset(name="upstream")
def upstream(_: OpExecutionContext):
return "upstream"
@asset(name="downstream", ins={"upstream": AssetIn("upstream")})
def downstream(_: OpExecutionContext, upstream):
return upstream + "downstream"
assets_with_resources = with_resources([upstream, downstream], resource_defs={...})
materialize_all = define_asset_job(name="materialize_all",config=config_from_files([...]))
@repository
def repo():
return [assets_with_resources, materialize_all]
# Note: in my custom io manager I load_input and handle_output a file named respectively from context.upstream_output.name and context.op_def.name
in dagit from the job view, when I launch upstream alone then downstream alone, all is right:
Loaded input "upstream" using input manager "my_io_manager"
but when I launch the whole job materialize_all it fails after :
Loaded input "upstream" using input manager "my_io_manager", from output "result" of step "upstream"
In the working case the output of upstream is named 'upstream' but in the breaking case it's named 'result'
I am expecting that when I launch the 2 assets separately and with materialize all, I have the same behaviourXavier BALESI
08/05/2022, 10:36 AM@asset(name="upstream")
def upstream(_: OpExecutionContext):
return "upstream"
@asset(name="downstream", ins={"upstream": AssetIn("upstream")})
def downstream(_: OpExecutionContext, upstream):
return upstream + "downstream"
assets_with_resources = with_resources([upstream, downstream], resource_defs={...})
materialize_all = define_asset_job(name="materialize_all",config=config_from_files([...]))
@repository
def repo():
return [assets_with_resources, materialize_all]
# Note: in my custom io manager I load_input and handle_output a file named respectively from context.upstream_output.name and context.op_def.name
in dagit from the job view, when I launch upstream alone then downstream alone, all is right:
Loaded input "upstream" using input manager "my_io_manager"
but when I launch the whole job materialize_all it fails after :
Loaded input "upstream" using input manager "my_io_manager", from output "result" of step "upstream"
In the working case the output of upstream is named 'upstream' but in the breaking case it's named 'result'
I am expecting that when I launch the 2 assets separately and with materialize all, I have the same behaviourowen
08/05/2022, 5:21 PMcontext.upstream_output.name
and context.op_def.name
can cause issues like what you're seeing, as these are not guaranteed to stay the same between different executionsXavier BALESI
08/05/2022, 8:02 PM