Félix Tremblay
05/10/2023, 9:48 PMVinnie
05/11/2023, 12:23 PMAutoMaterializePolicies
are working as intended for multi-partitioned assets. Having a multi-partitioned asset that includes a static and a dynamic partitioned set, creating a new dynamic partition and kicking off a run for all static partitions under the latest dynamic partition will only trigger the materialization on downstream assets (which are set to eager) if all partitions of the upstream asset have been materialized. In case one or more of the dynamic partitions haven’t been materialized (and aren’t queued, so dagster can’t know they will be materialized soon), the downstream run won’t be launched.Daniel Gafni
05/11/2023, 3:08 PMJoel Olazagasti
05/11/2023, 3:29 PMFélix Tremblay
05/11/2023, 5:08 PMStephen Bailey
05/11/2023, 7:12 PMAutomaterializePolicy.cron_schedule("0 0 * * *")
and AutomaterializePolicy.eager()
could solve a ton of use cases without ever having to learn about sensors and schedules classes.Brian Pohl
05/11/2023, 9:23 PMos.environ
as long as the code is inside an op.
The series of hoops I’ve jumped through work, but I’ve finally run into a scenario where this doesn’t cut it. I want to use an environment variable to construct a string, which gets set in my job config (it’s the dbt target). If I didn’t need to do any string transformations, I could set resources: dbt: config: target: env: MY_ENV_VARIABLE
, but because I need to modify it, I am forced to do this in my Python scripts. And my Python scripts can only access environment variables during op runtime…
So, if I had build args during the build, I could turn all my ARGs into ENVs in the Dockerfile. I could cut Pulumi out of this whole process - passing variables from CI/CD straight into Dagster - and simplify a lot of my Python that today is restricted by only having access to environment variables at runtime.Simon Frid
05/12/2023, 12:37 AMOperation name: RunsRootQuery
Message: Cannot return null for non-nullable field Run.mode.
Path: ["pipelineRunsOrError","results",5,"mode"]
Locations: [{"line":29,"column":3}]
Stack Trace:
File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 541, in execute_field
return_type, field_nodes, info, path, result
File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 621, in complete_value
"Cannot return null for non-nullable field"
Ivan Tsarev
05/12/2023, 1:02 AM<http://EnvVar.int|EnvVar.int>("VAR_NAME")
- maybe it I'm not so smart but wasting half an hour figuring out how to work correctly with int-based envvar without pydantic errors is not so fun.
• Usage of <http://RunConfig.to|RunConfig.to>_config_dict()
. Probably my use-case is a bit marginal, but I have bunch of jobs which uses @config_mapping
and is started manually from dagit UI without python invocation of execute_in_process()
. Previously returning dict from @config_mapping
was okay-ish option but with all pydantic benefits doing it looks quite dumb. So this method becomes handy allowing to create fully-pydantic config inside mapping and then pass it to "just" job with ease. And I again spent quite some time before found this method in the source code :)Giovanni Paolo
05/12/2023, 5:51 PMdagster.EnvVar
that I didn't expect: it silently fails when used for something that is not bound to a run (specifically the dagster-slack make_slack_on_run_failure_sensor
).
took me a while to figure out why!Sebastien
05/14/2023, 9:10 AMSimrun Basuita
05/14/2023, 1:35 PMCody Peterson
05/16/2023, 5:16 PMZach
05/16/2023, 6:55 PMPablo Beltran
05/16/2023, 10:06 PMChris Comeau
05/17/2023, 1:27 PMMark Fickett
05/17/2023, 5:31 PMjosh
05/17/2023, 5:38 PMAdam Bloom
05/17/2023, 6:35 PM1.3.4
and noticed a dagit regression - on the run page, long tags used to display a tooltip containing the full tag (as they are truncated in the run table) - that tooltip no longer seems to appearFrancesco Piccoli
05/17/2023, 7:37 PMFrancesco Piccoli
05/17/2023, 8:40 PMDaniel Kim
05/18/2023, 2:20 AMR Lucas
05/19/2023, 8:52 AM1.3.4
) search seems to be limited to asset/group/job names but does not seem to allow search on description or dagster types.
Is this feature expected in your backlog or will we need to use a dedicated data catalog tool (such as DataHub) to manage exploration of data assets / (search in fields / description) ?Mark Fickett
05/19/2023, 1:01 PMSpencer Nelson
05/19/2023, 4:34 PMTable
. Details aren’t important - what matters is that dataframes and tables are not compatible.
So I have these old assets with keys like ztf_source_dataframe
, which are pandas dataframes. But I want to convert to `Table`s. Here are the options I see:
1. Reuse the ztf_source_dataframe
asset key, but return a non-dataframe type. Which is confusing, and will break dependents, which can possibly be managed with asset versioning in some way? But the name confusion would be unfortunate.
2. Write new ztf_source_table
asset, change dependents to use it, and then delete ztf_source_dataframe.
But this would destroy all history and orphan the materialized assets. Historical runs will be… broken? I don’t know what will happen to the dagit UI for them.
3. Write the new asset, but then keep ztf_source_dataframe
as relics of a bygone era. But they’ll clutter the UI and the codebase forever. Is there a way to mark assets as “archived” or “deprecated” or “just kept around in the attic?”
Gradual migrations like this are really important. I think Dagster could provide tools to manage this, and they could be fantastically better than anything else out there, since Dagster knows so much about my computation graph. I don’t have a concrete suggestion but think this is an important area for new features.Leo Qin
05/19/2023, 6:39 PMFélix Tremblay
05/23/2023, 10:43 PMCody Peterson
05/24/2023, 2:06 AMClement Emmanuel
05/24/2023, 4:01 PMSELECT
event_logs.id,
event_logs.event
FROM
event_logs
WHERE
event_logs.dagster_event_type = $1
ORDER BY
event_logs.id DESC
Which seems to be invoked by
context: MultiAssetSensorEvaluationContext
context.latest_materialization_records_by_partition_and_asset()
Becomes very expensive as the event_logs table grows (which is indefinite I believe as it's essentially a write-only table). This is expensive even with the appropriate indexes that get leveraged by the query plan.
Has there been any throughput testing on this pattern, or any ideas as to how to optimize this. Unless i'm missing something this seems to make the canonical use of multi-asset sensors non viable even at a fairly modest scale as it will only decay in performance as materializations continue until it eventually (or in the case that materializations already exist when turning on the sensor, immediately) can't complete within the hard 60 second timeoutMark Fickett
05/24/2023, 5:56 PM