Dagster Cloud not updating from the main branch al...
# dagster-plus
t
Dagster Cloud not updating from the main branch all of the time
🤖 1
I’ve had this happen 2 or 3 times now. Context: • We are using Dagster Cloud Hybrid • We have the GitHib Actions setup using the Dagster template provided • We submit a PR, it’s reviewed and we then merge it into main • 99% of the time, Dagster Cloud will then sync the changes and pull from the latest commit • On Occassion, on merge to main the dagster-cloud-deploy will fail. Attached reason. I’ll be honest, I haven’t research into this any further. I was hoping someone might have come across this before and can just point me in the right direction of fixing
It’s not the end of the world, as we just manually run the job (hence why I haven’t spent any time yet researching). But a little annoying as it’s tripped us up a couple of times. I will add some alerting onto this GH action so that we get a slack alert when it fails. But curious if there is a proper fix / known issue
z
Do you have
Copy code
concurrency:
  # Cancel in-progress runs on same branch
  group: ${{ github.ref }}-push
  cancel-in-progress: true
set in your workflow?
If you have
cancel-in-progress: true
then if you push while the branch is already being built then GH actions will cancel the original build
t
Ah that’s probably it!
Thanks for the reply, Zach.
🎉 1
Hmmmm, nope. That’s not it. We already have this set
Ahhhhh! I think I know what this is. Sometime when we are pushing through a PR quickly, we just wait for the tests and lint to pass before the Merge button is enabled. We don’t wait until the branch deployment has finished. So when we eagerly merge and the branch deployment hasn’t finished, there is a clash (although the error message doesn’t really indicate that). This is just a presumption, as I just bumped into this again and that was the exact workflow / sitation
z
Yeah what I was saying was that if you have that set in your workflow, and you push to the same PR / branch again while it's still building, it will cancel the in-progress run (that's what cancel-in-progress means)
You'd need
cancel-in-progress: false
to avoid that
But the eager merge thing might be an issue too, not sure
t
No that’s not what I’m saying. I’m saying: 1. Created a PR 2. Lint and test steps pass, but branch deployment is still in progress 3. We press merge to main
That’s the example I saw today where this happened again.
z
Yeah that's what I meant by eager merge and saying that that might still be an issue. I'm just saying that the
cancel-in-progress: true
is what causes our workflows to cancel with the
Canceling because a higher priority request...
message
I think we're getting wound in circles here, your statement
Hmmmm, nope. That’s not it. We already have this set
seemed to indicate to me that you thought I was suggesting to set
cancel-in-progress: true
and that you already had it set, when what I meant was that having that set is what cancels in-progress builds and was in fact suggesting you turn that to
false
t
Yeah we are getting mixed up. My fault
I’m not convinced though that setting this to false is the right solution though. As in theory, the merge into main should cancel the other existing runs. Not the other way around (unless I’m misunderstanding how this works in GHA).
Very well could be my misunderstanding of GHA though
z
Yeah we deal with the same thing for the cancel-in-progress setting. It's entirely possible you're seeing the merge behavior you mentioned too, I haven't personally observed that. Do you have your branch deployments / GHA set up to delete branch deployments after they get merged?
(my team does not, I'm just curious to see if that could be a confounding factor as well)
t
No haven’t got that set at this stage - although I see how that would be useful, as it’s annoying to delete them manually in Dagster. Any reason why you guys haven’t implemented it? I’ll have a play next week and see if I can spot the problem and let you know if I figure what it is out. A solution might be to disable the cancel in progress only for the main branch. Thanks again for the help. Much appreciated
z
Not a problem! RE: automatic deletion of branch deployments - a number of our data engineers / analysts have bad habits around running relatively important jobs in branch deployments and often want to keep run history for those jobs. It's seen as too big of a hassle to have to go through merges and redeployment to prod to fix issues that come up during large processing jobs (these jobs take weeks to complete usually), so instead people run these big jobs on branch deployments and then just merge in when they're finished. It sucks, but I've currently got bigger fish to fry
t
Only 4 months later am I finally looking at this properly… The issue was in our GH action yaml was that we needed to remove
closed
from types. It was essentially a race condition between GH actions deploying against the branch deployment vs main. Hence why it was so intermittent
Copy code
pull_request:
     types: [opened, synchronize, reopened, closed]
Thought I’d post this here in case someone else stumbles across it