Jonathan Lamiel
11/23/2022, 12:15 PMDagster deamon
which can’t (per the doc) handle replicas for the moment. As it’s running sensors / schedule etc… I was wondering:
1. If there was benchmarks of workload it can handle in //?
2. Any plan to make it scalable? (Didn’t found any issues mentioning it)prha
11/23/2022, 10:02 PMTomas H
12/12/2022, 2:03 PMPieter Custers
12/14/2022, 3:40 PMdaniel
12/14/2022, 4:45 PMTomas H
12/15/2022, 3:42 PMdef execute_if_prev_job_finished(context: ScheduleEvaluationContext,
job_name: str) -> Union[SkipReason, RunRequest]:
logger = context.log
logger.debug(f"Checking job run {job_name}")
start_timestamp = datetime.now()
run_records = context.instance.get_run_records(
RunsFilter(job_name=job_name,
statuses=[DagsterRunStatus.STARTED,
DagsterRunStatus.QUEUED,
DagsterRunStatus.STARTING,
DagsterRunStatus.NOT_STARTED])
)
checked_in = str(datetime.now() - start_timestamp)
logger.debug(f"Previous run of job {job_name} checked in {checked_in}")
if len(run_records) == 0:
logger.debug(f"Submitting run request for job {job_name}")
yield RunRequest(tags={"run_requested_at": str(datetime.utcnow())})
else:
yield SkipReason(f'The previous job {job_name} have not finished yet.')
sleep_schedules_perm = [ScheduleDefinition(
job=job,
cron_schedule="* * * * *",
execution_fn=functools.partial(execute_if_prev_job_finished, job_name=job.name))
for job in sleep_jobs_perm]
daniel
12/15/2022, 3:45 PMTomas H
12/15/2022, 4:18 PMSELECT runs.id, runs.run_body, runs.status, runs.create_timestamp, runs.update_timestamp, runs.start_time, runs.end_time
FROM runs
WHERE runs.run_id IN (SELECT run_tags.run_id
FROM run_tags
WHERE run_tags.key = 'dagster/schedule_name' AND run_tags.value = 'sleep_job_perm19_schedule' INTERSECT SELECT run_tags.run_id
FROM run_tags
WHERE run_tags.key = '.dagster/repository' AND run_tags.value = 'load_repo@code-load') ORDER BY runs.id DESC
LIMIT 1
daniel
12/15/2022, 4:20 PMTomas H
12/15/2022, 4:24 PMalex
12/15/2022, 5:52 PMinsufficiently scaling sensor processingDid you put all the sensors in the same code location / user code server? The daemon reaches out to this/these servers to evaluate the sensor functions. Splitting these out and/or tuning the
--max_workers
flag on dagster api grpc
would likely yield higher throughput.Tomas H
12/29/2022, 5:28 PMdaniel
12/29/2022, 5:42 PM