Hi. I'm using Dagster with DBT Core to materialise...
# integration-dbt
i
Hi. I'm using Dagster with DBT Core to materialise tables to a postgres database. I've got sensors polling for changes on source tables. I thought it would efficient to allow the sensors to use a connection pool as creating connections is an expensive (time wise). My approach is a singleton pattern to instantiate the connection pool which I was hoping would be used each time a sensor needs to check for changes. I've noticed that dagster routinely starts and stops the code servers e.g. "Started Dagster code server for module ...", "Shutting down Dagster code server for module ...". I've seen posts that suggest the stopping and starting of the code server is expected. However this invalidates my approach of creating a singleton style connection pool for the sensors. Is it best to abandon the connection pool and just have the sensors create a connection each time they need to poll. Or is there a good way to deal with this. My ELT pipeline has near real time (NRT) requirements, so i'm looking for ways to optimize the run time.
s
One thought is to have the sensor open up a connection at startup, then poll continuously for a period of time (say, 45 seconds), then turn off. Then, when the next time the sensor starts up, it will do the same thing. So you have a startup cost of creating the connection every minute or so, but within that minute, you can reuse the same connection. We've thought about doing this with Kafka but haven't quite got around to it yet.
i
@Stephen Bailey, is there a way to control the frequency that the code server module is shutdown and restarted? Also in my scenario each sensor represents a pipeline run, so i'm not sure if each pipeline is being initiated it it's own code server instance? If each pipeline run is in it's own instance of a code server which gets shutdown after the run then my approach with a singleton Connection Pool is futile. I may rather look to run the code that spins up a connection pool in a separate container i.e. not controlled by dagster and have my sensors make calls to this container to obtain a connection.
s
I think
minimum_interval_seconds
is the main configurable piece of the sensor, and a sensor evaluation itself can last up to ~60 seconds. So yeah, you might still be looking at N connections for N pipelines per minute