I’ve been running dagster locally for a while now ...
# dagster-plus
p
I’ve been running dagster locally for a while now and have some historical data that I’d like to use to estimate the cost of running in
dagster-cloud
instead. Is there a query / tool I can use to extract this data from my local instance to compute approximate cost? It’d be great to have something like
dagster price -from 2023-01-01 -to 2023-05-01
and have it produce a number or some kind of estimated price breakdown (per day / month / job / asset / etc) Tangentially, is there any mechanism to migrate the historical data from a postgresql database into dagster-cloud?
z
I think the only real way to do this right now would be quite manually, by using the graphql
runsOrError
endpoint with a query like this:
Copy code
query RunsQuery(
    $runIds: [String]
  ) {
    runsOrError (filter: {runIds: $runIds}) {
      __typename
      ... on Runs {
        results {
          startTime
          endTime
          runId
          }
        }
      }
    }
you could execute that from a python graphql client - you could just inherit from DagsterGraphQLClient and add the query as another method like this (note this may not be best practice in a production setting):
Copy code
class CustomGraphQLClient(DagsterGraphQLClient):
    """
    Client for executing GraphQL queries against a Dagster instance, acting as a thin wrapper around the Dagster
    GraphQL API.
    """

    def get_run_stats(self, run_ids: List[str]):
        query = """
                query RunsQuery(
    $runIds: [String]
  ) {
    runsOrError (filter: {runIds: $runIds}) {
      __typename
      ... on Runs {
        results {
          startTime
          endTime
          runId
          }
        }
      }
    }
                """
        variables = {"runIds": run_ids}
        return self._execute(query, variables)
then just do the math on the results
p
Ok, and what about extracting
runIds
between 2 timestamps?
z
I don't think there's a great way to do that just through the graphql api, I'd just pull the results of the query into a DataFrame and filter from there
a
you can access this from the python API as well via
DagsterInstance
, you can see
instance.get_runs
in use in this example https://github.com/dagster-io/dagster/discussions/14164
daggy love 1
🙏 1
that api is exposed via graphql as the field
runsOrError
which takes
filters
argument