https://dagster.io/ logo
#integration-airbyte
Title
# integration-airbyte
j

Jason

01/30/2023, 4:35 PM
Hi everyone.... This is actually an Airbyte setup question which isn't exactly the correct topic for the channel but everyone here obviously has Airbyte running so I thought I'd ask 🙂 I'm attempting to setup Airbyte via kustomize and we don't support persistent volume so I can't use Minio, so instead I'm configuring S3 but it doesn't "just work" as the docs imply. How are people here running Airbyte? I'm assuming most is with Minio and
persistent volumes
but if I'm wrong can you share any insights? Currently, I have all of the pods running in the cluster, except minio (using external postgres). No errors in my pods. I'm able to log in and set up a snowflake destination and a source but when trying to do a sync, all 3 attempts fail. No logs except the one below.
Copy code
2023-01-30 16:20:32 - Additional Failure Information: message='io.temporal.serviceclient.CheckedExceptionWrapper: java.util.concurrent.ExecutionException: java.lang.RuntimeException: io.airbyte.workers.exception.WorkerException: Running the launcher replication-orchestrator failed', type='java.lang.RuntimeException', nonRetryable=false
It works fine locally, so I'm guessing my last hope is trying to get Infra to set me up with an EC2 server to run
docker compose
but then I'll have to deal with Dagster in K8s getting access to EC2
btw, Yes I've tried Airbyte Slack channel and forum which lead me here 🙂
a

Adam Bloom

01/30/2023, 4:39 PM
Lol….I just wrestled with this last week (via helm charts) - there’s a bug that’s probably causing this. I can share more details in a bit when I’m at my laptop
🙏🏾 1
(I’m also running with S3 state storage and orchestrator now)
ok, skimmed at the kustomize deployment - it looks like it has the same problems as helm. So I'll just share what I ended up having to do, and we'll see where that gets you. I don't recognize the error you're getting though (is there more available in the airbyte-worker logs?)
j

Jason

01/30/2023, 5:23 PM
No worker log errors, that info came directly from the UI when I attempt to do a sync. Looks like someone in the Airbyte channel found setting this from "true" to empty worked, so I'll test that out
CONTAINER_ORCHESTRATOR_ENABLED: ""
a

Adam Bloom

01/30/2023, 5:24 PM
that's disabling container orchestrator, which would get you past some of this, but you do probably want enabled
without that enabled, airbyte-worker remains the singular choke point in your deployment (all data between sources and destinations during syncs must be sent through it). With it enabled, each sync gets its own "replication orchestrator" that is responsible for only that sync.
j

Jason

01/30/2023, 5:27 PM
Yeah, that makes sense.
a

Adam Bloom

01/30/2023, 5:28 PM
anyways, you'll need to remove the following env var from the deployment (just edit it in kubernetes after apply kustomize) - this one is the airbyte bug. I'll try to get a PR in for it this week.
STATE_STORAGE_MINIO_ENDPOINT
- just totally delete that from the airbyte-worker deployment. Then, you need to add the following env vars to use S3 for state storage: •
STATE_STORAGE_S3_BUCKET_REGION
STATE_STORAGE_S3_REGION
(yes, looks like a duplicate, but different parts of the airbyte code check different variables right now) •
STATE_STORAGE_S3_BUCKET_NAME
STATE_STORAGE_S3_ACCESS_KEY
STATE_STORAGE_S3_SECRET_ACCESS_KEY
the charts don't appear to support setting any of those yet, but airbyte-worker will use them if defined.
j

Jason

01/30/2023, 5:31 PM
geez! Thanks a lot, Adam. So just btw, setting
CONTAINER_ORCHESTRATOR_ENABLED: ""
got me past the error and sync'ing seems to work on first attempt but no S3 logs. I'll follow your above instructions (and re-enable the orchestrator).
a

Adam Bloom

01/30/2023, 5:32 PM
s3 state storage and log storage are totally different, fyi
👍🏾 1
yeah - I had a fun week last week working through all this
🙃 1
j

Jason

01/30/2023, 5:41 PM
So the worker logs (in the UI) should be represented in the "log storage" which is also reflected in the airbyte-worker pod logs, correct?
a

Adam Bloom

01/30/2023, 5:51 PM
the sync logs are technically separate from worker logs (in the old setup, without orchestrator, the logs would all flow through worker too and show up there as well). but yes, the backing store for the airbyte UI logs is the "log storage" location.
👍🏾 1
🙏🏾 1
4 Views