I know I've asked this before, though it was sever...
# tools
t
I know I've asked this before, though it was several months ago. How are folks taking advantage of Singer taps/targets with Dagster? Has anyone built an integration (however hacky) with Meltano?
j
I’m working on just such an implementation with meltano right now
Meltano should soon “integrate” with Dagster, but our use case requires running Meltano within Dagster rather than the other way around. I’m currently running Meltano commands using my own solid variation of the
dagster_shell.utils.execute
function
full transparency: I have not actually deployed anything to production yet. I’m still building our entire platform, but I’m about 75% of the way to a complete POC. I’ve got this work so far on a local docker-compose configuration. Will be deploying to AWS in the end.
👍 1
t
That's great to hear! My preference would actually be for Dagster to be the driving force for the Meltano execution, rather than having Meltano be the primary interface. I already have some Dagster pipelines in production so it would be a bit odd to swap to having Meltano as the main tool. Any chance that the integration pieces that you're building can or will be open sourced? At the very least a blog post with the details would be helpful.
👍 1
j
not likely for now because there really isn’t much to my modifications. I haven’t built a true “integration” like what dagster has for a lot of other 3rd party libraries. Below the solid I use to run a cli command in the dagster worker:
Copy code
@solid(
    input_defs=[
        InputDefinition("start_after", Nothing),
        InputDefinition("shell_command", str),
        InputDefinition("env_dict", dict),
    ],
    output_defs=[OutputDefinition(str, "result")],
)
def run_shell_command(context, shell_command, env_dict):
    output, return_code = execute(
        shell_command=shell_command,
        log=context.log,
        output_logging="STREAM",
        env=env_dict,
        cwd=os.getenv('DAGSTER_APP')
    )

    if return_code:
        raise Failure(
            description="Shell command execution failed with output: {output}".format(output=output)
        )

    return output
which is really just a stripped down version of
dagster_shell.solids.shell_solid
in fact I had to
from dagster_shell.utils import execute
just to get it to work one other note: the reason why I went this route with my own custom solid rather than invoking
dagster_shell
directly is because Meltano relies heavily on setting environment variables which can’t be passed into the native dagster_shell solids at run time. See this thread for a deeper explanation. https://dagster.slack.com/archives/C01U954MEER/p1623181007055200
getting meltano on the run worker with all its code was probably the bigger battle than running the cli commands
I’m still trying to figure out how to run the meltano ui from the same container
I think once I get something in production I plan to present my stuff in Meltano’s demo day. So that might be a good chance for you to see how I setup everything.
t
I also just created an issue on the Meltano project to see about creating a web API that would simplify the work of integrating with it from systems like Dagster https://gitlab.com/meltano/meltano/-/issues/2813
j
Interesting idea, I hadn’t thought about using it that way!!
t
Yeah, initially I had thought to create a lib based on the meltano core that could be directly integrated with Dagster, but that's a lot of effort and code to maintain. I'm also not exactly sure of how that would play out in terms of ergonomics with the Meltano project configuration, etc. Being able to let Meltano do what it's good at and just have Dagster manage the execution that would be great.
j
Makes sense. Though, I think I have seen in their respective channels that it’s the intention of Dagster to do exactly that. I’ve mostly just been waiting for them to implement it. I guess I wouldn’t mind getting involved but I don’t think I’ve got the time right now. Gotta get this all working first.
m
Did y'all ever have any success getting Singer Taps/Targets running as part of a Dagster pipeline? I'm especially interested in how you managed Singer's state file across Dagster pipeline runs
j
That’s part of the reason why I’ve been using Meltano. Meltano uses a database (which I’ve made part of the dagster db) to manage state and Meltano uses a persistent yml file that manages config.
though if you’re not going to use Meltano, you’ll almost definitely need some sort of external storage (like S3) for your state file.
To your first question, as of right now I have about 25% of our source data replicating to snowflake using Meltano and dagster in production. Hoping to have 100% of our sources by the end of the month.
👍 2
@Tobias Macey and anyone else who’s interested. I’ll be doing a demo of our Dagster + Meltano implementation this Thursday during Meltano’s Demo Day at 1600 UTC. In case you wanted to see and provide feedback on how we’ve done it.
t
Thanks for the heads up, I'll have to take a look 🙂
b
@Josh Lloyd @Tobias Macey did any of you make progress in this direction. I am at a stage where I too need to use meltano with dagster.
j
@Binoy Shah I’m still using this combination in production. It’s working well. I’ve been working with @Jules Huisman (Quantile) to create an actually module between to Dagster and Meltano. It’s still in progress but I expect it will get its finishing touches in Q3 this year. This is the link to the github repo
👀 1
👍 2