I'm looking at using a composite solid to wrap a b...
# announcements
t
I'm looking at using a composite solid to wrap a bash_command_solid so that I can use config arguments to parameterize the command. I was also trying to use a results directory resource that I wrote to handle the output data of the command, but it looks like the composite_solid doesn't accept a resources option. I'm guessing the blessed approach would be to write a small solid that just generates the command data and output path and passes those to the bash solid?
Here's the code in question:
Copy code
@composite_solid(
    name='open_edx_forum_data_export',
    description='Export data for edX forums from Mongo database',
    config={
        'edx_mongodb_host': Field(
            String,
            is_required=True,
            description='Resolvable host address of MongoDB master'
        ),
        'edx_mongodb_port': Field(
            Int,
            is_required=False,
            default_value=27017,  # noqa WPS4232
            description='TCP port number used to connect to MongoDB server'
        ),
        'edx_mongodb_username': Field(
            String,
            is_required=False,
            default_value='',
            description='Username for account with permissions to read forum database'
        ),
        'edx_mongodb_password': Field(
            String,
            is_required=False,
            default_value='',
            description='Password for account with permissions to read forum database'
        ),
        'edx_mongodb_database_name': Field(
            String,
            is_required=True,
            description='Name of database that contains forum data for Open edX installation'
        ),
        'edx_forum_data_folder_name': Field(
            String,
            is_required=False,
            default_value='forum',
            description=('Name of the directory to create within the results directory for containing the exported '
                         'mongo database')
        )
    },
    required_resource_keys={'results_dir'},
    input_defs=[
        InputDefinition(
            name='edx_course_ids',
            dagster_type=List[String],
            description='List of course IDs active on Open edX installation'
        )
    ],
    output_defs=[
        OutputDefinition(
            name='edx_forum_data',
            dagster_type=String,
            description='Open edX forum data exported from Mongo database'
        )
    ]
)
def export_edx_forum_data(context: SolidExecutionContext) -> Sring:
    """Run mongodump for the database that contains Open edX forum submissions to be consumed by Institutional Research.

    :param context: Dagster execution context for propagaint configuration data
    :type context: SolidExecutionContext

    :returns: Path to exported database contents

    :rtype: String
    """
    forum_data_path = context.resources.results_dir.joinpath(context.config['edx_forum_data_folder_name'])
    command_array = ['/usr/bin/mongodump',
                               '--host',
                               context.config['edx_mongodb_host'],
                               '--port',
                               context.config['edx_mongodb_port'],
                               '--db',
                               context.config['edx_mongodb_database_name'],
                               '--authenticationDatabase',
                               'admin',
                               '--out',
                               str(forum_data_path)]
    if password := context.config['edx_mongodb_password']:
        command_array.extend(['--password', password])
    if username := context.config['edx_mongodb_username']:
        command_array.extend(['--username', username])
    bash_command_solid(' '.join(command_string))
    yield Output(
        str(forum_data_path),
        'edx_forum_data'
    )
m
have you looked at the config mapping function, the
config_fn
arg to
composite_solid
this can transform composite solid config in whatever arbitrary way you like to generate config for the child solids
i'm not quite sure i follow the resource question -- which solid would you like to have handling the output data?
t
So, what I would like to do is generate the command line string, passing that to the bash solid. As part of that, I need to construct the output path, which gets interned to the command string, and also yielded as the final output of the composite solid
Copy code
@solid(
    name='edx_forum_build_mongo_dump_command',
    description='Solid to build the command line string for executing mongodump against the Open edX forum database',
    required_resource_keys={'results_dir'},
    config={
        'edx_mongodb_host': Field(
            String,
            is_required=True,
            description='Resolvable host address of MongoDB master'
        ),
        'edx_mongodb_port': Field(
            Int,
            is_required=False,
            default_value=27017,  # noqa WPS4232
            description='TCP port number used to connect to MongoDB server'
        ),
        'edx_mongodb_username': Field(
            String,
            is_required=False,
            default_value='',
            description='Username for account with permissions to read forum database'
        ),
        'edx_mongodb_password': Field(
            String,
            is_required=False,
            default_value='',
            description='Password for account with permissions to read forum database'
        ),
        'edx_mongodb_database_name': Field(
            String,
            is_required=True,
            description='Name of database that contains forum data for Open edX installation'
        ),
        'edx_forum_data_folder_name': Field(
            String,
            is_required=False,
            default_value='forum',
            description=('Name of the directory to create within the results directory for containing the exported '
                         'mongo database')
        )
    },
    output_defs=[
        OutputDefinition(
            name='edx_forum_mongodump_command',
            dagster_type=String,
            description='Command line string for executing mongodump'
        ),
        OutputDefinition(
            name='edx_forum_data_directory',
            dagster_type=String,
            description='Path to exported forum data generated by mongodump command'
        )
    ]
)
def edx_forum_mongo_build_dump_command(context: SolidExSolidExecutionContext):
    forum_data_path = context.resources.results_dir.joinpath(context.config['edx_forum_data_folder_name'])
    command_array = ['/usr/bin/mongodump',
                               '--host',
                               context.config['edx_mongodb_host'],
                               '--port',
                               context.config['edx_mongodb_port'],
                               '--db',
                               context.config['edx_mongodb_database_name'],
                               '--authenticationDatabase',
                               'admin',
                               '--out',
                               str(forum_data_path)]
    if password := context.config['edx_mongodb_password']:
        command_array.extend(['--password', password])
    if username := context.config['edx_mongodb_username']:
        command_array.extend(['--username', username])
    yield Output(' '.join(command_array), 'edx_forum_mongodump_command')
    yield Output(str(forum_data_path), 'edx_forum_data_directory')


@composite_solid(
    name='open_edx_forum_data_export',
    description='Export data for edX forums from Mongo database',
    output_defs=[
        OutputDefinition(
            name='edx_forum_data_directory',
            dagster_type=String,
            description='Path to Open edX forum data exported from Mongo database'
        )
    ]
)
def export_edx_forum_data() -> Sring:
    """Run mongodump for the database that contains Open edX forum submissions to be consumed by Institutional Research.

    :param context: Dagster execution context for propagaint configuration data
    :type context: SolidExecutionContext

    :returns: Path to exported database contents

    :rtype: String
    """
    edx_forum_mongo_build_dump_command()
    bash_command_solid(' '.join(command_string), input_defs=[])
    yield Output(
        str(forum_data_path),
        'edx_forum_data'
    )
Here's the current state of my rewrite (untested as of now)
That might give a better sense of what I'm looking to do
@schrockn do you have any thoughts on ^?
m
interesting. i feel like you might want to configure the command string into the bash_command_solid
and then perhaps have a little stub solid with a
Nothing
dependency on the bash command solid that yielded the path
t
The challenge is that I need the path information before the bash solid so that I can include it in the command string, and then I need to yield that as the output so I can consume it downstream to handle uploading the exported data.
And to generate the command string I need the config data that gets pulled from the solid schema, and the results_dir resource to construct the full path to the output.
m
yep, you can have a topology where the config gets fed to the bash solid and to the output solid
but the output solid also depends on the bash solid
a
👍 1
b
@Tobias Macey I'm facing exactly the same problem. Did you manage to solve it with the suggested workaround?
t
This is the code I ended up with for executing a bash command as part of my pipeline https://github.com/mitodl/ol-data-pipelines/blob/main/src/ol_data_pipelines/edx/solids.py#L440-L478