in Blog

April 10, 2024

Enhancing MLOps with GitHub Workflows: An event-based approach

Author:




Kamil Abram

Data Scientist at Addepto


Reading time:




11 minutes


Getting MLOps right is crucial for businesses aiming to leverage the power of machine learning effectively. However, many companies are still stuck in the Excel era, with data infrastructure resembling a chaotic patchwork of mismatched puzzle pieces. Incorporating MLOps into such environments can be a hard task.

MLOps-CTA

The main challenge in MLOps lies in bridging the gap between Data Scientists and MLOps Engineers. Data Scientists are all about experimenting, tweaking models, and diving deep into algorithms using tools like Jupyter notebooks. Meanwhile, MLOps Engineers have the responsibility of taking these models and making sure they run smoothly and efficiently in real-world production environments. The clash in work styles and tools often creates friction in the ML pipeline.

This clash is especially noticeable with the widespread use of notebooks by Data Scientists. While these tools are great for exploring data and fine-tuning models, they often fall short when it comes to scaling up for production.

In one of our projects, we faced this exact challenge head-on and came out with a fully-fledged MLOps platform. This platform seamlessly integrates with Databricks, combining the flexibility of notebooks with the robustness of MLOps practices.

Read the case study:
MLOps Platform aimed to facilitate a smooth transition of models from concept to deployment

In our project, GitHub became a crucial piece of the puzzle for managing code repositories. However, despite the unified environment provided by Databricks, handling GitHub repositories manually led to bottlenecks and inconsistencies in our deployment process.

To tackle this issue, we turned to GitHub Actions and its powerful feature: workflow_dispatch. By automating routine tasks through an event-driven approach, we aimed to streamline our deployment pipeline, boosting productivity and operational efficiency.

In this article, I’ll guide you through our journey step-by-step, showing you how we tackled these challenges and implemented workflow_dispatch for GitOps. So let’s get started and dive into the intricacies of making MLOps work seamlessly for your projects.

Leveraging workflow_dispatch for GitOps: An Event-Driven Approach to Managing GitHub Repositories

Automating the management of GitHub repositories through an event-driven approach significantly enhances productivity and operational efficiency. GitHub Actions offers a powerful tool in the form of workflow_dispatch, which allows developers to manually trigger workflows from GitHub’s UI, CLI, or via an API call. This capability is invaluable for implementing an event-driven architecture in managing repositories, enabling precise control over when and how actions are executed based on specific events or conditions.

Understanding workflow_dispatch

The workflow_dispatch event triggers a GitHub Actions workflow manually. It’s designed to provide developers with flexibility, allowing them to run workflows both on demand and automatically upon various custom events. This feature is particularly useful in scenarios where workflows need to be executed based on external events or at specific stages of the development cycle.

Using workflow_dispatch

Activating workflow_dispatch within a workflow necessitates the inclusion of the workflow_dispatch clause in the workflow file, located within the .github/workflows directory on the default branch. GitHub will subsequently present an option within the Actions interface to manually initiate this workflow, thereby integrating a manual trigger mechanism directly into the UI.

The code snippet below illustrates a sample workflow that appears in the editor once you select the Simple Workflow template and press the Configure button.

# This is a basic workflow to help you get started with Actions

name: CI

# Controls when the workflow will run
on:
# Triggers the workflow on push or pull request events but only for the "main" branch
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
# This workflow contains a single job called "build"
build:
# The type of runner that the job will run on
runs-on: ubuntu-latest

# Steps represent a sequence of tasks that will be executed as part of the job
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- uses: actions/checkout@v3

# Runs a single command using the runners shell
- name: Run a one-line script
run: echo Hello, world!

# Runs a set of commands using the runners shell
- name: Run a multi-line script
run: |
echo Add other actions to build,
echo test, and deploy your project.

Once the commit is finalized, the file becomes part of the repository’s codebase. If you navigate to the Actions tab at the top menu, you’ll find the workflow either in progress or already completed. This is because the commit to the main branch triggered the workflow, fulfilling the conditions outlined in the on section of the workflow file:

on:
...
push:
branches: [ "main" ]
...

Figure 1 displays the GitHub Actions UI, featuring a single workflow that has been executed once. On the UI’s left side, there’s a list of all the workflows linked to this repository. Selecting an item from this list filters the displayed workflow runs on the right side. By default, “All workflows” is chosen, showing runs from every workflow. However, choosing a specific workflow from the left-side list narrows down the display on the right, focusing exclusively on runs from the selected workflow.

GitHub Actions UI showing the example workflow run.

Figure 1. GitHub Actions UI showing the example workflow run.

Selecting a workflow will prompt the display of a message stating “This workflow has a workflow_dispatch event trigger,” accompanied by a “Run workflow” button. This option appears only if the workflow file in the default branch contains the specific code within its on section, enabling manual execution from the Actions tab:

on:
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
...
"CI" workflow run panel.

Figure 2. “CI” workflow run panel.

This scenario describes the workflow_dispatch trigger in action. A button is made available which, when clicked, initiates another manual execution of the workflow. This action brings up a dialog box, enabling you to choose a branch for the workflow run, along with any other defined options. Once triggered, the workflow runs, adding another entry to the execution history as depicted in Figure 3.

A manual workflow run

Figure 3. A manual workflow run

Utilizing this direct workflow invocation method is particularly beneficial for purposes such as prototyping, debugging, or other situations where triggering a run without a GitHub event is preferable. Figure 4 illustrates how this manual invocation appears.

Invoking "CI" workflow via a `workflow_dispatch` event.

Figure 4. Invoking “CI” workflow via a `workflow_dispatch` event.

Defining and Utilizing Workflow Inputs

In this context, “inputs” refer to specific values provided by either a user or an automated process directly to the workflow. This differs from values obtained through default environment variables or contexts.

Once these inputs are precisely defined within a workflow, they are accessible using the notation ${{ github.event.inputs.<input_name> }}. Below is an illustration of how a job within a workflow_dispatch triggered workflow can leverage such defined inputs:

name: CI with Inputs

on:
workflow_dispatch:
inputs:
log_level:
description: 'Log level for the workflow'
required: true
default: 'info'
type: string
environment:
description: 'Environment for the workflow'
required: false
default: 'production'
type: string
run_tests:
description: 'Run tests in the workflow'
required: true
default: true
type: boolean

jobs:
build:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Set log level, environment, and run tests
run: |
echo "Setting log level to ${{ github.event.inputs.log_level }}"
echo "Setting environment to ${{ github.event.inputs.environment }}"
echo "Run tests: ${{ github.event.inputs.run_tests }}"

- name: Run a one-line script
run: echo Hello, world!

- name: Run a multi-line script
run: |
echo Add other actions to build,
echo test, and deploy your project.

This example demonstrates a workflow tailored for the workflow_dispatch trigger, which incorporates user-defined inputs for log level, target environment, and whether to run tests. These inputs enable a high degree of customization for each workflow run, allowing for specific execution parameters to be set directly at the time of triggering.

Figure 5 illustrates an example of triggering a workflow with inputs specified, demonstrating how users can provide specific values at the time of invocation to tailor the workflow execution according to their needs.

Invoking "CI with Inputs" workflow via `workflow_dispatch`.

Figure 5. Invoking “CI with Inputs” workflow via `workflow_dispatch`.

Triggering Workflows via other means

The workflow_dispatch trigger offers three methods to initiate workflows: via the GitHub Actions tab, the GitHub CLI, or a REST API call. For scenarios requiring automated workflow execution from external systems or scripts, workflow_dispatch can be triggered via a curl command. This method involves making a POST request to the GitHub API, specifying the repository, workflow, and the branch on which the workflow should run.

Example curl command to trigger a workflow:

curl -X POST \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer YOUR_GITHUB_TOKEN" \ https://api.github.com/repos/your-organization/your-repo/actions/workflows/your-workflow-id/dispatches \
-d '{"ref":"your-branch-name"}'

This command sends a request to GitHub to run the specified workflow on the given branch, leveraging your GitHub token for authentication.

For those who prefer working from the command line, GitHub CLI (gh) provides a convenient way to trigger workflows:

gh workflow run your-workflow.yml -F ref=your-branch-name

This command initiates the specified workflow on the designated branch, offering a seamless integration into command-line based workflows.

Differentiating workflow_dispatch from Similar Trigger Events in GitHub Actions

GitHub Actions offers a variety of triggers for workflows, beyond the commonly utilized event-based triggers. A subset of these triggers is designed to initiate workflows without direct changes to the repository, accommodating external or manual interventions. This includes workflow_dispatch, repository_dispatch, workflow_call, and workflow_run events, each serving distinct purposes.

Events Ending with _dispatch

The events ending in _dispatch—namely, workflow_dispatch and repository_dispatch—are tailored for initiating workflows in response to actions happening outside of GitHub. Both events enable similar triggering mechanisms, yet they serve different purposes. The workflow_dispatch event is specifically for launching an individual workflow, providing direct control over its execution. On the other hand, repository_dispatch is aimed at activating multiple workflows within a single repository, usually reacting to external or custom events. A typical use case could be an external Continuous Integration (CI) system triggering a series of workflows for Continuous Deployment (CD) purposes.

Events Starting with workflow_

The workflow_ prefix in event names suggests a focus on workflow interactions, with each event offering unique capabilities:

workflow_run: This trigger is utilized to execute a secondary workflow upon the completion of a prerequisite workflow, regardless of the outcome (success or failure). It enables sequential workflow execution, where the second workflow’s initiation is contingent upon the first workflow’s completion.

workflow_call: This event renders a workflow callable from another workflow, promoting reusability across different projects or within the same repository. A workflow tagged with workflow_call becomes a reusable entity, inheriting the event payload from the calling workflow.

workflow_dispatch: Stands out as the manual trigger within this group, allowing workflows to be initiated manually through the GitHub API, GitHub CLI, or directly via the Actions tab in the GitHub UI. This event is crucial for workflows that require on-demand execution or are dependent on specific inputs at runtime.

Security Considerations

Given that a GitHub token can carry permissions enabling it to execute tasks within your workflows, there’s a theoretical risk it could initiate additional workflow runs, potentially leading to recursive workflow execution. To mitigate this risk, GitHub restricts the ability of events triggered by the repository’s token to spawn new workflow runs. However, there are two notable exceptions: the workflow_dispatch and repository_dispatch events are exempt from this restriction. This exemption is logical, as it supports the intended functionality of these events to trigger other workflows deliberately, facilitating complex automation scenarios while maintaining control over workflow initiation.

Conclusion

workflow_dispatch boosts GitOps by providing manual control for precise, flexible repository management. Available through GitHub’s UI, CLI, and API, it lets teams customize automation, enhancing operational efficiency and adaptability.

References

For further details on utilizing workflow_dispatch within GitHub Actions and integrating it into your GitOps practices, consult the following official GitHub documentation resources:
GitHub Documentation on workflow_dispatch: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_dispatch
Webhook Events and Payloads for workflow_dispatch: https://docs.github.com/en/webhooks/webhook-events-and-payloads#workflow_dispatch
Creating a Repository Dispatch Event via GitHub’s REST API: https://docs.github.com/en/rest/repos/repos?apiVersion=2022-11-28#create-a-repository-dispatch-event



Category:


MLOps

Data Engineering