Author:
Data Scientist at Addepto
Reading time:
Getting MLOps right is crucial for businesses aiming to leverage the power of machine learning effectively. However, many companies are still stuck in the Excel era, with data infrastructure resembling a chaotic patchwork of mismatched puzzle pieces. Incorporating MLOps into such environments can be a hard task.
The main challenge in MLOps lies in bridging the gap between Data Scientists and MLOps Engineers. Data Scientists are all about experimenting, tweaking models, and diving deep into algorithms using tools like Jupyter notebooks. Meanwhile, MLOps Engineers have the responsibility of taking these models and making sure they run smoothly and efficiently in real-world production environments. The clash in work styles and tools often creates friction in the ML pipeline.
This clash is especially noticeable with the widespread use of notebooks by Data Scientists. While these tools are great for exploring data and fine-tuning models, they often fall short when it comes to scaling up for production.
In one of our projects, we faced this exact challenge head-on and came out with a fully-fledged MLOps platform. This platform seamlessly integrates with Databricks, combining the flexibility of notebooks with the robustness of MLOps practices.
Read the case study:
MLOps Platform aimed to facilitate a smooth transition of models from concept to deployment
In our project, GitHub became a crucial piece of the puzzle for managing code repositories. However, despite the unified environment provided by Databricks, handling GitHub repositories manually led to bottlenecks and inconsistencies in our deployment process.
To tackle this issue, we turned to GitHub Actions and its powerful feature: workflow_dispatch. By automating routine tasks through an event-driven approach, we aimed to streamline our deployment pipeline, boosting productivity and operational efficiency.
In this article, I’ll guide you through our journey step-by-step, showing you how we tackled these challenges and implemented workflow_dispatch for GitOps. So let’s get started and dive into the intricacies of making MLOps work seamlessly for your projects.
Automating the management of GitHub repositories through an event-driven approach significantly enhances productivity and operational efficiency. GitHub Actions offers a powerful tool in the form of workflow_dispatch
, which allows developers to manually trigger workflows from GitHub’s UI, CLI, or via an API call. This capability is invaluable for implementing an event-driven architecture in managing repositories, enabling precise control over when and how actions are executed based on specific events or conditions.
workflow_dispatch
The workflow_dispatch event triggers a GitHub Actions workflow manually. It’s designed to provide developers with flexibility, allowing them to run workflows both on demand and automatically upon various custom events. This feature is particularly useful in scenarios where workflows need to be executed based on external events or at specific stages of the development cycle.
workflow_dispatch
Activating workflow_dispatch
within a workflow necessitates the inclusion of the workflow_dispatch
clause in the workflow file, located within the .github/workflows
directory on the default branch. GitHub will subsequently present an option within the Actions interface to manually initiate this workflow, thereby integrating a manual trigger mechanism directly into the UI.
The code snippet below illustrates a sample workflow that appears in the editor once you select the Simple Workflow template and press the Configure button.
# This is a basic workflow to help you get started with Actions name: CI # Controls when the workflow will run on: # Triggers the workflow on push or pull request events but only for the "main" branch push: branches: [ "main" ] pull_request: branches: [ "main" ] # Allows you to run this workflow manually from the Actions tab workflow_dispatch: # A workflow run is made up of one or more jobs that can run sequentially or in parallel jobs: # This workflow contains a single job called "build" build: # The type of runner that the job will run on runs-on: ubuntu-latest # Steps represent a sequence of tasks that will be executed as part of the job steps: # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it - uses: actions/checkout@v3 # Runs a single command using the runners shell - name: Run a one-line script run: echo Hello, world! # Runs a set of commands using the runners shell - name: Run a multi-line script run: | echo Add other actions to build, echo test, and deploy your project.
Once the commit is finalized, the file becomes part of the repository’s codebase. If you navigate to the Actions tab at the top menu, you’ll find the workflow either in progress or already completed. This is because the commit to the main branch triggered the workflow, fulfilling the conditions outlined in the on
section of the workflow file:
on: ... push: branches: [ "main" ] ...
Figure 1 displays the GitHub Actions UI, featuring a single workflow that has been executed once. On the UI’s left side, there’s a list of all the workflows linked to this repository. Selecting an item from this list filters the displayed workflow runs on the right side. By default, “All workflows” is chosen, showing runs from every workflow. However, choosing a specific workflow from the left-side list narrows down the display on the right, focusing exclusively on runs from the selected workflow.
Selecting a workflow will prompt the display of a message stating “This workflow has a workflow_dispatch
event trigger,” accompanied by a “Run workflow” button. This option appears only if the workflow file in the default branch contains the specific code within its on
section, enabling manual execution from the Actions tab:
on: # Allows you to run this workflow manually from the Actions tab workflow_dispatch: ...
This scenario describes the workflow_dispatch
trigger in action. A button is made available which, when clicked, initiates another manual execution of the workflow. This action brings up a dialog box, enabling you to choose a branch for the workflow run, along with any other defined options. Once triggered, the workflow runs, adding another entry to the execution history as depicted in Figure 3.
Utilizing this direct workflow invocation method is particularly beneficial for purposes such as prototyping, debugging, or other situations where triggering a run without a GitHub event is preferable. Figure 4 illustrates how this manual invocation appears.
In this context, “inputs” refer to specific values provided by either a user or an automated process directly to the workflow. This differs from values obtained through default environment variables or contexts.
Once these inputs are precisely defined within a workflow, they are accessible using the notation ${{ github.event.inputs.<input_name> }}
. Below is an illustration of how a job within a workflow_dispatch
triggered workflow can leverage such defined inputs:
name: CI with Inputs on: workflow_dispatch: inputs: log_level: description: 'Log level for the workflow' required: true default: 'info' type: string environment: description: 'Environment for the workflow' required: false default: 'production' type: string run_tests: description: 'Run tests in the workflow' required: true default: true type: boolean jobs: build: runs-on: ubuntu-latest steps: - name: Checkout code uses: actions/checkout@v3 - name: Set log level, environment, and run tests run: | echo "Setting log level to ${{ github.event.inputs.log_level }}" echo "Setting environment to ${{ github.event.inputs.environment }}" echo "Run tests: ${{ github.event.inputs.run_tests }}" - name: Run a one-line script run: echo Hello, world! - name: Run a multi-line script run: | echo Add other actions to build, echo test, and deploy your project.
This example demonstrates a workflow tailored for the workflow_dispatch
trigger, which incorporates user-defined inputs for log level, target environment, and whether to run tests. These inputs enable a high degree of customization for each workflow run, allowing for specific execution parameters to be set directly at the time of triggering.
Figure 5 illustrates an example of triggering a workflow with inputs specified, demonstrating how users can provide specific values at the time of invocation to tailor the workflow execution according to their needs.
The workflow_dispatch
trigger offers three methods to initiate workflows: via the GitHub Actions tab, the GitHub CLI, or a REST API call. For scenarios requiring automated workflow execution from external systems or scripts, workflow_dispatch
can be triggered via a curl command. This method involves making a POST request to the GitHub API, specifying the repository, workflow, and the branch on which the workflow should run.
Example curl
command to trigger a workflow:
curl -X POST \ -H "Accept: application/vnd.github+json" \ -H "Authorization: Bearer YOUR_GITHUB_TOKEN" \ https://api.github.com/repos/your-organization/your-repo/actions/workflows/your-workflow-id/dispatches \ -d '{"ref":"your-branch-name"}'
This command sends a request to GitHub to run the specified workflow on the given branch, leveraging your GitHub token for authentication.
For those who prefer working from the command line, GitHub CLI (gh) provides a convenient way to trigger workflows:
gh workflow run your-workflow.yml -F ref=your-branch-name
This command initiates the specified workflow on the designated branch, offering a seamless integration into command-line based workflows.
workflow_dispatch
from Similar Trigger Events in GitHub ActionsGitHub Actions offers a variety of triggers for workflows, beyond the commonly utilized event-based triggers. A subset of these triggers is designed to initiate workflows without direct changes to the repository, accommodating external or manual interventions. This includes workflow_dispatch, repository_dispatch, workflow_call, and workflow_run
events, each serving distinct purposes.
_dispatch
The events ending in _dispatch
—namely, workflow_dispatch
and repository_dispatch
—are tailored for initiating workflows in response to actions happening outside of GitHub. Both events enable similar triggering mechanisms, yet they serve different purposes. The workflow_dispatch
event is specifically for launching an individual workflow, providing direct control over its execution. On the other hand, repository_dispatch
is aimed at activating multiple workflows within a single repository, usually reacting to external or custom events. A typical use case could be an external Continuous Integration (CI) system triggering a series of workflows for Continuous Deployment (CD) purposes.
workflow_
The workflow_
prefix in event names suggests a focus on workflow interactions, with each event offering unique capabilities:
– workflow_run: This trigger is utilized to execute a secondary workflow upon the completion of a prerequisite workflow, regardless of the outcome (success or failure). It enables sequential workflow execution, where the second workflow’s initiation is contingent upon the first workflow’s completion.
– workflow_call: This event renders a workflow callable from another workflow, promoting reusability across different projects or within the same repository. A workflow tagged with workflow_call
becomes a reusable entity, inheriting the event payload from the calling workflow.
– workflow_dispatch: Stands out as the manual trigger within this group, allowing workflows to be initiated manually through the GitHub API, GitHub CLI, or directly via the Actions tab in the GitHub UI. This event is crucial for workflows that require on-demand execution or are dependent on specific inputs at runtime.
Given that a GitHub token can carry permissions enabling it to execute tasks within your workflows, there’s a theoretical risk it could initiate additional workflow runs, potentially leading to recursive workflow execution. To mitigate this risk, GitHub restricts the ability of events triggered by the repository’s token to spawn new workflow runs. However, there are two notable exceptions: the workflow_dispatch
and repository_dispatch
events are exempt from this restriction. This exemption is logical, as it supports the intended functionality of these events to trigger other workflows deliberately, facilitating complex automation scenarios while maintaining control over workflow initiation.
Conclusion
workflow_dispatch
boosts GitOps by providing manual control for precise, flexible repository management. Available through GitHub’s UI, CLI, and API, it lets teams customize automation, enhancing operational efficiency and adaptability.
References
For further details on utilizing workflow_dispatch
within GitHub Actions and integrating it into your GitOps practices, consult the following official GitHub documentation resources:
GitHub Documentation on workflow_dispatch
: https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows#workflow_dispatch
Webhook Events and Payloads for workflow_dispatch: https://docs.github.com/en/webhooks/webhook-events-and-payloads#workflow_dispatch
Creating a Repository Dispatch Event via GitHub’s REST API: https://docs.github.com/en/rest/repos/repos?apiVersion=2022-11-28#create-a-repository-dispatch-event
Category: