2022-05-22 10:25:50 +09:00
---
apiVersion : storage.k8s.io/v1
kind : StorageClass
metadata :
name : ${NAME}-runner-work-dir
labels :
content : ${NAME}-runner-work-dir
provisioner : rancher.io/local-path
reclaimPolicy : Delete
volumeBindingMode : WaitForFirstConsumer
---
2022-05-16 09:26:48 +09:00
apiVersion : storage.k8s.io/v1
kind : StorageClass
metadata :
name : ${NAME}
# In kind environments, the provider writes:
# /var/lib/docker/volumes/KIND_NODE_CONTAINER_VOL_ID/_data/local-path-provisioner/PV_NAME
# It can be hundreds of gigabytes depending on what you cache in the test workflow. Beware to not encounter `no space left on device` errors!
# If you did encounter no space errorrs try:
# docker system prune
# docker buildx prune #=> frees up /var/lib/docker/volumes/buildx_buildkit_container-builder0_state
# sudo rm -rf /var/lib/docker/volumes/KIND_NODE_CONTAINER_VOL_ID/_data/local-path-provisioner #=> frees up local-path-provisioner's data
provisioner : rancher.io/local-path
reclaimPolicy : Retain
volumeBindingMode : WaitForFirstConsumer
---
apiVersion : storage.k8s.io/v1
kind : StorageClass
metadata :
name : ${NAME}-var-lib-docker
labels :
content : ${NAME}-var-lib-docker
provisioner : rancher.io/local-path
reclaimPolicy : Retain
volumeBindingMode : WaitForFirstConsumer
---
apiVersion : storage.k8s.io/v1
kind : StorageClass
metadata :
name : ${NAME}-cache
labels :
content : ${NAME}-cache
provisioner : rancher.io/local-path
reclaimPolicy : Retain
volumeBindingMode : WaitForFirstConsumer
---
apiVersion : storage.k8s.io/v1
kind : StorageClass
metadata :
name : ${NAME}-runner-tool-cache
labels :
content : ${NAME}-runner-tool-cache
provisioner : rancher.io/local-path
reclaimPolicy : Retain
volumeBindingMode : WaitForFirstConsumer
---
2022-11-01 20:30:10 +09:00
apiVersion : storage.k8s.io/v1
kind : StorageClass
metadata :
name : ${NAME}-rootless-dind-work-dir
labels :
content : ${NAME}-rootless-dind-work-dir
provisioner : rancher.io/local-path
reclaimPolicy : Delete
volumeBindingMode : WaitForFirstConsumer
---
2021-06-22 17:10:09 +09:00
apiVersion : actions.summerwind.dev/v1alpha1
kind : RunnerSet
metadata :
2022-02-27 11:43:07 +00:00
name : ${NAME}
2021-06-22 17:10:09 +09:00
spec :
# MANDATORY because it is based on StatefulSet: Results in a below error when omitted:
# missing required field "selector" in dev.summerwind.actions.v1alpha1.RunnerSet.spec
selector :
matchLabels :
2022-02-27 11:43:07 +00:00
app : ${NAME}
2021-06-22 17:10:09 +09:00
# MANDATORY because it is based on StatefulSet: Results in a below error when omitted:
# missing required field "serviceName" in dev.summerwind.actions.v1alpha1.RunnerSet.spec]
2022-02-27 11:43:07 +00:00
serviceName : ${NAME}
2021-06-22 17:10:09 +09:00
#replicas: 1
2021-06-24 20:39:37 +09:00
# From my limited testing, `ephemeral: true` is more reliable.
# Seomtimes, updating already deployed runners from `ephemeral: false` to `ephemeral: true` seems to
# result in queued jobs hanging forever.
feat: Workflow job based ephemeral runner scaling (#721)
This add support for two upcoming enhancements on the GitHub side of self-hosted runners, ephemeral runners, and `workflow_jow` events. You can't use these yet.
**These features are not yet generally available to all GitHub users**. Please take this pull request as a preparation to make it available to actions-runner-controller users as soon as possible after GitHub released the necessary features on their end.
**Ephemeral runners**:
The former, ephemeral runners, is basically the reliable alternative to `--once`, which we've been using when you enabled `ephemeral: true` (default in actions-runner-controller).
`--once` has been suffering from a race issue #466. `--ephemeral` fixes that.
To enable ephemeral runners with `actions/runner`, you give `--ephemeral` to `config.sh`. This updated version of `actions-runner-controller` does it for you, by using `--ephemeral` instead of `--once` when you set `RUNNER_FEATURE_FLAG_EPHEMERAL=true`.
Please read the section `Ephemeral Runners` in the updated version of our README for more information.
Note that ephemeral runners is not released on GitHub yet. And `RUNNER_FEATURE_FLAG_EPHEMERAL=true` won't work at all until the feature gets released on GitHub. Stay tuned for an announcement from GitHub!
**`workflow_job` events**:
`workflow_job` is the additional webhook event that corresponds to each GitHub Actions workflow job run. It provides `actions-runner-controller` a solid foundation to improve our webhook-based autoscale.
Formerly, we've been exploiting webhook events like `check_run` for autoscaling. However, as none of our supported events has included `labels`, you had to configure an HRA to only match relevant `check_run` events. It wasn't trivial.
In contrast, a `workflow_job` event payload contains `labels` of runners requested. `actions-runner-controller` is able to automatically decide which HRA to scale by filtering the corresponding RunnerDeployment by `labels` included in the webhook payload. So all you need to use webhook-based autoscale will be to enable `workflow_job` on GitHub and expose actions-runner-controller's webhook server to the internet.
Note that the current implementation of `workflow_job` support works in two ways, increment, and decrement. An increment happens when the webhook server receives` workflow_job` of `queued` status. A decrement happens when it receives `workflow_job` of `completed` status. The latter is used to make scaling-down faster so that you waste money less than before. You still don't suffer from flapping, as a scale-down is still subject to `scaleDownDelaySecondsAfterScaleOut `.
Please read the section `Example 3: Scale on each `workflow_job` event` in the updated version of our README for more information on its usage.
2021-08-11 09:52:04 +09:00
ephemeral : ${TEST_EPHEMERAL}
2021-06-22 17:10:09 +09:00
2022-02-27 11:43:07 +00:00
enterprise : ${TEST_ENTERPRISE}
group : ${TEST_GROUP}
organization : ${TEST_ORG}
2021-06-22 17:10:09 +09:00
repository : ${TEST_REPO}
2022-02-27 11:43:07 +00:00
2021-06-22 17:10:09 +09:00
#
# Custom runner image
#
image : ${RUNNER_NAME}:${RUNNER_TAG}
2022-02-27 11:43:07 +00:00
2021-06-22 17:10:09 +09:00
#
# dockerd within runner container
#
## Replace `mumoshu/actions-runner-dind:dev` with your dind image
#dockerdWithinRunnerContainer: true
2022-02-27 11:43:07 +00:00
dockerdWithinRunnerContainer : ${RUNNER_DOCKERD_WITHIN_RUNNER_CONTAINER}
2021-06-22 17:10:09 +09:00
#
# Set the MTU used by dockerd-managed network interfaces (including docker-build-ubuntu)
#
#dockerMTU: 1450
#Runner group
# labels:
# - "mylabel 1"
# - "mylabel 2"
e2e: Install and run workflow and verify the result (#661)
This enhances the E2E test suite introduced in #658 to also include the following steps:
- Install GitHub Actions workflow
- Trigger a workflow run via a git commit
- Verify the workflow run result
In the workflow, we use `kubectl create cm --from-literal` to create a configmap that contains an unique test ID. In the last step we obtain the configmap from within the E2E test and check the test ID to match the expected one.
To install a GitHub Actions workflow, we clone a GitHub repository denoted by the TEST_REPO envvar, progmatically generate a few files with some Go code, run `git-add`, `git-commit`, and then `git-push` to actually push the files to the repository. A single commit containing an updated workflow definition and an updated file seems to run a workflow derived to the definition introduced in the commit, which was a bit surpirising and useful behaviour.
At this point, the E2E test fully covers all the steps for a GitHub token based installation. We need to add scenarios for more deployment options, like GitHub App, RunnerDeployment, HRA, and so on. But each of them would worth another pull request.
2021-06-28 08:30:32 +09:00
labels :
- "${RUNNER_LABEL}"
2021-06-22 17:10:09 +09:00
#
# Non-standard working directory
#
# workDir: "/"
template :
metadata :
labels :
2022-02-27 11:43:07 +00:00
app : ${NAME}
2021-06-22 17:10:09 +09:00
spec :
2022-08-25 04:44:22 +00:00
serviceAccountName : ${RUNNER_SERVICE_ACCOUNT_NAME}
2022-11-01 20:30:10 +09:00
terminationGracePeriodSeconds : ${RUNNER_TERMINATION_GRACE_PERIOD_SECONDS}
2021-06-22 17:10:09 +09:00
containers :
2022-11-01 20:30:10 +09:00
# # Uncomment only when non-dind-runner / you're using docker sidecar
# - name: docker
# # Image is required for the dind sidecar definition within RunnerSet spec
# image: "docker:dind"
# env:
# - name: RUNNER_GRACEFUL_STOP_TIMEOUT
# value: "${RUNNER_GRACEFUL_STOP_TIMEOUT}"
2021-06-22 17:10:09 +09:00
- name : runner
imagePullPolicy : IfNotPresent
2022-05-16 09:26:48 +09:00
env :
2022-11-01 20:30:10 +09:00
- name : RUNNER_GRACEFUL_STOP_TIMEOUT
value : "${RUNNER_GRACEFUL_STOP_TIMEOUT}"
2022-05-16 09:26:48 +09:00
- name : RUNNER_FEATURE_FLAG_EPHEMERAL
value : "${RUNNER_FEATURE_FLAG_EPHEMERAL}"
- name : GOMODCACHE
value : "/home/runner/.cache/go-mod"
2022-08-26 01:28:00 +00:00
- name : ROLLING_UPDATE_PHASE
value : "${ROLLING_UPDATE_PHASE}"
2022-05-22 10:25:50 +09:00
# PV-backed runner work dir
2022-05-16 09:26:48 +09:00
volumeMounts :
2022-06-29 22:15:50 +09:00
# Comment out the ephemeral work volume if you're going to test the kubernetes container mode
# The volume and mount with the same names will be created by workVolumeClaimTemplate and the kubernetes container mode support.
# - name: work
# mountPath: /runner/_work
2022-05-16 09:26:48 +09:00
# Cache docker image layers, in case dockerdWithinRunnerContainer=true
- name : var-lib-docker
mountPath : /var/lib/docker
# Cache go modules and builds
# - name: gocache
# # Run `goenv | grep GOCACHE` to verify the path is correct for your env
# mountPath: /home/runner/.cache/go-build
# - name: gomodcache
# # Run `goenv | grep GOMODCACHE` to verify the path is correct for your env
# # mountPath: /home/runner/go/pkg/mod
- name : cache
# go: could not create module cache: stat /home/runner/.cache/go-mod: permission denied
mountPath : "/home/runner/.cache"
- name : runner-tool-cache
# This corresponds to our runner image's default setting of RUNNER_TOOL_CACHE=/opt/hostedtoolcache.
#
# In case you customize the envvar in both runner and docker containers of the runner pod spec,
# You'd need to change this mountPath accordingly.
#
# The tool cache directory is defined in actions/toolkit's tool-cache module:
# https://github.com/actions/toolkit/blob/2f164000dcd42fb08287824a3bc3030dbed33687/packages/tool-cache/src/tool-cache.ts#L621-L638
#
# Many setup-* actions like setup-go utilizes the tool-cache module to download and cache installed binaries:
# https://github.com/actions/setup-go/blob/56a61c9834b4a4950dbbf4740af0b8a98c73b768/src/installer.ts#L144
mountPath : "/opt/hostedtoolcache"
# Valid only when dockerdWithinRunnerContainer=false
2022-08-27 07:12:55 +00:00
# - name: docker
# # PV-backed runner work dir
# volumeMounts:
# - name: work
# mountPath: /runner/_work
# # Cache docker image layers, in case dockerdWithinRunnerContainer=false
# - name: var-lib-docker
# mountPath: /var/lib/docker
# # image: mumoshu/actions-runner-dind:dev
2022-05-16 09:26:48 +09:00
2022-08-27 07:12:55 +00:00
# # For buildx cache
# - name: cache
# mountPath: "/home/runner/.cache"
2022-11-01 20:30:10 +09:00
# For fixing no space left error on rootless dind runner
- name : rootless-dind-work-dir
# Omit the /share/docker part of the /home/runner/.local/share/docker as
# that part is created by dockerd.
mountPath : /home/runner/.local
readOnly : false
# Comment out the ephemeral work volume if you're going to test the kubernetes container mode
2022-06-29 22:15:50 +09:00
# volumes:
# - name: work
# ephemeral:
# volumeClaimTemplate:
# spec:
# accessModes:
# - ReadWriteOnce
# storageClassName: "${NAME}-runner-work-dir"
# resources:
# requests:
# storage: 10Gi
2022-11-01 20:30:10 +09:00
# Fix the following no space left errors with rootless-dind runners that can happen while running buildx build:
# ------
# > [4/5] RUN go mod download:
# ------
# ERROR: failed to solve: failed to prepare yxsw8lv9hqnuafzlfta244l0z: mkdir /home/runner/.local/share/docker/vfs/dir/yxsw8lv9hqnuafzlfta244l0z/usr/local/go/src/cmd/compile/internal/types2/testdata: no space left on device
# Error: Process completed with exit code 1.
#
volumes :
- name : rootless-dind-work-dir
ephemeral :
volumeClaimTemplate :
spec :
accessModes : [ "ReadWriteOnce" ]
storageClassName : "${NAME}-rootless-dind-work-dir"
resources :
requests :
storage : 3Gi
2022-05-16 09:26:48 +09:00
volumeClaimTemplates :
- metadata :
name : vol1
spec :
accessModes :
- ReadWriteOnce
resources :
requests :
storage : 10Mi
storageClassName : ${NAME}
## Dunno which provider supports auto-provisioning with selector.
## At least the rancher local path provider stopped with:
## waiting for a volume to be created, either by external provisioner "rancher.io/local-path" or manually created by system administrator
# selector:
# matchLabels:
# runnerset-volume-id: ${NAME}-vol1
- metadata :
name : vol2
spec :
accessModes :
- ReadWriteOnce
resources :
requests :
storage : 10Mi
storageClassName : ${NAME}
# selector:
# matchLabels:
# runnerset-volume-id: ${NAME}-vol2
- metadata :
name : var-lib-docker
spec :
accessModes :
- ReadWriteOnce
resources :
requests :
storage : 10Mi
storageClassName : ${NAME}-var-lib-docker
- metadata :
name : cache
spec :
accessModes :
- ReadWriteOnce
resources :
requests :
storage : 10Mi
storageClassName : ${NAME}-cache
- metadata :
name : runner-tool-cache
# It turns out labels doesn't distinguish PVs across PVCs and the
# end result is PVs are reused by wrong PVCs.
# The correct way seems to be to differentiate storage class per pvc template.
# labels:
# id: runner-tool-cache
spec :
accessModes :
- ReadWriteOnce
resources :
requests :
storage : 10Mi
storageClassName : ${NAME}-runner-tool-cache
2022-02-27 11:43:07 +00:00
---
apiVersion : actions.summerwind.dev/v1alpha1
kind : HorizontalRunnerAutoscaler
metadata :
name : ${NAME}
spec :
scaleTargetRef :
kind : RunnerSet
name : ${NAME}
scaleUpTriggers :
2022-05-12 09:19:58 +09:00
- githubEvent :
workflowJob : {}
2022-02-27 11:43:07 +00:00
amount : 1
duration : "10m"
minReplicas : ${RUNNER_MIN_REPLICAS}
maxReplicas : 10
scaleDownDelaySecondsAfterScaleOut : ${RUNNER_SCALE_DOWN_DELAY_SECONDS_AFTER_SCALE_OUT}
2022-06-29 22:15:50 +09:00
# Comment out the whole metrics if you'd like to solely test webhook-based scaling
metrics :
- type : PercentageRunnersBusy
scaleUpThreshold : '0.75'
scaleDownThreshold : '0.25'
scaleUpFactor : '2'
scaleDownFactor : '0.5'