Introduction:
Ah, the CI/CD pipeline. Are you tired of hitting those pesky Docker Hub rate limits in your CI/CD pipeline? Well, you're in luck because today we're going to spill the beans on how to avoid these rate limits and keep your pipeline running smoothly.
Here at Devtron, we've faced this struggle head-on, and let me tell you, it wasn't pretty. Whenever we tried to deploy an urgent fix in production, rate limits would pop up in build logs. We were like a bakery with an empty flour sack – progress halted.
Understanding the Problem:
Docker Hub, a widely-used container registry, has become an integral part of modern software development. However, its usage after specific limits can lead to rate limits, particularly for organizations with high-traffic CI/CD pipelines. These limitations can hinder development velocity, increase operational costs, and disrupt software delivery.
So, let's talk about the problem. Docker Hub has updated its rate limits, and this started affecting our CI/CD pipeline. What's the solution? Let's break it down into multiple parts of our pipeline:
CI Pipeline:
According to official documentation, the current limits are 100 pulls per 6 hours for non-authenticated requests and 200 pulls per six hours for authenticated requests. The updated limits after December 10, 2024 will be 10 pulls per hour or 60 pulls per 6 hours for unauthenticated requests and 40 pulls per hour or 240 pulls per 6 hours for authenticated requests. In addition to it, every image pull is considered an API call and rate limits are identified through individual IPs. If you use a single hosted instance of Jenkins for all your builds or multiple builder instances in a private network sharing a single nat gateway for outbound requests, you are more likely to hit the pull requests limit sooner than you expect.
Our CI pipeline consists of multiple steps where dockerhub comes into picture. Let's list down every step of our CI pipeline and then we'll move forward to solution that we implemented at every step:
- Create docker buildx builder
- Pull base image specified in dockerfile
- Image scanning with Trivy
Current Available Solutions:
Now, if you hit the limits you most probably get an error message with status code 429 which says - ERROR: toomanyrequests: Too Many Requests.
OR
You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limits.
If you want to avoid these rate limits, you have the following options:
- Authenticate with docker login credentials which will increase your limit to 240 pull requests every 6 hours.
- Buy a docker hub paid subscription which will increase your pull rate limit to upto 1 Million pulls every every month depending upon your new docker hub subscription plan.
- Maintain all the images on your private repository and then use it from there. In this case, you'll need to maintain updates for all images with security patches and new available versions as well as replace all your dockerfiles with the private registry.
Let's look at the simple solution that we implemented for builds that run using devtron for our internal team as well as all our clients that depends upon devtron for their CI/CD without any interruptions.
Docker buildx builder:
If you build container images using docker buildx or if you build for multiple architecture, buildx is automatically used in that. When you use buildx to build container images, the first thing you have to do is install an emulator for the architecture you want to build your container image using binfmt.
Devtron uses kubernetes pods as build agents and you can run infinite number of parallel builds in low cost if you use spot nodes with autoscaling enabled dedicated to build infrastructure. Now, because every build starts completely fresh only having build cache of previous build for the same CI pipeline, binfmt installs emulators on every build, the first api call to docker hub is used here itself.
Then, it pulls buildkit image internally to setup builder when you create a builder instance using docker buildx create
command because docker buildx uses buildkit internally. Now, there are different drivers for docker buildx where docker driver is set to default. Docker driver doesn't even let you change the registry or image tag for buildkit image but other drivers like docker container or kubernetes let's you customize that. Till now, we have exhausted 2 out of 10 pull rate limits of this hour and our build has not even started still.
Now, that we all are on same page about the problem, let's discuss the solution that we implemented. The solution is simple, it doesn't require authentication, you don't need to manage images on your own repository, you don't need to make any changes in dockerfile and the best part, you don't need to buy a subscription for Docker Hub. We implemented docker mirror using mirror.gcr.io.
What is mirror.gcr.io?
Remember the 3rd alternative we talked about that if you don't want to face the issue of Docker Hub rate limits, you need to maintain images on your own repository? That's what mirror.gcr.io
does for you. It manages most of the popular DockerHub images on it's own container registry which is publicly accessible and it also keeps updating the newer tags of those images at regular interval. Now, the question is how do we use it if we don't have to change anything in our existing system? That's why Docker Hub mirror comes into picture. These images are stored in a server managed by Google Cloud.
Pull base image specified in dockerfile:
For normal docker builds (without buildx) and docker run commands that run containers which pull images from docker hub, simply add the below lines of json in docker daemon.json file located at /etc/docker/daemon.json
:
{"registry-mirrors": ["https://mirror.gcr.io"]}
Once the mirrors are added, restart docker daemon using sudo systemctl restart docker
. How docker mirror works is, for every image pull request, it checks on the mirrors first one by one in order, you can configure multiple mirrors as well and maintain a mirror for images which are not available on mirror.gcr.io
and if the image is not found on any of the mirrors, then it goes to DockerHub which is then counted towards your pull rate limit of the hour.
Adding mirror.gcr.io
to docker's daemon.json
file will resolve the pull rate limit in CI step for most of the users including images that are mentioned in Dockerfiles. This will also resolve the first 2 things that docker does while using buildx i.e. installation of emulators for different architectures using binfmt image and buildkit image irrespective of the driver you are using to initialize the builder because both of those are also pulled from docker hub by default.
But if you are using buildx for builds, images mentioned in your dockerfile won't use the mirrors that you have added in docker daemon config file, buildkit has it's own configuration file where you'll need to specify the same mirror configurations again. Add the below configuration to buildkitd.default.toml
, default location of this file is $HOME/.docker/buildx/buildkitd.default.toml
, in our case it is /root/.docker/buildx/buildkitd.default.toml
:
[registry."docker.io"]
mirrors = ["mirror.gcr.io"]
After this, when a new builder instance is initialized with buildkit, it'll use the mirror configurations from default buildkit daemon config file. We have successfully avoided DockerHub rate limits for our build process from installing the qemu emulators for multiple architectures, initializing buildkit for docker buildx and base images used in dockerfile, in short we have dodged it for our complete build process with or without docker buildx and you can successfully push the image to your container registry without having to worry about the Docker Hub image pull rate limits hurdles. All of your docker run are also able to avoid the rate limits. Also, even if Docker Hub experiences issues, you might still have access to the images in your mirror.
Image scanning with Trivy:
For most people, build process gets completed as soon as the image is pushed to container registry but Devtron also has an option to enable image scanning after your build completes and devtron uses Trivy for that. Whether or not you use Devtron, if you are using Trivy you might have been facing the issues recently of your image scanning getting failed due to Trivy CLI not being able to pull it's vulnerability database (trivy-db or trivy-java-db) from ghcr.io (github container registry).
2024-11-20T06:42:17.572Z FATAL init error: DB error: failed to download vulnerability DB: database download error: OCI repository error: 2 errors occurred:
* GET https://ghcr.io/token?scope=repository%3Aaquasecurity%2Ftrivy-db%3Apull&service=ghcr.io: DENIED: denied
* GET https://ghcr.io/v2/aquasecurity/trivy-db/manifests/2: TOOMANYREQUESTS: retry-after: 181.317µs, allowed: 44000/minute
This might be the issue from ghcr rate limits, but this can also be resolved with mirror.gcr.io
we used above because Trivy also keeps their vulnerability databases updated on docker hub as well as public ecr too and we know that mirror.gcr.io
keeps popular DockerHub repositories cached on it's server. So, we can use the same here as well, simply add the below given env variables from wherever you are running the trivy cli to resolve the issue:
For trivy version less than 0.56.2, it accepts single repository as replacement
export TRIVY_DB_REPOSITORY: mirror.gcr.io/aquasec/trivy-db
export TRIVY_JAVA_DB_REPOSITORY: mirror.gcr.io/aquasec/trivy-java-db
For trivy version equal to or greater than 0.56.2, it has been configured to accept multiple repositories as fallback as well as in version equal to or greater than 0.57.1 mirror.gcr.io is already added to fallback registries by default.
This winds up the all the issues that we had to resolve for CI and since then our build pipelines never failed due to pull rate limit errors. Just FYI, if you deploy on kubernetes and you are using Devtron version v1.0.0 or above, all these mirrors are inbuilt and you don't have to configure anything from your side. Using latest version of devtron, you only need to focus on your application and not worry about the builds failing due to pull rate limits, plus the added advantage is you can run infinite number of builds in parallel that too all builds having same number of resources dedicated to each.
CD Pipeline:
If you deploy on Kubernetes and not on docker, you might be facing this issue in deployments as well. Issue may not be in your custom deployed applications because you maybe using private registries other than dockerhub but if you are deploying third party helm charts or any Kubernetes operators which uses images from dockerhub which can include your monitoring and logging stacks as well which are critical for every system and may result in business loss if down even for few seconds. So, how do we apply mirror for this?
We'll use MutatingWebhookConfiguration for this, it'll intercept any request for image from dockerhub and replace the registry from docker.io to mirror.gcr.io at API Server level itself. You can take a look at docker proxy webhook, you can map any number of registries to replace the url from one to another. For example, replace k8s.gcr.io to registry.k8s.io. You'll need cert-manager to be setup in your cluster as a pre-requisite.
Update configMap as needed:
apiVersion: v1
kind: ConfigMap
metadata:
name: docker-proxy-config
namespace: docker-proxy
data:
docker-proxy-config.yaml: |
ignoreList:
- "123456789012.dkr.ecr.us-east-1.amazonaws.com"
domainMap:
docker.io: mirror.gcr.io
Setup ignoreList
to never intercept API server requests for the registries that you use for your own applications, like private ECR. Apply the complete yaml file with the changes in configMap as suggested above and your applications will start using container images for pods in your kubernetes cluster from gcr mirror instead of dockerhub.
You can update the given label on any namespace for which you don't want the docker proxy controller to intercept the requests for registry changes, it's good to enable it for kubernetes official namespaces like kube-system
etc. Also, do not remove the label from namespace in which proxy controller is deployed to prevent deadlock condition.
kubectl label namespace kube-system docker-proxy-webhook=disabled
Conclusion:
And there you have it, folks! With these simple configurations, you can avoid Docker Hub rate limits in your CI/CD pipeline. Remember, a smooth pipeline is a happy pipeline. So, go ahead and give these configurations a try. Your pipeline (and your sanity) will thank you.