Why do we still need to slim down Docker images today when storage is so cheap?
Advantages of Small Images#
- Accelerate Build/Deployment
Although storage resources are relatively cheap, network IO is limited. In situations with limited bandwidth, the time difference between deploying a1Gimage and a10Mimage can be on the order of minutes versus seconds. This time is especially precious when failures occur and services are scheduled to other nodes. - Improve Security, Reduce Attack Surface
Smaller images mean fewer unnecessary programs, which can greatly reduce the targets for attacks. - Reduce Storage Overhead
Principles for Creating Small Images#
- Choose the smallest base image.
- Reduce layers and remove unnecessary files.
In the actual process of creating images, blindly merging layers is not advisable. It is important to learn to fully utilize Docker's caching mechanism, extract common layers, and accelerate builds.- Separate dependency files and actual code files into different layers.
- Teams/companies should adopt common base images, etc.
- Use multi-stage builds.
Often, the dependency environments needed during the build phase and the actual runtime phase are different. For example, a program written inGolangonly needs a binary file to run, while forNode.js, the final runtime may only require some packagedjsfiles without needing to include thousands of dependencies innode_modules.
Base Images#
-
"Distroless" images contain only your application and its runtime dependencies. They do not contain package managers, shells or any other programs you would expect to find in a standard Linux distribution.
distrolessis a project launched by Google that contains only the runtime environment and does not include package managers,shell, or other programs. If your program has no other dependencies, this is a good choice. -
Alpine Linux is a security-oriented, lightweight Linux distribution based on musl libc and busybox.
alpineis a secure Linux distribution based onmuslandbusybox. Although it is less than 10M, it includes a package manager and ashellenvironment, which will be very useful during actual usage and debugging. However, please note that sincealpineuses a smallermuslcinstead ofglibc, it may cause some applications to be unusable and require recompilation. -
scratch
scratchis a blank image, generally used for building base images. For example, theDockerfilefor thealpineimage starts fromscratch.FROM scratch ADD alpine-minirootfs-20190228-x86_64.tar.gz / CMD ["/bin/sh"]
In general, distroless is relatively more secure, but in practical use, you may encounter issues with adding dependencies and debugging. alpine is smaller, comes with a package manager, and is more in line with usage habits, but muslc may bring compatibility issues. Generally, I would choose alpine as the base image. In addition, we can find commonly used debian images on Docker Hub that also provide small images containing only basic functionalities.
Base Image Comparison#
Here, we directly pull the base images and check the image sizes. By observation, we can find that alpine is only about 5M, which is one-twentieth of debian.
alpine latest 5cb3aa00f899 3 weeks ago 5.53MB
debian latest 0af60a5c6dd0 3 weeks ago 101MB
ubuntu 18.04 47b19964fb50 7 weeks ago 88.1MB
ubuntu latest 47b19964fb50 7 weeks ago 88.1MB
alpine 3.8 3f53bb00af94 3 months ago 4.41MB
It seems that the difference is not significant at first glance, but in practice, different language base images will provide some tags made with different base images. Below, we take the ruby image as an example to see the differences between different base images. We can see that the default latest image is 881MB, while alpine is only less than 50MB, which is a considerable difference.
ruby latest a5d26127d8d0 4 weeks ago 881MB
ruby alpine 8d8f7d19d1fa 4 weeks ago 47.8MB
ruby slim 58dd4d3c99da 4 weeks ago 125MB
Reduce Layers, Remove Unnecessary Files#
-
Do not cross lines when deleting files.
# dockerfile 1 FROM alpine RUN wget https://github.com/mohuishou/scuplus-wechat/archive/1.0.0.zip # dockerfile 2 FROM alpine RUN wget https://github.com/mohuishou/scuplus-wechat/archive/1.0.0.zip RUN rm 1.0.0.zip # dockerfile 3 FROM alpine RUN wget https://github.com/mohuishou/scuplus-wechat/archive/1.0.0.zip && rm 1.0.0.ziptest 3 351a80e99c22 5 seconds ago 5.53MB test 2 ad27e625b8e5 49 seconds ago 6.1MB test 1 165e2e0df1d3 About a minute ago 6.1MBWe can see that the sizes of 1 and 2 are the same, but 3 is smaller by 0.5MB. This is because
dockergenerates a layer for almost every command line. When deleting files: since the layers below are read-only, when files in these layers need to be deleted,AUFSuses awhiteoutmechanism, which is implemented by creating correspondingwhiteouthidden files in the writable directory of the upper layer. Therefore, deleting files from the previous layer in the current layer only hides the file. -
Use single-line commands.
Besides the delete statements needing to be on one line, it is best to also use a singleRUNcommand for some common statements when installing dependencies to reduce the final number of layers. -
Separate dependency packages and source code programs, fully utilize layer caching.
This is a best practice. In actual development, our dependency packages often do not change much, but the source code we are developing changes frequently. If our actual code is only10M, but the dependencies are1G, if we directlyCOPY ...duringCOPY, it will cause the cache of this layer to be invalidated every time we modify the code, leading to wasted time copying and pushing to the image repository. By separating theCOPYstatements, we can only change the frequently modified code layer during eachpush, rather than including the dependencies. -
Use
.dockerignore.
When usingGit, we can ignore files using.gitignore. Duringdocker build, we can also use.dockerignoreto ignore files in the Docker context. This not only reduces the import of unnecessary files but also improves security by avoiding packaging some configuration files into the image.
Multi-Stage Builds#
Multi-stage builds are also a way to reduce layers. Through multi-stage builds, the final image can contain only the executable file generated at the end and the necessary runtime dependencies, greatly reducing the image size.
Taking GO language as an example, during actual runtime, only the final compiled binary file is needed, while the GO language itself and its extension packages and code files are unnecessary. However, these dependencies are essential during compilation, which is where multi-stage builds can be used to reduce the final image size.
# Use golang image as builder image
FROM golang:1.12 as builder
WORKDIR /go/src/github.com/go/helloworld/
COPY app.go .
RUN go build -o app .
# After compilation, use alpine image as the final base image
FROM alpine:latest as prod
RUN apk --no-cache add ca-certificates
WORKDIR /root/
# Copy the compiled binary file from the builder
COPY --from=builder /go/src/github.com/go/helloworld/app .
CMD ["./app"]
Due to the length of this article, I will not elaborate on multi-stage builds here. For details, please refer to: Multi-Stage Builds
Tips and Tricks#
-
Use
diveto view the layers of Docker images, which can help you analyze and reduce image size. -
Use
docker-slimto automatically help you reduce image size, which is particularly useful for web applications. -
Remove dependencies when installing software.
# ubuntu apt-get install -y --no-install-recommends # alpine apk add --no-cache && apk del build-dependencies # centos yum install -y ... && yum clean all -
Use the
--flattenparameter to reduce layers (not recommended). -
Use
docker-squashto compress layers.
Examples from Different Languages#
Ruby (Rails)#
-
Only install the dependencies needed for production.
-
Remove unnecessary dependency files.
bundle install --without development:test:assets -j4 --retry 3 --path=vendor/bundle \ # Remove unneeded files (cached *.gem, *.o, *.c) && rm -rf vendor/bundle/ruby/2.5.0/cache/*.gem \ && find vendor/bundle/ruby/2.5.0/gems/ -name "*.c" -delete \ && find vendor/bundle/ruby/2.5.0/gems/ -name "*.o" -delete -
Remove frontend
node_modulesand cache files.rm -rf node_modules tmp/cache app/assets vendor/assets spec
The above content can be combined with multi-stage builds.
Golang#
After using multi-stage builds, Golang only has one binary file left. At this point, further optimization can only be done by using tools like upx to compress the size of the binary file.
References#
- Three Tips for Slimming Docker Container Images
- Base Images | Discussing Docker Slimming Again
- Mind Map of "Docker Best Practices"
- Docker — From Beginner to Practice
- Brief Analysis of Docker's Basic Principles
- Ruby on Rails — Smaller Docker Images
Author of this article: mohuishou
Original link: https://lailin.xyz/post/51252.html
Copyright statement: Copyright belongs to the author, please indicate the source when reprinting!