Why do we still need to slim down Docker images today when storage is so cheap?
Advantages of Small Images#
- Accelerate Build/Deployment
Although storage resources are relatively cheap, network IO is limited. In situations with limited bandwidth, the time difference between deploying a1G
image and a10M
image can be on the order of minutes versus seconds. This time is especially precious when failures occur and services are scheduled to other nodes. - Improve Security, Reduce Attack Surface
Smaller images mean fewer unnecessary programs, which can greatly reduce the targets for attacks. - Reduce Storage Overhead
Principles for Creating Small Images#
- Choose the smallest base image.
- Reduce layers and remove unnecessary files.
In the actual process of creating images, blindly merging layers is not advisable. It is important to learn to fully utilize Docker's caching mechanism, extract common layers, and accelerate builds.- Separate dependency files and actual code files into different layers.
- Teams/companies should adopt common base images, etc.
- Use multi-stage builds.
Often, the dependency environments needed during the build phase and the actual runtime phase are different. For example, a program written inGolang
only needs a binary file to run, while forNode.js
, the final runtime may only require some packagedjs
files without needing to include thousands of dependencies innode_modules
.
Base Images#
-
"Distroless" images contain only your application and its runtime dependencies. They do not contain package managers, shells or any other programs you would expect to find in a standard Linux distribution.
distroless
is a project launched by Google that contains only the runtime environment and does not include package managers,shell
, or other programs. If your program has no other dependencies, this is a good choice. -
Alpine Linux is a security-oriented, lightweight Linux distribution based on musl libc and busybox.
alpine
is a secure Linux distribution based onmusl
andbusybox
. Although it is less than 10M, it includes a package manager and ashell
environment, which will be very useful during actual usage and debugging. However, please note that sincealpine
uses a smallermuslc
instead ofglibc
, it may cause some applications to be unusable and require recompilation. -
scratch
scratch
is a blank image, generally used for building base images. For example, theDockerfile
for thealpine
image starts fromscratch
.FROM scratch ADD alpine-minirootfs-20190228-x86_64.tar.gz / CMD ["/bin/sh"]
In general, distroless
is relatively more secure, but in practical use, you may encounter issues with adding dependencies and debugging. alpine
is smaller, comes with a package manager, and is more in line with usage habits, but muslc
may bring compatibility issues. Generally, I would choose alpine
as the base image. In addition, we can find commonly used debian
images on Docker Hub that also provide small images containing only basic functionalities.
Base Image Comparison#
Here, we directly pull the base images and check the image sizes. By observation, we can find that alpine
is only about 5M, which is one-twentieth of debian
.
alpine latest 5cb3aa00f899 3 weeks ago 5.53MB
debian latest 0af60a5c6dd0 3 weeks ago 101MB
ubuntu 18.04 47b19964fb50 7 weeks ago 88.1MB
ubuntu latest 47b19964fb50 7 weeks ago 88.1MB
alpine 3.8 3f53bb00af94 3 months ago 4.41MB
It seems that the difference is not significant at first glance, but in practice, different language base images will provide some tags made with different base images. Below, we take the ruby
image as an example to see the differences between different base images. We can see that the default latest
image is 881MB
, while alpine
is only less than 50MB
, which is a considerable difference.
ruby latest a5d26127d8d0 4 weeks ago 881MB
ruby alpine 8d8f7d19d1fa 4 weeks ago 47.8MB
ruby slim 58dd4d3c99da 4 weeks ago 125MB
Reduce Layers, Remove Unnecessary Files#
-
Do not cross lines when deleting files.
# dockerfile 1 FROM alpine RUN wget https://github.com/mohuishou/scuplus-wechat/archive/1.0.0.zip # dockerfile 2 FROM alpine RUN wget https://github.com/mohuishou/scuplus-wechat/archive/1.0.0.zip RUN rm 1.0.0.zip # dockerfile 3 FROM alpine RUN wget https://github.com/mohuishou/scuplus-wechat/archive/1.0.0.zip && rm 1.0.0.zip
test 3 351a80e99c22 5 seconds ago 5.53MB test 2 ad27e625b8e5 49 seconds ago 6.1MB test 1 165e2e0df1d3 About a minute ago 6.1MB
We can see that the sizes of 1 and 2 are the same, but 3 is smaller by 0.5MB. This is because
docker
generates a layer for almost every command line. When deleting files: since the layers below are read-only, when files in these layers need to be deleted,AUFS
uses awhiteout
mechanism, which is implemented by creating correspondingwhiteout
hidden files in the writable directory of the upper layer. Therefore, deleting files from the previous layer in the current layer only hides the file. -
Use single-line commands.
Besides the delete statements needing to be on one line, it is best to also use a singleRUN
command for some common statements when installing dependencies to reduce the final number of layers. -
Separate dependency packages and source code programs, fully utilize layer caching.
This is a best practice. In actual development, our dependency packages often do not change much, but the source code we are developing changes frequently. If our actual code is only10M
, but the dependencies are1G
, if we directlyCOPY ...
duringCOPY
, it will cause the cache of this layer to be invalidated every time we modify the code, leading to wasted time copying and pushing to the image repository. By separating theCOPY
statements, we can only change the frequently modified code layer during eachpush
, rather than including the dependencies. -
Use
.dockerignore
.
When usingGit
, we can ignore files using.gitignore
. Duringdocker build
, we can also use.dockerignore
to ignore files in the Docker context. This not only reduces the import of unnecessary files but also improves security by avoiding packaging some configuration files into the image.
Multi-Stage Builds#
Multi-stage builds are also a way to reduce layers. Through multi-stage builds, the final image can contain only the executable file generated at the end and the necessary runtime dependencies, greatly reducing the image size.
Taking GO
language as an example, during actual runtime, only the final compiled binary file is needed, while the GO
language itself and its extension packages and code files are unnecessary. However, these dependencies are essential during compilation, which is where multi-stage builds can be used to reduce the final image size.
# Use golang image as builder image
FROM golang:1.12 as builder
WORKDIR /go/src/github.com/go/helloworld/
COPY app.go .
RUN go build -o app .
# After compilation, use alpine image as the final base image
FROM alpine:latest as prod
RUN apk --no-cache add ca-certificates
WORKDIR /root/
# Copy the compiled binary file from the builder
COPY --from=builder /go/src/github.com/go/helloworld/app .
CMD ["./app"]
Due to the length of this article, I will not elaborate on multi-stage builds here. For details, please refer to: Multi-Stage Builds
Tips and Tricks#
-
Use
dive
to view the layers of Docker images, which can help you analyze and reduce image size. -
Use
docker-slim
to automatically help you reduce image size, which is particularly useful for web applications. -
Remove dependencies when installing software.
# ubuntu apt-get install -y --no-install-recommends # alpine apk add --no-cache && apk del build-dependencies # centos yum install -y ... && yum clean all
-
Use the
--flatten
parameter to reduce layers (not recommended). -
Use
docker-squash
to compress layers.
Examples from Different Languages#
Ruby (Rails)#
-
Only install the dependencies needed for production.
-
Remove unnecessary dependency files.
bundle install --without development:test:assets -j4 --retry 3 --path=vendor/bundle \ # Remove unneeded files (cached *.gem, *.o, *.c) && rm -rf vendor/bundle/ruby/2.5.0/cache/*.gem \ && find vendor/bundle/ruby/2.5.0/gems/ -name "*.c" -delete \ && find vendor/bundle/ruby/2.5.0/gems/ -name "*.o" -delete
-
Remove frontend
node_modules
and cache files.rm -rf node_modules tmp/cache app/assets vendor/assets spec
The above content can be combined with multi-stage builds.
Golang#
After using multi-stage builds, Golang
only has one binary file left. At this point, further optimization can only be done by using tools like upx
to compress the size of the binary file.
References#
- Three Tips for Slimming Docker Container Images
- Base Images | Discussing Docker Slimming Again
- Mind Map of "Docker Best Practices"
- Docker — From Beginner to Practice
- Brief Analysis of Docker's Basic Principles
- Ruby on Rails — Smaller Docker Images
Author of this article: mohuishou
Original link: https://lailin.xyz/post/51252.html
Copyright statement: Copyright belongs to the author, please indicate the source when reprinting!