y0ngb1n

Aben Blog

欢迎来到我的技术小黑屋ヾ(◍°∇°◍)ノ゙
github

Reprint / Docker Image Slimming & Optimization

Why do we still need to slim down Docker images today when storage is so cheap?

Advantages of Small Images#

  1. Accelerate Build/Deployment
    Although storage resources are relatively cheap, network IO is limited. In situations with limited bandwidth, the time difference between deploying a 1G image and a 10M image can be on the order of minutes versus seconds. This time is especially precious when failures occur and services are scheduled to other nodes.
  2. Improve Security, Reduce Attack Surface
    Smaller images mean fewer unnecessary programs, which can greatly reduce the targets for attacks.
  3. Reduce Storage Overhead

Principles for Creating Small Images#

  1. Choose the smallest base image.
  2. Reduce layers and remove unnecessary files.
    In the actual process of creating images, blindly merging layers is not advisable. It is important to learn to fully utilize Docker's caching mechanism, extract common layers, and accelerate builds.
    • Separate dependency files and actual code files into different layers.
    • Teams/companies should adopt common base images, etc.
  3. Use multi-stage builds.
    Often, the dependency environments needed during the build phase and the actual runtime phase are different. For example, a program written in Golang only needs a binary file to run, while for Node.js, the final runtime may only require some packaged js files without needing to include thousands of dependencies in node_modules.

Base Images#

  • distroless

    "Distroless" images contain only your application and its runtime dependencies. They do not contain package managers, shells or any other programs you would expect to find in a standard Linux distribution.

    distroless is a project launched by Google that contains only the runtime environment and does not include package managers, shell, or other programs. If your program has no other dependencies, this is a good choice.

  • alpine

    Alpine Linux is a security-oriented, lightweight Linux distribution based on musl libc and busybox.

    alpine is a secure Linux distribution based on musl and busybox. Although it is less than 10M, it includes a package manager and a shell environment, which will be very useful during actual usage and debugging. However, please note that since alpine uses a smaller muslc instead of glibc, it may cause some applications to be unusable and require recompilation.

  • scratch
    scratch is a blank image, generally used for building base images. For example, the Dockerfile for the alpine image starts from scratch.

    FROM scratch
    ADD alpine-minirootfs-20190228-x86_64.tar.gz /
    CMD ["/bin/sh"]
    
  • busybox

In general, distroless is relatively more secure, but in practical use, you may encounter issues with adding dependencies and debugging. alpine is smaller, comes with a package manager, and is more in line with usage habits, but muslc may bring compatibility issues. Generally, I would choose alpine as the base image. In addition, we can find commonly used debian images on Docker Hub that also provide small images containing only basic functionalities.

Base Image Comparison#

Here, we directly pull the base images and check the image sizes. By observation, we can find that alpine is only about 5M, which is one-twentieth of debian.

alpine      latest    5cb3aa00f899        3 weeks ago         5.53MB
debian      latest    0af60a5c6dd0        3 weeks ago         101MB
ubuntu      18.04     47b19964fb50        7 weeks ago         88.1MB
ubuntu      latest    47b19964fb50        7 weeks ago         88.1MB
alpine      3.8       3f53bb00af94        3 months ago        4.41MB

It seems that the difference is not significant at first glance, but in practice, different language base images will provide some tags made with different base images. Below, we take the ruby image as an example to see the differences between different base images. We can see that the default latest image is 881MB, while alpine is only less than 50MB, which is a considerable difference.

ruby   latest   a5d26127d8d0        4 weeks ago         881MB
ruby   alpine   8d8f7d19d1fa        4 weeks ago         47.8MB
ruby   slim     58dd4d3c99da        4 weeks ago         125MB

Reduce Layers, Remove Unnecessary Files#

  1. Do not cross lines when deleting files.

    # dockerfile 1
    FROM alpine
    RUN wget https://github.com/mohuishou/scuplus-wechat/archive/1.0.0.zip
    
    # dockerfile 2
    FROM alpine
    RUN wget https://github.com/mohuishou/scuplus-wechat/archive/1.0.0.zip
    RUN rm 1.0.0.zip
    
    # dockerfile 3
    FROM alpine
    RUN wget https://github.com/mohuishou/scuplus-wechat/archive/1.0.0.zip && rm 1.0.0.zip
    
    test   3  351a80e99c22        5 seconds ago        5.53MB
    test   2  ad27e625b8e5        49 seconds ago       6.1MB
    test   1  165e2e0df1d3        About a minute ago   6.1MB
    

    We can see that the sizes of 1 and 2 are the same, but 3 is smaller by 0.5MB. This is because docker generates a layer for almost every command line. When deleting files: since the layers below are read-only, when files in these layers need to be deleted, AUFS uses a whiteout mechanism, which is implemented by creating corresponding whiteout hidden files in the writable directory of the upper layer. Therefore, deleting files from the previous layer in the current layer only hides the file.

  2. Use single-line commands.
    Besides the delete statements needing to be on one line, it is best to also use a single RUN command for some common statements when installing dependencies to reduce the final number of layers.

  3. Separate dependency packages and source code programs, fully utilize layer caching.
    This is a best practice. In actual development, our dependency packages often do not change much, but the source code we are developing changes frequently. If our actual code is only 10M, but the dependencies are 1G, if we directly COPY ... during COPY, it will cause the cache of this layer to be invalidated every time we modify the code, leading to wasted time copying and pushing to the image repository. By separating the COPY statements, we can only change the frequently modified code layer during each push, rather than including the dependencies.

  4. Use .dockerignore.
    When using Git, we can ignore files using .gitignore. During docker build, we can also use .dockerignore to ignore files in the Docker context. This not only reduces the import of unnecessary files but also improves security by avoiding packaging some configuration files into the image.

Multi-Stage Builds#

Multi-stage builds are also a way to reduce layers. Through multi-stage builds, the final image can contain only the executable file generated at the end and the necessary runtime dependencies, greatly reducing the image size.

Taking GO language as an example, during actual runtime, only the final compiled binary file is needed, while the GO language itself and its extension packages and code files are unnecessary. However, these dependencies are essential during compilation, which is where multi-stage builds can be used to reduce the final image size.

# Use golang image as builder image
FROM golang:1.12 as builder
WORKDIR /go/src/github.com/go/helloworld/
COPY app.go .
RUN go build -o app .

# After compilation, use alpine image as the final base image
FROM alpine:latest as prod
RUN apk --no-cache add ca-certificates
WORKDIR /root/

# Copy the compiled binary file from the builder
COPY --from=builder /go/src/github.com/go/helloworld/app .
CMD ["./app"]

Due to the length of this article, I will not elaborate on multi-stage builds here. For details, please refer to: Multi-Stage Builds

Tips and Tricks#

  1. Use dive to view the layers of Docker images, which can help you analyze and reduce image size.

  2. Use docker-slim to automatically help you reduce image size, which is particularly useful for web applications.

  3. Remove dependencies when installing software.

    # ubuntu
    apt-get install -y --no-install-recommends
    
    # alpine
    apk add --no-cache && apk del build-dependencies
    
    # centos
    yum install -y ... && yum clean all
    
  4. Use the --flatten parameter to reduce layers (not recommended).

  5. Use docker-squash to compress layers.

Examples from Different Languages#

Ruby (Rails)#

  1. Only install the dependencies needed for production.

  2. Remove unnecessary dependency files.

    bundle install --without development:test:assets -j4 --retry 3 --path=vendor/bundle \
        # Remove unneeded files (cached *.gem, *.o, *.c)
        && rm -rf vendor/bundle/ruby/2.5.0/cache/*.gem \
        && find vendor/bundle/ruby/2.5.0/gems/ -name "*.c" -delete \
        && find vendor/bundle/ruby/2.5.0/gems/ -name "*.o" -delete
    
  3. Remove frontend node_modules and cache files.

    rm -rf node_modules tmp/cache app/assets vendor/assets spec
    

The above content can be combined with multi-stage builds.

Golang#

After using multi-stage builds, Golang only has one binary file left. At this point, further optimization can only be done by using tools like upx to compress the size of the binary file.

References#


Author of this article: mohuishou
Original link: https://lailin.xyz/post/51252.html
Copyright statement: Copyright belongs to the author, please indicate the source when reprinting!

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.