Eliminating Code Duplication in Dockerfiles with Stage Name Aliasing
====================================================================

The limited availability of traditional control flow constructs when writing Dockerfiles encourages code duplication. This increases the odds of errors made in subsequent updates wherein a developer may update one of the copies but accidentally forget to update the other(s). One alternative to these is build arguments, which can be leveraged to make decisions about which stages are the sources of other stages in multi-stage builds. This allows you to design Dockerfiles with built-in conditionals, with each branch of the conditional being defined as its own stage, allowing shared code to be branch points instead of needing multiple copies of them in completely independent stages.

One application of this is choosing between compiling a dependency versus downloading a pre-built binary. Downloading a pre-built binary could be useful for testing that binary prior to release, and compiling it could be useful for debugging it on a local development machine. Including a stage capable of compiling it as the dependency for its runtime stage allows developers to share a standardized declarative build environment and toolchain, making the results reproducible between developers.

However, lack of support for build argument evaluation in COPY --from directives discourages consolidating equivalent logic across these branched stages into a single downstream stage. Here’s an example with a Dockerfile for jq which is alternately downloaded or built from source depending on a build argument, and then copies the binary into a runtime layer with only the dependencies required for execution.

# syntax=docker/dockerfile:1.5

ARG JQ_SOURCE=download

FROM debian:bullseye AS jq-build
RUN apt-get update && apt-get install -y build-essential libonig-dev
ADD https://github.com/stedolan/jq/releases/download/jq-1.6/jq-1.6.tar.gz \
  jq-1.6.tar.gz
RUN <<EOF
tar xf jq-1.6.tar.gz
cd jq-1.6
./configure
make
make install
EOF

FROM debian:bullseye-slim AS jq-build-run
RUN apt-get update && apt-get install -y libonig5
COPY --from=jq-build /usr/local/bin/jq /usr/local/bin/jq

FROM debian:bullseye AS jq-download
ADD https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64 \
  /usr/local/bin/jq
RUN chmod +x /usr/local/bin/jq

FROM debian:bullseye-slim AS jq-download-run
RUN apt-get update && apt-get install -y libonig5
COPY --from=jq-download /usr/local/bin/jq /usr/local/bin/jq

FROM jq-${JQ_SOURCE} AS jq-run
ENTRYPOINT [ "/usr/local/bin/jq" ]

Both the jq-build and jq-download stages produce an executable at /usr/local/bin/jq that can be copied into dependent stages, with the jq-*-run dependent stages being duplicates of each other, with the only difference being the stage from which the binary is copied.

graph LR; jq-build -. COPY --from .-> jq-build-run; jq-download -. COPY --from .-> jq-download-run; jq-build-run -- FROM --> jq-run; jq-download-run -- FROM --> jq-run;

As documented in moby/moby#34482, variables are not evaluated when they appear in a COPY --from= value. An attempt to consolidate the jq-build-run, jq-download-run, and jq-run stages using a build argument directly would not work:

FROM debian:bullseye-slim AS jq-run
RUN apt-get update && apt-get install -y libonig5
COPY --from=${JQ_SOURCE} /usr/local/bin/jq /usr/local/bin/jq
ENTRYPOINT [ "/usr/local/bin/jq" ]

However, because build arguments are evaluated in FROM statements, a stage can be conditionally aliased to a common name, which can be used as a dependency in both subsequent FROM and COPY --from= statements. jq-source is used as an alias stage in this modified version.

# syntax=docker/dockerfile:1.5

ARG JQ_SOURCE=download

FROM debian:bullseye AS jq-build
RUN apt-get update && apt-get install -y build-essential libonig-dev
ADD https://github.com/stedolan/jq/releases/download/jq-1.6/jq-1.6.tar.gz \
  jq-1.6.tar.gz
RUN <<EOF
tar xf jq-1.6.tar.gz
cd jq-1.6
./configure
make
make install
EOF

FROM debian:bullseye AS jq-download
ADD https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64 \
  /usr/local/bin/jq
RUN chmod +x /usr/local/bin/jq

FROM jq-${JQ_SOURCE} AS jq-source

FROM debian:bullseye AS final
RUN apt-get update && apt-get install -y libonig5
COPY --from=jq-source /usr/local/bin/jq /usr/local/bin/jq
ENTRYPOINT [ "/usr/local/bin/jq" ]

This produces a dependency graph with fewer stages and no code duplication.

graph LR; jq-build -- FROM --> jq-source; jq-download -- FROM --> jq-source; jq-source -. COPY --from .-> jq-run

Using the JQ_SOURCE argument as a conditional switch, this image can be built in two different ways:

docker buildx build --build-arg JQ_SOURCE=download - < Dockerfile
docker buildx build --build-arg JQ_SOURCE=build - < Dockerfile

In the first docker buildx build command, jq-source evaluates to jq-download, causing the COPY statement in the final stage to evaluate to:

COPY --from=jq-download /usr/local/bin/jq /usr/local/bin/jq

And likewise for the second build command:

COPY --from=jq-build /usr/local/bin/jq /usr/local/bin/jq

This demonstrates how aliasing stage names computed from build arguments can be used to implement sophisticated conditional logic about how different stages can depend on each other, preventing the need to write multiple Dockerfiles or to duplicate code within a Dockerfile.

· docker, code-duplication