Pair Programming with ChatGPT: Bash Retry with STDIN Replay
===========================================================

ChatGPT can be a very useful programming assistant. I’ve used it for debugging, documentation reference, and more recently trying out random ideas I think I would find useful. Its conversational nature and correctibility make it conducive to problem solving.

Unreliable services outside of your control are disruptive to automation. Sometimes the best you can do is retry while praying that the problem goes away after a few attempts. It would be useful to be able to wrap routines that depend on such services in a retry function.

A basic formulation of the problem is relatively straightforward to implement, but a trickier aspect to handle is retrying commands that accept input on STDIN. Properly retrying such a command requires capturing and replaying what is received on STDIN. This added complication made it feel like a fun challenge to try using ChatGPT as a pair-programming assistant.

Before highlighting the progression of the effort, here is a direct link to the final implementation: retry.sh

Here’s the original problem statement I gave it:

Hello! I need a bash script that provides a function called retry. Its first argument is the number of times to retry a command, with 0 meaning to retry forever. The remaining arguments are a command and its arguments to retry. The tricky part about this retry function is that I need it to be able to have buffered what it receives on STDIN, such that it can replay it to the command on each retry. For example, if I write retry 4 bash -e < <(echo "date && false"), the inner command (bash -e) will receive the text date && false every time it’s retried, even though the retry function only received that text on its STDIN once.

It cheerfully produced the following output:

#!/bin/bash

retry() {
  local retries="$1"
  shift
  local cmd=("$@")

  local tmp_input_file
  tmp_input_file="$(mktemp)"
  cat >"${tmp_input_file}"

  local count=0
  local exit_code

  while [[ "${retries}" -eq 0 || "${count}" -lt "${retries}" ]]; do
    "${cmd[@]}" <"${tmp_input_file}"
    exit_code="$?"

    if [[ "${exit_code}" -eq 0 ]]; then
      break
    else
      count=$((count + 1))
    fi
  done

  rm -f "${tmp_input_file}"

  return "${exit_code}"
}

I was pleasantly surprised to find that it behaved correctly for the example I provided!

➜  ~ retry 4 bash -e < <(echo "date && false") 
Wed Apr  5 07:20:24 PM EDT 2023
Wed Apr  5 07:20:24 PM EDT 2023
Wed Apr  5 07:20:24 PM EDT 2023
Wed Apr  5 07:20:24 PM EDT 2023
➜  ~ echo $?
1

At this point I started wondering about a handful of corner cases, and how many of them it could help me address. The first one I pursued was not needing to buffer all of STDIN before starting the first attempt.

Is it possible to modify this to not need to wait for all of stdin to be received before running the command?

Its idea was to replace the temporary file with a named pipe fed by a background job, which implemented the behavior I specified. However, in doing so, it broke the ability for reattempts to receive the same STDIN data as the first attempt because data in named pipes is consumed when read.

@@ -5,25 +5,28 @@ retry() {
   shift
   local cmd=("$@")
 
-  local tmp_input_file
-  tmp_input_file="$(mktemp)"
-  cat >"${tmp_input_file}"
+  local fifo
+  fifo="$(mktemp -u)"
+  mkfifo "${fifo}"
+
+  cat >"${fifo}" &
 
   local count=0
   local exit_code
 
   while [[ "${retries}" -eq 0 || "${count}" -lt "${retries}" ]]; do
-    "${cmd[@]}" <"${tmp_input_file}"
+    "${cmd[@]}" <"${fifo}"
     exit_code="$?"
 
     if [[ "${exit_code}" -eq 0 ]]; then
       break
     else
       count=$((count + 1))
+      echo >&2 "Retrying... (${count}/${retries})"
     fi
   done
 
-  rm -f "${tmp_input_file}"
+  rm -f "${fifo}"
 
   return "${exit_code}"
 }

When I challenged it on this, it acknowledged the error and switched back from a named pipe to a temporary file (retaining the background job to feed it), and used tail to feed the contents of the file into the command.

Will this be able to replay everything that was received on stdin? I forgot if a named pipe can be read from multiple times without the data that has been read becoming consumed.

@@ -5,28 +5,30 @@ retry() {
   shift
   local cmd=("$@")
 
-  local fifo
-  fifo="$(mktemp -u)"
-  mkfifo "${fifo}"
-
-  cat >"${fifo}" &
+  local tmp_input_file
+  tmp_input_file="$(mktemp)"
+  
+  # Background process to write stdin to temporary file
+  cat > "${tmp_input_file}" &
 
   local count=0
   local exit_code
 
   while [[ "${retries}" -eq 0 || "${count}" -lt "${retries}" ]]; do
-    "${cmd[@]}" <"${fifo}"
+    # Read from the temporary file and feed it to the command
+    "${cmd[@]}" < <(tail -n +1 -f "${tmp_input_file}")
     exit_code="$?"
 
     if [[ "${exit_code}" -eq 0 ]]; then
       break
     else
       count=$((count + 1))
-      echo >&2 "Retrying... (${count}/${retries})"
     fi
   done
 
-  rm -f "${fifo}"
+  # Clean up background process and temporary file
+  wait
+  rm -f "${tmp_input_file}"
 
   return "${exit_code}"
 }

I also asked it about correctly handling interrupts such as Ctrl-C. It reasonably added a cleanup function to a trap.

Will this clean up the background cat process if interrupted with ctrl-c? Will the retry function itself exit on ctrl-c or a termination signal? It should.

@@ -10,6 +10,15 @@ retry() {
 
   # Background process to write stdin to temporary file
   cat > "${tmp_input_file}" &
+  bg_pid=$!
+
+  cleanup() {
+    kill "${bg_pid}" 2>/dev/null
+    rm -f "${tmp_input_file}"
+  }
+
+  # Trap to handle signals and clean up
+  trap cleanup EXIT INT TERM
 
   local count=0
   local exit_code
@@ -27,8 +36,7 @@ retry() {
   done
 
   # Clean up background process and temporary file
-  wait
-  rm -f "${tmp_input_file}"
+  cleanup
 
   return "${exit_code}"
 }

When I tested this version, I found that the script was not exiting when the command succeeds. I suspected the use of tail was responsible for this.

This works when the command to retry fails. When the command succeeds, the script hangs and never exits, which is undesirable. I think this is because the tail command is not terminating. I’m not sure why it terminates when the command fails, but not when the command succeeds.

Its next idea was to switch to using cat instead of tail, and to nest it in a loop to continuously read data from the temporary file.

@@ -25,14 +25,18 @@ retry() {
 
   while [[ "${retries}" -eq 0 || "${count}" -lt "${retries}" ]]; do
     # Read from the temporary file and feed it to the command
-    "${cmd[@]}" < <(tail -n +1 -f "${tmp_input_file}")
-    exit_code="$?"
-
-    if [[ "${exit_code}" -eq 0 ]]; then
-      break
-    else
-      count=$((count + 1))
-    fi
+    while ! "${cmd[@]}" < <(cat "${tmp_input_file}"); do
+      exit_code="$?"
+
+      if [[ "${exit_code}" -eq 0 ]]; then
+        break 2
+      else
+        count=$((count + 1))
+        if [[ "${retries}" -ne 0 && "${count}" -ge "${retries}" ]]; then
+          break 2
+        fi
+      fi
+    done
   done
 
   # Clean up background process and temporary file

This is a confusing change, as the value of having two nested loops is unclear. I observed that it never exits when the command succeeds, and reports success when a command fails.

This updated version results in strange behavior. echo "date && false" | retry 4 bash -e produces one line of output before exiting with status 0, when I expected four lines of output (for four attempts) and an exit status of 1. When I run echo "date && true" | retry 4 bash -e, it retries indefinitely and never returns.

The next attempt was to revert back to using tail, and depends on the cleanup function killing the background cat process, which is supposed to cause tail to exit.

@@ -25,18 +25,14 @@ retry() {
 
   while [[ "${retries}" -eq 0 || "${count}" -lt "${retries}" ]]; do
     # Read from the temporary file and feed it to the command
-    while ! "${cmd[@]}" < <(cat "${tmp_input_file}"); do
-      exit_code="$?"
-
-      if [[ "${exit_code}" -eq 0 ]]; then
-        break 2
-      else
-        count=$((count + 1))
-        if [[ "${retries}" -ne 0 && "${count}" -ge "${retries}" ]]; then
-          break 2
-        fi
-      fi
-    done
+    "${cmd[@]}" < <(tail -n +1 -f "${tmp_input_file}")
+    exit_code="$?"
+
+    if [[ "${exit_code}" -eq 0 ]]; then
+      break
+    else
+      count=$((count + 1))
+    fi
   done
 
   # Clean up background process and temporary file

This change also makes no sense, because the command blocking ensures that cleanup never runs, thus never killing the background cat process. I decided that I needed to help it out a bit, and proposed that the background process stream STDIN to the command line-by-line through a named pipe, exiting if it detects that the command running in the background has exited.

The false test works correctly, but the true test hangs after printing the expected one line of output. It seems like the tail command is still not terminating, preventing progression of the script from reaching the cleanup function that would kill the background cat command. Instead of using tail, another function may need to be introduced that reads from tmp_input_file, checking if the command receiving input has terminated or not in between each line read and sent. Maybe a named pipe could be used by the function that reads from tmp_input_file to stream stdin to the command? I’m not sure if this would work though, or if it’s the best approach.

The implementation it generated served as the starting point for my own manual iterations on the program.

@@ -1,5 +1,20 @@
 #!/bin/bash
 
+stream_input() {
+  local tmp_input_file="$1"
+  local cmd_pid="$2"
+  local fifo="$3"
+
+  while kill -0 "${cmd_pid}" 2>/dev/null; do
+    while read -r line; do
+      echo "${line}" > "${fifo}"
+      if ! kill -0 "${cmd_pid}" 2>/dev/null; then
+        break
+      fi
+    done < "${tmp_input_file}"
+  done
+}
+
 retry() {
   local retries="$1"
   shift
@@ -12,9 +27,14 @@ retry() {
   cat > "${tmp_input_file}" &
   bg_pid=$!
 
+  local fifo
+  fifo="$(mktemp -u)"
+  mkfifo "${fifo}"
+
   cleanup() {
     kill "${bg_pid}" 2>/dev/null
     rm -f "${tmp_input_file}"
+    rm -f "${fifo}"
   }
 
   # Trap to handle signals and clean up
@@ -24,8 +44,15 @@ retry() {
   local exit_code
 
   while [[ "${retries}" -eq 0 || "${count}" -lt "${retries}" ]]; do
-    # Read from the temporary file and feed it to the command
-    "${cmd[@]}" < <(tail -n +1 -f "${tmp_input_file}")
+    # Start the command
+    "${cmd[@]}" < "${fifo}" &
+    cmd_pid=$!
+
+    # Stream input to the command
+    stream_input "${tmp_input_file}" "${cmd_pid}" "${fifo}" &
+
+    # Wait for the command to finish
+    wait "${cmd_pid}"
     exit_code="$?"
 
     if [[ "${exit_code}" -eq 0 ]]; then

It was at this point that I decided I had reached the point of diminishing returns, wherein the program was becoming more complex than it was capable of usefully understanding. The fact that even after pointing it in a solid direction it generated a plainly broken implementation was good evidence of this (stream_input will repeatedly dump the contents of tmp_input_file into the named pipe).

The final implementation ended up being significantly more complicated than I was originally expecting this problem to require. It is included below with explanations.

retry.sh

#! /usr/bin/env bash

# usage: retry ATTEMPTS COMMAND [...]
#
# This script provides a retry function capable of capturing and replaying data
# from stdin to the command being retried, so that each attempt receives the
# input.  The data from stdin is streamed to the command immediately, it is not
# pre-buffered, and thus will not lock up if the input from stdin never ends.
#
# If ATTEMPTS is 0, COMMAND is retried forever until it succeeds.

# Suppress some of the job control messages.  Unfortunately it is impossible to
# suppress all of them.
set +m

retry() {
    local retries="$1"
    shift
    local cmd=("$@")

    # Background process which reads stdin data from stdin_reader through
    # in_fifo, buffers it in replay_file, and writes it to the stdin of each
    # command attempt through out_fifo.
    stream_input() {
        local cmd_pid="$1"
        local stdin_reader_pid="$2"
        local in_fifo="$3"
        local out_fifo="$4"
        local replay_file="$5"

        # Hold out_fifo open for writing.  The command might otherwise exit
        # prematurely if the pipe is closed by a bash I/O redirect operator in
        # between individual line writes.
        exec 6>"${out_fifo}"

        while read -r line; do
            echo "${line}" >&6
            if ! kill -0 "${cmd_pid}" 2>/dev/null; then
                # Stop replaying from replay_file to out_fifo if the command
                # has exited.
                break
            fi
        done < "${replay_file}"

        # This is used to signal if data is done being sent over in_fifo by
        # stdin_reader.
        local final_read=false

        exec 5<"${in_fifo}"

        while true; do
            if read -t 0 -r -u 5; then
                read -r -u 5 line
                echo "${line}" >> "${replay_file}"
                echo "${line}" >&6
            elif ${final_read}; then
                break
            fi

            if ! kill -0 "${cmd_pid}" 2>/dev/null; then
                # Stop passing lines from in_fifo to out_fifo if the command
                # has exited.
                break
            elif ! kill -0 "${stdin_reader_pid}" 2>/dev/null; then
                # Since stdin_reader has exited, nothing new will come over
                # in_fifo.  Signal the next iteration of the loop to finish
                # reading whatever remains in the pipe and exit.
                final_read=true
            fi
        done

        # Close out_fifo and in_fifo.
        exec 5<&-
        exec 6>&-
    }

    # Create a named pipe to receive data from stdin asynchronously. Bind the
    # pipe to &3 to keep it open on both ends.  It needs to be held open for
    # writing because subsequent instances of stream_input will block on
    # opening it for reading if stdin_reader has exited, which would otherwise
    # have been the only thing holding it open.  It needs to be held open for
    # reading to prevent holding it open it for writing from blocking.
    local in_fifo
    in_fifo="$(mktemp -u)"
    mkfifo "${in_fifo}"
    exec 3<>"${in_fifo}"
    cat > "${in_fifo}" &
    stdin_reader_pid=$!

    # The replay file buffers data received on stdin so that it can be passed
    # to reattempts of the command.
    local replay_file
    replay_file="$(mktemp)"

    cleanup() {
        # Kill all background jobs
        # https://unix.stackexchange.com/a/609300
        for j in $(jobs | awk '{gsub("[^0-9]","",$1);printf "%%%s\n", $1}'); do
            kill $j
        done
        [[ -e "${in_fifo}" ]] && rm -f "${in_fifo}"
        [[ -e "${replay_file}" ]] && rm -f "${replay_file}"
        [[ -e "${out_fifo}" ]] && rm -f "${out_fifo}"
        wait
    }

    trap cleanup EXIT INT TERM

    local count=0
    local exit_code

    while ((retries == 0 || count < retries)); do
        # Create a named pipe to receive buffered and replayed data from stdin
        # from stream_input.
        local out_fifo
        out_fifo="$(mktemp -u)"
        mkfifo "${out_fifo}"

        "${cmd[@]}" < "${out_fifo}" &
        cmd_pid=$!

        stream_input \
          "${cmd_pid}" \
          "${stdin_reader_pid}" \
          "${in_fifo}" \
          "${out_fifo}" \
          "${replay_file}" &
        stream_input_pid=$!

        wait "${cmd_pid}"
        exit_code="$?"

        rm -f "${out_fifo}"

        if ((exit_code == 0)); then
            break
        else
            count=$((count + 1))
        fi
    done

    cleanup

    return "${exit_code}"
}

Now that I have a functional implementation, it is easy to see in hindsight that bash was likely not the best choice for this. It’s if nothing else an interesting example of the types of limits you can run into when using ChatGPT for programming problem-solving.

· bash, chatgpt