Reconnect with Exponential Backoff
This tutorial builds a TCP client that connects to a server and automatically reconnects with exponential backoff when the connection fails. You’ll learn how to combine timers with sockets for retry logic and how to use stop tokens for graceful shutdown.
| Code snippets assume: |
#include <boost/corosio/endpoint.hpp>
#include <boost/corosio/io_context.hpp>
#include <boost/corosio/tcp_socket.hpp>
#include <boost/corosio/timer.hpp>
#include <boost/capy/buffers.hpp>
#include <boost/capy/cond.hpp>
#include <boost/capy/ex/run_async.hpp>
#include <boost/capy/task.hpp>
namespace corosio = boost::corosio;
namespace capy = boost::capy;
Overview
Client applications often need to maintain a persistent connection to a server. When the server is temporarily unavailable — during a restart, a network blip, or a deployment — the client should retry rather than give up immediately. Retrying too aggressively wastes resources and can overwhelm a recovering server, so the delay between attempts should grow over time.
Exponential backoff solves this: start with a short delay, double it on each failure, and cap it at a maximum. This gives fast recovery when the outage is brief and backs off gracefully when it isn’t.
This tutorial demonstrates:
-
Separating the backoff policy (pure state) from the mechanism (timer wait)
-
Using
timerfor inter-attempt delays -
Graceful cancellation via stop tokens
-
Why
io_context::stop()alone is not sufficient for coroutine shutdown
The Backoff Policy
The delay logic is pure computation — no I/O, no coroutines. A simple value type tracks the current delay, doubles it on each call, and caps it at a configured maximum:
struct exponential_backoff
{
using duration = std::chrono::milliseconds;
private:
duration initial_;
duration delay_;
duration max_;
public:
exponential_backoff(duration initial, duration max) noexcept
: initial_(initial)
, delay_(initial)
, max_(max)
{
}
/// Return the current delay and advance to the next.
duration next() noexcept
{
auto current = (std::min)(delay_, max_);
delay_ = (std::min)(delay_ * 2, max_);
return current;
}
/// Restart the sequence from the initial delay.
void reset() noexcept
{
delay_ = initial_;
}
};
With an initial delay of 500ms and a 30s cap, calling next() produces:
500, 1000, 2000, 4000, 8000, 16000, 30000, 30000, …
Keeping the policy separate from the timer means it can be reused in any context — synchronous retries, tests, or logging — without pulling in async machinery.
Session Coroutine
Once connected, the client reads data until the peer disconnects:
capy::task<>
do_session(corosio::tcp_socket& sock)
{
char buf[4096];
for (;;)
{
auto [ec, n] =
co_await sock.read_some(capy::mutable_buffer(buf, sizeof buf));
if (ec)
break;
std::cout.write(buf, static_cast<std::streamsize>(n));
std::cout.flush();
}
}
This is the same read loop you would find in any echo client. The interesting part is what happens after it returns — the caller reconnects.
Reconnection Loop
The retry loop ties everything together. On each failed connection it asks the backoff policy for the next delay, waits on a timer, and tries again:
capy::task<>
connect_with_backoff(
corosio::io_context& ioc,
corosio::endpoint ep,
exponential_backoff backoff,
int max_attempts)
{
corosio::tcp_socket sock(ioc);
corosio::timer delay(ioc);
int attempt = 0;
for (;;)
{
++attempt;
auto [ec] = co_await sock.connect(ep);
if (!ec)
{
std::cout << "Connected on attempt " << attempt << std::endl;
co_await do_session(sock);
// Peer disconnected — restart the retry sequence
sock.close();
backoff.reset();
attempt = 0;
continue;
}
sock.close();
if (max_attempts > 0 && attempt >= max_attempts)
co_return;
auto wait_for = backoff.next();
delay.expires_after(wait_for);
auto [timer_ec] = co_await delay.wait();
if (timer_ec == capy::cond::canceled)
co_return;
// delay doubles automatically via backoff.next()
}
}
There are two exit conditions:
-
Max attempts exhausted — the coroutine gives up.
-
Timer cancelled — someone signaled the stop token, requesting graceful shutdown. The coroutine unwinds through normal control flow.
After a successful connection and subsequent disconnect, backoff.reset()
restarts the delay sequence from the initial value.
Graceful Shutdown with Stop Tokens
The key insight of this tutorial: io_context::stop() does not cancel
pending operations. It only stops the event loop. Suspended coroutines are
left in place and destroyed during ~io_context without ever observing an
error. This is by design — stop() is a pause that preserves state for
a potential restart().
For graceful shutdown where coroutines unwind through their own control flow, use a stop token:
std::stop_source stop_src;
capy::run_async(ioc.get_executor(), stop_src.get_token())(
connect_with_backoff(ioc, ep, backoff, 10));
// Later, from any thread:
stop_src.request_stop();
When the stop source is signaled:
-
The timer’s
wait()returnscond::canceled. -
The coroutine checks the error and executes
co_return. -
Local variables (
sock,delay) are destroyed through normal unwinding. -
With no more outstanding work,
run()returns. -
~io_contextfinds an empty heap — nothing to clean up.
Contrast with calling stop() directly:
-
run()exits immediately. -
The coroutine remains suspended — it never sees an error.
-
~io_contextcallsh.destroy()on the coroutine frame, bypassing its error-handling logic.
Both paths are safe (no leaks or crashes), but only the stop token path executes the coroutine’s own cleanup code.
| Mechanism | Coroutine sees cancellation? | Use case |
|---|---|---|
|
Yes — operations return |
Graceful shutdown |
|
No — coroutines stay suspended |
Pause and resume the event loop |
|
No — frames destroyed via |
Final cleanup (after |
Main Function
int main(int argc, char* argv[])
{
if (argc != 3)
{
std::cerr << "Usage: reconnect <ip-address> <port>\n";
return EXIT_FAILURE;
}
corosio::ipv4_address addr;
if (auto ec = corosio::parse_ipv4_address(argv[1], addr); ec)
{
std::cerr << "Invalid IP address: " << argv[1] << "\n";
return EXIT_FAILURE;
}
auto port = static_cast<std::uint16_t>(std::atoi(argv[2]));
corosio::io_context ioc;
using namespace std::chrono_literals;
exponential_backoff backoff(500ms, 30s);
std::stop_source stop_src;
capy::run_async(ioc.get_executor(), stop_src.get_token())(
connect_with_backoff(ioc, corosio::endpoint(addr, port), backoff, 10));
// Run the event loop on a background thread so main
// can signal cancellation after a timeout.
auto worker = std::jthread([&ioc] { ioc.run(); });
std::this_thread::sleep_for(5s);
stop_src.request_stop();
}
The event loop runs on a background thread. After five seconds the main thread
signals cancellation. The coroutine observes cond::canceled, unwinds, the
work count reaches zero, and run() returns. The jthread destructor joins
automatically.
Testing
Start an echo server on one terminal:
$ ./echo_server 8080 10
Echo server listening on port 8080 with 10 workers
Run the reconnect client on another:
$ ./reconnect 127.0.0.1 8080
Connected on attempt 1
Stop the server and watch the client retry:
Attempt 1 failed: Connection refused
Retrying in 500ms
Attempt 2 failed: Connection refused
Retrying in 1000ms
Attempt 3 failed: Connection refused
Retrying in 2000ms
Restart the server — the client reconnects on the next attempt.
To test the no-server case, point the client at a port with nothing listening:
$ ./reconnect 127.0.0.1 19999
Attempt 1 failed: Connection refused
Retrying in 500ms
Attempt 2 failed: Connection refused
Retrying in 1000ms
...
Retry cancelled
After five seconds the stop token fires and the client exits cleanly.
Next Steps
-
Timers Guide — Timer operations in detail
-
Sockets Guide — Socket operations and error handling
-
I/O Context Guide — Event loop mechanics,
stop(), andrestart() -
Error Handling — Portable error conditions and
cond