Back to overview
Degraded

Automatos Primary Worker Offline

Dec 05 at 10:40am UTC
Affected services
Automatos - Arbitrum One
Automatos - Base
Automatos - Blast
Automatos - Boba Ethereum
Automatos - BSC
Automatos - Ethereum
Automatos - Filecoin
Automatos - Gravity
Automatos - Linea
Automatos - Lisk
Automatos - Manta Pacific
Automatos - Mantle
Automatos - Mode
Automatos - Moonbeam
Automatos - Optimism
Automatos - Polygon
Automatos - Polygon ZkEVM
Automatos - Rootstock
Automatos - Scroll
Automatos - Sei
Automatos - Taiko
Automatos - ZkSync Era

Resolved
Dec 08 at 05:43am UTC

Post Mortem: Automatos Primary Worker Offline

What happened

  1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC.
  2. Our team was notified at 11:08am UTC and responded immediately.
  3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC.

Why this happened

  1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash.

Estimated costs

  1. The backup workers took over at the cost of short delays in submitting transactions.

How to prevent this in the future

  1. We dramatically reduced the verbosity of network error log reporting to prevent such issues.
  2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information.
  3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs.

Updated
Dec 05 at 11:17am UTC

The primary worker was manually rebooted and came back online.

Created
Dec 05 at 10:40am UTC

The primary Automatos worker node went offline.