Incidents | Adrastia Incidents reported on status page for Adrastia https://status.adrastia.io/ https://d1lppblt9t2x15.cloudfront.net/logos/9aa961793941f89182ee6d27e73ed0e7.png Incidents | Adrastia https://status.adrastia.io/ en adrastia.io recovered https://status.adrastia.io/ Wed, 14 May 2025 00:15:24 +0000 https://status.adrastia.io/#279f4925256d75fa6212c6b9a8fb179950b0b36e87cd9979200f9bd9f2ac2430 adrastia.io recovered adrastia.io went down https://status.adrastia.io/ Wed, 14 May 2025 00:13:53 +0000 https://status.adrastia.io/#279f4925256d75fa6212c6b9a8fb179950b0b36e87cd9979200f9bd9f2ac2430 adrastia.io went down Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Sun, 08 Dec 2024 05:43:00 -0000 https://status.adrastia.io/incident/474190#9c21b1f43c718875fa02ddf3f99bbcdec09052f3afcca1e1d9ce9486e8bcdd13 # Post Mortem: Automatos Primary Worker Offline ## What happened 1. The primary worker node became unresponsive at about Dec 05 at 10:40am UTC. 2. Our team was notified at 11:08am UTC and responded immediately. 3. After a brief issue diagnosis, the primary worker was manually rebooted and came back online and fully functional at 11:17 UTC. ## Why this happened 1. Our 3rd logging party service used for verbose debugging, BetterStack, went down briefly, resulting in network errors. We log network errors, but the amount of data being sent was more than BetterStack permits, resulting in an infinite cycle of errors. Eventually, the server ran out of memory as its network buffers grew, causing it to crash. ## Estimated costs 1. The backup workers took over at the cost of short delays in submitting transactions. ## How to prevent this in the future 1. We dramatically reduced the verbosity of network error log reporting to prevent such issues. 2. We have suspended using BetterStack log reporting, until investigation to avoid and handle error 429s is concluded. Note that our Datadog log system still records all vital information. 3. We created an alert for network throughput changes so that we can closely monitor irregularities, being alerted before downtime occurs. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 11:17:00 -0000 https://status.adrastia.io/incident/474190#b7178979d564d828a26677847bed66f5b7da5d674996b37dd7d82f75030173ec The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/474190 Thu, 05 Dec 2024 10:40:00 -0000 https://status.adrastia.io/incident/474190#7d1faa18b4e5cf8b382ec41b49b4a7249cc263d1ae3afe1b6490c43851950871 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:30:00 -0000 https://status.adrastia.io/incident/447941#280f508b552a42b2d3bdf820ebf3179b2c1e0c703e8285663b0861168ee0c22f The primary worker was manually rebooted and came back online. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos Primary Worker Offline https://status.adrastia.io/incident/447941 Mon, 21 Oct 2024 01:10:00 -0000 https://status.adrastia.io/incident/447941#018dd2be693e563a0d0f8b01710e5be9a2da3be0839b262c6cb2795c01960c13 The primary Automatos worker node went offline. Automatos - Sei recovered https://status.adrastia.io/ Thu, 19 Sep 2024 19:34:33 +0000 https://status.adrastia.io/#758d4d29285dc0e9e997886b744508b117ec644fc7a4c5da508fc839288e8517 Automatos - Sei recovered Automatos Sei Downtime Post Mortem https://status.adrastia.io/incident/431932 Thu, 19 Sep 2024 19:34:00 -0000 https://status.adrastia.io/incident/431932#298409233b0dc64bb397ebe1754a87d6a9c6b8387c97629611a43e326beb8ff9 ## What happened - There was a brief 22-minute period where Sei connectivity was down: 1. Between 12:12 PDT and 12:34 PDT, September 19, 2024. ## Why this happened - All of our RPC servers for Sei went down. ## Estimated costs - Automatos was down on Sei for about 22 minutes. - The estimated cost of this downtime is zero. ## How to prevent this in the future - We've added an additional RPC provider for Sei. Automatos - Sei went down https://status.adrastia.io/ Thu, 19 Sep 2024 19:12:45 +0000 https://status.adrastia.io/#758d4d29285dc0e9e997886b744508b117ec644fc7a4c5da508fc839288e8517 Automatos - Sei went down Automatos Mode Downtime Post Mortem https://status.adrastia.io/incident/403785 Fri, 26 Jul 2024 23:40:00 -0000 https://status.adrastia.io/incident/403785#e1ccfa1962ee9fa6ae53d1c3816384efa68c0c0fae6c4118e019006fba6c76e3 ## What happened - There was a brief 7-minute period where Mode transactions failed to be submitted: 1. Between 16:35 PST and 16:43 PST, July 25, 2024. ## Why this happened - Some of our RPC providers went down. - Other RPC providers lagged behind the latest block. ## Estimated costs - Automatos was down on Mode for about 7 minutes. - Adrastia Oracles suffered minor staleness: 1. Ionic's utilization and error oracle had one update that was delayed by up to 7 minutes. ## How to prevent this in the future - We'll explore additional RPC providers, if available. Automatos - Mode recovered https://status.adrastia.io/ Thu, 25 Jul 2024 23:43:31 +0000 https://status.adrastia.io/#a39332a27c057412ac74ef995a01d56edd660679d81c28183df35acf41e7ed52 Automatos - Mode recovered Automatos - Mode went down https://status.adrastia.io/ Thu, 25 Jul 2024 23:35:43 +0000 https://status.adrastia.io/#a39332a27c057412ac74ef995a01d56edd660679d81c28183df35acf41e7ed52 Automatos - Mode went down Adrastia App - Rootstock Connectivity recovered https://status.adrastia.io/ Sun, 14 Jul 2024 06:47:15 +0000 https://status.adrastia.io/#4267bcaf5d48f92f91fe8b9fd869acdbac3a2195e232c70053d63f0fecf6079a Adrastia App - Rootstock Connectivity recovered Adrastia App - Rootstock Connectivity went down https://status.adrastia.io/ Sun, 14 Jul 2024 06:32:27 +0000 https://status.adrastia.io/#4267bcaf5d48f92f91fe8b9fd869acdbac3a2195e232c70053d63f0fecf6079a Adrastia App - Rootstock Connectivity went down Web App Rootstock Connectivity Downtime Post Mortem https://status.adrastia.io/incident/397934 Sat, 13 Jul 2024 23:59:00 -0000 https://status.adrastia.io/incident/397934#368806d8bb54af463ba39bbbc243feedaa129fec77c22e5e42dd1bc47f76b9c4 ## What happened - At 11:32 PM PST, all of our web app's Rootstock RPC providers become unavailable. - At 11:37 PM PST, we were notified of downtime and responded immediately. - At 11:47PM PST, our fix went live. ## Why this happened - Our web app was only using 2 RPC providers for Rootstock, and they both went down. ## Estimated costs - Users were unable to view Automatos worker balances for Rootstock via our web app for about 15 minutes. This did not negatively affect anyone. ## How to prevent this in the future - We've already added another RPC provider for Rootstock to our web app, DRPC. Automatos Arbitrum One Downtime Post Mortem https://status.adrastia.io/incident/387215 Thu, 20 Jun 2024 21:52:00 -0000 https://status.adrastia.io/incident/387215#86c79dc18353648e673861884d36b8b7d7b00101244baef1ce91fc41a8b955ed ## What happened - There were two periods where many Arbitrum One transactions failed to be submitted: 1. Between 04:27 PST and 06:00 PST. 2. Between 09:59 PST and 11:04 PST. ## Why this happened - Severe network congesion led to extreme gas costs — about 2000 times more costly than usual. - Our workers used hardcoded gas limits of 20M, meanwhile average transactions were consuming up to 1B gas since the network uses dynamic gas accounting. Therefore, our worker transactions were rejected due to insufficient gas. ## Estimated costs - Automatos was down on Arbitrum One for about 2 hours and 38 minutes. - Adrastia Oracles suffered minor staleness: 1. Between 04:27 PST and 06:00 PST: Up to about 0.5% price inaccuracy. 2. Between 09:59 PST and 11:04 PST: Up to about 0.1% price inaccuracy. ## How to prevent this in the future - We'll use dynamic gas estimations on Arbitrum One. - We'll improve our error detection and escalation processes to ensure high response times to incidents like these. Automatos Blast Downtime Post Mortem https://status.adrastia.io/incident/375719 Tue, 28 May 2024 05:03:00 -0000 https://status.adrastia.io/incident/375719#12b16807879bf4bd718cd0d18b7d8833e43e0434db51c0cbc43b999c7a6c4f28 ## What happened 1. At 9:50PM PST, all of our worker nodes lost blockchain connectivity. 2. At 9:54PM PST, we were notified of downtime and immediately responded. 3. At 10:03PM PST, we changed RPC configurations to bring Automatos back online. Incident resolved. ## Why this happened 1. All of the RPC providers we use with our Automatos workers went offline. ## Estimated costs 1. Automatos was down on Blast for about 13 minutes. 2. Automatos did not miss any work that needed to be performed. ## How to prevent this in the future 1. We'll add more backup RPC providers.