I’ve been using an AR multi DB config for quite a long time; however, recently I came across an unpleasant situation: One of the remote DBs configured becomes unreachable due to maintenance and I don’t have control over it, which causes the request to hang forever because of DB’s lack of response.
So my question is: How can I gracefully handle this kind of situations? I only connect to execute readonly queries, so it shouldn’t affect app’s performance and, given a certain timeout or lack of response where the queries are executed, it should rescue and continue with the rest of the logic.
To add a bit more of context, I set up 3 databases: The main one is handled in local PostgreSQL, another one in remote MSSQL and the troublesome one in remote MySQL. All of them work fine except when it comes to downtime (freezes and never responds).
A possible solution for the situation you’re describing could be implementing a circuit breaker. This is a design pattern used to detect failures and prevent a failure from constantly recurring. In the case of a lack of response from your remote databases, a circuit breaker can stop the requests to the failing database after several failures, preventing the request from hanging indefinitely. After a specific amount of time, it will allow a limited number of test requests to pass through, and if these succeed, the circuit will “close” and normal operation will resume. But if these fail, the circuit remains “open” and the downtime continues.
A great library for implementing this pattern in Ruby is Semian, developed by Shopify. Semian provides resiliency for third-party services and databases in a centralized manner with minimal code changes. You can find it here: GitHub - Shopify/semian: Resiliency toolkit for Ruby for failing fast.
By integrating Semian into your application, you can manage the unresponsiveness of the remote MySQL database more gracefully. You can set it up to wrap your database calls with a circuit breaker and configure the number of failures that will open the circuit and the reset timeout that will put the circuit into the half-open state.