Problems with Kamal and ActionCable connection

mbajur · March 23, 2024, 2:49pm

Hello,

I recently moved my production app from Dokku to Kamal and loving it so far but one feature of rails i’ve been using went broken - the ActionCable support.

Basically, after the migration, my frontend is not able to connect to /cable endpoint anymore. The message in browser log says: WebSocket connection to 'wss://domain.com/cable' failed: without giving me any more information about why it failed after the colon. Also, the rails log is empty for /cable route. The only suspicious thing i can see in traefik logs are these lines:

2024-03-22T18:56:17.441214126Z time="2024-03-22T18:56:17Z" level=debug msg="'499 Client Closed Request' caused by: context canceled"

But i’m not sure if that’s related as this app is on production already with decent traffic so i’m not able to tell if that relates to the failing wss connection or not. However, the amount of these logs doesn’t match the amount of failed wss connection retries so i would say that’s not it.

My kamal config looks as follows:

service: myapp

image: mbajur/myapp

volumes:
  - "/home/app/myapp-cache:/app/tmp/cache"
  - "/home/app/myapp-shared:/app/shared"
  - "/home/app/myapp-storage:/app/storage"

servers:
  web:
    hosts:
      - x.x.x.x
    labels:
      traefik.http.routers.myapp.entrypoints: websecure
      traefik.http.routers.myapp.rule: Host(`domain.com`)
      traefik.http.routers.myapp.tls.certresolver: letsencrypt
    options:
      network: "private"
  job:
    hosts:
      - x.x.x.x
    cmd: bundle exec rake solid_queue:start
    options:
      network: "private"
  clock:
    hosts:
      - x.x.x.x
    cmd: bundle exec clockwork clock.rb
    options:
      network: "private"

registry:
  server: ghcr.io
  username: mbajur

  password:
    - KAMAL_REGISTRY_PASSWORD

# Inject ENV variables into containers (secrets come from .env).
# Remember to run `kamal env push` after making changes!
env:
  clear:
    HOSTNAME: domain.com
    APP_DOMAIN: domain.com
    DB_HOST: x.x.x.x
    RAILS_SERVE_STATIC_FILES: true
    RAILS_LOG_TO_STDOUT: true
    ARTISTS_TAXONOMY_ID: 9
    CATEGORIES_TAXONOMY_ID: 8
    PATTERNS_TAXONOMY_ID: 10
    FLIPPER_PSTORE_PATH: shared/flipper.pstore
  secret:
    - POSTGRES_PASSWORD
    - RAILS_MASTER_KEY

ssh:
  user: app

builder:
  dockerfile: Dockerfile.production
  multiarch: false
  cache:
    type: registry

accessories:
  db:
    image: postgres:15
    host: x.x.x.x
    port: 5432
    env:
      clear:
        POSTGRES_USER: "myapp"
        POSTGRES_DB: 'myapp_production'
      secret:
        - POSTGRES_PASSWORD
    files:
      - config/init.sql:/docker-entrypoint-initdb.d/setup.sql
    directories:
      - data:/var/lib/postgresql/data
    options:
      network: "private"

traefik:
  args:
    accesslog: true
  options:
    network: "private"
    publish:
      - "443:443"
    volume:
      - "/letsencrypt/acme.json:/letsencrypt/acme.json"
  args:
    entryPoints.web.address: ":80"
    entryPoints.websecure.address: ":443"
    entryPoints.web.http.redirections.entryPoint.to: websecure # We want to force https
    entryPoints.web.http.redirections.entryPoint.scheme: https
    entryPoints.web.http.redirections.entrypoint.permanent: true
    certificatesResolvers.letsencrypt.acme.email: "email@example.com"
    certificatesResolvers.letsencrypt.acme.storage: "/letsencrypt/acme.json" # Must match the path in `volume`
    certificatesResolvers.letsencrypt.acme.httpchallenge: true
    certificatesResolvers.letsencrypt.acme.httpchallenge.entrypoint: web

healthcheck:
  path: /health/ready
  port: 4000
  max_attempts: 15

# Bridge fingerprinted assets, like JS and CSS, between versions to avoid
# hitting 404 on in-flight requests. Combines all files from new and old
# version inside the asset_path.
# asset_path: /rails/public/assets

# Configure rolling deploys by setting a wait time between batches of restarts.
# boot:
#   limit: 10 # Can also specify as a percentage of total hosts, such as "25%"
#   wait: 2

# Configure the role used to determine the primary_host. This host takes
# deploy locks, runs health checks during the deploy, and follow logs, etc.
#
# Caution: there's no support for role renaming yet, so be careful to cleanup
#          the previous role on the deployed hosts.
# primary_role: web

# Controls if we abort when see a role with no hosts. Disabling this may be
# useful for more complex deploy configurations.
#
# allow_empty_roles: false

Thank you in advance for any clues!

Josh_Marchello · March 31, 2024, 4:13pm

What adapter are you using for actioncable?

mbajur · March 31, 2024, 4:42pm

I’m using postgresql as an adapter both on dev and prod

Josh_Marchello · March 31, 2024, 4:49pm

Is there a firewall in front of your servers?

Josh_Marchello · March 31, 2024, 4:59pm

This SO thread provides some good context to explore.

https://stackoverflow.com/questions/12973304/possible-reason-for-nginx-499-error-codes

I’d be looking to see if any load balancers or firewalls are not properly configured to support web sockets. Or maybe your server instance is underpowered and responding too slowly.

mbajur · March 31, 2024, 5:41pm

The weirdest thing about that is that not any error is even raised anywhere. No exact errors in JS console, no errors in network tab nor nginx server. Is there any way to at least get a error code using some other tool, like curl for example?

My server has plenty of free resources so i don’t think that’s it

Josh_Marchello · March 31, 2024, 6:32pm

I’m not an expert in these things but I think the reason you’re not seeing any errors in the browser is because it’s sending the request to the proxy (Traefik), which accepts the connection then acts as a client, forwarding the request to your server. So Traefik is reporting error because the breakdown is happening between it and the rails app, not between the browser and it.

And the server isn’t throwing anything because it’s none the wiser that there’s even a problem. It’s either not ever getting the request (because of a firewall or something) or it’s simply replying but doing so too slowly.

I’d start looking at network configurations and seeing what could be blocking or slowing down the request.

mbajur · April 5, 2024, 8:07pm

Thank you for all the effort. I will try to speak with traefik people and see where it takes me. I will write here in cause i find any solution for that.

mbajur · May 25, 2024, 5:37am

It turned out to be an ActionCable config issue. All i had to do was to configure config.action_cable.allowed_request_origins

See related discussion on kamal repo:

github.com/basecamp/kamal

Rails WSS connection broken

opened 07:00PM - 22 Mar 24 UTC

closed 05:33AM - 25 May 24 UTC

mbajur

Hello, I recently moved my production app from Dokku to Kamal and loving it s…o far but one feature of rails i've been using went broken - the ActionCable support. Basically, after the migration, my frontend is not able to connect to /cable endpoint anymore. The message in browser log says: `WebSocket connection to 'wss://domain.com/cable' failed: ` without giving me any more information about why it failed after the colon. Also, the rails log is empty for `/cable` route. The only suspicious thing i can see in traefik logs are these lines: ``` 2024-03-22T18:56:17.441214126Z time="2024-03-22T18:56:17Z" level=debug msg="'499 Client Closed Request' caused by: context canceled" ``` But i'm not sure if that's related as this app is on production already with decent traffic so i'm not able to tell if that relates to the failing wss connection or not. However, the amount of these logs doesn't match the amount of failed wss connection retries so i would say that's not it. My kamal config looks as follows: ``` service: myapp image: mbajur/myapp volumes: - "/home/app/myapp-cache:/app/tmp/cache" - "/home/app/myapp-shared:/app/shared" - "/home/app/myapp-storage:/app/storage" servers: web: hosts: - x.x.x.x labels: traefik.http.routers.myapp.entrypoints: websecure traefik.http.routers.myapp.rule: Host(`domain.com`) traefik.http.routers.myapp.tls.certresolver: letsencrypt options: network: "private" job: hosts: - x.x.x.x cmd: bundle exec rake solid_queue:start options: network: "private" clock: hosts: - x.x.x.x cmd: bundle exec clockwork clock.rb options: network: "private" registry: server: ghcr.io username: mbajur password: - KAMAL_REGISTRY_PASSWORD # Inject ENV variables into containers (secrets come from .env). # Remember to run `kamal env push` after making changes! env: clear: HOSTNAME: domain.com APP_DOMAIN: domain.com DB_HOST: x.x.x.x RAILS_SERVE_STATIC_FILES: true RAILS_LOG_TO_STDOUT: true ARTISTS_TAXONOMY_ID: 9 CATEGORIES_TAXONOMY_ID: 8 PATTERNS_TAXONOMY_ID: 10 FLIPPER_PSTORE_PATH: shared/flipper.pstore secret: - POSTGRES_PASSWORD - RAILS_MASTER_KEY ssh: user: app builder: dockerfile: Dockerfile.production multiarch: false cache: type: registry accessories: db: image: postgres:15 host: x.x.x.x port: 5432 env: clear: POSTGRES_USER: "myapp" POSTGRES_DB: 'myapp_production' secret: - POSTGRES_PASSWORD files: - config/init.sql:/docker-entrypoint-initdb.d/setup.sql directories: - data:/var/lib/postgresql/data options: network: "private" traefik: args: accesslog: true options: network: "private" publish: - "443:443" volume: - "/letsencrypt/acme.json:/letsencrypt/acme.json" args: entryPoints.web.address: ":80" entryPoints.websecure.address: ":443" entryPoints.web.http.redirections.entryPoint.to: websecure # We want to force https entryPoints.web.http.redirections.entryPoint.scheme: https entryPoints.web.http.redirections.entrypoint.permanent: true certificatesResolvers.letsencrypt.acme.email: "email@example.com" certificatesResolvers.letsencrypt.acme.storage: "/letsencrypt/acme.json" # Must match the path in `volume` certificatesResolvers.letsencrypt.acme.httpchallenge: true certificatesResolvers.letsencrypt.acme.httpchallenge.entrypoint: web healthcheck: path: /health/ready port: 4000 max_attempts: 15 # Bridge fingerprinted assets, like JS and CSS, between versions to avoid # hitting 404 on in-flight requests. Combines all files from new and old # version inside the asset_path. # asset_path: /rails/public/assets # Configure rolling deploys by setting a wait time between batches of restarts. # boot: # limit: 10 # Can also specify as a percentage of total hosts, such as "25%" # wait: 2 # Configure the role used to determine the primary_host. This host takes # deploy locks, runs health checks during the deploy, and follow logs, etc. # # Caution: there's no support for role renaming yet, so be careful to cleanup # the previous role on the deployed hosts. # primary_role: web # Controls if we abort when see a role with no hosts. Disabling this may be # useful for more complex deploy configurations. # # allow_empty_roles: false ``` Thank you in advance for any clues!

jcsmith · October 12, 2024, 10:55pm

I’m having a similar issue. I am using Rails 8.0.0.beta1. I configured the allowed request origins and even tried disabling force ssl.

This is what I see in the app logs:

ActionController::RoutingError (No route matches [GET] "/cable"):

I verified the action cable route using Rails.application.config.action_cable and got this result:

{:mount_path=>"/cable", :precompile_assets=>true, :allowed_request_origins=>["https://example.app"]}

Any advice would be helpful.

sorenmalling · October 13, 2024, 6:30am

Do you happen to have devise installed with rails 8?

github.com/heartcombo/devise

Rails 8: route initialization messed up

opened 11:46AM - 27 Sep 24 UTC

miharekar

## Environment - Ruby **3.3.5** - Rails **Rails 8.0.0.beta1** - Devise **4.…9.4** ## Current behavior A bit of story/context to help with discoverability of this issue if anyone is googling with similar symptoms. When I upgraded my app to Rails 8 beta ActionCable stopped working with a peculiar message in web console: `connection.js:39 WebSocket connection to 'wss://visualizer.coffee/cable' failed` And indeed I saw a lot of `ActionController::RoutingError (No route matches [GET] "/cable")` in the logs. At first I thought it's an issue with my hosting provider Fly, but then I was able to replicate this locally. Not in `development` environment though, but with `RAILS_ENV=production PORT=3001 rails s` I got those same errors. I assumed something in Rails broke (it is beta after all) so I made a quick `rails new` app, and, _damn it_, it worked just fine in production env. A couple of `bundle open`s later and pokings around I found [this block](https://github.com/rails/rails/blob/15ddce90583bdf169ae69449b42db10be9f714c9/actioncable/lib/action_cable/engine.rb#L66-L68) which prepends the mounting of "/cable" or whatever the ActionCable `mount_path` is set to. With some further puts debugging I found that in the brand new app the block registers and executes while in my app it registers but **never executes**. I added some puts debugs inside [ActionDispatch#clear!](https://github.com/rails/rails/blob/15ddce90583bdf169ae69449b42db10be9f714c9/actionpack/lib/action_dispatch/routing/route_set.rb#L490-L497) and found that in the new app the ActionCable initializer is registered *before* first `clear` call, but in my app it happened `after`. So it has no chance to run the block. **Culprit found**. Now I needed to know why this happens and I put some `puts caller` in there. The only diff was that one included `devise-4.9.4/lib/devise/rails.rb:17` which is [this piece of code with comment _# Force routes to be loaded if we are doing any eager load_](https://github.com/heartcombo/devise/blob/72884642f5700439cc96ac560ee19a44af5a2d45/lib/devise/rails.rb#L15-L18). I lack the deep knowledge of Devise to know how to proceed or what to do now, but here's how you can replicate it: 1. `rails new name_of_app` with Rails version 8.0.0.beta1 2. add devise to Gemfile and `bundle` 3. run Rails in production with `RAILS_ENV=production PORT=3001 rails s` 4. `curl localhost:3001/cable` Without devise you get `Page not found%` response with these logs: ``` [0812aa3e-bfcd-41a5-958c-c27ab4b2a982] Started GET "/cable" for 127.0.0.1 at 2024-09-27 13:45:09 +0200 [0812aa3e-bfcd-41a5-958c-c27ab4b2a982] Started GET "/cable"[non-WebSocket] for 127.0.0.1 at 2024-09-27 13:45:09 +0200 [0812aa3e-bfcd-41a5-958c-c27ab4b2a982] Failed to upgrade to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: , HTTP_UPGRADE: ) [0812aa3e-bfcd-41a5-958c-c27ab4b2a982] Finished "/cable"[non-WebSocket] for 127.0.0.1 at 2024-09-27 13:45:09 +0200 ``` as expected. With devise you get 404 page with these logs: ``` [6795276c-a698-4bf8-bf9f-d4394a41e648] Started GET "/cable" for 127.0.0.1 at 2024-09-27 13:44:46 +0200 [6795276c-a698-4bf8-bf9f-d4394a41e648] [6795276c-a698-4bf8-bf9f-d4394a41e648] ActionController::RoutingError (No route matches [GET] "/cable"): [6795276c-a698-4bf8-bf9f-d4394a41e648] ``` With devise and `reload_routes` set to `false` in the initializer it also works, but probably it should work out of the box? 😅 Let me know if something isn't clear and/or how I can help out.

jcsmith · October 13, 2024, 12:45pm

I do have devise added, I appreciate the help!

sorenmalling · October 13, 2024, 6:21pm

The GitHub issue contains the setting needed to be set

github.com/heartcombo/devise

Rails 8: route initialization messed up

opened 11:46AM - 27 Sep 24 UTC

miharekar

## Environment - Ruby **3.3.5** - Rails **Rails 8.0.0.beta1** - Devise **4.…9.4** ## Current behavior A bit of story/context to help with discoverability of this issue if anyone is googling with similar symptoms. When I upgraded my app to Rails 8 beta ActionCable stopped working with a peculiar message in web console: `connection.js:39 WebSocket connection to 'wss://visualizer.coffee/cable' failed` And indeed I saw a lot of `ActionController::RoutingError (No route matches [GET] "/cable")` in the logs. At first I thought it's an issue with my hosting provider Fly, but then I was able to replicate this locally. Not in `development` environment though, but with `RAILS_ENV=production PORT=3001 rails s` I got those same errors. I assumed something in Rails broke (it is beta after all) so I made a quick `rails new` app, and, _damn it_, it worked just fine in production env. A couple of `bundle open`s later and pokings around I found [this block](https://github.com/rails/rails/blob/15ddce90583bdf169ae69449b42db10be9f714c9/actioncable/lib/action_cable/engine.rb#L66-L68) which prepends the mounting of "/cable" or whatever the ActionCable `mount_path` is set to. With some further puts debugging I found that in the brand new app the block registers and executes while in my app it registers but **never executes**. I added some puts debugs inside [ActionDispatch#clear!](https://github.com/rails/rails/blob/15ddce90583bdf169ae69449b42db10be9f714c9/actionpack/lib/action_dispatch/routing/route_set.rb#L490-L497) and found that in the new app the ActionCable initializer is registered *before* first `clear` call, but in my app it happened `after`. So it has no chance to run the block. **Culprit found**. Now I needed to know why this happens and I put some `puts caller` in there. The only diff was that one included `devise-4.9.4/lib/devise/rails.rb:17` which is [this piece of code with comment _# Force routes to be loaded if we are doing any eager load_](https://github.com/heartcombo/devise/blob/72884642f5700439cc96ac560ee19a44af5a2d45/lib/devise/rails.rb#L15-L18). I lack the deep knowledge of Devise to know how to proceed or what to do now, but here's how you can replicate it: 1. `rails new name_of_app` with Rails version 8.0.0.beta1 2. add devise to Gemfile and `bundle` 3. run Rails in production with `RAILS_ENV=production PORT=3001 rails s` 4. `curl localhost:3001/cable` Without devise you get `Page not found%` response with these logs: ``` [0812aa3e-bfcd-41a5-958c-c27ab4b2a982] Started GET "/cable" for 127.0.0.1 at 2024-09-27 13:45:09 +0200 [0812aa3e-bfcd-41a5-958c-c27ab4b2a982] Started GET "/cable"[non-WebSocket] for 127.0.0.1 at 2024-09-27 13:45:09 +0200 [0812aa3e-bfcd-41a5-958c-c27ab4b2a982] Failed to upgrade to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: , HTTP_UPGRADE: ) [0812aa3e-bfcd-41a5-958c-c27ab4b2a982] Finished "/cable"[non-WebSocket] for 127.0.0.1 at 2024-09-27 13:45:09 +0200 ``` as expected. With devise you get 404 page with these logs: ``` [6795276c-a698-4bf8-bf9f-d4394a41e648] Started GET "/cable" for 127.0.0.1 at 2024-09-27 13:44:46 +0200 [6795276c-a698-4bf8-bf9f-d4394a41e648] [6795276c-a698-4bf8-bf9f-d4394a41e648] ActionController::RoutingError (No route matches [GET] "/cable"): [6795276c-a698-4bf8-bf9f-d4394a41e648] ``` With devise and `reload_routes` set to `false` in the initializer it also works, but probably it should work out of the box? 😅 Let me know if something isn't clear and/or how I can help out.

I saw it mentioned in the gorails discord chat the other day

Topic		Replies	Views
Kamal + Traefik + SSL = Actioncable not working rubyonrails-talk	1	259	July 24, 2024
An application resource called Cables rubyonrails-talk	1	222	March 9, 2017
Differentiating between connection and reconnection in actioncable. rubyonrails-core	0	182	August 7, 2017
[ActionCable] Connection issues behind reverse proxy rubyonrails-core	0	669	May 26, 2022
Action Cabe rubyonrails-talk	0	173	September 6, 2017

Problems with Kamal and ActionCable connection

Related topics

More Resources