Native support for OpenTracing

With the advent of OpenTracing (https://opentracing.io) along with an official Ruby library (GitHub - opentracing/opentracing-ruby: OpenTracing API for Ruby) as well as growing industry support (e.g. Datadog + OpenTracing: Embracing the Open Standard for APM | Datadog and Tutorial: Distributed Tracing in Ruby with OpenTracing | Scout APM Blog), would it make sense to provide native tracing of Rails using the opentracing-ruby library?

At Zendesk we’re using Datadog’s proprietary tracing API, which monkey patches Rails and other libraries in order to trace key interactions. I think a more sustainable approach would be for libraries to include tracing support out of the box using the standardized OpenTracing APIs. It is then merely a matter of hooking up e.g. the Datadog trace collector in order to get a working tracing setup.

If there’s interest in this I’d be willing to contribute code. I’ve done a bunch of working in order to trace various aspects of Rails, including Rack middleware and before/after filters, but without native support, these implementations are brittle and prone to breakage when internal Rails APIs change.

I’d love to hear your thoughts on this.

Cheers,

Daniel Schierbeck

Hey Daniel,

Absolutely! We’re looking at OpenCensus (https://opencensus.io) integration, which seems to be leapfrogging OpenTracing in standardization and adoption.

Current Ruby integration, including early Rack and Rails support: https://github.com/census-instrumentation/opencensus-ruby

Datadog exporter: https://github.com/DataDog/opencensus-go-exporter-datadog

Stackdriver exporter: https://github.com/census-ecosystem/opencensus-ruby-exporter-stackdriver

Zipkin/Jaeger exporter: https://github.com/census-ecosystem/opencensus-ruby-exporter-zipkin

Or use a local collector/relay: https://github.com/census-instrumentation/opencensus-service

Production use is fantastic, but I’d particularly love to see a collector and built-in visualization for local app development and tests.

We have an existing ActiveSupport::Notifications API which works much like typical parent-span instrumentation, but it doesn’t propagate or report trace context. For deeper Rails integration, we could adapt the AS::N design to more directly map to OpenCensus, or introduce ActiveSupport::Tracing if there’s too much mismatch or compatibility concern. That’d allow these libraries to plug in directly without needing to carefully instrument Rails on their own. Rails should be able to participate in distributed tracing out of the box, report stats out of the box, show traces and stats in development mode, and flip between production APM vendors without specialized integration.

At Basecamp, we have a home-grown StatsD setup, similar to Datadog, that hooks Active Support notifications (https://signalvnoise.com/posts/3091-pssst-your-rails-application-has-a-secret-to-tell-you). We also parse logs from Kafka to reconstruct some traces. We’d love to extract this and rely on Rails to natively export traces and stats.

I’d love to hear what you’re doing at Zendesk, where you’re headed, and whether this sketch aligns well. And anyone else who’s working in this area!

Best,

Jeremy

Hey Daniel,

Absolutely! We’re looking at OpenCensus (https://opencensus.io) integration, which seems to be leapfrogging OpenTracing in standardization and adoption.

So now there are two standards? :grimacing:

Is there clarity on where things are going? The point of a standard would be that we’d only need to support one, and not have an extra layer of abstraction.

Current Ruby integration, including early Rack and Rails support: https://github.com/census-instrumentation/opencensus-ruby

Datadog exporter: https://github.com/DataDog/opencensus-go-exporter-datadog

Stackdriver exporter: https://github.com/census-ecosystem/opencensus-ruby-exporter-stackdriver

Zipkin/Jaeger exporter: https://github.com/census-ecosystem/opencensus-ruby-exporter-zipkin

Or use a local collector/relay: https://github.com/census-instrumentation/opencensus-service

Compared to OpenTracing, is the ecosystem mature enough to warrant us going all-in on this? I definitely see the theoretical point of a unified stats and trace standard, especially seeing as Statsd has fragmented somewhat, but is it a horse we want to bet on? I’m fine with either, as long as there are working, scalable solutions today for getting things working in a variety of languages and without duct tape. For instance, it seems like the Datadog exporter only supports Go?

Production use is fantastic, but I’d particularly love to see a collector and built-in visualization for local app development and tests.

Me too :smile: but that’s probably not going to be my initial focus, I’m meanly looking at the instrumentation side of things.

We have an existing ActiveSupport::Notifications API which works much like typical parent-span instrumentation, but it doesn’t propagate or report trace context.

For deeper Rails integration, we could adapt the AS::N design to more directly map to OpenCensus, or introduce ActiveSupport::Tracing if there’s too much mismatch or compatibility concern.

I’ve done a bunch of work on AS::N in the past (I’m the one who created the original ActiveSupport::Subscriber base class) and feel pretty confident that it can form the basis for this work. We’d probably want to instrument more places, e.g. each middleware invocation and maybe filters in controllers, but otherwise it’s a good starting point.

I do think we need to have a specific mapping from AS::N to the tracing backend, selecting which payload keys should be propagated and maybe formatting some of the values, so it’s probably not just a matter of copying everything verbatim. It sounds like you’re doing the verbatim thing at Basecamp though – how is that working out? Would you be in favor of that?

We’ve seen issues when tracing gets too granular or too much data is captured, so I’d like to be a bit conservative.

That’d allow these libraries to plug in directly without needing to carefully instrument Rails on their own. Rails should be able to participate in distributed tracing out of the box, report stats out of the box, show traces and stats in development mode, and flip between production APM vendors without specialized integration.

Yup, that’s my goal as well. APM vendors should not compete on their quality of instrumentation, but on the quality of their product. One thing I want to emphasize though is that I think we need to push for standardization beyond Rails. AS::N could have been a great standard if it wasn’t tied to AS – it hasn’t seen widespread adoption because gem authors are unwilling to add a dependency on AS, I think. So we should think holistically about the entire ecosystem and what would make sense for Ruby as a whole.

At Basecamp, we have a home-grown StatsD setup, similar to Datadog, that hooks Active Support notifications (https://signalvnoise.com/posts/3091-pssst-your-rails-application-has-a-secret-to-tell-you). We also parse logs from Kafka to reconstruct some traces. We’d love to extract this and rely on Rails to natively export traces and stats.

I’d love to hear what you’re doing at Zendesk, where you’re headed, and whether this sketch aligns well. And anyone else who’s working in this area!

We’re currently all-in on Datadog, and I’ve helped improve their instrumentation. However, I keep running into ad-hoc instrumentation being brittle, which is why I’m interested in first-class support. I think the only sustainable path forward is that gems natively support some form of tracing, either through AS::N (which would need to be extracted) or directly with a standardized tracing gem.

How would you feel about extracting AS::N, actually? Then gems could adopt it for pub/sub and it would be a lot simpler to plug in a tracing subscriber.

Looks like OpenCensus already has support for development mode UIs, currently only for Java and Go though: https://opencensus.io/core-concepts/z-pages/

Have you deployed an OpenCensus integration to production? At least the metrics part looks pretty advanced, maybe too much so.

Cheers,

Daniel

Hey Daniel,

Absolutely! We’re looking at OpenCensus (https://opencensus.io) integration, which seems to be leapfrogging OpenTracing in standardization and adoption.

So now there are two standards? :grimacing:

Is there clarity on where things are going? The point of a standard would be that we’d only need to support one, and not have an extra layer of abstraction.

Right? :joy:

Some background: https://github.com/census-instrumentation/opencensus-java/issues/482

It’s still not all that clear and I haven’t seen a great discussion of the hows and whys, but it seems that OpenTracing is a common-API effort whereas OpenCensus is an umbrella effort covering common formats / wire protocol and library/exporter implementations. OpenCensus clients could talk to OpenTracing services.

The fact that neither project seems to directly address the other in their FAQs suggests that there are deeper organizational or community roots to their kinda-overlapping-but-kinda-distinct disposition.

We’re running with OpenCensus because it standardizes protocols, exporters, and implementations, which means we don’t end up with a common API but saddled with vendor-specific “leakage” that makes it hard to switch APMs in practice.

Current Ruby integration, including early Rack and Rails support: https://github.com/census-instrumentation/opencensus-ruby

Datadog exporter: https://github.com/DataDog/opencensus-go-exporter-datadog

Stackdriver exporter: https://github.com/census-ecosystem/opencensus-ruby-exporter-stackdriver

Zipkin/Jaeger exporter: https://github.com/census-ecosystem/opencensus-ruby-exporter-zipkin

Or use a local collector/relay: https://github.com/census-instrumentation/opencensus-service

Compared to OpenTracing, is the ecosystem mature enough to warrant us going all-in on this? I definitely see the theoretical point of a unified stats and trace standard, especially seeing as Statsd has fragmented somewhat, but is it a horse we want to bet on? I’m fine with either, as long as there are working, scalable solutions today for getting things working in a variety of languages and without duct tape. For instance, it seems like the Datadog exporter only supports Go?

The ecosystem is not mature enough, but we’re skating to that puck, so to speak, with Rails 6. We can drive maturity on the Ruby end by holding the local exporters, client, and app integration to our “just works” standard.

Production use is fantastic, but I’d particularly love to see a collector and built-in visualization for local app development and tests.

Me too :smile: but that’s probably not going to be my initial focus, I’m meanly looking at the instrumentation side of things.

We have an existing ActiveSupport::Notifications API which works much like typical parent-span instrumentation, but it doesn’t propagate or report trace context.

For deeper Rails integration, we could adapt the AS::N design to more directly map to OpenCensus, or introduce ActiveSupport::Tracing if there’s too much mismatch or compatibility concern.

I’ve done a bunch of work on AS::N in the past (I’m the one who created the original ActiveSupport::Subscriber base class) and feel pretty confident that it can form the basis for this work. We’d probably want to instrument more places, e.g. each middleware invocation and maybe filters in controllers, but otherwise it’s a good starting point.

Sweet! Yes.

I do think we need to have a specific mapping from AS::N to the tracing backend, selecting which payload keys should be propagated and maybe formatting some of the values, so it’s probably not just a matter of copying everything verbatim. It sounds like you’re doing the verbatim thing at Basecamp though – how is that working out? Would you be in favor of that?

We’re doing a lot of mapping/translation/filtering, too. Particularly since we are using StatsD without tagging support.

We’ve seen issues when tracing gets too granular or too much data is captured, so I’d like to be a bit conservative.

Ditto. Traces should be introduced where meaningful and actionable, not just because we have the data.

That’d allow these libraries to plug in directly without needing to carefully instrument Rails on their own. Rails should be able to participate in distributed tracing out of the box, report stats out of the box, show traces and stats in development mode, and flip between production APM vendors without specialized integration.

Yup, that’s my goal as well. APM vendors should not compete on their quality of instrumentation, but on the quality of their product. One thing I want to emphasize though is that I think we need to push for standardization beyond Rails. AS::N could have been a great standard if it wasn’t tied to AS – it hasn’t seen widespread adoption because gem authors are unwilling to add a dependency on AS, I think. So we should think holistically about the entire ecosystem and what would make sense for Ruby as a whole.

Agreed. A higher-level AS::N-like DSL in OpenCensus would be welcome. Using it directly feels pretty bare-metal today.

At Basecamp, we have a home-grown StatsD setup, similar to Datadog, that hooks Active Support notifications (https://signalvnoise.com/posts/3091-pssst-your-rails-application-has-a-secret-to-tell-you). We also parse logs from Kafka to reconstruct some traces. We’d love to extract this and rely on Rails to natively export traces and stats.

I’d love to hear what you’re doing at Zendesk, where you’re headed, and whether this sketch aligns well. And anyone else who’s working in this area!

We’re currently all-in on Datadog, and I’ve helped improve their instrumentation. However, I keep running into ad-hoc instrumentation being brittle, which is why I’m interested in first-class support. I think the only sustainable path forward is that gems natively support some form of tracing, either through AS::N (which would need to be extracted) or directly with a standardized tracing gem.

How would you feel about extracting AS::N, actually? Then gems could adopt it for pub/sub and it would be a lot simpler to plug in a tracing subscriber.

I’d be concerned about being able to evolve AS::N in step with Rails. But I think there’s definitely room for extracting the underlying setup to OpenCensus, if they’re open to that, or to a higher-level gem that wraps OpenCensus. Then AS::N could start to rely on that directly, rather than bridging to it.

Looks like OpenCensus already has support for development mode UIs, currently only for Java and Go though: https://opencensus.io/core-concepts/z-pages/

This is a great starting point. Rails dev can level up from there.

Have you deployed an OpenCensus integration to production? At least the metrics part looks pretty advanced, maybe too much so.

Not in production. We have a branch of Basecamp that exports to Stackdriver. In production, we’d prefer to use local agents and collectors rather than export directly to a vendor: https://github.com/census-instrumentation/opencensus-service

Looks like OpenCensus already has support for development mode UIs, currently only for Java and Go though: https://opencensus.io/core-concepts/z-pages/

This is a great starting point. Rails dev can level up from there.

Have you deployed an OpenCensus integration to production? At least the metrics part looks pretty advanced, maybe too much so.

Not in production. We have a branch of Basecamp that exports to Stackdriver. In production, we’d prefer to use local agents and collectors rather than export directly to a vendor: https://github.com/census-instrumentation/opencensus-service

Sounds like you are farther ahead than us then – we’re pushing stuff directly to Datadog right now.

How about this as a starting point: I try to add proper AS::N instrumentation to the places where monkey patches are currently used, e.g. middleware execution. I’ll CC you on the PRs. Maybe we can stay in touch regarding your experience with OpenCensus in production, what, if anything, would be needed in order to “natively” support it, and anything else you might think relevant? It sounds like it’s too premature to add a dependency on the opencensus gem and push traces from Rails itself, unless you think we’re ready?

One problem is that we’re unlikely to be able to run on Rails master in production, so there’s little production feedback I can give.

Looks like OpenCensus already has support for development mode UIs, currently only for Java and Go though: https://opencensus.io/core-concepts/z-pages/

This is a great starting point. Rails dev can level up from there.

Have you deployed an OpenCensus integration to production? At least the metrics part looks pretty advanced, maybe too much so.

Not in production. We have a branch of Basecamp that exports to Stackdriver. In production, we’d prefer to use local agents and collectors rather than export directly to a vendor: https://github.com/census-instrumentation/opencensus-service

Sounds like you are farther ahead than us then – we’re pushing stuff directly to Datadog right now.

Your instrumentation is likely further along since Datadog supports tagging and ingests traces :slight_smile:

How about this as a starting point: I try to add proper AS::N instrumentation to the places where monkey patches are currently used, e.g. middleware execution. I’ll CC you on the PRs. Maybe we can stay in touch regarding your experience with OpenCensus in production, what, if anything, would be needed in order to “natively” support it, and anything else you might think relevant? It sounds like it’s too premature to add a dependency on the opencensus gem and push traces from Rails itself, unless you think we’re ready?

Great path; agreed.

One problem is that we’re unlikely to be able to run on Rails master in production, so there’s little production feedback I can give.

A “rails-canary” branch could be enough!

Hi folks,

Thought I’d jump in here as the engineer who has done most of the implementation on the opencensus gem so far. Ruby support in OpenCensus is currently a bit behind other languages—we don’t yet have support for stats, z-pages, and some other things. So we’re starting a push to get it up to date; I’ve been doing some updates myself, and it looks like Google will be donating another engineer for a period of time.

I’d love to help get OpenCensus’s instrumentation fleshed out for people’s use cases. The current gem does have basic integration with AS::N to collect trace information for events that are instrumented, but I’m trying very very hard not to introduce monkey patches. If you’re using the opencensus gem and have particular instrumentation needs, I’ll be happy to help with PRs and get them committed upstream. Please don’t hesitate to reach out to me.

There also isn’t a Datadog exporter yet for Ruby (that I know of), but I’d love to help get one started up.

Daniel Azuma

Hi folks,

Thought I’d jump in here as the engineer who has done most of the implementation on the opencensus gem so far. Ruby support in OpenCensus is currently a bit behind other languages—we don’t yet have support for stats, z-pages, and some other things. So we’re starting a push to get it up to date; I’ve been doing some updates myself, and it looks like Google will be donating another engineer for a period of time.

Sounds great!

I’d love to help get OpenCensus’s instrumentation fleshed out for people’s use cases. The current gem does have basic integration with AS::N to collect trace information for events that are instrumented, but I’m trying very very hard not to introduce monkey patches. If you’re using the opencensus gem and have particular instrumentation needs, I’ll be happy to help with PRs and get them committed upstream. Please don’t hesitate to reach out to me.

One thing I’m unsure of is naming conventions – with Datadog, we’ll have a “span name” that’s close to e.g. the AS::N event names, such as rack.request. In addition to that, there’s the notion of a “resource”, typically the name of an endpoint, e.g. ArticlesController#show. That part seems to be missing from OpenCensus, and the span names are overloaded with both span type info and “endpoint” names. Is there a standardized way to capture both? This is important because it’s nice to have a small set of span types, but the resources can number in the thousands and you’ll typically filter those.

There also isn’t a Datadog exporter yet for Ruby (that I know of), but I’d love to help get one started up.

I don’t really get this part of OC – since there’s a standard wire format, would you not want an external process doing the exporting?

Hi folks,

Thought I’d jump in here as the engineer who has done most of the implementation on the opencensus gem so far. Ruby support in OpenCensus is currently a bit behind other languages—we don’t yet have support for stats, z-pages, and some other things. So we’re starting a push to get it up to date; I’ve been doing some updates myself, and it looks like Google will be donating another engineer for a period of time.

Sounds great!

I’d love to help get OpenCensus’s instrumentation fleshed out for people’s use cases. The current gem does have basic integration with AS::N to collect trace information for events that are instrumented, but I’m trying very very hard not to introduce monkey patches. If you’re using the opencensus gem and have particular instrumentation needs, I’ll be happy to help with PRs and get them committed upstream. Please don’t hesitate to reach out to me.

One thing I’m unsure of is naming conventions – with Datadog, we’ll have a “span name” that’s close to e.g. the AS::N event names, such as rack.request. In addition to that, there’s the notion of a “resource”, typically the name of an endpoint, e.g. ArticlesController#show. That part seems to be missing from OpenCensus, and the span names are overloaded with both span type info and “endpoint” names. Is there a standardized way to capture both? This is important because it’s nice to have a small set of span types, but the resources can number in the thousands and you’ll typically filter those.

There also isn’t a Datadog exporter yet for Ruby (that I know of), but I’d love to help get one started up.

I don’t really get this part of OC – since there’s a standard wire format, would you not want an external process doing the exporting?

We do. See GitHub - census-instrumentation/opencensus-service: OpenCensus service allows OpenCensus libraries to export to an exporter service rather than having to link vendor-specific exports. for the upcoming work.

I’d love to help get OpenCensus’s instrumentation fleshed out for people’s use cases. The current gem does have basic integration with AS::N to collect trace information for events that are instrumented, but I’m trying very very hard not to introduce monkey patches. If you’re using the opencensus gem and have particular instrumentation needs, I’ll be happy to help with PRs and get them committed upstream. Please don’t hesitate to reach out to me.

One thing I’m unsure of is naming conventions – with Datadog, we’ll have a “span name” that’s close to e.g. the AS::N event names, such as rack.request. In addition to that, there’s the notion of a “resource”, typically the name of an endpoint, e.g. ArticlesController#show. That part seems to be missing from OpenCensus, and the span names are overloaded with both span type info and “endpoint” names. Is there a standardized way to capture both? This is important because it’s nice to have a small set of span types, but the resources can number in the thousands and you’ll typically filter those.

Not sure! Would check out the Datadog Go exporter to start, since it’s mapping from one span to another. Looks like it’s just using the span name as the resource name rather than pulling it from an annotated attribute.

I’d naively expect to see spans annotated with the controller action (picked up from the active trace context) and have that exported as Datadog resource.

I don’t really get this part of OC – since there’s a standard wire format, would you not want an external process doing the exporting?

Direct export can be appealing for easy-setup or quick-deploy scenarios like dev/test, one-click Heroku apps, or short-lived services like one-off jobs run outside the main cluster.

First PR is up: https://github.com/rails/rails/pull/34305

Nice! :+1:

And merged (targeting Rails 6.0.0.beta4) :blush:

Awesome! Are you working on other OC integrations?