Understanding OpenTelemetry

In the current world most of the companies are trying to leverage all the brand new features and benefits of designing application in cloud native form. While it brings all of the latest and greatest features it also brings in some problems. As cloud native applications are built in distributed microservices nature, its brings the problem of traceability and monitoring all microservices.
A single endpoint might cut across number of microservices to complete its job. It is very important we should have deep analysis, metrics available to know what’s happening in those microservices to troubleshoot issues.
We will look into the solution to this problem using opens-source OpenTelemetry tool.
What is OpenTelemetry
OpenTelemetry is a set of APIs, SDKs, tooling and integrations that are designed for the creation and management of telemetry data such as traces, metrics, and logs. This helps you analyze your software’s performance and behavior.
OpenTelemetry provides a vendor-agnostic implementation i.e. irrespective of what technology you use that can be configured to send telemetry data to the backend(s) of your choice. It supports a variety of popular open-source projects including Jaeger and Prometheus.
Why you need OpenTelemetry and what it can do
So, why do we need to use tools like OpenTelemetry, well you might have already guessed it right,. In cloud-native technology stacks, distributed and polyglot architectures are the norm. Distributed architectures introduce a variety of operational challenges including how to solve availability and performance issues quickly. These challenges have led to the rise of observability.
Telemetry data is needed to power observability products. Traditionally, telemetry data has been provided by either open-source projects or commercial vendors. With a lack of standardization, the net result is the lack of data portability and the burden on the user to maintain the instrumentation.
The OpenTelemetry project solves these problems by providing a single, vendor-agnostic solution. The project has broad industry support and adoption from cloud providers, vendors and end users.
OpenTelemetry provides you with:
- A single, vendor-agnostic instrumentation library per language with support for both automatic and manual instrumentation.
- A single collector tool that can be deployed in a variety of ways including as an agent or gateway.
- An end-to-end implementation to generate, emit, collect, process and export telemetry data.
- Full control of your data with the ability to send data to multiple destinations in parallel through configuration.
- Open-standard semantic conventions to ensure vendor-agnostic data collection
- The ability to support multiple context propagation formats in parallel to assist with migrating as standards evolve.
Architecture
Here is high level architecture of OpenTelemetry including main components of the system. I have tried to make this diagram more simplified and self explanatory so as to understand the bigger picture of the design. We will go through each of these components.
Components of OpenTelemetry
- Data Collector
- SDKs
- Instrumentation
- Data Sources
Data Collector
The OpenTelemetry project makes telemetry data collection easier via the OpenTelemetry Collector. The OpenTelemetry Collector is a vendor-independent solution for receiving, processing, and exporting telemetry data. It eliminates the need to run, administer, and maintain multiple agents/collectors in order to handle open-source observability data formats (such as Jaeger, Prometheus, and others) that are sent to one or more open-source or commercial back-ends. The Collector also allows end-users control over their data. The Collector is where instrumentation libraries send their telemetry data by default.
Theere are 2 ways how you can use the OpenTelemetry Collector:
- Agent: A Collector instance running with the application or on the same host as the application (e.g. binary, sidecar, or daemonset).
- Gateway: One or more Collector instances running as a standalone service (e.g. container or deployment) typically per cluster, datacenter or region.
Agent
It is suggested that the Agent be installed on each computer in a network. As a result, the Agent can receive and enhance telemetry data (both push and pull) as well as metadata such as custom tags and infrastructure information. Furthermore, the Agent can do functions that would usually be handled by client instrumentation, such as batching, retry, encryption, compression, and more. OpenTelemetry instrumentation libraries will automatically export their data if a Collector is operating locally.
Gateway
Additionally, a Gateway cluster can also be set up in any cluster, datacenter, or region. A Gateway cluster can provide improved capabilities above the Agent, such as tail-based sampling, by running as a stand-alone service. A Gateway cluster can also reduce the number of egress points needed to deliver data and centralise API token management. A simple load balancer may easily expand the architecture based on performance needs because each Collector instance in a Gateway cluster functions independently. When a gateway cluster is set up, it normally gets data from the environment’s Agents.
The Collector consists of three components that access telemetry data:
- Receivers:
A receiver, which can be push or pull based, is how data gets into the Collector. Receivers may support one or more data sources.
- Processors
Processors are run on data between being received and being exported. Processors are optional though some are recommended.
- Exporters
An exporter, which can be push or pull based, is how you send data to one or more backends/destinations. Exporters may support one or more data sources.
SDKs
Implementation of the API with processing and exporting capabilities. Defined per data source as well as for other aspects including resources and configuration.
Instrumentation
The OpenTelemetry initiative simplifies the process of instrumenting apps. For each language, instrumentation libraries serve as a central repository. Additional repositories for non-core components and automatic instrumentation may or may not be provided. The following repositories, for example, are provided by Java instrumentation libraries:
- Core: Provides an implementation of the OpenTelemetry API and SDK and can be used to manually instrument an application.
- Instrumentation: All the core functionality plus automatic instrumentation for a variety of libraries and frameworks.
- Contrib: Optional components such as JMX metric gathers.
Automatic Instrumentation
One or more dependencies must be introduced to enable automatic instrumentation. The manner in which dependencies are added varies by language. These dependencies will add OpenTelemetry API and SDK functionality at the very least. Per instrumentation dependencies are also required in some languages. It’s also possible that exporter dependencies are required. See the specification repository for additional information on the OpenTelemetry API and SDK.
Configure OpenTelemetry Instrumentation
Environment variables and maybe language-specific techniques, such as system properties in Java, are accessible for configuration. A service name must be defined at the very least to identify the service being instrumented.
Manual Instrumentation
- Import the OpenTelemetry API and SDK
To begin, import OpenTelemetry into your service code. If you’re writing a library or other component that’s meant to be consumed by a runnable binary, you’d only use the API as a dependency. If your artifact is a stand-alone process or service, you’ll need to rely on the API and SDK.
- Configure the OpenTelemetry API
- You must first construct a tracer and/or meter provider before you can create traces or metrics.
- The service will then provide you with a tracer or meter instance, which you will name and version.
- If you’re developing a library, for example, name it after your library because it will namespace all spans or metric events produced.
- It’s also a good idea to include a version string that corresponds to your library’s or service’s current version.
- Configure the OpenTelemetry SDK
If you’re creating a service process, you’ll also need to set up the SDK so that you can export your telemetry data to a backend for analysis. This configuration should be handled programmatically using a configuration file or another mechanism. There are additional per-language customization options that can use.
- Create Telemetry Data
After you’ve configured the API and SDK, you can use the tracer and metre objects you got from the provider to create traces and metric events. You can also use a plugin or integration to create traces and metric events for you.
Export Data
You’ll need to send telemetry data somewhere once you’ve created it. OpenTelemetry allows you to export data from your process to an analytical backend in one of two ways: directly from the process or via the OpenTelemetry Collector.
Data sources
OpenTelemetry supports multiple data sources as defined below. More data sources may be added in the future.
Traces
Traces track the progression of a single request, called a trace, as it is handled by services that make up an application.
Metrics
A metric is a measurement about a service, captured at runtime. Logically, the moment of capturing one of these measurements is known as a metric event which consists not only of the measurement itself, but the time that it was captured and associated metadata.
Logs
A log is a timestamped text record, either structured (recommended) or unstructured, with metadata.
Baggage
In addition to trace propagation, OpenTelemetry provides a simple mechanism for propagating name/value pairs, called baggage.
Conclusion
Hope this post helps you to clear things about OpenTelemetry. I will be sharing some more details with step-by-step guide in my upcoming posts, so keep watching this space.