ELK Stack step by step Guide

elk stack step by step

ELK stack is an open source technology used for log aggregation and analytics purpose. Its basically comprises of 3 different tools:

  1. Elasticsearch
  2. Logstash
  3. Kibana

We will go through each one of those separately and get to know about how this stack can be used for log aggregation and analytics purposes

What is Log Aggregation

Basically log aggregation is a technique in which all of log sources from different application are aggregated or consolidated into a single place. This aggregated data helps organizations in multiple ways. They can try to find our patterns in the log, easy to debug issues using simple queries, apply AI/ML techniques over data, trigger alerts based on specific events and so on.

Now a days everyone is moving towards a microservice architecute because of several advantages it provides. So int his microservice environment

The application consists of multiple services and service instances that are running on multiple machines. Requests often span multiple service instances. Each service instance generates writes information about what it is doing to a log file in a standardized format. The log file contains errors, warnings, information and debug messages. In such scenarios it would be difficult to understand the behaviour of the application and troubleshoot issue. Log aggregation techniques are very useful in such use cases.

 

What is Log Analytics

Log analytics provides different ways to analyse aggregated data. You can different questions to your data and try to predict things for the future. For example, lets take an example of e-commerce company, they can query their data to find out product wise count of sell, which category of products are selling most, which region or city is lacking in orders etc.

It also provides ways to apply different aggregation techniques on your data which will again be very helpful to organizations

ELK Stack

Now we will deep dive into each component of the ELK stack.

Elasticsearch

What is Elasticsearch

Elasticsearch is the distributed search and analytics engine. Elasticsearch is the component from ELK stack which does indexing, search, and analysis part.

Elasticsearch provides near real-time search and analytics for all types of data. Whether you have structured or unstructured text, numerical data, or geospatial data, Elasticsearch can efficiently store and index it in a way that supports fast searches.

You can go far beyond simple data retrieval and aggregate information to discover trends and patterns in your data. And as your data and query volume grows, the distributed nature of Elasticsearch enables your deployment to grow seamlessly right along with it.

Elasticsearch offers speed and flexibility to handle data in a wide variety of use cases:

  • Add a search box to an app or website
  • Store and analyze logs, metrics, and security event data
  • Use machine learning to automatically model the behavior of your data in real time
  • Automate business workflows using Elasticsearch as a storage engine
  • Manage, integrate, and analyze spatial information using Elasticsearch as a geographic information system (GIS)
  • Store and process genetic data using Elasticsearch as a bioinformatics research tool

How Elasticsearch Works

Elasticsearch is a distributed document store. Instead of storing information as rows of columnar data, Elasticsearch stores complex data structures that have been serialized as JSON documents. When you have multiple Elasticsearch nodes in a cluster, stored documents are distributed across the cluster and can be accessed immediately from any node.

When a document is stored, it is indexed and fully searchable in near real-time–within 1 second. Elasticsearch uses a data structure called an inverted index that supports very fast full-text searches. In a full-text search, a search engine examines all of the words in every stored document as it tries to match search criteria

An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data. By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure.

Ultimately, however, you know more about your data and how you want to use it than Elasticsearch can. You can define rules to control dynamic mapping and explicitly define mappings to take full control of how fields are stored and indexed.

Defining your own mappings enables you to:

  • Distinguish between full-text string fields and exact value string fields
  • Perform language-specific text analysis
  • Optimize fields for partial matching
  • Use custom date formats
  • Use data types such as geo_point and geo_shape that cannot be automatically detected

How to Search your data

The Elasticsearch REST APIs support structured queries, full text queries, and complex queries that combine the two. Structured queries are similar to the types of queries you can construct in SQL. For example, you could search the gender and age fields in your employee index and sort the matches by the hire_date field. Full-text queries find all documents that match the query string and return them sorted by relevance

In addition to searching for individual terms, you can perform phrase searches, similarity searches, and prefix searches, and get autocomplete suggestions.

Have geospatial or other numerical data that you want to search? Elasticsearch indexes non-textual data in optimized data structures that support high-performance geo and numerical queries.

You can access all of these search capabilities using Elasticsearch’s comprehensive JSON-style query language (Query DSL). You can also construct SQL-style queries to search and aggregate data natively inside Elasticsearch, and JDBC and ODBC drivers enable a broad range of third-party applications to interact with Elasticsearch via SQL.

How to Analyze your data

Elasticsearch aggregations enable you to build complex summaries of your data and gain insight into key metrics, patterns, and trends.Aggregations enable you to answer questions like:

  • How many products sold by region?
  • What is count of cancelled order?
  • How many users have added products into cart but not buy yet?

You can also use aggregations to answer more subtle questions, such as:

  • What are most popular products?
  • Are there any unusual or defective products?

Because aggregations leverage the same data-structures used for search, they are also very fast. This enables you to analyze and visualize your data in real time. Your reports and dashboards update as your data changes so you can take action based on the latest information.

We will cover more on Elasticsearch API’s and how to invoke them in installation guide. Please continue on reading

Logstash

What is Logstash?

Logstash is an open source data collection engine with real-time pipelining capabilities. Logstash can dynamically unify data from different sources and normalize the data into destinations of your choice. Cleanse and democratize all your data for diverse advanced downstream analytics and visualization use cases. The Power of Logstash

logstash
source: elastic.co

Collect more, so you can know more. Logstash welcomes data of all shapes and sizes. For example

  1. Logs and Metrics
  2. HTTP requests
  3. Data stores and streams

Kibana

What is Kibana

Kibana is An open-source analytics and visualization platform. Use Kibana to explore your Elasticsearch data, and then build beautiful visualizations and dashboards.

Using Kibana you can manage your security settings, assign user roles, take snapshots, roll up your data, and more all from the convenience of a Kibana UI.

Kibana home page

Ingest data

Kibana is designed to use Elasticsearch as a data source. Think of Elasticsearch as the engine that stores and processes the data, with Kibana sitting on top.

To start working with your data in Kibana, use one of the many ingest options, available from the home page. You can collect data from an app or service or upload a file that contains your data. If you’re not ready to use your own data, you can add a sample data set to give Kibana a test drive.

Built-in options for adding data to Kibana:  Add data

Explore & query

Ready to dive into your data? With Discover, you can explore your data and search for hidden insights and relationships. Ask your questions, and then narrow the results to just the data you want.

Discover UI

Visualize & analyze

A visualization is worth a thousand log lines, and Kibana provides many options for showcasing your data. Use Lens, to rapidly build charts, tables, metrics, and more. If there is a better visualization for your data, Lens suggests it, allowing for quick switching between visualization types.

Once your visualizations are just the way you want, use Dashboard to collect them in one place. A dashboard provides insights into your data from multiple perspectives.

Sample eCommerce data set dashboard

ELK stack Installation

Now we will see different ways to install ELK stack on local with step by step guide.

Normal Installation

Elasticsearch Installation

Follow below step by step guide to install Elasticsearch

  1. Download open source version of Elasticsearch from below this link
  2. Unzip it with your favourite unzip tool. This will create a folder called elasticsearch-7.10.2, which we will refer to as %ES_HOME%. In a terminal window, cd to the %ES_HOME% directory, for instance:
cd c:\elasticsearch-7.10.2
\bin

3. Run Elasticsearch using below command

C\elasticsearch-7.10.2\bin>elasticsearch.bat -Ecluster.name=my_cluster -Enode.name=node1

4. Verify that Elasticsearch is running using below curl command. By default Elasticsearch runs on 9200 port

curl --location --request GET 'http://localhost:9200/'

You should get below response

{
    "name": "node1",
    "cluster_name": "my_cluster",
    "cluster_uuid": "3MoIsqByQn2Ox9TcbPkd1w",
    "version": {
        "number": "7.10.2",
        "build_flavor": "oss",
        "build_type": "zip",
        "build_hash": "747e1cc71def077253878a59143c1f785afa92b9",
        "build_date": "2021-01-13T00:42:12.435326Z",
        "build_snapshot": false,
        "lucene_version": "8.7.0",
        "minimum_wire_compatibility_version": "6.8.0",
        "minimum_index_compatibility_version": "6.0.0-beta1"
    },
    "tagline": "You Know, for Search"
}

You can use Postman to invoke the REST API

elasticsearch get api

You have successfully inatalled Elasticsearch on your local machine.

Logstash Installation

Follow below steps to install Logstash

  1. Download open source version of Logstash from this link
  2. Unzip the zip folder and create logstash-filter.config file in /config folder. Copy below contents to the file
input { stdin { } }

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
  date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
}

output {
  elasticsearch { hosts => ["localhost:9200"] }
  stdout { codec => rubydebug }
}

The “logstash-filter.config” file consists of 3 main sections

  1. input
  2. filter and
  3. output

Input part tells the logstash engine from where to take the input data. In our example we will provide it through command line

Filter part actually applies the given filters to the data before outputing the data

And output section tells logstash where to push filtered data. As you can see we are pushing data to elasticsearch

3. Run the below command to start the logstash using our conf file

D:\logstash-7.10.2\bin>logstash -f ..\config\logstash-filter.conf

4. Enter below sample logs on the command line of logstash.

134.76.249.10 - - [17/Jan/2021:11:05:57 +0000] "GET /style2.css HTTP/1.1" 200 4877 "http://www.semicomplete.com/projects/xdotool/" "Mozilla/5.0 (X11; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0"
134.76.249.10 - - [17/Jan/2021:11:05:23 +0000] "GET /favicon.ico HTTP/1.1" 200 3638 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0"
134.76.249.10 - - [17/Jan/2021:11:05:40 +0000] "GET /images/jordan-80.png HTTP/1.1" 200 6146 "http://www.semicomplete.com/projects/xdotool/" "Mozilla/5.0 (X11; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0"
134.76.249.10 - - [17/Jan/2021:11:05:50 +0000] "GET /images/web/2009/banner.png HTTP/1.1" 200 52315 "http://www.semicomplete.com/style2.css" "Mozilla/5.0 (X11; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0"
134.76.249.10 - - [17/Jan/2021:11:05:47 +0000] "GET /projects/xdotool HTTP/1.1" 301 339 "http://tuxradar.com/content/xdotool-script-your-mouse" "Mozilla/5.0 (X11; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0"
134.76.249.10 - - [17/Jan/2021:11:05:13 +0000] "GET /projects/xdotool/ HTTP/1.1" 200 12292 "http://tuxradar.com/content/xdotool-script-your-mouse" "Mozilla/5.0 (X11; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0"
66.249.73.135 - - [17/Jan/2021:11:05:26 +0000] "GET /?flav=atom HTTP/1.1" 200 32352 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
207.241.237.220 - - [17/Jan/2021:11:05:24 +0000] "GET /blog/tags/C?page=2 HTTP/1.0" 200 16311 "http://www.semicomplete.com/blog/tags/C" "Mozilla/5.0 (compatible; archive.org_bot +

5. Logstash will process and index above data and push it to our elasticsearch. We can verify using elasticsearch REST APIs.

First get the list of indexex

curl --location --request GET 'http://localhost:9200/_cat/indices?v'

Response will be

health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   logstash-2015.05.17 4NdMPN_0Sb2wA2znC8ZEOQ   1   1         10            0      109kb          109kb
yellow open   logstash-2013.12.11 nd-F5JxcS0isTmMTv90EFA   1   1          1            0     11.9kb         11.9kb
green  open   .kibana_1           UL2u6sXfShKm12aoLh95PQ   1   0         17            6     49.3kb         49.3kb
yellow open   logstash-2021.01.24 kR7bOMkYQTC2VA9E9A7I6A   1   1          1            0      5.1kb          5.1kb

Now copy the index from above response and invoke below API

curl --location --request GET 'http://localhost:9200/logstash-2013.12.11/_search'

Response will be your logstash data

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "logstash-2013.12.11",
                "_type": "_doc",
                "_id": "0wmLL3cBeWcpsmO9jqu8",
                "_score": 1.0,
                "_source": {
                    "auth": "-",
                    "verb": "GET",
                    "timestamp": "11/Dec/2013:00:01:45 -0800",
                    "@version": "1",
                    "host": "DESKTOP-RCB552F",
                    "ident": "-",
                    "clientip": "127.0.0.1",
                    "httpversion": "1.1",
                    "bytes": "3891",
                    "message": "127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] \"GET /xampp/status.php HTTP/1.1\" 200 3891 \"http://cadenza/xampp/navi.php\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\"\r",
                    "@timestamp": "2013-12-11T08:01:45.000Z",
                    "response": "200",
                    "request": "/xampp/status.php",
                    "agent": "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\"",
                    "referrer": "\"http://cadenza/xampp/navi.php\""
                }
            }
        ]
    }
}

Kibana Installation

Now we will see how to install Kibana. Follow below steps.

  1. Download opesn source version of the Kibana from this link.
  2. Unzip the zip folder
  3. Run the kibana using below command
D:\kibana-7.10.2-windows-x86_64\bin>kibana.bat

4. Go to browser and enter http://localhost:5601

kibana-add-data

5. Click on Add data

kibana-discover

6. Click on Discover

7. As we have already pushed data, we already having some sample data. Now create an Index and you will be able view if on Kibana

kibana-elastic-data

Docker Installation

To run a container using this image, you will need the following:

  • DockerInstall Docker, either using a native package (Linux) or wrapped in a virtual machine (Windows, OS X – e.g. using Boot2Docker or Vagrant).Note – As the sebp/elk image is based on a Linux image, users of Docker for Windows will need to ensure that Docker is using Linux containers.
  • A minimum of 4GB RAM assigned to DockerElasticsearch alone needs at least 2GB of RAM to run.With Docker for Mac, the amount of RAM dedicated to Docker can be set using the UI: see How to increase docker-machine memory Mac (Stack Overflow).
  • Access to TCP port 5044 from log-emitting clientsOther ports may need to be explicitly opened.

Installation

To pull this image from the Docker registry, open a shell prompt and enter:

$ sudo docker pull sebp/elk

Usage

Run a container from the image with the following command:

$ sudo docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -it --name elk sebp/elk

Using above command the whole ELK stack will be started.

The following environment variables may be used to selectively start a subset of the services:

  • ELASTICSEARCH_START: if set and set to anything other than 1, then Elasticsearch will not be started.
  • LOGSTASH_START: if set and set to anything other than 1, then Logstash will not be started.
  • KIBANA_START: if set and set to anything other than 1, then Kibana will not be started.

For example, the following command starts Elasticsearch only:

$ sudo docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 -it \
    -e LOGSTASH_START=0 -e KIBANA_START=0 --name elk sebp/elk

This command publishes the following ports, which are needed for proper operation of the ELK stack:

  • 5601 (Kibana web interface).
  • 9200 (Elasticsearch JSON interface).
  • 5044 (Logstash Beats interface, receives logs from Beats such as Filebeat).

The image exposes (but does not publish):

  • Elasticsearch’s transport interface on port 9300. Use the -p 9300:9300 option with the docker command above to publish it.
  • Logstash’s monitoring API on port 9600. Use the -p 9600:9600 option with the docker command above to publish it.

The figure below shows how the pieces fit together.

Access Kibana’s web interface by browsing to http://<your-host>:5601, where <your-host> is the hostname or IP address of the host Docker is running on (see note), e.g. localhost if running a local native version of Docker, or the IP address of the virtual machine if running a VM-hosted version of Docker.

Recommendation

If you want to go in depth and learn more about ELK stack I would recommend below books.

Conclusion

So, in this article we saw complete guide to the ELK stack by digging deeper into each of ELK component. We saw how ELK stack can be installed using normal and docker installation. 

References and links

https://www.elastic.co/

Tags:

Leave a Reply