BLOG POSTS
    MangoHost Blog / How to Install and Configure Elasticsearch on Ubuntu 24
How to Install and Configure Elasticsearch on Ubuntu 24

How to Install and Configure Elasticsearch on Ubuntu 24

Elasticsearch has become a cornerstone of modern data infrastructure, powering everything from application search features to log analytics and business intelligence dashboards. As a distributed, RESTful search and analytics engine built on Apache Lucene, it’s designed to handle large volumes of data in near real-time. This guide walks you through installing and configuring Elasticsearch on Ubuntu 24, covering everything from basic setup to production-ready configurations, common troubleshooting scenarios, and optimization techniques that’ll save you headaches down the road.

Understanding Elasticsearch Architecture

Before diving into installation, it’s worth understanding what makes Elasticsearch tick. At its core, Elasticsearch organizes data into indices (similar to databases), which contain documents (like database records) that are automatically distributed across shards for scalability and replicated for fault tolerance.

The beauty of Elasticsearch lies in its distributed nature. When you add more nodes to a cluster, Elasticsearch automatically redistributes data and handles failover scenarios. Each node can serve multiple roles: master nodes handle cluster coordination, data nodes store and process data, and coordinating nodes route requests and aggregate results.

Here’s how data flows through Elasticsearch:

  • Documents are indexed and stored in shards
  • Each shard is replicated across multiple nodes
  • Queries are distributed across relevant shards
  • Results are aggregated and returned to the client

Prerequisites and System Requirements

Ubuntu 24 provides an excellent foundation for Elasticsearch, but there are some specific requirements to consider:

Component Minimum Recommended Notes
RAM 2GB 8GB+ Elasticsearch heap should be ~50% of available RAM
CPU 2 cores 4+ cores More cores help with concurrent queries
Storage 10GB SSD with 100GB+ SSD dramatically improves query performance
Java OpenJDK 11 OpenJDK 17 Elasticsearch 8.x requires Java 17+

First, let’s check if Java is already installed and update the system:

sudo apt update
sudo apt upgrade -y
java -version

If Java isn’t installed or you need a newer version:

sudo apt install openjdk-17-jdk -y
java -version
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64

Step-by-Step Installation Guide

There are several ways to install Elasticsearch on Ubuntu 24. I’ll cover the most reliable method using the official Elastic repository, which ensures you get the latest stable version with proper package management.

Method 1: Official Elastic Repository (Recommended)

First, import the Elasticsearch PGP key and add the repository:

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elasticsearch-keyring.gpg

echo "deb [signed-by=/usr/share/keyrings/elasticsearch-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

Update the package list and install Elasticsearch:

sudo apt update
sudo apt install elasticsearch -y

During installation, you’ll see security configuration details including the elastic user password. Make sure to save this information:

# Example output during installation
--------------------------- Security autoconfiguration information ------------------------------
Authentication and authorization are enabled.
TLS for the transport and HTTP layers is enabled and configured.

The generated password for the elastic built-in superuser is : xB4k2mN9vC7wD8fE

If this node should join an existing cluster, you can reconfigure this with
'/usr/share/elasticsearch/bin/elasticsearch-reconfigure-node'

Method 2: Direct Package Download

If you prefer downloading the package directly:

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.11.0-amd64.deb
sudo dpkg -i elasticsearch-8.11.0-amd64.deb

Initial Configuration

Elasticsearch’s main configuration file is located at /etc/elasticsearch/elasticsearch.yml. Out of the box, it’s configured for a single-node setup with security enabled. Here’s how to customize it for different scenarios:

Basic Single-Node Configuration

Edit the configuration file:

sudo nano /etc/elasticsearch/elasticsearch.yml

Here’s a basic configuration for development:

# Cluster name
cluster.name: my-application

# Node name
node.name: node-1

# Network settings
network.host: localhost
http.port: 9200

# Discovery settings for single node
discovery.type: single-node

# Path settings
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

# Memory settings
bootstrap.memory_lock: true

# Security settings (for development)
xpack.security.enabled: false
xpack.security.enrollment.enabled: false
xpack.security.http.ssl.enabled: false
xpack.security.transport.ssl.enabled: false

Important: Disabling security is only recommended for development environments. For production, always keep security enabled.

Production-Ready Configuration

For production environments, security should remain enabled. Here’s a more robust configuration:

# Cluster configuration
cluster.name: production-cluster
node.name: ${HOSTNAME}

# Network
network.host: 0.0.0.0
http.port: 9200
transport.port: 9300

# Node roles
node.roles: [ master, data, ingest ]

# Memory
bootstrap.memory_lock: true

# Discovery (add your node IPs)
discovery.seed_hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3"]
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

# Security
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.http.ssl.enabled: true

# Monitoring
xpack.monitoring.collection.enabled: true

System Configuration and Optimization

Elasticsearch requires some system-level optimizations to perform well, especially in production environments.

Memory and File Descriptor Limits

Edit the systemd service file to configure memory locking:

sudo systemctl edit elasticsearch

Add the following override configuration:

[Service]
LimitMEMLOCK=infinity
LimitNOFILE=65535

Configure JVM heap size by editing the jvm.options file:

sudo nano /etc/elasticsearch/jvm.options.d/heap.options

Set heap size to approximately 50% of available RAM:

# For a system with 8GB RAM
-Xms4g
-Xmx4g

Virtual Memory Configuration

Elasticsearch uses memory mapping extensively. Configure vm.max_map_count:

echo 'vm.max_map_count=262144' | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

File System Optimization

For optimal performance, disable swap and configure filesystem settings:

# Disable swap
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

# Configure filesystem settings
echo 'vm.swappiness=1' | sudo tee -a /etc/sysctl.conf

Starting and Managing Elasticsearch

Enable and start the Elasticsearch service:

sudo systemctl daemon-reload
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch

Check the service status:

sudo systemctl status elasticsearch

Monitor the logs during startup:

sudo journalctl -u elasticsearch -f

Test the installation:

# For unsecured setup
curl -X GET "localhost:9200/"

# For secured setup (use the password from installation)
curl -X GET "localhost:9200/" -u elastic:your_password

A successful response looks like this:

{
  "name" : "node-1",
  "cluster_name" : "my-application",
  "cluster_uuid" : "abc123def456",
  "version" : {
    "number" : "8.11.0",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "d9ec3fa628c7b0ba3d25692e277ba26814820b20",
    "build_date" : "2023-11-04T10:04:57.184859352Z",
    "build_snapshot" : false,
    "lucene_version" : "9.8.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

Security Configuration

If you disabled security for development, here’s how to re-enable and configure it properly:

Enabling Basic Security

Edit the elasticsearch.yml file to enable security:

xpack.security.enabled: true

Restart Elasticsearch and generate passwords:

sudo systemctl restart elasticsearch
sudo /usr/share/elasticsearch/bin/elasticsearch-setup-passwords auto

This generates random passwords for built-in users. Save these credentials securely.

Setting Up TLS/SSL

For production environments, enable TLS encryption:

# Generate certificates
sudo /usr/share/elasticsearch/bin/elasticsearch-certutil ca
sudo /usr/share/elasticsearch/bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12

Move certificates to the Elasticsearch config directory:

sudo mv elastic-certificates.p12 /etc/elasticsearch/
sudo chown elasticsearch:elasticsearch /etc/elasticsearch/elastic-certificates.p12

Update elasticsearch.yml:

xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: elastic-certificates.p12

Real-World Use Cases and Examples

Now that Elasticsearch is running, let’s explore some practical applications:

Log Analysis Setup

Create an index for application logs:

curl -X PUT "localhost:9200/application-logs" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "timestamp": { "type": "date" },
      "level": { "type": "keyword" },
      "message": { "type": "text" },
      "service": { "type": "keyword" },
      "host": { "type": "keyword" }
    }
  }
}'

Index some sample log data:

curl -X POST "localhost:9200/application-logs/_doc" -H 'Content-Type: application/json' -d'
{
  "timestamp": "2024-01-15T10:30:00",
  "level": "ERROR",
  "message": "Database connection failed",
  "service": "user-service",
  "host": "web-server-01"
}'

Product Search Implementation

Create a product catalog index:

curl -X PUT "localhost:9200/products" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "standard",
        "fields": {
          "keyword": { "type": "keyword" }
        }
      },
      "description": { "type": "text" },
      "price": { "type": "double" },
      "category": { "type": "keyword" },
      "tags": { "type": "keyword" },
      "created_at": { "type": "date" }
    }
  }
}'

Perform a complex search query:

curl -X GET "localhost:9200/products/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "name": "laptop" } }
      ],
      "filter": [
        { "range": { "price": { "gte": 500, "lte": 2000 } } },
        { "term": { "category": "electronics" } }
      ]
    }
  },
  "sort": [
    { "price": { "order": "asc" } }
  ],
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "ranges": [
          { "to": 1000 },
          { "from": 1000, "to": 2000 },
          { "from": 2000 }
        ]
      }
    }
  }
}'

Performance Optimization and Monitoring

Here are some key metrics to monitor and optimization techniques:

Essential Monitoring Commands

# Cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

# Node statistics
curl -X GET "localhost:9200/_nodes/stats?pretty"

# Index statistics
curl -X GET "localhost:9200/_stats?pretty"

# Hot threads (useful for debugging performance issues)
curl -X GET "localhost:9200/_nodes/hot_threads"

Performance Tuning Tips

Area Optimization Impact
Indexing Increase refresh interval during bulk operations 2-3x faster indexing
Memory Set heap size to 50% of RAM, max 32GB Optimal garbage collection
Storage Use SSD storage 5-10x faster query response
Sharding 1 shard per 10-50GB of data Better resource utilization

Common Issues and Troubleshooting

Here are the most frequent problems you’ll encounter and their solutions:

Service Won’t Start

Check the logs for specific error messages:

sudo journalctl -u elasticsearch --no-pager -l

Common causes and fixes:

  • Memory lock issues: Ensure the systemd override is configured correctly
  • Insufficient memory: Reduce heap size in jvm.options
  • Port conflicts: Check if port 9200 is already in use with netstat -tlnp | grep 9200
  • Permission issues: Verify elasticsearch user owns data and log directories

High Memory Usage

Monitor heap usage:

curl -X GET "localhost:9200/_nodes/stats/jvm?pretty"

If heap usage is consistently above 85%, consider:

  • Reducing field data cache size
  • Implementing index lifecycle policies
  • Adding more nodes to distribute load

Slow Query Performance

Enable slow query logging in elasticsearch.yml:

index.search.slowlog.threshold.query.warn: 10s
index.search.slowlog.threshold.query.info: 5s
index.search.slowlog.threshold.query.debug: 2s
index.search.slowlog.threshold.query.trace: 500ms

Use the explain API to understand query performance:

curl -X GET "localhost:9200/products/_search?explain=true" -H 'Content-Type: application/json' -d'
{
  "query": { "match": { "name": "laptop" } }
}'

Comparison with Alternatives

Understanding how Elasticsearch stacks up against alternatives helps in making informed decisions:

Solution Best For Pros Cons
Elasticsearch Full-text search, analytics Rich query DSL, excellent scaling, real-time Memory intensive, complex operations
Apache Solr Traditional search applications Mature, extensive admin UI Less modern architecture, slower adoption of new features
PostgreSQL Full-Text Simple search in existing PostgreSQL apps No additional infrastructure, ACID compliance Limited search features, poor scaling for search workloads
Amazon OpenSearch AWS-native applications Managed service, AWS integration Vendor lock-in, potentially higher costs

Best Practices and Security Considerations

Here are essential practices for running Elasticsearch in production:

Security Best Practices

  • Never disable security in production: Always keep xpack.security.enabled: true
  • Use strong passwords: Implement password policies and rotate credentials regularly
  • Enable audit logging: Track who accesses what data
  • Network security: Use firewalls and VPNs to restrict access
  • Regular updates: Keep Elasticsearch updated to patch security vulnerabilities

Operational Best Practices

  • Index lifecycle management: Automatically rotate and delete old indices
  • Backup strategy: Regular snapshots to external storage
  • Monitoring: Set up alerts for cluster health, disk usage, and performance metrics
  • Capacity planning: Monitor growth trends and plan for scaling

Configure automated snapshots:

curl -X PUT "localhost:9200/_snapshot/my_backup" -H 'Content-Type: application/json' -d'
{
  "type": "fs",
  "settings": {
    "location": "/var/lib/elasticsearch/backups"
  }
}'

Integration and Advanced Features

Elasticsearch shines when integrated with other tools in the Elastic Stack:

Kibana Integration

Install Kibana for data visualization:

sudo apt install kibana -y
sudo systemctl enable kibana
sudo systemctl start kibana

Logstash for Data Processing

Logstash can enrich and transform data before indexing:

sudo apt install logstash -y

Sample Logstash configuration for processing Apache logs:

input {
  file {
    path => "/var/log/apache2/access.log"
    start_position => "beginning"
  }
}

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
  
  date {
    match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
  
  mutate {
    convert => { "bytes" => "integer" }
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "apache-logs-%{+YYYY.MM.dd}"
  }
}

Application Integration Examples

Here’s a Python example using the official Elasticsearch client:

from elasticsearch import Elasticsearch

# Connect to Elasticsearch
es = Elasticsearch(['localhost:9200'])

# Index a document
doc = {
    'author': 'John Doe',
    'text': 'Elasticsearch is powerful',
    'timestamp': '2024-01-15T12:00:00'
}

es.index(index='posts', id=1, body=doc)

# Search for documents
response = es.search(
    index='posts',
    body={
        'query': {
            'match': {
                'text': 'elasticsearch'
            }
        }
    }
)

print(f"Found {response['hits']['total']['value']} documents")
for hit in response['hits']['hits']:
    print(f"Score: {hit['_score']}, Source: {hit['_source']}")

For more advanced implementations and cluster management, check out the official Elasticsearch documentation at https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html.

With this setup, you now have a solid foundation for building search-driven applications, implementing log analysis systems, or creating business intelligence dashboards. Remember that Elasticsearch is incredibly flexible – the key is starting simple and gradually adding complexity as your requirements evolve.



This article incorporates information and material from various online sources. We acknowledge and appreciate the work of all original authors, publishers, and websites. While every effort has been made to appropriately credit the source material, any unintentional oversight or omission does not constitute a copyright infringement. All trademarks, logos, and images mentioned are the property of their respective owners. If you believe that any content used in this article infringes upon your copyright, please contact us immediately for review and prompt action.

This article is intended for informational and educational purposes only and does not infringe on the rights of the copyright owners. If any copyrighted material has been used without proper credit or in violation of copyright laws, it is unintentional and we will rectify it promptly upon notification. Please note that the republishing, redistribution, or reproduction of part or all of the contents in any form is prohibited without express written permission from the author and website owner. For permissions or further inquiries, please contact us.

Leave a reply

Your email address will not be published. Required fields are marked