
How to Install and Use ClickHouse on Ubuntu 24
ClickHouse is an open-source columnar database management system designed for OLAP (Online Analytical Processing) workloads, and it’s become increasingly popular among developers and system administrators who need to handle massive datasets with lightning-fast query performance. Originally developed by Yandex, ClickHouse can process billions of rows and return complex analytical queries in seconds, making it an excellent choice for real-time analytics, data warehousing, and log analysis. In this guide, you’ll learn how to install ClickHouse on Ubuntu 24, configure it properly, and start leveraging its powerful capabilities for your data-intensive applications.
Understanding ClickHouse Architecture
ClickHouse differs significantly from traditional row-based databases like MySQL or PostgreSQL. Its columnar storage format means data is stored column by column rather than row by row, which provides several advantages for analytical workloads:
- Better compression ratios since similar data types are stored together
- Faster analytical queries that only need to read specific columns
- Vectorized query execution that processes multiple values simultaneously
- Built-in support for parallel processing across multiple CPU cores
The system uses a MergeTree engine family as its primary storage engine, which automatically sorts and merges data parts in the background. This design allows ClickHouse to maintain high write throughput while keeping query performance optimal.
Prerequisites and System Requirements
Before installing ClickHouse on Ubuntu 24, ensure your system meets the minimum requirements. ClickHouse performs best with adequate RAM and fast storage, especially for production workloads.
Component | Minimum | Recommended | Production |
---|---|---|---|
RAM | 4 GB | 8 GB | 32 GB+ |
CPU Cores | 2 | 4 | 16+ |
Storage | 20 GB | 100 GB SSD | 1 TB+ NVMe |
Network | 100 Mbps | 1 Gbps | 10 Gbps |
For serious production deployments, consider using dedicated servers with NVMe storage and high-speed networking to maximize ClickHouse performance.
Installing ClickHouse on Ubuntu 24
ClickHouse provides official packages for Ubuntu, making installation straightforward. The process involves adding the official repository and installing the server and client packages.
First, update your system and install required dependencies:
sudo apt update && sudo apt upgrade -y
sudo apt install -y apt-transport-https ca-certificates dirmngr
Add the ClickHouse repository key and source:
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 8919F6BD2B48D754
echo "deb https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
Update the package list and install ClickHouse:
sudo apt update
sudo apt install -y clickhouse-server clickhouse-client
During installation, you’ll be prompted to set a password for the default user. Choose a strong password and remember it for later use.
Start and enable the ClickHouse service:
sudo systemctl start clickhouse-server
sudo systemctl enable clickhouse-server
Verify the installation by checking the service status:
sudo systemctl status clickhouse-server
Initial Configuration and Security Setup
The default ClickHouse configuration is suitable for development but requires adjustments for production use. The main configuration file is located at /etc/clickhouse-server/config.xml
.
Create a backup of the original configuration:
sudo cp /etc/clickhouse-server/config.xml /etc/clickhouse-server/config.xml.backup
Edit the configuration to allow connections from specific IP addresses:
sudo nano /etc/clickhouse-server/config.xml
Find the <listen_host>
section and uncomment or add the following lines:
<listen_host>0.0.0.0</listen_host>
<!-- For IPv6 support -->
<listen_host>::</listen_host>
Configure user authentication by editing the users configuration:
sudo nano /etc/clickhouse-server/users.xml
Create a new user with specific permissions:
<users>
<analytics_user>
<password_sha256_hex>your_password_hash_here</password_sha256_hex>
<networks>
<ip>your_client_ip/32</ip>
</networks>
<profile>default</profile>
<quota>default</quota>
<databases>
<database>analytics</database>
</databases>
</analytics_user>
</users>
Generate a password hash using the clickhouse-client:
echo -n 'your_password' | sha256sum
Restart ClickHouse to apply the configuration changes:
sudo systemctl restart clickhouse-server
Basic Database Operations
Now that ClickHouse is installed and configured, let’s explore basic operations. Connect to the ClickHouse server using the client:
clickhouse-client --password
Create your first database and table:
CREATE DATABASE analytics;
USE analytics;
CREATE TABLE web_events (
timestamp DateTime,
user_id UInt32,
event_type String,
page_url String,
browser String,
country String
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id);
Insert some sample data:
INSERT INTO web_events VALUES
('2024-01-15 10:30:00', 12345, 'pageview', '/home', 'Chrome', 'US'),
('2024-01-15 10:35:00', 12345, 'click', '/products', 'Chrome', 'US'),
('2024-01-15 10:40:00', 67890, 'pageview', '/home', 'Firefox', 'UK'),
('2024-01-15 10:45:00', 67890, 'signup', '/register', 'Firefox', 'UK');
Run analytical queries to test performance:
-- Count events by type
SELECT event_type, count() as total
FROM web_events
GROUP BY event_type
ORDER BY total DESC;
-- Hourly event distribution
SELECT
toHour(timestamp) as hour,
count() as events
FROM web_events
GROUP BY hour
ORDER BY hour;
Performance Optimization and Best Practices
ClickHouse performance depends heavily on proper table design and configuration. Here are essential optimization strategies:
Choosing the Right Order By Clause:
The ORDER BY clause in MergeTree tables determines how data is sorted and directly impacts query performance. Place the most frequently filtered columns first:
-- Good: frequently filtered columns first
CREATE TABLE events (
date Date,
user_id UInt32,
event_type String,
data String
) ENGINE = MergeTree()
ORDER BY (date, user_id, event_type);
-- Bad: rarely used columns first
CREATE TABLE events_bad (
date Date,
user_id UInt32,
event_type String,
data String
) ENGINE = MergeTree()
ORDER BY (data, event_type, user_id);
Using Appropriate Data Types:
ClickHouse offers specialized data types that can significantly improve performance and reduce storage:
CREATE TABLE optimized_events (
timestamp DateTime64(3), -- Millisecond precision
user_id UInt32, -- Instead of UInt64 if values fit
event_type LowCardinality(String), -- For columns with few distinct values
ip_address IPv4, -- Instead of String for IP addresses
amount Decimal(10,2), -- For financial data
tags Array(String), -- For array data
metadata Map(String, String) -- For key-value pairs
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id);
Memory and Storage Configuration:
Adjust memory settings based on your server specifications:
sudo nano /etc/clickhouse-server/config.xml
Add these configuration parameters:
<max_memory_usage>8000000000</max_memory_usage> <!-- 8GB -->
<max_bytes_before_external_group_by>4000000000</max_bytes_before_external_group_by>
<max_bytes_before_external_sort>4000000000</max_bytes_before_external_sort>
<max_server_memory_usage>16000000000</max_server_memory_usage> <!-- 16GB -->
Real-World Use Cases and Examples
ClickHouse excels in several practical scenarios. Here are common implementations:
Web Analytics Dashboard:
Track website performance and user behavior with real-time queries:
CREATE TABLE page_analytics (
timestamp DateTime,
session_id String,
user_id Nullable(UInt32),
page_path String,
referrer String,
user_agent String,
load_time_ms UInt16,
country_code FixedString(2),
device_type LowCardinality(String),
bounce Boolean
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, session_id);
-- Real-time dashboard queries
SELECT
toHour(timestamp) as hour,
count() as pageviews,
uniqExact(session_id) as unique_sessions,
avg(load_time_ms) as avg_load_time
FROM page_analytics
WHERE timestamp >= now() - INTERVAL 24 HOUR
GROUP BY hour
ORDER BY hour;
Log Analysis System:
Process application logs for monitoring and debugging:
CREATE TABLE application_logs (
timestamp DateTime64(3),
level LowCardinality(String),
service String,
message String,
trace_id String,
user_id Nullable(UInt32),
response_time_ms Nullable(UInt16),
status_code Nullable(UInt16),
error_details Nullable(String)
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (timestamp, service, level);
-- Error analysis query
SELECT
service,
count() as error_count,
uniqExact(trace_id) as affected_requests
FROM application_logs
WHERE level = 'ERROR'
AND timestamp >= now() - INTERVAL 1 HOUR
GROUP BY service
ORDER BY error_count DESC;
IoT Data Processing:
Handle sensor data from IoT devices:
CREATE TABLE sensor_data (
timestamp DateTime64(3),
device_id UInt32,
sensor_type LowCardinality(String),
value Float32,
unit String,
location_lat Float64,
location_lon Float64,
battery_level UInt8
) ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (device_id, timestamp, sensor_type);
-- Temperature anomaly detection
SELECT
device_id,
avg(value) as avg_temp,
max(value) as max_temp,
min(value) as min_temp,
count() as readings
FROM sensor_data
WHERE sensor_type = 'temperature'
AND timestamp >= now() - INTERVAL 1 DAY
GROUP BY device_id
HAVING max_temp > 35 OR min_temp < -10;
ClickHouse vs Alternatives Comparison
Understanding how ClickHouse compares to other database solutions helps determine when it's the right choice:
Database | Query Speed | Storage Compression | Learning Curve | Best Use Case |
---|---|---|---|---|
ClickHouse | Very Fast | Excellent (10:1) | Moderate | Real-time analytics |
PostgreSQL | Moderate | Good (3:1) | Easy | General purpose OLTP |
Elasticsearch | Fast | Good (4:1) | Hard | Full-text search |
BigQuery | Very Fast | Excellent | Easy | Cloud analytics |
MongoDB | Moderate | Fair (2:1) | Easy | Document storage |
ClickHouse performance benchmarks show significant advantages for analytical workloads. In typical scenarios, ClickHouse can be 100-1000x faster than traditional databases for aggregation queries on large datasets.
Common Issues and Troubleshooting
Here are frequent problems and their solutions:
Memory Issues:
If you encounter "Memory limit exceeded" errors:
-- Check current memory usage
SELECT formatReadableSize(memory_usage) FROM system.processes;
-- Reduce memory usage with LIMIT
SELECT * FROM large_table
ORDER BY timestamp
LIMIT 100000;
-- Use external sorting for large GROUP BY operations
SET max_bytes_before_external_group_by = 1000000000;
Connection Problems:
If you can't connect to ClickHouse from remote hosts:
# Check if ClickHouse is listening on the correct interface
sudo netstat -tlnp | grep 9000
# Verify firewall settings
sudo ufw allow 9000/tcp
# Check ClickHouse logs
sudo tail -f /var/log/clickhouse-server/clickhouse-server.log
Performance Issues:
For slow queries, analyze the execution plan:
-- Explain query execution
EXPLAIN PLAN SELECT * FROM events WHERE date = '2024-01-15';
-- Check query statistics
SELECT query, query_duration_ms, memory_usage
FROM system.query_log
WHERE query_duration_ms > 1000
ORDER BY query_duration_ms DESC
LIMIT 10;
Disk Space Management:
Monitor and manage disk usage:
-- Check table sizes
SELECT
database,
table,
formatReadableSize(sum(bytes_on_disk)) as size
FROM system.parts
GROUP BY database, table
ORDER BY sum(bytes_on_disk) DESC;
-- Set up automatic data cleanup
ALTER TABLE events
ADD TTL timestamp + INTERVAL 90 DAY;
Integration and Data Import Strategies
ClickHouse supports various data import methods for different scenarios:
CSV File Import:
# Import from CSV file
cat data.csv | clickhouse-client --query="INSERT INTO events FORMAT CSV"
# Import with specific format settings
clickhouse-client --query="INSERT INTO events FORMAT CSVWithNames" < data.csv
Real-time Data Streaming:
Use Kafka for streaming data into ClickHouse:
CREATE TABLE kafka_events (
timestamp DateTime,
user_id UInt32,
event_type String,
data String
) ENGINE = Kafka
SETTINGS
kafka_broker_list = 'localhost:9092',
kafka_topic_list = 'events',
kafka_group_name = 'clickhouse_group',
kafka_format = 'JSONEachRow';
-- Create materialized view to move data from Kafka to MergeTree
CREATE MATERIALIZED VIEW events_mv TO events AS
SELECT * FROM kafka_events;
Database Migration:
Migrate data from MySQL or PostgreSQL:
# Export from MySQL
mysqldump --tab=/tmp/export database_name table_name
# Import to ClickHouse
clickhouse-client --query="CREATE TABLE imported_table AS SELECT * FROM mysql('mysql-host:3306', 'database', 'table', 'user', 'password')"
For production deployments requiring high availability and consistent performance, consider using VPS services with SSD storage and dedicated resources to ensure optimal ClickHouse performance.
ClickHouse continues to evolve rapidly, with new features and optimizations being added regularly. The official ClickHouse documentation provides comprehensive information about advanced features, configuration options, and best practices for specific use cases. As you become more comfortable with ClickHouse, explore advanced topics like replication, sharding, and custom table engines to build robust, scalable analytical systems.

This article incorporates information and material from various online sources. We acknowledge and appreciate the work of all original authors, publishers, and websites. While every effort has been made to appropriately credit the source material, any unintentional oversight or omission does not constitute a copyright infringement. All trademarks, logos, and images mentioned are the property of their respective owners. If you believe that any content used in this article infringes upon your copyright, please contact us immediately for review and prompt action.
This article is intended for informational and educational purposes only and does not infringe on the rights of the copyright owners. If any copyrighted material has been used without proper credit or in violation of copyright laws, it is unintentional and we will rectify it promptly upon notification. Please note that the republishing, redistribution, or reproduction of part or all of the contents in any form is prohibited without express written permission from the author and website owner. For permissions or further inquiries, please contact us.