BLOG POSTS

MangoHost Blog / How to Install and Configure Neo4j on Ubuntu 24

How to Install and Configure Neo4j on Ubuntu 24

Neo4j is a highly popular, Java-based NoSQL graph database that excels at handling complex relationships between data points, making it perfect for recommendation engines, fraud detection, network analysis, and social media platforms. Unlike traditional relational databases that struggle with multi-level JOINs, Neo4j uses graph structures with nodes, edges, and properties to represent and store data, providing lightning-fast traversals even with billions of connections. This guide walks you through installing Neo4j Community Edition on Ubuntu 24, configuring it for production use, and optimizing performance for real-world applications.

Understanding Neo4j Architecture and Components

Neo4j operates fundamentally differently from traditional databases. Instead of tables and rows, it stores data as graphs consisting of nodes (entities) connected by relationships (edges). Each node and relationship can have properties, creating a rich data model that mirrors real-world connections naturally.

The database engine includes several key components:

Store Files: Binary files containing nodes, relationships, and properties
Transaction Logs: Write-ahead logs ensuring ACID compliance
Page Cache: In-memory cache for frequently accessed data
Cypher Query Engine: SQL-like declarative query language
Bolt Protocol: Binary protocol for client-server communication

Neo4j Community Edition provides the core graph database functionality, while Enterprise Edition adds clustering, advanced security, and monitoring features. For development and small to medium production workloads, Community Edition handles millions of nodes efficiently.

Prerequisites and System Requirements

Before diving into installation, ensure your Ubuntu 24 system meets Neo4j’s requirements. The database is surprisingly lightweight for basic operations but can consume significant resources under heavy loads.

Component	Minimum	Recommended	Production
RAM	2GB	8GB	16GB+
CPU	2 cores	4 cores	8+ cores
Storage	10GB	100GB SSD	500GB+ NVMe
Java	OpenJDK 17	OpenJDK 17	OpenJDK 17

First, update your system and install Java 17, which Neo4j 5.x requires:

sudo apt update && sudo apt upgrade -y
sudo apt install openjdk-17-jdk curl wget gnupg -y
java -version

Verify Java installation shows OpenJDK 17. If multiple Java versions exist, configure the default:

sudo update-alternatives --config java

Step-by-Step Neo4j Installation

Neo4j provides official APT repositories for Ubuntu, making installation and updates straightforward. This method ensures you receive security patches and feature updates automatically.

Add Neo4j’s official GPG key and repository:

wget -O - https://debian.neo4j.com/neotechnology.gpg.key | sudo gpg --dearmor -o /usr/share/keyrings/neo4j.gpg
echo "deb [signed-by=/usr/share/keyrings/neo4j.gpg] https://debian.neo4j.com stable 5" | sudo tee /etc/apt/sources.list.d/neo4j.list

Update package lists and install Neo4j:

sudo apt update
sudo apt install neo4j -y

The installation creates a dedicated neo4j user and installs files in standard Linux locations:

Installation: /var/lib/neo4j
Configuration: /etc/neo4j/neo4j.conf
Logs: /var/log/neo4j
Service: /lib/systemd/system/neo4j.service

Enable and start the Neo4j service:

sudo systemctl enable neo4j
sudo systemctl start neo4j
sudo systemctl status neo4j

Check the service status shows “active (running)”. If issues occur, examine logs:

sudo journalctl -u neo4j -f

Essential Configuration Settings

Neo4j’s default configuration works for development but requires tuning for production environments. The main configuration file /etc/neo4j/neo4j.conf contains hundreds of settings, but several are critical for performance and security.

Edit the configuration file:

sudo nano /etc/neo4j/neo4j.conf

Key settings to modify:

# Enable remote connections
server.default_listen_address=0.0.0.0

# HTTP and HTTPS ports
server.http.listen_address=:7474
server.https.listen_address=:7473

# Bolt protocol port
server.bolt.listen_address=:7687

# Memory settings (adjust based on available RAM)
server.memory.heap.initial_size=1g
server.memory.heap.max_size=1g
server.memory.pagecache.size=512m

# Database location
server.directories.data=/var/lib/neo4j/data
server.directories.logs=/var/log/neo4j

# Security settings
dbms.security.auth_enabled=true
dbms.security.procedures.unrestricted=apoc.*

# Performance tuning
dbms.checkpoint.interval.time=15m
dbms.checkpoint.interval.tx=100000

For production deployments on a dedicated server, allocate 50-60% of available RAM to heap memory, leaving the remainder for the operating system and page cache.

Restart Neo4j to apply configuration changes:

sudo systemctl restart neo4j

Setting Up Authentication and Security

Neo4j requires changing the default password on first connection. The initial username is neo4j with password neo4j. This step is mandatory and the database refuses connections until completed.

Set the initial password using the built-in admin tool:

sudo neo4j-admin dbms set-initial-password your_secure_password_here

For production environments, consider additional security measures:

# Disable HTTP connector (use HTTPS only)
server.http.enabled=false
server.https.enabled=true

# Enable SSL/TLS
dbms.ssl.policy.bolt.enabled=true
dbms.ssl.policy.https.enabled=true

# Restrict admin procedures
dbms.security.procedures.whitelist=apoc.coll.*,apoc.load.*

Configure firewall rules to restrict access:

sudo ufw allow from trusted_ip_range to any port 7474
sudo ufw allow from trusted_ip_range to any port 7687
sudo ufw enable

Testing Your Installation

With Neo4j running and configured, test connectivity using multiple methods. The web interface provides an intuitive way to explore the database and run queries.

Access the Neo4j Browser at http://your_server_ip:7474. Login with username neo4j and your configured password.

Run basic Cypher queries to verify functionality:

// Create sample nodes and relationships
CREATE (alice:Person {name: 'Alice', age: 30})
CREATE (bob:Person {name: 'Bob', age: 25})
CREATE (company:Company {name: 'TechCorp'})
CREATE (alice)-[:WORKS_FOR]->(company)
CREATE (bob)-[:WORKS_FOR]->(company)
CREATE (alice)-[:KNOWS]->(bob)

// Query the graph
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
RETURN p.name, c.name

// Find connections
MATCH (p1:Person)-[:KNOWS]-(p2:Person)
RETURN p1.name, p2.name

Test programmatic access using Python and the official driver:

pip install neo4j

# Python test script
from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687", 
                             auth=("neo4j", "your_password"))

def test_connection(tx):
    result = tx.run("RETURN 'Hello, Neo4j!' as message")
    return result.single()["message"]

with driver.session() as session:
    message = session.execute_read(test_connection)
    print(message)

driver.close()

Performance Optimization and Tuning

Neo4j performance depends heavily on proper memory configuration, indexing strategy, and query optimization. Unlike relational databases where row-based access is standard, graph databases excel at traversal operations but require different optimization approaches.

Memory allocation follows specific patterns:

Memory Type	Purpose	Recommendation
Heap Memory	Query processing, transactions	8-16GB for most workloads
Page Cache	Store file caching	Remaining available RAM
OS Memory	System operations	2-4GB minimum

Create indexes for frequently queried properties:

// Create indexes for better query performance
CREATE INDEX person_name FOR (p:Person) ON (p.name)
CREATE INDEX company_name FOR (c:Company) ON (c.name)

// Composite indexes for multiple properties
CREATE INDEX person_name_age FOR (p:Person) ON (p.name, p.age)

// Show existing indexes
SHOW INDEXES

Monitor query performance using PROFILE and EXPLAIN:

// Analyze query execution plan
PROFILE MATCH (p:Person {name: 'Alice'})-[:WORKS_FOR]->(c:Company)
RETURN p, c

// Show execution plan without running
EXPLAIN MATCH (p:Person)-[:KNOWS*2..4]-(friend)
WHERE p.name = 'Alice'
RETURN friend.name

Common Installation Issues and Troubleshooting

Neo4j installations occasionally encounter issues, particularly around Java versions, memory allocation, and file permissions. Understanding common problems saves significant debugging time.

Java Version Conflicts:

Neo4j 5.x requires exactly Java 17. Multiple Java versions cause startup failures:

# Check current Java version
java -version
javac -version

# List installed Java versions
sudo update-alternatives --list java

# Set Java 17 as default
sudo update-alternatives --config java

Memory Issues:

Insufficient memory allocation causes OutOfMemoryError exceptions:

# Monitor Neo4j memory usage
sudo ps aux | grep neo4j
sudo systemctl status neo4j

# Check available system memory
free -h
cat /proc/meminfo

Permission Problems:

File permission issues prevent database startup:

# Fix Neo4j file ownership
sudo chown -R neo4j:neo4j /var/lib/neo4j
sudo chown -R neo4j:neo4j /var/log/neo4j
sudo chmod -R 755 /var/lib/neo4j

Port Conflicts:

Other services using Neo4j’s default ports cause binding failures:

# Check port usage
sudo netstat -tlnp | grep 7474
sudo lsof -i :7687

# Kill conflicting processes if necessary
sudo fuser -k 7474/tcp

Real-World Use Cases and Applications

Neo4j shines in scenarios where relationships between entities are as important as the entities themselves. Traditional relational databases struggle with complex JOIN operations that Neo4j handles effortlessly through graph traversals.

Social Network Analysis:

Model user connections, friend recommendations, and community detection:

// Find mutual friends
MATCH (user1:User {name: 'Alice'})-[:FRIEND]-(mutual)-[:FRIEND]-(user2:User {name: 'Bob'})
WHERE user1 <> user2
RETURN mutual.name

// Recommend friends based on common interests
MATCH (user:User {name: 'Alice'})-[:INTERESTED_IN]->(interest)<-[:INTERESTED_IN]-(potential_friend)
WHERE NOT (user)-[:FRIEND]-(potential_friend)
RETURN potential_friend.name, count(interest) as common_interests
ORDER BY common_interests DESC

Fraud Detection:

Identify suspicious patterns and connected fraudulent accounts:

// Find accounts sharing suspicious patterns
MATCH (account1:Account)-[:SHARES_DEVICE|SHARES_IP|SHARES_ADDRESS]-(account2:Account)
WHERE account1.flagged = true
RETURN account2.id, account2.status

// Detect ring patterns
MATCH path = (a:Account)-[:TRANSFER*3..6]->(a)
WHERE all(rel in relationships(path) WHERE rel.amount > 1000)
RETURN path

Recommendation Engines:

Build sophisticated recommendation systems based on user behavior:

// Product recommendations based on similar users
MATCH (user:User {id: 123})-[:PURCHASED]->(product)<-[:PURCHASED]-(similar_user)
MATCH (similar_user)-[:PURCHASED]->(recommendation)
WHERE NOT (user)-[:PURCHASED]->(recommendation)
RETURN recommendation.name, count(*) as score
ORDER BY score DESC
LIMIT 10

Comparing Neo4j with Alternative Solutions

Graph databases occupy a specific niche, but several alternatives exist depending on use case requirements and existing infrastructure.

Database	Type	Strengths	Weaknesses	Best For
Neo4j	Native Graph	Mature, Cypher query language, excellent tooling	Memory intensive, licensing costs	Complex relationships, real-time queries
Amazon Neptune	Managed Graph	Fully managed, supports multiple query languages	Vendor lock-in, cost at scale	AWS environments, rapid deployment
PostgreSQL	Relational with graph extensions	Familiar SQL, ACID compliance, cost-effective	Limited graph capabilities, complex queries	Existing PostgreSQL infrastructure
ArangoDB	Multi-model	Document, graph, and key-value in one system	Jack-of-all-trades complexity	Varied data models in single application

Choose Neo4j when relationship traversal performance is critical and your application naturally models as a graph. For simpler relationship queries or budget constraints, PostgreSQL with recursive CTEs might suffice.

Production Deployment Best Practices

Moving Neo4j from development to production requires careful planning around backup strategies, monitoring, and scalability. A VPS works well for development and small production workloads, while larger applications benefit from dedicated hardware.

Backup Configuration:

# Configure automated backups
server.directories.data=/var/lib/neo4j/data
server.directories.logs=/var/log/neo4j

# Create backup script
#!/bin/bash
BACKUP_DIR="/backup/neo4j"
DATE=$(date +%Y%m%d_%H%M%S)

sudo -u neo4j neo4j-admin database dump --to-path=$BACKUP_DIR neo4j_backup_$DATE.dump
find $BACKUP_DIR -name "*.dump" -mtime +7 -delete

Monitoring Setup:

# Enable JMX monitoring
server.jvm.additional=-Dcom.sun.management.jmxremote.port=3637
server.jvm.additional=-Dcom.sun.management.jmxremote.authenticate=false
server.jvm.additional=-Dcom.sun.management.jmxremote.ssl=false

# Monitor key metrics
dbms.logs.query.enabled=true
dbms.logs.query.threshold=1s

Security Hardening:

Use strong passwords and consider certificate-based authentication
Enable SSL/TLS for all connections
Restrict network access using firewalls
Regular security updates through APT repository
Monitor query logs for suspicious activity

Neo4j Community Edition handles millions of nodes efficiently on modern hardware. For applications requiring clustering, advanced security features, or enterprise support, consider upgrading to Enterprise Edition. The Community Edition provides an excellent foundation for learning graph databases and building substantial applications.

Regular maintenance includes monitoring disk space, analyzing slow queries, and updating indexes based on query patterns. The Neo4j Browser provides built-in profiling tools, while external monitoring solutions offer deeper insights into performance trends.

For additional information, consult the official Neo4j Documentation which provides comprehensive guides on advanced topics like clustering, performance tuning, and integration patterns.

This article incorporates information and material from various online sources. We acknowledge and appreciate the work of all original authors, publishers, and websites. While every effort has been made to appropriately credit the source material, any unintentional oversight or omission does not constitute a copyright infringement. All trademarks, logos, and images mentioned are the property of their respective owners. If you believe that any content used in this article infringes upon your copyright, please contact us immediately for review and prompt action.

This article is intended for informational and educational purposes only and does not infringe on the rights of the copyright owners. If any copyrighted material has been used without proper credit or in violation of copyright laws, it is unintentional and we will rectify it promptly upon notification. Please note that the republishing, redistribution, or reproduction of part or all of the contents in any form is prohibited without express written permission from the author and website owner. For permissions or further inquiries, please contact us.