BLOG POSTS

MangoHost Blog / How to Migrate MySQL Database to Postgres Using pgloader

How to Migrate MySQL Database to Postgres Using pgloader

Migrating from MySQL to PostgreSQL is a common challenge that many teams face when scaling applications or seeking better performance, advanced features, and more robust data integrity constraints. While manual migration can be tedious and error-prone, pgloader provides an automated solution that can handle the heavy lifting of data transfer, schema conversion, and type mapping between these two popular database systems. This guide will walk you through the complete process of using pgloader to migrate your MySQL database to PostgreSQL, covering everything from installation and configuration to troubleshooting common issues and optimizing performance.

Understanding pgloader and How It Works

pgloader is a powerful data loading tool specifically designed to migrate data into PostgreSQL from various source databases, including MySQL, SQLite, MS SQL Server, and CSV files. What makes pgloader particularly effective for MySQL-to-PostgreSQL migrations is its ability to automatically handle schema differences, data type conversions, and even rebuild indexes and constraints.

The tool works by connecting to both source and destination databases simultaneously, reading the MySQL schema structure, converting it to PostgreSQL-compatible DDL statements, and then streaming data in batches. It handles complex scenarios like foreign key relationships, triggers, and custom data types that would otherwise require manual intervention.

Key advantages of pgloader include:

Automatic schema conversion with intelligent type mapping
Parallel data loading for improved performance
Built-in error handling and data validation
Support for custom transformation rules
Comprehensive logging and progress reporting

Unlike simple dump-and-restore approaches, pgloader can transform data on-the-fly, making it ideal for migrations where direct schema compatibility isn’t possible.

Installation and Initial Setup

Getting pgloader up and running depends on your operating system. Here are the most common installation methods:

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install pgloader

CentOS/RHEL:

sudo yum install epel-release
sudo yum install pgloader

macOS (using Homebrew):

brew install pgloader

Docker (recommended for consistent environments):

docker pull dimitri/pgloader:latest

Before starting the migration, ensure both MySQL and PostgreSQL servers are accessible and you have the necessary permissions. You’ll need:

READ access to the MySQL source database
CREATE and INSERT permissions on the PostgreSQL target database
Network connectivity between pgloader and both database servers

Test your connections first:

# Test MySQL connection
mysql -h source_host -u username -p source_database

# Test PostgreSQL connection
psql -h target_host -U username -d target_database

Step-by-Step Migration Process

The migration process with pgloader involves creating a command file that defines source and target connections, transformation rules, and migration parameters.

Basic Migration Command

For simple migrations, you can use pgloader’s command-line syntax:

pgloader mysql://user:password@mysql-host/source_db postgresql://user:password@pg-host/target_db

However, for production environments, using a configuration file provides better control:

Creating a Configuration File

Create a file named migration.load:

LOAD DATABASE
    FROM mysql://mysql_user:mysql_pass@mysql_host:3306/source_database
    INTO postgresql://pg_user:pg_pass@pg_host:5432/target_database

WITH include drop, create tables, create indexes, reset sequences,
     workers = 8, concurrency = 1,
     multiple readers per thread, rows per range = 50000

SET MySQL PARAMETERS
    net_read_timeout  = '120s',
    net_write_timeout = '120s'

SET PostgreSQL PARAMETERS
    maintenance_work_mem to '1GB',
    checkpoint_segments to 64

CAST type datetime to timestamptz
     drop default drop not null using zero-dates-to-null,
     type date drop not null drop default using zero-dates-to-null

BEFORE LOAD DO
$$ CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; $$

ALTER SCHEMA 'source_database' RENAME TO 'public';

Running the Migration

Execute the migration using your configuration file:

pgloader migration.load

For Docker users:

docker run --rm -v $(pwd):/data dimitri/pgloader:latest pgloader /data/migration.load

Advanced Configuration Options

For complex migrations, you might need additional configuration options:

LOAD DATABASE
    FROM mysql://user:pass@mysql-host/source_db
    INTO postgresql://user:pass@pg-host/target_db

WITH include drop, create tables, create indexes, reset sequences,
     workers = 16, concurrency = 1,
     multiple readers per thread, rows per range = 100000,
     prefetch rows = 25000

SET MySQL PARAMETERS
    net_read_timeout = '300s',
    net_write_timeout = '300s'

-- Handle specific table transformations
LOAD FROM mysql_table 
INTO postgresql_table (
    id, name, created_at, updated_at
)
WITH truncate,
     skip header = 1,
     fields optionally enclosed by '"',
     fields escaped by double-quote,
     fields terminated by ','

-- Custom data transformations
CAST column "price" to numeric drop typemod,
     column "status" to text,
     column "created_date" to timestamp using zero-dates-to-null

Real-World Migration Examples

E-commerce Platform Migration

Here’s a practical example of migrating an e-commerce database with complex relationships:

LOAD DATABASE
    FROM mysql://ecom_user:secure_pass@mysql.internal:3306/ecommerce_prod
    INTO postgresql://pg_user:pg_pass@postgres.internal:5432/ecommerce_new

WITH include drop, create tables, create indexes, reset sequences,
     workers = 12, concurrency = 1,
     multiple readers per thread, rows per range = 25000

-- Handle decimal precision for prices
CAST type decimal(10,2) to numeric(10,2),
     type tinyint(1) to boolean using tinyint-to-boolean,
     type datetime to timestamptz drop default drop not null using zero-dates-to-null

-- Pre-migration setup
BEFORE LOAD DO
$$ 
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS "pg_trgm";
$$

-- Handle specific tables with custom logic
EXCLUDING TABLE NAMES MATCHING ~/^temp_/, ~/^backup_/

-- Custom field mapping for user table
LOAD FROM users
INTO users (
    user_id, email, password_hash, 
    created_at, updated_at, last_login
)
WITH batch rows = 10000,
     batch size = 20MB

-- Post-migration tasks
AFTER LOAD DO
$$
UPDATE products SET search_vector = to_tsvector('english', name || ' ' || description);
CREATE INDEX CONCURRENTLY idx_products_search ON products USING gin(search_vector);
$$;

Multi-Tenant Application Migration

For applications with multiple schemas or tenant separation:

-- Migrate each tenant schema separately
LOAD DATABASE
    FROM mysql://app_user:password@mysql-host/tenant_001
    INTO postgresql://pg_user:password@pg-host/multi_tenant_app

WITH include drop, create tables, create indexes,
     workers = 8

ALTER SCHEMA 'tenant_001' RENAME TO 'tenant_001'

-- Repeat for each tenant or use shell scripting
-- for tenant in tenant_001 tenant_002 tenant_003; do
--   sed "s/TENANT_NAME/$tenant/g" migration_template.load > ${tenant}_migration.load
--   pgloader ${tenant}_migration.load
-- done

Performance Optimization and Tuning

Migration performance can vary significantly based on data volume, network latency, and hardware resources. Here are key optimization strategies:

Parameter	Default Value	Recommended for Large DBs	Impact
workers	4	8-16	Parallel processing threads
rows per range	25000	50000-100000	Batch size for data transfer
prefetch rows	10000	25000-50000	Read-ahead buffer size
multiple readers per thread	false	true	Concurrent reading capability

PostgreSQL-specific optimizations during migration:

SET PostgreSQL PARAMETERS
    maintenance_work_mem to '2GB',
    checkpoint_segments to 128,
    wal_buffers to '16MB',
    shared_buffers to '4GB',
    effective_cache_size to '12GB'

Performance monitoring during migration:

# Monitor pgloader progress
tail -f /tmp/pgloader.log

# Check PostgreSQL activity
SELECT pid, query, state, query_start 
FROM pg_stat_activity 
WHERE datname = 'target_database';

# Monitor table sizes during migration
SELECT schemaname, tablename, 
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables 
WHERE schemaname = 'public' 
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

Common Issues and Troubleshooting

Data Type Conversion Problems

MySQL and PostgreSQL handle certain data types differently. Common issues include:

Zero dates and timestamps:

-- MySQL allows '0000-00-00' dates, PostgreSQL doesn't
CAST type datetime to timestamptz 
     drop default drop not null using zero-dates-to-null,
     
-- Alternative: use a specific default date
CAST type datetime to timestamptz 
     using zero-dates-to-epoch

Boolean values:

-- Convert MySQL tinyint(1) to PostgreSQL boolean
CAST type tinyint(1) to boolean using tinyint-to-boolean

Text encoding issues:

-- Handle character set conversion
SET MySQL PARAMETERS
    character_set_client = 'utf8mb4',
    character_set_connection = 'utf8mb4'

Memory and Performance Issues

Large tables can cause memory issues. Solutions include:

-- Reduce batch sizes for memory-constrained environments
WITH workers = 4, concurrency = 1,
     rows per range = 10000,
     prefetch rows = 5000

-- Process large tables separately
EXCLUDING TABLE NAMES MATCHING ~/^large_table_/

-- Then process large tables individually with custom settings

Connection and Network Problems

Timeout issues are common with large migrations:

SET MySQL PARAMETERS
    net_read_timeout = '600s',
    net_write_timeout = '600s',
    interactive_timeout = '28800s',
    wait_timeout = '28800s'

SET PostgreSQL PARAMETERS
    statement_timeout to '0',
    lock_timeout to '0'

Foreign Key and Constraint Issues

Handle referential integrity carefully:

-- Disable foreign key checks during migration
BEFORE LOAD DO
$$
SET session_replication_role = replica;
$$

AFTER LOAD DO
$$
SET session_replication_role = DEFAULT;

-- Validate all foreign key constraints
DO $$
DECLARE
    r RECORD;
BEGIN
    FOR r IN SELECT constraint_name, table_name 
             FROM information_schema.table_constraints 
             WHERE constraint_type = 'FOREIGN KEY'
    LOOP
        EXECUTE 'ALTER TABLE ' || r.table_name || ' VALIDATE CONSTRAINT ' || r.constraint_name;
    END LOOP;
END
$$;
$$

Comparing pgloader with Alternative Migration Tools

Tool	Pros	Cons	Best Use Case
pgloader	Automatic schema conversion, parallel processing, built-in error handling	Limited customization for complex transformations	Straightforward MySQL to PostgreSQL migrations
AWS DMS	Cloud-native, continuous replication, minimal downtime	AWS ecosystem lock-in, complex pricing	Cloud migrations with ongoing replication needs
mysqldump + custom scripts	Full control, free, works everywhere	Manual schema conversion, time-consuming, error-prone	Small databases or highly customized migrations
Pentaho/Talend	GUI interface, complex transformations, enterprise features	Steep learning curve, licensing costs	Enterprise environments with complex data transformations

Best Practices and Production Considerations

Pre-Migration Checklist

Perform a complete backup of source MySQL database
Test migration on a subset of data first
Document all custom functions, procedures, and triggers
Identify and plan for application-specific SQL query changes
Set up monitoring for both source and target databases
Plan for downtime and coordinate with stakeholders

Security Considerations

Never expose database credentials in plain text:

# Use environment variables
export MYSQL_URL="mysql://user:pass@host/db"
export PG_URL="postgresql://user:pass@host/db"

# Then reference in pgloader config
LOAD DATABASE FROM ${MYSQL_URL} INTO ${PG_URL}

Or use PostgreSQL’s password file:

# Create ~/.pgpass file
hostname:port:database:username:password

# Set restrictive permissions
chmod 600 ~/.pgpass

Post-Migration Validation

Always validate your migration results:

-- Compare row counts
SELECT 'users' as table_name, COUNT(*) as row_count FROM users
UNION ALL
SELECT 'orders' as table_name, COUNT(*) as row_count FROM orders;

-- Validate data integrity
SELECT COUNT(*) as orphaned_orders 
FROM orders o 
LEFT JOIN users u ON o.user_id = u.id 
WHERE u.id IS NULL;

-- Check for null values in required fields
SELECT COUNT(*) as null_emails 
FROM users 
WHERE email IS NULL;

-- Verify indexes are created
SELECT schemaname, tablename, indexname, indexdef 
FROM pg_indexes 
WHERE schemaname = 'public' 
ORDER BY tablename, indexname;

Application Migration Considerations

Remember that migrating data is only part of the process. Applications may need updates for:

SQL syntax differences (LIMIT vs LIMIT/OFFSET)
Date/time handling differences
Case sensitivity in column names
Boolean value representation
Auto-increment vs sequences

For more detailed information about pgloader’s capabilities and advanced configuration options, refer to the official pgloader documentation. The PostgreSQL wiki also provides excellent resources on database migration best practices.

Migration projects can be complex, but pgloader significantly simplifies the process of moving from MySQL to PostgreSQL. With proper planning, testing, and the techniques outlined in this guide, you can achieve a successful migration with minimal downtime and data integrity issues. Remember to always test thoroughly in a staging environment before proceeding with production migrations.

This article incorporates information and material from various online sources. We acknowledge and appreciate the work of all original authors, publishers, and websites. While every effort has been made to appropriately credit the source material, any unintentional oversight or omission does not constitute a copyright infringement. All trademarks, logos, and images mentioned are the property of their respective owners. If you believe that any content used in this article infringes upon your copyright, please contact us immediately for review and prompt action.

This article is intended for informational and educational purposes only and does not infringe on the rights of the copyright owners. If any copyrighted material has been used without proper credit or in violation of copyright laws, it is unintentional and we will rectify it promptly upon notification. Please note that the republishing, redistribution, or reproduction of part or all of the contents in any form is prohibited without express written permission from the author and website owner. For permissions or further inquiries, please contact us.