BLOG POSTS
How to Share Data Between Docker Containers

How to Share Data Between Docker Containers

Sharing data between Docker containers is one of those fundamental skills that separates Docker beginners from intermediate users. Whether you’re building a microservices architecture, setting up a development environment, or deploying a multi-container application, you’ll inevitably need containers to communicate and share information. This post covers the main approaches for data sharing between containers, from Docker volumes and bind mounts to networking solutions, complete with practical examples and troubleshooting tips that’ll save you hours of debugging.

How Docker Container Data Sharing Works

Docker containers are designed to be isolated by default, which is great for security and consistency but creates challenges when you need containers to share data. Docker provides several mechanisms to break this isolation in controlled ways:

  • Volumes: Docker-managed storage that persists outside container lifecycles
  • Bind mounts: Direct mapping of host filesystem paths into containers
  • tmpfs mounts: In-memory storage for temporary data
  • Named pipes and sockets: For process communication
  • Network-based sharing: Using databases, message queues, or APIs

The key difference between these approaches lies in where the data lives and who manages it. Volumes are managed by Docker and stored in /var/lib/docker/volumes/ on Linux systems, while bind mounts link directly to host filesystem paths. This affects portability, performance, and management complexity.

Docker Volumes: The Recommended Approach

Docker volumes are the most flexible and widely-used method for sharing data between containers. They’re managed entirely by Docker, making them portable across different host systems and easier to backup or migrate.

Creating and Using Named Volumes

Here’s how to create a named volume and share it between multiple containers:

# Create a named volume
docker volume create shared-data

# Run first container with the volume
docker run -d --name app1 -v shared-data:/app/data nginx:alpine

# Run second container sharing the same volume
docker run -d --name app2 -v shared-data:/var/www/html ubuntu:20.04

# List volumes to verify
docker volume ls

You can also create volumes on-the-fly when running containers:

# Volume gets created automatically if it doesn't exist
docker run -d --name database -v postgres-data:/var/lib/postgresql/data postgres:13

# Share the same volume with a backup container
docker run --rm -v postgres-data:/data -v $(pwd):/backup ubuntu tar czf /backup/postgres-backup.tar.gz /data

Docker Compose Volume Sharing

Docker Compose makes volume management much cleaner for multi-container applications:

version: '3.8'
services:
  web:
    image: nginx:alpine
    volumes:
      - shared-content:/usr/share/nginx/html
      - app-logs:/var/log/nginx
  
  app:
    image: node:16-alpine
    volumes:
      - shared-content:/app/public
      - app-logs:/app/logs
    working_dir: /app
    command: npm start
  
  log-analyzer:
    image: fluent/fluent-bit
    volumes:
      - app-logs:/fluent-bit/logs:ro  # Read-only access

volumes:
  shared-content:
  app-logs:

Bind Mounts for Development and Host Integration

Bind mounts directly map host filesystem paths into containers, making them perfect for development workflows where you want live code reloading or need to access host-specific resources.

# Basic bind mount syntax
docker run -d --name dev-server -v /host/path:/container/path node:16-alpine

# Real-world development example
docker run -d \
  --name react-dev \
  -p 3000:3000 \
  -v $(pwd)/src:/app/src \
  -v $(pwd)/public:/app/public \
  -v node_modules:/app/node_modules \
  node:16-alpine npm start

# Multiple containers sharing host directory
docker run -d --name web-server -v /var/www:/usr/share/nginx/html nginx:alpine
docker run -d --name file-processor -v /var/www:/app/input python:3.9-alpine

Bind Mount Permissions and Security

One of the biggest gotchas with bind mounts is file permissions. Here’s how to handle common scenarios:

# Run container with specific user ID to match host permissions
docker run -d \
  --name app \
  --user $(id -u):$(id -g) \
  -v $(pwd)/data:/app/data \
  ubuntu:20.04

# For Docker rootless mode
docker run -d \
  --name app \
  --user 1000:1000 \
  -v $(pwd)/data:/app/data:Z \
  ubuntu:20.04

Advanced Data Sharing Patterns

Init Containers for Data Preparation

Sometimes you need to prepare or populate data before your main containers start:

version: '3.8'
services:
  data-init:
    image: busybox
    volumes:
      - app-data:/data
    command: >
      sh -c "
        echo 'Initializing data...' &&
        mkdir -p /data/config /data/logs &&
        echo 'app_version=1.0' > /data/config/app.conf &&
        echo 'Data initialization complete'
      "
  
  web-app:
    image: nginx:alpine
    depends_on:
      - data-init
    volumes:
      - app-data:/app/data
    ports:
      - "80:80"

volumes:
  app-data:

Sidecar Pattern for Shared Resources

The sidecar pattern is excellent for sharing processed data or providing shared services:

version: '3.8'
services:
  log-collector:
    image: fluent/fluent-bit
    volumes:
      - log-data:/logs
    command: ["/fluent-bit/bin/fluent-bit", "--config=/fluent-bit/etc/fluent-bit.conf"]
  
  main-app:
    image: myapp:latest
    volumes:
      - log-data:/app/logs
    depends_on:
      - log-collector
  
  log-analyzer:
    image: elasticsearch:7.14.0
    volumes:
      - log-data:/usr/share/elasticsearch/data
    environment:
      - discovery.type=single-node

volumes:
  log-data:

Network-Based Data Sharing

For more complex scenarios, network-based sharing often provides better scalability and flexibility than filesystem-based approaches.

Database Sharing Pattern

version: '3.8'
services:
  postgres:
    image: postgres:13
    environment:
      POSTGRES_DB: shared_db
      POSTGRES_USER: app_user
      POSTGRES_PASSWORD: secure_password
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - app-network
  
  api-service:
    image: myapi:latest
    environment:
      DATABASE_URL: postgresql://app_user:secure_password@postgres:5432/shared_db
    networks:
      - app-network
    depends_on:
      - postgres
  
  worker-service:
    image: myworker:latest
    environment:
      DATABASE_URL: postgresql://app_user:secure_password@postgres:5432/shared_db
    networks:
      - app-network
    depends_on:
      - postgres

volumes:
  postgres_data:

networks:
  app-network:
    driver: bridge

Redis for High-Performance Data Sharing

# Redis as shared cache and message broker
version: '3.8'
services:
  redis:
    image: redis:6-alpine
    command: redis-server --appendonly yes
    volumes:
      - redis_data:/data
    networks:
      - shared-network
  
  producer:
    image: python:3.9-alpine
    command: >
      sh -c "
        pip install redis &&
        python -c \"
        import redis, time, json
        r = redis.Redis(host='redis', port=6379, decode_responses=True)
        while True:
          data = {'timestamp': time.time(), 'message': 'Hello from producer'}
          r.lpush('task_queue', json.dumps(data))
          time.sleep(5)
        \"
      "
    networks:
      - shared-network
    depends_on:
      - redis
  
  consumer:
    image: python:3.9-alpine
    command: >
      sh -c "
        pip install redis &&
        python -c \"
        import redis, json
        r = redis.Redis(host='redis', port=6379, decode_responses=True)
        while True:
          task = r.brpop('task_queue', timeout=10)
          if task:
            data = json.loads(task[1])
            print(f'Processed: {data}')
        \"
      "
    networks:
      - shared-network
    depends_on:
      - redis

volumes:
  redis_data:

networks:
  shared-network:
    driver: bridge

Performance Comparison and Best Practices

Different data sharing methods have varying performance characteristics depending on your use case:

Method Read Performance Write Performance Portability Management Complexity Best Use Case
Docker Volumes High High High Low Production persistent data
Bind Mounts Highest Highest Low Medium Development, host integration
tmpfs Mounts Highest Highest High Low Temporary data, caches
Network (DB) Medium Medium Highest High Structured data, transactions
Network (Redis) High High High Medium Caching, pub/sub, sessions

Volume Performance Optimization

For high-performance applications, consider these volume optimization techniques:

# Use local driver with specific options for better performance
docker volume create \
  --driver local \
  --opt type=tmpfs \
  --opt device=tmpfs \
  --opt o=size=1000m,uid=1000 \
  fast-cache

# For production databases, use local SSDs
docker volume create \
  --driver local \
  --opt type=none \
  --opt o=bind \
  --opt device=/fast-ssd/postgres \
  postgres-data

# Example usage with performance considerations
docker run -d \
  --name high-perf-db \
  -v postgres-data:/var/lib/postgresql/data \
  --shm-size=256m \
  postgres:13

Common Pitfalls and Troubleshooting

Permission Issues

The most common problem when sharing data between containers is file permission mismatches:

# Debug permission issues
docker run --rm -v shared-data:/data alpine ls -la /data

# Fix ownership in a utility container
docker run --rm -v shared-data:/data alpine chown -R 1000:1000 /data

# Run containers with consistent user IDs
docker run -d --name app1 --user 1000:1000 -v shared-data:/app/data myapp:latest
docker run -d --name app2 --user 1000:1000 -v shared-data:/app/data myapp:latest

Volume Cleanup and Management

Docker volumes can accumulate over time, consuming disk space:

# List all volumes with size information
docker system df -v

# Remove unused volumes
docker volume prune

# Remove specific volume (be careful!)
docker volume rm volume-name

# Backup volume data
docker run --rm \
  -v volume-name:/data \
  -v $(pwd):/backup \
  ubuntu tar czf /backup/volume-backup.tar.gz /data

Container Startup Order Issues

When containers depend on shared data, startup order matters:

version: '3.8'
services:
  data-container:
    image: busybox
    volumes:
      - shared-data:/data
    command: >
      sh -c "
        echo 'Setting up shared data...' &&
        touch /data/ready &&
        echo 'Data setup complete'
      "
  
  app-container:
    image: myapp:latest
    volumes:
      - shared-data:/app/data
    depends_on:
      - data-container
    healthcheck:
      test: ["CMD", "test", "-f", "/app/data/ready"]
      interval: 30s
      timeout: 10s
      retries: 3

volumes:
  shared-data:

Real-World Use Cases and Examples

Multi-Stage Data Processing Pipeline

Here’s a practical example of a data processing pipeline where containers share data at different stages:

version: '3.8'
services:
  data-ingestion:
    image: python:3.9-alpine
    volumes:
      - raw-data:/app/input
      - processed-data:/app/output
    command: >
      sh -c "
        pip install requests pandas &&
        python -c \"
        import pandas as pd
        import time
        while True:
          # Simulate data ingestion
          df = pd.DataFrame({'timestamp': [time.time()], 'value': [42]})
          df.to_csv('/app/output/data.csv', mode='a', header=False)
          time.sleep(60)
        \"
      "
  
  data-processor:
    image: python:3.9-alpine
    volumes:
      - processed-data:/app/input
      - analytics-data:/app/output
    command: >
      sh -c "
        pip install pandas numpy &&
        python -c \"
        import pandas as pd, time
        while True:
          try:
            df = pd.read_csv('/app/input/data.csv', header=None)
            # Process data
            df['processed'] = df[1] * 2
            df.to_json('/app/output/processed.json')
            time.sleep(30)
          except: pass
        \"
      "
    depends_on:
      - data-ingestion
  
  web-dashboard:
    image: nginx:alpine
    volumes:
      - analytics-data:/usr/share/nginx/html:ro
    ports:
      - "8080:80"
    depends_on:
      - data-processor

volumes:
  raw-data:
  processed-data:
  analytics-data:

Development Environment with Live Reload

For development workflows, bind mounts provide the best experience:

version: '3.8'
services:
  frontend:
    image: node:16-alpine
    working_dir: /app
    volumes:
      - ./frontend:/app
      - frontend_modules:/app/node_modules
    ports:
      - "3000:3000"
    command: npm run dev
    environment:
      - CHOKIDAR_USEPOLLING=true
  
  backend:
    image: python:3.9-alpine
    working_dir: /app
    volumes:
      - ./backend:/app
      - shared-uploads:/app/uploads
    ports:
      - "8000:8000"
    command: >
      sh -c "
        pip install flask flask-cors watchdog &&
        python app.py
      "
    environment:
      - FLASK_ENV=development
  
  file-processor:
    image: python:3.9-alpine
    volumes:
      - shared-uploads:/app/input
      - ./backend/processed:/app/output
    command: >
      sh -c "
        pip install pillow watchdog &&
        python -c \"
        import time, os
        from PIL import Image
        while True:
          for f in os.listdir('/app/input'):
            if f.endswith('.jpg'):
              img = Image.open(f'/app/input/{f}')
              img.thumbnail((200, 200))
              img.save(f'/app/output/thumb_{f}')
          time.sleep(5)
        \"
      "

volumes:
  frontend_modules:
  shared-uploads:

Security Considerations

When sharing data between containers, security should be a primary concern:

  • Use read-only mounts when containers only need read access
  • Limit volume scope to only the necessary directories
  • Avoid sharing sensitive host paths like /var/run/docker.sock
  • Use Docker secrets for sensitive configuration data
  • Implement proper user mapping to prevent privilege escalation
# Example of security-conscious volume sharing
version: '3.8'
services:
  web-app:
    image: nginx:alpine
    user: "nginx"
    volumes:
      - web-content:/usr/share/nginx/html:ro  # Read-only
      - web-logs:/var/log/nginx:rw            # Read-write only for logs
    ports:
      - "80:80"
    read_only: true
    tmpfs:
      - /tmp
      - /var/cache/nginx
  
  content-updater:
    image: alpine:latest
    user: "1000:1000"
    volumes:
      - web-content:/app/content:rw
      - ./scripts:/app/scripts:ro
    command: /app/scripts/update-content.sh

volumes:
  web-content:
  web-logs:

For more advanced Docker networking and security best practices, check out the official Docker documentation at docs.docker.com/storage/ and the Docker security guide at docs.docker.com/engine/security/. These resources provide comprehensive coverage of Docker’s storage drivers, security models, and production deployment considerations that complement the data sharing techniques covered in this post.



This article incorporates information and material from various online sources. We acknowledge and appreciate the work of all original authors, publishers, and websites. While every effort has been made to appropriately credit the source material, any unintentional oversight or omission does not constitute a copyright infringement. All trademarks, logos, and images mentioned are the property of their respective owners. If you believe that any content used in this article infringes upon your copyright, please contact us immediately for review and prompt action.

This article is intended for informational and educational purposes only and does not infringe on the rights of the copyright owners. If any copyrighted material has been used without proper credit or in violation of copyright laws, it is unintentional and we will rectify it promptly upon notification. Please note that the republishing, redistribution, or reproduction of part or all of the contents in any form is prohibited without express written permission from the author and website owner. For permissions or further inquiries, please contact us.

Leave a reply

Your email address will not be published. Required fields are marked