BLOG POSTS
    MangoHost Blog / How to Use wget to Download Files and Interact with REST APIs
How to Use wget to Download Files and Interact with REST APIs

How to Use wget to Download Files and Interact with REST APIs

wget is a powerful command-line tool that serves as the Swiss Army knife for downloading files and interacting with web services from Unix-like systems. Whether you’re a sysadmin automating server maintenance tasks, a developer testing REST APIs, or just need to grab files efficiently from remote servers, wget provides robust functionality that goes far beyond simple file downloads. This guide will walk you through wget’s core capabilities, from basic file retrieval to advanced REST API interactions, complete with real-world examples and troubleshooting tips that’ll save you hours of debugging.

How wget Works Under the Hood

wget operates as an HTTP/HTTPS client that can handle various protocols including FTP and FTPS. Unlike curl, which is designed for data transfer with maximum flexibility, wget specializes in recursive downloads and robust file retrieval with automatic retry mechanisms. It maintains a connection pool, handles cookies, supports authentication methods, and can resume interrupted downloads.

The tool sends standard HTTP requests with customizable headers, user agents, and request methods. For REST API interactions, wget constructs requests with proper HTTP verbs (GET, POST, PUT, DELETE) and can handle JSON payloads, authentication tokens, and response processing.

Feature wget curl HTTP
Recursive Downloads Native support Manual scripting required Not applicable
Resume Downloads Automatic (-c flag) Manual range headers Range requests
Background Operation Built-in (-b flag) External process management Not applicable
Rate Limiting Native (–limit-rate) External tools needed Not applicable

Step-by-Step Implementation Guide

Basic File Downloads

Start with simple file downloads to understand wget’s basic syntax:

# Download a single file
wget https://example.com/file.zip

# Download with custom filename
wget -O myfile.zip https://example.com/file.zip

# Resume interrupted download
wget -c https://example.com/largefile.iso

# Download in background with logging
wget -b -o download.log https://example.com/file.zip

# Rate-limited download (useful for production servers)
wget --limit-rate=200k https://example.com/file.zip

REST API Interactions

wget can handle most REST API operations, though the syntax differs from curl:

# GET request with headers
wget --header="Authorization: Bearer your-token-here" \
     --header="Content-Type: application/json" \
     -O response.json \
     https://api.example.com/users

# POST request with JSON data
wget --header="Content-Type: application/json" \
     --post-data='{"name":"John","email":"john@example.com"}' \
     -O response.json \
     https://api.example.com/users

# PUT request (requires wget 1.15+)
wget --method=PUT \
     --header="Content-Type: application/json" \
     --body-data='{"id":123,"name":"Updated Name"}' \
     -O response.json \
     https://api.example.com/users/123

# DELETE request
wget --method=DELETE \
     --header="Authorization: Bearer your-token" \
     https://api.example.com/users/123

Advanced Authentication

Handle various authentication methods commonly used in APIs:

# Basic HTTP authentication
wget --user=username --password=password https://api.example.com/data

# Using .netrc file for credentials (safer)
echo "machine api.example.com login username password secret" >> ~/.netrc
chmod 600 ~/.netrc
wget https://api.example.com/data

# Client certificate authentication
wget --certificate=client.crt --private-key=client.key https://api.example.com/secure

# Custom authentication headers
wget --header="X-API-Key: your-api-key" \
     --header="X-Client-ID: your-client-id" \
     https://api.example.com/data

Real-World Examples and Use Cases

Automated Backup Downloads

Create a robust backup download script for your VPS or dedicated server:

#!/bin/bash
# backup-downloader.sh

BACKUP_URL="https://backups.example.com"
API_KEY="your-api-key"
BACKUP_DIR="/var/backups/remote"

# Get list of available backups
wget --header="Authorization: Bearer $API_KEY" \
     -O backup-list.json \
     "$BACKUP_URL/api/backups"

# Parse JSON and download latest backup
LATEST_BACKUP=$(cat backup-list.json | jq -r '.backups[0].download_url')

wget --header="Authorization: Bearer $API_KEY" \
     --progress=bar:force \
     --limit-rate=1m \
     -P "$BACKUP_DIR" \
     "$LATEST_BACKUP"

# Verify download integrity
if [ $? -eq 0 ]; then
    echo "Backup downloaded successfully"
    # Add checksum verification
    wget --header="Authorization: Bearer $API_KEY" \
         -O backup.sha256 \
         "$BACKUP_URL/api/backups/latest/checksum"
else
    echo "Download failed, check logs"
    exit 1
fi

API Health Monitoring

Monitor REST API endpoints and log response times:

#!/bin/bash
# api-monitor.sh

ENDPOINTS=(
    "https://api.example.com/health"
    "https://api.example.com/status"
    "https://api.example.com/metrics"
)

for endpoint in "${ENDPOINTS[@]}"; do
    start_time=$(date +%s.%N)
    
    wget --quiet \
         --timeout=10 \
         --tries=3 \
         --header="User-Agent: HealthCheck/1.0" \
         -O /tmp/health_check.json \
         "$endpoint"
    
    exit_code=$?
    end_time=$(date +%s.%N)
    response_time=$(echo "$end_time - $start_time" | bc)
    
    if [ $exit_code -eq 0 ]; then
        echo "$(date): $endpoint - OK (${response_time}s)"
    else
        echo "$(date): $endpoint - FAILED (${response_time}s)"
    fi
done

Bulk File Processing

Download and process multiple files from a REST API:

#!/bin/bash
# bulk-processor.sh

API_BASE="https://api.example.com"
TOKEN="your-jwt-token"

# Get list of files to process
wget --header="Authorization: Bearer $TOKEN" \
     -O file-list.json \
     "$API_BASE/files?status=pending"

# Process each file
jq -r '.files[].id' file-list.json | while read file_id; do
    echo "Processing file ID: $file_id"
    
    # Download file
    wget --header="Authorization: Bearer $TOKEN" \
         -O "file_${file_id}.dat" \
         "$API_BASE/files/$file_id/download"
    
    # Process file (example: compress)
    gzip "file_${file_id}.dat"
    
    # Update status via API
    wget --header="Authorization: Bearer $TOKEN" \
         --header="Content-Type: application/json" \
         --method=PUT \
         --body-data='{"status":"processed"}' \
         "$API_BASE/files/$file_id/status"
    
    echo "Completed file ID: $file_id"
done

Performance Optimization and Best Practices

Connection Management

Optimize wget for better performance in production environments:

# Increase timeout for slow servers
wget --timeout=30 --dns-timeout=10 --connect-timeout=10 https://slow-api.example.com/data

# Use HTTP/1.1 keep-alive
wget --header="Connection: keep-alive" https://api.example.com/endpoint

# Parallel downloads (be careful with rate limits)
wget --background \
     --output-file=download1.log \
     https://api.example.com/file1 &
wget --background \
     --output-file=download2.log \
     https://api.example.com/file2 &

# Wait for all background jobs
wait

Error Handling and Retry Logic

# Robust retry configuration
wget --tries=5 \
     --retry-connrefused \
     --waitretry=10 \
     --timeout=30 \
     https://unreliable-api.example.com/data

# Custom retry script with exponential backoff
#!/bin/bash
retry_download() {
    local url=$1
    local max_attempts=5
    local delay=1
    
    for ((i=1; i<=max_attempts; i++)); do
        echo "Attempt $i/$max_attempts"
        
        if wget --timeout=30 "$url"; then
            echo "Download successful"
            return 0
        fi
        
        if [ $i -lt $max_attempts ]; then
            echo "Failed, retrying in ${delay}s"
            sleep $delay
            delay=$((delay * 2))  # Exponential backoff
        fi
    done
    
    echo "Download failed after $max_attempts attempts"
    return 1
}

retry_download "https://api.example.com/large-file.zip"

Common Pitfalls and Troubleshooting

SSL/TLS Issues

Handle certificate problems that frequently occur in development environments:

# Skip certificate verification (development only)
wget --no-check-certificate https://self-signed.example.com/api

# Use custom CA bundle
wget --ca-certificate=/path/to/custom-ca.pem https://internal-api.company.com

# Debug SSL handshake
wget --debug --verbose https://problematic-ssl.example.com 2>&1 | grep -i ssl

HTTP Response Handling

Process different HTTP status codes appropriately:

#!/bin/bash
# smart-wget.sh

url="https://api.example.com/data"
response_file="response.json"
headers_file="headers.txt"

# Download with server response headers
wget --server-response \
     --output-document="$response_file" \
     "$url" 2>&1 | tee "$headers_file"

# Extract HTTP status code
status_code=$(grep "HTTP/" "$headers_file" | tail -1 | awk '{print $2}')

case $status_code in
    200)
        echo "Success: Data downloaded"
        ;;
    401)
        echo "Error: Authentication required"
        exit 1
        ;;
    429)
        echo "Error: Rate limited, waiting..."
        sleep 60
        exec "$0"  # Retry script
        ;;
    5*)
        echo "Server error: $status_code"
        exit 1
        ;;
    *)
        echo "Unexpected status: $status_code"
        cat "$headers_file"
        ;;
esac

Memory and Disk Space Management

# Stream large responses without saving to disk
wget -O - https://api.example.com/large-dataset.json | jq '.data[] | select(.status=="active")'

# Check available space before download
check_space_and_download() {
    local url=$1
    local required_space_mb=$2
    local available_space=$(df . | awk 'NR==2{print int($4/1024)}')
    
    if [ $available_space -lt $required_space_mb ]; then
        echo "Insufficient space: ${available_space}MB available, ${required_space_mb}MB required"
        exit 1
    fi
    
    wget --progress=dot:giga "$url"
}

check_space_and_download "https://example.com/big-file.zip" 1000

Integration with Modern Development Workflows

Docker Container Usage

Use wget in containerized environments:

# Dockerfile example
FROM alpine:latest
RUN apk add --no-cache wget ca-certificates jq

WORKDIR /app
COPY download-script.sh .
RUN chmod +x download-script.sh

# Health check using wget
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
    CMD wget --quiet --tries=1 --spider http://localhost:8080/health || exit 1

CMD ["./download-script.sh"]

CI/CD Pipeline Integration

# GitHub Actions example
name: Download and Process Data
on:
  schedule:
    - cron: '0 2 * * *'  # Daily at 2 AM

jobs:
  download:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Download API data
        run: |
          wget --header="Authorization: Bearer ${{ secrets.API_TOKEN }}" \
               --output-document=data.json \
               https://api.example.com/daily-export
      
      - name: Validate download
        run: |
          if [ ! -s data.json ]; then
            echo "Download failed or empty file"
            exit 1
          fi
          
          # Validate JSON structure
          jq empty data.json
      
      - name: Upload to storage
        run: |
          aws s3 cp data.json s3://your-bucket/daily-exports/$(date +%Y-%m-%d).json

wget remains an essential tool for system administrators and developers working with file downloads and API interactions. Its robust retry mechanisms, built-in authentication support, and reliable performance make it particularly valuable for production environments where stability matters more than cutting-edge features. While newer tools like curl might offer more flexibility for complex API interactions, wget excels in scenarios requiring dependable, automated file retrieval and basic REST operations.

For comprehensive documentation and advanced usage examples, consult the official GNU wget manual. When deploying these techniques on production servers, consider the network and storage implications, especially when working with high-volume data transfers on your hosting infrastructure.



This article incorporates information and material from various online sources. We acknowledge and appreciate the work of all original authors, publishers, and websites. While every effort has been made to appropriately credit the source material, any unintentional oversight or omission does not constitute a copyright infringement. All trademarks, logos, and images mentioned are the property of their respective owners. If you believe that any content used in this article infringes upon your copyright, please contact us immediately for review and prompt action.

This article is intended for informational and educational purposes only and does not infringe on the rights of the copyright owners. If any copyrighted material has been used without proper credit or in violation of copyright laws, it is unintentional and we will rectify it promptly upon notification. Please note that the republishing, redistribution, or reproduction of part or all of the contents in any form is prohibited without express written permission from the author and website owner. For permissions or further inquiries, please contact us.

Leave a reply

Your email address will not be published. Required fields are marked