
How to Use wget to Download Files and Interact with REST APIs
wget is a powerful command-line tool that serves as the Swiss Army knife for downloading files and interacting with web services from Unix-like systems. Whether you’re a sysadmin automating server maintenance tasks, a developer testing REST APIs, or just need to grab files efficiently from remote servers, wget provides robust functionality that goes far beyond simple file downloads. This guide will walk you through wget’s core capabilities, from basic file retrieval to advanced REST API interactions, complete with real-world examples and troubleshooting tips that’ll save you hours of debugging.
How wget Works Under the Hood
wget operates as an HTTP/HTTPS client that can handle various protocols including FTP and FTPS. Unlike curl, which is designed for data transfer with maximum flexibility, wget specializes in recursive downloads and robust file retrieval with automatic retry mechanisms. It maintains a connection pool, handles cookies, supports authentication methods, and can resume interrupted downloads.
The tool sends standard HTTP requests with customizable headers, user agents, and request methods. For REST API interactions, wget constructs requests with proper HTTP verbs (GET, POST, PUT, DELETE) and can handle JSON payloads, authentication tokens, and response processing.
Feature | wget | curl | HTTP |
---|---|---|---|
Recursive Downloads | Native support | Manual scripting required | Not applicable |
Resume Downloads | Automatic (-c flag) | Manual range headers | Range requests |
Background Operation | Built-in (-b flag) | External process management | Not applicable |
Rate Limiting | Native (–limit-rate) | External tools needed | Not applicable |
Step-by-Step Implementation Guide
Basic File Downloads
Start with simple file downloads to understand wget’s basic syntax:
# Download a single file
wget https://example.com/file.zip
# Download with custom filename
wget -O myfile.zip https://example.com/file.zip
# Resume interrupted download
wget -c https://example.com/largefile.iso
# Download in background with logging
wget -b -o download.log https://example.com/file.zip
# Rate-limited download (useful for production servers)
wget --limit-rate=200k https://example.com/file.zip
REST API Interactions
wget can handle most REST API operations, though the syntax differs from curl:
# GET request with headers
wget --header="Authorization: Bearer your-token-here" \
--header="Content-Type: application/json" \
-O response.json \
https://api.example.com/users
# POST request with JSON data
wget --header="Content-Type: application/json" \
--post-data='{"name":"John","email":"john@example.com"}' \
-O response.json \
https://api.example.com/users
# PUT request (requires wget 1.15+)
wget --method=PUT \
--header="Content-Type: application/json" \
--body-data='{"id":123,"name":"Updated Name"}' \
-O response.json \
https://api.example.com/users/123
# DELETE request
wget --method=DELETE \
--header="Authorization: Bearer your-token" \
https://api.example.com/users/123
Advanced Authentication
Handle various authentication methods commonly used in APIs:
# Basic HTTP authentication
wget --user=username --password=password https://api.example.com/data
# Using .netrc file for credentials (safer)
echo "machine api.example.com login username password secret" >> ~/.netrc
chmod 600 ~/.netrc
wget https://api.example.com/data
# Client certificate authentication
wget --certificate=client.crt --private-key=client.key https://api.example.com/secure
# Custom authentication headers
wget --header="X-API-Key: your-api-key" \
--header="X-Client-ID: your-client-id" \
https://api.example.com/data
Real-World Examples and Use Cases
Automated Backup Downloads
Create a robust backup download script for your VPS or dedicated server:
#!/bin/bash
# backup-downloader.sh
BACKUP_URL="https://backups.example.com"
API_KEY="your-api-key"
BACKUP_DIR="/var/backups/remote"
# Get list of available backups
wget --header="Authorization: Bearer $API_KEY" \
-O backup-list.json \
"$BACKUP_URL/api/backups"
# Parse JSON and download latest backup
LATEST_BACKUP=$(cat backup-list.json | jq -r '.backups[0].download_url')
wget --header="Authorization: Bearer $API_KEY" \
--progress=bar:force \
--limit-rate=1m \
-P "$BACKUP_DIR" \
"$LATEST_BACKUP"
# Verify download integrity
if [ $? -eq 0 ]; then
echo "Backup downloaded successfully"
# Add checksum verification
wget --header="Authorization: Bearer $API_KEY" \
-O backup.sha256 \
"$BACKUP_URL/api/backups/latest/checksum"
else
echo "Download failed, check logs"
exit 1
fi
API Health Monitoring
Monitor REST API endpoints and log response times:
#!/bin/bash
# api-monitor.sh
ENDPOINTS=(
"https://api.example.com/health"
"https://api.example.com/status"
"https://api.example.com/metrics"
)
for endpoint in "${ENDPOINTS[@]}"; do
start_time=$(date +%s.%N)
wget --quiet \
--timeout=10 \
--tries=3 \
--header="User-Agent: HealthCheck/1.0" \
-O /tmp/health_check.json \
"$endpoint"
exit_code=$?
end_time=$(date +%s.%N)
response_time=$(echo "$end_time - $start_time" | bc)
if [ $exit_code -eq 0 ]; then
echo "$(date): $endpoint - OK (${response_time}s)"
else
echo "$(date): $endpoint - FAILED (${response_time}s)"
fi
done
Bulk File Processing
Download and process multiple files from a REST API:
#!/bin/bash
# bulk-processor.sh
API_BASE="https://api.example.com"
TOKEN="your-jwt-token"
# Get list of files to process
wget --header="Authorization: Bearer $TOKEN" \
-O file-list.json \
"$API_BASE/files?status=pending"
# Process each file
jq -r '.files[].id' file-list.json | while read file_id; do
echo "Processing file ID: $file_id"
# Download file
wget --header="Authorization: Bearer $TOKEN" \
-O "file_${file_id}.dat" \
"$API_BASE/files/$file_id/download"
# Process file (example: compress)
gzip "file_${file_id}.dat"
# Update status via API
wget --header="Authorization: Bearer $TOKEN" \
--header="Content-Type: application/json" \
--method=PUT \
--body-data='{"status":"processed"}' \
"$API_BASE/files/$file_id/status"
echo "Completed file ID: $file_id"
done
Performance Optimization and Best Practices
Connection Management
Optimize wget for better performance in production environments:
# Increase timeout for slow servers
wget --timeout=30 --dns-timeout=10 --connect-timeout=10 https://slow-api.example.com/data
# Use HTTP/1.1 keep-alive
wget --header="Connection: keep-alive" https://api.example.com/endpoint
# Parallel downloads (be careful with rate limits)
wget --background \
--output-file=download1.log \
https://api.example.com/file1 &
wget --background \
--output-file=download2.log \
https://api.example.com/file2 &
# Wait for all background jobs
wait
Error Handling and Retry Logic
# Robust retry configuration
wget --tries=5 \
--retry-connrefused \
--waitretry=10 \
--timeout=30 \
https://unreliable-api.example.com/data
# Custom retry script with exponential backoff
#!/bin/bash
retry_download() {
local url=$1
local max_attempts=5
local delay=1
for ((i=1; i<=max_attempts; i++)); do
echo "Attempt $i/$max_attempts"
if wget --timeout=30 "$url"; then
echo "Download successful"
return 0
fi
if [ $i -lt $max_attempts ]; then
echo "Failed, retrying in ${delay}s"
sleep $delay
delay=$((delay * 2)) # Exponential backoff
fi
done
echo "Download failed after $max_attempts attempts"
return 1
}
retry_download "https://api.example.com/large-file.zip"
Common Pitfalls and Troubleshooting
SSL/TLS Issues
Handle certificate problems that frequently occur in development environments:
# Skip certificate verification (development only)
wget --no-check-certificate https://self-signed.example.com/api
# Use custom CA bundle
wget --ca-certificate=/path/to/custom-ca.pem https://internal-api.company.com
# Debug SSL handshake
wget --debug --verbose https://problematic-ssl.example.com 2>&1 | grep -i ssl
HTTP Response Handling
Process different HTTP status codes appropriately:
#!/bin/bash
# smart-wget.sh
url="https://api.example.com/data"
response_file="response.json"
headers_file="headers.txt"
# Download with server response headers
wget --server-response \
--output-document="$response_file" \
"$url" 2>&1 | tee "$headers_file"
# Extract HTTP status code
status_code=$(grep "HTTP/" "$headers_file" | tail -1 | awk '{print $2}')
case $status_code in
200)
echo "Success: Data downloaded"
;;
401)
echo "Error: Authentication required"
exit 1
;;
429)
echo "Error: Rate limited, waiting..."
sleep 60
exec "$0" # Retry script
;;
5*)
echo "Server error: $status_code"
exit 1
;;
*)
echo "Unexpected status: $status_code"
cat "$headers_file"
;;
esac
Memory and Disk Space Management
# Stream large responses without saving to disk
wget -O - https://api.example.com/large-dataset.json | jq '.data[] | select(.status=="active")'
# Check available space before download
check_space_and_download() {
local url=$1
local required_space_mb=$2
local available_space=$(df . | awk 'NR==2{print int($4/1024)}')
if [ $available_space -lt $required_space_mb ]; then
echo "Insufficient space: ${available_space}MB available, ${required_space_mb}MB required"
exit 1
fi
wget --progress=dot:giga "$url"
}
check_space_and_download "https://example.com/big-file.zip" 1000
Integration with Modern Development Workflows
Docker Container Usage
Use wget in containerized environments:
# Dockerfile example
FROM alpine:latest
RUN apk add --no-cache wget ca-certificates jq
WORKDIR /app
COPY download-script.sh .
RUN chmod +x download-script.sh
# Health check using wget
HEALTHCHECK --interval=30s --timeout=10s --retries=3 \
CMD wget --quiet --tries=1 --spider http://localhost:8080/health || exit 1
CMD ["./download-script.sh"]
CI/CD Pipeline Integration
# GitHub Actions example
name: Download and Process Data
on:
schedule:
- cron: '0 2 * * *' # Daily at 2 AM
jobs:
download:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Download API data
run: |
wget --header="Authorization: Bearer ${{ secrets.API_TOKEN }}" \
--output-document=data.json \
https://api.example.com/daily-export
- name: Validate download
run: |
if [ ! -s data.json ]; then
echo "Download failed or empty file"
exit 1
fi
# Validate JSON structure
jq empty data.json
- name: Upload to storage
run: |
aws s3 cp data.json s3://your-bucket/daily-exports/$(date +%Y-%m-%d).json
wget remains an essential tool for system administrators and developers working with file downloads and API interactions. Its robust retry mechanisms, built-in authentication support, and reliable performance make it particularly valuable for production environments where stability matters more than cutting-edge features. While newer tools like curl might offer more flexibility for complex API interactions, wget excels in scenarios requiring dependable, automated file retrieval and basic REST operations.
For comprehensive documentation and advanced usage examples, consult the official GNU wget manual. When deploying these techniques on production servers, consider the network and storage implications, especially when working with high-volume data transfers on your hosting infrastructure.

This article incorporates information and material from various online sources. We acknowledge and appreciate the work of all original authors, publishers, and websites. While every effort has been made to appropriately credit the source material, any unintentional oversight or omission does not constitute a copyright infringement. All trademarks, logos, and images mentioned are the property of their respective owners. If you believe that any content used in this article infringes upon your copyright, please contact us immediately for review and prompt action.
This article is intended for informational and educational purposes only and does not infringe on the rights of the copyright owners. If any copyrighted material has been used without proper credit or in violation of copyright laws, it is unintentional and we will rectify it promptly upon notification. Please note that the republishing, redistribution, or reproduction of part or all of the contents in any form is prohibited without express written permission from the author and website owner. For permissions or further inquiries, please contact us.