
Python struct Pack and Unpack – Working with Binary Data
Python’s struct
module is your go-to tool for handling binary data, letting you pack Python values into binary strings and unpack them back into Python objects. Whether you’re building network protocols, working with file formats, or interfacing with C libraries, understanding how to manipulate binary data efficiently is crucial for any serious Python developer. This guide covers everything from the basics to advanced techniques, common gotchas, and real-world applications that’ll help you master binary data handling in Python.
How Python struct Works
The struct
module bridges the gap between Python’s high-level data types and low-level binary representations. It uses format strings to define how data should be packed or unpacked, similar to C’s printf-style formatting but for binary data.
Format strings consist of format characters that specify data types, byte order, size, and alignment. The most common format characters include:
Format | C Type | Python Type | Size (bytes) |
---|---|---|---|
c | char | bytes of length 1 | 1 |
b | signed char | integer | 1 |
B | unsigned char | integer | 1 |
h | short | integer | 2 |
H | unsigned short | integer | 2 |
i | int | integer | 4 |
I | unsigned int | integer | 4 |
f | float | float | 4 |
d | double | float | 8 |
Byte order matters when working with binary data across different systems. The struct module provides several options:
@
– Native order (default)=
– Native order, standard size and alignment<
– Little-endian>
– Big-endian!
– Network (big-endian) order
Basic Pack and Unpack Operations
Let’s start with simple examples to understand the core functionality:
import struct
# Packing data
data = struct.pack('i', 42)
print(f"Packed integer: {data}") # b'*\x00\x00\x00'
print(f"Length: {len(data)} bytes") # 4 bytes
# Unpacking data
unpacked = struct.unpack('i', data)
print(f"Unpacked: {unpacked[0]}") # 42
# Multiple values
packed_multi = struct.pack('iif', 10, 20, 3.14)
unpacked_multi = struct.unpack('iif', packed_multi)
print(f"Multiple values: {unpacked_multi}") # (10, 20, 3.140000104904175)
Working with strings requires special attention since they need explicit length specification:
# Fixed-length strings
text = "Hello"
packed_str = struct.pack('5s', text.encode('utf-8'))
unpacked_str = struct.unpack('5s', packed_str)[0].decode('utf-8')
print(f"String: {unpacked_str}")
# Variable-length strings with length prefix
def pack_string(s):
encoded = s.encode('utf-8')
return struct.pack('I', len(encoded)) + encoded
def unpack_string(data):
length = struct.unpack('I', data[:4])[0]
return data[4:4+length].decode('utf-8')
original = "Hello, World!"
packed = pack_string(original)
result = unpack_string(packed)
print(f"Variable string: {result}")
Real-World Examples and Use Cases
Here are practical scenarios where struct shines:
Network Protocol Implementation
Building a simple TCP header parser demonstrates struct’s power in network programming:
import struct
import socket
class TCPHeader:
def __init__(self, src_port, dst_port, seq_num, ack_num, flags):
self.src_port = src_port
self.dst_port = dst_port
self.seq_num = seq_num
self.ack_num = ack_num
self.flags = flags
def pack(self):
# TCP header format: src_port(2), dst_port(2), seq(4), ack(4), flags(2)
return struct.pack('!HHIIH',
self.src_port, self.dst_port,
self.seq_num, self.ack_num, self.flags)
@classmethod
def unpack(cls, data):
unpacked = struct.unpack('!HHIIH', data[:14])
return cls(*unpacked)
# Usage example
header = TCPHeader(8080, 80, 1000, 2000, 0x18)
packed_header = header.pack()
reconstructed = TCPHeader.unpack(packed_header)
print(f"Source port: {reconstructed.src_port}")
Binary File Format Processing
Reading custom binary file formats is another common use case:
import struct
class BinaryFileReader:
def __init__(self, filename):
self.file = open(filename, 'rb')
def read_header(self):
# Example: magic(4), version(2), record_count(4)
header_data = self.file.read(10)
magic, version, count = struct.unpack('!4sHI', header_data)
return {
'magic': magic,
'version': version,
'record_count': count
}
def read_record(self):
# Example: id(4), timestamp(8), value(4)
record_data = self.file.read(16)
if len(record_data) < 16:
return None
record_id, timestamp, value = struct.unpack('!IQf', record_data)
return {
'id': record_id,
'timestamp': timestamp,
'value': value
}
def close(self):
self.file.close()
Embedded Systems Communication
When communicating with microcontrollers or embedded devices, struct helps maintain precise data formatting:
import struct
import serial
class SensorProtocol:
HEADER = b'\xAA\xBB'
@staticmethod
def create_command(cmd_id, payload=b''):
# Header(2) + Command ID(1) + Length(1) + Payload + Checksum(1)
length = len(payload)
packet = SensorProtocol.HEADER + struct.pack('BB', cmd_id, length) + payload
checksum = sum(packet) & 0xFF
return packet + struct.pack('B', checksum)
@staticmethod
def parse_response(data):
if len(data) < 5: # Minimum packet size
return None
header, cmd_id, length = struct.unpack('2sBB', data[:4])
if header != SensorProtocol.HEADER:
return None
payload = data[4:4+length]
checksum = struct.unpack('B', data[4+length:5+length])[0]
# Verify checksum
calculated = sum(data[:4+length]) & 0xFF
if calculated != checksum:
return None
return {'cmd_id': cmd_id, 'payload': payload}
# Usage with serial communication
def read_sensor_data(port):
ser = serial.Serial(port, 9600)
command = SensorProtocol.create_command(0x01) # Read sensor command
ser.write(command)
response = ser.read(100) # Read response
return SensorProtocol.parse_response(response)
Performance Considerations and Alternatives
While struct is efficient for most use cases, performance can vary based on usage patterns:
Method | Use Case | Performance | Memory Usage |
---|---|---|---|
struct.pack/unpack | Occasional conversions | Good | Low |
struct.Struct | Repeated operations | Excellent | Low |
array module | Homogeneous data | Very Good | Very Low |
numpy | Numerical arrays | Excellent | Low |
For repeated operations, pre-compile format strings using struct.Struct:
import struct
import time
# Inefficient approach
def slow_packing(data_list):
result = []
for item in data_list:
result.append(struct.pack('if', item[0], item[1]))
return result
# Efficient approach
def fast_packing(data_list):
packer = struct.Struct('if')
return [packer.pack(item[0], item[1]) for item in data_list]
# Performance test
test_data = [(i, float(i)) for i in range(10000)]
start = time.time()
slow_result = slow_packing(test_data)
slow_time = time.time() - start
start = time.time()
fast_result = fast_packing(test_data)
fast_time = time.time() - start
print(f"Slow method: {slow_time:.4f}s")
print(f"Fast method: {fast_time:.4f}s")
print(f"Speedup: {slow_time/fast_time:.2f}x")
Common Pitfalls and Troubleshooting
Even experienced developers run into these struct-related issues:
Endianness Problems
The most common issue is endianness mismatches between systems:
import struct
# Problem: Different results on different systems
value = 0x12345678
native_packed = struct.pack('I', value)
print(f"Native: {native_packed.hex()}")
# Solution: Always specify endianness for portable code
little_endian = struct.pack('I', value)
print(f"Little endian: {little_endian.hex()}") # 78563412
print(f"Big endian: {big_endian.hex()}") # 12345678
Padding and Alignment Issues
C struct padding can cause unexpected results:
import struct
# Native alignment includes padding
native_size = struct.calcsize('cI') # Usually 8 bytes due to padding
packed_size = struct.calcsize('=cI') # Usually 5 bytes, no padding
print(f"Native size: {native_size}")
print(f"Packed size: {packed_size}")
# Explicit padding control
data = struct.pack('=cxxxI', b'A', 42) # Manual padding with 'xxx'
unpacked = struct.unpack('=cxxxI', data)
print(f"With manual padding: {unpacked}")
String Encoding Gotchas
String handling requires careful attention to encoding:
import struct
# Wrong: This will fail with non-ASCII characters
try:
text = "Hello 世界"
packed = struct.pack('10s', text.encode('utf-8'))
print("This might truncate or fail")
except struct.error as e:
print(f"Error: {e}")
# Right: Check encoded length first
text = "Hello 世界"
encoded = text.encode('utf-8')
if len(encoded) <= 20:
packed = struct.pack('20s', encoded)
unpacked = struct.unpack('20s', packed)[0].rstrip(b'\x00').decode('utf-8')
print(f"Properly handled: {unpacked}")
Advanced Techniques and Best Practices
For production code, consider these advanced patterns:
Context Managers for Binary Files
import struct
from contextlib import contextmanager
@contextmanager
def binary_file_reader(filename):
try:
file = open(filename, 'rb')
yield BinaryReader(file)
finally:
file.close()
class BinaryReader:
def __init__(self, file):
self.file = file
self.position = 0
def read_struct(self, format_string):
size = struct.calcsize(format_string)
data = self.file.read(size)
if len(data) < size:
raise EOFError(f"Expected {size} bytes, got {len(data)}")
self.position += size
return struct.unpack(format_string, data)
def seek(self, position):
self.file.seek(position)
self.position = position
# Usage
with binary_file_reader('data.bin') as reader:
header = reader.read_struct('!4sHH')
records = []
while True:
try:
record = reader.read_struct('!IQf')
records.append(record)
except EOFError:
break
Schema-Based Binary Serialization
import struct
from typing import Dict, Any, List
class BinarySchema:
def __init__(self, fields: List[tuple]):
self.fields = fields
self.format_string = '!' + ''.join(field[1] for field in fields)
self.struct = struct.Struct(self.format_string)
def pack(self, data: Dict[str, Any]) -> bytes:
values = []
for field_name, field_format in self.fields:
value = data[field_name]
if 's' in field_format: # String field
if isinstance(value, str):
value = value.encode('utf-8')
values.append(value)
return self.struct.pack(*values)
def unpack(self, data: bytes) -> Dict[str, Any]:
values = self.struct.unpack(data)
result = {}
for i, (field_name, field_format) in enumerate(self.fields):
value = values[i]
if 's' in field_format: # String field
value = value.rstrip(b'\x00').decode('utf-8')
result[field_name] = value
return result
# Define a user record schema
user_schema = BinarySchema([
('user_id', 'I'),
('username', '20s'),
('email', '50s'),
('age', 'H'),
('balance', 'f')
])
# Usage
user_data = {
'user_id': 12345,
'username': 'john_doe',
'email': 'john@example.com',
'age': 30,
'balance': 1234.56
}
packed = user_schema.pack(user_data)
unpacked = user_schema.unpack(packed)
print(f"Roundtrip successful: {unpacked['username']}")
When deploying applications that handle binary data on servers, consider hosting solutions that provide the performance and reliability needed for data-intensive operations. Services like VPS hosting or dedicated servers can provide the computational resources necessary for processing large amounts of binary data efficiently.
For more detailed information about Python's struct module, check the official Python documentation. The struct module is part of Python's standard library and provides comprehensive format string specifications and usage examples that complement the practical applications covered in this guide.

This article incorporates information and material from various online sources. We acknowledge and appreciate the work of all original authors, publishers, and websites. While every effort has been made to appropriately credit the source material, any unintentional oversight or omission does not constitute a copyright infringement. All trademarks, logos, and images mentioned are the property of their respective owners. If you believe that any content used in this article infringes upon your copyright, please contact us immediately for review and prompt action.
This article is intended for informational and educational purposes only and does not infringe on the rights of the copyright owners. If any copyrighted material has been used without proper credit or in violation of copyright laws, it is unintentional and we will rectify it promptly upon notification. Please note that the republishing, redistribution, or reproduction of part or all of the contents in any form is prohibited without express written permission from the author and website owner. For permissions or further inquiries, please contact us.