Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Neumenon/cowrie/llms.txt

Use this file to discover all available pages before exploring further.

Cowrie decoders enforce security limits to prevent denial-of-service attacks, memory exhaustion, and CPU spin attacks. These limits provide defense-in-depth protection beyond basic sanity checks.

Overview

Security limits are enforced at decode time and can be customized via DecodeOptions. Default limits are designed to support large ML workloads while preventing extreme allocations.

Two-Layer Protection

  1. Sanity Checks (always enforced): Length cannot exceed remaining data
  2. Security Limits (configurable): Absolute maximums even for well-formed data
// Example: Decoding a string
length := readVarint()  // Attacker claims 1GB

// Layer 1: Sanity check
if length > remaining_bytes {
    return ErrMalformedLength  // Fail fast
}

// Layer 2: Security limit
if length > MaxStringLen {
    return ErrStringTooLarge  // Prevent legitimate but huge allocation
}

data := read(length)  // Safe to allocate

Default Limits

const (
    DefaultMaxDepth     = 1000          // Maximum nesting depth
    DefaultMaxArrayLen  = 100_000_000   // 100M elements
    DefaultMaxObjectLen = 10_000_000    // 10M fields
    DefaultMaxStringLen = 500_000_000   // 500MB strings
    DefaultMaxBytesLen  = 1_000_000_000 // 1GB bytes (tensors, images, audio)
    DefaultMaxExtLen    = 100_000_000   // 100MB max extension payload
    DefaultMaxDictLen   = 10_000_000    // 10M dictionary entries
    DefaultMaxHintCount = 10_000        // 10K column hints
    DefaultMaxRank      = 32            // Maximum tensor rank
)
These defaults support real ML workloads:
  • 768-dim embeddings: ~3KB per embedding → 32M embeddings fit in MaxBytesLen
  • Large language model responses: Multi-paragraph text fits in MaxStringLen
  • Graph databases: Millions of nodes/edges fit in MaxArrayLen

DecodeOptions

Configure limits for your use case:
import "github.com/Neumenon/cowrie"

// Use defaults
val, err := cowrie.Decode(data)

// Custom limits
opts := cowrie.DecodeOptions{
    MaxDepth:     500,              // Limit nesting (JSON bomb protection)
    MaxArrayLen:  1_000_000,        // Limit array size
    MaxObjectLen: 100_000,          // Limit object fields
    MaxStringLen: 10_000_000,       // 10MB strings
    MaxBytesLen:  100_000_000,      // 100MB binary data
    MaxExtLen:    50_000_000,       // 50MB extensions
    MaxDictLen:   1_000_000,        // 1M dictionary keys
    MaxHintCount: 1_000,            // 1K column hints
    MaxRank:      16,               // 16D tensors max
}
val, err := cowrie.DecodeWithOptions(data, opts)

Zero Values Use Defaults

opts := cowrie.DecodeOptions{
    MaxDepth: 100,  // Override
    // MaxArrayLen: 0 → Uses DefaultMaxArrayLen (100M)
}
opts := cowrie.DecodeOptions{
    MaxDepth: -1,  // Unlimited (DANGEROUS!)
}
Only use unlimited for trusted input (e.g., internal files).

Limit Descriptions

MaxDepth

Protects against: Nested structure attacks (stack overflow, CPU spin)
// Attack: 1000 levels deep
{"a": {"a": {"a": {"a": ...}}}}
Default: 1000 levels (enough for legitimate data) Typical values:
  • APIs: 50-100 (shallow documents)
  • Databases: 500-1000 (complex objects)
  • File processing: 1000+ (deeply nested config)

MaxArrayLen

Protects against: Memory exhaustion via huge arrays
// Attack: Claim 1B elements (8GB+ allocation)
Tag(0x06) | count:varint(1000000000) | ...
Default: 100M elements Typical values:
  • APIs: 1M-10M (paginated responses)
  • ML workloads: 100M+ (large embedding batches)
  • Graphs: 100M+ (large node/edge batches)
Memory impact:
  • 100M int64: ~800MB
  • 100M float32: ~400MB
  • 100M strings: Variable (depends on content)

MaxObjectLen

Protects against: Memory exhaustion via huge objects
// Attack: 10M fields (massive dictionary + object overhead)
Tag(0x07) | count:varint(10000000) | ...
Default: 10M fields Typical values:
  • APIs: 1K-10K fields (reasonable documents)
  • Databases: 100K-1M fields (wide tables)
  • Analytics: 10M+ fields (event aggregations)
Memory impact:
  • 10M fields × 32 bytes/field ≈ 320MB overhead
  • Plus dictionary keys (encoded once)
  • Plus field values (varies)

MaxStringLen

Protects against: Memory exhaustion via huge strings
// Attack: 1GB string
Tag(0x05) | len:varint(1000000000) | ...
Default: 500MB Typical values:
  • APIs: 1MB-10MB (documents, logs)
  • LLM responses: 100MB-500MB (long-form generation)
  • Files: 500MB+ (processing large text)
Why 500MB? Supports GPT-4 max context (~200K tokens × 4 bytes ≈ 800KB UTF-8, but with long-form responses can be multi-MB).

MaxBytesLen

Protects against: Memory exhaustion via binary data (tensors, images, audio)
// Attack: 10GB tensor
Tag(0x20) | ... | dataLen:varint(10000000000) | ...
Default: 1GB Typical values:
  • APIs: 10MB-100MB (small images, embeddings)
  • ML workloads: 1GB+ (large tensors, batches)
  • Media: 100MB-1GB+ (high-res images, audio)
Examples:
  • 768-dim float32 embedding: 3KB
  • 10K embeddings: 30MB
  • 1M embeddings: 3GB (exceeds default!)
  • 1920×1080 JPEG: ~1MB
  • 4K raw RGB: 24MB

MaxExtLen

Protects against: Unknown extension payload attacks
// Attack: 1GB unknown extension
Tag(0x0E) | extType:varint | len:varint(1000000000) | ...
Default: 100MB Typical values:
  • Standard: 10MB-100MB (forward compatibility)
  • Strict: 1MB (reject large unknown data)

MaxDictLen

Protects against: Dictionary explosion (CPU spin, memory)
// Attack: 10M dictionary keys
DictLen:varint(10000000) | (len:varint | bytes)* | ...
Default: 10M entries (same as MaxObjectLen) Typical values:
  • APIs: 1K-10K keys (typical schemas)
  • Large objects: 1M-10M keys (wide tables, many graphs)
Memory impact:
  • 10M keys × 20 bytes avg ≈ 200MB dictionary
  • Plus hash map overhead

MaxHintCount

Protects against: Column hints CPU spin attack
// Attack: 1M column hints (causes long parsing time)
HintCount:varint(1000000) | (field + type + shape + flags)* | ...
Default: 10K hints Typical values:
  • Standard: 100-1000 columns (wide tables)
  • Large: 10K+ columns (ultra-wide analytics)

MaxRank

Protects against: Tensor dimension explosion
// Attack: 255 dimensions (causes huge offset calculations)
Tag(0x20) | dtype | rank:u8(255) | dims:varint*255 | ...
Default: 32 dimensions Typical values:
  • Standard ML: 4-8 dimensions (batches, channels, height, width, etc.)
  • Advanced: 16-32 dimensions (attention heads, multiple batches)
Why 32? Enough for complex architectures:
  • 4D: [batch, channels, height, width]
  • 6D: [batch, time, layers, heads, seq, hidden]
  • 32D: Extreme multi-dimensional tensors
Wire limit: u8 max = 255 dimensions (but decoder rejects > MaxRank)

Attack Scenarios

1. Nested Object Bomb

Attack: Deeply nested objects to exhaust stack or spin CPU
{"a":{"a":{"a":{"a": ... 10000 levels}}}}
Protection: MaxDepth limit
opts := cowrie.DecodeOptions{MaxDepth: 100}
_, err := cowrie.DecodeWithOptions(malicious, opts)
// err == cowrie.ErrDepthExceeded

2. Array Length Bomb

Attack: Claim huge array to allocate gigabytes
Tag(0x06) | count:varint(1000000000) | ...
Protection: MaxArrayLen + sanity check
// Decoder checks:
if count > MaxArrayLen {
    return ErrArrayTooLarge  // Security limit
}
if count > remaining_bytes {
    return ErrMalformedLength  // Sanity check
}

3. Dictionary Explosion

Attack: 10M dictionary keys to exhaust memory + CPU
DictLen:varint(10000000) | key1 | key2 | ... | key10M | ...
Protection: MaxDictLen + sanity check
if dictLen > MaxDictLen {
    return ErrDictTooLarge
}
if dictLen > remaining_bytes {
    return ErrMalformedLength
}

4. Decompression Bomb

Attack: 1KB compressed → 10GB decompressed
Flags:0x03 (compressed gzip) | OrigLen:varint(10000000000) | [1KB of compressed data]
Protection: MaxDecompressedSize limit
const MaxDecompressedSize = 256 * 1024 * 1024  // 256MB

limited := io.LimitReader(gzipReader, MaxDecompressedSize+1)
out, _ := io.ReadAll(limited)
if len(out) > MaxDecompressedSize {
    return cowrie.ErrDecompressedTooLarge
}
See Compression for details.

5. Tensor Rank Bomb

Attack: 255-dimensional tensor to cause overflow in size calculations
Tag(0x20) | dtype | rank:u8(255) | dims:[1,1,1,...,1] | dataLen:varint(1) | [1 byte]
Protection: MaxRank limit
if rank > MaxRank {
    return ErrMalformedLength
}

6. Column Hints CPU Spin

Attack: 1M column hints to slow down header parsing
FlagHasColumnHints | HintCount:varint(1000000) | (field + type + shape + flags)*1M | ...
Protection: MaxHintCount limit
if hintCount > MaxHintCount {
    return ErrTooManyHints
}

Error Handling

All limit violations return specific errors:
val, err := cowrie.DecodeWithOptions(data, opts)
switch err {
case cowrie.ErrDepthExceeded:
    log.Println("Nested too deep")
case cowrie.ErrArrayTooLarge:
    log.Println("Array too large")
case cowrie.ErrObjectTooLarge:
    log.Println("Object too large")
case cowrie.ErrStringTooLarge:
    log.Println("String too large")
case cowrie.ErrBytesTooLarge:
    log.Println("Bytes/tensor too large")
case cowrie.ErrExtTooLarge:
    log.Println("Extension too large")
case cowrie.ErrDictTooLarge:
    log.Println("Dictionary too large")
case cowrie.ErrTooManyHints:
    log.Println("Too many column hints")
case cowrie.ErrMalformedLength:
    log.Println("Length exceeds remaining data (malicious)")
default:
    log.Println("Other error:", err)
}

Public API (Untrusted Input)

opts := cowrie.DecodeOptions{
    MaxDepth:     100,              // Shallow documents
    MaxArrayLen:  1_000_000,        // 1M elements max
    MaxObjectLen: 10_000,           // 10K fields max
    MaxStringLen: 10_000_000,       // 10MB strings
    MaxBytesLen:  100_000_000,      // 100MB binary
    MaxExtLen:    10_000_000,       // 10MB extensions
    MaxDictLen:   10_000,           // 10K keys
    MaxHintCount: 100,              // 100 column hints
    MaxRank:      8,                // 8D tensors max
    OnUnknownExt: cowrie.UnknownExtError,  // Reject unknown extensions
}
Profile: Conservative, protects against abuse, suitable for user-facing APIs.

Internal Service (Semi-Trusted)

opts := cowrie.DecodeOptions{
    MaxDepth:     500,
    MaxArrayLen:  10_000_000,
    MaxObjectLen: 100_000,
    MaxStringLen: 100_000_000,
    MaxBytesLen:  500_000_000,
    MaxExtLen:    50_000_000,
    MaxDictLen:   100_000,
    MaxHintCount: 1_000,
    MaxRank:      16,
}
Profile: Moderate, allows larger payloads, suitable for service-to-service communication.

ML Workload (Trusted)

opts := cowrie.DefaultDecodeOptions()  // Use generous defaults
// or
opts := cowrie.DecodeOptions{
    MaxDepth:     1000,
    MaxArrayLen:  100_000_000,
    MaxObjectLen: 10_000_000,
    MaxStringLen: 500_000_000,
    MaxBytesLen:  2_000_000_000,     // 2GB for large tensors
    MaxExtLen:    100_000_000,
    MaxDictLen:   10_000_000,
    MaxHintCount: 10_000,
    MaxRank:      32,
}
Profile: Permissive, supports large ML payloads, suitable for trusted data pipelines.

Strict Mode (Maximum Security)

opts := cowrie.DecodeOptions{
    MaxDepth:     50,
    MaxArrayLen:  10_000,
    MaxObjectLen: 1_000,
    MaxStringLen: 1_000_000,         // 1MB
    MaxBytesLen:  10_000_000,        // 10MB
    MaxExtLen:    1_000_000,         // 1MB
    MaxDictLen:   1_000,
    MaxHintCount: 50,
    MaxRank:      4,
    OnUnknownExt: cowrie.UnknownExtError,
}
Profile: Paranoid, rejects anything unusual, suitable for high-security environments.

Performance Impact

Limit checks have negligible overhead (less than 1% CPU) because they use fail-fast checks:
// Fast: Single comparison
if count > MaxArrayLen {
    return ErrArrayTooLarge
}

// No allocation until after limit check
items := make([]*Value, count)  // Only if count <= MaxArrayLen
Benchmark (100KB payload):
  • No limits: 1.2ms decode
  • With limits: 1.21ms decode (~1% overhead)
Limits save time by rejecting malicious payloads early.

Monitoring

Track limit violations to detect attacks:
func DecodeWithMetrics(data []byte) (*cowrie.Value, error) {
    val, err := cowrie.Decode(data)
    
    switch err {
    case cowrie.ErrArrayTooLarge, 
         cowrie.ErrObjectTooLarge,
         cowrie.ErrStringTooLarge,
         cowrie.ErrBytesTooLarge,
         cowrie.ErrDictTooLarge:
        metrics.Increment("cowrie.limit_exceeded", map[string]string{
            "error": err.Error(),
        })
        log.Warn("Limit exceeded", "error", err, "size", len(data))
    }
    
    return val, err
}

Best Practices

  1. Use Defaults for ML: Default limits support real ML workloads
  2. Tighten for APIs: Reduce limits for user-facing endpoints
  3. Monitor Violations: Track ErrXxxTooLarge errors
  4. Reject Unknown Extensions: Set OnUnknownExt to Error for strict mode
  5. Combine with Rate Limiting: Limit violations may indicate attack
  6. Test Edge Cases: Verify your limits with real data

Unknown Extension Behavior

Control how the decoder handles unknown TagExt extensions:
type UnknownExtBehavior int

const (
    UnknownExtKeep       UnknownExtBehavior = iota  // Preserve (default)
    UnknownExtSkipAsNull                            // Skip, return null
    UnknownExtError                                 // Error (strict mode)
)

opts := cowrie.DecodeOptions{
    OnUnknownExt: cowrie.UnknownExtError,  // Reject unknown data
}
Use cases:
  • Keep: Forward compatibility, round-trip preservation
  • Skip: Ignore unknown extensions silently
  • Error: Strict validation, reject unknown data