Agent Architecture

The DeepTrace Agent is a lightweight, high-performance Rust-based component responsible for collecting distributed tracing data from applications without requiring code modifications. This document provides a detailed overview of the agent's architecture, components, and operational principles based on the actual implementation.

Overview

The DeepTrace Agent operates as a system-level service that uses eBPF (Extended Berkeley Packet Filter) technology to transparently capture network communications and system calls. It processes this raw data into structured spans and transmits them directly to Elasticsearch for storage and later processing by the DeepTrace Server.

Architecture Diagram

graph TB
    subgraph "Application Layer"
        APP1[Application 1]
        APP2[Application 2]
        APP3[Application 3]
    end
    
    subgraph "DeepTrace Agent"
        subgraph "eBPF Layer"
            TRACE_MODULE[TraceModule/TraceCollector]
            EBPF_PROGS[eBPF Programs]
            SYSCALLS[System Call Hooks]
        end
        
        subgraph "Processing Layer"
            SPAN_CONSTRUCTOR[SpanConstructor]
            MESSAGE_QUEUE[Message Queue]
        end
        
        subgraph "Sender Layer"
            SENDER_PROCESS[SenderProcess]
            ELASTIC_SENDER[ElasticSender]
            FILE_SENDER[FlatFile]
        end
        
        subgraph "Management Layer"
            CONFIGURATOR[Configurator]
            SYNCHRONIZER[Synchronizer]
            METRIC_COLLECTOR[MetricCollector]
            API_SERVER[Rocket API Server]
        end
    end
    
    subgraph "External Systems"
        ES[(Elasticsearch)]
        CONFIG_API[Configuration API]
    end
    
    APP1 --> SYSCALLS
    APP2 --> SYSCALLS
    APP3 --> SYSCALLS
    
    SYSCALLS --> EBPF_PROGS
    EBPF_PROGS --> TRACE_MODULE
    TRACE_MODULE --> MESSAGE_QUEUE
    MESSAGE_QUEUE --> SPAN_CONSTRUCTOR
    SPAN_CONSTRUCTOR --> SENDER_PROCESS
    SENDER_PROCESS --> ELASTIC_SENDER
    SENDER_PROCESS --> FILE_SENDER
    ELASTIC_SENDER --> ES
    
    CONFIGURATOR --> SPAN_CONSTRUCTOR
    CONFIGURATOR --> SENDER_PROCESS
    SYNCHRONIZER --> API_SERVER
    API_SERVER --> CONFIG_API
    METRIC_COLLECTOR --> FILE_SENDER

Purpose: Main eBPF program management and data collection
Implementation: Rust-based eBPF program loader and manager
Target Processes: Configurable via PIDs in configuration
Data Collection: Network system calls and socket operations

System Call Hooks

Monitored Calls:
- Read Operations: sys_enter_read, sys_exit_read, sys_enter_readv, sys_exit_readv
- Receive Operations: sys_enter_recvfrom, sys_exit_recvfrom, sys_enter_recvmsg, sys_exit_recvmsg, sys_enter_recvmmsg, sys_exit_recvmmsg
- Write Operations: sys_enter_write, sys_exit_write, sys_enter_writev, sys_exit_writev
- Send Operations: sys_enter_sendto, sys_exit_sendto, sys_enter_sendmsg, sys_exit_sendmsg, sys_enter_sendmmsg, sys_exit_sendmmsg
- Socket Operations: sys_exit_socket, sys_enter_close
Configuration: Enabled probes are configurable via enabled_probes array
Logging: Configurable log levels (0=off, 1=debug, 3=verbose, 4=stats)

eBPF Configuration

Buffer Management: max_buffered_events (default: 128)
Process Filtering: Target specific PIDs for monitoring
Probe Selection: Granular control over which system calls to monitor

2. Processing Layer

The processing layer transforms raw eBPF events into structured spans:

SpanConstructor

Purpose: Converts raw eBPF messages into structured spans
Input: Receives messages from TraceModule via crossbeam channels
Output: Sends constructed spans to SenderProcess
Implementation: Rust-based message processing with configurable buffering
Configuration:
- cleanup_interval: Span cleanup timing (default: 30 seconds)
- max_sockets: Maximum tracked sockets (default: 1024)

Message Queue System

Channel Type: Crossbeam unbounded/bounded channels
Message Flow: TraceModule → SpanConstructor → SenderProcess
Buffer Sizes: Configurable bounded channels (default: 1024)
Backpressure: Automatic handling via channel capacity

Data Processing Features

Socket Tracking: Maintains socket state across system calls
Request/Response Correlation: Matches network I/O operations
Span Correlation: Correlates related spans using transaction semantics
Metadata Extraction: Process IDs, timestamps, connection details
Span Lifecycle Management: Automatic cleanup of completed spans

3. Sender Layer

The sender layer handles data output to various destinations:

SenderProcess

Purpose: Generic sender framework for different output types
Implementation: Configurable sender that can use different backends
Channel Integration: Receives spans from SpanConstructor via channels
Supported Backends: Elasticsearch and File output

ElasticSender

Purpose: Direct Elasticsearch integration for span storage
Configuration:
- node_url: Elasticsearch endpoint (e.g., "http://localhost:9200")
- username/password: Authentication credentials
- index_name: Target index for spans
- bulk_size: Batch size for bulk operations (default: 64)
- request_timeout: HTTP timeout (default: 10 seconds)
Features: Bulk indexing, connection management, error handling

FlatFile Sender

Purpose: File-based output for debugging and backup
Configuration:
- path: Output file path
- rotate: Enable log rotation
- max_size: Maximum file size before rotation (MB)
- max_age: Retention period (days)
- rotate_time: Rotation interval (days)
- data_format: Date format for file naming
Features: Automatic rotation, compression, structured output

4. Management Layer

The management layer provides operational capabilities:

Configurator

Purpose: Dynamic configuration management with file watching
Features:
- File system watching for configuration changes
- Automatic reload on configuration file modifications
- Retry logic for handling file write delays
- Configuration validation and error handling
Implementation: Uses notify crate for file system events
Configuration Path: Configurable via command line (-c flag)

Synchronizer

Purpose: Agent state synchronization and API management
Features: Rocket-based HTTP API server for configuration updates
API Endpoints: /api/config/update for dynamic configuration
Configuration:
- address: API server bind address
- port: API server port
- workers: Number of worker threads
- ident: Server identification string

MetricCollector

Purpose: System and application metrics collection
Configuration:
- interval: Collection interval in seconds
- sender: Target sender for metrics (references sender configuration)
Output: Sends metrics to configured sender (typically file-based)
Metrics: CPU usage, memory usage, span counts, system statistics

Data Flow

1. Event Capture

Application → System Call → eBPF Hook → TraceModule → Message Channel

2. Span Construction

Message Channel → SpanConstructor → Span Building → Span Channel

3. Data Output

Span Channel → SenderProcess → ElasticSender → Elasticsearch
                            → FlatFile → Local Files

4. Configuration Management

Config File → Configurator → Dynamic Reload → Component Updates

[agent]
name = "deeptrace"  # Agent identifier

eBPF Configuration

[ebpf.trace]
log_level = 1  # 0=off, 1=debug, 3=verbose, 4=stats
pids = [523094]  # Target process IDs
max_buffered_events = 128
enabled_probes = [
    "sys_enter_read", "sys_exit_read",
    "sys_enter_write", "sys_exit_write",
    # ... additional system call hooks
]

Trace Configuration

[trace]
ebpf = "trace"  # References ebpf configuration
sender = "trace"  # References sender configuration

[trace.span]
cleanup_interval = 30  # Span cleanup interval (seconds)
max_sockets = 1024     # Maximum tracked sockets

Sender Configuration

# Elasticsearch sender
[sender.elastic.trace]
node_url = "http://localhost:9200"
username = "elastic"
password = "***"
request_timeout = 10
index_name = "agent1"
bulk_size = 64

# File sender
[sender.file.metric]
path = "metrics.csv"
rotate = true
max_size = 512  # MB
max_age = 6     # days
rotate_time = 11  # days
data_format = "%Y%m%d"

Metrics Configuration

[metric]
interval = 10    # Collection interval (seconds)
sender = "metric" # References sender configuration

Security Considerations

Privilege Requirements

CAP_BPF: Required for eBPF program loading (kernel 5.8+)
CAP_SYS_ADMIN: Required for older kernels
Root Access: Alternative to capabilities (not recommended)

Data Protection

Payload Filtering: Configurable content-type exclusions
Sensitive Data Masking: Automatic detection and redaction
Encryption in Transit: TLS support for server communication
Local Storage: Optional encryption for disk buffers

Attack Surface

eBPF Verifier: Kernel-level safety guarantees
User Space: Standard application security practices
Network Communication: Standard HTTPS security
Configuration: File system permissions and validation

Deployment and Usage

Command Line Usage

# Basic usage with default configuration
cargo run --release

# Specify custom configuration file
cargo run --release -- -c /path/to/config.toml

# With sudo privileges (required for eBPF)
sudo cargo run --release -- -c config/deeptrace.toml

Configuration File Location

Default Path: config/deeptrace.toml
Custom Path: Specified via -c command line argument
Example Configuration: config/deeptrace.toml.example

Runtime Requirements

Privileges: Root or CAP_BPF capability for eBPF program loading
Kernel Version: Linux kernel with eBPF support
Dependencies: Rust runtime, libbpf, Elasticsearch (for data storage)

Process Management

Startup: Agent initializes all modules sequentially
Shutdown: Graceful shutdown on SIGINT (Ctrl+C)
State Management: Atomic state management for clean shutdown
Error Handling: Comprehensive error handling with logging

API Endpoints

The agent provides a REST API for configuration management:

Configuration Update

POST /api/config/update
Content-Type: application/json

{
  "agent": {
    "name": "deeptrace",
    "workers": 4
  },
  "sender": {
    "elastic": {
      "node_url": "http://localhost:9200",
      "username": "elastic",
      "password": "password",
      "index_name": "spans",
      "bulk_size": 64
    }
  },
  "trace": {
    "pids": [1234, 5678]
  }
}

API Configuration

# API server settings (part of synchronizer)
address = "0.0.0.0"  # Bind address
port = 8080          # API port
workers = 1          # Worker threads
ident = "deeptrace"  # Server identification

Module Architecture

The agent follows a modular architecture with the following key modules:

Core Modules

TraceModule/TraceCollector: eBPF program management and data collection
SpanConstructor: Raw event processing and span construction
SenderProcess: Data output management with pluggable backends
MetricCollector: System metrics collection and reporting
Configurator: Dynamic configuration management
Synchronizer: API server and state synchronization

Module Lifecycle

Initialization: Sequential module startup with dependency management
Runtime: Asynchronous operation with channel-based communication
Shutdown: Graceful shutdown with proper resource cleanup
Error Handling: Per-module error handling with system-wide error propagation

Inter-Module Communication

Channels: Crossbeam channels for high-performance message passing
Configuration: Shared configuration via Arc<ArcSwap>
State Management: Atomic state management for coordination
Error Propagation: Structured error handling across module boundaries

Keyboard shortcuts

DeepTrace Documentation