Skip to main content

IO Stress/Latency Fault Workflow

This topic describes the flow of control when you execute a IO stress or latency chaos experiment in Harness Chaos Engineering.

The diagram below describes the flow of control for a IO stress or latency experiment.

stress/latency fault flow

IO stress consumes disk resources of the target application by injecting high load on the disk IO.

Latency increases the file operation delays by introducing latency in read/write operations of the target application.

Step 1: Fetch Target Container Info

The chaos helper pod retrieves the pod specification and identifies the containerID of the target application pod.

Step 2: Inspect Container Metadata

The helper pod inspects the container runtime to obtain metadata, including the cgroup details of the target container. This requires permissions such as sudo/root and host path for socket mount.

Step 3: Derive PID of the Target App Container

The helper pod extracts the process ID (PID) of the main process running inside the application container.

Step 4: Prepare Stress Process

IO StressLatency
The PID derived earlier is used to inject a stress process into the target application. The stress process is loaded into memory but kept in a paused state.The helper pod joins the PID namespace (pid_ns) and mount namespace (mnt_ns) of the target container.

Step 5: Transfer IO Stress / Inject Stress Process

IO StressLatency
Transfer I/O Stress Process into the Target Container Cgroup:
  • Using Linux namespaces (pid_ns, mnt_ns), the stress process is mapped into the target container’s namespace.
  • This ensures that the stress process runs inside the application container.
Inject Latency Faults Using System-Level Tools
  • FUSE (Filesystem in Userspace) is leveraged to add delays in file system operations.
  • ptrace (Process Tracing) is used to attach and detach processes, simulating high-latency operations.
  • Files are mounted, backed up, and restored with delays to introduce artificial latency.

Step 6: Resume Stress Process / Apply Network-Level Constraints

IO StressLatency
Resume I/O Stress Process:
  • The stressor starts an intensive disk read/write operations, increasing I/O utilization.
  • This affects the target application’s performance by making disk access slow or unresponsive.
Apply Network-Level Constraints: If latency is injected into network-based file operations, additional constraints may be applied using:
  • net_admin capabilities.
  • sys_admin privileges for file system modifications.

In case of IO stress chaos, after the chaos duration is complete, the helper pod stops the stressor process and cleans up resources.

In case of IO latency chaos, after the chaos duration is complete, the helper pod removes the latency injection rules and restores normal file operations.