Understand the TCP Stream
Today’s research transitioned from reading source code to active interaction with the llama.cpp RPC server. I focused on building the communication backbone required to trigger the identified vulnerability
1. Reverse Engineering the Server Handshake
By analyzing the server’s message-handling loop, I mapped out the exact sequence of bytes expected by the RPC backend. The server doesn’t just wait for data; it expects a strict “protocol-legal” handshake before it even considers processing complex commands
2. Scripting the Raw Socket Interaction
I developed a custom Python script using the socket library to handle the raw TCP stream. Unlike high-level APIs, interacting directly with the socket allows for:
- Byte-level Precision: Crucial for satisfying the
#pragma pack(push, 1)requirement of the RPC structs - Timing Control: Managing how the server receives chunks of metadata to ensure the
deserialize_tensorfunction is triggered under the right conditions
3. Logic Flow Analysis
I’ve traced the server’s logic from the moment a packet arrives at the socket until it hits the vulnerable rpc_server layer:
- Socket Listen: Server accepts the connection
- Command Dispatch: The first few bytes define the
rpc_cmd - Metadata Ingestion: The server reads the remaining bytes directly into an
rpc_tensorbuffer
The Exploit Backbone
The script currently implements the following milestones:
- TCP Connection Establishment: Successful handshake with the RPC port
- Packet Serialization: Correctly converting Python objects into the binary format expected by the C++ backend
- Basic Command Execution: Sending
RPC_CMD_HELLOand receiving a valid version response