embedding bpf objects
Too many open files
I had trouble compiling. The nix builder failed with “too many open files”.
find /proc -maxdepth 1 -type d -name '[0-9]*' \
-exec bash -c "ls {}/fd/ | wc -l | tr '\n' ' '" \; \
-printf "fds (PID = %P), command: " \
-exec bash -c "tr '\0' ' ' < {}/cmdline" \; \
-exec echo \; | sort -rn | head
showed systemd consumed more than 700 file descriptors on startup.
I had a look at sudo lsof -p 1
, which told me there were hundreds of bpf-map
and bpf-prog
.
sudo bpftool prog list
showed the same bpf program 7dc8126e8768ea37
was loaded over and over.
5: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 4
6: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 3
9: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 8
10: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 7
11: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 10
12: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 9
15: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 14
...
48: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 2
49: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 1
50: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 6
51: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 5
52: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 46
53: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 45
54: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 48
55: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 47
56: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 50
57: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 49
58: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 52
59: cgroup_skb tag 7dc8126e8768ea37 gpl
loaded_at 2022-03-17T17:22:00+0800 uid 0
xlated 312B jited 192B memlock 4096B map_ids 51
...
In fact, every cgroup under the sun was attached with this program.
sudo bpftool cgroup tree
returned
CgroupPath
ID AttachType AttachFlags Name
/sys/fs/cgroup
49 ingress multi
48 egress multi
/sys/fs/cgroup/sys-fs-fuse-connections.mount
55 ingress multi
54 egress multi
/sys/fs/cgroup/sys-kernel-config.mount
57 ingress multi
56 egress multi
/sys/fs/cgroup/sys-kernel-debug.mount
24 ingress multi
23 egress multi
/sys/fs/cgroup/dev-mqueue.mount
20 ingress multi
19 egress multi
/sys/fs/cgroup/user.slice
190 ingress multi
189 egress multi
...
Moreover,
find /nix/store/24ljibki63lxk0m11qnw8fh9smh64g3x-systemd-249.7 -name '*bpf*'
returned nothing. So, where did systemd’s bpf object files go?
Hunting for systemd’s lost bpf object files
Hints from systemd
The only reference to cgroup_skb
in systemd is restrict-ifaces.bpf.c, whose caller restrict-ifaces.c
never explicitly loads any bpf programs.
However, the bpf entry point to restrict cgroup egress access sd_restrictif_e
is referenced in the line ingress_link = sym_bpf_program__attach_cgroup(obj->progs.sd_restrictif_i, cgroup_fd);
,
sym_bpf_program__attach_cgroup
’s definition is nowhere to be found,
but restrict-ifaces.c included a non-existent file bpf/restrict_ifaces/restrict-ifaces-skel.h
.
With some further digging, I found out it is generated by the script tools/build-bpf-skel.py.
Under the hood, it uses bpftool and libbpf’s code generation support.
The generated code can be used directly in the normal c programs. It also exposes functions to load bpf programs.
The code generation is added in the commit Add code-generated BPF object skeleton support.
Before we dive into how libbpf and bpftool generate skeleton code, how they embed the bpf programs into the elf binary and how are bpf programs loaded on demand, let’s inspect a simpler program, uprobe from libbpf-bootstrap, a simple program to probe usserspace function calls and returns which also embeds bpf programs with libbpf and bpftool.
Where are the bpf programs located in the memory?
One possibility is bpf programs, like dynamic libraries, are mmaped into uprobe
’s memory space.
If this is the case, we need to find out the memory region of the bpf programs, and which file they are mapped from.
Let’s use bpftrace
to trace the instructions passed to bpf(2)
.
We run sudo bpftrace bpf_prog_load.bt
where bpf_prog_load.bt
has the following contents.
// The struct fields are copied from
// https://github.com/torvalds/linux/blob/ed4643521e6af8ab8ed1e467630a85884d2696cf/include/uapi/linux/bpf.h#L1314-L1349
// __aligned_u64 is changed to __u64.
struct BpfProgAttr { /* anonymous struct used by BPF_PROG_LOAD command */
__u32 prog_type; /* one of enum bpf_prog_type */
__u32 insn_cnt;
__u64 insns;
__u64 license;
__u32 log_level; /* verbosity level of verifier */
__u32 log_size; /* size of user buffer */
__u64 log_buf; /* user supplied buffer */
__u32 kern_version; /* not used */
__u32 prog_flags;
char prog_name[16u];
__u32 prog_ifindex; /* ifindex of netdev */
};
// bpf_prog_load's signature is
// static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
kprobe:bpf_prog_load {
printf("bpf_prog_load\n");
$ptr = (struct BpfProgAttr *) arg0;
printf("pid: %d\n", pid);
printf("comm: %s\n", comm);
printf("attr address: %p\n", $ptr);
printf("instruction size: %d\n", $ptr->insn_cnt);
printf("instruction address: %p\n", $ptr->insns);
printf("prog name: %s\n", $ptr->prog_name);
printf("prog type: %d\n", $ptr->prog_type);
printf("prog index: %d\n", $ptr->prog_ifindex);
printf("\n")
}
// This bpftrace snippet does not seem to work.
// bpf_sys_bpf is the bpf syscall, whose signature is
// BPF_CALL_3(bpf_sys_bpf, int, cmd, void *, attr, u32, attr_size)
kprobe:bpf_sys_bpf {
// 3 is BPF_PROG_LOAD
if (arg0 == 3) {
printf("bpf_sys_bpf\n");
$ptr = (struct BpfProgAttr *) arg1;
printf("pid: %d\n", pid);
printf("comm: %s\n", comm);
printf("attr address: %p\n", $ptr);
printf("instruction size: %d\n", $ptr->insn_cnt);
printf("instruction address: %p\n", $ptr->insns);
printf("prog name: %s\n", $ptr->prog_name);
printf("prog type: %d\n", $ptr->prog_type);
printf("prog index: %d\n", $ptr->prog_ifindex);
printf("\n")
}
}
After running a new uprobe
process, the following results are printed on the screen
bpf_prog_load
pid: 2509841
comm: uprobe
attr address: 0xffffb98bc0ec3e68
instruction size: 2
instruction address: 0x7fff753d7bc0
prog name: test
prog type: 1
prog index: 0
bpf_prog_load
pid: 2509841
comm: uprobe
attr address: 0xffffb98bc0ec3e68
instruction size: 8
instruction address: 0x1bcac40
prog name: uprobe
prog type: 2
prog index: 0
bpf_prog_load
pid: 2509841
comm: uprobe
attr address: 0xffffb98bc0ec3e68
instruction size: 2
instruction address: 0x7fff753d7b90
prog name:
prog type: 1
prog index: 0
bpf_prog_load
pid: 2509841
comm: uprobe
attr address: 0xffffb98bc0ec3e68
instruction size: 7
instruction address: 0x1bcaf20
prog name: uretprobe
prog type: 2
prog index: 0
where 0x1bcac40
and 0x1bcaf20
are the programs loaded into the bpf vm.
Let’s check out where those programs came from.
We attach the program to a gdb session with sudo gdb attach -p $(pgrep '^uprobe$')
.
We then run info proc mappings
to view the memory lay out of the program uprobe
.
0x1bc9000 0x1bea000 0x21000 0x0 [heap]
To much of my disappointment, these two bpf programs are in the heap, not in some mmaped files.
Actually, carefully inspecting sudo cat /proc/$(pgrep '^uprobe$')/maps
will show that there are no extra mapped files
which could include the bpf programs.
The embedded bpf object
Following libbpf-boostrap’s instructions, we build uprobe
with a side effect of generating
a file uprobe.skel.h
which contains a snippet
static inline const void *uprobe_bpf__elf_bytes(size_t *sz)
{
*sz = 3304;
return (const void *)"\
\x7f\x45\x4c\x46\x02\x01\x01\0\0\0\0\0\0\0\0\0\x01\0\xf7\0\x01\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\xe8\x08\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x10\0\
\x0f\0\x79\x14\x68\0\0\0\0\0\x79\x13\x70\0\0\0\0\0\x18\x01\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\xb7\x02\0\0\x1e\0\0\0\x85\0\0\0\x06\0\0\0\xb7\0\0\0\0\0\0\0\x95\0\0\0\
\0\0\0\0\x79\x13\x50\0\0\0\0\0\x18\x01\0\0\x20\0\0\0\0\0\0\0\0\0\0\0\xb7\x02\0\
\0\x1a\0\0\0\x85\0\0\0\x06\0\0\0\xb7\0\0\0\0\0\0\0\x95\0\0\0\0\0\0\0\x44\x75\
\x61\x6c\x20\x42\x53\x44\x2f\x47\x50\x4c\0\0\0\0\0\0\0\0\0\0\0\0\x55\x50\x52\
\x4f\x42\x45\x20\x45\x4e\x54\x52\x59\x3a\x20\x61\x20\x3d\x20\x25\x64\x2c\x20\
\x62\x20\x3d\x20\x25\x64\x0a\0\0\0\x55\x50\x52\x4f\x42\x45\x20\x45\x58\x49\x54\
\x3a\x20\x72\x65\x74\x75\x72\x6e\x20\x3d\x20\x25\x64\x0a\0\x9f\xeb\x01\0\x18\0\
\0\0\0\0\0\0\x54\x02\0\0\x54\x02\0\0\xec\x01\0\0\0\0\0\0\0\0\0\x02\x02\0\0\0\
\x01\0\0\0\x15\0\0\x04\xa8\0\0\0\x09\0\0\0\x03\0\0\0\0\0\0\0\x0d\0\0\0\x03\0\0\
\0\x40\0\0\0\x11\0\0\0\x03\0\0\0\x80\0\0\0\x15\0\0\0\x03\0\0\0\xc0\0\0\0\x19\0\
\0\0\x03\0\0\0\0\x01\0\0\x1d\0\0\0\x03\0\0\0\x40\x01\0\0\x21\0\0\0\x03\0\0\0\
\x80\x01\0\0\x25\0\0\0\x03\0\0\0\xc0\x01\0\0\x29\0\0\0\x03\0\0\0\0\x02\0\0\x2c\
\0\0\0\x03\0\0\0\x40\x02\0\0\x2f\0\0\0\x03\0\0\0\x80\x02\0\0\x33\0\0\0\x03\0\0\
\0\xc0\x02\0\0\x37\0\0\0\x03\0\0\0\0\x03\0\0\x3b\0\0\0\x03\0\0\0\x40\x03\0\0\
\x3f\0\0\0\x03\0\0\0\x80\x03\0\0\x43\0\0\0\x03\0\0\0\xc0\x03\0\0\x4c\0\0\0\x03\
\0\0\0\0\x04\0\0\x50\0\0\0\x03\0\0\0\x40\x04\0\0\x53\0\0\0\x03\0\0\0\x80\x04\0\
\0\x5a\0\0\0\x03\0\0\0\xc0\x04\0\0\x5e\0\0\0\x03\0\0\0\0\x05\0\0\x61\0\0\0\0\0\
\0\x01\x08\0\0\0\x40\0\0\0\0\0\0\0\x01\0\0\x0d\x05\0\0\0\x73\0\0\0\x01\0\0\0\
\x77\0\0\0\0\0\0\x01\x04\0\0\0\x20\0\0\x01\x7b\0\0\0\x01\0\0\x0c\x04\0\0\0\0\0\
\0\0\x01\0\0\x0d\x05\0\0\0\x73\0\0\0\x01\0\0\0\x23\x01\0\0\x01\0\0\x0c\x07\0\0\
\0\x92\x01\0\0\0\0\0\x01\x01\0\0\0\x08\0\0\x01\0\0\0\0\0\0\0\x03\0\0\0\0\x09\0\
\0\0\x0b\0\0\0\x0d\0\0\0\x97\x01\0\0\0\0\0\x01\x04\0\0\0\x20\0\0\0\xab\x01\0\0\
\0\0\0\x0e\x0a\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\x0a\x09\0\0\0\0\0\0\0\0\0\0\x03\0\
\0\0\0\x0d\0\0\0\x0b\0\0\0\x1e\0\0\0\xb3\x01\0\0\0\0\0\x0e\x0e\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\x03\0\0\0\0\x0d\0\0\0\x0b\0\0\0\x1a\0\0\0\xc6\x01\0\0\0\0\0\x0e\
\x10\0\0\0\0\0\0\0\xdc\x01\0\0\x02\0\0\x0f\0\0\0\0\x0f\0\0\0\0\0\0\0\x1e\0\0\0\
\x11\0\0\0\x20\0\0\0\x1a\0\0\0\xe4\x01\0\0\x01\0\0\x0f\0\0\0\0\x0c\0\0\0\0\0\0\
\0\x0d\0\0\0\0\x70\x74\x5f\x72\x65\x67\x73\0\x72\x31\x35\0\x72\x31\x34\0\x72\
\x31\x33\0\x72\x31\x32\0\x72\x62\x70\0\x72\x62\x78\0\x72\x31\x31\0\x72\x31\x30\
\0\x72\x39\0\x72\x38\0\x72\x61\x78\0\x72\x63\x78\0\x72\x64\x78\0\x72\x73\x69\0\
\x72\x64\x69\0\x6f\x72\x69\x67\x5f\x72\x61\x78\0\x72\x69\x70\0\x63\x73\0\x65\
\x66\x6c\x61\x67\x73\0\x72\x73\x70\0\x73\x73\0\x6c\x6f\x6e\x67\x20\x75\x6e\x73\
\x69\x67\x6e\x65\x64\x20\x69\x6e\x74\0\x63\x74\x78\0\x69\x6e\x74\0\x75\x70\x72\
\x6f\x62\x65\0\x75\x70\x72\x6f\x62\x65\x2f\x66\x75\x6e\x63\0\x2f\x68\x6f\x6d\
\x65\x2f\x65\x2f\x57\x6f\x72\x6b\x73\x70\x61\x63\x65\x2f\x6c\x69\x62\x62\x70\
\x66\x2d\x62\x6f\x6f\x74\x73\x74\x72\x61\x70\x2f\x65\x78\x61\x6d\x70\x6c\x65\
\x73\x2f\x63\x2f\x75\x70\x72\x6f\x62\x65\x2e\x62\x70\x66\x2e\x63\0\x69\x6e\x74\
\x20\x42\x50\x46\x5f\x4b\x50\x52\x4f\x42\x45\x28\x75\x70\x72\x6f\x62\x65\x2c\
\x20\x69\x6e\x74\x20\x61\x2c\x20\x69\x6e\x74\x20\x62\x29\0\x09\x62\x70\x66\x5f\
\x70\x72\x69\x6e\x74\x6b\x28\x22\x55\x50\x52\x4f\x42\x45\x20\x45\x4e\x54\x52\
\x59\x3a\x20\x61\x20\x3d\x20\x25\x64\x2c\x20\x62\x20\x3d\x20\x25\x64\x5c\x6e\
\x22\x2c\x20\x61\x2c\x20\x62\x29\x3b\0\x75\x72\x65\x74\x70\x72\x6f\x62\x65\0\
\x75\x72\x65\x74\x70\x72\x6f\x62\x65\x2f\x66\x75\x6e\x63\0\x69\x6e\x74\x20\x42\
\x50\x46\x5f\x4b\x52\x45\x54\x50\x52\x4f\x42\x45\x28\x75\x72\x65\x74\x70\x72\
\x6f\x62\x65\x2c\x20\x69\x6e\x74\x20\x72\x65\x74\x29\0\x09\x62\x70\x66\x5f\x70\
\x72\x69\x6e\x74\x6b\x28\x22\x55\x50\x52\x4f\x42\x45\x20\x45\x58\x49\x54\x3a\
\x20\x72\x65\x74\x75\x72\x6e\x20\x3d\x20\x25\x64\x5c\x6e\x22\x2c\x20\x72\x65\
\x74\x29\x3b\0\x63\x68\x61\x72\0\x5f\x5f\x41\x52\x52\x41\x59\x5f\x53\x49\x5a\
\x45\x5f\x54\x59\x50\x45\x5f\x5f\0\x4c\x49\x43\x45\x4e\x53\x45\0\x5f\x5f\x5f\
\x5f\x75\x70\x72\x6f\x62\x65\x2e\x5f\x5f\x5f\x5f\x66\x6d\x74\0\x5f\x5f\x5f\x5f\
\x75\x72\x65\x74\x70\x72\x6f\x62\x65\x2e\x5f\x5f\x5f\x5f\x66\x6d\x74\0\x2e\x72\
\x6f\x64\x61\x74\x61\0\x6c\x69\x63\x65\x6e\x73\x65\0\x9f\xeb\x01\0\x20\0\0\0\0\
\0\0\0\x24\0\0\0\x24\0\0\0\x74\0\0\0\x98\0\0\0\0\0\0\0\x08\0\0\0\x82\0\0\0\x01\
\0\0\0\0\0\0\0\x06\0\0\0\x2d\x01\0\0\x01\0\0\0\0\0\0\0\x08\0\0\0\x10\0\0\0\x82\
\0\0\0\x03\0\0\0\0\0\0\0\x8e\0\0\0\xc9\0\0\0\x05\x2c\0\0\x10\0\0\0\x8e\0\0\0\
\xee\0\0\0\x02\x34\0\0\x30\0\0\0\x8e\0\0\0\xc9\0\0\0\x05\x2c\0\0\x2d\x01\0\0\
\x03\0\0\0\0\0\0\0\x8e\0\0\0\x3c\x01\0\0\x05\x48\0\0\x08\0\0\0\x8e\0\0\0\x62\
\x01\0\0\x02\x50\0\0\x28\0\0\0\x8e\0\0\0\x3c\x01\0\0\x05\x48\0\0\0\0\0\0\0\0\
\x10\0\0\0\0\0\0\0\x01\x7a\x52\0\x08\x7c\x0b\x01\x0c\0\0\0\x18\0\0\0\x18\0\0\0\
\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\0\x1c\0\0\0\x34\0\0\0\0\0\0\0\0\0\0\0\
\x38\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\x03\0\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x14\0\0\0\x01\0\x05\0\
\0\0\0\0\0\0\0\0\x1e\0\0\0\0\0\0\0\0\0\0\0\x03\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\x27\0\0\0\x01\0\x05\0\x20\0\0\0\0\0\0\0\x1a\0\0\0\0\0\0\0\0\0\0\0\x03\
\0\x05\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x53\0\0\0\x12\0\x02\0\0\0\0\0\0\0\0\0\
\x40\0\0\0\0\0\0\0\x5a\0\0\0\x12\0\x03\0\0\0\0\0\0\0\0\0\x38\0\0\0\0\0\0\0\xa8\
\0\0\0\x11\0\x04\0\0\0\0\0\0\0\0\0\x0d\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x01\0\0\
\0\x05\0\0\0\x08\0\0\0\0\0\0\0\x01\0\0\0\x05\0\0\0\x40\x02\0\0\0\0\0\0\x03\0\0\
\0\x05\0\0\0\x4c\x02\0\0\0\0\0\0\x03\0\0\0\x05\0\0\0\x64\x02\0\0\0\0\0\0\x04\0\
\0\0\x08\0\0\0\x2c\0\0\0\0\0\0\0\x04\0\0\0\x01\0\0\0\x3c\0\0\0\0\0\0\0\x04\0\0\
\0\x03\0\0\0\x50\0\0\0\0\0\0\0\x04\0\0\0\x01\0\0\0\x60\0\0\0\0\0\0\0\x04\0\0\0\
\x01\0\0\0\x70\0\0\0\0\0\0\0\x04\0\0\0\x01\0\0\0\x88\0\0\0\0\0\0\0\x04\0\0\0\
\x03\0\0\0\x98\0\0\0\0\0\0\0\x04\0\0\0\x03\0\0\0\xa8\0\0\0\0\0\0\0\x04\0\0\0\
\x03\0\0\0\x1c\0\0\0\0\0\0\0\x02\0\0\0\x01\0\0\0\x38\0\0\0\0\0\0\0\x02\0\0\0\
\x03\0\0\0\0\x2e\x74\x65\x78\x74\0\x2e\x72\x65\x6c\x2e\x42\x54\x46\x2e\x65\x78\
\x74\0\x5f\x5f\x5f\x5f\x75\x70\x72\x6f\x62\x65\x2e\x5f\x5f\x5f\x5f\x66\x6d\x74\
\0\x5f\x5f\x5f\x5f\x75\x72\x65\x74\x70\x72\x6f\x62\x65\x2e\x5f\x5f\x5f\x5f\x66\
\x6d\x74\0\x6c\x69\x63\x65\x6e\x73\x65\0\x2e\x72\x65\x6c\x2e\x65\x68\x5f\x66\
\x72\x61\x6d\x65\0\x75\x70\x72\x6f\x62\x65\0\x75\x72\x65\x74\x70\x72\x6f\x62\
\x65\0\x2e\x72\x65\x6c\x75\x70\x72\x6f\x62\x65\x2f\x66\x75\x6e\x63\0\x2e\x72\
\x65\x6c\x75\x72\x65\x74\x70\x72\x6f\x62\x65\x2f\x66\x75\x6e\x63\0\x2e\x73\x74\
\x72\x74\x61\x62\0\x2e\x73\x79\x6d\x74\x61\x62\0\x2e\x72\x6f\x64\x61\x74\x61\0\
\x2e\x72\x65\x6c\x2e\x42\x54\x46\0\x4c\x49\x43\x45\x4e\x53\x45\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\x01\0\0\0\x06\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x04\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\x68\0\0\0\x01\0\0\0\x06\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x40\0\
\0\0\0\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\x78\0\0\0\x01\0\0\0\x06\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x80\0\0\0\0\0\0\0\x38\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x3d\0\0\0\x01\
\0\0\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xb8\0\0\0\0\0\0\0\x0d\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x97\0\0\0\x01\0\0\0\x02\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\xd0\0\0\0\0\0\0\0\x3a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\x10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xa3\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\x0a\x01\0\0\0\0\0\0\x58\x04\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x62\
\x05\0\0\0\0\0\0\xb8\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\x49\0\0\0\x01\0\0\0\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x20\x06\0\0\0\0\0\
\0\x50\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x8f\0\0\
\0\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x70\x06\0\0\0\0\0\0\xd8\0\0\0\0\0\
\0\0\x0f\0\0\0\x06\0\0\0\x08\0\0\0\0\0\0\0\x18\0\0\0\0\0\0\0\x64\0\0\0\x09\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x48\x07\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x09\0\
\0\0\x02\0\0\0\x08\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x74\0\0\0\x09\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\x58\x07\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x09\0\0\0\x03\0\
\0\0\x08\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x9f\0\0\0\x09\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\x68\x07\0\0\0\0\0\0\x30\0\0\0\0\0\0\0\x09\0\0\0\x06\0\0\0\x08\0\
\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x07\0\0\0\x09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\x98\x07\0\0\0\0\0\0\x80\0\0\0\0\0\0\0\x09\0\0\0\x07\0\0\0\x08\0\0\0\0\0\0\
\0\x10\0\0\0\0\0\0\0\x45\0\0\0\x09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x18\
\x08\0\0\0\0\0\0\x20\0\0\0\0\0\0\0\x09\0\0\0\x08\0\0\0\x08\0\0\0\0\0\0\0\x10\0\
\0\0\0\0\0\0\x87\0\0\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x38\x08\0\0\0\
\0\0\0\xb0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0";
}
Eureka! This is the bpf program. Look at the head of this file. \x7f\x45\x4c\x46
is
the magic number for ELF files, while \xf7
is the machine number for bpf.
For better understanding of what this array does, we dump it to a file with the following c program.
#include "uprobe.skel.h"
#include <stdio.h>
void writeBpfObjectToFile(char filename[80]) {
FILE *fp = fopen(filename, "wb");
size_t size = 0;
const void *p = uprobe_bpf__elf_bytes(&size);
int r = fwrite(p, 1, size, fp);
fclose(fp);
}
int main(int argc, char **argv) {
writeBpfObjectToFile(argv[1]);
return 0;
}
I compile it with
clang -g -I.output -I../../libbpf/include/uapi -I../../vmlinux/x86/ -idirafter /nix/store/zhykg9kkhyb6mb47p1mw7pyz847ll5b4-libelf-0.8.13/include -idirafter /nix/store/1my9xr1s1nfjmqwyi46pzdrvny7hm66x-zlib-1.2.11-dev/include -idirafter /nix/store/0sk7aa616ihk43r8fmc770s5vr9nqwij-clang-wrapper-13.0.0/resource-root/include -idirafter /nix/store/vccvfa5bjb9dv4x6zq5gjf1yp58y4brg-glibc-2.33-108-dev/include -I /home/e/.nix-profile/include -I /run/current-system/sw/include -I /home/e/.nix-profile/include -I /run/current-system/sw/include -I /home/e/.nix-profile/include -I /run/current-system/sw/include -o uprobe_save uprobe_save.c
Your mileage may vary. I save the data to bpf_program.o
with ./uprobe_save bpf_program.o
. It is indeed a valid ELF file.
Moreover, we can load it with sudo bpftool prog load bpf_program.o /sys/fs/bpf/bpf_program
.
A warning is printed while loading it.
libbpf: elf: skipping unrecognized data section(8) .eh_frame
libbpf: elf: skipping relo section(14) .rel.eh_frame for section(8) .eh_frame
It is my guessing that this bpf object actually contains two bpf programs. The loader of bpftool may be not able to properly handle this.
We see sudo bpftool prog list
now list a new program uprobe
with an old tag 2a8c45c2f0e905b1
.
330: kprobe name uprobe tag 2a8c45c2f0e905b1 gpl
loaded_at 2022-03-24T14:13:16+0800 uid 0
xlated 64B jited 43B memlock 4096B map_ids 97
btf_id 464
332: kprobe name uretprobe tag 10e060f1f65ee396 gpl
loaded_at 2022-03-24T14:13:16+0800 uid 0
xlated 56B jited 39B memlock 4096B map_ids 97
btf_id 464
372: kprobe name uprobe tag 2a8c45c2f0e905b1 gpl
loaded_at 2022-03-24T17:03:12+0800 uid 0
xlated 64B jited 43B memlock 4096B map_ids 121
btf_id 497
We are now certain that the array returned from uprobe_bpf__elf_bytes
is indeed the long-hunted bpf object.
Note that the program uretprobe
is not loaded. The reason may still be bpftool ELF loader’s inability to detect multiple programs.
It is just not designed to work this way.
Now that the bpf object is saved as a hard-coded const void *
, we may find it somewhere in the generated binary.
binwalk uprobe
shows there is a
172168 0x2A088 ELF, 64-bit LSB relocatable, version 1 (SYSV)
We can extract it to a separated file with binwalk --extract tmp ./uprobe
.
file tmp/_uprobe.extracted/2A088.o
shows it is a “ELF 64-bit LSB relocatable, eBPF, version 1 (SYSV), not stripped”. But
tmp/_uprobe.extracted/2A088.o
contains some extra bytes. To obtain an identical object file,
we can run dd if=uprobe of=bpf_program_extracted.o bs=1 skip=172168 count=3304
where 172168
is the start offset obtained from binwalk
,
3304
is the size obtained from inspecting function uprobe_bpf__elf_bytes
in the skeleton code.
We can verify this by running sha512sum tmp/_uprobe.extracted/2A088.o bpf_program.o bpf_program_extracted.o
.
853e4b8c5560ecd40d465792f4777c75c6c117a797b2a1e558ef83dbb36dccdaa62a192a96bc2bb03bd5501f0d5b0007609beb0ba177a076492b189c5bf80a03 tmp/_uprobe.extracted/2A088.o
79ee4b9a85cec9bda9351936cbae4f8a879f87a7b7afd7108ffae74fe95691fd1828d20380d71b0ae5b5cbb935548fa57ed4555143fa72b811f4ba70e92914eb bpf_program.o
79ee4b9a85cec9bda9351936cbae4f8a879f87a7b7afd7108ffae74fe95691fd1828d20380d71b0ae5b5cbb935548fa57ed4555143fa72b811f4ba70e92914eb bpf_program_extracted.o
Now the conclusion is clear. bpftool gen sketelon
generates skeleton code which contains a hard-coded bpf object. The compiler and linker
save this to the .rodata
section (const void *
) of the final binary. From the point of view of an ordinary C function, this bpf program is just another ordinary pointer.
It is also clear that, we have no reliable way to extract bpf objects from elf files as embedding details depend on implementation.
Different compilers and linkers may have different behaviours, e.g. the position within .rodata
can not be determined easily.
The best take is using binwalk
. Fortunately the extraneous bytes in the resulting binary do not really matter. We can load the ELF anyway.
TODO How is the bpf object loaded and all the other things bpf programs need?
For now, see [PATCH v3 bpf-next 00/17] Add code-generated BPF object skeleton support.
Addendum
- The compiling failure was because of the per process open files limit.
- It was systemd’s IP accounting program that was attached to every cgroup.
- Both options can be tuned in
/etc/systemd/system.conf
, seeDefaultIPAccounting
andDefaultLimitNOFILE
in systemd-system.conf(5).