wiki

embedding bpf objects

Too many open files

I had trouble compiling. The nix builder failed with “too many open files”.

find /proc -maxdepth 1 -type d -name '[0-9]*' \
     -exec bash -c "ls {}/fd/ | wc -l | tr '\n' ' '" \; \
     -printf "fds (PID = %P), command: " \
     -exec bash -c "tr '\0' ' ' < {}/cmdline" \; \
     -exec echo \; | sort -rn | head

showed systemd consumed more than 700 file descriptors on startup. I had a look at sudo lsof -p 1, which told me there were hundreds of bpf-map and bpf-prog.

sudo bpftool prog list showed the same bpf program 7dc8126e8768ea37 was loaded over and over.

5: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 4
6: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 3
9: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 8
10: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 7
11: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 10
12: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 9
15: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 14
...
48: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 2
49: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 1
50: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 6
51: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 5
52: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 46
53: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 45
54: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 48
55: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 47
56: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 50
57: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 49
58: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 52
59: cgroup_skb  tag 7dc8126e8768ea37  gpl
        loaded_at 2022-03-17T17:22:00+0800  uid 0
        xlated 312B  jited 192B  memlock 4096B  map_ids 51
...

In fact, every cgroup under the sun was attached with this program. sudo bpftool cgroup tree returned

CgroupPath
ID       AttachType      AttachFlags     Name
/sys/fs/cgroup
49       ingress         multi
48       egress          multi
/sys/fs/cgroup/sys-fs-fuse-connections.mount
    55       ingress         multi
    54       egress          multi
/sys/fs/cgroup/sys-kernel-config.mount
    57       ingress         multi
    56       egress          multi
/sys/fs/cgroup/sys-kernel-debug.mount
    24       ingress         multi
    23       egress          multi
/sys/fs/cgroup/dev-mqueue.mount
    20       ingress         multi
    19       egress          multi
/sys/fs/cgroup/user.slice
    190      ingress         multi
    189      egress          multi
...

Moreover,

find /nix/store/24ljibki63lxk0m11qnw8fh9smh64g3x-systemd-249.7 -name '*bpf*'

returned nothing. So, where did systemd’s bpf object files go?

Hunting for systemd’s lost bpf object files

Hints from systemd

The only reference to cgroup_skb in systemd is restrict-ifaces.bpf.c, whose caller restrict-ifaces.c never explicitly loads any bpf programs.

However, the bpf entry point to restrict cgroup egress access sd_restrictif_e is referenced in the line ingress_link = sym_bpf_program__attach_cgroup(obj->progs.sd_restrictif_i, cgroup_fd);, sym_bpf_program__attach_cgroup’s definition is nowhere to be found, but restrict-ifaces.c included a non-existent file bpf/restrict_ifaces/restrict-ifaces-skel.h. With some further digging, I found out it is generated by the script tools/build-bpf-skel.py. Under the hood, it uses bpftool and libbpf’s code generation support. The generated code can be used directly in the normal c programs. It also exposes functions to load bpf programs. The code generation is added in the commit Add code-generated BPF object skeleton support.

Before we dive into how libbpf and bpftool generate skeleton code, how they embed the bpf programs into the elf binary and how are bpf programs loaded on demand, let’s inspect a simpler program, uprobe from libbpf-bootstrap, a simple program to probe usserspace function calls and returns which also embeds bpf programs with libbpf and bpftool.

Where are the bpf programs located in the memory?

One possibility is bpf programs, like dynamic libraries, are mmaped into uprobe’s memory space. If this is the case, we need to find out the memory region of the bpf programs, and which file they are mapped from.

Let’s use bpftrace to trace the instructions passed to bpf(2).

We run sudo bpftrace bpf_prog_load.bt where bpf_prog_load.bt has the following contents.

// The struct fields are copied from
// https://github.com/torvalds/linux/blob/ed4643521e6af8ab8ed1e467630a85884d2696cf/include/uapi/linux/bpf.h#L1314-L1349
// __aligned_u64 is changed to __u64.
struct BpfProgAttr { /* anonymous struct used by BPF_PROG_LOAD command */
  __u32   prog_type;  /* one of enum bpf_prog_type */
  __u32   insn_cnt;
  __u64   insns;
  __u64   license;
  __u32   log_level;  /* verbosity level of verifier */
  __u32   log_size; /* size of user buffer */
  __u64   log_buf;  /* user supplied buffer */
  __u32   kern_version; /* not used */
  __u32   prog_flags;
  char    prog_name[16u];
  __u32   prog_ifindex; /* ifindex of netdev */
};

// bpf_prog_load's signature is
// static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr)
kprobe:bpf_prog_load {
  printf("bpf_prog_load\n");
  $ptr = (struct BpfProgAttr *) arg0;
  printf("pid: %d\n", pid);
  printf("comm: %s\n", comm);
  printf("attr address: %p\n", $ptr);
  printf("instruction size: %d\n", $ptr->insn_cnt);
  printf("instruction address: %p\n", $ptr->insns);
  printf("prog name: %s\n", $ptr->prog_name);
  printf("prog type: %d\n", $ptr->prog_type);
  printf("prog index: %d\n", $ptr->prog_ifindex);
  printf("\n")
}

// This bpftrace snippet does not seem to work.
// bpf_sys_bpf is the bpf syscall, whose signature is
// BPF_CALL_3(bpf_sys_bpf, int, cmd, void *, attr, u32, attr_size)
kprobe:bpf_sys_bpf {
  // 3 is BPF_PROG_LOAD
  if (arg0 == 3) {
    printf("bpf_sys_bpf\n");
    $ptr = (struct BpfProgAttr *) arg1;
    printf("pid: %d\n", pid);
    printf("comm: %s\n", comm);
    printf("attr address: %p\n", $ptr);
    printf("instruction size: %d\n", $ptr->insn_cnt);
    printf("instruction address: %p\n", $ptr->insns);
    printf("prog name: %s\n", $ptr->prog_name);
    printf("prog type: %d\n", $ptr->prog_type);
    printf("prog index: %d\n", $ptr->prog_ifindex);
    printf("\n")
  }
}

After running a new uprobe process, the following results are printed on the screen

bpf_prog_load
pid: 2509841
comm: uprobe
attr address: 0xffffb98bc0ec3e68
instruction size: 2
instruction address: 0x7fff753d7bc0
prog name: test
prog type: 1
prog index: 0

bpf_prog_load
pid: 2509841
comm: uprobe
attr address: 0xffffb98bc0ec3e68
instruction size: 8
instruction address: 0x1bcac40
prog name: uprobe
prog type: 2
prog index: 0

bpf_prog_load
pid: 2509841
comm: uprobe
attr address: 0xffffb98bc0ec3e68
instruction size: 2
instruction address: 0x7fff753d7b90
prog name:
prog type: 1
prog index: 0

bpf_prog_load
pid: 2509841
comm: uprobe
attr address: 0xffffb98bc0ec3e68
instruction size: 7
instruction address: 0x1bcaf20
prog name: uretprobe
prog type: 2
prog index: 0

where 0x1bcac40 and 0x1bcaf20 are the programs loaded into the bpf vm. Let’s check out where those programs came from.

We attach the program to a gdb session with sudo gdb attach -p $(pgrep '^uprobe$'). We then run info proc mappings to view the memory lay out of the program uprobe.

0x1bc9000          0x1bea000    0x21000        0x0 [heap]

To much of my disappointment, these two bpf programs are in the heap, not in some mmaped files.

Actually, carefully inspecting sudo cat /proc/$(pgrep '^uprobe$')/maps will show that there are no extra mapped files which could include the bpf programs.

The embedded bpf object

Following libbpf-boostrap’s instructions, we build uprobe with a side effect of generating a file uprobe.skel.h which contains a snippet

static inline const void *uprobe_bpf__elf_bytes(size_t *sz)
{
        *sz = 3304;
        return (const void *)"\
\x7f\x45\x4c\x46\x02\x01\x01\0\0\0\0\0\0\0\0\0\x01\0\xf7\0\x01\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\xe8\x08\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x10\0\
\x0f\0\x79\x14\x68\0\0\0\0\0\x79\x13\x70\0\0\0\0\0\x18\x01\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\xb7\x02\0\0\x1e\0\0\0\x85\0\0\0\x06\0\0\0\xb7\0\0\0\0\0\0\0\x95\0\0\0\
\0\0\0\0\x79\x13\x50\0\0\0\0\0\x18\x01\0\0\x20\0\0\0\0\0\0\0\0\0\0\0\xb7\x02\0\
\0\x1a\0\0\0\x85\0\0\0\x06\0\0\0\xb7\0\0\0\0\0\0\0\x95\0\0\0\0\0\0\0\x44\x75\
\x61\x6c\x20\x42\x53\x44\x2f\x47\x50\x4c\0\0\0\0\0\0\0\0\0\0\0\0\x55\x50\x52\
\x4f\x42\x45\x20\x45\x4e\x54\x52\x59\x3a\x20\x61\x20\x3d\x20\x25\x64\x2c\x20\
\x62\x20\x3d\x20\x25\x64\x0a\0\0\0\x55\x50\x52\x4f\x42\x45\x20\x45\x58\x49\x54\
\x3a\x20\x72\x65\x74\x75\x72\x6e\x20\x3d\x20\x25\x64\x0a\0\x9f\xeb\x01\0\x18\0\
\0\0\0\0\0\0\x54\x02\0\0\x54\x02\0\0\xec\x01\0\0\0\0\0\0\0\0\0\x02\x02\0\0\0\
\x01\0\0\0\x15\0\0\x04\xa8\0\0\0\x09\0\0\0\x03\0\0\0\0\0\0\0\x0d\0\0\0\x03\0\0\
\0\x40\0\0\0\x11\0\0\0\x03\0\0\0\x80\0\0\0\x15\0\0\0\x03\0\0\0\xc0\0\0\0\x19\0\
\0\0\x03\0\0\0\0\x01\0\0\x1d\0\0\0\x03\0\0\0\x40\x01\0\0\x21\0\0\0\x03\0\0\0\
\x80\x01\0\0\x25\0\0\0\x03\0\0\0\xc0\x01\0\0\x29\0\0\0\x03\0\0\0\0\x02\0\0\x2c\
\0\0\0\x03\0\0\0\x40\x02\0\0\x2f\0\0\0\x03\0\0\0\x80\x02\0\0\x33\0\0\0\x03\0\0\
\0\xc0\x02\0\0\x37\0\0\0\x03\0\0\0\0\x03\0\0\x3b\0\0\0\x03\0\0\0\x40\x03\0\0\
\x3f\0\0\0\x03\0\0\0\x80\x03\0\0\x43\0\0\0\x03\0\0\0\xc0\x03\0\0\x4c\0\0\0\x03\
\0\0\0\0\x04\0\0\x50\0\0\0\x03\0\0\0\x40\x04\0\0\x53\0\0\0\x03\0\0\0\x80\x04\0\
\0\x5a\0\0\0\x03\0\0\0\xc0\x04\0\0\x5e\0\0\0\x03\0\0\0\0\x05\0\0\x61\0\0\0\0\0\
\0\x01\x08\0\0\0\x40\0\0\0\0\0\0\0\x01\0\0\x0d\x05\0\0\0\x73\0\0\0\x01\0\0\0\
\x77\0\0\0\0\0\0\x01\x04\0\0\0\x20\0\0\x01\x7b\0\0\0\x01\0\0\x0c\x04\0\0\0\0\0\
\0\0\x01\0\0\x0d\x05\0\0\0\x73\0\0\0\x01\0\0\0\x23\x01\0\0\x01\0\0\x0c\x07\0\0\
\0\x92\x01\0\0\0\0\0\x01\x01\0\0\0\x08\0\0\x01\0\0\0\0\0\0\0\x03\0\0\0\0\x09\0\
\0\0\x0b\0\0\0\x0d\0\0\0\x97\x01\0\0\0\0\0\x01\x04\0\0\0\x20\0\0\0\xab\x01\0\0\
\0\0\0\x0e\x0a\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\x0a\x09\0\0\0\0\0\0\0\0\0\0\x03\0\
\0\0\0\x0d\0\0\0\x0b\0\0\0\x1e\0\0\0\xb3\x01\0\0\0\0\0\x0e\x0e\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\x03\0\0\0\0\x0d\0\0\0\x0b\0\0\0\x1a\0\0\0\xc6\x01\0\0\0\0\0\x0e\
\x10\0\0\0\0\0\0\0\xdc\x01\0\0\x02\0\0\x0f\0\0\0\0\x0f\0\0\0\0\0\0\0\x1e\0\0\0\
\x11\0\0\0\x20\0\0\0\x1a\0\0\0\xe4\x01\0\0\x01\0\0\x0f\0\0\0\0\x0c\0\0\0\0\0\0\
\0\x0d\0\0\0\0\x70\x74\x5f\x72\x65\x67\x73\0\x72\x31\x35\0\x72\x31\x34\0\x72\
\x31\x33\0\x72\x31\x32\0\x72\x62\x70\0\x72\x62\x78\0\x72\x31\x31\0\x72\x31\x30\
\0\x72\x39\0\x72\x38\0\x72\x61\x78\0\x72\x63\x78\0\x72\x64\x78\0\x72\x73\x69\0\
\x72\x64\x69\0\x6f\x72\x69\x67\x5f\x72\x61\x78\0\x72\x69\x70\0\x63\x73\0\x65\
\x66\x6c\x61\x67\x73\0\x72\x73\x70\0\x73\x73\0\x6c\x6f\x6e\x67\x20\x75\x6e\x73\
\x69\x67\x6e\x65\x64\x20\x69\x6e\x74\0\x63\x74\x78\0\x69\x6e\x74\0\x75\x70\x72\
\x6f\x62\x65\0\x75\x70\x72\x6f\x62\x65\x2f\x66\x75\x6e\x63\0\x2f\x68\x6f\x6d\
\x65\x2f\x65\x2f\x57\x6f\x72\x6b\x73\x70\x61\x63\x65\x2f\x6c\x69\x62\x62\x70\
\x66\x2d\x62\x6f\x6f\x74\x73\x74\x72\x61\x70\x2f\x65\x78\x61\x6d\x70\x6c\x65\
\x73\x2f\x63\x2f\x75\x70\x72\x6f\x62\x65\x2e\x62\x70\x66\x2e\x63\0\x69\x6e\x74\
\x20\x42\x50\x46\x5f\x4b\x50\x52\x4f\x42\x45\x28\x75\x70\x72\x6f\x62\x65\x2c\
\x20\x69\x6e\x74\x20\x61\x2c\x20\x69\x6e\x74\x20\x62\x29\0\x09\x62\x70\x66\x5f\
\x70\x72\x69\x6e\x74\x6b\x28\x22\x55\x50\x52\x4f\x42\x45\x20\x45\x4e\x54\x52\
\x59\x3a\x20\x61\x20\x3d\x20\x25\x64\x2c\x20\x62\x20\x3d\x20\x25\x64\x5c\x6e\
\x22\x2c\x20\x61\x2c\x20\x62\x29\x3b\0\x75\x72\x65\x74\x70\x72\x6f\x62\x65\0\
\x75\x72\x65\x74\x70\x72\x6f\x62\x65\x2f\x66\x75\x6e\x63\0\x69\x6e\x74\x20\x42\
\x50\x46\x5f\x4b\x52\x45\x54\x50\x52\x4f\x42\x45\x28\x75\x72\x65\x74\x70\x72\
\x6f\x62\x65\x2c\x20\x69\x6e\x74\x20\x72\x65\x74\x29\0\x09\x62\x70\x66\x5f\x70\
\x72\x69\x6e\x74\x6b\x28\x22\x55\x50\x52\x4f\x42\x45\x20\x45\x58\x49\x54\x3a\
\x20\x72\x65\x74\x75\x72\x6e\x20\x3d\x20\x25\x64\x5c\x6e\x22\x2c\x20\x72\x65\
\x74\x29\x3b\0\x63\x68\x61\x72\0\x5f\x5f\x41\x52\x52\x41\x59\x5f\x53\x49\x5a\
\x45\x5f\x54\x59\x50\x45\x5f\x5f\0\x4c\x49\x43\x45\x4e\x53\x45\0\x5f\x5f\x5f\
\x5f\x75\x70\x72\x6f\x62\x65\x2e\x5f\x5f\x5f\x5f\x66\x6d\x74\0\x5f\x5f\x5f\x5f\
\x75\x72\x65\x74\x70\x72\x6f\x62\x65\x2e\x5f\x5f\x5f\x5f\x66\x6d\x74\0\x2e\x72\
\x6f\x64\x61\x74\x61\0\x6c\x69\x63\x65\x6e\x73\x65\0\x9f\xeb\x01\0\x20\0\0\0\0\
\0\0\0\x24\0\0\0\x24\0\0\0\x74\0\0\0\x98\0\0\0\0\0\0\0\x08\0\0\0\x82\0\0\0\x01\
\0\0\0\0\0\0\0\x06\0\0\0\x2d\x01\0\0\x01\0\0\0\0\0\0\0\x08\0\0\0\x10\0\0\0\x82\
\0\0\0\x03\0\0\0\0\0\0\0\x8e\0\0\0\xc9\0\0\0\x05\x2c\0\0\x10\0\0\0\x8e\0\0\0\
\xee\0\0\0\x02\x34\0\0\x30\0\0\0\x8e\0\0\0\xc9\0\0\0\x05\x2c\0\0\x2d\x01\0\0\
\x03\0\0\0\0\0\0\0\x8e\0\0\0\x3c\x01\0\0\x05\x48\0\0\x08\0\0\0\x8e\0\0\0\x62\
\x01\0\0\x02\x50\0\0\x28\0\0\0\x8e\0\0\0\x3c\x01\0\0\x05\x48\0\0\0\0\0\0\0\0\
\x10\0\0\0\0\0\0\0\x01\x7a\x52\0\x08\x7c\x0b\x01\x0c\0\0\0\x18\0\0\0\x18\0\0\0\
\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\0\x1c\0\0\0\x34\0\0\0\0\0\0\0\0\0\0\0\
\x38\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\x03\0\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x14\0\0\0\x01\0\x05\0\
\0\0\0\0\0\0\0\0\x1e\0\0\0\0\0\0\0\0\0\0\0\x03\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\x27\0\0\0\x01\0\x05\0\x20\0\0\0\0\0\0\0\x1a\0\0\0\0\0\0\0\0\0\0\0\x03\
\0\x05\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x53\0\0\0\x12\0\x02\0\0\0\0\0\0\0\0\0\
\x40\0\0\0\0\0\0\0\x5a\0\0\0\x12\0\x03\0\0\0\0\0\0\0\0\0\x38\0\0\0\0\0\0\0\xa8\
\0\0\0\x11\0\x04\0\0\0\0\0\0\0\0\0\x0d\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x01\0\0\
\0\x05\0\0\0\x08\0\0\0\0\0\0\0\x01\0\0\0\x05\0\0\0\x40\x02\0\0\0\0\0\0\x03\0\0\
\0\x05\0\0\0\x4c\x02\0\0\0\0\0\0\x03\0\0\0\x05\0\0\0\x64\x02\0\0\0\0\0\0\x04\0\
\0\0\x08\0\0\0\x2c\0\0\0\0\0\0\0\x04\0\0\0\x01\0\0\0\x3c\0\0\0\0\0\0\0\x04\0\0\
\0\x03\0\0\0\x50\0\0\0\0\0\0\0\x04\0\0\0\x01\0\0\0\x60\0\0\0\0\0\0\0\x04\0\0\0\
\x01\0\0\0\x70\0\0\0\0\0\0\0\x04\0\0\0\x01\0\0\0\x88\0\0\0\0\0\0\0\x04\0\0\0\
\x03\0\0\0\x98\0\0\0\0\0\0\0\x04\0\0\0\x03\0\0\0\xa8\0\0\0\0\0\0\0\x04\0\0\0\
\x03\0\0\0\x1c\0\0\0\0\0\0\0\x02\0\0\0\x01\0\0\0\x38\0\0\0\0\0\0\0\x02\0\0\0\
\x03\0\0\0\0\x2e\x74\x65\x78\x74\0\x2e\x72\x65\x6c\x2e\x42\x54\x46\x2e\x65\x78\
\x74\0\x5f\x5f\x5f\x5f\x75\x70\x72\x6f\x62\x65\x2e\x5f\x5f\x5f\x5f\x66\x6d\x74\
\0\x5f\x5f\x5f\x5f\x75\x72\x65\x74\x70\x72\x6f\x62\x65\x2e\x5f\x5f\x5f\x5f\x66\
\x6d\x74\0\x6c\x69\x63\x65\x6e\x73\x65\0\x2e\x72\x65\x6c\x2e\x65\x68\x5f\x66\
\x72\x61\x6d\x65\0\x75\x70\x72\x6f\x62\x65\0\x75\x72\x65\x74\x70\x72\x6f\x62\
\x65\0\x2e\x72\x65\x6c\x75\x70\x72\x6f\x62\x65\x2f\x66\x75\x6e\x63\0\x2e\x72\
\x65\x6c\x75\x72\x65\x74\x70\x72\x6f\x62\x65\x2f\x66\x75\x6e\x63\0\x2e\x73\x74\
\x72\x74\x61\x62\0\x2e\x73\x79\x6d\x74\x61\x62\0\x2e\x72\x6f\x64\x61\x74\x61\0\
\x2e\x72\x65\x6c\x2e\x42\x54\x46\0\x4c\x49\x43\x45\x4e\x53\x45\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\x01\0\0\0\x06\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x04\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\x68\0\0\0\x01\0\0\0\x06\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x40\0\
\0\0\0\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\x78\0\0\0\x01\0\0\0\x06\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x80\0\0\0\0\0\0\0\x38\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x3d\0\0\0\x01\
\0\0\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xb8\0\0\0\0\0\0\0\x0d\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x97\0\0\0\x01\0\0\0\x02\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\xd0\0\0\0\0\0\0\0\x3a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\x10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xa3\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\x0a\x01\0\0\0\0\0\0\x58\x04\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\x0b\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x62\
\x05\0\0\0\0\0\0\xb8\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\x49\0\0\0\x01\0\0\0\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x20\x06\0\0\0\0\0\
\0\x50\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x8f\0\0\
\0\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x70\x06\0\0\0\0\0\0\xd8\0\0\0\0\0\
\0\0\x0f\0\0\0\x06\0\0\0\x08\0\0\0\0\0\0\0\x18\0\0\0\0\0\0\0\x64\0\0\0\x09\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x48\x07\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x09\0\
\0\0\x02\0\0\0\x08\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x74\0\0\0\x09\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\0\0\0\0\0\x58\x07\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x09\0\0\0\x03\0\
\0\0\x08\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x9f\0\0\0\x09\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\0\0\0\0\0\x68\x07\0\0\0\0\0\0\x30\0\0\0\0\0\0\0\x09\0\0\0\x06\0\0\0\x08\0\
\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x07\0\0\0\x09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
\0\0\x98\x07\0\0\0\0\0\0\x80\0\0\0\0\0\0\0\x09\0\0\0\x07\0\0\0\x08\0\0\0\0\0\0\
\0\x10\0\0\0\0\0\0\0\x45\0\0\0\x09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x18\
\x08\0\0\0\0\0\0\x20\0\0\0\0\0\0\0\x09\0\0\0\x08\0\0\0\x08\0\0\0\0\0\0\0\x10\0\
\0\0\0\0\0\0\x87\0\0\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x38\x08\0\0\0\
\0\0\0\xb0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0";
}

Eureka! This is the bpf program. Look at the head of this file. \x7f\x45\x4c\x46 is the magic number for ELF files, while \xf7 is the machine number for bpf.

For better understanding of what this array does, we dump it to a file with the following c program.

#include "uprobe.skel.h"
#include <stdio.h>

void writeBpfObjectToFile(char filename[80]) {
  FILE *fp = fopen(filename, "wb");
  size_t size = 0;
  const void *p = uprobe_bpf__elf_bytes(&size);
  int r = fwrite(p, 1, size, fp);
  fclose(fp);
}

int main(int argc, char **argv) {
  writeBpfObjectToFile(argv[1]);
  return 0;
}

I compile it with

clang -g -I.output -I../../libbpf/include/uapi -I../../vmlinux/x86/ -idirafter /nix/store/zhykg9kkhyb6mb47p1mw7pyz847ll5b4-libelf-0.8.13/include -idirafter /nix/store/1my9xr1s1nfjmqwyi46pzdrvny7hm66x-zlib-1.2.11-dev/include -idirafter /nix/store/0sk7aa616ihk43r8fmc770s5vr9nqwij-clang-wrapper-13.0.0/resource-root/include -idirafter /nix/store/vccvfa5bjb9dv4x6zq5gjf1yp58y4brg-glibc-2.33-108-dev/include -I /home/e/.nix-profile/include -I /run/current-system/sw/include -I /home/e/.nix-profile/include -I /run/current-system/sw/include -I /home/e/.nix-profile/include -I /run/current-system/sw/include -o uprobe_save uprobe_save.c

Your mileage may vary. I save the data to bpf_program.o with ./uprobe_save bpf_program.o. It is indeed a valid ELF file. Moreover, we can load it with sudo bpftool prog load bpf_program.o /sys/fs/bpf/bpf_program. A warning is printed while loading it.

libbpf: elf: skipping unrecognized data section(8) .eh_frame
libbpf: elf: skipping relo section(14) .rel.eh_frame for section(8) .eh_frame

It is my guessing that this bpf object actually contains two bpf programs. The loader of bpftool may be not able to properly handle this.

We see sudo bpftool prog list now list a new program uprobe with an old tag 2a8c45c2f0e905b1.

330: kprobe  name uprobe  tag 2a8c45c2f0e905b1  gpl
        loaded_at 2022-03-24T14:13:16+0800  uid 0
        xlated 64B  jited 43B  memlock 4096B  map_ids 97
        btf_id 464
332: kprobe  name uretprobe  tag 10e060f1f65ee396  gpl
        loaded_at 2022-03-24T14:13:16+0800  uid 0
        xlated 56B  jited 39B  memlock 4096B  map_ids 97
        btf_id 464
372: kprobe  name uprobe  tag 2a8c45c2f0e905b1  gpl
        loaded_at 2022-03-24T17:03:12+0800  uid 0
        xlated 64B  jited 43B  memlock 4096B  map_ids 121
        btf_id 497

We are now certain that the array returned from uprobe_bpf__elf_bytes is indeed the long-hunted bpf object. Note that the program uretprobe is not loaded. The reason may still be bpftool ELF loader’s inability to detect multiple programs. It is just not designed to work this way.

Now that the bpf object is saved as a hard-coded const void *, we may find it somewhere in the generated binary. binwalk uprobe shows there is a

172168        0x2A088         ELF, 64-bit LSB relocatable, version 1 (SYSV)

We can extract it to a separated file with binwalk --extract tmp ./uprobe. file tmp/_uprobe.extracted/2A088.o shows it is a “ELF 64-bit LSB relocatable, eBPF, version 1 (SYSV), not stripped”. But tmp/_uprobe.extracted/2A088.o contains some extra bytes. To obtain an identical object file, we can run dd if=uprobe of=bpf_program_extracted.o bs=1 skip=172168 count=3304 where 172168 is the start offset obtained from binwalk, 3304 is the size obtained from inspecting function uprobe_bpf__elf_bytes in the skeleton code. We can verify this by running sha512sum tmp/_uprobe.extracted/2A088.o bpf_program.o bpf_program_extracted.o.

853e4b8c5560ecd40d465792f4777c75c6c117a797b2a1e558ef83dbb36dccdaa62a192a96bc2bb03bd5501f0d5b0007609beb0ba177a076492b189c5bf80a03  tmp/_uprobe.extracted/2A088.o
79ee4b9a85cec9bda9351936cbae4f8a879f87a7b7afd7108ffae74fe95691fd1828d20380d71b0ae5b5cbb935548fa57ed4555143fa72b811f4ba70e92914eb  bpf_program.o
79ee4b9a85cec9bda9351936cbae4f8a879f87a7b7afd7108ffae74fe95691fd1828d20380d71b0ae5b5cbb935548fa57ed4555143fa72b811f4ba70e92914eb  bpf_program_extracted.o

Now the conclusion is clear. bpftool gen sketelon generates skeleton code which contains a hard-coded bpf object. The compiler and linker save this to the .rodata section (const void *) of the final binary. From the point of view of an ordinary C function, this bpf program is just another ordinary pointer. It is also clear that, we have no reliable way to extract bpf objects from elf files as embedding details depend on implementation. Different compilers and linkers may have different behaviours, e.g. the position within .rodata can not be determined easily. The best take is using binwalk. Fortunately the extraneous bytes in the resulting binary do not really matter. We can load the ELF anyway.

TODO How is the bpf object loaded and all the other things bpf programs need?

For now, see [PATCH v3 bpf-next 00/17] Add code-generated BPF object skeleton support.

Addendum

  • The compiling failure was because of the per process open files limit.
  • It was systemd’s IP accounting program that was attached to every cgroup.
  • Both options can be tuned in /etc/systemd/system.conf, see DefaultIPAccounting and DefaultLimitNOFILE in systemd-system.conf(5).