wiki

emacs server activation timeout

If you are also expirenecing the systemd emacs unit activating timeout, try

pkill -f -a emacs
rm -f /run/user/$UID/emacs/server

Maybe emacs is trying to connect to the server socket /run/user/$UID/emacs/server, but for some reason, the server socket is obsolete. In that case, emacs would hang up. To make things more complicated, manually running emacs --debug-init almost certainly works. It’s only when you are trying to start the daemon you will encounter problems (service launchers like systemd will think the emacs daemon service failed because no sub processes are spawn).

Symptoms

My emacs service unit is repeatedly activating.

Active: activating (start) since Thu 2022-05-05 21:03:22 CST; 47s ago

Sometime after the emacs process started, it would be killed by systemd. The strange thing is that it always works when I manually start emacs. I initially thought it is my configuration’s fault. elisp-bug-hunter didn’t help. I tried to pinpoint the failing code by bisection. It gradually became to me that my code may be is working correctly. I observed that both commenting out code above and below some point work. I also found that whenever I manually started emacs process is indeed unresponsive until I press a key (any key works).

Investigation

I decided to look into the details of emacs running status by gdb -p $(pgrep -f 'emacs.*debug-init').

(gdb) info threads
  Id   Target Id                                             Frame
    1    Thread 0x7fe398ace000 (LWP 1019020) ".emacs-29.0.50-" 0x00007fe39bdbabb7 in connect () from /nix/store/ayrsyv7npr0lcbann4k9lxr19x813f0z-glibc-2.34-115/lib/libc.so.6
    * 2    Thread 0x7fe397539640 (LWP 1019040) "gmain"           0x00007fe39bdadf89 in poll () from /nix/store/ayrsyv7npr0lcbann4k9lxr19x813f0z-glibc-2.34-115/lib/libc.so.6
    (gdb) thread 1
    [Switching to thread 1 (Thread 0x7fe398ace000 (LWP 1019020))]
#0  0x00007fe39bdbabb7 in connect () from /nix/store/ayrsyv7npr0lcbann4k9lxr19x813f0z-glibc-2.34-115/lib/libc.so.6
    (gdb) backtrace
#0  0x00007fe39bdbabb7 in connect () from /nix/store/ayrsyv7npr0lcbann4k9lxr19x813f0z-glibc-2.34-115/lib/libc.so.6
#1  0x000000000062f56c in connect_network_socket ()
#2  0x00000000006303cb in Fmake_network_process ()
#3  0x0000000000624dc1 in exec_byte_code ()
#4  0x00000000005ddd3b in apply_lambda ()
#5  0x00000000005de466 in eval_sub ()
#6  0x00000000005dea1d in Fif ()
#7  0x00000000005de552 in eval_sub ()
#8  0x0000000000611a8a in readevalloop ()
#9  0x0000000000612c91 in Feval_buffer ()
#10 0x0000000000624dc1 in exec_byte_code ()
#11 0x00000000005db2c7 in Ffuncall ()
#12 0x000000000061292b in Fload ()
#13 0x0000000000624dc1 in exec_byte_code ()
#14 0x00000000005ddd3b in apply_lambda ()
#15 0x00000000005de466 in eval_sub ()
#16 0x00000000005de552 in eval_sub ()
#17 0x00000000005de9dd in Fprogn ()
#18 0x00000000005dd90e in funcall_lambda ()
#19 0x00000000005ddd3b in apply_lambda ()
#20 0x00000000005de466 in eval_sub ()
#21 0x0000000000611a8a in readevalloop ()
#22 0x0000000000612c91 in Feval_buffer ()
#23 0x0000000000624dc1 in exec_byte_code ()
#24 0x00000000005db2c7 in Ffuncall ()
#25 0x000000000061292b in Fload ()
#26 0x0000000000624dc1 in exec_byte_code ()
#27 0x00000000005ddd3b in apply_lambda ()
#28 0x00000000005de466 in eval_sub ()
#29 0x00000000005e10aa in Feval ()
#30 0x00000000005d9c0f in internal_condition_case ()
#31 0x000000000054b92d in top_level_1 ()
#32 0x00000000005d9b73 in internal_catch ()
#33 0x000000000054adb3 in command_loop ()
#34 0x000000000054f96a in recursive_edit_1 ()
#35 0x000000000054fce7 in Frecursive_edit ()
#36 0x000000000042cea4 in main ()
    (gdb) thread 2
    [Switching to thread 2 (Thread 0x7fe397539640 (LWP 1019040))]
#0  0x00007fe39bdadf89 in poll () from /nix/store/ayrsyv7npr0lcbann4k9lxr19x813f0z-glibc-2.34-115/lib/libc.so.6
    (gdb) backtrace
#0  0x00007fe39bdadf89 in poll () from /nix/store/ayrsyv7npr0lcbann4k9lxr19x813f0z-glibc-2.34-115/lib/libc.so.6
#1  0x00007fe39d53801e in g_main_context_iterate.constprop () from /nix/store/abwcp8fsvrign29hrqrha5psz8kkc4rx-glib-2.72.0/lib/libglib-2.0.so.0
#2  0x00007fe39d53813f in g_main_context_iteration () from /nix/store/abwcp8fsvrign29hrqrha5psz8kkc4rx-glib-2.72.0/lib/libglib-2.0.so.0
#3  0x00007fe39d538191 in glib_worker_main () from /nix/store/abwcp8fsvrign29hrqrha5psz8kkc4rx-glib-2.72.0/lib/libglib-2.0.so.0
#4  0x00007fe39d5625ad in g_thread_proxy () from /nix/store/abwcp8fsvrign29hrqrha5psz8kkc4rx-glib-2.72.0/lib/libglib-2.0.so.0
#5  0x00007fe39bd37eb2 in start_thread () from /nix/store/ayrsyv7npr0lcbann4k9lxr19x813f0z-glibc-2.34-115/lib/libc.so.6
#6  0x00007fe39bdba31c in clone3 () from /nix/store/ayrsyv7npr0lcbann4k9lxr19x813f0z-glibc-2.34-115/lib/libc.so.6

There are only two threads running. One is blocked at polling, another is trying to connecting to something. Unfortunately I can’t figure out what emacs is trying to connect, lsof -p $(pgrep -a 'emacs.*debug-init') only shows

.emacs-29 1019020    e    0u      CHR             136,15      0t0      18 /dev/pts/15
.emacs-29 1019020    e    1u      CHR             136,15      0t0      18 /dev/pts/15
.emacs-29 1019020    e    2u      CHR             136,15      0t0      18 /dev/pts/15
.emacs-29 1019020    e    3u  a_inode               0,13        0    8493 [timerfd]
.emacs-29 1019020    e    4u  a_inode               0,13        0    8493 [eventfd:159]
.emacs-29 1019020    e    5u      CHR                5,0      0t0      11 /dev/tty
.emacs-29 1019020    e    6u     unix 0x000000004c491458      0t0 1745132 type=STREAM (UNCONNECTED)
.emacs-29 1019020    e   15u     unix 0x00000000f54422e9      0t0 1747036 type=STREAM (CONNECTED)
.emacs-29 1019020    e   16u  a_inode               0,13        0    8493 [eventfd:164]
.emacs-29 1019020    e   17u     unix 0x00000000938bed24      0t0 1747037 type=STREAM (CONNECTED)

I became to realize emacs maybe is trying to connect to the server socket, which for some reason, is un-connectable for now. I tried to verify my theory by inspect the argument emacs gives to the syscall connect.

(gdb) info registers
rax            0xfffffffffffffe00  -512
rbx            0x6                 6
rcx            0x7fe39bdbabb7      140615549168567
rdx            0x6e                110
rsi            0xc79e410           209314832
rdi            0x6                 6
rbp            0xc79e410           0xc79e410
rsp            0x7ffcf6ffed70      0x7ffcf6ffed70

man 2 connect shows the signature of connect

int connect(int sockfd, const struct sockaddr *addr,
            socklen_t addrlen);

while man 2 syscall shows the abi of linux syscall

Arch/ABI      arg1  arg2  arg3  arg4  arg5  arg6  arg7  Notes
──────────────────────────────────────────────────────────────
......
x86-64        rdi   rsi   rdx   r10   r8    r9    -
......

in particular, rdi value(6) is the socket fd, rsi (value 0xc79e410) is the sockaddr (of length $rdx) given to the kernel.

I tried to dump the content of sockaddr with

(gdb) x/110bx $rsi
0xc79e410:      0x01    0x00    0x2f    0x72    0x75    0x6e    0x2f    0x75
0xc79e418:      0x73    0x65    0x72    0x2f    0x31    0x30    0x30    0x30
0xc79e420:      0x2f    0x65    0x6d    0x61    0x63    0x73    0x2f    0x73
0xc79e428:      0x65    0x72    0x76    0x65    0x72    0x00    0x00    0x00
......
(gdb) x/s $rsi+2
0xc79e412:      "/run/user/1000/emacs/server"

Eureka. Emacs is indeed trying to connect to /run/user/1000/emacs/server. So who is listening to this?

lsof -E /run/user/$UID/emacs/server

lsof: WARNING: can't stat() zfs file system /var/lib/docker/zfs/graph/598f48f4aa007bc4092642e5997c8c86f8bd9d86df65a73e2ae0d38292c151c7
      Output information may be incomplete.
lsof: WARNING: can't stat() zfs file system /var/lib/docker/zfs/graph/69bad3e00b2eef638322f8cb1be1bc3f03c2f500f874e9719da680d64b64cb7d
      Output information may be incomplete.
lsof: WARNING: can't stat() nsfs file system /run/docker/netns/ef24d71f2537
      Output information may be incomplete.
lsof: WARNING: can't stat() nsfs file system /run/docker/netns/ef4c8c959af7
      Output information may be incomplete.
COMMAND      PID USER   FD   TYPE             DEVICE SIZE/OFF    NODE NAME
.emacs-28 708873    e    7u  unix 0x00000000518bee2e      0t0 1144673 /run/user/1000/emacs/server type=STREAM (LISTEN)

tr '\0' ' ' < /proc/708873/cmdline
/nix/store/zq0czk42sfa2ryd0d2g6cdd04ps05nln-emacs-28.1/bin/emacs -l cl-loaddefs -l nix-generated-autoload -q -l magit -f magit --eval (local-set-key "q" #'kill-emacs) -f delete-other-windows -nw

Oh, it’s some magit wrapper I wrote for ease of use of magit in the terminal.

emacs -q -l magit -f magit --eval "(local-set-key \"q\" #'kill-emacs)" -f delete-other-windows "${emacs_arguments[@]}"

All I have to do is pkill emacs; rm -f /run/user/$UID/emacs/server*; systemctl restart --user emacs, now everything works.

One more thing, why is emacs working correctly when I manually started it? It is because the other thread is polling fd 4

.emacs-29 1019020    e    4u  a_inode               0,13        0    8493 [eventfd:159]

When I pressed any key, this thread was woken up.