Darksun 发布的文章

使用 strace 查找 Emacs 启动阻塞的原因

Darksun 发布于 2019-09-29
另请参阅: 技术,strace, EMACS
1 条评论

之前就觉得我的 Emacs 启动好慢，查看启动日志会发现启动到一般的时候会有一个比较长时间的卡顿。之前一直没有理会它，今天花了点时间探索了一下，发现罪魁祸首居然是 exec-path-from-shell 这个包。

现将探索的过程记录如下：由于使用了 spacemacs 的配置，配置上比较复杂，不太想通过实验缩减配置的方式来摸索出问题的地方。刚好最近在学习使用 strace 工具，因此决定使用 strace 来看看 Emacs 到底卡在哪里。

strace emacs --fg-daemon

输出的内容特别多，这里只截取卡顿前的部分内容

readlinkat(AT_FDCWD, "/home", 0x7ffd1d3abb50, 1024) = -1 EINVAL (无效的参数)
readlinkat(AT_FDCWD, "/home/lujun9972", 0x7ffd1d3abf00, 1024) = -1 EINVAL (无效的参数)
readlinkat(AT_FDCWD, "/home/lujun9972/.emacs.d", 0x7ffd1d3ac2b0, 1024) = -1 EINVAL (无效的参数)
readlinkat(AT_FDCWD, "/home/lujun9972/.emacs.d/elpa", 0x7ffd1d3ac660, 1024) = -1 EINVAL (无效的参数)
readlinkat(AT_FDCWD, "/home/lujun9972/.emacs.d/elpa/exec-path-from-shell-20180323.1904", 0x7ffd1d3aca10, 1024) = -1 EINVAL (无效的参数)
readlinkat(AT_FDCWD, "/home/lujun9972/.emacs.d/elpa/exec-path-from-shell-20180323.1904/exec-path-from-shell.elc", 0x7ffd1d3acdc0, 1024) = -1 EINVAL (无效的参数)
lseek(7, -2655, SEEK_CUR)               = 1441
read(7, "\n(defvar exec-path-from-shell-de"..., 4096) = 4096
lseek(7, 5537, SEEK_SET)                = 5537
lseek(7, 5537, SEEK_SET)                = 5537
lseek(7, 5537, SEEK_SET)                = 5537
lseek(7, 5537, SEEK_SET)                = 5537
lseek(7, 5537, SEEK_SET)                = 5537
lseek(7, 5537, SEEK_SET)                = 5537
brk(0x7507000)                          = 0x7507000
lseek(7, 5537, SEEK_SET)                = 5537
lseek(7, 5537, SEEK_SET)                = 5537
lseek(7, 5537, SEEK_SET)                = 5537
read(7, "230\\205\26\0\t\22\\307\\310\t!\vC\\\"\\211\24\\2"..., 4096) = 2430
lseek(7, 7967, SEEK_SET)                = 7967
lseek(7, 7967, SEEK_SET)                = 7967
lseek(7, 7967, SEEK_SET)                = 7967
lseek(7, 7967, SEEK_SET)                = 7967
read(7, "", 4096)                       = 0
close(7)                                = 0
getpid()                                = 10818
faccessat(AT_FDCWD, "/home/lujun9972/bin/printf", X_OK) = -1 ENOENT (没有那个文件或目录)
faccessat(AT_FDCWD, "/usr/local/sbin/printf", X_OK) = -1 ENOENT (没有那个文件或目录)
faccessat(AT_FDCWD, "/usr/local/bin/printf", X_OK) = -1 ENOENT (没有那个文件或目录)
faccessat(AT_FDCWD, "/usr/bin/printf", X_OK) = 0
stat("/usr/bin/printf", {st_mode=S_IFREG|0755, st_size=51176, ...}) = 0
openat(AT_FDCWD, "/dev/null", O_RDONLY|O_CLOEXEC) = 7
faccessat(AT_FDCWD, "/proc/5070/fd/.", F_OK) = 0
faccessat(AT_FDCWD, "/proc/5070/fd/.", F_OK) = 0
faccessat(AT_FDCWD, "/bin/bash", X_OK)  = 0
stat("/bin/bash", {st_mode=S_IFREG|0755, st_size=903440, ...}) = 0
pipe2([8, 9], O_CLOEXEC)                = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
vfork()                                 = 10949
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
close(9)                                = 0
close(7)                                = 0
read(8, "bash: \346\227\240\346\263\225\350\256\276\345\256\232\347\273\210\347\253\257\350\277\233\347\250\213\347\273"..., 16384) = 74
read(8, "bash: \346\255\244 shell \344\270\255\346\227\240\344\273\273\345\212\241\346\216\247\345"..., 16310) = 35
read(8, "setterm: \347\273\210\347\253\257 xterm-256color \344"..., 16275) = 51
read(8, "Couldn't get a file descriptor r"..., 16224) = 56
read(8, "bash: [: \357\274\232\351\234\200\350\246\201\346\225\264\346\225\260\350\241\250\350\276\276\345\274"..., 16168) = 34
read(8, "Your display number is 0\n", 16134) = 25
read(8, "Test whether fcitx is running co"..., 16109) = 53
read(8, "Fcitx is running correctly.\n\n==="..., 16056) = 87
read(8, "Launch fbterm...\n", 15969)    = 17
read(8, "stdin isn't a tty!\n", 15952)  = 19
read(8, "__RESULT\0/home/lujun9972/bin:/ho"..., 15933) = 298
read(8, 0x7ffd1d39ce9d, 15635)          = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=10949, si_uid=1000, si_status=0, si_utime=10, si_stime=7} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (被中断的系统调用)
read(8, "", 15635)                      = 0
wait4(10949, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 10949
close(8)                                = 0
getpid()                                = 10818
faccessat(AT_FDCWD, "/home/lujun9972/bin/printf", X_OK) = -1 ENOENT (没有那个文件或目录)
faccessat(AT_FDCWD, "/usr/local/sbin/printf", X_OK) = -1 ENOENT (没有那个文件或目录)
faccessat(AT_FDCWD, "/usr/local/bin/printf", X_OK) = -1 ENOENT (没有那个文件或目录)
faccessat(AT_FDCWD, "/usr/bin/printf", X_OK) = 0
stat("/usr/bin/printf", {st_mode=S_IFREG|0755, st_size=51176, ...}) = 0
openat(AT_FDCWD, "/dev/null", O_RDONLY|O_CLOEXEC) = 7
faccessat(AT_FDCWD, "/proc/5070/fd/.", F_OK) = 0
faccessat(AT_FDCWD, "/proc/5070/fd/.", F_OK) = 0
faccessat(AT_FDCWD, "/bin/bash", X_OK)  = 0
stat("/bin/bash", {st_mode=S_IFREG|0755, st_size=903440, ...}) = 0
pipe2([8, 9], O_CLOEXEC)                = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
vfork()                                 = 11679
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
close(9)                                = 0
close(7)                                = 0
read(8, "setterm: \347\273\210\347\253\257 xterm-256color \344"..., 16384) = 51
read(8, "Couldn't get a file descriptor r"..., 16333) = 56
read(8, "/home/lujun9972/.bash_profile: \347"..., 16277) = 72
read(8, "Your display number is 0\nTest wh"..., 16205) = 78
read(8, "Fcitx is running correctly.\n\n==="..., 16127) = 104
read(8, "stdin isn't a tty!\n", 16023)  = 19
read(8, "__RESULT\0b269cd09e7ec4e8a115188c"..., 16004) = 298
read(8, 0x7ffd1d39cba6, 15706)          = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=11679, si_uid=1000, si_status=0, si_utime=1, si_stime=1} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (被中断的系统调用)
read(8,

很容易就可以看出，当 Emacs 卡顿时，它在尝试从 8 号文件句柄中读取内容。

那么 8 号文件句柄在哪里定义的呢？往前看可以看到：

pipe2([8, 9], O_CLOEXEC)                = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
vfork()                                 = 11679
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
close(9)                                = 0

可以推测出，Emacs 主进程 fork 出一个子进程（进程号为 11679），并通过管道读取子进程的内容。

然而，从

--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=11679, si_uid=1000, si_status=0, si_utime=1, si_stime=1} ---
rt_sigreturn({mask=[]})                 = -1 EINTR (被中断的系统调用)
read(8,

可以看出，实际上子进程已经退出了（父进程收到 SIGCHLD 信号），父进程确依然在尝试从管道中读取内容，导致的阻塞。

而且从

read(8, "setterm: \347\273\210\347\253\257 xterm-256color \344"..., 16384) = 51
read(8, "Couldn't get a file descriptor r"..., 16333) = 56
read(8, "/home/lujun9972/.bash_profile: \347"..., 16277) = 72
read(8, "Your display number is 0\nTest wh"..., 16205) = 78
read(8, "Fcitx is running correctly.\n\n==="..., 16127) = 104
read(8, "stdin isn't a tty!\n", 16023)  = 19
read(8, "__RESULT\0b269cd09e7ec4e8a115188c"..., 16004) = 298
read(8, 0x7ffd1d39cba6, 15706)          = ? ERESTARTSYS (To be restarted if SA_RESTART is set)

看到，子进程的输出似乎是我的交互式登录 bash 启动时的输出（加载了 .bash_profile）

在往前翻发现这么一段信息：

readlinkat(AT_FDCWD, "/home", 0x7ffd1d3abb50, 1024) = -1 EINVAL (无效的参数)
readlinkat(AT_FDCWD, "/home/lujun9972", 0x7ffd1d3abf00, 1024) = -1 EINVAL (无效的参数)
readlinkat(AT_FDCWD, "/home/lujun9972/.emacs.d", 0x7ffd1d3ac2b0, 1024) = -1 EINVAL (无效的参数)
readlinkat(AT_FDCWD, "/home/lujun9972/.emacs.d/elpa", 0x7ffd1d3ac660, 1024) = -1 EINVAL (无效的参数)
readlinkat(AT_FDCWD, "/home/lujun9972/.emacs.d/elpa/exec-path-from-shell-20180323.1904", 0x7ffd1d3aca10, 1024) = -1 EINVAL (无效的参数)
readlinkat(AT_FDCWD, "/home/lujun9972/.emacs.d/elpa/exec-path-from-shell-20180323.1904/exec-path-from-shell.elc", 0x7ffd1d3acdc0, 1024) = -1 EINVAL (无效的参数)
lseek(7, -2655, SEEK_CUR)               = 1441
read(7, "\n(defvar exec-path-from-shell-de"..., 4096) = 4096

这很明显是跟 exec-path-from-shell 有关啊。

通过查看 exec-path-from-shell 的实现，发现 exec-path-from-shell 的实现原理是通过实际调启一个 shell，然后输出 PATH 和 MANPATH 的值的。而且对于 bash 来说，默认的启动参数为 -i -l（可以通过exec-path-from-shell-arguments来设置）。也就是说 bash 会作为交互式的登录shell来启动的，因此会加载 .bash_profile 和 .bashrc。

既然发现跟 exec-path-from-shell 这个包有关，而且据说这个包对 Linux 其实意义不大，那不如直接禁用掉好了。

dotspacemacs-excluded-packages '(exec-path-from-shell)

再次重启Emacs，发现这次启动速度明显快了许多了。

使用 Python 的 urllib.parse 库解析 URL

Darksun 发布于 2018-02-23
另请参阅: 软件开发,python, URL
评论

Python 中的 urllib.parse 模块提供了很多解析和组建 URL 的函数。

解析url

urlparse() 函数可以将 URL 解析成 ParseResult 对象。对象中包含了六个元素，分别为：

协议（scheme）
域名（netloc）
路径（path）
路径参数（params）
查询参数（query）
片段（fragment）

from urllib.parse import urlparse

url='http://user:pwd@domain:80/path;params?query=queryarg#fragment'

parsed_result=urlparse(url)

print('parsed_result 包含了',len(parsed_result),'个元素')
print(parsed_result)

结果为:

parsed_result 包含了 6 个元素
ParseResult(scheme='http', netloc='user:pwd@domain:80', path='/path', params='params', query='query=queryarg', fragment='fragment')

ParseResult 继承于 namedtuple，因此可以同时通过索引和命名属性来获取 URL 中各部分的值。

为了方便起见， ParseResult 还提供了 username、 password、 hostname、 port 对 netloc 进一步进行拆分。

print('scheme  :', parsed_result.scheme)
print('netloc  :', parsed_result.netloc)
print('path    :', parsed_result.path)
print('params  :', parsed_result.params)
print('query   :', parsed_result.query)
print('fragment:', parsed_result.fragment)
print('username:', parsed_result.username)
print('password:', parsed_result.password)
print('hostname:', parsed_result.hostname)
print('port    :', parsed_result.port)

结果为：

scheme  : http
netloc  : user:pwd@domain:80
path    : /path
params  : params
query   : query=queryarg
fragment: fragment
username: user
password: pwd
hostname: domain
port    : 80

除了 urlparse() 之外，还有一个类似的 urlsplit() 函数也能对 URL 进行拆分，所不同的是， urlsplit() 并不会把 路径参数(params) 从 路径(path) 中分离出来。

当 URL 中路径部分包含多个参数时，使用 urlparse() 解析是有问题的：

url='http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg#fragment'

parsed_result=urlparse(url)

print(parsed_result)
print('parsed.path    :', parsed_result.path)
print('parsed.params  :', parsed_result.params)

结果为：

ParseResult(scheme='http', netloc='user:pwd@domain:80', path='/path1;params1/path2', params='params2', query='query=queryarg', fragment='fragment')
parsed.path    : /path1;params1/path2
parsed.params  : params2

这时可以使用 urlsplit() 来解析：

from urllib.parse import urlsplit
split_result=urlsplit(url)

print(split_result)
print('split.path    :', split_result.path)
# SplitResult 没有 params 属性

结果为：

SplitResult(scheme='http', netloc='user:pwd@domain:80', path='/path1;params1/path2;params2', query='query=queryarg', fragment='fragment')
split.path    : /path1;params1/path2;params2

若只是要将 URL 后的 fragment 标识拆分出来，可以使用 urldefrag() 函数：

from urllib.parse import urldefrag

url = 'http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg#fragment'

d = urldefrag(url)
print(d)
print('url     :', d.url)
print('fragment:', d.fragment)

结果为：

DefragResult(url='http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg', fragment='fragment')
url     : http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg
fragment: fragment

组建URL

ParsedResult 对象和 SplitResult 对象都有一个 geturl() 方法，可以返回一个完整的 URL 字符串。

print(parsed_result.geturl())
print(split_result.geturl())

结果为：

http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg#fragment
http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg#fragment

但是 geturl() 只在 ParsedResult 和 SplitResult 对象中有，若想将一个普通的元组组成 URL，则需要使用 urlunparse() 函数：

from urllib.parse import urlunparse
url_compos = ('http', 'user:pwd@domain:80', '/path1;params1/path2', 'params2', 'query=queryarg', 'fragment')
print(urlunparse(url_compos))

结果为：

http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg#fragment

相对路径转换绝对路径

除此之外，urllib.parse 还提供了一个 urljoin() 函数，来将相对路径转换成绝对路径的 URL。

from urllib.parse import urljoin

print(urljoin('http://www.example.com/path/file.html', 'anotherfile.html'))
print(urljoin('http://www.example.com/path/', 'anotherfile.html'))
print(urljoin('http://www.example.com/path/file.html', '../anotherfile.html'))
print(urljoin('http://www.example.com/path/file.html', '/anotherfile.html'))

结果为：

http://www.example.com/path/anotherfile.html
http://www.example.com/path/anotherfile.html
http://www.example.com/anotherfile.html
http://www.example.com/anotherfile.html

查询参数的构造和解析

使用 urlencode() 函数可以将一个 dict 转换成合法的查询参数：

from urllib.parse import urlencode

query_args = {
    'name': 'dark sun',
    'country': '中国'
}

query_args = urlencode(query_args)
print(query_args)

结果为：

name=dark+sun&country=%E4%B8%AD%E5%9B%BD

可以看到特殊字符也被正确地转义了。

相对的，可以使用 parse_qs() 来将查询参数解析成 dict。

from urllib.parse import parse_qs
print(parse_qs(query_args))

结果为：

{'name': ['dark sun'], 'country': ['中国']}

如果只是希望对特殊字符进行转义，那么可以使用 quote 或 quote_plus 函数，其中 quote_plus 比 quote 更激进一些，会把 :、/ 一类的符号也给转义了。

from urllib.parse import quote, quote_plus, urlencode

url = 'http://localhost:1080/~hello!/'
print('urlencode :', urlencode({'url': url}))
print('quote     :', quote(url))
print('quote_plus:', quote_plus(url))

结果为：

urlencode : url=http%3A%2F%2Flocalhost%3A1080%2F%7Ehello%21%2F
quote     : http%3A//localhost%3A1080/%7Ehello%21/
quote_plus: http%3A%2F%2Flocalhost%3A1080%2F%7Ehello%21%2F

可以看到 urlencode 中应该是调用 quote_plus 来进行转义的。

逆向操作则使用 unquote 或 unquote_plus 函数：

from urllib.parse import unquote, unquote_plus

encoded_url = 'http%3A%2F%2Flocalhost%3A1080%2F%7Ehello%21%2F'
print(unquote(encoded_url))
print(unquote_plus(encoded_url))

结果为：

http://localhost:1080/~hello!/
http://localhost:1080/~hello!/

你会发现 unquote 函数居然能正确地将 quote_plus 的结果转换回来。