Go性能分析-pprof

采集profile数据

开启pprof

做 Profiling 第一步就是怎么获取应用程序的运行情况数据,go 语言提供了 runtime/pprof 和 net/http/pprof 两个库。

服务型应用

如果你的应用是一直运行的,比如web应用,那么可以使用 net/http/pprof 库,它能够提供http服务进行分析

如果http服务使用默认的http.DefaultServeMux,以_方式直接引入pprof包即可

1
2
3
4
5
import _ "net/http/pprof"
go func() {
http.ListenAndServe("0.0.0.0:8080", nil)
}()

如果使用非http.DefaultServeMux来提供http api,需要查阅pprof包的init函数自行注册辅助handler

import _ "net/http/pprof"背后执行了哪些操作?
pprof包的init函数进行了一系列的路由注册操作,具体的说是通过http.HandleFunc将一系列辅助handler注册到http.DefaultServeMux上

1
2
3
4
5
6
7
8
9
> //package pprof
> func init() {
> http.HandleFunc("/debug/pprof/", Index)
> http.HandleFunc("/debug/pprof/cmdline", Cmdline)
> http.HandleFunc("/debug/pprof/profile", Profile)
> http.HandleFunc("/debug/pprof/symbol", Symbol)
> http.HandleFunc("/debug/pprof/trace", Trace)
> }
>

此后可以通过http://localhost:8080/debug/pprof/查看当前服务概况

工具型应用

如果你的应用是一次性的,运行一段时间就结束,那么最好的办法就是在应用退出的时候把profiling报告保存到文件中进行分析。这种情况可以使用runtime/pprof

1
2
3
4
5
6
7
8
9
10
//cpu profiling
f, err := os.Create(*cpuprofile)
...
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
//memory profiling
f, err := os.Create(*memprofile)
pprof.WriteHeapProfile(f)
f.Close()

采集数据

访问http://localhost:8080/debug/pprof/ 得到以下内容

1
2
3
4
5
6
7
8
9
10
Count Profile
46 allocs //内存分配情况采样
0 block //阻塞操作情况采样
0 cmdline //显示程序启动命令及参数
4 goroutine //goroutine堆栈情况采样
46 heap //堆内存使用情况采样
0 mutex //锁抢占情况采样
0 profile //cpu占用情况采样
16 threadcreate //系统线程创建情况采样
0 trace //程序运行情况采样

阻塞分析是一个很独特的分析,它有点儿类似于 CPU 性能分析,但是它所记录的是 goroutine 等待资源所花的时间。阻塞分析对分析程序并发瓶颈非常有帮助,阻塞性能分析可以显示出什么时候出现了大批的 goroutine 被阻塞了。阻塞性能分析是特殊的分析工具,在排除 CPU 和内存瓶颈前,不应该用它来分析。

gorouine

点击goroutine查看goroutine详情信息,可以看到目前有4个goroutine正在运行,并且可获得相应的堆栈信息

此时url为xxx/debug/pprof/goroutine?debug=1

1
2
3
4
5
6
7
8
9
goroutine profile: total 4 //目前运行的goroutine一共有4个
1 @ 0x1037d60 0x103142a 0x10309f5 0x10d2585 0x10d3481 0x10d3463 0x119a2df 0x11abece 0x12b72a8 0x1066ef1 //此处产生的goroutine有1个(对应开头的1)
# 0x10309f4 internal/poll.runtime_pollWait+0x54 /usr/local/Cellar/go/1.14/libexec/src/runtime/netpoll.go:203
# 0x10d2584 internal/poll.(*pollDesc).wait+0x44 /usr/local/Cellar/go/1.14/libexec/src/internal/poll/fd_poll_runtime.go:87
# 0x10d3480 internal/poll.(*pollDesc).waitRead+0x200 /usr/local/Cellar/go/1.14/libexec/src/internal/poll/fd_poll_runtime.go:92
# 0x10d3462 internal/poll.(*FD).Read+0x1e2 /usr/local/Cellar/go/1.14/libexec/src/internal/poll/fd_unix.go:169
# 0x119a2de net.(*netFD).Read+0x4e /usr/local/Cellar/go/1.14/libexec/src/net/fd_unix.go:202
# 0x11abecd net.(*conn).Read+0x8d /usr/local/Cellar/go/1.14/libexec/src/net/net.go:184
# 0x12b72a7 net/http.(*connReader).backgroundRead+0x57 /usr/local/Cellar/go/1.14/libexec/src/net/http/server.go:678 //goroutine产生的位置

将debug参数设置为2(xxxx/debug/pprof/goroutine?debug=2)查看更为详细的信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
goroutine 20 [chan send, 2 minutes]: //id为20的goroutine阻塞在写channel,时间达2分钟
main.alloc2.func1(0xc42015e060)
/home/ubuntu/heap/leak_demo.go:53 +0xf9 // 这
main.alloc2(0xc42015e060)
/home/ubuntu/heap/leak_demo.go:54 +0x2b
created by main.alloc1
/home/ubuntu/heap/leak_demo.go:42 +0x3f
goroutine 229 [sleep]: //id为229的goroutine阻塞在sleep
time.Sleep(0x2540be400)
/usr/local/Cellar/go/1.14/libexec/src/runtime/time.go:198 +0xba
main.main.func1.1()
/Users/sunxiangke/project/go/src/local/task/advanced/pprof/server.go:13 +0x30
created by main.main.func1
/Users/sunxiangke/project/go/src/local/task/advanced/pprof/server.go:12 +0x7f
goroutine 1 [IO wait, 1 minutes]: //id为1的goroutine阻塞在IO 等待,时间达1分钟
internal/poll.runtime_pollWait(0x92a4ee8, 0x72, 0x0)
/usr/local/Cellar/go/1.14/libexec/src/runtime/netpoll.go:203 +0x55
internal/poll.(*pollDesc).wait(0xc000198098, 0x72, 0x0, 0x0, 0x1409973)
/usr/local/Cellar/go/1.14/libexec/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
/usr/local/Cellar/go/1.14/libexec/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Accept(0xc000198080, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.14/libexec/src/internal/poll/fd_unix.go:384 +0x1d4
net.(*netFD).accept(0xc000198080, 0x8ecf6d6b0efc91be, 0x19fd0efc91be, 0x100000001)
/usr/local/Cellar/go/1.14/libexec/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc0001900c0, 0x5fc4d942, 0xc0001c6db0, 0x1086356)
/usr/local/Cellar/go/1.14/libexec/src/net/tcpsock_posix.go:139 +0x32
net.(*TCPListener).Accept(0xc0001900c0, 0xc0001c6e00, 0x18, 0xc000000180, 0x12c1afc)
/usr/local/Cellar/go/1.14/libexec/src/net/tcpsock.go:261 +0x64
net/http.(*Server).Serve(0xc00019e000, 0x149d180, 0xc0001900c0, 0x0, 0x0)
/usr/local/Cellar/go/1.14/libexec/src/net/http/server.go:2901 +0x25d
net/http.(*Server).ListenAndServe(0xc00019e000, 0xc00019e000, 0x5)
/usr/local/Cellar/go/1.14/libexec/src/net/http/server.go:2830 +0xb7
net/http.ListenAndServe(...)
/usr/local/Cellar/go/1.14/libexec/src/net/http/server.go:3086
main.main()
/Users/sunxiangke/project/go/src/local/task/advanced/pprof/server.go:16 +0x96

heap信息(两部分)

通过heap查看内存相关信息,信息分两部分

  • 第一部分为通过runtime.MemProfile()获取的runtime.MemProfileRecord记录 —todo:不可理解
  • 第二部分打印的是通过runtime.ReadMemStats()读取的runtime.MemStats信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
heap profile: 1: 48 [9348: 234901392] @ heap/1048576 //当前使用的堆内存是48B 总共分配的内存是224M
1: 48 [4: 192] @ 0x105a836 0x1031cc0
# 0x105a835 runtime.systemstack+0x65 /usr/local/Cellar/go/1.11.1/libexec/src/runtime/asm_amd64.s:351
# 0x1031cc0 runtime.mstart+0x0 /usr/local/Cellar/go/1.11.1/libexec/src/runtime/proc.go:1229
0: 0 [1: 663552] @ 0x120f551 0x1219f2c 0x12f5c93 0x12f6128 0x12fa0ea 0x12f418d 0x12f2717 0x12f0e24 0x12fd060 0x12fd933 0x12778e4 0x1279587 0x127a52b 0x1276946 0x105c881
# 0x120f550 compress/flate.NewWriter+0x30 /usr/local/Cellar/go/1.11.1/libexec/src/compress/flate/deflate.go:667
# 0x1219f2b compress/gzip.(*Writer).Write+0x2fb /usr/local/Cellar/go/1.11.1/libexec/src/compress/gzip/gzip.go:191
# 0x12f5c92 runtime/pprof.(*profileBuilder).flush+0x72 /usr/local/Cellar/go/1.11.1/libexec/src/runtime/pprof/proto.go:147
# 0x12f6127 runtime/pprof.(*profileBuilder).pbSample+0xe7 /usr/local/Cellar/go/1.11.1/libexec/src/runtime/pprof/proto.go:169
# 0x12fa0e9 runtime/pprof.writeHeapProto+0x5e9 /usr/local/Cellar/go/1.11.1/libexec/src/runtime/pprof/protomem.go:62
# 0x12f418c runtime/pprof.writeHeapInternal+0x19cc /usr/local/Cellar/go/1.11.1/libexec/src/runtime/pprof/pprof.go:572
# 0x12f2716 runtime/pprof.writeHeap+0x46 /usr/local/Cellar/go/1.11.1/libexec/src/runtime/pprof/pprof.go:532
# 0x12f0e23 runtime/pprof.(*Profile).WriteTo+0x3e3 /usr/local/Cellar/go/1.11.1/libexec/src/runtime/pprof/pprof.go:328
# 0x12fd05f net/http/pprof.handler.ServeHTTP+0x20f /usr/local/Cellar/go/1.11.1/libexec/src/net/http/pprof/pprof.go:245
# 0x12fd932 net/http/pprof.Index+0x722 /usr/local/Cellar/go/1.11.1/libexec/src/net/http/pprof/pprof.go:268
# 0x12778e3 net/http.HandlerFunc.ServeHTTP+0x43 /usr/local/Cellar/go/1.11.1/libexec/src/net/http/server.go:1964
# 0x1279586 net/http.(*ServeMux).ServeHTTP+0x126 /usr/local/Cellar/go/1.11.1/libexec/src/net/http/server.go:2361
# 0x127a52a net/http.serverHandler.ServeHTTP+0xaa /usr/local/Cellar/go/1.11.1/libexec/src/net/http/server.go:2741
# 0x1276945 net/http.(*conn).serve+0x645 /usr/local/Cellar/go/1.11.1/libexec/src/net/http/server.go:1847
...
# runtime.MemStats
# Alloc = 760112
# TotalAlloc = 5073950616
# Sys = 73332984 //进程从系统获得的内存空间,虚拟地址空间
# Lookups = 0
# Mallocs = 1367272
# Frees = 1364549
# HeapAlloc = 760112 //进程堆内存分配使用的空间,通常是用户new出来的堆对象,包含未被gc的
# HeapSys = 65667072 //进程从系统获取的堆内存空间,因为golang底层使用TCmalloc机制,会缓存一部分堆内存,虚拟地址空间
# HeapIdle = 63733760
# HeapInuse = 1933312
# HeapReleased = 62070784
# HeapObjects = 2723
# Stack = 1441792 / 1441792
# MSpan = 53960 / 278528
# MCache = 13824 / 16384
# BuckHashSys = 1455511
# GCSys = 2396160
# OtherSys = 2077537
# NextGC = 4194304
# LastGC = 1540227932229352000
# PauseNs = [34924741 43928762 45939283 73930033 67085591 0 0 ...] //记录每次gc暂停的时间(纳秒),最多记录256个最新记录
# NumGC = 1224 //记录gc发生的次数
# NumForcedGC = 0
# GCCPUFraction = 0.007017534570680994
# DebugGC = false

profile

采集数据获取cpu profile统计数据(需要等待数十秒)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
//连接到进程中查看正在使用的一些内存相关信息
go tool pprof -inuse_space http://127.0.0.1:8080/debug/pprof/heap
inuse_space下flat是堆栈中当前层的inuse内存值,cum是堆栈中本层级的累计inuse内存值
//查看内存对象分配情况
go tool pprof -alloc_space http://127.0.0.1:8080/debug/pprof/heap
top10
top -cum //-cum意思是将函数调用关系中的数据累积,比如A函数调用B函数,则B函数中的内存分配量也会累积到A上面,这样就很容易找出调用链
web //显示svg图
选项有-inuse/alloc_space和-inuse/alloc_objects,有以下常见用法
使用-inuse_space 分析程序常驻内存占用情况
使用-alloc_objects 分析程序在整个生命周期分配的内存数量(累计分配的内存)
我们优先使用-inuse_space来分析,因为直接分析导致问题的现场比分析历史数据肯定要直观的多,一个函数alloc_space多不一定就代表它会导致进程的RSS高

获取和分析 Profiling 数据

测试环境中启动服务demo后,我们使用wrk模拟服务高负载访问情况

1
wrk -c 100 -t 4 -d 1m http://localhost:9876/test

cpu profiling

接着打开http://localhost:8080/debug/pprof/,点击profile等待数秒后获得profile文件
最后执行go tool pprof -http=":8081" ./demo profile,自动弹出profile统计结果页面

统计页面可查看火焰图/调用图/分析信息
demo是服务的二进制执行文件

火焰图

源码

memory profiling

接着打开http://localhost:8080/debug/pprof/heap获得heap文件
最后执行go tool pprof -http=":8081" ./demo heap,自动弹出heap统计结果页面

top

Flat: 函数占用的内存量

Flat%: 函数内存使用量占总内存使用量比例

Sum%: 以上几行Flat合计内存使用量占比

Cum: 累计内存使用量(外层=内层使用量总和)

火焰图

Peek

block profiling

参考资料

使用 pprof 和火焰图调试 golang 应用
golang 内存分析/动态追踪π
go routine leak