Arthas入门手册

一份简单的Arthas入门手册

Arthas入门手册

介绍

Arthas是一款开源的Java诊断工具

Arthas拥有实时反编译, 修改内存, 查询类加载器, 监控特定类或方法等功能, 能有效检查并解决生产环境中的各类问题

本文将介绍工具的下载运行与一些常用的指令使用方法, 更多详细内容还行查看官方文档: Arthas 用户文档

本文环境

  • 系统: CentOS 7 虚拟机

  • JDK: OracleJDK1.8.0

  • Arthas: 3.5

下载 & 运行

  1. 下载Arthas

    下载地址: https://arthas.aliyun.com/arthas-boot.jar

    wget:

    1
    wget https://arthas.aliyun.com/arthas-boot.jar
  2. 运行Arthas

    1
    java -jar arthas-boot.jar

    若直接运行如上命令, 可能产生如下第二行信息, 这只是说明当前没有其他正在运行的的Java进程, 没有可诊断对象

    但是若有其他Java程序正在运行, 却依然产生此信息, 则说明JDK安装版本不满足要求. 可能安装的是OpenJDK, 需要重新安装OracleJDK

    1
    2
    3
    4
    # java -jar arthas-boot.jar
    [INFO] arthas-boot version: 3.5.3
    [INFO] Can not find java process. Try to run `jps` command lists the instrumented Java HotSpot VMs on the target system.
    Please select an available pid.
  3. (可选)下载运行测试程序

    官方提供了一个用于测试的SpringBoot程序, 可以运行此程序后再运行Arthas

    下载地址: https://github.com/hengyunabc/spring-boot-inside/raw/master/demo-arthas-spring-boot/demo-arthas-spring-boot.jar

    wget:

    1
    wget https://github.com/hengyunabc/spring-boot-inside/raw/master/demo-arthas-spring-boot/demo-arthas-spring-boot.jar

    运行:

    1
    java -jar demo-arthas-spring-boot.jar

    也可以运行其他的Java程序用于诊断, 下文的一些诊断内容基于此程序

  4. 重新运行Arthas

    重新运行arthas-boot.jar后产生类似如下内容:

    1
    2
    3
    4
    # java -jar arthas-boot.jar
    [INFO] arthas-boot version: 3.5.3
    [INFO] Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER.
    * [1]: 1312 demo-arthas-spring-boot.jar
  5. 选择进程

    输入1并回车, 此操作是选择运行中demo-arthas-spring-boot.jar进程作为诊断对象, 产生如下内容:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    # java -jar arthas-boot.jar
    [INFO] arthas-boot version: 3.5.3
    [INFO] Found existing java process, please choose one and input the serial number of the process, eg : 1. Then hit ENTER.
    * [1]: 2135 demo-arthas-spring-boot.jar
    1
    [INFO] arthas home: /root/.arthas/lib/3.5.3/arthas
    [INFO] Try to attach process 2135
    [INFO] Attach process 2135 success.
    [INFO] arthas-client connect 127.0.0.1 3658
    ,---. ,------. ,--------.,--. ,--. ,---. ,---.
    / O \ | .--. ''--. .--'| '--' | / O \ ' .-'
    | .-. || '--'.' | | | .--. || .-. |`. `-.
    | | | || |\ \ | | | | | || | | |.-' |
    `--' `--'`--' '--' `--' `--' `--'`--' `--'`-----'


    wiki https://arthas.aliyun.com/doc
    tutorials https://arthas.aliyun.com/doc/arthas-tutorials.html
    version 3.5.3
    main_class
    pid 2135
    time 2021-08-26 09:17:45

    [arthas@2135]$

    此时进入了Arthas工具, 显示ARTHAS的图标, 且输入行首显示为[arthas@xxxx]$

常用命令

退出Arthas exit | quit

quitexit

1
2
3
4
5
$ quit
[root@localhost ~]#

$ exit
[root@localhost ~]#

如此方式退出时, 并没有关闭Arthas工具的进程

重新连接Arthas只需重新输入运行命令

关闭Arthas stop

1
2
3
4
5
6
$ stop
Resetting all enhanced classes ...
Affect(class count: 0 , method count: 0) cost in 1 ms, listenerId: 0
Arthas Server is going to shutdown...
$ session (a0783afa-2aa7-4d61-8624-c978b261e2ab) is closed because server is going to shutdown.
[root@localhost ~]#

查看指令帮助

查看指令与介绍help

显示内容较多, 此处截取部分展示

1
2
3
4
5
$ help
NAME DESCRIPTION
help Display Arthas Help
auth Authenticates the current session
...

查看单条指令详细信息-h

每条指令也都可添加参数-h来获取详细信息, 以dashboard指令为例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ dashboard -h
USAGE:
dashboard [-h] [-i <value>] [-n <value>]

SUMMARY:
Overview of target jvm's thread, memory, gc, vm, tomcat info.

EXAMPLES:
dashboard
dashboard -n 10
dashboard -i 2000

WIKI:
https://arthas.aliyun.com/doc/dashboard

OPTIONS:
-h, --help this help
-i, --interval <value> The interval (in ms) between two executions, default is 5000 ms.
-n, --number-of-execution <value> The number of times this command will be executed.

查看系统实时数据面板dashboard

dashboard会循环显示系统实时数据

可以通过按键QCtrl + C退出循环

显示内容较多, 此处截取部分展示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ dashboard
ID NAME GROUP PRIORITY STATE %CPU DELTA_TI TIME INTERRUP DAEMON
-1 C2 CompilerThread0 - -1 - 0.0 0.000 0:3.727 false true
-1 VM Periodic Task Thread - -1 - 0.0 0.000 0:2.710 false tru
...

Memory used total max usage GC
heap 43M 79M 483M 9.02% gc.copy.count 79
eden_space 10M 22M 133M 7.96% gc.copy.time(ms) 287
...

Runtime
os.name Linux
os.version 3.10.0-1160.el7.x86_64
...

查看线程信息thread

thread会显示当前的线程信息, 可以检查线程状态, CPU使用率等

显示内容较多, 此处截取部分展示

1
2
3
4
5
6
7
$ thread
Threads Total: 35, NEW: 0, RUNNABLE: 11, BLOCKED: 0, WAITING: 14, TIMED_WAITING: 5, TERMINATED: 0, Internal
threads: 5
ID NAME GROUP PRIORITY STATE %CPU DELTA_TI TIME INTERRUP DAEMON
-1 C1 CompilerThread1 - -1 - 0.48 0.000 0:1.490 false true
44 arthas-command-execute system 5 RUNNABLE 0.21 0.000 0:0.055 false true
...

可以通过参数-b来查看阻塞的线程, 参数-n x来查看最占用cpu的前x个线程

查看JVM已加载的类信息sc

查看JVMUserController类的信息

1
2
3
4
5
$ sc com.example.demo.arthas.user.UserController -d
class-info com.example.demo.arthas.user.UserController
code-source file:/root/demo-arthas-spring-boot.jar!/BOOT-INF/classes!/
name com.example.demo.arthas.user.UserController
...

查看已加载类的方法信息sm

查看JVMUserController中的方法信息

1
2
3
4
5
$ sm com.example.demo.arthas.user.UserController -d
declaring-class com.example.demo.arthas.user.UserController
constructor-name <init>
modifier public
...

或者指定查看的方法为findUserById

1
2
3
4
5
$ sm com.example.demo.arthas.user.UserController findUserById -d
declaring-class com.example.demo.arthas.user.UserController
method-name findUserById
modifier public
..

反编译jad

反编译UserController

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
$ jad com.example.demo.arthas.user.UserController

ClassLoader:
+-org.springframework.boot.loader.LaunchedURLClassLoader@5674cd4d
+-sun.misc.Launcher$AppClassLoader@70dea4e
+-sun.misc.Launcher$ExtClassLoader@52fb0a15

Location:
file:/root/demo-arthas-spring-boot.jar!/BOOT-INF/classes!/

/*
* Decompiled with CFR.
*
* Could not load the following classes:
* org.slf4j.Logger
* org.slf4j.LoggerFactory
* org.springframework.web.bind.annotation.GetMapping
* org.springframework.web.bind.annotation.PathVariable
* org.springframework.web.bind.annotation.RestController
*/
package com.example.demo.arthas.user;

import com.example.demo.arthas.user.User;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class UserController {
private static final Logger logger = LoggerFactory.getLogger(UserController.class);

@GetMapping(value={"/user/{id}"})
public User findUserById(@PathVariable Integer id) {
/*15*/ logger.info("id: {}", (Object)id);
/*17*/ if (id != null && id < 1) {
throw new IllegalArgumentException("id < 1");
}
return new User(id, "name" + id);
}
}

Affect(row-cnt:1) cost in 183 ms.

添加参数--source-only可以隐藏ClassLoaderLocation信息

可以使用>来输出到文件中

1
$ jad com.example.demo.arthas.user.UserController > /tmp/UserController.java

也可以反编译特定的函数

1
2
3
4
5
6
7
8
9
$ jad com.example.demo.arthas.user.UserController findUserById --source-only --lineNumber false
@GetMapping(value={"/user/{id}"})
public User findUserById(@PathVariable Integer id) {
logger.info("id: {}", (Object)id);
if (id != null && id < 1) {
throw new IllegalArgumentException("id < 1");
}
return new User(id, "name" + id);
}

监控函数的返回值watch

监控UserControllerfindUerById方法的返回值

可以通过按键QCtrl + C退出监控

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
$ watch com.example.demo.arthas.user.UserController findUserById -x 2
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 47 ms, listenerId: 2
method=com.example.demo.arthas.user.UserController.findUserById location=AtExit
ts=2021-09-08 16:36:13; [cost=0.289593ms] result=@ArrayList[
@Object[][
@Integer[1],
],
@UserController[
logger=@Logger[Logger[com.example.demo.arthas.user.UserController]],
],
@User[
id=@Integer[1],
name=@String[name1],
],
]

这里使用参数-x指定了展开层数为2, 所以才能看到返回的User的具体内容

可以指定参数-e, 只在发生异常时进行显示

监控节点耗时trace

监控UserControllerfindUerById方法调用耗时, 每个方法的耗时显示在行最前端

1
2
3
4
5
6
7
$ trace com.example.demo.arthas.user.UserController findUserById
Press Q or Ctrl+C to abort.
Affect(class count: 1 , method count: 1) cost in 65 ms, listenerId: 4
`---ts=2021-09-08 17:19:26;thread_name=http-nio-80-exec-5;id=14;is_daemon=true;priority=5;TCCL=org.springframework.boot.context.embedded.tomcat.TomcatEmbeddedWebappClassLoader@11858c60
`---[0.269505ms] com.example.demo.arthas.user.UserController:findUserById()
+---[0.198111ms] org.slf4j.Logger:info() #15
`---[0.007955ms] com.example.demo.arthas.user.User:<init>() #21

可以使用参数--skipJDKMethod false来监控JDK函数耗时

使用案例

热更新代码

目前, 请求http://localhost/user/0会产生java.lang.IllegalArgumentException错误

1
2
# curl http://localhost/user/0
{"timestamp":1631170739465,"status":500,"error":"Internal Server Error","exception":"java.lang.IllegalArgumentException","message":"id < 1","path":"/user/0"}

原因在UserController.findByUserId函数内, 现在通过热更新代码来解决这一错误

  1. 通过jad命令将UserController类反编译为java文件, 并保存至本地

    1
    $ jad --source-only com.example.demo.arthas.user.UserController --lineNumber false > /tmp/UserController.java
  2. 使用vivim编辑文件, 编辑后内容如下

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    package com.example.demo.arthas.user;

    import com.example.demo.arthas.user.User;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    import org.springframework.web.bind.annotation.GetMapping;
    import org.springframework.web.bind.annotation.PathVariable;
    import org.springframework.web.bind.annotation.RestController;

    @RestController
    public class UserController {
    private static final Logger logger = LoggerFactory.getLogger(UserController.class);

    @GetMapping(value={"/user/{id}"})
    public User findUserById(@PathVariable Integer id) {
    logger.info("id: {}", (Object)id);
    if (id != null && id < 1) {
    return new User(id, "name" + id);
    // throw new IllegalArgumentException("id < 1");
    }
    return new User(id, "name" + id);
    }
    }
  3. 通过sc命令查找到UserController类的类加载器的hash

    1
    2
    $ sc -d *UserController | grep classLoaderHash
    classLoaderHash 5674cd4d

    *为通配符, 类似模糊查找

    grep为结果筛选, 此处只需要classLoaderHash的值, 本次的值为5674cd4d

  4. 通过mc命令将UserController.java文件编译

    1
    2
    3
    4
    $ mc -c 5674cd4d /tmp/UserController.java -d /tmp
    Memory compiler output:
    /tmp/com/example/demo/arthas/user/UserController.class
    Affect(row-cnt:1) cost in 466 ms.

    此处需要指定类的加载器, 已经通过第四步获取到

    -d参数指定编译文件的输出路径, 也存放在/tmp

  5. 通过redefine命令重新加载编译后的文件

    1
    2
    3
    $ redefine /tmp/com/example/demo/arthas/user/UserController.class
    redefine success, size: 1, classes:
    com.example.demo.arthas.user.UserController

此时请求http://localhost/user/0则正常返回数据

1
2
# curl http://localhost/user/0
{"id":0,"name":"name0"}

线程死锁诊断

上述例子中不存在线程死锁的情况, 故使用以下例子来进行测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
package io.tomoto.controller;

import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

/**
* <p>
* 线程死锁测试Controller
* </p>
*
* @author Tomoto
* @version 1.0
* @since 1.0 2021/9/9 16:34
*/
@RestController
public class TestController {
private Object key1 = new Object();
private Object key2 = new Object();

@GetMapping("/deadlock")
public void deadlock() {
System.out.println("start deadlock");

new Thread(() -> {
synchronized (key1) {
try {
Thread.sleep(2000);
} catch (InterruptedException ignored) {
}

synchronized (key2) {
System.out.println("Thread1 key2");
}
}
}).start();

new Thread(() -> {
synchronized (key2) {
try {
Thread.sleep(2000);
} catch (InterruptedException ignored) {
}

synchronized (key1) {
System.out.println("Thread2 key1");
}
}
}).start();
}
}

在请求该地址之前, 使用thread -b命令, 提示没有阻塞的线程

1
2
$ thread -b
No most blocking thread found!

在请求后, 再次使用thread -b, 提示存在阻塞的线程, 并指出代码位置

1
2
3
4
5
6
7
8
$ thread -b
"Thread-4" Id=41 BLOCKED on java.lang.Object@30cdfdb0 owned by "Thread-5" Id=42
at io.tomoto.controller.TestController.lambda$deadlock$0(TestController.java:32)
- blocked on java.lang.Object@30cdfdb0
- locked java.lang.Object@16271c06 <---- but blocks 1 other threads!
at io.tomoto.controller.TestController$$Lambda$536/391834385.run(Unknown Source)
at java.lang.Thread.run(Thread.java:748)

之后就可以修改代码重新打包部署或者通过热更新代码来解决了

诊断Docker容器运行的Java程序

进入容器, 安装运行Arthas

1
docker exec -it ${containerId} /bin/bash -c "wget https://arthas.aliyun.com/arthas-boot.jar && java -jar arthas-boot.jar"

可能遇到的问题

  1. wget失败

    • 原因: 无法wgetarthas-boot.jar文件, 错误信息为
      wget: can't execute 'ssl_helper': No such file or directory
      则是因为容器内置的wget不支持https

      wget: can’t execute ‘ssl_helper’: No such file or directory

    • 解决方案: 需要从其他的http地址获取该文件, 或者在构建镜像时在Dockerfile中添加该文件

      1
      ADD arthas-boot.jar arthas-boot.jar
  2. 无法启动Arthas

    • 原因: 若无法运行Arthas, 可能是因为容器内安装的是JRE而非JDK, 或安装的是低版本的OpenJDK, 缺少必要工具, 上文也有提到过此原因

    • 解决方案: 在Dockerfile中添加如下内容并重新构建镜像

      1
      FROM java:8-alpine
  3. Arthas启动失败

    • 原因: 镜像启动时若直接运行Java进程, pid1, 会导致一些功能无法使用, 这是docker自身的问题

      Attach docker 里 pid 1的进程报错

    • 解决方案: 若docker版本大于等于1.13, 则可以在启动镜像时添加参数--init来避免Java进程pid1

      1
      docker run --init -dp 80:80 demo