geekymv

Do one thing, do it well.


  • 首页

  • 分类

  • 归档

  • 标签

  • 关于

skywalking-java-plugin

发表于 2022-01-06
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/**
* The main entrance of sky-walking agent, based on javaagent mechanism.
*/
public class SkyWalkingAgent {
private static ILog LOGGER = LogManager.getLogger(SkyWalkingAgent.class);

/**
* Main entrance. Use byte-buddy transform to enhance all classes, which define in plugins.
*/
public static void premain(String agentArgs, Instrumentation instrumentation) throws PluginException {
final PluginFinder pluginFinder;
try {
SnifferConfigInitializer.initializeCoreConfig(agentArgs);
} catch (Exception e) {
// try to resolve a new logger, and use the new logger to write the error log here
// 配置初始化过程可能会抛出异常(验证非空参数),这里为了使用新的 LogResolver 需要重新获取日志对象
LogManager.getLogger(SkyWalkingAgent.class)
.error(e, "SkyWalking agent initialized failure. Shutting down.");
return;
} finally {
// refresh logger again after initialization finishes
LOGGER = LogManager.getLogger(SkyWalkingAgent.class);
}

try {
pluginFinder = new PluginFinder(new PluginBootstrap().loadPlugins());
} catch (AgentPackageNotFoundException ape) {
LOGGER.error(ape, "Locate agent.jar failure. Shutting down.");
return;
} catch (Exception e) {
LOGGER.error(e, "SkyWalking agent initialized failure. Shutting down.");
return;
}

// 省略部分代码....
}


// 省略部分代码....
}

今天我们要分析的源码是插件加载部分

1
pluginFinder = new PluginFinder(new PluginBootstrap().loadPlugins());

PluginBootstrap 是一个插件查找器,使用 PluginResourcesResolver 查找所有插件,使用 PluginCfg 加载所有插件。

抽象类定义属性,并为属性提供 Setter 和 Getter 方法,子类通过构造方法内调用super(xx)将属性赋值给父类。

skywalking-java-config

发表于 2022-01-04

SkyWalking Java Agent 配置初始化流程分析
上一篇文章我们通过 SkyWalking Java Agent 日志组件分析一文详细介绍了日志相关的底层实现原理,今天我们要正式进入 premain 方法了,
premain 方法见名知义就是在我们 Java 程序的 main 方法之前运行的方法,一般我们通过 JVM 参数 -javaagent:/path/to/skywalking-agent.jar 的方式指定代理程序。
java agent 相关内容不是本文重点,更多内容请参考文末相关阅读连接。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
/**
* The main entrance of sky-walking agent, based on javaagent mechanism.
*/
public class SkyWalkingAgent {
private static ILog LOGGER = LogManager.getLogger(SkyWalkingAgent.class);

/**
* Main entrance. Use byte-buddy transform to enhance all classes, which define in plugins.
*/
public static void premain(String agentArgs, Instrumentation instrumentation) throws PluginException {
final PluginFinder pluginFinder;
try {
SnifferConfigInitializer.initializeCoreConfig(agentArgs);
} catch (Exception e) {
// try to resolve a new logger, and use the new logger to write the error log here
LogManager.getLogger(SkyWalkingAgent.class)
.error(e, "SkyWalking agent initialized failure. Shutting down.");
return;
} finally {
// refresh logger again after initialization finishes
LOGGER = LogManager.getLogger(SkyWalkingAgent.class);
}
// 省略部分代码....

}
// 省略部分代码....
}

今天我们要重点分析的就是这行代码的内部实现

1
SnifferConfigInitializer.initializeCoreConfig(agentArgs);

SnifferConfigInitializer 类使用多种方式初始化配置,内部实现有以下几个重要步骤:

loadConfig() 加载配置文件
replacePlaceholders() 解析 placeholder
overrideConfigBySystemProp() 读取 System.Properties 属性
overrideConfigByAgentOptions() 解析 agentArgs 参数配置
initializeConfig() 将以上读取到的配置信息映射到 Config 类的静态属性
configureLogger() 根据配置的 Config.Logging.RESOLVER 重配置 Log,更多关于日志参见文章
验证非空参数 agent.service_name 和 collector.servers。

  1. 从指定的配置文件路径读取配置文件内容,通过 -Dskywalking_config=/xxx/yyy 指定配置文件位置;
  2. 如果没有指定配置文件路径,则从默认配置文件 config/agent.config 读取;
  3. 将配置文件内容加载到 Properties;

  4. 从系统属性读取配置覆盖配置;

  5. 解析 agentArgs 覆盖配置;

  6. 将配置信息映射到 Config 类的静态属性;

  7. 必填参数验证。

定位 skywalking-agent.jar 所在目录

读取默认配置文件,以及后面加载插件都需要用到 SkyWalking agent.jar 所在目录
skywalking-agent 目录结构如下:

config 目录存放的是 agent.config 配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/**
* Load the specified config file or default config file
*
* @return the config file {@link InputStream}, or null if not needEnhance.
*/
private static InputStreamReader loadConfig() throws AgentPackageNotFoundException, ConfigNotFoundException {
// System.getProperty() 读取 Java 虚拟机中的系统属性, Java 虚拟机中的系统属性在运行Java程序的时候通过 java -Dk1=v1 配置.
String specifiedConfigPath = System.getProperty(SPECIFIED_CONFIG_PATH);
// 使用指定的配置文件或默认的配置文件, AgentPackagePath.getPath() 获取 skywalking-agent.jar 所在目录
File configFile = StringUtil.isEmpty(specifiedConfigPath) ? new File(
AgentPackagePath.getPath(), DEFAULT_CONFIG_FILE_NAME) : new File(specifiedConfigPath);

if (configFile.exists() && configFile.isFile()) {
try {
LOGGER.info("Config file found in {}.", configFile);

return new InputStreamReader(new FileInputStream(configFile), StandardCharsets.UTF_8);
} catch (FileNotFoundException e) {
throw new ConfigNotFoundException("Failed to load agent.config", e);
}
}
throw new ConfigNotFoundException("Failed to load agent.config.");
}

new File(AgentPackagePath.getPath(), DEFAULT_CONFIG_FILE_NAME) 定位默认配置文件的位置,
AgentPackagePath.getPath() 方法用来获取 skywalking-agent.jar 所在目录,其中涉及到一些的字符串操作,大家对这部分感兴趣的可以看源码研究。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
/**
* AgentPackagePath is a flag and finder to locate the SkyWalking agent.jar. It gets the absolute path of the agent jar.
* The path is the required metadata for agent core looking up the plugins and toolkit activations. If the lookup
* mechanism fails, the agent will exit directly.
*/
public class AgentPackagePath {
private static final ILog LOGGER = LogManager.getLogger(AgentPackagePath.class);

private static File AGENT_PACKAGE_PATH;

public static File getPath() throws AgentPackageNotFoundException {
if (AGENT_PACKAGE_PATH == null) {
// 返回 skywalking-agent.jar 文件所在的目录 E:\develop\source\sample\source\skywalking-java\skywalking-agent
AGENT_PACKAGE_PATH = findPath();
}
return AGENT_PACKAGE_PATH;
}

public static boolean isPathFound() {
return AGENT_PACKAGE_PATH != null;
}

private static File findPath() throws AgentPackageNotFoundException {
// 将 AgentPackagePath 全类名中的.替换成 /
// org/apache/skywalking/apm/agent/core/boot/AgentPackagePath.class
String classResourcePath = AgentPackagePath.class.getName().replaceAll("\\.", "/") + ".class";
// 使用 AppClassLoader 加载资源,通常情况下 AgentPackagePath 类是被 AppClassLoader 加载的。
URL resource = ClassLoader.getSystemClassLoader().getResource(classResourcePath);
if (resource != null) {
String urlString = resource.toString();
//jar:file:/E:/source/skywalking-java/skywalking-agent/skywalking-agent.jar!/org/apache/skywalking/apm/agent/core/boot/AgentPackagePath.class
LOGGER.debug("The beacon class location is {}.", urlString);

// 判断 url 中是否包含!,如果包含则说明 AgentPackagePath.class 是包含在jar中。
int insidePathIndex = urlString.indexOf('!');
boolean isInJar = insidePathIndex > -1;

if (isInJar) {
// file:/E:/source/skywalking-java/skywalking-agent/skywalking-agent.jar
urlString = urlString.substring(urlString.indexOf("file:"), insidePathIndex);
File agentJarFile = null;
try {
// E:\source\skywalking-java\skywalking-agent\skywalking-agent.jar
agentJarFile = new File(new URL(urlString).toURI());
} catch (MalformedURLException | URISyntaxException e) {
LOGGER.error(e, "Can not locate agent jar file by url:" + urlString);
}
if (agentJarFile.exists()) {
// 返回 skywalking-agent.jar 文件所在的目录
return agentJarFile.getParentFile();
}
} else {
int prefixLength = "file:".length();
String classLocation = urlString.substring(
prefixLength, urlString.length() - classResourcePath.length());
return new File(classLocation);
}
}

LOGGER.error("Can not locate agent jar file.");
throw new AgentPackageNotFoundException("Can not locate agent jar file.");
}

}

配置优先级 Setting-override.md

Agent Options > System.Properties(-D) > System environment variables > Config file

System.getProperties() 和 System.getenv() 区别

System.getProperties() 获取 Java 虚拟机相关的系统属性(比如 java.version、 java.io.tmpdir 等),通过 java -D配置;
System.getenv() 获取系统环境变量(比如 JAVA_HOME、Path 等),通过操作系统配置。

请参考 https://www.cnblogs.com/clarke157/p/6609761.html

将配置信息映射到 Config 类

在我们的日常开发中一般是直接从 Properties 读取需要的配置项,SkyWalking Java Agent 并没有这么做,而是定义一个配置类 Config,将配置项映射到 Config 类的静态属性中,
其他地方需要配置项的时候,直接从类的静态属性获取就可以了,非常方便使用。
ConfigInitializer 就是负责将 Properties 中的 key/value 键值对映射到类(比如 Config 类)的静态属性,其中 key 对应类的静态属性,value 赋值给静态属性的值。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
/**
* This is the core config in sniffer agent.
*/
public class Config {

public static class Agent {
/**
* Namespace isolates headers in cross process propagation. The HEADER name will be `HeaderName:Namespace`.
*/
public static String NAMESPACE = "";

/**
* Service name is showed in skywalking-ui. Suggestion: set a unique name for each service, service instance
* nodes share the same code
*/
@Length(50)
public static String SERVICE_NAME = "";

// 省略部分代码....
}

public static class Collector {
/**
* Collector skywalking trace receiver service addresses.
*/
public static String BACKEND_SERVICE = "";

// 省略部分代码....
}

// 省略部分代码....

public static class Logging {
/**
* Log file name.
*/
public static String FILE_NAME = "skywalking-api.log";

/**
* Log files directory. Default is blank string, means, use "{theSkywalkingAgentJarDir}/logs " to output logs.
* {theSkywalkingAgentJarDir} is the directory where the skywalking agent jar file is located.
* <p>
* Ref to {@link WriterFactory#getLogWriter()}
*/
public static String DIR = "";
}

// 省略部分代码....

}

比如通过 agent.config 配置文件配置服务名称

1
2
# The service name in UI
agent.service_name=${SW_AGENT_NAME:Your_ApplicationName}

agent 对应 Config 类的静态内部类 Agent
service_name 对应静态内部类 Agent 的静态属性 SERVICE_NAME
SkyWalking Java Agent 在这里面使用了下划线而不是驼峰来命名配置项,将类的静态属性名称转换成下划线配置名称非常方便,直接转成小写就可以通过 Properties 获取对应的值了。

ConfigInitializer.initNextLevel 方法涉及到的技术点有反射、递归调用、栈等,更多实现细节参考 ConfigInitializer 类源码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
/**
* Init a class's static fields by a {@link Properties}, including static fields and static inner classes.
* <p>
*/
public class ConfigInitializer {
public static void initialize(Properties properties, Class<?> rootConfigType) throws IllegalAccessException {
initNextLevel(properties, rootConfigType, new ConfigDesc());
}

private static void initNextLevel(Properties properties, Class<?> recentConfigType,
ConfigDesc parentDesc) throws IllegalArgumentException, IllegalAccessException {
for (Field field : recentConfigType.getFields()) {
if (Modifier.isPublic(field.getModifiers()) && Modifier.isStatic(field.getModifiers())) {
String configKey = (parentDesc + "." + field.getName()).toLowerCase();
Class<?> type = field.getType();

if (type.equals(Map.class)) {
/*
* Map config format is, config_key[map_key]=map_value
* Such as plugin.opgroup.resttemplate.rule[abc]=/url/path
*/
// Deduct two generic types of the map
ParameterizedType genericType = (ParameterizedType) field.getGenericType();
Type[] argumentTypes = genericType.getActualTypeArguments();

Type keyType = null;
Type valueType = null;
if (argumentTypes != null && argumentTypes.length == 2) {
// Get key type and value type of the map
keyType = argumentTypes[0];
valueType = argumentTypes[1];
}
Map map = (Map) field.get(null);
// Set the map from config key and properties
setForMapType(configKey, map, properties, keyType, valueType);
} else {
/*
* Others typical field type
*/
String value = properties.getProperty(configKey);
// Convert the value into real type
final Length lengthDefine = field.getAnnotation(Length.class);
if (lengthDefine != null) {
if (value != null && value.length() > lengthDefine.value()) {
value = value.substring(0, lengthDefine.value());
}
}
Object convertedValue = convertToTypicalType(type, value);
if (convertedValue != null) {
// 通过反射给静态属性设置值
field.set(null, convertedValue);
}
}
}
}
// recentConfigType.getClasses() 获取 public 的 classes 和 interfaces
for (Class<?> innerConfiguration : recentConfigType.getClasses()) {
// parentDesc 将类(接口)名入栈
parentDesc.append(innerConfiguration.getSimpleName());
// 递归调用
initNextLevel(properties, innerConfiguration, parentDesc);
// parentDesc 将类(接口)名出栈
parentDesc.removeLastDesc();
}
}

// 省略部分代码....
}

class ConfigDesc {
private LinkedList<String> descs = new LinkedList<>();

void append(String currentDesc) {
if (StringUtil.isNotEmpty(currentDesc)) {
descs.addLast(currentDesc);
}
}

void removeLastDesc() {
descs.removeLast();
}

@Override
public String toString() {
return String.join(".", descs);
}
}

阅读源码过程对部分代码添加的注释已经提交到 GitHub 上了,我就不在此贴太多代码了,具体参见 https://github.com/geekymv/skywalking-java/tree/v8.8.0-annotated

skywalking-java

发表于 2021-12-29

基于 SkyWalking Java Agent 8.8.0 版本

SkyWalkingAgent 类是 SkyWalking Java Agent 的入口 premain 方法所在类,今天我们要分析的不是 premain 方法,而是任何一个应用程序都需要的日志框架,SkyWalking Java Agent 并没有依赖现有的日志框架如 log4j 之类的,而是自己实现了一套。

1
2
3
4
5
6
7
8
9
10
11
12
13
/**
* The main entrance of sky-walking agent, based on javaagent mechanism.
*/
public class SkyWalkingAgent {
private static ILog LOGGER = LogManager.getLogger(SkyWalkingAgent.class);

/**
* Main entrance. Use byte-buddy transform to enhance all classes, which define in plugins.
*/
public static void premain(String agentArgs, Instrumentation instrumentation) throws PluginException {
// 省略部分代码....
}
}

SkyWalkingAgent 类的第一行代码就是自己实现的日志组件,看起来和其他常用的日志组件的使用方式没什么区别。

1
private static ILog LOGGER = LogManager.getLogger(SkyWalkingAgent.class);

首先通过日志管理器 LogManager 获取 ILog 接口的一个具体实现,这是典型的Java多态 - 父类(接口)的引用指向子类的实现。

ILog 接口

ILog 接口提供了我们常用的打印日志的方法,定义了一套日志使用规范。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/**
* The Log interface. It's very easy to understand, like any other log-component. Do just like log4j or log4j2 does.
* <p>
*/
public interface ILog {
void info(String format);

void info(String format, Object... arguments);

void info(Throwable t, String format, Object... arguments);

void debug(String format);

void debug(String format, Object... arguments);

// 省略部分代码....
}

LogManager

我们看下 LogManager 类的具体实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/**
* LogManager is the {@link LogResolver} implementation manager. By using {@link LogResolver}, {@link
* LogManager#getLogger(Class)} returns a {@link ILog} implementation. This module use this class as the main entrance,
* and block the implementation detail about log-component. In different modules, like server or sniffer, it will use
* different implementations.
*
* <p> If no {@link LogResolver} is registered, return {@link NoopLogger#INSTANCE} to avoid
* {@link NullPointerException}. If {@link LogManager#setLogResolver(LogResolver)} is called twice, the second will
* override the first without any warning or exception.
*
* <p> Created by xin on 2016/11/10.
*/
public class LogManager {
private static LogResolver RESOLVER = new PatternLogResolver();

public static void setLogResolver(LogResolver resolver) {
LogManager.RESOLVER = resolver;
}

public static ILog getLogger(Class<?> clazz) {
if (RESOLVER == null) {
return NoopLogger.INSTANCE;
}
return LogManager.RESOLVER.getLogger(clazz);
}

public static ILog getLogger(String clazz) {
if (RESOLVER == null) {
return NoopLogger.INSTANCE;
}
return LogManager.RESOLVER.getLogger(clazz);
}
}

LogManager 类是 LogResolver 实现类的管理器,通过使用 LogResolver,LogManager.getLogger(Class) 返回了 ILog 接口的一个实现,LogManager 类是日志组件的主要入口,内部封装了日志组件的实现细节。

LogManager 内部提供了setLogResolver 方法用于注册指定的 LogResolver,如果设置 LogResolver 为 null,则返回 NoopLogger 实例。

LogResolver 只是返回 ILog 接口的实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/**
* {@link LogResolver} just do only one thing: return the {@link ILog} implementation.
* <p>
*/
public interface LogResolver {
/**
* @param clazz the class is showed in log message.
* @return {@link ILog} implementation.
*/
ILog getLogger(Class<?> clazz);

/**
* @param clazz the class is showed in log message.
* @return {@link ILog} implementation.
*/
ILog getLogger(String clazz);
}

LogResolver 接口目前提供了2个实现类:PatternLogResolver 、JsonLogResolver 分别返回 PatternLogger 和 JsonLogger。

ILog 接口的实现类

  • NoopLogger 枚举
    NoopLogger 直接继承了 ILog,NoopLogger 只是实现了 ILog 接口,所有方法都是空实现,NoopLogger 存在的意义是为了防止 NullPointerException,因为调用者可以通过 LogManager 的 setLogResolver 方法设置不同的日志解析器 LogResolver,如果为null,则返回 ILog 接口的默认实现 NoopLogger。

  • AbstractLogger 抽象类

    • PatternLogger
    • JsonLogger

AbstractLogger

AbstractLogger 抽象类是为了简化 ILog 接口的具体实现,主要功能:

  1. 它持有logger类名 targetClass;
  2. 负责日志级别检查;
  3. 解析用户输入的 message,将{}替换为对应的参数值;
  4. 提供格式化日志内容的抽象方法,需要具体子类实现,目前支持 pattern 和 json 两种,默认为pattern。

输出的日志格式

1
%level %timestamp %thread %class : %msg %throwable

每一项代表的意义如下:

1
2
3
4
5
6
7
%level 日志级别.
%timestamp 时间戳 yyyy-MM-dd HH:mm:ss:SSS 格式.
%thread 线程名.
%msg 用户指定的日志信息.
%class 类名.
%throwable 异常信息.
%agent_name agent名称.

对于日志格式中的每一项, SkyWalking Java Agent 分别提供了对应的转换器解析,比如 ThreadConverter、LevelConverter 等。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/**
* An abstract class to simplify the real implementation of the loggers.
* It hold the class name of the logger, and is responsible for log level check,
* message interpolation, etc.
*/
public abstract class AbstractLogger implements ILog {
public static final Map<String, Class<? extends Converter>> DEFAULT_CONVERTER_MAP = new HashMap<>();
protected List<Converter> converters = new ArrayList<>();

static {
DEFAULT_CONVERTER_MAP.put("thread", ThreadConverter.class);
DEFAULT_CONVERTER_MAP.put("level", LevelConverter.class);
DEFAULT_CONVERTER_MAP.put("agent_name", AgentNameConverter.class);
DEFAULT_CONVERTER_MAP.put("timestamp", DateConverter.class);
DEFAULT_CONVERTER_MAP.put("msg", MessageConverter.class);
DEFAULT_CONVERTER_MAP.put("throwable", ThrowableConverter.class);
DEFAULT_CONVERTER_MAP.put("class", ClassConverter.class);
}

// 省略部分代码....
}

比如 ThreadConverter 类用于解析 %thread 输出当前线程的 name。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/**
* Just return the Thread.currentThread().getName()
*/
public class ThreadConverter implements Converter {
@Override
public String convert(LogEvent logEvent) {
return Thread.currentThread().getName();
}

@Override
public String getKey() {
return "thread";
}
}

日志格式化 format

AbstractLogger 类实现了 ILog 接口,实现了打印日志方法,最终都调用了下面这个通用的方法

1
2
3
protected void logger(LogLevel level, String message, Throwable e) {
WriterFactory.getLogWriter().write(this.format(level, message, e));
}

其中 logger 方法内部调用了 format 方法,format 是一个抽象方法,留给子类去实现日志输出的格式,返回字符串的字符串将输出到文件或标准输出。

1
2
3
4
5
6
7
8
9
10
11
 /**
* The abstract method left for real loggers.
* Any implementation MUST return string, which will be directly transferred to log destination,
* i.e. log files OR stdout
*
* @param level log level
* @param message log message, which has been interpolated with user-defined parameters.
* @param e throwable if exists
* @return string representation of the log, for example, raw json string for {@link JsonLogger}
*/
protected abstract String format(LogLevel level, String message, Throwable e);

抽象类对接口中的方法做了实现,每个实现中都调用了一个抽象方法,这个抽象方法让子类来实现具体的业务逻辑。

下面是 AbstractLogger 抽象类其中一个实现类 PatternLogger 类对 format 方法的实现,调用转换器相应的 Converter ,将日志拼接成字符串。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
@Override
protected String format(LogLevel level, String message, Throwable t) {
LogEvent logEvent = new LogEvent(level, message, t, targetClass);
StringBuilder stringBuilder = new StringBuilder();
for (Converter converter : this.converters) {
stringBuilder.append(converter.convert(logEvent));
}
return stringBuilder.toString();
}

public void setPattern(String pattern) {
if (StringUtil.isEmpty(pattern)) {
pattern = DEFAULT_PATTERN;
}
this.pattern = pattern;
this.converters = new Parser(pattern, DEFAULT_CONVERTER_MAP).parse();
}

PatternLogger 将日志格式 pattern 转换成对应的转换器 Converter 是在 PatternLogger.setPattern 方法实现的,内部调用了 Parser.parse 方法,具体实现参见 Parser 类源码。

WriterFactory

日志已经拼装完毕,接下来就该将日志内容输出到指定的位置,WriterFactory 工厂类就是负责创建 IWriter 接口的实现类,用于将日志信息写入到目的地。

1
2
3
public interface IWriter {
void write(String message);
}

IWriter 接口有2种实现 FileWriter 和 SystemOutWriter

  • FileWriter:使用一个阻塞队列 ArrayBlockingQueue 作为缓冲,线程池 ScheduledExecutorService 异步从队列中取出日志信息,使用 FileOutputStream 将日志信息写入日志文件中,典型的生产者-消费者模式;

  • SystemOutWriter:将日志信息输出到控制台;

FileWriter 工作原理

下面我们看下 FileWriter 是如何将日志写入日志文件的

  • FileWriter 负责将日志信息写入 ArrayBlockingQueue 队列中

  • ScheduledExecutorService 定时任务
    a)每秒从 ArrayBlockingQueue 取出所有日志信息,写入到文件中;
    b)判断文件大小是否超过最大值Config.Logging.MAX_FILE_SIZE;
    c)如果超过最大值将当前文件重命名。

1
2
3
4
5
6
7
8
9
/**
* Write log to the queue. W/ performance trade off.
*
* @param message to log
*/
@Override
public void write(String message) {
logBuffer.offer(message);
}

10_write

日志组件涉及到的设计模式:

  • 工厂模式
  • 单例模式(懒汉模式、枚举类)
  • 生产者-消费者模式

好了,今天的内容就到这里了,看完之后自己是否也能模仿写一个日志组件呢

SkyWalking Java Agent 源码

jvm-gc

发表于 2021-11-26

垃圾收集器

Java堆内存分为新生代和老年代,新生代主要存储短生命周期的对象,适合使用复制算法进行垃圾回收;老年代主要存储长生命周期的对象,适合使用标记整理算法进行垃圾回收。

JVM针对新生代和老年代分别提供了不同的垃圾收集器。新生代垃圾收集器有Serial、ParNew 和 Parallel Scavenge,老年代垃圾收集器有 Serial Old、Parallel Old、CMS,还有针对不同区域的G1分区收集算法。

新生代垃圾收集器

  • Serial
    单线程,复制算法,简单高效,没有线程交互开销。
    垃圾收集时,必须暂停其他所有用户线程(Stop The World),直到它收集结束。
    Java 虚拟机在 Client 模式下新生代的默认垃圾收集器。

  • ParNew
    多线程,复制算法。ParNew 垃圾收集器是 Serial 垃圾收集器的多线程实现,它采用多线程模式工作,其他和 Serial 几乎一样。
    除了 Serial 收集器外,目前只有它能与 CMS 收集器配合工作。

  • Parallel Scavenge
    多线程,复制算法,Java8 默认的垃圾收集器。
    吞吐量优先的垃圾收集器,主要适合在后台运算而不需要太多交互的分析任务。

    自适应调节策略(GC Ergonomics),不需要指定新生代大小(-Xmn)、Eden 与 Survivor 的比例(-XX:SurvivorRatio)、晋升老年代对象大小(-XX:PretenureSizeThreshold)等细节参数,虚拟机会动态调整这些参数以提供最合适的停顿时间或者最大吞吐量。

1
Stop The World (STW)由虚拟机在后台自动发起自动完成的,在用户不可知、不可控的情况下把用户的正常用户线程全部停掉,这对很多应用来说是不可接受的。

老年代垃圾收集器

  • Serial Old
    单线程,标记整理算法
    Java 虚拟机在 Client 模式下老年代的默认垃圾收集器。
  • Parallel Old
    多线程,标记整理算法,Parallel Scavenge 的老年代版本。
    Parallel Old 垃圾收集器优先考虑系统吞吐量,其次考虑停顿时间等因素;
    Parallel Old 在JDK6开始提供,
    新生代 Parallel Scavenge + 老年代 Parallel Old 配合使用。
  • CMS(Concurrent Mark Sweep)
    第一款实际意义上支持并发的垃圾收集器,它首次实现了让垃圾收集线程和用户线程(基本上)同时工作。

    标记清除算法,关注点是尽可能缩短用户线程的停顿时间,停顿时间越短越适合需要与用户交互或需要保证服务响应质量的程序,适合互联网应用的服务端。

    ①初始标记(CMS initial mark):只标记和 GC Roots 能直接关联到的对象,速度很快,需要暂停所有用户线程;

    ②并发标记(CMS concurrent mark):从 GC Roots 的直接关联对象开始遍历整个对象图的过程,耗时较长单不需要暂停用户线程;
    ③重新标记(CMS remark):在并发标记过程中用户线程继续运行,导致垃圾收集过程中部分对象的状态发生了变化,为了确保这部分对象的状态的正确性,需要重新标记并暂停用户线程。

    ④并发清除(CMS concurrent sweep):和用户线程一起工作,清理删除掉标记阶段判断的可以回收的对象,不需要暂停用户线程。

    初始标记和重新标记阶段会暂停用户线程,但速度很快;
    并发标记和并发清除不需要暂停用户线程,有效地缩短了垃圾回收时系统的停顿时间;
    从总体上来说,CMS收集器回收过程是与用户线程一起并发执行的。

    1
    2
    -XX:+UseConcMarkSweepGC
    老年代使用CMS,新生代使用ParNew(-XX:+/-UseParNewGC来强制指定或者禁用它),新生代也可以使用 Serial 收集器。

要是CMS运行

G1(Garbage First)
多线程标记整理算法。

G1 是一个面向全堆的收集器,不需要其他新生代收集器的配合工作,自JDK9开始,ParNew + CMS 收集器的组合不再是官方推荐的服务端模式下的收集器解决方案了,官方希望它能完全被 G1 所取代,甚至取消了 ParNew + Serial Old 以及 Serial + CMS 组合了的支持(很少有人这么用)。只剩下 ParNew + CMS 互相搭配使用。

相较于CMS垃圾收集器,G1收集器的两个突出的改进:

  • 基于标记整理算法,不产生内存碎片;
  • 可以精确地控制停顿时间,在不牺牲吐吞量的前提下,实现短停顿的垃圾回收。

垃圾收集器中的并行与并发

并行(Parallel):并行描述的是多条垃圾收集器线程之间的关系,说明同一时间有多条这样的线程在协同工作,通常默认此时用户线程是处理等待状态,程序无法响应服务请求。

并发(Concurrent):并发描述的是垃圾收集器线程与用户线程之间的关系,说明同一时间垃圾收集器线程与用户线程都在运行。用户线程仍在运行,所以程序仍然能响应服务请求,由于垃圾收集器线程占用了一部分系统资源,此时应用程序的处理的吞吐量将受到一定的影响。

perfect-code

发表于 2021-10-12

com.netflix.discovery.shared.resolver.ResolverUtils 类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/**
* Randomize server list.
* 随机化list
* @return a copy of the original list with elements in the random order
*/
public static <T extends EurekaEndpoint> List<T> randomize(List<T> list) {
// 新创建一个List是为了不改变原有的list
List<T> randomList = new ArrayList<>(list);
if (randomList.size() < 2) {
return randomList;
}
// 洗牌操作
Collections.shuffle(randomList,ThreadLocalRandom.current());
return randomList;
}

com.netflix.discovery.DiscoveryClient 类
创建线程池

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// default size of 2 - 1 each for heartbeat and cacheRefresh
scheduler = Executors.newScheduledThreadPool(2,
new ThreadFactoryBuilder()
.setNameFormat("DiscoveryClient-%d")
.setDaemon(true)
.build());

heartbeatExecutor = new ThreadPoolExecutor(
1, clientConfig.getHeartbeatExecutorThreadPoolSize(), 0, TimeUnit.SECONDS,
new SynchronousQueue<Runnable>(),
new ThreadFactoryBuilder()
.setNameFormat("DiscoveryClient-HeartbeatExecutor-%d")
.setDaemon(true)
.build()
); // use direct handoff

cacheRefreshExecutor = new ThreadPoolExecutor(
1, clientConfig.getCacheRefreshExecutorThreadPoolSize(), 0, TimeUnit.SECONDS,
new SynchronousQueue<Runnable>(),
new ThreadFactoryBuilder()
.setNameFormat("DiscoveryClient-CacheRefreshExecutor-%d")
.setDaemon(true)
.build()
); // use direct handoff

Eureka 每30s 执行一次renew操作

EurekaHttpClient 接口

threadpoolexecutor-javadoc

发表于 2021-09-15

ThreadPoolExecutor Javadoc解读

An ExecutorService that executes each submitted task using one of possibly several pooled threads, normally configured using Executors factory methods.

ExecutorService 使用池线程执行提交的任务

Thread pools address two different problems:
they usually provide improved performance when executing large numbers of asynchronous tasks, due to reduced per-task invocation overhead, and they provide a means of bounding and managing the resources, including threads, consumed when executing a collection of tasks.
Each ThreadPoolExecutor also maintains some basic statistics, such as the number of completed tasks.
线程池解决了两个问题:
在执行大量异步任务时提供更好的性能,因为每个任务调用开销减少了;
提供了限制和管理资源的方法,包括执行任务消耗的线程。

To be useful across a wide range of contexts, this class provides many adjustable parameters and extensibility hooks. However, programmers are urged to use the more convenient Executors factory methods
Executors.newCachedThreadPool (unbounded thread pool, with automatic thread reclamation),
Executors.newFixedThreadPool (fixed size thread pool) and Executors.newSingleThreadExecutor (single background thread), that preconfigure settings for the most common usage scenarios.
Otherwise, use the following guide when manually configuring and tuning this class:

ThreadPoolExecutor 提供了许多可调整的参数和扩展的钩子方法。
Executors 提供了一些创建线程池的工厂方法:
Executors.newCachedThreadPool() 无界线程池,自动回收线程
Executors.newFixedThreadPool() 固定大小的线程池
Executors.newSingleThreadExecutor() 单线程池

Core and maximum pool sizes

A ThreadPoolExecutor will automatically adjust the pool size (see getPoolSize) according to
the bounds set by corePoolSize (see getCorePoolSize) and maximumPoolSize (see getMaximumPoolSize).
When a new task is submitted in method execute(Runnable), and fewer than corePoolSize threads are running, a new thread is created to handle the request, even if other worker threads are idle.

ThreadPoolExecutor 根据 corePoolSize和 maximumPoolSize参数自动调整线程池大小。
当通过execute(Runnable) 方法提交一个新的任务,并且少于corePoolSize数量的线程在运行,会创建一个新的线程处理请求任务,即使其他工作线程空闲。
(也就是说当线程池中的线程数小于核心线程数,当有新的任务要处理时都会创建新的线程)。

If there are more than corePoolSize but less than maximumPoolSize threads running, a new thread will be created only if the queue is full. By setting corePoolSize and maximumPoolSize the same, you create a fixed-size thread pool. By setting maximumPoolSize to an essentially unbounded value such as Integer.MAX_VALUE, you allow the pool to accommodate an arbitrary number of concurrent tasks.
Most typically,core and maximum pool sizes are set only upon construction, but they may also be changed dynamically using setCorePoolSize and setMaximumPoolSize.

如果线程池中有超过核心线程数但是小于最大线程数的线程在运行,只有当队列满了才会创建新的线程。
(也就是说当线程池中的线程数等于核心线程数,对于新提交的任务会进入队列排队,直到队列满了,才会创建新的线程处理新来的任务)
通过 setCorePoolSize 和 setMaximumPoolSize 方法可以动态调整核心线程和最大线程的数量。

On-demand construction

By default, even core threads are initially created and started only when new tasks arrive,
but this can be overridden dynamically using method prestartCoreThread or prestartAllCoreThreads.
You probably want to prestart threads if you construct the pool with a non-empty
默认情况下,当新的任务到达时创建并启动核心线程,也可以使用 prestartCoreThread 或 prestartAllCoreThreads 方法提前创建线程。

Creating new threads

New threads are created using a ThreadFactory. If not otherwise specified, a Executors.defaultThreadFactory is used, that creates threads to all be in the same ThreadGroup and with the same NORM_PRIORITY priority and non-daemon status. By supplying a different ThreadFactory, you can alter the thread’s name, thread group, priority, daemon status, etc. If a ThreadFactory fails to create a thread when asked by returning null from newThread, the executor will continue, but might not be able to execute any tasks. Threads should possess the “modifyThread” RuntimePermission.
If worker threads or other threads using the pool do not possess this permission, service may be degraded:
configuration changes may not take effect in a timely manner, and a shutdown pool may remain in a state in which termination is possible but not completed.

使用ThreadFactory创建线程,默认使用 Executors的 defaultThreadFactory方法创建 DefaultThreadFactory 对象。
通过自定义 ThreadFactory 可以指定 线程名称,线程组,优先级,是否守护线程 等等。

Keep-alive times

If the pool currently has more than corePoolSize threads, excess threads will be terminated
if they have been idle for more than the keepAliveTime (see getKeepAliveTime(TimeUnit)).
This provides a means of reducing resource consumption when the pool is not being actively used.
If the pool becomes more active later, new threads will be constructed. This parameter can also be changed dynamically using method setKeepAliveTime(long, TimeUnit).
Using a value of Long.MAX_VALUE TimeUnit.NANOSECONDS effectively disables idle threads from ever terminating prior to shut down. By default, the keep-alive policy applies only when there are more than corePoolSize threads. But method allowCoreThreadTimeOut(boolean) can be used to apply this time-out policy to core threads as well, so long as the keepAliveTime value is non-zero

如果线程池中的线程数超过核心线程大小,空闲时间超过 keepAliveTime 的多余线程会被终止,
keepAliveTime的值可以通过 setKeepAliveTime(long, TimeUnit) 方法动态调整。
默认情况下,keep-alive 策略只会作用于超过核心线程数的线程, allowCoreThreadTimeOut(boolean) 方法会改变time-out策略将空闲的核心线程也回收。

Queuing

Any BlockingQueue may be used to transfer and hold submitted tasks. The use of this queue interacts with pool sizing:

  • If fewer than corePoolSize threads are running, the Executor always prefers adding a new thread rather than queuing.
  • If corePoolSize or more threads are running, the Executor always prefers queuing a request rather than adding a new thread.
  • If a request cannot be queued, a new thread is created unless this would exceed maximumPoolSize, in which case, the task will be rejected.

BlockingQueue 用于转换和存储提交的任务
如果正在运行的线程数小于核心线程数,Executor 会创建一个线程执行新的请求任务;
如果正在运行的线程数大于等于核心线程数,Executor 会将请求任务放入队列;
如果队列满了请求无法入队,会创建新的线程执行任务,当线程数超过maximumPoolSize,将执行任务拒绝策略 RejectedExecutionHandler。

There are three general strategies for queuing:

  • Direct handoffs.
    A good default choice for a work queue is a SynchronousQueue that hands off tasks to threads without otherwise holding them. Here, an attempt to queue a task will fail if no threads are immediately available to run it, so a new thread will be constructed.
    This policy avoids lockups when handling sets of requests that might have internal dependencies.
    Direct handoffs generally require unbounded maximumPoolSizes to avoid rejection of new submitted tasks.
    This in turn admits the possibility of unbounded thread growth when commands continue to arrive on average faster than they can be processed.

  • Unbounded queues.
    Using an unbounded queue (for example a LinkedBlockingQueue without a predefined capacity) will cause new tasks
    to wait in the queue when all corePoolSize threads are busy. Thus, no more than corePoolSize threads will ever be created.
    (And the value of the maximumPoolSize therefore doesn’t have any effect.) This may be appropriate when each task is completely independent of others,
    so tasks cannot affect each others execution; for example, in a web page server.
    While this style of queuing can be useful in smoothing out transient bursts of requests,
    it admits the possibility of unbounded work queue growth when commands continue to arrive on average faster than they can be processed.

  • Bounded queues.
    A bounded queue (for example, an ArrayBlockingQueue) helps prevent resource exhaustion when used with finite maximumPoolSizes,
    but can be more difficult to tune and control.
    Queue sizes and maximum pool sizes may be traded off for each other:
    Using large queues and small pools minimizes CPU usage, OS resources, and context-switching overhead, but can lead to artificially low throughput.
    If tasks frequently block (for example if they are I/O bound), a system may be able to schedule time for more threads than you otherwise allow.
    Use of small queues generally requires larger pool sizes, which keeps CPUs busier but may encounter unacceptable scheduling overhead, which also decreases throughput.

常用的3种队列

  • SynchronousQueue:不存任务,直接将任务传递给线程,如果没有线程能够处理任务,任务入队将失败。
  • LinkedBlockingQueue:无界队列,如果没有指定容量(默认容量是Integer.MAX_VALUE,也可以指定容量),当核心线程繁忙的时候,任务入队等待,只有核心线程被创建,最大线程数不起作用。
  • ArrayBlockingQueue:有界队列,可以防止资源耗尽。

Rejected tasks

New tasks submitted in method execute(Runnable) will be rejected when the Executor has been shut down,
and also when the Executor uses finite bounds for both maximum threads and work queue capacity, and is saturated.
In either case, the execute method invokes the RejectedExecutionHandler.rejectedExecution(Runnable, ThreadPoolExecutor) method of its RejectedExecutionHandler.
Four predefined handler policies are provided:

  • In the default ThreadPoolExecutor.AbortPolicy, the handler throws a runtime RejectedExecutionException upon rejection.
  • In ThreadPoolExecutor.CallerRunsPolicy, the thread that invokes execute itself runs the task.
    This provides a simple feedback control mechanism that will slow down the rate that new tasks are submitted.
  • In ThreadPoolExecutor.DiscardPolicy, a task that cannot be executed is simply dropped.
  • In ThreadPoolExecutor.DiscardOldestPolicy, if the executor is not shut down, the task at the head of the work queue is dropped, and then execution is retried (which can fail again, causing this to be repeated.)
    It is possible to define and use other kinds of RejectedExecutionHandler classes.
    Doing so requires some care especially when policies are designed to work only under particular capacity or queuing policies.

新的任务提交,当Executor已经关闭的时候 或 有界队列满了并且线程达到最大线程数,会执行拒绝策略 RejectedExecutionHandler.rejectedExecution(Runnable r, ThreadPoolExecutor executor)。

JDK提供了4个拒绝策略
ThreadPoolExecutor.AbortPolicy:默认拒绝策略,抛出RejectedExecutionException异常。
ThreadPoolExecutor.CallerRunsPolicy:调用者执行,调用execute方法的线程自己执行任务,会降低任务提交的速度。
ThreadPoolExecutor.DiscardPolicy:丢弃当前任务。
ThreadPoolExecutor.DiscardOldestPolicy:丢弃队列头的任务,再次执行当前任务。

可以自定义实现其他种类的 RejectedExecutionHandler的实现,其他实现参考 https://cloud.tencent.com/developer/article/1520860

Hook methods

This class provides protected overridable beforeExecute(Thread, Runnable) and afterExecute(Runnable, Throwable) methods that are called before and after execution of each task.
These can be used to manipulate the execution environment; for example, reinitializing ThreadLocals, gathering statistics, or adding log entries. Additionally, method terminated can be overridden to perform any special processing that needs to be done once the Executor has fully terminated.
If hook or callback methods throw exceptions, internal worker threads may in turn fail and abruptly terminate.
ThreadPoolExecutor 类提供了 beforeExecute 和 afterExecute 方法在任务执行前后会执行,子类可以重写这两个方法做一些统计、日志等。

Queue maintenance

Method getQueue() allows access to the work queue for purposes of monitoring and debugging.
Use of this method for any other purpose is strongly discouraged. Two supplied methods, remove(Runnable) and purge are available to assist in storage reclamation when large numbers of queued tasks become cancelled.
getQueue() 方法允许访问工作队列,可以用于监控和调试。

Finalization

A pool that is no longer referenced in a program AND has no remaining threads will be shutdown automatically. If you would like to ensure that unreferenced pools are reclaimed even if users forget to call shutdown, then you must arrange that unused threads eventually die, by setting appropriate keep-alive times, using a lower bound of zero core threads and/or setting allowCoreThreadTimeOut(boolean).

workerCount and runState

The main pool control state, ctl, is an atomic integer packing two conceptual fields

  • workerCount, indicating the effective number of threads
  • runState, indicating whether running, shutting down etc

In order to pack them into one int, we limit workerCount to (2^29)-1 (about 500 million) threads rather than (2^31)-1 (2 billion) otherwise representable. If this is ever an issue in the future, the variable can be changed to be an AtomicLong, and the shift/mask constants below adjusted. But until the need arises, this code is a bit faster and simpler using an int.

The workerCount is the number of workers that have been permitted to start and not permitted to stop. The value may be transiently different from the actual number of live threads, for example when a ThreadFactory fails to create a thread when asked, and when exiting threads are still performing bookkeeping before terminating. The user-visible pool size is reported as the current size of the workers set.

The runState provides the main lifecycle control, taking on values:

  • RUNNING: Accept new tasks and process queued tasks
  • SHUTDOWN: Don’t accept new tasks, but process queued tasks
  • STOP: Don’t accept new tasks, don’t process queued tasks, and interrupt in-progress tasks
  • TIDYING: All tasks have terminated, workerCount is zero, the thread transitioning to state TIDYING will run the terminated() hook method
  • TERMINATED: terminated() has completed

The numerical order among these values matters, to allow ordered comparisons. The runState monotonically increases over time, but need not hit each state. The transitions are:

  • RUNNING -> SHUTDOWN On invocation of shutdown(), perhaps implicitly in finalize()
  • (RUNNING or SHUTDOWN) -> STOP On invocation of shutdownNow()
  • SHUTDOWN -> TIDYING When both queue and pool are empty
  • STOP -> TIDYING When pool is empty
  • TIDYING -> TERMINATED When the terminated() hook method has completed

Threads waiting in awaitTermination() will return when the state reaches TERMINATED.

Detecting the transition from SHUTDOWN to TIDYING is less straightforward than you’d like because the queue may become empty after non-empty and vice versa during SHUTDOWN state, but we can only terminate if, after seeing that it is empty, we see that workerCount is 0 (which sometimes entails a recheck – see below).

ThreadPoolExecutor 中的成员变量ctl 包含两部分

1
private final AtomicInteger ctl = new AtomicInteger(ctlOf(RUNNING, 0));

workerCount: 工作线程数量
runState:线程池运行状态

RUNNING:接收新任务并处理队列里的任务
SHUTDOWN:不接受新任务,处理队列任务
STOP:不接受新任务,不处理队列任务,中断正在执行的任务
TIDYING:所有任务已经终止,workerCount是0,线程状态转换为TIDYING,将运行terminated()钩子方法
TERMINATED:terminated() 钩子方法已经完成

RUNNING -> SHUTDOWN 调用shutdown()方法
RUNNING / SHUTDOWN -> STOP 调用shutdownNow()方法
SHUTDOWN -> TIDYING 队列和线程池为空
STOP -> TIDYING 线程池为空
TIDYING -> TERMINATED terminated()钩子方法已经完成。

更多详细信息参见 java.util.concurrent.ThreadPoolExecutor 的Javadoc。

redis-cluster-tutorial

发表于 2021-09-13

Redis cluster tutorial Redis官方文档解读

This document is a gentle introduction to Redis Cluster, that does not use difficult to understand concepts of distributed systems. It provides instructions about how to setup a cluster, test, and operate it, without going into the details that are covered in the Redis Cluster specification but just describing how the system behaves from the point of view of the user.

However this tutorial tries to provide information about the availability and consistency characteristics of Redis Cluster from the point of view of the final user, stated in a simple to understand way.

Note this tutorial requires Redis version 3.0 or higher.

If you plan to run a serious Redis Cluster deployment, the more formal specification is a suggested reading, even if not strictly required. However it is a good idea to start from this document, play with Redis Cluster some time, and only later read the specification.

Redis Cluster 101

Redis Cluster provides a way to run a Redis installation where data is automatically sharded across multiple Redis nodes.

Redis Cluster also provides some degree of availability during partitions, that is in practical terms the ability to continue the operations when some nodes fail or are not able to communicate. However the cluster stops to operate in the event of larger failures (for example when the majority of masters are unavailable).

So in practical terms, what do you get with Redis Cluster?

  • The ability to automatically split your dataset among multiple nodes.
  • The ability to continue operations when a subset of the nodes are experiencing failures or are unable to communicate with the rest of the cluster.
1
2
数据自动分片到多个节点
提供一定程度的可用性,部分节点不可用时,客户端可以继续操作

Redis Cluster TCP ports

Every Redis Cluster node requires two TCP connections open. The normal Redis TCP port used to serve clients, for example 6379, plus the port obtained by adding 10000 to the data port, so 16379 in the example.

This second high port is used for the Cluster bus, that is a node-to-node communication channel using a binary protocol. The Cluster bus is used by nodes for failure detection, configuration update, failover authorization and so forth. Clients should never try to communicate with the cluster bus port, but always with the normal Redis command port, however make sure you open both ports in your firewall, otherwise Redis cluster nodes will be not able to communicate.

The command port and cluster bus port offset is fixed and is always 10000.

Note that for a Redis Cluster to work properly you need, for each node:

  1. The normal client communication port (usually 6379) used to communicate with clients to be open to all the clients that need to reach the cluster, plus all the other cluster nodes (that use the client port for keys migrations).
  2. The cluster bus port (the client port + 10000) must be reachable from all the other cluster nodes.

If you don’t open both TCP ports, your cluster will not work as expected.

The cluster bus uses a different, binary protocol, for node to node data exchange, which is more suited to exchange information between nodes using little bandwidth and processing time.

1
每一个Redis Cluster节点有2个端口,命令端口(command port)比如6379,用于服务客户端,集群总线端口(cluster bus port) 等于命令端口加上10000,即16379,用于节点之间数据交换。

Redis Cluster and Docker

Currently Redis Cluster does not support NATted environments and in general environments where IP addresses or TCP ports are remapped.

Docker uses a technique called port mapping: programs running inside Docker containers may be exposed with a different port compared to the one the program believes to be using. This is useful in order to run multiple containers using the same ports, at the same time, in the same server.

In order to make Docker compatible with Redis Cluster you need to use the host networking mode of Docker. Please check the --net=host option in the Docker documentation for more information.

Redis Cluster data sharding

Redis Cluster does not use consistent hashing, but a different form of sharding where every key is conceptually part of what we call a hash slot.

There are 16384 hash slots in Redis Cluster, and to compute what is the hash slot of a given key, we simply take the CRC16 of the key modulo 16384.

Every node in a Redis Cluster is responsible for a subset of the hash slots, so for example you may have a cluster with 3 nodes, where:

  • Node A contains hash slots from 0 to 5500.
  • Node B contains hash slots from 5501 to 11000.
  • Node C contains hash slots from 11001 to 16383.

This allows to add and remove nodes in the cluster easily. For example if I want to add a new node D, I need to move some hash slot from nodes A, B, C to D. Similarly if I want to remove node A from the cluster I can just move the hash slots served by A to B and C. When the node A will be empty I can remove it from the cluster completely.

Because moving hash slots from a node to another does not require to stop operations, adding and removing nodes, or changing the percentage of hash slots hold by nodes, does not require any downtime.

Redis Cluster supports multiple key operations as long as all the keys involved into a single command execution (or whole transaction, or Lua script execution) all belong to the same hash slot. The user can force multiple keys to be part of the same hash slot by using a concept called hash tags.

Hash tags are documented in the Redis Cluster specification, but the gist is that if there is a substring between {} brackets in a key, only what is inside the string is hashed, so for example this{foo}key and another{foo}key are guaranteed to be in the same hash slot, and can be used together in a command with multiple keys as arguments.

1
2
3
4
5
6
7
8
9
10
11
12
Redis Cluster 没有使用一致性Hash算法,而是使用Hash slot 实现数据分片,一个集群共有16384个hash slots。
对于给定key的hash slot = CRC16(key) % 16384。

每个节点包含部分hash slots,可以添加和删除节点,比如集群有3个节点,其中
节点A包含的hash slots 从 0 到 5500
节点B包含的hash slots 从 5501 到 11000
节点C包含的hash slots 从 11001 到 16383

如果想要添加一个新的节点D,那么可以从A,B,C 节点移动一些hash slots到节点D。删除节点需要先将节点中的hash slots 移到其他的节点中。
从一个节点到另一个节点移动hash slots 无需停机。

Hash tags 可以将多个不同的key存储到同一个hash slot上。

Redis Cluster master-slave model

In order to remain available when a subset of master nodes are failing or are not able to communicate with the majority of nodes, Redis Cluster uses a master-slave model where every hash slot has from 1 (the master itself) to N replicas (N-1 additional slaves nodes).

In our example cluster with nodes A, B, C, if node B fails the cluster is not able to continue, since we no longer have a way to serve hash slots in the range 5501-11000.

However when the cluster is created (or at a later time) we add a slave node to every master, so that the final cluster is composed of A, B, C that are master nodes, and A1, B1, C1 that are slave nodes. This way, the system is able to continue if node B fails.

Node B1 replicates B, and B fails, the cluster will promote node B1 as the new master and will continue to operate correctly.

However, note that if nodes B and B1 fail at the same time, Redis Cluster is not able to continue to operate.

1
2
Redis Cluster 使用主从模式,每个hash slot 有1~N个副本,其中1个master和 N-1个 slaves
当Master出现故障,集群会将其中一个从节点提升为新的Master,继续提供服务。

kafka-config-parse

发表于 2021-08-17

tcpdump排查线上接口请求问题

发表于 2021-08-13

是的,线上环境出问题了,调用第三方的接口出现服务端响应状态码401,于是赶紧查询HTTP Code 401代表啥意思,于是找到了这篇文章

http常见的状态码,400,401,403状态码分别代表什么?

1
401 unauthorized,表示发送的请求需要有通过 HTTP 认证的认证信息

401是服务端响应的状态码,根据接口文档在请求header中添加 X_API_KEY用于接口验证,代码中也确实这么实现的。

而且同样的接口本地发送请求没有问题,这里是使用Hutool(3.3.2版本)工具包发送HTTP请求,调用第三方接口抓取数据。

于是想到了抓包看看,项目代码是部署在Linux服务器(IP是192.168.0.211)上的,无法使用Wireshark之类图形化工具,于是使用tcpdump命令去抓包。

1
2
3
4
5
6
7
tcpdump -i eth0 tcp port 80 and host 192.168.67.206 -w /tmp/httptool-206.pcap

-i eth0 指定网卡
tcp 指定协议
port 80 指定端口
host 192.168.67.206 指定ip,表示抓取192.168.67.206的主机收到和发出的数据包
-w 将抓包信息写入文件

将抓的数据包传输到本地使用Wireshark打开如下

第4行就是发出的HTTP GET请求,注意下这里发出的请求header中携带了cookie信息,而代码中并没有去设置cookie,那么这个cookie是怎么来的呢?于是先将这个cookie在本地代码中显示设置,在本地调试下,果然出现了 401 Unauthorized 异常,可能就是这个cookie导致的问题。

决定看下Hutool工具包中HttpRequest类实现源码是如何自动设置cookie的。

我们的业务代码

1
2
3
4
String result = HttpRequest.get(url) // 设置请求url
.header(X_API_KEY, apiKey) // 设置header
.timeout(TIME_OUT) // 设置超时时间
.execute().body();

上面都是设置请求需要的参数,看下HttpRequest中的execute() 方法

1
2
3
4
5
6
7
8
/**
* 执行Reuqest请求
*
* @return this
*/
public HttpResponse execute() {
return this.execute(false);
}

继续跟踪

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/**
* 执行Reuqest请求
*
* @param isAsync 是否异步
* @return this
*/
public HttpResponse execute(boolean isAsync) {
//初始化URL
urlWithParamIfGet();
// 初始化 connection
initConnecton();

// 发送请求
send();

//手动实现重定向
HttpResponse httpResponse = sendRedirectIfPosible();

// 获取响应
if(null == httpResponse){
httpResponse = new HttpResponse(this.httpConnection, this.charset, isAsync, isIgnoreResponseBody());
}
return httpResponse;
}

进入到 initConnecton() 方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/**
* 初始化网络连接
*/
private void initConnecton(){
// 初始化 connection
this.httpConnection = HttpConnection
.create(this.url, this.method, this.hostnameVerifier, this.ssf, this.timeout, this.proxy)
.header(this.headers, true); // 覆盖默认Header

//自定义Cookie
if(null != this.cookie){
this.httpConnection.setCookie(this.cookie);
}

//是否禁用缓存
if(this.isDisableCache){
this.httpConnection.disableCache();
}

//定义转发
this.httpConnection.setInstanceFollowRedirects(maxRedirectCount > 0 ? true : false);
}

this.httpConnection.setCookie(this.cookie); 可以看到如果我们显示指定了cookie,这里会通过 HttpConnection 中的 setCookie 方法进行设置

1
2
3
4
5
6
7
8
9
10
11
12
13
/**
* 设置Cookie
*
* @param cookie Cookie
* @return this
*/
public HttpConnection setCookie(String cookie) {
if (cookie != null) {
log.debug("Cookie: {}", cookie);
header(Header.COOKIE, cookie, true);
}
return this;
}

这里我们在代码中并没有指定cookie,那么代码中是否在其他地方调动了这个方法呢。

于是在setCookie 方法中打个断点,运行代码调试下看看,从IDEA中的Frames窗口中可以定位到调用setCookie 方法的地方,果然在 HttpConnection 的 initConn 方法中会调用setCookie 方法,从 CookiePool 中根据url里的host获取cookie。

我们看下 CookiePool 这个类,该类内部为了一个静态的Map,key是host, value是cookies字符串,CookiePool 用于模拟浏览器的Cookie,当访问后站点,记录Cookie,下次再访问这个站点时,一并提交Cookie到站点。也就是说以后的请求都会携带这个cookie。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
package com.xiaoleilu.hutool.http;

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

/**
*Cookie池。此池针对所有HTTP请求可用。<br>
*此Cookie池用于模拟浏览器的Cookie,当访问后站点,记录Cookie,下次再访问这个站点时,一并提交Cookie到站点。
* @author Looly
*
*/
public class CookiePool {

//key: host, value: cookies字符串
private static Map<String, String> cookies = new ConcurrentHashMap<String, String>();

/**
* 获得某个网站的Cookie信息
* @param host 网站Host
* @return Cookie字符串
*/
public static String get(String host) {
return cookies.get(host);
}

/**
* 将某个网站的Cookie放入Cookie池
* @param host 网站Host
* @param cookie Cookie字符串
*/
public static void put(String host, String cookie) {
cookies.put(host, cookie);
}

/**
* 清空Cookie
* @since 3.0.7
*/
public static void clear(){
cookies.clear();
}
}

那么这个cookie是从哪里来的呢?继续看下 CookiePool 中的 put 方法在哪些地方被调用了,在 HttpConnection 类中找到了 storeCookie 方法

1
2
3
4
5
6
7
8
9
10
/**
* 存储服务器返回的Cookie到本地
*/
private void storeCookie() {
final String setCookie = header(Header.SET_COOKIE);
if (StrUtil.isBlank(setCookie) == false) {
log.debug("Set cookie: [{}]", setCookie);
CookiePool.put(url.getHost(), setCookie);
}
}

在HttpRequest中的 execute() 方法发送请求之后,获取响应数据的时候会调用 httpConnection.getInputStream(),获取服务端返回的信息时,从响应头中提取Set-Cookie字段的值,保存到CookiePool中。

原来是我们大部分的接口都是根据第三方接口,通过在请求header中添加 X_API_KEY用于接口验证,而有一个接口第三方并没有提供,于是我们通过模拟登录的方式登录到网站来抓取数据,就是在调动登录接口的时候,第三方服务端在响应中返回了 Set-Cookie 信息,而Hutool工具会从响应中提取 Set-Cookie信息保存在CookiePool 中,并在后续请求中携带这个cookie。

第三方网站登录接口返回的cookie

至此,问题已经定位到了,既然我们不想要这个cookie,那么可以在模拟登录调用第三方接口之后,调用CookiePool 中的put方法将host对应的cookie重置为null,这样同一个host的其他请求就不会携带cookie了。

1
2
// 清除cookie
CookiePool.put(ip, null);

总结,本文通过tcpdump抓包工具,查看完整的HTTP请求,分析了Hutool工具发送HTTP请求过程的源码,最终定位并解决了问题。

程序开多少线程合适

发表于 2021-07-26

CPU密集型应用

CPU密集型程序,一个完整请求的I/O操作可以在很短的时间内完成,CPU还有很多运算要处理,也就是说CPU运算的比例占很大一部分,线程等待时间接近0。
对于单核CPU处理CPU密集型程序,这种情况不太适合使用多线程,对于多核CPU处理CPU密集型程序,我们完全可以最大化的利用CPU核心数,应用并发编程提高效率。
CPU密集型程序的最佳线程数就是:理论上线程数量 = CPU核数(逻辑),但是实际上数量一般会设置为CPU核数(逻辑)+ 1(经验值),
CPU密集型的线程恰好在某时因为发生一个页错误或者因其他原因而暂停,刚好有一个“额外”的线程,可以确保在这种情况下CPU周期不会中断工作。

IO密集型应用

与CPU密集型程序相对,一个完整请求的CPU运算操作完成之后还有很多I/O操作要做,也就是说I/O操作占比很大部分,等待时间较长,线程等待时间所在比例越高,需要越多线程;
线程CPU时间所占比例越高,需要越少线程。
1、I/O密集型程序的最佳线程数就是:最佳线程数 = CPU核心数(1 / CPU 利用率) = CPU核心数(1 + (I/O耗时 / CPU耗时));
2、如果几乎全是I/O耗时, 那么CPU耗时就无限趋近于0,所以纯理论就可以说是2N(N=CPU核数),当然也有说2N+1的,1应该是backup。
3、一般我们说2N+1即可。

参考资料 https://www.bilibili.com/video/BV1aB4y1N7NR?p=139

12…11

geekymv

110 日志
8 分类
23 标签
© 2022 geekymv
由 Hexo 强力驱动
| 总访问量次 | 总访客人 |
主题 — NexT.Muse v5.1.4