# Elastic Stack

image-20240207173153219

The Elastic stack, 包括 Elasticsearch、Kibana、Beats 和 Logstash (也称为 ELK Stack)。
Elaticsearch:
简称为 ES, ES 是一个开源的高扩展的分布式全文搜索引擎,是整个 E1astic stack 技术栈的核心。它可以近乎实时的存储、检索数据;本身扩展性很好,可以扩展到上百台服务器,处理 PB 级别的数据
Kibana:
是一个免费且开放的用户界面,能够让您对 Elasticsearch 数据进行可视化,并让您在 elastic stack 中进行导航。
您可以进行各种操作,从跟踪查询负载,到理解请求如何流经您的整个应用,都能轻松完成。
Beats:
是一个免费且开放的平台,集合了多种单一用途数据采集器。它们从成百上千或成千上万台机器和系统向 Logstash 或 Elasticsearch 发送数据
Logstash:
是免费且开放的服务器端数据处理管道,能够从多个来源采集数据,转换数据,然后将数据发送到您最喜欢的 “存储库 " 中。

image-20240207173840701

数据流走向:源数据层 (nginx,tomcat)—> 数据采集层 (filebeat)—> 数据存储层 (Elasticsearch)

image-20240207174025791

数据流走向:源数据层 (nginx,tomcat)—> 数据采集 / 转换层 (Logstash)—> 数据存储层 (Elasticsearch)。

image-20240207174116591

数据流走向:源数据层 (nginx,tomcat)—> 数据采集 (filebeat)–> 转换层 (Logstash)—> 数据存储层 (ElasticSearch)。

image-20240207174356843

数据流走向:源数据层 (nginx,tomcat)—> 数据采集 (filebeat)—> 数据缓存层 (kafka)—> 转换层 (Logstash)—> 数 据存储层 (ElasticSearch)。

image-20240207174651850

# ElasticSearch 和 solr 的选择

Lucene 的优缺点:

优点:

可以被认为是迄今为止最先进,性能最好的,功能最全的搜索引整库 (框架)。

缺点:

(1) 只能在 Java 项目中使用,并且要以 jar 包的方式直接集成在项目中;
(2) 使用很复杂,你需要深入了解检索的相关知识来创建索引和搜索索引代码;
(3) 不支持集群环境,索引数据不同步 (不支持大型项目);
(4) 扩展性差,索引库和应用所在同一个服务器,当索引数据过大时,效率逐渐降低;
值得注意的是,上述的 Lucene 框架中的缺点,Elasticsearch 全部都能解决。

Elasticsearch 是一个实时的分布式搜索和分析引擎。它可以帮助你用前所未有的速度去处理大规模数据
ES 可以用于全文搜索,结构化搜索以及分析,当然你也可以将这三者进行组合,

https://www.elastic.co/cn/customers/success-stories

image-20240207195559794

Solr 是 Apache ucene 项目的开源企业搜索平台。其主要功能包括全文检索、命中标示、分面搜索、动态聚类、数据库集成,以及富文本 (如 Word、PDF) 的处理。
So1r 是高度可扩展的,并提供了分布式搜索和索引复制。So1r 是最流行的企业级搜索引擎,Solr4 还增加了 NoSQL 支持。
Elasticsearch (下面简称 "ES") 与 Solr 的比较:
(1) So1r 利用 2ookeeper 进行分布式管理,而 ES 自身带有分布式协调管理功能;(2) Solr 支持更多格式 (JSON、XML、CSV) 的数据,而 ES 仅支持 JSON 文件格式;(3) Solr 官方提供的功能更多,而 ES 本身更注重于核心功能,高级功能多有第三方插件提供;(4) So1r 在 " 传统搜索”(已有数据) 中表现好于 ES,但在处理 “实时搜索”(实时建立索引) 应用时效率明显低于 ES (5) Solr 是传统搜索应用的有力解决方案,但 Elasticsearch 更适用于新兴的实时搜索应用。

# 部署

# 单点部署 es

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
hostnamectl set-hostname elk101

cat >> /etc/hosts << "EOF"
> 192.168.13.101 elk101
> 192.168.13.102 elk102
> 192.168.13.103 elk103
> EOF

cat << EOF >> ~/.bashrc
PS1="\[\e[37;40m\][\[\e[32;40m\]\u\[\e[37;40m\]@\h \[\e[36;40m\]\w\[\e[0m\]]\\$ "
EOF
source ~/.bashrc

vim /etc/ssh/sshd_config
GSSAPIAuthentication no
UseDNS no

sed -ri 's#^GSSAPIAuthentication yes#GSSAPIAuthentication no#g' /etc/ssh/sshd_config
sed -ri 's@^#UseDNS yes@UseDNS no@g' /etc/ssh/sshd_config
grep ^UseDNS /etc/ssh/sshd_config
grep ^GSSAPIAuthentication /etc/ssh/sshd_config


systemctl disable firewalld --now
systemctl is-enabled firewalld

getenforce
setenforce 0
sed -ri 's#(SELINUX=)enforcing#1disabled#' /etc/selinux/config

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa -q
ssh-copy-id 192.168.13.102
ssh-copy-id 192.168.13.103
ssh-copy-id 192.168.13.101

yum install rsync -y

// 仅在101上
vim /usr/local/sbin/data_rsync.sh
#! /bin/bash
if [ $# -ne 1 ];then
echo "Usage: $0 /path/to/file"
exit
fi

if [ ! -e $1 ];then
echo "[ $1 ] dir or file not find"
exit
fi

fullpath=`dirname $1`

basename=`basename $1`

cd $fullpath

for ((host_id=102;host_id<=103;host_id++))
do
tput setaf 2
echo ===== rsyncing elk${host_id}: $basename =====
tput setaf 7
rsync -az $basename `whoami`@elk${host_id}:$fullpath
if [ $? -eq 0 ];then
echo "命令执行成功"
fi
done
chmod +x /usr/local/sbin/data_rsync.sh


[root@elk101 /tmp]$ data_rsync.sh /tmp/
===== rsyncing elk102: tmp =====
The authenticity of host 'elk102 (192.168.13.102)' can't be established.
ECDSA key fingerprint is SHA256:Amhw1/IzdvDQ+eSUVh6rwPbGbuOoSsni4RkaYT5RYNA.
ECDSA key fingerprint is MD5:00:b8:a1:41:cd:66:89:02:11:af:c7:64:fe:5d:25:b2.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'elk102' (ECDSA) to the list of known hosts.
命令执行成功
===== rsyncing elk103: tmp =====
命令执行成功

yum install vim net-tools.x86_64 ntpdate chrony.x86_64 -y

vim /etc/chrony.conf
server ntp.aliyun.com iburst
server ntp1.aliyun.com iburst

systemctl restart chronyd.service

systemctl enable chronyd --now

-------------------------------------------------------------------
// 开始安装es

yum install wget -y
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.3-linux-x86_64.tar.gz
wget "https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.3-x86_64.rpm"
yum install -y elasticsearch-7.17.3-x86_64.rpm
systemctl start elasticsearch.service


[root@elk101 /usr/share/elasticsearch/jdk/bin]$ netstat -lntup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp6 0 0 127.0.0.1:9200 :::* LISTEN 11745/java
tcp6 0 0 ::1:9200 :::* LISTEN 11745/java
tcp6 0 0 127.0.0.1:9300 :::* LISTEN 11745/java
tcp6 0 0 ::1:9300 :::* LISTEN 11745/java

9200 对集群外部提供服务,使用http协议
9300 集群内通信,使用tcp协议

[root@elk101 ~]$ curl http://127.0.0.1:9200
{
"name" : "elk101",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "qW8JEryoSeqktwv8L4VOGA",
"version" : {
"number" : "7.17.3",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "5ad023604c8d7416c9eb6c0eadb62b14e766caff",
"build_date" : "2022-04-19T08:11:19.070913226Z",
"build_snapshot" : false,
"lucene_version" : "8.11.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}


vim /etc/elasticsearch/elasticsearch.yml
cluster.name: elk
node.name: elk101
network.host: 0.0.0.0
discovery.seed_hosts: ["elk101"]

ll /var/log/elasticsearch
-rw-r--r-- 1 elasticsearch elasticsearch 0 Feb 8 01:32 elk_index_indexing_slowlog.json
-rw-r--r-- 1 elasticsearch elasticsearch 0 Feb 8 01:32 elk_index_indexing_slowlog.log
-rw-r--r-- 1 elasticsearch elasticsearch 0 Feb 8 01:32 elk_index_search_slowlog.json
-rw-r--r-- 1 elasticsearch elasticsearch 0 Feb 8 01:32 elk_index_search_slowlog.log
-rw-r--r-- 1 elasticsearch elasticsearch 23194 Feb 8 01:32 elk.log


[root@elk101 /var/log/elasticsearch]$ curl http://elk101:9200
{
"name" : "elk101",
"cluster_name" : "elk",
"cluster_uuid" : "qW8JEryoSeqktwv8L4VOGA",
"version" : {
"number" : "7.17.3",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "5ad023604c8d7416c9eb6c0eadb62b14e766caff",
"build_date" : "2022-04-19T08:11:19.070913226Z",
"build_snapshot" : false,
"lucene_version" : "8.11.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}


# 集群部署 es

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
yum install elasticsearch-7.17.3-x86_64.rpm -y
vim /etc/elasticsearch/elasticsearch.yml
discovery.seed_hosts: ["elk101","elk102","elk103"]
cluster.initial_master_nodes: ["elk101","elk102","elk103"]

// 删除临时数据
rm -rf /var/{lib,log}/elasticsearch/*

data_rsync.sh /etc/elasticsearch/elasticsearch.yml

systemctl restart elasticsearch

[root@elk102 ~]# curl http://192.168.13.103:9200/_cat/nodes
192.168.13.102 7 97 1 0.01 0.04 0.05 cdfhilmrstw - elk102
192.168.13.103 6 97 2 0.02 0.08 0.07 cdfhilmrstw - elk103
192.168.13.101 9 96 3 0.20 0.18 0.09 cdfhilmrstw * elk101

# 安装 kibana

1
2
3
4
5
6
7
8
9
wget "https://artifacts.elastic.co/downloads/kibana/kibana-7.17.3-x86_64.rpm"
wget https://artifacts.elastic.co/downloads/kibana/kibana-7.17.3-linux-x86_64.tar.gz
yum install kibana-7.17.3-x86_64.rpm -y

vim /etc/kibana/kibana.yml
server.host: "elk101"
server.name: "elk"
elasticsearch.hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
i18n.locale: "zh-CN"

image-20240210151604168

# 安装 filebeat

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
wget "https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.17.3-x86_64.rpm"
wget "https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.17.3-linux-x86_64.tar.gz"
yum install filebeat-7.17.3-x86_64.rpm -y

vim /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: stdin

output.console:
pretty: true

filebeat -e -c /etc/filebeat/filebeat.yml
9999999999999999999999999999
{
"@timestamp": "2024-02-10T10:01:37.689Z",
"@metadata": {
"beat": "filebeat",
"type": "_doc",
"version": "7.17.3"
},
"host": {
"name": "elk101"
},
"agent": {
"ephemeral_id": "b92fbd99-5787-454b-a33b-8b3dc5a1bd88",
"id": "370ef800-c964-4c80-bef5-5f4b5756bdb2",
"name": "elk101",
"type": "filebeat",
"version": "7.17.3",
"hostname": "elk101"
},
"log": {
"offset": 0,
"file": {
"path": ""
}
},
"message": "9999999999999999999999999999",
"input": {
"type": "stdin"
},
"ecs": {
"version": "1.12.0"
}
}

image-20240210175557765

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
[root@elk101 ~]$ ll /var/lib/filebeat/registry/filebeat/
total 8
-rw------- 1 root root 902 Feb 10 20:20 log.json 记录了读取的时间戳与读取位置偏移,修改offset实现自定义读取位置
-rw------- 1 root root 15 Feb 10 17:33 meta.json


// 通过日志文件读取input数据
vim /etc/filebeat/filebeat-log.yml
filebeat.inputs:
- type: log
paths:
- /var/log/*

output.console:
pretty: true

删除了/var/lib/filebeat/*后,offset丢失,重新从文件开始读取

filebeat.inputs:
- type: log
tags: ["shabi"]
enabled: true
paths:
- /var/log/*
- /tmp/1.json
fields: // 输出的字段增加fields字段,fields中是kv形式
haha: "xixixi"

- type: log
tags: ["haha"] // 在输出的字段中增加tag字段
enabled: true
paths:
- /data/safeline/*
fields_under_root: true // 不在被fields字段包裹,直接自己变成一个kv
output.elasticsearch:
host: ["http://elk101:9200"]

filebeat 对接 es

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
filebeat.inputs:
- type: log
tags: ["shabi"]
enabled: true
paths:
- /var/log/*
- /tmp/1.json
fields:
haha: "xixixi"

output.elasticsearch:
hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
enabled: true
#index: "%{[fields.log_type]}-%{[agent.version]}-%{+yyy.MM.dd}" // 默认索引名称filebeat-7.17.3-2024-02-12。不配置索引生命周期的时候才会生效
index: "hahaha-xixi-%{+yyy.MM.dd}"
setup.template.name: "hahaha"
setup.template.pattern: "hahaha-xixi"
setup.ilm.enabled: false // 禁用索引生命周期管理

image-20240211205042678

创建索引

image-20240211205405692

image-20240211205523452

筛选字段

image-20240211205716000

新建索引模板

image-20240211213108434

filebeat-indices

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
filebeat.inputs:
- type: log
tags: ["shabi"]
enabled: true
paths:
- /var/log/*
fields:
haha: "xixixi"
- type: log
tags: ["zhizhang"]
enabled: true
paths:
- /tmp/*
fields:
haha: "lululu"

output.elasticsearch:
hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
enabled: true
indices:
- index: "hahaha-xixi-uuu-%{+yyyy.MM.dd}"
when.contains:
tags: "shabi"
- index: "hahaha-xixi-ppp-%{+yyyy.MM.dd}"
when.contains:
tags: "zhizhang"

setup.template.name: "hahaha"
setup.template.pattern: "hahaha-xixi*"
setup.ilm.enabled: false

image-20240211214505454

egrep -v “#|$” /etc/kibana/kibana.yml

分片:一个分片只能在一个节点。分片里面存放的是文档。多分片适用于集群情况

每个文档通过 id 标识

路由计算:hash (文档 id) % 主分片数 = 分片编号

分片数量设置后不可以修改。副本数量可以修改

image-20240212192642807

机架感知:防止一个机柜故障导致副本无法访问

image-20240212193118761
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
filebeat.inputs:
- type: log
tags: ["shabi"]
enabled: true
paths:
- /var/log/*
fields:
haha: "xixixi"
- type: log
tags: ["zhizhang"]
enabled: true
paths:
- /tmp/*
fields:
haha: "lululu"

output.elasticsearch:
hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
enabled: true
indices:
- index: "hahaha-xixi-uuu-%{+yyyy.MM.dd}"
when.contains:
tags: "shabi"
- index: "hahaha-xixi-ppp-%{+yyyy.MM.dd}"
when.contains:
tags: "zhizhang"

# 设置索引模板的名称
setup.template.name: "hahaha"
# 设置索引模板的匹配模式
setup.template.pattern: "hahaha-xixi*"
# 禁用索引生命周期管理
setup.ilm.enabled: false
# 覆盖已有的索引模板
setup.template.overwrite: false
setup.template.settings:
index.number_of_shards: 3 # 分片
index.number_of_replicas: 0 # 副本,副本数量一定要小于集群中主机数量

image-20240212194140225

集群有红黄绿三种颜色

红色:集群的主分片未正常工作

黄色:集群的部分副本分片未正常使用

绿色:集群的主分片和副本分片可以访问

主分片和副本分片的区别

主分片可以读写

副本分片只读

推荐 10 分片 2 副本

image-20240212201413703

索引模式用于在 kibana.discover 中展示数据

索引模板:用于按照模板创建一系列索引。删除索引模板不会删除索引

image-20240212204717096

filebeat 收集 nginx 日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
yum install nginx -y 

filebeat.inputs:
- type: log
tags: ["access"]
enabled: true
paths:
- /var/log/nginx/access.log*

output.elasticsearch:
hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
enabled: true
index: "hahaha-xixi-nginx-%{+yyyy.MM.dd}"

# 设置索引模板的名称
setup.template.name: "hahaha"
# 设置索引模板的匹配模式
setup.template.pattern: "hahaha-xixi*"
# 禁用索引生命周期管理
setup.ilm.enabled: false
# 覆盖已有的索引模板
setup.template.overwrite: true
setup.template.settings:
index.number_of_shards: 3
index.number_of_replicas: 0

# 通过日志筛选某个字段

1.accesslog 直接变为 json 格式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
    log_format log_json '{ "@timestamp": "$time_local", '
'"remote_addr": "$remote_addr", '
'"referer": "$http_referer", '
'"request": "$request", '
'"status": $status, '
'"bytes": $body_bytes_sent, '
'"agent": "$http_user_agent", '
'"x_forwarded": "$http_x_forwarded_for", '
# '"up_addr": "$upstream_addr",'
# '"up_host": "$upstream_http_host",'
# '"up_resp_time": "$upstream_response_time",'
'"request_time": "$request_time"'
' }';
access_log /var/log/nginx/access.log log_json;


{ "@timestamp": "13/Feb/2024:15:23:23 +0800", "remote_addr": "192.168.13.1", "referer": "-", "request": "GET / HTTP/1.1", "status": 304, "bytes": 0, "agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:122.0) Gecko/20100101 Firefox/122.0", "x_forwarded": "-", "request_time": "0.000" }



filebeat.inputs:
- type: log
tags: ["access"]
enabled: true
paths:
- /var/log/nginx/access.log*
json.keys_under_root: true

output.elasticsearch:
hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
enabled: true
index: "hahaha-xixi-nginx-access-%{+yyyy.MM.dd}"

# 设置索引模板的名称
setup.template.name: "hahaha"
# 设置索引模板的匹配模式
setup.template.pattern: "hahaha-xixi*"
# 禁用索引生命周期管理
setup.ilm.enabled: false
# 覆盖已有的索引模板
setup.template.overwrite: true
setup.template.settings:
index.number_of_shards: 3
index.number_of_replicas: 0

image-20240213153307334

2. 借助 module 处理 (filebeat)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false

output.elasticsearch:
hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
enabled: true
index: "hahaha-xixi-nginx-access-%{+yyyy.MM.dd}"

# 设置索引模板的名称
setup.template.name: "hahaha"
# 设置索引模板的匹配模式
setup.template.pattern: "hahaha-xixi*"
# 禁用索引生命周期管理
setup.ilm.enabled: false
# 覆盖已有的索引模板
setup.template.overwrite: true
setup.template.settings:
index.number_of_shards: 3
index.number_of_replicas: 0

[root@elk101 ~/es-filebeat]$ filebeat -e -c /root/es-filebeat/nginx-module.yaml modules list
2024-02-13T15:58:56.904+0800 INFO instance/beat.go:685 Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat] Hostfs Path: [/]
2024-02-13T15:58:56.904+0800 INFO instance/beat.go:693 Beat ID: 86cb8587-cdc1-47a0-8ed8-82b95547ea30
Enabled:

Disabled:
activemq
apache
auditd
aws
awsfargate
azure
barracuda
bluecoat
cef


[root@elk101 ~/es-filebeat]$ filebeat -e -c /root/es-filebeat/nginx-module.yaml modules enable nginx tomcat
2024-02-13T16:04:33.044+0800 INFO instance/beat.go:685 Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat] Hostfs Path: [/]
2024-02-13T16:04:33.044+0800 INFO instance/beat.go:693 Beat ID: 86cb8587-cdc1-47a0-8ed8-82b95547ea30
Enabled nginx
Enabled tomcat

启用和禁用只取决于后缀是否是.yml 或者后面是disable

[root@elk101 ~/es-filebeat]$ cat /etc/filebeat/modules.d/nginx.yml
# Module: nginx
# Docs: https://www.elastic.co/guide/en/beats/filebeat/7.17/filebeat-module-nginx.html

- module: nginx
# Access logs
access:
enabled: true

# Set custom paths for the log files. If left empty,
# Filebeat will choose the paths depending on your OS.
var.paths: ["/var/log/nginx/access.log*"]

# Error logs
error:
enabled: true

# Set custom paths for the log files. If left empty,
# Filebeat will choose the paths depending on your OS.
var.paths: ["/var/log/nginx/error.log*"]

# Ingress-nginx controller logs. This is disabled by default. It could be used in Kubernetes environments to parse ingress-nginx logs
ingress_controller:
enabled: false

# Set custom paths for the log files. If left empty,
# Filebeat will choose the paths depending on your OS.
#var.paths:


image-20240213162820195

image-20240213164251436

3. 引入 logstash

# 收集 tomcat 日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
 filebeat -e -c /root/es-filebeat/nginx-module.yaml modules enable tomcat 

vim tomcat-module.yaml
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false

output.elasticsearch:
hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
enabled: true
index: "hahaha-xixi-tomcat-access-%{+yyyy.MM.dd}"

# 设置索引模板的名称
setup.template.name: "hahaha"
# 设置索引模板的匹配模式
setup.template.pattern: "hahaha-xixi*"
# 禁用索引生命周期管理
setup.ilm.enabled: false
# 覆盖已有的索引模板
setup.template.overwrite: true
setup.template.settings:
index.number_of_shards: 3
index.number_of_replicas: 0


vim /etc/filebeat/modules.d/tomcat.yml
# Module: tomcat
# Docs: https://www.elastic.co/guide/en/beats/filebeat/7.17/filebeat-module-tomcat.html

- module: tomcat
log:
enabled: true

# Set which input to use between udp (default), tcp or file.
var.input: file
# var.syslog_host: localhost
# var.syslog_port: 9501

# Set paths for the log files when file input is used.
var.paths:
- /usr/local/tomcat/logs/*.txt

# Toggle output of non-ECS fields (default true).
# var.rsa_fields: true

# Set custom timezone offset.
# "local" (default) for system timezone.
# "+02:00" for GMT+02:00
# var.tz_offset: local



var.input 有tcp udp file 三种形式
前两种对接syslog


image-20240214150406383
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
filebeat.inputs:
- type: log
tags: ["access"]
enabled: true
paths:
- /usr/local/tomcat/logs/tomcat.haha.com_access_log.2024-02-14.txt

output.elasticsearch:
hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
enabled: true
index: "hahaha-xixi-tomcat-%{+yyyy.MM.dd}"

# 设置索引模板的名称
setup.template.name: "hahaha"
# 设置索引模板的匹配模式
setup.template.pattern: "hahaha-xixi*"
# 禁用索引生命周期管理
setup.ilm.enabled: false
# 覆盖已有的索引模板
setup.template.overwrite: true
setup.template.settings:
index.number_of_shards: 3
index.number_of_replicas: 0


vim /usr/local/tomcat/conf/server.xml
161 <Host name="tomcat.haha.com" appBase="webapps"
162 unpackWARs="true" autoDeploy="true">
163
164 <!-- SingleSignOn valve, share authentication between web applications
165 Documentation at: /docs/config/valve.html -->
166 <!--
167 <Valve className="org.apache.catalina.authenticator.SingleSignOn" />
168 -->
169
170 <!-- Access log processes all example.
171 Documentation at: /docs/config/valve.html
172 Note: The pattern used is equivalent to using pattern="common" -->
173 <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
174 prefix="tomcat.haha.com_access_log" suffix=".txt"
175 pattern="{&quot;client&quot;:&quot;%h&quot;, &quot;client user&quot;:&quot;%l&quot;, &quot;authenticated&quot;:&quo t;%u&quot;, &quot;access time&quot;:&quot;%t&quot;, &quot;method&quot;:&quot;%r&quot;, &quot;status&quot;:&quot;%s&quot;, &q uot;send bytes&quot;:&quot;%b&quot;, &quot;Query?string&quot;:&quot;%q&quot;, &quot;partner&quot;:&quot;%{Referer}i&quot;, &quot;A gent version&quot;:&quot;%{User-Agent}i&quot;}"/>
176
177 </Host>
178 </Engine>
179 </Service>
180 </Server>


[root@elk101 /usr/local/tomcat/logs]$ cat tomcat.haha.com_access_log.2024-02-14.txt
{"client":"192.168.13.101", "client user":"-", "authenticated":"-", "access time":"[14/Feb/2024:15:55:51 +0800]", "method":"GET / HTTP/1.1", "status":"200", "send bytes":"11230", "Query?string":"", "partner":"-", "Agent version":"curl/7.29.0"}

image-20240214161223343

kql - kibana query language

image-20240214161753177

# 收集 tomcat 错误日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
filebeat.inputs:
- type: log
tags: ["error"]
enabled: true
paths:
- /usr/local/tomcat/logs/catalina.out
# json.key_under_root: true
multiline.type: pattern
multiline.pattern: '^\d{2}'
multiline.negate: true
multiline.match: after

output.elasticsearch:
hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
enabled: true
index: "hahaha-xixi-tomcat-error-%{+yyyy.MM.dd}"

# 设置索引模板的名称
setup.template.name: "hahaha"
# 设置索引模板的匹配模式
setup.template.pattern: "hahaha-xixi*"
# 禁用索引生命周期管理
setup.ilm.enabled: false
# 覆盖已有的索引模板
setup.template.overwrite: true
setup.template.settings:
index.number_of_shards: 3
index.number_of_replicas: 0

image-20240215140411494

# 收集 es 错误日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
filebeat.inputs:
- type: log
tags: ["error"]
enabled: true
paths:
- /var/log/elasticsearch/elk-100.log
# json.key_under_root: true
multiline.type: pattern
multiline.pattern: '^\['
multiline.negate: true
multiline.match: after

output.elasticsearch:
hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
enabled: true
index: "hahaha-xixi-tomcat-error-%{+yyyy.MM.dd}"

# 设置索引模板的名称
setup.template.name: "hahaha"
# 设置索引模板的匹配模式
setup.template.pattern: "hahaha-xixi*"
# 禁用索引生命周期管理
setup.ilm.enabled: false
# 覆盖已有的索引模板
setup.template.overwrite: true
setup.template.settings:
index.number_of_shards: 3
index.number_of_replicas: 0

image-20240215141620731

# 日志过滤

1
2
3
4
5
6
7
8
9
10
11
12
13
14
filebeat.inputs:
- type: log
tags: ["error"]
enabled: true
paths:
- /var/log/elasticsearch/elk-100.log
# json.key_under_root: true
multiline.type: pattern
multiline.pattern: '^\['
multiline.negate: true
multiline.match: after
// 包含指定的内容才会采集
include_lines: ["^ERR", "^WARN"]
exclude_lines: ["^ERR", "^WARN"]

不包含的结果是 offset 已经过了这行,但是 output 没有这行

黑名单优先级更高

# 收集 nginx 所有日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
filebeat.inputs:
- type: log
tags: ["access"]
enabled: true
paths:
- /var/log/nginx/access.log*
- type: log
tags: ["error"]
enabled: true
paths:
- /var/log/nginx/error.log*
include_lines: ['\[error\]']


output.elasticsearch:
hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
enabled: true
indices:
- index: "hahaha-xixi-nginx-access-%{+yyyy.MM.dd}"
when.contains:
tags: "access"
- index: "hahaha-xixi-nginx-error-%{+yyyy.MM.dd}"
when.contains:
tags: "error"

setup.template.name: "hahaha"
setup.template.pattern: "hahaha-xixi*"
setup.ilm.enabled: false // 禁用索引生命周期管理

# filestream 收集日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
[root@elk101 ~/es-filebeat]$ cat /var/log/nginx/access.log
{ "@timestamp": "15/Feb/2024:16:28:00 +0800", "remote_addr": "127.0.0.1", "referer": "-", "request": "GET / HTTP/1.1", "status": 200, "bytes": 615, "agent": "curl/7.29.0", "x_forwarded": "-", "request_time": "0.000" }


filebeat.inputs:
- type: filestream
id: haxi
tags: ["access"]
enabled: true
paths:
- /var/log/nginx/access.log*
parsers:
- ndjson:
keys_under_root: true



output.elasticsearch:
hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
enabled: true
indices:
- index: "hahaha-xixi-nginx-access-%{+yyyy.MM.dd}"
when.contains:
tags: "access"
- index: "hahaha-xixi-nginx-error-%{+yyyy.MM.dd}"
when.contains:
tags: "error"

setup.template.name: "hahaha"
setup.template.pattern: "hahaha-xixi*"
setup.ilm.enabled: false

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
filebeat.inputs:
- type: filestream
id: haxi
tags: ["access"]
enabled: true
paths:
- /usr/local/tomcat/logs/tomcat.haha.com_access_log.*.txt
parsers:
- ndjson:
keys_under_root: true

- type: filestream
id: haxixi
tags: ["error"]
enabled: true
paths:
- /usr/local/tomcat/logs/*.out
parsers:
- multiline:
type: pattern
pattern: '^\d{2}'
negate: true
match: after


output.elasticsearch:
hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
enabled: true
indices:
- index: "hahaha-xixi-nginx-access-%{+yyyy.MM.dd}"
when.contains:
tags: "access"
- index: "hahaha-xixi-nginx-error-%{+yyyy.MM.dd}"
when.contains:
tags: "error"

setup.template.name: "hahaha"
setup.template.pattern: "hahaha-xixi*"
setup.ilm.enabled: false

image-20240216143534847

input 源最好 少于 4 个

实例太多可以采用:拆分实例 / 日志聚合

运行多个实例的 filebeat 需要手工指定数据路径 --path.data=/tmp/filebeat 。运行实例后,会在文件夹生成一个 lock 文件,阻止其他 fileabeat 实例使用

# 日志聚合

rsyslog 收集系统日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
yum install rsyslog -y

vim /etc/rsyslog.conf
$ModLoad imtcp
$InputTCPServerRun 514
*.* /var/log/kkkkk.log # @10.0.0.1:514

systemctl restart rsyslog.service

logger "xixix" # logger命令测试日志文件

Feb 16 15:21:46 elk101 root: xixix


// 之后filesream中只需要收集kkkkk.log即可


# input.tcp

用于路由器 / 交换机等无法安装 linux 程序场景

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
filebeat.inputs:
- type: tcp
host: "0.0.0.0:8888"
tags: "access"


output.elasticsearch:
hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
enabled: true
indices:
- index: "hahaha-xixi-nginx-access-%{+yyyy.MM.dd}"
when.contains:
tags: "access"

setup.template.name: "hahaha"
setup.template.pattern: "hahaha-xixi*"
setup.ilm.enabled: false

telnet 1.1.1.1 8888

nc 1.1.1.1 8888

# output.file

1
2
3
output.file:
path: "/tmp/haha"
filename: sss.txt

# output.redis

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
yum install epel* -y 
yum install redis -y

vim /etc/redis.conf
bind 0.0.0.0
requirepass hahahaha

systemctl enable redis --now

redis-cli -a hahahaha -h 192.168.13.101 -p 6379 --raw -n 5



filebeat.inputs:
- type: tcp
host: "192.168.13.101:9000"
tags: "access"


output.redis:
hosts: ["192.168.13.101:6379"]
password: "hahahaha"
key: "xixixixi"
db: 5
timeout: 3
enabled: true
indices:
- index: "hahaha-xixi-nginx-access-%{+yyyy.MM.dd}"
when.contains:
tags: "access"

setup.template.name: "hahaha"
setup.template.pattern: "hahaha-xixi*"
setup.ilm.enabled: false



[root@elk101 ~]$ telnet 192.168.13.101 9000
Trying 192.168.13.101...
Connected to 192.168.13.101.
Escape character is '^]'.
aaaa
dddd


[root@elk102 ~]# redis-cli -a hahahaha -h 192.168.13.101 -p 6379 --raw -n 5
192.168.13.101:6379[5]> KEYS *
xixixixi
192.168.13.101:6379[5]> LRANGE xixixixi 0 -1
{"@timestamp":"2024-02-16T08:34:52.573Z","@metadata":{"beat":"filebeat","type":"_doc","version":"7.17.3"},"log":{"source":{"address":"192.168.13.101:58184"}},"tags":["access"],"input":{"type":"tcp"},"ecs":{"version":"1.12.0"},"host":{"name":"elk101"},"agent":{"version":"7.17.3","hostname":"elk101","ephemeral_id":"0ff0daea-3a89-47bb-84c2-e76108f22046","id":"6531f582-c541-42cb-97f2-6a8ff960316b","name":"elk101","type":"filebeat"},"message":"aaaa"}
{"@timestamp":"2024-02-16T08:34:53.371Z","@metadata":{"beat":"filebeat","type":"_doc","version":"7.17.3"},"log":{"source":{"address":"192.168.13.101:58184"}},"tags":["access"],"input":{"type":"tcp"},"ecs":{"version":"1.12.0"},"host":{"name":"elk101"},"agent":{"id":"6531f582-c541-42cb-97f2-6a8ff960316b","name":"elk101","type":"filebeat","version":"7.17.3","hostname":"elk101","ephemeral_id":"0ff0daea-3a89-47bb-84c2-e76108f22046"},"message":"dddd"}


1
echo 11111 | nc 192.168.13.101 9000

# Logstash

image-20240217192958149

1
2
3
4
wget "https://artifacts.elastic.co/downloads/logstash/logstash-7.17.3-x86_64.rpm"
yum install logstash-7.17.3-x86_64.rpm -y

ln -sv /usr/share/logstash/bin/logstash /usr/local/bin/

image-20240217193501783

# stdin && stdout

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
input {
stdin {}
}

output {
stdout {}
}

logstash -tf 111.conf
logstash -f 111.conf


111111
{
"@version" => "1",
"host" => "elk102",
"message" => "111111",
"@timestamp" => 2024-02-17T11:36:14.136Z
}

# input - file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
input {
file {
path => ["/tmp/*.txt"]
# logstash 默认从文件尾部读取。指定文件读取位置,只有sincedb种没有记录时生效。默认是end
start_position => "beginning"

}
}

output {
stdout {}
}

cat /usr/share/logstash/data/plugins/inputs/file/.sincedb_820ddbbd098cfece4b56f4fcbf67a9bb
2363504 0 2050 12 1708170120.662962 /tmp/xxxx.txt

ll -i /tmp/xxxx.txt
2363504 -rw-r--r-- 1 root root 12 Feb 17 13:57 /tmp/xxxx.txt

所有file都从头开始读取,删除db,设置beginning

选择 Logstash 开始读取文件的初始位置:开始或结束。默认行为将文件视为实时流,因此从最后开始。如果您有要导入的旧数据,请将其设置为begin。
此选项仅修改文件是新文件且以前未见过的“第一次接触”情况,即没有记录在 Logstash 读取的sincedb 文件中的当前位置的文件。如果文件之前已经看过,则此选项无效,将使用 sincedb 文件中记录的位置。

# input - tcp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
input {
tcp {
port => 8888
}
tcp {
port => 9999
}
}

output {
stdout {}
}



[root@elk101 ~]$ telnet 192.168.13.102 8888
eeeeee

{
"host" => "elk101",
"@timestamp" => 2024-02-17T11:58:05.831Z,
"port" => 60912,
"message" => "eeeeee\r",
"@version" => "1"
}

image-20240217200036913

# input - http

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
input {
http {
port => 8888
}
}

output {
stdout {}
}


{
"@timestamp" => 2024-02-17T12:15:58.521Z,
"@version" => "1",
"headers" => {
"upgrade_insecure_requests" => "1",
"http_accept" => "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"http_host" => "192.168.13.102:8888",
"connection" => "close",
"accept_language" => "zh-CN,zh;q=0.9",
"request_method" => "POST",
"accept_encoding" => "gzip, deflate, br",
"http_user_agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.6099.71 Safari/537.36",
"request_path" => "/",
"http_version" => "HTTP/1.1",
"cache_control" => "max-age=0",
"content_length" => "6"
},
"message" => "111111",
"host" => "192.168.13.1"
}

使用postman,burp发送数据,logstash读取body数据

image-20240217200636381

# input - redis

filebeat 配合 logstash 使用

image-20240217203344911

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
filebeat.inputs:
- type: tcp
host: "192.168.13.101:9000"
tags: "access"


output.redis:
hosts: ["192.168.13.101:6379"]
password: "hahahaha"
key: "xixixixi"
db: 5
timeout: 3
enabled: true
indices:
- index: "hahaha-xixi-nginx-access-%{+yyyy.MM.dd}"
when.contains:
tags: "access"

setup.template.name: "hahaha"
setup.template.pattern: "hahaha-xixi*"
setup.ilm.enabled: false


----------------------------------------------------

input {
redis {
data_type => "list"
db => 5
host => "192.168.13.101"
port => 6379
password => "hahahaha"
key => "xixixixi"

}
}

output {
stdout {}
}



{
"tags" => [
[0] "access"
],
"input" => {
"type" => "tcp"
},
"@timestamp" => 2024-02-16T08:34:52.573Z,
"log" => {
"source" => {
"address" => "192.168.13.101:58184"
}
},
"message" => "aaaa",
"ecs" => {
"version" => "1.12.0"
},
"agent" => {
"version" => "7.17.3",
"hostname" => "elk101",
"id" => "6531f582-c541-42cb-97f2-6a8ff960316b",
"ephemeral_id" => "0ff0daea-3a89-47bb-84c2-e76108f22046",
"name" => "elk101",
"type" => "filebeat"
},
"@version" => "1",
"host" => {
"name" => "elk101"
}
}

logstash 从 redis 取数据的时候是使用的 pop 方式,读完就删

# filebeat — logstash

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
logstash:
input {
beats {
port => 8809
}
}

output {
stdout {}
}


filebeat:
filebeat.inputs:
- type: tcp
host: "192.168.13.101:9000"
tags: "access"


output.logstash:
hosts: ["192.168.13.102:8809"]



{
"agent" => {
"name" => "elk101",
"version" => "7.17.3",
"ephemeral_id" => "76001a08-5496-418b-8f67-cc021e950a5d",
"id" => "3c315a09-5270-4638-997e-829692eb5eb9",
"hostname" => "elk101",
"type" => "filebeat"
},
"@timestamp" => 2024-02-17T12:42:08.409Z,
"@version" => "1",
"log" => {
"source" => {
"address" => "192.168.13.103:58750"
}
},
"tags" => [
[0] "access",
[1] "beats_input_codec_plain_applied"
],
"input" => {
"type" => "tcp"
},
"ecs" => {
"version" => "1.12.0"
},
"message" => "1111111111111",
"host" => {
"name" => "elk101"
}
}

# output - redis

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
input {
tcp {
port => 9999

}
}

output {
stdout {}
redis {
host => "192.168.13.101"
port => 6379
db => 10
password => "hahahaha"
data_type => "list"
key => "aaaaa"

}
}


{
"host" => "elk103",
"@version" => "1",
"@timestamp" => 2024-02-17T12:51:25.744Z,
"message" => "eeeeeeee\r",
"port" => 56288
}

[root@elk102 ~/logst-config]# redis-cli -h 192.168.13.101 -a hahahaha -n 10 --raw
192.168.13.101:6379[10]> LRANGE aaaaa 0 -1
1) "{\"host\":\"elk103\",\"@version\":\"1\",\"@timestamp\":\"2024-02-17T12:51:25.744Z\",\"message\":\"eeeeeeee\\r\",\"port\":56288}"

# output - file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
input {
tcp {
port => 9999

}
}

output {
stdout {}
file {
path => "/tmp/kkkkk.txt"
}
}

cat /tmp/kkkkk.txt
{"port":56292,"@timestamp":"2024-02-17T12:56:36.976Z","host":"elk103","message":"ttttttt\r","@version":"1"}

# output - es

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
input {
tcp {
port => 9999
}
}

output {
stdout {}
elasticsearch {
}
}




input {
tcp {
port => 9999
}
}

output {
stdout {}
elasticsearch {
hosts => ["elk101:9200", "elk102:9200", "elk103:9200"]
index => "uuu-logstash-%{+yyyy.MM.dd}"
}
}

image-20240218204850593

索引模板可以自定义

# if

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
input {
tcp {
port => 8888
type => "shabi"
}
}

output {
if [tpye] == "haha" {
elasticsearch {
hosts => ["elk101:9200", "elk102:9200", "elk103:9200"]
index => "uuu-logstash--haha-%{+yyyy.MM.dd}"
}
} else if [type] == "shabi" {
elasticsearch {
hosts => ["elk101:9200", "elk102:9200", "elk103:9200"]
index => "uuu-logstash--shabi-%{+yyyy.MM.dd}"
}
} else {
elasticsearch {
hosts => ["elk101:9200", "elk102:9200", "elk103:9200"]
index => "uuu-logstash--ppp-%{+yyyy.MM.dd}"
}
}
}

image-20240218211300883

# 多实例 logstash

1
logstash -f xxx.conf --path.data /tmp/haha

# filter

# gork

Grok 是将非结构化日志数据解析为结构化和可查询的好方法
基于正则匹配任意文本格式,尤其是解析任意文本并对其进行结构化。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
input {
beats {
port => 8888
}

}

filter {
grok {
match => {
# "message" => "%{HTTPD_COMMONLOG}"
"message" => "%{COMBINEDAPACHELOG}"
}
}
}

output {
stdout {}
elasticsearch {
hosts => ["elk101:9200", "elk102:9200", "elk103:9200"]
index => "uuu-logstash-%{+yyyy.MM.dd}"
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
input {
stdin {}
}

filter {
grok {
match => {
# patterns_dir => ["./patterns"] # patterns 目录下面有个任意文件定义了变量,如果使用非官方定义的变量
"message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"
}
}
}

output {
stdout {}
}


55.3.244.1 GET /index.html 15824 0.043
{
"bytes" => "15824",
"@version" => "1",
"host" => "elk102",
"method" => "GET",
"duration" => "0.043",
"@timestamp" => 2024-02-19T12:56:29.945Z,
"client" => "55.3.244.1",
"message" => " 55.3.244.1 GET /index.html 15824 0.043",
"request" => "/index.html"
}

remove_field 移除一些字段

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
input {
stdin {}
}

filter {
grok {
match => {
# patterns_dir => ["./patterns"] # patterns 目录下面有个任意文件定义了变量,如果使用非官方定义的变量
"message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"
}
remove_field => ["@version","tags"]
}
}

output {
stdout {}
}

add_field

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
input {
stdin {}
}

filter {
grok {
match => {
# patterns_dir => ["./patterns"] # patterns 目录下面有个任意文件定义了变量,如果使用非官方定义的变量
"message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"
}
remove_field => ["@version","tags"]
add_field => {
"xixi" => "xixixi"
"haha-ip" => %{clientip}
}
add_tag => ["ssss"]
}
}

output {
stdout {}
}

image-20240220131110249

image-20240220132930278

# date

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
filebeat.inputs:
- type: log
tags: ["access"]
enabled: true
paths:
- /var/log/nginx/access.log*

output.logstash:
hosts: ["192.168.13.102:8888"]

# 设置索引模板的名称
setup.template.name: "hahaha"
# 设置索引模板的匹配模式
setup.template.pattern: "hahaha-xixi*"
# 禁用索引生命周期管理
setup.ilm.enabled: false
# 覆盖已有的索引模板
setup.template.overwrite: true
setup.template.settings:
index.number_of_shards: 3
index.number_of_replicas: 0


input {
beats {
port => 8888
}
}

filter {
grok {
match => {
"message" => "%{COMBINEDAPACHELOG}"
}
}
date {
match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
timezone => "Asia/Shanghai"
# 将匹配到的时间字段解析后存储到目标字段,默认为@timestamp
target => "ooooo"
}
}

output {
stdout {}
elasticsearch {
hosts => ["elk101:9200", "elk102:9200", "elk103:9200"]
index => "uuu-logstash-%{+yyyy.MM.dd}"
}
}



{
"host" => {
"name" => "elk101"
},
"auth" => "-",
"log" => {
"offset" => 87,
"file" => {
"path" => "/var/log/nginx/access.log"
}
},
"verb" => "GET",
"httpversion" => "1.1",
"response" => "200",
"referrer" => "\"-\"",
"ooooo" => 2024-02-20T05:55:12.000Z,
"@timestamp" => 2024-02-20T07:29:00.018Z,
"timestamp" => "20/Feb/2024:13:55:12 +0800",

image-20240220153211206

# geoip

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
input {
beats {
port => 8888
}
}

filter {
grok {
match => {
"message" => "%{COMBINEDAPACHELOG}"
}
}
date {
match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
timezone => "Asia/Shanghai"
# 将匹配到的时间字段解析后存储到目标字段,默认为@timestamp
target => "ooooo"
}
geoip {
source => "clientip"
fields => ["city_name", "country_name"]
}
}

output {
stdout {}
elasticsearch {
hosts => ["elk101:9200", "elk102:9200", "elk103:9200"]
index => "uuu-logstash-%{+yyyy.MM.dd}"
}
}


"geoip" => {
"timezone" => "Asia/Shanghai",
"country_code3" => "CN",
"ip" => "223.5.5.5",
"city_name" => "Hangzhou",
"latitude" => 30.294,
"longitude" => 120.1619,
"region_code" => "ZJ",
"country_code2" => "CN",
"country_name" => "China",
"continent_code" => "AS",
"location" => {
"lat" => 30.294,
"lon" => 120.1619
},

# useragent

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
{ "@timestamp": "20/Feb/2024:17:01:44 +0800", "remote_addr": "192.168.13.1", "referer": "-", "request": "GET /jajaj HTTP/1.1", "status": 404, "bytes": 153, "http_user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:122.0) Gecko/20100101 Firefox/122.0", "x_forwarded": "-", "request_time": "0.000" }

filebeat.inputs:
- type: log
tags: ["access"]
enabled: true
paths:
- /var/log/nginx/access.log*
json.keys_under_root: true

output.logstash:
hosts: ["192.168.13.102:8888"]

# 设置索引模板的名称
setup.template.name: "hahaha"
# 设置索引模板的匹配模式
setup.template.pattern: "hahaha-xixi*"
# 禁用索引生命周期管理
setup.ilm.enabled: false
# 覆盖已有的索引模板
setup.template.overwrite: true
setup.template.settings:
index.number_of_shards: 3
index.number_of_replicas: 0



input {
beats {
port => 8888
}
}

filter {
date {
match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"]
timezone => "Asia/Shanghai"
# 将匹配到的时间字段解析后存储到目标字段,默认为@timestamp
target => "ooooo"
}
geoip {
source => "clientip"
fields => ["city_name", "country_name"]
}
useragent {
source => "http_user_agent"
}
}

output {
stdout {}
elasticsearch {
hosts => ["elk101:9200", "elk102:9200", "elk103:9200"]
index => "uuu-logstash-%{+yyyy.MM.dd}"
}
}


{
"request_time" => "0.000",
"status" => 404,
"@timestamp" => 2024-02-20T09:08:53.104Z,
"input" => {
"type" => "log"
},
"referer" => "-",
"x_forwarded" => "-",
"ecs" => {
"version" => "1.12.0"
},
"request" => "GET /jajaj HTTP/1.1",
"log" => {
"file" => {
"path" => "/var/log/nginx/access.log"
},
"offset" => 306
},
"host" => {
"name" => "elk101"
},
"agentttttt" => {
"os_major" => "10",
"version" => "122.0",
"name" => "Firefox",
"os_full" => "Windows 10",
"os_name" => "Windows",
"os_version" => "10",
"major" => "122",
"minor" => "0",
"device" => "Other",
"os" => "Windows"
},
"tags" => [
[0] "access",
[1] "beats_input_raw_event",
[2] "_geoip_lookup_failure"
],
"bytes" => 153,
"@version" => "1",
"agent" => {
"ephemeral_id" => "cb7c47b6-8857-42bb-a298-5e2c43a57980",
"version" => "7.17.3",
"name" => "elk101",
"type" => "filebeat",
"hostname" => "elk101",
"id" => "27c000fd-4c27-4a68-bd48-fc24ea1959aa"
},
"http_user_agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:122.0) Gecko/20100101 Firefox/122.0",
"remote_addr" => "192.168.13.1"
}

# mutate

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
input {
beats {
port => 8888
}
}

filter {
mutate {
split {
"message" => "|"
}
}
mutate {
add_field => {
"user_id" => "%{[message][0]}"
}
}
mutate {
convert => {
"filenum" => "integer"
}
}
mutate {
strip => {
"filenum" => ["integer"]
}
}
}

output {
stdout {}
elasticsearch {
hosts => ["elk101:9200", "elk102:9200", "elk103:9200"]
index => "uuu-logstash-%{+yyyy.MM.dd}"
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
input {
beats {
type => "oldboyedu-beats"
port => 8888
}
tcp {
type => "oldboyedu-tcp"
port => 9999
}
tcp {
type => "oldboyedu-tcp-new"
port => 7777
}
http {
type => "oldboyedu-http"
port => 6666
}
file {
type => "oldboyedu-file"
path => "/tmp/apps.log"
}
}
filter {
mutate {
add_field => {
"school" => "北京市昌平区沙河镇⽼男孩IT教育"
}
}
if [type] == ["oldboyedu-beats","oldboyedu-tcp-new","oldboyedu-http"]
{
mutate {
remove_field => [ "agent", "host", "@version", "ecs",
"tags","input", "log" ]
}
geoip {
source => "clientip"
target => "oldboyedu-linux80-geoip"
}
useragent {
source => "http_user_agent"
target => "oldboyedu-linux80-useragent"
}
} else if [type] == "oldboyedu-file" {
mutate {
add_field => {
"class" => "oldboyedu-linux80"
"address" => "北京昌平区沙河镇⽼男孩IT教育"
"hobby" => ["LOL","王者荣耀"]
}
remove_field => ["host","@version","school"]
}
} else {
mutate {
remove_field => ["port","@version","host"]
}
mutate {
split => {
"message" => "|"
}
add_field => {
"user_id" => "%{[message][1]}"
"action" => "%{[message][2]}"
"svip" => "%{[message][3]}"
"price" => "%{[message][4]}"
}
# 利⽤完message字段后,在删除是可以等!注意代码等执⾏顺序!
remove_field => ["message"]
strip => ["svip"]
}
mutate {
convert => {
"user_id" => "integer"
"svip" => "boolean"
"price" => "float"
}
}
}
}
output {
stdout {}
if [type] == "oldboyedu-beats" {
elasticsearch {
hosts =>
["10.0.0.101:9200","10.0.0.102:9200","10.0.0.103:9200"]
index => "oldboyedu-linux80-logstash-beats"
}
} else {
elasticsearch {
hosts =>
["10.0.0.101:9200","10.0.0.102:9200","10.0.0.103:9200"]
index => "oldboyedu-linux80-logstash-tcp"
}
}
}

image-20240221154547427

# kibana

试用 kibana 样例

image-20240221155924632

# dashborad

# pv

page view 页面访问量,在一定的统计周期内,用户每次刷新网页就会被计算

1. 创建 Visualize 库

创建基于聚合 -> 新建指标 -> 选择索引

image-20240224212012635

# IP

image-20240224214353415

# 带宽

image-20240224214831117

# 统计访问页面

image-20240224215436733

# dashboard

image-20240224220417920

image-20240224220707733

image-20240224221343389

image-20240224221458172

# 二进制部署

# 单节点 elasticsearch

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
yum install java-1.8.0-openjdk  java-1.8.0-openjdk-devel.x86_64 -y
java -version
openjdk version "1.8.0_402"
OpenJDK Runtime Environment (build 1.8.0_402-b06)
OpenJDK 64-Bit Server VM (build 25.402-b06, mixed mode)

echo "export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.402.b06-1.el7_9.x86_64" >> /etc/profile
echo "export PATH=$JAVA_HOME/bin:$PATH " >> /etc/profile
echo "export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar" >> /etc/profile
source /etc/profile

yum install wget vim lrzsz systemd unzip -y
wget "https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.3-linux-x86_64.tar.gz"
tar xzvf elasticsearch-7.17.3-linux-x86_64.tar.gz

echo "# elasticsearch" >> /etc/profile
echo "export ES_HOME=/root/elasticsearch-7.17.3" >> /etc/profile
echo "export PATH=$PATH:$ES_HOME/bin" >> /etc/profile
source /etc/profile

useradd elasticsearch
chown elasticsearch:elasticsearch -R elasticsearch-7.17.3

egrep -v "^#|^$" /root/elasticsearch-7.17.3/config/elasticsearch.yml
cluster.name: elk
network.host: 0.0.0.0
discovery.seed_hosts: ["192.168.13.101"]
cluster.initial_master_nodes: ["192.168.13.101"]

ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 15001
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 15001
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

vim /etc/security/limits.conf
* soft nofile 65535
* hard nofile 100000

logout

ulimit -Sn
65535
ulimit -Hn
100000

vim /etc/sysctl.conf
vm.max_map_count = 262144

sysctl -p

sysctl -q vm.max_map_count
vm.max_map_count = 262144

usermod -g root elasticsearch
su -c "elasticsearch" elasticsearch # -d后台启动,或者切到用户在启动


curl 192.168.13.101:9200/_cat/nodes
192.168.13.101 17 97 8 0.22 0.12 0.08 cdfhilmrstw * elk101
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
jps # 查看java相关进程信息
31139 Jps
30872 Elasticsearch


jmap -heap 30872 # 查看java堆栈信息
Attaching to process ID 30872, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.402-b06

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 1073741824 (1024.0MB)
NewSize = 174456832 (166.375MB)
MaxNewSize = 174456832 (166.375MB)
OldSize = 899284992 (857.625MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 157024256 (149.75MB)
used = 119552712 (114.01435089111328MB)
free = 37471544 (35.73564910888672MB)
76.13646136301388% used
Eden Space:
capacity = 139591680 (133.125MB)
used = 113230224 (107.98475646972656MB)
free = 26361456 (25.140243530273438MB)
81.11531002420774% used
From Space:
capacity = 17432576 (16.625MB)
used = 6322488 (6.029594421386719MB)
free = 11110088 (10.595405578613281MB)
36.26823712112312% used
To Space:
capacity = 17432576 (16.625MB)
used = 0 (0.0MB)
free = 17432576 (16.625MB)
0.0% used
concurrent mark-sweep generation:
capacity = 899284992 (857.625MB)
used = 73758768 (70.34184265136719MB)
free = 825526224 (787.2831573486328MB)
8.201934721045584% used

40897 interned Strings occupying 4927424 bytes.

vim /root/elasticsearch-7.17.3/config/jvm.options
-Xms512m
-Xmx512m

当内存>64G 的时候,最高设置为32G。否则设置为 1/2

重启服务

如果es文件在root下,那么此时一定要给对应的文件夹普通用户的权限,或者chown这个文件夹
同事也需要将这个普通用户所属组改成root usermod -g root elasticsearch
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[Unit]
Description=Elasticsearch #描述
After=network.target

[Service]
Type=simple
User=elasticsearch
LimitMEMLOCK=infinity
LimitNOFILE=65535
WorkingDirectory=/root/elasticsearch-7.17.3/
ExecStart=/root/elasticsearch-7.17.3/bin/elasticsearch
Restart=on-failure

[Install]
WantedBy=multi-user.target


systemctl daemon-reload

# 集群 es

1
2
3
4
5
vim /root/elasticsearch-7.17.3/config/elasticsearch.yml 
cluster.name: elk
network.host: 0.0.0.0
discovery.seed_hosts: ["elk101", "elk102", "elk103"]
cluster.initial_master_nodes: ["elk101", "elk102", "elk103"]

# kibana

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
wget "https://artifacts.elastic.co/downloads/kibana/kibana-7.17.3-linux-x86_64.tar.gz"
tar xzvf kibana-7.17.3-linux-x86_64.tar.gz

vim /etc/profile
# kibana
export KIBANA_HONE=/root/kibana-7.17.3-linux-x86_64/
export PATH=$PATH:$KIBANA_HONE/bin

source /etc/profile

chown elasticsearch:elasticsearch /root/kibana-7.17.3-linux-x86_64 -R

vim /root/kibana-7.17.3-linux-x86_64/config/kibana.yml
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
server.name: "kibana-server"
i18n.locale: "zh-CN"

su -c "kibana" elasticsearch
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
cat /root/kibana-7.17.3-linux-x86_64/bin/kibana.sh 
Kibana=/root/kibana-7.17.3-linux-x86_64
PID=""

if [ "$1" = "" ];
then
echo -e "\033[0;31m 未输入操作名 \033[0m \033[0;34m {start|stop|restart|status} \033[0m"
exit 1
fi

function query()
{
PID=`ps aux |grep elk|grep kibana|grep -v $0 | grep -v grep | awk '{print $2}'`
}

function start()
{
query

if [ x"$PID" != x"" ]; then
echo "kibana is running..."
else
#su elk<<!
nohup su -c $Kibana/bin/kibana "elasticsearch" >/root/kibana-7.17.3-linux-x86_64/logs/start.log 2>&1&
#!
echo "Start kibana success..."
fi
}

function stop()
{
echo "Stop kibana"
query
echo "WO $PID"
if [ x"$PID" != x"" ]; then
kill -TERM $PID
echo "kibana (pid:$PID) exiting..."
while [ x"$PID" != x"" ]
do
sleep 1
query
done
echo "kibana exited."
else
echo "kibana already stopped."
fi
}

function restart()
{
stop
sleep 2
start
}

case $1 in
start)
start;;
stop)
stop;;
restart)
restart;;
*)

esac


mkdir /root/kibana-7.17.3-linux-x86_64/logs
chown elasticsearch:elasticsearch -R logs/


[Unit]
Description=kibana.service
After=network.target

[Service]
Type=forking
User=elasticsearch
LimitCORE=infinity
LimitMEMLOCK=infinity
LimitNOFILE=65536
LimitNPROC=65536
ExecStart=/root/kibana-7.17.3-linux-x86_64/bin/kibana.sh start
ExecReload=/root/kibana-7.17.3-linux-x86_64/bin/kibana.sh restart
ExecStop=/root/kibana-7.17.3-linux-x86_64/bin/kibana.sh stop
KillMode=process
Restart=always

[Install]
WantedBy=multi-user.target
1
2
3
4
5
6
7
8
9
10
11
12
13
[Unit]
Description=kibana
After=network.target

[Service]
Type=simple
User=elasticsearch
ExecStart=/root/kibana-7.17.3-linux-x86_64/bin/kibana
PrivateTmp=true
User=elasticsearch

[Install]
WantedBy=multi-user.target

# logstash

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
wget https://artifacts.elastic.co/downloads/logstash/logstash-7.17.3-linux-x86_64.tar.gz
tar xvzf logstash-7.17.3-linux-x86_64.tar.gz

vim /etc/profile
export LOGSTASH_HOME=/root/logstash-7.17.3
export PATH=$PATH:$LOGSTASH_HOME/bin


input {
stdin{}
}

output {
stdout{}
}

logstash -rf 1.conf

# filebeat

1
2
3
4
5
6
7
8
9
10
11
12
wget https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.17.3-linux-x86_64.tar.gz
tar xzvf filebeat-7.17.3-linux-x86_64.tar.gz


filebeat.inputs:
- type: stdin
enabled: true

output.console:
pretty: true

/root/filebeat-7.17.3-linux-x86_64/filebeat -e -c /root/filebeat_config/1.yaml

# es-head

1
2
3

wget https://github.com/mobz/elasticsearch-head/archive/refs/tags/v5.0.0.tar.gz
tar xzvf v5.0.0.tar.gz

# ES rest 风格 api

1
2
3
4
5
6
7
8
9
10
11
12
json 语法


基础数据类型
字符串
数字
布尔值
空值 null

高级数据类型
数组 []
对象 {"1":"12"}
1
2
3
4
5
6
7
8
9
10
es
document: 文档,用户存储在es中的数据,最小的存储单元。文档采用json对象数据类型存储
field: 相当于数据库表的字段,对文档数据根据不同属性进行分类标识
index:索引,一个索引就是一个拥有相似特征的文档集合
shard:分片,一个索引有一个或多个分片。是真正存储数据的地方,对应的是一个lucene库
replica: 副本,一个分片可以有0个或多个副本。副本数量不等于0的时候,就会有主分片primary shard和副本分片replica shard
主分片:可以实现数据的读写操作
副本分片:可以实现数据读写操作,需要去主分片同步数据,当主分片挂掉,副本分片会变成主分片
Alloction:将分片分配给某个节点的过程,包含主分片和副本分片。如果是副本分片,还包含从主分片复制数据的过程,这个分配过程由master节点调度完成
Type: 在5.x和以前一个索引中我们可以定义一种或多种数据类型。在es7中仅支持_doc类型

# 索引

image-20240302134655477

image-20240302134913428

image-20240302135039214

image-20240302135140817

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
# 查询索引
GET http://192.168.13.101:9200/_cat/indices
GET http://192.168.13.101:9200/_cat/indices?v
GET http://192.168.13.101:9200/_cat/indices/.geoip_databases?v
GET http://192.168.13.101:9200/.kibana_7.17.3_001

# 创建索引
PUT http://192.168.13.101:9200/hahaha-new # 创建索引
PUT http://192.168.13.101:9200/hahaha-old
{
"settings": {
"index": {
"number_of_shards": "3",
"number_of_replicas": "3"
}
}
}

# 修改索引,只能修改副本数量,不能修改分片数量
PUT http://192.168.13.101:9200/hahaha-old/_settings
{
"number_of_replicas": "2"
}

# 删除索引
DELETE http://192.168.13.101:9200/hahaha-old

# 索引别名
POST http://192.168.13.101:9200/_aliases # 创建
{
"actions": [
{
"add": {
"index": "hahaha-new",
"alias": "ahahahah"
}
},
{
"add": {
"index": "hahaha-new",
"alias": "pppp"
}
}
]
}
GET http://192.168.13.101:9200/_aliases # 查看所有索引别名
POST http://192.168.13.101:9200/_aliases # 删除索引别名
{
"actions": [
{
"remove": {
"index": "hahaha-new",
"alias": "ahahahah"
}
}
]
}
POST http://192.168.13.101:9200/_aliases # 修改
{
"actions": [
{
"remove": {
"index": "hahaha-new",
"alias": "ahahahah"
}
},
{
"add": {
"index": "hahaha-new",
"alias": "segemnt"
}
}
]
}

# 关闭索引,数据在,不会被真实删除,但是不能进行读写,支持通配符
POST http://192.168.13.101:9200/hahaha-new/_close
POST http://192.168.13.101:9200/hahaha-new/_open


# 文档创建,es7中只有_doc这种类型。可以创建不同的类型的文档,文档中类型必须一致。
POST http://192.168.13.101:9200/teacher/_doc # 创建文档,会自动创建索引
{
"name":"shabi",
"age":"123"
}
POST http://192.168.13.101:9200/teacher/_doc/10010 # 10010 为文档ID
{
"name":"shabi",
"age":"123"
}
POST http://192.168.13.101:9200/teacher/linux
{
"name":"shabi",
"age":"123"
}
POST http://192.168.13.101:9200/teacher/_doc
{
"name":"shabi",
"age":"123"
}

# 查看文档
GET http://192.168.13.101:9200/teacher/_search
源数据:用户写入数据
元数据:描述用户写入数据,由es维护
GET http://192.168.13.101:9200/teacher/_doc/10010 # 查看id=10010的文档
HEAD http://192.168.13.101:9200/teacher/_doc/10010 # 文档是否存在

# 修改文档
POST http://192.168.13.101:9200/teacher/_doc/10010 # 全量更新
{
"name":"hahahahah",
"age":"123"
}
POST http://192.168.13.101:9200/teacher/_doc/10010/_update # 局部更新,只更新某个字段
{
"doc": {
"name": "hahahahahxixix"
}
}

# 删除文档
DELETE http://192.168.13.101:9200/teacher/_doc/_psT_o0BSW4RfxxQMeLl # 删除文档


bulk对JSON串的有着严格的要求。每个JSON串不能换行,只能放在同一行,同时,相邻的JSON串之间必须要有换行(Linux下是\n;Window下是\r\n)。bulk的每个操作必须要一对JSON串(delete语法除外)。
# 批量创建文档
POST http://192.168.13.101:9200/_bulk
{"create":{"_index":"hahaha"}}
{"name":"xixi","hobby":"luguan"}
{"create":{"_index":"hahaha"}}
{"name":"haha","hobby":"111"}
\n # 替换成一个换行

# 批量删除
POST http://192.168.13.101:9200/_bulk
{"delete":{"_index":"hahaha","_id": "AZscBI4BSW4RfxxQpuOP"}}
{"delete":{"_index":"hahaha","_id": "AZscBI4BSW4RfxxQpuOP"}}
\n{
"docs":[
{
"_index":"hahaha",
"_id": "ApsiBI4BSW4RfxxQnuNB"
}
]
}


# 批量修改
POST http://192.168.13.101:9200/_bulk
{"update":{"_index":"hahaha","_id": "ApsiBI4BSW4RfxxQnuNB"}}
{"doc":{"name":"xixi","hobby":"dapao"}}
\n

# 批量查看
POST http://192.168.13.101:9200/_mget
{
"docs":[
{
"_index":"hahaha",
"_id": "ApsiBI4BSW4RfxxQnuNB"
}
]
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# 创建mapping,将字段指定为自定义类型
PUT http://192.168.13.101:9200/teacher222/
{
"mappings" :{
"properties":{
"ip_addr":{
"type":"ip"
}
}
}
}

# 查看映射关系
GET http://192.168.13.101:9200/teacher222/

# 修改字段映射类型
PUT http://192.168.13.101:9200/teacher222/_mapping
{
"properties":{
"name":{
"type":"text",
"index":"true"
},
"gender":{
"type":"text",
"index":"true"
}
}
}

keyword在搜索时必须完全匹配
text可以模糊匹配
如果不想基于字段搜索,设置index:false

# 分词器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
GET http://192.168.13.101:9200/_analyze
{
"analyzer": "standard",
"text": "my name is sb"
}

{
"tokens": [
{
"token": "my",
"start_offset": 0,
"end_offset": 2,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "name",
"start_offset": 3,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "is",
"start_offset": 8,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "sb",
"start_offset": 11,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 3
}
]
}

标准分词器默认使用空格和符号进行切割分词
标准分词器默认使用单个汉字进行分割

wget https://github.com/infinilabs/analysis-ik/releases/download/v7.17.3/elasticsearch-analysis-ik-7.17.3.zip
unzip elasticsearch-analysis-ik-7.17.3.zip
chown -R elasticsearch:elasticsearch /root/elasticsearch-7.17.3/plugins/ik/
systemctl restart elasticsearch.service

GET http://192.168.13.101:9200/_analyze
{
"analyzer": "ik_max_word", # 细粒度分词
"text": "我是警察"
}

GET http://192.168.13.101:9200/_analyze
{
"analyzer": "ik_smart", # 粗粒度分词
"text": "我是警察"
}


# 自定义字典
在confi下创建一个dic文件,里面定义自己的词汇

加载字典
vim IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">hahaha.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

重启es
1
2
3
4
5
6
# 自定义分词器
PUT http://192.168.13.101:9200/hahaha


# 验证
GET http://192.168.13.101:9200/_analyze

image-20240305121419839

集群的每个节点都需要安装分词器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
{
"title": "戴尔(DELL)31.5英⼨ 4K 曲⾯ 内置⾳箱 低蓝光 影院级⾊彩 FreeSync技术 可壁挂 1800R 电脑显示器 S3221QS",
"price": 3399.00,
"brand": "Dell",
"weight": "15.25kg",
"item": "https://item.jd.com/100014940686.html"
},
{
"title": "三星(SAMSUNG)28英⼨ 4K IPS 10.7亿⾊ 90%DCI-P3 Eyecomfort2.0认证 专业设计制图显示器(U28R550UQC)",
"price": 2099.00,
"brand": "SAMSUNG",
"weight": "7.55kg",
"item": "https://item.jd.com/100009558656.html"
},
{
"title": "ALIENWARE外星⼈新品外设⾼端键⿏套装AW510K机械键盘cherry轴RGB/AW610M 610M ⽆线⿏标+510K机械键盘+510H⽿机",
"price": 6000.00,
"brand": "ALIENWARE外星⼈",
"weight": "1.0kg",
"item": "https://item.jd.com/10030370257612.html"
},
{
"title":"樱桃CHERRY MX8.0彩光87键游戏机械键盘合⾦⼥⽣樱粉⾊版 彩光-粉⾊红轴粉⾊箱 官⽅标配",
"price": 4066.00,
"brand": "樱桃CHERRY",
"weight": "1.0kg",
"item": "https://item.jd.com/10024385308012.html"
},
{
"title": "罗技(G)G610机械键盘 有线机械键盘 游戏机械键盘 全尺⼨背光机械键盘 吃鸡键盘 Cherry红轴",
"price": 429.00,
"brand": "罗技",
"weight": "1.627kg",
"item": "https://item.jd.com/3378484.html"
},
{
"title": "美商海盗船(USCORSAIR)K68机械键盘⿊⾊ 防⽔防尘樱桃轴体 炫彩背光游戏有线 红光红轴",
"price": 499.00,
"brand": "美商海盗船",
"weight": "1.41kg",
"item": "https://item.jd.com/43580479783.html"
},
{
"title": "雷蛇(Razer) 蝰蛇标准版 ⿏标 有线⿏标 游戏⿏标 ⼈体⼯程学 电竞 ⿊⾊6400DPI lol吃鸡神器cf",
"price": 109.00,
"brand": "雷蛇",
"weight": "185.00g",
"item": "https://item.jd.com/8141909.html"
},
{
"title": "罗技(G)G502 HERO主宰者有线⿏标 游戏⿏标 HERO引擎 RGB⿏标 电竞⿏标 25600DPI",
"price": 299.00,
"brand": "罗技",
"weight": "250.00g",
"item": "https://item.jd.com/100001691967.html"
},
{
"title": "武极 i5 10400F/GTX1050Ti/256G游戏台式办公电脑主机DIY组装机",
"price": 4099.00,
"brand": "武极",
"weight": "5.0kg",
"item": "https://item.jd.com/1239166056.html"
},
{
"title": "变异者 组装电脑主机DIY台式游戏 i5 9400F/16G/GTX1050Ti 战胜G1",
"price": 4299.00,
"brand": "变异者",
"weight": "9.61kg",
"item": "https://item.jd.com/41842373306.html"
},
{
"title": "宏碁(Acer) 暗影骑⼠·威N50-N92 英特尔酷睿i5游戏台机 吃鸡电脑主机(⼗⼀代i5-11400F 16G 256G+1T GTX1650)",
"price": 5299.00,
"brand": "宏碁",
"weight": "7.25kg",
"item": "https://item.jd.com/100020726324.html"
},
{
"title": "京天 酷睿i7 10700F/RTX2060/16G内存 吃鸡游戏台式电脑主机DIY组装机",
"price": 7999.00,
"brand": "京天",
"weight": "10.0kg",
"item": "https://item.jd.com/40808512828.html"
},
{
"title": "戴尔(DELL)OptiPlex 3070MFF/3080MFF微型台式机电脑迷你⼩主机客厅HTPC 标配 i5-10500T/8G/1T+256G 内置WiFi+蓝⽛ 全国联保 三年上⻔",
"price": 3999.00,
"brand": "DELL",
"weight": "2.85kg",
"item": "https://item.jd.com/10025304273651.html"
},
{
"title": "伊萌纯种英短蓝⽩猫活体猫咪幼猫活体英国短⽑猫矮脚猫英短蓝猫幼体银渐层蓝⽩活体宠物蓝猫幼崽猫咪宠物猫短 双⾎统A级 ⺟",
"price": 4000.00,
"brand": "英短",
"weight": "1.0kg",
"item": "https://item.jd.com/10027188382742.html"
},
{
"title": "柴墨 ⾦渐层幼猫英短猫宠物猫英短⾦渐层猫咪活体猫活体纯种⼩猫银渐层 双⾎统",
"price": 12000.00,
"brand": "英短",
"weight": "3.0kg",
"item": "https://item.jd.com/10029312412476.html"
},
{
"title": "Redmi Note10 Pro 游戏智能5G⼿机 ⼩⽶ 红⽶",
"price": 9999.00,
"brand": "⼩⽶",
"weight": "10.00g",
"item": "https://item.jd.com/100021970002.html"
},
{
"title": "【⼆⼿99新】⼩⽶Max3⼿机⼆⼿⼿机 ⼤屏安卓 曜⽯⿊ 6G+128G 全⽹通",
"price": 1046.00,
"brand": "⼩⽶",
"weight": "0.75kg",
"item": "https://item.jd.com/35569092038.html"
},
{
"title": "现货速发(10天价保)⼩⽶11 5G⼿机 骁⻰888 游戏智能⼿机 PRO店内可选⿊⾊ 套装版 12GB+256GB",
"price": 4699.00,
"brand": "⼩⽶",
"weight": "0.7kg",
"item": "https://item.jd.com/10025836790851.html"
},
{
"title": "⼩⽶⼿环6 NFC版 全⾯彩屏 30种运动模式 24h⼼率检测 50⽶防⽔ 智能⼿环",
"price": 279.00,
"brand": "⼩⽶",
"weight": "65.00g",
"item": "https://item.jd.com/100019867468.html"
},
{
"title": "HUAWEI MateView⽆线原⾊显示器⽆线版 28.2英⼨ 4K+ IPS 98% DCI-P3 10.7亿⾊ HDR400 TypeC 双扬声器 双MIC",
"price": 4699.00,
"brand": "华为",
"weight": "9.8kg",
"item": "https://item.jd.com/100021420806.html"
},
{
"title": "华为nova7se/nova7 se 5G⼿机( 12期免息可选 )下单享好礼 绮境森林乐活版 8G+128G(1年碎屏险)",
"price": 2999.00,
"brand": "华为",
"weight": "500.00g",
"item": "https://item.jd.com/10029312412476.html"
},
{
"title": "华为HUAWEI FreeBuds 4i主动降噪 ⼊⽿式真⽆线蓝⽛⽿机/通话降噪/⻓续航/⼩巧舒适 Android&ios通⽤ 陶瓷⽩",
"price": 479.00,
"brand": "华为",
"weight": "137.00g",
"item": "https://item.jd.com/100018510746.html"
},
{
"title": "HUAWEI WATCH GT2 华为⼿表 运动智能⼿表 两周⻓续航/蓝⽛通话/⾎氧检测/麒麟芯⽚ 华为gt2 46mm 曜⽯⿊",
"price": 1488.00,
"brand": "华为",
"weight": "335.00g",
"item": "https://item.jd.com/100008492922.html"
},
{
"title": "Apple苹果12 mini iPhone 12 mini 5G ⼿机(现货速发 12期免息可选)蓝⾊ 5G版 64G",
"price": 4699.00,
"brand": "苹果",
"weight": "280.00g",
"item": "https://item.jd.com/10026100075337.html"
},
{
"title": "Apple iPhone 12 (A2404) 128GB 紫⾊ ⽀持移动联通电信5G 双卡双待⼿机",
"price": 6799.00,
"brand": "苹果",
"weight": "330.00g",
"item": "https://item.jd.com/100011203359.html"
},
{
"title": "华硕ROG冰刃双屏 ⼗代英特尔酷睿 15.6英⼨液⾦导热300Hz电竞游戏笔记本电脑 i9-10980H 32G 2T RTX2080S",
"price": 48999.00,
"brand": "华硕",
"weight": "2.5kg",
"item": "https://item.jd.com/10021558215658.html"
},
{
"title": "联想⼩新Air15 2021超轻薄笔记本电脑 ⾼⾊域学⽣办公设计师游戏本 ⼋核锐⻰R7-5700U 16G内存 512G固态 升级15.6英⼨IPS全⾯屏【DC调光护眼⽆闪烁",
"price": 5499.00,
"brand": "苹果",
"weight": "10.0kg",
"item": "https://item.jd.com/33950552707.html"
},
{
"title": "苹果(Apple)MacBook Air 13.3英⼨ 笔记本电脑 【2020款商务灰】⼗代i7 16G 512G 官⽅标配 19点前付款当天发货",
"price": 10498.00,
"brand": "苹果",
"weight": "1.29kg",
"item": "https://item.jd.com/10021130510120.html"
},
{
"title": "科⼤讯⻜机器⼈ 阿尔法蛋A10智能机器⼈ 专业教育⼈⼯智能编程机器⼈学习机智能可编程 ⽩⾊",
"price": 1099.00,
"brand": "科⼤讯⻜",
"weight": "1.7kg",
"item": "https://item.jd.com/100005324258.html"
},
{
"title": "robosen乐森机器⼈六⼀⼉童节礼物⾃营孩⼦玩具星际特⼯智能编程机器⼈⼉童语⾳控制陪伴益智变形机器⼈",
"price": 2499.00,
"brand": "senpowerT9-X",
"weight": "3.01kg",
"item": "https://item.jd.com/100006740372.html"
},
{
"title": "优必选(UBTECH)悟空智能语⾳监控对话⼈形机器⼈⼉童教育陪伴早教学习机玩具",
"price": 4999.00,
"brand": "优必选悟空",
"weigth": "1.21kg",
"item": "https://item.jd.com/100000722348.html"
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# 查看索引模板
GET http://192.168.13.101:9200/_template

# 查看单个索引模板
GET http://192.168.13.101:9200/_template/.monitoring-es

# 创建/修改索引模板
POST http://192.168.13.101:9200/_template/uuuuuu
{
"index_patterns":[
"hahaha*"
],
"settings": {
"index": {
"number_of_shards": "3",
"number_of_replicas": "1"
}
},
"mappings":{
"properties" :{
"ip_addr":{
"type":"ip"
}
}
}
}

索引模板只对新创建的索引生效

# 删除索引模板
DELETE http://192.168.13.101:9200/_template/uuuuuu

# DSL

Elasticsearch 提供了基于 JSON 的完整 Query DSL (Domain Specific Language) 来定义查询。

# 全文类型检索

1
2
3
4
5
6
7
8
9
10
11
# match查询
POST http://192.168.13.101:9200/shopping/_search
{
"query":{
"match":{
"brand":"优必选悟空"
}
}
}

逻辑是对中文进行分词

# 完全匹配

1
2
3
4
5
6
7
8
9
10

POST http://192.168.13.101:9200/shopping/_search
{
"query":{
"match_phrase":{
"brand":"优必选悟空"
}
}
}

# 全量查询

1
2
3
4
5
6
POST http://10.0.0.103:9200/oldboyedu-shopping/_search
{
"query": {
"match_all" : {}
}
}

# 分页查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14

POST http://192.168.13.101:9200/shopping/_search
{
"query":{
"match_all":{}
},
"size:7,
"from":28
}

size:
指定每页显示多少条数据,默认值为10.
from:
指定跳过数据偏移量的大小,默认值为0,即默认看第一页。查询指定页码的from值="(页码-1)*每页数据大小(size)
1
2
3
4
5
6
7
8
9
10
11
12
POST http://192.168.13.101:9200/shopping/_search
{
"query":{
"match_all":{}
},
"size:7,
"from":28,
"_source":["brand","price"]
}


_source:指定只查看哪些字段

# 查询包含指定字段文档

1
2
3
4
5
6
7
8
9
10
11
POST http://192.168.13.101:9200/shopping/_search
{
"query":{
"exists":{
"field":"hobby"
}
},
}

返回包含指定字段的文档

# 语法高亮

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

POST http://192.168.13.101:9200/shopping/_search
{
"query":{
"match":{
"brand":"优必选悟空"
}
},
"highlight":{
"pre_tags":["<h1>"],
"post_tags":["</h1>"]
"fields":{
"brand":{}
}
}
}

# 字段排序

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
POST http://192.168.13.101:9200/shopping/_search
{
"query":{
"match":{
"brand":"优必选悟空"
}
},
"sort":{
"price":{
"order":asc""
}
}

asc
desc

# 多条件查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
POST http://10.0.0.103:9200/oldboyedu-shopping/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"brand": " 苹果"
}
},
{
"match": {
"price": 5499
}
}
]
}
}
}
must,must_not,should
最少匹配条件"minimum_should_match": 2

POST http://10.0.0.103:9200/oldboyedu-shopping/_search
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"brand": " 苹果"
}
},
{
"match": {
"price": 5499
}
}
],
"minimum_should_match": 2
}
}
}

bool:
可以匹配多个条件查询。其中有"must","must_not","should"。
"must"
必须匹配的条件。
"must_not"
必须不匹配的条件,即和must相反。
"should"
不是必要条件,满⾜其中之⼀即可,可以使⽤"minimum_should_match"来限制满⾜
要求的条件数量。

# 范围查询

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
POST http: //10.0.0.103:9200/oldboyedu-shopping/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"brand": " 苹果"
}
}
],
"filter": {
"range": {
"price": {
"gt": 5000,
"lt": 8000
}
}
}
}
}
}

# 精确匹配

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
POST http://10.0.0.103:9200/oldboyedu-shopping/_search
{
"query": {
"terms": {
"price": [
4699,
299,
4066
]
}
}
}


term匹配一个
terms匹配多个

# 多词搜索

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
POST http: //10.0.0.103:9200/oldboyedu-shopping/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": {
"query": "显示器曲⾯",
"operator": "and"
}
}
}
]
}
},
"highlight": {
"pre_tags": [
"<h1>"
],
"post_tags": [
"</h1>"
],
"fields": {
"title": {}
}
}
}
"operator"设置为"and"则⽂档必须包含"query"中的所有词汇,"operator"的默认值为"or"。

# 权重

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
POST http: //10.0.0.103:9200/oldboyedu-shopping/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"brand": {
"query": "⼩苹华"
}
}
}
],
"should": [
{
"match_phrase": {
"title": {
"query": "防⽔",
"boost": 2
}
}
},
{
"match_phrase": {
"title": {
"query": "⿊⾊",
"boost": 10
}
}
}
]
}
},
"highlight": {
"fields": {
"title": {},
"brand": {}
}
},
"_source": ""
}

# 聚合

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
POST http: //10.0.0.103:9200/oldboyedu-shopping/_search # 统计每个品牌的数量。
{
"aggs": {
"oldboyedu_brand_group": {
"terms": {
"field": "brand.keyword"
}
}
},
"size": 0
}
POST http: //10.0.0.103:9200/oldboyedu-shopping/_search # 统计苹果商品中最贵的。
{
"query": {
"match_phrase": {
"brand": "苹果"
}
},
"aggs": {
"oldboyedu_max_shopping": {
"max": {
"field": "price"
}
}
},
"size": 0
}

# es 集群迁移

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.23.tar.gz
tar xzvf elasticsearch-6.8.23.tar.gz


cluster.name: es-6
node.name: elk-101
path.data: /data/data
path.logs: /data/logs
network.host: 0.0.0.0
http.port: 19200
transport.tcp.port: 19300
discovery.zen.ping.unicast.hosts: ["elk101", "elk102", "elk103"]
discovery.zen.minimum_master_nodes: 2

chown -R elasticsearch:elasticsearch /data/
chown elasticsearch:elasticsearch /root/elasticsearch-6.8.23 -R

# 同步目录到其他节点
data_rsync.sh /root/elasticsearch-6.8.23

cp /etc/systemd/system/elasticsearch.service /etc/systemd/system/elasticsearch7.service
cp /etc/systemd/system/elasticsearch7.service /etc/systemd/system/elasticsearch6.service

[Unit]
Description=Elasticsearch
After=network.target

[Service]
Type=simple
User=elasticsearch
LimitMEMLOCK=infinity
LimitNOFILE=65535
WorkingDirectory=/root/elasticsearch-6.8.23
ExecStart=/root/elasticsearch-6.8.23/bin/elasticsearch
Restart=on-failure

[Install]
WantedBy=multi-user.target


systemctl daemon-reload
systemctl restart elasticsearch6.service

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# 同集群迁移
POST http://192.168.13.101:9200/_reindex
{
"source":{
"index":"shopping"
},
"dest":{
"index":"shopping-new"
}
}


# 跨集群迁移
编辑原集群配置文件
echo reindex.remote.whitelist: \"*:*\" >> elasticsearch.yml
systemctl restart elasticsearch6
POST http://192.168.13.101:9200/_reindex
{
"source":{
"index":"shopping",
"remote":{
"host":"http://192.168.13.101:19200"
},
"query":{
"match":{
"brand":"haha"
}
}
},
"dest":{
"index":"shopping-new"
}
}
把19200的数据迁移到9200,注意修改19200的配置文件,加上reindex

可以在迁移的时候使用query,决定迁移哪些数据

# logstash 实现集群迁移

1
2
3
4
5
6
7
8
9
10
11
12
13
input {
index => "shopping"
hosts => "192.168.13.101:19200"
query => '{"query": {"match_phrase":{"brand":"dell"}},"sort":["_doc"]}'
}

output {
elasticsearch {
index => "ooo"
hosts => "192.168.13.101:9200"
}

}

# es 健康 api

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# 集群状态
http://192.168.13.101:9200/_cluster/health
{
"cluster_name": "elk",
"status": "green",
"timed_out": false,
"number_of_nodes": 3,
"number_of_data_nodes": 3,
"active_primary_shards": 16,
"active_shards": 32,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100.0
}

Green
所有分片都已分配。
yellow
所有主分片都已分配,但一个或多个副本分片未分配。如果集群中的某个节点发生故障,则在修复该节点之前,某些数据可能不可用。
red
一个或多个主分片未分配,因此某些数据不可用。这可能会在集群启动期间短暂发生,因为分配了主分片。

查看详细分片状态
curl http://192.168.13.101:9200/_cluster/health?level=shards | jq



# 集群详细信息
http://192.168.13.101:9200/_cluster/settings?include_defaults
http://192.168.13.101:9200/_cluster/settings?include_defaults=true&flat_settings=true

# 集群更新设置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
如果您使用多种方法配置相同的设置,Elasticsearch会按以下优先顺序应用这些设置
Transient setting
Persistent setting
elasticsearch.yml setting
Default setting value

PUT http://192.168.13.101:9200/_cluster/settings
{
"persistent":{
"indices.recovery.max_bytes_per_sec":"50mb"
}
}

'cluster.routing.allocation.enable" :
"all":
允许所有分片类型进行分配。
"primaries"
仅允许分配主分片。
"new"
仅允许新索引分配主分片
"none"
不允许分配任何类型的分片

# 集群 state

1
2
3
GET http://192.168.13.101:9200/_cluster/state
GET http://192.168.13.101:9200/_cluster/state/nodes # 返回node内容
GET http://192.168.13.101:9200/_cluster/state/nodes,version/shopping* # 查看索引shopping* 的nodes和version信息

# 集群 stats 统计信息

1
2
GET http://192.168.13.101:9200/_cluster/stats
GET http://192.168.13.101:9200/_cluster/stats/nodes/elk101

# 集群分片分配情况

1
2
3
4
5
6
7
8
9
10
集群分配解释API的目的是为集群中的分片分配提供解释。
对于未分配的分片,解释 API 提供了有关未分配分片的原因的解释。
对于分配的分片,解释 API解释了为什么分片保留在其当前节点上并且没有移动或重新平衡到另一个节点。

GET http://192.168.13.101:9200/_cluster/allocation/explain
{
"index":"hahaha-new",
"shard":0,
"primary":"true"
}

# reroute api

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
reroute 命令允许手动更改集群中各个分片的分配。例如,可以将分片从一个节点显式移动到另一个节点,可以取消分配,并且可以将未分
配的分片显式分配给特定节点。

POST http://192.168.13.101:9200/_cluster/reroute
{
"commands": [
{
"move": {
"index": "teacher",
"shard": 0,
"from_node": "elk102",
"to_node": "elk101"
}
}
]
}

# 集群角色

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
node.role  
cdfhilmrstw

⻆⾊说明:
c :
Cold data
d :
data node
f :
frozen node
h :
hot node
i :
ingest node
l :
machine learning node
m :
master eligible node
r :
remote cluster client node
s :
content node
t :
transform node
v :
voting-only node
w :
warm node
- :
coordinating node only
常⽤的⻆⾊说明:
data node:
指的是存储数据的节点。
node.data: true
master node:
控制ES集群,并维护集群的状态(cluster state,包括节点信息,索引信息等,ES
集群每个节点都有⼀份)。
node.master: true
coordinating:
协调节点可以处理请求的节点,ES集群所有的节点均为协调节点,该⻆⾊⽆法取消。

image-20240310164635140

image-20240310165658810

image-20240310174634804

image-20240312223812350

image-20240312224711763

# 乐观锁

1
2
3
POST http://1.1.1.1:9200/index_exp/_doc/10001/?version=3&versin_type=external

POST http://1.1.1.1:9200/index_exp/_doc/10001/_update?if_seq_no=0&if_primary_term=1

# tls

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
bin/elasticsearch-certutil ca
#一直enter
bin/elasticsearch-certutil cert --ca elastic-stack-ca.p12
#一直enter

chown elasticsearch:elasticsearch elastic.p12

vim elasticsearch.yml
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: /root/elasticsearch-7.17.3/config/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: /root/elasticsearch-7.17.3/config/elastic-certificates.p12

Changed password for user apm_system
PASSWORD apm_system = 9sWQlhJehyYKrueHVD6h

Changed password for user kibana_system
PASSWORD kibana_system = E6evsbrrupM1ub90Uq58

Changed password for user kibana
PASSWORD kibana = E6evsbrrupM1ub90Uq58

Changed password for user logstash_system
PASSWORD logstash_system = aZlM0Wx3EgdD7Cn0oIyQ

Changed password for user beats_system
PASSWORD beats_system = sPUcXLl62KpC9apCDnLO

Changed password for user remote_monitoring_user
PASSWORD remote_monitoring_user = 82axLeZr4vvSMFhfZAqh

Changed password for user elastic
PASSWORD elastic = lPXaVbZBHvVdUcpI1bCQ

https://www.jianshu.com/p/d13b6074b545

注意文件所属着和所属组,证书一定要在 config 下面

image-20240313223859472

1
2
3
4
[root@elk101 ~/elasticsearch-7.17.3]$ vim /root/kibana-7.17.3-linux-x86_64/config/kibana.yml 

elasticsearch.username: "kibana_system"
elasticsearch.password: "E6evsbrrupM1ub90Uq58"

image-20240313224316040

# RBAC

image-20240316134801880

image-20240316135708419

1
2
3
4
5
6
7
8
9
10
11
12
13
14
input {
stdin{}
}

output {
elasticsearch {
index => "1111"
hosts => "10.0.0.101:9200"
user => "elastic"
password => "xxxx"

}
}
创建一个低权限用户来管理
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
filebeat.inputs:
- type: stdin
tags: ["stdin"]
enabled: true


output.elasticsearch:
hosts: ["http://elk101:9200", "http://elk102:9200", "http://elk103:9200"]
enabled: true
index: "hahaha-xixi-tomcat-error-%{+yyyy.MM.dd}"
username: "elastic"
password: "zzz"

# 设置索引模板的名称
setup.template.name: "hahaha"
# 设置索引模板的匹配模式
setup.template.pattern: "hahaha-xixi*"
# 禁用索引生命周期管理
setup.ilm.enabled: false
# 覆盖已有的索引模板
setup.template.overwrite: true
setup.template.settings:
index.number_of_shards: 3
index.number_of_replicas: 0

不建议使用elastic管理员用户

# kafka

filebeat 一般在靠近数据源测,易扩展

为了解决 logstash 的性能瓶颈,有三种解决方案

垂直扩容 - 升级单节点硬件设备 价钱昂贵,可能存在单点故障,容易达到硬件上限

水平扩容 - 加机器 - 缺点:高峰时期服务器利用率低

增加 MQ

消息队列:
使用消息队列解耦,缓冲数据,数据削峰

消息队列是在消息传输过程中保存消息的容器,多用于分布式系统之间进行通信。

劣势:

可靠性降低

系统复杂度提高

image-20240317175359357

image-20240317175522234

# 工作模式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
点对点模式:
⼀对⼀,消费者主动拉取数据,消息接受后消息会被删除

发布/订阅模式:
⼀对多,消费者消费数据后不会清除消息。
消费者可以主动去Broker服务器去拉取数据,当然,也可以是Broker主动推送数据。

拉取:
优点:
消费者程序可以根据自身的硬件配置去broker消费)
缺点:
消费者需要长期执行一个进程来询问broker是否有数据

推(push):
优点:
无需客户端主动拉去数据,而是由服务端主动发送数据
缺点:
(1)broker推送数据给消费者是,可能因为消费者消费能力不足,直接导致客户端程序崩溃掉
(2)broker内部需要维护一个订阅者列表,当订阅者较多时,可能会很占用内存

kafka broker仅支持拉取工作方式。

image-20240317180554895

image-20240317181336827

# 部署 kafka

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# 部署zookeerper集群
wget "https://dlcdn.apache.org/zookeeper/zookeeper-3.9.2/apache-zookeeper-3.9.2-bin.tar.gz" --no-check-certificate
tar zvxf apache-zookeeper-3.9.2-bin.tar.gz
cd apache-zookeeper-3.9.2-bin/

vim /etc/profile.d/kafka.sh
#! /bin/bash
export ZK_HOME=/root/apache-zookeeper-3.9.2-bin
export PATH=$PATH:$ZK_HOME/bin

source /etc/profile.d/kafka.sh
cp zoo_sample.cfg zoo.cfg

zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /root/apache-zookeeper-3.9.2-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

zkServer.sh status

zkCli.sh


# 部署kafka
wget https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz --no-check-certificate
tar xzvf kafka_2.13-3.7.0.tgz
cd kafka_2.13-3.7.0/

vim /etc/profile.d/kafka.sh
export KAFKA_HOME=/root/kafka_2.13-3.7.0
export PATH=$PATH:$KAFKA_HOME/bin
source /etc/profile.d/kafka.sh

vim server.properties
broker.id=99
zookeeper.connect=192.168.13.101:2181/kafka_3.7

kafka-server-start.sh -daemon /root/kafka_2.13-3.7.0/config/server.properties


tcp6 0 0 :::9092 :::* LISTEN 40507/java
tcp6 0 0 :::2181 :::* LISTEN 36697/java

jps
40803 Jps
40739 ZooKeeperMain
36697 QuorumPeerMain
48059 Elasticsearch
40507 Kafka
127132 Elasticsearch


# 验证
kafka-console-producer.sh --topic nginx --bootstrap-server 192.168.13.101:9092 # 不存在topic自动创建

kafka-console-producer.sh --topic nginx --bootstrap-server 192.168.13.101:9092


取数据默认从最新的取,加上--from-beginning 从头开始取数据

# 基本概念

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
broker server:
⼀台kafka服务器就是⼀个broker。通常情况下,"broker list"就是kafka
集群(cluster)。
topic:
对外提供的逻辑存储单位。和ES中的index有点类似。
parition:
实际存储数据的单元,⼀个topic最少对应⼀个或多个parition。和ES中的shard分⽚有点类似。
replica:
每个partition最少有⼀个或多个副本,如果有2个或以上副本,则副本分为leader和follower。
leader:
leader parition负责对kafka集群对读写(rw)操作,可以和客户端进⾏交互。
follower:
follower parition负责去leader parition同步数据,不可以和客户端进⾏交互。

client:
consumer API:
即消费者,指的是从boker拉取数据的⻆⾊。
每个消费者均⾪属于⼀个消费者组(consumer Group),⼀个消费者组内可以有多个消费者。

producer API:
即⽣产者,指的是往broker写⼊数据的⻆⾊。

admin API:
集群管理的相关API,包括topic,parititon,replica等管理。

stream API:
数据流处理等API,提供给Spark,Flink,Storm分布式计算框架提供数据流管道。

connect API:
连接数据库相关的API,例如将MySQL导⼊到kafka。

image-20240317195348018

# 集群

1
2
3
4
5
6
7
8
9
10
11
12
13
mkdir /data/kakfa
chown -R elasticsearch:elasticsearch /data/

vim server.properties
log.dirs=/data/kafka
broker.id=99
zookeeper.connect=192.168.13.101:2181/kafka_3.7

kafka-server-start.sh -daemon /root/kafka_2.13-3.7.0/config/server.properties


[zk: localhost:2181(CONNECTED) 7] ls /kafka_3.7/brokers/ids
[102, 103, 99]

# kafka topic

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
kafka-topics.sh --bootstrap-server 192.168.13.101:9092 --list 
__consumer_offsets
nginx

kafka-topics.sh --bootstrap-server 192.168.13.101:9092 --describe
Topic: nginx TopicId: THT6uHQmS3S8NzU12s5uIQ PartitionCount: 1 ReplicationFactor: 1 Configs:
Topic: nginx Partition: 0 Leader: 99 Replicas: 99 Isr: 99
Topic: __consumer_offsets TopicId: LYq2xfA9ShKZy0fgPLd7zg PartitionCount: 50 ReplicationFactor: 1 Configs: compression.type=producer,cleanup.policy=compact,segment.bytes=104857600
Topic: __consumer_offsets Partition: 0 Leader: 99 Replicas: 99 Isr: 99
Topic: __consumer_offsets Partition: 1 Leader: 99 Replicas: 99 Isr: 99

kafka-topics.sh --bootstrap-server 192.168.13.101:9092 --describe --topic nginx
Topic: nginx TopicId: THT6uHQmS3S8NzU12s5uIQ PartitionCount: 1 ReplicationFactor: 1 Configs:
Topic: nginx Partition: 0 Leader: 99 Replicas: 99 Isr: 99

kafka-topics.sh --bootstrap-server 192.168.13.101:9092 --create --topic hahaha --partitions 10 --replication-factor 1
Created topic hahaha.

topic 不能存在相同的名字
副本数量不能大于 brokers 的数量,范围1-32767


修改分区数量,只能调大不能调小
kafka-topics.sh --bootstrap-server 192.168.13.101:9092 --topic hahaha --alter --partitions 13
--alter不能修改副本选项

kafka-topics.sh --bootstrap-server 192.168.13.101:9092 --delete --topic nginx
删除多个时使用,分割


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
# 修改副本数量
vim resign.json
{
"topics": [{"topic": "hahaha"}],
"version": 1
}
kafka-reassign-partitions.sh --bootstrap-server 192.168.13.101:9092 --broker-list 99,102,103 --topics-to-move-json-file resign.json --generate
Current partition replica assignment
{"version":1,"partitions":[{"topic":"hahaha","partition":0,"replicas":[99],"log_dirs":["any"]},{"topic":"hahaha","partition":1,"replicas":[103],"log_dirs":["any"]},{"topic":"hahaha","partition":2,"replicas":[102],"log_dirs":["any"]},{"topic":"hahaha","partition":3,"replicas":[99],"log_dirs":["any"]},{"topic":"hahaha","partition":4,"replicas":[103],"log_dirs":["any"]},{"topic":"hahaha","partition":5,"replicas":[102],"log_dirs":["any"]},{"topic":"hahaha","partition":6,"replicas":[99],"log_dirs":["any"]},{"topic":"hahaha","partition":7,"replicas":[103],"log_dirs":["any"]},{"topic":"hahaha","partition":8,"replicas":[102],"log_dirs":["any"]},{"topic":"hahaha","partition":9,"replicas":[99],"log_dirs":["any"]},{"topic":"hahaha","partition":10,"replicas":[102],"log_dirs":["any"]},{"topic":"hahaha","partition":11,"replicas":[103],"log_dirs":["any"]},{"topic":"hahaha","partition":12,"replicas":[99],"log_dirs":["any"]}]}

Proposed partition reassignment configuration # 重新分配分区计划,还未执行
{"version":1,"partitions":[{"topic":"hahaha","partition":0,"replicas":[99],"log_dirs":["any"]},{"topic":"hahaha","partition":1,"replicas":[102],"log_dirs":["any"]},{"topic":"hahaha","partition":2,"replicas":[103],"log_dirs":["any"]},{"topic":"hahaha","partition":3,"replicas":[99],"log_dirs":["any"]},{"topic":"hahaha","partition":4,"replicas":[102],"log_dirs":["any"]},{"topic":"hahaha","partition":5,"replicas":[103],"log_dirs":["any"]},{"topic":"hahaha","partition":6,"replicas":[99],"log_dirs":["any"]},{"topic":"hahaha","partition":7,"replicas":[102],"log_dirs":["any"]},{"topic":"hahaha","partition":8,"replicas":[103],"log_dirs":["any"]},{"topic":"hahaha","partition":9,"replicas":[99],"log_dirs":["any"]},{"topic":"hahaha","partition":10,"replicas":[102],"log_dirs":["any"]},{"topic":"hahaha","partition":11,"replicas":[103],"log_dirs":["any"]},{"topic":"hahaha","partition":12,"replicas":[99],"log_dirs":["any"]}]}



vim rebalance.json
{
"version": 1,
"partitions": [
{
"topic": "hahaha",
"partition": 0,
"replicas": [
99,
102
],
"log_dirs": [
"any",
"any"
]
},
{
"topic": "hahaha",
"partition": 1,
"replicas": [
102
],
"log_dirs": [
"any"
]
},
{
"topic": "hahaha",
"partition": 2,
"replicas": [
103
],
"log_dirs": [
"any"
]
},
{
"topic": "hahaha",
"partition": 3,
"replicas": [
99
],
"log_dirs": [
"any"
]
},
{
"topic": "hahaha",
"partition": 4,
"replicas": [
102
],
"log_dirs": [
"any"
]
},
{
"topic": "hahaha",
"partition": 5,
"replicas": [
103
],
"log_dirs": [
"any"
]
},
{
"topic": "hahaha",
"partition": 6,
"replicas": [
99
],
"log_dirs": [
"any"
]
},
{
"topic": "hahaha",
"partition": 7,
"replicas": [
102
],
"log_dirs": [
"any"
]
},
{
"topic": "hahaha",
"partition": 8,
"replicas": [
103
],
"log_dirs": [
"any"
]
},
{
"topic": "hahaha",
"partition": 9,
"replicas": [
99
],
"log_dirs": [
"any"
]
},
{
"topic": "hahaha",
"partition": 10,
"replicas": [
102
],
"log_dirs": [
"any"
]
},
{
"topic": "hahaha",
"partition": 11,
"replicas": [
103
],
"log_dirs": [
"any"
]
},
{
"topic": "hahaha",
"partition": 12,
"replicas": [
99
],
"log_dirs": [
"any"
]
}
]
}


# 验证分配计划
kafka-reassign-partitions.sh --bootstrap-server 192.168.13.101:9092 --reassignment-json-file rebalance.json --verify
Status of partition reassignment:
There is no active reassignment of partition hahaha-0, but replica set is 99 rather than 99,102.
There is no active reassignment of partition hahaha-1, but replica set is 103 rather than 102.
There is no active reassignment of partition hahaha-2, but replica set is 102 rather than 103.
Reassignment of partition hahaha-3 is completed.
There is no active reassignment of partition hahaha-4, but replica set is 103 rather than 102.
There is no active reassignment of partition hahaha-5, but replica set is 102 rather than 103.
Reassignment of partition hahaha-6 is completed.
There is no active reassignment of partition hahaha-7, but replica set is 103 rather than 102.
There is no active reassignment of partition hahaha-8, but replica set is 102 rather than 103.
Reassignment of partition hahaha-9 is completed.
Reassignment of partition hahaha-10 is completed.
Reassignment of partition hahaha-11 is completed.
Reassignment of partition hahaha-12 is completed.

Clearing broker-level throttles on brokers 99,102,103


# 执行分配计划
kafka-reassign-partitions.sh --bootstrap-server 192.168.13.101:9092 --reassignment-json-file rebalance.json --execute

kafka-topics.sh --bootstrap-server 192.168.13.101:9092 --describe --topic hahaha
Topic: hahaha TopicId: 0TDZliscTFaZDRM9QL_mtA PartitionCount: 13 ReplicationFactor: 2 Configs:
Topic: hahaha Partition: 0 Leader: 99 Replicas: 99,102 Isr: 99,102

# 消费者组

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
kafka-consumer-groups.sh --bootstrap-server 192.168.13.101:9092 --list 
console-consumer-59612


kafka-consumer-groups.sh --bootstrap-server 192.168.13.101:9092 --describe --group console-consumer-59612

GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
console-consumer-59612 nginx 0 - 0 - console-consumer-71084746-0f0c-4438-873c-d30fb954f83e /192.168.13.102 console-consumer


kafka-consumer-groups.sh --bootstrap-server 192.168.13.101:9092 --describe --all-groups



kafka-consumer.sh --bootstrap-server 192.168.13.101:9092 --topic hahaha --from-beginning --consumer-property group.id="111"


同一个消费者组(consumer group)的消费者(consumer)不能同时去同一个分区(parititon)读取数据,避免数据重复消费;
当一个topic的分区数量增大时,消费者组的各个消费者将重新分配,即重新分配待消费分区的所属权;
当同一个消费者组的消费者数量发生变化时,也会触发rebalance,即重新分配待消费分区的所属权;

# 同一个集群部署不同版本 kafka

1
2
3
4
5
6
7
8
9
wget "https://archive.apache.org/dist/kafka/0.8.0/kafka_2.8.0-0.8.0.tar.gz" --no-check-certificate
tar xzvf kafka_2.8.0-0.8.0.tar.gz

vim server.properties
broker.id=99
zookeeper.connect=192.168.13.101:2181/kafka_3.7
port=19092

nohup ./kafka-server-start.sh /root/kafka_2.8.0-0.8.0/config/server.properties &

# znode 基础操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# 查看 / 下所有子znode列表
ls /
[kafka_0.8, kafka_3.7, zookeeper]

[zk: localhost:2181(CONNECTED) 3] stat /
cZxid = 0x0
ctime = Thu Jan 01 08:00:00 CST 1970
mZxid = 0x0
mtime = Thu Jan 01 08:00:00 CST 1970
pZxid = 0x136
cversion = 1
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 3

[zk: localhost:2181(CONNECTED) 4] ls -R /
/
/kafka_0.8
/kafka_3.7
/zookeeper
/kafka_0.8/admin
/kafka_0.8/brokers
/kafka_0.8/cluster
/kafka_0.8/config

# 查看信息
[zk: localhost:2181(CONNECTED) 5] get /zookeeper/config

# 创建znode,创建多级znode时,其父znode必须存在
[zk: localhost:2181(CONNECTED) 6] create /hahaha
Created /hahaha


[zk: localhost:2181(CONNECTED) 8] create /hahaha/haha 444
Created /hahaha/haha
[zk: localhost:2181(CONNECTED) 9] get /hahaha/haha
444

[zk: localhost:2181(CONNECTED) 10] set /hahaha/haha 888
[zk: localhost:2181(CONNECTED) 11] get /hahaha/haha
888

[zk: localhost:2181(CONNECTED) 12] delete /hahaha/haha
[zk: localhost:2181(CONNECTED) 13] deleteall /hahaha # 递归删除

# offset

kafka 最早 offset 放在 zk 上,后来放在了 broker 上,一个叫做_consumer_Offset 的 topic 中

早器 <0.9 后期> 0.9

当消费者组数量增多时,对 zk 写入有较大压力

1
2
3
4
5
6
7
8
9
10
< 0.9
get /oldboyedu-linux80-kafka-0_8_0/consumers/oldboyedu/offsets/oldboyedu-linux80/0

>0.9
kafka-consumer-groups.sh --bootstrap-server 10.0.0.101:9092 --describe --group oldboyedu-linux



查看内置的__consumer_offsets数据
kafka-console-consumer.sh --bootstrap-server 10.0.0.101:9092 --topic __consumer_offsets --formatter "kafka.coordinator.group.GroupMetadataManager\$OffsetsMessageFormatter" --from-beginning | grep oldboyedu-linux

# kafka 监控组件 eagle

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
kafka-server-stop.sh

vim kafka-server-start.sh
if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then
#export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G"
export KAFKA_HEAP_OPTS="-server -Xmx256M -Xms256M -Xms256M -XX:PermSize=128m -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:ParallelGCThreads=8 -XX:ConcGCThreads=5 -XX:InitiatingHeapOccupancyPercent=70"
export JMX_PORT="8888"
fi

# 其余节点同步配置文件,启动kafka

yum install mariadb-server -y
systemctl enable mariadb.service --now

mysql
CREATE DATABASE kafka DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
CREATE USER admin IDENTIFIED BY 'kafka';

GRANT ALL ON kafka.* TO admin;

show GRANTS FOR admin;
+------------------------------------------------------------------------------------------------------+
| Grants for admin@% |
+------------------------------------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'admin'@'%' IDENTIFIED BY PASSWORD '*DCB9E0CF558C6CC2E574E66EF0C9CA18276BDDB1' |
| GRANT ALL PRIVILEGES ON `kafka`.* TO 'admin'@'%' |
+------------------------------------------------------------------------------------------------------+
2 rows in set


wget "https://github.com/smartloli/kafka-eagle-bin/archive/v3.0.1.tar.gz"
tar xzvf v3.0.1.tar.gz
cd kafka-eagle-bin-3.0.1/
tar xzvf efak-web-3.0.1-bin.tar.gz

vim ke.sh
export KE_JAVA_OPTS="-server -Xmx512M -Xms512M -XX:MaxGCPauseMillis=20 -XX:+UseG1GC -XX:MetaspaceSize=128m -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80"

vim system-config.properties
efak.zk.cluster.alias=cluster1,cluster2
cluster1.zk.list=192.168.13.101:2181/kafka_0.8
cluster2.zk.list=192.168.13.101:2181/kafka_3.7
cluster1.efak.offset.storage=zk
cluster2.efak.offset.storage=kafka
efak.topic.token=keadmin
efak.driver=com.mysql.cj.jdbc.Driver
efak.url=jdbc:mysql://127.0.0.1:3306/kafka?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull
efak.username=admin
efak.password=kafka

vim /etc/profile.d/kafka.sh
export KE_HOME=/root/kafka-eagle-bin-3.0.1/efak-web-3.0.1/
export PATH=$PATH:$KE_HOME/bin

source /etc/profile.d/kafka.sh

ke.sh start


# kafka 压测

1
2
3
4
5
6
7
8
9
10
11
[root@elk101 /tmp/kafka-logs]$ mkdir /tmp/kafka-test 


cat > oldboyedu-kafka-test.sh <<'EOF'
# 创建topic
kafka-topics.sh --bootstrap-server 192.168.13.101:9092,192.168.13.102:9092,192.168.13.103:9092 --topic oldboyedu-kafka --replication-factor 1 --partitions 10 --create
# 启动消费者消费数据
nohup kafka-consumer-perf-test.sh --broker-list 192.168.13.101:9092,192.168.13.102:9092,192.168.13.103:9092 --topic oldboyedu-kafka --messages 100000000 &>/tmp/kafka-test/oldboyedu-kafka-consumer.log &
# 启动⽣产者写⼊数据
nohup kafka-producer-perf-test.sh --num-records 100000000 --record-size 1000 --topic oldboyedu-kafka --throughput 1000000 --producer-props bootstrap.servers=192.168.13.101:9092,192.168.13.102:9092,192.168.13.103:9092 &> /tmp/kafka-test/oldboyedu-kafka-producer.log &
EOF
Edited on

Give me a cup of [coffee]~( ̄▽ ̄)~*

John Doe WeChat Pay

WeChat Pay

John Doe Alipay

Alipay

John Doe PayPal

PayPal