概念理解

node(节点): 节点，即1个服务器实例
cluster(集群): 多个相同集群名的节点的集合
index(索引): 类似数据库
type(类型): 类似数据表
shard(数据分片):
- 每个index划分为多个分片，分片按类型可划分为主分片(primary)和副本分片(replica),通常说shard指的是primary shard。
- 主分片存储索引数据；副本分片主分片的备份存储主分片数,即副本分片起冗余和提供读性能的作用据(类似主从)。
- 主分片与其对应的副本分片不会出现在同一个节点上。集群节点为1时默认一个index含5个主数据分片及1个副本分片(只有一个节点，但是不会分配，冗余没用)；集群节点大于1时一个主分片对应一个副本分片。
- 当向索引插入数据时，ES会自动判定应该记录到哪个主分片。
- 同一个index的不同分片会分布于集群的不同node中。
document(文档): 类似一条数据，document由多个field构成
document type(文档类型): 类似数据的类型。一个index可能有多种document type。
field(字段): 类似column
mapping(映射): 存储field的映射信息

示例: primary shard(1,2..,5)和 replica(1R,2R…5R)
======初始=============
Node1: 1 2 3 4R 5R
Node2: 1R 2R 3R 4 5
======Node2挂掉=========>
//即使Node2挂掉，Node1仍然有全部的索引数据，与此同时replica会自动升级为primary
Node1: 1 2 3 4 5
~~Node2: 1R 2R 3R 4 5~~
======Node2修复==========>
//Node2修复好，集群分片重新分配
Node1: 1R R 3 4 5R
Node2: 1 2 3R 4R 5

参考文档:
Shards and replicas in Elasticsearch

常用命令

## 在此借助kibana中devl tool访问
# 查询所有索引
GET localhost:9200/_all/
# 修改索引映射信息
PUT myindex
{
  "mappings": {
    "doc": {
      "properties": {
        "name":{
          "type":"keyword"
        }
      }
    }
  }
}
# 向索引插入数据
POST myindex/doc                //需要指定type 这里是doc
{
    "name":"xiaohai"
}
# 获取索引下数据
GET myindex/_search     //获取索引myindex的内容
GET myindex/doc/_search //获取myindex索引中类型doc的内容
# 删除索引
DELETE myindex     //删除索引myindex

elasticsearch

索引

设置

1 2	# 查看索引设置 http://localhost:9200/myindex/_settings

映射(mapping)

# 查看索引的映射信息
http://localhost:9200/myindex/_mapping
# 查看模板信息
http://10.75.30.49:9200/_template

类型

复杂类型官方文档

array
object
在es中其实没有明确的array类型，每个字段都可以包含多个同类型的值，比如
[“one”,”two”][1,2,3]
[{“name”:”xiaoming”,”age”:18},{“name”:”xiaohong”,”age”:20}]

索引一旦创建,只可以加字段映射及修改一些搜索字段映射信息

示例
修改映射模板信息

//http://localhost:9200/myindex
{
  "my_index": {
    "mappings": {                           //映射信息
      "_doc": {
        "dynamic_templates": [              //动态映射模板    https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html#dynamic-templates
          {
            "full_name": {                  //动态模板名 可以随意起名
              "path_match": "name.*",       //匹配name字段的子字段
              "path_unmatch": "*.middle",   //不匹配*.middle子字段
              "mapping": {
                "copy_to": "full_name",     //映射到顶层的full_name下
                "type": "text"              //映射类型为text
              }
            }
          },
          {
            "longs_as_strings": {           //匹配所有默认规则判断为string类型且字段名以long_开始不以_text结尾的字段，映射为long类型
                "match_mapping_type": "string",
                "match":   "long_*",
                "unmatch": "*_text",
                "mapping": {
                  "type": "long"
                }
              }
            }
        ],
        "properties": {                     //精确指定的映射规则
          "full_name": {                    //字段名
            "type": "text",                 //映射的字段类型
            "fields": {                     //将字段映射为多种类型
              "keyword": {                  //字段full_name.keyword       https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html#multi-fields
                "type": "keyword",          //将full_name.keyword映射为keyword类型
                "ignore_above": 256
              }
            }
          },
          "name": {
            "properties": {
              "first": {
                "type": "text",
                "copy_to": [
                  "full_name"
                ]
              },
              "last": {
                "type": "text",
                "copy_to": [
                  "full_name"
                ]
              },
              "middle": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

修改配置模板信息

1 2	# 修改模板zeroreplicas 将所有索引的复制分片设置为0 curl -XPUT 10.75.30.40:9200/_template/zeroreplicas -d '{"template" : "*","settings" : {"number_of_replicas" : 0}}'

Es的template

集群

集群配置

es配置

# node-1
cluster.name: mycluster         # 同一集群cluster名称相同
node.name: node-1               # 结点名称
network.host: 10.0.0.1          # 绑定的ip地址，默认地址127.0.0.1，如果要对外(非本机)提供服务，必须改为非回环地址
http.port: 9200                 # 用于客户端crud的http端口
discovery.zen.ping.unicast.hosts: [127.0.0.1:9301,127.0.0.1:9302]   # 单播，提供应该尝试连接的机器列表
transport.tcp.port: 9300                                            # es集群通讯使用的的端口
discovery.zen.minimum_master_nodes: 2                               # 通常是n/2+1 n是节点数量
thread_pool.bulk.queue_size: 1000                                   # 调整批量传输队列大小
thread_pool.index.queue_size: 500                                   # 调整数据索引队列大小
cluster.routing.allocation.node_initial_primaries_recoveries: 8     # 并发恢复分片数
# node-2
cluster.name: mycluster         # 同一集群cluster名称相同
node.name: node-2               # 结点名称
network.host: 10.0.0.2
http.port: 9201                 # 用于客户端crud的http端口
discovery.zen.ping.unicast.hosts: [127.0.0.1:9300,127.0.0.1:9302]   # 单播，提供应该尝试连接的机器列表
transport.tcp.port: 9301                                            # es集群通讯使用的的端口
discovery.zen.minimum_master_nodes: 2                               # 通常是n/2+1 n是节点数量
thread_pool.bulk.queue_size: 1000                                   # 调整批量传输队列大小
thread_pool.index.queue_size: 500                                   # 调整数据索引队列大小
cluster.routing.allocation.node_initial_primaries_recoveries: 8     # 并发恢复分片数
# node-3
...

jvm配置


# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
# 通常设置为最大内存的一半
-Xms15g
-Xmx15g

命令

http://localhost:9200/_cluster/settings?v           # 集群配置信息
http://localhost:9200/_cluster/health?v             # 集群健康
http://localhost:9200/_cluster/state>v              # 集群状态信息
http://localhost:9200/_cluster/stats?human&pretty # 集群统计信息
http://localhost:9200/_cluster/pending_tasks?v      # 等待执行的集群任务
# 集群重路由
# 通常用于对集群中的分片重新划分
POST /_cluster/reroute
{
    "commands" : [
        {
            "move" : {
                "index" : "test", "shard" : 0,
                "from_node" : "node1", "to_node" : "node2"
            }
        },
        {
          "allocate_replica" : {
                "index" : "test", "shard" : 1,
                "node" : "node3"
          }
        }
    ]
}

性能优化

索引速度优化

启动

1	ES_PATH_CONF=/Users/sunxiangke/project/elk/es/cluster/n2/etc/ /usr/local/opt/elasticsearch/bin/elasticsearch

分片

该接口对于调查”为什么切片没有分配或还没被移动/平衡至其他结点”很有帮助
对于未分配的分片，该接口会给出为什么分片没有被分配
对于已分配的分片，接口会给出解释为什么分片还在这个node上，而不是移动或重新平衡到其他结点

使用explain接口获取分片情况的解释

GET /_cluster/allocation/explain

7.0临时笔记

elk7.0
    发现
        发现指的是集群信息模块找到组其他节点的过程，这些节点构成了集群
        seed hosts provider:
            默认有 基于配置和基于文件的种子主机提供者两种方式
        基于配置的种子主机提供者

        discovery.seed_hosts:
            - 192.168.1.10:9300
            - 192.168.1.11:9300
    1
2
3
4
5
6
7
8
9
10
11
12
13
14
15         基于文件的种子主机提供者
        https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-discovery-hosts-providers.html#built-in-hosts-providers
discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes
docker run -d --name elasticsearch --net aa -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.2.0
//cat命令
//查看所有可以cat的命令
http://10.75.30.125:9200/_cat

=^.^=
/_cat/repositories
/_cat/plugins
/_cat/allocation
/_cat/count
/_cat/count/{index}
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/health
/_cat/nodeattrs
/_cat/recovery
/_cat/recovery/{index}
/_cat/indices
/_cat/indices/{index}
/_cat/snapshots/{repository}
/_cat/tasks
/_cat/thread_pool
/_cat/thread_pool/{thread_pools}
/_cat/shards
/_cat/shards/{index}
/_cat/nodes
/_cat/master
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/segments
/_cat/segments/{index}
/_cat/templates


#查看集群节点
http://127.0.0.1:9200/_cat/nodes?v    ?v表示显示详细信息，增加字段名

ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.75.30.40 23 100 28 12.91 14.71 17.65 mdi * node-10.75.30.40
10.75.32.232 70 100 2 0.89 0.83 0.58 mdi - node-10.75.32.232
10.75.30.125 51 100 32 2.25 2.46 2.25 mdi - mynode-10.75.30.125


#查看集群健康情况
http://10.75.30.125:9200/_cat/health?v

epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1561710195 08:23:15 elasticsearch yellow 3 3 2216 1176 0 3 118 0 - 94.8%


docker run -d --name kibana --net aa -p 5601:5601 kibana:7.2.0
索引、文档和REST API
    mapping 定义文档字段的类型
    setting 定义不同的数据分布
    ![20190628156166153245402.png](http://pic.aipp.vip/20190628156166153245402.png)

FAQ

1. logstash出现`number_format_exception`的错误信息

查看索引的mapping后发现mapping中字段类型是long，但是插入的数据类型是字符串，于是导致插入失败

解决方法有两种
1.数据插入时转换字段值为字符串
2.使用映射模板指定字段类型

2. es集群中有一台机器正常运行，但是却没有分配任何分片

cluster.routing.allocation.disk.watermark.low
控制低水平的磁盘使用。它默认为 85% ，这意味着 ES 不会在节点使用超过 85% 的磁盘时将新的分片分配给节点。如果小于配置的可用空间，它也可以设置为绝对字节值 ( 如 500mb )，以防止 ES 分配分片。
cluster.routing.allocation.disk.watermark.high
控制高水印。它默认为 90%，意味着如果节点磁盘使用率高于 90% ，ES 将尝试将分片重新定位到另一个节点。它也可以设置为一个绝对字节值 ( 类似 low watermark ) ，以便将分片重定位一次小于节点上配置的空间量。

调整水位线配置后恢复正常
cluster.routing.allocation.disk.watermark.low: 100gb
cluster.routing.allocation.disk.watermark.high: 50gb

参考文档

Shards and replicas in Elasticsearch
ElasticSearch单机双实例的配置方法

启动检查报错

1	Elastic search max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

解决方法:
https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html

1	sysctl -w vm.max_map_count=262144