运维开发网

ES写入性能优化实录

运维开发网 https://www.qedev.com 2020-10-22 12:20 出处:51CTO 作者:我的二狗呢
ES写入性能优化

背景:

公司的各个微服务在逐步接入ES APM 这个监控体系,但是metrics写入量较大(每个metrics的长度很小,但是频率很高),通过logstash往ES写数据时候频繁报写入队列已满,写入拒绝,运维侧需要对ES做写入优化。

优化措施

1、调整ES的索引持久化参数

主要是调整下面4个参数:

"index.translog.durability" : "async",  
"index.translog.flush_threshold_size" : "512mb",
"index.translog.sync_interval" : "120s",
"index.refresh_interval" : "120s"

index.refresh_interval 这个参数需要特别注意,如果你们公司对索引实时性要求很高,就不要像我上面这样设置了(默认这个参数是1秒钟,建议允许的话,改大点,日志系统可以建议设置到60s,能大幅提升性能)。

具体含义不再赘述,参考官方文档。

需要说明的是,这个动作是索引级别的,因此我们每当有新索引创建完成后都要执行这个操作,不然对新索引是不会生效的。

所以,还需要有个update_settings.sh 的脚本,来定期操作。

# 调整 es的索引的写入参数,牺牲持久性来换取高写入性能
curl -s -HContent-Type:application/json  --user elastic:'xxxxxx' -XPUT http://1.2.3.4:9200/_all/_settings?preserve_existing=true -d '
{
  "index.translog.durability" : "async",  
  "index.translog.flush_threshold_size" : "512mb",
  "index.translog.sync_interval" : "120s",
  "index.refresh_interval" : "120s"
}
' | jq .

2、调整logstash运行参数

主要调整如下3个参数:

pipeline.workers: 8
pipeline.batch.size: 4000
pipeline.batch.delay: 50

logstash的配置文件放到configmap里面,如下 cat logstash-configmap.yaml :

apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-configmap
data:
  logstash.yml: |
    http.host: "0.0.0.0"
    path.config: /usr/share/logstash/pipeline
    http.port: 9600-9700
    log.level: info
    path.logs: /var/log/logstash
    pipeline.workers: 8
    pipeline.batch.size: 4000
    pipeline.batch.delay: 50
  logstash.conf: |
    input {
            kafka {
                group_id => "logstash-kafka-apm-new"
                bootstrap_servers  => "10.10.1.14:9092,10.10.1.13:9092,10.10.1.12:9092"
                topics => ["elastic-apm"]
                auto_offset_reset => "latest"
                max_partition_fetch_bytes => "10485760"
                codec => "json"
            }
        }
        output {
            elasticsearch {
                hosts => ["1.2.3.4:9200"]
                manage_template => false
                index => "apm-7.4.0-%{[processor][event]}-%{+YYYY.MM.dd}"
                user => elastic
                password => "xxxxxxxxxxxx"
            }
        }

deployment 配置如下 cat logstash-7.4-apm-deployment.yaml :

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
  generation: 1
  labels:
    app: logstash-apm-prod
  name: logstash-apm-prod
  namespace: logging
spec:
  replicas: 6
  selector:
    matchLabels:
      app: logstash-apm-prod
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: logstash-apm-prod
    spec:
      containers:
      - command:
        - /usr/share/logstash/bin/logstash
        image: logstash:7.4.0
        imagePullPolicy: IfNotPresent
        name: logstash
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
         - name: config-volume
           mountPath: /usr/share/logstash/config
         - name: logstash-pipeline-volume
           mountPath: /usr/share/logstash/pipeline
      hostAliases:
      - ip: "10.10.1.12"
        hostnames:
        - "kafka-01"
      - ip: "10.10.1.13"
        hostnames:
        - "kafka-02"
      - ip: "10.10.1.14"
        hostnames:
        - "kafka-03"
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      terminationGracePeriodSeconds: 30
      volumes:
      - name: config-volume
        configMap:
          name: logstash-configmap
          items:
            - key: logstash.yml
              path: logstash.yml
      - name: logstash-pipeline-volume
        configMap:
          name: logstash-configmap
          items:
           - key: logstash.conf
             path: logstash.conf

性能

硬件配置:

5台 8C32G ES -普通SSD磁盘

调整后,ES写入性能有大幅提升。

日常消费:ES消费能力大约是110w每分钟。

极限测试:通过开12个logstash来消费测试,索引ES的写入峰值能达到220w左右每分钟,此时logstash侧有bulk写入报错,提示ES write queue full。

扫码领视频副本.gif

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号