本文最后更新于 666 天前,其中的信息可能已经有所发展或是发生改变
前言
现在公司用的是ELK日志架构做日志收集和展示分析,所以相对一些日志进行关键字进行告警,比如MQTT、Nginx40X和50X状态请求、后端服务ERROR严重错误日志,通过监控ES日志数据,然后使用Python调用钉钉接口来实现日志告警
- 5.0版本之后Elastic将一些重要的插件整合成了X-Pack(需要收费)
- 所以使用开源的ElastAlert
ElastAlert2
1、ElastAlert是Yelp 公司开源的一套用 Python写的报警框架。属于后来 Elastic.co 公司出品的 Watcher 同类产品。ElastAlert2是原始Yelp/elasteralert项目的延续~
2、它支持多种监控模式和告警方式
- 电子邮件:通过电子邮件发送报警
- Slack:将报警发送到Slack通道
- Microsoft Teams:将报警发送到Microsoft Teams频道
- JIRA:在JIRA上创建问题
- 寻呼机职责:通过寻呼机
- Amazon SNS:将报警发送到 Amazon Simple Notification Service(SNS)主题
- Webhook:通过自定义HTTP POST请求发送报警
- Telegram:将报警发送到Telegram群组或用户
- Google Chat:将报警发送到Google Chat房间
- Twilio:通过电话或短信发送报警
- Alerta:将报警发送到Alerta API
- Datadog:将报警发送到Datadog事件流
- Gitter:将报警发送到Gitter聊天室
- VictorOps:通过VictorOps触发报警
3、但是并不支持钉钉告警,在github上面有第三方的钉钉Python项目:https://github.com/xuyaoqiang/elastalert-dingtalk-plugin
最终效果图
后端service
前端web
- Bytespider字节跳动旗下的今日头条爬虫。。。
部署
1、先准备钉钉机器人
- 复制webhook
2、从GitHub上克隆elastalert2项目
git clone https://github.com/jertel/elastalert2.git
cd elastalert2
3、克隆钉钉告警插件项目
git clone https://github.com/xuyaoqiang/elastalert-dingtalk-plugin.git
4、创建一个config.yaml文件在Elastalert2项目的根目录
es_host: 10.10.10.179
es_port: 9200
rules_folder: rules # ElastAlert2的规则文件夹路径
run_every: # ElastAlert2检查新警报的频率
minutes: 1
buffer_time: # 缓冲时间,用于拉取历史数据
minutes: 15
writeback_index: elastalert_status # 存储警报状态的索引名称
alert_time_limit: # 警报的过期时间
days: 2
5、将钉钉告警插件复制到Elastalert2项目的elastalert
目录
cp elastalert-dingtalk-plugin/elastalert_modules/dingtalk_alert.py elastalert2/elastalert/
6、在Elastalert2项目根目录下创建rules
目录
mkdir rules
7、创建一个名为example_rule.yaml
的规则文件,并添加以下内容。请务必替换<your_dingtalk_webhook>
为您的钉钉webhook地址
name: Example rule
type: any
index: logstash-*
num_events: 50
timeframe:
hours: 4
filter:
- term:
some_field: "some_value"
alert:
- "elastalert.dingtalk_alert.DingTalkAlerter"
dingtalk_webhook: '<your_dingtalk_webhook>'
dingtalk_msgtype: 'text'
8、修改Dockerfile
FROM python:3-slim-buster as builder
LABEL description="ElastAlert 2 Official Image"
LABEL maintainer="Jason Ertel"
COPY . /tmp/elastalert
RUN mkdir -p /opt/elastalert && \
cd /tmp/elastalert && \
pip install setuptools wheel && \
python setup.py sdist bdist_wheel
FROM python:3-slim-buster
ARG GID=1000
ARG UID=1000
ARG USERNAME=elastalert
COPY --from=builder /tmp/elastalert/dist/*.tar.gz /tmp/
COPY elastalert/dingtalk_alert.py /opt/elastalert/elastalert/
RUN pip install requests -i https://mirrors.aliyun.com/pypi/simple/
COPY requirements.txt /tmp/
RUN pip install --no-cache-dir -r /tmp/requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
RUN apt update && apt -y upgrade && \
apt -y install jq curl gcc libffi-dev && \
rm -rf /var/lib/apt/lists/* && \
pip install /tmp/*.tar.gz && \
rm -rf /tmp/* && \
apt -y remove gcc libffi-dev && \
apt -y autoremove && \
mkdir -p /opt/elastalert && \
echo "#!/bin/sh" >> /opt/elastalert/run.sh && \
echo "set -e" >> /opt/elastalert/run.sh && \
echo "elastalert-create-index --config /opt/elastalert/config.yaml" \
>> /opt/elastalert/run.sh && \
echo "elastalert --config /opt/elastalert/config.yaml \"\$@\"" \
>> /opt/elastalert/run.sh && \
chmod +x /opt/elastalert/run.sh && \
groupadd -g ${GID} ${USERNAME} && \
useradd -u ${UID} -g ${GID} -M -b /opt -s /sbin/nologin \
-c "ElastAlert 2 User" ${USERNAME}
USER ${USERNAME}
ENV TZ "UTC"
WORKDIR /opt/elastalert
ENTRYPOINT ["/opt/elastalert/run.sh"]
9、更换为阿里源
FROM python:3-slim-buster as builder
LABEL description="ElastAlert 2 Official Image"
LABEL maintainer="Jason Ertel"
COPY . /tmp/elastalert
RUN mkdir -p /opt/elastalert && \
cd /tmp/elastalert && \
pip install setuptools wheel && \
python setup.py sdist bdist_wheel
FROM python:3-slim-buster
ARG GID=1000
ARG UID=1000
ARG USERNAME=elastalert
COPY --from=builder /tmp/elastalert/dist/*.tar.gz /tmp/
COPY elastalert/dingtalk_alert.py /opt/elastalert/elastalert/
RUN pip install requests -i https://mirrors.aliyun.com/pypi/simple/
COPY requirements.txt /tmp/
RUN pip install --no-cache-dir -r /tmp/requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
RUN echo 'deb http://mirrors.aliyun.com/debian/ buster main non-free contrib\n\
deb-src http://mirrors.aliyun.com/debian/ buster main non-free contrib\n\
deb http://mirrors.aliyun.com/debian-security buster/updates main\n\
deb-src http://mirrors.aliyun.com/debian-security buster/updates main\n\
deb http://mirrors.aliyun.com/debian/ buster-updates main non-free contrib\n\
deb-src http://mirrors.aliyun.com/debian/ buster-updates main non-free contrib\n\
deb http://mirrors.aliyun.com/debian/ buster-backports main non-free contrib\n\
deb-src http://mirrors.aliyun.com/debian/ buster-backports main non-free contrib' > /etc/apt/sources.list && \
apt update && apt -y upgrade && \
apt -y install jq curl gcc libffi-dev && \
rm -rf /var/lib/apt/lists/* && \
pip install /tmp/*.tar.gz && \
rm -rf /tmp/* && \
apt -y remove gcc libffi-dev && \
apt -y autoremove && \
mkdir -p /opt/elastalert && \
echo "#!/bin/sh" >> /opt/elastalert/run.sh && \
echo "set -e" >> /opt/elastalert/run.sh && \
echo "elastalert-create-index --config /opt/elastalert/config.yaml" \
>> /opt/elastalert/run.sh && \
echo "elastalert --config /opt/elastalert/config.yaml \"\$@\"" \
>> /opt/elastalert/run.sh && \
chmod +x /opt/elastalert/run.sh && \
groupadd -g ${GID} ${USERNAME} && \
useradd -u ${UID} -g ${GID} -M -b /opt -s /sbin/nologin \
-c "ElastAlert 2 User" ${USERNAME}
USER ${USERNAME}
ENV TZ "UTC"
WORKDIR /opt/elastalert
ENTRYPOINT ["/opt/elastalert/run.sh"]
10、使用Docker构建Elastalert2镜像
docker build -t elastalert2 .
11、使用Docker运行Elastalert2容器
docker run -d --name elastalert2 -v $(pwd)/config.yaml:/opt/elastalert/config.yaml -v $(pwd)/rules:/opt/elastalert/rules elastalert2
修改rules格式
1、service后端rules
name: "prd-service-***"
type: "frequency"
index: "prd-service-*"
is_enabled: true
num_events: 1
timeframe:
minutes: 1
realert:
minutes: 1
timestamp_field: "@timestamp"
timestamp_type: "iso"
use_strftime_index: false
alert_text_type: alert_text_only
alert_text: |
🚨「prd-service告警信息」
────────────────
时间: {0}
级别: 高
服务名称: {1}
索引名称: {2}
采集路径: {3}
触发次数: {4}
日志信息: {5}
详情请登录「kibana」查看: http://10.10.10.171:5000
有疑惑请[联系运维同学]
alert_text_args:
- "@timestamp"
- "fields.source"
- "_index"
- "log.file.path"
- "num_hits"
- "message"
filter:
- query:
query_string:
query: "message: ERROR"
alert:
- "elastalert.dingtalk_alert.DingTalkAlerter"
dingtalk_webhook: "https://oapi.dingtalk.com/robot/send?access_token=***"
dingtalk_msgtype: "text"
2、去除不必要的日志
name: "prd-service-gedai"
type: "frequency"
index: "prd-service-gedai*"
is_enabled: true
num_events: 1
timeframe:
minutes: 1
realert:
minutes: 1
timestamp_field: "@timestamp"
timestamp_type: "iso"
use_strftime_index: false
alert_text_type: alert_text_only
alert_text: |
🚨「个贷告警信息」
────────────────
时间: {0}
级别: 高
服务名称: {1}
索引名称: {2}
采集路径: {3}
日志信息: {4}
详情请登录「kibana」查看: http://10.10.10.171:5000
有疑惑请[联系运维同学]
alert_text_args:
- "@timestamp"
- "fields.source"
- "_index"
- "log.file.path"
- "message"
filter:
- query:
query_string:
query: "(message: ERROR* OR message: *Exception*) AND NOT message: *MethodArgumentNotValidException* AND NOT message: *serviceExceptionHandler* AND NOT message: *DEBUG*"
alert:
- "elastalert.dingtalk_alert.DingTalkAlerter"
dingtalk_webhook: "https://oapi.dingtalk.com/robot/send?access_token=***"
dingtalk_msgtype: "text"
3、web前端rules
name: "prd-web-***" # 规则的名称,用于识别和记录
type: "frequency" # 规则类型,这里是 "frequency",表示在给定的时间范围内匹配到的事件数量达到某个阈值时触发告警
index: "prd-web-*" # Elasticsearch 索引的名称模式,表示从哪个索引获取数据
is_enabled: true # 表示规则是否启用
num_events: 1 # 在指定的时间范围内需要匹配的事件数量,以触发告警
timeframe: # 用于计算事件频率的时间范围
minutes: 1
realert: # # 在触发告警后,多长时间内不再触发新的告警
minutes: 1
timestamp_field: "@timestamp" # 在触发告警后,多长时间内不再触发新的告警
timestamp_type: "iso" # 事件时间戳字段的格式类型
use_strftime_index: false # 是否使用strftime格式动态构建索引名称
alert_text_type: alert_text_only # 告警文本的格式类型
alert_text: | # 告警文本模板,其中的占位符将被实际值替换
🚨「prd-sweb告警信息」
────────────────
时间: {0}
级别: 高
服务名称: {1}
索引名称: {2}
采集路径: {3}
触发次数: {4}
请求IP: {5}
请求状态: {6}
请求URL: {7}
日志信息: {8}
详情请登录「kibana」查看: http://10.10.10.171:5000
有疑惑请[联系运维同学]
alert_text_args: # 用于替换告警文本模板中占位符的字段列表 (去kibana找详细信息)
- "@timestamp"
- "fields.source"
- "_index"
- "log.file.path"
- "num_hits"
- "nginx.real_ip"
- "nginx.status"
- "nginx.url"
- "message"
filter: # 查询和筛选匹配的事件
- query:
query_string:
query: "nginx.status: [400 TO 499] OR nginx.status: [500 TO 599]"
alert: # 规定告警类型和处理器,这里使用了一个钉钉机器人
- "elastalert.dingtalk_alert.DingTalkAlerter"
dingtalk_webhook: "https://oapi.dingtalk.com/robot/send?access_token=***" # 钉钉机器人的 webhook URL
dingtalk_msgtype: "text" # 钉钉机器人发送的消息类型