blackbox黑盒探测
本文最后更新于 779 天前,其中的信息可能已经有所发展或是发生改变

介绍

  • 黑盒探测:主要关注的现象,一般都是正在发生的东西,例如出现一个告警,业务接口不正常,那么这种监控就是站在用户的角度能看到的监控,重点在于能对正在发生的故障进行告警
  • 白盒探测:主要关注的是原因,也就是系统内部暴露的一些指标,例如 redis 的 info 中显示 redis slave down,这个就是 redis info 显示的一个内部的指标,重点在于原因,可能是在黑盒监控中看到 redis down,而查看内部信息的时候,显示 redis port is refused connection

Blackbox Exporter

  • Blackbox Exporter 是 Prometheus 社区提供的官方黑盒监控解决方案,其允许用户通过:HTTP、HTTPS、DNS、TCP 以及 ICMP 的方式对网络进行探测

1、HTTP 测试

  • 定义 Request Header 信息
  • 判断 Http status / Http Respones Header / Http Body 内容

2、TCP 测试

  • 业务组件端口状态监听
  • 应用层协议定义与监听

3、ICMP 测试

  • 主机探活机制

4、POST 测试

  • 接口联通性

5、SSL 证书过期时间

安装

1、部署yaml

version: '3'
services:
  blackbox:
    image: prom/blackbox-exporter:latest
    container_name: blackbox
    restart: always
    user: root
    ports:
      - 9115:9115
    volumes:
      - /mydata/blackbox/conf:/config
      - /etc/localtime:/etc/localtime
    environment:
      - 'TZ="Asia/Shanghai"'
    command:
      - '--config.file=/config/blackbox.yml'
      - '--log.level=debug' #开启debug日志调试
networks:
  default:
    external:
      name: prom_net

2、配置文件

modules:
  http_2xx:   #这个名字是随便写的,但是需要在 prometheus.yml 配置文件中对应起来
    prober: http  #进行探测的协议,可以是 http、tcp、dns、icmp,所有的探针均是以 Module 的信息进行配置
    timeout: 10s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"] #此探针接受的 HTTP 版本
      valid_status_codes: [200,302]  # 这里最好作一个返回状态码,在grafana作图时,有明示
      method: GET #探针将使用的 HTTP 方法
      preferred_ip_protocol: "ip4" #TTP 探针的 IP 协议(ip4,ip6)
      ip_protocol_fallback: false #default = true 
      no_follow_redirects: true  #对于 http 服务是否启用 ssl 有强制的标准,可以使用 fail_if_ssl 和 fail_if_not_ssl 进行配置。fail_if_ssl 为 true 时,表示如果站点启>用了 ssl 则探针失败,反之成功。fail_if_not_ssl 刚好相反
  http_post_2xx: # http post 监测模块
    prober: http
    timeout: 10s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]
      valid_status_codes: [200]
      method: POST
      preferred_ip_protocol: "ip4"
  tcp_connect: # tcp 监测模块
    prober: tcp
    timeout: 10s
  ping: # icmp 检测模块
    prober: icmp
    timeout: 5s
    icmp:
      preferred_ip_protocol: "ip4"

3、HTTP就是通过GET或者POST的方式来检测应用是否正常

- job_name: "prd站点信息"
  metrics_path: /probe
  scrape_interval: 30s
  params:
    module: [http_2xx]
  static_configs:
    - targets:
      - www.***.com
      labels:
        group: demo1
    - targets:
      - www.***.com
      labels:
        group: demo2
    - targets:
      - www.***.com
      labels:
        group: demo3
    - targets:
      - www.***.com
      labels:
        group: demo4
  relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: 10.10.10.179:9115

4、重载配置后可以看到监控如下:

5、接口数据

6、告警规则

groups:
- name: blackbox-web端点
  rules:
  - alert: BlackboxProbeFailed
    expr: probe_success == 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Blackbox probe failed (instance {{ $labels.instance }})
      description: "Probe failed\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

  - alert: BlackboxConfigurationReloadFailure
    expr: blackbox_exporter_config_last_reload_successful != 1
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: Blackbox configuration reload failure (instance {{ $labels.instance }})
      description: "Blackbox configuration reload failure\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

  - alert: BlackboxSlowProbe
    expr: avg_over_time(probe_duration_seconds[1m]) > 1
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: Blackbox slow probe (instance {{ $labels.instance }})
      description: "Blackbox probe took more than 1s to complete\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

  - alert: BlackboxProbeHttpFailure
    expr: probe_http_status_code <= 199 OR probe_http_status_code >= 400
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Blackbox probe HTTP failure (instance {{ $labels.instance }})
      description: "HTTP status code is not 200-399\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

  - alert: BlackboxSslCertificateWillExpireSoon
    expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: Blackbox SSL certificate will expire soon (instance {{ $labels.instance }})
      description: "SSL certificate expires in 30 days\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

  - alert: BlackboxSslCertificateWillExpireSoon
    expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 3
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Blackbox SSL certificate will expire soon (instance {{ $labels.instance }})
      description: "SSL certificate expires in 3 days\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

  - alert: BlackboxSslCertificateExpired
    expr: probe_ssl_earliest_cert_expiry - time() <= 0
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Blackbox SSL certificate expired (instance {{ $labels.instance }})
      description: "SSL certificate has expired already\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

  - alert: BlackboxProbeSlowHttp
    expr: avg_over_time(probe_http_duration_seconds[1m]) > 1
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: Blackbox probe slow HTTP (instance {{ $labels.instance }})
      description: "HTTP request took more than 1s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

  - alert: BlackboxProbeSlowPing
    expr: avg_over_time(probe_icmp_duration_seconds[1m]) > 1
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: Blackbox probe slow ping (instance {{ $labels.instance }})
      description: "Blackbox ping took more than 1s\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Grafana面板

1、使用9965模版倒入即可

博客内容均系原创,未经允许严禁转载!
暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇
首页