k8s與監控--prometheus的遠端儲存

來源:互聯網
上載者:User

prometheus的遠端儲存

前言

prometheus在容器雲的領域實力毋庸置疑,越來越多的雲原生組件直接提供prometheus的metrics介面,無需額外的exporter。所以採用prometheus作為整個叢集的監控方案是合適的。但是metrics的儲存這塊,prometheus提供了本機存放區,即tsdb時序資料庫。本機存放區的優勢就是營運簡單,啟動prometheus只需一個命令,下面兩個啟動參數指定了資料路徑和儲存時間。

  • storage.tsdb.path: tsdb資料庫路徑,預設 data/
  • storage.tsdb.retention: 資料保留時間,預設15天

缺點就是無法大量的metrics持久化。當然prometheus2.0以後壓縮資料能力得到了很大的提升。
為瞭解決單節點儲存的限制,prometheus沒有自己實現叢集儲存,而是提供了遠程讀寫的介面,讓使用者自己選擇合適的時序資料庫來實現prometheus的擴充性。
prometheus通過下面兩張方式來實現與其他的遠端儲存系統對接

  • Prometheus 按照標準的格式將metrics寫到遠端儲存
  • prometheus 按照標準格式從遠端的url來讀取metrics


下面我將重點剖析遠端儲存的方案

遠端儲存方案

設定檔

遠程寫 write_relabel_configs

# The URL of the endpoint to send samples to.url: <string># Timeout for requests to the remote write endpoint.[ remote_timeout: <duration> | default = 30s ]# List of remote write relabel configurations.write_relabel_configs:  [ - <relabel_config> ... ]# Sets the `Authorization` header on every remote write request with the# configured username and password.# password and password_file are mutually exclusive.basic_auth:  [ username: <string> ]  [ password: <string> ]  [ password_file: <string> ]# Sets the `Authorization` header on every remote write request with# the configured bearer token. It is mutually exclusive with `bearer_token_file`.[ bearer_token: <string> ]# Sets the `Authorization` header on every remote write request with the bearer token# read from the configured file. It is mutually exclusive with `bearer_token`.[ bearer_token_file: /path/to/bearer/token/file ]# Configures the remote write request's TLS settings.tls_config:  [ <tls_config> ]# Optional proxy URL.[ proxy_url: <string> ]# Configures the queue used to write to remote storage.queue_config:  # Number of samples to buffer per shard before we start dropping them.  [ capacity: <int> | default = 100000 ]  # Maximum number of shards, i.e. amount of concurrency.  [ max_shards: <int> | default = 1000 ]  # Maximum number of samples per send.  [ max_samples_per_send: <int> | default = 100]  # Maximum time a sample will wait in buffer.  [ batch_send_deadline: <duration> | default = 5s ]  # Maximum number of times to retry a batch on recoverable errors.  [ max_retries: <int> | default = 10 ]  # Initial retry delay. Gets doubled for every retry.  [ min_backoff: <duration> | default = 30ms ]  # Maximum retry delay.  [ max_backoff: <duration> | default = 100ms ]

遠程讀

# The URL of the endpoint to query from.url: <string># An optional list of equality matchers which have to be# present in a selector to query the remote read endpoint.required_matchers:  [ <labelname>: <labelvalue> ... ]# Timeout for requests to the remote read endpoint.[ remote_timeout: <duration> | default = 1m ]# Whether reads should be made for queries for time ranges that# the local storage should have complete data for.[ read_recent: <boolean> | default = false ]# Sets the `Authorization` header on every remote read request with the# configured username and password.# password and password_file are mutually exclusive.basic_auth:  [ username: <string> ]  [ password: <string> ]  [ password_file: <string> ]# Sets the `Authorization` header on every remote read request with# the configured bearer token. It is mutually exclusive with `bearer_token_file`.[ bearer_token: <string> ]# Sets the `Authorization` header on every remote read request with the bearer token# read from the configured file. It is mutually exclusive with `bearer_token`.[ bearer_token_file: /path/to/bearer/token/file ]# Configures the remote read request's TLS settings.tls_config:  [ <tls_config> ]# Optional proxy URL.[ proxy_url: <string> ]

PS

  • 遠程寫配置中的write_relabel_configs 該配置項,充分利用了prometheus強大的relabel的功能。可以過濾需要寫到遠端儲存的metrics。

例如:選擇指定的metrics。

remote_write:      - url: "http://prometheus-remote-storage-adapter-svc:9201/write"        write_relabel_configs:        - action: keep          source_labels: [__name__]          regex: container_network_receive_bytes_total|container_network_receive_packets_dropped_total
  • global配置中external_labels,在prometheus的聯邦和遠程讀寫的可以考慮設定該配置項,從而區分各個叢集。
global:      scrape_interval: 20s      # The labels to add to any time series or alerts when communicating with      # external systems (federation, remote storage, Alertmanager).      external_labels:        cid: '9'

已有的遠端儲存的方案

現在社區已經實現了以下的遠程儲存方案

  • AppOptics: write
  • Chronix: write
  • Cortex: read and write
  • CrateDB: read and write
  • Elasticsearch: write
  • Gnocchi: write
  • Graphite: write
  • InfluxDB: read and write
  • OpenTSDB: write
  • PostgreSQL/TimescaleDB: read and write
  • SignalFx: write

上面有些儲存是只支援寫的。其實研讀源碼,能否支援遠程讀,
取決於該儲存是否支援Regex的查詢匹配。具體實現下一節,將會解讀一下prometheus-postgresql-adapter和如何?一個自己的adapter。
同時支援遠程讀寫的

  • Cortex來源於weave公司,整個架構對prometheus做了上層的封裝,用到了很多組件。稍微複雜。
  • InfluxDB 開源版不支援叢集。對於metrics量比較大的,寫入壓力大,然後influxdb-relay方案並不是真正的高可用。當然餓了麼開源了influxdb-proxy,有興趣的可以嘗試一下。
  • CrateDB 基於es。具體瞭解不多
  • TimescaleDB 個人比較中意該方案。傳統營運對pgsql熟悉度高,營運靠譜。目前支援 streaming replication方案支援高可用。

後記

其實如果收集的metrics用於資料分析,可以考慮clickhouse資料庫,叢集方案和寫入效能以及支援遠程讀寫。這塊正在研究中。待有了一定成果以後再專門寫一篇文章解讀。目前我們的持久化方案準備用TimescaleDB。

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.