Metrics#
Mars has a unified metrics API and three different backends.
A Unified Metrics API#
Mars metrics API are in mars/metrics/api.py
and there are four metric types:
Counter
is a cumulative type of data which represents a monotonically increasing number.Gauge
is a single numerical value.Meter
is the rate at which a set of events occur. we can use it as qps or tps.Histogram
is a type of statistics which records the average value of a window data.
And we can use these types as follows:
# Four metrics have a unified parameter list:
# 1. Declarative method: Metrics.counter(name: str, description: str = "", tag_keys: Optional[Tuple[str]] = None)
# 2. Record method: record(value=1, tags: Optional[Dict[str, str]] = None)
c1 = Metrics.counter('counter1', 'A counter')
c1.record(1)
c2 = Metrics.counter('counter2', 'A counter', ('service', 'tenant'))
c2.record(1, {'service': 'mars', 'tenant': 'test'})
g1 = Metrics.gauge('gauge1')
g1.record(1)
g2 = Metrics.gauge('gauge2', 'A gauge', ('service', 'tenant'))
g2.record(1, {'service': 'mars', 'tenant': 'test'})
m1 = Metrics.meter('meter1')
m1.record(1)
m2 = Metrics.meter('meter1', 'A meter', ('service', 'tenant'))
m2.record(1, {'service': 'mars', 'tenant': 'test'})
h1 = Metrics.histogram('histogram1')
h1.record(1)
h2 = Metrics.histogram('histogram1', 'A histogram', ('service', 'tenant')))
h2.record(1, {'service': 'mars', 'tenant': 'test'})
Note: If tag_keys
is declared, tags
must be specified when invoking
record
method and tags’ keys must be consistent with tag_keys
.
Three different Backends#
Mars metrics support three different backends:
console
is used for debug and it just prints the value.prometheus
is an open-source systems monitoring and alerting toolkit.ray
is a metric backend which just runs on ray engine.
We can choose a metric backend by configuring metrics.backend
in
mars/deploy/oscar/base_config.yml
or its descendant files.
Metrics Naming Convention#
We propose a naming convention for metrics as follows:
namespace.[component].metric_name[_units]
namespace
could bemars
.component
could be supervisor, worker or band etc, and can be omitted.units
is the metric unit which may be seconds when recording time, or_count
when metric type isCounter
,_number
when metric type isGauge
if there is no suitable unit.