Skip to content
tgrabiec edited this page Dec 5, 2014 · 10 revisions

Info

Collectd is a deamon / set of plug-ins to collect and aggregate counter metrics from various sources for either (semi-)realtime or later viewing and analysis.

Installing

On fedora (and probably other redhat derivatives): sudo yum install collectd

Config

Vanilla installation will neither listen nor record any data. Again, on fedora-ish installs, edit a file /etc/collectd.d/<myconfig>.conf. A minimal setup (according to manual) to get data collected and stored would be:

LoadPlugin network
LoadPlugin logfile
LoadPlugin rrdtool

<Plugin "network">
    Listen "<ip-address>" "<port>"
</Plugin>
<Plugin "rrdtool">
    DataDir "/var/lib/collectd/rrd"
    CacheFlush 120
    WritesPerSecond 50
</Plugin>

The network plugin section tells the daemon to listen for data packets on <ip-address>:<port>, which can be either a uni- or multicast address. Note that multiple "Listen" entries can be added to gather data from more than one ip/interface.

The RRD tool section will enable data being written in this format, organized by host, plugin, plugin-instance, type and type-instance. From this it can be later plotted and analyzed.

(Note: the rrd files will be incrementally written to, using caching, flushing and whatnot, thus "realtime" graphing via this will have quite some delay, which can (probably) be tuned by the abobe cache and writing parameters. At the price of performance (?))

Taken from Collectd networking and Collectd rrd plugin

Generate graphs

See related projects for a list of visualizers. Collectd also comes with a (sample) visualizer package, using perl + webserver to graph collected data. On (again) fedora-ish it can be installed by sudo yum install collectd-web. After it and a whole apache server has been installed and started, you can navigate counter sets and get graphs by going to http://<your-host>/collectd/.

Adding counters

Metric data is recorded through primitive counters which are registered under an ID comprised of:

<plugin> The component or subsystem collect for. For example "cpu", or "interface" (network)
<plugin-instance> (Optional) The individual instance of a plugin being collected. For example, for the cpu case, this would simply be 0, 1, 2... etc. For network interfaces this would be the the interface name (eth0).
<type> The [data type](http://collectd.org/documentation/manpages/types.db.5.shtml) being collected. New data types can be defined (and you do not really need to use a defined type at all, but the RRD plugin for example will not work with types not pre-defined. Either select one of the built-in types from `/usr/share/collectd/types.db`, or create a new database (but this requires all clients to use this db and enable it in config, so using existing types is highly recommended).

For cpu, the cpu type is highly appropriate. It defines that the counter consists of a single value which is an absolute (instant) value.

For a network interface, several types exist, for example if_packets which is defined as two values, RX and TX. Both derivate, which basically means that when looking at the value we are interested in the difference of the value now compared to before, i.e. the derivate.

<type-instance> (Optional) For the cpu example: `idle`, `user`, `kernel` etc.

For a usage example, you can check out reactor.cc:

    uint64_t tasks_processed = 0;
    scollectd::registration regs[] = {
            // queue_length     value:GAUGE:0:U
            // Absolute value of num tasks in queue.
            scollectd::add_polled_metric(scollectd::type_instance_id("reactor"
                    , scollectd::per_cpu_plugin_instance
                    , "queue_length", "tasks-pending")
                    , scollectd::make_typed(scollectd::data_type::GAUGE
                            , std::bind(&decltype(_pending_tasks)::size, &_pending_tasks))
            ),
            // total_operations value:DERIVE:0:U
            scollectd::add_polled_metric(scollectd::type_instance_id("reactor"
                    , scollectd::per_cpu_plugin_instance
                    , "total_operations", "tasks-processed")
                    , scollectd::make_typed(scollectd::data_type::DERIVE, tasks_processed)
            ),
            // queue_length     value:GAUGE:0:U
            // Absolute value of num timers in queue.
            scollectd::add_polled_metric(scollectd::type_instance_id("reactor"
                    , scollectd::per_cpu_plugin_instance
                    , "queue_length", "timers-pending")
                    , scollectd::make_typed(scollectd::data_type::GAUGE
                            , std::bind(&decltype(_timers)::size, &_timers))
            ),
    };

scollectd::registration is an anchor type which will ensure the counter is removed once the anchor goes out of scope. In the reactor loop case, the counters only exist in the run() function. In most other cases they and the anchors would be type members.

Clone this wiki locally