You can find the answers to these questions on the FAQ page.
  • It doesn't work. Where can I find diagnostic output?
    Version 3.* writes warnings and error messages using the syslog(3) facility. Depending on your system the syslog-daemon writes these messages to files and/or sends them to another host. On most GNU/Linux distributions the place to look at is either /var/log/syslog or /var/log/messages.
    Version 4.0 and later comes with the logfile and syslog plugins which can be used to write status messages to a file or send it to the syslog daemon.
  • Some lines of the config seem to be ignored..?
    Yes, that's a known bug. You probably have one or more white spaces at the end of the lines being ignored.
    This is a bug in the library used by collectd 3.* to parse the configfile. Versions 4.0 and later use a different library and don't have this problem.
  • Can I adjust the interval in which data is collected?
    Yes, since version 3.9.0 this can be set at compile-time. Keep in mind, though, that this will change the layout of the generated RRD-files. Also, clients and servers should have the same setting here to avoid interesting results.
    Version 4.0 allows this setting to be adjusted in the configfile.
  • I try to use the ping-plugin, but keep getting the message "`ping_host_add' failed.". What's the matter?
    In order to generate ICMP packets one needs to open a so called "RAW socket". On most UNIX systems only the superuser (root) may open such sockets.
    In addition, some virtualization environments, such as VServer and Solaris Zones have been reported to cause some trouble.
  • Who receives the multicast traffic?
    I don't know. That entirely depends on your network setup. By default collectd uses "site local" addresses, that should not be routed to outside your AS. If that's really the case is up to you.
  • What does "Invalid value for config option `Mode': `Local'" mean?
    Is means that the mode "Local" is not available. Most likely the "librrd" library wasn't found. If you want to write to RRD-files install "librrd" or, if you already did that, use the --with-rrdtool option of the ./configure-script to point to the right direction.
  • How do I use --with-rrdtool?
    If you installed libraries in a non-standard (or non-system) path you need to specify them when running the configure script. Otherwise it will not find them and build the binaries without linking against the library.
    You need to set the PATH as given to the --prefix option when compiling the library. The script actually looks for the two subdirectories PATH/include and PATH/lib, so check for their existence if things don't work. If, for example, you installed RRDTool in /opt/rrdtool-x.y.z you need to run configure like this:
    $ ./configure --with-rrdtool=/opt/rrdtool-x.y.z
  • The apache-plugin reports the following error: apache: curl_easy_perform failed: Failed writing body. What's wrong?
    The response received was too big and didn't fit into the buffer. Check the URL-option in the configfile. Especially check that the URL ends in "?auto": collectd requires the machine readable output generated by the Apache-plugin mod_status and will not work with anything else.
  • What do the version numbers mean?
    The version numbers consist of three numbers: The major- and minor-number and the patchlevel.
    • Versions with different major-numbers are basically not compatible. This means that the definitions of RRD-files or config-options have been changed or, in general, that the user has to do something in addition to install the new version. This is not nice and avoided when possible, but sometimes necessary to prevent old mistakes to become ancient mistakes. We try to provide migration scripts, though, to make a switch as easy as possible. See the v3 to v4 migration guide for details.
    • Versions with differing minor-numbers are backwards compatible, i. e. you can replace the lower version with the higher one and everything should still work. This means that features are added, but not removed or changed and that the default behavior does not change.
    • Versions with different patchlevels are both, forward- and backwards-compatible, because no new features have been introduced. The only difference between the two versions is one or more bugfixes, so you should generally install the higher version of the two.
  • I enabled the foo plugin using --enable-foo but now the build process fails. What's wrong?
    Since version 4.0.0 a server process doesn't need to load the plugins from which data should be received - in contrast to versions 3.*. This means, that plugins with unmet dependencies no longer have any purpose. So, we moved dependency checking into the configure script, starting with version 4.1.0. I. e. the configure script now automatically disables all plugins with unmet dependencies and enables all plugins whose dependencies are met.
    So, if a plugin is displayed as disabled, it's dependencies are not met. The normal way to get a plugin compiled is to install the missing dependencies and re-run the configure script.
    You can force it to be build using --enable-foo, but you need to know exactly what you are doing. If you do this you're out in the dark, cold woods and totally on your own!
  • The build process fails with "relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC". What's wrong?
    Many plugins have to be linked against libraries. A few of them (currently iptables, netlink and nut are known to be affected) link against libraries that are only available as "static libraries" in many distributions. Most distributions (e. g. Debian and SuSE GNU/Linux) do not compile static libraries with the "-fPIC" option. Thus they cannot be linked with shared objects compiled with "-fPIC". Some architectures (among them i386) do not seem to care about that and handle it in some (probably magic) way. However, other architectures (mostly 64bit like amd64 or hppa) cannot handle that and thus the compiler aborts with the error message mentioned above.
    To fix this issue, you need a version of the static library compiled with "-fPIC" (or a shared library). Ask your distributor to provide a suitable version of the library or compile it yourself.
    For more detailed information please refer to:
  • Solaris support is broken! The build aborts! Help!
    There are two known issues with Solaris, but both can be fixed relatively easy:
    If you build a 32bit binary, the configure script will (try to) enable LFS. This will result in an error which looks somehow like this:
    config.h:832:1: error: "_FILE_OFFSET_BITS" redefined
    Also, the swap-plugin has some problems of it's own with this:
    swap.c:197: warning: implicit declaration of function 'swapctl'
    swap.c:197: error: 'SC_AINFO' undeclared (first use in this function)
    The solution is to build a 64bit binary! If you build a 64bit binary, LFS is not needed and the swap plugin works as intended. To do this, pass the -m64 flag to the compiler (assuming you're using the Sun C compiler).
    Another problem is that by default Sun defines a version of getgrnam_r that isn't POSIX-compatible. To enable POSIX-compatibility pass the _POSIX_PTHREAD_SEMANTICS define to the compiler.
    Putting all together you need to pass the following flags to the configure-script:
    # Sun CC
    $ ./configure CFLAGS="-m64 -mt -D_POSIX_PTHREAD_SEMANTICS"
    Please note that we only test the Sun C compiler ourselves, but GCC may work, too. When using the GCC you need to substitute the -mt flag with the -pthreads flag. So if you use GCC the above invokation of ./configure becomes:
    # GCC
    $ ./configure CFLAGS="-m64 -pthreads -D_POSIX_PTHREAD_SEMANTICS"
    Thanks to Christophe Kalt for sharing his insights :)
  • Why is the CPU usage split up in so many files? Can I change that?
    The short answer is: That is because otherwise backwards compatibility would be impossible and you would have to re-create your files from scratch regularly. And, "no".
    The long answer and explanation of the short answer is: collectd runs on a variety of operating systems. Each operating system has it's own method for accounting CPU states, memory consumption, swap usage, and so on. If all these data sources where in one data set, every new supported operating system or any addition to an already supported operating system would mean that we need to modify the data set. This cannot be done without breaking backwards compatibility.
    To give you a few examples: Sometime in mid-2.6 the Linux kernel added some Xen-patches which provided a new CPU state: "steal time". When adding support for BSD systems we had to add "wired" memory. NFSv4 added some new procedures that NFSv3 didn't have, etc pp.
    That interface traffic has two data sources is different, because every operating system will account received and transmitted bytes. Likewise for the system load: The 1, 5, and 15 minute averages have been like that for ages and it's very unlikely that any weird UNIX does this different.
    Changing the layout of the data is not just a matter of changing the types.db file. That file describes the layout of the data submitted by plugins. The plugins don't need it - they know what data they submit. It's needed by the daemon and writing plugin to know how to store the data. If you mess with the file without knowing what you do, you will most likely end up with the data not being collected at all anymore.
  • Why doesn't collection.cgi draw foo graphs correctly?
    That script is meant as a starting point for own developments, not as a ready to use web frontend for RRD files written by collectd.
    It is just an example, because it's not really usable as it is. And it's not really useable, because we are UNIX developers and don't enjoy doing web stuff much. Working on the daemon is just so much more fun.. ;) So in the best of free / open source traditions: Patches welcome!
    There are alternatives, though. We've heard from various people using Cacti to render the graphs. Sergiusz Pawlowicz of the BBC has written CollectGraph, a macro for the MoinMoin wiki. And of course there's drraw.

Notifications and thresholds

Starting with version 4.3 collectd introduces the concept of "notifications". This document describes the concept and the related "thresholds" as they are implemented in the current version 4.3.

Notifications

Notifications are generic text messages with an associated severity and a time. Their use is to inform the user about a notable condition, such as an unusual high CPU load or a failed health check. In addition to the severity and time the messages may be associated with performance data using the usual host/plugin/type tuple. The text doesn't follow any protocol or other specifications and the text of notifications generated by collectd may change without notice between versions. If interpretation of the text should become necessary we will add a computer understandable field or flag for that purpose. The severity can be one of OKAY, WARNING, and FAILURE with the usual meaning. The time hopefully is self-explanatory.

Notifications are dispatched in the same manner in which performance data is dispatched: There are "producers", i. e. plugins that create notifications, and "consumers", i. e. plugins that receive notifications and do something with them. Plugins that can either create or receive notifications are right now:

Thresholds

One of the central sources of notifications is to check whether performance values are in an acceptable range. This is done using "Thresholds", the second big change in 4.3. You can define thresholds for any value or group of values which will then be checked.

But besides range checking a possibly less obvious mechanism is enabled if thresholds are configured for a value. Because one appears to be interested in the value, a notification will be created when it hasn't been received for an unusual long time. This way you will get a notification for missing values, too, which would otherwise go unnoticed.

When configuring thresholds you can define if the threshold is supposed to be "persistent". With persistent notifications a notification will be created for each value that is out-of-range. This may result in a high number of notifications, basically one notification each interval. If a threshold is configured to be non-persistent a notification is created for each state change, i. e. when the status changes from "okay" to "out of range" and a second one when it changes back to "okay".

Prospects

There are no detailed plans what we're going to build on top of that infrastructure, but we have some interesting ideas. A plugin which sends notifications to a user using email is a must-have. A plugin which makes a (VoIP)-phonecall would be nice, too. Something using Festival comes to mind. We're always open for new ideas, of course ;)

Design considerations

When thinking about the concept of monitoring functionality in collectd, we tried to take advantage from design problems of other solutions. For example, other projects have a "check" which has a certain status. Although health checks and availability checks are important, there are a lot of situations where some performance data needs to be in some range. One example would be the free space in /var, which should never be less than 100 MByte (or something like that).

Of course, other solutions can do that kind of checks, too. But the way to do this is a disaster: One defines a script to be executed (including arguments) and within the arguments the threshold values are coded. This is not only unreadable but also annoying as hell if every plugin uses its own (or a slightly different) syntax for these thresholds. In our opinion the best solution is to have the plugin report the values and let the daemon decide whether it is good or not. The user gets a uniform interface for defining these threshold limits.

So with collectd's notifications you're very flexible: A notification can inform the user that a host is unreachable, that the harddisk is dead or that the moon was destroyed. But, using the threshold values, a notification can be created when the system temperature exceeds 70 degrees (Celsius), too. So as much functionality as appropriate has been pulled into the daemon so that the plugins can focus on what they were meant to do.