A core file or core dump is a dump of the application state, including all memory contents, to the disk for further inspection. If collectd crashes due to a bug, the best way to debug this is via such a file. Using a debugger it is possible to find out where exactly the problem occurred and under which circumstances.
In order to create and use a core file, you need to take the following steps:
- Create an executable with debugging symbols. You can either re-compile with special flags or install a debugging package if one is available.
- Enable creation of core files.
- Wait until the daemon crashes again.
- Inspect the core file to find the source of the problem.
In order to get meaningful information from a core file, the executable must be built with debugging symbols.
From a package
The easiest way to obtain debugging symbols is by installing an appropriate debugging package. If you're using Debian or Ubuntu, you can install such a debugging package using:
$ apt-get install collectd-dbg
Other distributions may provide debugging packages, too. If you know of such a distribution, please add the information here.
If you installed collectd from source or your distribution doesn't provide a debugging package, you can recompile with the appropriate compiler flags.
The exact flags required depend on the compiler used. If you use the C compiler from the GNU Compiler Collection (GCC), the flag
-g enables the inclusion of debugging symbols. While you're at it, disable optimization using
-O0. This makes it easier to interpret the output of the debugger.
Pass the flags to the configure script using:
$ ./configure $OTHER_FLAGS CFLAGS="-g -O0"
Other compilers may differ. If you know how, please add the information here.
Enabling core files
Many distributions disable the creation of core files by default, so that the disk isn't filled with useless files if the user doesn't know how to use them.
The size of the biggest core file allowed to be written to disk is controlled via
ulimit -c. If creation of core files is disabled, you will get the following output:
$ ulimit -c 0
This means that core dumps are only written if their size is smaller than 0 blocks, i.e. never. You can increase this limit to a reasonable size or use unlimited to force the creation of a core file regardless of its size:
$ ulimit -c unlimited $ ulimit -c unlimited
Please note that this change only effects the shell it was issues from and the programs started from it. The setting it not global. You need to restart the daemon from this shell or add the ulimit-line to the init-script for the changes to take effect.
Under Debian GNU/Linux and Ubuntu, creation of core files is controlled via the file
/etc/default/collectd. You can enable the creation of core dumps by setting:
Waiting for a crash
After restarting the daemon with core files enabled, all you have to do is wait for the daemon to crash again. Until this happens, the core file creation doesn't effect the performance of the daemon. (If you recompiled with
-O0 the daemon may use a bit more CPU time due to inferior code efficiency, but this has nothing to do with the ulimit setting.)
Locating the core file
Once the daemon crashed, a file called
core.$PID will be created in its current working directory. This directory can be set using the BaseDir setting. By default, this is
$prefix/var/lib/collectd. If you installed a package, this directory is most likely
Inspecting the core file
Once the core file has been created, a debugger is used to examine the file in order to find the problem. We'll talk about the standard debugger under GNU/Linux, the GNU Debugger (gdb) here. On other systems, you may use a different debugger, such as dbx under Solaris.
Start the debugger with:
$ gdb $path_to/collectd $path_to/core
This will tell the debugger that we want to inspect the collectd executable and that the state should be read from the core file that was created. Please make sure to use the collectd executable that actually created the core file – else, the information provided by the debugger will be bogus. I. e., after recompiling collectd, you'll have to recreate the core file.
If you know how to work with a debugger, knock yourself out and submit a patch. If not, just create the most useful information for a start and send it to the mailing list. The most useful information is, at first, a stack backtrace. When using gdb, a stack backtrace is printed with
(gdb) backtrace full
Just copy that output to a file and send us an email.