Where does my code crash?
When the code you are running "crashes", you are usually presented with a backtrace or stack trace showing the nesting level of the called functions plus some more or less "obscure" information. Such backtrace can be more detailed if you have compiled your code with debug symbols.
Consider the following backtrace:
*** Break *** segmentation violation
Generating stack trace...
0x00000001191d3bdd in AliAnalysisTaskSE::Exec(char const*) (in libANALYSISalice.so) (AliAnalysisTask.h:118)
0x0000000111df577f in TTask::ExecuteTask(char const*) (in libCore.5.so) + 383
0x00000001110518ed in AliAnalysisManager::ExecAnalysis(char const*) (in libANALYSIS.so) (AliAnalysisManager.cxx:2323)
0x0000000111061be8 in AliAnalysisSelector::Process(long long) (in libANALYSIS.so) (AliAnalysisSelector.cxx:164)
0x0000000114aa824f in TTreePlayer::Process(TSelector*, char const*, long long, long long) (in libTreePlayer.5.so) + 895
0x000000011105afd6 in AliAnalysisManager::StartAnalysis(char const*, TTree*, long long, long long) (in libANALYSIS.so) (AliAnalysisManager.cxx:1950)
0x0000000111086153 in G__G__ANALYSIS_208_0_14(G__value*, char const*, G__param*, int) (in libANALYSIS.so) (G__ANALYSIS.cxx:4560)
0x0000000112578881 in Cint::G__ExceptionWrapper(int (*)(G__value*, char const*, G__param*, int), G__value*, char*, G__param*, int) (in libCint.5.so) + 49
0x000000011262185b in G__execute_call (in libCint.5.so) + 75
0x0000000112621cbc in G__call_cppfunc (in libCint.5.so) + 860
0x00000001125f563e in G__interpret_func (in libCint.5.so) + 5198
0x00000001125e3a67 in G__getfunction (in libCint.5.so) + 5655
0x00000001126e45db in G__getstructmem(int, G__FastAllocString&, char*, int, char*, int*, G__var_array*, int) (in libCint.5.so) + 4187
0x00000001126db0cd in G__getvariable (in libCint.5.so) + 7341
0x00000001125d82c2 in G__getitem (in libCint.5.so) + 402
0x00000001125d3e92 in G__getexpr (in libCint.5.so) + 31458
0x000000011265517c in G__exec_statement (in libCint.5.so) + 34988
0x000000011265348d in G__exec_statement (in libCint.5.so) + 27581
0x00000001125f8282 in G__interpret_func (in libCint.5.so) + 16530
0x00000001125e3ab4 in G__getfunction (in libCint.5.so) + 5732
0x00000001125d832f in G__getitem (in libCint.5.so) + 511
0x00000001125d3e92 in G__getexpr (in libCint.5.so) + 31458
0x000000011265517c in G__exec_statement (in libCint.5.so) + 34988
0x00000001125ba885 in G__exec_tempfile_core(char const*, __sFILE*) (in libCint.5.so) + 1125
0x00000001125ba416 in G__exec_tempfile_fp (in libCint.5.so) + 22
0x000000011265e4ab in G__process_cmd (in libCint.5.so) + 9339
0x0000000111e27674 in TCint::ProcessLine(char const*, TInterpreter::EErrorCode*) (in libCore.5.so) + 884
0x0000000111d86fbd in TApplication::ProcessLine(char const*, bool, int*) (in libCore.5.so) + 2141
0x0000000113d81be4 in TRint::HandleTermInput() (in libRint.5.so) + 676
0x0000000111e5f11d in TUnixSystem::CheckDescriptors() (in libCore.5.so) + 317
0x0000000111e68053 in TMacOSXSystem::DispatchOneEvent(bool) (in libCore.5.so) + 387
0x0000000111de4d5a in TSystem::InnerLoop() (in libCore.5.so) + 26
0x0000000111de4c58 in TSystem::Run() (in libCore.5.so) + 392
0x0000000111d87cc4 in TApplication::Run(bool) (in libCore.5.so) + 36
0x0000000113d8150c in TRint::Run(bool) (in libRint.5.so) + 1420
0x000000010546c5a8 in main (in aliroot) (aliroot.cxx:113)
0x00007fff8c4d75fd in start (in libdyld.dylib) + 1
0x0000000000000001 in <unknown function>
This means that the program has crashed while running the function
AliAnalysisTaskSE::Exec
, and in particular while executing the line of code
defined in AliAnalysisTask.h
at line 113:
0x00000001191d3bdd in AliAnalysisTaskSE::Exec(char const*) (in libANALYSISalice.so) (AliAnalysisTask.h:118)
The function was in turn called by TTask::ExecuteTask()
, and so on, until we
reach the first function called by the program.
Other lines carry obscure information:
0x0000000111e5f11d in TUnixSystem::CheckDescriptors() (in libCore.5.so) + 317
In this case, we know that the CheckDescriptors()
function is defined in
libCore.5.so
and the crash occurred at byte offset 317 calculated from the
start of the compiled function inside the binary library.
The difference in debug information is given by the fact that:
- AliRoot has been compiled with debug information: all AliRoot functions report the corresponding source file and line number
- ROOT has been compiled without debug information: all ROOT functions only report obscure offsets
It is possible to mix non-debug and debug code, but we will obtain comprehensible information only for the latter.
In the following sections we will present two commonly used debugging techniques: one very simple based on printouts, and another one based on using a debugger (gdb).
Debugging with "printf/cout"
The simplest way to understand what goes on in your code is to introduce periodic printouts stating the value of some variables, or simply indicating what the program is currently executing.
Please note that, while printouts may be useful, they surely clutter your code if abused: so if you end up adding one printout per code line, you'd better off with a debugger, as explained in the next paragraph.
Apart from code cleanliness, printouts have a computational cost, which might be non-negligible depending on the frequency of your output: we are about to learn the appropriate way to add discrete debug printouts to your code in a way that:
- we can turn them off when debug is over without additional computational costs
- we don't have to write our code twice, one debug and one production version
The generic way
The simplest (and naive) way to do it in C/C++ is the following:
// global variable
bool gPrintDebug = true;
void PrintDebug(const char *message) {
if (gPrintDebug == false)
return;
std::cout << message << std::endl;
}
int main(int argn, char *argv[]) {
PrintDebug("We are here");
return 0;
}
The above snippet has a convenient PrintDebug()
function, whose output can be
suppressed by simply setting the global variable gPrintDebug = false
.
However, the PrintDebug() function is called in any case, even if no output
will be produced, and this function call has a non-negligible cost; moreover,
the condition if (!gPrintDebug)
must be tested every time, which has another
computational cost.
The correct way is to use C/C++ preprocessor macros. The simplest way is to
move the condition if (!gPrintDebug)
from C++ to the preprocessor:
#include <iostream>
//#define PRINT_DEBUG
#ifdef PRINT_DEBUG
void PrintDebug(const char *message) {
std::cout << message << std::endl;
}
#else
#define PrintDebug(...) 0;
#endif
int main(int argn, char *argv[]) {
PrintDebug("We are here");
return 0;
}
If we compile the above snippet, no call to the print function will ever be generated in the compiled code, because the condition is evaluated at compile time by the code preprocessor (to be precise, this happens just before compiling the code).
To enable print debug, you can either uncomment the #define
line on top, or
compile the code with:
gcc -o prog -D PRINT_DEBUG prog.cxx
i.e. you can define the PRINT_DEBUG
preprocessor variable on the command
line without modifying your code.
There are fancier ways to use preprocessor macros for debug: this one is trivial and provided as an example.
Using a debugger
gdb is the GNU debugger. To install it on Debian-based distributions (like Ubuntu):
apt update
apt install gdb
gdb and OS X
Since some time, OS X moved to LLVM. In general, code compiled with LLVM compilers cannot be debugged easily with gdb, although it is claimed now and then to be compatible.
LLVM comes with a new debugger called lldb: for the basic operations, lldb's commands and its way of funcitoning are very similar, if not identical in some cases, to gdb.
The debugging example that follows contains examples and references to both gdb and lldb, and enough information on how to use the latter if you come from gdb.
Just for reference, here's how to install gdb on OS X properly.
It is recommended you get accustomed with lldb if you use a Mac, and refrain from using gdb, as it lacks official support.
Test case: a program that crashes
Consider the following snippet:
#include <iostream>
#define FLOAT_ARY 12
void crash_function() {
unsigned int answer_to_everything = 42;
int another_number = -answer_to_everything;
float many_floats[FLOAT_ARY];
int counter_max = 10000;
const char *test_string = "oh no, not again!";
std::cout << test_string << std::endl;
for (unsigned int i=0; i<counter_max; i++) {
std::cout << "setting index " << i << " out of " << counter_max << ". string is: " << test_string << std::endl;
many_floats[i] = 0.12345;
}
std::cout << another_number << std::endl;
}
void func2() {
float pi = 3.14;
crash_function();
}
void func1() {
int a_variable = 456;
a_variable += 1;
func2();
}
int main(int argn, char *argv[]) {
func1();
crash_function();
return 0;
}
Name it crash.cxx
and compile it with debug symbols. On Linux:
g++ -g -O0 -o crash crash.cxx
On OS X:
clang++ -g -O0 -o crash crash.cxx
This code contains an array of floats, allocated "on the stack" (i.e. on the function's memory), which has space for 12 elements:
float many_floats[12];
However in the for
loop we are writing way beyond its length in order to
artificially generate a crash. Let's run it:
$> ./crash
...
setting index 811 out of 10000. string is: oh no, not again!
setting index 812 out of 10000. string is: oh no, not again!
setting index 813 out of 10000. string is: oh no, not again!
setting index 814 out of 10000. string is: oh no, not again!
setting index 815 out of 10000. string is: oh no, not again!
setting index 816 out of 10000. string is: oh no, not again!
Segmentation fault: 11
The program crashes at a random point as expected.
We did not obtain any stack trace automatically: this is a peculiarity of ROOT programs.
We want to run the program under a debugger in order to:
- obtain a stack trace
- see the value of the variables at the moment of the crash
Load the program under gdb:
gdb --args ./crash
On OS X you might use lldb:
lldb -- ./crash
From the gdb or lldb prompt, run the program by typing run
: when the program
crashes, we are back at the debugger's prompt.
Here is the sample output from lldb (gdb's output is very similar):
* thread #1: tid = 0x13c1d0, 0x0000000100000ee8 crash`crash_function() + 280 at crash.cxx:14, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x7fff5fc00000)
frame #0: 0x0000000100000ee8 crash`crash_function() + 280 at crash.cxx:14
11 std::cout << test_string << std::endl;
12 for (unsigned int i=0; i<counter_max; i++) {
13 std::cout << "setting index " << i << " out of " << counter_max << ". string is: " << test_string << std::endl;
-> 14 many_floats[i] = 0.12345;
15 }
16 std::cout << another_number << std::endl;
17 }
The debugger is pointing us exactly to the line where the execution has stopped: since we have compiled the program with debug symbols, we have this precise reference and the debugger can even show us the code snippet.
To manually show the code snippet, type list
(both gdb and lldb). By pressing
Enter on the empty prompt you are repeating the last list
command that
advances in the code until the end of file is reached.
To reset code listing to the beginning, type list 1
.
Print a backtrace by typing bt
:
* thread #1: tid = 0x13d53e, 0x0000000100000ee8 crash`crash_function() + 280 at crash.cxx:14, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x7fff5fc00000)
* frame #0: 0x0000000100000ee8 crash`crash_function() + 280 at crash.cxx:14
Print the list of local variables available in the current program context with
frame variable
(lldb):
$> frame variable
(unsigned int) answer_to_everything = 42
(int) another_number = -42
(float [12]) many_floats = ([0] = 0.123450004, [1] = 0.123450004, [2] = 0.123450004, [3] = 0.123450004, [4] = 0.123450004, [5] = 0.123450004, [6] = 0.123450004, [7] = 0.123450004, [8] = 0.123450004, [9] = 0.123450004, [10] = 0.123450004, [11] = 0.123450004)
(int) counter_max = 10000
(const char *) test_string = 0x0000000100001f14 "oh no, not again!"
(unsigned int) i = 436
Print the value of a variable at this point in the program (both lldb and gdb):
$> print global_int
(int) $0 = 789
$> print test_string
(const char *) $1 = 0x0000000100001f14 "oh no, not again!"
We can manually set a "breakpoint" in the program to stop before executing a certain line of code, then we can advance manually from there.
Let's start by killing the current running process:
kill
Set a breakpoint when calling the function func1()
. With lldb:
breakpoint set -b func1
With gdb:
break func1
Or, as an alternative, we can set the breakpoint to a certain line of code. With lldb:
$> breakpoint set -l 28
Breakpoint 1: where = crash`func1() + 15 at crash.cxx:28, address = 0x0000000100000f9f
With gdb:
$> break set 28
Breakpoint 1 at 0x100000f9f: file crash.cxx, line 28.
Now run the program with run
. It will stop where we told it to stop, even if
it did not crash: since the execution is stopped, we can do many things:
- examine the variables
- advance line by line
- resume the normal execution
- delete the breakpoint
For instance, we set the breakpoint before incrementing the variable
a_variable
of one unit: let's print its value:
$> print a_variable
$0 = 456
Process next line with next
, or simply n
, then examine again:
$> n
$> print a_variable
$1 = 457
The variable was correctly incremented by one unit like expected. To resume
normal execution just type continue
(our demo program will continue until the
crash).
Current breakpoints can be listed. On lldb:
$> breakpoint list
Current breakpoints:
1: file = '/tmp/crash.cxx', line = 28, locations = 1, resolved = 1, hit count = 1
1.1: where = crash`func1() + 15 at crash.cxx:28, address = 0x0000000100000f9f, resolved, hit count = 1
On gdb:
$> info breakpoints
Num Type Disp Enb Address What
1 breakpoint keep y 0x0000000100000f9f in func1() at crash.cxx:28
breakpoint already hit 1 time
We can delete a breakpoint, or disable it if we don't want it to be effective but we might want to restore it later. With lldb:
$> breakpoint delete 1
$> breakpoint disable 2
$> breakpoint enable 2
With gdb:
$> delete 1
$> disable 2
$> enable 2
We have now learned how to perform the basic operations with two different debuggers: stopping the program's execution and checking on the variable values.
Without a debugger those operations are commonly done by adding printouts, input requests, etc.: we now know how to obtain the same results without modifying our code.
Commands summary
gdb | lldb | description |
---|---|---|
run | run | start execution |
quit | quit | exit debugger |
break | breakpoint set [-l/-f...] | set a breakpoint |
delete | breakpoint delete | deletes a breakpoint |
info breakpoints | breakpoint list | lists current breakpoints |
enable | breakpoint enable | enable a breakpoint |
disable | breakpoint disable | disable a breakpoint |
continue | continue | resume a stopped execution |
kill | kill | terminate current execution |
list | list | show source code |
print value of a variable |
A complete list of gdb vs. lldb commands can be found on the official lldb documentation.