This is the new home of the egghelp.org community forum.
All data has been migrated (including user logins/passwords) to a new phpBB version.


For more information, see this announcement post. Click the X in the top right-corner of this box to dismiss this message.

More info on the crash "glibc detected"

General support and discussion of Eggdrop bots.
Post Reply
J
JPB
Voice
Posts: 2
Joined: Mon Jul 04, 2011 1:41 pm

More info on the crash "glibc detected"

Post by JPB »

Folks -

Here's more info on the crash in eggdrop. What you are seeing is 'glibc detected' - meaning that your C library you use is detecting a memory corruption issue. This is a good thing, because memory should not be corrupted.

I have a system built from scratch, and I can recreate the crashing bots at will by using Tcl/Tk 8.5.10. If I back out to Tcl/Tk 8.5.9, everything works fine and dandy. It just takes minutes for me to swap back and forth, compiling and installing Tcl/Tk, then rebuilding Eggdrop. I do know that my header files are updating properly, etc. There appears to be some sort of issue with Eggdrop and the latest TCL - what, I do not know.

But this is why some people see a problem, and some don't. Many haven't upgraded to Tcl 8.5.10 yet.
n
nml375
Revered One
Posts: 2860
Joined: Fri Aug 04, 2006 2:09 pm

Post by nml375 »

Hi JPB,
Could you try to get a coredump and do a backtrace on the crash?
See if the crash occurs within the add_builtins function in tclhash.c, if that's the case, then something must've been broken in the Tcl_ScanElement/Tcl_ConvertElement function pair of v8.5.10 (which seems to have been heavily re-written in 8.5.10).
NML_375
n
nml375
Revered One
Posts: 2860
Joined: Fri Aug 04, 2006 2:09 pm

Post by nml375 »

One more thing,
Could you try editing the add_builtins function (tclhash.c) like below, and see if that sorts the issue with tcl8.5.10

Code: Select all

void add_builtins(tcl_bind_list_t *tl, cmd_t *cc)
{
  int k, i;
  char p[1024], *l;
  cd_tcl_cmd table[2];

  table[0].name = p;
  table[0].callback = tl->func;
  table[1].name = NULL;
  for (i = 0; cc[i].name; i++) {
    egg_snprintf(p, sizeof p, "*%s:%s", tl->name,
                 cc[i].funcname ? cc[i].funcname : cc[i].name);
    k = TCL_DONT_USE_BRACES;
    l = nmalloc(Tcl_ScanElement(p, &k));
    Tcl_ConvertElement(p, l, k | TCL_DONT_USE_BRACES);
    table[0].cdata = (void *) cc[i].func;
    add_cd_tcl_cmds(table);
    bind_bind_entry(tl, cc[i].flags, cc[i].name, l);
    nfree(l);
  }
}
NML_375
J
JPB
Voice
Posts: 2
Joined: Mon Jul 04, 2011 1:41 pm

I had the same stack trace....

Post by JPB »

in tclHash as the other users have reported.

I tried your change; it did not help. Still crashes in call to nfree in add_builtins, that you already know about.

Want any more data?
n
nml375
Revered One
Posts: 2860
Joined: Fri Aug 04, 2006 2:09 pm

Post by nml375 »

Unfortunately, I still can't reproduce this with tcl8.5.10.
Could you once again modify the add_builtins function as below, and then post the added debug output here?

Code: Select all

void add_builtins(tcl_bind_list_t *tl, cmd_t *cc)
{
  int k, i, size;
  char p[1024], *l;
  cd_tcl_cmd table[2];

  table[0].name = p;
  table[0].callback = tl->func;
  table[1].name = NULL;
  for (i = 0; cc[i].name; i++) {
    egg_snprintf(p, sizeof p, "*%s:%s", tl->name,
                 cc[i].funcname ? cc[i].funcname : cc[i].name);
    size = Tcl_ScanElement(p, &k);
    putlog(LOG_MISC, "*", "Allocating %u bytes for builtin \"%s\", flags: %u", size, p, k);
    l = nmalloc(size);
    Tcl_ConvertElement(p, l, k | TCL_DONT_USE_BRACES);
    table[0].cdata = (void *) cc[i].func;
    add_cd_tcl_cmds(table);
    bind_bind_entry(tl, cc[i].flags, cc[i].name, l);
    nfree(l);
  }
}
NML_375
t
thommey
Halfop
Posts: 76
Joined: Tue Apr 01, 2008 2:59 pm

Post by thommey »

Hey,

I tracked down the bug and it happens because of a behavioural change between Tcl8.5.9 and Tcl8.5.10.
(If you care for details: Tcl_ScanElement used to overestimate the required space, it was rewritten and apparently doesn't do that always anymore. Whether or not the terminating '\0' for strings is included in the estimate is the issue here, eggdrop's code assumes it is while the real return values of Tcl_ScanElement indicate otherwise.)

Here's a patch (patch -p1 < this.patch) to fix the issue:

Code: Select all

diff -urN eggdrop1.6.20/src/tclhash.c eggdrop1.6.20.fix/src/tclhash.c
--- eggdrop1.6.20/src/tclhash.c	2010-06-29 17:52:24.000000000 +0200
+++ eggdrop1.6.20.fix/src/tclhash.c	2011-07-08 23:45:37.000000000 +0200
@@ -1264,7 +1264,7 @@
   for (i = 0; cc[i].name; i++) {
     egg_snprintf(p, sizeof p, "*%s:%s", tl->name,
                  cc[i].funcname ? cc[i].funcname : cc[i].name);
-    l = nmalloc(Tcl_ScanElement(p, &k));
+    l = nmalloc(Tcl_ScanElement(p, &k)+1);
     Tcl_ConvertElement(p, l, k | TCL_DONT_USE_BRACES);
     table[0].cdata = (void *) cc[i].func;
     add_cd_tcl_cmds(table);
@@ -1282,7 +1282,7 @@
   for (i = 0; cc[i].name; i++) {
     egg_snprintf(p, sizeof p, "*%s:%s", table->name,
                  cc[i].funcname ? cc[i].funcname : cc[i].name);
-    l = nmalloc(Tcl_ScanElement(p, &k));
+    l = nmalloc(Tcl_ScanElement(p, &k)+1);
     Tcl_ConvertElement(p, l, k | TCL_DONT_USE_BRACES);
     Tcl_DeleteCommand(interp, p);
     unbind_bind_entry(table, cc[i].flags, cc[i].name, l);
This has been fixed in Eggdrop1.6.21, please upgrade instead.
Last edited by thommey on Mon Nov 07, 2011 8:20 pm, edited 1 time in total.
L
LadyCuddles
Voice
Posts: 3
Joined: Tue Jul 12, 2011 5:50 pm
Location: SLC, Utah, USA

Post by LadyCuddles »

Can someone post the deb package with the patch already in it? That way those of us who prefer not to make/make install, and download the -dev packages, can just get the updated bot as we usually do...

Thanks for any, and all, help :)
t
thommey
Halfop
Posts: 76
Joined: Tue Apr 01, 2008 2:59 pm

Post by thommey »

Both Debian (unstable) and Ubuntu (oneiric) are still serving an Eggdrop version that has been superseded by a new release for about year now. They seem to be unmaintained or at least not interested in keeping up-to-date. Are there inofficial sources for ".deb"s for Eggdrop packages in their latest stable release (1.6.20)? Otherwise you'd just be patching an old Eggdrop version with a fix and probably keep doing this over and over again with bugs that get fixed in Eggdrop releases. Maybe a better place to ask for an update and/or patched version is the Debian bugtracker? (because of a higher chance of success and the greater benefit as everyone who's using their package will see this bug with the new tcl release)
L
LadyCuddles
Voice
Posts: 3
Joined: Tue Jul 12, 2011 5:50 pm
Location: SLC, Utah, USA

Post by LadyCuddles »

thommey, I don't profess to be a guru when it comes to compiling, nor a c coder, or for that matter, a master of decyphering diff output, but, your fix is made in the rem_ routine, and not the add_ routine, am I correct? And from what I can tell, the only change is being made ONLY in the rem_ routine by adding the "+1", right???

I am trying out going for the source tarball, since as you said, the deb packages are almost a full minor version behind.
t
thommey
Halfop
Posts: 76
Joined: Tue Apr 01, 2008 2:59 pm

Post by thommey »

The fix has to be applied to both methods, add_ and rem_. The full diff is visible with enough context to see the function names here (against eggdrop1.8cvs, so don't worry about slight differences and line numbers being off):

http://cvs.eggheads.org/viewvc/eggdrop1 ... f_format=l

The fix is indeed just adding +1 in those two spots.
f
fatalerror
Voice
Posts: 1
Joined: Sun Jul 24, 2011 10:54 am

Debian packages

Post by fatalerror »

Hi there!

I am the Debian developer in charge for the eggdrop Debian package. I'm terribly sorry it took me so long to package eggdrop 1.6.20 and to notice this bug in particular. Please do not take this as lack of interest; life hasn't been easy for the last couple of years, but I intend to keep closer contact from now on.

The x86 .deb files for eggdrop 1.6.20 can already be downloaded from http://people.debian.org/~gpastore

These packages have just been uploaded to the Debian Archive and should land on unstable/sid shortly. They've also been uploaded with the urgency attribute set to 'high', so that the fix reaches testing/wheezy soon enough.
R
Rynet
Voice
Posts: 4
Joined: Tue Jun 12, 2007 3:24 pm

Post by Rynet »

Run "export MALLOC_CHECK_=4" and it will work.
n
nml375
Revered One
Posts: 2860
Joined: Fri Aug 04, 2006 2:09 pm

Post by nml375 »

Actually, that's just sweeping the problem under the rug, and hoping things won't break later on. The bug is well known, and it has been patched/fixed.
Telling malloc/free to ignore the issue will cause crashes further down the execution on certain system setups (see http://forum.egghelp.org/viewtopic.php?t=18528#97131).

Just to emphasize what thommey already pointed out, the eggdrops provided by Ubuntu still uses 1.6.19 (havn't checked the status of Debian though), and there's been quite a few other bugfixes since then. If a patched 1.6.20 package is not available for your distribution, you'd almost always be better off compiling the bot yourself (with the patch applied).
NML_375
User avatar
neofutur
Voice
Posts: 6
Joined: Fri Oct 02, 2009 9:38 pm
Location: irc://chat.freenode.net#bitcoin-hosting
Contact:

bugreport on gentoo

Post by neofutur »

I just hit this bug, and filed a bugreport on the gentoo bugtracking tool :

eggdrop crash after upgrading to tcl-8.5.10-r1

1.6.21 is already available on gentoo :
http://packages.gentoo.org/package/net-irc/eggdrop
but still marked as unstable.

if anyone hit the problem on gentoo , 2 solutions :

* downgrade tcl to dev-lang/tcl-8.5.9 for eggdrop-1.6.19
* unmask the "unstable" eggdrop-1.6.21

feel free to post on the gentoo bugreport to have the package maintainers bump the stable version to eggdrop-1.6.21 ;)
Post Reply