Core Dump Bug In ART Index: ShannonBase Troubleshooting

Aug 12, 2025 by RICHARD 56 views

Bug Report: Core Dump at ART Index Discussion Category

Hey guys,

We've got a bug report here about a core dump that's happening at the ART index discussion category. Let's dive into the details and see what's going on.

Search Before Asking

The reporter has already done a search in the issues and didn't find any similar issues. That's a good first step!

Version

This bug is happening on the latest version, so we know it's not something that's been fixed in a previous release.

What's Wrong?

Here's the error message we're getting:

2025-08-12T13:40:28Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=346297f9fcbecd6ca0c8ce417322539c580833a0
Thread pointer: 0x7fcba4000fd0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fcc443a6fe0 thread_stack 0x100000
 #0 0x55e8aa11f250 _ZL18print_fatal_signaliP9siginfo_t
 #1 0x55e8aa11f74c _Z19handle_fatal_signaliP9siginfo_tPv
 #2 0x7fcc4e24532f <unknown>
 #3 0x7fcc4e29eb2c pthread_kill
 #4 0x7fcc4e24527d gsignal
 #5 0x7fcc4e2288fe abort
 #6 0x7fcc4e2297b5 <unknown>
 #7 0x7fcc4e2a8ff4 <unknown>
 #8 0x7fcc4e2ab329 <unknown>
 #9 0x7fcc4e2addad cfree
 #10 0x55e8abb8937c _ZN11ShannonBase4Imcs5Index3ART12Destroy_nodeEPNS2_8Art_nodeE
 #11 0x55e8abbc5b38 _ZNSt10_HashtableINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_St10unique_ptrIN11ShannonBase4Imcs5Index5IndexIhmEESt14default_deleteISD_EEESaISH_ENSt8__detail10_Select1stESt8equal_toIS5_ESt4hashIS5_ENSJ_18_Mod_range_hashingENSJ_20_Default_ranged_hashENSJ_20_Prime_rehash_policyENSJ_17_Hashtable_traitsILb1ELb0ELb1EEEE5clearEv
 #12 0x55e8abbc6227 _ZN11ShannonBase4Imcs5TableD0Ev
 #13 0x55e8abb7d893 _ZNSt10_HashtableINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_St10unique_ptrIN11ShannonBase4Imcs10RapidTableESt14default_deleteISB_EEESaISF_ENSt8__detail10_Select1stESt8equal_toIS5_ESt4hashIS5_ENSH_18_Mod_range_hashingENSH_20_Default_ranged_hashENSH_20_Prime_rehash_policyENSH_17_Hashtable_traitsILb1ELb0ELb1EEEE8_M_eraseESt17integral_constantIbLb1EERS7_.isra.0
 #14 0x55e8abb823c5 _ZN11ShannonBase4Imcs4Imcs13unload_innodbEPKNS_18Rapid_load_contextEPKcS6_b
 #15 0x55e8abb6c8cd _ZN11ShannonBase8ha_rapid12unload_tableEPKcS2_b
 #16 0x55e8aa02099c _ZL29secondary_engine_unload_tableP3THDPKcS2_RKN2dd5TableEb
 #17 0x55e8aa02ec34 _ZL15drop_base_tableP3THDRK15Drop_tables_ctxP9Table_refbPSt3setIP10handlertonSt4lessIS8_ESaIS8_EEP31Foreign_key_parents_invalidatorPSt6vectorIP10MDL_ticketSaISI_EEP8MEM_ROOT
 #18 0x55e8aa0347a4 _Z23mysql_rm_table_no_locksP3THDP9Table_refbbbPbPSt3setIP10handlertonSt4lessIS6_ESaIS6_EEP31Foreign_key_parents_invalidatorPSt6vectorIP10MDL_ticketSaISG_EE
 #19 0x55e8a9f1f209 _Z11mysql_rm_dbP3THDRK17MYSQL_LEX_CSTRINGb
 #20 0x55e8a9f77806 _Z21mysql_execute_commandP3THDb
 #21 0x55e8a9f7a522 _Z20dispatch_sql_commandP3THDP12Parser_state
 #22 0x55e8a9f7cdbd _Z16dispatch_commandP3THDPK8COM_DATA19enum_server_command
 #23 0x55e8a9f7da1d _Z10do_commandP3THD
 #24 0x55e8aa10e1cf handle_connection
 #25 0x55e8abf3c273 pfs_spawn_thread
 #26 0x7fcc4e29caa3 <unknown>
 #27 0x7fcc4e329c3b <unknown>
 #28 0xffffffffffffffff <unknown>

This looks like a classic core dump, guys. The server received a signal 6, which usually means something went seriously wrong. The backtrace is super helpful, though. It gives us a stack trace of the functions that were being called when the crash happened. Looking at the stack trace, we can see that the issue seems to be related to ShannonBase4Imcs5Index3ART12Destroy_node. This suggests that there might be a problem with how the ART index nodes are being destroyed or deallocated.

ART indexes (Adaptive Radix Trees) are often used for efficient indexing in databases and other systems. They are tree-like data structures that can quickly locate data based on keys. However, like any complex data structure, they need to be managed carefully. Memory leaks, double frees, or other memory corruption issues can lead to crashes like this one. The Destroy_node function is a critical part of this management, so it's a good place to focus our attention.

Further down the stack trace, we see calls to _ZN11ShannonBase4Imcs5TableD0Ev and _ZN11ShannonBase4Imcs10RapidTableE. These functions are likely related to the destruction of tables and indexes within the ShannonBase system. This indicates that the crash might be happening during the process of unloading or dropping a table, possibly when the ART index is being cleaned up. It's also worth noting the presence of RapidTable in the trace, which suggests that the issue might be specific to the Rapid storage engine within ShannonBase.

Another interesting point is the call to cfree (line #9 in the backtrace). cfree is a function used to deallocate memory that was previously allocated with malloc or a similar function. If cfree is called on a memory address that is invalid (e.g., already freed or never allocated), it can lead to a core dump. This further strengthens the hypothesis that the crash is related to a memory management issue within the ART index destruction process. It's possible that a node is being freed twice, or that a pointer has become corrupted, leading to cfree being called with a bad address. To get a clearer picture, we will probably need to examine the code for Destroy_node and the surrounding functions to understand how memory is being allocated and deallocated in this context.

Finally, the initial lines of the error message mention the possibility of malfunctioning hardware. While this is a possibility, it's less likely than a software bug, especially given the clear stack trace pointing to a specific area of code. It's good practice to rule out software issues first before considering hardware problems. However, it's worth keeping in mind if the bug proves difficult to reproduce or if other symptoms suggest hardware instability.

In summary, the core dump appears to be related to memory management issues within the ART index destruction process, possibly during the unloading of a RapidTable. The stack trace provides valuable clues, and further investigation should focus on the Destroy_node function and the surrounding code to understand how memory is being allocated and deallocated. It is crucial to carefully examine the code paths involved in destroying ART index nodes, particularly when tables are dropped or unloaded. We need to ensure that nodes are not being freed multiple times, that pointers are valid, and that memory is being managed correctly.

How to Reproduce?

Unfortunately, the reporter didn't provide steps to reproduce the issue. This makes it harder to debug, but we can try to create some scenarios that might trigger the bug. We need more information to effectively reproduce the bug. Knowing the specific operations performed before the crash (e.g., creating, inserting, deleting data, dropping tables) would be extremely helpful. If the reporter can provide a minimal test case or a sequence of operations that reliably triggers the core dump, it will greatly accelerate the debugging process.

Without specific steps, we can try to reproduce the issue by performing operations that involve creating, using, and then dropping tables with ART indexes. This might include:

Creating a table with an ART index.
Inserting and deleting data in the table.
Dropping the table.
Unloading the table.
Performing these operations in a loop or under heavy load to try to expose race conditions or other timing-related issues.

We can also try different configurations and data sets to see if the bug is triggered under specific circumstances. It is important to document all attempts to reproduce the bug, including the steps taken, the configuration used, and whether the crash occurred.

Are you willing to submit PR?

The reporter isn't sure if they can submit a PR. That's okay! We appreciate the bug report either way. If someone else can fix it, that's great. If the original reporter is willing to test the fix, that's also super helpful.

Next Steps

Okay, so what's next? Given the information we have, here's a plan:

Deep Dive into the Code: Let's examine the Destroy_node function and any code related to ART index destruction. We'll be looking for potential memory leaks, double frees, or any other memory management issues.
Try to Reproduce: We need to figure out how to make this bug happen consistently. We'll try the scenarios mentioned above and see if we can trigger the core dump.
Debugging: If we can reproduce the bug, we can use debugging tools to step through the code and see exactly where the crash is happening. This will give us valuable insights into the cause of the issue.
Fix the Bug: Once we understand the root cause, we can implement a fix and test it thoroughly.

The ShannonBase Imcs5 Index ART Destroy Node Function: Analyzing the backtrace, the _ZN11ShannonBase4Imcs5Index3ART12Destroy_nodeEPNS2_8Art_nodeE function is pinpointed as a critical point of failure. This function is responsible for destroying nodes within the ART index. A deep dive into this function and its call graph is essential to understand the memory management practices. Are nodes being deallocated correctly? Is there a possibility of double freeing or memory leaks? Understanding the algorithm and the specific implementation details is crucial here.

Hashtable Clear and Table Destruction: The stack trace also reveals calls to functions related to hashtable clearing (_ZNSt10_Hashtable...5clearEv) and table destruction (_ZN11ShannonBase4Imcs5TableD0Ev). These functions likely interact with the ART index during the cleanup process. It is necessary to investigate the order of operations during table destruction. Is the index being properly detached from the table before node destruction? Are there any race conditions that might occur if these operations are not synchronized correctly? The interactions between these components should be carefully examined to identify potential issues.

RapidTable Unload and InnoDB Interaction: The mention of RapidTable and the unload_innodb function suggests that the bug might be triggered when unloading tables using the Rapid storage engine or during interactions with InnoDB. Understanding the specifics of how RapidTable interacts with ART indexes and how it coordinates with InnoDB during table unloading is essential. Is there any data being shared between these engines? If so, are there any potential issues with data consistency or lifetime management? Scenarios involving RapidTable unload and InnoDB interactions should be considered when attempting to reproduce the bug.

The Role of cfree: The appearance of cfree in the stack trace is a strong indicator of a memory-related issue. It signifies that the program is attempting to deallocate memory using the standard C library's free function. If cfree is called with an invalid pointer (e.g., a pointer that has already been freed or a pointer that was not allocated by malloc), it will lead to a crash. This suggests that there is a problem with memory management in the code path leading to the crash. Identifying the specific code that calls cfree and understanding the state of the memory being freed is critical for solving this bug.

Guys, let's get to the bottom of this! Any help is appreciated.