Skip to content

[CRASH] Server with Flash storage crashes after joining the cluster. #574

@skolosov-snap

Description

@skolosov-snap

Crash report

Paste the complete crash log between the quotes below. Please include a few lines from the log preceding the crash report to provide some context.

38:79:S 17 Feb 2023 23:37:20.847 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
38:79:S 17 Feb 2023 23:37:20.847 * Connecting to MASTER 10.110.44.242:17379
38:79:S 17 Feb 2023 23:37:20.847 * MASTER <-> REPLICA sync started
38:79:S 17 Feb 2023 23:37:20.849 * Non blocking connect for SYNC fired the event.
38:79:S 17 Feb 2023 23:37:20.850 * Master replied to PING, replication can continue...
38:79:S 17 Feb 2023 23:37:20.850 * Trying a partial resynchronization (request a46dad8e4ed4d1226c831446287fa735e1e9e010:1).
38:79:S 17 Feb 2023 23:37:20.894 * Full resync from master: a2478128fd4716e7878614a2df4bc1d6a0fc5631:48602803
38:79:S 17 Feb 2023 23:37:20.894 * Discarding previously cached master state.
38:79:S 17 Feb 2023 23:37:20.894 * MASTER <-> REPLICA sync: Flushing old data
38:79:S 17 Feb 2023 23:37:21.092 * Fast sync complete
38:79:S 17 Feb 2023 23:37:21.092 * MASTER <-> REPLICA sync: Finished with success


=== KEYDB BUG REPORT START: Cut & paste starting from here ===
38:79:S 17 Feb 2023 23:37:21.482 # === ASSERTION FAILED ===
38:79:S 17 Feb 2023 23:37:21.482 # ==> db.cpp:2499 'fModified || g_pserver->db[0]->snapshot_depth() > 0' is not true

------ STACK TRACE ------

Backtrace:
keydb-server 0.0.0.0:17379 [cluster](slotToKeyUpdateKeyCore(char const*, unsigned long, int)+0x1d5) [0x55b4ed2649d5]
keydb-server 0.0.0.0:17379 [cluster](redisDbPersistentData::syncDelete(redisObject*)+0x1cf) [0x55b4ed26962f]
keydb-server 0.0.0.0:17379 [cluster](delGenericCommand(client*, int)+0x8e) [0x55b4ed26adde]
keydb-server 0.0.0.0:17379 [cluster](call(client*, int)+0xae) [0x55b4ed296bce]
keydb-server 0.0.0.0:17379 [cluster](processCommand(client*, int)+0x70c) [0x55b4ed297b0c]
keydb-server 0.0.0.0:17379 [cluster](processCommandAndResetClient(client*, int)+0x6f) [0x55b4ed27ab5f]
keydb-server 0.0.0.0:17379 [cluster](processInputBuffer(client*, bool, int)+0x172) [0x55b4ed27ef52]
keydb-server 0.0.0.0:17379 [cluster](processClients()+0x118) [0x55b4ed27f138]
keydb-server 0.0.0.0:17379 [cluster](+0x25b31e) [0x55b4ed28f31e]
keydb-server 0.0.0.0:17379 [cluster](beforeSleep(aeEventLoop*)+0x15a) [0x55b4ed298a1a]
keydb-server 0.0.0.0:17379 [cluster](aeProcessEvents+0x3d0) [0x55b4ed2a4030]
keydb-server 0.0.0.0:17379 [cluster](aeMain+0x3a) [0x55b4ed2a77da]
keydb-server 0.0.0.0:17379 [cluster](workerThreadMain(void*)+0x88) [0x55b4ed28fd98]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7f8255346609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7f8255269133]

------ INFO OUTPUT ------
# Server
redis_version:255.255.255
redis_git_sha1:687850a6
redis_git_dirty:1
redis_build_id:b535e138518f077a
redis_mode:cluster
os:Linux 5.10.147+ x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:9.4.0
process_id:38
process_supervised:no
run_id:d0fd37f19ab4db4055938f6a559205812a27301f
tcp_port:17379
server_time_usec:1676677041484183
uptime_in_seconds:18
uptime_in_days:0
hz:40
configured_hz:40
lru_clock:15732657
executable:/data/keydb-server
config_file:/redis-conf/redis.conf

# Clients
connected_clients:5
cluster_connections:22
maxclients:60000
client_recent_max_input_buffer:16084
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
current_client_thread:1
thread_0_clients:1
thread_1_clients:3
thread_2_clients:0
thread_3_clients:0
thread_4_clients:0
thread_5_clients:1

# Memory
used_memory:2184921720
used_memory_human:2.03G
used_memory_rss:90320896
used_memory_rss_human:86.14M
used_memory_peak:2184921720
used_memory_peak_human:2.03G
used_memory_peak_perc:100.00%
used_memory_overhead:2174948488
used_memory_startup:25837864
used_memory_dataset:9973232
used_memory_dataset_perc:0.46%
allocator_allocated:2187168784
allocator_active:2189295616
allocator_resident:2238046208
total_system_memory:33671958528
total_system_memory_human:31.36G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:9875228262
maxmemory_human:9.20G
maxmemory_policy:allkeys-lfu
allocator_frag_ratio:1.00
allocator_frag_bytes:2126832
allocator_rss_ratio:1.02
allocator_rss_bytes:48750592
rss_overhead_ratio:0.04
rss_overhead_bytes:-2147725312
mem_fragmentation_ratio:0.04
mem_fragmentation_bytes:-2094291264
mem_not_counted_for_evict:2000000000
mem_replication_backlog:2147483648
mem_clients_slaves:0
mem_clients_normal:102552
mem_aof_buffer:0
mem_allocator:jemalloc-5.2.1
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
storage_provider:flash
flash_memory:0

# Persistence
loading:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:919
rdb_bgsave_in_progress:0
rdb_last_save_time:1676677023
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:15
total_commands_processed:945
instantaneous_ops_per_sec:450
total_net_input_bytes:1153890
total_net_output_bytes:21603
instantaneous_input_kbps:547.83
instantaneous_output_kbps:5.54
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:9
dump_payload_sanitizations:0
total_reads_processed:133
total_writes_processed:18
instantaneous_lock_contention:2
avg_lock_contention:0.140625
storage_provider_read_hits:12
storage_provider_read_misses:1801

# Replication
role:slave
master_global_link_status:up
connected_masters:1
master_host:10.110.44.242
master_port:17379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_read_repl_offset:49755376
slave_repl_offset:49755289
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:a2478128fd4716e7878614a2df4bc1d6a0fc5631
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:49755289
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:2000000000
repl_backlog_first_byte_offset:48602804
repl_backlog_histlen:1152486

# CPU
used_cpu_sys:0.205142
used_cpu_user:0.505217
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
server_threads:6
long_lock_waits:0
used_cpu_sys_main_thread:0.053791
used_cpu_user_main_thread:0.205014

# Modules
module:name=statsd,ver=1,api=1,filters=0,usedby=[],using=[],options=[]

# Commandstats
cmdstat_set:calls=19,usec=3405,usec_per_call=179.21,rejected_calls=0,failed_calls=0
cmdstat_readonly:calls=4,usec=5,usec_per_call=1.25,rejected_calls=0,failed_calls=0
cmdstat_ping:calls=1,usec=1,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_cluster:calls=10,usec=783,usec_per_call=78.30,rejected_calls=0,failed_calls=0
cmdstat_select:calls=1,usec=1,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_info:calls=5,usec=204,usec_per_call=40.80,rejected_calls=0,failed_calls=0
cmdstat_client:calls=1,usec=2,usec_per_call=2.00,rejected_calls=0,failed_calls=0
cmdstat_restore-asking:calls=900,usec=33867,usec_per_call=37.63,rejected_calls=0,failed_calls=0
cmdstat_get:calls=0,usec=0,usec_per_call=0.00,rejected_calls=9,failed_calls=0
cmdstat_config:calls=2,usec=135,usec_per_call=67.50,rejected_calls=0,failed_calls=0
cmdstat_dbsize:calls=2,usec=1,usec_per_call=0.50,rejected_calls=0,failed_calls=0

# Errorstats
errorstat_CLUSTERDOWN:count=1
errorstat_MOVED:count=8

# Cluster
cluster_enabled:1

# Keyspace
db0:keys=37256,expires=912,avg_ttl=33322509,cached_keys=912

# KeyDB
mvcc_depth:0

------ CLIENT LIST OUTPUT ------
id=16 addr=10.183.0.45:56036 laddr=10.110.44.240:17379 fd=80 name= age=13 idle=13 flags=r db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=get user=default redir=-1
id=17 addr=10.183.0.45:56038 laddr=10.110.44.240:17379 fd=81 name= age=12 idle=12 flags=r db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=get user=default redir=-1
id=18 addr=10.183.0.45:56044 laddr=10.110.44.240:17379 fd=82 name= age=10 idle=10 flags=r db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=get user=default redir=-1
id=23 addr=10.136.152.37:57378 laddr=10.110.44.240:17379 fd=83 name= age=1 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20512 events=r cmd=cluster user=default redir=-1
id=25 addr=10.110.44.242:17379 laddr=10.110.44.240:36741 fd=84 name= age=0 idle=0 flags=M db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=70 obl=0 oll=0 omem=0 tot-mem=61526 events=r cmd=del user=(superuser) redir=-1

------ CURRENT CLIENT INFO ------
id=25 addr=10.110.44.242:17379 laddr=10.110.44.240:36741 fd=84 name= age=0 idle=0 flags=M db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=40954 argv-mem=70 obl=0 oll=0 omem=0 tot-mem=61526 events=r cmd=del user=(superuser) redir=-1
argv[0]: 'DEL'
argv[1]: 'test'

------ MODULES INFO OUTPUT ------

------ FAST MEMORY TEST ------
38:79:S 17 Feb 2023 23:37:21.484 # main thread terminated
38:79:S 17 Feb 2023 23:37:21.484 # Bio thread for job type #0 terminated
38:79:S 17 Feb 2023 23:37:21.484 # Bio thread for job type #1 terminated
38:79:S 17 Feb 2023 23:37:21.484 # Bio thread for job type #2 terminated

Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

=== KEYDB BUG REPORT END. Make sure to include from START to END. ===

       Please report the crash by opening an issue on github:

           https://github.com/JohnSully/KeyDB/issues

  Suspect RAM error? Use keydb-server --test-memory to verify it.


Aditional information

  1. OS distribution and version
  2. Steps to reproduce (if any)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions