Cassandra 1.2 huge read latency -
i'm working on 4 node cassandra 1.2.6 cluster single keyspace, replication factor of 2 (3 originally, dropped 2) , 10 or column families. running oracle 1.7 jvm. has mix of reads , writes, 2 3 times many writes reads.
even under small amount of load, seeing large read latencies, , quite few read timeouts (using datastax java driver). here example output of nodetool cfstats 1 of column families:
column family: user_scores sstable count: 1 sstables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0] space used (live): 7539098 space used (total): 7549091 number of keys (estimate): 42112 memtable columns count: 2267 memtable data size: 1048576 memtable switch count: 2 read count: 2101 **read latency: 272334.202 ms.** write count: 24947 write latency: nan ms. pending tasks: 0 bloom filter false positives: 0 bloom filter false ratio: 0.00000 bloom filter space used: 55376 compacted row minimum size: 447 compacted row maximum size: 219342 compacted row mean size: 1051
as can see, tried using level base compaction strategy try , improve read latency, can see latency huge. i'm bit stumped. had cassandra 1.1.6 cluster working beautifully, no luck far 1.2.
the cluster running on vm's 4 cpu's , 7 gb of ram. data drive setup striped raid across 4 disks. machine doesn't seem io bound.
i'm running pretty vanilla configuration, defaults.
i see strange cpu behavior cpu spiking under smaller load. see compactions running, niced don't think culprit.
i'm trying figure out go next. appreciated!
[update rpc_timeout trace] still playing this. here example trace. looks merge step taking way long.
activity | timestamp | source | source_elapsed -------------------------------------------------------------------------+--------------+---------------+---------------- execute_cql3_query | 04:57:18,882 | 100.69.176.51 | 0 parsing select * user_scores user_id='26257166' limit 10000; | 04:57:18,884 | 100.69.176.51 | 1981 peparing statement | 04:57:18,885 | 100.69.176.51 | 2997 executing single-partition query on user_scores | 04:57:18,885 | 100.69.176.51 | 3657 acquiring sstable references | 04:57:18,885 | 100.69.176.51 | 3724 merging memtable tombstones | 04:57:18,885 | 100.69.176.51 | 3779 key cache hit sstable 32 | 04:57:18,886 | 100.69.176.51 | 3910 seeking partition beginning in data file | 04:57:18,886 | 100.69.176.51 | 3930 merging data memtables , 1 sstables | 04:57:18,886 | 100.69.176.51 | 4211 request complete | 04:57:38,891 | 100.69.176.51 | 20009870
older traces below:
[newer trace] after addressing problem noted in logs rebuilding cluster data repository, still ran problem, although took quite bit longer. here trace grabbed when in bad state:
tracing session: a6dbefc0-ea49-11e2-84bb-ef447a7d9a48
activity | timestamp | source | source_elapsed -------------------------------------------------------------------------------------------------+--------------+----------------+---------------- execute_cql3_query | 16:48:02,755 | 100.69.196.124 | 0 parsing select * user_scores limit 1; | 16:48:02,756 | 100.69.196.124 | 1774 peparing statement | 16:48:02,759 | 100.69.196.124 | 4006 determining replicas query | 16:48:02,759 | 100.69.196.124 | 4286 enqueuing request /100.69.176.51 | 16:48:02,763 | 100.69.196.124 | 8849 sending message cdb002/100.69.176.51 | 16:48:02,764 | 100.69.196.124 | 9456 message received /100.69.196.124 | 16:48:03,449 | 100.69.176.51 | 160 message received /100.69.176.51 | 16:48:09,646 | 100.69.196.124 | 6891860 processing response /100.69.176.51 | 16:48:09,647 | 100.69.196.124 | 6892426 executing seq scan across 1 sstables [min(-9223372036854775808), min(-9223372036854775808)] | 16:48:10,288 | 100.69.176.51 | 6838754 seeking partition beginning in data file | 16:48:10,289 | 100.69.176.51 | 6839689 read 1 live , 0 tombstoned cells | 16:48:10,289 | 100.69.176.51 | 6839927 seeking partition beginning in data file | 16:48:10,289 | 100.69.176.51 | 6839998 read 1 live , 0 tombstoned cells | 16:48:10,289 | 100.69.176.51 | 6840082 scanned 1 rows , matched 1 | 16:48:10,289 | 100.69.176.51 | 6840162 enqueuing response /100.69.196.124 | 16:48:10,289 | 100.69.176.51 | 6840229 sending message /100.69.196.124 | 16:48:10,299 | 100.69.176.51 | 6850072 request complete | 16:48:09,648 | 100.69.196.124 | 6893029
[update] should add things work dandy solo cassandra instance on macbook pro. aka works on machine...:)
[update trace data]
here trace data. java driver. downside can trace queries succeed. make total of 67 queries before every query starts timing out. weird doesn't bad. @ query 68, no longer response, , 2 of servers running hot.
2013-07-11 02:15:45 stdio [info] *************************************** 66:host (queried): cdb003/100.69.198.47 66:host (tried): cdb003/100.69.198.47 66:trace id: c95e51c0-e9cf-11e2-b9a9-5b3c0946787b 66:-----------------------------------------------------+--------------+-----------------+-------------- 66: enqueuing data request /100.69.176.51 | 02:15:42.045 | /100.69.198.47 | 200 66: enqueuing digest request /100.69.176.51 | 02:15:42.045 | /100.69.198.47 | 265 66: sending message /100.69.196.124 | 02:15:42.045 | /100.69.198.47 | 570 66: sending message /100.69.176.51 | 02:15:42.045 | /100.69.198.47 | 574 66: message received /100.69.176.51 | 02:15:42.107 | /100.69.198.47 | 62492 66: processing response /100.69.176.51 | 02:15:42.107 | /100.69.198.47 | 62780 66: message received /100.69.198.47 | 02:15:42.508 | /100.69.196.124 | 31 66: executing single-partition query on user_scores | 02:15:42.508 | /100.69.196.124 | 406 66: acquiring sstable references | 02:15:42.508 | /100.69.196.124 | 473 66: merging memtable tombstones | 02:15:42.508 | /100.69.196.124 | 577 66: key cache hit sstable 11 | 02:15:42.508 | /100.69.196.124 | 807 66: seeking partition beginning in data file | 02:15:42.508 | /100.69.196.124 | 849 66: merging data memtables , 1 sstables | 02:15:42.509 | /100.69.196.124 | 1500 66: message received /100.69.198.47 | 02:15:43.379 | /100.69.176.51 | 60 66: executing single-partition query on user_scores | 02:15:43.379 | /100.69.176.51 | 399 66: acquiring sstable references | 02:15:43.379 | /100.69.176.51 | 490 66: merging memtable tombstones | 02:15:43.379 | /100.69.176.51 | 593 66: key cache hit sstable 7 | 02:15:43.380 | /100.69.176.51 | 1098 66: seeking partition beginning in data file | 02:15:43.380 | /100.69.176.51 | 1141 66: merging data memtables , 1 sstables | 02:15:43.380 | /100.69.176.51 | 1912 66: read 1 live , 0 tombstoned cells | 02:15:43.438 | /100.69.176.51 | 59094 66: enqueuing response /100.69.198.47 | 02:15:43.438 | /100.69.176.51 | 59225 66: sending message /100.69.198.47 | 02:15:43.438 | /100.69.176.51 | 59373 66:started at: 02:15:42.04466:elapsed time in micros: 63105 2013-07-11 02:15:45 stdio [info] *************************************** 67:host (queried): cdb004/100.69.184.134 67:host (tried): cdb004/100.69.184.134 67:trace id: c9f365d0-e9cf-11e2-a4e5-7f3170333ff5 67:-----------------------------------------------------+--------------+-----------------+-------------- 67: message received /100.69.184.134 | 02:15:42.536 | /100.69.198.47 | 36 67: executing single-partition query on user_scores | 02:15:42.536 | /100.69.198.47 | 273 67: acquiring sstable references | 02:15:42.536 | /100.69.198.47 | 311 67: merging memtable tombstones | 02:15:42.536 | /100.69.198.47 | 353 67: key cache hit sstable 8 | 02:15:42.536 | /100.69.198.47 | 436 67: seeking partition beginning in data file | 02:15:42.536 | /100.69.198.47 | 455 67: merging data memtables , 1 sstables | 02:15:42.537 | /100.69.198.47 | 811 67: read 1 live , 0 tombstoned cells | 02:15:42.550 | /100.69.198.47 | 14242 67: enqueuing response /100.69.184.134 | 02:15:42.550 | /100.69.198.47 | 14456 67: sending message /100.69.184.134 | 02:15:42.551 | /100.69.198.47 | 14694 67: enqueuing data request /100.69.198.47 | 02:15:43.021 | /100.69.184.134 | 323 67: sending message /100.69.198.47 | 02:15:43.021 | /100.69.184.134 | 565 67: message received /100.69.198.47 | 02:15:43.038 | /100.69.184.134 | 17029 67: processing response /100.69.198.47 | 02:15:43.038 | /100.69.184.134 | 17230 67:started at: 02:15:43.021 67:elapsed time in micros: 17622
and here trace using cqlsh:
tracing session: d0f845d0-e9cf-11e2-8882-ef447a7d9a48 activity | timestamp | source | source_elapsed -------------------------------------------------------------------------+--------------+----------------+---------------- execute_cql3_query | 19:15:54,833 | 100.69.196.124 | 0 parsing select * user_scores user_id='39333433' limit 10000; | 19:15:54,833 | 100.69.196.124 | 103 peparing statement | 19:15:54,833 | 100.69.196.124 | 455 executing single-partition query on user_scores | 19:15:54,834 | 100.69.196.124 | 1400 acquiring sstable references | 19:15:54,834 | 100.69.196.124 | 1468 merging memtable tombstones | 19:15:54,835 | 100.69.196.124 | 1575 key cache hit sstable 11 | 19:15:54,835 | 100.69.196.124 | 1780 seeking partition beginning in data file | 19:15:54,835 | 100.69.196.124 | 1822 merging data memtables , 1 sstables | 19:15:54,836 | 100.69.196.124 | 2562 read 1 live , 0 tombstoned cells | 19:15:54,838 | 100.69.196.124 | 4808 request complete | 19:15:54,838 | 100.69.196.124 | 5810
the trace seems show of time doing or waiting network operations. perhaps network has problems?
if operations fail, perhaps have problem 1 of nodes. when node not needed, things work, when needed things go badly. might worth looking @ log files on other nodes.
Comments
Post a Comment