Cassandra 1.2 huge read latency -


i'm working on 4 node cassandra 1.2.6 cluster single keyspace, replication factor of 2 (3 originally, dropped 2) , 10 or column families. running oracle 1.7 jvm. has mix of reads , writes, 2 3 times many writes reads.

even under small amount of load, seeing large read latencies, , quite few read timeouts (using datastax java driver). here example output of nodetool cfstats 1 of column families:

    column family: user_scores     sstable count: 1     sstables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0]     space used (live): 7539098     space used (total): 7549091     number of keys (estimate): 42112     memtable columns count: 2267     memtable data size: 1048576     memtable switch count: 2     read count: 2101     **read latency: 272334.202 ms.**     write count: 24947     write latency: nan ms.     pending tasks: 0     bloom filter false positives: 0     bloom filter false ratio: 0.00000     bloom filter space used: 55376     compacted row minimum size: 447     compacted row maximum size: 219342     compacted row mean size: 1051 

as can see, tried using level base compaction strategy try , improve read latency, can see latency huge. i'm bit stumped. had cassandra 1.1.6 cluster working beautifully, no luck far 1.2.

the cluster running on vm's 4 cpu's , 7 gb of ram. data drive setup striped raid across 4 disks. machine doesn't seem io bound.

i'm running pretty vanilla configuration, defaults.

i see strange cpu behavior cpu spiking under smaller load. see compactions running, niced don't think culprit.

i'm trying figure out go next. appreciated!

[update rpc_timeout trace] still playing this. here example trace. looks merge step taking way long.

 activity                                                                | timestamp    | source        | source_elapsed -------------------------------------------------------------------------+--------------+---------------+----------------                                                       execute_cql3_query | 04:57:18,882 | 100.69.176.51 |              0  parsing select * user_scores user_id='26257166' limit 10000; | 04:57:18,884 | 100.69.176.51 |           1981                                                       peparing statement | 04:57:18,885 | 100.69.176.51 |           2997                          executing single-partition query on user_scores | 04:57:18,885 | 100.69.176.51 |           3657                                             acquiring sstable references | 04:57:18,885 | 100.69.176.51 |           3724                                              merging memtable tombstones | 04:57:18,885 | 100.69.176.51 |           3779                                             key cache hit sstable 32 | 04:57:18,886 | 100.69.176.51 |           3910                              seeking partition beginning in data file | 04:57:18,886 | 100.69.176.51 |           3930                               merging data memtables , 1 sstables | 04:57:18,886 | 100.69.176.51 |           4211                                                         request complete | 04:57:38,891 | 100.69.176.51 |       20009870 

older traces below:

[newer trace] after addressing problem noted in logs rebuilding cluster data repository, still ran problem, although took quite bit longer. here trace grabbed when in bad state:

tracing session: a6dbefc0-ea49-11e2-84bb-ef447a7d9a48

 activity                                                                                        | timestamp    | source         | source_elapsed -------------------------------------------------------------------------------------------------+--------------+----------------+----------------                                                                               execute_cql3_query | 16:48:02,755 | 100.69.196.124 |              0                                                       parsing select * user_scores limit 1; | 16:48:02,756 | 100.69.196.124 |           1774                                                                               peparing statement | 16:48:02,759 | 100.69.196.124 |           4006                                                                    determining replicas query | 16:48:02,759 | 100.69.196.124 |           4286                                                              enqueuing request /100.69.176.51 | 16:48:02,763 | 100.69.196.124 |           8849                                                          sending message cdb002/100.69.176.51 | 16:48:02,764 | 100.69.196.124 |           9456                                                            message received /100.69.196.124 | 16:48:03,449 |  100.69.176.51 |            160                                                             message received /100.69.176.51 | 16:48:09,646 | 100.69.196.124 |        6891860                                                          processing response /100.69.176.51 | 16:48:09,647 | 100.69.196.124 |        6892426  executing seq scan across 1 sstables [min(-9223372036854775808), min(-9223372036854775808)] | 16:48:10,288 |  100.69.176.51 |        6838754                                                      seeking partition beginning in data file | 16:48:10,289 |  100.69.176.51 |        6839689                                                               read 1 live , 0 tombstoned cells | 16:48:10,289 |  100.69.176.51 |        6839927                                                      seeking partition beginning in data file | 16:48:10,289 |  100.69.176.51 |        6839998                                                               read 1 live , 0 tombstoned cells | 16:48:10,289 |  100.69.176.51 |        6840082                                                                     scanned 1 rows , matched 1 | 16:48:10,289 |  100.69.176.51 |        6840162                                                            enqueuing response /100.69.196.124 | 16:48:10,289 |  100.69.176.51 |        6840229                                                               sending message /100.69.196.124 | 16:48:10,299 |  100.69.176.51 |        6850072                                                                                 request complete | 16:48:09,648 | 100.69.196.124 |        6893029 

[update] should add things work dandy solo cassandra instance on macbook pro. aka works on machine...:)

[update trace data]

here trace data. java driver. downside can trace queries succeed. make total of 67 queries before every query starts timing out. weird doesn't bad. @ query 68, no longer response, , 2 of servers running hot.

2013-07-11 02:15:45 stdio [info] *************************************** 66:host (queried): cdb003/100.69.198.47 66:host (tried): cdb003/100.69.198.47 66:trace id: c95e51c0-e9cf-11e2-b9a9-5b3c0946787b  66:-----------------------------------------------------+--------------+-----------------+-------------- 66:            enqueuing data request /100.69.176.51 | 02:15:42.045 |  /100.69.198.47 |          200 66:          enqueuing digest request /100.69.176.51 | 02:15:42.045 |  /100.69.198.47 |          265 66:                  sending message /100.69.196.124 | 02:15:42.045 |  /100.69.198.47 |          570 66:                   sending message /100.69.176.51 | 02:15:42.045 |  /100.69.198.47 |          574 66:                message received /100.69.176.51 | 02:15:42.107 |  /100.69.198.47 |        62492 66:             processing response /100.69.176.51 | 02:15:42.107 |  /100.69.198.47 |        62780 66:                message received /100.69.198.47 | 02:15:42.508 | /100.69.196.124 |           31 66:     executing single-partition query on user_scores | 02:15:42.508 | /100.69.196.124 |          406 66:                        acquiring sstable references | 02:15:42.508 | /100.69.196.124 |          473 66:                         merging memtable tombstones | 02:15:42.508 | /100.69.196.124 |          577 66:                        key cache hit sstable 11 | 02:15:42.508 | /100.69.196.124 |          807 66:         seeking partition beginning in data file | 02:15:42.508 | /100.69.196.124 |          849 66:          merging data memtables , 1 sstables | 02:15:42.509 | /100.69.196.124 |         1500 66:                message received /100.69.198.47 | 02:15:43.379 |  /100.69.176.51 |           60 66:     executing single-partition query on user_scores | 02:15:43.379 |  /100.69.176.51 |          399 66:                        acquiring sstable references | 02:15:43.379 |  /100.69.176.51 |          490 66:                         merging memtable tombstones | 02:15:43.379 |  /100.69.176.51 |          593 66:                         key cache hit sstable 7 | 02:15:43.380 |  /100.69.176.51 |         1098 66:         seeking partition beginning in data file | 02:15:43.380 |  /100.69.176.51 |         1141 66:          merging data memtables , 1 sstables | 02:15:43.380 |  /100.69.176.51 |         1912 66:                  read 1 live , 0 tombstoned cells | 02:15:43.438 |  /100.69.176.51 |        59094 66:                enqueuing response /100.69.198.47 | 02:15:43.438 |  /100.69.176.51 |        59225 66:                   sending message /100.69.198.47 | 02:15:43.438 |  /100.69.176.51 |        59373 66:started at: 02:15:42.04466:elapsed time in micros: 63105 2013-07-11 02:15:45 stdio [info] *************************************** 67:host (queried): cdb004/100.69.184.134 67:host (tried): cdb004/100.69.184.134 67:trace id: c9f365d0-e9cf-11e2-a4e5-7f3170333ff5  67:-----------------------------------------------------+--------------+-----------------+-------------- 67:               message received /100.69.184.134 | 02:15:42.536 |  /100.69.198.47 |           36 67:     executing single-partition query on user_scores | 02:15:42.536 |  /100.69.198.47 |          273 67:                        acquiring sstable references | 02:15:42.536 |  /100.69.198.47 |          311 67:                         merging memtable tombstones | 02:15:42.536 |  /100.69.198.47 |          353 67:                         key cache hit sstable 8 | 02:15:42.536 |  /100.69.198.47 |          436 67:         seeking partition beginning in data file | 02:15:42.536 |  /100.69.198.47 |          455 67:          merging data memtables , 1 sstables | 02:15:42.537 |  /100.69.198.47 |          811 67:                  read 1 live , 0 tombstoned cells | 02:15:42.550 |  /100.69.198.47 |        14242 67:               enqueuing response /100.69.184.134 | 02:15:42.550 |  /100.69.198.47 |        14456 67:                  sending message /100.69.184.134 | 02:15:42.551 |  /100.69.198.47 |        14694 67:            enqueuing data request /100.69.198.47 | 02:15:43.021 | /100.69.184.134 |          323 67:                   sending message /100.69.198.47 | 02:15:43.021 | /100.69.184.134 |          565 67:                message received /100.69.198.47 | 02:15:43.038 | /100.69.184.134 |        17029 67:             processing response /100.69.198.47 | 02:15:43.038 | /100.69.184.134 |        17230 67:started at: 02:15:43.021 67:elapsed time in micros: 17622 

and here trace using cqlsh:

tracing session: d0f845d0-e9cf-11e2-8882-ef447a7d9a48   activity                                                                | timestamp    | source         | source_elapsed -------------------------------------------------------------------------+--------------+----------------+----------------                                                       execute_cql3_query | 19:15:54,833 | 100.69.196.124 |              0  parsing select * user_scores user_id='39333433' limit 10000; | 19:15:54,833 | 100.69.196.124 |            103                                                       peparing statement | 19:15:54,833 | 100.69.196.124 |            455                          executing single-partition query on user_scores | 19:15:54,834 | 100.69.196.124 |           1400                                             acquiring sstable references | 19:15:54,834 | 100.69.196.124 |           1468                                              merging memtable tombstones | 19:15:54,835 | 100.69.196.124 |           1575                                             key cache hit sstable 11 | 19:15:54,835 | 100.69.196.124 |           1780                              seeking partition beginning in data file | 19:15:54,835 | 100.69.196.124 |           1822                               merging data memtables , 1 sstables | 19:15:54,836 | 100.69.196.124 |           2562                                       read 1 live , 0 tombstoned cells | 19:15:54,838 | 100.69.196.124 |           4808                                                         request complete | 19:15:54,838 | 100.69.196.124 |           5810 

the trace seems show of time doing or waiting network operations. perhaps network has problems?

if operations fail, perhaps have problem 1 of nodes. when node not needed, things work, when needed things go badly. might worth looking @ log files on other nodes.


Comments

Popular posts from this blog

javascript - Count length of each class -

What design pattern is this code in Javascript? -

hadoop - Restrict secondarynamenode to be installed and run on any other node in the cluster -