php - fastest way to read multiple documents from Riak -
i have tried reading documents in different format, have array of keys use reading. each batch contains 1000 keys. have 6 riak nodes. reads r=1. connect each time same riak node. documents profile fields, it's not big. i've checked cpu , disk usage on nodes, , can observe slight move, not close overall 20% cpu.
method 1: multiget
(code section linked here):
$data = $riak->multiget($ids); /*execution time 80seconds*/
method 2: key_filter
(code section linked here)
$data = $riak->multigetbetween($id1, $id2); /* gives riak internal timeout*/
method 3: 1 one get
foreach ($ids $key) { $riak->get($key); $data[$key] = $riak->document->data; } /* execution time 20seconds */
as see method 3 better, problem have of them, cannot run more 2 threads
. if try run more, socket timeout connection. checked linux open limits , it's 240k. ran out of options try , here. ideas?
the recommended approach retrieving multiple objects use multiple connections in order parallelise work , connect available nodes in order spread out load. has benefit returns object data metadata , results in quorum read , read-repair being performed. load can spread out across cluster. works best clients have support concurrency and/or threading.
for client libraries not, common approach try perform multi-gets mapreduce job. reasonably heavyweight way query data , requires riak set , execute mapreduce job. running large amounts of concurrent mapreduce jobs can therefore put lot of load on system. not result in quorum read , read-repair not triggered.
this doing in method 2 example. if know keys wish retrieve, more efficient specify these directly rather use key filter riak has scan lot less objects. if using leveldb backend, base query on secondary index lookup. in example noted using javascript map function. considerably slower using erlang functions , uses pool of javascript vms specified in app.config file. there erlang function available returns object value, , believe map phase should specified map(array("riak_kv_mapreduce", "map_object_value"))
.
some time ago did experiment little creating mapreduce functions return important data of riak objects, e.g. indexes, metadata , vector clock. results encoded json, means these functions limited data valid json. functions , simple examples , documentation can found in github repository. please note has not been tested extensively. have far not gotten around turning resulting output riak objects client libraries benefit it.
another way around issue of having retrieve large number of objects riak de-normalise data model in order ensure common queries can served through smaller number of requests. approach recommend scales well. if have data read-heavy, makes sense bit more work when inserting or updating data in order ensure data can read efficiently. how can done depend lot on data , access patterns.
Comments
Post a Comment