python - Huge JSON lists without eating all the memory -
i have queue endpoint (celery) consumes batch of messages before working on them, writes them a temporary file process(spark clustering) consume. huge list of dicts, encoded in json.
[{'id':1,'content'=...},{'id':2,'content'=...},{'id':3,'content'=...}.....]
but keep messages in memory, json.dumps
generates big string in memory. can better storing in memory? can dump messages file arrive, not consume memory?
write own json encoder efficient json encoding. or use json.dump
passing in file pointer object.
also, don't read whole json file memory when consuming data. use json.load
instead of json.loads
, , use standard python iterator interface
Comments
Post a Comment