c# - Producing a list of distinct points but running OutOfMemory -


i'm trying improve performance of method have reprojects points 1 coordinate system another.

list<point> reproject(list<point> points, string sourceprojection, string destinationprojection) 

to coordinate transformation pass points 3rd party library (fme.) i'm trying achieve take input list of points, select out distinct points list, pass transformation engine, reconstruct original list , return client.

my basic plan use dictionary<point, int> distinct points , assign index them, reconstruct original list index. here's rough code wrote test out behaviour:

var distinctpoints = new dictionary<point, int>(); var distinctpointsmapping = new dictionary<int, int>(); var pointnumber = 0; var distinctpointnumber = 0; foreach (var point in points) {     if (distinctpoints.containskey(point))     {         distinctpointsmapping.add(pointnumber, distinctpoints[point]);     }     else     {         distinctpoints.add(point, distinctpointnumber);         distinctpointsmapping.add(pointnumber, distinctpointnumber);         distinctpointnumber++;     }      pointnumber++; }  console.writeline("from input of {0} points, found {1} distinct points.", points.count, distinctpointnumber); var transformedpoints = new point[distinctpointnumber]; // replace call fme transformer  var returnval = new list<point>(points.count); pointnumber = 0; foreach (var untransformedpoint in points) {     var transformedpoint = transformedpoints[distinctpointsmapping[pointnumber]];     returnval.add(transformedpoint);     pointnumber++; }  return returnval; 

the problem i'm running outofmemoryexception when doing more 8m points. i'm wondering if there's better way this?

1. dictionaries use lot of memory. automatic resizing of large dictionaries particularly trouble-prone (memory fragmentation => oom long before expect it).

replace:

var distinctpointsmapping = new dictionary<int, int>(); ... distinctpointsmapping.add(pointnumber, distinctpoints[point]); ... distinctpointsmapping.add(pointnumber, distinctpointnumber); 

with:

var distinctpointsmapping = new list<int>(points.count); ... distinctpointsmapping[pointnumber] = distinctpoints[point]; ... distinctpointsmapping[pointnumber] = distinctpointnumber; 

2. reduce memory fragmentation, consider setting appropriate initial size distinctpoints (which need dictionary, fast lookup). ideal size prime number larger points.count. (i failed find reference suggest how larger -- maybe 25%?).

// have write "calcdictionarysize". see above text. int goodsize = calcdictionarysize(points.count); var distinctpoints = new dictionary<point, int>(goodsize); 

3. in extreme case, request gc prior code running. (this advice might debatable. however, have used myself, when unable find other way avoid oom.)

public void garbagecollect_major() {     // force gc of 2 generations - recent unneeded objects finalizers.     gc.collect(1, gccollectionmode.forced);      gc.waitforpendingfinalizers();      gc.collect(gc.maxgeneration, gccollectionmode.forced);      // may dubious. seemed maintain more responsive system.     // (perhaps 5-20 ms) because full gc stalls .net, give time threads (related gui?)     system.threading.thread.sleep(10); } 

then @ start of method:

garbagecollect_major(); 

caveat: explicitly calling gc not lightly. nor often. gc done "too often" might merely push objects gen 1 gen 2, won't collected, until full gc done. call when user requests operation take more 5 seconds complete, , has been shown prone oom.


Comments

Popular posts from this blog

javascript - Count length of each class -

What design pattern is this code in Javascript? -

hadoop - Restrict secondarynamenode to be installed and run on any other node in the cluster -