Using and abusing memory with LMDB in Kube
When synchronizing a larger folder in Kube, you’ll notice that the memory usage is growing at an alarming rate.
Surely something must be wrong? Let’s dig into that.
Here’s the output of ‘top’ after pressing “g3”, which gets you into the memory view.
PID %MEM VIRT RES CODE DATA SHR nMaj nDRT %CPU COMMAND 3202 24.8 0.955t 1.907g 32 298064 1.744g 47 0 0.0 kube 3365 22.3 4.769t 1.713g 48 45188 1.693g 42 0 0.0 sink_synchroniz
RES and SHR are huge with 1.9 Gigabytes, so either we are leaking memory with no end or something else is going on.
But what do those various fields mean?
- VIRT is the amount of virtual memory used by the process. Virtual memory does not at all correlate with physical memory, it just indicates how much memory the process could address at the moment. (You’ll notice that this is in the terabytes for kube and the synchronizer process).
- RES (Resident Set Size) represents the portion of physical memory that is currently in use. This includes portions of libraries that are used by the process, and also the part of memory mapped files that was loaded into memory because the process accessed it.
- SHR is finally the amount of memory that is shareable between processes. This again includes shared libraries, as well as memory mapped files.
For an in depth explanation head over to techtalk.intersec.com, they have an excellent explanation.
So, a likely culprit for our high memory consumption is the memory mapped database of course. We’re using LMDB as database, which is a memory mapped key-value store to hold our data. Memory mapped files are loaded into memory from disk as we access that memory, so if we access the complete file, eventually the whole file should end up in memory (given we have enough memory).
So let’s first check our assumption that the database is indeed the problem.
The file-size on disk closely correlates with what we see in Kube being used as SHR and that would also be part of RES:
6815891 1742848 -rw-r--r-- 1 developer developer 1784672256 Jan 24 11:42 /home/developer/.local/share/sink/storage/{5419f028-3b00-4c67-a281-4f166d37c7a9}/data.mdb
To further inspect this we can use ‘pmap -x’ to show us what the memory map is made up of. Among other lines we’ll find:
Address Kbytes RSS Dirty Mode Mapping 00007e6a44000000 1024000000 1742084 0 r--s- data.mdb
which clearly shows that the lmdb database occupies 1.7GB.
The value of RES is made up of anonymous resident pages (memory private to the process) and file-backed resident pages. But only file-backed resident pages also show up in SHR, which is why both SHR and RES are larger than our database file. From the output of ‘pmap -x’ above we’ll notice another crucial detail though, and that is that there is 0 “Dirty” memory. “Dirty” memory is memory that has been written to, and that would thus have to be “swapped” (spilled to disk), should the system run out of memory. Clean, file-backed memory on the other hand can simply be reclaimed by the operating system, as it can simply be reloaded from the file when required again.
So what we see in action is the operating system operating system disk cache that loads portions of the file into memory as we access it. It is good that it uses as much memory for that as possible, because that means subsequent accesses will be fast. However, it does not mean that memory is not available for other purposes. Which we can demonstrate too.
First, let’s restart Kube.
PID %MEM VIRT RES CODE DATA SHR nMaj nDRT %CPU COMMAND 83 1.2 1.908t 99656 32 108684 65808 210 0 0.0 kube 89 0.4 4.769t 29356 48 27908 24932 11 0 0.0 sink_synchroniz 91 0.3 4.769t 22700 48 17236 20380 4 0 0.0 sink_synchroniz 87 0.3 4.769t 22408 48 16824 20240 6 0 0.0 sink_synchroniz
Initially there is little memory use, because haven’t accessed a lot of the database.
Scrolling down in our INBOX slowly grows that amount as we traverse the database:
PID %MEM VIRT RES CODE DATA SHR nMaj nDRT %CPU COMMAND 83 4.6 1.910t 367280 32 286332 313452 479 0 0.0 kube 127 0.7 1164560 57888 4 77684 46612 258 0 0.0 QtWebEngineProc 106 0.4 347004 35524 4 3100 29968 26 0 0.0 QtWebEngineProc 89 0.4 4.769t 29816 48 28040 25168 11 0 0.0 sink_synchroniz 91 0.3 4.769t 22700 48 17236 20380 4 0 0.0 sink_synchroniz 87 0.3 4.769t 22408 48 16824 20240 6 0 0.0 sink_synchroniz
Finally we can force the operating system to reclaim that memory by exhausting the systems memory.
Running ‘tail /dev/zero’ to consume unbounded memory while running top, will show that kube releases all RES down to 11k, as soon as we start to exhaust the systems memory.
So, while it’s certainly worthwhile for the quality of a product to keep a close eye on how much memory is used and what it is used for, high usage is not necessarily bad.
After all you’ve paid for all the memory in your system, so let’s put that to (good) use! What’s important is that we also free up the resources for all the other things your doing.
Further reading:
- https://techtalk.intersec.com/2013/07/memory-part-2-understanding-process-memory/
- http://lmdb.readthedocs.io/en/release/#memory-usage
- https://symas.com/understanding-lmdb-database-file-sizes-and-memory-utilization/
For more info about Kube, please head over to About Kube.