Using and abusing memory with LMDB in Kube

When synchronizing a larger folder in Kube, you’ll notice that the memory usage is growing at an alarming rate.

Surely something must be wrong? Let’s dig into that.

Here’s the output of ‘top’ after pressing “g3”, which gets you into the memory view.

PID %MEM   VIRT   RES      CODE  DATA   SHR    nMaj nDRT  %CPU COMMAND
3202 24.8  0.955t 1.907g     32  298064 1.744g   47    0   0.0 kube
3365 22.3  4.769t 1.713g     48   45188 1.693g   42    0   0.0 sink_synchroniz

RES and SHR are huge with 1.9 Gigabytes, so either we are leaking memory with no end or something else is going on.

But what do those various fields mean?

  • VIRT is the amount of virtual memory used by the process. Virtual memory does not at all correlate with physical memory, it just indicates how much memory the process could address at the moment. (You’ll notice that this is in the terabytes for kube and the synchronizer process).
  • RES (Resident Set Size) represents the portion of physical memory that is currently in use. This includes portions of libraries that are used by the process, and also the part of memory mapped files that was loaded into memory because the process accessed it.
  • SHR is finally the amount of memory that is shareable between processes. This again includes shared libraries, as well as memory mapped files.

For an in depth explanation head over to techtalk.intersec.com, they have an excellent explanation.

So, a likely culprit for our high memory consumption is the memory mapped database of course. We’re using LMDB as database, which is a memory mapped key-value store to hold our data. Memory mapped files are loaded into memory from disk as we access that memory, so if we access the complete file, eventually the whole file should end up in memory (given we have enough memory).

So let’s first check our assumption that the database is indeed the problem.

The file-size on disk closely correlates with what we see in Kube being used as SHR and that would also be part of RES:

6815891 1742848 -rw-r--r-- 1 developer developer 1784672256 Jan 24 11:42 /home/developer/.local/share/sink/storage/{5419f028-3b00-4c67-a281-4f166d37c7a9}/data.mdb

To further inspect this we can use ‘pmap -x’ to show us what the memory map is made up of. Among other lines we’ll find:

Address          Kbytes     RSS       Dirty Mode  Mapping
00007e6a44000000 1024000000 1742084       0 r--s- data.mdb

which clearly shows that the lmdb database occupies 1.7GB.

The value of RES is made up of anonymous resident pages (memory private to the process) and file-backed resident pages. But only file-backed resident pages also show up in SHR, which is why both SHR and RES are larger than our database file. From the output of ‘pmap -x’ above we’ll notice another crucial detail though, and that is that there is 0 “Dirty” memory. “Dirty” memory is memory that has been written to, and that would thus have to be “swapped” (spilled to disk), should the system run out of memory. Clean, file-backed memory on the other hand can simply be reclaimed by the operating system, as it can simply be reloaded from the file when required again.

So what we see in action is the operating system operating system disk cache that loads portions of the file into memory as we access it. It is good that it uses as much memory for that as possible, because that means subsequent accesses will be fast. However, it does not mean that memory is not available for other purposes. Which we can demonstrate too.

First, let’s restart Kube.

   PID %MEM    VIRT   RES   CODE    DATA    SHR nMaj nDRT  %CPU COMMAND
   83  1.2  1.908t  99656     32  108684  65808  210    0   0.0 kube
   89  0.4  4.769t  29356     48   27908  24932   11    0   0.0 sink_synchroniz
   91  0.3  4.769t  22700     48   17236  20380    4    0   0.0 sink_synchroniz
   87  0.3  4.769t  22408     48   16824  20240    6    0   0.0 sink_synchroniz

 

Initially there is little memory use, because haven’t accessed a lot of the database.

Scrolling down in our INBOX slowly grows that amount as we traverse the database:

   PID %MEM    VIRT   RES   CODE    DATA    SHR nMaj nDRT  %CPU COMMAND
   83  4.6  1.910t 367280     32  286332 313452  479    0   0.0 kube
  127  0.7 1164560  57888      4   77684  46612  258    0   0.0 QtWebEngineProc
  106  0.4  347004  35524      4    3100  29968   26    0   0.0 QtWebEngineProc
   89  0.4  4.769t  29816     48   28040  25168   11    0   0.0 sink_synchroniz
   91  0.3  4.769t  22700     48   17236  20380    4    0   0.0 sink_synchroniz
   87  0.3  4.769t  22408     48   16824  20240    6    0   0.0 sink_synchroniz

 

Finally we can force the operating system to reclaim that memory by exhausting the systems memory.

Running ‘tail /dev/zero’ to consume unbounded memory while running top, will show that kube releases all RES down to 11k, as soon as we start to exhaust the systems memory.

So, while it’s certainly worthwhile for the quality of a product to keep a close eye on how much memory is used and what it is used for, high usage is not necessarily bad.

After all you’ve paid for all the memory in your system, so let’s put that to (good) use! What’s important is that we also free up the resources for all the other things your doing.

Further reading:

For more info about Kube, please head over to About Kube.