Android analysis implement virtual memory leak issues

There are many odd tweaks in the Android system, and you will often encounter some strange memory leaks. I recently came across a relatively rare app leak and am writing it here.

At first it was a weapon used by Android Studio: Profiler


The application runs for about two hours. The sexy picture is as follows. No apparent memory increase is found, but the following crash occurs:

02-04 13:15:03.661  1070  1087 W libc    : pthread_create failed: couldn't allocate 1069056-bytes mapped space: Out of memory
02-04 13:15:03.661  1070  1087 W art     : Throwing OutOfMemoryError "pthread_create (1040KB stack) failed: Try again"

E mali_so : encounter the first mali_error : 0x0002 : failed to allocate CPU memory (gles_texturep_upload_3d_internal at hardware/arm/maliT760/driver/product/gles/src/texture/mali_gles_texture_upload.c:1030)
02-04 13:15:18.664  1070  1627 E OpenGLRenderer: GL error:  Out of memory!
02-04 13:15:18.665  1070  1627 F OpenGLRenderer: GL errors! frameworks/base/libs/hwui/renderthread/CanvasContext.cpp:550

Can not set less than 1M of memory? Thanks to the following command, it appears that the system still has more than 500M of free memory.

cat /proc/meminfo

MemTotal:        2045160 kB
MemFree:          529064 kB
MemAvailable:    1250020 kB
Buffers:            1300 kB
Cached:           891916 kB
SwapCached:            0 kB
Active:           556296 kB
Inactive:         674200 kB
Active(anon):     235800 kB
Inactive(anon):   224668 kB
Active(file):     320496 kB
Inactive(file):   449532 kB
Unevictable:         256 kB
Mlocked:             256 kB

It’s weird. I wrote two scripts here, one is to automatically click the test process, and the other is to use the system, application memory, and the associated handle during the test.

Written test:

#input tap 800 500
#input text 1599828
#input tap 1600 500
echo "Auto test start:"
if [ "$1" == "" ];then
 echo "./ should with procecess name"
PID=`pidof $PROCESS`
echo "start test $PROCESS,pid is $PID:"
echo "======================================================="

echo $COUNT : `date`
# input tap 1800 900 
# input tap 800 500
#input text 1599828
 #input tap 1600 500
 #input keyevent 66
 #swich to face login
 #input tap 800 330

procrank | grep -E "$PROCESS|RAM"
cat /proc/meminfo  | grep -A 2  MemTotal:
echo "------------------------------------------------------"
sleep 2
input tap 1000 700
sleep 6
input tap 1900 80
sleep 2
#confirm button ok
input tap 1150 750

if [ ! -d "/proc/$PID" ];then
	TIME=$(date "+%Y%m%d_%H%M%S")
	echo "$PROCESS is died at:" `date`
	echo "save bugreport to:$BUG_REPORT"
	bugreport > $BUG_REPORT
 COUNT=$(expr $COUNT + 1 )

Part of the writing text is as follows:

TIME=$(date "+%Y%m%d-%H%M")



mkdir -p $WORK_DIR
mkdir -p $MEMINFO_DIR
mkdir -p $LSOF_DIR
mkdir -p $LOG_DIR
mkdir -p $SYSTEM_MEM_DIR
mkdir -p $STATUS_DIR
mkdir -p $THREAD_DIR
mkdir -p $PROCRANK_DIR
mkdir -p $FD_DIR

 #echo `date >> $LOG`
 #echo `date >> $SLAB`
 PROCESS_NAME=`cat /proc/$1/cmdline`

#set -x
if [  $1 ]; then
  echo "================================================"
  echo "Track process: $PROCESS_NAME,pid: $1 "
  echo "Start at : `date`"
  PID_EXIST=` ps | grep -w $PID | wc -l`
	if [ $PID_EXIST -lt 1 ];then
			echo "Pid :$1 not exsit!"
		exit 1
	if [ ! -r  /proc/$PID/fd ];then
			echo "You should run in root user."
		exit 2
  echo "You should run with track process pid!($0 pid)"
  exit 4
 echo "Update logcat buffer size to 2M."
 logcat -G 2M
echo "Save record in: $WORK_DIR"
echo Record start at:`date` >> $INFO
echo "$PROCESS_NAME,pid is:$1" >> $INFO
echo  -------------------------------------------------->> $INFO

echo "Current system info:" >> $INFO
echo /proc/sys/kernel/threads-max: >> $INFO
cat /proc/sys/kernel/threads-max >> $INFO
echo /proc/$1/limits: >> $INFO
cat /proc/$1/limits >> $INFO

  if [ ! -d "/proc/$PID" ];then
	echo "$PROCESS_NAME is died,exit proc info record!"
	echo  -------------------------------------------------->> $INFO
	echo "Record total $COUNT times." >> $INFO
	logcat -d >> $LOG_DIR/last_log.txt
	cp -rf /data/tombstones $WORK_DIR/
	TIME=$(date "+%Y%m%d_%H%M%S")
	echo "save bugreport to:$BUG_REPORT"
	bugreport > $BUG_REPORT
  NUM=`ls -l /proc/$PID/fd | wc -l`
  echo -e  $TIME_LABEL >> $PS
  `ps >> $PS`

  echo -e  $TIME_LABEL >> $MEMINFO_DIR/${COUNT}_meminfo.txt
  dumpsys meminfo $1 >> $MEMINFO_DIR/${COUNT}_meminfo.txt
  echo -e  $TIME_LABEL >> $SYSTEM_MEM_DIR/${COUNT}_sys_meminfo.txt
  cat /proc/meminfo >> $SYSTEM_MEM_DIR/${COUNT}_sys_meminfo.txt
  echo -e  $TIME_LABEL >> $PROCRANK_DIR/${COUNT}_procrank.txt
  procrank >> $PROCRANK_DIR/${COUNT}_procrank.txt

  COUNT=$(expr $COUNT + 1 )
  sleep $INTERVAL

The problem appeared to check about two hours. Compared with the information recorded in the above scenario, no leakage of the pen was found. Cat / Proc / Meminfo Display System Available memory is not significantly reduced. When comparing PS results from different time periods, some results:

u0_a9     1070  217   2057984 379264 SyS_epoll_ b5fd37a4 S com.cnsg.card
u0_a9     1070  217   2663308 581404 binder_thr b5fd38e8 S com.cnsg.card

After comparing Procrank’s processing after finding the time period, I found that:

The VSS part of the above process continues to grow. Search for information:

VMRSS can’t judge memory leaks, VMSIZE can

Typical memory leaks often increase VMSize and VMRS at the same time, and a memory leak can be found by watching VMRS (it probably doesn’t use something like Malloc); But we don’t say that VMRS has a memory leak.

In fact, monitoring VMSIZE is reasonable. The reason is as follows:

VMSIZE is full memory (file mapping, memory share, heap, any other memory, that has VMRS), its change doesn’t seem “too fast”, CH ___ MGR settles at 58824 K long ago, because no, because there is no no malloc / Free, VMSize will not go up

VMRSS is the physical memory that is actually being used. Because of business needs, this is reasonable.

For example, VMSIZE is the amount of assets the administrator owns (the sum of all major asset holdings, bank deposit money, house money, etc.)

VMRSS is the money that official puts into the house

We are now tracking that this administrator messed up the money and we should be tracking all of his VMSIZE assets soon

Obviously, it is funny that officials take money from the bank and put it in the house as corruption

It can be confirmed that a virtual memory leak has occurred. Then the next step is to confirm this process which is causing the virtual memory leak. We didn’t have this application code and we analyzed the deadlock. Then keep looking for hacks from /proc/1070//.

In the case of /proc/1070/, several process states are written as follows:

***/proc/1070 # cat status        
Name:   com.cnsg.card                    
State:  S (sleeping)                       
Tgid:   1070                              
Ngid:   0                                  
Pid:    1070                              
PPid:   343                                
TracerPid:      0                          
Uid:    10062   10062   10062   10062      
Gid:    10062   10062   10062   10062      
FDSize: 128                                
Groups: 3003 9997 50062                    
VmPeak:  3619748 kB                        
VmSize:  3531764 kB                        
VmLck:         0 kB                        
VmPin:         0 kB                        
VmHWM:    878692 kB                        
VmRSS:    487580 kB                        
VmData:  1317136 kB                        
VmStk:      8192 kB                        
VmExe:        16 kB                        
VmLib:    177524 kB                        
VmPTE:      3488 kB                        
VmPMD:        32 kB                        
VmSwap:        0 kB                        
Threads:        48                         
SigQ:   0/15299                            
SigPnd: 0000000000000000                   
ShdPnd: 0000000000000000                   
SigBlk: 0000000000001204                   
SigIgn: 0000000000000000                   
SigCgt: 20000002000084f8                   
CapInh: 0000000000000000                   
CapPrm: 0000000000000000                   
CapEff: 0000000000000000                   
CapBnd: 0000000000000000                   
CapAmb: 0000000000000000                   
Seccomp:        0                          
Cpus_allowed:   30                         
Cpus_allowed_list:      4-5                
Mems_allowed:   1                          
Mems_allowed_list:      0                  
voluntary_ctxt_switches:        619990     
nonvoluntary_ctxt_switches:     88065      

Explanation above:

Пользовательский процесс записывает реальную ситуацию памяти в файле/proc/{pid}/status.
    * VmSize:
             Размер виртуальной памяти.
             Весь процесс использует размер виртуальной памяти, которая представляет собой сумму VMLIB, VMEXE, VMDATA и VMSTK.
    * VmLck:
             Виртуальная блокировка памяти.
             Общее количество виртуальной памяти, используемой в настоящее время в процессе
    * VmRSS:
             Коллекция урегулирования виртуальной памяти.
             Это часть физической памяти. Это не обменивается на жесткий диск. Он включает код, данные и стек.
    * VmData:
             Данные виртуальной памяти.
             Виртуальная память, используемая для кучи.
    * VmStk:
             Виртуальный стек памяти
             Виртуальная память, используемая в стеке
    * VmExe:
             Исполняемая виртуальная память
             Виртуальная память, используемая в библиотеке исполняемой и статической ссылки
    * VmLib:
             Библиотека виртуальной памяти
             Виртуальная память, используемая в библиотеке динамических ссылок

Then continue to run the script and at the same time this process is also added to the watch. After a while, I found that in addition to changes related to the VM, the threads are constantly growing, and the test is paused, and the threads will not go down. Guys, it seems that the process continues to create the process, but it is not killed. Also, I want to know that the thread is being created continuously. It is used here: Pstree command

busybox pstree 1070
                |-{Jit thread pool}
                |-{Profile Saver}
                |-{Signal Catcher}

By comparing different periods, it is found that the type of thread such as rxcachedthread is increasing. At this point, the reason is clear. There is code to create threads continuously in the process. When creating, you need to apply for memory space, including virtual and physical. Virtual memory is full here. Error.

Leave a Comment