?
?
目錄
背景
步驟
獲取寫(xiě)文件的進(jìn)程號(hào)
文件被那個(gè)進(jìn)程使用,寫(xiě)數(shù)據(jù)不是用lsof可以找出來(lái)嗎,但現(xiàn)實(shí)情況是lsof沒(méi)找出來(lái)T_T
背景
centos7 在某一段時(shí)間監(jiān)控報(bào)警磁盤(pán)使用率達(dá)99%,由于監(jiān)控屬于概要形式信息,沒(méi)有快照信息的監(jiān)控(能發(fā)現(xiàn)某進(jìn)程的I/O,CPU消耗情況),所以需要在服務(wù)器上去定時(shí)執(zhí)行統(tǒng)計(jì)命令獲取快照信息。 需要通過(guò)iostat -dx -k去查看avgqu-sz、await、svctm、%util; sar -u查看%iowait、%user; pidstat -d 查看進(jìn)程I/O讀寫(xiě)的快照信息
步驟
生成統(tǒng)計(jì)信息文件
?
cat>/tmp/at_task.sh</tmp/pidstat_`date +%F_%T`.log 2>& 1 & sar -u 2 >/tmp/sar_`date +%F_%T`.log 2>& 1 & while [ 1 ];do echo -n `date +%T` >>/tmp/iostat_`date +%F` 2>& 1 && iostat -dx -k 1 1 >>/tmp/iostat_`date +%F` 2>& 1; sleep 2; done & EOF
?
在while循環(huán)中使用iostat的原因是要輸出date +%T時(shí)間,不然只有數(shù)據(jù),沒(méi)有時(shí)間信息也沒(méi)有什么用
使用at 命令定時(shí)執(zhí)行
?
at 15:14 today -f /tmp/at_task.sh
?
出現(xiàn)錯(cuò)誤
Can't open /var/run/atd.pid to signal atd. No atd running?
重啟atd服務(wù)
service atd restart
重新開(kāi)啟at定時(shí)任務(wù)
at 15:14 today -f /tmp/at_task.sh
job 2 at Wed Mar 13 1500 2019
得到如下快照信息
iostat
?
1535Linux 3.10.0-862.14.4.el7.x86_64 (ip-xxxxx) 03/13/2019 _x86_64_ (4 CPU) Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util vda 0.12 0.07 17.31 19.41 580.79 90.52 36.57 0.09 2.39 4.42 0.57 0.72 2.63 scd0 0.00 0.00 0.00 0.00 0.00 0.00 6.00 0.00 0.28 0.28 0.00 0.25 0.00
?
sar
?
0300 PM CPU %user %nice %system %iowait %steal %idle 0302 PM all 0.25 0.00 0.38 0.00 0.00 99.37 0304 PM all 1.25 0.13 0.63 0.00 0.00 97.99 0306 PM all 0.25 0.13 0.50 0.00 0.00 99.12 0308 PM all 0.50 0.00 0.50 0.63 0.00 98.37
?
pidstat
?
0300 PM UID PID kB_rd/s kB_wr/s kB_ccwr/s Command 0302 PM 5700 9089 0.00 6.00 0.00 uxxx 0302 PM 5700 9140 0.00 6.00 0.00 uxxx 0302 PM 5700 9292 0.00 10.00 0.00 uxxx 0302 PM 0 18084 0.00 2.00 0.00 bash
?
kill 掉收集信息的命令
?
ps -ef | egrep 'iostat|sar|pidstat|while' | grep -v grep | awk '{print $2}' | xargs -l kill
?
但ps -ef | egrep 命令沒(méi)有獲取到while循環(huán)的pid,不kill掉該while循環(huán),就會(huì)一直對(duì)/tmp/iostat_2019-03-13寫(xiě)數(shù)據(jù)-_-
通過(guò)lsof 沒(méi)有定位到打開(kāi)文件的進(jìn)程
?
lsof /tmp/iostat_2019-03-13 [root@ip-10-186-60-117 ~]# [root@ip-10-186-60-117 ~]#
?
通過(guò)lsof 可以定位到打開(kāi)mysql-error.log的進(jìn)程
?
lsof /opt/mysql/data/5690/mysql-error.log COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME mysqld 12858 actiontech-universe 1w REG 253,1 6345 20083533 /opt/mysql/data/5690/mysql-error.log mysqld 12858 actiontech-universe 2w REG 253,1 6345 20083533 /opt/mysql/data/5690/mysql-error.log
?
可見(jiàn),某進(jìn)程只有一只持有某文件的inode,才可以通過(guò)lsof查看文件在被那些進(jìn)程使用
獲取寫(xiě)文件的進(jìn)程號(hào)
安裝sysemtap
yum -y install systemtap
SystemTap 是對(duì) Linux 內(nèi)核監(jiān)控和跟蹤的工具
利用systemtap中的inodewatch.stp工具來(lái)查找寫(xiě)文件的進(jìn)程號(hào)
得到文件的inode
?
stat -c '%i' /tmp/iostat_2019-03-13 4210339
?
獲取文件所在設(shè)備的major,minor
?
ls -al /dev/vda1 brw-rw---- 1 root disk 253, 1 Jan 30 13:57 /dev/vda1
?
得到寫(xiě)文件的pid
?
stap /usr/share/systemtap/examples/io/inodewatch.stp 253 1 4210339 Checking "/lib/modules/3.10.0-862.14.4.el7.x86_64/build/.config" failed with error: No such file or directory Incorrect version or missing kernel-devel package, use: yum install kernel-devel-3.10.0-862.14.4.el7.x86_64
?
根據(jù)系統(tǒng)內(nèi)核版本在kernel-devel rpm build for :?Scientific?Linux?7網(wǎng)站上下載相應(yīng)的kernal-devel包
?
wget ftp://ftp.pbone.net/mirror/ftp.scientificlinux.org/linux/scientific/7.2/x86_64/updates/security/kernel-devel-3.10.0-862.14.4.el7.x86_64.rpm rpm -ivh kernel-devel-3.10.0-862.14.4.el7.x86_64.rpm
?
再次執(zhí)行stap
stap /usr/share/systemtap/examples/io/inodewatch.stp 253 1 4210339
......
Missing separate debuginfos, use: debuginfo-install kernel-3.10.0-862.14.4.el7.x86_64
Pass 2: analysis failed. [man error::pass2]
Number of similar error messages suppressed: 2.
安裝debuginfo kernal
?
debuginfo-install kernel-3.10.0-862.14.4.el7.x86_64 Verifying : kernel-debuginfo-common-x86_64-3.10.0-862.14.4.el7.x86_64 1/3 Verifying : yum-plugin-auto-update-debug-info-1.1.31-50.el7.noarch 2/3 Verifying : kernel-debuginfo-3.10.0-862.14.4.el7.x86_64 3/3 Installed: kernel-debuginfo.x86_64 0:3.10.0-862.14.4.el7 yum-plugin-auto-update-debug-info.noarch 0:1.1.31-50.el7 Dependency Installed: kernel-debuginfo-common-x86_64.x86_64 0:3.10.0-862.14.4.el7
?
再次執(zhí)行stap
?
stap /usr/share/systemtap/examples/io/inodewatch.stp 253 1 4210339 ERROR: module version mismatch (#1 SMP Tue Sep 25 1452 CDT 2018 vs #1 SMP Wed Sep 26 1511 UTC 2018), release 3.10.0-862.14.4.el7.x86_64 WARNING: /usr/bin/staprun exited with status: 1
添加 -v查看詳細(xì)報(bào)錯(cuò) stap -v /usr/share/systemtap/examples/io/inodewatch.stp 253 1 4210339 Pass 1: parsed user script and 471 library scripts using 240276virt/41896res/3368shr/38600data kb, in 300usr/20sys/320real ms. Pass 2: analyzed script: 2 probes, 12 functions, 8 embeds, 0 globals using 399436virt/196284res/4744shr/197760data kb, in 1540usr/560sys/2106real ms. Pass 3: using cached /root/.systemtap/cache/f5/stap_f5c0cd780e8a2cac973c9e3ee69fba0c_7030.c Pass 4: using cached /root/.systemtap/cache/f5/stap_f5c0cd780e8a2cac973c9e3ee69fba0c_7030.ko Pass 5: starting run. ERROR: module version mismatch (#1 SMP Tue Sep 25 1452 CDT 2018 vs #1 SMP Wed Sep 26 1511 UTC 2018), release 3.10.0-862.14.4.el7.x86_64 WARNING: /usr/bin/staprun exited with status: 1 Pass 5: run completed in 0usr/10sys/38real ms. Pass 5: run failed. [man error::pass5]
?
修改
?
vim /usr/src/kernels/3.10.0-862.14.4.el7.x86_64/include/generated/compile.h #define UTS_VERSION "#1 SMP Tue Sep 25 1452 CDT 2018" 改為 #define UTS_VERSION "#1 SMP Wed Sep 26 1511 UTC 2018" rm -rf /root/.systemtap/cache/f5/stap_f5c0cd780e8a2cac973c9e3ee69fba0c_7030*
?
再次執(zhí)行
stap /usr/share/systemtap/examples/io/inodewatch.stp 253 1 4210339
?
iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4671) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4677) vfs_write 0xfd00001/4210339 iostat(4683) vfs_write 0xfd00001/4210339 ............
?
可見(jiàn)已經(jīng)得到了寫(xiě)/tmp/iostat_date +%F?文件的進(jìn)程號(hào),但進(jìn)程號(hào)一直在打印出來(lái),因?yàn)楹笈_(tái)進(jìn)程iostat -dx -m 的在while循環(huán)中的,每隔sleep 2s 后就會(huì)執(zhí)行一次iostat 產(chǎn)生新的pid。
那要怎樣才能讓iostat -dx -m 停止寫(xiě)/tmp/iostat_date +%F?文件了?除了重啟大法好 $_$
rm -rf 也不能終止后臺(tái)的while iostat進(jìn)程寫(xiě)文件,刪除了文件后,while循環(huán)又會(huì)生成新的文件
?
rm -rf /tmp/iostat_2019-03-1* stat /tmp/iostat_2019-03-1* File: ‘/tmp/iostat_2019-03-13’ Size: 146700 Blocks: 512 IO Block: 4096 regular file Device: fd01h/64769dInode: 4210339 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-03-14 1626.211888899 +0800 Modify: 2019-03-14 1617.854019793 +0800 Change: 2019-03-14 1617.854019793 +0800
?
正確做法
?
cat>/tmp/iostat.sh<?>/tmp/iostat_`date +%F` 2>& 1 && iostat -dx -m 1 1 >>/tmp/iostat_`date +%F` 2>& 1; sleep 2; done & EOF at now + 1 minute today bash /tmp/iostat.sh #這樣就能方便的獲取到進(jìn)程號(hào)pid了 ps -ef | grep iostat root 8593 1 0 16:16 pts/2 0000 bash /tmp/iostat.sh
?
審核編輯:湯梓紅
評(píng)論
查看更多