Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: summarize_jobs.py Command Not Processing Jobs #231

Open
mauw10 opened this issue Jun 26, 2024 · 1 comment
Open

Issue: summarize_jobs.py Command Not Processing Jobs #231

mauw10 opened this issue Jun 26, 2024 · 1 comment

Comments

@mauw10
Copy link

mauw10 commented Jun 26, 2024

Hello Support Team,

I am encountering an issue with the summarize_jobs.py command on my CentOS 7 system. When I run the command:

[root@centos7 bin]# summarize_jobs.py -d

I receive the following output:

2024-06-26T14:14:13.600 [DEBUG] Using config file /usr/lib64/python2.7/site-packages/supremm-1.4.1-py2.7-linux-x86_64.egg/etc/supremm/config.json
2024-06-26T14:14:13.602 [DEBUG] Loaded 3 preprocessors
2024-06-26T14:14:13.605 [WARNING] Autoperiod library not found, TimeseriesPatterns plugins will not do period analysis
2024-06-26T14:14:13.606 [DEBUG] Loaded 35 plugins
2024-06-26T14:14:13.606 [INFO] Processing resource clusterbioproves
2024-06-26T14:14:13.606 [DEBUG] Using 3 preprocessors
2024-06-26T14:14:13.606 [DEBUG] Using 35 plugins
2024-06-26T14:14:13.612 [WARNING] /usr/lib64/python2.7/site-packages/pymongo/mongo_client.py:343: UserWarning: database name or authSource in URI is being ignored. If you wish to authenticate to supremm, you must provide a username and password.
"must provide a username and password." % (db_name,))
2024-06-26T14:14:13.639 [INFO] Processing 0 jobs
[root@centos7 bin]#

As you can see, it is not processing any jobs. However, when I run the indexarchives.py command:

[root@centos7 bin]# indexarchives.py -a -d

It processes the archives correctly, as shown below:

2024-06-26T14:16:39.331 [DEBUG] Using config file /usr/lib64/python2.7/site-packages/supremm-1.4.1-py2.7-linux-x86_64.egg/etc/supremm/config.json
2024-06-26T14:16:39.332 [INFO] archive indexer starting
2024-06-26T14:16:39.338 [DEBUG] processed archive /data/clusterbioproves/pmlogger/2024/06/mdrvpremst01/2024-06-25/20240625.10.55.index (fileio 0.00240302085876, dbacins 4.29153442383e-05)
2024-06-26T14:16:39.343 [DEBUG] processed archive /data/clusterbioproves/pmlogger/2024/06/mdrvpremst01/2024-06-25/20240625.11.13.index (fileio 0.00458288192749, dbacins 1.50203704834e-05)
2024-06-26T14:16:39.344 [DEBUG] processed archive /data/clusterbioproves/pmlogger/2024/06/mdrvpremst01/2024-06-25/job--begin-20240625.13.56.39.index (fileio 0.000778913497925, dbacins 8.89301300049e-05)
2024-06-26T14:16:39.346 [DEBUG] processed archive /data/clusterbioproves/pmlogger/2024/06/mdrvpremst01/2024-06-25/job--end-20240625.13.56.37.index (fileio 0.00105690956116, dbacins 1.31130218506e-05)
2024-06-26T14:16:39.346 [DEBUG] processed archive /data/clusterbioproves/pmlogger/2024/06/mdrvpremst01/2024-06-26/20240626.00.10.index (fileio 0.000596046447754, dbacins 8.82148742676e-06)
2024-06-26T14:16:39.379 [INFO] archive indexer complete
[root@centos7 bin]#

The directory contains the start and end job files:

[root@centos7 2024-06-25]# ls -l
total 2696
-rw-rw-r--. 1 centos centos 4492 jun 25 11:13 20240625.10.55.0.xz
-rw-rw-r--. 1 centos centos 252 jun 25 11:13 20240625.10.55.index
-rw-rw-r--. 1 centos centos 13584 jun 25 11:11 20240625.10.55.meta.xz
-rw-rw-r--. 1 centos centos 2336104 jun 26 00:10 20240625.11.13.0
-rw-rw-r--. 1 centos centos 792 jun 26 00:10 20240625.11.13.index
-rw-rw-r--. 1 centos centos 116479 jun 25 18:53 20240625.11.13.meta
-rw-rw-r--. 1 centos centos 29200 jun 25 13:56 job--begin-20240625.13.56.39.0
-rw-rw-r--. 1 centos centos 252 jun 25 13:56 job--begin-20240625.13.56.39.index
-rw-rw-r--. 1 centos centos 76596 jun 25 13:56 job--begin-20240625.13.56.39.meta
-rw-rw-r--. 1 centos centos 23080 jun 25 13:56 job--end-20240625.13.56.37.0
-rw-rw-r--. 1 centos centos 232 jun 25 13:56 job--end-20240625.13.56.37.index
-rw-rw-r--. 1 centos centos 76596 jun 25 13:56 job--end-20240625.13.56.37.meta
-rw-rw-r--. 1 centos centos 29167 jun 26 00:10 pmlogger.log
-rw-rw-r--. 1 centos centos 15565 jun 25 11:13 pmlogger.log.prior
[root@centos7 2024-06-25]# pwd
/data/clusterbioproves/pmlogger/2024/06/mdrvpremst01/2024-06-25
[root@centos7 2024-06-25]#

When I performed the initial job ingestion and subsequently executed indexarchives.py -a -d and summarize_jobs.py -d, it added data to the supremm database in MongoDB. The output of the command was:

[root@centos7 shm]# summarize_jobs.py -d
2024-06-26T14:00:20.480 [DEBUG] Using config file /usr/lib64/python2.7/site-packages/supremm-1.4.1-py2.7-linux-x86_64.egg/etc/supremm/config.json
2024-06-26T14:00:20.482 [DEBUG] Loaded 3 preprocessors
2024-06-26T14:00:20.494 [WARNING] Autoperiod library not found, TimeseriesPatterns plugins will not do period analysis
2024-06-26T14:00:20.495 [DEBUG] Loaded 35 plugins
2024-06-26T14:00:20.496 [INFO] Processing resource clusterbioproves
2024-06-26T14:00:20.496 [DEBUG] Using 3 preprocessors
2024-06-26T14:00:20.496 [DEBUG] Using 35 plugins
2024-06-26T14:00:20.507 [WARNING] /usr/lib64/python2.7/site-packages/pymongo/mongo_client.py:343: UserWarning: database name or authSource in URI is being ignored. If you wish to authenticate to supremm, you must provide a username and password.
"must provide a username and password." % (db_name,))
2024-06-26T14:00:20.544 [INFO] Processing 7 jobs
2024-06-26T14:00:20.549 [INFO] Skipping 1, skipped_noarchives
2024-06-26T14:00:20.623 [INFO] Skipping 2, skipped_noarchives
2024-06-26T14:00:20.644 [INFO] Skipping 3, skipped_noarchives
2024-06-26T14:00:20.650 [INFO] Skipping 4, skipped_noarchives
2024-06-26T14:00:20.655 [INFO] Skipping 5, skipped_noarchives
2024-06-26T14:00:20.660 [INFO] Skipping 6, skipped_noarchives
2024-06-26T14:00:20.664 [INFO] Skipping 7, skipped_noarchives
[root@centos7 shm]#

However, the issue is that most metrics are not displayed. For example, metrics like Avg: Total Memory: Per Core weighted by core-hour, Avg CPU %: System: weighted by core-hour, etc., do not appear. Only a few metrics are shown.
I have already performed an initial ingestion, and there are jobs that are displayed in the XDMoD interface.

Please advise on why summarize_jobs.py is not processing the jobs and how to resolve this issue.

@mauw10
Copy link
Author

mauw10 commented Jun 26, 2024

Additionally, the PCP files are created and contain information. For instance, when I query the PCP log file for job--end-20240625.13.56.37.0:

sysadmin@mdrvpremst01:/data/clusterbioproves/pmlogger/2024/06/mdrvpremst01/2024-06-25$ pmdumplog -a job--end-20240625.13.56.37.0 :
60.1.15 (mem.util.inactive): value 1588336
60.1.14 (mem.util.active): value 928364
60.1.13 (mem.util.swapCached): value 0
60.1.12 (mem.util.other): value 1477032
60.1.11 (hinv.pagesize): value 4096
60.1.10 (mem.freemem): value 360708
60.1.9 (hinv.physmem): value 3906
60.1.8 (swap.free): value 4088393728
60.1.5 (mem.util.cached): value 1800244
60.1.4 (mem.util.bufmem): value 362284
60.1.3 (mem.util.shared): No values returned!
60.1.2 (mem.util.free): value 360708
60.1.1 (mem.util.used): value 3639560
60.1.0 (mem.physmem): value 4000268
60.0.75 (disk.all.write_rawactive): value 117145270
60.0.74 (disk.all.read_rawactive): value 45351
60.0.73 (disk.dev.write_rawactive): inst [0 or "sda"] value 117145270
60.0.72 (disk.dev.read_rawactive): inst [0 or "sda"] value 45351
60.0.57 (kernel.percpu.cpu.irq.hard): inst [0 or "cpu0"] value 0
60.0.56 (kernel.percpu.cpu.irq.soft): inst [0 or "cpu0"] value 5055140
60.0.54 (kernel.all.cpu.irq.hard): value 0
60.0.53 (kernel.all.cpu.irq.soft): value 5055140
60.0.52 (disk.all.write_merge): value 954208
60.0.51 (disk.all.read_merge): value 8792
60.0.50 (disk.dev.write_merge): inst [0 or "sda"] value 954208
60.0.49 (disk.dev.read_merge): inst [0 or "sda"] value 8792
60.0.47 (disk.dev.aveq): inst [0 or "sda"] value 119078932
60.0.46 (disk.dev.avactive): inst [0 or "sda"] value 7795644
60.0.45 (disk.all.aveq): value 119078932
60.0.44 (disk.all.avactive): value 7795644
60.0.42 (disk.all.write_bytes): value 17907401
60.0.41 (disk.all.read_bytes): value 812463
60.0.39 (disk.dev.write_bytes): inst [0 or "sda"] value 17907401
60.0.38 (disk.dev.read_bytes): inst [0 or "sda"] value 812463
60.0.35 (kernel.all.cpu.wait.total): value 5538350
60.0.34 (kernel.all.cpu.intr): value 5055140
60.0.33 (hinv.ndisk): value 1
60.0.32 (hinv.ncpu): value 1
60.0.31 (kernel.percpu.cpu.intr): inst [0 or "cpu0"] value 5055140
60.0.30 (kernel.percpu.cpu.wait.total): inst [0 or "cpu0"] value 5538350
60.0.29 (disk.all.total): value 2046929
60.0.28 (disk.dev.total): inst [0 or "sda"] value 2046929
60.0.25 (disk.all.write): value 2012993
60.0.24 (disk.all.read): value 33936
60.0.23 (kernel.all.cpu.idle): value 1032698290
60.0.22 (kernel.all.cpu.sys): value 2384580
60.0.21 (kernel.all.cpu.nice): value 71920
60.0.20 (kernel.all.cpu.user): value 3572700
60.0.9 (swap.pagesout): value 0
60.0.8 (swap.pagesin): value 0
60.0.5 (disk.dev.write): inst [0 or "sda"] value 2012993
60.0.4 (disk.dev.read): inst [0 or "sda"] value 33936
60.0.3 (kernel.percpu.cpu.idle): inst [0 or "cpu0"] value 1032698290
60.0.2 (kernel.percpu.cpu.sys): inst [0 or "cpu0"] value 2384580
60.0.1 (kernel.percpu.cpu.nice): inst [0 or "cpu0"] value 71920
60.0.0 (kernel.percpu.cpu.user): inst [0 or "cpu0"] value 3572700

[256 bytes]
13:56:37.156456 5 metrics
2.3.3 (pmcd.pmlogger.host): inst [483049 or "483049"] value "mdrvpremst01.vhio.org"
2.3.0 (pmcd.pmlogger.port): inst [483049 or "483049"] value 4331
2.3.2 (pmcd.pmlogger.archive): inst [483049 or "483049"] value "/data/clusterbioproves/pmlogger/2024/06/mdrvpremst01/2024-06-25/job--end-20240625.13.56.37"
2.0.23 (pmcd.pid): value 474114
2.0.24 (pmcd.seqnum): value 14
This PCP log provides detailed metrics related to system performance and resource utilization during the specified job end time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant