JobMon
Interactive GRID Job Monitoring

Authors:

Shih-Chieh Hsu
Elliot Lipeles
Conrad Steenberg
Frank Wuerthwein

Based on:

Hosted by:

Introduction

When a user submits a GRID job a large number of things can go wrong even after the job starts. For example, resources such as data handling systems and databases can be broken or inaccessible or the user can make simple errors in the assembly of the job. In order to quickly and efficiently detect and diagnose problems, users need to be able to access the running job before completion. Simple task such as being able to read log files in real time, or see when the job is consuming CPU can be of great value. The JobMon system implements a secure and authenticated method for users to access running GRID jobs. It is an generalization of the tools originally constructed for the CDF experiment. The general design of the JobMon system is driven by two factors major factors: locating and establishing communication with the job, and authenticating this connection. Authentication and security are particularly important because the user will be able to execute commands and access data on the job's computing resources.

System Design

There are three separate places where code is run in an individual transaction. There is the persistent Clarens web-service generally located at the execution site. The job runs a jobmond process which runs the communication and executes the commands on the worker node. The jobmond persists for as long as the job is executing. Finally there is the client code, which a users executes when they want to interact with the remote job, this persists only until the transaction has completed.

Documentation

Design Overview [ps] [pdf] [html]

Software

Installation [html]
Usage [html]
Download
- One could use source forge cvs download or VDT-1.3.7 installation
- Here is source tarball for JobMon Server code tar.gz
- Here is source tarball for JobMon Client code tar.gz

News

Nov 2, 2005

OSG Readiness Plan JobMon ReadinessPlan

Oct, 2005

VDT 1.3.7 Integration

July 5th, 2005

Now hosted SourceForge

Modified on Tue Jul 5 10:56:07 CDT 2005 by E. Lipeles