Introduction
When a user submits a GRID job a large number of things can go wrong even
after the job starts. For example, resources such as data handling
systems and databases can be broken or inaccessible or the user can
make simple errors in the assembly of the job. In order to quickly
and efficiently detect and diagnose problems, users need to be able
to access the running job before completion. Simple task such as
being able to read log files in real time, or see when the job is
consuming CPU can be of great value. The JobMon
system implements a
secure and authenticated method for users to access running GRID jobs.
It is an generalization of the tools originally constructed for the
CDF experiment.
The general design of the JobMon system is driven by two
factors major factors: locating and establishing communication
with the job, and authenticating this connection.
Authentication and security are particularly important because the
user will be able to execute commands and access data on the
job's computing resources.
System Design
There are three separate places where code is run in an individual
transaction. There is the persistent
Clarens web-service
generally
located at the execution site. The job runs a jobmond process
which runs the communication and executes the commands on the worker
node. The jobmond persists for as long as the job is executing.
Finally there is the client code, which a users executes when they
want to interact with the remote job, this persists only until the
transaction has completed.
Documentation
Software
- Installation [html]
- Usage [html]
- Download
- One could use source forge cvs download or VDT-1.3.7 installation
- Here is source tarball for JobMon Server code
tar.gz
- Here is source tarball for JobMon Client code
tar.gz
|
|
News
Nov 2, 2005
OSG Readiness Plan JobMon ReadinessPlan
Oct, 2005
VDT 1.3.7 Integration
July 5th, 2005
Now hosted SourceForge
|