P-GRADE Portal
An introduction without tears
Content
0 Preface
Release
notes to Version 2.4.1
- Possibility to store the data of the
end users in reliable databases: It has turned out that the default
hibernate function (HSQL ) supported by the Gridsphere is error prone,
and in some cases the logging data of the
users have been lost after the restarting of the Portal therefore the
Portal administrator is supported from this Release on to define and
set up an external Data Base for the storing of the log information.
- The data transfer load of the
information system has been substantially reduced. The BDII server will
be asked for data on user request.
- A new job submission strategy has been
introduced observing the current load of
the portal server and therefore ensuring a
tolerable response time for the user
- The own data resource handling of the
portal server has been reconsidered. In connection to this the
redundant storage of workflow result has been deleted, and
a more accurate quota handling implemented.
- A bug has
been fixed occurring at the concurrent up-and downloading
of the proxy certificates. This
failure occurring typically at conducted practices when many users
executes the same command within a short time.
- Automatic VOMS extension of
certificates has been introduced
- Jobs can be submitted to VO-s via the
Glite infrastructure as well
Release
notes to Version 2.4:
New features and improvement of
services:
Revision of remote file handling:
User option for non automatic copy to
the worker node. ( See managed copy
)
Revision of rescue handling:
The new functionality includes all types of resources involving
the submissions to a Broker
Enhancement of verbosity level,
localization and accuracy in the forwarding of the eventual
errors occurring in the grid infrastructure
Protecting the Portal server by the
introduction of a changeable limit of jobs being
submitted and observed in one time.
Revision of MPI job
handling: A totally new middleware ensures -(and guaranties in
defined circumstances ) the success of submissions in case of MPI
jobs
Bug fix:
Total revision of low level script
layer
Solving the memory leak
problem of the visualization
Known bugs:
B.1
The Ldap server sometimes delivers
such hosts for the information
system which reference a common cluster with different
hostnames within a given site. As the information
system has no additional knowledge to unify these clusters
the aggregated data gained from the component CE-s sometimes show
the multiple of the real values.
B.2
The sites of the selected VO in the
overview window of the
Information System display even those jobs not belonging to the
selected VO.
B.3
In case of several existing
Workflow Editor Windows on the
users desktop the "old" windows tend to become
zombies ( insensitive to user commands and loosing connection to
the server )
Release notes to Version 2.3:
New features:
Extended -user individual - quota
handling
Full archive facility for generated workflows (See chapter XIII_Workflow_archive service)
Release notes to Version 2.2:
New features:
Separation of external and
internal file name references in the input/output ports of
the jobs ( See No
more restriction on file references)
Connecting the Portal to the EGEE Grid and exploiting
in this case the Broker service of the EGEE Grid for the jobs of a
workflow directed to this grid.
(See chapter X_Connection_to_the_EGEE_Grids)
Fault tolerant behaviour of
workflows (See chapter XI_Rescuing_the_workflow).
Welcome menu to change the default settings of personal user data (
See chapter XII._Welcome_Menu)
Release
notes to Version 2.1:
New features:
This documentation
includes the new features of Version 2.1 highlighted in the
chapters VI_Multi-GRID_support , VII
Information System , VIII_Handling_of_remote_files,
and IX User quotas.
Deleted features:
The operations Copy and Paste of the Workflow
Editor considered as
unimportant and error prone have been deleted.
Bug fixes:
Edited workflows in transient
(incomplete) state can be stored in the PORTAL and retrieved for
further
editing.
ii. Introduction
The P-Grade
portals mission is
to give user friendly access to Grid resources which is a
technology in a rapid evolution.
This evolution is
"mapped" in the Portal which offers
general low level solutions for simple Globus Grids, and high
level solutions for the modern sophisticated Grids like the EGEE.
Throughout in this paper you will find descriptions of general low
level solutions and special considerations referring only to the
EGEE Grid.
As the P-Grade Portal is a Multi Grid portal, able to
connect Grids of
different kind a substantial effort has been taken to make
the functionalities of the Portal as orthogonal as possible.
However at some point the different aspects, conditions and
possibilities of the EGEE grid must have been mentioned mixed in
the general text.
I
The aim
The
P-GRADE
Portal offers a
comfortable method of handling
workflows from
any
connection point of the World Wide Web.
The P-GRADE Portal cover several Cluster and GRID related technologies
(GLOBUS2,GLOBUS3, Condor, CondorG, CondorDAGMAN, PVM, PMI )
to meet the need of the interested user which intends
to access
remote computational resources and
hides
the difficulties to activate them.
If
you are negligent about details or if you are a hardened
GLOBUS professional with bad
nerves you can get a head start with the chapter
IV where the usage of the
Portal is explained by
a comprehensive example.
A
Workflow is a bundle of
jobs you
want to edit, launch and observe from remote computer
resources where access rights have been granted
for you by so called
certificates.
Technically a
Workflow
is a directed acyclic graph (DAG) where each
node
has a
computing resource
and a program (
job ) to be launched on that resource;
further the edges of the graph are the ¨information pipelines¨
(streams) which connect the
input and
output points (
ports) of the individual
jobs.
(See
Figure_1)
Jobs
are executable
(sequential or parallel) applications represented by their
binary
code.
A
node
is a wrapper of a
job
containing the references
of its executable
code, to its I/O
connections and to its
resource.
(See
Figure 16
for an outer, and
Figure_18 for an intern
look of a
node )
The input connection points (we will use the term
port interchangeably with term
¨point¨ referring input and output connection points) of the
nodes
that are not connected to any other output point of any other
node
are representing the
input file-s
of the whole Workflow. The output points of the
nodes
not serving as inputs to any other
nodes
are representing the
output file-s
of the Workflow.
(Let
us note, that any internal pipeline (stream) can be marked as either
volatile or
permanent, in this later case the
data flowing through it will be regarded and recorded as an
output_file of the Workflow , see
Figure_24 )
The
task of the Workflow is to generate OUTPUT files from the
INPUT ones.
There are several subtle points to
emphasize:
- Two jobs
of separate nodes
can be executed in parallel if they are independent, i.e. one has no
role in the generation of the inputs for the other, AND all
inputs
(represented by the edges of the graph) are available (either have been
prepared by successful terminations of the preceding jobs
of the calling sequence defined by the directed graph or have been
defined as the original input files of the whole Workflow ). Simply
saying the structure of the directed graph determines the order of the
execution.
- You must separately assign computing resources
(from the list of available ones - see: Setting_the_resources ) to the
unique jobs,
so you can instruct the workflow to execute at the same time -
for example - the job "Budapest" in a site in Hungary and job
"Paris" in a site in France and to
gather
the outputs from them in a third job "London" in a site in Brittan.
- The resources
must be part of the Grid implementing the GLOBUS2 middleware and must
accept those credentials that the users of the P-GRADE
Portal have got.
Figure
1
II The Players of
the PORTAL infrastructure and
their identifications
1. The
Players
Now let us summarize the main actors participating in the handling of
the
workflows (
see
Figure_2 ).
1.1
The user's
desktop machine.
You need an Internet-connected desktop
machine with a
browser which is able
to access the WWW.
Please note that the
user works with
two different user interfaces
in a parallel way when he/she uses the
P-GRADE
Portal :
1.2
The Portal server
There is a remote
Portal
server which you can access by a
browser.
This server will be used to store your
code,
program data (first of all local
input_files),
the
graphs of the workflows, the list of the defined
resources
and the living short term (proxy)
certificates.
From here you can download your workflows to edit and also from here
can
you launch your workflows, and the results can be downloaded from here
as
well.
The data stored in the Portal server on behalf of a single user is
restricted by the
user quotas.
1.3 The
set of remote resources ( the GRID)
The most important part of the
infrastructure is the set of
remote
computational
resources
(generally of computer clusters) where the
jobs
may actually run.
The resources are subordinated under
Grids.
See more detailed in paragraph
Setting:
Defining the resources
Complex
Grids
may subdivide the set of users and the
resources
accessible by
them in
virtual
organizations(VO). However this mapping may be
overlapping:
a user and a
resource
may belong to more than one
virtual organization of the Grid.
In these grids the access
right
represented by a user
certificate may
be associated to one (or more) virtual
organization(s) and not to the whole Grid.
The EGEE Grid requires that the user be registered at one
VO.
There is a general rule that a user must belong to just one
VO.
The registration procedure and policy
is VO dependent and not covered in this paper.
Resources are abstractions and associated to
sites, which
performing the task of a given resource.
In the EGEE Grid a site may serve
resources belonging to different virtual organizations. These
resources are not only computational resources (see Computing
Element) but storage resources
(see Storage
Element) as well. The resources
of a site may be shared by different virtual organizations. However the
user access to a resource must be completed only by a valid VO
membership reference.
Basically the default resources are set by the
system administrator in a static way . These data may be
inherited by common users, and can be extended or changed at will.
Therefore these settings may not correspond to the actual state
of
the Grid . The portlet
Information_System
is used to gain actual data about he Grid.
For the time being there is only a restricted
facility in the P_GRADE Portal allowing the automatic setting
of resources found by the Information System. ( See the button
Load
resources from MDS2 in Figure 12b)
1.4 The
Certificate Server (MyProxy)
At last we mention an administrative
player, the Certificate server,
which is a repository of ¨certificates¨.
A
certificate is virtual identity card
granting access to a set of
resources.
Certificates must be
signed by a trusted
Certificate Authority
(CA).
To understand the importance of this last one here is a little notice:
These players (the
user, the
Portal_server, the
resources)
are connected through an unreliable channel - the Internet - therefore
they have to build secure connections to identify themselves and to
have
sufficient protection from unjustified access. These rather
complicated tasks are executed with the help of the certificates which
have an identity card feature - granting access to an expensive
resource
only up to a limited amount of time.
Your previously obtained
personal
certificate
containing your personalities (
distinguished_name,
your
public_key , the expiration
date of the public
key, the name of the CA) as not encoded
open data
must have been issued and
¨signed¨ by a trusted
Certificate_Authority to
identify you.
The
distinguished name contains the
family and given name, organisation unit, organisation of the user
introduced by standard prefixes ( CN=,OU=,O=).
The
public key
(PK) is the binary code by the help of a messages
which has been previously encoded by the
secret key
(SK) can be decoded :
message = decode ( PK, encode( SK , message ) )
Each agent (the users and the
Certificate_Authority)
publishes own
public_key and
hides own
secret_key)
Technically the
signature means
an
additional text to your certificate file containing the
open_data
processed in three steps:
The MyProxy Certificate Server stores
the
public key of the
Certificate_Authority
- in form of a special
certificate
- and therefore this server
is able to decipher your
public
key, vouch for you and represent you
against third person, what is in our case a remote
resource.
The representation happens by issuing a short term -so called-
proxy
certificate signed by the ¨MyProxy¨ Certificate server.
This
representation is needed because the
resources
do not accept directly the personal certificates.
This delegation method has four advantages against the direct use
of personal certificates:
- The user needs to execute the rather complicated process of
handling of the personal certificate very seldom:
generally the first time he/she uses the P-GRADE
Portal , and in the cases when the long term certificate expires,
or
he/she obtains other certificates granting permissions to additional resources. ( See more at Uploading_a_personal_certificate
)
- The proxy certificate is a short term one, mutually
securing the user and the administrator of the used resources
against consequences of infinite loops resulting from program
errors, and against unwanted tempering, intrusion of third party.
(See how to obtain
a short term proxy )
- The resources
need not know the CA-s of the personal certificates of the users just
the single CA of the proxy server. The duty of the maintaining of
the
list of acceptable CA-s can be delegated to the administrator of
the MyProxy server.
- The generation of a proxy certificate - issued by a secure
machine - does not need the "pass phrase" by which a person can prove
his or her
right to the possessed private key file needed during the personal certificate upload process.
2.
The identifications
To handle the agents of the P-GRADE Portal environment there are
four different kinds of identification
interesting from the viewpoint of the users:
2.1 User against
the Portal Server
2.2 User against
own "userkey" file
2.3 User
against the Certificate Server
The third kind of identification is
associated to a
certificate account
of your personal
certificate
on the certificate server MyProxy.
You use this identification if
2.4 User against the Virtual
Organization
III.
Overview of the operation of the PORTAL
A possible full operation cycle is the
following scenario:
0.
Preparation:
Notes:
- Users may receive an account
from the Administrator of the Portal. (See the separate
document "Short Administrator's
Manual of the P-GRADE Portal")
Since the Release 2 of the P-GRADE
portal the account includes a user quota.
- Before starting a P-GRADE
Portal session the user must have already compiled and
hopefully
tested all the executable jobs intended to
feed as a workflow. In addition the user also must
have
the
references to the necessary input files.
The
executables of the jobs must be in the
local file system of the user of the
portal. Here
is a very important point to be emphasized:
In the obsolete version of the P-GRADE Portal the jobs
of a workflow might contain only local file references with the
consequence that the Portal_server
must have copied and stored these files in the host where the Portal_server
runs. Now this limitation is lifted for the input files and the
user can define an input_file by
its proper URL when the referenced resource infrastructure
supports
this feature. This enables the late binding i.e. the Portal_server
will not need to store and to relay the files defined this way just
their addresses instead.
See more detailed in VIII_Handling_of_remote_files
0.2 Users of the EGEE grid
Beyond all what has been described the
in the previous point the EGEE users must be members of
virtual organisations.
There is a general rule that a user must belong to just
one
VO.
Generally a user certificate
is
required for the VO membership registration. This
certificate must be trusted by the
Grid the
VO belongs to.
SPECIAL WARNING to the users of the
virtual organisation Gilda, and to the users of other VO-s
requireing certificates with WOMS extension:
The EGGE Grid community is in
transition from using the simple Grid certificate to the usage of
Certificates including VO specific extensions (VOMS).
This enables a more reliable and secure access to more than one VO with
one certificate.
However the VOMS related extension of the MyProxy service has been not
finished up to now and the API interface to the My Proxy service is
error prone.
The intermediate consequence is that the
Certificate/upload functionality
(See
Figure 2 and
Section 1
Uploading_a_personal_certificate) can not be executed
within the Portal for the time being.
The suggested roundabout is the issuing the of the following command in
a UIF machine belonging to the given VO where the valid certificate of
the user is already inserted:
myproxy-init
--voms <VO> -s <Host_of_MyProxy_Server>
-p <Port_of_MyProxy_Server>
-l <Proposed_user_account_name_on_MyProxyServer>
Example:
./myproxy-init
--voms gilda -s grid001.ct.infn.it -p 7512 -l myGildaCert
where the "myproxy-init" must be the special updated command
(written by the gilda people not complaining because of the "--voms"
parameter )
Please
note that the command prompts:
- the first time for the passphrase of the certificate
(belonging to the secret key file, See chapter 2.2)
- the next time for a password of the new certificate
account <Proposed_user_account_name_on_MyProxyServer>
(See
chapter 2.3)
Figure
2
1. Uploading a
personal certificate
By ¨
Certificate/
upload¨ (
Figure_2
) the user sends a personal
certificate
to the
Certificate_server
and establishes a
certificate
account.
This step happens rather seldom, because the expiration time
of personal certificates is fairly long.
The uploading process is a rather
complicated transaction
started from
Figure 5 and explained
detailed in
Chapter IV 2.1.
The upload creates
a
certificate account
of the certificate, and the user must remember the name and
the password of it for the
subsequent proxy generations.
(See also
Chapter
II 2.3)
2. Receiving a
short term - proxy - certificate.
3.
Setting: Defining the
resources
Filling a simple table of the
Portal_server
the user can define the URL and the access way to the basic services of
the
remote
resources where the jobs may run
(See also
Figure 12a and Figure 12b) See detailed the steps of
definition at
resource definition.
If the selected
GRID has an
information system, the
information system may automatically explore the possible sites and
services. See:
VII_Information_System
The user need not bother with the definition (finding) of
resources and connecting them to the jobs in the special
case she or he has access to an EGEE like Grid, because in this
case the Broker service does this task. See it more detailed in
Chapter
X_Connection_to_the_EGEE_Grids_and_the
usage of the Broker
4.
Defining a workflow
The user can create new workflows,
and load and archive existing ones.
Please note that the creation process is done with a SEPARATE program,
in a different window (the
Workflow_Editor)
which is downloaded
from the
Portal_server
and runs on your desktop. This has two consequences:
- If you have created or modified a Workflow application you must
use the ¨EDITOR/save|upload¨
(Figure_2)
command to send it from the temporary storage of your desktop to its
final storage which is on the Portal_server.
Similarly you may use the EDITOR /
Upload (Figure_2)
command to send local input_files
, and the code
of the executables to the Portal_server.
There is only a slight difference between save and upload: save
refers only to the skeleton file defining the Workflow but upload
refers
to
the local files mentioned in a Workflow description. You need not to
bother
with this: The system prompts you to upload the referenced files, when
there is no copy of them on the Portal_server.
In the other direction you may use the EDITOR/ open command to download
an existing Workflow from the Portal_server
in order to see it or to modify it.
- As the Portal_server
(connected to your desktop by HTTP) and the involved
portlet Workflow/Workflow_Manager
works
asynchronously with your Workflow_Editor
running on your desktop, you need to hit the Refresh button (visible for example
in Figure 35) whenever you
want to see the current state of the Portal_server.
(For new users it may be a little confusing, that upon saving a new
workflow it will not appear in the list of executable workflows
until clicking on ¨Refresh¨ )
- A previously uploaded Workflow can be saved from the Portal_server
to the User's desktop
machine by the Workflow/Storage/Download
command.(Figure_2)
See chapter XIII_Workflow_archivation
for details.
- From an archive of User's
desktop machine the user can reload a workflow into the Portal_server
by the Workflow/Upload
command. (Figure_2)
See chapter XIII_Workflow_archive
for details.
There is an important and suggested
different way of defining the
workflows:
They can be
imported from
the P-GRADE development tool (
P-GRADE). This
way has some advantages against the manual editing the Workflow in the
P-GRADE
Portal :
- The code of the workflow (the component nodes
with all input_file
mappings and the graph of the workflow) can be developed and tested
with the debug and graphic monitoring facility of the P-GRADE.
- To facilitate graphic monitoring of the running
application the source code can be
instrumented automatically in
P-GRADE, a process to insert trace sending code conforming to the
common graphic monitoring standards of
P-GRADE and of the current P-GRADE
Portal.
In the Workflow EDITOR program you
use the menu item
Workflow/Import
workflow (See
4.1.1.2_Import_process)
to open the file browser "
Import
Workflow"
in which you can search for the needed workflow files distinguished by
the name extension ".wrk" .
To learn more about P-GRADE please consult with
P-GRADE
4.1 Short
introduction in the Workflow Editor
You have learned already, that the
Workflow_Editor
is a separate graphic program which can be started from the
Workflow/Workflow_Manager portlet
by the button
Workflow Editor
of the
Workflow-tab and it runs
in the
desktop
of the user.
Shortly speaking the
Workflow_Editor
can create, modify and save a workflow. You will find in Chapter
IV a rather long introduction
in
the
use of the
Workflow_Editor. Here is
only
a short summary of the most important menu items of it
4.1.1
Workflow
creation
Workflow creation is the process when we define a new workflow on the
P-GRADE
Portal. The creation may be interactive building process, or an
import process:
4.1.1.1
Interactive building process
A new workflow can be created
within a recently established window (as
you see it at
Figure_14)
or within an existing copy (see
Figure
32) of the Workflow Editor program.
With the menu item
Workflow/New you may create a
new
empty workflow.
By the subsequent application of
Workflow/New
job and
Workflow/New port
you can build the proper parts of the graph of the workflow. (See
Figure_15)
4.1.1.2
Import process
A whole
workflow previously
built and tested by the application
P-GRADE
can be imported from the desktop machine with all of it dependent
parts by menu item
Workflow/Import
Workflow. (See
Figure32 ) The selected
menu item opens a
file browser enabling to select a workflow file of file type
extension
wrk,
which will be uploaded to the
Portal_server.
This workflow will behave just the same way as the workflows you have
manipulated manually. However in most cases you need only to check the
destination
resources
of the component
jobs.
4.1.2
Workflow saving
A just created
workflow has no name.
It must be saved by the menu item Workflow/
Save as. (See
EDITOR/Save| Upload on
Figure_2) This command
has two
effects: it uploads the workflow with its user defined name to
the workflow repository of the
Portal_server
and puts the workflow in the launch list of the portlet Workflow
Manager. After
any modification of the workflow (see
4.1.3_Workflow_modification_
)
the menu item Workflow/
Save
has the same effects. If the saving process finds that any of
the referenced files mentioned in the description of the saved
workflow have not yet been uploaded to the
Portal_server
(or not valid –see later) it prompts the user to enable the
start of the automatic upload process. Therefore the manual issue of
the
menu item Workflow/
Upload is
seldom used. (See
Figure_32)
4.1.3
Workflow
modification
Any part of a saved (or recently
created) workflow can be modified:
of the
ports ( See detailed at
Figure 21 and at the subsequent
Learning Notes on Port
Properties).
To handle these changes the user needs to access to
the workflow (i.e. to download it from the
Portal_server
to the desktop) by the menu item Workflow
/Open.
The user selects the needed Workflow from the Workflow Repository of
the
Portal_server.
Changes the user any file reference during the modification
process (even if he/she restores the previous text) a hidden marking
will record the event, and the previous file reference will be
invalidated, with the consequences that after the subsequent Workflow/
save command the user will be
prompted to enable the needed upload. Shortly speaking the system
automatically maintains the data consistence between the definition
environment (desktop machine) and the
Portal_server,
and the user is exempted from the duty to delete the obsolete files
from the
Portal_server. (See
detailed at
Figure 33 and the subsequent
Learning_Notes_on_Upload_File
)
The actual modification steps are discussed in details via examples in
Chapter IV in paragraph
4._Building_your_workflow_
4.2
Workflow deletion
5.
Starting a workflow
Using the
Workflow /Workflow
Manager/ Submit (Figure_2) command you
can
submit the
prepared workflow to the GRID i.e. you may let it run. Certainly the
following conditions must be fulfilled which are controlled by the
system partly at creation time and partly at load time:
- The input_files
and the executable codes
must be available. - See the note in the point 0._Preparation:
- The resources
(processors, clusters where the jobs belonging to
the nodes
should run) must be defined.
- The selected short term (proxy) certificate
must allow enough time limits and all of the requested resources
must accept this certificate.
See more detailed in paragraph
Run_time_user_actions
6. Observing
the progress of a workflow
If the submission was successful and
the
jobs
begin running you can follow the progress in three different ways:
6.1 Progress info
from the Workflow Manager
First of all, the elements of the
Workflow list of the
Workflow Manager (
Figure_38)
inform you about the state of the whole workflow (column
Status), and
about the eventual results (column
Output). The
elements of the column
View have
a tree structure, and their roots are the buttons
Details (
Figure_39).
In the detailed mode a sub list describes the state of each
job
composing the selected workflow.
Size shows the size of the
storage needed by the Workflow in the host of the
server.
Quota shows the percentage of
the
quota permitted for the
user.
The label of the column
Quota
includes the information about the full size of the quota (in
case of
Figure 38 it is 1 MB),
and the last line of
Workflow
list summarizes the percentage and the size of
occupied storage.
6.1.1
Detailed view (Figure_39)
In this list each line corresponds to a
component
job.
The line contains the following fields:
-
Workflow
name of workflow inherited from the root menu
-
Gridname name
of the Grid (or of the virtual organization) where the job runs.
-
Job
name of current component, as the user defined it in
the text field of "Name" of the job definition window
<jobname>properties.
(See
Figure 17 ,
Figure 18 )
-
Hostname
host where the
job
runs
-
Status
Status information must be distinguished between Workflow status and
Job
level status.
The possible Job states with proper coloring and in the natural
sequence (when applicable) are:
init
(white)
submitted (orange) only in
case of
brokering
(Since Release 2.2)
wait
(blue) only in case of
brokering (Since Release 2.2)
scheduled (magenta) only in case of brokering
(Since Release 2.2)
running (Red)
finished (green)
error (blue)
The possible Workflow states are
init
(white in overview window, green in detailed view)
The workflow is uploaded in
the Server
submitted (orange in overview
window, white in detailed view)
On user action and when no job is Run state
running
(red in overview window, white in
detailed view) On first
job enters Running state
finished
(green in overview window, white in detailed view
)
When the last job terminated successfully
error
(blue in overview window, white in
detailed
view)
On error in one job and with no possible jobs to run
rescue
(blue in overview window, white in
detailed
view)
On error in one job and with no possible jobs to run (Since
Release 2.2 See
rescue)
aborted
(red in overview window, white in
detailed
view)
On user action
-
Logs
buttons
Out and/or
Error to read the eventual files
written by the system on ¨stdout¨ and ¨stderr¨
respectively
-
Output A
green button indicates that, the application terminated
successfully and the result can be downloaded
-
Visualization
eventual buttons Visualize , All to call the graphic
monitoring for the whole workflow, or for the proper
job
or
for each possible parts.
-
Action
This array of buttons is inherited from the root menu and
will be discussed in paragraph
8_Run_time_user_actions_
The user should return to the root menu by hitting the button
Back
6.2 Progress info
from the Workflow Editor
If the graph of
the running workflow
is selected and visible in the Workflow Editor then you can see the
progress by the changing of the coloring of the
corresponding
nodes
of the
jobs that are being executed.
You can start the Workflow Editor in
two different ways to see the progress:
- either by the
Attach
button of that workflow in the Workflow list of the
Workflow Manager
- or by hit the button
Workflow Editor
of the
Workflow
Manager and
Open
the Workflow to be observed.
In accordance with the convention discussed in
6.1.1_Detailed_view_ the following
colors are
used:
orange The
job
waits for user submission
white
The
job
is submitted and waits to
run
red
The
job
is running
green
The
job
is finished
blue
The
job
has been aborted either by the system or by the user
Note that since the release 2.2 there is an additional state of a job
in the case it is running under the control of a Broker of the
EGEE Grid:
magenta The
job is scheduled
by the broker
Note that since the release 2.2 a new color indicates that the workflow
failed but can be restarted from a natural checkpoint composed upon the
jobs finished successfully:
saddle brown The
job is in
"rescue" state
6.3 Progress info
by Monitoring and Visualization
7.
Fetching the result
The last step is fetching the
result. The
Portal_server puts
the results in a zip
file.
This is a compressed directory hierarchy which follows the structure of
the
workflow graph. A subdirectory will be generated for each
node,
where the associated permanent local
output_files
are stored. The user can fetch
the
zip file by the standard download manager of the browser used.
This step is marked in
Figure_2 as
¨
Workflow/Workflow
Manager/output¨.
Please note, that the remote output files will not be retrieved to the
portal server and can be accessed by methods beyond the control
of the Portal. See
Figure 8.1
Please note, that the download manager
is part of the browser, and it is the responsibility of the
administrator of the user's web site to set it up properly.
8.
Run time user actions
The recent version of the Portal
Program maintains two different lists of workflows:
- The tag Workflow/ Storage
lists all workflows of the user . See chapter XIII_Workflow_archive
- The tag Workflow/Workflow
Manager lists only the active
workflows of the user.
The meaning of discrimination between the active and inactive workflows
is the following:
The cost of operation of polling the state of active workflows is
expensive because of heavy net traffic. Therefore the user may
get much slower responses if the number of active workflows
(
and the complexity of them ) is high.
The both lists will be updated as a consequence of
the
Save or
Save as command of the
Workflow Editor but
the the
Delete
command may has different consequences:
- The Delete command
of Workflow/ Storage deletes
the file system representing definition of the selected workflow
from Portal server irrevocably. See chapter XIII_Workflow_archive.
- The Delete command
of the Workflow/ Workflow Manager has the
option
to delete the selected Workflow only from the list of active Workflows
i.e. from the Workflow Manager.
The user can use the buttons of
the
Actions
column (
Figure_38) in the row belonging to the
to each
elements of the
Workflow/
Workflow Manager list
- to start or stop the application by the toggle button Submit/Abort,
- to Delete
the whole
Workflow from the Portal_server
(each file belonging to the definition and to the running
state of that workflow)/ or from the list of active workflows.
- to show the workflow via the Workflow_Editor
by hitting the button Attach.
Delete all deletes all
workflows
of the user from the
Portal_server
or from the list of active workflows.
IV The detailed
operation of the PORTAL by an example
During the tour
you will build, start and observe a little test
workflow.
After the general preparation you will find the description
of
the workflow to build and submit after
Figure 14
1 Login
The user can reach the PORTAL through
a proper URL. For
example:
There you should find - depending on
the browser - something like this:
Figure
3

Figure 4
From here you can
launch the activities shown in
Figure_2
Let us begin with the
Certificate Manager
by selecting the tab
Certificates.
2.Certificates:
Setting
access rights to resources
Figure
5
The user can
Upload a personal
certificate to, or
Download
a temporary
proxy_certificate
from the
MyProxy
certificate server with the help of
the
Certificate Manager
triggered by the Certificates tab.
The very first action can be to fill the Certificate Server (
the so
called MyProxy server) with the existing personal certificates of the
user .
Please
select
Upload:
2.1
Upload detailed
In this process the user creates a
certificate account
Please rememeber that the
Upload step -for the time being -
must be skipped and done in a different way (outside of the scope of
P-Grade Portal) in the case when your Virtual
Organisation uses WOMS extension to the certificates. See
WOMS_Warning
Figure
6
The first screen of the upload process
requires your file named
userkey.pem containing your
secret_key (see
Figure_6).
You can search for it in your local directory system using the
Browse tool.
Fill the input field and accept it with
OK. The next panel requires a
password for your secret file as
Figure_7
shows (see also
User
identification against_own_userkey_file ) :
Figure
7
Upon
OK
the certificate file
will be requested. This
certificate
entitles you to use certain
resources
for a limited amount of time.
(See
Figure_8)
Figure
8
Upon
OK
you will see the window depicted in
Figure_9
where
you must select an existing
certificate_server(MyProxy)
by
hostname and
port, and must define an account (
login name and
password ) on it where your
certificate will be stored.
( See also
User
identification_against_the_Certificate_Server)
This
Certificate account stores just one certificate, so
this ¨login¨ is
actually the user name of the given certificate. The default host and
port of the Certificate Server is given by the system. You see here an
additional input field, the
lifetime -
this will be an upper limit for the short term proxy certificates you
may request by a subsequent download. You must hit
Upload to perform the
operation.
Figure
9
The system acknowledges the end of the successful upload process:
Figure 9a.
Next you may generate a short term
proxy_certificate. To get it you can
use the
Download button
of
the
Certificate List menu open in this state ( see
Figure 9a)
You will get Download menu:
2.2
Download detailed:

Figure
10
The proxy certificate will be generated
from the personal certificate by filling a form and will be
downloaded to the
Portal_server
upon hitting the button
Download.
The parameters of the form are the followings:
The fields
login
name and
password
refer to the account of the
previously uploaded certificate.
You can overwrite the default value of
lifetime
as the required
expiration time of the short term proxy certificate. However
the the actual value of the downloaded
proxy will not
exceed the value you have defined during Upload of
your certificate.
If you find this limit too short, please repeat the upload process.
Upon
Download in the
new (2) release of the Portal the user gets a message (
Figure 10a) indicating that downloaded
short term certificate can be associated to a
GRID.
Figure
10a
A
GRID
is an administrative community of certain
resources.
With the same
certificate the user can
reach all resources of a certain simple GRID or a certain
virtual_organization of a complex grid.
Resources are subordinated to grids, i.e. each resource must belong to
a
given GRID (or to a virtual organization of complex Grid).
In the new release of the Portals Infrastructure the different
jobs of the same workflow can
submit programs in resources of different available grids.
Therefore the user must have methods to maintain different certificates
to different Grids (or
virtual_organizations).
See more detailed in
VI_Multi-GRID_support.
This association is introduced by hitting the button
Set for Grid.
The actual selection can be performed on the subsequent frame (
Figure 10b) .
Upon selecting one available name (which may refer either to a simple
grid or to
a
virtual_organization or to
a
virtual organization
with broker support ) from the
check list
labelled
Select GRID
the hitting
of button
OK (
Figure 10b) closes the frame and returns to
the panel showing the
downloaded short term (proxy) certificates:
Figure 11
As you can see -comparing
it with Figure 5, there is just one
certificate on the list in our
case, and the time of the usage is restricted by the value you have
used
for the selection.
Important notice:
In the [
Actions] column
there may be a fourth button
Use this
at each unselected proxies , if the list has more than one
element belonging to the same GRID (or virtual organization).
With this button you can select the actual certificate you want to
apply for your subsequent job submissions directed in the respecting
GRID (or virtual organization).
With the help of button
Set for
Grid the GRID (or virtual organization) association can
be changed at any time (See
Figure 10b)
With the help of button
Details
the user can go back to a frame similar to of
Figure
10b but without the possibility to set a GRID (or
virtual organization).
Having defined the certificates you
need resource(s) where your
jobs
may run.
The main menu button
Settings
helps you to define them. Please note that these data are stored on
your
Portal_server
and not on the Certificate Server.
3.
Settings: Defining the
resources
However
resources belonging to different virtual organizations (or
even to different grids ) may be used within the same workflow.
See
the management of the grids in the Chapter
VI_Multi-GRID_support.
Please, note that the
Name in the table
settings (See
Figure
12a) means
virtual_organization
even in cases when the grid consists of just one virtual_oganization.
Hitting
the tab Settings the
list of virtual organizations available for the user will
be displayed:
Figure
12a
To select a line for
details the proper
button
Resources opens a new
frame listing the
resources
belonging to the selected
virtual_organization
.
Figure
12b
New elements to the resource
listing of Figure12b can be
added by three different ways:
-
By manual definition, filling the two component input fields of
the Contact_string the URL and Job_manager
acknowledged by the button Add
-
By inheriting the manual settings of the Administrator of
the P-GRADE Portal hitting the button Load
default
-
By letting the resources discovered automatically by the Information system hitting the
button Load resources from MDS2
Let
be noticed that the usage of this intelligent method is GRID dependent:
it is a condition of a running and accessible information system
of
MDS2 kind.
The Contact_string
defines the entry point of a resource (cluster ) in form of
an URL of the
leading Host of the cluster appended by symbolic
scheduler
name Job
manager which classifies the demands of the job against the
cluster.
More precisely
the contact string defines a program-queue belonging generally to a
cluster of hosts, whose elements may execute the job added to the queue.
In the EGEE like grids the name of the
resources identified by a contact string is called Computing Element (CE).
The same cluster -identified by the
leading Host- can serve several CE-s.
You may remove a
resource
from your list at any time by Delete.
If the
NAME of Figure 12a is of virtual organization with
broker support
kind - for example "hungrid_LCG_2_BROKER" - then the
window
opening
for hitting the button "Resources"
is not modifiable, i.e. it
has no significance, and is there just for historical reasons.
The contact string is a general term, which
has been slightly extended by the EGEE
Therefore
the usage of EGEE resources needs special consideration:
3.1 Direct
use of resources in the EGEE.
The user can explore the available
Computing
Elements by two ways:
- either by using the LCG-2_information_system. (See Figures 7.9 7.10)
- or by issuing the following command from a remote terminal
connected to a User Interface Machine of the EGEE:
lcg-infosites -vo <vitual_organization> ce.
For example in case of the
virtual_organization
"voce":
skurut4.cesnet.cz$
lcg-infosites --vo voce ce
****************************************************************
These are the
related data for voce: (in terms of queues and CPUs)
****************************************************************
#CPU
Free Total Jobs Running
Waiting ComputingElement
----------------------------------------------------------
9
9
0
0 0
ce.grid.tuke.sk:2119/jobmanager-pbs-voce
166
166
0
0 0
ce.polgrid.pl:2119/jobmanager-lcgpbs-voce
94 14
95
80 15
grid109.kfki.hu:2119/jobmanager-lcgcondor-long
36 32
0
0 0
ares02.cyf-kr.edu.pl:2119/jobmanager-lcgpbs-voce
78 65
0
0 0
zeus02.cyf-kr.edu.pl:2119/jobmanager-lcgpbs-voce
46 41
0
0 0
skurut17.cesnet.cz:2119/jobmanager-lcgpbs-voce
176
176
0
0 0
ce.egee.man.poznan.pl:2119/jobmanager-lcgpbs-voce
Two new features can be observed comparing the traditional Globus
resources:
- The newest type of job_managers
contain a subordinated queue as an
extension:
Here the "-" separated last part of the job manager -in the example
("voce" and "long") refer to the subordinated queues.
- At some places the traditional job managers are substituted by
"lcg" proprietary ones:
Warning:
When the job manager is of "lcg" inherent one - for
example "lcgpbs" instead of the common "pbs" we strongly discourage the direct usage of the
resources
of this kind in the Job properties
window
(See Figure_18.
where the job manager is "fork" and therefore the usage is correct ).
Instead of the direct use, please select a virtual organization with
Broker support as Grid
(See Figure
10.1) and apply the following constraint in the tab Rank& Requirement of
the JDL description:
other.GlueCEInfoHostname ==
<Host_of_Computing_Element >.
Example:
Selecting from the table above for example the Computing Element
"ce.polgrid.pl:2119/jobmanager-lcgpbs-voce"
the host is
"ce.polgrid.pl"
And the setting can be seen at Figure 10.9
4.
Workflow Editor:
Building your workflow
Now it is time to make our own
workflows. We select the tab Workflow
of the main menu:
Figure 13
By selecting the button
Workflow
Editor, an independent java program, the
Workflow_Editor will start.
Note: Your browser must have the JRE
1.5.2
Java plug-in (or higher) in order to let this program
start.
The first time
the Workflow Editor is loaded during the portal session some messages regarding possible risk of
using the program will be displayed.
The Workflow Editor, however, is harmless and should be allowed to run.
(The involved
Webstart technology notifies the user that the downloaded program may
access the local file system and
prompts the user to trust or dismiss the source of the certificate )
In the positive case the following
window will appear:
Figure 14
Next you will build a simple
workflow containing two
jobs
described as follows:
Example definition:
For simplicity both jobs are of
identical structure and use the
same executable program (in real life the executables are usually
different):
This executable is a simple
sequential program of C source
-in our example Cell.c¨
- that reads two integer numbers from two different text files, where
the program opens these files as ¨INPUT1¨ and ¨INPUT2¨ respectively, and the
value of the multiplication of the two numbers will be written in an
output text file opened as ¨OUTPUT¨.
You will build the connections to
your first job ¨Cascade1¨
such a way, that it will receive its both input_files from the local file system
as <path1>/I1
and
<path2>/I2
respectively.
Its output ¨OUTPUT¨ will be generated
somewhere in the GRID and will
serve as the ¨INPUT1¨
of
the second job -¨Cascade1.2¨.
The local file <path2>/I2 will be used as the ¨INPUT2¨ of this job .
Finally the result of the whole
workflow - ¨OUTPUT¨ of ¨Cascade1.2¨ - will be
generated in the GRID and
replicated to the Portal_server, downloadable for the user.
As a preparation, you must have
got the executable
code
(
¨Cell.exe¨) and the
input_file
in the form <path1>/
I1
<path2>/
I2
stored in your local file system.
This knowledge is enough to build the
workflow as follows. Let us
define the first
job
first:
Figure 15
Hitting the marked icon
New job
a new
job
will appear:
Figure 16
Double clicking on the
job will allow
its
properties to be edited.
(An alternative to double clicking is the RIGHT mouse click on all
graphic elements triggering a popup menu of possible operations.)
Figure 17
Learning
Notes on Job Properties:
In this menu the user defines the code of job to run (
Job Executable) and the call
conditions of the job - where to let it run (
Grid,
Resource),
with what kind of arguments
and conditions of the resources.
Arguments can be
line
Attributes
elaborated by the code internally eventually influencing the running of
the job,
Monitor flag to
indicate graphical observation of running of the job,
and some hints about the kind (
JobType)
and size of resources needed perform the job (
ProcessNumber)
Details:
Name
is given by the system as
default. The user can change it but the name of
job
must be different from any other job names.
- The value of "Name"
will be used to generate different file names in the internal
representation
therefore the value is restricted to be built of
alphanumerical characters and of underline character.
- Changing the Name
after a closed editing session may have a disturbing
consequence:
The path to the Job Executable
uploaded in the previous session will be lost and therefore the
system prompts the user to redefine the path.
JobType
is the kind of the code referenced by
Job
Executable to be started on the resource.
It can be traditional sequential (
SEQ) or parallel (
MPI,
PVM) . In case of MPI the resource
must be
informed about the number of hosts needed by the program (
Process Number)
Important notice: If the user wants to submit
MPI jobs with Broker support the
special JDL requirement must be entered:
(See Chapter X
2.9_Important_notice_to_MPI_submission)
Job
Executable defines the path of the executable code to be
uploaded from the local file system.
Upon a successful upload
and a subsequent download of the Workflow the input field will show
only the name of the executable
instead of the whole path. Any change of this input field
instructs the
system to upload a new executable -defined by an absolute path - from
the local file system.
The search for such a file is supported by the File Browser
Instrument
is a message field
set by the system only in the case when the executable
code
contains
special message sending instructions
for the real time
monitoring i.e. the
code
is
instrumented.
Process
Number has
significance only if the Job Executable
is of MPI
type.
It notifies the resource about
number of needed hosts.
Attributes
may be filled in
just
as the eventual command line parameters of traditional C
programs.
Grid
defines a
GRID or a
virtual organization
i.e. the high level administrative domain where the job must be
submitted.
Changing the GRID changes the subordinated list
of
selectable Resources as well.
Monitor flag can be
set by the user
only in the case when the
Instrument
is set.
The setting enables
job level monitoring .
Please note that setting (changing) of the monitor flag may
have an unexpected
effect on the selected
Resource
(and on selectable resources):
If the previous state of
Monitor
was "not set" and the relating selected
Resource could not
be
monitored then the new monitoring request will hide all the
resources lacking the monitoring infrastructure and the current
value in the field
Resource
will be replaced by first monitorable element of the list of
resources.
Resource defines the
resource
where and
with what kind of assumption the
Job
Executable will runs within the defined
Grid.
The selected resource may change
implicitly as a consequence of
changing
Grid and
Monitor setting.
If the
Grid is of
virtual
organization with broker support
kind - for example hungrid_LCG_2_BROKER - then the value of
"
Resource" has no
significance, and just for historical reasons.
Warning:
If the kind of the
virtual_organization
defined in the
Grid is
of EGEE like and
Job_manager of
the
Contact_string defined as
Resource is an LCG proprietary
one
please consider the warning in
chapter
3.1_Direct_use_of_resources_in_the_EGEE
and use
virtual
organization with broker support as
Grid with resource constraint in the
JDL.
JDL: Is a new
feature available from the Release 2.2.
It indicates the JDL
editor. This editor is applicable only if the
virtual_organization selected in
Grid is of EGEE compliant
type
i.e. the
Resource will be
determined by the system upon clues of matching
characteristics (set by the user with the help of the JDL Editor)
instead of direct assignment. The usage and effect of the JDL Editor
will be discussed in
Chapter
X_Connection_to_the_EGEE_Grids.
There is an alternate method to set the
resource
dependent features of the job. It can be handled centrally starting
from the
main menu
Workflow Properties.
After filling the needed fields the
window may look like this:
Figure 18
Hitting Ok you will return to
the main window:
Figure 19
Now you can define the
I/O ports
to this
job.
Hitting the
port icon you may add a new
port to
the
selected job
.
An alternate method to define a new port is the menu item New port in
the main menu Workflow (See
Figure 32)
The
selected
state of the
job
is visible by the red frame around the
job
:
Figure 20
Double clicking on
the
port icon
(or
selecting the
Properties item
of the popup menu triggered by the right mouse click as
Figure23
shows )
the port
properties will be definable in a pop-up window:
Figure 21
With the help of the
port properties window the user defines the direction, kind, name and
file association of the
port.
Learning
notes on Port
Properties:
A
port
connects
an
input or an output file opened by the job with the environment.
This environment can be from the point of view of the respecting port
an external file or an other
port of a different job.
The external file reference (defined in the field File) will be mapped to
the name (defined in the field Internal
File Name) used by the author of the executable to open the
given file.
Notice, that there is
no more restriction on usage of filenames:
In
the versions preceding Release 2.2 of the P-GRADE Portal the Port
property
field Internal
File Name (see Figure 21)
has not been
defined and
hence there was the additional restriction imposed on the user to
apply external file references
"ending" as the job executable expects
them i.e. the value corresponding to this field was generated from the
"/"
separated "tail" of the File field which used to have the form
[[protocol]<directory>]<FileName_applicabe_as
InternalFileName_as_well>
A port can be either an Input (
In)
or an Output (
Out) port .
Input:
If the
Type of the
port is
Input AND the port will
NOT be connected
to any
Output port of other jobs then the
port must refer to a genuine
input file.
The genuine input file must be defined as a full path in the
File in the form of [<
protocol>]
<path>
where
<protocol> can be defined only in case of
Remote files (see
VIII_Handling_of_remote_files).
If the
Input port will
not be connected to a genuine input file i.e. it will be
connected to the output port of a different job then
File field must (and can) not be
filled.
In both (local and remote) cases the user defined input field
<name> after the label
Internal File Name must correspond to the "fopen (
<name
>,"
r")"
instruction within the code of the job.
The
set
flag
managed
copy means
that the system automatically delivers the input file to the working
directory where the associated job will run.
This is the default case.
However in some cases when the user wants to handele
remote files and the
location of the input file is a
Storage
Element the file may be to big to be copied.
In such cases the user may decide (by clearing the flag) to take over
the responsibility of
reading the Grid file. Note, that in this case the executable of the
job
must be prepared by the user to open and read the Grid files using the
GFAL api of the EGEE.
Output:
If the
Type of the port is
Output
AND
File Type is NOT
Remote
then
File field must (and
can) not be written.
Please note that there is NO symmetry between genuine input and
output files: Genuine local output files
( i.e. those referenced by
Output
ports not connected to any other
Input
ports of different jobs) are stored in the
PORTAL Server and
will be downloaded to the local environment of the user by an
interactive command after the completion of the run of workflow.
The other case is when the
Output
File is
Remote. In this case the file
referenced as string after the label
File will be
stored according the full path of form [<
protocol>]
<path>
(see
VIII_Handling_of_remote_files).
In both (local and remote) cases the user defined input
field <name> after the label
Internal File Name must correspond to the "fopen (
<name>,"
w")"
instruction within the code of the job
Details:
Port name:
Given
automatically by the system. There is not too much sense to
change
it by the user.
The field will be used internally to generate subdirectories. Hence
it must contain only alphanumerical characters.
Type:
Selector, to indicate either a
reading
or
writing access
to the proper "fopen (<name>,
"{r/w}")" instruction in the code of
the job
File
Type:
Has significance only in case of genuine files. The default setting is Local.
If the setting is
Remote then the
Input
file will not be uploaded to the Portal Server during the
Save/Upload phase
terminating the definition of Workflow by this Workflow
Editor. Instead of just the reference to the
remote
file will be
stored and
the file transfers will be organized by the run time system.
A file defined to be
Remote on
an output port forces all connected input ports to be
Remote with identical
File names.
File:
Reference to a
genuine local or
remote
file [<
protocol>]
<path>
Please note, that this filed is not definable if the port is an
input one and connected to the output of an other port, or the
port is designed to be
a local output port.
The search for such a file in the
case FileType = Local is supported by the File Browser.
Internal File Name:
Internal reference to a file
used by the author of the corresponding job in a "fopen(...)"
instruction.
File storage type:
This
selector can be activated only in
case of Output files. Its
default setting is Permanent
for the genuine Output files
and Volatile for
the
"channel"
files connected to
other ports .
If a channel file is reset as Permanent
its data will not be
discarded after each connected job has read its
content
but "added to the output of the
workflow". It means in the case when the setting FileType = Local that the file will be
preserved for downloading as it would be a
genuine Output file.
An eventual resetting to Volatile
forces the change of the File
Type
to Local because a
temporary file in dedicated Remote
storage
device is undesirable.
Please select a proper input
file with the help of the File
Browser and fill the Internal File Name according the convention
required by example program
cell.c
:
Figure 22
The
Port
name is set
automatically, however the user may redefine it.
Hitting
OK you return to the
original editor to define other
ports.
Repeating the proper steps
(basically to define the location of INPUT2 for
port ¨1¨ - a similar window
is shown in
Figure 30)
you arrive to
change the properties of
port ¨2¨
Learning Notes on Port
Editing:
In this case hitting the right mouse
button (seen in
Figure23) offers three
possibilities:
Properties - to define the Port
properties
Delete
- to delete the port
Fix (Unfix)- toggle to glue (or
release) the relative graphic representation of the
port along the sides of the square representing the job
(This operation has no significance from the point of view of the
semantics of the workflow)
Figure 23
Selecting Properties by left click you get the Port
properties popup
window where
you may select "Out" as Type and enter
"OUTPUT" as Internal File Name:
Figure 24
Please
remember that in this (
Local) case of
Output you
must not define the
File even if
this
port
would be intended as the source
of the genuine output of the whole workflow. The reason is, that the
Workflow, upon successful termination of the submitted tasks,
will
not return individual files. Instead it packs the
Permanent Local output_files
into a compressed file tree reflecting the structure of the
workflow, and you can download it by the standard method of your
browser as you will see it in Figure 39.
Please note that you will be able to identify this
file using its
Internal File
Name.
If the user wants to reduce the storage load of the files produced by
the job then the eventual unneeded files
can be marked as
Volatile
instead of the default
Permanent.
Hitting
Ok completes the
definition of our first
job and the
icon New job
can be selected.
Figure 25
Let us define the job properties as
previously and let us create the
ports for the new job similar to
Figure19,
and let us select the first (0) one.
Here the
File will not
be
defined because this port will be connected to the output
port of the other job:
Figure 26
Hitting Ok we will receive the following Warning message
Figure 27:
Figure 27
The simplest way is to
answer it with No
and proceed to
perform the port connection.
After closing the Port property window the Editor looks like as:
Figure 28
Now we connect the
output (
port ¨
2¨) of ¨
Cascade1¨ with the
input port ¨
0¨ of the second
job
.
Pressing the
middle mouse
button on the output
port, holding it down, dragging it
up to the proper input
port and releasing it will
define the desired connection.
Editing Notes :
No rubber line will be seen during dragging.
Clicking on the arrow connecting ports the color changes from blue
to red and the connection becomes selected.
The selected connector can be deleted by the proper icons of the
menu bar (cat or delete) or even attributed it
graphically
to influence the color oft he connection for the simulation
regime
when the Workflow Edit is used to display the runtime state
of the submitted workflow. Hitting right click on the arrow opens a
popup menu where toggle item Switch to {ONLINE|OFFLINE}
can be selected. This minor coloring feature does not change
the semantics of the workflow to be defined.
Figure 29
Now we edit the second input port (Port
name 1) of the
new job:
Figure 30
And let us define the output port for
the second Job:
Figure 31
Confirming the change by
Ok
the
edition phase is complete. We just need to save our product for
the
Portal_server:
In the main menu let
us select the
operation Save as:
Figure 32
In this state the Workflow Editor
controls the correctness of the
workflow.
Learning Note
In case of an eventual error
(mostly bad references to the local files to upload, missing resources)
a warning message appears about the found errors.
Even in this case the user may decide to save the workflow. However in
this case the workflow is marked as
incomplete for
the
Workflow Manager and can not
be submitted only to be stored for a later modification.
This modification is initialized by the
Open menu command (see
Figure32
) supplying a list of the workflows of the user stored in
the
PORTAL Server
.
Selecting one workflow it will be downloaded and the editing can be
completed.
In case of saving of new workflow (in
case of
Save as or at
the first use of
Save) a popup
dialog (
Figure 32a) prompts the user
for a name of the Workflow.
This must be of alphanumerical characters and must be different of
workflow names have been stored in the
PORTAL Server .
Figure 32a
Let's define the workflow as ¨
WF1¨.
In a subsequent step system automatically proposes to issue the
Upload command to transfer the
referenced executable
code(s)
and the
input_file(s)
from the client's desktop to the
Portal_server:
Learning notes on upload file
:
The Upload proposal happens in
the following cases:
- New local input file or code file references has been detected
- The user has modified a field of kind Job Properties/
Job Executable (Figure 18)
or a field of kind Port Properties/File
(Figure 22 ).
If the user refuses the suggestion the workflow remains incomplete
.
Upload command can be issued later at any time even manually. See
Menu Upload Files...
of Figure 32
Figure 33
You select Yes and then the system
starts the uploading process, which is indicated in a pop up window Upload containing a progress
bar.
Upon termination the message
Finished will be visible and the system will wait for
the user to press the Close button:
Figure 34
Executing the editing steps above we have finished
the
creation of our new workflow
WF1
and can leave the
Workflow Editor
to return to our
Workflow Manager
Its page (See
Figure 35) must be
Refresh-ed to show our new
workflow WF1, which is now ready to run.
Before
doing it let us control the associations of
jobs
and
resources.
It can be done either step by step visiting the jobs for
properties or centrally by a new menu command of Release 2
Workflow Properties (See
Figure 32 )
It opens the following table:
Figure 34a.
If a change is needed it can be
performed in a 6 step process:
- Select a proper Grid
- Select the required resource from the list of the loaded Resources. (Remember the resources
are Grid dependent)
- Mark the left of the line(s) belonging to the required
Job(s).
- Confirm the changes with the button Set selected
- Leave
the window by Ok
- Save the Workflow
This table can be used the similar way
to control the monitoring of
the jobs. If the
code belonging to the
job is not
instrumented
then
the association will be refused.
5
Submitting the workflow
Figure 35
With the Submit command we can
activate the workflow:
Figure 36
6 Observing
the progress of the workflow
A side effect of submitting is the changing of the Submit button to
Abort. A subsequent Attach command reopens the
Workflow Editor but in a new cast: The progress of the workflow can be
followed by the changing of the colors:
Figure 37
In this state the first
job
has received the control and the second is waiting for the termination
of the first.
A click on the Refresh button of the Workflow manager window may
indicate the successful termination of the workflow.
The
Portal_server
just collects the referenced files and makes one compressed
downloadable
file out of them:
Figure
38
7
Fetching the result
A click on the green button in the [ Output ]
column starts the download
manager of the browser to copy the result file on the desktop of
the user. Please note that the user defines the destination
library of the workflow result with the tools of the download manager
in
a browser dependent way.
The pressing of the Details
button of the [
View ] column opens a
window from where important information
can
be concluded:

Figure 39
Beyond the
verbose state of the
constituent
jobs
in the
Status column
you can get the graphical rendering of the two stages Time –
Process communication diagrams ( by pressing the buttons under
column [
Visualization]
. You can also see the eventual messages of the
jobs
directed to the standard output ( by pressing the button
Out) and/or to the standard error
(by pressing the button
Err
- not visible in Figure 40 as the
jobs
did not produced error messages ) channels. This buttons are placed in
the column [
Logs].
Hitting the
Visualize
button, the visualization is performed by the independent program
called
Prove that is working on
a
proper trace file of the workflow. The availability of
job
level visualization is depending on two necessary conditions:
- The original executables have been translated with the
necessary instrumentation library incorporating trace sending
instructions.
- At the corresponding resource where the job
was running, the ]Mercury_monitor:_
monitoring infrastructure should have been installed receiving and
collecting these traces.
As you see this was not the case in our simple example.
You will get a more comprehensive view of the possibilities available
for monitoring in Chapter
V_Monitoring_and_Visualization
.
Finally we show the window returning the content of the standard output
upon hitting the Out button of Cascade 1.2
Figure 40
V
Monitoring and Visualization
1 Introduction
Graphic monitoring means the
generating, collecting and graphic rendering of runtime data informing
the user about the state and about the progress of the submitted
workflow. In a parallel environment the dynamical conditions
triggering the run of a distinguished program parts are of special
importance: They help the user to pinpoint design flaws and the
temporarily missing resources. Therefore the “time space” diagram has
been selected as the base tool to render graphically the behaviour of
the
interacting program parts. This will be discussed in Section 3 where
the
work with the graphic tool
Prove –
running in the desktop of the user will be detailed.
We use the common term “program parts” in the respect of
graphic monitoring in two totally different contexts:
- At the level of the whole workflow to distinguish the
participating jobs;
- At the level of individual jobs
to distinguish the eventual parallel running processes.
For example the upper part of Figure_43 shows
the monitoring of the whole workflow, the lower part is a detailed view
of the progress its job
“cummu”.
1.1
Availability of monitoring
On one hand the possibility of the high level, –or workflow
monitoring is the generic property of the implied job
submission technique “Globus/Dagman” .
On the other hand the job
level monitoring –
badly needed first of all in cases when
the job
includes parallel processes – can only be performed if the
following conditions are all valid :
- The source code of the respecting processes has been extended by
special instructions at proper places to send monitoring messages.
(This
preparation is referred generally as “instrumentation” of the source
code)
The code can be instrumented either using the application P-GRADE (See Import_process) or by the
¨manual¨ use of the grm
library. A separate documentation of the using of the grm
library will be soon available.
- There is a special infrastructure (the Mercury_monitoring service
[1]) deployed in the remote resource where the
submitted job
runs to listen for and to gather these monitoring messages.
- The user has enabled the monitoring by setting the Monitor flag in the Job properties
window of the Workflow_Editor . (Figure_18)
2. Life cycle of
monitoring data
2.1 The source of data
As you see in Figure_2 the workflow results
–including monitoring data, in our terminology “the trace file”– primarily
arrive from the remote resources
into the Portal_server.
Actually huge amount of data may be produced by the
instrumentation.
As each job
is associated to a dedicated resource, there is a separate trace_file file to each job
.
2.2 The transport
The trace_file
will be collected in an autonomous, incremental way in packages as a
result of two possible events which are basically independent from the
activity of the user:
- The local temporary buffer for the current portion of the
trace_file
in a host of the remote resource is full.
- The respecting job
has terminated
2.3 The elaboration
These data need to be stored, filtered and elaborated. It is the Portal_server
which does the bulk of this work. It prepares the “image file” on user
demand. This “image file” - very few byte indeed compared
to
the trace_file
– will forwarded to the application program Prove running on the
desktop of the user.
Why the user should know all these nasty technical nuances?
First of all to understand the cause of the delays that the double
buffering imposes on the graphic rendering system. Almost as important
to understand that in given cases the user should assist to diminish
the
load of the Portal_server
by issuing of the “forget events”
command of the Prove, instructing the Portal_server
to truncate the corresponding trace_file
releasing the data about events have been arrived before a certain
time. (There is only a limited storage quota for each user in the Portal_server
which is a precious shared resource )
2.4 The frame of
destination: The visualization interface
The program Prove can be started from the “detailed” view of
the “Workflow Manager” as Figure_39
indicates.
Note: In the following a new workflow
application is selected as an example to demonstrate the full
palette of monitoring options. This fairly complicated workflow has
been
prepared such a way that all of its component jobs contain instrumented codes.
It is called ForecastWmin and performs a weather
forecast program, see Figure_41)
Figure
41
In this case the detailed view of this workflow in the Workflow Manager
indicates the possibility of the job
level monitoring by the show of proper buttons (Figure_42).
You can compare it with Figure_39 of
the
workflow application WF1 where the
buttons for the job
level monitoring are missing, because the jobs
of this application have not been prepared for monitoring.
Figure 42
Figure 42 shows the detailed view of the workflow ForecastWmin in
an intermediate state. The jobs
which are running and /or finished can be visualized by the program Prove which opens independent
windows upon hitting the respecting Visualize buttons. The Prove can be
opened for the high level view of the workflow as well. (Button Visualize in first line of the
workflow containing the name of it)
The button All “packs” all
visualization windows together starting from the high level view
as Figure_43 indicates:
Figure 43
Warning:
If the number of the elements along the vertical
axis (hosts / jobs) is high than certain alphanumeric texts
may not be displayed due to the low resolution.
In that case please increase the size (especially the
Height ) of the applet.
3 The Prove program
As previously indicated the program Prove visualizes time – space
diagrams.
The program_parts are represented
by colored bars placed as rows of a coordinate system where the
horizontal axis denotes the common time, and the –discrete –
vertical axis is labeled by the name of program parts which may
be jobs
or processes depending on the call context of the current item of
Prove.
- Green color indicates the state of a program part waiting for an
event to read.
- Black color indicates the “working” state
- Gray color indicates a state where the program is blocked
wanting
to send an event to an other program not yet ready to accept it.
Endpoints of arrows between bars are indicating times of sending and
receiving of events respectively. These arrows must be generally blue.
Exceptional red lines indicate bad trace_files,
unsynchronized clocks, lost monitor information. You are kindly
encouraged to report them to our Portal
maintenance team.
3. 1 User activities
The user activities may have effect on the trace_file
generation and on the graphical rendering of them.
3.1.1 Truncate trace
files
The only activity respecting the trace_file
generation is the menu command Trace/Forget
events
(see more detailed in the chapter 2)
Figure 44
The menu command Trace/Collect
is not used at present – it is reserved for forcing the remote resource
to update the trace_file
.
3.1.2 Visualization
activities
Visual rendering activities include the filtering, attributing, and
time scale zooming of the program parts.
3.1.2.1 Filtering
The menu item View/Filter serves
to diminish the program parts to be shown.
You can select the interesting program parts by the associated toggle
marks. The selection is will be actualized by selecting the “Show
changes” item, as Figure_45 shows. Please
note,
that “delta_m” –not visible in Figure_45-
has been selected too.
The operation Filtering can be regarded as a kind of “vertical
zooming”.
Figure 45
The result of selection is shown in Figure_46:
Figure 46
3.1.2.2 Change
state/statistics
Selecting the statistical regime instead of the default settings
informing about the time dependent states of the program parts a
color coded statistics of the occurrence frequency of distinguished
event types will be retrieved:
This operation can be started by selecting the menu item
Info/Statistics/Event. See Figure_47:
Figure 47
The result can be seen in Figure_48:
Figure 48
You can restore the original settings by selecting the menu
item Info/Statistics/Communication
(Figure_49)
3.1.2.3 Sorting
the
program parts vertically
You can change the order of the appearance of the program parts along
the vertical axis.Figure_49 shows the
path to the selection of the proper menu item from the list
Info/Sort/{Sort by communication
|Sort by name| Sort by hostname}
Figure 49
Figure_50 shows the new image:
Figure 50
3.1.2.4 Zooming
in
the time scale
One of the most important ways of the investigation of events is
the zooming facility in the time scale. The zooming works a stack
like way and does not use special buttons of the window but the
just the mouse buttons. The rules of selection are very simple:
- Left click of the mouse around the horizontal time line defines a
range delimiter. Releasing the dragged mouse in a new
position defines the other time range delimiter. The window will be
refreshed automatically blowing up the whole image in the ratio
of
the length of the original range divided by the length of the new one.
- This activity can be repeated in any steps or the
previous range selection revoked by the clicking of the right mouse
button.
The Figure_51 shows the state immediately
after the range selection (the little horizontal line toward the right
side of the calibrated time scale), and Figure_52
the state after the execution of the zoom instruction.
Figure 51
Figure 52
Any zoomed image (
Figure_52,
Figure_53)
contains an active ruler . With the help of it whole original
range time range can be swept over. However this operation can be
prohibitively slow: As it was discussed in
2.3, the desktop part of the Prove
program must send a request to the
Portal_server
for a new image which will be downloaded with a delay depending on the
network. Therefore the sweeping will not be as smooth as it would be,
in
the case of traditional local program.
The Figure_53 shows the image after a
repeated
zoom.
Figure 53
VI
Multi-GRID support
In P-GRADE Portal from version 2.1 users can execute their applications
in
several
Grids, each of which may consist of one or more Virtual
organizations, (VOs).
If a Grid consists of several VOs the user should have a certificate
for the Grid and this certificate should be registered to those VOs the
user would like to access.
For each of these
VO-s the user has to have a valid certificate, which
will be used
for authenticating
the user at the resources of that particular VO.
To use this multi-GRID
support the following steps have to be taken
- The
portal administrator has to set up the list of VOs, and may
define a set of default resources. These resources appear on
demand
in the resource list of every common user.
- Each
user can then setup his own resource list for this VO
- The
jobs of the workflow can then be allocated to any resource of any
VO, so different jobs of the very same workflow can be executed
on
different resources belonging to different VOs of even
different
Grids
- Before
execution the user has to download a short term proxy certificate
for each VO
involved in the workflow.
Important
notice for EGEE users:
The Portal ensures a multi-
Grid, multi VO support independently from the underlying
infrastructures.
However certain grids may
impose
restrictions:
EGEE restriction:
A VO
defined by the user when selects a Virtual
Organization with Broker support may be in contradiction with the VO permission
of the resource selectable by that Broker.
This unpleasant situation may only
occur if two conditions fulfill:
- The user has registered to more than one VO
(The EGEE declares that a user should be a member of just one
single VO. However EGEE does not prohibits multiple membership)
- The sets of resources
belonging to the mentioned different VO-s include a common site
"S" and the broker selects just this site to execute the user's job.
Let's
see the situation detailed:
- The user already a member of VO1 registers to VO2. As the site
"S"
also belongs to the VO2 in the Grid map File of site
"S" the user will be mapped as VO2 member.
- The user submits a job to a VO1 broker accepting him/her as a VO1
user and making the proper VO1 setting in the JDL description.(Figure 10.2)
- The local security system on site "S" finds a VO1 job and from the
delivered proxy_certificate (
including distinguished_name of the
user ) determines a contradicting VO2
membership from the mentioned Grid map File.
Let's
see all this in a bit more detail.
1.
Setting
up VOs of Grids and default resources (by portal administrator)
The Grid
and VO and the resource list
of the VO-s can be
edited
in the
Settings tab of the portal.
Only the root user has
privilege to setup
and modify
the list of Grids and VOs.
This means that he/she has to set up
at
least one Grid (or VO)
and
advisably one default resource for it.
In Figure 6.1
the
Grid configurations window can be seen
as edited by the root user.
The root user
adds a new VO by ‘Add
new’, and
delete existing
ones by ‘Delete’.
Note that in case of Grids composed of
several
VO-s the input field "Name" refers to the
VO and the input field "Grid" refers to
the Grid as hub over the several VOs.
The distinction is necessary because the resources will belong to
the VO but the information system access defined here refers to
the superimposed Grid.
Shortly speaking the string defined as "Grid" may appear only in the
top of hierarchy when the user selects a Grid as the root
for
information retrieval (see The
Information system).
If we want to define any VO -for example -
"HUNGRID" - of the
"EGEE" Grid then together with the VO "HUNGRID" we may define the
access to the whole "EGEE" Grid.
Having defined the HUNGRID as part of the EGEE grid
the whole information system of the EGEE Grid becomes visible (See Figure 7.6)
In cases when the Grid is not really subdivided by VO-s the Grid
is regarded to be consisting of one VO and -similar to the multi
VO case - this name of this VOis required
as "Name".
The filling of the field "Grid" is not obligatory, and in case of the
empty input string its value will be inherited from the value of "Name"
. This suits for needs of user groups
who using simple Grids do not want to make
distinction between the idea of VO and of Grid.

Figure
6.1 Grid
configurations list window
The administrator can also setup an
information system for the
Grid of the VO if it is available. Currently the
information
systems of types MDS2 and
LCG2 are supported. The
configuration of the Information System will
then be
used by the Information System
portlet. If there is no information system
then just choose ‘N/A’.
Please note that in case of the LCG2 the information system
refers to the whole Grid and not
to just one virtual organization.
Both
for MDS2 and LCG2 the host,
port, base-dn have to be defined for
contacting the Information System. You can see this in Figure
6.2.
For the MDS2
type you also have to refer to an existing MyProxy server account (See the "login" and "password" of Figure10, where "login" of Figure 10
corresponds to "Username" of Figure
6.2).
The other fields of the
MyProxy Server account ("hostname" and "port") are referring to the
MyProxy Server itself and they are defined during the installation of
the P-GRADE Portal in
the configuration file "PGradePortal.properties".
The system will
automatically download a proxy certificate from this account, and
will use it for
authenticating itself against the source of the information when
querying the
job-manager list for the Grid.

Figure
6.2 Defining
Information System for the Grid (MDS2)
A
default resource list can also be setup by the portal administrator.
This user
interface can be reached by clicking the ‘Resources’ button in the Grid
Configuration Window (Figure 6.1) .
The resource list window can be seen in Figure
6.3.
Figure
6.3 Defining
the DEFAULT resource list for the Grids
The
portal administrator defines a default list, which will then be
available for
any of the users for setting up their own resource lists. Resources can
be added
by ‘Add’ and can be deleted by ‘Delete’. At definition the URL (for
example
"n99.hpcc.sztaki.hu"), and a Job manager (for example
"jobmanager-fork")
have to be
provided.
A special case of the VO
definition is when we define a virtual_organization with
broker support for example "hungrid_LCG_2_BROKER" in Figure 6.1
In this case no information system will be defined. For
historical
reasons the window Resources
contains in this
case just one list
element -mostly the- "default.jobmanager". It will be set by the
administrator, and it may not be altered by a common user.
This value is not used in Release 2.2.
2.
Setting up the resource list for a VO
(any regular user)
Any regular user can define his own
resource list for
each of the available Grids.
Let us compare Figure
6.1
and Figure 12a. As you can see, the users
cannot edit
the Grid list itself, they can only edit resources list by clicking the
‘Resources’ button for each
Grid.
The
resource list window for any user for a particular VO can be seen
in Figure 12b. The user can add and
delete resources just like the portal
administrator by
‘Add’ and ‘Delete’. The default resources
defined by the administrator
can be
loaded by the ‘Load default’.
If and MDS2 type information system is
defined
for the Grid than it can also provide some resource configurations,
this can be
loaded by the ‘Load resources from MDS2’
button.
3.
Allocating
the workflow (any regular user)
The workflow and its jobs can be allocated
in the Workflow Editor(WE). For any job
any VO and resource in that VO can be
set. In Figure 34a you can see the window
Workflow properties in
the WE which
can be
opened from the Workflow menu or using the Ctrl+W hotkey.
A Resource
for the jobs can also be set in the job
properties window,
which opens
by clicking on the job.
The VO (Grid) in the
job properties window can be
selected marked by the label Grid. This window can be seen
in Figure 18 .
4
Supplying
certificate for each virtual organization before
execution (any regular user)
In the multi-GRID environment users have
to provide
certificate for each virtual organization, this means that
they have to map any valid
certificate for any virtual_organization
on the resources of which
they want to execute
their
application. The whole certificate management takes place in the
Certificate
tab of the portal just like before. Right after download, users are
offered to
map the certificate for any of the Grids. This can be seen in Figure 10a .
The
click on ‘Set for Grid’ leads
to the
interface in Figure 10b . The
details of the certificate such as the issuer, subject and
timeleft
are displayed, and
the desired Grid can be selected.
By
clicking ‘OK’ in this window
the user gets back to the certificate
list, which
can be seen in Figure 6.4.
In the column named ‘Set
for Grids’ all the names of valid virtual
organizations having been associated with the respecting
certificate are encountered.
Each certificate
can be assigned to any number of the virtual organizations, but only one certificate can be set for
a given virtual
organization any
time.
Figure
6.4 The
certificate mapping window with the Grid mappings
In
this window you can also modify mappings by the ‘Set for Grid’
function, which
leads to the certificate-mapping window already seen before in Figure 10b .
VII
Information System
The P-Grade
portal can handle the available Grid
dependent
information systems.
Two kind of
information systems are recognized in the P-Grade Portal:
the MDS-2 and the LCG-2 Information system.
Configuring
a Grid access (including specifying an information system for a grid)
is a task
of the administrator of the portal. See Setting
up VOs of Grids and default resources
1.
MDS-2
information system
The MDS-2
information system of the portal has two functions: one is getting the
list of
resources available in the Grid; the other is getting detailed
information
about individual resources.
1.1 View
of available resources in the Grid
When
the user clicks on the Information system
tab then the MDS
Monitor label the MDS Monitor
module of the portal is
activated by default. There are two modules under the tab "Information
System" the MDS Monitor and the LCG
Monitor. In case
of a subsequent selection of Information
system the last visited module will be activated.
If
the administrator of the portal has not yet specified a grid with MDS-2
information system, the following message can be seen in the portal
window (see
Figure 7.1).
Figure 7.1
If
one or more Grids with MDS-2 information systems have already been
defined
in the
portal the following screen
( Figure 7.2 ) can be seen after the
selection of the MDS Monitor label.
Figure
7.2
The
user can select a Grid to see the
available resources using the combo box which
is in the upper left part of the portal window.
Having selected a grid the user
must click on the View button right
next to the grid combo box to see the available resources.
If
the server (called as a GIIS server) or the service running on that
server from
where the portal gets this information is not
available the following
message
can be seen (see Figure 7.3).
Figure 7.3
1.2
View of detailed information about a
resource
If
the user would like to get detailed information about a resource he
should
click on the appropriate resource in the resource list (see Figure 7.2). The page
with the detailed information about a resource can be seen in Figure 7.4.
Figure 7.4
Figure 7.4 shows that the detailed information on every resource
provided
by MDS-2 can be divided into a static and a dynamic part.
If
any information (e.g.: CPU Model in Figure 7.4)
is not available from the MDS at
that moment the Not Available (N/A) text is displayed for that
attributes.
2.
LCG-2 information system
The LCG-2
information system of the portal has two functions: one is getting the
list of
sites available in the GRID; the other is getting
detailed information
about
individual site.
Sites are associated with one or more virtual
organization(s) VO as well.
2.1 View
of available sites in a Grid
When
the user clicks on the
label LCG Monitor of the tab of Information
system then the LCG Monitor
module will be
activated.
If
the administrator of the portal has not specified a grid with LCG
information system yet, the following message can be seen in the
portal
window (see
Figure 7.5).

Figure
7.5
If
one or more grids with LCG information systems have already been
defined
in the
portal the following screen ( Figure 7.6 )
can be seen after the user clicks on the LCG Monitor
label.

Figure 7.6
The
user can select a grid for the available sites using the combo box
which can be found toward the upper part of the portal window. After
selecting
a grid the user must click on the View
button right next to the grid combo box to see the available sites. By
default
the sites belong to the first grid in the grid list is displayed in
this page.
Each
site in the LCG type grid is built up from Computing_Element (CE) and
Storage
Elements (SE).
More precisely the site is a rather geographic idea.
There can be one ore more clusters inside of a site.
A cluster can be feed by one or more queue called Computing_Element.
In
the site’s list page the basic information about CE-s and SE-s
can be seen. The
information for each site by default is the aggregation of all
the CE and SE resources can be found at the respective site.
If
the server (called as a BDII server) or the service running on that
server from
where the portal gets this information is not available the
message "Cannot contact the BDII server"
can be seen.
2.1.1
Selecting a Virtual Organization
The users of
LCG type grids must belong to one or more virtual organization (VO). The CE’s
and
the SE’s are associated
to VO-s as well. The CE-s may belong to more than one VO. This
means that if a CE or SE associated to a VO only those users who belong
to the corresponding VO can access
these resources.
The
user can filter the sites associated to a specified VO by
the combo box can be
found under the grid combo box in the upper part of the portal window (Figure
7.7). See bug report

Figure 7.7
After
clicking the View button right next
to the combo box the sites that belong to the selected VO can be seen (Figure 7.8).

Figure 7.8
Selecting
a specified VO means the following:
-
The user can see the list of those sites which belong to the selected
VO .
-
When the user clicks on a site name
the detailed
information
will display only those
CE’s and SE’s which belong to the selected VO
Important remark -
see bug report B.1 while
interpreting the value of columns
Total Free Running Waiting
2.2
View of detailed information about a site of a Grid
If
the user would like to get detailed information about a site he should
click on
the appropriate name of the site in the site list (see Figure 7.6). The page with
the detailed information about a resource can be seen in Figure 7.9.

Figure 7.9
As can
be seen in this figure the selected VO is All. This means that
all CE-s
and SE-s have been found at that site are displayed.
If
the user select a VO in the site list page only those CE’s and SE’s
will be
displayed in the detailed view which are belong to a selected VO.
As
can be seen in the Figure 7.10
reflecting the site IFCA-LCG-2 with VO dteam only limited number of CE and
SE is displayed.

Figure 7.10
VIII
Handling of remote files
1 General aspects
of remote files
The P-GRADE Portal supports the
handling of remote files.
Remote
is a place within a given
virtual_organization which is
different
from the
local file system of the
user's desktop and its access is controlled by the grid
certificates.
Since the version 2.1 of the P-GRADE Portal
input
files can be sent to a
job not
only from the
local file system of the user's desk top but from
trusted remote places as well.
In a similar way the
output files of
a job can be sent into remote storage places as well.
The next figure explains the differences between the handling of local
and
remote files:
Figure
8.1
Life cycles of local and remote files
- Please note that the remote
input_files referenced in the
graph description of a workflow are not uploaded together with the local input files when the user -
subsequent the editing phase - uploads the workflow to the P-GRADE Portal Server ( in Step 1
of the lifecycle of the workflow ).
- When an input file is needed at the
site of the Executing Resource
for the submission of Job i
(Step i2) the file will be
copied
from the P-GRADE Portal Server
if it has been originated as a local
file of the User Desktop machine and it will be copied from the remote Location if it has been
defined as a remote file.
- In a similar way after termination of the Job i (Step i3 ) a generated output file will be copied to the P-Grade P Portal Server if it has
been defined to be local (i.e.
downloadable on the User Desktop) and it will be copied to the
respecting remote site in the
other case.
- After termination the whole workflow (Step 4) the compressed
bundle of the local output
files can be downloaded to the User Desktop machine.
On the other hand
the eventual communication between the remote site(s) and the
desktop machine is out of the scope of the P-GRADE
Portal.
2.Different kinds
of remote file usage
Remote files can be handled by several
protocols, stored by different means and can be referenced at
several levels in a Grid (and VO) dependent way.
There are two basically different ways to use remote files from the
point of view
of the user:
1. Low level usage supported by the Globus middleware.
2. High level usage generally supported by the EGEE
infrastructure.[
4]
2.1 Low level usage (Globus)
2.1.1 Protocol
To access a file on a remote place a transfer protocol is needed,
which
is explicitly or implicitly part of the URL describing the
location of the file.
Mostly the protocol
gsiftp
will be used
i.e. in this case the user will be identified against the remote host
by
the actual certificate.
2.1.2 File reference
The file will be referenced by the URL consists of the concatenation
of host name and the storage path of the file on that host.
2.1.3
File Storage
The remote files are stored as common files of a host and there is
a special file, the GridMap file of entries containing the so
called
distinguished name part
of the user
certificates
associated to a user account known on that system. So the system can
control the access permission of file
operations. The GridMap file is maintained by the local administrator
of
that host.
2.1.4
Example
The system will use this information
in arguments of the automatically generated
globus-url-copy instructions.
Figure
8.2
Low level access to a remote input file
2.2
High level usage (only within the EGEE with broker support)
In this chapter only the most important remote file related
features of the LCG like
grids (for example EGEE) are covered.
2.2.1 Protocol
The protocol is of low importance as the
JDL
job submission system and the joined internal services of the P-GRADE
Portal hide the protocol from the user.
In that case the job submission is performed by the Broker
support. See
Connection
to the
EGEE Grids and the usage of the Broker)
2.2.2 File reference
The high level remote files can be referenced within the P-GRADE Portal
by symbolical names directed to File Catalogues.
File Catalogues map the symbolical names to Grid File-s.
Grid files are not modifiable (after creation), may exists in several
replicas connected
by a common grid wide unique identifier "guuid" and the replicas are
stored in
Storage Elements.
There are more standards of File Catalogues. The actual type of the
File
Catalogue is defined by the administrator of the respective
virtual_organization.
A reference to a file catalogue - a symbolical name - begins with
the prefix "
lfn:" (abbreviation
of logical file name) but the syntax following this
prefix is different depending on the type of the File Catalogue:
Two type of File catalogues has been tested:
- LFC used for example in the
VO hungrid, seegrid, gilda and voce
- RMC used former in the VO voce.
In both cases the user is emphatically
suggested to define the environment variable "LCG_GFAL_INFOSYS" as the
catalogues are accessible via the information system.
This environment variable is mostly defined by the system
administrator of
the UIF machine i.e. on the same machine where the P-GRADE Portal
server runs.
However, it is possible that the working nodes (CE -s) where the
actual jobs run miss this setting. In that case the operations relating
remote files will fail.
The user should put this setting manually in the
JDL
part of the Job Properties window. See
Figure 10.8
The value of this setting may differ in different VO-s. Please check it
in the UIF machine with the instruction
set
| grep
LCG_GFAL_INFOSYS
Typical values are at the time of writing of this manual:
lcg-bdii.cern.ch:2170
for the VO voce
bdii.phy.bg.ac.yu:2170
for the VO seegrid
2.2.2.1 LFC file catalogue
The
file name here has a fix hierarchical form:
/grid/<VO>/<Username>/[<LFC_Catalog_Directory_Name>/]...<fileName>
where the
<LFC_Catalog_Directory_Name>-s must refer existing catalogue
directories having been defined by proper LFC commands[
4].
See
Figure 8.3 as example.
In connection with the usage
of LFC catalogue the special setting of two environment variables
is required:
- LCG_CATLOG_TYPE
must
be set as "lfc" and
-
LFC_HOST
defines
the URL of the LFC catalogue
It is very important that it
is the responsibility of the user to set these environmental variables
properly in the JDL description. (See
Figure 10.8 )
2.2.2.2
RMC file catalogue
In this case
the name is not hierarchical, but a plain string. For example:
MyTestFile_25_Nov_2005
No user setting of
environment variables is required
2.2.3
File Storage
In the
EGEE the remote (grid) files are stored in so called Storage
Elements. Local administrators of the sites
belonging to common virtual
organization
may have
different policy about usage of the local Storage Element.
The user
can instruct the system within the P-GRADE Portal to store the
generated output file on a certain Storage
Element.
This is a
possibility of the JDL description modifiable by the Workflow_Editor. See the
input field Output SE in the Figure 10.6
The user can
explore the available Storage Elements by two ways:
2.2.4.
Example
Figure 8.3
High level file definition used by the LFC catalogue
.
IX User
quotas
For the safety of the overall operation
of the
Portal_server the
Release 2 of the P-GRADE Portal introduces the term of and manages the
administration of user quotas.
User quota is a
predefined amount of the storage resources available for a User on the
host machine acting as the server of the
P-GRADE Portal
(See "Portal server" on
Figure 2) .
The amount of the user quota (defined in MB) is set by the system
administrator of the P-GRADE Portal centrally:
The administrator can set different amount of storage for each user and
can reset
it
at any time.
See the
pane Quota per portal user on tab
Settings which defines a common
default value
and the
pane User Quota listing the users
with their quota limits
where the administrator can define
individual values:
Please
remember that this pane is visible and editable only by the
administrator (user
root).
Figure 9.1
Note:
In
the eventual (and possibly improbable case) when user quota becomes
exhausted as a consequence of the activity of the administrator
who has decreased the quota, the
user
will get the same warning messages as if he/she would have stepped
over the limit.
No user data will be lost but the
user will be forced to take measures to free
enough places.
The quota is the highest
amount of the valuable
common storage resource which can be allocated by a user directly or
indirectly:
- Direct usage - where the user has direct control over the amount
-
may involve the files
(input files, code of
executables and graph description) uploaded in the saving process
of workflows and
the output files having been generated
as the result of the runs of the workflows.
- Indirect usage may involve the trace
files generated during job level monitoring.
The
quota management does not guarantee the availability of the defined
amount.
The only
purpose of the
quota management is the prohibition of excessive usage and/or of
malevolent exhausting of the common storage resources.
Shortly speaking it defends first of all the system against the user,
but not the user against the system.
If the quota is exhausted the
user receives a proper warning
message.
Suggested user actions:
- The simplest thing the user can do is to delete the obsolete
workflows from the server (See button Delete
on Figure 38 ) or the user can use the Workflow Archive Service
to save the workflows on the desktop machine , or
clear parts of them.
- If the workflow is important the user may experiment by
substituting the referenced input_files
by shorter ones and/or performing runs supplying shorter or no output files.
- A tricky method is to start and immediately abort a workflow: in
this case the residues of the previous runs are deleted by
the system.
X Connection to the
EGEE Grids and the
usage of the Broker
1. General
rules to submit individual jobs of a workflow by the Broker of
the EGEE
Since the Release 2.2 of the
P-GRADE portal the user can submit one or more jobs of a workflow with
broker support into an
EGEE like Grid
[4].
However this freedom is coupled with the installation
restriction
that the
Portal
Server ( see
Figure 2)
must be set up on a
so called "UIF machine" belonging to the EGEE like Grid
to be reached.
The main differences in the usage between a traditional low level
Globus
Grid and an EGEE like Grid from the point of view of user are the
followings:
- The user lets the Broker service of the EGEE like Grid to
choose
the
optimal resource - in this case - the Computing
Element where the given job should be submitted.
- The input of the Broker is the parameter set corresponding to
the
rules of the Job Description Language (JDL) [3].
The JDL description defines the job and
may store user hints in order to select an optimal
resource.(See Figure 10.9 showing a restriction on
the host of the required resource)
The JDL description can be edited by the user. However in the current
implementation of the P-GRADE Portal
large parts of the JDL description are inherited from the proper parts
of the job description, and these entries
can be altered only by changing the corresponding parts with the Workflow_Editor.
See Figure 10.3 10.5
10.6
where the corresponding windows are shown together:
On the left sides
you see the primarily editable windows of Port properties, on the
right hand side you find their mappings in the JDL description.
In a similar way the proper editable values of Job
Properties ( Figure_10.1) will be mapped into the JDL tabs Job (Figure
10.2) and Sandbox (Figure_10.3)
- The user can use the services of the classical Storage Elements to
use remote files.
The remote files must be referenced by logical names (See Chapter VIII 2.2.2_File_reference)
As the remote
files in this case are Grid files, and there is no expectation against
the user to provide such special executable code which is able to read
Grid
files directly, therefore - in the default case, when the flag managed_copy is set - the applied
connector infrastructure of the
P-GRADE Portal copies
the referenced input
remote
files
as temporary local files on the
working node where the job actually runs.
With this method the
executables are much more portable and it is easy to create and test
them in a local environment.
However with the unsetting of the flag managed
copy the user is able to indicate that the
supplied
code will read the Grid files directly and therefore
the resource consuming copy step can be skipped in given
cases. The port properties
part of the Figures 10.5
and 10.6 demonstrate the usage
of Grid files. Please note that the logical file names are
assigned by the prefix lfn:
The system recognizes
a
virtual
organization with broker support if two
conditions for the
Name
defined in the window "GRID configurations" (See
Figure 6.1)
and selected as
Grid in the
window "Job properties" (
Figure 10.1)
are
uphold:
- The prefix of the name must be the name of a virtual_organization. This value will
be copied to the JDL description automatically . See Figure 10.2
- The postfix of the name should be "_LCG2_BROKER"
In this case the button
JDL
Editor... of the Job Properties windows becomes sensitive. (See
Figure 10.1) and the
Resource information has no
significance.
For a more detailed usage of the JDL language please consult with
[3]
2. JDL Editor details
2.1 Opening the JDL Editor
Figure 10.1
2.2
Setting retry count
Figure 10.2
In this window only
Retry count (the
highest number of repetitions in case of eventual errors) can be
defined.
In this and in all subsequent tabs of the JDL Editor the button
View opens a different window
to show the whole JDL file to be generated.
2.3 Checking the Sandbox
Figure 10.3
Local files of the
ports and the executable of the
job
are copied in the proper Sandboxes.
Please observe
the proper mapping of
Internal File
Name from the left hand side of Figure 10.3
and from
Executable
of the Job Properties window (
Figure_10.1)
to the right
hand
side of Figure 10.3
Several system files (an envelop shall,
info.tar.gz,
x509up... ) are needed
to copy the eventual
remote
input files to the executing machine, and to start the
executable of the job.
Please remember that brokering and the mentioning of the eventual
remote input files
in the tab
Input Data of
JDL (See
Figure 10.5)
does not ensure in itself the access to the
remote
input files
from the
executable program in the working node of the
CE
therefore the implemented automatic copy
mechanism of the P-GRADE Portal infrastructure is
used
(See
Remote_input_file_handling)
2.4 Setting
Ranks&Requirements
Figure 10.4
The fields of Rank and Requirements
can be filled according to the rules of the JDL. It is
free text from the point of view of the portal server and the checking
of the syntax will be done by the broker and the eventual errors will
be returned in the standard Error Output channel run time.
2.5 Checking Input Data
Figure 10.5
2.6 Setting optional Storage Element
in Output Data
Figure 10.6
If
the job has a proper remote
output reference then the system will deliver it automatically to
the proper destination.
The user can define a destination
Storage_Elements
in the text field of
Output
SE: In the absence of this definition a default "near" one will
be
used.
2.7 Setting the
Environment Variables eventually needed on the Working Nodes of the
Computing Element
Figure 10.7
The next window shows a typical
setting to reach lfc catalogue on the worker node:
Figure 10.8
2.8 Example of "misuse" :
Direct a job to a dedicated site
Figure 10.9
2.9 Important notice
to MPI submission
Because of a well known problem of the
LCG information system the MPI submission for the time being needs the
following user entered requirement extension of in
the tab Rank&Requirement
of the JDL:
(other.GlueCEInfoLRMSType
== "PBS") || (other.GlueCEInfoLRMSType == "LSF")
XI Rescuing the workflow
The
execution of a workflow may
fail for many reasons. In general, however, this means that some part
of the
workflow had completed already and only the left part has to be
executed for
the completeness of the workflow. In such cases it saves time and CPU
time if
the user can examine what might have gone wrong, do modifications, such
as
reallocating the failed job to a proper resource, and then resubmit the
non-finished jobs of the workflow. This mechanism is supported in
P-GRADE Portal
from Release 2.2 and is called rescuing. Currently before rescuing a
workflow
the user can modify the resources of a job in the Workflow Editor or
can adjust the certificate belonging the resource in the
Certificates tab of the portal.
The general assumption is that the code our workflow is tested, and the
genuine input files and especially the eventual remote input
files do not change during the period the error
is detected and the failed jobs are restarted. Shortly speaking
Rescuing may help to overcome difficulties having arisen due to
broken resources and invalid certificates.
Please
read the next step-by-step
guide for getting familiar with the Rescue function as a portal user.
-
Workflow status: rescue

Figure
11.1
The submitted job "Count3" of the workflow
"demo-RESCUE" has failed for some reason, and the
workflow
status has changed for rescue,
which means that the user may modify the
workflow
and then may attempt to let it run further by pressing the button Rescue.
Please note that the execution of the workflow will stop only then when
there is no more independent job to be executed.
-
Read the log for possible reasons
Figure
11.2
The user reads
the error log belonging to the failed job and identifies the
authentication
problem at the given resource. He decides to launch the Workflow Editor
in
which he can reallocate his job to a working resource, see this in the
following step.
-
Modify the workflow:
reallocating the failed job
Figure 11.3
The user
reaches the workflow (by button Attach Figure 11.1 )which is now in Rescue mode (
stopped job painted blue). He opens
up the job
properties window
for the problematic Count3 job:
Figure 11.4
Then the user
changes the resource
in the
window job properties to a
properly working one.
Figure 11.5
Finally, in the Workflow menu the user
saves his
modification with the menu item Save
resources, which stores his modification
on the server side.
-
Rescuing the workflow
Figure 11.6
In
the window
Workflow Manager
the "continue button" Rescue in
this state
is appearing . Clicking the button Rescue the previously failed
job "Count3" starts running on the new
resource.
The already finished jobs Count1 and Count2 will not
be resubmitted!
Figure 11.7
-
Workflow finished
Figure 11.8
With
modifying the resource the user could
Rescue his workflow, which then successfully completed only by
executing the
non-finished jobs and preserving the results of the finished jobs from
the
first attempt.
XII. Welcome Menu
Since the Release
2.2 of the P-GRADE Portal a new Welcome portlet greats the user
logged
in .
In this menu the user can customize the portal and can alter own
role, personal data, and first of all the original password
received from the system administrator.
Figure 12.1 Welcome menu
XIII Workflow archive
service
An existing workflow can be saved from
the Workflow Manager list
of the
Portal Server
and stored in the
local file system belonging to the user's
Desktop Machine and can be
uploaded from there in the reveres order subsequently. See
Figure 2 (arrows
Workflow/Storage/Download Workflow/Upload) for overview
and
Figure 13.1 for the actual usage:
Figure 13.1
1 Saving the definition of a workflow and clearing the
temporary parts:
Clicking on the operation
Storage (
Figure 13.1) opens the storage list
showing the workflows can be saved:

Figure 13.2
Three parts of a workflow can be handled
independently:
- Under column Workflow the definition
part of a workflow is accessible.
Download selects a workflow
and opens the Download
Manager of the browser, by which the user can define a
destination in the local file system in order to download
the
definition of the selected workflow in form of a compressed file.
The saved workflow can be retrieved later from the local file
system
(See paragraph 2.
Uploading
the definition of a workflow to modify / resubmit or uploading the
content of a trace fie for visualization:)
Please note that the workflow is saved in its current state i.e.
with its eventual temporary files.
If you do not need this please apply set
init:
set init is an auxiliary
operation to discard the temporal files have been generated during
eventual previous workflow submissions.
Both cases -with an without set init
- may have own merits:
Saving the workflow in the state as it was facilitates the
subsequent investigation of a spoiled run by an expert (For
example to discriminate user, portal and Grid related errors in
complicated cases)
Saving the workflow bringing it to the
init
state minimizes the information needed to save the definition of
the workflow. This option will be suggested if the user wants to
migrate the workflow to a different user, to a different portal,
or wants to save it intending to resubmit or edit it in the future.
- The operations under column Trace are
optional and depending on the existence of the trace file .
As trace files may be of substantial size they can be Downloaded or Deleted separately.
- Under the column Output
there is no Download option as
this functionality is available under the
Workflow/Workflow Manager
tag. Here only the output of a workflow can be Deleted from the Portal server
machine.
Please
note that in the forth column ALL the button Delete
is
visible only if the workflow is inactive i.e. the workflow
is
not in the Workflow
/ Workflow Manager list
2. Uploading the
definition of a workflow to modify /
resubmit or uploading the content of a trace fie
for visualization:
Clicking on the operation
Upload
(
Figure 13.1)
opens the set of file browsers to define the paths of the saved
files in the user's desktop environment
to be uploaded in the Portal Server:
Figure 13.3
The input field of
Workflow archive must refer to one
of the compressed files have been previously stored by the
Storage/ Workflow Download
operation. (See paragraph
1_Saving_the_definition_of_a_workflow)
Demo Workflows are
prefabricated
example/test applications to be uploaded. See more detailed in
the
next section
Important notice:
The result of the successful
Upload
from a
Workflow
archive operation will not be visible immediately in the
Workflow
/ Workflow Manger list.
However it appears both in the
Storage
list, and in the
Open list of
the
Workflow_Editor.
Therefore user following the successful
Upload should
- enter
the Workflow Editor in tab Workflow
/ Workflow Manger (Figure 13),
- Open the
workflow list in the Workflow Editor, select requested Workflow
- Save it on
the server (Figure 32)
- (and hit the Refresh
button on the Workflow
/ Workflow Manger tab)
(See the arrows
EDITOR/Open ,
EDITOR/Save|Upload of
Figure 2 for overview
)
3. Uploading of
the demo applications
The
Demo
Workflows section of
Figure 13.3
shows the available prefabricated demo applications.
These generally test the P-GRADE Portal
and the current environment (certificates, settings and the Grid).
The names and numbers of the displayed
test applications may be different from that shown by Figure 13.3 , and
they may be reset by the portal Administrator.
The user can either select one application (by the
radio button confirmed by
OK button
) or all the available Demo Workflow applications (by the
Upload all button).
The selected applications will appear in the Workflow
Manager
list just after the user manually modified them by the Workflow
Editor.
However it is not guaranteed that the
application will be associated with the proper
resources, and can be submitted imediately.
The inexperieneced Portal user is suggested to follow the next
steps:
- Select an application by the radio button and confirm
the
upload with OK.
- Control the success of Upload reading the Message line
- Control the existence of valid proxy certificate in the Certificate tab
- Control the existence of required Grids/ resources in the Setting tab
- Control the association of Grids to the selected valid
proxy certificate in the Certificate
tab
- Select the tab Workflow/
Workflow Manager
- Select the button Workflow
Editor
- Use the menu item open
in the WE window toaccess and download the demo
application.
- In the appearing WE graph open each
job of the application:
Select one of the resources has been defined/checked in (4) , and
conform the changes by OK
- Save the workflow by SaveAs..
- Submit the workflow
with the proper button of the tab Workflow
/ Workflow Manager
3.1 The Equation
Solver application
This application solves the n (in our example 5)
dimensional equation system
A*
x =
B
See details here
[5]
The Figure 13.3 contains four versions of the the
common workflow prepared for two different virtual
organisations, and discriminating in each the direct
(static) and dynamic (Broker associated) resource
reservations.
The expected results of
x
(approximations of the vector [1,2,3,4,5] ) can be read out the
simplest way by hitting the
Out
button of column
Logs
belonging to the line of Job
Multip_B
in the detailed view of the the submitted workflow
within the Workflow Manager portlet.
XIV
References
[1] Mercury
monitor:
[2] P-GRADE:
[5 ] Equation Solver application
http://www.lpds.sztaki.hu/pgportal/v23/includes/Equation_Solver.html