P-GRADE Portal

Version_2.4.1 Dec 15,  2006

An introduction without tears
 
 

Content

0_Preface
i. Release notes
ii._Introduction

I The aim

II_The_Players of the PORTAL infrastructure_and_their_identifications
   1 The Players
      1.1_The_user´s_desktop_machine
      1.2_The_Portal_server
      1.3 The set of remote resources ( the GRID)     
      1.4 The Certificate Server (MyProxy)
   2 The identifications
      2.1_User_against_the_Portal_Server_
      2.2_User_against_own_userkey_file 
      2.3_User_against_the_Certificate_Server
      2.4_User_against_the_Virtual_Organization

III Overview of the operation of the PORTAL
   0 Preparation
   1 Uploading_a_personal_certificate
   2 Receiving_a_short_term_-_proxy - certificate
   3 Settings: Defining_the_resources_
   4 Defining_a_workflow_
      4.1_Short_introduction_in_the_Workflow Editor
         4.1.1_Workflow_creation_  
            4.1.1.1_Interactive_building_process
            4.1.1.2_Import_process
         4.1.2_Workflow_saving
         4.1.3_Workflow_modification
      4.2_Workflow_deletion
   5 Starting_a_workflow
   6 Observing_the_progress_of_a_workflow
      6.1_Progress_info_from_the_WorkflowManager
         6.1.1_Detailed_view
      6.2_Progress_info_from_the_Workflow Editor
      6.3_Progress_info_by_Monitoring_and Visualization
   7 Fetching_the_result
   8.Run time user actions

IV The detailed operation of  the PORTAL by an example
   1 Login
   2 Certificates: Setting access  rights  to resources
   3 Settings: Defining the_resources
          3.1_Direct_use_of_resources_in_the_EGEE
   4 Workflow Editor: Building_your_workflow
   5 Submitting_the_workflow
   6 Observing_the_progress_of_the_workflow
   7 Fetching_the_result
 
V Monitoring_and_Visualization
   1 Introduction
      1.1_Availability_of_monitoring
   2 Life_cycle_of_monitoring_data
      2.1_The_source_of_data
      2.2_The_transport
      2.3_The_elaboration 
      2.4_The_frame_of_destination:_The visualization interface
   3_The_Prove_program
      3.1_User_activities  
         3.1.1_Truncate_trace_files
         3.1.2_Visualization_activities_
            3.1.2.1_Filtering
            3.1.2.2_Change_state statistics
            3.1.2.3_Sorting_the_program_parts
            3.1.2.4_Zooming_in_the_time_scale

VI_Multi-GRID_support
1. Setting up VOs of Grids and default resources (by portal administrator)
2. Setting up the resource list for a VO (any regular user)
3. Allocating the workflow (any regular user)
4. Supplying certificate for each VO before execution (any regular user)

VII Information System
1. MDS-2 information system
1.1 View of available resources in the Grid
1.2 View of detailed information about a resource

2. LCG-2 information system
2.1 View of available sites in the Grid
2.1.1 Selecting a Virtual Organization
2.2 View of detailed information about a site of a Grid

VIII_Handling_of_remote_files
          1_General_aspects_of_remote_files
          2. Different_kinds_of_remote_files
                   2.1_Low_level_usage (Globus)
                             2.1.1_Protocol
                             2.1.2_File_reference
                             2.1.3_File_Storage
                             2.1.4_Example
                    2.2 High_level_usage (only_within_the_EGEE)
                             2.2.1_Protocol
                             2.2.2_File_reference
                                     2.2.2.1_LFC_ file_catalogue
                                     2.2.2.2_RMC_file_catalogue
                             2.2.3_File_Storage (Storage Element)
                             2.2.4._Example
                              
                  

IX User Quotas

X_Connection_to_the_EGEE_Grids_and_the usage of the Broker
1._General_rules_to_submit_individulal jobs by the broker of EGEE
2._JDL_Editor_detailes
2.1_Opening_the_JDL_Editor
2.2_Setting_retry_count
2.3_Checking_the_Sandbox
2.4_Setting_Ranks&Requirements
2.5_Checking_Input_Data
2.6_Setting_optional_Storage_Element_in_Output Data
2.7_Setting_the_Environment_Varibales
2.8_Examlpe_of_"missuse"_:_Direct_a_job_to a dedicated resource
2.9_Important_notice_to_MPI_submission


XI_Rescuing_the_workflow

XII._Welcome_Menu

XIII_Workflow_archive service
1.  Saving the definition of a workflow and clearing the temporary parts
2. Uploading the definition of a workflow to modify / resubmit or uploading the content of a trace fie for visualization:
3. Uploading_of_the_demo_applications
 3.1_The_Equation_Solver_application




XIV_References

0 Preface

Release notes to Version 2.4.1

  1. Possibility to store the data of the end users in reliable databases: It has turned out that the default hibernate function (HSQL ) supported by the Gridsphere is error prone, and in some cases the  logging data of the users have been lost after the restarting of the Portal therefore the Portal administrator is supported from this Release on to define and set up an external Data Base for the storing of the log information.
  2. The data transfer load of the information system has been substantially reduced. The BDII server will be asked for data on user request.
  3. A new job submission strategy has been introduced observing the current  load of the  portal server and therefore ensuring a tolerable response time for the user
  4. The own data resource handling of the portal server has been reconsidered. In connection to this the redundant storage of  workflow result  has been deleted,  and a more accurate quota handling implemented.
  5.  A bug has been fixed occurring at the concurrent up-and downloading  of  the proxy certificates. This failure occurring typically at conducted practices when many users executes the same command within a short time.
  6. Automatic VOMS extension of certificates has been introduced
  7. Jobs can be submitted to VO-s via the Glite infrastructure as well

 



Release notes to Version 2.4:

New features and improvement of services:

Revision of remote file handling: User option for non automatic copy to the worker node. ( See managed copy )

Revision of  rescue handling: The new functionality includes all types of resources involving  the submissions to a Broker

Enhancement of verbosity level, localization  and accuracy  in the forwarding of the eventual errors occurring in the grid infrastructure

Protecting the Portal server by the introduction of a changeable  limit of  jobs  being submitted and observed in one time.

Revision of  MPI job handling:  A totally new middleware ensures -(and guaranties in defined circumstances )  the success of submissions in case of MPI jobs

Bug fix:


Total revision of low level script layer

Solving the memory leak problem of the visualization

Known bugs:

B.1
The Ldap server sometimes delivers such hosts for the information system which  reference a common cluster  with different hostnames within a given site.  As the  information system  has  no additional knowledge to unify these clusters the aggregated  data gained from the component CE-s sometimes show the  multiple of the real values.  
B.2
The sites of the selected VO in the overview window of the  Information System display even those jobs not belonging to the selected VO.
B.3
In case of  several existing Workflow Editor Windows on the users  desktop the "old"  windows   tend to become zombies ( insensitive to user commands and loosing connection to the  server )

Release notes to Version 2.3:

New features:

Extended -user individual - quota handling

Full archive  facility for generated workflows (See chapter XIII_Workflow_archive service)


Release notes to Version 2.2:

 New features:

 Separation of external and internal file name references in the input/output ports of  the  jobs ( See No more restriction on file references)

Connecting  the Portal to the EGEE  Grid  and exploiting in this case the Broker service of the EGEE Grid for the jobs of a workflow directed to this grid.
(See chapter X_Connection_to_the_EGEE_Grids)


Fault tolerant behaviour of workflows (See chapter XI_Rescuing_the_workflow).

Welcome menu to change the default settings of personal user data ( See chapter XII._Welcome_Menu)

Release notes to Version 2.1:

New features:

This  documentation includes the new features of Version 2.1 highlighted in the chapters VI_Multi-GRID_support , VII Information System , VIII_Handling_of_remote_files, and IX User quotas.

Deleted features:

The
operations Copy and Paste  of the Workflow Editor considered as unimportant and error prone have been deleted.

Bug fixes:

Edited workflows in transient (incomplete) state can be stored in the PORTAL and retrieved for further editing.




  ii. Introduction

The P-Grade portals mission is to give user friendly access to Grid resources which is a technology in a rapid evolution.
This evolution is "mapped"  in the Portal which offers  general low level solutions for simple Globus Grids, and high level  solutions for the modern sophisticated Grids like the EGEE.
Throughout in this paper you will find descriptions of general low level solutions and special considerations referring only to the EGEE Grid.
As the P-Grade Portal is a Multi Grid portal, able to connect Grids of different kind a substantial effort has been taken to make the functionalities of the Portal as orthogonal as possible.
However at some point the different aspects, conditions and possibilities of the EGEE grid must have been mentioned  mixed in the general text.
 

I  The aim

 
The  P-GRADE Portal  offers a comfortable method of handling workflows from any connection point of the World Wide Web. 

The P-GRADE Portal cover several Cluster and GRID related technologies (GLOBUS2,GLOBUS3, Condor, CondorG, CondorDAGMAN, PVM, PMI )  to  meet the  need of the interested user which  intends to access remote computational resources and hides the difficulties  to activate them.

 
If you are negligent about details or if you are a hardened GLOBUS professional with bad
nerves you can get a head start with the  
chapter IV  where the usage of the Portal is explained by a comprehensive example.
 
 
 
A Workflow is a bundle of jobs  you want to edit, launch and observe from remote computer resources where access rights have been granted for you by so called certificates.
Technically a Workflow is a directed acyclic graph (DAG) where each node has a computing resource and a program ( job ) to be launched on that resource; further the edges of the graph are the ¨information pipelines¨ (streams) which connect the input and output points (ports) of the individual jobs. (See Figure_1)

Jobs are executable (sequential or parallel)  applications represented by their binary code.
 


A node is a  wrapper  of  a job  containing  the  references  of its  executable  code, to its I/O connections  and to its resource.
(See Figure 16 for an outer, and Figure_18 for an intern look of a node )

The input connection points (we will use the term port interchangeably with term ¨point¨ referring input and output connection points) of the nodes that are not connected to any other output  point of any other node  are representing the input file-s of the whole Workflow. The output points of the nodes not serving as inputs to any other nodes are representing the output file-s of the Workflow.
(Let us note, that any internal pipeline (stream) can be marked as either volatile or permanent, in this later case the data flowing through it will be regarded and recorded as an output_file  of the Workflow , see Figure_24 )
 
The task of the Workflow is to generate OUTPUT files from the INPUT ones.
 
There are several subtle points to emphasize:
 
In Figure_1 you see the input and output connection points of a node as little green and gray squares. Green indicates input ports, gray indicates output ports.
A port maps the external references (input_file, output_file, pipeline) to the internal I/0 representation of the  job
(Port usage will be detailed in the Chapter IV    The detailed operation of the PORTAL by an example  for input  and output ports)
At present there is a limitation: no more than 16 ports can be associated to a node.

 

Figure 1
 

II The Players of the PORTAL infrastructure and their identifications

 

1. The Players

 
Now let us summarize the main actors participating in the handling of the workflows ( see  Figure_2  ).

1.1 The user's desktop machine.

You need an Internet-connected desktop machine with a browser which is able to access the WWW.
Please note that the user works with two different  user interfaces in a parallel way when he/she uses  the P-GRADE Portal :
 

1.2  The Portal server

There is a remote Portal server which you can access by a browser.   
This server will be used to store your code, program data (first of all local input_files), the graphs of the workflows, the list of the defined resources  and the living short term (proxy) certificates. From here you can download your workflows to edit and also from here can you launch your workflows, and the results can be downloaded from here as well. 
The data stored in the Portal server on behalf of a single user is restricted by the user quotas.
 

1.3  The set of remote resources ( the GRID)

The most important part of the infrastructure  is the set of remote computational resources (generally of computer clusters) where the jobs may actually run.
The resources are subordinated under Grids.  See more detailed in paragraph Setting: Defining the resources

Complex Grids may subdivide the set of users and the  resources  accessible by them  in virtual organizations(VO). However this mapping may be overlapping: 
 a user and a  resource  may belong to more than one virtual organization of the Grid.
In these grids the  access right represented by a user certificate  may be  associated  to one (or more) virtual organization(s)  and not to the whole Grid.

 The EGEE Grid requires that the user be registered  at one VO. 
There is a general rule that a user must belong to  just one  VO.

The registration procedure and policy is VO dependent and not covered in this paper.
      
Resources are abstractions and  associated to sites, which performing the task of a given resource.

In the EGEE Grid a site may serve resources belonging to different virtual organizations. These resources are not only computational resources (see Computing Element) but storage resources (see Storage Element) as well. The resources of a site may be shared by different virtual organizations. However the user  access to a resource must be completed only by a valid VO membership reference.

Basically  the default  resources  are set by the  system  administrator in a static way . These data may be inherited by common users, and can be extended or changed at will.
Therefore these settings  may not correspond to the actual state of the Grid . The portlet Information_System  is used to gain actual data about he Grid.
For the time being  there is only a  restricted  facility in the P_GRADE Portal allowing the automatic setting  of  resources found by the Information System. ( See the button Load resources from MDS2  in Figure 12b)

1.4 The Certificate Server (MyProxy)

At last we mention an administrative player, the Certificate server, which is a repository of ¨certificates¨.
A certificate is virtual identity card granting  access to a set of  resources.  Certificates must be signed by a trusted Certificate Authority (CA).
 
To understand the importance of this last one here is a little notice:
 
These players (the user, the Portal_server, the resources) are connected through an unreliable channel - the Internet - therefore they have to build secure connections to identify themselves and to have sufficient protection from unjustified access.  These rather complicated tasks are executed with the help of the certificates which have an identity card feature - granting access to an expensive resource only up to a limited amount of time.

Your previously obtained personal certificate containing your personalities ( distinguished_name, your public_key , the expiration date of the public key, the name of the CA) as not encoded open data  must have been issued and ¨signed¨ by a trusted Certificate_Authority  to identify you.

The distinguished name contains the family and given name, organisation unit, organisation of the user introduced by standard prefixes ( CN=,OU=,O=).
The public key (PK) is  the binary code by  the help of a  messages which has been previously encoded by the secret key (SK)  can be  decoded :
                               message = decode ( PK, encode( SK , message ) )
Each agent (the users and the Certificate_Authority)  publishes  own public_key  and hides own secret_key)

Technically the signature means an additional text to your certificate file containing the open_data processed in three steps:
  1. A control sum will be generated from the open data by well known hash function.
  2. The result will be encoded by your public_key.
  3. The result will be encoded by the secret_key of the Certificate_Authority.

The MyProxy Certificate Server stores the public key of the Certificate_Authority  - in form of a special certificate - and therefore this server is able to decipher your public key, vouch for you and represent you against third person, what is in our case a remote resource.
The representation happens by issuing a short term -so called- proxy certificate signed by the ¨MyProxy¨ Certificate server.
This representation is needed because the resources do not accept directly the personal certificates.
 
This delegation method has four advantages against the direct use of  personal certificates:
 
 

2. The identifications

 
To handle the agents of the P-GRADE Portal environment  there are four different kinds of identification interesting from the viewpoint of the users:
 

2.1 User against the Portal Server

This first kind of identification is required when you would like to access the Portal_server via the Internet, i.e. it is your account on the Portal_server. (See Figure_3)

See more detailed in Chapter III 0 Preparation  how to gain a user account .

2.2 User against own "userkey" file

This second kind of identification  is associated to the secret_key  file  (userkey.pem) belonging to  your personal certificate.
This is needed when you upload your long term personal certificate on the MyProxy server. (See Figure_7)

2.3 User against the Certificate Server 

The third kind of identification is associated to a certificate account of  your personal certificate on the certificate server MyProxy.
You use this identification if

  2.4 User against the Virtual Organization

Users_of_the_EGEE grid must be members of a virtual organization (VO).
Registration to a VO -a one time administrative issue - happens in possession of a valid user certificate.
The registration procedure and policy is VO dependent and not covered in this paper.

 

III. Overview of the operation of the PORTAL

 
A possible full operation cycle is the following scenario:
 

0. Preparation:

 0.1 Users  of  general simple  grids.


Using the portal login name and password of the account, and a valid personal certificate (consisting of two files - see later more details Figure_6 , Figure_8  ) the user enters the Portal_server.
See more detailed in Chapter IV 1 Login
Notes:




0.2 Users of the EGEE grid

Beyond all what has been described the in the previous point the EGEE users must be members of  virtual organisations.
There is a general rule that a user must belong to  just one  VO.
Generally a user certificate 
is required for the VO membership registration.  This  certificate must be trusted  by the Grid the VO belongs to.    


SPECIAL WARNING to the users of the virtual organisation Gilda, and to the users of other  VO-s requireing  certificates with WOMS extension:
The EGGE Grid community is in transition from using the simple Grid certificate to the usage of Certificates including VO specific extensions (VOMS).
This enables a more reliable and secure access to more than one VO with one certificate.
However the VOMS related extension of the MyProxy service has been not finished up to now and the API interface to the My Proxy service is error prone.
The intermediate consequence is that the Certificate/upload functionality (See Figure 2 and Section 1 Uploading_a_personal_certificate) can not be  executed  within  the Portal for the time being.
The suggested roundabout is the issuing the of the following command in a UIF machine belonging to the given VO where the valid certificate of the user is already inserted:

myproxy-init --voms  <VO> -s <Host_of_MyProxy_Server> -p <Port_of_MyProxy_Server> -l <Proposed_user_account_name_on_MyProxyServer>

Example:

./myproxy-init --voms  gilda -s grid001.ct.infn.it -p 7512 -l myGildaCert

where the "myproxy-init" must be the special  updated command (written by the gilda people not complaining because of the "--voms" parameter )

          Please note  that the command  prompts:

After this one time roundabout the Certificate/download functionality (See Figure 2 and Section 2 Receiving_a_short_term_-_proxy_-certificate) can be executed the  traditional way within the  Portal framework.



Figure_2 indicates the possible activities that the P-GRADE Portal  permits you:
 





Figure 2
 

1. Uploading a personal certificate

By ¨Certificate/ upload¨ ( Figure_2 ) the user sends a personal certificate to the Certificate_server and establishes a certificate account

This step happens rather seldom, because the expiration time of personal certificates is fairly long. 
The uploading process is a rather complicated  transaction started from Figure 5  and explained detailed in Chapter IV 2.1.  
 The upload creates a certificate account  of the certificate, and the user must remember  the  name and the password of it for the subsequent   proxy generations.
(See also Chapter II 2.3)

2. Receiving a short term - proxy - certificate.

By the operation ¨Certificate/download¨ in (Figure_2)    the user accesses the Certificate_server   and reads the certificate account to load a short living proxy of  valid personal certificate  into the Portal_server. (The user must do it every time intending to submit a job, and there is no time  left for the current proxy_certificate. For security and economic reasons the expiration time of this type of certificates  is generally limited in one week.  Please note that the resources where you want to submit your jobs  ¨see¨ and accept only that proxy_certificate  which you  have downloaded  from   Certificate_server ¨MyProxy¨ and which you have selected to use for the subsequent submission.

The downloading process is started from Figure 5 detailed in Chapter IV 2.2.  
The user must reference the certificate account of the uploaded personal certificate . (See also Chapter II 2.3)


 

3. Setting: Defining the resources

Filling a simple table of the Portal_server the user can define the URL and the access way to the basic services of the remote resources where the jobs may run  (See also Figure 12a  and  Figure 12b)  See detailed the steps of definition at resource definition.
If the  selected GRID  has an information system, the information system may automatically explore the possible sites and services. See: VII_Information_System
The user need not bother with the definition (finding) of resources and connecting them to the  jobs  in the special case  she or he has access to an EGEE like Grid, because in this case the Broker service does this task. See it more detailed  in Chapter X_Connection_to_the_EGEE_Grids_and_the usage of the Broker

Notice:
In connection with the direct use of resources of  EGEE Grids  please  read the Chapter IV :  3.1_Direct_use_of_resources_in_the_EGEE

4. Defining a workflow

The user can create new workflows, and  load and archive existing ones.
Please note that the creation process is done with a SEPARATE program, in a different window (the Workflow_Editor) which  is downloaded from the Portal_server  and runs on your desktop. This has two consequences: 

There is an important and suggested different way of defining the workflows:
 
 They can be imported from the P-GRADE development tool (P-GRADE). This way has some advantages against the manual editing the Workflow in the P-GRADE Portal :

In the Workflow EDITOR program you use the menu item Workflow/Import workflow (See 4.1.1.2_Import_process) to open the file browser "Import Workflow" in which you can search for the needed workflow files distinguished by the name extension ".wrk" .
To learn more about P-GRADE please consult with P-GRADE
 

4.1 Short introduction in the Workflow Editor

You have learned already, that the Workflow_Editor is a separate  graphic program which can be started from the Workflow/Workflow_Manager  portlet by the button Workflow Editor of the Workflow-tab and it runs in the desktop of the user.
Shortly speaking the Workflow_Editor can create, modify and save a workflow. You will find in Chapter IV a rather long introduction in the use of the Workflow_Editor. Here is only a short  summary of  the most important menu items of it
 
 4.1.1  Workflow creation 
 
Workflow creation is the process when we define a new workflow on the P-GRADE Portal. The creation may be interactive building process, or an import process:

4.1.1.1 Interactive building process
A new workflow can be created within a recently established window (as you see it at Figure_14)    or   within an existing copy (see Figure 32) of the Workflow Editor program.
With the menu item Workflow/New you may create a new empty workflow.
By the subsequent application of  Workflow/New job and Workflow/New port you can build the proper parts of the graph of the workflow.  (See Figure_15)
 
4.1.1.2 Import process
 
A whole workflow  previously built and tested by the application P-GRADE can be imported from the desktop machine with all of it dependent parts  by menu item Workflow/Import Workflow. (See Figure32 ) The selected menu item opens a file browser enabling to select a workflow file of file type extension  wrk, which  will be uploaded to the Portal_server. This workflow will behave just the same way as the workflows you have manipulated manually. However in most cases you need only to check the destination resources of the component jobs.
 
 4.1.2 Workflow saving
 
A just created workflow has no name. It must be saved by the menu item Workflow/Save as. (See EDITOR/Save| Upload on Figure_2) This command has two effects: it uploads the workflow with its user defined name to the workflow repository of the Portal_server and puts the workflow in the launch list of the portlet Workflow Manager. After any modification of the workflow (see 4.1.3_Workflow_modification_ ) the menu item   Workflow/Save has the same effects. If the  saving process finds that any of the referenced files mentioned in the description of the saved workflow  have not yet been uploaded  to the Portal_server (or not valid –see later) it  prompts  the user to enable the start of the automatic upload process. Therefore the manual issue of the menu item Workflow/Upload is seldom used.  (See Figure_32)
 
 4.1.3 Workflow modification
 
Any part of a saved (or recently created) workflow can be modified:
add or delete a job, (See detailed at Figure 15)
add or delete a port (See detailed at Figure 19 and at  Figure 23 ),
add or delete a connection between two ports,  (method described between Figure 28 and Figure 29)
changing any attribute of the job    ( See detailed at Figure 17 and at the subsequent  Learning Notes on Job Properties ) and
                                           of the ports  ( See detailed at Figure 21 and at the subsequent Learning Notes on Port Properties).
To handle these changes the user needs to access to the workflow (i.e. to download it from the Portal_server to the desktop) by the menu item Workflow/Open. The user selects the needed Workflow from the Workflow Repository of the Portal_server.
Changes the user any file reference during  the modification process (even if he/she restores the previous text) a hidden marking will record the event, and the  previous file reference will be invalidated, with the consequences that after the subsequent Workflow/save command the user will be prompted to enable the needed upload. Shortly speaking the system automatically maintains the data consistence between the definition environment (desktop machine) and the Portal_server, and the user is exempted from the duty to delete the obsolete files from the Portal_server. (See detailed at Figure 33 and the subsequent Learning_Notes_on_Upload_File )
 
The actual modification steps are discussed in details via examples in Chapter IV  in paragraph 4._Building_your_workflow_
 

4.2 Workflow deletion

 
There is no way to delete a workflow directly with the help of  Workflow_Editor  commands.
The reason is that Workflow_Editor runs in the desktop and the workflow is stored in the Workflow Repository of the remote Portal_server under the control of

 
You find in paragraph 8. Run_time_user_actions more useful notes about Delete


 
 

5. Starting a workflow

Using the  Workflow /Workflow Manager/ Submit (Figure_2) command you can submit the  prepared workflow to the GRID i.e. you may let it run. Certainly the following conditions must be fulfilled which are controlled by the system partly at creation time and partly at load time:
See more detailed in paragraph  Run_time_user_actions
 

6. Observing the progress of a workflow

If the submission was successful and the jobs begin running you can follow the progress in three different ways:
 

6.1 Progress info from the  Workflow Manager

First of all, the elements of the  Workflow list of the Workflow Manager (Figure_38) inform you  about the state of the whole workflow (column Status), and about the eventual results (column Output). The elements of the column View  have a tree structure, and their roots are the buttons Details  (Figure_39).  In the detailed mode a sub list describes the state of each job  composing the selected workflow. 
Size shows the size of the storage needed by the Workflow in the host of the server.
Quota shows the percentage of the  quota permitted for the user. 
The label of the column Quota includes the information about the full size of  the quota (in case of Figure 38 it is 1 MB),
and the last line of Workflow list  summarizes the percentage and the size of  occupied storage.
 
 
6.1.1 Detailed view (Figure_39)
In this list each line corresponds to a component job. The line contains the following fields:
    - Workflow         name of workflow inherited from the root menu
 
    - Gridname         name of the Grid (or of the virtual organization) where the job runs.
                      Gridname  is a new feature of Version 2.1 See more detailed in  multi-GRID_support.
    - Job                    name of current component,  as the user defined it  in the  text field of "Name" of the job definition window <jobname>properties.
                                 (See  Figure 17  , Figure 18 )

    - Hostname         host where the  job  runs
    - Status                Status information must be distinguished between Workflow status and Job level status.
                                               The possible Job states with proper coloring and in the natural sequence (when applicable) are:
                                   
                                                     init               (white)
                                                        submitted   (orange)    only in case of brokering  (Since Release 2.2)
                                                         wait          (blue)        only in case of brokering   (Since Release 2.2)
                                                        scheduled  (magenta)  only in case of brokering   (Since Release 2.2)
                                                     running         (Red)
                                                     finished         (green)
                                                     error             (blue)

                                                The possible Workflow states  are

                                                     init                 (white   in overview window, green in detailed view)              The workflow is uploaded in the Server
                                                     submitted       (orange in overview window, white in detailed view)              On user action and when no job is Run state        
                                                     running           (red      in overview window, white in detailed view)              On first job enters Running state                                                                                   
                                                     finished          (green  in overview window, white in detailed view )              When the last job terminated successfully               
                                                     error              (blue     in overview window, white in detailed  view)             On error in one job and with no possible jobs to run                   
                                                     rescue            (blue     in overview window, white in detailed  view)             On error in one job and with no possible jobs to run  (Since Release 2.2 See rescue)
                                                     aborted          (red      in overview window, white  in detailed view)             On user action
     - Logs                 buttons Out and/or Error to read the eventual files
                                 written by the system on ¨stdout¨ and ¨stderr¨ respectively
     - Output             A green button indicates that, the application  terminated successfully and the result can be downloaded
     - Visualization   eventual buttons  Visualize , All to call the graphic 
                                 monitoring for the whole workflow, or for the proper  job or 
                                 for each possible parts. 
     - Action              This array of buttons is inherited from the root menu and 
                                 will be discussed in paragraph 8_Run_time_user_actions_
 
The user should return to the root menu by hitting the button Back
 
 

6.2 Progress info from the Workflow Editor

If the graph of the running workflow is selected and visible in the Workflow Editor then you can see the progress by the changing of the coloring   of the corresponding nodes  of the jobs that are being executed. 
You can start the Workflow Editor in two different ways to see the progress:
- either by the Attach button of that workflow in the Workflow list of the
  Workflow Manager
- or by hit the button Workflow Editor of the Workflow Manager and Open
  the Workflow to be observed. 
 In accordance with the convention discussed in  6.1.1_Detailed_view_ the following colors are 
 used:
                                  orange              The job  waits for user submission
                                  white                The job  is submitted and waits to run         
                                  red                   The job  is running 
                                  green                The job  is  finished
                                  blue                  The job  has been aborted either by the system or by the user

Note that since the release 2.2 there is an additional state of a job in the case it is running under the control of a Broker of  the EGEE Grid:

                                  magenta    The job is scheduled by the broker

Note that since the release 2.2 a new color indicates that the workflow failed but can be restarted from a natural checkpoint composed upon the jobs finished successfully:

                                  saddle brown    The job is in "rescue" state

 


                                  
 
 

6.3 Progress info by Monitoring and Visualization

The third method is graphical monitoring. It is discussed detailed in chapter V Monitoring_and_Visualization.
 
 
 

7. Fetching the result

The last step is fetching the result. The Portal_server puts the results in a zip file. This is a compressed directory hierarchy which follows the structure of the workflow graph. A subdirectory will be generated for each node, where the  associated permanent local  output_files are stored. The user can fetch the zip file by the standard download manager of the browser used. 
This step is marked in Figure_2 as ¨Workflow/Workflow Manager/output¨.

Please note, that the remote output files will not be retrieved to the portal server and can be accessed  by methods beyond the control of the Portal. See Figure 8.1

Please note, that the download manager is  part of the browser, and it is the responsibility of the administrator of the user's web site to set it up properly.
 

8. Run time user actions

The recent version of the Portal Program maintains  two different lists  of  workflows:
The meaning of discrimination between the active and inactive workflows is the following:
The cost of operation of polling the state of active workflows is expensive because of heavy net traffic.  Therefore the user may get  much slower responses if the number of  active workflows ( and the complexity of them ) is high.
The both  lists will be updated  as a consequence of  the  Save  or Save as command of the Workflow Editor  but  the  the  Delete  command may has  different consequences:

The user can use the buttons  of the Actions column (Figure_38) in the row belonging to the to each elements of the Workflow/ Workflow Manager list
 

 
Delete all deletes all workflows of the user from the Portal_server or from the list of active workflows.
 
 
  
 
 
 

IV The detailed operation of the PORTAL by an example

 
During the tour you will build, start and observe a little test workflow.
After the general preparation you will find the description of the workflow to build and submit after Figure 14
 

1 Login

 
 
The user can reach the PORTAL through a  proper URL. For example: 

 
 http://fn1.hpcc.sztaki.hu:9080/gridsphere/gridsphere

There you should find - depending on the browser - something like this:

 
Figure 3
 
At this point you log on the Portal_server  (see also User identification against_the_Portal_Server_).

After successful log in, the user  is automatically directed  to the new Welcome menu offering possibilities to change personal data. (See Figure 12.1 )
Selecting the tab Workflow  the user can reach the basic services of the  Workflow Manager:
 



Figure 4
 
From here you can launch the activities shown in Figure_2
Let us begin with the Certificate Manager by selecting the tab Certificates.
 
 

2.Certificates: Setting access rights to resources

 
 
Figure 5
 
The user can Upload a personal certificate to, or Download   a temporary proxy_certificate from the MyProxy certificate server with the help of the Certificate Manager triggered by the Certificates tab.
The very first action can be to fill  the Certificate Server ( the so called MyProxy server) with the existing personal certificates of the user .
Please select Upload:

2.1 Upload detailed

In this process the user creates a certificate account

Please rememeber that the Upload step -for the time being - must be skipped and done in a different way (outside of the scope of P-Grade Portal)  in the case when your Virtual Organisation   uses WOMS extension to the certificates. See WOMS_Warning

 
 

Figure 6
 
 
 
The first screen of the upload process requires your file named userkey.pem containing your secret_key  (see Figure_6).
You can search for it in your local directory system using the Browse  tool.
Fill the input field and accept it with OK. The next panel requires a password for your secret file as Figure_7
shows (see also User  identification against_own_userkey_file  ) :
 
 
 
Figure 7
 
Upon OK the certificate file will be requested. This  certificate entitles you to use certain resources for a limited amount of time.
(See Figure_8)
 


Figure 8
 
 
Upon OK you will see the window depicted in Figure_9 where you must select an existing certificate_server(MyProxy)  by hostname and port, and must define an account (login name and password ) on it where your certificate will be stored.
( See also User identification_against_the_Certificate_Server)
This Certificate account stores just one certificate, so this ¨login¨ is actually the user name of the given certificate. The default host and port of the Certificate Server is given by the system. You see here an additional input field, the lifetime - this will be an upper limit for the short term proxy certificates you may request by a subsequent download. You must hit Upload to perform the operation. 
 
                                                                                                                              


                                                                                                                                       
  
Figure 9
 
  The system acknowledges the end of the successful upload process:



Figure 9a.
 
 
Next you may generate a short term proxy_certificate. To get it you can use the  Download  button of the  Certificate List menu  open  in this state ( see Figure 9a)
You will get Download menu:

2.2 Download detailed:

 




Figure 10

 
The proxy certificate will be generated from the personal certificate by filling a form and will be downloaded to the Portal_server upon hitting the button Download.
The parameters of the form are the followings:

The fields login  name  and  password refer to the account  of the  previously uploaded certificate.
You can overwrite the default value of lifetime as the required expiration time of the  short term proxy certificate. However the  the  actual  value of the  downloaded  proxy will not
exceed  the value you have  defined  during Upload of your certificate.
If you find this limit too short, please repeat the upload process.



Upon Download in  the new (2) release of the Portal the user gets a message (Figure 10a) indicating that  downloaded short term certificate can be associated to a GRID.




Figure 10a

A GRID is an administrative community of certain resources. With the same certificate the user can reach all resources of a certain simple GRID or a certain virtual_organization of a complex grid.
Resources are subordinated to grids, i.e. each resource must belong to a given GRID (or to a virtual organization of complex Grid).
In the new release of the Portals Infrastructure  the different jobs of  the same workflow can submit programs  in resources of different available  grids.
Therefore the user must have methods to maintain different certificates to different Grids (or virtual_organizations).
See more detailed in VI_Multi-GRID_support.
This association is introduced by hitting the button Set for Grid.

The actual selection  can be performed on the subsequent frame (Figure 10b) .


 



Figure 10b.

                 Upon selecting one available name (which may refer either to a simple grid or to a virtual_organization or to a virtual organization with broker support  ) from the check list
                 labelled Select GRID  the hitting of button  OK (Figure 10b) closes the frame and returns to the  panel showing the downloaded short term (proxy) certificates:


 
Figure 11
 
        As you can see -comparing it with Figure 5,  there is just one certificate on the list in our case, and the time of the usage is restricted by the value you have used for the selection.

          Important notice:
In the [Actions] column there may be a fourth button Use this at each unselected proxies , if the list has more than one element belonging to the same GRID (or virtual organization). 
With this button you can select the actual certificate you want to apply for your subsequent job submissions directed in the respecting GRID (or virtual organization).
With the help of button Set for Grid  the GRID (or virtual organization) association can be changed at any time (See Figure 10b)
With the help of button Details the user can go back to a frame similar to of Figure 10b  but without the   possibility to set a GRID (or virtual organization).

Having defined the certificates you need resource(s) where your jobs may run.
The main menu button Settings helps you to define them. Please note that these data are stored on your Portal_server and not on the Certificate Server.
 
 

3. Settings: Defining the resources

          
In the new version  of the P-GRADE Portal infrastructure  the  resources  are subordinated  to   virtual organizations which are disjointed  administrative communities of the grids.

          However  resources  belonging to different virtual organizations  (or even to different grids )  may be used within the same workflow.  
          See the  management of the  grids in the  Chapter VI_Multi-GRID_support.
Please, note that the Name in the table settings (See Figure 12a) means virtual_organization even in cases when the grid consists of just one virtual_oganization.


Hitting the tab Settings  the list of virtual organizations  available  for the user will be displayed:

 

Figure 12a

To select a line for details the proper button Resources opens a new frame  listing the resources belonging  to the selected virtual_organization .



 


 
Figure 12b
 
            New elements to the resource listing of Figure12b  can be  added by three different ways:
          The  Contact_string  defines the entry point of  a resource (cluster ) in form of an   URL of the leading Host of the cluster  appended by symbolic scheduler
          name Job manager which classifies the demands of the job against the cluster.
          More precisely the contact string defines a program-queue belonging generally to a cluster of hosts, whose elements may execute the job added to the queue.
          In the EGEE like grids the name of  the resources  identified by a contact string  is called Computing Element (CE).
          The same cluster -identified by the leading Host- can serve several CE-s.
 
          You may remove a resource from your list at any time by Delete.

           If the NAME of Figure 12a is of virtual organization with broker support kind - for example "hungrid_LCG_2_BROKER" -  then the  window
           opening for  hitting the button  "Resources" is not modifiable, i.e. it has no significance, and is there just for historical reasons.

           The contact string is a general term, which has been slightly  extended by the EGEE Therefore the usage of  EGEE resources needs special consideration:

3.1 Direct  use  of  resources in the EGEE.


The user can explore the available Computing Elements by two ways:
  For example in case of the virtual_organization "voce":

skurut4.cesnet.cz$ lcg-infosites --vo voce ce

****************************************************************
These are the related data for voce: (in terms of queues and CPUs)
****************************************************************

#CPU    Free    Total Jobs      Running Waiting ComputingElement
----------------------------------------------------------
   9       9       0              0        0    ce.grid.tuke.sk:2119/jobmanager-pbs-voce
 166     166       0              0        0    ce.polgrid.pl:2119/jobmanager-lcgpbs-voce
  94      14      95             80       15    grid109.kfki.hu:2119/jobmanager-lcgcondor-long
  36      32       0              0        0    ares02.cyf-kr.edu.pl:2119/jobmanager-lcgpbs-voce
  78      65       0              0        0    zeus02.cyf-kr.edu.pl:2119/jobmanager-lcgpbs-voce
  46      41       0              0        0    skurut17.cesnet.cz:2119/jobmanager-lcgpbs-voce
 176     176       0              0        0    ce.egee.man.poznan.pl:2119/jobmanager-lcgpbs-voce

Two new features can be observed comparing the traditional Globus resources:

4. Workflow Editor: Building your workflow

 
Now it is time to make our own workflows. We select the tab Workflow of the main menu:
  
 
 

Figure 13
 
By selecting the button Workflow Editor,  an independent java program, the Workflow_Editor will start. 
 
Note: Your browser must have the JRE 1.5.2  Java plug-in (or higher) in order  to let this program start.
The first time the Workflow Editor is loaded during the portal session some messages regarding possible risk of using the program will be displayed.
The Workflow Editor, however, is harmless and should be allowed to run.

(The involved Webstart technology notifies the user that the downloaded program may access the local file system and prompts the user to trust or dismiss the source of the certificate )

 
In the positive case the following window will appear:
 
 
 
Figure 14
 
                
 Next you will build a simple workflow containing two  jobs  described as follows:
       Example definition:
 
For simplicity both jobs are of identical structure and use the same executable program (in real life the executables are usually different):
This executable is a simple sequential program  of C source -in our example Cell.c¨ - that reads two integer numbers from two different text files, where the program opens these files as  ¨INPUT1¨ and ¨INPUT2¨ respectively, and the value of the multiplication of the two numbers will be written in an output text file opened as ¨OUTPUT¨. 
 
You will build the connections to your first  job  ¨Cascade1¨ such a way, that it will receive its both input_files    from the local file system as  <path1>/I1 and <path2>/I2   respectively.
 Its output  ¨OUTPUT¨ will be generated somewhere in the GRID and will serve as the ¨INPUT1¨ of the second  job   -¨Cascade1.2¨.
The local file <path2>/I2 will be used as the ¨INPUT2¨ of this job .
Finally  the result of the whole workflow - ¨OUTPUT¨ of ¨Cascade1.2¨  - will be generated in the GRID and replicated to the Portal_server, downloadable for the user.
 
 
As a preparation, you must have got  the executable code (¨Cell.exe¨) and the input_file  in the form <path1>/I1 <path2>/I2   stored  in your local file system. 
 
This knowledge is enough to build the workflow as follows.  Let us define the first job  first:
 

 
Figure 15
 
Hitting the marked icon New  job  a new  job  will appear:
 
 
 
Figure 16
 
Double clicking on the  job  will allow its properties to be edited.
(An alternative to double clicking is the RIGHT mouse click on all graphic elements triggering a popup menu of possible operations.)
 
  
 
 
Figure 17
         Learning Notes on Job Properties:
 
          In this menu the user defines the code of job to run (Job Executable) and the call conditions of the job - where to let it run (Grid, Resource),
          with what kind of arguments and  conditions of the resources.
          Arguments can be
                        line Attributes   elaborated by the code internally eventually influencing the running of the job,
                        Monitor   flag to indicate graphical observation of  running of the job, 
                        and some hints about  the kind (JobType) and size of  resources needed perform the job (ProcessNumber)

          Details:
                    
Name is given by the system as default. The user can change it but the name of job must be different from any other job names. 
JobType is the kind of the code referenced by Job Executable to be started on the resource.
It can be traditional sequential (SEQ)   or parallel (MPI, PVM) . In case of MPI the resource must be
informed about the number of hosts needed by the program (Process Number)  

Important notice: If the user wants to submit MPI jobs with Broker support the special JDL requirement must be entered:
(See Chapter X 2.9_Important_notice_to_MPI_submission)      

 Job Executable defines the path of the executable code to be uploaded from the local file system.
                               Upon a successful upload and a subsequent download of the Workflow the input field will show
only the name of the executable instead of the whole path. Any change of this input field instructs  the
system to upload a new executable -defined by an absolute path - from the local file system.
The search for such a file is supported by the File Browser

Instrument is a message field set by the system  only in the case when the executable code contains
special message sending instructions for the real time monitoring  i.e.  the code is instrumented.

 Process Number has significance only if the Job Executable is of MPI type.
 It notifies the resource about number of needed hosts.


Attributes may be filled in just as the eventual command line parameters of traditional C programs.

 Grid defines a GRID  or a virtual organization   i.e. the high level administrative domain where the job must be submitted.
Changing the GRID changes the subordinated list of selectable Resources as well.

Monitor flag can be set by the user only in the case when the Instrument is set.
The setting enables job level monitoring .

Please note that  setting (changing) of  the monitor flag may have an unexpected effect  on the selected Resource (and on selectable resources):

If the previous state of Monitor  was "not  set"  and  the relating selected Resource could not be  monitored then the new monitoring request will hide all  the resources lacking the monitoring infrastructure and the current  value in the  field Resource will be replaced by  first monitorable element of the list of resources.

Resource defines  the resource  where and with what kind of assumption the Job Executable will runs within the defined Grid.
The selected resource may change implicitly as a consequence of changing Grid and Monitor setting.


If the Grid is of  virtual organization with broker support  kind - for example hungrid_LCG_2_BROKER -  then the value of
  "Resource" has no significance, and just for historical reasons.

Warning:
          If the kind of the virtual_organization defined in the Grid is of  EGEE like and Job_manager of the Contact_string defined as Resource is an LCG proprietary  one
          please consider the warning in chapter 3.1_Direct_use_of_resources_in_the_EGEE and use virtual organization with broker support  as Grid with resource constraint in the
           JDL.
 
JDL:  Is a new feature  available from  the Release  2.2.
 It indicates the  JDL editor. This editor is applicable only if the virtual_organization  selected in Grid is of EGEE compliant type  i.e. the Resource will be determined by the system upon clues of   matching characteristics (set by the user with the help of the JDL Editor) instead of direct assignment. The usage and effect of the JDL Editor will be discussed in  Chapter X_Connection_to_the_EGEE_Grids.


There is an alternate method to set the resource dependent features of the job. It can be handled centrally starting from the
main menu  Workflow Properties.

    
After filling the needed fields the window may look like this:
 
 
 
Figure 18
 
 
Hitting Ok you will return to the main window:

 
 
 
Figure 19
 
Now you can define the I/O ports to this job. Hitting the port icon you may add a new port to the selected job .
An alternate method to define a new port is the menu item New port in the main menu Workflow (See Figure 32)

The selected state of the  job  is visible by the red frame around the job  : 
 
 
 
Figure 20
 
    Double clicking on the port icon (or selecting the Properties item of the popup menu triggered by the right mouse click as Figure23 shows )
    the port properties will be definable in a pop-up window:
 
 
 
Figure 21
 

With the help of the  port properties window the user defines the direction, kind, name and file association of the port.

Learning notes on Port Properties:

A port connects  an  input or an output file opened by the job with the environment.
This environment can be from the point of view of the respecting port an external file or an other port of a different job.
The external file reference (defined in the field File)  will be mapped to the  name (defined in the field Internal File Name) used by the author of the executable to open the given file.

Notice, that there is no more restriction on usage of  filenames:
In the versions preceding  Release 2.2 of the P-GRADE Portal the Port property field Internal File Name  (see Figure 21)
has not been defined and hence there was the additional restriction imposed on  the user to apply external file references
"ending" as the job executable expects them i.e. the value corresponding to this field
  was generated from  the
"/" separated "tail" of the
File field which used to have the form
                               [[protocol]<directory>]<FileName_applicabe_as InternalFileName_as_well> 

A port   can be either an Input (In)  or  an Output (Out) port .

Input:
If the Type  of the port  is Input  AND  the port will NOT be connected to any Output port of other jobs then the port must refer to a genuine  input file.
The genuine input file must be defined as a full path  in the File in the form of  [<protocol>]<path> where
<protocol> can be defined only in case of  Remote files (see VIII_Handling_of_remote_files).

If the Input port will  not  be connected to a genuine input file i.e. it will be connected  to the output port of a different job  then File field must (and can) not be filled.

In both (local and remote) cases the user defined  input field <name> after the label  Internal File Name  must correspond to the   "fopen (<name>,"r")" instruction within the code of the job.

The set flag  managed copy means that the system automatically delivers the input file to the working directory where the associated job will run.
This is the default case.
However in  some cases when the user wants to handele remote files and the location of the input file is a Storage Element  the file may be to big to be copied.
In such cases the user may decide (by clearing the flag) to take over the responsibility of reading the Grid file. Note, that in this case the executable of the job must be prepared by the user to open and read the Grid files using the GFAL api of the EGEE.

Output:
If the Type of the port is Output AND File Type is NOT Remote then  File field must (and can) not be written. 
Please note that there is NO symmetry  between genuine input and output files: Genuine local output files
( i.e. those referenced by Output ports not connected to any other Input ports of different jobs)  are   stored in the PORTAL Server and
 will be downloaded to the local environment of the user by an interactive command after the completion of the run of workflow.
The other case  is when the  Output  File  is Remote. In this case the file referenced as string after the label File  will be stored  according the full path of form  [<protocol>]<path>
(see VIII_Handling_of_remote_files).

In both (local and remote) cases the user defined  input field <name> after the label  Internal File Name  must correspond to the   "fopen (<name>,"w")" instruction within the code of the job



Details:

Port name:
                  Given automatically by the system.  There is not too much sense to change it by the user.
         The field will be used internally to generate subdirectories. Hence it must contain only alphanumerical characters.
Type:        
                    Selector, to indicate  either a reading or writing  access  to the proper  "fopen (<name>,"{r/w}")" instruction in the code of the job
File Type: 
                    Has significance only in case of genuine files. The default setting is Local.  
If the setting is Remote then the Input file will not be uploaded to the Portal Server during the Save/Upload phase
terminating the definition of Workflow by this Workflow Editor. Instead of just the reference to the remote file will be stored and
the file transfers will be organized by the run time system.
A file defined to be Remote on an output port forces all connected input ports to be  Remote  with identical File names.
File:            
                     Reference to a genuine local or remote  file  [<protocol>]<path>
                     Please note, that this filed is not definable if the port  is an input one and connected to the output  of an other port, or the port is designed to be 
                     a local output port.

The search for such a file  in the case FileType = Local   is supported by the File Browser.


Internal File Name:
                     
Internal reference to a file  used by the author of the corresponding job in a "fopen(...)"  instruction.
File storage type:

This selector can be activated only in case of Output files. Its default setting is Permanent for the genuine Output files and  Volatile for the 
"channel" files connected  to other  ports .
If a channel file is reset as Permanent its data will not be discarded after each  connected job has read its
content but "added to the output of the workflow". It means in the case when the setting FileTypeLocal  that the file will be preserved for downloading as it would be a genuine Output file.
An eventual resetting to Volatile forces the change of the  File Type to Local because  a  temporary file  in dedicated Remote storage device is undesirable.


  Please select a proper input file with the help of the File Browser and fill the Internal File Name according the convention required by example program cell.c  :

 

 
 
 
Figure 22
 
The Port name is set automatically, however the user may redefine it.
Hitting OK you return to the original editor to define other ports.

Repeating the proper steps (basically to define the location of INPUT2 for port ¨1¨ - a similar window is shown in Figure 30)
you arrive to change the properties of port ¨2¨

Learning Notes on Port Editing:

In this case hitting the right mouse button (seen in  Figure23) offers three possibilities:
Properties - to define the Port properties
Delete       - to delete the port
Fix (Unfix)- toggle to glue (or release) the relative  graphic representation  of the  port  along the sides of the square representing the job
                   (This operation has no significance from the point of view of the semantics of the workflow)
 
 
 
 
 
Figure 23
 
Selecting Properties by left click you get the Port properties popup window where you may select "Out" as Type and enter "OUTPUT" as Internal File Name:
 
 


Figure 24
 
 
Please remember  that in this (Local) case of Output  you must not define the File even if this port would be intended as the source of the genuine output of the whole workflow. The reason is, that the Workflow, upon successful termination of the submitted  tasks, will not return individual  files. Instead it packs the Permanent Local output_files into a compressed file tree reflecting  the structure of the workflow, and you can download it by the standard method of your browser as you will see it in Figure 39.
Please note that you  will be able to identify  this file  using its Internal File Name.
If the user wants to reduce the storage load of the files produced by the job then  the eventual unneeded files can be marked as Volatile instead of the default Permanent.
Hitting Ok completes the definition of our first job and  the  icon New job can be selected.
 
 
 
 
 
Figure 25
Let us define the job properties as previously and let us create the ports for the new job similar to Figure19, and let us select the first (0) one.
Here the  File will not be  defined  because this port will be connected to  the output port of the other job:


 
Figure 26
  Hitting Ok we will receive the following Warning message Figure 27:
 
 



 
 
 
Figure 27
  The simplest  way is to answer it with  No  and proceed to  perform the  port connection.
  After closing the Port property window the Editor looks like as:
                                                                                                                                            
 
 
 
Figure 28
 
Now we connect the output (port ¨2¨) of ¨Cascade1¨ with the input port ¨0¨ of the second job .
Pressing the middle mouse  button on the output port, holding it down, dragging it up to the proper input port  and releasing it will define the desired connection.
                          Editing Notes :

No rubber line will be seen during dragging.

Clicking on the arrow connecting ports the color changes from blue to red and the connection becomes selected.
The selected  connector can be deleted by the proper icons of the menu bar  (cat or delete) or even attributed it graphically
to influence the color oft he connection for the simulation  regime when the Workflow Edit  is used to display the runtime state
of the submitted workflow. Hitting right click on the arrow opens a popup menu where toggle item Switch to {ONLINE|OFFLINE}
can be selected. This  minor coloring feature does not change the  semantics of the  workflow to be defined.
 

 
 
 
Figure 29
Now we edit the second input port (Port name 1)  of the  new job:
 

 
Figure 30
 
And let us define the output port for the second Job:
 
 
 
Figure 31


Confirming the change by Ok the edition phase is complete.  We just need to save our product for the Portal_server:
In the main menu let us select the operation Save as:
 

 
Figure 32


In this state the Workflow Editor controls the correctness of the workflow.

Learning Note
In case of an eventual error  (mostly bad references to the local files to upload, missing resources) a warning message appears about the found errors.
Even in this case the user may decide to save the workflow. However in this case the workflow is marked as incomplete for the Workflow Manager and can not be submitted only to be stored for a later modification.
This modification is initialized by the Open menu command (see Figure32 ) supplying a list of the workflows of the user stored in the  PORTAL Server  .
Selecting one workflow it will be downloaded and the editing can be completed.

In case of saving of new workflow (in case of Save as or  at the first use of Save)  a popup dialog  (Figure 32a) prompts the user for a  name of the Workflow.
This must be of alphanumerical characters and must be different of workflow names have been stored in the PORTAL Server .



Figure 32a


 
Let's define the workflow as  ¨WF1¨.
In a subsequent step system automatically proposes to issue the Upload command to transfer the referenced executable code(s) and the input_file(s) from the client's desktop to the Portal_server:

Learning notes on upload file :
The Upload proposal  happens in the following cases:
                               If the user refuses the suggestion   the workflow remains incomplete .
                               Upload command can be issued later at any time even manually. See Menu  Upload Files... of  Figure 32

 
 
 
Figure 33
 
You select Yes and then the system starts the uploading process, which is indicated in a pop up window Upload containing a progress bar. 
Upon termination the message Finished will be visible and the system will wait for the user to press the  Close button:
 
 
 
Figure 34
 
 
 
 Executing the editing steps above we  have finished the  creation of our new workflow WF1 and can leave the  Workflow Editor  to return to our Workflow Manager
 Its page (See Figure 35) must be  Refresh-ed  to show our new workflow WF1, which is now ready to run. 

 Before doing it let us control  the associations of jobs and resources.
 It can be done either step by step visiting the jobs for properties or centrally by a new menu command of Release 2 Workflow Properties (See Figure 32 )
 It opens the following  table:



Figure 34a.

  If a change is needed it can be performed in a 6 step process:
  1. Select a proper Grid
  2. Select the required  resource from the list of the loaded Resources. (Remember the resources are Grid dependent)
  3. Mark the left of the line(s) belonging to the required Job(s). 
  4. Confirm the changes with the button Set selected
  5. Leave the window by Ok
  6. Save the Workflow
This table can be used the similar way to control the monitoring of the  jobs. If the code belonging to the job is not instrumented then
the association will be refused.

 
 

5 Submitting the workflow

 
 
 

Figure 35
 
With the Submit command we can activate the workflow:
 
 
 
Figure 36
 
 

6 Observing the progress of the workflow

 
A side effect of submitting is the changing of the Submit button to Abort.   A subsequent Attach  command reopens the Workflow Editor but in a new cast: The progress of the workflow can be followed by the changing of the colors:
 
  
 
Figure 37
 
In this state the first  job  has received the control and the second is waiting for the termination of the first.
A click on the Refresh button of the Workflow manager window may indicate the successful termination of the workflow.
The Portal_server just collects the referenced files and makes one compressed downloadable file  out of them: 
 
  
 

 
 Figure 38
 

7 Fetching the result

 
A click on the green button in the [ Output ] column  starts the download manager of the browser to copy the result file on the desktop of the user.  Please note that the user defines the destination library of the workflow result with the tools of the download manager in a browser dependent way. 
 
The pressing of the  Details button  of the [ View ] column opens a window from where important information can be concluded: 
 
 
  

Figure 39

 
Beyond the verbose state of the constituent  jobs in the Status column you can get the graphical rendering of the two stages  Time – Process  communication diagrams ( by pressing the buttons under column [Visualization] . You can also see the eventual  messages of the   jobs  directed to the standard output ( by pressing the button Out) and/or to the standard error (by pressing the button Err  - not visible in Figure 40 as the jobs did not produced error messages ) channels. This buttons are placed in the column [Logs]. 
 
Hitting  the Visualize button, the visualization is performed by the independent  program called Prove that is working on a proper trace file of the workflow.  The availability of  job  level visualization is depending on two necessary conditions:
 
As you see this was not the case in our simple example. 
You will get a more comprehensive view of the possibilities available for monitoring in Chapter V_Monitoring_and_Visualization .
 
Finally we show the window returning the content of the standard output upon hitting the Out button of Cascade 1.2
 
 
 
 
Figure 40
 
 
 
 

V Monitoring and Visualization

 

1 Introduction

Graphic monitoring means the generating, collecting and graphic rendering of runtime data informing the user about the state and about the progress of the submitted workflow. In a parallel environment the dynamical conditions  triggering the run of a distinguished program parts are of special importance:  They help the user to pinpoint design flaws and the temporarily missing resources. Therefore the “time space” diagram has been selected as the base tool to render graphically the behaviour of the interacting program parts. This will be discussed in Section 3 where the work with the graphic tool Prove – running in the desktop of the user will be detailed.
 
We use the common term program parts” in the respect of graphic monitoring in two totally different contexts:
For example the upper part of Figure_43 shows the monitoring of the whole workflow, the lower part is a detailed view of the progress its  job  “cummu”.
 

1.1  Availability of monitoring

 On one hand the possibility of the high level, –or workflow monitoring is the generic property of the implied  job  submission technique “Globus/Dagman” . 
On the other hand the  job  level monitoring – badly needed first of all in cases when the  job  includes  parallel processes – can only be performed if the following conditions are all valid :
  
 
 
 

2. Life cycle of monitoring data

 

2.1 The source of data

As you see in Figure_2 the workflow results –including monitoring data, in our terminology the trace file”– primarily arrive from the remote resources into the Portal_server. Actually huge amount of data may be produced by the instrumentation. 
As each  job  is associated to a dedicated resource, there is a separate trace_file file to each job .
 

2.2 The transport

The trace_file  will be collected in an autonomous, incremental way in packages as a result of two possible events which are basically independent from the activity of the user: 
 
- The local temporary buffer for the current portion of  the  trace_file  in a host of the remote resource is full.
- The respecting  job  has terminated
 

2.3 The elaboration

These data need to be stored, filtered and elaborated. It is the Portal_server which does the bulk of this work.  It prepares the “image  file” on user demand. This “image file”   - very few byte indeed compared to the trace_file  – will forwarded to the application program Prove running on the desktop of the user. 
  
  
Why the user should know all these nasty technical nuances?
First of all to understand the cause of the delays that the double buffering imposes on the graphic rendering system. Almost as important to understand that in given cases the user should assist to diminish the load of the  Portal_server by issuing of the “forget  events” command  of the Prove,  instructing the Portal_server to truncate the corresponding trace_file  releasing the data about events have been arrived before a certain time. (There is only a limited storage quota for each user in the Portal_server  which is a precious shared resource )
 

2.4 The frame of destination: The visualization interface

The program  Prove can be started from the “detailed” view of the  “Workflow Manager” as Figure_39 indicates. 
 
Note: In the following a new workflow application is selected as an example to demonstrate  the full palette of monitoring options. This fairly complicated workflow has been prepared such a way that all of its component jobs contain instrumented codes
It is called ForecastWmin and performs a weather forecast program, see Figure_41)
 
    

Figure 41
 
In this case the detailed view of this workflow in the Workflow Manager indicates the possibility of the  job  level monitoring by the show of proper buttons (Figure_42). You can compare it with  Figure_39 of the workflow application WF1 where the buttons for the  job  level monitoring are missing, because the  jobs  of this application have not been prepared for monitoring. 
 
   
 
 
Figure 42

 
Figure 42 shows the detailed view of the  workflow ForecastWmin in an intermediate state. The  jobs  which are running and /or finished can be visualized by the program Prove  which opens independent windows upon hitting the respecting Visualize buttons. The Prove can be opened for the high level view of the workflow as well. (Button Visualize in first line of the workflow containing the name of it) 
The button All “packs” all visualization windows together starting from the high level  view as Figure_43 indicates:  
 
 
 
  Figure 43
 
 
  Warning:
   If the number of the elements along the vertical axis  (hosts / jobs)  is high than certain alphanumeric texts may not be displayed due to the low resolution.
   In that case please increase the size (especially the Height  ) of the applet.
 
 

3 The Prove program

 
As previously indicated the program Prove visualizes time – space diagrams.   
The program_parts  are represented by colored bars placed as rows of a coordinate system  where the horizontal axis denotes  the common time, and the –discrete – vertical axis is labeled by the name of program parts which may be  jobs  or processes depending on the call context of the current item of Prove.
 
 
Endpoints of arrows between bars are indicating times of sending and receiving of events respectively. These arrows must be generally blue. Exceptional red lines indicate bad trace_files, unsynchronized clocks, lost monitor information. You are kindly encouraged to report them to our Portal maintenance team.
 

3. 1 User activities 

The user activities may have effect on the  trace_file generation and  on the graphical rendering of them.
 
3.1.1 Truncate trace files
The only activity respecting the  trace_file generation is the menu command Trace/Forget events (see more detailed in the chapter 2
 
 
 
Figure 44
 
The menu command Trace/Collect is not used at present – it is reserved for forcing the remote resource to update the  trace_file .
 
3.1.2 Visualization activities
Visual rendering activities include the filtering, attributing, and time scale zooming of  the program parts. 
 
3.1.2.1 Filtering
The menu item View/Filter serves to diminish the program parts to be shown.
You can select the interesting program parts by the associated toggle marks. The selection is will be actualized by selecting the “Show changes” item, as Figure_45 shows. Please note, that “delta_m” –not visible in Figure_45- has been selected too.  
The operation Filtering can be regarded as a kind of “vertical zooming”.
 
 
 
Figure 45
 
The result of selection is shown in Figure_46:
 
 
 
 
 
Figure 46
 
 
 
3.1.2.2 Change state/statistics
Selecting the statistical regime instead of the default settings informing about the time dependent states of the program parts  a color coded statistics of the occurrence frequency of distinguished event types will be retrieved:


 
  
This operation can be started by selecting the menu item Info/Statistics/Event.  See Figure_47:
 
 
 
Figure 47
The result can be seen in Figure_48:
 
 
 
 
Figure  48
 
You can restore the original settings by selecting the menu item   Info/Statistics/Communication (Figure_49)
 
 
3.1.2.3 Sorting the program parts vertically
You can change the order of the appearance of the program parts along the vertical axis.Figure_49  shows the path to the selection of the proper menu item from the list 
 Info/Sort/{Sort by communication |Sort by name| Sort by hostname} 
 
 
Figure 49
 
Figure_50 shows the new image:
 
 
Figure 50
3.1.2.4 Zooming in the time scale
One of the most important ways of the investigation of events is the  zooming facility in the time scale. The zooming works a stack like way and does not use special  buttons of the window but the just the mouse buttons. The rules of selection are very simple:
 
The Figure_51 shows the state immediately after the range selection (the little horizontal line toward the right side of the calibrated time scale), and Figure_52  the state after the execution of the zoom instruction.
 
 
 
Figure 51
 
 
Figure 52
 
Any zoomed image (Figure_52,Figure_53) contains an active ruler . With the help of it  whole original range time range can be swept over. However this operation can be prohibitively slow: As it was discussed in 2.3, the desktop part of the Prove program must send a request to the Portal_server  for a new image which will be downloaded with a delay depending on the network. Therefore the sweeping will not be as smooth as it would be, in the case of traditional local program.
 
The Figure_53 shows the image after a repeated zoom.    
 
 
Figure 53
 

VI Multi-GRID support


In P-GRADE Portal from version 2.1 users can execute their applications in several Grids, each of which may consist of one or more  Virtual organizations, (VOs).
If a Grid consists of several VOs the user should have a certificate for the Grid and this certificate should be registered to those VOs the user would like to access.
For each of these VO-s  the user has to have a valid certificate, which will be used for authenticating the user at the resources of that particular  VO.
To use this multi-GRID support the following steps have to be taken

- The portal administrator has to set up the list of VOs, and may define a set of default resources. These  resources appear on demand
   in the resource list of every common user
.

-  Each user can then setup his own resource list for this VO
-  The jobs of the workflow can then be allocated to any  resource of any VO, so different jobs of the very same workflow can be executed on
    different resources belonging to different VOs of even different Grids

-   Before execution the user has to download a short term proxy certificate  for each  VO involved in the workflow.

Important notice for EGEE users:

The Portal ensures a multi- Grid, multi VO support independently from the underlying infrastructures.
However  certain grids may impose restrictions:

EGEE restriction:
A VO defined by the user when selects a Virtual Organization with Broker support may be in contradiction with the VO permission of  the resource selectable by that Broker.
This unpleasant situation may only occur if two conditions fulfill:

 Let's see the situation detailed:

  1. The user already a member of VO1 registers to VO2. As the site "S" also belongs to the VO2  in the Grid map File  of site "S"  the user will be mapped as  VO2 member.
  2. The user submits a job to a VO1 broker accepting him/her as a VO1 user and making the proper VO1 setting in the JDL description.(Figure 10.2)
  3. The local security system on site "S" finds a VO1 job and  from the delivered  proxy_certificate ( including distinguished_name of the user  )  determines a contradicting VO2
    membership from the mentioned Grid map File.





Let's see all this in a bit more detail.


1.  Setting up VOs of Grids and default resources (by portal administrator)

 

The Grid   and  VO and the resource list of the VO-s can be edited in the Settings tab of the portal.
Only the root user has privilege to setup and modify the list of Grids and VOs.
This means that he/she has to set up at least one Grid (or VO) and advisably one default resource for it.
In Figure 6.1 the Grid configurations window can be seen as edited by the root user.

The root user  adds a new VO by ‘Add new’, and delete existing ones by ‘Delete’.

Note  that  in case of  Grids  composed  of several VO-s  the input  field "Name"  refers  to the  VO  and the  input  field "Grid"  refers to  the Grid as  hub over the several VOs.
The distinction is necessary because  the resources will belong to the VO but the  information system access defined here refers to the superimposed Grid.
Shortly speaking the string defined as "Grid" may appear only in the top of hierarchy  when the user selects a Grid as the root  for  information retrieval (see The Information system).

If we want to define any VO -for example - "HUNGRID" -  of the "EGEE" Grid then  together with the VO "HUNGRID" we may define the access to the whole "EGEE" Grid.
Having defined the HUNGRID as part of the EGEE grid  the whole information system of the EGEE Grid becomes visible (See Figure 7.6)

In cases when the Grid is not really subdivided by VO-s  the Grid is regarded to be consisting of one VO and -similar to the multi VO  case - this  name of this VOis required
as "Name".
The filling of the field "Grid" is not obligatory, and in case of the empty input string its value will be inherited from the value of "Name" . This suits for needs of user groups
who using  simple Grids   do not want to make distinction between the idea of VO and of  Grid.





 




Figure 6.1 Grid configurations list window


The administrator can also setup an information system for the Grid of the VO  if it is available. Currently the information systems of types MDS2 and LCG2  are supported. The configuration of the Information System will then be used by the Information System portlet. If there is no information system then just choose ‘N/A’.
Please note that  in case of the LCG2 the information system  refers  to the  whole  Grid and not to just one virtual organization.

Both for MDS2 and LCG2  the host, port, base-dn have to be defined for contacting the Information System. You can see this in Figure 6.2.
For the MDS2 type you also have to refer to an existing MyProxy server account (See the "login" and "password" of Figure10, where "login" of Figure 10 corresponds to "Username" of Figure 6.2).
The other fields of the  MyProxy Server account ("hostname" and "port") are referring to the MyProxy Server itself and they are defined during the installation of the P-GRADE Portal in the  configuration file  "PGradePortal.properties".
The system will automatically download a proxy certificate from this account, and will  use it for authenticating itself against the source of the information when querying the job-manager list for the Grid.

                                                                                                                     


Figure 6.2 Defining Information System for the Grid (MDS2)

A default resource list can also be setup by the portal administrator. This user interface can be reached by clicking the ‘Resources’ button in the Grid Configuration Window (Figure 6.1) . The resource list window can be seen in Figure 6.3.


                                                                                                                      



Figure 6.3 Defining the DEFAULT resource list for the Grids

 

The portal administrator defines a default list, which will then be available for any of the users for setting up their own resource lists. Resources can be added by ‘Add’ and can be deleted by ‘Delete’. At definition the URL (for example "n99.hpcc.sztaki.hu"), and a Job manager (for example "jobmanager-fork") have to be provided.


A special case of the VO   definition is when we define a virtual_organization with broker support for example "hungrid_LCG_2_BROKER"  in Figure 6.1
In this case  no information system will be defined. For historical reasons the
window Resources  contains  in this case just one list element -mostly the- "default.jobmanager". It will be set by the administrator, and  it may not be altered  by a common user. This value is not used in Release 2.2.

 

 

2.     Setting up the resource list for a VO (any regular user)

 

Any regular user can define his own resource list for each of the available Grids.
Let us compare Figure 6.1  and    Figure 12a.  As you can see, the users cannot edit the Grid list itself, they can only edit resources list by clicking the ‘Resources’ button for each Grid.
The resource list window for any user for a particular VO can be seen in Figure 12b.  The user can add and delete resources just like the portal administrator by ‘Add’ and ‘Delete’. The default resources defined by the administrator can be loaded by the ‘Load default’. If and MDS2 type information system is defined for the Grid than it can also provide some resource configurations, this can be loaded by the ‘Load resources from MDS2’ button.


                                                                                                                    


3.      Allocating the workflow (any regular user)

The workflow and its jobs can be allocated in the Workflow Editor(WE). For any job any VO  and resource in that VO  can be set. In Figure 34a  you can see the window Workflow properties  in the WE which can be opened from the Workflow menu or using the Ctrl+W hotkey.

A Resource for the jobs can also be set in the job properties window, which opens by clicking on the job.
The  VO (Grid) in the 
job properties window can be selected marked by the label Grid. This window can be seen in Figure 18 .


4      Supplying certificate for each virtual organization before execution (any regular user)

 

In the multi-GRID environment users have to provide certificate for  each virtual organization, this means that they have to map any valid certificate for any  virtual_organization  on the resources of which they want to execute their application. The whole certificate management takes place in the Certificate tab of the portal just like before. Right after download, users are offered to map the certificate for any of the Grids. This can be seen in Figure 10a .

The click on ‘Set for Grid’ leads to the interface in Figure 10b . The details of the certificate such as the issuer, subject and timeleft are displayed, and the desired Grid can be selected.


By clicking ‘OK’ in this window the user gets back to the certificate list, which can be seen in Figure 6.4.
In the column named ‘Set for Grids’  all the names of valid virtual organizations having been  associated with the respecting certificate are encountered.
Each certificate can be assigned to any number of the 
virtual organizations, but only one certificate can be set for a given virtual organization any time.




Figure 6.4 The certificate mapping window with the Grid mappings

 

In this window you can also modify mappings by the ‘Set for Grid’ function, which leads to the certificate-mapping window already seen before in Figure 10b .



 

VII Information System


The P-Grade portal can handle the available Grid dependent information systems. 
Two kind of information systems are recognized in the P-Grade Portal: the MDS-2 and the LCG-2  Information system.

 
Configuring a Grid access (including specifying an information system for a grid) is a task of the administrator of the portal. See   Setting up VOs of Grids and default resources

 

1.      MDS-2 information system

 

The MDS-2 information system of the portal has two functions: one is getting the list of resources available in the Grid; the other is getting detailed information about individual resources.

 

1.1 View of available resources in the Grid

 

When the user clicks on the Information system tab then the MDS Monitor label the MDS Monitor module of the portal is activated by default. There are two modules under the tab "Information System" the MDS Monitor and the LCG Monitor. In case of a subsequent selection of Information system the last visited module will be activated.

 

If the administrator of the portal has not yet specified a grid with MDS-2 information system, the following message can be seen in the portal window (see Figure 7.1).




Figure 7.1

If one or more Grids with MDS-2 information systems have already been defined in the portal the following screen
( Figure 7.2 ) can be seen after the selection of the MDS Monitor label.





Figure 7.2

The user can select a Grid  to see the available resources using the combo box  which is in the upper left part of the portal window.
Having selected a grid the user must click on the View button right next to the grid combo box to see the available resources.

I
f the server (called as a GIIS server) or the service running on that server from where the portal gets this information is not
 available the following message can be seen (see Figure 7.3).




Figure 7.3

1.2 View of detailed information about a resource

 

If the user would like to get detailed information about a resource he should click on the appropriate resource in the resource list (see Figure 7.2). The page with the detailed information about a resource can be seen in  Figure 7.4.




Figure 7.4

Figure 7.4 shows that the detailed information on every resource provided by MDS-2 can be divided into a static and a dynamic part.

If any information (e.g.: CPU Model in Figure 7.4) is not available from the MDS at that moment the Not Available (N/A) text is displayed for that attributes.

 

 

2. LCG-2 information system

 

The LCG-2 information system of the portal has two functions: one is getting the list of sites available in the GRID; the other is getting detailed information about individual site.
Sites are associated with one or more virtual organization(s) VO as well.

 

2.1 View of available sites in a Grid

 

When the user clicks on the  label LCG Monitor of the tab of Information system  then the LCG Monitor module will be activated.

 

If the administrator of the portal has not specified a grid with LCG information system yet, the following message can be seen in the portal window (see Figure 7.5).







Figure 7.5

If one or more grids with LCG information systems have already been defined in the portal the following screen ( Figure 7.6 ) can be seen after the user clicks on the LCG Monitor label.




Figure 7.6

The user can select a grid for the available sites using the combo box which can be found toward the upper part of the portal window. After selecting a grid the user must click on the View button right next to the grid combo box to see the available sites. By default the sites belong to the first grid in the grid list is displayed in this page.

 

Each site in the LCG type grid is built up from Computing_Element (CE) and Storage Elements (SE).
More precisely the site is a rather geographic idea.
There can be one ore more clusters inside of a site.
A cluster can be feed by one or more queue called Computing_Element.


In the site’s list page the basic information about  CE-s and SE-s can be seen. The information for each site by default is the aggregation of all the CE and SE resources can be found at the respective site.

 

If the server (called as a BDII server) or the service running on that server from where the portal gets this information is not available the  message "Cannot contact the BDII server" can be seen.




2.1.1 Selecting a Virtual Organization

 

The users of LCG type grids must belong to one or more virtual organization (VO). The CE’s and the SE’s are associated to  VO-s as well. The CE-s may belong to more than one VO. This means that if a CE or SE associated to a VO only those users who belong to the corresponding VO can access these resources.

The user can filter the sites associated to a specified  VO  by the combo box can be found under the grid combo box in the upper part of the portal window (Figure 7.7).   See bug report






Figure 7.7


After clicking the View button right next to the combo box the sites that belong to the selected VO can be seen (Figure 7.8).




Figure 7.8

Selecting a specified VO means the following:

- The user can see the list of those sites which belong to the selected VO .

- When the user clicks on a site name  the detailed information will display only those CE’s and SE’s  which belong to the selected VO

Important remark - see  bug report B.1  while interpreting the value of columns Total Free Running Waiting

 

2.2 View of detailed information about a site of a Grid

 

If the user would like to get detailed information about a site he should click on the appropriate name of the site in the site list (see Figure 7.6). The page with the detailed information about a resource can be seen in  Figure 7.9.




Figure 7.9

As can be seen in this figure the selected VO is All. This means that all CE-s and SE-s have been found at that site are displayed.
If the user select a VO in the site list page only those CE’s and SE’s will be displayed in the detailed view which are belong to a selected VO.
As can be seen in the Figure 7.10  reflecting the site IFCA-LCG-2 with  VO
dteam only limited number of CE and SE is displayed.






Figure 7.10




VIII Handling of remote  files

1 General aspects of remote files

The P-GRADE Portal supports the handling of remote files.
Remote  is  a  place within a given virtual_organization which is different from the local file system of the user's desktop and its access is controlled by  the grid certificates.

Since the version 2.1 of the P-GRADE Portal  input files can be sent to a job not only from the local file system of the user's desk top but from trusted remote places as well.
In a similar way the  output files of a job can be sent  into  remote storage places as well.

The next figure explains the differences between the handling of local and remote files:







Figure 8.1
Life cycles of local and remote files



2.Different kinds of remote file usage




Remote files can be handled by several protocols, stored by different means and can be referenced  at several levels in a Grid (and VO) dependent way.

There are two basically different ways to use remote files from the point of view of the user:
1.  Low level usage supported by  the Globus middleware.
2.  High level usage generally supported by the EGEE infrastructure.[4]

2.1 Low level usage (Globus)

2.1.1 Protocol
To access a file on  a remote place a transfer protocol is needed, which is  explicitly or implicitly  part of the URL describing the location of the file.
 Mostly the protocol  gsiftp will be used  i.e. in this case the user will be identified against the remote host by the actual certificate.

2.1.2 File reference
The file will be referenced by the URL consists of the concatenation of  host name and the storage path of the file on that host.

2.1.3 File Storage
The remote files are stored as common files of a host and there is a  special file, the GridMap file of entries containing the so called distinguished name part of  the user certificates  associated to a user account known on that system. So the system can control the access permission of file operations. The GridMap file is maintained by the local administrator of that host. 


2.1.4 Example


 
 The system will use this information in arguments of the automatically generated  globus-url-copy instructions.

 






Figure 8.2
  Low level access to a remote input file

2.2 High level usage (only within the EGEE with broker support)


In this chapter only the most important remote file related features of the LCG like grids (for example EGEE) are covered.

2.2.1 Protocol
The protocol is of  low importance as the  JDL  job submission system and the joined internal services of the P-GRADE Portal hide the protocol from the user.
In that case the job submission  is performed by the  Broker support. See Connection to the EGEE Grids and the usage of the Broker)
2.2.2 File reference
The high level remote files can be referenced within the P-GRADE Portal by symbolical names directed to File Catalogues.
File Catalogues map the symbolical names to Grid File-s.
Grid files are not modifiable (after creation), may exists in several replicas connected by a common grid wide unique identifier "guuid" and the replicas are stored in Storage Elements.
 
There are more standards of File Catalogues. The actual type of the File Catalogue is defined by the administrator of the respective virtual_organization.
 A reference to a file catalogue - a symbolical name - begins with the prefix "lfn:" (abbreviation of logical file name) but the syntax following this prefix is different depending on the type of the File Catalogue:
Two type of File catalogues has been tested:
In both cases the user is emphatically suggested to define the environment variable "LCG_GFAL_INFOSYS" as the catalogues are accessible via the information system.
This environment variable is mostly defined by the system administrator of the UIF machine i.e. on the same machine where the P-GRADE Portal server runs.
However, it is possible that the working nodes  (CE -s) where the actual jobs run miss this setting. In that case the operations relating remote files will fail.

The user should put this setting  manually in the JDL part of  the  Job Properties window. See Figure 10.8
The value of this setting may differ in different VO-s. Please check it in the UIF machine  with the instruction

set | grep LCG_GFAL_INFOSYS

 

Typical  values are at the time of  writing of this manual:

lcg-bdii.cern.ch:2170                for  the  VO  voce
bdii.phy.bg.ac.yu:2170              for  the VO  seegrid

2.2.2.1 LFC file catalogue
            The file name here has a fix hierarchical form:

                                            /grid/<VO>/<Username>/[<LFC_Catalog_Directory_Name>/]...<fileName>

             where the   <LFC_Catalog_Directory_Name>-s must refer existing catalogue directories having been defined by proper LFC commands[4].
              
 
            See Figure 8.3 as example.
 
            In connection with the usage of LFC catalogue the special setting of two environment variables is required:


            It is very important that it is the responsibility of the user to set these environmental variables properly in the JDL description.  (See Figure 10.8
           

2.2.2.2 RMC file catalogue
             In this case the name is not hierarchical, but a plain string. For example:  MyTestFile_25_Nov_2005
             No user setting of environment variables is required   

        
2.2.3 File Storage
          In the EGEE  the remote (grid) files are stored in so called Storage Elements.  Local  administrators of the  sites belonging  to common virtual organization
          may have different policy about  usage of the local Storage Element.
          The user can instruct the system within the P-GRADE Portal to store the generated output file on a certain Storage Element.
          This is  a possibility of the JDL description modifiable by the Workflow_Editor.  See the   input field Output SE in the Figure 10.6
          The user can explore the available Storage Elements by two ways:
         


2.2.4. Example      





Figure 8.3
High level file definition used by the LFC catalogue


 .

IX User quotas



For the safety of the overall operation of the Portal_server the Release 2 of the P-GRADE Portal introduces the term of and manages the administration of user quotas.
User quota is a  predefined amount of the storage resources available for a User on the host machine acting as  the server of the  P-GRADE Portal
(See "Portal server" on Figure 2) .

The amount of the user quota (defined in MB) is set  by the system administrator of the P-GRADE Portal centrally:
The administrator can set different amount of storage for each user and can reset it at any time.
See the pane Quota per portal user on tab Settings which defines a common default value
and the pane User Quota listing the users with their quota limits where the administrator can define individual values:

Please remember that this pane is visible and editable only  by the administrator (user root).



Figure 9.1

Note:
In the eventual (and possibly improbable case) when user quota becomes exhausted as a consequence of the activity of the administrator
who has decreased the quota, the user will get the same warning messages as if he/she would have stepped over  the limit.
No user data will be lost but the user will be forced  to take  measures  to free  enough places.


The quota  is the highest  amount  of the valuable common storage resource which can be allocated by a user directly or indirectly:
The user can compare the permitted and used storage quota  in the  Workflow  List  window of  the  Workflow Manager. (See Figure_36 and Figure_38 )

The quota management does not guarantee the availability of the defined amount.

The only purpose of the quota management is the prohibition of excessive usage and/or of malevolent exhausting of the common storage resources.
Shortly speaking it defends first of all the system against the user, but not the user against the system.


If the quota is exhausted  the user receives a proper warning message.

Suggested user actions:

 

X Connection to the EGEE Grids and the usage of the Broker

1. General rules to submit individual jobs of a workflow  by the Broker of the EGEE

Since the  Release 2.2 of the P-GRADE portal the user can submit one or more jobs of a workflow with broker support into an EGEE like Grid [4].
However this freedom is  coupled with the installation restriction  that  the  Portal Server  ( see Figure 2)  must be set up on a so called "UIF machine" belonging to the  EGEE like Grid to be reached.

The main differences in the usage between a traditional low level Globus Grid and an EGEE like Grid  from the point of view of user are the followings:


The system recognizes a  virtual organization  with broker support  if two conditions for the Name  defined in the window  "GRID configurations"  (See Figure 6.1)  and   selected as Grid  in the window "Job properties"  (Figure 10.1) are uphold:
In this case the button  JDL Editor... of the Job Properties windows becomes sensitive. (See Figure 10.1) and the Resource information has no significance.




For a more detailed usage of the JDL language please consult with [3]

2. JDL Editor details

2.1 Opening the JDL Editor






Figure 10.1

2.2 Setting retry count





Figure 10.2

In this window only Retry count (the highest number of repetitions in case of eventual errors)  can be defined.
In this and in all subsequent tabs of the JDL Editor the button View opens a different window  to show  the whole JDL file to be generated.

2.3 Checking the Sandbox





Figure 10.3

Local files of the ports and the executable of the job are copied in the proper Sandboxes.
Please observe  the proper mapping of Internal File Name  from the left hand side of Figure 10.3  and  from Executable of  the Job Properties window ( Figure_10.1)  to the right hand side of Figure 10.3 
Several system files (an envelop shall, info.tar.gz, x509up...  ) are needed  to copy the eventual  remote input files to the executing machine, and to start the  executable of the job.

Please remember that brokering and the mentioning of the eventual remote input  files in the tab Input Data  of JDL (See Figure 10.5) does not ensure in itself the access to the  remote input  files from the executable program in the working node of the CE  therefore  the implemented automatic copy  mechanism  of the P-GRADE  Portal  infrastructure is used  (See Remote_input_file_handling)

2.4 Setting Ranks&Requirements





Figure 10.4

The fields of Rank and Requirements can be filled  according to the rules of the JDL. It is free text from the point of view of the portal server and the checking of the syntax will be done by the broker and the eventual errors will be returned in the standard Error Output channel run time.

2.5 Checking Input Data





Figure 10.5

2.6 Setting optional Storage Element in Output Data





Figure 10.6

If the job has a proper remote output reference then  the system will deliver it automatically to the proper destination.
The user can define a destination Storage_Elements in the text field of Output SE: In the absence of this definition a default "near" one will be used.

2.7 Setting the Environment Variables eventually needed on the Working Nodes of the Computing Element






Figure 10.7
The next window shows a typical setting to reach lfc catalogue on the worker node:
                                                                           


Figure 10.8

2.8 Example of  "misuse" : Direct a job to a dedicated site




Figure 10.9

2.9 Important notice to MPI submission

Because of a well known problem of the LCG information system the MPI submission for the time being needs the following user entered requirement extension of  in the tab Rank&Requirement of the JDL:

  (other.GlueCEInfoLRMSType == "PBS") || (other.GlueCEInfoLRMSType == "LSF")

XI Rescuing the workflow

The execution of a workflow may fail for many reasons. In general, however, this means that some part of the workflow had completed already and only the left part has to be executed for the completeness of the workflow. In such cases it saves time and CPU time if the user can examine what might have gone wrong, do modifications, such as reallocating the failed job to a proper resource, and then resubmit the non-finished jobs of the workflow. This mechanism is supported in P-GRADE Portal from Release 2.2 and is called rescuing. Currently before rescuing a workflow the user can modify the resources of a job in the Workflow Editor or can adjust the certificate belonging the resource  in the Certificates tab of the portal.

The general assumption is that the code our workflow is tested, and the genuine  input files and especially the eventual remote input files  do not change during  the period the  error  is detected and the failed jobs are restarted. Shortly speaking Rescuing may help to overcome difficulties having arisen due to broken resources and invalid certificates.


Please read the next step-by-step guide for getting familiar with the Rescue function as a portal user.

  1. Workflow status: rescue



    Figure 11.1

    The submitted  job "Count3" of the workflow "demo-RESCUE" has failed for some reason, and the workflow status has changed for rescue, which means that the user may modify the workflow and then may attempt to let it run further by pressing the button Rescue.
    Please note that the execution of the workflow will stop only then when there is no more independent job to be executed.

  2. Read the log for possible reasons



    Figure 11.2

    The user reads the error log belonging to the failed job and identifies the authentication problem at the given resource. He decides to launch the Workflow Editor in which he can reallocate his job to a working resource, see this in the following step.


  3. Modify the workflow: reallocating the failed job


    Figure 11.3

    The user reaches  the workflow  (by button Attach Figure 11.1 )which is now in Rescue mode ( stopped job painted blue). He opens up the job properties window for the problematic Count3 job:



    Figure 11.4

    Then  the user changes   the resource  in the window job properties to a properly working one.



    Figure 11.5



    Finally, in the Workflow menu the user saves his modification with the  menu item Save resources, which stores his modification on the server side.




  4. Rescuing the workflow



    Figure 11.6
    In the window Workflow Manager  the "continue button" Rescue in this state  is appearing . Clicking the button Rescue the previously failed  job "Count3" starts running on the new resource.  The already finished jobs Count1 and Count2 will  not be resubmitted! 


    Figure 11.7


     




  5. Workflow finished



    Figure 11.8


    With modifying the resource the user could Rescue his workflow, which then successfully completed only by executing the non-finished jobs and preserving the results of the finished jobs from the first attempt.




XII. Welcome Menu

Since the Release 2.2 of the P-GRADE Portal a new Welcome  portlet greats the user logged in .
In this menu the user can customize the portal and can alter own role, personal data, and first of all the original  password  received from the system  administrator.




Figure 12.1 Welcome menu

XIII Workflow archive service

An existing workflow can be saved from the Workflow Manager list  of the Portal Server  and stored in the local file system belonging to the user's Desktop Machine and can be uploaded from there in the reveres order subsequently. See Figure 2  (arrows Workflow/Storage/Download  Workflow/Upload) for overview and Figure 13.1 for the actual usage:


Figure 13.1

  1 Saving the definition of  a workflow  and clearing the temporary parts:


Clicking on the operation Storage  (Figure 13.1)  opens the storage list showing  the workflows can be saved:




Figure 13.2


Three parts  of  a workflow  can be handled   independently:
  • Under column Workflow the definition part of  a workflow is accessible.

    Download selects a workflow and opens the Download Manager of the browser, by which the user can  define a destination in the local file system  in order to download the  definition of the selected workflow  in form of a compressed file.
    The  saved workflow can be retrieved later from the local file system
    (See paragraph 2. Uploading the definition of a workflow to modify / resubmit or uploading the content of a trace fie for visualization:)
    Please note that the workflow is saved in its current  state i.e. with its eventual temporary files.
    If you do not need this please apply set init:

    set init is an auxiliary operation to discard the temporal files have been generated during eventual previous workflow submissions.

    Both cases -with an without set init - may have own merits:
    Saving the workflow in the state as it was facilitates the subsequent  investigation of a spoiled run by an expert (For example to discriminate user, portal and Grid related errors in complicated cases)
    Saving the workflow bringing it to the init state minimizes the information needed to save the  definition of the workflow. This option will be suggested if the user  wants to migrate the workflow to a different  user, to a different portal, or wants to save it intending to resubmit or edit it in the future.

  • The operations under column Trace are optional and depending on the existence of  the trace file .
    As trace files may be of substantial size they can be Downloaded or  Deleted separately.

  • Under the column Output  there is no Download option as this functionality is available under the
    Workflow/Workflow Manager tag. Here only the output of a workflow can be Deleted from the Portal  server machine.
Please note that in the forth column ALL the button Delete is visible only  if the  workflow is inactive i.e. the workflow is not in the Workflow / Workflow Manager list

2. Uploading the definition of  a workflow to modify / resubmit  or  uploading the content of a trace fie for  visualization:


Clicking on the operation Upload  (Figure 13.1)  opens the set of file browsers  to define the paths of the saved files in the user's desktop environment
to be uploaded in the Portal Server:




Figure 13.3

The input field of  Workflow archive must refer to one of the compressed files have been previously stored by the Storage/ Workflow Download
operation. (See paragraph 1_Saving_the_definition_of_a_workflow)

Demo Workflows  are prefabricated example/test applications to be  uploaded. See more detailed in the next section

Important notice:
 
The result of the successful Upload from a Workflow archive operation will not be visible immediately in the Workflow / Workflow Manger list.
However it appears both in the Storage list, and in the Open list of the Workflow_Editor.  Therefore  user following the successful Upload should

  1. enter the Workflow Editor in tab Workflow / Workflow Manger (Figure 13),
  2. Open the workflow list in the Workflow Editor, select requested Workflow 
  3. Save it on the server (Figure 32)
  4. (and hit the Refresh button on the Workflow / Workflow Manger  tab)
(See the arrows EDITOR/Open , EDITOR/Save|Upload of  Figure 2 for overview )


3. Uploading of the demo applications

The Demo Workflows section of  Figure 13.3 shows the available prefabricated demo applications.
These generally test the P-GRADE Portal and the current environment (certificates, settings and the Grid).
The names and numbers of the displayed test applications may be different from that shown by Figure 13.3 , and they may be reset by the portal Administrator.
The user  can either select one  application  (by the radio button  confirmed by OK button ) or all the available Demo Workflow applications (by the Upload all button).
The selected applications will  appear  in the Workflow Manager list just after the user manually modified them by the  Workflow Editor.
However it is not  guaranteed  that the  application  will  be associated with the  proper  resources, and can be submitted imediately.
The inexperieneced Portal user is suggested  to follow the next steps:
  1. Select an application  by  the radio button and confirm the upload with OK.
  2. Control the success of Upload reading the Message line
  3. Control the existence of  valid proxy certificate in the Certificate tab
  4. Control the existence of  required Grids/ resources in the Setting tab
  5. Control the association of  Grids to the selected valid proxy certificate in the Certificate tab
  6. Select the tab Workflow/ Workflow Manager
  7. Select the button Workflow Editor 
  8. Use the menu item open in the WE window  toaccess and download the demo application.
  9. In the appearing WE graph open each job of the application:
    Select  one of the resources has been defined/checked in (4) , and conform the changes by OK
  10. Save the workflow by SaveAs..
  11. Submit the workflow  with the proper button of the tab Workflow / Workflow Manager

3.1 The Equation Solver application

This application solves the n  (in our example 5)  dimensional equation system A*x = B
See details here [5]   
The Figure 13.3  contains four versions of the  the common  workflow  prepared for two different  virtual organisations, and  discriminating in each the  direct (static) and  dynamic (Broker associated)  resource reservations.
The expected  results of x (approximations of the vector [1,2,3,4,5] ) can be read out the simplest way by  hitting the Out button of  column Logs belonging to the line of Job Multip_B in the detailed view of the  the submitted  workflow within the Workflow Manager portlet. 




XIV References   


    [1] Mercury monitor: 

   http://www.lpds.sztaki.hu/mercury/

    [2] P-GRADE: 

   http://www.lpds.sztaki.hu/pgrade/

     [3] Job Description language How To. December 17th, 2001

            http://server11.infn.it/workload-grid/docs/DataGrid-01-TEN-0102-0_2-Document.pdf


     [4] EGEE User Guide 

              https://emds.cern.ch/file/454439//LCG-2-UserGuide.html
                     
               [5 ]  Equation Solver  application
                        http://www.lpds.sztaki.hu/pgportal/v23/includes/Equation_Solver.html