Community Shell

From TeraGrid Wiki

Jump to: navigation, search

Contents

Background

The purpose of this document is to show how to install and configure the Community Shell for Science Gateways using Community User Accounts and Community Software Areas. The intended audience consists of both system administrators and community software developers. Instructions specific to each type of user are given in separate sections below. But first, a little explanation as to the need for the Community Shell.

A growing trend in supercomputing is the use of community gateways that provide supercomputing resources to a wide audience. These gateways maintain a user database so that each user can log in to an individual account at the gateway. However, the community of users may share a single community account credential which is utilized when performing computations on the supercomputing resources. Thus, while the gateway/portal may be able to track individuals, the supercomputing resources can only track the single community account user. Since all gateway users (theoretically) have access to the community account credential, a gateway user could conceivably execute unauthorized code on a supercomputing resource and do so with a high degree of anonymity, which may result in all the users of that gateway losing access to the resource. The goal of the Community Shell project is to mitigate this potential for abuse by placing restrictions on the applications which may be executed by a community account.

The proposed solution requires a collaborative effort between the system administrators of the supercomputing resources and the community developers of the gateway. In this scenario, a community developer is responsible for requesting resources used by the gateway, in particular a Community Software Area and a Community Account. The community developer also decides the binary applications that will be executed on the supercomputing resources. These binaries are placed in a Community Software Area directory so as to be freely accessible by the community developers (e.g. to install a newer software version). The Community Account user's shell is set to commsh and configured to limit what can be executed, specifically only the binaries in the Community Software Area directory. This simple configuration scheme minimizes what can be done with the Community Account user's credential and also eases configuration for both system administrators and community developers.

In a more advanced/secure configuration, the binary files in the Community Software Area directory are not called directly by the gateway. Rather, a set of static (i.e. unchanging) scripts is written by the community developer. These scripts are placed in a protected location on the supercomputing resources with the restriction that they are run with security provided by the commsh Community Shell (i.e. commsh restricts the community account to run only the approved scripts). The scripts then call the binaries which are placed in a directory accessible by community developers. This allows for the binaries to be updated as needed (i.e., to upgrade to new software package versions) without involvement of the system administrator, while keeping the security restrictions placed on the scripts (so that only a small set of pre-approved binaries can be executed). The system administrator is responsible for approving the scripts and installing them in an appropriate directory.

Below you will find configuration details for both the simple and advanced configuration schemes.

Abbreviations

In the discussion that follows, we will use the following abbreviations for the sake of brevity when specifying directory/file paths.

  • SA = System Administrator - This user is responsible for the creation of the Community Software Area and the Community Account, as well as the installation and configuration of the Community Shell executable and associated configuration files.
  • CSA = Community Software Area - This is an allocation of disk space available at any TeraGrid site for the installation of executables and libraries that will be utilized by a community of users. A request must be made for the creation of a CSA.
  • CD = Community Account Developer - This user is responsible for requesting the creation of a CSA and associated Community Account for their community of gateway users. Additionally, the CD may design scripts which call binary executables that do the actual computation on the supercomputing resources. There may be more than one CD. Each CD has his or her own individual login to TeraGrid systems.
  • CU = Community Account User - When a CD requests the creation of a Community Account, a new user and associated credential is created. The CU's credential is utilized by the gateway to run programs on supercomputing resources on behalf of the gateway users. So while there may be many user accounts at a science gateway, there is a single account on the supercomputing resources for the CU.
  • CG = Community Account Group - When a CD requests the creation of a CSA, a Community Group is created, in the sense of a "Un*x group". The CG is named the same as the CSA. CG members consist of users who were listed as CDs when the creation of the CSA was requested. The CG allows CDs to update binary applications stored in the CSA directory.

Proposed Configuration

Directories / Files

This is the proposed layout for the various directories and files utilized by commsh, a CSA, and a CU. Details will be given in the sections that follow. Here we assume that the CSA request was made by "jzsmith", who is the primary CD. The CSA's name is "ntroport". The name of the CU is "ntrouser". While the CSA name can be different from the CU name (as shown here), we suggest that CDs try to make them the same. We have chosen them to be different in this example to illustrate the ownership of files and directories (i.e. uid and gid).

Ownership         Perm Directory / File       Usage
---------         ---- ----------------       -----
root:root         0755 /usr/local/bin/commsh  Location of commsh binary [1]
root:root         0644 /etc/commsh.conf       Configuration file for commsh [2]
root:root         0755 /etc/commsh.d          Directory for per-community configurations [3]
root:root         0755 /etc/commsh.d/ntroport Location of config file and optional scripts for CSA [4]
ntrouser:ntroport 2770 ~ntrouser              CU's home directory [5]
jzsmith:ntroport  2775 $TG_COMMUNITY/ntroport Location of CSA binaries and associated files [6]

Explanatory Notes

  1. The default configuration of commsh installs the binary into /usr/local/bin. While this is what we will use in the proposed configuration, you can change this with the "--prefix=..." option when running configure.
  2. When commsh is first installed, a sample configuration file is installed in /etc/commsh.conf.sample. The SA can look at this file for some example uses of commsh. Ultimately, the SA must create /etc/commsh.conf either by copying the sample configuration file and editing it, or by creating /etc/commsh.conf from scratch. If you want commsh to read the configuration file from some place other than /etc/commsh.conf, you can set the "--sysconfdir=..." option when running configure.
  3. This directory is the storage location for all CSA configuration directories (as shown in [4] below). In other words, this directory should contain only directories, no files.
  4. The configuration files for commsh specific to the ntroport CSA/CU reside here. The /etc/commsh.conf file contains references to the various CSAs' configuration files. For example, the /etc/commsh.conf file will have entries like this, one for each CSA:
    # /etc/commsh.conf
    CheckUser ntrouser
    ReadUserConfig ntrouser /etc/commsh.d/ntroport/commsh.conf

    This new configuration file /etc/commsh.d/ntroport/commsh.conf has the entries for binaries (for a simple setup) in the CSA directory or scripts (for a complex setup) in the /etc/commsh.d/ntroport directory which will be called by commsh. Here are two basic configuration files showing a simple configuration and an advanced configuration.

    The first configuration file shows a very simple setup where a CU is allowed to run any executable in the CSA $HOME/bin directory, using any number of command line parameters.

    # /etc/commsh.d/ntroport/commsh.conf
    # Simple configuration - direct access to CSA binaries
    # Allow the CU to run any command in the CSA bin directory with any parameters
    DirectAccess $TG_COMMUNITY/ntroport/bin/* **

    The second example is more complex and involves a bit of indirection to deter unauthorized modification of allowable executables. In this setup, a CD has created two scripts with the intention that they not be modified. They are placed in the read-only /etc/commsh.d/ntroport directory by the SA. Within these two scripts are calls to executables stored in the CSA $HOME/bin directory. (Note that the scripts are highly specific to a particular CSA and are not shown here.) The configuration file below allows a CU to call only these two scripts with two command line parameters (for input and output). The scripts call executables in the CSA $HOME/bin directory with the two command line parameters. The idea here is to give a CU a limited number of commands that can be called, but still allow a CD to update the underlying executables stored in the CSA $HOME/bin directory.

    # /etc/commsh.d/ntroport/commsh.conf
    # Advanced configuration - indirect access to CSA binaries via protected scripts
    # Allow the CU to run two scripts in the protected directory with two parameters
    # script1 and script2 in turn call executables in $TG_COMMUNITY/ntroport/bin
    DirectAccess /etc/commsh.d/ntroport/script1 -input * -output *
    DirectAccess /etc/commsh.d/ntroport/script2 -input * -output *
  5. When the request for a CU has been approved, the CU's home directory is automatically created. In this case, we deliberately chose the CSA name to be different from the CU name. You will probably want to choose them to be the same. The CU home directory will be utilized by the various binary files for input and output since the binaries will be executed with the CU's credential via GRAM. In other words, all input files should be transferred to the CU's home directory, for example via GridFTP. The CSA binaries, which are stored in $TG_COMMUNITY/ntroport, read the input files from the CU's home directory. Any output generated by the binaries will be written to the CU's home directory. This output can then be fetched using GridFTP. Note that the group permission (gid) on the CU's home directory allows for easy access by the CDs for debugging purposes.
  6. When the request for a CSA has been approved, the CD's CSA directory is automatically created. Note again that we deliberately chose the CSA name to be different from the CU name, but you should choose them to be the same. The directory $TG_COMMUNITY/ntroport is where the binaries (and other associated files) for the CSA will be stored. These binaries are called either directly or by the scripts located in /etc/commsh.d/ntroport/. This allows for the binaries to be updated frequently while keeping the scripts secure (since the scripts should not require frequent updating).

Instructions For Community Developers

Incomplete GD library configuration: missing function imagecreatefromgif
Incomplete GD library configuration: missing function imagecreatefromgif

  1. Request the creation of a Community Software Area. See the image at the right for an example of a filled-in request form.
    Note: When the CSA is created, both the $TG_COMMUNITY/CSA directory and the CG group are automatically created.
  2. Request the addition of a Community Account. See the image at the right for an example of a filled-in request form.
    Note: When the Community Account is created, the ~CU directory is automatically created. Also, you can get an X.509 credential for CU to be used by your gateway.
  3. Decide on the application binaries (and optional scripts) to be used by your gateway. This step can be tricky and requires a CD to consider not only the software needed by gateway users, but also the security restrictions desired by the SA. In a simple configuration, the commsh configuration file allows for the execution of any binaries placed in the CSA directory. In an advanced configuration, the commsh configuration file allows for the execution of only a few scripts located in a protected directory. These scripts then call specific binary applications located in the CSA directory. In either case, the binary files can be updated by a CD. The advanced configuration is more secure since the protected scripts reference only particular files in the CSA directory. So, if a malicious user placed extra files in the CSA directory, they would not be of concern since they are not referenced by the protected scripts. Of course, this advanced configuration requires advanced planning by a CD and approval by a SA since the protected scripts would not be updated very often.
  4. Simple Configuration: In a simple configuration setup, the CD places all binaries needed by the gateway in the CSA directory. The commsh configuration file is then written to allow any binaries in that directory to be executed with any number of command line parameters. The SA would install a commsh configuration file, like the one given here for a CSA named ntroport.
    # /etc/commsh.d/ntroport/commsh.conf - Simple Configuration
    # Allow ntrouser to run any binary in the CSA ntroport bin directory
    DirectAccess $TG_COMMUNITY/ntroport/bin/* **
    

    A single asterisk (*) will match any character in a single argument. In general, this means it will not match a space unless the space is enclosed in quotation marks or escaped with a back-slash. Additionally, an asterisk in the command itself will not match a backslash (/). Name your binaries accordingly. In contrast, a double asterisk (**) should only appear at the end of a command restriction specification, and indicates that any additional parameters will be accepted. Remember that binaries placed in the $TG_COMMUNITY/CSA directory can be updated by any CD.

  5. Advanced Configuration: In an advanced configuration, a limited number of scripts can be executed by the CU via commsh. These scripts should be considered to be static (i.e. seldom require modification) since they will be put in a secure location accessible only by SAs. The scripts should reference binary executables which will be placed in the $TG_COMMUNITY/CSA directory by a CD. The binaries may be updated frequently. Thus it is the job of the CD to write the scripts appropriately. Since you have created the scripts, you should know the command line parameters. This is important since it is also the responsibility of the CD to write the configuration file referenced by commsh. For the syntax of the directives for the commsh.conf file, see the commsh.conf (5) man page. Your configuration file will be audited by a SA, but ultimately the onus is on you. Below is an example commsh configuration file for a specific CSA named ntroport.
    # /etc/commsh.d/ntroport/commsh.conf - Advanced configuration
    # Allow ntrouser to execute only two protected scripts, which in turn
    # call executables in ntroport's $TG_COMMUNITY/ntroport/bin directory
    DirectAccess /etc/commsh.d/ntroport/script1 -input * -output *
    DirectAccess /etc/commsh.d/ntroport/script2 -input * -output *
    

    Here, script1 and script2 are written by a CD and installed into a secure location by a SA. The scripts take two command line parameters, one for input and one for output. These scripts call the executables located in $TG_COMMUNITY/ntroport.

  6. Submit your commsh configuration file and (optional) scripts to a SA for the appropriate TG system. The SA will review your configuration file and scripts. If acceptable, they will be installed to a secure location such as /etc/commsh.d/CSA/.
  7. Install the binary executables (optionally referenced by your scripts) into the CSA directory $TG_COMMUNITY/CSA. Since this directory has group access permissions for CDs, you can easily update the files there. However, keep in mind that the scripts will be run with the CU's credential and thus all input/output files will be in the ~CU directory.


Testing and Debugging

The restrictions of the community account require CDs to adjust their testing and debugging practices. CDs should first use their own individual TeraGrid accounts to test new applications for the gateway to run on TeraGrid systems. When the application(s)s are well-tested, the CD can work with the SA to install/modify the community scripts to enable the application(s) in the CU account. CDs have access to read and write files in the CU account (via the CG's permissions) for further debugging purposes.

Instructions For System Administrators

These instructions assume that you have followed the instructions for installing GRAM4 or GRAM5 following instructions as provided by the CTSS 4 Science Gateway Capability Implementation.

Once you have GRAM4 or GRAM5 installed you will need to install the commsh components. Instructions that tell you to install a tarball are old installation instructions. Other (older) background information can be found at install the Community Shell (commsh) application.

  1. Follow the README.install instructions for the latest version of commsh.
  2. When you are ready to use commsh with GRAM4 or GRAM5 be sure to uncomment the definition of $FILTER_COMMAND in the $GLOBUS_LOCATION/libexec/globus-job-manager-script.pl. It is located about line 20 in this file.
  3. GRAM versions prior to gram4-4.0.8-r2 and gram5-5.0.2-r2 have a known bug with the commsh code in globus-job-manager-script.pl where the fork, pbs or other LRM job will be run regardless of the return status from commsh.

    This has been fixed for gram5-5.0.2-r3; if you installed gram5-5.0.2-r3, you can skip down to step #3 below.
    If you have gram5-5.0.2-r2, you do not have to reinstall, rather you can apply a patch as shown:

    $ cd $TG_APPS_PREFIX/gram5-5.0.2-r2
    

    (or wherever you put your gram5-5.0.2-r2 install)

    $ wget http://software.teragrid.org/pacman/ctss4/globus/gt-5.0.2-r1/patches/commsh-insitu-patch
    $ patch -p0 < commsh-insitu-patch
    

    This patch may not apply against GRAM4 installations. If this is the case, you should be able to simply put an "exit;" after the "&fail(Globus::GRAM::Error::AUTHORIZATION_DENIED_EXECUTABLE)" line instead of applying the patch above. This will prevent the job from being run if commsh rejects it, but will return a less useful error than the patch against 5.0.2.

           if(defined $FILTER_COMMAND)
           {
               local $commandName = join(" ", $job_description->executable,
                                              $job_description->arguments);
               local @filterArgs = split(/\s+/, $FILTER_COMMAND);
               if(-x $filterArgs[0])  # Make sure program is executable
               {
                   local $rVal = (system(@filterArgs, $commandName)) >> 8;
                   if($rVal != 0)  # The filter command returned an error, so deny.
                   {
                       &fail(Globus::GRAM::Error::AUTHORIZATION_DENIED_EXECUTABLE);
                       exit;
                   }
    

  4. Create/Edit the /etc/commsh.conf configuration file and any other commsh config files in /etc/commsh.d/. A sample is provided in GRAM4/GRAM5 $GLOBUS_LOCATION/{commsh location}/etc/commsh.conf.sample. See commsh.conf for details. You need to edit these files for your particular setup. (An example can be seen on kraken.nics.teragrid.org:/etc/commsh.conf). Note: it is suggested that you do not use the CheckVerbose option when integrating commsh with GRAM4 or GRAM5. This causes extra information to be passed from globus-job-manager-script.pl to the globus-job-manager and it gets confused and fails and you will see the commsh "ALLOW" or "DENY" output in the GRAM5 log file. You need to add an entry for every CSA on your system. The configuration below shows the configuration for a single CSA named ntroport to be used by a CU named ntrouser.
    # /etc/commsh.conf
    # Allow any user except root to run commands through commsh
    AllowUser *
    DenyUser root
    # Check GRAM job submissions by the ntrouser user
    CheckUser ntrouser
    # Load the external configuration file for the ntroport CSA
    ReadUserConfig ntrouser /etc/commsh.d/ntroport/commsh.conf 
    
  5. Create a directory to store configurations and scripts specific to each CSA. For every CSA that requires access to your machine, you will need to create a directory under /etc/commsh.d to store (a) configuration files for commsh and (b) (optionally) scripts for that CSA which will be referenced by the configuration files. For example, if the CSA is named "ntroport", you would do the following command.
    mkdir -p -m 755 /etc/commsh.d/ntroport
    
  6. Install the commsh.conf file specific to the CSA. Within the directory you created above, you need to install a configuration file which will refer to (a) (optional) scripts contained in the same directory and/or (b) executables stored in the CSA's $HOME/bin directory. This configuration file may be created by the CD, but must be verified by a SA. For the syntax of the directives for the commsh.conf file, see the commsh.conf (5) man page. Use the Science Gateways Administration page to view the information the CU requester submitted in the Community Account Request for additional information on how the account should be configured. Below we create a very simple configuration file for the ntroport CSA which allows a CU to run any executable in the $TG_COMMUNITY/ntroport/bin directory, with any number of command line parameters.
    echo 'DirectAccess $TG_COMMUNITY/ntroport/bin/* **' > /etc/commsh.d/ntroport/commsh.conf
    chown root:root /etc/commsh.d/ntroport/commsh.conf
    chmod 644 /etc/commsh.d/ntroport/commsh.conf
    
  7. Set the shell for the CU. In order to activate commsh parsing for a specific user, the CU's shell must be changed to commsh. This involves two steps.
    1. Edit the /etc/passwd file (or your sites equivalent) and find the line for the CU ntrouser. Set the shell (typically the last entry on the line) to /usr/local/bin/commsh.
    2. Edit the /etc/shells file and append /usr/local/bin/commsh to the list. Note that this second step need be done only once. You can do this with the following command.
    echo '/usr/local/bin/commsh' >> /etc/shells
    

Use Case Scenario

A CD will write the code for the gateway which calls the binaries stored on the supercomputing resources. We assume that the gateway has a copy of the CU's credential, and that the CD has placed the binaries called by the gateway in the appropriate location (i.e. $TG_COMMUNITY/CSA). We also assume that commsh has been configured using the "simple configuration" where binaries are called directly. A typical scenario is as follows:

  1. Copy one or more local files (where "local" means the files are accessible to the gateway) to the supercomputing resource in the CU's home directory.
  2. Run one or more executables stored in $TG_COMMUNITY/CSA, using the files just transferred as input, writing any output to the CU's home directory.
  3. Copy any generated output files from the CU's home directory back to the gateway.

A simple example can be done by a CD on the command line. Here we assume that the binary sort has been placed in the CSA directory $TG_COMMUNITY/ntroport, which expands to /usr/projects/ntroport. Run the following commands from the gateway machine. Be sure to substitute the appropriate values for your supercomputing resource and associated directories.

  1. Be sure you run all of the following commands using the CU's credential. If the credential is stored in a MyProxy server, you can fetch it with the myproxy-get-delegation command.
    # myproxy-get-delegation -l ntrouser -s myproxy.teragrid.org
    Enter MyProxy pass phrase: <password not echoed for security purposes>
    A credential has been received for user ntrouser in /tmp/x509up_u28289.
    
  2. Copy a local text file to the CU's home directory.
    # set JOBID=12345
    # globus-url-copy -v file:///full/path/to/file/input.txt \
                         gsiftp://your.server.com:2811/~/input.txt.$JOBID
    Source: file:///full/path/to/file/
    Dest:   gsiftp://your.server.com:2811/~/
      input.txt -> input.txt.12345
    
  3. Sort the input file and write the results to a new output file in the CU's home directory. Typically such commands would be written to a dynamically generated script file, but we show them here on the command line for the sake of simplicity.
    # globusrun-ws -F your.server.com -submit -streaming \
                   -c /usr/projects/ntroport/sort \
                      "--output=/home/ntrouser/output.txt.$JOBID" \
                      /home/ntrouser/input.txt.$JOBID
    

    For testing the older GRAM2 version, use the following command.

    # globusrun -o -r your.server.com/jobmanager \
                '&(executable=/usr/projects/ntroport/sort) \
                (arguments="--output=/home/ntrouser/output.txt.12345" \
                 /home/ntrouser/input.txt.12345)'
    
  4. Copy the resulting output file back to the local account.
    # globus-url-copy -v gsiftp://your.server.com:2811/~/output.txt.$JOBID \
                         file:///full/path/to/file/output.txt.$JOBID
    Source: gsiftp://your.server.com:2811/~/
    Dest:   file:///full/path/to/file/
      output.txt.12345
    

See Also

Personal tools