Community Shell

From TeraGrid Wiki

Jump to: navigation, search

Contents

Background

The purpose of this document is to show how to install and configure the Community Shell for Science Gateways using Community User Accounts and Community Software Areas. The intended audience consists of both system administrators and community software developers. Instructions specific to each type of user are given in separate sections below. But first, a little explanation as to the need for the Community Shell.

A growing trend in supercomputing is the use of community gateways that provide supercomputing resources to a wide audience. These gateways maintain a user database so that each user can log in to an individual account at the gateway. However, the community of users may share a single community account credential which is utilized when performing computations on the supercomputing resources. Thus, while the gateway/portal may be able to track individuals, the supercomputing resources can only track the single community account user. Since all gateway users (theoretically) have access to the community account credential, a gateway user could conceivably execute unauthorized code on a supercomputing resource and do so with a high degree of anonymity, which may result in all the users of that gateway losing access to the resource. The goal of the Community Shell project is to mitigate this potential for abuse by placing restrictions on the applications which may be executed by a community account.

The proposed solution requires a collaborative effort between the system administrators of the supercomputing resources and the community developers of the gateway. In this scenario, a community developer is responsible for requesting resources used by the gateway, in particular a Community Software Area and a Community Account. The community developer also decides the binary applications that will be executed on the supercomputing resources. These binaries are placed in a Community Software Area directory so as to be freely accessible by the community developers (e.g. to install a newer software version). The Community Account user's shell is set to commsh and configured to limit what can be executed, specifically only the binaries in the Community Software Area directory. This simple configuration scheme minimizes what can be done with the Community Account user's credential and also eases configuration for both system administrators and community developers.

In a more advanced/secure configuration, the binary files in the Community Software Area directory are not called directly by the gateway. Rather, a set of static (i.e. unchanging) scripts is written by the community developer. These scripts are placed in a protected location on the supercomputing resources with the restriction that they are run with security provided by the commsh Community Shell (i.e. commsh restricts the community account to run only the approved scripts). The scripts then call the binaries which are placed in a directory accessible by community developers. This allows for the binaries to be updated as needed (i.e., to upgrade to new software package versions) without involvement of the system administrator, while keeping the security restrictions placed on the scripts (so that only a small set of pre-approved binaries can be executed). The system administrator is responsible for approving the scripts and installing them in an appropriate directory.

Below you will find configuration details for both the simple and advanced configuration schemes.

Abbreviations

In the discussion that follows, we will use the following abbreviations for the sake of brevity when specifying directory/file paths.

  • SA = System Administrator - This user is responsible for the creation of the Community Software Area and the Community Account, as well as the installation and configuration of the Community Shell executable and associated configuration files.
  • CSA = Community Software Area - This is an allocation of disk space available at any TeraGrid site for the installation of executables and libraries that will be utilized by a community of users. A request must be made for the creation of a CSA.
  • CD = Community Account Developer - This user is responsible for requesting the creation of a CSA and associated Community Account for their community of gateway users. Additionally, the CD may design scripts which call binary executables that do the actual computation on the supercomputing resources. There may be more than one CD. Each CD has his or her own individual login to TeraGrid systems.
  • CU = Community Account User - When a CD requests the creation of a Community Account, a new user and associated credential is created. The CU's credential is utilized by the gateway to run programs on supercomputing resources on behalf of the gateway users. So while there may be many user accounts at a science gateway, there is a single account on the supercomputing resources for the CU.
  • CG = Community Account Group - When a CD requests the creation of a CSA, a Community Group is created, in the sense of a "Un*x group". The CG is named the same as the CSA. CG members consist of users who were listed as CDs when the creation of the CSA was requested. The CG allows CDs to update binary applications stored in the CSA directory.

Proposed Configuration

Directories / Files

This is the proposed layout for the various directories and files utilized by commsh, a CSA, and a CU. Details will be given in the sections that follow. Here we assume that the CSA request was made by "jzsmith", who is the primary CD. The CSA's name is "ntroport". The name of the CU is "ntrouser". While the CSA name can be different from the CU name (as shown here), we suggest that CDs try to make them the same. We have chosen them to be different in this example to illustrate the ownership of files and directories (i.e. uid and gid).

Ownership         Perm Directory / File       Usage
---------         ---- ----------------       -----
root:root         0755 /usr/local/bin/commsh  Location of commsh binary [1]
root:root         0644 /etc/commsh.conf       Configuration file for commsh [2]
root:root         0755 /etc/commsh.d          Directory for per-community configurations [3]
root:root         0755 /etc/commsh.d/ntroport Location of config file and optional scripts for CSA [4]
ntrouser:ntroport 2770 ~ntrouser              CU's home directory [5]
jzsmith:ntroport  2775 $TG_COMMUNITY/ntroport Location of CSA binaries and associated files [6]

Explanatory Notes

  1. The default configuration of commsh installs the binary into /usr/local/bin. While this is what we will use in the proposed configuration, you can change this with the "--prefix=..." option when running configure.
  2. When commsh is first installed, a sample configuration file is installed in /etc/commsh.conf.sample. The SA can look at this file for some example uses of commsh. Ultimately, the SA must create /etc/commsh.conf either by copying the sample configuration file and editing it, or by creating /etc/commsh.conf from scratch. If you want commsh to read the configuration file from some place other than /etc/commsh.conf, you can set the "--sysconfdir=..." option when running configure.
  3. This directory is the storage location for all CSA configuration directories (as shown in [4] below). In other words, this directory should contain only directories, no files.
  4. The configuration files for commsh specific to the ntroport CSA/CU reside here. The /etc/commsh.conf file contains references to the various CSAs' configuration files. For example, the /etc/commsh.conf file will have entries like this, one for each CSA:
    # /etc/commsh.conf
    CheckUser ntrouser
    ReadUserConfig ntrouser /etc/commsh.d/ntroport/commsh.conf

    This new configuration file /etc/commsh.d/ntroport/commsh.conf has the entries for binaries (for a simple setup) in the CSA directory or scripts (for a complex setup) in the /etc/commsh.d/ntroport directory which will be called by commsh. Here are two basic configuration files showing a simple configuration and an advanced configuration.

    The first configuration file shows a very simple setup where a CU is allowed to run any executable in the CSA $HOME/bin directory, using any number of command line parameters.

    # /etc/commsh.d/ntroport/commsh.conf
    # Simple configuration - direct access to CSA binaries
    # Allow the CU to run any command in the CSA bin directory with any parameters
    DirectAccess $TG_COMMUNITY/ntroport/bin/* **

    The second example is more complex and involves a bit of indirection to deter unauthorized modification of allowable executables. In this setup, a CD has created two scripts with the intention that they not be modified. They are placed in the read-only /etc/commsh.d/ntroport directory by the SA. Within these two scripts are calls to executables stored in the CSA $HOME/bin directory. (Note that the scripts are highly specific to a particular CSA and are not shown here.) The configuration file below allows a CU to call only these two scripts with two command line parameters (for input and output). The scripts call executables in the CSA $HOME/bin directory with the two command line parameters. The idea here is to give a CU a limited number of commands that can be called, but still allow a CD to update the underlying executables stored in the CSA $HOME/bin directory.

    # /etc/commsh.d/ntroport/commsh.conf
    # Advanced configuration - indirect access to CSA binaries via protected scripts
    # Allow the CU to run two scripts in the protected directory with two parameters
    # script1 and script2 in turn call executables in $TG_COMMUNITY/ntroport/bin
    DirectAccess /etc/commsh.d/ntroport/script1 -input * -output *
    DirectAccess /etc/commsh.d/ntroport/script2 -input * -output *
  5. When the request for a CU has been approved, the CU's home directory is automatically created. In this case, we deliberately chose the CSA name to be different from the CU name. You will probably want to choose them to be the same. The CU home directory will be utilized by the various binary files for input and output since the binaries will be executed with the CU's credential via GRAM. In other words, all input files should be transferred to the CU's home directory, for example via GridFTP. The CSA binaries, which are stored in $TG_COMMUNITY/ntroport, read the input files from the CU's home directory. Any output generated by the binaries will be written to the CU's home directory. This output can then be fetched using GridFTP. Note that the group permission (gid) on the CU's home directory allows for easy access by the CDs for debugging purposes.
  6. When the request for a CSA has been approved, the CD's CSA directory is automatically created. Note again that we deliberately chose the CSA name to be different from the CU name, but you should choose them to be the same. The directory $TG_COMMUNITY/ntroport is where the binaries (and other associated files) for the CSA will be stored. These binaries are called either directly or by the scripts located in /etc/commsh.d/ntroport/. This allows for the binaries to be updated frequently while keeping the scripts secure (since the scripts should not require frequent updating).

Instructions For Community Developers

  1. Request the creation of a Community Software Area. See the image at the right for an example of a filled-in request form.
    Note: When the CSA is created, both the $TG_COMMUNITY/CSA directory and the CG group are automatically created.
  2. Request the addition of a Community Account. See the image at the right for an example of a filled-in request form.
    Note: When the Community Account is created, the ~CU directory is automatically created. Also, you can get an X.509 credential for CU to be used by your gateway.
  3. Decide on the application binaries (and optional scripts) to be used by your gateway. This step can be tricky and requires a CD to consider not only the software needed by gateway users, but also the security restrictions desired by the SA. In a simple configuration, the commsh configuration file allows for the execution of any binaries placed in the CSA directory. In an advanced configuration, the commsh configuration file allows for the execution of only a few scripts located in a protected directory. These scripts then call specific binary applications located in the CSA directory. In either case, the binary files can be updated by a CD. The advanced configuration is more secure since the protected scripts reference only particular files in the CSA directory. So, if a malicious user placed extra files in the CSA directory, they would not be of concern since they are not referenced by the protected scripts. Of course, this advanced configuration requires advanced planning by a CD and approval by a SA since the protected scripts would not be updated very often.
  4. Simple Configuration: In a simple configuration setup, the CD places all binaries needed by the gateway in the CSA directory. The commsh configuration file is then written to allow any binaries in that directory to be executed with any number of command line parameters. The SA would install a commsh configuration file, like the one given here for a CSA named ntroport.
    # /etc/commsh.d/ntroport/commsh.conf - Simple Configuration
    # Allow ntrouser to run any binary in the CSA ntroport bin directory
    DirectAccess $TG_COMMUNITY/ntroport/bin/* **
    

    A single asterisk (*) will match any character in a single argument. In general, this means it will not match a space unless the space is enclosed in quotation marks or escaped with a back-slash. Additionally, an asterisk in the command itself will not match a backslash (/). Name your binaries accordingly. In contrast, a double asterisk (**) should only appear at the end of a command restriction specification, and indicates that any additional parameters will be accepted. Remember that binaries placed in the $TG_COMMUNITY/CSA directory can be updated by any CD.

  5. Advanced Configuration: In an advanced configuration, a limited number of scripts can be executed by the CU via commsh. These scripts should be considered to be static (i.e. seldom require modification) since they will be put in a secure location accessible only by SAs. The scripts should reference binary executables which will be placed in the $TG_COMMUNITY/CSA directory by a CD. The binaries may be updated frequently. Thus it is the job of the CD to write the scripts appropriately. Since you have created the scripts, you should know the command line parameters. This is important since it is also the responsibility of the CD to write the configuration file referenced by commsh. For the syntax of the directives for the commsh.conf file, see the commsh.conf (5) man page. Your configuration file will be audited by a SA, but ultimately the onus is on you. Below is an example commsh configuration file for a specific CSA named ntroport.
    # /etc/commsh.d/ntroport/commsh.conf - Advanced configuration
    # Allow ntrouser to execute only two protected scripts, which in turn
    # call executables in ntroport's $TG_COMMUNITY/ntroport/bin directory
    DirectAccess /etc/commsh.d/ntroport/script1 -input * -output *
    DirectAccess /etc/commsh.d/ntroport/script2 -input * -output *
    

    Here, script1 and script2 are written by a CD and installed into a secure location by a SA. The scripts take two command line parameters, one for input and one for output. These scripts call the executables located in $TG_COMMUNITY/ntroport.

  6. Submit your commsh configuration file and (optional) scripts to a SA for the appropriate TG system. The SA will review your configuration file and scripts. If acceptable, they will be installed to a secure location such as /etc/commsh.d/CSA/.
  7. Install the binary executables (optionally referenced by your scripts) into the CSA directory $TG_COMMUNITY/CSA. Since this directory has group access permissions for CDs, you can easily update the files there. However, keep in mind that the scripts will be run with the CU's credential and thus all input/output files will be in the ~CU directory.


Testing and Debugging

The restrictions of the community account require CDs to adjust their testing and debugging practices. CDs should first use their own individual TeraGrid accounts to test new applications for the gateway to run on TeraGrid systems. When the application(s)s are well-tested, the CD can work with the SA to install/modify the community scripts to enable the application(s) in the CU account. CDs have access to read and write files in the CU account (via the CG's permissions) for further debugging purposes.

Instructions For System Administrators

These instructions assume that you have a functioning Globus Toolkit 4 installation as provided by the CTSS 4 Remote Compute Capability Implementation.

Once you have GT4 installed and configured, you will need to install the Community Shell (commsh) application. While this too has been documented elsewhere, detailed installation instructions are provided here so that we can configure specific directory paths.

  1. Download the latest version of commsh and copy it to a suitable location, i.e. somewhere you have write access. Configuring and building the commsh code can be performed as any user. However you will need root access to do the actual installation of the program. Assuming you have wget installed, you can use the following commands to get the code.
    wget http://security.ncsa.uiuc.edu/research/commaccts/downloads/commsh-latest.tar.gz
    tar xvzf commsh-latest.tar.gz
    
  2. Change into the newly extracted code directory and configure/build the commsh code. By default, the commsh binary will be installed in /usr/local/bin and the associated configuration file will be read from /etc/commsh.conf. You can change these locations by using the "--prefix=/alternate/path/" and "--sysconfdir=/alternate/config" command line parameters when running configure. Run "./configure --help" for more information.
    ./configure 
    gmake
    gmake install    # Note that you MUST be root to do this
    
  3. Patch GRAM so that it will interact with commsh properly. You will need to download a patch file and apply it to the globus-job-manager-script.pl script. This will make it possible to use commsh to implement command-based restrictions on GRAM jobs. Depending on how you installed CTSS4 or Globus/GT4, you may have more than one globus-job-manager-script.pl file to patch. For example, in a basic Globus/GT4 installation, you can find this script at $GLOBUS_LOCATION/libexec/. For a CTSS4 installation, you may have two separate scripts to patch, one for a globus-wsrf install and one for a prews-gram install. You may find the scripts at $TG_APPS_PREFIX/globus-wsrf-4.0.5-r0/libexec/ and $TG_APPS_PREFIX/prews-gram-4.0.5-r1/libexec/ respectively. (Of course the exact location depends on the version numbers of the packages that were installed.) First, get the patch file. Then change to the appropriate directories and execute the patch command. The example here assumes a Globus/GT4 installation where $GLOBUS_LOCATION has been set.
    cd $GLOBUS_LOCATION/libexec
    wget http://security.ncsa.uiuc.edu/research/commaccts/downloads/globus-job-manager-script-pl.diff
    patch -p0 < globus-job-manager-script-pl.diff
    

    Don't panic if the patch complains about "fuzz". The patch is designed to work with multiple versions of GRAM. As long as both hunks of the patch succeed, the patch has been successfully applied. You may see output similiar to the following.

    patch -p0 < globus-job-manager-script-pl.diff
      patching file globus-job-manager-script.pl
      Hunk #1 succeeded at 7 with fuzz 1 (offset 5 lines).
      Hunk #2 succeeded at 87 with fuzz 2 (offset 6 lines).
    

    NOTE: If you chose a different location for installing commsh by setting "--prefix=" to something other than /usr/local in the configuration step above, you MUST edit $GLOBUS_LOCATION/libexec/globus-job-manager-script.pl and change the $FILTER_COMMAND variable to point to the installed location of the commsh binary.

  4. Create/Edit the /etc/commsh.conf configuration file. When you installed commsh, a sample configuration file was created at /etc/commsh.conf.sample. This file gives many examples of the types of restrictions you can do with commsh. We will need only a few of these directives. You need to edit this file for your particular setup. So copy this file to /etc/commsh.conf and edit the file with your favorite text editor.
    cp /etc/commsh.conf.config /etc/commsh.conf
    chown root:root /etc/commsh.conf
    chmod 644 /etc/commsh.conf
    vim /etc/commsh.conf    # or use your favorite editor
    

    You need to add an entry for every CSA on your system. The configuration below shows the configuration for a single CSA named ntroport to be used by a CU named ntrouser.

    # /etc/commsh.conf
    # Allow any user except root to run commands through commsh
    AllowUser *
    DenyUser root
    # Check GRAM job submissions by the ntrouser user
    CheckUser ntrouser
    # Load the external configuration file for the ntroport CSA
    ReadUserConfig ntrouser /etc/commsh.d/ntroport/commsh.conf 
    
  5. Create a directory to store configurations and scripts specific to each CSA. For every CSA that requires access to your machine, you will need to create a directory under /etc/commsh.d to store (a) configuration files for commsh and (b) (optionally) scripts for that CSA which will be referenced by the configuration files. For example, if the CSA is named "ntroport", you would do the following command.
    mkdir -p -m 755 /etc/commsh.d/ntroport
    
  6. Install the commsh.conf file specific to the CSA. Within the directory you created above, you need to install a configuration file which will refer to (a) (optional) scripts contained in the same directory and/or (b) executables stored in the CSA's $HOME/bin directory. This configuration file may be created by the CD, but must be verified by a SA. For the syntax of the directives for the commsh.conf file, see the commsh.conf (5) man page. Use the Science Gateways Administration page to view the information the CU requester submitted in the Community Account Request for additional information on how the account should be configured. Below we create a very simple configuration file for the ntroport CSA which allows a CU to run any executable in the $TG_COMMUNITY/ntroport/bin directory, with any number of command line parameters.
    echo 'DirectAccess $TG_COMMUNITY/ntroport/bin/* **' > /etc/commsh.d/ntroport/commsh.conf
    chown root:root /etc/commsh.d/ntroport/commsh.conf
    chmod 644 /etc/commsh.d/ntroport/commsh.conf
    
  7. Set the shell for the CU. In order to activate commsh parsing for a specific user, the CU's shell must be changed to commsh. This involves two steps.
    1. Edit the /etc/passwd file and find the line for the CU ntrouser. Set the shell (typically the last entry on the line) to /usr/local/bin/commsh.
    2. Edit the /etc/shells file and append /usr/local/bin/commsh to the list. Note that this second step need be done only once. You can do this with the following command.
    echo '/usr/local/bin/commsh' >> /etc/shells
    

Use Case Scenario

A CD will write the code for the gateway which calls the binaries stored on the supercomputing resources. We assume that the gateway has a copy of the CU's credential, and that the CD has placed the binaries called by the gateway in the appropriate location (i.e. $TG_COMMUNITY/CSA). We also assume that commsh has been configured using the "simple configuration" where binaries are called directly. A typical scenario is as follows:

  1. Copy one or more local files (where "local" means the files are accessible to the gateway) to the supercomputing resource in the CU's home directory.
  2. Run one or more executables stored in $TG_COMMUNITY/CSA, using the files just transferred as input, writing any output to the CU's home directory.
  3. Copy any generated output files from the CU's home directory back to the gateway.

A simple example can be done by a CD on the command line. Here we assume that the binary sort has been placed in the CSA directory $TG_COMMUNITY/ntroport, which expands to /usr/projects/ntroport. Run the following commands from the gateway machine. Be sure to substitute the appropriate values for your supercomputing resource and associated directories.

  1. Be sure you run all of the following commands using the CU's credential. If the credential is stored in a MyProxy server, you can fetch it with the myproxy-get-delegation command.
    # myproxy-get-delegation -l ntrouser -s myproxy.teragrid.org
    Enter MyProxy pass phrase: <password not echoed for security purposes>
    A credential has been received for user ntrouser in /tmp/x509up_u28289.
    
  2. Copy a local text file to the CU's home directory.
    # set JOBID=12345
    # globus-url-copy -v file:///full/path/to/file/input.txt \
                         gsiftp://your.server.com:2811/~/input.txt.$JOBID
    Source: file:///full/path/to/file/
    Dest:   gsiftp://your.server.com:2811/~/
      input.txt -> input.txt.12345
    
  3. Sort the input file and write the results to a new output file in the CU's home directory. Typically such commands would be written to a dynamically generated script file, but we show them here on the command line for the sake of simplicity.
    # globusrun-ws -F your.server.com -submit -streaming \
                   -c /usr/projects/ntroport/sort \
                      "--output=/home/ntrouser/output.txt.$JOBID" \
                      /home/ntrouser/input.txt.$JOBID
    

    For testing the older GRAM2 version, use the following command.

    # globusrun -o -r your.server.com/jobmanager \
                '&(executable=/usr/projects/ntroport/sort) \
                (arguments="--output=/home/ntrouser/output.txt.12345" \
                 /home/ntrouser/input.txt.12345)'
    
  4. Copy the resulting output file back to the local account.
    # globus-url-copy -v gsiftp://your.server.com:2811/~/output.txt.$JOBID \
                         file:///full/path/to/file/output.txt.$JOBID
    Source: gsiftp://your.server.com:2811/~/
    Dest:   file:///full/path/to/file/
      output.txt.12345
    

See Also

Personal tools