TeraGrid Client Software - Basic Client Toolkit
From TeraGrid Wiki
This page discusses the evolution of a prototype for a package of software clients that can be used by campus researchers from their laptops or workstations to perform basic interactions with TeraGrid resources.
In order to gain some initial traction while still trying to hammer out specific requirements for the TeraGrid Client Software effort, a preliminary prototype was explored. This prototype was based on Option 1 as detailed in our proposal to the EOT Team. In short, the goal was to provide an initial implementation of a software distribution that campus IT groups could make available to their users that would provide access to basic TeraGrid services like remote login.
The prototype was based around the Virtual Data Toolkit (VDT), which like the CTSS uses Pacman for packaging and distribution. A Pacman cache was set up for this prototype and simply pointed to the VDT-Client package in the VDT's Pacman cache. A similar technique is used in OSG to provide a layer of extra services on top of what VDT provides.
An install of client tools was conducted using the prototype. While successful, the experience had a number of shortcomings including:
- The installation footprint was huge. (Over 1GB IIRC!). This is certainly unacceptable for what is supposed to be a bundle with a collection of basic client tools.
- User-friendliness was lacking. The install required several questions to be answered that would not be straightforward for a typical user. The output shown during installation also could be vastly reduced and improved to make for a better experience.
- The install required network connectivity. It is unclear at the moment whether this is an acceptable mode for this type of toolkit. There's a good chance that a CD/DVD-based install may be preferred by campus IT groups.
Determining what would need to be done to address these issues requires further exploration.
In order to address the large size and duration of the install, the TeraGrid-Client package from the previous effort was modified to identify at a finer granularity what components from the VDT are needed. Specifically, instead of simply using the VDT-Client package that encompasses all client-oriented software in the VDT, only the CA-Certificates, MyProxy, and GSIOpenSSH packages are now used. As a result, the install was reduced to about 2-3 minutes and 125MB.
In the short term, I'd like to continue work on the prototype as follows:
- This prototype now only supports remote login. Packages will likely need to be added to support job submission and data movement.
- Most (all?) of the questions asked during the install can be scripted away, making for a much smoother experience.
- 125MB still seems big. Any easy way to shrink this further?
- I noticed that part of what the VDT installs is Perl. In addition, after a user runs VDT's setup.sh script in order to set up environment variables so that the VDT's programs can be easily accessed, the VDT's Perl interpreter becomes the first one in the user's PATH. I'll look for a way to prevent this, and also will check to see if there are any similar surprises for other software packages.
Over the last month, the development of the Basic Client Toolkit has been pushed well ahead. It is now in a state where it satisfies its basic requirements and can start being subjected to more thorough, multi-platform testing. A basic description of the toolkit's implementation follows:
Installing the toolkit involves two steps: (1) downloading a tarball that contains some bootstrapping content and (2) running an installer from this tarball which uses a Pacman cache available over the Internet to complete the installation. The installation scripts are implemented in Bourne shell, but the package also requires Python for Pacman to work. The current prototype's tarball is about 1MB, while the full installation is about 440MB (this is considerably higher than the previously mentioned 125MB because the prototype now includes GridFTP and GRAM client tools that it didn't before). During testing, installation over a wireless LAN connection took around 5 minutes.
The following illustrates a shell session in which a user unpacks the tarball, runs the install script, and then runs the uninstall script. Note that while output to the user's terminal is fairly terse during installation, an
install.log file is created with a more detailed information.
[greg@kegonsa ~]$ tar xzf teragrid-client-0.01.tgz; cd teragrid-client-0.01 [greg@kegonsa teragrid-client-0.01]$ date; ./install-teragrid-client; date Wed Feb 6 11:24:50 CST 2008 Installing the TeraGrid Client Toolkit. Searching for packages... [***************************************************************************] Installing packages... [***************************************************************] TeraGrid Client Toolkit successfully installed. Wed Feb 6 11:29:25 CST 2008 [greg@kegonsa teragrid-client-0.01]$ ./uninstall-teragrid-client Uninstalling the TeraGrid Client Toolkit. * Removed ant. * Removed berkeley-db. * Removed expat. * Removed fetch-crl. * Removed globus. * Removed gpt. * Removed install-marker. * Removed jdk1.5. * Removed licenses. * Removed logrotate. * Removed o..pacman..o. * Removed pacman-3.21. * Removed perl. * Removed post-install. * Removed post-setup. * Removed setup.csh. * Removed setup.sh. * Removed trusted.caches. * Removed vdt. * Removed vdt-app-data. * Removed vdt-install.log. * Removed vdt-questions.csh. * Removed vdt-questions.sh. * Removed tarball-file-list. TeraGrid Client Toolkit successfully uninstalled.
Note that the uninstaller script leaves the state of the install directory back the way it was before installation (except that it leaves the installer-generated
install.log file. Also if the installer fails for any reason, it will undo all its actions from before the point of failure.
Outstanding issues concerning the Basic Client Toolkit are:
- The size is now even larger than before. Given the relatively small number of client tools that the toolkit needs to make available, 440MB seems a bit crazy. One avenue of addressing this - being selective about which packages from the VDT are used - has already been explored. Perhaps splitting packages up on the server side or even manually paring down the installation directory from the install script could help further? Related to this issue is the time involved for the installation. Although perhaps given that our target audience is people on campuses the 5 minute install time measured over a wireless LAN connection indicates that we are OK.
- The toolkit does not support Windows. On the one hand, this seems bad because so many campus users will likely be on Windows machines. On the other hand, it is unlikely that most of the toolkit's constituent programs support Windows anyways (this needs to be explored further). There does exist a Java version of a GSI-enabled SSH client, but if this is the only tool that it even makes sense to deploy on a Windows client machine, then leaving it outside the scope of this toolkit seems to make sense.
- There are still perhaps some client tools that may make sense to deploy via the toolkit that aren't as of yet.
tgcpare client tools that are available on TeraGrid resources. Investigation into these programs shows they are both Perl scripts so including them in the toolkit would be straightforward, although it is unknown how much work it would be to allow these scripts to function correctly from campus resources. In addition, client-side SRB functionality may be useful to include in the toolkit. It currently is not because SRB is not packaged as part of the VDT.
- The toolkit has only so far been tested on a few Linux platforms. Cross-platform testing is easily facilitated at the University of Wisconsin using the NMI Build and Test laboratory, and will begin shortly. A list of important platforms needs to be generated in order to guide this testing.
- Documentation still needs to be created regarding the use of the toolkit.
At this point, the toolkit is in a position to provide a foundation for expanded functionality, such as the Scale-Up Client Toolkit that has been planned as a 2nd phase of this effort.
Documentation has been written on installing and using the toolkit in its current form. A preliminary version of the toolkit (packaged as the TeraGrid Client Toolkit (TCT)) has been made available for people to checkout. A "release" page has been written that provides an introduction to the TCT and links to the package and documentation.