Distributed File Systems


There are currently three large Distributed FileSystems running on Greenplanet: /DFS-L, /DFS-B, and /XXX-L.

-- Main storage area. User data storage directory at /DFS-L/DATA/groupname/username
-- 1 petabyte Lustre 2.12 system with 8 Object Storage Targets spread over 2 Object Storage Servers, plus a single MetaData Server.
-- NOT backed up, but hardware is fault tolerant.
-- $2/TB/month to store data (first TB per group is free)

-- Auxiliary storage area. User data storage directory at /DFS-B/DATA/groupname/username
-- Approximately 140TB (shares a subset of object and metadata storage targets with /DFS-L), BeeGFS 7.2
-- NOT backed up, but hardware is fault tolerant.
-- Usage of DFS-B is added to DFS-L for recharge purposes
-- Deprecated storage (was initially used when Lustre was unable to handle many small files efficiently) 

-- Scratch storage area. User scratch directory at /XXX-L/SCRATCH/groupname/username
-- 140TB Lustre 2.12 system with 12 OSTs on 2 OSSs, 1 MDS
-- NOT backed up, but hardware is fault tolerant.
-- Uses older, repurposed disks
-- Testing area for newer stable Lustre versions (e.g. will move XXX-L to 2.15 for several months before upgrading DFS-L)
-- NO CHARGE for usage


Previous Updates


  We are updating the system software on Greenplanet, but only a block of nodes at a time. Once the compute nodes are all updated, the old distributed 360 TB file systems (/beegfs and /lustre) will start being cleared, updated, and combined with the new 720 TB file systems (/DFS-B and /DFS-L). In preparation, existing data is being pre-copied to make the final transition quick. At the moment, /beegfs, /lustre, and /DFS-L are mounted on all nodes, but /DFS-B is only accessible from the new system (gplogin2).

  -- On the old system (gplogin1, gplogin3 and the atlas login nodes), please start to use:

  -- On the new system (gplogin2), please use:

There will be plenty of warning before /beegfs/DATA and /lustre/DATA are moved.

Data in /beegfs/SCRATCH and /lustre/SCRATCH may be moved earlier to relieve file system pressure, but not while affected users are logged in or running jobs.


Both Lustre and BeeGFS seem to be running fine. The only errors logged are due to nodes failing for other reasons. I (Nate Crawford) have been using /beegfs/SCRATCH for all operations, like compiling software, that I used to do on the NFS arrays (/data19, etc.), and have not noticed anything unusual. Most of the Modeling Facility slurm submission scripts now try to use /lustre/SCRATCH for multi-node jobs, also without reported errors.

I encourage users who are running out of space on the /data NFS mounts to try the /beegfs/DATA/group/user and /lustre/DATA/group/user filesystems. Actually, I enourage you to go through your own data to remove old junk first :)


We are testing two new distributed filesystems on Greenplanet. For now, please use them for temporary calculation data. After observations with actual user load, we may see things that require major changes (up to total disk wipe) to fix. After thorough testing, one or both will opened up for non-scratch usage.

Users will have individual scratch directories at /lustre/SCRATCH/$group/$USER and /beegfs/SCRATCH/$group/$USER

Results after initial testing show Lustre is better with large (>1MB) files, while BeeGFS better handles directories with hundreds of small (1KB) files. Please test your own calculations and let us know.

Both filesystems are running on top of the 240TB ZFS array on nas-50-7, but can be expanded to include additional disk arrays in the same namespace. As they use the same pool of disks, writing data to /beegfs will reduce the available space on /lustre (and vice versa).

Both filesystems are mounted on all nodes with Infiniband. The few special non-IB nodes only have access to /beegfs at this time.