Storage/Data

You can store your files on your home, work, and/or scratch directory. All directories are mounted on both submit nodes and all compute nodes. Each directory is assigned a variable when you log in that you can use for quicker access to that space and reference in your scripts.

NOT BACKED UP: No data, anywhere, is backed up. We recommend using the Research File Storage for those purposes.

To see any of the information below, run the command mystats

  • Home Directory ($HOME): Your home directory is a place only you have access to. If you wish to share your data, see $WORK directory below. You have a 100 GB disk and 100,000 file limit quota.

    The path to your home directory is (replace username with your KU Online ID):

    /home/username
    
  • Work Directory ($WORK): Your work directory is a shared space for you to collaborate with your group. The quota is based off how much storage the owner of the group has purchased directory or indirectly from purchasing nodes.

    To find your $WORK directory, run the command.

    /panfs/pfs.local/work/groupname/username
    

    If you do not see a $WORK, then your owner group has not bought any, and/or bought nodes in 2013, which did not have storage included:

  • Scratch Directory ($SCRATCH): Your scratch directory is a place for you to use as a temporary space for your data processing. The quota for scratch is a finite amount, but is set for the whole volume, which CRC staff will maintain.

    Files equal to or greater than 60 days old will be deleted

    To find your $SCRATCH directory, run the command.

    /panfs/pfs.local/scratch/groupname/username
    

You can use $HOME, $WORK, and $SCRATCH in you submit scripts and make it easier to get around the file systems.

cd $WORK
#MSUB -o $SCRATCH/job12/out.txt

Quota

Each of the above locations, $HOME $WORK and $SCRACH each have an enforced quota. To determine how much of your quota you are using for each of these volumes, login to the cluster and run mystats

This will produce output similar to the following:

------------------------------- Storage Variables ------------------------------
| Variable     Path                                                            |
| $HOME        /home/username                                                  |
| $WORK        /panfs/pfs.local/work/groupname/username                        |
| $SCRATCH     /panfs/pfs.local/scratch/groupname/username                     |
--------------------------------------------------------------------------------

--------------------------------- Disk Quotas ----------------------------------
| Disk         Usage (GB)     Limit    %Used   File Usage       Limit   %Used  |
| $HOME             33.51    100.00    33.51        99054      100000   99.05  |
| $WORK           6436.80  13969.84    46.08       296488           0       0  |
| $SCRATCH       39533.15  55879.35    70.75            1           0       0  |
--------------------------------------------------------------------------------

Transferring your files

There are many ways to transfer files in and out of the cluster. CRC recommends using either Globus or rsync.

  • Globus: Globus works through your internet browser, so therefore works on Windows, Linux, and MacOS.

To start using Globus, navigate to https://www.globus.org/app/transfer in your browser. From there, choose University of Kansas as your institution. You will be redirected to login with your KU ID.

You should then see a screen with two panels, if this is your first time using Globus, you'll need to set up your personal computer as a Globus endpoint.

To do this, click the link on the bottom right of the screen that says "Get Globus Connect Personal". Then follow the prompts, naming your computer, generating a key, downloading the client, and then entering the previously generated key into the client. Your computer is now an endpoint.

Back on the page with the two panels, you need to choose the two endpoints you'd like to copy between. The cluster endpoint is named KU CRC Data Transfer Node. The other endpoint will be whatever you named your computer in the previous step.

Now you should be able to see the files both on your computer as well as your cluster home directory. From here transferring is just a simple click and drag from source to destination.

 

  • rsync: rsync is a command for Linux and Mac only. It is used to transfer files back and forth using the Terminal and comes with no GUI.

You must be on KU's network or connected to KU Anywhere to access the Data Transfer Node (DTN).

The KU Community Cluster supports SCP, SFTP and Rsync for transferring files:

Host: transfer.hpc.crc.ku.edu
Port: 22

rsync -avP username@host1:~/file1 username@host2:~/file1_copy

For example, to copy a file from your home directory on your local computer (e.g., ~/foo.txt) to your home directory on the HPC, on the command line, enter (replace username with your KU Online ID username):

rsync -avP ~/foo.txt username@transfer.hpc.crc.ku.edu:~/foo.txt

 

  • SCP: Similar to rsync above, this is a command to be run in the Linux or Mac terminal. This command-line utility is included with OpenSSH.

You must be on KU's network or connected to KU Anywhere to access the Data Transfer Node (DTN).

The KU Community Cluster supports SCP, SFTP and Rsync for transferring files:

Host: transfer.hpc.crc.ku.edu
Port: 22

scp username@host1:~/file1 username@host2:~/file1_copy

For example, to copy a file from your home directory on your local computer (e.g., ~/foo.txt) to your home directory on the HPC, on the command line, enter (replace username with your KU Online ID username):

scp ~/foo.txt username@transfer.hpc.crc.ku.edu:~/foo.txt

Recovering your files

One of the features of our cluster filesystem is the concept of snapshots. Snapshots are a daily capture of files in a given directory. All snapshots are user accessible, but only for volumes that are owned by a group the user is part of. Snapshots are read-only, but can be used for when you accidentally delete a file, you can retrieve that file up to seven days later.

Snapshots are stored in the .snapshot directory in the root of the your work or home directory, but this directory is hidden, and won't be displayed in listings (ls) of that directory. Snapshots are captured for $HOME and $WORK directories but not $SCRATCH

For example, say you're working in your work directory, (i.e. /panfs/pfs.local/work/groupname/username) and you accidentally delete a file named oops.txt. To restore that file from a previous snapshot, you can navigate to the .snapshot directory for your group's work and there you will find directories containing snapshots from the past seven days. Each of these directories contain a file structure similar to that of /panfs/pfs.local/groupname and has a snapshot of what was in those files when that snapshot was taken. You can navigate into those directories and copy the file(s) you accidentally deleted back to your work directory.

cd /panfs/pfs.local/work/groupname/.snapshot
ls
cd date-of-snapshot.automatic 
cd username
cp oops.txt /panfs/pfs.local/work/groupname/username

If one particular file was heavily modified, the snapshot may not recover the most recent change, but it will have the files that were in those directories when the snapshot was taken for that day.

Snapshots of home directories can also be found in

/home/.snapshot/date-of-snapshot.automatic/username

Due to the way that directory is set up you cannot ls inside the date-of-snapshot.automatic directory, instead you must go directly to your own home directory as shown above.

Snapshots are on a rolling seven day purge, so if you accidentally delete a file you will need to restore it within seven days or it will be gone forever.


CRC Help

If you need any help with the cluster or have general questions related to the cluster, please contact crchelp@ku.edu.

In your email, please include your submission script, any relevant log files, and steps in which you took to produce the problem

One of 34 U.S. public institutions in the prestigious Association of American Universities
44 nationally ranked graduate programs.
—U.S. News & World Report
Top 50 nationwide for size of library collection.
—ALA
23rd nationwide for service to veterans —"Best for Vets," Military Times
KU Today