You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Depending on the agreement between CNAF and experiments, data transfers can be performed with or without Storage Resource Manager (SRM), which at Tier-1 is StoRM. SRM is typically used when the experiment maintains a Virtual Organization (VO).

Other protocols which are commonly used at INFN-Tier-1 are posix, GridFTP, XrootD, WebDAV/http.

Data transfers without SRM

To transfer a file without SRM, globus-url-copy is commonly used. It is a command line program for file transfers which implements different protocols, among which gridFTP, an extension of FTP for file transfers. It supports parallel transfer streams and third-party-copy.

A personal certificate is required in order to use gridFTP. Also, the user DN has to be enabled on the gridFTP server by the sysadmin. The DN can be obtained from the certificate using the command:

openssl x509 -noout -in $HOME/.globus/usercert.pem -subject

Then, it should be communicated to the User Support team in order to be enabled.

Before performing the actual file transfer, it is necessary to generate a proxy with the command:

grid-proxy-init

By default, the proxy lasts 12 hours. In order to extend proxy life time, the following options can be used:

    -valid HOURS:MINUTES
-hours HOURS

For example:

-bash-4.2$ grid-proxy-init -hours 48
Your identity: /DC=org/DC=terena/DC=tcs/C=IT/O=Istituto Nazionale di Fisica Nucleare/CN=Andrea Rendina arendina@infn.it
Enter GRID pass phrase for this identity:
Creating proxy ...................................... Done
Your proxy is valid until: Sun Aug 2 17:47:32 2020

After that, we can perform the transfers. This depends on the permissions and the access control list on the filesystem.
To write:

globus-url-copy <local_path>/file gsiftp://gridftp-plain-virgo.cr.cnaf.infn.it:2811/<remote_path>/file

To read, i.e. to get a local copy:

globus-url-copy gsiftp://gridftp-plain-virgo.cr.cnaf.infn.it:2811/<remote_path>/file local_copy

The <remote_path> (something like: /storage/gpfs_data/experiment) will be communicated to the user by the User Support team.
Note that the 
globus-url-copy command allows to do a third-part-copy of a file without getting a local copy on your own device.
This works with a simple concatanation of read and write:

globus-url-copy gsiftp://gridftp-plain-virgo.cr.cnaf.infn.it:2811/<source_remote_path_>/file gsiftp://gridftp-plain-virgo.cr.cnaf.infn.it:2811/<destination_remote_path>/new_file

You can also use the gfal tools, that are explained in the following paragraphs, for example to list the files of a directory or remove a file:

Data transfers with SRM

In this case, a voms-proxy is needed (see above for details on proxy generation).

In case of local to remote transfer, you have to request the storage space in the destination filesystem and this is done with the command clientSRM PTP, where PTP stands for Prepare To Put. In case of remote to local transfer the command is clientSRM PTG, where PTG stands for Prepare To Get.

clientSRM PTP –v NIG -e httpg://storm-fe-archive.cr.cnaf.infn.it:8444 -s srm://storm-fe-archive.cr.cnaf.infn.it:8444/srm/managerv2?SFN=/virgo4/test.mt.002 
clientSRM PTG –v NIG -e httpg://storm-fe-archive.cr.cnaf.infn.it:8444 -s srm://storm-fe-archive.cr.cnaf.infn.it:8444/srm/managerv2?SFN=/virgo4/test.mt.002
where:
  • -v is the verbose level,
  • -e is used to specify the endpoint,
  • -s is used to specify the destination surl, which is composed by a space token (virgo4 in the example) and the file path. The space token will be communicated by the Experiment Support group.

and the complete list of options is listed by the command clientSRM PTP -help or in [17]. The output should be something like this:

============================================================
Behavior:
poll: 0
verbose level: NIG
============================================================
Input data:
authorizationID=NULL
storageSystemInfo=NULL
arrayOfFileRequests (size=1)
\[0\] targetSURL="srm://storm-fe-archive.cr.cnaf.infn.it:8444/srm/managerv2?SFN=/virgo4/test.mt.002"
\[0\] expectedFileSize=NULL
userRequestDescription=NULL
overwriteOption=NULL
desiredTotalRequestTime=NULL
desiredPinLifeTime=NULL
desiredFileLifeTime=NULL
desiredFileStorageType=NULL
targetSpaceToken=NULL
targetFileRetentionPolicyInfo=NULL
transferParameters=NULL
============================================================
Sending PtP request to: httpg://storm-fe-archive.cr.cnaf.infn.it:8444
Before execute:
Afer execute:
Request Status Code 17
Poll Flag 0
============================================================
Request status:
statusCode="SRM_REQUEST_QUEUED"(17)
explanation=NULL
============================================================
SRM Response:
requestToken="{*}fd5256b6-5cb3-4fe2-a82d-3ba316f1f1f8{*}"
remainingTotalRequestTime=NULL
arrayOfFileStatuses (size=1)
\[0\] SURL="srm://storm-fe-archive.cr.cnaf.infn.it:8444/virgo4/test.mt.002"
\[0\] status: statusCode="SRM_REQUEST_QUEUED"(17)
explanation=""
\[0\] fileSize=NULL
\[0\] estimatedWaitTime=NULL
\[0\] remainingPinLifetime=NULL
\[0\] remainingFileLifetime=NULL
\[0\] TURL=NULL
\[0\] transferProtocolInfo=NULL
============================================================

It is important to pay attention to the request token, which will be used later. Then it is necessary to check the status of the request with clientSRM SPTP (Status of Prepare To Put) or clientSRM SPTG (Status of Prepare To Get).

clientSRM SPTP -v -e httpg://storm-fe-archive.cr.cnaf.infn.it:8444 -t fd5256b6-5cb3-4fe2-a82d-3ba316f1f1f8
clientSRM SPTG -v -e httpg://storm-fe-archive.cr.cnaf.infn.it:8444 -t fd5256b6-5cb3-4fe2-a82d-3ba316f1f1f8

where with –t you provide the token shown in the output of the clientSRM PTP command. The output will show you whether the request is successful in the status field.

============================================================
Sending StatusPtP request to: httpg://storm-fe-archive.cr.cnaf.infn.it:8444
Before execute:
Afer execute:
Request Status Code 0
Poll Flag 0
============================================================
Request status:
statusCode="SRM_SUCCESS"(0)
explanation="All chunks successfully handled!"
============================================================
SRM Response:
  arrayOfFileStatuses (size=1)
      \[0\] SURL="srm://storm-fe-archive.cr.cnaf.infn.it:8444/virgo4/test.mt.002"
      \[0\] status: statusCode="{*}SRM_SPACE_AVAILABLE{*}"(24)
             explanation="srmPrepareToPut successfully handled!"
      \[0\] *TURL="gsiftp://gridftp-storm-archive.cr.cnaf.infn.it:2811//storage/gpfs_virgo4/virgo4/test.mt.002"*
============================================================

It is important to remember the TURL which will be used in transfer command with globus-url-copy command. After that we can perform the file transfer. In case of local to remote:

globus-url-copy file:///<local_path>/file gsiftp://gridftp-storm-archive.cr.cnaf.infn.it:2811//storage/gpfs_virgo4/virgo4/test.mt.002

or in case of remote to local:

globus-url-copy gsiftp://gridftp-storm-archive.cr.cnaf.infn.it:2811//storage/gpfs_virgo4/virgo4/test.mt.002* *file:///<local_path>/file

The full list of the available options is available using:

man globus-url-copy

Some useful options:

  • -f FILENAME : Read a list of URL pairs from filename. Each line should contain sourceURL destURL. Enclose URLs with spaces in double qoutes ("). Blank lines and lines beginning with # will be ignored.
  • -df FILENAME, -dumpfile FILENAME : Path to a file where untransferred URLs will be saved for later restarting. Resulting file is the same format as the -f input file. If file exists, it will be read and all other URL input will be ignored.
  • -cd, -create-dest : Create destination directory if needed.
  • -r : Copy files in subdirectories
  • -v, -verbose : Display URLs being transferred
  • -p PARALLELISM, -parallel PARALLELISM : pecify the number of parallel data connections should be used.
  • -list URL : List the files located at URL.
  • -sync : Only transfer files where the destination does not exist or differs from the source. -sync-level controls how to determine if files differ.
  • -sync-level number : Criteria for determining if files differ when performing a sync transfer. The default sync level is 2.
    The available levels are:
    • Level 0: will only transfer if the destination does not exist.
    • Level 1: will transfer if the size of the destination does not match the size of the source.
    • Level 2: will transfer if the time stamp of the destination is older than the time stamp of the source.
    • Level 3: will perform a checksum of the source and destination and transfer if the checksums do not match.

To list the file in a directory, you can use the command clientSRM LS.

clientSRM Ls -e httpg://storm-fe-archive.cr.cnaf.infn.it:8444 -s srm://storm-fe-archive.cr.cnaf.infn.it:8444/srm/managerv2?SFN=/virgo4/

More information on using SRM clients can be found here [18].

Lcg-utils



N.B. Since Scientific Linux 7 distribution does not support lcg-utils, you could use gfal tool, alternatively.
Alternatively to the previously described StoRM clients, it is possible to use lcg-cp (of lcg-utils [19]) for file transfer, which is a wrapper of clientSRM (of StoRM [17]). Example:
lcg-cp --vo virgo -D SRMv2 -b -v file.tar.gz srm://storm-fe-archive.cr.cnaf.infn.it:8444/srm/managerv2?SFN=/virgo4/test.mt.006
where:



  • -b option. If this flag is present, it means that you don't want to make BDII calls to get SE type. So, you must provide the type of the SE for srm: arguments, and full endpoint in SURLs.
  • -v option. Verbose mode. You can specify it twice for extra verbose mode.
  • -D option specifies the default SE type you want to use. Possible values are none, se, srmv1, srmv2, for respectively no default type, classic SE, SRMv1, and SRMv2. But if according to the BDII the default type is not available for this SE, it will use another type.
  • --vo option for virtual organization.

More options can be find with the command man lcg-cp.
Most used commands are:

  • lcg-get-checksum : gets or compute the checksum value of given files.
  • lcg-la : lists the aliases for a given LFN, GUID or SURL.
  • lcg-ra : removes an alias in the RMC or the LFC for a given GUID.
  • lcg-bringonline : brings SURLs online.
  • lcg-getturls : gets the TURLs for given SURLs and transfer protocols.
  • lcg-lg : gets the GUID for a given LFN or SURL.
  • lcg-rep : copies a file from one Storage Element to another Storage Element and registers it in the LRC or the LFC.
  • lcg-cp : copies a Grid file to a local destination, or copie a local file to a SE (without registering it in a file catalog).
  • lcg-gt : gets the TURL for a given SURL and transfer protocol.
  • lcg-lr : lists the replicas for a given LFN, GUID or SURL.
  • lcg-del : deletes files (either one replica or all replicas).
  • lcg-infosites : provides a user-friendly way to query the EGI/WLCG information system for services that match given criteria.

Gfal



Documentation is available here [22].
These are the steps to install Gfal assuming the machine is CentOS7:



  1. Enable epel repo:
    curl http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm >/tmp/epel-release-latest-7.noarch.rpm
    sudo rpm -ivh /tmp/epel-release-latest-7.noarch.rpm
  2. Enable egi repo:
    echo '\[EGI-trustanchors\]name=EGI-trustanchorsbaseurl=http://repository.egi.eu/sw/production/cas/1/current/gpgkey=http://repository.egi.eu/sw/production/cas/1/GPG-KEY-EUGridPMA-RPM-3gpgcheck=1enabled=1' | sudo tee /etc/yum.repos.d/egi.repo
  3. Install several tools:
    sudo yum install -y gfal2-util gfal2-all fetch-crl ca-policy-egi-core globus-proxy-utils
  4. Install personal certificate on the machine:
    cd $HOME
    mkdir -p .globus
    cd .globus
    openssl pkcs12 -clcerts -nokeys -in cert.p12 -out usercert.pem
    openssl pkcs12 -nocerts -in cert.p12 -out userkey.pem
    chmod 600 usercert.pem
    chmod 400 userkey.pem

To check all is correctly working:

grid-proxy-init -valid 168:00
gfal-copy --version

The last command should produce a list of the available protocols. The list should include gridftp. If this is not the case, try to do: yum update
Here an example, to copy file:
gfal-copy /home/CTA/tenticta/file_1MB srm://storm-fe-archive.cr.cnaf.infn.it:8444/srm/managerv2?SFN=/ctadisk /vo.cta.in2p3.fr/user/t/tenticta/file_1MB.03
GFal options are very similar to the ones of lcg-utils. You could get the full list using the command: man gfal-copy
Most used commands are:

  • gfal-ls: List information about the file
  • gfal-xattr: Display attributes of a file or set them to a new value
  • gfal-cat: Concatenate file to standard output
  • gfal-mkdir: Create the DIRECTORY(ies), if they do not already exist
  • gfal-stat: Display extended information about a file or directory
  • gfal-chmod: Change the permissions of a file
  • gfal-rename: Renames SOURCE to DESTINATION
  • gfal-sum: Calculates the checksum of the specified file, using a specified checksum algoritm
  • gfal-rm: Removes each specified file or directory
  • gfal-save: Reads from stdin and writes to a file until it finds EOF

Data transfers using http endpoints (with both proxies and tokens)

At INFN-Tier-1, valid WebDAV endpoints for the experiments’ storage areas are provided with StoRM WebDAV (third-party-copy supported) or Apache.

Then, the most common WebDAV clients can be used to access the storage areas, namely browsers and command-line tools such as curl and davix.

When StoRM WebDAV is used, VOMS proxies are supported only by command-line tool, and browsers can be used to navigate into the storage area content if anonymous read-only access is enabled (HTTP endpoint) or if VO users access through their X509 certificate is enabled (HTTPS endpoint).

StoRM WebDAV also supports OpenID connect authentication and authorization on storage areas, so tokens can be used instead of proxies[23]. Dedicated IAM (Identity and Access Management) instances can be configured for the experiments upon requests (please ocntact the user support).

As currently StoRM WebDAV does not support group-based authorization, for such use-case we provide a dedicated Apache server and a catch-all IAM instance available at iam-computing.cloud.cnaf.infn.it, where registered users are assigned to specific groups.

Once registered within IAM, an access token can be retrieved via browser or dedicated script from a registered IAM client, and such access token, exported in an environment variable, can be used instead of the VOMS proxy to access the storage area with http clients (see below for examples).

A few useful commands follow, more info are available in the wiki[25].

Tape

  • Tape area path will be provided by CNAF.
  • Files located in the tape area are pre-migrated on tape
  • Subsequently system can delete them from disk if disk space is needed (however the files remain on tape)

Check if the file is on the disk (using local POSIX commands)

  • To know if a file is on the disk, it is sufficient to check file dimension (i.e.: ls -ls)
  • If the file has null dimension, it is not physically present on the disk
-bash-4.2$ ls -ls /storage/gpfs_tsm_cms/cms/store/test/rucio/cms/store/mc/\
RunIIFall18wmLHEGS/SUSYGluGluToBBHToBB_M-600_TuneCP5_13TeV-amcatnlo-pythia8/\
GEN-SIM/102X_upgrade2018_realistic_v11-v1/280000/A61D92B2-C74A-6045-8325-869194181F9E.root

0 -rw-rwxr--+ 1 storm storm 1790274828 Jul 10 18:33 /storage/gpfs_tsm_cms/cms/store/test/rucio/cms/\
store/mc/RunIIFall18wmLHEGS/SUSYGluGluToBBHToBB_M-600_TuneCP5_13TeV-amcatnlo-pythia8/\
GEN-SIM/102X_upgrade2018_realistic_v11-v1/280000/A61D92B2-C74A-6045-8325-869194181F9E.root ## ON TAPE

Check if the file is on the disk (with Grid tools using VO based authentication)

  • To know if a file is on the disk, you can use lcg-ls [19] command with the option -l. For example:
    lcg-ls -c 100 -v -l srm://storm-fe-archive.cr.cnaf.infn.it:8444/pamela/data/file

    The output will be something like this:
    SE type: SRMv2
    -rw-rw-rw- 1 2 2 681491712 ONLINE_AND_NEARLINE /pamela/data/file […]


    In output of the command, next to the file, there will be its status. ONLINE_AND_NEARLINE means the file is present both on disk and tape, while NEARLINE means it is only on tape.
    NB: for SL7 “lcg-utils” and so “lcg-ls” are deprecated

  • Another way to check where is a file is to use the following command (to be used with a valid VOMS Proxy):
    clientSRM ls -l -v NIG -e <endpoint> -s <file-SURL>

    Based on the information shown in the output, it is possible to locate the file:
           - "retentionPolicyInfo=(2,0)" : on tape
           - "retentionPolicyInfo=(0,0)" : only on disk

    Example:
    # file on TAPE:
    clientSRM ls -l -v NIG -e httpg://storm-fe-archive.cr.cnaf.infn.it:8444/ -s srm://storm-fe-archive.cr.cnaf.infn.it:8444/icarus/test-srm

    [...]
    [0] retentionPolicyInfo=(2,0)
    [...]

    #file only on DISK:
    clientSRM ls -l -v NIG -e httpg://storm-fe-archive.cr.cnaf.infn.it:8444/ -s srm://storm-fe-archive.cr.cnaf.infn.it:8444/icarusdata/std.err

    [...]
    [0] retentionPolicyInfo=(0,0)
    [...]

  • Using "gfal-xattr" command as following:
    gfal-xattr <file-SURL>

    NB: gfal-xattr ALWAYS recall a file! This is a bug of the command that will be fixed by the gfal utils developers. 

Recall files from tape (without Grid tools)

To recall files from tape, it is necessary to provide the list of the file to be recalled. CNAF will recall them.

Recall file from tape (using Grid tools with VO-based authentication)



To recall files from tape, you can use clientSRM [17] command with the option bol (which stands for Bring On Line).
clientSRM bol -e httpg://storm-fe-archive.cr.cnaf.infn.it:8444 -s srm://storm-fe-archive.cr.cnaf.infn.it:8444/srm/managerv2?SFN=/pamela/data/file
Where –e option provides the end-point to contact, -s option provides the SURL of the files. Nw, your recall request is queued. N.B. Remember the requestToken (for example: requestToken="ea8b525d-1b12-47a5-b8d5-6935ebc53003") which appears in the output of the previous command, because you can later use it to know the status of your request, i.e.:
clientSRM sbol -e httpg://storm-fe-archive.cr.cnaf.infn.it:8444 -t "ea8b525d-1b12-47a5-b8d5-6935ebc53003"










  • No labels