Open Science Grid Operations: GridFTP-HDFS Corruption Issue Workaround

Tuesday, September 2, 2014

GridFTP-HDFS Corruption Issue Workaround

Some sites in OSG have observed data corruption when transferring files with GridFTP-HDFS. In particular, the problem arises when pthreads is enabled (by setting GLOBUS_THREAD_MODEL="pthread") and GridFTP is using *single stream* transfers that span multiple HDFS blocks. In this condition, blocks may be written to the destination file in the wrong sequence. A few sites using GridFTP-HDFS have reported failures, including Fermilab and GLOW.

This issue affects the OSG gridftp-hdfs package, versions 0.5.4-14 and newer, because they have pthreads enabled by default.

DETAILS

Transfers that use parallelism (we tried from 2-10 streams) and single stream transfers that only span a single HDFS block seem to be fine.

Transfers using a single stream but spanning multiple (3 or more) HDFS blocks result in the correct size, but usually the wrong checksum at the destination. The issue was reported originally by a remote user transferring via srm-copy, and OSG testing has observed the same failures using local globus-url-copy tools.

The issue has been reported to the Globus GridFTP developers, but there is no fix yet. But see below for possible workarounds.

WORKAROUNDS

On the server side, the problem can be avoided by disabling pthreads. To do this step, comment out the following line in /etc/gridftp.d/gridftp-hdfs.conf:

# $GLOBUS_THREAD_MODEL pthread

Until the bug is fixed, we recommend making this change on all gridftp-hdfs servers 0.5.4-14 and newer.

Alternatively, if pthreads cannot be disabled for the GridFTP server, it is sufficient on the client side to run globus-url-copy with parallelism greater than 1. For example:

globus-url-copy -p 2 gsiftp://$host:2811/path/file.in file:///path/file.out

FOR MORE INFO

https://globus.atlassian.net/browse/GT-547
https://jira.opensciencegrid.org/browse/SOFTWARE-1495
https://ticket.grid.iu.edu/21825
https://ticket.grid.iu.edu/21157

Open Science Grid Operations Center

Based at Indiana University, the OSG Operations Group provides a single point of operational support for the Open Science Grid (OSG). Operations performs real time Grid monitoring and problem tracking, provides support to users, developers and systems administrators, maintains grid services, provides security incident response, and maintains information repositories.

Operations Service Status Overview

Contact: Phone +1 317-278-9699; Email goc [at] opensciencegrid.org; Submit ticket.

Please visit the Operations Git Pages for more information on every day activities of OSG Operations.

Open Science Grid Operations

Tuesday, September 2, 2014

GridFTP-HDFS Corruption Issue Workaround

Search This Blog

OSG Operations Team

Open Science Grid Operations Center

OSG Operational Services

Upcoming Events - OSG Operations Calendar

Blog Archive

Open Science Grid Operations

Tuesday, September 2, 2014

GridFTP-HDFS Corruption Issue Workaround

Search This Blog

OSG Operations Team

Open Science Grid Operations Center

OSG Operational Services

Upcoming Events - OSG Operations Calendar

Blog Archive

Subscribe