[gpfsug-discuss] Kernel > 4.10, python >= 3.8 issue

Ray Coetzee coetzee.ray at gmail.com
Wed May 26 18:33:20 BST 2021


Hello all

I'd be interested to know if anyone else has experienced a problem with Kernel
> 4.10, python >= 3.8 and Spectrum Scale (5.0.5-2).

We noticed that python shut.copy() is failing against a GPFS mount with:

BlockingIOError: [Errno 11] Resource temporarily unavailable: 'test.file'
-> 'test2.file'

To reproduce the error:

```
[user at login01]$ module load python-3.8.9-gcc-9.3.0-soqwnzh

[ user at login01]$ truncate --size 640MB test.file
[ user at login01]$ python3 -c "import shutil; shutil.copy('test.file',
'test2.file')"
Traceback (most recent call last):
 File "<string>", line 1, in <module>
 File
"/hps/software/spack/opt/spack/linux-centos8-sandybridge/gcc-9.3.0/python-3.8.9-soqwnzhndvqpk3mly3w6z6zx6cdv45sn/lib/python3.8/shutil.py",
line 418, in copy
 copyfile(src, dst, follow_symlinks=follow_symlinks)
 File
"/hps/software/spack/opt/spack/linux-centos8-sandybridge/gcc-9.3.0/python-3.8.9-soqwnzhndvqpk3mly3w6z6zx6cdv45sn/lib/python3.8/shutil.py",
line 275, in copyfile
 _fastcopy_sendfile(fsrc, fdst)
 File
"/hps/software/spack/opt/spack/linux-centos8-sandybridge/gcc-9.3.0/python-3.8.9-soqwnzhndvqpk3mly3w6z6zx6cdv45sn/lib/python3.8/shutil.py",
line 172, in _fastcopy_sendfile
 raise err
 File
"/hps/software/spack/opt/spack/linux-centos8-sandybridge/gcc-9.3.0/python-3.8.9-soqwnzhndvqpk3mly3w6z6zx6cdv45sn/lib/python3.8/shutil.py",
line 152, in _fastcopy_sendfile
 sent = os.sendfile(outfd, infd, offset, blocksize)
BlockingIOError: [Errno 11] Resource temporarily unavailable: 'test.file'
-> 'test2.file'



 Investigating into why this is happening revealed that:


1. It is failing for python3.8 and above.
2. It is happening only a GPFS mount
3. It is happening with files whose size is multiple of 4KB (OS Page size)

Relevant links:
https://bugs.python.org/issue43743
https://www.ibm.com/support/pages/apar/IJ28891


Doing an strace revealed that at the lower level, it seems to be related to
the Linux Syscall of “sendfile”, which seems to fail in these cases on GPFS.


Strace for a 4096 B file:

```
sendfile(4, 3, [0] => [4096], 8388608) = 4096
sendfile(4, 3, [4096], 8388608) = -1 EAGAIN (Resource temporarily
unavailable)

```

The same file on other disk:
```
sendfile(4, 3, [0] => [4096], 8388608) = 4096
sendfile(4, 3, [4096], 8388608) = 0


IBM's "fix" for the problem of "Do not use a file size which that is a
multiple of the page size."  sounds really blasé.


```


Kind regards

Ray Coetzee
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210526/cf991430/attachment-0001.htm>


More information about the gpfsug-discuss mailing list