[gpfsug-discuss] Kernel > 4.10, python >= 3.8 issue
Ray Coetzee
coetzee.ray at gmail.com
Wed May 26 18:33:20 BST 2021
Hello all
I'd be interested to know if anyone else has experienced a problem with Kernel
> 4.10, python >= 3.8 and Spectrum Scale (5.0.5-2).
We noticed that python shut.copy() is failing against a GPFS mount with:
BlockingIOError: [Errno 11] Resource temporarily unavailable: 'test.file'
-> 'test2.file'
To reproduce the error:
```
[user at login01]$ module load python-3.8.9-gcc-9.3.0-soqwnzh
[ user at login01]$ truncate --size 640MB test.file
[ user at login01]$ python3 -c "import shutil; shutil.copy('test.file',
'test2.file')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File
"/hps/software/spack/opt/spack/linux-centos8-sandybridge/gcc-9.3.0/python-3.8.9-soqwnzhndvqpk3mly3w6z6zx6cdv45sn/lib/python3.8/shutil.py",
line 418, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
File
"/hps/software/spack/opt/spack/linux-centos8-sandybridge/gcc-9.3.0/python-3.8.9-soqwnzhndvqpk3mly3w6z6zx6cdv45sn/lib/python3.8/shutil.py",
line 275, in copyfile
_fastcopy_sendfile(fsrc, fdst)
File
"/hps/software/spack/opt/spack/linux-centos8-sandybridge/gcc-9.3.0/python-3.8.9-soqwnzhndvqpk3mly3w6z6zx6cdv45sn/lib/python3.8/shutil.py",
line 172, in _fastcopy_sendfile
raise err
File
"/hps/software/spack/opt/spack/linux-centos8-sandybridge/gcc-9.3.0/python-3.8.9-soqwnzhndvqpk3mly3w6z6zx6cdv45sn/lib/python3.8/shutil.py",
line 152, in _fastcopy_sendfile
sent = os.sendfile(outfd, infd, offset, blocksize)
BlockingIOError: [Errno 11] Resource temporarily unavailable: 'test.file'
-> 'test2.file'
Investigating into why this is happening revealed that:
1. It is failing for python3.8 and above.
2. It is happening only a GPFS mount
3. It is happening with files whose size is multiple of 4KB (OS Page size)
Relevant links:
https://bugs.python.org/issue43743
https://www.ibm.com/support/pages/apar/IJ28891
Doing an strace revealed that at the lower level, it seems to be related to
the Linux Syscall of “sendfile”, which seems to fail in these cases on GPFS.
Strace for a 4096 B file:
```
sendfile(4, 3, [0] => [4096], 8388608) = 4096
sendfile(4, 3, [4096], 8388608) = -1 EAGAIN (Resource temporarily
unavailable)
```
The same file on other disk:
```
sendfile(4, 3, [0] => [4096], 8388608) = 4096
sendfile(4, 3, [4096], 8388608) = 0
IBM's "fix" for the problem of "Do not use a file size which that is a
multiple of the page size." sounds really blasé.
```
Kind regards
Ray Coetzee
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210526/cf991430/attachment.htm>
More information about the gpfsug-discuss
mailing list