[gpfsug-discuss] Kernel > 4.10, python >= 3.8 issue
    Ray Coetzee 
    coetzee.ray at gmail.com
       
    Wed May 26 18:33:20 BST 2021
    
    
  
Hello all
I'd be interested to know if anyone else has experienced a problem with Kernel
> 4.10, python >= 3.8 and Spectrum Scale (5.0.5-2).
We noticed that python shut.copy() is failing against a GPFS mount with:
BlockingIOError: [Errno 11] Resource temporarily unavailable: 'test.file'
-> 'test2.file'
To reproduce the error:
```
[user at login01]$ module load python-3.8.9-gcc-9.3.0-soqwnzh
[ user at login01]$ truncate --size 640MB test.file
[ user at login01]$ python3 -c "import shutil; shutil.copy('test.file',
'test2.file')"
Traceback (most recent call last):
 File "<string>", line 1, in <module>
 File
"/hps/software/spack/opt/spack/linux-centos8-sandybridge/gcc-9.3.0/python-3.8.9-soqwnzhndvqpk3mly3w6z6zx6cdv45sn/lib/python3.8/shutil.py",
line 418, in copy
 copyfile(src, dst, follow_symlinks=follow_symlinks)
 File
"/hps/software/spack/opt/spack/linux-centos8-sandybridge/gcc-9.3.0/python-3.8.9-soqwnzhndvqpk3mly3w6z6zx6cdv45sn/lib/python3.8/shutil.py",
line 275, in copyfile
 _fastcopy_sendfile(fsrc, fdst)
 File
"/hps/software/spack/opt/spack/linux-centos8-sandybridge/gcc-9.3.0/python-3.8.9-soqwnzhndvqpk3mly3w6z6zx6cdv45sn/lib/python3.8/shutil.py",
line 172, in _fastcopy_sendfile
 raise err
 File
"/hps/software/spack/opt/spack/linux-centos8-sandybridge/gcc-9.3.0/python-3.8.9-soqwnzhndvqpk3mly3w6z6zx6cdv45sn/lib/python3.8/shutil.py",
line 152, in _fastcopy_sendfile
 sent = os.sendfile(outfd, infd, offset, blocksize)
BlockingIOError: [Errno 11] Resource temporarily unavailable: 'test.file'
-> 'test2.file'
 Investigating into why this is happening revealed that:
1. It is failing for python3.8 and above.
2. It is happening only a GPFS mount
3. It is happening with files whose size is multiple of 4KB (OS Page size)
Relevant links:
https://bugs.python.org/issue43743
https://www.ibm.com/support/pages/apar/IJ28891
Doing an strace revealed that at the lower level, it seems to be related to
the Linux Syscall of “sendfile”, which seems to fail in these cases on GPFS.
Strace for a 4096 B file:
```
sendfile(4, 3, [0] => [4096], 8388608) = 4096
sendfile(4, 3, [4096], 8388608) = -1 EAGAIN (Resource temporarily
unavailable)
```
The same file on other disk:
```
sendfile(4, 3, [0] => [4096], 8388608) = 4096
sendfile(4, 3, [4096], 8388608) = 0
IBM's "fix" for the problem of "Do not use a file size which that is a
multiple of the page size."  sounds really blasé.
```
Kind regards
Ray Coetzee
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210526/cf991430/attachment.htm>
    
    
More information about the gpfsug-discuss
mailing list