[gpfsug-discuss] AFM weirdness

Simon Thompson (IT Research Support) S.J.Thompson at bham.ac.uk
Fri Aug 25 08:44:35 BST 2017


So as Venkat says, AFM doesn't support using fallocate() to preallocate space.

So why aren't other people seeing this ... Well ...

We use EasyBuild to build our HPC cluster software including the compiler tool chains.
This enables the new linker ld.gold by default rather than the "old" ld.
Interestingly we don't seem to have seen this with C code being compiled, only fortran.
We can work around it by using the options to gfortran I mention below.

There is a mention to this limitation at:
https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.3/com.ibm.spectrum.scale.v4r23.doc/bl1ins_afmlimitations.htm

We aren;t directly calling gpfs_prealloc, but I guess the linker is indirectly calling it by making a call to posix_fallocate.

I do have a new problem with AFM where the data written to the cache differs from that replicated back to home... I'm beginning to think I don't like the decision to use AFM! Given the data written back to HOME is corrupt, I think this is definitely PMR time. But ... If you have Abaqus on you system and are using AFM, I'd be interested to see if someone else sees the same issue as us!

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of Simon Thompson <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>
Reply-To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Wednesday, 23 August 2017 at 14:01
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] AFM weirdness

I've got a PMR open about this ... Will email you the number directly.

Looking at the man page for ld.gold, it looks to set '--posix-fallocate' by default. In fact, testing with '-Xlinker -no-posix-fallocate' does indeed make the code compile.

Simon

From: "vpuvvada at in.ibm.com<mailto:vpuvvada at in.ibm.com>" <vpuvvada at in.ibm.com<mailto:vpuvvada at in.ibm.com>>
Date: Wednesday, 23 August 2017 at 13:36
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>, Simon Thompson <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>
Subject: Re: [gpfsug-discuss] AFM weirdness

I believe this error is result of preallocation failure, but traces are needed to confirm this.  AFM caching modes does not support preallocation of blocks (ex. using fallocate()). This feature is supported only in AFM DR.

~Venkat (vpuvvada at in.ibm.com<mailto:vpuvvada at in.ibm.com>)



From:        "Simon Thompson (IT Research Support)" <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        08/23/2017 03:48 PM
Subject:        Re: [gpfsug-discuss] AFM weirdness
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________



OK so I checked and if I run directly on the "AFM" FS in a different "non
AFM" directory, it works fine, so its something AFM related ...

Simon

On 23/08/2017, 11:11, "gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> on behalf
of Simon Thompson (IT Research Support)"
<gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> on behalf of
S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>> wrote:

>We're using an AFM cache from our HPC nodes to access data in another GPFS
>cluster, mostly this seems to be working fine, but we've just come across
>an interesting problem with a user using gfortran from the GCC 5.2.0
>toolset.
>
>When linking their code, they get a "no space left on device" error back
>from the linker. If we do this on a node that mounts the file-system
>directly (I.e. Not via AFM cache), then it works fine.
>
>We tried with GCC 4.5 based tools and it works OK, but the difference
>there is that 4.x uses ld and 5x uses ld.gold.
>
>If we strike the ld.gold when using AFM, we see:
>
>stat("program", {st_mode=S_IFREG|0775, st_size=248480, ...}) = 0
>unlink("program")                       = 0
>open("program", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0777) = 30
>fstat(30, {st_mode=S_IFREG|0775, st_size=0, ...}) = 0
>fallocate(30, 0, 0, 248480)             = -1 ENOSPC (No space left on
>device)
>
>
>
>Vs when running directly on the file-system:
>stat("program", {st_mode=S_IFREG|0775, st_size=248480, ...}) = 0
>unlink("program")                       = 0
>open("program", O_RDWR|O_CREAT|O_TRUNC|O_CLOEXEC, 0777) = 30
>fstat(30, {st_mode=S_IFREG|0775, st_size=0, ...}) = 0
>fallocate(30, 0, 0, 248480)             = 0
>
>
>
>Anyone seen anything like this before?
>
>... Actually I'm about to go off and see if its a function of AFM, or
>maybe something to do with the FS in use (I.e. Make a local directory on
>the filesystem on the "AFM" FS and see if that works ...)
>
>Thanks
>
>Simon
>
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=UqTzoU-bx454OgyeB4f0Nrruvs7yYAxFutzIe2eKmnc&s=8E5opHyyAwomLS8kdxpvKCvf6sdKBLlfZvx6wDdaZy4&e=

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=UqTzoU-bx454OgyeB4f0Nrruvs7yYAxFutzIe2eKmnc&s=8E5opHyyAwomLS8kdxpvKCvf6sdKBLlfZvx6wDdaZy4&e=




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170825/10bbecc8/attachment-0002.htm>


More information about the gpfsug-discuss mailing list