[gpfsug-discuss] mounts taking longer in 4.2 vs 4.1?

Felipe Knop knop at us.ibm.com
Fri Feb 9 13:32:30 GMT 2018


All,

For at least one of the instances reported by this group, a PMR has been
opened, and a fix is being developed. For folks that are getting affected
by the problem: Please contact the service team to confirm your problem is
the same as the one previously reported, and for an outlook for the
availability of the fix.

Thanks,

  Felipe

----
Felipe Knop                                     knop at us.ibm.com
GPFS Development and Security
IBM Systems
IBM Building 008
2455 South Rd, Poughkeepsie, NY 12601
(845) 433-9314  T/L 293-9314





From:	Bryan Banister <bbanister at jumptrading.com>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>,
            "Loic	Tortay" <tortay at cc.in2p3.fr>
Date:	02/08/2018 04:11 PM
Subject:	Re: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1?
Sent by:	gpfsug-discuss-bounces at spectrumscale.org



It may be related to this issue of using root squashed file system option,
here are some edited comments from my colleague who stumbled upon this
while chatting with a friend at a CUG:

" Something I learned last week:  apparently the libmount code from
util-linux (used by /bin/mount) will call utimensat() on new mountpoints if
access() fails (for example, on root-squashed filesystems).  This is done
"just to be sure" that the filesystem is really read-only.  This operation
can be quite expensive and (anecdotally) may cause huge slowdowns when
mounting root-squashed parallel filesystems on thousands of clients.

Here is the relevant code:

https://github.com/karelzak/util-linux/blame/1ea4e7bd8d9d0f0ef317558c627e6fa069950e8d/libmount/src/utils.c#L222

This code has been in util-linux for years.

It's not clear exactly what the impact is in our environment, but this
certainly can't be helping, especially since we've grown the size of the
cluster considerably. Mounting GPFS has recently really become a slow and
disruptive operation – if you try to mount many clients at once, the FS
will hang for a considerable period of time.

The timing varies, but here is one example from an isolated mounting
operation:

12:09:11.222513 mount("<gpfs_fs>", "<mount_point>", "gpfs", MS_MGC_VAL,
"dev=<gpfs_cluster>"...) = 0 <1.590217>
12:09:12.812777 access("<mount_point>", W_OK) = -1 EACCES (Permission
denied) <0.000022>
12:09:12.812841 utimensat(AT_FDCWD, "<mount_point>", \{UTIME_NOW,
\{93824994378048, 1073741822}}, 0) = -1 EPERM (Operation not permitted)
<2.993689>
Here, the utimensat() took ~3 seconds, almost twice as long as the mount
operation! I also suspect it will slow down other clients trying to mount
the filesystem since the sgmgr has to process this write attempt to the
mountpoint.

(Hilariously, it still returns the "wrong" answer, because this filesystem
is not read-only, just squashed.)

As of today, the person who originally brought the issue to my attention at
CUG has raised it for discussion on the util-linux mailing list.
https://marc.info/?l=util-linux-ng&m=151075932824688&w=2
"

We ended up putting facls on our mountpoints like such, which hacked around
this stupidity:
for fs in gpfs_mnt_point ; do
  chmod 1755 $fs
  setfacl -m u:99:rwx $fs  # 99 is the "nobody" uid to which root is
mapped--see "mmauth" output
done

Hope that helps,
-Bryan

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org [
mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Aaron Knister
Sent: Thursday, February 08, 2018 2:23 PM
To: Loic Tortay <tortay at cc.in2p3.fr>
Cc: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] mounts taking longer in 4.2 vs 4.1?

Note: External Email
-------------------------------------------------

Hi Loic,

Thank you for that information!

I have two follow up questions--
1. Are you using ccr?
2. Do you happen to have mmsdrserv disabled in your environment? (e.g.
what's the output of "mmlsconfig mmsdrservPort" on your cluster?).

-Aaron

On Thu, 8 Feb 2018, Loic Tortay wrote:

> On 07/02/2018 22:28, Aaron Knister wrote:
>> I noticed something curious after migrating some nodes from 4.1 to 4.2
>> which is that mounts now can take foorrreeevverrr. It seems to boil down
>> to the point in the mount process where getEFOptions is called.
>>
>> To highlight the difference--
>>
> [...]
>>
> Hello,
> I have had this (or a very similar) issue after migrating from 4.1.1.8 to
> 4.2.3.  There are 37 filesystems in our main cluster, which made the
problem
> really noticeable.
>
> A PMR has been opened.  I have tested the fixes included in 4.2.3.7,
(which,
> I'm told, should be released today) actually resolve my problems (APAR
> IJ03192 & IJ03235).
>
>
> Loïc.
> --
> |   Loïc Tortay <tortay at cc.in2p3.fr>  -     IN2P3 Computing Centre     |
>




Note: This email is for the confidential use of the named addressee(s) only
and may contain proprietary, confidential or privileged information. If you
are not the intended recipient, you are hereby notified that any review,
dissemination or copying of this email is strictly prohibited, and to
please notify the sender immediately and destroy this email and any
attachments. Email transmission cannot be guaranteed to be secure or
error-free. The Company, therefore, does not make any guarantees as to the
completeness or accuracy of this email or any attachments. This email is
for informational purposes only and does not constitute a recommendation,
offer, request or solicitation of any kind to buy, sell, subscribe, redeem
or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=oNT2koCZX0xmWlSlLblR9Q&m=C0S8WTufrOCvXbHUegB8zS9jk_1SLczALa-4aVEubu4&s=VTWKI-xcUiJ_LeMhJ-xOPmnz0Zm9IspKsU3bsxA4BNo&e=







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180209/a91062c1/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180209/a91062c1/attachment-0002.gif>


More information about the gpfsug-discuss mailing list