[gpfsug-discuss] CCR troubles

McPheeters, Gordon gmcpheeters at anl.gov
Wed Jul 27 23:34:50 BST 2016


mmchcluster has an option:
‐‐ccr‐disable
         Reverts to the traditional primary or backup
         configuration server semantics and destroys the CCR
         environment. All nodes must be shut down before
         disabling CCR.


-Gordon


On Jul 27, 2016, at 5:29 PM, Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>> wrote:

Hi Marc,

I do understand the principal you describe.  The quorum nodes are accessible over TCP/IP but GPFS happens to be down.  I think that CCR should would work regardless of whether GPFS is up or down, so that you can change the configuration on a down cluster.  I could even imagine a scenario where a config parameter was set incorrectly and prevents GPFS from starting at all.  If you have to have GPFS up to make config changes because of CCR then how can you fix this issue?

Thanks for the response!
-Bryan

From: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Marc A Kaplan
Sent: Wednesday, July 27, 2016 1:03 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] CCR troubles

I understand you are having problems with your cluster, but you do NOT need to have GPFS "started" to
display and/or change configuration paramters.  You do need at least a majority of the nodes to be up and in communcation (e.g. can talk to each other by tcp/ip)

--ccr-enable
Enables the configuration server repository (CCR), which stores redundant copies of configuration data files on all quorum nodes. The advantage of CCR over the traditional primary or backup configuration server semantics is that when using CCR, all GPFS administration commands as well as file system mounts and daemon startups work normally as long as a majority of quorum nodes are accessible.

Think about how this must work (I have the advantage of actually NOT knowing the details, but one can reason...)
to maintain a consistent single configuration database, a majority of quorum nodes MUST agree on every bit of data in the configuration database.

Even to query the database and get a correct answer, you'd have to know that a majority agree on the answer.

(You could ask 1 guy, but then how would you know if he was telling you what the majority opinion is? The minority need not lie to mislead you,
I don't think CCR guards against Byzantine failures...
The minority guy could just be out of touch for a while...)

I advise that you do some testing on a test cluster (could be virtual)...



From:        Bryan Banister <bbanister at jumptrading.com<mailto:bbanister at jumptrading.com>>
To:        "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>)" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date:        07/27/2016 01:37 PM
Subject:        [gpfsug-discuss] CCR troubles
Sent by:        gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
________________________________



When I have the GPFS cluster down, some GPFS commands no longer work like they should, or at least they did work without CCR:

# mmgetstate -aL                                                                 # Which stalls for a really stupid amount of time and then spits out:
get file failed: Not enough CCR quorum nodes available (err 809)
gpfsClusterInit: Unexpected error from ccr fget mmsdrfs.  Return code: 158
mmgetstate: Command failed. Examine previous error messages to determine cause.

And trying to change tuning parameters now also barfs when GPFS is down:
# [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig
get file failed: Not enough CCR quorum nodes available (err 809)
gpfsClusterInit: Unexpected error from ccr fget mmsdrfs.  Return code: 158
mmlsconfig: Command failed. Examine previous error messages to determine cause.

# mmchconfig worker1Threads=128,prefetchThreads=128
mmchconfig: Unable to obtain the GPFS configuration file lock.
mmchconfig: GPFS was unable to obtain a lock from node fpia-gpfs-jcsdr01.grid.jumptrading.com<http://fpia-gpfs-jcsdr01.grid.jumptrading.com/>.
mmchconfig: Command failed. Examine previous error messages to determine cause.

Which means I will have to start GPFS, change the parameter, shut GPFS down again, and start GPFS up again just to get the new setting.

Is this really the new mode of operation for CCR enabled clusters?

I searched CCR in the Concepts, Planning, and Install Guide and also the Adv. Admin Guide, with explanation.

If so, then maybe I’ll go back to non CCR,
-Bryan



________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product._______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



________________________________

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org/>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160727/53030df2/attachment-0002.htm>


More information about the gpfsug-discuss mailing list