From abeattie at au1.ibm.com  Sun Sep  1 14:17:01 2019
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Sun, 1 Sep 2019 13:17:01 +0000
Subject: [gpfsug-discuss] Backup question
In-Reply-To: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com>
References: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com>
Message-ID: <OF900738E9.D3475363-ON00258468.004792DF-00258468.0048F86B@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190901/c59b82d1/attachment.htm>

From sandeep.patil at in.ibm.com  Tue Sep  3 06:28:30 2019
From: sandeep.patil at in.ibm.com (Sandeep Ramesh)
Date: Tue, 3 Sep 2019 05:28:30 +0000
Subject: [gpfsug-discuss] Latest Technical Blogs on IBM Spectrum Scale (Q2
	2019)
In-Reply-To: <OFEED53014.BEAB96A3-ON652583EB.00247469-652583EB.0024D44E@LocalDomain>
References: <OF7A360CDE.FA6DB691-ON652581DA.005047B1-652581DA.00510C76@LocalDomain>
	<OF574EC5A3.432467EB-ON65258211.00247AF9-65258211.0024E8C2@LocalDomain>
	<OF3AFFA28C.972DCC84-ON6525825D.0040EC76-6525825D.004159E3@LocalDomain>
	<OFA6EC728F.FF378285-ON652582BE.00649A77-652582BE.0066D779@LocalDomain>
	<OF0BEA5F18.0E4A8655-ON6525831B.0051B859-6525831B.00540D1A@LocalDomain>
	<OFDAD8861F.EBFB80F2-ON65258382.0045FED0-65258382.0046E369@LocalDomain>
	<OFEED53014.BEAB96A3-ON652583EB.00247469-652583EB.0024D44E@LocalDomain>
Message-ID: <OFC7CE74CE.2A8DF83C-ON6525846A.001C9692-6525846A.001E12F4@notes.na.collabserv.com>

Dear User Group Members,

In continuation, here are list of development blogs in the this quarter 
(Q2 2019). We now have over 100+ developer blogs on Spectrum Scale/ESS. As 
discussed in User Groups, passing it along to the emailing list.


Redpaper : IBM Power Systems Enterprise AI Solutions (W/ SPECTRUM SCALE)
http://www.redbooks.ibm.com/redpieces/abstracts/redp5556.html?Open

IBM Spectrum Scale Erasure Code Edition (ECE): Installation Demonstration 

https://www.youtube.com/watch?v=6If50EvgP-U

Blogs:
Using IBM Spectrum Scale as platform storage for running containerized 
Hadoop/Spark workloads
https://developer.ibm.com/storage/2019/08/27/using-ibm-spectrum-scale-as-platform-storage-for-running-containerized-hadoop-spark-workloads/

Useful Tools for Spectrum Scale CES NFS
https://developer.ibm.com/storage/2019/07/22/useful-tools-for-spectrum-scale-ces-nfs/

How to ensure NFS uses strong encryption algorithms for secure data in 
motion ?
https://developer.ibm.com/storage/2019/07/19/how-to-ensure-nfs-uses-strong-encryption-algorithms-for-secure-data-in-motion/

Introducing IBM Spectrum Scale Erasure Code Edition
https://developer.ibm.com/storage/2019/07/07/introducing-ibm-spectrum-scale-erasure-code-edition/

Spectrum Scale: Which Filesystem Encryption Algo to Consider ?
https://developer.ibm.com/storage/2019/07/01/spectrum-scale-which-filesystem-encryption-algo-to-consider/

IBM Spectrum Scale HDFS Transparency Apache Hadoop 3.1.x Support
https://developer.ibm.com/storage/2019/06/24/ibm-spectrum-scale-hdfs-transparency-apache-hadoop-3-0-x-support/

Enhanced features in Elastic Storage Server (ESS) 5.3.4 
https://developer.ibm.com/storage/2019/06/19/enhanced-features-in-elastic-storage-server-ess-5-3-4/

Upgrading IBM Spectrum Scale Erasure Code Edition using installation 
toolkit
https://developer.ibm.com/storage/2019/06/09/upgrading-ibm-spectrum-scale-erasure-code-edition-using-installation-toolkit/

Upgrading IBM Spectrum Scale sync replication / stretch cluster setup in 
PureApp
https://developer.ibm.com/storage/2019/06/06/upgrading-ibm-spectrum-scale-sync-replication-stretch-cluster-setup/

GPFS config remote access with multiple network definitions
https://developer.ibm.com/storage/2019/05/30/gpfs-config-remote-access-with-multiple-network-definitions/

IBM Spectrum Scale Erasure Code Edition Fault Tolerance
https://developer.ibm.com/storage/2019/05/30/ibm-spectrum-scale-erasure-code-edition-fault-tolerance/

IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 
5.0.3 ?
https://developer.ibm.com/storage/2019/05/02/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-3/

Understanding and Solving WBC_ERR_DOMAIN_NOT_FOUND error with 
Spectrum Scale
https://crk10.wordpress.com/2019/07/21/solving-the-wbc-err-domain-not-found-nt-status-none-mapped-glitch-in-ibm-spectrum-scale/

Understanding and Solving NT_STATUS_INVALID_SID issue for SMB access with 
Spectrum Scale
https://crk10.wordpress.com/2019/07/24/solving-nt_status_invalid_sid-for-smb-share-access-in-ibm-spectrum-scale/

mmadquery primer (apparatus to query Active Directory from IBM 
Spectrum Scale)
https://crk10.wordpress.com/2019/07/27/mmadquery-primer-apparatus-to-query-active-directory-from-ibm-spectrum-scale/

How to configure RHEL host as Active Directory Client using SSSD
https://crk10.wordpress.com/2019/07/28/configure-rhel-machine-as-active-directory-client-using-sssd/

How to configure RHEL host as LDAP client using nslcd
https://crk10.wordpress.com/2019/07/28/configure-rhel-machine-as-ldap-client-using-nslcd/

Solving NFSv4 AUTH_SYS nobody ownership issue
https://crk10.wordpress.com/2019/07/29/nfsv4-auth_sys-nobody-ownership-and-idmapd/

For more : Search /browse here: https://developer.ibm.com/storage/blog

User Group Presentations: 
https://www.spectrumscale.org/presentations/

Consolidation list of all blogs and collaterals.
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   04/29/2019 12:12 PM
Subject:        Latest Technical Blogs on IBM Spectrum Scale (Q1 2019)


Dear User Group Members,

In continuation, here are list of development blogs in the this quarter 
(Q1 2019). We now have over 100+ developer blogs on Spectrum Scale/ESS. As 
discussed in User Groups, passing it along to the emailing list.

Spectrum Scale 5.0.3
https://developer.ibm.com/storage/2019/04/24/spectrum-scale-5-0-3/

IBM Spectrum Scale HDFS Transparency Ranger Support
https://developer.ibm.com/storage/2019/04/01/ibm-spectrum-scale-hdfs-transparency-ranger-support/

Integration of IBM Aspera Sync with IBM Spectrum Scale: Protecting and 
Sharing Files Globally, 

http://www.redbooks.ibm.com/abstracts/redp5527.html?Open

Spectrum Scale user group in Singapore, 2019
https://developer.ibm.com/storage/2019/03/14/spectrum-scale-user-group-in-singapore-2019/

7 traits to use Spectrum Scale to run container workload
https://developer.ibm.com/storage/2019/02/26/7-traits-to-use-spectrum-scale-to-run-container-workload/

Health Monitoring of IBM Spectrum Scale Cluster via External Monitoring 
Framework
https://developer.ibm.com/storage/2019/01/22/health-monitoring-of-ibm-spectrum-scale-cluster-via-external-monitoring-framework/

Migrating data from native HDFS to IBM Spectrum Scale based shared storage
https://developer.ibm.com/storage/2019/01/18/migrating-data-from-native-hdfs-to-ibm-spectrum-scale-based-shared-storage/

Bulk File Creation useful for Test on Filesystems
https://developer.ibm.com/storage/2019/01/16/bulk-file-creation-useful-for-test-on-filesystems/


For more : Search /browse here: https://developer.ibm.com/storage/blog

User Group Presentations: 
https://www.spectrumscale.org/presentations/

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   01/14/2019 06:24 PM
Subject:        Latest Technical Blogs on IBM Spectrum Scale (Q4 2018)


Dear User Group Members,

In continuation, here are list of development blogs in the this quarter 
(Q4 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As 
discussed in User Groups, passing it along to the emailing list.

Redpaper: IBM Spectrum Scale and IBM StoredIQ: Identifying and securing 
your business data to support regulatory requirements
http://www.redbooks.ibm.com/abstracts/redp5525.html?Open

IBM Spectrum Scale Memory Usage
https://www.slideshare.net/tomerperry/ibm-spectrum-scale-memory-usage?qid=50a1dfda-3102-484f-b9d0-14b69fc4800b&v=&b=&from_search=2

Spectrum Scale and Containers
https://developer.ibm.com/storage/2018/12/20/spectrum-scale-and-containers/

IBM Elastic Storage Server Performance Graphical Visualization with 
Grafana
https://developer.ibm.com/storage/2018/12/18/ibm-elastic-storage-server-performance-graphical-visualization-with-grafana/

Hadoop Performance for disaggregated compute and storage configurations 
based on IBM Spectrum Scale Storage
https://developer.ibm.com/storage/2018/12/13/hadoop-performance-for-disaggregated-compute-and-storage-configurations-based-on-ibm-spectrum-scale-storage/

EMS HA in ESS LE (Little Endian) environment
https://developer.ibm.com/storage/2018/12/07/ems-ha-in-ess-le-little-endian-environment/

What?s new in ESS 5.3.2
https://developer.ibm.com/storage/2018/12/04/whats-new-in-ess-5-3-2/

Administer your Spectrum Scale cluster easily
https://developer.ibm.com/storage/2018/11/13/administer-your-spectrum-scale-cluster-easily/

Disaster Recovery using Spectrum Scale?s Active File Management
https://developer.ibm.com/storage/2018/11/13/disaster-recovery-using-spectrum-scales-active-file-management/

Recovery Group Failover Procedure of IBM Elastic Storage Server (ESS)
https://developer.ibm.com/storage/2018/10/08/recovery-group-failover-procedure-ibm-elastic-storage-server-ess/

Whats new in IBM Elastic Storage Server (ESS) Version 5.3.1 and 5.3.1.1
https://developer.ibm.com/storage/2018/10/04/whats-new-ibm-elastic-storage-server-ess-version-5-3-1-5-3-1-1/

For more : Search /browse here: https://developer.ibm.com/storage/blog

User Group Presentations: 
https://www.spectrumscale.org/presentations/

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   10/03/2018 08:48 PM
Subject:        Latest Technical Blogs on IBM Spectrum Scale (Q3 2018)


Dear User Group Members,

In continuation, here are list of development blogs in the this quarter 
(Q3 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As 
discussed in User Groups, passing it along to the emailing list.

How NFS exports became more dynamic with Spectrum Scale 5.0.2
https://developer.ibm.com/storage/2018/10/02/nfs-exports-became-dynamic-spectrum-scale-5-0-2/

HPC storage on AWS (IBM Spectrum Scale)
https://developer.ibm.com/storage/2018/10/02/hpc-storage-aws-ibm-spectrum-scale/

Upgrade with Excluding the node(s) using Install-toolkit
https://developer.ibm.com/storage/2018/09/30/upgrade-excluding-nodes-using-install-toolkit/

Offline upgrade using Install-toolkit
https://developer.ibm.com/storage/2018/09/30/offline-upgrade-using-install-toolkit/

IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 
5.0.2 ?
https://developer.ibm.com/storage/2018/09/21/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-2/

What?s New in IBM Spectrum Scale 5.0.2 ?
https://developer.ibm.com/storage/2018/09/15/whats-new-ibm-spectrum-scale-5-0-2/

Starting IBM Spectrum Scale 5.0.2 release, the installation toolkit 
supports upgrade rerun if fresh upgrade fails.
https://developer.ibm.com/storage/2018/09/15/starting-ibm-spectrum-scale-5-0-2-release-installation-toolkit-supports-upgrade-rerun-fresh-upgrade-fails/

IBM Spectrum Scale installation toolkit ? enhancements over releases ? 
5.0.2.0
https://developer.ibm.com/storage/2018/09/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases-5-0-2-0/

Announcing HDP 3.0 support with IBM Spectrum Scale
https://developer.ibm.com/storage/2018/08/31/announcing-hdp-3-0-support-ibm-spectrum-scale/

IBM Spectrum Scale Tuning Overview for Hadoop Workload
https://developer.ibm.com/storage/2018/08/20/ibm-spectrum-scale-tuning-overview-hadoop-workload/

Making the Most of Multicloud Storage
https://developer.ibm.com/storage/2018/08/13/making-multicloud-storage/

Disaster Recovery for Transparent Cloud Tiering using SOBAR
https://developer.ibm.com/storage/2018/08/13/disaster-recovery-transparent-cloud-tiering-using-sobar/

Your Optimal Choice of AI Storage for Today and Tomorrow
https://developer.ibm.com/storage/2018/08/10/spectrum-scale-ai-workloads/

Analyze IBM Spectrum Scale File Access Audit with ELK Stack
https://developer.ibm.com/storage/2018/07/30/analyze-ibm-spectrum-scale-file-access-audit-elk-stack/

Mellanox SX1710 40G switch MLAG configuration for IBM ESS
https://developer.ibm.com/storage/2018/07/12/mellanox-sx1710-40g-switcher-mlag-configuration/

Protocol Problem Determination Guide for IBM Spectrum Scale? ? SMB and NFS 
Access issues
https://developer.ibm.com/storage/2018/07/10/protocol-problem-determination-guide-ibm-spectrum-scale-smb-nfs-access-issues/

Access Control in IBM Spectrum Scale Object
https://developer.ibm.com/storage/2018/07/06/access-control-ibm-spectrum-scale-object/

IBM Spectrum Scale HDFS Transparency Docker support
https://developer.ibm.com/storage/2018/07/06/ibm-spectrum-scale-hdfs-transparency-docker-support/

Protocol Problem Determination Guide for IBM Spectrum Scale? ? Log 
Collection
https://developer.ibm.com/storage/2018/07/04/protocol-problem-determination-guide-ibm-spectrum-scale-log-collection/


Redpapers

IBM Spectrum Scale Immutability Introduction, Configuration Guidance, 
and Use Cases
http://www.redbooks.ibm.com/abstracts/redp5507.html?Open

Certifications
Assessment of the immutability function of IBM Spectrum Scale Version 5.0 
in accordance to US SEC17a-4f, EU GDPR Article 21 Section 1, German and 
Swiss laws and regulations in collaboration with KPMG.

Certificate: 
http://www.kpmg.de/bescheinigungen/RequestReport.aspx?DE968667B47544FF83F6CCDCF37E5FB5
Full assessment report: 
http://www.kpmg.de/bescheinigungen/RequestReport.aspx?B290411BE1224F5A9B4D24663BCD3C5D

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   07/03/2018 12:13 AM
Subject:        Re: Latest Technical Blogs on Spectrum Scale (Q2 2018)


Dear User Group Members,

In continuation , here are list of development blogs in the this quarter 
(Q2 2018). We now have over 100+ developer blogs. As discussed in User 
Groups, passing it along:

IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object
https://developer.ibm.com/storage/2018/06/15/6494/

IBM Spectrum Scale ILM Policies
https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/

IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object
https://developer.ibm.com/storage/2018/06/15/6494/

Management GUI enhancements in IBM Spectrum Scale release 5.0.1
https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/


Managing IBM Spectrum Scale services through GUI
https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/


Use AWS CLI with IBM Spectrum Scale? object storage
https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/

Hadoop Storage Tiering with IBM Spectrum Scale
https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/

How many Files on my Filesystem?
https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/

Recording Spectrum Scale Object Stats for Potential Billing like Purpose 
using Elasticsearch
https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/

New features in IBM Elastic Storage Server (ESS) Version 5.3
https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/


Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send 
earlier)
https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19


Redpapers

Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for 
Building an Integrated Solution
http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, 

Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent 
Cloud Tiering
http://www.redbooks.ibm.com/abstracts/redp5411.html?Open

SAP HANA and ESS: A Winning Combination (Update)
http://www.redbooks.ibm.com/abstracts/redp5436.html?Open


Others
IBM Spectrum Scale Software Version Recommendation Preventive Service 
Planning (Updated)
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, 

IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in 
HCLS
https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN&

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   03/27/2018 05:23 PM
Subject:        Re: Latest Technical Blogs on Spectrum Scale


Dear User Group Members,

In continuation , here are list of development blogs in the this quarter 
(Q1 2018). As discussed in User Groups, passing it along:

GDPR Compliance and Unstructured Data Storage
https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/

IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and 
highlights
https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/

Management GUI enhancements in IBM Spectrum Scale release 5.0.0
https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/

IBM Spectrum Scale 5.0.0 ? What?s new in NFS?
https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/

Benefits and implementation of Spectrum Scale sudo wrappers
https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/

IBM Spectrum Scale: Big Data and Analytics Solution Brief
https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/

Variant Sub-blocks in Spectrum Scale 5.0
https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/

Compression support in Spectrum Scale 5.0.0
https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/

IBM Spectrum Scale Versus Apache Hadoop HDFS
https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/

ESS Fault Tolerance
https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/


Genomic Workloads ? How To Get it Right From Infrastructure Point Of View.


https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/


IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM 
Spectrum Scale on AWS. This solution helps the users who require highly 
available access to a shared name space across multiple instances with 
good performance, without requiring an in-depth knowledge of IBM Spectrum 
Scale.
Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4
Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY.

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Cc:     Doris Conti/Poughkeepsie/IBM at IBMUS
Date:   01/10/2018 12:13 PM
Subject:        Re: Latest Technical Blogs on Spectrum Scale


Dear User Group Members,

Here are list of development blogs in the last quarter. Passing it to this 
email group as Doris had got a feedback in the UG meetings to notify the 
members with the latest updates periodically.

Genomic Workloads ? How To Get it Right From Infrastructure Point Of View.
https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/

IBM Spectrum Scale Versus Apache Hadoop HDFS
https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/

ESS Fault Tolerance
https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/

IBM Spectrum Scale MMFSCK ? Savvy Enhancements
https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/

ESS Disk Management
https://developer.ibm.com/storage/2018/01/02/ess-disk-management/

IBM Spectrum Scale Object Protocol On Ubuntu
https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/

IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object
https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/

A Complete Guide to ? Protocol Problem Determination Guide for IBM 
Spectrum Scale? ? Part 1
https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/

IBM Spectrum Scale installation toolkit ? enhancements over releases
https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/

Network requirements in an Elastic Storage Server Setup
https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/

Co-resident migration with Transparent cloud tierin
https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/

IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big 
Data Solution
https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/

Big data analytics with Spectrum Scale using remote cluster mount & 
multi-filesystem support
https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/

IBM Spectrum Scale HDFS Transparency Short Circuit Write Support
https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/

IBM Spectrum Scale HDFS Transparency Federation Support
https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/

How to configure and performance tuning different system workloads on IBM 
Spectrum Scale Sharing Nothing Cluster
https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

How to configure and performance tuning Spark workloads on IBM Spectrum 
Scale Sharing Nothing Cluster
https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

How to configure and performance tuning database workloads on IBM Spectrum 
Scale Sharing Nothing Cluster
https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

How to configure and performance tuning Hadoop workloads on IBM Spectrum 
Scale Sharing Nothing Cluster

https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning
https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/

How to Configure IBM Spectrum Scale? with NIS based Authentication.
https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/


For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Cc:     Doris Conti/Poughkeepsie/IBM at IBMUS
Date:   11/16/2017 08:15 PM
Subject:        Latest Technical Blogs on Spectrum Scale


Dear User Group members,

Here are the Development Blogs in last 3 months on Spectrum Scale 
Technical Topics.

Spectrum Scale Monitoring ? Know More ?
https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/

IBM Spectrum Scale 5.0 Release ? What?s coming !
https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/

Four Essentials things to know for managing data ACLs on IBM Spectrum 
Scale? from Windows
https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/

GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server
https://developer.ibm.com/storage/2017/11/13/gssutils/

IBM Spectrum Scale Object Authentication
https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/

Video Surveillance ? Choosing the right storage
https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/


IBM Spectrum scale object deep dive training with problem determination
https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training


Spectrum Scale as preferred software defined storage for Ubuntu OpenStack
https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/

IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a 
performance workhorse
https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/

A Complete Guide to Configure LDAP-based authentication with IBM Spectrum 
Scale? for File Access
https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/

Deploying IBM Spectrum Scale on AWS Quick Start
https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/

Monitoring Spectrum Scale Object metrics
https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/

Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk 
Universal

https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/

Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol 
on IBM Spectrum Scale??
https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/

IBM Spectrum Scale? Authentication using Active Directory and LDAP
https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/

IBM Spectrum Scale? Authentication using Active Directory and RFC2307
https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/

High Availability Implementation with IBM Spectrum Virtualize and IBM 
Spectrum Scale
https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/

10 Frequently asked Questions on configuring Authentication using AD + 
AUTO ID mapping on IBM Spectrum Scale?.
https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/

IBM Spectrum Scale? Authentication using Active Directory
https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/

Five cool things that you didn?t know Transparent Cloud Tiering on 
Spectrum Scale can do
https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/

IBM Spectrum Scale GUI videos
https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/

IBM Spectrum Scale? Authentication ? Planning for NFS Access
https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190903/9eff9450/attachment.htm>

From janfrode at tanso.net  Tue Sep  3 14:07:44 2019
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Tue, 3 Sep 2019 15:07:44 +0200
Subject: [gpfsug-discuss] Fileheat - does work! Complete test/example
 provided here.
In-Reply-To: <OFC33EC1EE.D3A3A94B-ON85258455.00474286-85258455.004A6C4F@notes.na.collabserv.com>
References: <def66a66-1872-31d6-8e8f-2e1d713b775e@science-computing.de>
	<OFE4B1B51C.8A47C241-ON85258454.004A9ABD-85258454.004AFB07@notes.na.collabserv.com>
	<c8035af7-61dc-8b91-6087-7dd95e499fdc@science-computing.de>
	<CAHwPathMfF-6=rdWijkv4vrYRD4QNJHvGOJsEwf2yX-5N3PRxg@mail.gmail.com>
	<OFC33EC1EE.D3A3A94B-ON85258455.00474286-85258455.004A6C4F@notes.na.collabserv.com>
Message-ID: <CAHwPatiJzh6WQFhbyv67KG6RfU2jH950qayKzcJV4ZSgYbjH=Q@mail.gmail.com>

Thanks for this example, very userful, but I'm still struggeling a bit at a
customer..


We're doing heat daily based rebalancing, with fileheatlosspercent=20 and
fileheatperiodminutes=720:

RULE "defineTiers" GROUP POOL 'Tiers'
        IS 'ssdpool' LIMIT(70)
        then 'saspool'

RULE 'Rebalance' MIGRATE FROM POOL 'Tiers' TO POOL 'Tiers'
WEIGHT(FILE_HEAT) WHERE FILE_SIZE<10000000000

but are seeing too many files moved down to the saspool and too few are
staying in the ssdpool. Right now we ran a test of this policy, and saw
that it wanted to move 130k files / 300 GB down to the saspool, and a
single small file up to the ssdpool -- even though the ssdpool is only 50%
utilized.

Running your listing policy reveals lots of files with zero heat:

<7> /gpfs/gpfs0/file1     RULE 'fh2' LIST 'fh'  WEIGHT(0.000000) SHOW(
_NULL_ _NULL_ _NULL_ +0.00000000000000E+000 _NULL_ 720 25 server.locale)

<7> /gpfs/gpfs0/file2     RULE 'fh2' LIST 'fh' WEIGHT(0.000000) SHOW(
_NULL_ _NULL_ _NULL_ +0.00000000000000E+000 _NULL_ 720 25 server.locale)

<7> /gpfs/gpfs0/file3/HM_WVS_8P41017_1/HM_WVS_8P41017_1.S2206      RULE
'fh2' LIST 'fh' WEIGHT(0.000000) SHOW( _NULL_ _NULL_ _NULL_
+0.00000000000000E+000 _NULL_ 720 25 server.locale)


and others with heat:


<5> /gpfs/gpfs0/file4  RULE 'fh2' LIST 'fh' WEIGHT(0.004246) SHOW(
300401047 0 0 +4.24600492924153E-003 11E7C19700000000 720 25 server.locale)

<5> /gpfs/gpfs0/file5  RULE 'fh2' LIST 'fh' WEIGHT(0.001717) SHOW(
120971793 1 0 +1.71725239616613E-003 0735E21100010000 720 25 server.locale)

These are not new files -- so we're wondering if maybe the fileheat is
reduced to zero/NULL after  a while (how many times can it shrink by 25%
before it's zero??).

Would it make sense to increase fileheatperiodeminutes and/or decrease
fileheatlosspercentage? What would be good values? (BTW: we have relatime
enabled)

Any other ideas for why it won't fill up our ssdpool to close to LIMIT(70) ?


  -jf


On Tue, Aug 13, 2019 at 3:33 PM Marc A Kaplan <makaplan at us.ibm.com> wrote:

> Yes, you are correct. It should only be necessary to set
> fileHeatPeriodMinutes, since the loss percent does have a default value.
> But IIRC (I implemented part of this!) you must restart the daemon to get
> those fileheat parameter(s) "loaded"and initialized into the daemon
> processes.
>
> Not fully trusting my memory... I will now "prove" this works today as
> follows:
>
> To test, create and re-read a large file with dd...
>
> [root@/main/gpfs-git]$mmchconfig fileHeatPeriodMinutes=60
> mmchconfig: Command successfully completed
> ...
> [root@/main/gpfs-git]$mmlsconfig | grep -i heat
> fileHeatPeriodMinutes 60
>
> [root@/main/gpfs-git]$mmshutdown
> ...
> [root@/main/gpfs-git]$mmstartup
> ...
> [root@/main/gpfs-git]$mmmount c23
> ...
> [root@/main/gpfs-git]$ls -l /c23/10g
> -rw-r--r--. 1 root root 10737418240 May 16 15:09 /c23/10g
>
> [root@/main/gpfs-git]$mmlsattr -d -X /c23/10g
> file name: /c23/10g
> security.selinux
>
> (NO fileheat attribute yet...)
>
> [root@/main/gpfs-git]$dd if=/c23/10g bs=1M of=/dev/null
> ...
> After the command finishes, you may need to wait a while for the metadata
> to flush to the inode on disk ... or you can force that with an unmount or
> a mmfsctl...
>
> Then the fileheat attribute will appear (I just waited by answering
> another email... No need to do any explicit operations on the file system..)
>
> [root@/main/gpfs-git]$mmlsattr -d -X /c23/10g
> file name: /c23/10g
> security.selinux
> gpfs.FileHeat
>
> To see its hex string value:
>
> [root@/main/gpfs-git]$mmlsattr -d -X -L /c23/10g
> file name: /c23/10g
> ...
> security.selinux:
> 0x756E636F6E66696E65645F753A6F626A6563745F723A756E6C6162656C65645F743A733000
> gpfs.FileHeat: 0x000000EE42A40400
>
> Which will be interpreted by mmapplypolicy...
>
> YES, the interpretation is relative to last access time and current time,
> and done by a policy/sql function "computeFileHeat"
> (You could find this using m4 directives in your policy file...)
>
>
> define([FILE_HEAT],[computeFileHeat(CURRENT_TIMESTAMP-ACCESS_TIME,xattr('gpfs.FileHeat'),KB_ALLOCATED)])
>
> Well gone that far, might as well try mmapplypolicy too....
>
> [root@/main/gpfs-git]$cat /gh/policies/fileheat.policy
> define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1)
> END])
>
> rule fh1 external list 'fh' exec ''
> rule fh2 list 'fh' weight(FILE_HEAT)
> show(DISPLAY_NULL(xattr_integer('gpfs.FileHeat',1,4,'B')) || ' ' ||
> DISPLAY_NULL(xattr_integer('gpfs.FileHeat',5,2,'B')) || ' ' ||
> DISPLAY_NULL(xattr_integer('gpfs.FileHeat',7,2,'B')) || ' ' ||
> DISPLAY_NULL(FILE_HEAT) || ' ' ||
> DISPLAY_NULL(hex(xattr('gpfs.FileHeat'))) || ' ' ||
> getmmconfig('fileHeatPeriodMinutes') || ' ' ||
> getmmconfig('fileHeatLossPercent') || ' ' ||
> getmmconfig('clusterName') )
>
>
> [root@/main/gpfs-git]$mmapplypolicy /c23 --maxdepth 1 -P
> /gh/policies/fileheat.policy -I test -L 3
> ...
> <1> /c23/10g RULE 'fh2' LIST 'fh' WEIGHT(0.022363) SHOW( 238 17060 1024
> +2.23632812500000E-002 000000EE42A40400 60 10 makaplan.sl.cloud9.ibm.com)
> ...
> WEIGHT(0.022363) LIST 'fh' /c23/10g SHOW(238 17060 1024
> +2.23632812500000E-002 000000EE42A40400 60 10 makaplan.sl.cloud9.ibm.com)
>
>
>
>
> [image: Inactive hide details for Jan-Frode Myklebust ---08/13/2019
> 06:22:46 AM---What about filesystem atime updates. We recently chan]Jan-Frode
> Myklebust ---08/13/2019 06:22:46 AM---What about filesystem atime updates.
> We recently changed the default to ?relatime?. Could that maybe
>
> From: Jan-Frode Myklebust <janfrode at tanso.net>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 08/13/2019 06:22 AM
> Subject: [EXTERNAL] Re: [gpfsug-discuss] Fileheat
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
>
> What about filesystem atime updates. We recently changed the default to
> ?relatime?. Could that maybe influence heat tracking?
>
>
>
>   -jf
>
>
> tir. 13. aug. 2019 kl. 11:29 skrev Ulrich Sibiller <
> *u.sibiller at science-computing.de* <u.sibiller at science-computing.de>>:
>
>    On 12.08.19 15:38, Marc A Kaplan wrote:
>    > My Admin guide says:
>    >
>    > The loss percentage and period are set via the configuration
>    > variables *fileHeatLossPercent *and *fileHeatPeriodMinutes*. By
>    default, the file access temperature
>    > is not
>    > tracked. To use access temperature in policy, the tracking must
>    first be enabled. To do this, set
>    > the two
>    > configuration variables as follows:*
>
>    Yes, I am aware of that.
>
>    > fileHeatLossPercent*
>    > The percentage (between 0 and 100) of file access temperature
>    dissipated over the*
>    > fileHeatPeriodMinutes *time. The default value is 10.
>    > Chapter 25. Information lifecycle management for IBM Spectrum Scale
>    *361**
>    > fileHeatPeriodMinutes*
>    > The number of minutes defined for the recalculation of file access
>    temperature. To turn on
>    > tracking, *fileHeatPeriodMinutes *must be set to a nonzero value.
>    The default value is 0
>    >
>    >
>    > SO Try setting both!
>
>    Well, I have not because the documentation explicitly mentions a
>    default. What's the point of a
>    default if I have to explicitly configure it?
>
>    > ALSO to take effect you may have to mmshutdown and mmstartup, at
>    least on the (client gpfs) nodes
>    > that are accessing the files of interest.
>
>    I have now configured both parameters and restarted GPFS. Ran a tar
>    over a directory - still no
>    change. I will wait for 720minutes and retry (tomorrow).
>
>    Thanks
>
>    Uli
>
>    --
>    Science + Computing AG
>    Vorstandsvorsitzender/Chairman of the board of management:
>    Dr. Martin Matzke
>    Vorstand/Board of Management:
>    Matthias Schempp, Sabine Hohenstein
>    Vorsitzender des Aufsichtsrats/
>    Chairman of the Supervisory Board:
>    Philippe Miltin
>    Aufsichtsrat/Supervisory Board:
>    Martin Wibbe, Ursula Morgenstern
>    Sitz/Registered Office: Tuebingen
>    Registergericht/Registration Court: Stuttgart
>    Registernummer/Commercial Register No.: HRB 382196
>    _______________________________________________
>    gpfsug-discuss mailing list
>    gpfsug-discuss at *spectrumscale.org* <http://spectrumscale.org>
> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
>    <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>    _______________________________________________
>    gpfsug-discuss mailing list
>    gpfsug-discuss at spectrumscale.org
>    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190903/d0e482ad/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190903/d0e482ad/attachment.gif>

From Robert.Oesterlin at nuance.com  Tue Sep  3 16:37:58 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Tue, 3 Sep 2019 15:37:58 +0000
Subject: [gpfsug-discuss] Easiest way to copy quota settings from one file
	system to another?
Message-ID: <63C132C3-63AF-465B-8FD9-67AF9EA4887D@nuance.com>

I?m migratinga  file system from one cluster to another.

I want to copy all user quotas from cluster1 filesystem ?A? to cluster2, filesystem ?fs1?, fileset ?A?

What?s the easiest way to do that? I?m thinking mmsetquota with a stanza file, but is there a tool to generate the stanza file from the source? I could do a ?mmrepquota -u -Y? and process the output. Hoping for something easier :)


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190903/fcd817af/attachment.htm>

From andreas.mattsson at maxiv.lu.se  Thu Sep  5 10:54:04 2019
From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson)
Date: Thu, 5 Sep 2019 09:54:04 +0000
Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction
Message-ID: <3ed969d0d778446982a419067320f927@maxiv.lu.se>

Hi,

Does anyone here know if cache eviction on a AFM cache also make the inodes used by the evicted files available for reuse?

Basically, I'm trying to figure out if it is enough to have sufficient inode space in my cache filesets to keep the maximum expected simultaneously cached files, or if I need the same inode space as for the total amount of files that will reside in the home of the cache.

Regards,
Andreas Mattsson


____________________________________________

[X]

Andreas Mattsson

Systems Engineer


MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 224 84 Lund
Mobile: +46 706 64 95 44
<mailto:andreas.mattsson at maxiv.se>andreas.mattsson at maxiv.lu.se<mailto:andreas.mattsson at maxiv.lu.se>
www.maxiv.se<http://www.maxiv.se/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190905/d4d7957b/attachment.htm>

From vpuvvada at in.ibm.com  Thu Sep  5 14:28:00 2019
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Thu, 5 Sep 2019 18:58:00 +0530
Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction
In-Reply-To: <3ed969d0d778446982a419067320f927@maxiv.lu.se>
References: <3ed969d0d778446982a419067320f927@maxiv.lu.se>
Message-ID: <OF71EB5401.26EBDBE8-ON6525846C.00499F4C-6525846C.0049F93B@notes.na.collabserv.com>

Hi,

AFM does not support inode eviction, only data blocks are evicted and the 
file's metadata will remain in the fileset.

~Venkat (vpuvvada at in.ibm.com)


From:   Andreas Mattsson <andreas.mattsson at maxiv.lu.se>
To:     GPFS User Group <gpfsug-discuss at spectrumscale.org>
Date:   09/05/2019 03:39 PM
Subject:        [EXTERNAL] [gpfsug-discuss] Inode reuse on AFM cache 
eviction
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi,
Does anyone here know if cache eviction on a AFM cache also make the 
inodes used by the evicted files available for reuse?
Basically, I'm trying to figure out if it is enough to have sufficient 
inode space in my cache filesets to keep the maximum expected 
simultaneously cached files, or if I need the same inode space as for the 
total amount of files that will reside in the home of the cache. 

Regards,
Andreas Mattsson

____________________________________________


Andreas Mattsson
Systems Engineer
 
MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 224 84 Lund
Mobile: +46 706 64 95 44
andreas.mattsson at maxiv.lu.se
www.maxiv.se
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=5omqUvEiiIKUhShJOBEgb3WwLU5uy-8o_4--y0TOuw0&s=ZFAcjvG5LrsnsCJgIf9f1320V866HKG6iJGteRQ7oac&e= 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190905/c2373088/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 4232 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190905/c2373088/attachment.png>

From sakkuma4 at in.ibm.com  Thu Sep  5 19:37:47 2019
From: sakkuma4 at in.ibm.com (Saket Kumar11)
Date: Thu, 5 Sep 2019 18:37:47 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 92, Issue 4
In-Reply-To: <mailman.1.1567681201.8935.gpfsug-discuss@spectrumscale.org>
References: <mailman.1.1567681201.8935.gpfsug-discuss@spectrumscale.org>
Message-ID: <OF0B532D53.2DF1657A-ON0025846C.0065135F-0025846C.0066565A@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190905/cd4a915b/attachment.htm>

From sakkuma4 at in.ibm.com  Thu Sep  5 20:06:17 2019
From: sakkuma4 at in.ibm.com (Saket Kumar11)
Date: Thu, 5 Sep 2019 19:06:17 +0000
Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction
In-Reply-To: <mailman.1.1567681201.8935.gpfsug-discuss@spectrumscale.org>
Message-ID: <OFEAE178B9.0A03E0A1-ON0025846C.006864D0-0025846C.0068F24D@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190905/0051eb26/attachment.htm>

From son.truong at bristol.ac.uk  Fri Sep  6 10:48:56 2019
From: son.truong at bristol.ac.uk (Son Truong)
Date: Fri, 6 Sep 2019 09:48:56 +0000
Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
Message-ID: <DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>

Hello,

Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on RHEL 7.7?

I am failing with these errors:

[root at host ~]# uname -a
Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

[root at host ~]# rpm -qa | grep gpfs
gpfs.base-4.2.3-7.x86_64
gpfs.gskit-8.0.50-75.x86_64
gpfs.ext-4.2.3-7.x86_64
gpfs.msg.en_US-4.2.3-7.noarch
gpfs.docs-4.2.3-7.noarch
gpfs.gpl-4.2.3-7.noarch

[root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl
--------------------------------------------------------
mmbuildgpl: Building GPL module begins at Fri Sep  6 09:30:20 UTC 2019.
--------------------------------------------------------
Verifying Kernel Header...
  kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64, 3.10.0-1062)
  module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include
  module build dir   = /lib/modules/3.10.0-1062.el7.x86_64/build
  kernel source dir  = /usr/src/linux-3.10.0-1062.el7.x86_64/include
  Found valid kernel header file under /usr/src/kernels/3.10.0-1062.el7.x86_64/include
Verifying Compiler...
  make is present at /bin/make
  cpp is present at /bin/cpp
  gcc is present at /bin/gcc
  g++ is present at /bin/g++
  ld is present at /bin/ld
Verifying Additional System Headers...
  Verifying kernel-headers is installed ...
    Command: /bin/rpm -q kernel-headers
    The required package kernel-headers is installed
make World ...
Verifying that tools to build the portability layer exist....
cpp present
gcc present
g++ present
ld present
cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit $? || exit 1
rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib
mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib
rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver
cleaning (/usr/lpp/mmfs/src/ibm-kxi)
make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi'
rm -f trcid.h ibm_kxi.trclst

[cut]

Invoking Kbuild...
/usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config  ; \
if [ $? -ne 0 ]; then \
        exit 1;\
fi
make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
  LD      /usr/lpp/mmfs/src/gpl-linux/built-in.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracelin.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/relaytrc.o
  LD [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o
  LD [M]  /usr/lpp/mmfs/src/gpl-linux/mmfs26.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
/usr/lpp/mmfs/src/gpl-linux/inode.c: In function 'printInode':
/usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: 'struct inode' has no member named 'i_wb_list'
     _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP->i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev));
                                                         ^
/usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro '_TRACE_MACRO'
         { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP

[ cut ]

                          ^
/usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro 'TRACE6'
   TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8,
   ^
In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
/usr/lpp/mmfs/src/gpl-linux/inode.c: In function 'cxiInitInodeSecurity':
/usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of 'security_old_inode_init_security' from incompatible pointer type [enabled by default]
   rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name,
   ^
In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0,
                 from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61,
                 from /usr/lpp/mmfs/src/gpl-linux/dir.c:56,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
include/linux/security.h:1896:5: note: expected 'const char **' but argument is of type 'char **'
int security_old_inode_init_security(struct inode *inode, struct inode *dir,
     ^
In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
/usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function 'cache_get_name':
/usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration of function 'vfs_readdir' [-Werror=implicit-function-declaration]
     error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer);
     ^
cc1: some warnings being treated as errors
make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1
make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2
make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
make[1]: *** [modules] Error 1
make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
make: *** [Modules] Error 1
--------------------------------------------------------
mmbuildgpl: Building GPL module failed at Fri Sep  6 09:30:28 UTC 2019.
--------------------------------------------------------
mmbuildgpl: Command failed. Examine previous error messages to determine cause.

Any help appreciated...
Son

Son V Truong - Senior Storage Administrator
Advanced Computing Research Centre
IT Services, University of Bristol
Email: son.truong at bristol.ac.uk<mailto:s.truong at bristol.ac.uk>
Tel: Mobile: +44 (0) 7732 257 232
Address: 31 Great George Street, Bristol, BS1 5QD

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/aaaed906/attachment.htm>

From david_johnson at brown.edu  Fri Sep  6 11:24:51 2019
From: david_johnson at brown.edu (david_johnson at brown.edu)
Date: Fri, 6 Sep 2019 06:24:51 -0400
Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
In-Reply-To: <DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>
References: <DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>
Message-ID: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>

We are starting rolling upgrade to 5.0.3-x and gplbin compiles with non-fatal warnings at that version. It seems to run fine. The rest of the cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do you have a reason to not go for the latest release on either the 4- or 5- line?

[root at xxx ~]# ssh node1301 rpm -q gpfs.base
gpfs.base-4.2.3-10.x86_64


  -- ddj
Dave Johnson

> On Sep 6, 2019, at 5:48 AM, Son Truong <son.truong at bristol.ac.uk> wrote:
> 
> Hello,
>  
> Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on RHEL 7.7?
>  
> I am failing with these errors:
>  
> [root at host ~]# uname -a
> Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
>  
> [root at host ~]# rpm -qa | grep gpfs
> gpfs.base-4.2.3-7.x86_64
> gpfs.gskit-8.0.50-75.x86_64
> gpfs.ext-4.2.3-7.x86_64
> gpfs.msg.en_US-4.2.3-7.noarch
> gpfs.docs-4.2.3-7.noarch
> gpfs.gpl-4.2.3-7.noarch
>  
> [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl
> --------------------------------------------------------
> mmbuildgpl: Building GPL module begins at Fri Sep  6 09:30:20 UTC 2019.
> --------------------------------------------------------
> Verifying Kernel Header...
>   kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64, 3.10.0-1062)
>   module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include
>   module build dir   = /lib/modules/3.10.0-1062.el7.x86_64/build
>   kernel source dir  = /usr/src/linux-3.10.0-1062.el7.x86_64/include
>   Found valid kernel header file under /usr/src/kernels/3.10.0-1062.el7.x86_64/include
> Verifying Compiler...
>   make is present at /bin/make
>   cpp is present at /bin/cpp
>   gcc is present at /bin/gcc
>   g++ is present at /bin/g++
>   ld is present at /bin/ld
> Verifying Additional System Headers...
>   Verifying kernel-headers is installed ...
>     Command: /bin/rpm -q kernel-headers
>     The required package kernel-headers is installed
> make World ...
> Verifying that tools to build the portability layer exist....
> cpp present
> gcc present
> g++ present
> ld present
> cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit $? || exit 1
> rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib
> mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib
> rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver
> cleaning (/usr/lpp/mmfs/src/ibm-kxi)
> make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi'
> rm -f trcid.h ibm_kxi.trclst
>  
> [cut]
>  
> Invoking Kbuild...
> /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config  ; \
> if [ $? -ne 0 ]; then \
>         exit 1;\
> fi
> make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
>   LD      /usr/lpp/mmfs/src/gpl-linux/built-in.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracelin.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/relaytrc.o
>   LD [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o
>   LD [M]  /usr/lpp/mmfs/src/gpl-linux/mmfs26.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
> In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?:
> /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? has no member named ?i_wb_list?
>      _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP->i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev));
>                                                          ^
> /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro ?_TRACE_MACRO?
>          { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP
>  
> [ cut ]
>  
>                           ^
> /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro ?TRACE6?
>    TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8,
>    ^
> In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?:
> /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of ?security_old_inode_init_security? from incompatible pointer type [enabled by default]
>    rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name,
>    ^
> In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0,
>                  from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61,
>                  from /usr/lpp/mmfs/src/gpl-linux/dir.c:56,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> include/linux/security.h:1896:5: note: expected ?const char **? but argument is of type ?char **?
> int security_old_inode_init_security(struct inode *inode, struct inode *dir,
>      ^
> In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?:
> /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration of function ?vfs_readdir? [-Werror=implicit-function-declaration]
>      error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer);
>      ^
> cc1: some warnings being treated as errors
> make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1
> make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2
> make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
> make[1]: *** [modules] Error 1
> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
> make: *** [Modules] Error 1
> --------------------------------------------------------
> mmbuildgpl: Building GPL module failed at Fri Sep  6 09:30:28 UTC 2019.
> --------------------------------------------------------
> mmbuildgpl: Command failed. Examine previous error messages to determine cause.
>  
> Any help appreciated?
> Son
>  
> Son V Truong - Senior Storage Administrator
> Advanced Computing Research Centre
> IT Services, University of Bristol
> Email: son.truong at bristol.ac.uk
> Tel: Mobile: +44 (0) 7732 257 232
> Address: 31 Great George Street, Bristol, BS1 5QD
>  
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/c02620cf/attachment.htm>

From A.Wolf-Reber at de.ibm.com  Fri Sep  6 12:41:32 2019
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Fri, 6 Sep 2019 11:41:32 +0000
Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
In-Reply-To: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>
References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>,
	<DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>
Message-ID: <OF68980980.A6F8177F-ON0025846D.003FDECA-0025846D.00403A43@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/0a2d37c6/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.15677537609150.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/0a2d37c6/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.15677537609151.png
Type: image/png
Size: 6645 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/0a2d37c6/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.15677537609152.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/0a2d37c6/attachment-0002.png>

From Dugan.Witherick at warwick.ac.uk  Fri Sep  6 13:25:22 2019
From: Dugan.Witherick at warwick.ac.uk (Witherick, Dugan)
Date: Fri, 6 Sep 2019 12:25:22 +0000
Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
In-Reply-To: <OF68980980.A6F8177F-ON0025846D.003FDECA-0025846D.00403A43@notes.na.collabserv.com>
References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>	,
	<DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>
	<OF68980980.A6F8177F-ON0025846D.003FDECA-0025846D.00403A43@notes.na.collabserv.com>
Message-ID: <05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk>

Hi Son,

You might also find Table 39 on 
https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#fsm
useful as it lists the minimum Spectrum Scale Level supported and tested against the RHEL Distribution/kernel version.

Thanks,
Dugan

On Fri, 2019-09-06 at 11:41 +0000, Alexander Wolf wrote:
> RHEL 7.7 is not supported by any Scale release at the moment. We are
> qualifying it right now and would like to claim support with the next PTFs on
> both 4.2.3 and 5.0.3 streams. However we have seen issues in test that will
> probably cause delays. 
>  
> Picking up new minor RHEL updates before Scale claims support might work many
> times but is quite a risky business. I highly recommend waiting for our
> support statement.
>  
> Mit freundlichen Gr??en / Kind regards
> 
> 
>                             
>  
>      
> Dr. Alexander Wolf-Reber
> Spectrum Scale Release Lead Architect
> Department M069 / Spectrum Scale Software Development
> 
> +49-160-90540880
> a.wolf-reber at de.ibm.com
>  
> IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats:
> Matthias Hartmann / Gesch?ftsf?hrung: Dirk Wittkopp
> Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB
> 243294
> 
>  
>  
> > ----- Original message -----
> > From: david_johnson at brown.edu
> > Sent by: gpfsug-discuss-bounces at spectrumscale.org
> > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > Cc:
> > Subject: [EXTERNAL] Re: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
> > Date: Fri, Sep 6, 2019 12:33
> >  
> > We are starting rolling upgrade to 5.0.3-x and gplbin compiles with non-
> > fatal warnings at that version. It seems to run fine. The rest of the
> > cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do you have a
> > reason to not go for the latest release on either the 4- or 5- line?
> >  
> > [root at xxx ~]# ssh node1301 rpm -q gpfs.base
> > gpfs.base-4.2.3-10.x86_64
> >  
> >  
> >   -- ddj
> > Dave Johnson
> > 
> > On Sep 6, 2019, at 5:48 AM, Son Truong <son.truong at bristol.ac.uk> wrote:
> >  
> > > Hello,
> > >  
> > > Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on
> > > RHEL 7.7?
> > >  
> > > I am failing with these errors:
> > >  
> > > [root at host ~]# uname -a
> > > Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019
> > > x86_64 x86_64 x86_64 GNU/Linux
> > >  
> > > [root at host ~]# rpm -qa | grep gpfs
> > > gpfs.base-4.2.3-7.x86_64
> > > gpfs.gskit-8.0.50-75.x86_64
> > > gpfs.ext-4.2.3-7.x86_64
> > > gpfs.msg.en_US-4.2.3-7.noarch
> > > gpfs.docs-4.2.3-7.noarch
> > > gpfs.gpl-4.2.3-7.noarch
> > >  
> > > [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl
> > > --------------------------------------------------------
> > > mmbuildgpl: Building GPL module begins at Fri Sep  6 09:30:20 UTC 2019.
> > > --------------------------------------------------------
> > > Verifying Kernel Header...
> > >   kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64,
> > > 3.10.0-1062)
> > >   module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include
> > >   module build dir   = /lib/modules/3.10.0-1062.el7.x86_64/build
> > >   kernel source dir  = /usr/src/linux-3.10.0-1062.el7.x86_64/include
> > >   Found valid kernel header file under /usr/src/kernels/3.10.0-
> > > 1062.el7.x86_64/include
> > > Verifying Compiler...
> > >   make is present at /bin/make
> > >   cpp is present at /bin/cpp
> > >   gcc is present at /bin/gcc
> > >   g++ is present at /bin/g++
> > >   ld is present at /bin/ld
> > > Verifying Additional System Headers...
> > >   Verifying kernel-headers is installed ...
> > >     Command: /bin/rpm -q kernel-headers
> > >     The required package kernel-headers is installed
> > > make World ...
> > > Verifying that tools to build the portability layer exist....
> > > cpp present
> > > gcc present
> > > g++ present
> > > ld present
> > > cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit
> > > $? || exit 1
> > > rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin
> > > /usr/lpp/mmfs/src/lib
> > > mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin
> > > /usr/lpp/mmfs/src/lib
> > > rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver
> > > cleaning (/usr/lpp/mmfs/src/ibm-kxi)
> > > make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi'
> > > rm -f trcid.h ibm_kxi.trclst
> > >  
> > > [cut]
> > >  
> > > Invoking Kbuild...
> > > /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64
> > > M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config  ; \
> > > if [ $? -ne 0 ]; then \
> > >         exit 1;\
> > > fi
> > > make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
> > >   LD      /usr/lpp/mmfs/src/gpl-linux/built-in.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracelin.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/relaytrc.o
> > >   LD [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o
> > >   LD [M]  /usr/lpp/mmfs/src/gpl-linux/mmfs26.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?:
> > > /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? has no
> > > member named ?i_wb_list?
> > >      _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)),
> > > (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP-
> > > >i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev));
> > >                                                          ^
> > > /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro
> > > _TRACE_MACRO?
> > >          { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP
> > >  
> > > [ cut ]
> > >  
> > >                           ^
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro
> > > ?TRACE6?
> > >    TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8,
> > >    ^
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of
> > > ?security_old_inode_init_security? from incompatible pointer type [enabled
> > > by default]
> > >    rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name,
> > >    ^
> > > In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0,
> > >                  from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/dir.c:56,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > include/linux/security.h:1896:5: note: expected ?const char **? but
> > > argument is of type ?char **?
> > > int security_old_inode_init_security(struct inode *inode, struct inode
> > > *dir,
> > >      ^
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?:
> > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration
> > > of function ?vfs_readdir? [-Werror=implicit-function-declaration]
> > >      error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer);
> > >      ^
> > > cc1: some warnings being treated as errors
> > > make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1
> > > make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2
> > > make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
> > > make[1]: *** [modules] Error 1
> > > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
> > > make: *** [Modules] Error 1
> > > --------------------------------------------------------
> > > mmbuildgpl: Building GPL module failed at Fri Sep  6 09:30:28 UTC 2019.
> > > --------------------------------------------------------
> > > mmbuildgpl: Command failed. Examine previous error messages to determine
> > > cause.
> > >  
> > > Any help appreciated?
> > > Son
> > >  
> > > Son V Truong - Senior Storage Administrator
> > > Advanced Computing Research Centre
> > > IT Services, University of Bristol
> > > Email: son.truong at bristol.ac.uk
> > > Tel: Mobile: +44 (0) 7732 257 232
> > > Address: 31 Great George Street, Bristol, BS1 5QD
> > >  
> >  
> > > _______________________________________________
> > > gpfsug-discuss mailing list
> > > gpfsug-discuss at spectrumscale.org
> > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > 
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> 
>  
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From son.truong at bristol.ac.uk  Fri Sep  6 15:15:04 2019
From: son.truong at bristol.ac.uk (Son Truong)
Date: Fri, 6 Sep 2019 14:15:04 +0000
Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
In-Reply-To: <05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk>
References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>	,
	<DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>
	<OF68980980.A6F8177F-ON0025846D.003FDECA-0025846D.00403A43@notes.na.collabserv.com>
	<05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk>
Message-ID: <DB6PR0602MB2933853D2AEDBC8AD3A4A67AD2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>

Thank you. Table 39 is most helpful.

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Witherick, Dugan
Sent: 06 September 2019 13:25
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Compiling gplbin on RHEL 7.7

Hi Son,

You might also find Table 39 on
https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#fsm
useful as it lists the minimum Spectrum Scale Level supported and tested against the RHEL Distribution/kernel version.

Thanks,
Dugan

On Fri, 2019-09-06 at 11:41 +0000, Alexander Wolf wrote:
> RHEL 7.7 is not supported by any Scale release at the moment. We are 
> qualifying it right now and would like to claim support with the next 
> PTFs on both 4.2.3 and 5.0.3 streams. However we have seen issues in 
> test that will probably cause delays.
>  
> Picking up new minor RHEL updates before Scale claims support might 
> work many times but is quite a risky business. I highly recommend 
> waiting for our support statement.
>  
> Mit freundlichen Gr??en / Kind regards
> 
> 
>                             
>  
>      
> Dr. Alexander Wolf-Reber
> Spectrum Scale Release Lead Architect
> Department M069 / Spectrum Scale Software Development
> 
> +49-160-90540880
> a.wolf-reber at de.ibm.com
>  
> IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats:
> Matthias Hartmann / Gesch?ftsf?hrung: Dirk Wittkopp Sitz der 
> Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB
> 243294
> 
>  
>  
> > ----- Original message -----
> > From: david_johnson at brown.edu
> > Sent by: gpfsug-discuss-bounces at spectrumscale.org
> > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > Cc:
> > Subject: [EXTERNAL] Re: [gpfsug-discuss] Compiling gplbin on RHEL 
> > 7.7
> > Date: Fri, Sep 6, 2019 12:33
> >  
> > We are starting rolling upgrade to 5.0.3-x and gplbin compiles with 
> > non- fatal warnings at that version. It seems to run fine. The rest 
> > of the cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do 
> > you have a reason to not go for the latest release on either the 4- or 5- line?
> >  
> > [root at xxx ~]# ssh node1301 rpm -q gpfs.base
> > gpfs.base-4.2.3-10.x86_64
> >  
> >  
> >   -- ddj
> > Dave Johnson
> > 
> > On Sep 6, 2019, at 5:48 AM, Son Truong <son.truong at bristol.ac.uk> wrote:
> >  
> > > Hello,
> > >  
> > > Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel 
> > > modules on RHEL 7.7?
> > >  
> > > I am failing with these errors:
> > >  
> > > [root at host ~]# uname -a
> > > Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 
> > > 2019
> > > x86_64 x86_64 x86_64 GNU/Linux
> > >  
> > > [root at host ~]# rpm -qa | grep gpfs
> > > gpfs.base-4.2.3-7.x86_64
> > > gpfs.gskit-8.0.50-75.x86_64
> > > gpfs.ext-4.2.3-7.x86_64
> > > gpfs.msg.en_US-4.2.3-7.noarch
> > > gpfs.docs-4.2.3-7.noarch
> > > gpfs.gpl-4.2.3-7.noarch
> > >  
> > > [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl
> > > --------------------------------------------------------
> > > mmbuildgpl: Building GPL module begins at Fri Sep  6 09:30:20 UTC 2019.
> > > --------------------------------------------------------
> > > Verifying Kernel Header...
> > >   kernel version = 31000999 (31000999000000, 
> > > 3.10.0-1062.el7.x86_64,
> > > 3.10.0-1062)
> > >   module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include
> > >   module build dir   = /lib/modules/3.10.0-1062.el7.x86_64/build
> > >   kernel source dir  = /usr/src/linux-3.10.0-1062.el7.x86_64/include
> > >   Found valid kernel header file under /usr/src/kernels/3.10.0- 
> > > 1062.el7.x86_64/include Verifying Compiler...
> > >   make is present at /bin/make
> > >   cpp is present at /bin/cpp
> > >   gcc is present at /bin/gcc
> > >   g++ is present at /bin/g++
> > >   ld is present at /bin/ld
> > > Verifying Additional System Headers...
> > >   Verifying kernel-headers is installed ...
> > >     Command: /bin/rpm -q kernel-headers
> > >     The required package kernel-headers is installed make World 
> > > ...
> > > Verifying that tools to build the portability layer exist....
> > > cpp present
> > > gcc present
> > > g++ present
> > > ld present
> > > cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > 
> > > ./def.mk; exit $? || exit 1 rm -rf /usr/lpp/mmfs/src/include 
> > > /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib mkdir 
> > > /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin 
> > > /usr/lpp/mmfs/src/lib rm -f 
> > > //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver
> > > cleaning (/usr/lpp/mmfs/src/ibm-kxi)
> > > make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi'
> > > rm -f trcid.h ibm_kxi.trclst
> > >  
> > > [cut]
> > >  
> > > Invoking Kbuild...
> > > /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 
> > > ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux 
> > > CONFIGDIR=/usr/lpp/mmfs/src/config  ; \ if [ $? -ne 0 ]; then \
> > >         exit 1;\
> > > fi
> > > make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
> > >   LD      /usr/lpp/mmfs/src/gpl-linux/built-in.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracelin.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/relaytrc.o
> > >   LD [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o
> > >   LD [M]  /usr/lpp/mmfs/src/gpl-linux/mmfs26.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?:
> > > /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? 
> > > has no member named ?i_wb_list?
> > >      _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), 
> > > (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), 
> > > (Int64)(iP->i_wb_list.prev), (Int64)(&(iP-
> > > >i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev));
> > >                                                          ^
> > > /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition 
> > > of macro _TRACE_MACRO?
> > >          { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP
> > >  
> > > [ cut ]
> > >  
> > >                           ^
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of 
> > > macro ?TRACE6?
> > >    TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8,
> > >    ^
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing 
> > > argument 4 of ?security_old_inode_init_security? from incompatible 
> > > pointer type [enabled by default]
> > >    rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name,
> > >    ^
> > > In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0,
> > >                  from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/dir.c:56,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > include/linux/security.h:1896:5: note: expected ?const char **? 
> > > but argument is of type ?char **?
> > > int security_old_inode_init_security(struct inode *inode, struct 
> > > inode *dir,
> > >      ^
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?:
> > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit 
> > > declaration of function ?vfs_readdir? [-Werror=implicit-function-declaration]
> > >      error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer);
> > >      ^
> > > cc1: some warnings being treated as errors
> > > make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1
> > > make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2
> > > make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
> > > make[1]: *** [modules] Error 1
> > > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
> > > make: *** [Modules] Error 1
> > > --------------------------------------------------------
> > > mmbuildgpl: Building GPL module failed at Fri Sep  6 09:30:28 UTC 2019.
> > > --------------------------------------------------------
> > > mmbuildgpl: Command failed. Examine previous error messages to 
> > > determine cause.
> > >  
> > > Any help appreciated?
> > > Son
> > >  
> > > Son V Truong - Senior Storage Administrator Advanced Computing 
> > > Research Centre IT Services, University of Bristol
> > > Email: son.truong at bristol.ac.uk
> > > Tel: Mobile: +44 (0) 7732 257 232
> > > Address: 31 Great George Street, Bristol, BS1 5QD
> > >  
> >  
> > > _______________________________________________
> > > gpfsug-discuss mailing list
> > > gpfsug-discuss at spectrumscale.org 
> > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > 
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
>  
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From Robert.Oesterlin at nuance.com  Fri Sep  6 16:42:39 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Fri, 6 Sep 2019 15:42:39 +0000
Subject: [gpfsug-discuss] SSUG Meeting at SC19: Save the date and call for
	user talks!
Message-ID: <B5C8A89A-3CAF-4738-AC10-7341A67F4941@nuance.com>

The Spectrum Scale User group will hold its annual meeting at SC19 on Sunday November 17th from 12:30PM -6PM In Denver, Co. We will be posting exact meeting location soon, but reserve this time. IBM will host a reception following the user group meeting.

We?re also looking for user talks - these are short update (20 mins or so) on your use of Spectrum Scale - any topics are welcome. If you are interested, please contact myself or Kristy Kallback-Rose.

Looking forward to seeing everyone in Denver!

Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/71c77399/attachment.htm>

From bipcuds at gmail.com  Mon Sep  9 21:29:28 2019
From: bipcuds at gmail.com (Keith Ball)
Date: Mon, 9 Sep 2019 16:29:28 -0400
Subject: [gpfsug-discuss] Anyone have experience with changing NSD server
 node name in an ESS/DSS cluster?
Message-ID: <CAAxuGpGoC6ZFbGKvvnT=eMfjaEfnm-G3XOXfTN+bhr5yzZ_yvw@mail.gmail.com>

Hi All,

We are thinking of attempting a non-destructive change of NSD server node
names in a Lenovo DSS cluster (DSS level 1.2a, which has Scale 4.2.3.5).
For a non-GNR cluster, changing a node name for an NSD server isn't a huge
deal if you can have a backup server serve up disks; one can mmdelnode then
mmaddnode, for instance.

Has anyone tried to rename the NSD servers in a GNR cluster, however? I am
not sure if it's as easy as failing over the recovery group, and
deleting/adding the NSD server. It's easy enough to modify xcat. Perhaps
mmchrecoverygroup can be used to change the RG names (since they are named
after the NSD servers), but that might not be necessary. Or, it might not
work - does anyone know if there is a special process to change NSD server
names in an E( or D or G)SS cluster that does not run afoul of GNR or
upgrade scripts?

Best regards,
  Keith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190909/317b232a/attachment.htm>

From TROPPENS at de.ibm.com  Wed Sep 11 13:20:22 2019
From: TROPPENS at de.ibm.com (Ulf Troppens)
Date: Wed, 11 Sep 2019 14:20:22 +0200
Subject: [gpfsug-discuss] Save the date: Oct 10 - Spectrum Scale NYC User
	Meeting
Message-ID: <OFEE4AA8A1.E86C67B9-ONC1258472.00425CA9-C1258472.0043C885@notes.na.collabserv.com>


Greetings,

NYU Langone and IBM will host a Spectrum Scale User Meeting on October 10.
Many senior engineers of our development lab in Poughkeepsie will attend
and present. Details with agenda, exact location and registration link will
follow.

Best
Ulf


--
IBM Spectrum Scale Development - Client Engagements & Solutions Delivery
Consulting IT Specialist
Author "Storage Networks Explained"

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart,
HRB 243294
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190911/3380e88a/attachment.htm>

From jjvilla at nccs.nasa.gov  Wed Sep 11 20:14:12 2019
From: jjvilla at nccs.nasa.gov (John J. Villa)
Date: Wed, 11 Sep 2019 15:14:12 -0400 (EDT)
Subject: [gpfsug-discuss] Introduction - New Subscriber
Message-ID: <alpine.DEB.2.02.1909111508550.28760@calvin2.nccs.nasa.gov>

Hello,

My name is John Villa. I work for NASA at the Nasa Center for Climate 
Simulation. We currently utilize GPFS as the primary filesystem on the 
discover cluster:
https://www.nccs.nasa.gov/systems/discover

I look forward to seeing everyone at SC19.

Thank You,
--
John J. Villa
NASA Center for Climate Simulation
Discover Systems Administrator


From damir.krstic at gmail.com  Thu Sep 12 15:16:03 2019
From: damir.krstic at gmail.com (Damir Krstic)
Date: Thu, 12 Sep 2019 09:16:03 -0500
Subject: [gpfsug-discuss] VerbsReconnectThread waiters
Message-ID: <CAKV+WqfbysaKM=We3NNm5essS=--Ej=wu+yzvLfousVp4cfOYw@mail.gmail.com>

On my cluster I have seen couple of long waiters such as this:

gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230
VerbsReconnectThread: delaying for 43.145624000 more seconds, reason:
delaying for next reconnect attempt

I tried searching on gpfs wiki for this type of waiter, but was unable to
find anything of value.

Is this something to pay attention to, and what does this waiter mean?

Thank you.
Damir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190912/8b65f9f1/attachment.htm>

From george at markomanolis.com  Thu Sep 12 16:10:58 2019
From: george at markomanolis.com (George Markomanolis)
Date: Thu, 12 Sep 2019 11:10:58 -0400
Subject: [gpfsug-discuss] Call for Submission for the IO500 List
Message-ID: <CAPU3yroLbtpNzguSxxHndmQnxrAJ4OZtZTW0Wgngt7aq=2QKmA@mail.gmail.com>

Call for Submission

*Deadline*: 10 November 2019 AoE

The IO500 <http://io500.org/> is now accepting and encouraging submissions
for the upcoming 5th IO500 list revealed at SC19 in Denver, Colorado. Once
again, we are also accepting submissions to the 10 Node I/O Challenge to
encourage submission of small scale results. The new ranked lists will be
announced at our SC19 BoF [2]. We hope to see you, and your results, there.
We have updated our submission rules [3]. This year, we will have a new
list for the Student Cluster Competition as IO500 is used for extra points
during this competition

The benchmark suite is designed to be easy to run and the community has
multiple active support channels to help with any questions. Please submit
and we look forward to seeing many of you at SC19! Please note that
submissions of all sizes are welcome; the site has customizable sorting so
it is possible to submit on a small system and still get a very good
per-client score for example. Additionally, the list is about much more
than just the raw rank; all submissions help the community by collecting
and publishing a wider corpus of data. More details below.

Following the success of the Top500 in collecting and analyzing historical
trends in supercomputer technology and evolution, the IO500
<http://io500.org/> was created in 2017, published its first list at SC17,
and has grown exponentially since then. The need for such an initiative has
long been known within High-Performance Computing; however, defining
appropriate benchmarks had long been challenging. Despite this challenge,
the community, after long and spirited discussion, finally reached
consensus on a suite of benchmarks and a metric for resolving the scores
into a single ranking.

The multi-fold goals of the benchmark suite are as follows:

1.   Maximizing simplicity in running the benchmark suite

2.   Encouraging complexity in tuning for performance

3.   Allowing submitters to highlight their ?hero run? performance numbers

4.   Forcing submitters to simultaneously report performance for
challenging IO patterns.

Specifically, the benchmark suite includes a hero-run of both IOR and
mdtest configured however possible to maximize performance and establish an
upper-bound for performance. It also includes an IOR and mdtest run with
highly prescribed parameters in an attempt to determine a lower-bound.
Finally, it includes a namespace search as this has been determined to be a
highly sought-after feature in HPC storage systems that have historically
not been well-measured. Submitters are encouraged to share their tuning
insights for publication.

The goals of the community are also multi-fold:

1.   Gather historical data for the sake of analysis and to aid predictions
of storage futures

2.   Collect tuning information to share valuable performance optimizations
across the community

3.   Encourage vendors and designers to optimize for workloads beyond ?hero
runs?

4.   Establish bounded expectations for users, procurers, and administrators
10 Node I/O Challenge

At SC, we will continue the 10 Node Challenge. This challenge is conducted
using the regular IO500 benchmark, however, with the rule that exactly *10
computes nodes* must be used to run the benchmark (one exception is the
find, which may use 1 node). You may use any shared storage with, e.g., any
number of servers. We will announce the result in a separate derived list
and in the full list but not on the ranked IO500 list at io500.org.
Birds-of-a-feather

Once again, we encourage you to submit [1], to join our community, and to
attend our BoF ?The IO500 and the Virtual Institute of I/O? at SC19,
November 19th, 12:15-1:15pm, room 205-207, where we will announce the new
IO500 list, the 10 node challenge list, and the Student Cluster Competition
list. We look forward to answering any questions or concerns you might have.

[1] http://io500.org/submission

[2] *https://www.vi4io.org/io500/bofs/sc19/start
<https://www.vi4io.org/io500/bofs/sc19/start>*

[3] https://www.vi4io.org/io500/rules/submission

The IO500 committee
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190912/6b48bb24/attachment.htm>

From kkr at lbl.gov  Thu Sep 12 20:19:20 2019
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Thu, 12 Sep 2019 12:19:20 -0700
Subject: [gpfsug-discuss] Hold the Date - September 23 and 24 -
	REGISTRATION CLOSING SOON
In-Reply-To: <938EC571-B900-42BC-8465-3E666912533F@lbl.gov>
References: <3F2B08E9-C6E3-412B-9308-D79E3480C5DA@lbl.gov>
	<938EC571-B900-42BC-8465-3E666912533F@lbl.gov>
Message-ID: <FDE59BE1-EBFB-422F-A5BB-28CFD0BC403B@lbl.gov>

Reminder, registration closing on 9/16 EOB. That?s real soon now. Hope to see you there. Details below.

> On Aug 29, 2019, at 7:30 PM, Kristy Kallback-Rose <kkr at lbl.gov> wrote:
> 
> Hello,
> 
> 	You will now find the nearly complete agenda here: 
> 
> https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/ <https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/>
> 
> 	As noted before, the event is free, but please do register below to help with catering planning.
> 
> 	You can find more information about the full HPCXXL event here: http://hpcxxl.org/ <http://hpcxxl.org/>
> 
> 	Any questions let us know. Hope to see you there!
> 
> -Kristy
> 
>> On Jul 2, 2019, at 10:45 AM, Kristy Kallback-Rose <kkr at lbl.gov <mailto:kkr at lbl.gov>> wrote:
>> 
>> Hello,
>> 
>> HPCXXL will be hosted by NERSC (Berkeley, CA) this September. As part of this event, there will be approximately a day and a half on GPFS content. We have done this type of event in the past, and as before, the GPFS days will be free to attend, but you do need to register. 
>> 
>> We?ll have more details soon, mark your calendars. 
>> 
>> Initial details: https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/ <https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/>
>> 
>> Best,
>> Kristy
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190912/1c4b782e/attachment.htm>

From Greg.Lehmann at csiro.au  Fri Sep 13 09:48:58 2019
From: Greg.Lehmann at csiro.au (Lehmann, Greg (IM&T, Pullenvale))
Date: Fri, 13 Sep 2019 08:48:58 +0000
Subject: [gpfsug-discuss] infiniband fabric instability effects
Message-ID: <SYCPR01MB4544546F3C45E36F01E2DC3EF0B30@SYCPR01MB4544.ausprd01.prod.outlook.com>

Hi All,
                I was wondering what effect restarting the subnet manager has on an active Spectrum Scale filesystem. Is there any scope for data loss or corruption? A 2nd similar scenario of slightly longer duration is failover to a secondary subnet manager because the primary has crashed. What effect would that have on the filesystem?

Cheers,

Greg Lehmann
Senior High Performance Data Specialist
Data Services | Scientific Computing Platforms
Information Management and Technology  |  CSIRO
Greg.Lehmann at csiro.au<mailto:Greg.Lehmann at csiro.au>  |  +61 7 3327 4137 |
1 Technology Court, Pullenvale, QLD 4069

CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and we pay our respects to their Elders past and present.

The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference.

Please consider the environment before printing this email.

CSIRO Australia's National Science Agency  |  csiro.au

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190913/d85678ad/attachment.htm>

From david_johnson at brown.edu  Fri Sep 13 10:14:06 2019
From: david_johnson at brown.edu (david_johnson at brown.edu)
Date: Fri, 13 Sep 2019 05:14:06 -0400
Subject: [gpfsug-discuss] infiniband fabric instability effects
In-Reply-To: <SYCPR01MB4544546F3C45E36F01E2DC3EF0B30@SYCPR01MB4544.ausprd01.prod.outlook.com>
References: <SYCPR01MB4544546F3C45E36F01E2DC3EF0B30@SYCPR01MB4544.ausprd01.prod.outlook.com>
Message-ID: <21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu>

Restarting subnet manager in general is fairly harmless. It will cause a heavy sweep of the fabric when it comes back up, but there should be no LID renumbering. Traffic may be held up during the scanning and rebuild of the routing tables. 
 Losing a subnet manager for a period of time would prevent newly booted nodes from receiving a LID but existing nodes will continue to function. 
Adding or deleting inter-switch links should probably be avoided if the subnet manager is down.  I would also avoid changing the routing algorithm while in production.  
Moving a non ha subnet manager from primary to backup and back again has worked for us without disruption, but I would try to do this in a maintenance window. 

  -- ddj
Dave Johnson

> On Sep 13, 2019, at 4:48 AM, Lehmann, Greg (IM&T, Pullenvale) <Greg.Lehmann at csiro.au> wrote:
> 
> Hi All,
>                 I was wondering what effect restarting the subnet manager has on an active Spectrum Scale filesystem. Is there any scope for data loss or corruption? A 2nd similar scenario of slightly longer duration is failover to a secondary subnet manager because the primary has crashed. What effect would that have on the filesystem?
>  
> Cheers,
>  
> Greg Lehmann
> Senior High Performance Data Specialist
> Data Services | Scientific Computing Platforms
> Information Management and Technology  |  CSIRO 
> Greg.Lehmann at csiro.au  |  +61 7 3327 4137 |
> 1 Technology Court, Pullenvale, QLD 4069
>  
> CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and we pay our respects to their Elders past and present.
>  
> The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference.
>  
> Please consider the environment before printing this email.
>  
> CSIRO Australia?s National Science Agency  |  csiro.au
>  
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190913/491fcfd7/attachment.htm>

From jonathan.buzzard at strath.ac.uk  Fri Sep 13 10:48:52 2019
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 13 Sep 2019 09:48:52 +0000
Subject: [gpfsug-discuss] infiniband fabric instability effects
In-Reply-To: <21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu>
References: <SYCPR01MB4544546F3C45E36F01E2DC3EF0B30@SYCPR01MB4544.ausprd01.prod.outlook.com>
	<21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu>
Message-ID: <ddeb2a2ebc4b8df22b831d3e8287401db673a1ba.camel@strath.ac.uk>

On Fri, 2019-09-13 at 05:14 -0400, david_johnson at brown.edu wrote:

[SNIP]

> Moving a non ha subnet manager from primary to backup and back again
> has worked for us without disruption, but I would try to do this in a
> maintenance window. 
> 

Not on GPFS but in the past I have moved from one subnet manager to
another with dozens of running MPI jobs, and Lustre running over the
fabric and not missed a beat. My current cluster used 10 and 40Gbps
ethernet for GPFS with Omnipath exclusively for MPI traffic.

To be honest I just cannot wrap my head around the idea that you would
not be running two subnet managers in the first place. Just fire up two
subnet managers (whether on a switch or a node) and forget about it.
They will automatically work together to give you a HA solution. It is
the same with Omnipath too.

I would also note that you can fire up more than two fabric managers
and it all "just works".

If it where me and I didn't have fabric managers running on at least
two of my switches and I was doing GPFS over Infiniband, I would fire
up fabric managers on all of my NSD servers.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From heinrich.billich at id.ethz.ch  Fri Sep 13 15:56:07 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Fri, 13 Sep 2019 14:56:07 +0000
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?
Message-ID: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>


Hello,

I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated.  But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. 
I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level?

Any comment is welcome

Cheers,
Heiner
-- 
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
 

I did check with

  ss -l -t -4
  ss -l -t  -6

add  -p to get the process name, too.

do you get the same results on your ces nodes?


[root at nas22ces04-i config_samples]#   ss -l -t   -4
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      8192                                                                                  *:gpfs                                                                                                *:*
LISTEN      0      50                                                                                    *:netbios-ssn                                                                                         *:*
LISTEN      0      128                                                                                   *:5355                                                                                                *:*
LISTEN      0      128                                                                                   *:sunrpc                                                                                              *:*
LISTEN      0      128                                                                                   *:ssh                                                                                                 *:*
LISTEN      0      100                                                                           127.0.0.1:smtp                                                                                                *:*
LISTEN      0      10                                                                        10.250.135.24:4379                                                                                                *:*
LISTEN      0      128                                                                                   *:32765                                                                                               *:*
LISTEN      0      50                                                                                    *:microsoft-ds                                                                                        *:*
[root at nas22ces04-i config_samples]#   ss -l -t   -6
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      128                                                                                  :::32767                                                                                              :::*
LISTEN      0      128                                                                                  :::32768                                                                                              :::*
LISTEN      0      128                                                                                  :::32769                                                                                              :::*
LISTEN      0      128                                                                                  :::2049                                                                                               :::*
LISTEN      0      128                                                                                  :::5355                                                                                               :::*
LISTEN      0      50                                                                                   :::netbios-ssn                                                                                        :::*
LISTEN      0      128                                                                                  :::sunrpc                                                                                             :::*
LISTEN      0      128                                                                                  :::ssh                                                                                                :::*
LISTEN      0      128                                                                                  :::32765                                                                                              :::*
LISTEN      0      50                                                                                   :::microsoft-ds                                                                                       :::*


From ewahl at osc.edu  Fri Sep 13 16:42:30 2019
From: ewahl at osc.edu (Wahl, Edward)
Date: Fri, 13 Sep 2019 15:42:30 +0000
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be
	expected?
In-Reply-To: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
References: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
Message-ID: <DM5PR0102MB349461A0FCE42DB381989D35A8B30@DM5PR0102MB3494.prod.exchangelabs.com>

I recall looking at this a year or two back.  Ganesha is either v4 and v6 both (ie: the encapsulation you see), OR  ipv4 ONLY.  (ie: /etc/modprobe.d/ipv6.conf   disable=1)

Ed


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Billich Heinrich Rainer (ID SD) <heinrich.billich at id.ethz.ch>
Sent: Friday, September 13, 2019 10:56 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?


Hello,

I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated.  But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets.
I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level?

Any comment is welcome

Cheers,
Heiner
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


I did check with

  ss -l -t -4
  ss -l -t  -6

add  -p to get the process name, too.

do you get the same results on your ces nodes?


[root at nas22ces04-i config_samples]#   ss -l -t   -4
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      8192                                                                                  *:gpfs                                                                                                *:*
LISTEN      0      50                                                                                    *:netbios-ssn                                                                                         *:*
LISTEN      0      128                                                                                   *:5355                                                                                                *:*
LISTEN      0      128                                                                                   *:sunrpc                                                                                              *:*
LISTEN      0      128                                                                                   *:ssh                                                                                                 *:*
LISTEN      0      100                                                                           127.0.0.1:smtp                                                                                                *:*
LISTEN      0      10                                                                        10.250.135.24:4379                                                                                                *:*
LISTEN      0      128                                                                                   *:32765                                                                                               *:*
LISTEN      0      50                                                                                    *:microsoft-ds                                                                                        *:*
[root at nas22ces04-i config_samples]#   ss -l -t   -6
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      128                                                                                  :::32767                                                                                              :::*
LISTEN      0      128                                                                                  :::32768                                                                                              :::*
LISTEN      0      128                                                                                  :::32769                                                                                              :::*
LISTEN      0      128                                                                                  :::2049                                                                                               :::*
LISTEN      0      128                                                                                  :::5355                                                                                               :::*
LISTEN      0      50                                                                                   :::netbios-ssn                                                                                        :::*
LISTEN      0      128                                                                                  :::sunrpc                                                                                             :::*
LISTEN      0      128                                                                                  :::ssh                                                                                                :::*
LISTEN      0      128                                                                                  :::32765                                                                                              :::*
LISTEN      0      50                                                                                   :::microsoft-ds                                                                                       :::*

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190913/9f3212f9/attachment.htm>

From jam at ucar.edu  Fri Sep 13 17:07:01 2019
From: jam at ucar.edu (Joseph Mendoza)
Date: Fri, 13 Sep 2019 10:07:01 -0600
Subject: [gpfsug-discuss] VerbsReconnectThread waiters
In-Reply-To: <CAKV+WqfbysaKM=We3NNm5essS=--Ej=wu+yzvLfousVp4cfOYw@mail.gmail.com>
References: <CAKV+WqfbysaKM=We3NNm5essS=--Ej=wu+yzvLfousVp4cfOYw@mail.gmail.com>
Message-ID: <0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu>

I have seen these on our cluster after the IB network goes down (GPFS still runs over ethernet) and then comes back up.?
They will retry forever it seems, even after the IB is healthy again.? The effect they seem to have is that verbs
connections between some nodes breaks and GPFS uses ethernet/ipoib instead.? You may see messages in your
mmfs.log.latest about verbs being disabled "due to too many errors".? You can also see fewer verbs connections between
nodes in "mmfsadm test verbs conn" output.

Restarting GPFS on the nodes with waiters has fixed the issue for me, I don't know if IBM has any other tricks to fix
this without a restart.

--Joey


On 9/12/19 8:16 AM, Damir Krstic wrote:
> On my cluster I have seen couple of long waiters such as this:
>
> gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230 VerbsReconnectThread: delaying for 43.145624000 more
> seconds, reason: delaying for next reconnect attempt
>
> I tried searching on gpfs wiki for this type of waiter, but was unable to find anything of value.
>
> Is this something to pay attention to, and what does this waiter mean?
>
> Thank you.
> Damir
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190913/62e11588/attachment.htm>

From olaf.weiser at de.ibm.com  Mon Sep 16 08:12:09 2019
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 16 Sep 2019 09:12:09 +0200
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be
	expected?
In-Reply-To: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
References: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
Message-ID: <OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/eb0ae02d/attachment.htm>

From scale at us.ibm.com  Mon Sep 16 10:33:58 2019
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Mon, 16 Sep 2019 17:33:58 +0800
Subject: [gpfsug-discuss] VerbsReconnectThread waiters
In-Reply-To: <0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu>
References: <CAKV+WqfbysaKM=We3NNm5essS=--Ej=wu+yzvLfousVp4cfOYw@mail.gmail.com>
	<0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu>
Message-ID: <OFBB9F688D.90B746AE-ON85258477.0032A926-48258477.00348C9B@notes.na.collabserv.com>


Damir, Joseph,

> Is this something to pay attention to, and what does this waiter mean?
This waiter means GPFS fails to reconnect broken verbs connection,  which
can cause performance degradation.

> I have seen these on our cluster after the IB network goes down (GPFS
still runs over ethernet) and then comes back up.? They will retry forever
it seems, even after the IB is healthy again.
> Restarting GPFS on the nodes with waiters has fixed the issue for me, I
don't know if IBM has any other tricks to fix this without a restart.

This is a code bug which is fixed through internal defect 1090669. It will
be backport to service releases after verification.
There is a work-around which can fix this problem without a restart.
-   On nodes which have this waiter list, run command 'mmfsadm test
breakconn all 744'
     744 is E_RECONNECT, which triggers tcp reconnect and will not cause
node leave/rejoin. Its side effect clears RDMA connections and their
incorrect status.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.


From:	Joseph Mendoza <jam at ucar.edu>
To:	gpfsug-discuss at spectrumscale.org
Date:	2019/09/14 12:08 AM
Subject:	[EXTERNAL] Re: [gpfsug-discuss] VerbsReconnectThread waiters
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


I have seen these on our cluster after the IB network goes down (GPFS still
runs over ethernet) and then comes back up.? They will retry forever it
seems, even after the IB is healthy again.? The effect they seem to have is
that verbs connections between some nodes breaks and GPFS uses
ethernet/ipoib instead.? You may see messages in your mmfs.log.latest about
verbs being disabled "due to too many errors".? You can also see fewer
verbs connections between nodes in "mmfsadm test verbs conn" output.


Restarting GPFS on the nodes with waiters has fixed the issue for me, I
don't know if IBM has any other tricks to fix this without a restart.


--Joey


On 9/12/19 8:16 AM, Damir Krstic wrote:
      On my cluster I have seen couple of long waiters such as this:

      gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230
      VerbsReconnectThread: delaying for 43.145624000 more seconds, reason:
      delaying for next reconnect attempt

      I tried searching on gpfs wiki for this type of waiter, but was
      unable to find anything of value.

      Is this something to pay attention to, and what does this waiter
      mean?

      Thank you.
      Damir

      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=WoT3TYlCvAM8RQxUISD9L6UzqY0I_ffCJTS-UHhw8z4&s=18A0j0Zmp8OwZ6Y6cc3HFe3OgFZRHIv8OeJcBpkaPwQ&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/e5e489f9/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/e5e489f9/attachment.gif>

From alvise.dorigo at psi.ch  Mon Sep 16 13:58:03 2019
From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI))
Date: Mon, 16 Sep 2019 12:58:03 +0000
Subject: [gpfsug-discuss] Can 5-minutes frequent lsscsi command disrupt GPFS
 I/O on a Lenovo system ?
Message-ID: <83A6EEB0EC738F459A39439733AE80452BEA85FE@MBX214.d.ethz.ch>

Hello folks,
recently I observed that calling every 5 minutes the command "lsscsi -g" on a Lenovo I/O node (a X3650 M5 connected to D3284 enclosures, part of a DSS-G220 system) can seriously compromise the GPFS I/O performance.

(The motivation of running lsscsi every 5 minutes is a bit out of topic, but I can explain on request).

What we observed is that there were several GPFS waiters telling that flushing caches to physical disk was impossible and they had to wait (possibly going in timeout).

Is this something expected and/or observed by someone else in this community ?

Thanks
Regards,

   Alvise Dorigo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/479fd0fe/attachment.htm>

From ewahl at osc.edu  Mon Sep 16 15:50:24 2019
From: ewahl at osc.edu (Wahl, Edward)
Date: Mon, 16 Sep 2019 14:50:24 +0000
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to
	be	expected?
In-Reply-To: <OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>
References: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>,
	<OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>
Message-ID: <DM5PR0102MB3494845B8859DD1F731DBA53A88C0@DM5PR0102MB3494.prod.exchangelabs.com>

What package provides this /usr/lib/tuned/  file?

Ed


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
Sent: Monday, September 16, 2019 3:12 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?

Hallo Heiner,
usually, Spectrum Scale comes with a tuned profile (named scale) ..

[root at nsd01 ~]# tuned-adm active
Current active profile: scale

in there
[root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3
# Disable IPv6
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
[root at nsd01 ~]#

depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ...
but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled...


From:        "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        09/13/2019 05:02 PM
Subject:        [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello,

I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated.  But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets.
I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level

Any comment is welcome

Cheers,
Heiner
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


I did check with

 ss -l -t -4
 ss -l -t  -6

add  -p to get the process name, too.

do you get the same results on your ces nodes?


[root at nas22ces04-i config_samples]#   ss -l -t   -4
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      8192                                                                                  *:gpfs                                                                                                *:*
LISTEN      0      50                                                                                    *:netbios-ssn                                                                                         *:*
LISTEN      0      128                                                                                   *:5355                                                                                                *:*
LISTEN      0      128                                                                                   *:sunrpc                                                                                              *:*
LISTEN      0      128                                                                                   *:ssh                                                                                                 *:*
LISTEN      0      100                                                                           127.0.0.1:smtp                                                                                                *:*
LISTEN      0      10                                                                        10.250.135.24:4379                                                                                                *:*
LISTEN      0      128                                                                                   *:32765                                                                                               *:*
LISTEN      0      50                                                                                    *:microsoft-ds                                                                                        *:*
[root at nas22ces04-i config_samples]#   ss -l -t   -6
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      128                                                                                  :::32767                                                                                              :::*
LISTEN      0      128                                                                                  :::32768                                                                                              :::*
LISTEN      0      128                                                                                  :::32769                                                                                              :::*
LISTEN      0      128                                                                                  :::2049                                                                                               :::*
LISTEN      0      128                                                                                  :::5355                                                                                               :::*
LISTEN      0      50                                                                                   :::netbios-ssn                                                                                        :::*
LISTEN      0      128                                                                                  :::sunrpc                                                                                             :::*
LISTEN      0      128                                                                                  :::ssh                                                                                                :::*
LISTEN      0      128                                                                                  :::32765                                                                                              :::*
LISTEN      0      50                                                                                   :::microsoft-ds                                                                                       :::*

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/32f48867/attachment.htm>

From cblack at nygenome.org  Mon Sep 16 15:55:34 2019
From: cblack at nygenome.org (Christopher Black)
Date: Mon, 16 Sep 2019 14:55:34 +0000
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be
 expected?
In-Reply-To: <DM5PR0102MB3494845B8859DD1F731DBA53A88C0@DM5PR0102MB3494.prod.exchangelabs.com>
References: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
	<OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>
	<DM5PR0102MB3494845B8859DD1F731DBA53A88C0@DM5PR0102MB3494.prod.exchangelabs.com>
Message-ID: <C6000761-94CE-4D51-BF9A-B70AB3CF5B31@nygenome.org>

On our recent ESS systems we do not see /etc/tuned/scale/tuned.conf (or script.sh) owned by any package (rpm -qif ?).
I?ve attached what we have on our ESS 5.3.3 systems.

Best,
Chris

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "Wahl, Edward" <ewahl at osc.edu>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, September 16, 2019 at 10:50 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?

What package provides this /usr/lib/tuned/  file?

Ed


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
Sent: Monday, September 16, 2019 3:12 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?

Hallo Heiner,
usually, Spectrum Scale comes with a tuned profile (named scale) ..

[root at nsd01 ~]# tuned-adm active
Current active profile: scale

in there
[root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3
# Disable IPv6
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
[root at nsd01 ~]#

depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ...
but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled...


From:        "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        09/13/2019 05:02 PM
Subject:        [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello,

I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated.  But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets.
I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level

Any comment is welcome

Cheers,
Heiner
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


I did check with

 ss -l -t -4
 ss -l -t  -6

add  -p to get the process name, too.

do you get the same results on your ces nodes?


[root at nas22ces04-i config_samples]#   ss -l -t   -4
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      8192                                                                                  *:gpfs                                                                                                *:*
LISTEN      0      50                                                                                    *:netbios-ssn                                                                                         *:*
LISTEN      0      128                                                                                   *:5355                                                                                                *:*
LISTEN      0      128                                                                                   *:sunrpc                                                                                              *:*
LISTEN      0      128                                                                                   *:ssh                                                                                                 *:*
LISTEN      0      100                                                                           127.0.0.1:smtp                                                                                                *:*
LISTEN      0      10                                                                        10.250.135.24:4379                                                                                                *:*
LISTEN      0      128                                                                                   *:32765                                                                                               *:*
LISTEN      0      50                                                                                    *:microsoft-ds                                                                                        *:*
[root at nas22ces04-i config_samples]#   ss -l -t   -6
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      128                                                                                  :::32767                                                                                              :::*
LISTEN      0      128                                                                                  :::32768                                                                                              :::*
LISTEN      0      128                                                                                  :::32769                                                                                              :::*
LISTEN      0      128                                                                                  :::2049                                                                                               :::*
LISTEN      0      128                                                                                  :::5355                                                                                               :::*
LISTEN      0      50                                                                                   :::netbios-ssn                                                                                        :::*
LISTEN      0      128                                                                                  :::sunrpc                                                                                             :::*
LISTEN      0      128                                                                                  :::ssh                                                                                                :::*
LISTEN      0      128                                                                                  :::32765                                                                                              :::*
LISTEN      0      50                                                                                   :::microsoft-ds                                                                                       :::*

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFAw&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=vVQfi9vKAMyJJJkblLG-6lFn75kWWfpc6yGZpiIkJMo&s=l9Yuaa-imE1XkV2RV-lyYdcH0aRV2klb5vXbPDHg6F4&e=>


________________________________
This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/8f60d415/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tuned.conf
Type: application/octet-stream
Size: 2859 bytes
Desc: tuned.conf
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/8f60d415/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: script.sh
Type: application/octet-stream
Size: 270 bytes
Desc: script.sh
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/8f60d415/attachment-0001.obj>

From heinrich.billich at id.ethz.ch  Mon Sep 16 16:49:57 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Mon, 16 Sep 2019 15:49:57 +0000
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be
 expected?
In-Reply-To: <OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>
References: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
	<OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>
Message-ID: <766AA5C3-46BD-4B91-9D1E-52BC5FAB90A8@id.ethz.ch>

Hello Olaf,

Thank you, so we?ll try to get rid of IPv6. Actually we do have this settings active but  I may have to add them to the initrd file, too. (See https://access.redhat.com/solutions/8709#?rhel7disable) to prevent ganesha from opening an IPv6 socket. It?s probably no big issue if ganesha uses IPv4overIPv6 for all connections, but to keep things simple I would like to avoid it.

@Edward
We got /etc/tuned/scale/tuned.conf with GSS/xCAT. I?m not sure whether it?s part of any rpm.

Cheers,
Heiner
From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
Reply to: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, 16 September 2019 at 09:12
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?

Hallo Heiner,
usually, Spectrum Scale comes with a tuned profile (named scale) ..

[root at nsd01 ~]# tuned-adm active
Current active profile: scale

in there
[root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3
# Disable IPv6
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
[root at nsd01 ~]#

depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ...
but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled...


From:        "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        09/13/2019 05:02 PM
Subject:        [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello,

I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated.  But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets.
I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level

Any comment is welcome

Cheers,
Heiner
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


I did check with

 ss -l -t -4
 ss -l -t  -6

add  -p to get the process name, too.

do you get the same results on your ces nodes?


[root at nas22ces04-i config_samples]#   ss -l -t   -4
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      8192                                                                                  *:gpfs                                                                                                *:*
LISTEN      0      50                                                                                    *:netbios-ssn                                                                                         *:*
LISTEN      0      128                                                                                   *:5355                                                                                                *:*
LISTEN      0      128                                                                                   *:sunrpc                                                                                              *:*
LISTEN      0      128                                                                                   *:ssh                                                                                                 *:*
LISTEN      0      100                                                                           127.0.0.1:smtp                                                                                                *:*
LISTEN      0      10                                                                        10.250.135.24:4379                                                                                                *:*
LISTEN      0      128                                                                                   *:32765                                                                                               *:*
LISTEN      0      50                                                                                    *:microsoft-ds                                                                                        *:*
[root at nas22ces04-i config_samples]#   ss -l -t   -6
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      128                                                                                  :::32767                                                                                              :::*
LISTEN      0      128                                                                                  :::32768                                                                                              :::*
LISTEN      0      128                                                                                  :::32769                                                                                              :::*
LISTEN      0      128                                                                                  :::2049                                                                                               :::*
LISTEN      0      128                                                                                  :::5355                                                                                               :::*
LISTEN      0      50                                                                                   :::netbios-ssn                                                                                        :::*
LISTEN      0      128                                                                                  :::sunrpc                                                                                             :::*
LISTEN      0      128                                                                                  :::ssh                                                                                                :::*
LISTEN      0      128                                                                                  :::32765                                                                                              :::*
LISTEN      0      50                                                                                   :::microsoft-ds                                                                                       :::*

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/e76352cd/attachment.htm>

From Robert.Oesterlin at nuance.com  Mon Sep 16 18:34:07 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Mon, 16 Sep 2019 17:34:07 +0000
Subject: [gpfsug-discuss] SSUG @ SC19 Update: Scheduling and Sponsorship
	Opportunities
Message-ID: <F5A6F15E-EBD8-4CD2-B1E4-D6B5616184E2@nuance.com>

Two months until SC19 and the schedule is starting to come together, with a great mix of technical updates and user talks. I would like highlight a few items for you to be aware of:

- Morning session: We?re currently trying to put together a morning ?new users? session for those new to Spectrum Scale. These talks would be focused on fundamentals and give an opportunity to ask questions. We?re tentatively thinking about starting around 9:30-10 AM on Sunday November 17th. Watch the mailing list for updates and on the http://spectrumscale.org site.
- Sponsorships: We?re looking for sponsors. If your company is an IBM partner, uses/incorporates Spectrum Scale - please contact myself or Kristy Kallback-Rose. We are looking for sponsors to help with lunch (YES - we?d like to serve lunch this year!) and WiFi access during the user group meeting.

Looking forward to seeing you all at SC19. Registration link coming soon, watch here: https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-sc19/

Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/74eaddb9/attachment.htm>

From S.J.Thompson at bham.ac.uk  Wed Sep 18 18:56:29 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Wed, 18 Sep 2019 17:56:29 +0000
Subject: [gpfsug-discuss] GPFS and POWER9
Message-ID: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk>

Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log:

Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]
Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert
Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000
Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3
Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]
Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert
Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000
Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]
Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [00000000115a2478] PID: 141380 Comm: mmfsd
Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU
Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Load/Store]
Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000003002a2a8400
Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c016590000
Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]
Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001150b160] PID: 141380 Comm: mmfsd
Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU
Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]
Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001150b160
Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c01fe80000
Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]
Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd
Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU
Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]
Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001086a7f0
Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c00fe70000
Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4

I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed.

Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)?

Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190918/19cb9b0e/attachment.htm>

From abeattie at au1.ibm.com  Thu Sep 19 11:44:46 2019
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Thu, 19 Sep 2019 10:44:46 +0000
Subject: [gpfsug-discuss] GPFS and POWER9
In-Reply-To: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk>
References: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk>
Message-ID: <OF5BB3FFA4.5126D760-ON0025847A.003B07EE-0025847A.003B07F5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/565ad74a/attachment.htm>

From heinrich.billich at id.ethz.ch  Thu Sep 19 15:20:53 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Thu, 19 Sep 2019 14:20:53 +0000
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this
	unusual?
Message-ID: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>


Hello,

Is it usual to see 200?000-400?000 open files for a single ganesha process? Or does this indicate that something ist wrong?

We have some issues with ganesha (on spectrum scale protocol nodes)  reporting NFS3ERR_IO in the log. I noticed that the affected nodes have a large number of open files, 200?000-400?000 open files per daemon (and 500 threads and about 250 client connections). Other nodes have 1?000 ? 10?000 open files by ganesha only and don?t show the issue.

If someone could explain how ganesha decides which files to keep open and which to close that would help, too. As NFSv3 is stateless the client doesn?t open/close a file, it?s the server to decide when to close it? We do have a few NFSv4 clients, too.

Are there certain access patterns that can trigger such a large number of open file? Maybe traversing and reading a large number of small files?

Thank you,
Heiner

I did count the open files  by counting the entries in /proc/<pid of ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to list all the symbolic links, hence I can?t relate the open files to different exports easily.

I did post this to the ganesha mailing list, too.
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/52b76cf1/attachment.htm>

From frederik.ferner at diamond.ac.uk  Thu Sep 19 15:30:45 2019
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Thu, 19 Sep 2019 15:30:45 +0100
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is
 this unusual?
In-Reply-To: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
Message-ID: <b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>

Heiner,

we are seeing similar issues with CES/ganesha NFS, in our case it 
exclusively with NFSv3 clients.

What is maxFilesToCache set to on your ganesha node(s)? In our case 
ganesha was running into the limit of open file descriptors because 
maxFilesToCache was set at a low default and for now we've increased it 
to 1M.

It seemed that ganesha was never releasing files even after clients 
unmounted the file system.

We've only recently made the change, so we'll see how much that improved 
the situation.

I thought we had a reproducer but after our recent change, I can now no 
longer successfully reproduce the increase in open files not being released.

Kind regards,
Frederik

On 19/09/2019 15:20, Billich  Heinrich Rainer (ID SD) wrote:
> Hello,
> 
> Is it usual to see 200?000-400?000 open files for a single ganesha 
> process? Or does this indicate that something ist wrong?
> 
> We have some issues with ganesha (on spectrum scale protocol nodes) 
>  ?reporting NFS3ERR_IO in the log. I noticed that the affected nodes 
> have a large number of open files, 200?000-400?000 open files per daemon 
> (and 500 threads and about 250 client connections). Other nodes have 
> 1?000 ? 10?000 open files by ganesha only and don?t show the issue.
> 
> If someone could explain how ganesha decides which files to keep open 
> and which to close that would help, too. As NFSv3 is stateless the 
> client doesn?t open/close a file, it?s the server to decide when to 
> close it? We do have a few NFSv4 clients, too.
> 
> Are there certain access patterns that can trigger such a large number 
> of open file? Maybe traversing and reading a large number of small files?
> 
> Thank you,
> 
> Heiner
> 
> I did count the open files ?by counting the entries in /proc/<pid of 
> ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to 
> list all the symbolic links, hence I can?t relate the open files to 
> different exports easily.
> 
> I did post this to the ganesha mailing list, too.
> 
> -- 
> 
> =======================
> 
> Heinrich Billich
> 
> ETH Z?rich
> 
> Informatikdienste
> 
> Tel.: +41 44 632 72 56
> 
> heinrich.billich at id.ethz.ch
> 
> ========================
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 


-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom


From S.J.Thompson at bham.ac.uk  Thu Sep 19 16:18:47 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Thu, 19 Sep 2019 15:18:47 +0000
Subject: [gpfsug-discuss] GPFS and POWER9
In-Reply-To: <OF5BB3FFA4.5126D760-ON0025847A.003B07EE-0025847A.003B07F5@notes.na.collabserv.com>
References: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk>
	<OF5BB3FFA4.5126D760-ON0025847A.003B07EE-0025847A.003B07F5@notes.na.collabserv.com>
Message-ID: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>

Hi Andrew,

Yes, but not only. We use the two SFP+ ports from the Broadcom supplied card + the bifurcated Mellanox card in them.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "abeattie at au1.ibm.com" <abeattie at au1.ibm.com>
Reply-To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 19 September 2019 at 11:45
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] GPFS and POWER9

Simon,

are you using Intel 10Gb Network Adapters with RH 7.6 by anychance?

regards
Andrew Beattie
File and Object Storage Technical Specialist - A/NZ
IBM Systems - Storage
Phone: 614-2133-7927
E-mail: abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>


----- Original message -----
From: Simon Thompson <S.J.Thompson at bham.ac.uk>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] GPFS and POWER9
Date: Thu, Sep 19, 2019 8:42 PM


Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log:


Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert

Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000

Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3

Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert

Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [00000000115a2478] PID: 141380 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Load/Store]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000003002a2a8400

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c016590000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001150b160] PID: 141380 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001150b160

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c01fe80000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001086a7f0

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c00fe70000

Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4


I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed.


Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)?


Simon
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/425e8bd9/attachment.htm>

From mnaineni at in.ibm.com  Thu Sep 19 19:38:53 2019
From: mnaineni at in.ibm.com (Malahal R Naineni)
Date: Thu, 19 Sep 2019 18:38:53 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Ganesha_daemon_has_400=27000_open_file?=
 =?utf-8?q?s_-_is_this=09unusual=3F?=
In-Reply-To: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
Message-ID: <OF0B80358D.CD6BEB6B-ON0025847A.0063AE28-0025847A.0066700B@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/bd109dc6/attachment.htm>

From abeattie at au1.ibm.com  Thu Sep 19 22:34:33 2019
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Thu, 19 Sep 2019 21:34:33 +0000
Subject: [gpfsug-discuss] GPFS and POWER9
In-Reply-To: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>
References: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>,
	<2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk><OF5BB3FFA4.5126D760-ON0025847A.003B07EE-0025847A.003B07F5@notes.na.collabserv.com>
Message-ID: <OF41085F4B.A1242A6C-ON0025847A.00767BDC-0025847A.00768562@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/4aedb8d1/attachment.htm>

From Robert.Oesterlin at nuance.com  Thu Sep 19 23:41:08 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 19 Sep 2019 22:41:08 +0000
Subject: [gpfsug-discuss] Leftover GUI events after ESS upgrade
Message-ID: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com>

I just upgraded to ESS 5.3.4-1, and during the process these appeared. They only show up in the GUI. They don?t appear in gnrhelathcheck or mmhealth. pdisk checks are clearAny idea how to get rid of them?


GSSIO1-HS GNR pdisk rg_gssio1-hs/n001v001 is diagnosing

GSSIO1-HS GNR pdisk rg_gssio2-hs/n001v002 is diagnosing


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/d3559e5e/attachment.htm>

From TROPPENS at de.ibm.com  Fri Sep 20 09:08:01 2019
From: TROPPENS at de.ibm.com (Ulf Troppens)
Date: Fri, 20 Sep 2019 10:08:01 +0200
Subject: [gpfsug-discuss] Agenda and registration link // Oct 10 - Spectrum
	Scale NYC User Meeting
Message-ID: <OF64C797F0.8BE6F29D-ONC125847B.0029D9CA-C125847B.002CAE42@notes.na.collabserv.com>


Draft agenda and registration link are now available:
https://www.spectrumscaleug.org/event/spectrum-scale-nyc-user-meeting-2019/


--
IBM Spectrum Scale Development - Client Engagements & Solutions Delivery
Consulting IT Specialist
Author "Storage Networks Explained"

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart,
HRB 243294

----- Forwarded by Ulf Troppens/Germany/IBM on 20/09/2019 09:37 -----

From:	"Ulf Troppens" <TROPPENS at de.ibm.com>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	11/09/2019 14:27
Subject:	[EXTERNAL] [gpfsug-discuss] Save the date: Oct 10 - Spectrum
            Scale NYC User	Meeting
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Greetings,

NYU Langone and IBM will host a Spectrum Scale User Meeting on October 10.
Many senior engineers of our development lab in Poughkeepsie will attend
and present. Details with agenda, exact location and registration link will
follow.

Best
Ulf


--
IBM Spectrum Scale Development - Client Engagements & Solutions Delivery
Consulting IT Specialist
Author "Storage Networks Explained"

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart,
HRB 243294
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=kZaabFheMr5-INuBtDMnDjxzZMuvvQ-K0cx1FAfh4lg&m=I3TzCv5SKxKb51eAL_blo-XwctX64z70ayrZKERanWA&s=OSKGngwXAoOemFy3HkctexuIpBJQu8NPeTkC_MMQBks&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190920/ce0d530a/attachment.htm>

From rohwedder at de.ibm.com  Fri Sep 20 10:14:58 2019
From: rohwedder at de.ibm.com (Markus Rohwedder)
Date: Fri, 20 Sep 2019 11:14:58 +0200
Subject: [gpfsug-discuss] Leftover GUI events after ESS upgrade
In-Reply-To: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com>
References: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com>
Message-ID: <OF65D349DD.8095BE48-ON0025847B.003194D2-C125847B.0032CF35@notes.na.collabserv.com>

Hello Bob,

this event is a "Notice": You can use the action "Mark  Selected Notices as
Read" or "Mark All Notices as Read"in the GUI Event Groups or Individual
Events grid.
Notice events are transient by nature and don't imply a permanent state
change of an entity.
It seems that during the upgrade, mmhealth had probed the pdisk and the
disk hospital was diagnosing the pdisk at this time, but eventually disk
hospital placed the pdisk back to normal state,

Mit freundlichen Gr??en / Kind regards

Dr. Markus Rohwedder

Spectrum Scale GUI Development
                                                                                   
                                                                                   
 Phone:  +49 162 4159920       IBM Deutschland Research &                          
                              Development                                          
                                                                                   
 E-Mail: rohwedder at de.ibm.com  Am Weiher 24                                        
                                                                                   
                               65451 Kelsterbach                                   
                                                                                   
                               Germany                                             
                                                                                   
                                                                                   
From:	"Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	20.09.2019 00:53
Subject:	[EXTERNAL] [gpfsug-discuss] Leftover GUI events after ESS
            upgrade
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


I just upgraded to ESS 5.3.4-1, and during the process these appeared. They
only show up in the GUI. They don?t appear in gnrhelathcheck or mmhealth.
pdisk checks are clearAny idea how to get rid of them?

GSSIO1-HS GNR pdisk rg_gssio1-hs/n001v001 is diagnosing
GSSIO1-HS GNR pdisk rg_gssio2-hs/n001v002 is diagnosing


Bob Oesterlin
Sr Principal Storage Engineer, Nuance
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=hLyf83U0otjISdpV5zl1cSCPVFFUF61ny3jWvv-5kNQ&s=ptMGcpNhnRTogPO2CN_l6jhC-vCN-VQAf53HmRLQDq8&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190920/074ed425/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190920/074ed425/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 14525383.gif
Type: image/gif
Size: 4659 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190920/074ed425/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190920/074ed425/attachment-0002.gif>

From heinrich.billich at id.ethz.ch  Mon Sep 23 10:33:02 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Mon, 23 Sep 2019 09:33:02 +0000
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is
 this unusual?
In-Reply-To: <b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
	<b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>
Message-ID: <9D53BE88-A5FC-469F-9362-F2EC67E393B7@id.ethz.ch>

Hello Frederik,

Thank you. I now see a similar behavior: Ganesha has 500k open files while the node is suspended since 2+hours. I would expect that some cleanup job does remove most of the open FD after a much shorter while. Our systems have an upper limit of 1M open files per process and these spectrum scale settings:

! maxFilesToCache 1048576
! maxStatCache 2097152

Our ganesha version is 2.5.3. (gpfs.nfs-ganesha-2.5.3-ibm036.10.el7). I don't see the issue with gpfs.nfs-ganesha-2.5.3-ibm030.01.el7. But this second cluster also has a different load pattern.

I did also post my initial question to the ganesha mailing list and want to share the reply I've got from Daniel Gryniewicz.

Cheers,
Heiner

Daniel Gryniewicz <dang at redhat.com>
So, it's not impossible, based on the workload, but it may also be a bug.

For global FDs (All NFSv3 and stateless NFSv4), we obviously cannot know
when the client closes the FD, and opening/closing all the time causes a
large performance hit.  So, we cache open FDs.

All handles in MDCACHE live on the LRU.  This LRU is divided into 2
levels.  Level 1 is more active handles, and they can have open FDs.
Various operation can demote a handle to level 2 of the LRU.  As part of
this transition, the global FD on that handle is closed.  Handles that
are actively in use (have a refcount taken on them) are not eligible for
this transition, as the FD may be being used.

We have a background thread that runs, and periodically does this
demotion, closing the FDs.  This thread runs more often when the number
of open FDs is above FD_HwMark_Percent of the available number of FDs,
and runs constantly when the open FD count is above FD_Limit_Percent of
the available number of FDs.

So, a heavily used server could definitely have large numbers of FDs
open.  However, there have also, in the past, been bugs that would
either keep the FDs from being closed, or would break the accounting (so
they were closed, but Ganesha still thought they were open).  You didn't
say what version of Ganesha you're using, so I can't tell if one of
those bugs apply.

Daniel

?On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" <gpfsug-discuss-bounces at spectrumscale.org on behalf of frederik.ferner at diamond.ac.uk> wrote:

    Heiner,
    
    we are seeing similar issues with CES/ganesha NFS, in our case it 
    exclusively with NFSv3 clients.
    
    What is maxFilesToCache set to on your ganesha node(s)? In our case 
    ganesha was running into the limit of open file descriptors because 
    maxFilesToCache was set at a low default and for now we've increased it 
    to 1M.
    
    It seemed that ganesha was never releasing files even after clients 
    unmounted the file system.
    
    We've only recently made the change, so we'll see how much that improved 
    the situation.
    
    I thought we had a reproducer but after our recent change, I can now no 
    longer successfully reproduce the increase in open files not being released.
    
    Kind regards,
    Frederik
    
    On 19/09/2019 15:20, Billich  Heinrich Rainer (ID SD) wrote:
    > Hello,
    > 
    > Is it usual to see 200?000-400?000 open files for a single ganesha 
    > process? Or does this indicate that something ist wrong?
    > 
    > We have some issues with ganesha (on spectrum scale protocol nodes) 
    >   reporting NFS3ERR_IO in the log. I noticed that the affected nodes 
    > have a large number of open files, 200?000-400?000 open files per daemon 
    > (and 500 threads and about 250 client connections). Other nodes have 
    > 1?000 ? 10?000 open files by ganesha only and don?t show the issue.
    > 
    > If someone could explain how ganesha decides which files to keep open 
    > and which to close that would help, too. As NFSv3 is stateless the 
    > client doesn?t open/close a file, it?s the server to decide when to 
    > close it? We do have a few NFSv4 clients, too.
    > 
    > Are there certain access patterns that can trigger such a large number 
    > of open file? Maybe traversing and reading a large number of small files?
    > 
    > Thank you,
    > 
    > Heiner
    > 
    > I did count the open files  by counting the entries in /proc/<pid of 
    > ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to 
    > list all the symbolic links, hence I can?t relate the open files to 
    > different exports easily.
    > 
    > I did post this to the ganesha mailing list, too.
    > 
    > -- 
    > 
    > =======================
    > 
    > Heinrich Billich
    > 
    > ETH Z?rich
    > 
    > Informatikdienste
    > 
    > Tel.: +41 44 632 72 56
    > 
    > heinrich.billich at id.ethz.ch
    > 
    > ========================
    > 
    > 
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    > 
    
    
    -- 
    This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
    Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
    Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
    Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    

From heinrich.billich at id.ethz.ch  Mon Sep 23 11:43:06 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Mon, 23 Sep 2019 10:43:06 +0000
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is
 this unusual?
In-Reply-To: <OF0B80358D.CD6BEB6B-ON0025847A.0063AE28-0025847A.0066700B@notes.na.collabserv.com>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
	<OF0B80358D.CD6BEB6B-ON0025847A.0063AE28-0025847A.0066700B@notes.na.collabserv.com>
Message-ID: <72079C31-1E3E-4F69-B428-480620466353@id.ethz.ch>

Hello Malhal,

Thank you. Actually I don?t see the parameter Cache_FDs in our ganesha config. But when I trace LRU processing I see that almost no FDs get released. And the number of FDs given in the log messages doesn?t match what I see in /proc/<pid of ganesha>/fd/. I see 512k open files while the logfile give 600k. Even 4hours since the I suspended the node and all i/o activity stopped I see 500k open files and LRU processing doesn?t close any of them.

This looks like a bug in gpfs.nfs-ganesha-2.5.3-ibm036.10.el7. I?ll open a case with IBM. We did see gansha to fail to open new files and hence client requests to fail. I assume that 500K FDs compared to 10K FDs as before create some notable overhead for ganesha, spectrum scale and kernel and withdraw resources from samba.

I?ll post to the list once we got some results.

Cheers,

Heiner


Start of LRU processing

2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1027
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51350 totalclosed:6
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1027
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1027 closing 0 descriptors
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1028
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51400 totalclosed:6
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1028
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1028 closing 0 descriptors

End of log
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1029
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1029 closing 0 descriptors
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1030
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51500 totalclosed:6
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1030
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1030 closing 0 descriptors
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :After work, open_fd_count:607024  count:29503718 fdrate:1908874353 threadwait=9
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :currentopen=607024 futility=0 totalwork=51550 biggest_window=335544 extremis=0 lanes=1031 fds_lowat=167772

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Malahal R Naineni <mnaineni at in.ibm.com>
Reply to: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 19 September 2019 at 20:39
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual?

NFSv3 doesn't have open/close requests, so nfs-ganesha opens a file for read/write when there is an NFSv3 read/write request. It does cache file descriptors, so its open count can be very large. If you have 'Cache_FDs = true" in your config, ganesha aggressively caches file descriptors.

Taking traces with COMPONENT_CACHE_INODE_LRU level set to full debug should give us better insight on what is happening when the the open file descriptors count is very high.

When the I/O failure happens or when the open fd count is high, you could do the following:

1. ganesha_mgr set_log COMPONENT_CACHE_INODE_LRU FULL_DEBUG
2. wait for 90 seconds, then run
3. ganesha_mgr set_log COMPONENT_CACHE_INODE_LRU EVENT

Regards, Malahal.

----- Original message -----
From: "Billich Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual?
Date: Thu, Sep 19, 2019 7:51 PM


Hello,


Is it usual to see 200?000-400?000 open files for a single ganesha process? Or does this indicate that something ist wrong?


We have some issues with ganesha (on spectrum scale protocol nodes)  reporting NFS3ERR_IO in the log. I noticed that the affected nodes have a large number of open files, 200?000-400?000 open files per daemon (and 500 threads and about 250 client connections). Other nodes have 1?000 ? 10?000 open files by ganesha only and don?t show the issue.


If someone could explain how ganesha decides which files to keep open and which to close that would help, too. As NFSv3 is stateless the client doesn?t open/close a file, it?s the server to decide when to close it? We do have a few NFSv4 clients, too.


Are there certain access patterns that can trigger such a large number of open file? Maybe traversing and reading a large number of small files?


Thank you,

Heiner


I did count the open files  by counting the entries in /proc/<pid of ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to list all the symbolic links, hence I can?t relate the open files to different exports easily.


I did post this to the ganesha mailing list, too.

--

=======================

Heinrich Billich

ETH Z?rich

Informatikdienste

Tel.: +41 44 632 72 56

heinrich.billich at id.ethz.ch

========================


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190923/f76d0274/attachment.htm>

From heinrich.billich at id.ethz.ch  Tue Sep 24 09:52:34 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Tue, 24 Sep 2019 08:52:34 +0000
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is
 this unusual?
In-Reply-To: <b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
	<b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>
Message-ID: <280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch>

Hello Frederik,

Just some addition, maybe its of interest to someone:  The number of max open files for Ganesha is based on maxFilesToCache. Its. 80%of maxFilesToCache up to an upper and lower limits of 2000/1M. The active setting is visible in /etc/sysconfig/ganesha.

Cheers,

Heiner

?On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" <gpfsug-discuss-bounces at spectrumscale.org on behalf of frederik.ferner at diamond.ac.uk> wrote:

    Heiner,
    
    we are seeing similar issues with CES/ganesha NFS, in our case it 
    exclusively with NFSv3 clients.
    
    What is maxFilesToCache set to on your ganesha node(s)? In our case 
    ganesha was running into the limit of open file descriptors because 
    maxFilesToCache was set at a low default and for now we've increased it 
    to 1M.
    
    It seemed that ganesha was never releasing files even after clients 
    unmounted the file system.
    
    We've only recently made the change, so we'll see how much that improved 
    the situation.
    
    I thought we had a reproducer but after our recent change, I can now no 
    longer successfully reproduce the increase in open files not being released.
    
    Kind regards,
    Frederik
    
    On 19/09/2019 15:20, Billich  Heinrich Rainer (ID SD) wrote:
    > Hello,
    > 
    > Is it usual to see 200?000-400?000 open files for a single ganesha 
    > process? Or does this indicate that something ist wrong?
    > 
    > We have some issues with ganesha (on spectrum scale protocol nodes) 
    >   reporting NFS3ERR_IO in the log. I noticed that the affected nodes 
    > have a large number of open files, 200?000-400?000 open files per daemon 
    > (and 500 threads and about 250 client connections). Other nodes have 
    > 1?000 ? 10?000 open files by ganesha only and don?t show the issue.
    > 
    > If someone could explain how ganesha decides which files to keep open 
    > and which to close that would help, too. As NFSv3 is stateless the 
    > client doesn?t open/close a file, it?s the server to decide when to 
    > close it? We do have a few NFSv4 clients, too.
    > 
    > Are there certain access patterns that can trigger such a large number 
    > of open file? Maybe traversing and reading a large number of small files?
    > 
    > Thank you,
    > 
    > Heiner
    > 
    > I did count the open files  by counting the entries in /proc/<pid of 
    > ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to 
    > list all the symbolic links, hence I can?t relate the open files to 
    > different exports easily.
    > 
    > I did post this to the ganesha mailing list, too.
    > 
    > -- 
    > 
    > =======================
    > 
    > Heinrich Billich
    > 
    > ETH Z?rich
    > 
    > Informatikdienste
    > 
    > Tel.: +41 44 632 72 56
    > 
    > heinrich.billich at id.ethz.ch
    > 
    > ========================
    > 
    > 
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    > 
    
    
    -- 
    This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
    Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
    Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
    Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    

From valdis.kletnieks at vt.edu  Tue Sep 24 21:41:07 2019
From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks)
Date: Tue, 24 Sep 2019 16:41:07 -0400
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is
	this unusual?
In-Reply-To: <280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
	<b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>
	<280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch>
Message-ID: <269692.1569357667@turing-police>

On Tue, 24 Sep 2019 08:52:34 -0000, "Billich Heinrich Rainer (ID SD)" said:
> Just some addition, maybe its of interest to someone:  The number of max open
> files for Ganesha is based on maxFilesToCache. Its. 80%of maxFilesToCache up to
> an upper and lower limits of 2000/1M. The active setting is visible in
> /etc/sysconfig/ganesha.

Note that strictly speaking, the values in /etc/sysconfig are in general the
values that will be used at next restart - it's totally possible for the system
to boot, the then-current values be picked up from /etc/sysconfig, and then any
number of things, from configuration automation tools like Ansible, to a
cow-orker sysadmin armed with nothing but /usr/bin/vi, to have changed the
values without you knowing about it and the daemons not be restarted yet...

(Let's just say that in 4 decades of doing this stuff, I've been surprised by that
sort of thing a few times.  :)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190924/0ccd6eff/attachment.sig>

From mnaineni at in.ibm.com  Wed Sep 25 18:06:18 2019
From: mnaineni at in.ibm.com (Malahal R Naineni)
Date: Wed, 25 Sep 2019 17:06:18 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Ganesha_daemon_has_400=27000_open_file?=
 =?utf-8?q?s_-_is=09this_unusual=3F?=
In-Reply-To: <269692.1569357667@turing-police>
References: <269692.1569357667@turing-police>,
	<819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch><b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk><280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch>
Message-ID: <OF9267490D.933DCBEF-ON00258480.005D5A71-00258480.005DF60D@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190925/981e9862/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: att6j9ca.dat
Type: application/octet-stream
Size: 849 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190925/981e9862/attachment.obj>

From L.R.Sudbery at bham.ac.uk  Thu Sep 26 10:38:09 2019
From: L.R.Sudbery at bham.ac.uk (Luke Sudbery)
Date: Thu, 26 Sep 2019 09:38:09 +0000
Subject: [gpfsug-discuss] GPFS and POWER9
In-Reply-To: <OF41085F4B.A1242A6C-ON0025847A.00767BDC-0025847A.00768562@notes.na.collabserv.com>
References: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>,
	<2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk><OF5BB3FFA4.5126D760-ON0025847A.003B07EE-0025847A.003B07F5@notes.na.collabserv.com>
	<OF41085F4B.A1242A6C-ON0025847A.00767BDC-0025847A.00768562@notes.na.collabserv.com>
Message-ID: <3b15db460ac1459e9ca53bec00f30833@bham.ac.uk>

We think our issue was down to numa settings actually - making mmfsd allocate GPU memory. Makes sense given the type of error.

Tomer suggested to Simon we set numactlOptioni to "0 8", as per:
https://www-01.ibm.com/support/docview.wss?uid=isg1IJ02794

Our tests are not crashing since setting then ? we need to roll it out on all nodes to confirm its fixed all our hangs/reboots.

Cheers,

Luke

--
Luke Sudbery
Architecture, Infrastructure and Systems
Advanced Research Computing, IT Services
Room 132, Computer Centre G5, Elms Road

Please note I don?t work on Monday and work from home on Friday.

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of abeattie at au1.ibm.com
Sent: 19 September 2019 22:35
To: gpfsug-discuss at spectrumscale.org
Cc: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] GPFS and POWER9

Simon,

I have an open support call that required Redhat to create a kernel patch for RH 7.6 because of issues with the Intel x710 network adapter - I can't tell you if its related to your issue or not

but it would cause the GPFS cluster to reboot and the affected node to reboot if we tried to do almost anything with that intel adapter

regards,
Andrew Beattie
File and Object Storage Technical Specialist - A/NZ
IBM Systems - Storage
Phone: 614-2133-7927
E-mail: abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>


----- Original message -----
From: Simon Thompson <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Cc:
Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS and POWER9
Date: Fri, Sep 20, 2019 1:18 AM


Hi Andrew,


Yes, but not only. We use the two SFP+ ports from the Broadcom supplied card + the bifurcated Mellanox card in them.


Simon


From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of "abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>" <abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>>
Reply-To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Thursday, 19 September 2019 at 11:45
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] GPFS and POWER9


Simon,


are you using Intel 10Gb Network Adapters with RH 7.6 by anychance?


regards

Andrew Beattie

File and Object Storage Technical Specialist - A/NZ

IBM Systems - Storage

Phone: 614-2133-7927

E-mail: abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>


----- Original message -----
From: Simon Thompson <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] GPFS and POWER9
Date: Thu, Sep 19, 2019 8:42 PM


Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log:


Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert

Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000

Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3

Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert

Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [00000000115a2478] PID: 141380 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Load/Store]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000003002a2a8400

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c016590000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001150b160] PID: 141380 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001150b160

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c01fe80000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001086a7f0

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c00fe70000

Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4


I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed.


Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)?


Simon

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190926/0a8cd98f/attachment.htm>

From andreas.mattsson at maxiv.lu.se  Thu Sep 26 10:55:45 2019
From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson)
Date: Thu, 26 Sep 2019 09:55:45 +0000
Subject: [gpfsug-discuss] afmRefreshAsync questions
Message-ID: <ec6802b481a4474e8a346676ee8ee817@maxiv.lu.se>

Hi,

Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up.

Enabling this feature has had zero impact on performance of the software though.


The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x.

Would this feature still have any effect in this setup?


Regards,

Andreas Mattsson


____________________________________________

[X]

Andreas Mattsson

Systems Engineer


MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 224 84 Lund
Mobile: +46 706 64 95 44
<mailto:andreas.mattsson at maxiv.se>andreas.mattsson at maxiv.lu.se<mailto:andreas.mattsson at maxiv.lu.se>
www.maxiv.se<http://www.maxiv.se/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190926/be6fb1ca/attachment.htm>

From vpuvvada at in.ibm.com  Fri Sep 27 09:23:13 2019
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 27 Sep 2019 13:53:13 +0530
Subject: [gpfsug-discuss] afmRefreshAsync questions
In-Reply-To: <ec6802b481a4474e8a346676ee8ee817@maxiv.lu.se>
References: <ec6802b481a4474e8a346676ee8ee817@maxiv.lu.se>
Message-ID: <OFFED7B75A.B058DB34-ON65258482.002DA8DD-65258482.002E1244@notes.na.collabserv.com>

Hi,

Both storage and client clusters  have to be on 5.0.3.x to get the AFM 
revalidation performance with afmRefreshAsync. What are the refresh 
intervals ?, you could also try increasing them. Is this config option set 
at fileset level or cluster level ?

~Venkat (vpuvvada at in.ibm.com)


From:   Andreas Mattsson <andreas.mattsson at maxiv.lu.se>
To:     GPFS User Group <gpfsug-discuss at spectrumscale.org>
Date:   09/26/2019 03:26 PM
Subject:        [EXTERNAL] [gpfsug-discuss] afmRefreshAsync questions
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi,
Due to having a data analysis software that isn't running well at all in 
our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM 
fileset on the same storage system, I wanted to try out the 
afmRefreshAsync feature that came with 5.0.3 to see if it is the cache 
data refresh that is holding things up.
Enabling this feature has had zero impact on performance of the software 
though.

The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set 
there, but at the moment the remote-mounting client cluster is still 
running 5.0.2.x.
Would this feature still have any effect in this setup?

Regards,
Andreas Mattsson

____________________________________________


Andreas Mattsson
Systems Engineer
 
MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 224 84 Lund
Mobile: +46 706 64 95 44
andreas.mattsson at maxiv.lu.se
www.maxiv.se
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=tjCOcTjZ_AjP3N1mpspwuLu5u2XOFb5LkZqVAwX3wk8&s=tD6X2XM1HPMqWxSg-IelnstWbneQ7On4xfEVkCajtPE&e= 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190927/0ef79489/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 4232 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190927/0ef79489/attachment.png>

From sakkuma4 at in.ibm.com  Fri Sep 27 11:31:42 2019
From: sakkuma4 at in.ibm.com (Saket Kumar11)
Date: Fri, 27 Sep 2019 10:31:42 +0000
Subject: [gpfsug-discuss] afmRefreshAsync questions
In-Reply-To: <mailman.1.1569495602.52991.gpfsug-discuss@spectrumscale.org>
References: <mailman.1.1569495602.52991.gpfsug-discuss@spectrumscale.org>
Message-ID: <OFAE8D4A31.6A3AF249-ON00258481.00443BC9-00258482.0039D5A9@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190927/c70622b4/attachment.htm>

From abeattie at au1.ibm.com  Sun Sep  1 14:17:01 2019
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Sun, 1 Sep 2019 13:17:01 +0000
Subject: [gpfsug-discuss] Backup question
In-Reply-To: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com>
References: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com>
Message-ID: <OF900738E9.D3475363-ON00258468.004792DF-00258468.0048F86B@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190901/c59b82d1/attachment-0001.htm>

From sandeep.patil at in.ibm.com  Tue Sep  3 06:28:30 2019
From: sandeep.patil at in.ibm.com (Sandeep Ramesh)
Date: Tue, 3 Sep 2019 05:28:30 +0000
Subject: [gpfsug-discuss] Latest Technical Blogs on IBM Spectrum Scale (Q2
	2019)
In-Reply-To: <OFEED53014.BEAB96A3-ON652583EB.00247469-652583EB.0024D44E@LocalDomain>
References: <OF7A360CDE.FA6DB691-ON652581DA.005047B1-652581DA.00510C76@LocalDomain>
	<OF574EC5A3.432467EB-ON65258211.00247AF9-65258211.0024E8C2@LocalDomain>
	<OF3AFFA28C.972DCC84-ON6525825D.0040EC76-6525825D.004159E3@LocalDomain>
	<OFA6EC728F.FF378285-ON652582BE.00649A77-652582BE.0066D779@LocalDomain>
	<OF0BEA5F18.0E4A8655-ON6525831B.0051B859-6525831B.00540D1A@LocalDomain>
	<OFDAD8861F.EBFB80F2-ON65258382.0045FED0-65258382.0046E369@LocalDomain>
	<OFEED53014.BEAB96A3-ON652583EB.00247469-652583EB.0024D44E@LocalDomain>
Message-ID: <OFC7CE74CE.2A8DF83C-ON6525846A.001C9692-6525846A.001E12F4@notes.na.collabserv.com>

Dear User Group Members,

In continuation, here are list of development blogs in the this quarter 
(Q2 2019). We now have over 100+ developer blogs on Spectrum Scale/ESS. As 
discussed in User Groups, passing it along to the emailing list.


Redpaper : IBM Power Systems Enterprise AI Solutions (W/ SPECTRUM SCALE)
http://www.redbooks.ibm.com/redpieces/abstracts/redp5556.html?Open

IBM Spectrum Scale Erasure Code Edition (ECE): Installation Demonstration 

https://www.youtube.com/watch?v=6If50EvgP-U

Blogs:
Using IBM Spectrum Scale as platform storage for running containerized 
Hadoop/Spark workloads
https://developer.ibm.com/storage/2019/08/27/using-ibm-spectrum-scale-as-platform-storage-for-running-containerized-hadoop-spark-workloads/

Useful Tools for Spectrum Scale CES NFS
https://developer.ibm.com/storage/2019/07/22/useful-tools-for-spectrum-scale-ces-nfs/

How to ensure NFS uses strong encryption algorithms for secure data in 
motion ?
https://developer.ibm.com/storage/2019/07/19/how-to-ensure-nfs-uses-strong-encryption-algorithms-for-secure-data-in-motion/

Introducing IBM Spectrum Scale Erasure Code Edition
https://developer.ibm.com/storage/2019/07/07/introducing-ibm-spectrum-scale-erasure-code-edition/

Spectrum Scale: Which Filesystem Encryption Algo to Consider ?
https://developer.ibm.com/storage/2019/07/01/spectrum-scale-which-filesystem-encryption-algo-to-consider/

IBM Spectrum Scale HDFS Transparency Apache Hadoop 3.1.x Support
https://developer.ibm.com/storage/2019/06/24/ibm-spectrum-scale-hdfs-transparency-apache-hadoop-3-0-x-support/

Enhanced features in Elastic Storage Server (ESS) 5.3.4 
https://developer.ibm.com/storage/2019/06/19/enhanced-features-in-elastic-storage-server-ess-5-3-4/

Upgrading IBM Spectrum Scale Erasure Code Edition using installation 
toolkit
https://developer.ibm.com/storage/2019/06/09/upgrading-ibm-spectrum-scale-erasure-code-edition-using-installation-toolkit/

Upgrading IBM Spectrum Scale sync replication / stretch cluster setup in 
PureApp
https://developer.ibm.com/storage/2019/06/06/upgrading-ibm-spectrum-scale-sync-replication-stretch-cluster-setup/

GPFS config remote access with multiple network definitions
https://developer.ibm.com/storage/2019/05/30/gpfs-config-remote-access-with-multiple-network-definitions/

IBM Spectrum Scale Erasure Code Edition Fault Tolerance
https://developer.ibm.com/storage/2019/05/30/ibm-spectrum-scale-erasure-code-edition-fault-tolerance/

IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 
5.0.3 ?
https://developer.ibm.com/storage/2019/05/02/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-3/

Understanding and Solving WBC_ERR_DOMAIN_NOT_FOUND error with 
Spectrum Scale
https://crk10.wordpress.com/2019/07/21/solving-the-wbc-err-domain-not-found-nt-status-none-mapped-glitch-in-ibm-spectrum-scale/

Understanding and Solving NT_STATUS_INVALID_SID issue for SMB access with 
Spectrum Scale
https://crk10.wordpress.com/2019/07/24/solving-nt_status_invalid_sid-for-smb-share-access-in-ibm-spectrum-scale/

mmadquery primer (apparatus to query Active Directory from IBM 
Spectrum Scale)
https://crk10.wordpress.com/2019/07/27/mmadquery-primer-apparatus-to-query-active-directory-from-ibm-spectrum-scale/

How to configure RHEL host as Active Directory Client using SSSD
https://crk10.wordpress.com/2019/07/28/configure-rhel-machine-as-active-directory-client-using-sssd/

How to configure RHEL host as LDAP client using nslcd
https://crk10.wordpress.com/2019/07/28/configure-rhel-machine-as-ldap-client-using-nslcd/

Solving NFSv4 AUTH_SYS nobody ownership issue
https://crk10.wordpress.com/2019/07/29/nfsv4-auth_sys-nobody-ownership-and-idmapd/

For more : Search /browse here: https://developer.ibm.com/storage/blog

User Group Presentations: 
https://www.spectrumscale.org/presentations/

Consolidation list of all blogs and collaterals.
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   04/29/2019 12:12 PM
Subject:        Latest Technical Blogs on IBM Spectrum Scale (Q1 2019)


Dear User Group Members,

In continuation, here are list of development blogs in the this quarter 
(Q1 2019). We now have over 100+ developer blogs on Spectrum Scale/ESS. As 
discussed in User Groups, passing it along to the emailing list.

Spectrum Scale 5.0.3
https://developer.ibm.com/storage/2019/04/24/spectrum-scale-5-0-3/

IBM Spectrum Scale HDFS Transparency Ranger Support
https://developer.ibm.com/storage/2019/04/01/ibm-spectrum-scale-hdfs-transparency-ranger-support/

Integration of IBM Aspera Sync with IBM Spectrum Scale: Protecting and 
Sharing Files Globally, 

http://www.redbooks.ibm.com/abstracts/redp5527.html?Open

Spectrum Scale user group in Singapore, 2019
https://developer.ibm.com/storage/2019/03/14/spectrum-scale-user-group-in-singapore-2019/

7 traits to use Spectrum Scale to run container workload
https://developer.ibm.com/storage/2019/02/26/7-traits-to-use-spectrum-scale-to-run-container-workload/

Health Monitoring of IBM Spectrum Scale Cluster via External Monitoring 
Framework
https://developer.ibm.com/storage/2019/01/22/health-monitoring-of-ibm-spectrum-scale-cluster-via-external-monitoring-framework/

Migrating data from native HDFS to IBM Spectrum Scale based shared storage
https://developer.ibm.com/storage/2019/01/18/migrating-data-from-native-hdfs-to-ibm-spectrum-scale-based-shared-storage/

Bulk File Creation useful for Test on Filesystems
https://developer.ibm.com/storage/2019/01/16/bulk-file-creation-useful-for-test-on-filesystems/


For more : Search /browse here: https://developer.ibm.com/storage/blog

User Group Presentations: 
https://www.spectrumscale.org/presentations/

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   01/14/2019 06:24 PM
Subject:        Latest Technical Blogs on IBM Spectrum Scale (Q4 2018)


Dear User Group Members,

In continuation, here are list of development blogs in the this quarter 
(Q4 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As 
discussed in User Groups, passing it along to the emailing list.

Redpaper: IBM Spectrum Scale and IBM StoredIQ: Identifying and securing 
your business data to support regulatory requirements
http://www.redbooks.ibm.com/abstracts/redp5525.html?Open

IBM Spectrum Scale Memory Usage
https://www.slideshare.net/tomerperry/ibm-spectrum-scale-memory-usage?qid=50a1dfda-3102-484f-b9d0-14b69fc4800b&v=&b=&from_search=2

Spectrum Scale and Containers
https://developer.ibm.com/storage/2018/12/20/spectrum-scale-and-containers/

IBM Elastic Storage Server Performance Graphical Visualization with 
Grafana
https://developer.ibm.com/storage/2018/12/18/ibm-elastic-storage-server-performance-graphical-visualization-with-grafana/

Hadoop Performance for disaggregated compute and storage configurations 
based on IBM Spectrum Scale Storage
https://developer.ibm.com/storage/2018/12/13/hadoop-performance-for-disaggregated-compute-and-storage-configurations-based-on-ibm-spectrum-scale-storage/

EMS HA in ESS LE (Little Endian) environment
https://developer.ibm.com/storage/2018/12/07/ems-ha-in-ess-le-little-endian-environment/

What?s new in ESS 5.3.2
https://developer.ibm.com/storage/2018/12/04/whats-new-in-ess-5-3-2/

Administer your Spectrum Scale cluster easily
https://developer.ibm.com/storage/2018/11/13/administer-your-spectrum-scale-cluster-easily/

Disaster Recovery using Spectrum Scale?s Active File Management
https://developer.ibm.com/storage/2018/11/13/disaster-recovery-using-spectrum-scales-active-file-management/

Recovery Group Failover Procedure of IBM Elastic Storage Server (ESS)
https://developer.ibm.com/storage/2018/10/08/recovery-group-failover-procedure-ibm-elastic-storage-server-ess/

Whats new in IBM Elastic Storage Server (ESS) Version 5.3.1 and 5.3.1.1
https://developer.ibm.com/storage/2018/10/04/whats-new-ibm-elastic-storage-server-ess-version-5-3-1-5-3-1-1/

For more : Search /browse here: https://developer.ibm.com/storage/blog

User Group Presentations: 
https://www.spectrumscale.org/presentations/

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   10/03/2018 08:48 PM
Subject:        Latest Technical Blogs on IBM Spectrum Scale (Q3 2018)


Dear User Group Members,

In continuation, here are list of development blogs in the this quarter 
(Q3 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As 
discussed in User Groups, passing it along to the emailing list.

How NFS exports became more dynamic with Spectrum Scale 5.0.2
https://developer.ibm.com/storage/2018/10/02/nfs-exports-became-dynamic-spectrum-scale-5-0-2/

HPC storage on AWS (IBM Spectrum Scale)
https://developer.ibm.com/storage/2018/10/02/hpc-storage-aws-ibm-spectrum-scale/

Upgrade with Excluding the node(s) using Install-toolkit
https://developer.ibm.com/storage/2018/09/30/upgrade-excluding-nodes-using-install-toolkit/

Offline upgrade using Install-toolkit
https://developer.ibm.com/storage/2018/09/30/offline-upgrade-using-install-toolkit/

IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 
5.0.2 ?
https://developer.ibm.com/storage/2018/09/21/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-2/

What?s New in IBM Spectrum Scale 5.0.2 ?
https://developer.ibm.com/storage/2018/09/15/whats-new-ibm-spectrum-scale-5-0-2/

Starting IBM Spectrum Scale 5.0.2 release, the installation toolkit 
supports upgrade rerun if fresh upgrade fails.
https://developer.ibm.com/storage/2018/09/15/starting-ibm-spectrum-scale-5-0-2-release-installation-toolkit-supports-upgrade-rerun-fresh-upgrade-fails/

IBM Spectrum Scale installation toolkit ? enhancements over releases ? 
5.0.2.0
https://developer.ibm.com/storage/2018/09/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases-5-0-2-0/

Announcing HDP 3.0 support with IBM Spectrum Scale
https://developer.ibm.com/storage/2018/08/31/announcing-hdp-3-0-support-ibm-spectrum-scale/

IBM Spectrum Scale Tuning Overview for Hadoop Workload
https://developer.ibm.com/storage/2018/08/20/ibm-spectrum-scale-tuning-overview-hadoop-workload/

Making the Most of Multicloud Storage
https://developer.ibm.com/storage/2018/08/13/making-multicloud-storage/

Disaster Recovery for Transparent Cloud Tiering using SOBAR
https://developer.ibm.com/storage/2018/08/13/disaster-recovery-transparent-cloud-tiering-using-sobar/

Your Optimal Choice of AI Storage for Today and Tomorrow
https://developer.ibm.com/storage/2018/08/10/spectrum-scale-ai-workloads/

Analyze IBM Spectrum Scale File Access Audit with ELK Stack
https://developer.ibm.com/storage/2018/07/30/analyze-ibm-spectrum-scale-file-access-audit-elk-stack/

Mellanox SX1710 40G switch MLAG configuration for IBM ESS
https://developer.ibm.com/storage/2018/07/12/mellanox-sx1710-40g-switcher-mlag-configuration/

Protocol Problem Determination Guide for IBM Spectrum Scale? ? SMB and NFS 
Access issues
https://developer.ibm.com/storage/2018/07/10/protocol-problem-determination-guide-ibm-spectrum-scale-smb-nfs-access-issues/

Access Control in IBM Spectrum Scale Object
https://developer.ibm.com/storage/2018/07/06/access-control-ibm-spectrum-scale-object/

IBM Spectrum Scale HDFS Transparency Docker support
https://developer.ibm.com/storage/2018/07/06/ibm-spectrum-scale-hdfs-transparency-docker-support/

Protocol Problem Determination Guide for IBM Spectrum Scale? ? Log 
Collection
https://developer.ibm.com/storage/2018/07/04/protocol-problem-determination-guide-ibm-spectrum-scale-log-collection/


Redpapers

IBM Spectrum Scale Immutability Introduction, Configuration Guidance, 
and Use Cases
http://www.redbooks.ibm.com/abstracts/redp5507.html?Open

Certifications
Assessment of the immutability function of IBM Spectrum Scale Version 5.0 
in accordance to US SEC17a-4f, EU GDPR Article 21 Section 1, German and 
Swiss laws and regulations in collaboration with KPMG.

Certificate: 
http://www.kpmg.de/bescheinigungen/RequestReport.aspx?DE968667B47544FF83F6CCDCF37E5FB5
Full assessment report: 
http://www.kpmg.de/bescheinigungen/RequestReport.aspx?B290411BE1224F5A9B4D24663BCD3C5D

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   07/03/2018 12:13 AM
Subject:        Re: Latest Technical Blogs on Spectrum Scale (Q2 2018)


Dear User Group Members,

In continuation , here are list of development blogs in the this quarter 
(Q2 2018). We now have over 100+ developer blogs. As discussed in User 
Groups, passing it along:

IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object
https://developer.ibm.com/storage/2018/06/15/6494/

IBM Spectrum Scale ILM Policies
https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/

IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object
https://developer.ibm.com/storage/2018/06/15/6494/

Management GUI enhancements in IBM Spectrum Scale release 5.0.1
https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/


Managing IBM Spectrum Scale services through GUI
https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/


Use AWS CLI with IBM Spectrum Scale? object storage
https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/

Hadoop Storage Tiering with IBM Spectrum Scale
https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/

How many Files on my Filesystem?
https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/

Recording Spectrum Scale Object Stats for Potential Billing like Purpose 
using Elasticsearch
https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/

New features in IBM Elastic Storage Server (ESS) Version 5.3
https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/


Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send 
earlier)
https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19


Redpapers

Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for 
Building an Integrated Solution
http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, 

Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent 
Cloud Tiering
http://www.redbooks.ibm.com/abstracts/redp5411.html?Open

SAP HANA and ESS: A Winning Combination (Update)
http://www.redbooks.ibm.com/abstracts/redp5436.html?Open


Others
IBM Spectrum Scale Software Version Recommendation Preventive Service 
Planning (Updated)
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, 

IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in 
HCLS
https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN&

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   03/27/2018 05:23 PM
Subject:        Re: Latest Technical Blogs on Spectrum Scale


Dear User Group Members,

In continuation , here are list of development blogs in the this quarter 
(Q1 2018). As discussed in User Groups, passing it along:

GDPR Compliance and Unstructured Data Storage
https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/

IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and 
highlights
https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/

Management GUI enhancements in IBM Spectrum Scale release 5.0.0
https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/

IBM Spectrum Scale 5.0.0 ? What?s new in NFS?
https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/

Benefits and implementation of Spectrum Scale sudo wrappers
https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/

IBM Spectrum Scale: Big Data and Analytics Solution Brief
https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/

Variant Sub-blocks in Spectrum Scale 5.0
https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/

Compression support in Spectrum Scale 5.0.0
https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/

IBM Spectrum Scale Versus Apache Hadoop HDFS
https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/

ESS Fault Tolerance
https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/


Genomic Workloads ? How To Get it Right From Infrastructure Point Of View.


https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/


IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM 
Spectrum Scale on AWS. This solution helps the users who require highly 
available access to a shared name space across multiple instances with 
good performance, without requiring an in-depth knowledge of IBM Spectrum 
Scale.
Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4
Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY.

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Cc:     Doris Conti/Poughkeepsie/IBM at IBMUS
Date:   01/10/2018 12:13 PM
Subject:        Re: Latest Technical Blogs on Spectrum Scale


Dear User Group Members,

Here are list of development blogs in the last quarter. Passing it to this 
email group as Doris had got a feedback in the UG meetings to notify the 
members with the latest updates periodically.

Genomic Workloads ? How To Get it Right From Infrastructure Point Of View.
https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/

IBM Spectrum Scale Versus Apache Hadoop HDFS
https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/

ESS Fault Tolerance
https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/

IBM Spectrum Scale MMFSCK ? Savvy Enhancements
https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/

ESS Disk Management
https://developer.ibm.com/storage/2018/01/02/ess-disk-management/

IBM Spectrum Scale Object Protocol On Ubuntu
https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/

IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object
https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/

A Complete Guide to ? Protocol Problem Determination Guide for IBM 
Spectrum Scale? ? Part 1
https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/

IBM Spectrum Scale installation toolkit ? enhancements over releases
https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/

Network requirements in an Elastic Storage Server Setup
https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/

Co-resident migration with Transparent cloud tierin
https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/

IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big 
Data Solution
https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/

Big data analytics with Spectrum Scale using remote cluster mount & 
multi-filesystem support
https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/

IBM Spectrum Scale HDFS Transparency Short Circuit Write Support
https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/

IBM Spectrum Scale HDFS Transparency Federation Support
https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/

How to configure and performance tuning different system workloads on IBM 
Spectrum Scale Sharing Nothing Cluster
https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

How to configure and performance tuning Spark workloads on IBM Spectrum 
Scale Sharing Nothing Cluster
https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

How to configure and performance tuning database workloads on IBM Spectrum 
Scale Sharing Nothing Cluster
https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

How to configure and performance tuning Hadoop workloads on IBM Spectrum 
Scale Sharing Nothing Cluster

https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning
https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/

How to Configure IBM Spectrum Scale? with NIS based Authentication.
https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/


For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Cc:     Doris Conti/Poughkeepsie/IBM at IBMUS
Date:   11/16/2017 08:15 PM
Subject:        Latest Technical Blogs on Spectrum Scale


Dear User Group members,

Here are the Development Blogs in last 3 months on Spectrum Scale 
Technical Topics.

Spectrum Scale Monitoring ? Know More ?
https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/

IBM Spectrum Scale 5.0 Release ? What?s coming !
https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/

Four Essentials things to know for managing data ACLs on IBM Spectrum 
Scale? from Windows
https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/

GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server
https://developer.ibm.com/storage/2017/11/13/gssutils/

IBM Spectrum Scale Object Authentication
https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/

Video Surveillance ? Choosing the right storage
https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/


IBM Spectrum scale object deep dive training with problem determination
https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training


Spectrum Scale as preferred software defined storage for Ubuntu OpenStack
https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/

IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a 
performance workhorse
https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/

A Complete Guide to Configure LDAP-based authentication with IBM Spectrum 
Scale? for File Access
https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/

Deploying IBM Spectrum Scale on AWS Quick Start
https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/

Monitoring Spectrum Scale Object metrics
https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/

Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk 
Universal

https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/

Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol 
on IBM Spectrum Scale??
https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/

IBM Spectrum Scale? Authentication using Active Directory and LDAP
https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/

IBM Spectrum Scale? Authentication using Active Directory and RFC2307
https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/

High Availability Implementation with IBM Spectrum Virtualize and IBM 
Spectrum Scale
https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/

10 Frequently asked Questions on configuring Authentication using AD + 
AUTO ID mapping on IBM Spectrum Scale?.
https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/

IBM Spectrum Scale? Authentication using Active Directory
https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/

Five cool things that you didn?t know Transparent Cloud Tiering on 
Spectrum Scale can do
https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/

IBM Spectrum Scale GUI videos
https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/

IBM Spectrum Scale? Authentication ? Planning for NFS Access
https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190903/9eff9450/attachment-0001.htm>

From janfrode at tanso.net  Tue Sep  3 14:07:44 2019
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Tue, 3 Sep 2019 15:07:44 +0200
Subject: [gpfsug-discuss] Fileheat - does work! Complete test/example
 provided here.
In-Reply-To: <OFC33EC1EE.D3A3A94B-ON85258455.00474286-85258455.004A6C4F@notes.na.collabserv.com>
References: <def66a66-1872-31d6-8e8f-2e1d713b775e@science-computing.de>
	<OFE4B1B51C.8A47C241-ON85258454.004A9ABD-85258454.004AFB07@notes.na.collabserv.com>
	<c8035af7-61dc-8b91-6087-7dd95e499fdc@science-computing.de>
	<CAHwPathMfF-6=rdWijkv4vrYRD4QNJHvGOJsEwf2yX-5N3PRxg@mail.gmail.com>
	<OFC33EC1EE.D3A3A94B-ON85258455.00474286-85258455.004A6C4F@notes.na.collabserv.com>
Message-ID: <CAHwPatiJzh6WQFhbyv67KG6RfU2jH950qayKzcJV4ZSgYbjH=Q@mail.gmail.com>

Thanks for this example, very userful, but I'm still struggeling a bit at a
customer..


We're doing heat daily based rebalancing, with fileheatlosspercent=20 and
fileheatperiodminutes=720:

RULE "defineTiers" GROUP POOL 'Tiers'
        IS 'ssdpool' LIMIT(70)
        then 'saspool'

RULE 'Rebalance' MIGRATE FROM POOL 'Tiers' TO POOL 'Tiers'
WEIGHT(FILE_HEAT) WHERE FILE_SIZE<10000000000

but are seeing too many files moved down to the saspool and too few are
staying in the ssdpool. Right now we ran a test of this policy, and saw
that it wanted to move 130k files / 300 GB down to the saspool, and a
single small file up to the ssdpool -- even though the ssdpool is only 50%
utilized.

Running your listing policy reveals lots of files with zero heat:

<7> /gpfs/gpfs0/file1     RULE 'fh2' LIST 'fh'  WEIGHT(0.000000) SHOW(
_NULL_ _NULL_ _NULL_ +0.00000000000000E+000 _NULL_ 720 25 server.locale)

<7> /gpfs/gpfs0/file2     RULE 'fh2' LIST 'fh' WEIGHT(0.000000) SHOW(
_NULL_ _NULL_ _NULL_ +0.00000000000000E+000 _NULL_ 720 25 server.locale)

<7> /gpfs/gpfs0/file3/HM_WVS_8P41017_1/HM_WVS_8P41017_1.S2206      RULE
'fh2' LIST 'fh' WEIGHT(0.000000) SHOW( _NULL_ _NULL_ _NULL_
+0.00000000000000E+000 _NULL_ 720 25 server.locale)


and others with heat:


<5> /gpfs/gpfs0/file4  RULE 'fh2' LIST 'fh' WEIGHT(0.004246) SHOW(
300401047 0 0 +4.24600492924153E-003 11E7C19700000000 720 25 server.locale)

<5> /gpfs/gpfs0/file5  RULE 'fh2' LIST 'fh' WEIGHT(0.001717) SHOW(
120971793 1 0 +1.71725239616613E-003 0735E21100010000 720 25 server.locale)

These are not new files -- so we're wondering if maybe the fileheat is
reduced to zero/NULL after  a while (how many times can it shrink by 25%
before it's zero??).

Would it make sense to increase fileheatperiodeminutes and/or decrease
fileheatlosspercentage? What would be good values? (BTW: we have relatime
enabled)

Any other ideas for why it won't fill up our ssdpool to close to LIMIT(70) ?


  -jf


On Tue, Aug 13, 2019 at 3:33 PM Marc A Kaplan <makaplan at us.ibm.com> wrote:

> Yes, you are correct. It should only be necessary to set
> fileHeatPeriodMinutes, since the loss percent does have a default value.
> But IIRC (I implemented part of this!) you must restart the daemon to get
> those fileheat parameter(s) "loaded"and initialized into the daemon
> processes.
>
> Not fully trusting my memory... I will now "prove" this works today as
> follows:
>
> To test, create and re-read a large file with dd...
>
> [root@/main/gpfs-git]$mmchconfig fileHeatPeriodMinutes=60
> mmchconfig: Command successfully completed
> ...
> [root@/main/gpfs-git]$mmlsconfig | grep -i heat
> fileHeatPeriodMinutes 60
>
> [root@/main/gpfs-git]$mmshutdown
> ...
> [root@/main/gpfs-git]$mmstartup
> ...
> [root@/main/gpfs-git]$mmmount c23
> ...
> [root@/main/gpfs-git]$ls -l /c23/10g
> -rw-r--r--. 1 root root 10737418240 May 16 15:09 /c23/10g
>
> [root@/main/gpfs-git]$mmlsattr -d -X /c23/10g
> file name: /c23/10g
> security.selinux
>
> (NO fileheat attribute yet...)
>
> [root@/main/gpfs-git]$dd if=/c23/10g bs=1M of=/dev/null
> ...
> After the command finishes, you may need to wait a while for the metadata
> to flush to the inode on disk ... or you can force that with an unmount or
> a mmfsctl...
>
> Then the fileheat attribute will appear (I just waited by answering
> another email... No need to do any explicit operations on the file system..)
>
> [root@/main/gpfs-git]$mmlsattr -d -X /c23/10g
> file name: /c23/10g
> security.selinux
> gpfs.FileHeat
>
> To see its hex string value:
>
> [root@/main/gpfs-git]$mmlsattr -d -X -L /c23/10g
> file name: /c23/10g
> ...
> security.selinux:
> 0x756E636F6E66696E65645F753A6F626A6563745F723A756E6C6162656C65645F743A733000
> gpfs.FileHeat: 0x000000EE42A40400
>
> Which will be interpreted by mmapplypolicy...
>
> YES, the interpretation is relative to last access time and current time,
> and done by a policy/sql function "computeFileHeat"
> (You could find this using m4 directives in your policy file...)
>
>
> define([FILE_HEAT],[computeFileHeat(CURRENT_TIMESTAMP-ACCESS_TIME,xattr('gpfs.FileHeat'),KB_ALLOCATED)])
>
> Well gone that far, might as well try mmapplypolicy too....
>
> [root@/main/gpfs-git]$cat /gh/policies/fileheat.policy
> define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1)
> END])
>
> rule fh1 external list 'fh' exec ''
> rule fh2 list 'fh' weight(FILE_HEAT)
> show(DISPLAY_NULL(xattr_integer('gpfs.FileHeat',1,4,'B')) || ' ' ||
> DISPLAY_NULL(xattr_integer('gpfs.FileHeat',5,2,'B')) || ' ' ||
> DISPLAY_NULL(xattr_integer('gpfs.FileHeat',7,2,'B')) || ' ' ||
> DISPLAY_NULL(FILE_HEAT) || ' ' ||
> DISPLAY_NULL(hex(xattr('gpfs.FileHeat'))) || ' ' ||
> getmmconfig('fileHeatPeriodMinutes') || ' ' ||
> getmmconfig('fileHeatLossPercent') || ' ' ||
> getmmconfig('clusterName') )
>
>
> [root@/main/gpfs-git]$mmapplypolicy /c23 --maxdepth 1 -P
> /gh/policies/fileheat.policy -I test -L 3
> ...
> <1> /c23/10g RULE 'fh2' LIST 'fh' WEIGHT(0.022363) SHOW( 238 17060 1024
> +2.23632812500000E-002 000000EE42A40400 60 10 makaplan.sl.cloud9.ibm.com)
> ...
> WEIGHT(0.022363) LIST 'fh' /c23/10g SHOW(238 17060 1024
> +2.23632812500000E-002 000000EE42A40400 60 10 makaplan.sl.cloud9.ibm.com)
>
>
>
>
> [image: Inactive hide details for Jan-Frode Myklebust ---08/13/2019
> 06:22:46 AM---What about filesystem atime updates. We recently chan]Jan-Frode
> Myklebust ---08/13/2019 06:22:46 AM---What about filesystem atime updates.
> We recently changed the default to ?relatime?. Could that maybe
>
> From: Jan-Frode Myklebust <janfrode at tanso.net>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 08/13/2019 06:22 AM
> Subject: [EXTERNAL] Re: [gpfsug-discuss] Fileheat
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
>
> What about filesystem atime updates. We recently changed the default to
> ?relatime?. Could that maybe influence heat tracking?
>
>
>
>   -jf
>
>
> tir. 13. aug. 2019 kl. 11:29 skrev Ulrich Sibiller <
> *u.sibiller at science-computing.de* <u.sibiller at science-computing.de>>:
>
>    On 12.08.19 15:38, Marc A Kaplan wrote:
>    > My Admin guide says:
>    >
>    > The loss percentage and period are set via the configuration
>    > variables *fileHeatLossPercent *and *fileHeatPeriodMinutes*. By
>    default, the file access temperature
>    > is not
>    > tracked. To use access temperature in policy, the tracking must
>    first be enabled. To do this, set
>    > the two
>    > configuration variables as follows:*
>
>    Yes, I am aware of that.
>
>    > fileHeatLossPercent*
>    > The percentage (between 0 and 100) of file access temperature
>    dissipated over the*
>    > fileHeatPeriodMinutes *time. The default value is 10.
>    > Chapter 25. Information lifecycle management for IBM Spectrum Scale
>    *361**
>    > fileHeatPeriodMinutes*
>    > The number of minutes defined for the recalculation of file access
>    temperature. To turn on
>    > tracking, *fileHeatPeriodMinutes *must be set to a nonzero value.
>    The default value is 0
>    >
>    >
>    > SO Try setting both!
>
>    Well, I have not because the documentation explicitly mentions a
>    default. What's the point of a
>    default if I have to explicitly configure it?
>
>    > ALSO to take effect you may have to mmshutdown and mmstartup, at
>    least on the (client gpfs) nodes
>    > that are accessing the files of interest.
>
>    I have now configured both parameters and restarted GPFS. Ran a tar
>    over a directory - still no
>    change. I will wait for 720minutes and retry (tomorrow).
>
>    Thanks
>
>    Uli
>
>    --
>    Science + Computing AG
>    Vorstandsvorsitzender/Chairman of the board of management:
>    Dr. Martin Matzke
>    Vorstand/Board of Management:
>    Matthias Schempp, Sabine Hohenstein
>    Vorsitzender des Aufsichtsrats/
>    Chairman of the Supervisory Board:
>    Philippe Miltin
>    Aufsichtsrat/Supervisory Board:
>    Martin Wibbe, Ursula Morgenstern
>    Sitz/Registered Office: Tuebingen
>    Registergericht/Registration Court: Stuttgart
>    Registernummer/Commercial Register No.: HRB 382196
>    _______________________________________________
>    gpfsug-discuss mailing list
>    gpfsug-discuss at *spectrumscale.org* <http://spectrumscale.org>
> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
>    <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>    _______________________________________________
>    gpfsug-discuss mailing list
>    gpfsug-discuss at spectrumscale.org
>    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190903/d0e482ad/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190903/d0e482ad/attachment-0001.gif>

From Robert.Oesterlin at nuance.com  Tue Sep  3 16:37:58 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Tue, 3 Sep 2019 15:37:58 +0000
Subject: [gpfsug-discuss] Easiest way to copy quota settings from one file
	system to another?
Message-ID: <63C132C3-63AF-465B-8FD9-67AF9EA4887D@nuance.com>

I?m migratinga  file system from one cluster to another.

I want to copy all user quotas from cluster1 filesystem ?A? to cluster2, filesystem ?fs1?, fileset ?A?

What?s the easiest way to do that? I?m thinking mmsetquota with a stanza file, but is there a tool to generate the stanza file from the source? I could do a ?mmrepquota -u -Y? and process the output. Hoping for something easier :)


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190903/fcd817af/attachment-0001.htm>

From andreas.mattsson at maxiv.lu.se  Thu Sep  5 10:54:04 2019
From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson)
Date: Thu, 5 Sep 2019 09:54:04 +0000
Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction
Message-ID: <3ed969d0d778446982a419067320f927@maxiv.lu.se>

Hi,

Does anyone here know if cache eviction on a AFM cache also make the inodes used by the evicted files available for reuse?

Basically, I'm trying to figure out if it is enough to have sufficient inode space in my cache filesets to keep the maximum expected simultaneously cached files, or if I need the same inode space as for the total amount of files that will reside in the home of the cache.

Regards,
Andreas Mattsson


____________________________________________

[X]

Andreas Mattsson

Systems Engineer


MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 224 84 Lund
Mobile: +46 706 64 95 44
<mailto:andreas.mattsson at maxiv.se>andreas.mattsson at maxiv.lu.se<mailto:andreas.mattsson at maxiv.lu.se>
www.maxiv.se<http://www.maxiv.se/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190905/d4d7957b/attachment-0001.htm>

From vpuvvada at in.ibm.com  Thu Sep  5 14:28:00 2019
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Thu, 5 Sep 2019 18:58:00 +0530
Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction
In-Reply-To: <3ed969d0d778446982a419067320f927@maxiv.lu.se>
References: <3ed969d0d778446982a419067320f927@maxiv.lu.se>
Message-ID: <OF71EB5401.26EBDBE8-ON6525846C.00499F4C-6525846C.0049F93B@notes.na.collabserv.com>

Hi,

AFM does not support inode eviction, only data blocks are evicted and the 
file's metadata will remain in the fileset.

~Venkat (vpuvvada at in.ibm.com)


From:   Andreas Mattsson <andreas.mattsson at maxiv.lu.se>
To:     GPFS User Group <gpfsug-discuss at spectrumscale.org>
Date:   09/05/2019 03:39 PM
Subject:        [EXTERNAL] [gpfsug-discuss] Inode reuse on AFM cache 
eviction
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi,
Does anyone here know if cache eviction on a AFM cache also make the 
inodes used by the evicted files available for reuse?
Basically, I'm trying to figure out if it is enough to have sufficient 
inode space in my cache filesets to keep the maximum expected 
simultaneously cached files, or if I need the same inode space as for the 
total amount of files that will reside in the home of the cache. 

Regards,
Andreas Mattsson

____________________________________________


Andreas Mattsson
Systems Engineer
 
MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 224 84 Lund
Mobile: +46 706 64 95 44
andreas.mattsson at maxiv.lu.se
www.maxiv.se
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=5omqUvEiiIKUhShJOBEgb3WwLU5uy-8o_4--y0TOuw0&s=ZFAcjvG5LrsnsCJgIf9f1320V866HKG6iJGteRQ7oac&e= 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190905/c2373088/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 4232 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190905/c2373088/attachment-0001.png>

From sakkuma4 at in.ibm.com  Thu Sep  5 19:37:47 2019
From: sakkuma4 at in.ibm.com (Saket Kumar11)
Date: Thu, 5 Sep 2019 18:37:47 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 92, Issue 4
In-Reply-To: <mailman.1.1567681201.8935.gpfsug-discuss@spectrumscale.org>
References: <mailman.1.1567681201.8935.gpfsug-discuss@spectrumscale.org>
Message-ID: <OF0B532D53.2DF1657A-ON0025846C.0065135F-0025846C.0066565A@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190905/cd4a915b/attachment-0001.htm>

From sakkuma4 at in.ibm.com  Thu Sep  5 20:06:17 2019
From: sakkuma4 at in.ibm.com (Saket Kumar11)
Date: Thu, 5 Sep 2019 19:06:17 +0000
Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction
In-Reply-To: <mailman.1.1567681201.8935.gpfsug-discuss@spectrumscale.org>
Message-ID: <OFEAE178B9.0A03E0A1-ON0025846C.006864D0-0025846C.0068F24D@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190905/0051eb26/attachment-0001.htm>

From son.truong at bristol.ac.uk  Fri Sep  6 10:48:56 2019
From: son.truong at bristol.ac.uk (Son Truong)
Date: Fri, 6 Sep 2019 09:48:56 +0000
Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
Message-ID: <DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>

Hello,

Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on RHEL 7.7?

I am failing with these errors:

[root at host ~]# uname -a
Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

[root at host ~]# rpm -qa | grep gpfs
gpfs.base-4.2.3-7.x86_64
gpfs.gskit-8.0.50-75.x86_64
gpfs.ext-4.2.3-7.x86_64
gpfs.msg.en_US-4.2.3-7.noarch
gpfs.docs-4.2.3-7.noarch
gpfs.gpl-4.2.3-7.noarch

[root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl
--------------------------------------------------------
mmbuildgpl: Building GPL module begins at Fri Sep  6 09:30:20 UTC 2019.
--------------------------------------------------------
Verifying Kernel Header...
  kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64, 3.10.0-1062)
  module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include
  module build dir   = /lib/modules/3.10.0-1062.el7.x86_64/build
  kernel source dir  = /usr/src/linux-3.10.0-1062.el7.x86_64/include
  Found valid kernel header file under /usr/src/kernels/3.10.0-1062.el7.x86_64/include
Verifying Compiler...
  make is present at /bin/make
  cpp is present at /bin/cpp
  gcc is present at /bin/gcc
  g++ is present at /bin/g++
  ld is present at /bin/ld
Verifying Additional System Headers...
  Verifying kernel-headers is installed ...
    Command: /bin/rpm -q kernel-headers
    The required package kernel-headers is installed
make World ...
Verifying that tools to build the portability layer exist....
cpp present
gcc present
g++ present
ld present
cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit $? || exit 1
rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib
mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib
rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver
cleaning (/usr/lpp/mmfs/src/ibm-kxi)
make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi'
rm -f trcid.h ibm_kxi.trclst

[cut]

Invoking Kbuild...
/usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config  ; \
if [ $? -ne 0 ]; then \
        exit 1;\
fi
make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
  LD      /usr/lpp/mmfs/src/gpl-linux/built-in.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracelin.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/relaytrc.o
  LD [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o
  LD [M]  /usr/lpp/mmfs/src/gpl-linux/mmfs26.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
/usr/lpp/mmfs/src/gpl-linux/inode.c: In function 'printInode':
/usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: 'struct inode' has no member named 'i_wb_list'
     _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP->i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev));
                                                         ^
/usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro '_TRACE_MACRO'
         { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP

[ cut ]

                          ^
/usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro 'TRACE6'
   TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8,
   ^
In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
/usr/lpp/mmfs/src/gpl-linux/inode.c: In function 'cxiInitInodeSecurity':
/usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of 'security_old_inode_init_security' from incompatible pointer type [enabled by default]
   rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name,
   ^
In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0,
                 from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61,
                 from /usr/lpp/mmfs/src/gpl-linux/dir.c:56,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
include/linux/security.h:1896:5: note: expected 'const char **' but argument is of type 'char **'
int security_old_inode_init_security(struct inode *inode, struct inode *dir,
     ^
In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
/usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function 'cache_get_name':
/usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration of function 'vfs_readdir' [-Werror=implicit-function-declaration]
     error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer);
     ^
cc1: some warnings being treated as errors
make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1
make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2
make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
make[1]: *** [modules] Error 1
make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
make: *** [Modules] Error 1
--------------------------------------------------------
mmbuildgpl: Building GPL module failed at Fri Sep  6 09:30:28 UTC 2019.
--------------------------------------------------------
mmbuildgpl: Command failed. Examine previous error messages to determine cause.

Any help appreciated...
Son

Son V Truong - Senior Storage Administrator
Advanced Computing Research Centre
IT Services, University of Bristol
Email: son.truong at bristol.ac.uk<mailto:s.truong at bristol.ac.uk>
Tel: Mobile: +44 (0) 7732 257 232
Address: 31 Great George Street, Bristol, BS1 5QD

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/aaaed906/attachment-0001.htm>

From david_johnson at brown.edu  Fri Sep  6 11:24:51 2019
From: david_johnson at brown.edu (david_johnson at brown.edu)
Date: Fri, 6 Sep 2019 06:24:51 -0400
Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
In-Reply-To: <DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>
References: <DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>
Message-ID: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>

We are starting rolling upgrade to 5.0.3-x and gplbin compiles with non-fatal warnings at that version. It seems to run fine. The rest of the cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do you have a reason to not go for the latest release on either the 4- or 5- line?

[root at xxx ~]# ssh node1301 rpm -q gpfs.base
gpfs.base-4.2.3-10.x86_64


  -- ddj
Dave Johnson

> On Sep 6, 2019, at 5:48 AM, Son Truong <son.truong at bristol.ac.uk> wrote:
> 
> Hello,
>  
> Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on RHEL 7.7?
>  
> I am failing with these errors:
>  
> [root at host ~]# uname -a
> Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
>  
> [root at host ~]# rpm -qa | grep gpfs
> gpfs.base-4.2.3-7.x86_64
> gpfs.gskit-8.0.50-75.x86_64
> gpfs.ext-4.2.3-7.x86_64
> gpfs.msg.en_US-4.2.3-7.noarch
> gpfs.docs-4.2.3-7.noarch
> gpfs.gpl-4.2.3-7.noarch
>  
> [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl
> --------------------------------------------------------
> mmbuildgpl: Building GPL module begins at Fri Sep  6 09:30:20 UTC 2019.
> --------------------------------------------------------
> Verifying Kernel Header...
>   kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64, 3.10.0-1062)
>   module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include
>   module build dir   = /lib/modules/3.10.0-1062.el7.x86_64/build
>   kernel source dir  = /usr/src/linux-3.10.0-1062.el7.x86_64/include
>   Found valid kernel header file under /usr/src/kernels/3.10.0-1062.el7.x86_64/include
> Verifying Compiler...
>   make is present at /bin/make
>   cpp is present at /bin/cpp
>   gcc is present at /bin/gcc
>   g++ is present at /bin/g++
>   ld is present at /bin/ld
> Verifying Additional System Headers...
>   Verifying kernel-headers is installed ...
>     Command: /bin/rpm -q kernel-headers
>     The required package kernel-headers is installed
> make World ...
> Verifying that tools to build the portability layer exist....
> cpp present
> gcc present
> g++ present
> ld present
> cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit $? || exit 1
> rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib
> mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib
> rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver
> cleaning (/usr/lpp/mmfs/src/ibm-kxi)
> make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi'
> rm -f trcid.h ibm_kxi.trclst
>  
> [cut]
>  
> Invoking Kbuild...
> /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config  ; \
> if [ $? -ne 0 ]; then \
>         exit 1;\
> fi
> make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
>   LD      /usr/lpp/mmfs/src/gpl-linux/built-in.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracelin.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/relaytrc.o
>   LD [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o
>   LD [M]  /usr/lpp/mmfs/src/gpl-linux/mmfs26.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
> In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?:
> /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? has no member named ?i_wb_list?
>      _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP->i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev));
>                                                          ^
> /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro ?_TRACE_MACRO?
>          { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP
>  
> [ cut ]
>  
>                           ^
> /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro ?TRACE6?
>    TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8,
>    ^
> In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?:
> /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of ?security_old_inode_init_security? from incompatible pointer type [enabled by default]
>    rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name,
>    ^
> In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0,
>                  from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61,
>                  from /usr/lpp/mmfs/src/gpl-linux/dir.c:56,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> include/linux/security.h:1896:5: note: expected ?const char **? but argument is of type ?char **?
> int security_old_inode_init_security(struct inode *inode, struct inode *dir,
>      ^
> In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?:
> /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration of function ?vfs_readdir? [-Werror=implicit-function-declaration]
>      error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer);
>      ^
> cc1: some warnings being treated as errors
> make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1
> make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2
> make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
> make[1]: *** [modules] Error 1
> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
> make: *** [Modules] Error 1
> --------------------------------------------------------
> mmbuildgpl: Building GPL module failed at Fri Sep  6 09:30:28 UTC 2019.
> --------------------------------------------------------
> mmbuildgpl: Command failed. Examine previous error messages to determine cause.
>  
> Any help appreciated?
> Son
>  
> Son V Truong - Senior Storage Administrator
> Advanced Computing Research Centre
> IT Services, University of Bristol
> Email: son.truong at bristol.ac.uk
> Tel: Mobile: +44 (0) 7732 257 232
> Address: 31 Great George Street, Bristol, BS1 5QD
>  
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/c02620cf/attachment-0001.htm>

From A.Wolf-Reber at de.ibm.com  Fri Sep  6 12:41:32 2019
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Fri, 6 Sep 2019 11:41:32 +0000
Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
In-Reply-To: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>
References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>,
	<DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>
Message-ID: <OF68980980.A6F8177F-ON0025846D.003FDECA-0025846D.00403A43@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/0a2d37c6/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.15677537609150.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/0a2d37c6/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.15677537609151.png
Type: image/png
Size: 6645 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/0a2d37c6/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.15677537609152.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/0a2d37c6/attachment-0005.png>

From Dugan.Witherick at warwick.ac.uk  Fri Sep  6 13:25:22 2019
From: Dugan.Witherick at warwick.ac.uk (Witherick, Dugan)
Date: Fri, 6 Sep 2019 12:25:22 +0000
Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
In-Reply-To: <OF68980980.A6F8177F-ON0025846D.003FDECA-0025846D.00403A43@notes.na.collabserv.com>
References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>	,
	<DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>
	<OF68980980.A6F8177F-ON0025846D.003FDECA-0025846D.00403A43@notes.na.collabserv.com>
Message-ID: <05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk>

Hi Son,

You might also find Table 39 on 
https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#fsm
useful as it lists the minimum Spectrum Scale Level supported and tested against the RHEL Distribution/kernel version.

Thanks,
Dugan

On Fri, 2019-09-06 at 11:41 +0000, Alexander Wolf wrote:
> RHEL 7.7 is not supported by any Scale release at the moment. We are
> qualifying it right now and would like to claim support with the next PTFs on
> both 4.2.3 and 5.0.3 streams. However we have seen issues in test that will
> probably cause delays. 
>  
> Picking up new minor RHEL updates before Scale claims support might work many
> times but is quite a risky business. I highly recommend waiting for our
> support statement.
>  
> Mit freundlichen Gr??en / Kind regards
> 
> 
>                             
>  
>      
> Dr. Alexander Wolf-Reber
> Spectrum Scale Release Lead Architect
> Department M069 / Spectrum Scale Software Development
> 
> +49-160-90540880
> a.wolf-reber at de.ibm.com
>  
> IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats:
> Matthias Hartmann / Gesch?ftsf?hrung: Dirk Wittkopp
> Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB
> 243294
> 
>  
>  
> > ----- Original message -----
> > From: david_johnson at brown.edu
> > Sent by: gpfsug-discuss-bounces at spectrumscale.org
> > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > Cc:
> > Subject: [EXTERNAL] Re: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
> > Date: Fri, Sep 6, 2019 12:33
> >  
> > We are starting rolling upgrade to 5.0.3-x and gplbin compiles with non-
> > fatal warnings at that version. It seems to run fine. The rest of the
> > cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do you have a
> > reason to not go for the latest release on either the 4- or 5- line?
> >  
> > [root at xxx ~]# ssh node1301 rpm -q gpfs.base
> > gpfs.base-4.2.3-10.x86_64
> >  
> >  
> >   -- ddj
> > Dave Johnson
> > 
> > On Sep 6, 2019, at 5:48 AM, Son Truong <son.truong at bristol.ac.uk> wrote:
> >  
> > > Hello,
> > >  
> > > Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on
> > > RHEL 7.7?
> > >  
> > > I am failing with these errors:
> > >  
> > > [root at host ~]# uname -a
> > > Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019
> > > x86_64 x86_64 x86_64 GNU/Linux
> > >  
> > > [root at host ~]# rpm -qa | grep gpfs
> > > gpfs.base-4.2.3-7.x86_64
> > > gpfs.gskit-8.0.50-75.x86_64
> > > gpfs.ext-4.2.3-7.x86_64
> > > gpfs.msg.en_US-4.2.3-7.noarch
> > > gpfs.docs-4.2.3-7.noarch
> > > gpfs.gpl-4.2.3-7.noarch
> > >  
> > > [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl
> > > --------------------------------------------------------
> > > mmbuildgpl: Building GPL module begins at Fri Sep  6 09:30:20 UTC 2019.
> > > --------------------------------------------------------
> > > Verifying Kernel Header...
> > >   kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64,
> > > 3.10.0-1062)
> > >   module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include
> > >   module build dir   = /lib/modules/3.10.0-1062.el7.x86_64/build
> > >   kernel source dir  = /usr/src/linux-3.10.0-1062.el7.x86_64/include
> > >   Found valid kernel header file under /usr/src/kernels/3.10.0-
> > > 1062.el7.x86_64/include
> > > Verifying Compiler...
> > >   make is present at /bin/make
> > >   cpp is present at /bin/cpp
> > >   gcc is present at /bin/gcc
> > >   g++ is present at /bin/g++
> > >   ld is present at /bin/ld
> > > Verifying Additional System Headers...
> > >   Verifying kernel-headers is installed ...
> > >     Command: /bin/rpm -q kernel-headers
> > >     The required package kernel-headers is installed
> > > make World ...
> > > Verifying that tools to build the portability layer exist....
> > > cpp present
> > > gcc present
> > > g++ present
> > > ld present
> > > cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit
> > > $? || exit 1
> > > rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin
> > > /usr/lpp/mmfs/src/lib
> > > mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin
> > > /usr/lpp/mmfs/src/lib
> > > rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver
> > > cleaning (/usr/lpp/mmfs/src/ibm-kxi)
> > > make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi'
> > > rm -f trcid.h ibm_kxi.trclst
> > >  
> > > [cut]
> > >  
> > > Invoking Kbuild...
> > > /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64
> > > M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config  ; \
> > > if [ $? -ne 0 ]; then \
> > >         exit 1;\
> > > fi
> > > make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
> > >   LD      /usr/lpp/mmfs/src/gpl-linux/built-in.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracelin.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/relaytrc.o
> > >   LD [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o
> > >   LD [M]  /usr/lpp/mmfs/src/gpl-linux/mmfs26.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?:
> > > /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? has no
> > > member named ?i_wb_list?
> > >      _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)),
> > > (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP-
> > > >i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev));
> > >                                                          ^
> > > /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro
> > > _TRACE_MACRO?
> > >          { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP
> > >  
> > > [ cut ]
> > >  
> > >                           ^
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro
> > > ?TRACE6?
> > >    TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8,
> > >    ^
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of
> > > ?security_old_inode_init_security? from incompatible pointer type [enabled
> > > by default]
> > >    rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name,
> > >    ^
> > > In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0,
> > >                  from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/dir.c:56,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > include/linux/security.h:1896:5: note: expected ?const char **? but
> > > argument is of type ?char **?
> > > int security_old_inode_init_security(struct inode *inode, struct inode
> > > *dir,
> > >      ^
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?:
> > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration
> > > of function ?vfs_readdir? [-Werror=implicit-function-declaration]
> > >      error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer);
> > >      ^
> > > cc1: some warnings being treated as errors
> > > make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1
> > > make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2
> > > make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
> > > make[1]: *** [modules] Error 1
> > > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
> > > make: *** [Modules] Error 1
> > > --------------------------------------------------------
> > > mmbuildgpl: Building GPL module failed at Fri Sep  6 09:30:28 UTC 2019.
> > > --------------------------------------------------------
> > > mmbuildgpl: Command failed. Examine previous error messages to determine
> > > cause.
> > >  
> > > Any help appreciated?
> > > Son
> > >  
> > > Son V Truong - Senior Storage Administrator
> > > Advanced Computing Research Centre
> > > IT Services, University of Bristol
> > > Email: son.truong at bristol.ac.uk
> > > Tel: Mobile: +44 (0) 7732 257 232
> > > Address: 31 Great George Street, Bristol, BS1 5QD
> > >  
> >  
> > > _______________________________________________
> > > gpfsug-discuss mailing list
> > > gpfsug-discuss at spectrumscale.org
> > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > 
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> 
>  
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From son.truong at bristol.ac.uk  Fri Sep  6 15:15:04 2019
From: son.truong at bristol.ac.uk (Son Truong)
Date: Fri, 6 Sep 2019 14:15:04 +0000
Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
In-Reply-To: <05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk>
References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>	,
	<DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>
	<OF68980980.A6F8177F-ON0025846D.003FDECA-0025846D.00403A43@notes.na.collabserv.com>
	<05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk>
Message-ID: <DB6PR0602MB2933853D2AEDBC8AD3A4A67AD2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>

Thank you. Table 39 is most helpful.

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Witherick, Dugan
Sent: 06 September 2019 13:25
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Compiling gplbin on RHEL 7.7

Hi Son,

You might also find Table 39 on
https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#fsm
useful as it lists the minimum Spectrum Scale Level supported and tested against the RHEL Distribution/kernel version.

Thanks,
Dugan

On Fri, 2019-09-06 at 11:41 +0000, Alexander Wolf wrote:
> RHEL 7.7 is not supported by any Scale release at the moment. We are 
> qualifying it right now and would like to claim support with the next 
> PTFs on both 4.2.3 and 5.0.3 streams. However we have seen issues in 
> test that will probably cause delays.
>  
> Picking up new minor RHEL updates before Scale claims support might 
> work many times but is quite a risky business. I highly recommend 
> waiting for our support statement.
>  
> Mit freundlichen Gr??en / Kind regards
> 
> 
>                             
>  
>      
> Dr. Alexander Wolf-Reber
> Spectrum Scale Release Lead Architect
> Department M069 / Spectrum Scale Software Development
> 
> +49-160-90540880
> a.wolf-reber at de.ibm.com
>  
> IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats:
> Matthias Hartmann / Gesch?ftsf?hrung: Dirk Wittkopp Sitz der 
> Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB
> 243294
> 
>  
>  
> > ----- Original message -----
> > From: david_johnson at brown.edu
> > Sent by: gpfsug-discuss-bounces at spectrumscale.org
> > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > Cc:
> > Subject: [EXTERNAL] Re: [gpfsug-discuss] Compiling gplbin on RHEL 
> > 7.7
> > Date: Fri, Sep 6, 2019 12:33
> >  
> > We are starting rolling upgrade to 5.0.3-x and gplbin compiles with 
> > non- fatal warnings at that version. It seems to run fine. The rest 
> > of the cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do 
> > you have a reason to not go for the latest release on either the 4- or 5- line?
> >  
> > [root at xxx ~]# ssh node1301 rpm -q gpfs.base
> > gpfs.base-4.2.3-10.x86_64
> >  
> >  
> >   -- ddj
> > Dave Johnson
> > 
> > On Sep 6, 2019, at 5:48 AM, Son Truong <son.truong at bristol.ac.uk> wrote:
> >  
> > > Hello,
> > >  
> > > Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel 
> > > modules on RHEL 7.7?
> > >  
> > > I am failing with these errors:
> > >  
> > > [root at host ~]# uname -a
> > > Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 
> > > 2019
> > > x86_64 x86_64 x86_64 GNU/Linux
> > >  
> > > [root at host ~]# rpm -qa | grep gpfs
> > > gpfs.base-4.2.3-7.x86_64
> > > gpfs.gskit-8.0.50-75.x86_64
> > > gpfs.ext-4.2.3-7.x86_64
> > > gpfs.msg.en_US-4.2.3-7.noarch
> > > gpfs.docs-4.2.3-7.noarch
> > > gpfs.gpl-4.2.3-7.noarch
> > >  
> > > [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl
> > > --------------------------------------------------------
> > > mmbuildgpl: Building GPL module begins at Fri Sep  6 09:30:20 UTC 2019.
> > > --------------------------------------------------------
> > > Verifying Kernel Header...
> > >   kernel version = 31000999 (31000999000000, 
> > > 3.10.0-1062.el7.x86_64,
> > > 3.10.0-1062)
> > >   module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include
> > >   module build dir   = /lib/modules/3.10.0-1062.el7.x86_64/build
> > >   kernel source dir  = /usr/src/linux-3.10.0-1062.el7.x86_64/include
> > >   Found valid kernel header file under /usr/src/kernels/3.10.0- 
> > > 1062.el7.x86_64/include Verifying Compiler...
> > >   make is present at /bin/make
> > >   cpp is present at /bin/cpp
> > >   gcc is present at /bin/gcc
> > >   g++ is present at /bin/g++
> > >   ld is present at /bin/ld
> > > Verifying Additional System Headers...
> > >   Verifying kernel-headers is installed ...
> > >     Command: /bin/rpm -q kernel-headers
> > >     The required package kernel-headers is installed make World 
> > > ...
> > > Verifying that tools to build the portability layer exist....
> > > cpp present
> > > gcc present
> > > g++ present
> > > ld present
> > > cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > 
> > > ./def.mk; exit $? || exit 1 rm -rf /usr/lpp/mmfs/src/include 
> > > /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib mkdir 
> > > /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin 
> > > /usr/lpp/mmfs/src/lib rm -f 
> > > //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver
> > > cleaning (/usr/lpp/mmfs/src/ibm-kxi)
> > > make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi'
> > > rm -f trcid.h ibm_kxi.trclst
> > >  
> > > [cut]
> > >  
> > > Invoking Kbuild...
> > > /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 
> > > ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux 
> > > CONFIGDIR=/usr/lpp/mmfs/src/config  ; \ if [ $? -ne 0 ]; then \
> > >         exit 1;\
> > > fi
> > > make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
> > >   LD      /usr/lpp/mmfs/src/gpl-linux/built-in.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracelin.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/relaytrc.o
> > >   LD [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o
> > >   LD [M]  /usr/lpp/mmfs/src/gpl-linux/mmfs26.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?:
> > > /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? 
> > > has no member named ?i_wb_list?
> > >      _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), 
> > > (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), 
> > > (Int64)(iP->i_wb_list.prev), (Int64)(&(iP-
> > > >i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev));
> > >                                                          ^
> > > /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition 
> > > of macro _TRACE_MACRO?
> > >          { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP
> > >  
> > > [ cut ]
> > >  
> > >                           ^
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of 
> > > macro ?TRACE6?
> > >    TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8,
> > >    ^
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing 
> > > argument 4 of ?security_old_inode_init_security? from incompatible 
> > > pointer type [enabled by default]
> > >    rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name,
> > >    ^
> > > In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0,
> > >                  from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/dir.c:56,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > include/linux/security.h:1896:5: note: expected ?const char **? 
> > > but argument is of type ?char **?
> > > int security_old_inode_init_security(struct inode *inode, struct 
> > > inode *dir,
> > >      ^
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?:
> > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit 
> > > declaration of function ?vfs_readdir? [-Werror=implicit-function-declaration]
> > >      error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer);
> > >      ^
> > > cc1: some warnings being treated as errors
> > > make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1
> > > make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2
> > > make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
> > > make[1]: *** [modules] Error 1
> > > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
> > > make: *** [Modules] Error 1
> > > --------------------------------------------------------
> > > mmbuildgpl: Building GPL module failed at Fri Sep  6 09:30:28 UTC 2019.
> > > --------------------------------------------------------
> > > mmbuildgpl: Command failed. Examine previous error messages to 
> > > determine cause.
> > >  
> > > Any help appreciated?
> > > Son
> > >  
> > > Son V Truong - Senior Storage Administrator Advanced Computing 
> > > Research Centre IT Services, University of Bristol
> > > Email: son.truong at bristol.ac.uk
> > > Tel: Mobile: +44 (0) 7732 257 232
> > > Address: 31 Great George Street, Bristol, BS1 5QD
> > >  
> >  
> > > _______________________________________________
> > > gpfsug-discuss mailing list
> > > gpfsug-discuss at spectrumscale.org 
> > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > 
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
>  
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From Robert.Oesterlin at nuance.com  Fri Sep  6 16:42:39 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Fri, 6 Sep 2019 15:42:39 +0000
Subject: [gpfsug-discuss] SSUG Meeting at SC19: Save the date and call for
	user talks!
Message-ID: <B5C8A89A-3CAF-4738-AC10-7341A67F4941@nuance.com>

The Spectrum Scale User group will hold its annual meeting at SC19 on Sunday November 17th from 12:30PM -6PM In Denver, Co. We will be posting exact meeting location soon, but reserve this time. IBM will host a reception following the user group meeting.

We?re also looking for user talks - these are short update (20 mins or so) on your use of Spectrum Scale - any topics are welcome. If you are interested, please contact myself or Kristy Kallback-Rose.

Looking forward to seeing everyone in Denver!

Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/71c77399/attachment-0001.htm>

From bipcuds at gmail.com  Mon Sep  9 21:29:28 2019
From: bipcuds at gmail.com (Keith Ball)
Date: Mon, 9 Sep 2019 16:29:28 -0400
Subject: [gpfsug-discuss] Anyone have experience with changing NSD server
 node name in an ESS/DSS cluster?
Message-ID: <CAAxuGpGoC6ZFbGKvvnT=eMfjaEfnm-G3XOXfTN+bhr5yzZ_yvw@mail.gmail.com>

Hi All,

We are thinking of attempting a non-destructive change of NSD server node
names in a Lenovo DSS cluster (DSS level 1.2a, which has Scale 4.2.3.5).
For a non-GNR cluster, changing a node name for an NSD server isn't a huge
deal if you can have a backup server serve up disks; one can mmdelnode then
mmaddnode, for instance.

Has anyone tried to rename the NSD servers in a GNR cluster, however? I am
not sure if it's as easy as failing over the recovery group, and
deleting/adding the NSD server. It's easy enough to modify xcat. Perhaps
mmchrecoverygroup can be used to change the RG names (since they are named
after the NSD servers), but that might not be necessary. Or, it might not
work - does anyone know if there is a special process to change NSD server
names in an E( or D or G)SS cluster that does not run afoul of GNR or
upgrade scripts?

Best regards,
  Keith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190909/317b232a/attachment-0001.htm>

From TROPPENS at de.ibm.com  Wed Sep 11 13:20:22 2019
From: TROPPENS at de.ibm.com (Ulf Troppens)
Date: Wed, 11 Sep 2019 14:20:22 +0200
Subject: [gpfsug-discuss] Save the date: Oct 10 - Spectrum Scale NYC User
	Meeting
Message-ID: <OFEE4AA8A1.E86C67B9-ONC1258472.00425CA9-C1258472.0043C885@notes.na.collabserv.com>


Greetings,

NYU Langone and IBM will host a Spectrum Scale User Meeting on October 10.
Many senior engineers of our development lab in Poughkeepsie will attend
and present. Details with agenda, exact location and registration link will
follow.

Best
Ulf


--
IBM Spectrum Scale Development - Client Engagements & Solutions Delivery
Consulting IT Specialist
Author "Storage Networks Explained"

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart,
HRB 243294
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190911/3380e88a/attachment-0001.htm>

From jjvilla at nccs.nasa.gov  Wed Sep 11 20:14:12 2019
From: jjvilla at nccs.nasa.gov (John J. Villa)
Date: Wed, 11 Sep 2019 15:14:12 -0400 (EDT)
Subject: [gpfsug-discuss] Introduction - New Subscriber
Message-ID: <alpine.DEB.2.02.1909111508550.28760@calvin2.nccs.nasa.gov>

Hello,

My name is John Villa. I work for NASA at the Nasa Center for Climate 
Simulation. We currently utilize GPFS as the primary filesystem on the 
discover cluster:
https://www.nccs.nasa.gov/systems/discover

I look forward to seeing everyone at SC19.

Thank You,
--
John J. Villa
NASA Center for Climate Simulation
Discover Systems Administrator


From damir.krstic at gmail.com  Thu Sep 12 15:16:03 2019
From: damir.krstic at gmail.com (Damir Krstic)
Date: Thu, 12 Sep 2019 09:16:03 -0500
Subject: [gpfsug-discuss] VerbsReconnectThread waiters
Message-ID: <CAKV+WqfbysaKM=We3NNm5essS=--Ej=wu+yzvLfousVp4cfOYw@mail.gmail.com>

On my cluster I have seen couple of long waiters such as this:

gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230
VerbsReconnectThread: delaying for 43.145624000 more seconds, reason:
delaying for next reconnect attempt

I tried searching on gpfs wiki for this type of waiter, but was unable to
find anything of value.

Is this something to pay attention to, and what does this waiter mean?

Thank you.
Damir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190912/8b65f9f1/attachment-0001.htm>

From george at markomanolis.com  Thu Sep 12 16:10:58 2019
From: george at markomanolis.com (George Markomanolis)
Date: Thu, 12 Sep 2019 11:10:58 -0400
Subject: [gpfsug-discuss] Call for Submission for the IO500 List
Message-ID: <CAPU3yroLbtpNzguSxxHndmQnxrAJ4OZtZTW0Wgngt7aq=2QKmA@mail.gmail.com>

Call for Submission

*Deadline*: 10 November 2019 AoE

The IO500 <http://io500.org/> is now accepting and encouraging submissions
for the upcoming 5th IO500 list revealed at SC19 in Denver, Colorado. Once
again, we are also accepting submissions to the 10 Node I/O Challenge to
encourage submission of small scale results. The new ranked lists will be
announced at our SC19 BoF [2]. We hope to see you, and your results, there.
We have updated our submission rules [3]. This year, we will have a new
list for the Student Cluster Competition as IO500 is used for extra points
during this competition

The benchmark suite is designed to be easy to run and the community has
multiple active support channels to help with any questions. Please submit
and we look forward to seeing many of you at SC19! Please note that
submissions of all sizes are welcome; the site has customizable sorting so
it is possible to submit on a small system and still get a very good
per-client score for example. Additionally, the list is about much more
than just the raw rank; all submissions help the community by collecting
and publishing a wider corpus of data. More details below.

Following the success of the Top500 in collecting and analyzing historical
trends in supercomputer technology and evolution, the IO500
<http://io500.org/> was created in 2017, published its first list at SC17,
and has grown exponentially since then. The need for such an initiative has
long been known within High-Performance Computing; however, defining
appropriate benchmarks had long been challenging. Despite this challenge,
the community, after long and spirited discussion, finally reached
consensus on a suite of benchmarks and a metric for resolving the scores
into a single ranking.

The multi-fold goals of the benchmark suite are as follows:

1.   Maximizing simplicity in running the benchmark suite

2.   Encouraging complexity in tuning for performance

3.   Allowing submitters to highlight their ?hero run? performance numbers

4.   Forcing submitters to simultaneously report performance for
challenging IO patterns.

Specifically, the benchmark suite includes a hero-run of both IOR and
mdtest configured however possible to maximize performance and establish an
upper-bound for performance. It also includes an IOR and mdtest run with
highly prescribed parameters in an attempt to determine a lower-bound.
Finally, it includes a namespace search as this has been determined to be a
highly sought-after feature in HPC storage systems that have historically
not been well-measured. Submitters are encouraged to share their tuning
insights for publication.

The goals of the community are also multi-fold:

1.   Gather historical data for the sake of analysis and to aid predictions
of storage futures

2.   Collect tuning information to share valuable performance optimizations
across the community

3.   Encourage vendors and designers to optimize for workloads beyond ?hero
runs?

4.   Establish bounded expectations for users, procurers, and administrators
10 Node I/O Challenge

At SC, we will continue the 10 Node Challenge. This challenge is conducted
using the regular IO500 benchmark, however, with the rule that exactly *10
computes nodes* must be used to run the benchmark (one exception is the
find, which may use 1 node). You may use any shared storage with, e.g., any
number of servers. We will announce the result in a separate derived list
and in the full list but not on the ranked IO500 list at io500.org.
Birds-of-a-feather

Once again, we encourage you to submit [1], to join our community, and to
attend our BoF ?The IO500 and the Virtual Institute of I/O? at SC19,
November 19th, 12:15-1:15pm, room 205-207, where we will announce the new
IO500 list, the 10 node challenge list, and the Student Cluster Competition
list. We look forward to answering any questions or concerns you might have.

[1] http://io500.org/submission

[2] *https://www.vi4io.org/io500/bofs/sc19/start
<https://www.vi4io.org/io500/bofs/sc19/start>*

[3] https://www.vi4io.org/io500/rules/submission

The IO500 committee
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190912/6b48bb24/attachment-0001.htm>

From kkr at lbl.gov  Thu Sep 12 20:19:20 2019
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Thu, 12 Sep 2019 12:19:20 -0700
Subject: [gpfsug-discuss] Hold the Date - September 23 and 24 -
	REGISTRATION CLOSING SOON
In-Reply-To: <938EC571-B900-42BC-8465-3E666912533F@lbl.gov>
References: <3F2B08E9-C6E3-412B-9308-D79E3480C5DA@lbl.gov>
	<938EC571-B900-42BC-8465-3E666912533F@lbl.gov>
Message-ID: <FDE59BE1-EBFB-422F-A5BB-28CFD0BC403B@lbl.gov>

Reminder, registration closing on 9/16 EOB. That?s real soon now. Hope to see you there. Details below.

> On Aug 29, 2019, at 7:30 PM, Kristy Kallback-Rose <kkr at lbl.gov> wrote:
> 
> Hello,
> 
> 	You will now find the nearly complete agenda here: 
> 
> https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/ <https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/>
> 
> 	As noted before, the event is free, but please do register below to help with catering planning.
> 
> 	You can find more information about the full HPCXXL event here: http://hpcxxl.org/ <http://hpcxxl.org/>
> 
> 	Any questions let us know. Hope to see you there!
> 
> -Kristy
> 
>> On Jul 2, 2019, at 10:45 AM, Kristy Kallback-Rose <kkr at lbl.gov <mailto:kkr at lbl.gov>> wrote:
>> 
>> Hello,
>> 
>> HPCXXL will be hosted by NERSC (Berkeley, CA) this September. As part of this event, there will be approximately a day and a half on GPFS content. We have done this type of event in the past, and as before, the GPFS days will be free to attend, but you do need to register. 
>> 
>> We?ll have more details soon, mark your calendars. 
>> 
>> Initial details: https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/ <https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/>
>> 
>> Best,
>> Kristy
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190912/1c4b782e/attachment-0001.htm>

From Greg.Lehmann at csiro.au  Fri Sep 13 09:48:58 2019
From: Greg.Lehmann at csiro.au (Lehmann, Greg (IM&T, Pullenvale))
Date: Fri, 13 Sep 2019 08:48:58 +0000
Subject: [gpfsug-discuss] infiniband fabric instability effects
Message-ID: <SYCPR01MB4544546F3C45E36F01E2DC3EF0B30@SYCPR01MB4544.ausprd01.prod.outlook.com>

Hi All,
                I was wondering what effect restarting the subnet manager has on an active Spectrum Scale filesystem. Is there any scope for data loss or corruption? A 2nd similar scenario of slightly longer duration is failover to a secondary subnet manager because the primary has crashed. What effect would that have on the filesystem?

Cheers,

Greg Lehmann
Senior High Performance Data Specialist
Data Services | Scientific Computing Platforms
Information Management and Technology  |  CSIRO
Greg.Lehmann at csiro.au<mailto:Greg.Lehmann at csiro.au>  |  +61 7 3327 4137 |
1 Technology Court, Pullenvale, QLD 4069

CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and we pay our respects to their Elders past and present.

The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference.

Please consider the environment before printing this email.

CSIRO Australia's National Science Agency  |  csiro.au

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190913/d85678ad/attachment-0001.htm>

From david_johnson at brown.edu  Fri Sep 13 10:14:06 2019
From: david_johnson at brown.edu (david_johnson at brown.edu)
Date: Fri, 13 Sep 2019 05:14:06 -0400
Subject: [gpfsug-discuss] infiniband fabric instability effects
In-Reply-To: <SYCPR01MB4544546F3C45E36F01E2DC3EF0B30@SYCPR01MB4544.ausprd01.prod.outlook.com>
References: <SYCPR01MB4544546F3C45E36F01E2DC3EF0B30@SYCPR01MB4544.ausprd01.prod.outlook.com>
Message-ID: <21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu>

Restarting subnet manager in general is fairly harmless. It will cause a heavy sweep of the fabric when it comes back up, but there should be no LID renumbering. Traffic may be held up during the scanning and rebuild of the routing tables. 
 Losing a subnet manager for a period of time would prevent newly booted nodes from receiving a LID but existing nodes will continue to function. 
Adding or deleting inter-switch links should probably be avoided if the subnet manager is down.  I would also avoid changing the routing algorithm while in production.  
Moving a non ha subnet manager from primary to backup and back again has worked for us without disruption, but I would try to do this in a maintenance window. 

  -- ddj
Dave Johnson

> On Sep 13, 2019, at 4:48 AM, Lehmann, Greg (IM&T, Pullenvale) <Greg.Lehmann at csiro.au> wrote:
> 
> Hi All,
>                 I was wondering what effect restarting the subnet manager has on an active Spectrum Scale filesystem. Is there any scope for data loss or corruption? A 2nd similar scenario of slightly longer duration is failover to a secondary subnet manager because the primary has crashed. What effect would that have on the filesystem?
>  
> Cheers,
>  
> Greg Lehmann
> Senior High Performance Data Specialist
> Data Services | Scientific Computing Platforms
> Information Management and Technology  |  CSIRO 
> Greg.Lehmann at csiro.au  |  +61 7 3327 4137 |
> 1 Technology Court, Pullenvale, QLD 4069
>  
> CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and we pay our respects to their Elders past and present.
>  
> The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference.
>  
> Please consider the environment before printing this email.
>  
> CSIRO Australia?s National Science Agency  |  csiro.au
>  
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190913/491fcfd7/attachment-0001.htm>

From jonathan.buzzard at strath.ac.uk  Fri Sep 13 10:48:52 2019
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 13 Sep 2019 09:48:52 +0000
Subject: [gpfsug-discuss] infiniband fabric instability effects
In-Reply-To: <21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu>
References: <SYCPR01MB4544546F3C45E36F01E2DC3EF0B30@SYCPR01MB4544.ausprd01.prod.outlook.com>
	<21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu>
Message-ID: <ddeb2a2ebc4b8df22b831d3e8287401db673a1ba.camel@strath.ac.uk>

On Fri, 2019-09-13 at 05:14 -0400, david_johnson at brown.edu wrote:

[SNIP]

> Moving a non ha subnet manager from primary to backup and back again
> has worked for us without disruption, but I would try to do this in a
> maintenance window. 
> 

Not on GPFS but in the past I have moved from one subnet manager to
another with dozens of running MPI jobs, and Lustre running over the
fabric and not missed a beat. My current cluster used 10 and 40Gbps
ethernet for GPFS with Omnipath exclusively for MPI traffic.

To be honest I just cannot wrap my head around the idea that you would
not be running two subnet managers in the first place. Just fire up two
subnet managers (whether on a switch or a node) and forget about it.
They will automatically work together to give you a HA solution. It is
the same with Omnipath too.

I would also note that you can fire up more than two fabric managers
and it all "just works".

If it where me and I didn't have fabric managers running on at least
two of my switches and I was doing GPFS over Infiniband, I would fire
up fabric managers on all of my NSD servers.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From heinrich.billich at id.ethz.ch  Fri Sep 13 15:56:07 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Fri, 13 Sep 2019 14:56:07 +0000
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?
Message-ID: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>


Hello,

I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated.  But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. 
I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level?

Any comment is welcome

Cheers,
Heiner
-- 
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
 

I did check with

  ss -l -t -4
  ss -l -t  -6

add  -p to get the process name, too.

do you get the same results on your ces nodes?


[root at nas22ces04-i config_samples]#   ss -l -t   -4
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      8192                                                                                  *:gpfs                                                                                                *:*
LISTEN      0      50                                                                                    *:netbios-ssn                                                                                         *:*
LISTEN      0      128                                                                                   *:5355                                                                                                *:*
LISTEN      0      128                                                                                   *:sunrpc                                                                                              *:*
LISTEN      0      128                                                                                   *:ssh                                                                                                 *:*
LISTEN      0      100                                                                           127.0.0.1:smtp                                                                                                *:*
LISTEN      0      10                                                                        10.250.135.24:4379                                                                                                *:*
LISTEN      0      128                                                                                   *:32765                                                                                               *:*
LISTEN      0      50                                                                                    *:microsoft-ds                                                                                        *:*
[root at nas22ces04-i config_samples]#   ss -l -t   -6
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      128                                                                                  :::32767                                                                                              :::*
LISTEN      0      128                                                                                  :::32768                                                                                              :::*
LISTEN      0      128                                                                                  :::32769                                                                                              :::*
LISTEN      0      128                                                                                  :::2049                                                                                               :::*
LISTEN      0      128                                                                                  :::5355                                                                                               :::*
LISTEN      0      50                                                                                   :::netbios-ssn                                                                                        :::*
LISTEN      0      128                                                                                  :::sunrpc                                                                                             :::*
LISTEN      0      128                                                                                  :::ssh                                                                                                :::*
LISTEN      0      128                                                                                  :::32765                                                                                              :::*
LISTEN      0      50                                                                                   :::microsoft-ds                                                                                       :::*


From ewahl at osc.edu  Fri Sep 13 16:42:30 2019
From: ewahl at osc.edu (Wahl, Edward)
Date: Fri, 13 Sep 2019 15:42:30 +0000
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be
	expected?
In-Reply-To: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
References: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
Message-ID: <DM5PR0102MB349461A0FCE42DB381989D35A8B30@DM5PR0102MB3494.prod.exchangelabs.com>

I recall looking at this a year or two back.  Ganesha is either v4 and v6 both (ie: the encapsulation you see), OR  ipv4 ONLY.  (ie: /etc/modprobe.d/ipv6.conf   disable=1)

Ed


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Billich Heinrich Rainer (ID SD) <heinrich.billich at id.ethz.ch>
Sent: Friday, September 13, 2019 10:56 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?


Hello,

I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated.  But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets.
I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level?

Any comment is welcome

Cheers,
Heiner
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


I did check with

  ss -l -t -4
  ss -l -t  -6

add  -p to get the process name, too.

do you get the same results on your ces nodes?


[root at nas22ces04-i config_samples]#   ss -l -t   -4
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      8192                                                                                  *:gpfs                                                                                                *:*
LISTEN      0      50                                                                                    *:netbios-ssn                                                                                         *:*
LISTEN      0      128                                                                                   *:5355                                                                                                *:*
LISTEN      0      128                                                                                   *:sunrpc                                                                                              *:*
LISTEN      0      128                                                                                   *:ssh                                                                                                 *:*
LISTEN      0      100                                                                           127.0.0.1:smtp                                                                                                *:*
LISTEN      0      10                                                                        10.250.135.24:4379                                                                                                *:*
LISTEN      0      128                                                                                   *:32765                                                                                               *:*
LISTEN      0      50                                                                                    *:microsoft-ds                                                                                        *:*
[root at nas22ces04-i config_samples]#   ss -l -t   -6
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      128                                                                                  :::32767                                                                                              :::*
LISTEN      0      128                                                                                  :::32768                                                                                              :::*
LISTEN      0      128                                                                                  :::32769                                                                                              :::*
LISTEN      0      128                                                                                  :::2049                                                                                               :::*
LISTEN      0      128                                                                                  :::5355                                                                                               :::*
LISTEN      0      50                                                                                   :::netbios-ssn                                                                                        :::*
LISTEN      0      128                                                                                  :::sunrpc                                                                                             :::*
LISTEN      0      128                                                                                  :::ssh                                                                                                :::*
LISTEN      0      128                                                                                  :::32765                                                                                              :::*
LISTEN      0      50                                                                                   :::microsoft-ds                                                                                       :::*

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190913/9f3212f9/attachment-0001.htm>

From jam at ucar.edu  Fri Sep 13 17:07:01 2019
From: jam at ucar.edu (Joseph Mendoza)
Date: Fri, 13 Sep 2019 10:07:01 -0600
Subject: [gpfsug-discuss] VerbsReconnectThread waiters
In-Reply-To: <CAKV+WqfbysaKM=We3NNm5essS=--Ej=wu+yzvLfousVp4cfOYw@mail.gmail.com>
References: <CAKV+WqfbysaKM=We3NNm5essS=--Ej=wu+yzvLfousVp4cfOYw@mail.gmail.com>
Message-ID: <0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu>

I have seen these on our cluster after the IB network goes down (GPFS still runs over ethernet) and then comes back up.?
They will retry forever it seems, even after the IB is healthy again.? The effect they seem to have is that verbs
connections between some nodes breaks and GPFS uses ethernet/ipoib instead.? You may see messages in your
mmfs.log.latest about verbs being disabled "due to too many errors".? You can also see fewer verbs connections between
nodes in "mmfsadm test verbs conn" output.

Restarting GPFS on the nodes with waiters has fixed the issue for me, I don't know if IBM has any other tricks to fix
this without a restart.

--Joey


On 9/12/19 8:16 AM, Damir Krstic wrote:
> On my cluster I have seen couple of long waiters such as this:
>
> gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230 VerbsReconnectThread: delaying for 43.145624000 more
> seconds, reason: delaying for next reconnect attempt
>
> I tried searching on gpfs wiki for this type of waiter, but was unable to find anything of value.
>
> Is this something to pay attention to, and what does this waiter mean?
>
> Thank you.
> Damir
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190913/62e11588/attachment-0001.htm>

From olaf.weiser at de.ibm.com  Mon Sep 16 08:12:09 2019
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 16 Sep 2019 09:12:09 +0200
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be
	expected?
In-Reply-To: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
References: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
Message-ID: <OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/eb0ae02d/attachment-0001.htm>

From scale at us.ibm.com  Mon Sep 16 10:33:58 2019
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Mon, 16 Sep 2019 17:33:58 +0800
Subject: [gpfsug-discuss] VerbsReconnectThread waiters
In-Reply-To: <0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu>
References: <CAKV+WqfbysaKM=We3NNm5essS=--Ej=wu+yzvLfousVp4cfOYw@mail.gmail.com>
	<0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu>
Message-ID: <OFBB9F688D.90B746AE-ON85258477.0032A926-48258477.00348C9B@notes.na.collabserv.com>


Damir, Joseph,

> Is this something to pay attention to, and what does this waiter mean?
This waiter means GPFS fails to reconnect broken verbs connection,  which
can cause performance degradation.

> I have seen these on our cluster after the IB network goes down (GPFS
still runs over ethernet) and then comes back up.? They will retry forever
it seems, even after the IB is healthy again.
> Restarting GPFS on the nodes with waiters has fixed the issue for me, I
don't know if IBM has any other tricks to fix this without a restart.

This is a code bug which is fixed through internal defect 1090669. It will
be backport to service releases after verification.
There is a work-around which can fix this problem without a restart.
-   On nodes which have this waiter list, run command 'mmfsadm test
breakconn all 744'
     744 is E_RECONNECT, which triggers tcp reconnect and will not cause
node leave/rejoin. Its side effect clears RDMA connections and their
incorrect status.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.


From:	Joseph Mendoza <jam at ucar.edu>
To:	gpfsug-discuss at spectrumscale.org
Date:	2019/09/14 12:08 AM
Subject:	[EXTERNAL] Re: [gpfsug-discuss] VerbsReconnectThread waiters
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


I have seen these on our cluster after the IB network goes down (GPFS still
runs over ethernet) and then comes back up.? They will retry forever it
seems, even after the IB is healthy again.? The effect they seem to have is
that verbs connections between some nodes breaks and GPFS uses
ethernet/ipoib instead.? You may see messages in your mmfs.log.latest about
verbs being disabled "due to too many errors".? You can also see fewer
verbs connections between nodes in "mmfsadm test verbs conn" output.


Restarting GPFS on the nodes with waiters has fixed the issue for me, I
don't know if IBM has any other tricks to fix this without a restart.


--Joey


On 9/12/19 8:16 AM, Damir Krstic wrote:
      On my cluster I have seen couple of long waiters such as this:

      gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230
      VerbsReconnectThread: delaying for 43.145624000 more seconds, reason:
      delaying for next reconnect attempt

      I tried searching on gpfs wiki for this type of waiter, but was
      unable to find anything of value.

      Is this something to pay attention to, and what does this waiter
      mean?

      Thank you.
      Damir

      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=WoT3TYlCvAM8RQxUISD9L6UzqY0I_ffCJTS-UHhw8z4&s=18A0j0Zmp8OwZ6Y6cc3HFe3OgFZRHIv8OeJcBpkaPwQ&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/e5e489f9/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/e5e489f9/attachment-0001.gif>

From alvise.dorigo at psi.ch  Mon Sep 16 13:58:03 2019
From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI))
Date: Mon, 16 Sep 2019 12:58:03 +0000
Subject: [gpfsug-discuss] Can 5-minutes frequent lsscsi command disrupt GPFS
 I/O on a Lenovo system ?
Message-ID: <83A6EEB0EC738F459A39439733AE80452BEA85FE@MBX214.d.ethz.ch>

Hello folks,
recently I observed that calling every 5 minutes the command "lsscsi -g" on a Lenovo I/O node (a X3650 M5 connected to D3284 enclosures, part of a DSS-G220 system) can seriously compromise the GPFS I/O performance.

(The motivation of running lsscsi every 5 minutes is a bit out of topic, but I can explain on request).

What we observed is that there were several GPFS waiters telling that flushing caches to physical disk was impossible and they had to wait (possibly going in timeout).

Is this something expected and/or observed by someone else in this community ?

Thanks
Regards,

   Alvise Dorigo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/479fd0fe/attachment-0001.htm>

From ewahl at osc.edu  Mon Sep 16 15:50:24 2019
From: ewahl at osc.edu (Wahl, Edward)
Date: Mon, 16 Sep 2019 14:50:24 +0000
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to
	be	expected?
In-Reply-To: <OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>
References: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>,
	<OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>
Message-ID: <DM5PR0102MB3494845B8859DD1F731DBA53A88C0@DM5PR0102MB3494.prod.exchangelabs.com>

What package provides this /usr/lib/tuned/  file?

Ed


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
Sent: Monday, September 16, 2019 3:12 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?

Hallo Heiner,
usually, Spectrum Scale comes with a tuned profile (named scale) ..

[root at nsd01 ~]# tuned-adm active
Current active profile: scale

in there
[root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3
# Disable IPv6
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
[root at nsd01 ~]#

depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ...
but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled...


From:        "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        09/13/2019 05:02 PM
Subject:        [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello,

I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated.  But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets.
I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level

Any comment is welcome

Cheers,
Heiner
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


I did check with

 ss -l -t -4
 ss -l -t  -6

add  -p to get the process name, too.

do you get the same results on your ces nodes?


[root at nas22ces04-i config_samples]#   ss -l -t   -4
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      8192                                                                                  *:gpfs                                                                                                *:*
LISTEN      0      50                                                                                    *:netbios-ssn                                                                                         *:*
LISTEN      0      128                                                                                   *:5355                                                                                                *:*
LISTEN      0      128                                                                                   *:sunrpc                                                                                              *:*
LISTEN      0      128                                                                                   *:ssh                                                                                                 *:*
LISTEN      0      100                                                                           127.0.0.1:smtp                                                                                                *:*
LISTEN      0      10                                                                        10.250.135.24:4379                                                                                                *:*
LISTEN      0      128                                                                                   *:32765                                                                                               *:*
LISTEN      0      50                                                                                    *:microsoft-ds                                                                                        *:*
[root at nas22ces04-i config_samples]#   ss -l -t   -6
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      128                                                                                  :::32767                                                                                              :::*
LISTEN      0      128                                                                                  :::32768                                                                                              :::*
LISTEN      0      128                                                                                  :::32769                                                                                              :::*
LISTEN      0      128                                                                                  :::2049                                                                                               :::*
LISTEN      0      128                                                                                  :::5355                                                                                               :::*
LISTEN      0      50                                                                                   :::netbios-ssn                                                                                        :::*
LISTEN      0      128                                                                                  :::sunrpc                                                                                             :::*
LISTEN      0      128                                                                                  :::ssh                                                                                                :::*
LISTEN      0      128                                                                                  :::32765                                                                                              :::*
LISTEN      0      50                                                                                   :::microsoft-ds                                                                                       :::*

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/32f48867/attachment-0001.htm>

From cblack at nygenome.org  Mon Sep 16 15:55:34 2019
From: cblack at nygenome.org (Christopher Black)
Date: Mon, 16 Sep 2019 14:55:34 +0000
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be
 expected?
In-Reply-To: <DM5PR0102MB3494845B8859DD1F731DBA53A88C0@DM5PR0102MB3494.prod.exchangelabs.com>
References: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
	<OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>
	<DM5PR0102MB3494845B8859DD1F731DBA53A88C0@DM5PR0102MB3494.prod.exchangelabs.com>
Message-ID: <C6000761-94CE-4D51-BF9A-B70AB3CF5B31@nygenome.org>

On our recent ESS systems we do not see /etc/tuned/scale/tuned.conf (or script.sh) owned by any package (rpm -qif ?).
I?ve attached what we have on our ESS 5.3.3 systems.

Best,
Chris

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "Wahl, Edward" <ewahl at osc.edu>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, September 16, 2019 at 10:50 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?

What package provides this /usr/lib/tuned/  file?

Ed


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
Sent: Monday, September 16, 2019 3:12 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?

Hallo Heiner,
usually, Spectrum Scale comes with a tuned profile (named scale) ..

[root at nsd01 ~]# tuned-adm active
Current active profile: scale

in there
[root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3
# Disable IPv6
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
[root at nsd01 ~]#

depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ...
but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled...


From:        "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        09/13/2019 05:02 PM
Subject:        [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello,

I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated.  But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets.
I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level

Any comment is welcome

Cheers,
Heiner
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


I did check with

 ss -l -t -4
 ss -l -t  -6

add  -p to get the process name, too.

do you get the same results on your ces nodes?


[root at nas22ces04-i config_samples]#   ss -l -t   -4
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      8192                                                                                  *:gpfs                                                                                                *:*
LISTEN      0      50                                                                                    *:netbios-ssn                                                                                         *:*
LISTEN      0      128                                                                                   *:5355                                                                                                *:*
LISTEN      0      128                                                                                   *:sunrpc                                                                                              *:*
LISTEN      0      128                                                                                   *:ssh                                                                                                 *:*
LISTEN      0      100                                                                           127.0.0.1:smtp                                                                                                *:*
LISTEN      0      10                                                                        10.250.135.24:4379                                                                                                *:*
LISTEN      0      128                                                                                   *:32765                                                                                               *:*
LISTEN      0      50                                                                                    *:microsoft-ds                                                                                        *:*
[root at nas22ces04-i config_samples]#   ss -l -t   -6
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      128                                                                                  :::32767                                                                                              :::*
LISTEN      0      128                                                                                  :::32768                                                                                              :::*
LISTEN      0      128                                                                                  :::32769                                                                                              :::*
LISTEN      0      128                                                                                  :::2049                                                                                               :::*
LISTEN      0      128                                                                                  :::5355                                                                                               :::*
LISTEN      0      50                                                                                   :::netbios-ssn                                                                                        :::*
LISTEN      0      128                                                                                  :::sunrpc                                                                                             :::*
LISTEN      0      128                                                                                  :::ssh                                                                                                :::*
LISTEN      0      128                                                                                  :::32765                                                                                              :::*
LISTEN      0      50                                                                                   :::microsoft-ds                                                                                       :::*

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFAw&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=vVQfi9vKAMyJJJkblLG-6lFn75kWWfpc6yGZpiIkJMo&s=l9Yuaa-imE1XkV2RV-lyYdcH0aRV2klb5vXbPDHg6F4&e=>


________________________________
This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/8f60d415/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tuned.conf
Type: application/octet-stream
Size: 2859 bytes
Desc: tuned.conf
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/8f60d415/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: script.sh
Type: application/octet-stream
Size: 270 bytes
Desc: script.sh
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/8f60d415/attachment-0003.obj>

From heinrich.billich at id.ethz.ch  Mon Sep 16 16:49:57 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Mon, 16 Sep 2019 15:49:57 +0000
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be
 expected?
In-Reply-To: <OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>
References: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
	<OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>
Message-ID: <766AA5C3-46BD-4B91-9D1E-52BC5FAB90A8@id.ethz.ch>

Hello Olaf,

Thank you, so we?ll try to get rid of IPv6. Actually we do have this settings active but  I may have to add them to the initrd file, too. (See https://access.redhat.com/solutions/8709#?rhel7disable) to prevent ganesha from opening an IPv6 socket. It?s probably no big issue if ganesha uses IPv4overIPv6 for all connections, but to keep things simple I would like to avoid it.

@Edward
We got /etc/tuned/scale/tuned.conf with GSS/xCAT. I?m not sure whether it?s part of any rpm.

Cheers,
Heiner
From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
Reply to: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, 16 September 2019 at 09:12
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?

Hallo Heiner,
usually, Spectrum Scale comes with a tuned profile (named scale) ..

[root at nsd01 ~]# tuned-adm active
Current active profile: scale

in there
[root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3
# Disable IPv6
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
[root at nsd01 ~]#

depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ...
but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled...


From:        "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        09/13/2019 05:02 PM
Subject:        [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello,

I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated.  But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets.
I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level

Any comment is welcome

Cheers,
Heiner
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


I did check with

 ss -l -t -4
 ss -l -t  -6

add  -p to get the process name, too.

do you get the same results on your ces nodes?


[root at nas22ces04-i config_samples]#   ss -l -t   -4
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      8192                                                                                  *:gpfs                                                                                                *:*
LISTEN      0      50                                                                                    *:netbios-ssn                                                                                         *:*
LISTEN      0      128                                                                                   *:5355                                                                                                *:*
LISTEN      0      128                                                                                   *:sunrpc                                                                                              *:*
LISTEN      0      128                                                                                   *:ssh                                                                                                 *:*
LISTEN      0      100                                                                           127.0.0.1:smtp                                                                                                *:*
LISTEN      0      10                                                                        10.250.135.24:4379                                                                                                *:*
LISTEN      0      128                                                                                   *:32765                                                                                               *:*
LISTEN      0      50                                                                                    *:microsoft-ds                                                                                        *:*
[root at nas22ces04-i config_samples]#   ss -l -t   -6
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      128                                                                                  :::32767                                                                                              :::*
LISTEN      0      128                                                                                  :::32768                                                                                              :::*
LISTEN      0      128                                                                                  :::32769                                                                                              :::*
LISTEN      0      128                                                                                  :::2049                                                                                               :::*
LISTEN      0      128                                                                                  :::5355                                                                                               :::*
LISTEN      0      50                                                                                   :::netbios-ssn                                                                                        :::*
LISTEN      0      128                                                                                  :::sunrpc                                                                                             :::*
LISTEN      0      128                                                                                  :::ssh                                                                                                :::*
LISTEN      0      128                                                                                  :::32765                                                                                              :::*
LISTEN      0      50                                                                                   :::microsoft-ds                                                                                       :::*

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/e76352cd/attachment-0001.htm>

From Robert.Oesterlin at nuance.com  Mon Sep 16 18:34:07 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Mon, 16 Sep 2019 17:34:07 +0000
Subject: [gpfsug-discuss] SSUG @ SC19 Update: Scheduling and Sponsorship
	Opportunities
Message-ID: <F5A6F15E-EBD8-4CD2-B1E4-D6B5616184E2@nuance.com>

Two months until SC19 and the schedule is starting to come together, with a great mix of technical updates and user talks. I would like highlight a few items for you to be aware of:

- Morning session: We?re currently trying to put together a morning ?new users? session for those new to Spectrum Scale. These talks would be focused on fundamentals and give an opportunity to ask questions. We?re tentatively thinking about starting around 9:30-10 AM on Sunday November 17th. Watch the mailing list for updates and on the http://spectrumscale.org site.
- Sponsorships: We?re looking for sponsors. If your company is an IBM partner, uses/incorporates Spectrum Scale - please contact myself or Kristy Kallback-Rose. We are looking for sponsors to help with lunch (YES - we?d like to serve lunch this year!) and WiFi access during the user group meeting.

Looking forward to seeing you all at SC19. Registration link coming soon, watch here: https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-sc19/

Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/74eaddb9/attachment-0001.htm>

From S.J.Thompson at bham.ac.uk  Wed Sep 18 18:56:29 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Wed, 18 Sep 2019 17:56:29 +0000
Subject: [gpfsug-discuss] GPFS and POWER9
Message-ID: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk>

Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log:

Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]
Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert
Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000
Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3
Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]
Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert
Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000
Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]
Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [00000000115a2478] PID: 141380 Comm: mmfsd
Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU
Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Load/Store]
Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000003002a2a8400
Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c016590000
Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]
Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001150b160] PID: 141380 Comm: mmfsd
Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU
Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]
Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001150b160
Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c01fe80000
Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]
Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd
Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU
Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]
Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001086a7f0
Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c00fe70000
Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4

I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed.

Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)?

Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190918/19cb9b0e/attachment-0001.htm>

From abeattie at au1.ibm.com  Thu Sep 19 11:44:46 2019
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Thu, 19 Sep 2019 10:44:46 +0000
Subject: [gpfsug-discuss] GPFS and POWER9
In-Reply-To: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk>
References: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk>
Message-ID: <OF5BB3FFA4.5126D760-ON0025847A.003B07EE-0025847A.003B07F5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/565ad74a/attachment-0001.htm>

From heinrich.billich at id.ethz.ch  Thu Sep 19 15:20:53 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Thu, 19 Sep 2019 14:20:53 +0000
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this
	unusual?
Message-ID: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>


Hello,

Is it usual to see 200?000-400?000 open files for a single ganesha process? Or does this indicate that something ist wrong?

We have some issues with ganesha (on spectrum scale protocol nodes)  reporting NFS3ERR_IO in the log. I noticed that the affected nodes have a large number of open files, 200?000-400?000 open files per daemon (and 500 threads and about 250 client connections). Other nodes have 1?000 ? 10?000 open files by ganesha only and don?t show the issue.

If someone could explain how ganesha decides which files to keep open and which to close that would help, too. As NFSv3 is stateless the client doesn?t open/close a file, it?s the server to decide when to close it? We do have a few NFSv4 clients, too.

Are there certain access patterns that can trigger such a large number of open file? Maybe traversing and reading a large number of small files?

Thank you,
Heiner

I did count the open files  by counting the entries in /proc/<pid of ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to list all the symbolic links, hence I can?t relate the open files to different exports easily.

I did post this to the ganesha mailing list, too.
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/52b76cf1/attachment-0001.htm>

From frederik.ferner at diamond.ac.uk  Thu Sep 19 15:30:45 2019
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Thu, 19 Sep 2019 15:30:45 +0100
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is
 this unusual?
In-Reply-To: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
Message-ID: <b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>

Heiner,

we are seeing similar issues with CES/ganesha NFS, in our case it 
exclusively with NFSv3 clients.

What is maxFilesToCache set to on your ganesha node(s)? In our case 
ganesha was running into the limit of open file descriptors because 
maxFilesToCache was set at a low default and for now we've increased it 
to 1M.

It seemed that ganesha was never releasing files even after clients 
unmounted the file system.

We've only recently made the change, so we'll see how much that improved 
the situation.

I thought we had a reproducer but after our recent change, I can now no 
longer successfully reproduce the increase in open files not being released.

Kind regards,
Frederik

On 19/09/2019 15:20, Billich  Heinrich Rainer (ID SD) wrote:
> Hello,
> 
> Is it usual to see 200?000-400?000 open files for a single ganesha 
> process? Or does this indicate that something ist wrong?
> 
> We have some issues with ganesha (on spectrum scale protocol nodes) 
>  ?reporting NFS3ERR_IO in the log. I noticed that the affected nodes 
> have a large number of open files, 200?000-400?000 open files per daemon 
> (and 500 threads and about 250 client connections). Other nodes have 
> 1?000 ? 10?000 open files by ganesha only and don?t show the issue.
> 
> If someone could explain how ganesha decides which files to keep open 
> and which to close that would help, too. As NFSv3 is stateless the 
> client doesn?t open/close a file, it?s the server to decide when to 
> close it? We do have a few NFSv4 clients, too.
> 
> Are there certain access patterns that can trigger such a large number 
> of open file? Maybe traversing and reading a large number of small files?
> 
> Thank you,
> 
> Heiner
> 
> I did count the open files ?by counting the entries in /proc/<pid of 
> ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to 
> list all the symbolic links, hence I can?t relate the open files to 
> different exports easily.
> 
> I did post this to the ganesha mailing list, too.
> 
> -- 
> 
> =======================
> 
> Heinrich Billich
> 
> ETH Z?rich
> 
> Informatikdienste
> 
> Tel.: +41 44 632 72 56
> 
> heinrich.billich at id.ethz.ch
> 
> ========================
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 


-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom


From S.J.Thompson at bham.ac.uk  Thu Sep 19 16:18:47 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Thu, 19 Sep 2019 15:18:47 +0000
Subject: [gpfsug-discuss] GPFS and POWER9
In-Reply-To: <OF5BB3FFA4.5126D760-ON0025847A.003B07EE-0025847A.003B07F5@notes.na.collabserv.com>
References: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk>
	<OF5BB3FFA4.5126D760-ON0025847A.003B07EE-0025847A.003B07F5@notes.na.collabserv.com>
Message-ID: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>

Hi Andrew,

Yes, but not only. We use the two SFP+ ports from the Broadcom supplied card + the bifurcated Mellanox card in them.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "abeattie at au1.ibm.com" <abeattie at au1.ibm.com>
Reply-To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 19 September 2019 at 11:45
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] GPFS and POWER9

Simon,

are you using Intel 10Gb Network Adapters with RH 7.6 by anychance?

regards
Andrew Beattie
File and Object Storage Technical Specialist - A/NZ
IBM Systems - Storage
Phone: 614-2133-7927
E-mail: abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>


----- Original message -----
From: Simon Thompson <S.J.Thompson at bham.ac.uk>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] GPFS and POWER9
Date: Thu, Sep 19, 2019 8:42 PM


Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log:


Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert

Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000

Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3

Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert

Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [00000000115a2478] PID: 141380 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Load/Store]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000003002a2a8400

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c016590000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001150b160] PID: 141380 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001150b160

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c01fe80000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001086a7f0

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c00fe70000

Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4


I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed.


Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)?


Simon
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/425e8bd9/attachment-0001.htm>

From mnaineni at in.ibm.com  Thu Sep 19 19:38:53 2019
From: mnaineni at in.ibm.com (Malahal R Naineni)
Date: Thu, 19 Sep 2019 18:38:53 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Ganesha_daemon_has_400=27000_open_file?=
 =?utf-8?q?s_-_is_this=09unusual=3F?=
In-Reply-To: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
Message-ID: <OF0B80358D.CD6BEB6B-ON0025847A.0063AE28-0025847A.0066700B@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/bd109dc6/attachment-0001.htm>

From abeattie at au1.ibm.com  Thu Sep 19 22:34:33 2019
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Thu, 19 Sep 2019 21:34:33 +0000
Subject: [gpfsug-discuss] GPFS and POWER9
In-Reply-To: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>
References: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>,
	<2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk><OF5BB3FFA4.5126D760-ON0025847A.003B07EE-0025847A.003B07F5@notes.na.collabserv.com>
Message-ID: <OF41085F4B.A1242A6C-ON0025847A.00767BDC-0025847A.00768562@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/4aedb8d1/attachment-0001.htm>

From Robert.Oesterlin at nuance.com  Thu Sep 19 23:41:08 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 19 Sep 2019 22:41:08 +0000
Subject: [gpfsug-discuss] Leftover GUI events after ESS upgrade
Message-ID: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com>

I just upgraded to ESS 5.3.4-1, and during the process these appeared. They only show up in the GUI. They don?t appear in gnrhelathcheck or mmhealth. pdisk checks are clearAny idea how to get rid of them?


GSSIO1-HS GNR pdisk rg_gssio1-hs/n001v001 is diagnosing

GSSIO1-HS GNR pdisk rg_gssio2-hs/n001v002 is diagnosing


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/d3559e5e/attachment-0001.htm>

From TROPPENS at de.ibm.com  Fri Sep 20 09:08:01 2019
From: TROPPENS at de.ibm.com (Ulf Troppens)
Date: Fri, 20 Sep 2019 10:08:01 +0200
Subject: [gpfsug-discuss] Agenda and registration link // Oct 10 - Spectrum
	Scale NYC User Meeting
Message-ID: <OF64C797F0.8BE6F29D-ONC125847B.0029D9CA-C125847B.002CAE42@notes.na.collabserv.com>


Draft agenda and registration link are now available:
https://www.spectrumscaleug.org/event/spectrum-scale-nyc-user-meeting-2019/


--
IBM Spectrum Scale Development - Client Engagements & Solutions Delivery
Consulting IT Specialist
Author "Storage Networks Explained"

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart,
HRB 243294

----- Forwarded by Ulf Troppens/Germany/IBM on 20/09/2019 09:37 -----

From:	"Ulf Troppens" <TROPPENS at de.ibm.com>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	11/09/2019 14:27
Subject:	[EXTERNAL] [gpfsug-discuss] Save the date: Oct 10 - Spectrum
            Scale NYC User	Meeting
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Greetings,

NYU Langone and IBM will host a Spectrum Scale User Meeting on October 10.
Many senior engineers of our development lab in Poughkeepsie will attend
and present. Details with agenda, exact location and registration link will
follow.

Best
Ulf


--
IBM Spectrum Scale Development - Client Engagements & Solutions Delivery
Consulting IT Specialist
Author "Storage Networks Explained"

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart,
HRB 243294
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=kZaabFheMr5-INuBtDMnDjxzZMuvvQ-K0cx1FAfh4lg&m=I3TzCv5SKxKb51eAL_blo-XwctX64z70ayrZKERanWA&s=OSKGngwXAoOemFy3HkctexuIpBJQu8NPeTkC_MMQBks&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190920/ce0d530a/attachment-0001.htm>

From rohwedder at de.ibm.com  Fri Sep 20 10:14:58 2019
From: rohwedder at de.ibm.com (Markus Rohwedder)
Date: Fri, 20 Sep 2019 11:14:58 +0200
Subject: [gpfsug-discuss] Leftover GUI events after ESS upgrade
In-Reply-To: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com>
References: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com>
Message-ID: <OF65D349DD.8095BE48-ON0025847B.003194D2-C125847B.0032CF35@notes.na.collabserv.com>

Hello Bob,

this event is a "Notice": You can use the action "Mark  Selected Notices as
Read" or "Mark All Notices as Read"in the GUI Event Groups or Individual
Events grid.
Notice events are transient by nature and don't imply a permanent state
change of an entity.
It seems that during the upgrade, mmhealth had probed the pdisk and the
disk hospital was diagnosing the pdisk at this time, but eventually disk
hospital placed the pdisk back to normal state,

Mit freundlichen Gr??en / Kind regards

Dr. Markus Rohwedder

Spectrum Scale GUI Development
                                                                                   
                                                                                   
 Phone:  +49 162 4159920       IBM Deutschland Research &                          
                              Development                                          
                                                                                   
 E-Mail: rohwedder at de.ibm.com  Am Weiher 24                                        
                                                                                   
                               65451 Kelsterbach                                   
                                                                                   
                               Germany                                             
                                                                                   
                                                                                   
From:	"Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	20.09.2019 00:53
Subject:	[EXTERNAL] [gpfsug-discuss] Leftover GUI events after ESS
            upgrade
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


I just upgraded to ESS 5.3.4-1, and during the process these appeared. They
only show up in the GUI. They don?t appear in gnrhelathcheck or mmhealth.
pdisk checks are clearAny idea how to get rid of them?

GSSIO1-HS GNR pdisk rg_gssio1-hs/n001v001 is diagnosing
GSSIO1-HS GNR pdisk rg_gssio2-hs/n001v002 is diagnosing


Bob Oesterlin
Sr Principal Storage Engineer, Nuance
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=hLyf83U0otjISdpV5zl1cSCPVFFUF61ny3jWvv-5kNQ&s=ptMGcpNhnRTogPO2CN_l6jhC-vCN-VQAf53HmRLQDq8&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190920/074ed425/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190920/074ed425/attachment-0003.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 14525383.gif
Type: image/gif
Size: 4659 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190920/074ed425/attachment-0004.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190920/074ed425/attachment-0005.gif>

From heinrich.billich at id.ethz.ch  Mon Sep 23 10:33:02 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Mon, 23 Sep 2019 09:33:02 +0000
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is
 this unusual?
In-Reply-To: <b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
	<b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>
Message-ID: <9D53BE88-A5FC-469F-9362-F2EC67E393B7@id.ethz.ch>

Hello Frederik,

Thank you. I now see a similar behavior: Ganesha has 500k open files while the node is suspended since 2+hours. I would expect that some cleanup job does remove most of the open FD after a much shorter while. Our systems have an upper limit of 1M open files per process and these spectrum scale settings:

! maxFilesToCache 1048576
! maxStatCache 2097152

Our ganesha version is 2.5.3. (gpfs.nfs-ganesha-2.5.3-ibm036.10.el7). I don't see the issue with gpfs.nfs-ganesha-2.5.3-ibm030.01.el7. But this second cluster also has a different load pattern.

I did also post my initial question to the ganesha mailing list and want to share the reply I've got from Daniel Gryniewicz.

Cheers,
Heiner

Daniel Gryniewicz <dang at redhat.com>
So, it's not impossible, based on the workload, but it may also be a bug.

For global FDs (All NFSv3 and stateless NFSv4), we obviously cannot know
when the client closes the FD, and opening/closing all the time causes a
large performance hit.  So, we cache open FDs.

All handles in MDCACHE live on the LRU.  This LRU is divided into 2
levels.  Level 1 is more active handles, and they can have open FDs.
Various operation can demote a handle to level 2 of the LRU.  As part of
this transition, the global FD on that handle is closed.  Handles that
are actively in use (have a refcount taken on them) are not eligible for
this transition, as the FD may be being used.

We have a background thread that runs, and periodically does this
demotion, closing the FDs.  This thread runs more often when the number
of open FDs is above FD_HwMark_Percent of the available number of FDs,
and runs constantly when the open FD count is above FD_Limit_Percent of
the available number of FDs.

So, a heavily used server could definitely have large numbers of FDs
open.  However, there have also, in the past, been bugs that would
either keep the FDs from being closed, or would break the accounting (so
they were closed, but Ganesha still thought they were open).  You didn't
say what version of Ganesha you're using, so I can't tell if one of
those bugs apply.

Daniel

?On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" <gpfsug-discuss-bounces at spectrumscale.org on behalf of frederik.ferner at diamond.ac.uk> wrote:

    Heiner,
    
    we are seeing similar issues with CES/ganesha NFS, in our case it 
    exclusively with NFSv3 clients.
    
    What is maxFilesToCache set to on your ganesha node(s)? In our case 
    ganesha was running into the limit of open file descriptors because 
    maxFilesToCache was set at a low default and for now we've increased it 
    to 1M.
    
    It seemed that ganesha was never releasing files even after clients 
    unmounted the file system.
    
    We've only recently made the change, so we'll see how much that improved 
    the situation.
    
    I thought we had a reproducer but after our recent change, I can now no 
    longer successfully reproduce the increase in open files not being released.
    
    Kind regards,
    Frederik
    
    On 19/09/2019 15:20, Billich  Heinrich Rainer (ID SD) wrote:
    > Hello,
    > 
    > Is it usual to see 200?000-400?000 open files for a single ganesha 
    > process? Or does this indicate that something ist wrong?
    > 
    > We have some issues with ganesha (on spectrum scale protocol nodes) 
    >   reporting NFS3ERR_IO in the log. I noticed that the affected nodes 
    > have a large number of open files, 200?000-400?000 open files per daemon 
    > (and 500 threads and about 250 client connections). Other nodes have 
    > 1?000 ? 10?000 open files by ganesha only and don?t show the issue.
    > 
    > If someone could explain how ganesha decides which files to keep open 
    > and which to close that would help, too. As NFSv3 is stateless the 
    > client doesn?t open/close a file, it?s the server to decide when to 
    > close it? We do have a few NFSv4 clients, too.
    > 
    > Are there certain access patterns that can trigger such a large number 
    > of open file? Maybe traversing and reading a large number of small files?
    > 
    > Thank you,
    > 
    > Heiner
    > 
    > I did count the open files  by counting the entries in /proc/<pid of 
    > ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to 
    > list all the symbolic links, hence I can?t relate the open files to 
    > different exports easily.
    > 
    > I did post this to the ganesha mailing list, too.
    > 
    > -- 
    > 
    > =======================
    > 
    > Heinrich Billich
    > 
    > ETH Z?rich
    > 
    > Informatikdienste
    > 
    > Tel.: +41 44 632 72 56
    > 
    > heinrich.billich at id.ethz.ch
    > 
    > ========================
    > 
    > 
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    > 
    
    
    -- 
    This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
    Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
    Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
    Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    

From heinrich.billich at id.ethz.ch  Mon Sep 23 11:43:06 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Mon, 23 Sep 2019 10:43:06 +0000
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is
 this unusual?
In-Reply-To: <OF0B80358D.CD6BEB6B-ON0025847A.0063AE28-0025847A.0066700B@notes.na.collabserv.com>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
	<OF0B80358D.CD6BEB6B-ON0025847A.0063AE28-0025847A.0066700B@notes.na.collabserv.com>
Message-ID: <72079C31-1E3E-4F69-B428-480620466353@id.ethz.ch>

Hello Malhal,

Thank you. Actually I don?t see the parameter Cache_FDs in our ganesha config. But when I trace LRU processing I see that almost no FDs get released. And the number of FDs given in the log messages doesn?t match what I see in /proc/<pid of ganesha>/fd/. I see 512k open files while the logfile give 600k. Even 4hours since the I suspended the node and all i/o activity stopped I see 500k open files and LRU processing doesn?t close any of them.

This looks like a bug in gpfs.nfs-ganesha-2.5.3-ibm036.10.el7. I?ll open a case with IBM. We did see gansha to fail to open new files and hence client requests to fail. I assume that 500K FDs compared to 10K FDs as before create some notable overhead for ganesha, spectrum scale and kernel and withdraw resources from samba.

I?ll post to the list once we got some results.

Cheers,

Heiner


Start of LRU processing

2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1027
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51350 totalclosed:6
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1027
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1027 closing 0 descriptors
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1028
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51400 totalclosed:6
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1028
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1028 closing 0 descriptors

End of log
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1029
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1029 closing 0 descriptors
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1030
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51500 totalclosed:6
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1030
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1030 closing 0 descriptors
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :After work, open_fd_count:607024  count:29503718 fdrate:1908874353 threadwait=9
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :currentopen=607024 futility=0 totalwork=51550 biggest_window=335544 extremis=0 lanes=1031 fds_lowat=167772

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Malahal R Naineni <mnaineni at in.ibm.com>
Reply to: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 19 September 2019 at 20:39
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual?

NFSv3 doesn't have open/close requests, so nfs-ganesha opens a file for read/write when there is an NFSv3 read/write request. It does cache file descriptors, so its open count can be very large. If you have 'Cache_FDs = true" in your config, ganesha aggressively caches file descriptors.

Taking traces with COMPONENT_CACHE_INODE_LRU level set to full debug should give us better insight on what is happening when the the open file descriptors count is very high.

When the I/O failure happens or when the open fd count is high, you could do the following:

1. ganesha_mgr set_log COMPONENT_CACHE_INODE_LRU FULL_DEBUG
2. wait for 90 seconds, then run
3. ganesha_mgr set_log COMPONENT_CACHE_INODE_LRU EVENT

Regards, Malahal.

----- Original message -----
From: "Billich Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual?
Date: Thu, Sep 19, 2019 7:51 PM


Hello,


Is it usual to see 200?000-400?000 open files for a single ganesha process? Or does this indicate that something ist wrong?


We have some issues with ganesha (on spectrum scale protocol nodes)  reporting NFS3ERR_IO in the log. I noticed that the affected nodes have a large number of open files, 200?000-400?000 open files per daemon (and 500 threads and about 250 client connections). Other nodes have 1?000 ? 10?000 open files by ganesha only and don?t show the issue.


If someone could explain how ganesha decides which files to keep open and which to close that would help, too. As NFSv3 is stateless the client doesn?t open/close a file, it?s the server to decide when to close it? We do have a few NFSv4 clients, too.


Are there certain access patterns that can trigger such a large number of open file? Maybe traversing and reading a large number of small files?


Thank you,

Heiner


I did count the open files  by counting the entries in /proc/<pid of ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to list all the symbolic links, hence I can?t relate the open files to different exports easily.


I did post this to the ganesha mailing list, too.

--

=======================

Heinrich Billich

ETH Z?rich

Informatikdienste

Tel.: +41 44 632 72 56

heinrich.billich at id.ethz.ch

========================


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190923/f76d0274/attachment-0001.htm>

From heinrich.billich at id.ethz.ch  Tue Sep 24 09:52:34 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Tue, 24 Sep 2019 08:52:34 +0000
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is
 this unusual?
In-Reply-To: <b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
	<b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>
Message-ID: <280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch>

Hello Frederik,

Just some addition, maybe its of interest to someone:  The number of max open files for Ganesha is based on maxFilesToCache. Its. 80%of maxFilesToCache up to an upper and lower limits of 2000/1M. The active setting is visible in /etc/sysconfig/ganesha.

Cheers,

Heiner

?On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" <gpfsug-discuss-bounces at spectrumscale.org on behalf of frederik.ferner at diamond.ac.uk> wrote:

    Heiner,
    
    we are seeing similar issues with CES/ganesha NFS, in our case it 
    exclusively with NFSv3 clients.
    
    What is maxFilesToCache set to on your ganesha node(s)? In our case 
    ganesha was running into the limit of open file descriptors because 
    maxFilesToCache was set at a low default and for now we've increased it 
    to 1M.
    
    It seemed that ganesha was never releasing files even after clients 
    unmounted the file system.
    
    We've only recently made the change, so we'll see how much that improved 
    the situation.
    
    I thought we had a reproducer but after our recent change, I can now no 
    longer successfully reproduce the increase in open files not being released.
    
    Kind regards,
    Frederik
    
    On 19/09/2019 15:20, Billich  Heinrich Rainer (ID SD) wrote:
    > Hello,
    > 
    > Is it usual to see 200?000-400?000 open files for a single ganesha 
    > process? Or does this indicate that something ist wrong?
    > 
    > We have some issues with ganesha (on spectrum scale protocol nodes) 
    >   reporting NFS3ERR_IO in the log. I noticed that the affected nodes 
    > have a large number of open files, 200?000-400?000 open files per daemon 
    > (and 500 threads and about 250 client connections). Other nodes have 
    > 1?000 ? 10?000 open files by ganesha only and don?t show the issue.
    > 
    > If someone could explain how ganesha decides which files to keep open 
    > and which to close that would help, too. As NFSv3 is stateless the 
    > client doesn?t open/close a file, it?s the server to decide when to 
    > close it? We do have a few NFSv4 clients, too.
    > 
    > Are there certain access patterns that can trigger such a large number 
    > of open file? Maybe traversing and reading a large number of small files?
    > 
    > Thank you,
    > 
    > Heiner
    > 
    > I did count the open files  by counting the entries in /proc/<pid of 
    > ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to 
    > list all the symbolic links, hence I can?t relate the open files to 
    > different exports easily.
    > 
    > I did post this to the ganesha mailing list, too.
    > 
    > -- 
    > 
    > =======================
    > 
    > Heinrich Billich
    > 
    > ETH Z?rich
    > 
    > Informatikdienste
    > 
    > Tel.: +41 44 632 72 56
    > 
    > heinrich.billich at id.ethz.ch
    > 
    > ========================
    > 
    > 
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    > 
    
    
    -- 
    This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
    Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
    Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
    Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    

From valdis.kletnieks at vt.edu  Tue Sep 24 21:41:07 2019
From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks)
Date: Tue, 24 Sep 2019 16:41:07 -0400
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is
	this unusual?
In-Reply-To: <280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
	<b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>
	<280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch>
Message-ID: <269692.1569357667@turing-police>

On Tue, 24 Sep 2019 08:52:34 -0000, "Billich Heinrich Rainer (ID SD)" said:
> Just some addition, maybe its of interest to someone:  The number of max open
> files for Ganesha is based on maxFilesToCache. Its. 80%of maxFilesToCache up to
> an upper and lower limits of 2000/1M. The active setting is visible in
> /etc/sysconfig/ganesha.

Note that strictly speaking, the values in /etc/sysconfig are in general the
values that will be used at next restart - it's totally possible for the system
to boot, the then-current values be picked up from /etc/sysconfig, and then any
number of things, from configuration automation tools like Ansible, to a
cow-orker sysadmin armed with nothing but /usr/bin/vi, to have changed the
values without you knowing about it and the daemons not be restarted yet...

(Let's just say that in 4 decades of doing this stuff, I've been surprised by that
sort of thing a few times.  :)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190924/0ccd6eff/attachment-0001.sig>

From mnaineni at in.ibm.com  Wed Sep 25 18:06:18 2019
From: mnaineni at in.ibm.com (Malahal R Naineni)
Date: Wed, 25 Sep 2019 17:06:18 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Ganesha_daemon_has_400=27000_open_file?=
 =?utf-8?q?s_-_is=09this_unusual=3F?=
In-Reply-To: <269692.1569357667@turing-police>
References: <269692.1569357667@turing-police>,
	<819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch><b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk><280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch>
Message-ID: <OF9267490D.933DCBEF-ON00258480.005D5A71-00258480.005DF60D@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190925/981e9862/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: att6j9ca.dat
Type: application/octet-stream
Size: 849 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190925/981e9862/attachment-0001.obj>

From L.R.Sudbery at bham.ac.uk  Thu Sep 26 10:38:09 2019
From: L.R.Sudbery at bham.ac.uk (Luke Sudbery)
Date: Thu, 26 Sep 2019 09:38:09 +0000
Subject: [gpfsug-discuss] GPFS and POWER9
In-Reply-To: <OF41085F4B.A1242A6C-ON0025847A.00767BDC-0025847A.00768562@notes.na.collabserv.com>
References: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>,
	<2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk><OF5BB3FFA4.5126D760-ON0025847A.003B07EE-0025847A.003B07F5@notes.na.collabserv.com>
	<OF41085F4B.A1242A6C-ON0025847A.00767BDC-0025847A.00768562@notes.na.collabserv.com>
Message-ID: <3b15db460ac1459e9ca53bec00f30833@bham.ac.uk>

We think our issue was down to numa settings actually - making mmfsd allocate GPU memory. Makes sense given the type of error.

Tomer suggested to Simon we set numactlOptioni to "0 8", as per:
https://www-01.ibm.com/support/docview.wss?uid=isg1IJ02794

Our tests are not crashing since setting then ? we need to roll it out on all nodes to confirm its fixed all our hangs/reboots.

Cheers,

Luke

--
Luke Sudbery
Architecture, Infrastructure and Systems
Advanced Research Computing, IT Services
Room 132, Computer Centre G5, Elms Road

Please note I don?t work on Monday and work from home on Friday.

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of abeattie at au1.ibm.com
Sent: 19 September 2019 22:35
To: gpfsug-discuss at spectrumscale.org
Cc: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] GPFS and POWER9

Simon,

I have an open support call that required Redhat to create a kernel patch for RH 7.6 because of issues with the Intel x710 network adapter - I can't tell you if its related to your issue or not

but it would cause the GPFS cluster to reboot and the affected node to reboot if we tried to do almost anything with that intel adapter

regards,
Andrew Beattie
File and Object Storage Technical Specialist - A/NZ
IBM Systems - Storage
Phone: 614-2133-7927
E-mail: abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>


----- Original message -----
From: Simon Thompson <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Cc:
Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS and POWER9
Date: Fri, Sep 20, 2019 1:18 AM


Hi Andrew,


Yes, but not only. We use the two SFP+ ports from the Broadcom supplied card + the bifurcated Mellanox card in them.


Simon


From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of "abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>" <abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>>
Reply-To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Thursday, 19 September 2019 at 11:45
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] GPFS and POWER9


Simon,


are you using Intel 10Gb Network Adapters with RH 7.6 by anychance?


regards

Andrew Beattie

File and Object Storage Technical Specialist - A/NZ

IBM Systems - Storage

Phone: 614-2133-7927

E-mail: abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>


----- Original message -----
From: Simon Thompson <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] GPFS and POWER9
Date: Thu, Sep 19, 2019 8:42 PM


Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log:


Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert

Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000

Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3

Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert

Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [00000000115a2478] PID: 141380 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Load/Store]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000003002a2a8400

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c016590000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001150b160] PID: 141380 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001150b160

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c01fe80000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001086a7f0

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c00fe70000

Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4


I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed.


Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)?


Simon

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190926/0a8cd98f/attachment-0001.htm>

From andreas.mattsson at maxiv.lu.se  Thu Sep 26 10:55:45 2019
From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson)
Date: Thu, 26 Sep 2019 09:55:45 +0000
Subject: [gpfsug-discuss] afmRefreshAsync questions
Message-ID: <ec6802b481a4474e8a346676ee8ee817@maxiv.lu.se>

Hi,

Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up.

Enabling this feature has had zero impact on performance of the software though.


The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x.

Would this feature still have any effect in this setup?


Regards,

Andreas Mattsson


____________________________________________

[X]

Andreas Mattsson

Systems Engineer


MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 224 84 Lund
Mobile: +46 706 64 95 44
<mailto:andreas.mattsson at maxiv.se>andreas.mattsson at maxiv.lu.se<mailto:andreas.mattsson at maxiv.lu.se>
www.maxiv.se<http://www.maxiv.se/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190926/be6fb1ca/attachment-0001.htm>

From vpuvvada at in.ibm.com  Fri Sep 27 09:23:13 2019
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 27 Sep 2019 13:53:13 +0530
Subject: [gpfsug-discuss] afmRefreshAsync questions
In-Reply-To: <ec6802b481a4474e8a346676ee8ee817@maxiv.lu.se>
References: <ec6802b481a4474e8a346676ee8ee817@maxiv.lu.se>
Message-ID: <OFFED7B75A.B058DB34-ON65258482.002DA8DD-65258482.002E1244@notes.na.collabserv.com>

Hi,

Both storage and client clusters  have to be on 5.0.3.x to get the AFM 
revalidation performance with afmRefreshAsync. What are the refresh 
intervals ?, you could also try increasing them. Is this config option set 
at fileset level or cluster level ?

~Venkat (vpuvvada at in.ibm.com)


From:   Andreas Mattsson <andreas.mattsson at maxiv.lu.se>
To:     GPFS User Group <gpfsug-discuss at spectrumscale.org>
Date:   09/26/2019 03:26 PM
Subject:        [EXTERNAL] [gpfsug-discuss] afmRefreshAsync questions
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi,
Due to having a data analysis software that isn't running well at all in 
our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM 
fileset on the same storage system, I wanted to try out the 
afmRefreshAsync feature that came with 5.0.3 to see if it is the cache 
data refresh that is holding things up.
Enabling this feature has had zero impact on performance of the software 
though.

The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set 
there, but at the moment the remote-mounting client cluster is still 
running 5.0.2.x.
Would this feature still have any effect in this setup?

Regards,
Andreas Mattsson

____________________________________________


Andreas Mattsson
Systems Engineer
 
MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 224 84 Lund
Mobile: +46 706 64 95 44
andreas.mattsson at maxiv.lu.se
www.maxiv.se
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=tjCOcTjZ_AjP3N1mpspwuLu5u2XOFb5LkZqVAwX3wk8&s=tD6X2XM1HPMqWxSg-IelnstWbneQ7On4xfEVkCajtPE&e= 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190927/0ef79489/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 4232 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190927/0ef79489/attachment-0001.png>

From sakkuma4 at in.ibm.com  Fri Sep 27 11:31:42 2019
From: sakkuma4 at in.ibm.com (Saket Kumar11)
Date: Fri, 27 Sep 2019 10:31:42 +0000
Subject: [gpfsug-discuss] afmRefreshAsync questions
In-Reply-To: <mailman.1.1569495602.52991.gpfsug-discuss@spectrumscale.org>
References: <mailman.1.1569495602.52991.gpfsug-discuss@spectrumscale.org>
Message-ID: <OFAE8D4A31.6A3AF249-ON00258481.00443BC9-00258482.0039D5A9@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190927/c70622b4/attachment-0001.htm>

From abeattie at au1.ibm.com  Sun Sep  1 14:17:01 2019
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Sun, 1 Sep 2019 13:17:01 +0000
Subject: [gpfsug-discuss] Backup question
In-Reply-To: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com>
References: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com>
Message-ID: <OF900738E9.D3475363-ON00258468.004792DF-00258468.0048F86B@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190901/c59b82d1/attachment-0002.htm>

From sandeep.patil at in.ibm.com  Tue Sep  3 06:28:30 2019
From: sandeep.patil at in.ibm.com (Sandeep Ramesh)
Date: Tue, 3 Sep 2019 05:28:30 +0000
Subject: [gpfsug-discuss] Latest Technical Blogs on IBM Spectrum Scale (Q2
	2019)
In-Reply-To: <OFEED53014.BEAB96A3-ON652583EB.00247469-652583EB.0024D44E@LocalDomain>
References: <OF7A360CDE.FA6DB691-ON652581DA.005047B1-652581DA.00510C76@LocalDomain>
	<OF574EC5A3.432467EB-ON65258211.00247AF9-65258211.0024E8C2@LocalDomain>
	<OF3AFFA28C.972DCC84-ON6525825D.0040EC76-6525825D.004159E3@LocalDomain>
	<OFA6EC728F.FF378285-ON652582BE.00649A77-652582BE.0066D779@LocalDomain>
	<OF0BEA5F18.0E4A8655-ON6525831B.0051B859-6525831B.00540D1A@LocalDomain>
	<OFDAD8861F.EBFB80F2-ON65258382.0045FED0-65258382.0046E369@LocalDomain>
	<OFEED53014.BEAB96A3-ON652583EB.00247469-652583EB.0024D44E@LocalDomain>
Message-ID: <OFC7CE74CE.2A8DF83C-ON6525846A.001C9692-6525846A.001E12F4@notes.na.collabserv.com>

Dear User Group Members,

In continuation, here are list of development blogs in the this quarter 
(Q2 2019). We now have over 100+ developer blogs on Spectrum Scale/ESS. As 
discussed in User Groups, passing it along to the emailing list.


Redpaper : IBM Power Systems Enterprise AI Solutions (W/ SPECTRUM SCALE)
http://www.redbooks.ibm.com/redpieces/abstracts/redp5556.html?Open

IBM Spectrum Scale Erasure Code Edition (ECE): Installation Demonstration 

https://www.youtube.com/watch?v=6If50EvgP-U

Blogs:
Using IBM Spectrum Scale as platform storage for running containerized 
Hadoop/Spark workloads
https://developer.ibm.com/storage/2019/08/27/using-ibm-spectrum-scale-as-platform-storage-for-running-containerized-hadoop-spark-workloads/

Useful Tools for Spectrum Scale CES NFS
https://developer.ibm.com/storage/2019/07/22/useful-tools-for-spectrum-scale-ces-nfs/

How to ensure NFS uses strong encryption algorithms for secure data in 
motion ?
https://developer.ibm.com/storage/2019/07/19/how-to-ensure-nfs-uses-strong-encryption-algorithms-for-secure-data-in-motion/

Introducing IBM Spectrum Scale Erasure Code Edition
https://developer.ibm.com/storage/2019/07/07/introducing-ibm-spectrum-scale-erasure-code-edition/

Spectrum Scale: Which Filesystem Encryption Algo to Consider ?
https://developer.ibm.com/storage/2019/07/01/spectrum-scale-which-filesystem-encryption-algo-to-consider/

IBM Spectrum Scale HDFS Transparency Apache Hadoop 3.1.x Support
https://developer.ibm.com/storage/2019/06/24/ibm-spectrum-scale-hdfs-transparency-apache-hadoop-3-0-x-support/

Enhanced features in Elastic Storage Server (ESS) 5.3.4 
https://developer.ibm.com/storage/2019/06/19/enhanced-features-in-elastic-storage-server-ess-5-3-4/

Upgrading IBM Spectrum Scale Erasure Code Edition using installation 
toolkit
https://developer.ibm.com/storage/2019/06/09/upgrading-ibm-spectrum-scale-erasure-code-edition-using-installation-toolkit/

Upgrading IBM Spectrum Scale sync replication / stretch cluster setup in 
PureApp
https://developer.ibm.com/storage/2019/06/06/upgrading-ibm-spectrum-scale-sync-replication-stretch-cluster-setup/

GPFS config remote access with multiple network definitions
https://developer.ibm.com/storage/2019/05/30/gpfs-config-remote-access-with-multiple-network-definitions/

IBM Spectrum Scale Erasure Code Edition Fault Tolerance
https://developer.ibm.com/storage/2019/05/30/ibm-spectrum-scale-erasure-code-edition-fault-tolerance/

IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 
5.0.3 ?
https://developer.ibm.com/storage/2019/05/02/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-3/

Understanding and Solving WBC_ERR_DOMAIN_NOT_FOUND error with 
Spectrum Scale
https://crk10.wordpress.com/2019/07/21/solving-the-wbc-err-domain-not-found-nt-status-none-mapped-glitch-in-ibm-spectrum-scale/

Understanding and Solving NT_STATUS_INVALID_SID issue for SMB access with 
Spectrum Scale
https://crk10.wordpress.com/2019/07/24/solving-nt_status_invalid_sid-for-smb-share-access-in-ibm-spectrum-scale/

mmadquery primer (apparatus to query Active Directory from IBM 
Spectrum Scale)
https://crk10.wordpress.com/2019/07/27/mmadquery-primer-apparatus-to-query-active-directory-from-ibm-spectrum-scale/

How to configure RHEL host as Active Directory Client using SSSD
https://crk10.wordpress.com/2019/07/28/configure-rhel-machine-as-active-directory-client-using-sssd/

How to configure RHEL host as LDAP client using nslcd
https://crk10.wordpress.com/2019/07/28/configure-rhel-machine-as-ldap-client-using-nslcd/

Solving NFSv4 AUTH_SYS nobody ownership issue
https://crk10.wordpress.com/2019/07/29/nfsv4-auth_sys-nobody-ownership-and-idmapd/

For more : Search /browse here: https://developer.ibm.com/storage/blog

User Group Presentations: 
https://www.spectrumscale.org/presentations/

Consolidation list of all blogs and collaterals.
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   04/29/2019 12:12 PM
Subject:        Latest Technical Blogs on IBM Spectrum Scale (Q1 2019)


Dear User Group Members,

In continuation, here are list of development blogs in the this quarter 
(Q1 2019). We now have over 100+ developer blogs on Spectrum Scale/ESS. As 
discussed in User Groups, passing it along to the emailing list.

Spectrum Scale 5.0.3
https://developer.ibm.com/storage/2019/04/24/spectrum-scale-5-0-3/

IBM Spectrum Scale HDFS Transparency Ranger Support
https://developer.ibm.com/storage/2019/04/01/ibm-spectrum-scale-hdfs-transparency-ranger-support/

Integration of IBM Aspera Sync with IBM Spectrum Scale: Protecting and 
Sharing Files Globally, 

http://www.redbooks.ibm.com/abstracts/redp5527.html?Open

Spectrum Scale user group in Singapore, 2019
https://developer.ibm.com/storage/2019/03/14/spectrum-scale-user-group-in-singapore-2019/

7 traits to use Spectrum Scale to run container workload
https://developer.ibm.com/storage/2019/02/26/7-traits-to-use-spectrum-scale-to-run-container-workload/

Health Monitoring of IBM Spectrum Scale Cluster via External Monitoring 
Framework
https://developer.ibm.com/storage/2019/01/22/health-monitoring-of-ibm-spectrum-scale-cluster-via-external-monitoring-framework/

Migrating data from native HDFS to IBM Spectrum Scale based shared storage
https://developer.ibm.com/storage/2019/01/18/migrating-data-from-native-hdfs-to-ibm-spectrum-scale-based-shared-storage/

Bulk File Creation useful for Test on Filesystems
https://developer.ibm.com/storage/2019/01/16/bulk-file-creation-useful-for-test-on-filesystems/


For more : Search /browse here: https://developer.ibm.com/storage/blog

User Group Presentations: 
https://www.spectrumscale.org/presentations/

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   01/14/2019 06:24 PM
Subject:        Latest Technical Blogs on IBM Spectrum Scale (Q4 2018)


Dear User Group Members,

In continuation, here are list of development blogs in the this quarter 
(Q4 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As 
discussed in User Groups, passing it along to the emailing list.

Redpaper: IBM Spectrum Scale and IBM StoredIQ: Identifying and securing 
your business data to support regulatory requirements
http://www.redbooks.ibm.com/abstracts/redp5525.html?Open

IBM Spectrum Scale Memory Usage
https://www.slideshare.net/tomerperry/ibm-spectrum-scale-memory-usage?qid=50a1dfda-3102-484f-b9d0-14b69fc4800b&v=&b=&from_search=2

Spectrum Scale and Containers
https://developer.ibm.com/storage/2018/12/20/spectrum-scale-and-containers/

IBM Elastic Storage Server Performance Graphical Visualization with 
Grafana
https://developer.ibm.com/storage/2018/12/18/ibm-elastic-storage-server-performance-graphical-visualization-with-grafana/

Hadoop Performance for disaggregated compute and storage configurations 
based on IBM Spectrum Scale Storage
https://developer.ibm.com/storage/2018/12/13/hadoop-performance-for-disaggregated-compute-and-storage-configurations-based-on-ibm-spectrum-scale-storage/

EMS HA in ESS LE (Little Endian) environment
https://developer.ibm.com/storage/2018/12/07/ems-ha-in-ess-le-little-endian-environment/

What?s new in ESS 5.3.2
https://developer.ibm.com/storage/2018/12/04/whats-new-in-ess-5-3-2/

Administer your Spectrum Scale cluster easily
https://developer.ibm.com/storage/2018/11/13/administer-your-spectrum-scale-cluster-easily/

Disaster Recovery using Spectrum Scale?s Active File Management
https://developer.ibm.com/storage/2018/11/13/disaster-recovery-using-spectrum-scales-active-file-management/

Recovery Group Failover Procedure of IBM Elastic Storage Server (ESS)
https://developer.ibm.com/storage/2018/10/08/recovery-group-failover-procedure-ibm-elastic-storage-server-ess/

Whats new in IBM Elastic Storage Server (ESS) Version 5.3.1 and 5.3.1.1
https://developer.ibm.com/storage/2018/10/04/whats-new-ibm-elastic-storage-server-ess-version-5-3-1-5-3-1-1/

For more : Search /browse here: https://developer.ibm.com/storage/blog

User Group Presentations: 
https://www.spectrumscale.org/presentations/

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   10/03/2018 08:48 PM
Subject:        Latest Technical Blogs on IBM Spectrum Scale (Q3 2018)


Dear User Group Members,

In continuation, here are list of development blogs in the this quarter 
(Q3 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As 
discussed in User Groups, passing it along to the emailing list.

How NFS exports became more dynamic with Spectrum Scale 5.0.2
https://developer.ibm.com/storage/2018/10/02/nfs-exports-became-dynamic-spectrum-scale-5-0-2/

HPC storage on AWS (IBM Spectrum Scale)
https://developer.ibm.com/storage/2018/10/02/hpc-storage-aws-ibm-spectrum-scale/

Upgrade with Excluding the node(s) using Install-toolkit
https://developer.ibm.com/storage/2018/09/30/upgrade-excluding-nodes-using-install-toolkit/

Offline upgrade using Install-toolkit
https://developer.ibm.com/storage/2018/09/30/offline-upgrade-using-install-toolkit/

IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 
5.0.2 ?
https://developer.ibm.com/storage/2018/09/21/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-2/

What?s New in IBM Spectrum Scale 5.0.2 ?
https://developer.ibm.com/storage/2018/09/15/whats-new-ibm-spectrum-scale-5-0-2/

Starting IBM Spectrum Scale 5.0.2 release, the installation toolkit 
supports upgrade rerun if fresh upgrade fails.
https://developer.ibm.com/storage/2018/09/15/starting-ibm-spectrum-scale-5-0-2-release-installation-toolkit-supports-upgrade-rerun-fresh-upgrade-fails/

IBM Spectrum Scale installation toolkit ? enhancements over releases ? 
5.0.2.0
https://developer.ibm.com/storage/2018/09/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases-5-0-2-0/

Announcing HDP 3.0 support with IBM Spectrum Scale
https://developer.ibm.com/storage/2018/08/31/announcing-hdp-3-0-support-ibm-spectrum-scale/

IBM Spectrum Scale Tuning Overview for Hadoop Workload
https://developer.ibm.com/storage/2018/08/20/ibm-spectrum-scale-tuning-overview-hadoop-workload/

Making the Most of Multicloud Storage
https://developer.ibm.com/storage/2018/08/13/making-multicloud-storage/

Disaster Recovery for Transparent Cloud Tiering using SOBAR
https://developer.ibm.com/storage/2018/08/13/disaster-recovery-transparent-cloud-tiering-using-sobar/

Your Optimal Choice of AI Storage for Today and Tomorrow
https://developer.ibm.com/storage/2018/08/10/spectrum-scale-ai-workloads/

Analyze IBM Spectrum Scale File Access Audit with ELK Stack
https://developer.ibm.com/storage/2018/07/30/analyze-ibm-spectrum-scale-file-access-audit-elk-stack/

Mellanox SX1710 40G switch MLAG configuration for IBM ESS
https://developer.ibm.com/storage/2018/07/12/mellanox-sx1710-40g-switcher-mlag-configuration/

Protocol Problem Determination Guide for IBM Spectrum Scale? ? SMB and NFS 
Access issues
https://developer.ibm.com/storage/2018/07/10/protocol-problem-determination-guide-ibm-spectrum-scale-smb-nfs-access-issues/

Access Control in IBM Spectrum Scale Object
https://developer.ibm.com/storage/2018/07/06/access-control-ibm-spectrum-scale-object/

IBM Spectrum Scale HDFS Transparency Docker support
https://developer.ibm.com/storage/2018/07/06/ibm-spectrum-scale-hdfs-transparency-docker-support/

Protocol Problem Determination Guide for IBM Spectrum Scale? ? Log 
Collection
https://developer.ibm.com/storage/2018/07/04/protocol-problem-determination-guide-ibm-spectrum-scale-log-collection/


Redpapers

IBM Spectrum Scale Immutability Introduction, Configuration Guidance, 
and Use Cases
http://www.redbooks.ibm.com/abstracts/redp5507.html?Open

Certifications
Assessment of the immutability function of IBM Spectrum Scale Version 5.0 
in accordance to US SEC17a-4f, EU GDPR Article 21 Section 1, German and 
Swiss laws and regulations in collaboration with KPMG.

Certificate: 
http://www.kpmg.de/bescheinigungen/RequestReport.aspx?DE968667B47544FF83F6CCDCF37E5FB5
Full assessment report: 
http://www.kpmg.de/bescheinigungen/RequestReport.aspx?B290411BE1224F5A9B4D24663BCD3C5D

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   07/03/2018 12:13 AM
Subject:        Re: Latest Technical Blogs on Spectrum Scale (Q2 2018)


Dear User Group Members,

In continuation , here are list of development blogs in the this quarter 
(Q2 2018). We now have over 100+ developer blogs. As discussed in User 
Groups, passing it along:

IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object
https://developer.ibm.com/storage/2018/06/15/6494/

IBM Spectrum Scale ILM Policies
https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/

IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object
https://developer.ibm.com/storage/2018/06/15/6494/

Management GUI enhancements in IBM Spectrum Scale release 5.0.1
https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/


Managing IBM Spectrum Scale services through GUI
https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/


Use AWS CLI with IBM Spectrum Scale? object storage
https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/

Hadoop Storage Tiering with IBM Spectrum Scale
https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/

How many Files on my Filesystem?
https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/

Recording Spectrum Scale Object Stats for Potential Billing like Purpose 
using Elasticsearch
https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/

New features in IBM Elastic Storage Server (ESS) Version 5.3
https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/


Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send 
earlier)
https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19


Redpapers

Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for 
Building an Integrated Solution
http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, 

Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent 
Cloud Tiering
http://www.redbooks.ibm.com/abstracts/redp5411.html?Open

SAP HANA and ESS: A Winning Combination (Update)
http://www.redbooks.ibm.com/abstracts/redp5436.html?Open


Others
IBM Spectrum Scale Software Version Recommendation Preventive Service 
Planning (Updated)
http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, 

IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in 
HCLS
https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN&

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Date:   03/27/2018 05:23 PM
Subject:        Re: Latest Technical Blogs on Spectrum Scale


Dear User Group Members,

In continuation , here are list of development blogs in the this quarter 
(Q1 2018). As discussed in User Groups, passing it along:

GDPR Compliance and Unstructured Data Storage
https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/

IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and 
highlights
https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/

Management GUI enhancements in IBM Spectrum Scale release 5.0.0
https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/

IBM Spectrum Scale 5.0.0 ? What?s new in NFS?
https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/

Benefits and implementation of Spectrum Scale sudo wrappers
https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/

IBM Spectrum Scale: Big Data and Analytics Solution Brief
https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/

Variant Sub-blocks in Spectrum Scale 5.0
https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/

Compression support in Spectrum Scale 5.0.0
https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/

IBM Spectrum Scale Versus Apache Hadoop HDFS
https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/

ESS Fault Tolerance
https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/


Genomic Workloads ? How To Get it Right From Infrastructure Point Of View.


https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/


IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM 
Spectrum Scale on AWS. This solution helps the users who require highly 
available access to a shared name space across multiple instances with 
good performance, without requiring an in-depth knowledge of IBM Spectrum 
Scale.
Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4
Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY.

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Cc:     Doris Conti/Poughkeepsie/IBM at IBMUS
Date:   01/10/2018 12:13 PM
Subject:        Re: Latest Technical Blogs on Spectrum Scale


Dear User Group Members,

Here are list of development blogs in the last quarter. Passing it to this 
email group as Doris had got a feedback in the UG meetings to notify the 
members with the latest updates periodically.

Genomic Workloads ? How To Get it Right From Infrastructure Point Of View.
https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/

IBM Spectrum Scale Versus Apache Hadoop HDFS
https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/

ESS Fault Tolerance
https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/

IBM Spectrum Scale MMFSCK ? Savvy Enhancements
https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/

ESS Disk Management
https://developer.ibm.com/storage/2018/01/02/ess-disk-management/

IBM Spectrum Scale Object Protocol On Ubuntu
https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/

IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object
https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/

A Complete Guide to ? Protocol Problem Determination Guide for IBM 
Spectrum Scale? ? Part 1
https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/

IBM Spectrum Scale installation toolkit ? enhancements over releases
https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/

Network requirements in an Elastic Storage Server Setup
https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/

Co-resident migration with Transparent cloud tierin
https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/

IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big 
Data Solution
https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/

Big data analytics with Spectrum Scale using remote cluster mount & 
multi-filesystem support
https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/

IBM Spectrum Scale HDFS Transparency Short Circuit Write Support
https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/

IBM Spectrum Scale HDFS Transparency Federation Support
https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/

How to configure and performance tuning different system workloads on IBM 
Spectrum Scale Sharing Nothing Cluster
https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

How to configure and performance tuning Spark workloads on IBM Spectrum 
Scale Sharing Nothing Cluster
https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

How to configure and performance tuning database workloads on IBM Spectrum 
Scale Sharing Nothing Cluster
https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

How to configure and performance tuning Hadoop workloads on IBM Spectrum 
Scale Sharing Nothing Cluster

https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/

IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning
https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/

How to Configure IBM Spectrum Scale? with NIS based Authentication.
https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/


For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


From:   Sandeep Ramesh/India/IBM
To:     gpfsug-discuss at spectrumscale.org
Cc:     Doris Conti/Poughkeepsie/IBM at IBMUS
Date:   11/16/2017 08:15 PM
Subject:        Latest Technical Blogs on Spectrum Scale


Dear User Group members,

Here are the Development Blogs in last 3 months on Spectrum Scale 
Technical Topics.

Spectrum Scale Monitoring ? Know More ?
https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/

IBM Spectrum Scale 5.0 Release ? What?s coming !
https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/

Four Essentials things to know for managing data ACLs on IBM Spectrum 
Scale? from Windows
https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/

GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server
https://developer.ibm.com/storage/2017/11/13/gssutils/

IBM Spectrum Scale Object Authentication
https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/

Video Surveillance ? Choosing the right storage
https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/


IBM Spectrum scale object deep dive training with problem determination
https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training


Spectrum Scale as preferred software defined storage for Ubuntu OpenStack
https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/

IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a 
performance workhorse
https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/

A Complete Guide to Configure LDAP-based authentication with IBM Spectrum 
Scale? for File Access
https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/

Deploying IBM Spectrum Scale on AWS Quick Start
https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/

Monitoring Spectrum Scale Object metrics
https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/

Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk 
Universal

https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/

Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol 
on IBM Spectrum Scale??
https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/

IBM Spectrum Scale? Authentication using Active Directory and LDAP
https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/

IBM Spectrum Scale? Authentication using Active Directory and RFC2307
https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/

High Availability Implementation with IBM Spectrum Virtualize and IBM 
Spectrum Scale
https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/

10 Frequently asked Questions on configuring Authentication using AD + 
AUTO ID mapping on IBM Spectrum Scale?.
https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/

IBM Spectrum Scale? Authentication using Active Directory
https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/

Five cool things that you didn?t know Transparent Cloud Tiering on 
Spectrum Scale can do
https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/

IBM Spectrum Scale GUI videos
https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/

IBM Spectrum Scale? Authentication ? Planning for NFS Access
https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/

For more : Search /browse here: https://developer.ibm.com/storage/blog

Consolidation list: 
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190903/9eff9450/attachment-0002.htm>

From janfrode at tanso.net  Tue Sep  3 14:07:44 2019
From: janfrode at tanso.net (Jan-Frode Myklebust)
Date: Tue, 3 Sep 2019 15:07:44 +0200
Subject: [gpfsug-discuss] Fileheat - does work! Complete test/example
 provided here.
In-Reply-To: <OFC33EC1EE.D3A3A94B-ON85258455.00474286-85258455.004A6C4F@notes.na.collabserv.com>
References: <def66a66-1872-31d6-8e8f-2e1d713b775e@science-computing.de>
	<OFE4B1B51C.8A47C241-ON85258454.004A9ABD-85258454.004AFB07@notes.na.collabserv.com>
	<c8035af7-61dc-8b91-6087-7dd95e499fdc@science-computing.de>
	<CAHwPathMfF-6=rdWijkv4vrYRD4QNJHvGOJsEwf2yX-5N3PRxg@mail.gmail.com>
	<OFC33EC1EE.D3A3A94B-ON85258455.00474286-85258455.004A6C4F@notes.na.collabserv.com>
Message-ID: <CAHwPatiJzh6WQFhbyv67KG6RfU2jH950qayKzcJV4ZSgYbjH=Q@mail.gmail.com>

Thanks for this example, very userful, but I'm still struggeling a bit at a
customer..


We're doing heat daily based rebalancing, with fileheatlosspercent=20 and
fileheatperiodminutes=720:

RULE "defineTiers" GROUP POOL 'Tiers'
        IS 'ssdpool' LIMIT(70)
        then 'saspool'

RULE 'Rebalance' MIGRATE FROM POOL 'Tiers' TO POOL 'Tiers'
WEIGHT(FILE_HEAT) WHERE FILE_SIZE<10000000000

but are seeing too many files moved down to the saspool and too few are
staying in the ssdpool. Right now we ran a test of this policy, and saw
that it wanted to move 130k files / 300 GB down to the saspool, and a
single small file up to the ssdpool -- even though the ssdpool is only 50%
utilized.

Running your listing policy reveals lots of files with zero heat:

<7> /gpfs/gpfs0/file1     RULE 'fh2' LIST 'fh'  WEIGHT(0.000000) SHOW(
_NULL_ _NULL_ _NULL_ +0.00000000000000E+000 _NULL_ 720 25 server.locale)

<7> /gpfs/gpfs0/file2     RULE 'fh2' LIST 'fh' WEIGHT(0.000000) SHOW(
_NULL_ _NULL_ _NULL_ +0.00000000000000E+000 _NULL_ 720 25 server.locale)

<7> /gpfs/gpfs0/file3/HM_WVS_8P41017_1/HM_WVS_8P41017_1.S2206      RULE
'fh2' LIST 'fh' WEIGHT(0.000000) SHOW( _NULL_ _NULL_ _NULL_
+0.00000000000000E+000 _NULL_ 720 25 server.locale)


and others with heat:


<5> /gpfs/gpfs0/file4  RULE 'fh2' LIST 'fh' WEIGHT(0.004246) SHOW(
300401047 0 0 +4.24600492924153E-003 11E7C19700000000 720 25 server.locale)

<5> /gpfs/gpfs0/file5  RULE 'fh2' LIST 'fh' WEIGHT(0.001717) SHOW(
120971793 1 0 +1.71725239616613E-003 0735E21100010000 720 25 server.locale)

These are not new files -- so we're wondering if maybe the fileheat is
reduced to zero/NULL after  a while (how many times can it shrink by 25%
before it's zero??).

Would it make sense to increase fileheatperiodeminutes and/or decrease
fileheatlosspercentage? What would be good values? (BTW: we have relatime
enabled)

Any other ideas for why it won't fill up our ssdpool to close to LIMIT(70) ?


  -jf


On Tue, Aug 13, 2019 at 3:33 PM Marc A Kaplan <makaplan at us.ibm.com> wrote:

> Yes, you are correct. It should only be necessary to set
> fileHeatPeriodMinutes, since the loss percent does have a default value.
> But IIRC (I implemented part of this!) you must restart the daemon to get
> those fileheat parameter(s) "loaded"and initialized into the daemon
> processes.
>
> Not fully trusting my memory... I will now "prove" this works today as
> follows:
>
> To test, create and re-read a large file with dd...
>
> [root@/main/gpfs-git]$mmchconfig fileHeatPeriodMinutes=60
> mmchconfig: Command successfully completed
> ...
> [root@/main/gpfs-git]$mmlsconfig | grep -i heat
> fileHeatPeriodMinutes 60
>
> [root@/main/gpfs-git]$mmshutdown
> ...
> [root@/main/gpfs-git]$mmstartup
> ...
> [root@/main/gpfs-git]$mmmount c23
> ...
> [root@/main/gpfs-git]$ls -l /c23/10g
> -rw-r--r--. 1 root root 10737418240 May 16 15:09 /c23/10g
>
> [root@/main/gpfs-git]$mmlsattr -d -X /c23/10g
> file name: /c23/10g
> security.selinux
>
> (NO fileheat attribute yet...)
>
> [root@/main/gpfs-git]$dd if=/c23/10g bs=1M of=/dev/null
> ...
> After the command finishes, you may need to wait a while for the metadata
> to flush to the inode on disk ... or you can force that with an unmount or
> a mmfsctl...
>
> Then the fileheat attribute will appear (I just waited by answering
> another email... No need to do any explicit operations on the file system..)
>
> [root@/main/gpfs-git]$mmlsattr -d -X /c23/10g
> file name: /c23/10g
> security.selinux
> gpfs.FileHeat
>
> To see its hex string value:
>
> [root@/main/gpfs-git]$mmlsattr -d -X -L /c23/10g
> file name: /c23/10g
> ...
> security.selinux:
> 0x756E636F6E66696E65645F753A6F626A6563745F723A756E6C6162656C65645F743A733000
> gpfs.FileHeat: 0x000000EE42A40400
>
> Which will be interpreted by mmapplypolicy...
>
> YES, the interpretation is relative to last access time and current time,
> and done by a policy/sql function "computeFileHeat"
> (You could find this using m4 directives in your policy file...)
>
>
> define([FILE_HEAT],[computeFileHeat(CURRENT_TIMESTAMP-ACCESS_TIME,xattr('gpfs.FileHeat'),KB_ALLOCATED)])
>
> Well gone that far, might as well try mmapplypolicy too....
>
> [root@/main/gpfs-git]$cat /gh/policies/fileheat.policy
> define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1)
> END])
>
> rule fh1 external list 'fh' exec ''
> rule fh2 list 'fh' weight(FILE_HEAT)
> show(DISPLAY_NULL(xattr_integer('gpfs.FileHeat',1,4,'B')) || ' ' ||
> DISPLAY_NULL(xattr_integer('gpfs.FileHeat',5,2,'B')) || ' ' ||
> DISPLAY_NULL(xattr_integer('gpfs.FileHeat',7,2,'B')) || ' ' ||
> DISPLAY_NULL(FILE_HEAT) || ' ' ||
> DISPLAY_NULL(hex(xattr('gpfs.FileHeat'))) || ' ' ||
> getmmconfig('fileHeatPeriodMinutes') || ' ' ||
> getmmconfig('fileHeatLossPercent') || ' ' ||
> getmmconfig('clusterName') )
>
>
> [root@/main/gpfs-git]$mmapplypolicy /c23 --maxdepth 1 -P
> /gh/policies/fileheat.policy -I test -L 3
> ...
> <1> /c23/10g RULE 'fh2' LIST 'fh' WEIGHT(0.022363) SHOW( 238 17060 1024
> +2.23632812500000E-002 000000EE42A40400 60 10 makaplan.sl.cloud9.ibm.com)
> ...
> WEIGHT(0.022363) LIST 'fh' /c23/10g SHOW(238 17060 1024
> +2.23632812500000E-002 000000EE42A40400 60 10 makaplan.sl.cloud9.ibm.com)
>
>
>
>
> [image: Inactive hide details for Jan-Frode Myklebust ---08/13/2019
> 06:22:46 AM---What about filesystem atime updates. We recently chan]Jan-Frode
> Myklebust ---08/13/2019 06:22:46 AM---What about filesystem atime updates.
> We recently changed the default to ?relatime?. Could that maybe
>
> From: Jan-Frode Myklebust <janfrode at tanso.net>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 08/13/2019 06:22 AM
> Subject: [EXTERNAL] Re: [gpfsug-discuss] Fileheat
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
>
> What about filesystem atime updates. We recently changed the default to
> ?relatime?. Could that maybe influence heat tracking?
>
>
>
>   -jf
>
>
> tir. 13. aug. 2019 kl. 11:29 skrev Ulrich Sibiller <
> *u.sibiller at science-computing.de* <u.sibiller at science-computing.de>>:
>
>    On 12.08.19 15:38, Marc A Kaplan wrote:
>    > My Admin guide says:
>    >
>    > The loss percentage and period are set via the configuration
>    > variables *fileHeatLossPercent *and *fileHeatPeriodMinutes*. By
>    default, the file access temperature
>    > is not
>    > tracked. To use access temperature in policy, the tracking must
>    first be enabled. To do this, set
>    > the two
>    > configuration variables as follows:*
>
>    Yes, I am aware of that.
>
>    > fileHeatLossPercent*
>    > The percentage (between 0 and 100) of file access temperature
>    dissipated over the*
>    > fileHeatPeriodMinutes *time. The default value is 10.
>    > Chapter 25. Information lifecycle management for IBM Spectrum Scale
>    *361**
>    > fileHeatPeriodMinutes*
>    > The number of minutes defined for the recalculation of file access
>    temperature. To turn on
>    > tracking, *fileHeatPeriodMinutes *must be set to a nonzero value.
>    The default value is 0
>    >
>    >
>    > SO Try setting both!
>
>    Well, I have not because the documentation explicitly mentions a
>    default. What's the point of a
>    default if I have to explicitly configure it?
>
>    > ALSO to take effect you may have to mmshutdown and mmstartup, at
>    least on the (client gpfs) nodes
>    > that are accessing the files of interest.
>
>    I have now configured both parameters and restarted GPFS. Ran a tar
>    over a directory - still no
>    change. I will wait for 720minutes and retry (tomorrow).
>
>    Thanks
>
>    Uli
>
>    --
>    Science + Computing AG
>    Vorstandsvorsitzender/Chairman of the board of management:
>    Dr. Martin Matzke
>    Vorstand/Board of Management:
>    Matthias Schempp, Sabine Hohenstein
>    Vorsitzender des Aufsichtsrats/
>    Chairman of the Supervisory Board:
>    Philippe Miltin
>    Aufsichtsrat/Supervisory Board:
>    Martin Wibbe, Ursula Morgenstern
>    Sitz/Registered Office: Tuebingen
>    Registergericht/Registration Court: Stuttgart
>    Registernummer/Commercial Register No.: HRB 382196
>    _______________________________________________
>    gpfsug-discuss mailing list
>    gpfsug-discuss at *spectrumscale.org* <http://spectrumscale.org>
> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
>    <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>    _______________________________________________
>    gpfsug-discuss mailing list
>    gpfsug-discuss at spectrumscale.org
>    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190903/d0e482ad/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190903/d0e482ad/attachment-0002.gif>

From Robert.Oesterlin at nuance.com  Tue Sep  3 16:37:58 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Tue, 3 Sep 2019 15:37:58 +0000
Subject: [gpfsug-discuss] Easiest way to copy quota settings from one file
	system to another?
Message-ID: <63C132C3-63AF-465B-8FD9-67AF9EA4887D@nuance.com>

I?m migratinga  file system from one cluster to another.

I want to copy all user quotas from cluster1 filesystem ?A? to cluster2, filesystem ?fs1?, fileset ?A?

What?s the easiest way to do that? I?m thinking mmsetquota with a stanza file, but is there a tool to generate the stanza file from the source? I could do a ?mmrepquota -u -Y? and process the output. Hoping for something easier :)


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190903/fcd817af/attachment-0002.htm>

From andreas.mattsson at maxiv.lu.se  Thu Sep  5 10:54:04 2019
From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson)
Date: Thu, 5 Sep 2019 09:54:04 +0000
Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction
Message-ID: <3ed969d0d778446982a419067320f927@maxiv.lu.se>

Hi,

Does anyone here know if cache eviction on a AFM cache also make the inodes used by the evicted files available for reuse?

Basically, I'm trying to figure out if it is enough to have sufficient inode space in my cache filesets to keep the maximum expected simultaneously cached files, or if I need the same inode space as for the total amount of files that will reside in the home of the cache.

Regards,
Andreas Mattsson


____________________________________________

[X]

Andreas Mattsson

Systems Engineer


MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 224 84 Lund
Mobile: +46 706 64 95 44
<mailto:andreas.mattsson at maxiv.se>andreas.mattsson at maxiv.lu.se<mailto:andreas.mattsson at maxiv.lu.se>
www.maxiv.se<http://www.maxiv.se/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190905/d4d7957b/attachment-0002.htm>

From vpuvvada at in.ibm.com  Thu Sep  5 14:28:00 2019
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Thu, 5 Sep 2019 18:58:00 +0530
Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction
In-Reply-To: <3ed969d0d778446982a419067320f927@maxiv.lu.se>
References: <3ed969d0d778446982a419067320f927@maxiv.lu.se>
Message-ID: <OF71EB5401.26EBDBE8-ON6525846C.00499F4C-6525846C.0049F93B@notes.na.collabserv.com>

Hi,

AFM does not support inode eviction, only data blocks are evicted and the 
file's metadata will remain in the fileset.

~Venkat (vpuvvada at in.ibm.com)


From:   Andreas Mattsson <andreas.mattsson at maxiv.lu.se>
To:     GPFS User Group <gpfsug-discuss at spectrumscale.org>
Date:   09/05/2019 03:39 PM
Subject:        [EXTERNAL] [gpfsug-discuss] Inode reuse on AFM cache 
eviction
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi,
Does anyone here know if cache eviction on a AFM cache also make the 
inodes used by the evicted files available for reuse?
Basically, I'm trying to figure out if it is enough to have sufficient 
inode space in my cache filesets to keep the maximum expected 
simultaneously cached files, or if I need the same inode space as for the 
total amount of files that will reside in the home of the cache. 

Regards,
Andreas Mattsson

____________________________________________


Andreas Mattsson
Systems Engineer
 
MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 224 84 Lund
Mobile: +46 706 64 95 44
andreas.mattsson at maxiv.lu.se
www.maxiv.se
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=5omqUvEiiIKUhShJOBEgb3WwLU5uy-8o_4--y0TOuw0&s=ZFAcjvG5LrsnsCJgIf9f1320V866HKG6iJGteRQ7oac&e= 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190905/c2373088/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 4232 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190905/c2373088/attachment-0002.png>

From sakkuma4 at in.ibm.com  Thu Sep  5 19:37:47 2019
From: sakkuma4 at in.ibm.com (Saket Kumar11)
Date: Thu, 5 Sep 2019 18:37:47 +0000
Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 92, Issue 4
In-Reply-To: <mailman.1.1567681201.8935.gpfsug-discuss@spectrumscale.org>
References: <mailman.1.1567681201.8935.gpfsug-discuss@spectrumscale.org>
Message-ID: <OF0B532D53.2DF1657A-ON0025846C.0065135F-0025846C.0066565A@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190905/cd4a915b/attachment-0002.htm>

From sakkuma4 at in.ibm.com  Thu Sep  5 20:06:17 2019
From: sakkuma4 at in.ibm.com (Saket Kumar11)
Date: Thu, 5 Sep 2019 19:06:17 +0000
Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction
In-Reply-To: <mailman.1.1567681201.8935.gpfsug-discuss@spectrumscale.org>
Message-ID: <OFEAE178B9.0A03E0A1-ON0025846C.006864D0-0025846C.0068F24D@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190905/0051eb26/attachment-0002.htm>

From son.truong at bristol.ac.uk  Fri Sep  6 10:48:56 2019
From: son.truong at bristol.ac.uk (Son Truong)
Date: Fri, 6 Sep 2019 09:48:56 +0000
Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
Message-ID: <DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>

Hello,

Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on RHEL 7.7?

I am failing with these errors:

[root at host ~]# uname -a
Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

[root at host ~]# rpm -qa | grep gpfs
gpfs.base-4.2.3-7.x86_64
gpfs.gskit-8.0.50-75.x86_64
gpfs.ext-4.2.3-7.x86_64
gpfs.msg.en_US-4.2.3-7.noarch
gpfs.docs-4.2.3-7.noarch
gpfs.gpl-4.2.3-7.noarch

[root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl
--------------------------------------------------------
mmbuildgpl: Building GPL module begins at Fri Sep  6 09:30:20 UTC 2019.
--------------------------------------------------------
Verifying Kernel Header...
  kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64, 3.10.0-1062)
  module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include
  module build dir   = /lib/modules/3.10.0-1062.el7.x86_64/build
  kernel source dir  = /usr/src/linux-3.10.0-1062.el7.x86_64/include
  Found valid kernel header file under /usr/src/kernels/3.10.0-1062.el7.x86_64/include
Verifying Compiler...
  make is present at /bin/make
  cpp is present at /bin/cpp
  gcc is present at /bin/gcc
  g++ is present at /bin/g++
  ld is present at /bin/ld
Verifying Additional System Headers...
  Verifying kernel-headers is installed ...
    Command: /bin/rpm -q kernel-headers
    The required package kernel-headers is installed
make World ...
Verifying that tools to build the portability layer exist....
cpp present
gcc present
g++ present
ld present
cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit $? || exit 1
rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib
mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib
rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver
cleaning (/usr/lpp/mmfs/src/ibm-kxi)
make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi'
rm -f trcid.h ibm_kxi.trclst

[cut]

Invoking Kbuild...
/usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config  ; \
if [ $? -ne 0 ]; then \
        exit 1;\
fi
make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
  LD      /usr/lpp/mmfs/src/gpl-linux/built-in.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracelin.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/relaytrc.o
  LD [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o
  LD [M]  /usr/lpp/mmfs/src/gpl-linux/mmfs26.o
  CC [M]  /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
/usr/lpp/mmfs/src/gpl-linux/inode.c: In function 'printInode':
/usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: 'struct inode' has no member named 'i_wb_list'
     _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP->i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev));
                                                         ^
/usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro '_TRACE_MACRO'
         { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP

[ cut ]

                          ^
/usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro 'TRACE6'
   TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8,
   ^
In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
/usr/lpp/mmfs/src/gpl-linux/inode.c: In function 'cxiInitInodeSecurity':
/usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of 'security_old_inode_init_security' from incompatible pointer type [enabled by default]
   rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name,
   ^
In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0,
                 from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61,
                 from /usr/lpp/mmfs/src/gpl-linux/dir.c:56,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
include/linux/security.h:1896:5: note: expected 'const char **' but argument is of type 'char **'
int security_old_inode_init_security(struct inode *inode, struct inode *dir,
     ^
In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
/usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function 'cache_get_name':
/usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration of function 'vfs_readdir' [-Werror=implicit-function-declaration]
     error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer);
     ^
cc1: some warnings being treated as errors
make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1
make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2
make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
make[1]: *** [modules] Error 1
make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
make: *** [Modules] Error 1
--------------------------------------------------------
mmbuildgpl: Building GPL module failed at Fri Sep  6 09:30:28 UTC 2019.
--------------------------------------------------------
mmbuildgpl: Command failed. Examine previous error messages to determine cause.

Any help appreciated...
Son

Son V Truong - Senior Storage Administrator
Advanced Computing Research Centre
IT Services, University of Bristol
Email: son.truong at bristol.ac.uk<mailto:s.truong at bristol.ac.uk>
Tel: Mobile: +44 (0) 7732 257 232
Address: 31 Great George Street, Bristol, BS1 5QD

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/aaaed906/attachment-0002.htm>

From david_johnson at brown.edu  Fri Sep  6 11:24:51 2019
From: david_johnson at brown.edu (david_johnson at brown.edu)
Date: Fri, 6 Sep 2019 06:24:51 -0400
Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
In-Reply-To: <DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>
References: <DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>
Message-ID: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>

We are starting rolling upgrade to 5.0.3-x and gplbin compiles with non-fatal warnings at that version. It seems to run fine. The rest of the cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do you have a reason to not go for the latest release on either the 4- or 5- line?

[root at xxx ~]# ssh node1301 rpm -q gpfs.base
gpfs.base-4.2.3-10.x86_64


  -- ddj
Dave Johnson

> On Sep 6, 2019, at 5:48 AM, Son Truong <son.truong at bristol.ac.uk> wrote:
> 
> Hello,
>  
> Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on RHEL 7.7?
>  
> I am failing with these errors:
>  
> [root at host ~]# uname -a
> Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
>  
> [root at host ~]# rpm -qa | grep gpfs
> gpfs.base-4.2.3-7.x86_64
> gpfs.gskit-8.0.50-75.x86_64
> gpfs.ext-4.2.3-7.x86_64
> gpfs.msg.en_US-4.2.3-7.noarch
> gpfs.docs-4.2.3-7.noarch
> gpfs.gpl-4.2.3-7.noarch
>  
> [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl
> --------------------------------------------------------
> mmbuildgpl: Building GPL module begins at Fri Sep  6 09:30:20 UTC 2019.
> --------------------------------------------------------
> Verifying Kernel Header...
>   kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64, 3.10.0-1062)
>   module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include
>   module build dir   = /lib/modules/3.10.0-1062.el7.x86_64/build
>   kernel source dir  = /usr/src/linux-3.10.0-1062.el7.x86_64/include
>   Found valid kernel header file under /usr/src/kernels/3.10.0-1062.el7.x86_64/include
> Verifying Compiler...
>   make is present at /bin/make
>   cpp is present at /bin/cpp
>   gcc is present at /bin/gcc
>   g++ is present at /bin/g++
>   ld is present at /bin/ld
> Verifying Additional System Headers...
>   Verifying kernel-headers is installed ...
>     Command: /bin/rpm -q kernel-headers
>     The required package kernel-headers is installed
> make World ...
> Verifying that tools to build the portability layer exist....
> cpp present
> gcc present
> g++ present
> ld present
> cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit $? || exit 1
> rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib
> mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib
> rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver
> cleaning (/usr/lpp/mmfs/src/ibm-kxi)
> make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi'
> rm -f trcid.h ibm_kxi.trclst
>  
> [cut]
>  
> Invoking Kbuild...
> /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config  ; \
> if [ $? -ne 0 ]; then \
>         exit 1;\
> fi
> make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
>   LD      /usr/lpp/mmfs/src/gpl-linux/built-in.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracelin.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/relaytrc.o
>   LD [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o
>   LD [M]  /usr/lpp/mmfs/src/gpl-linux/mmfs26.o
>   CC [M]  /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
> In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?:
> /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? has no member named ?i_wb_list?
>      _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP->i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev));
>                                                          ^
> /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro ?_TRACE_MACRO?
>          { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP
>  
> [ cut ]
>  
>                           ^
> /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro ?TRACE6?
>    TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8,
>    ^
> In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?:
> /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of ?security_old_inode_init_security? from incompatible pointer type [enabled by default]
>    rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name,
>    ^
> In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0,
>                  from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61,
>                  from /usr/lpp/mmfs/src/gpl-linux/dir.c:56,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> include/linux/security.h:1896:5: note: expected ?const char **? but argument is of type ?char **?
> int security_old_inode_init_security(struct inode *inode, struct inode *dir,
>      ^
> In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0,
>                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?:
> /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration of function ?vfs_readdir? [-Werror=implicit-function-declaration]
>      error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer);
>      ^
> cc1: some warnings being treated as errors
> make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1
> make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2
> make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
> make[1]: *** [modules] Error 1
> make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
> make: *** [Modules] Error 1
> --------------------------------------------------------
> mmbuildgpl: Building GPL module failed at Fri Sep  6 09:30:28 UTC 2019.
> --------------------------------------------------------
> mmbuildgpl: Command failed. Examine previous error messages to determine cause.
>  
> Any help appreciated?
> Son
>  
> Son V Truong - Senior Storage Administrator
> Advanced Computing Research Centre
> IT Services, University of Bristol
> Email: son.truong at bristol.ac.uk
> Tel: Mobile: +44 (0) 7732 257 232
> Address: 31 Great George Street, Bristol, BS1 5QD
>  
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/c02620cf/attachment-0002.htm>

From A.Wolf-Reber at de.ibm.com  Fri Sep  6 12:41:32 2019
From: A.Wolf-Reber at de.ibm.com (Alexander Wolf)
Date: Fri, 6 Sep 2019 11:41:32 +0000
Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
In-Reply-To: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>
References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>,
	<DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>
Message-ID: <OF68980980.A6F8177F-ON0025846D.003FDECA-0025846D.00403A43@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/0a2d37c6/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.15677537609150.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/0a2d37c6/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.15677537609151.png
Type: image/png
Size: 6645 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/0a2d37c6/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Image.15677537609152.png
Type: image/png
Size: 1134 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/0a2d37c6/attachment-0008.png>

From Dugan.Witherick at warwick.ac.uk  Fri Sep  6 13:25:22 2019
From: Dugan.Witherick at warwick.ac.uk (Witherick, Dugan)
Date: Fri, 6 Sep 2019 12:25:22 +0000
Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
In-Reply-To: <OF68980980.A6F8177F-ON0025846D.003FDECA-0025846D.00403A43@notes.na.collabserv.com>
References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>	,
	<DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>
	<OF68980980.A6F8177F-ON0025846D.003FDECA-0025846D.00403A43@notes.na.collabserv.com>
Message-ID: <05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk>

Hi Son,

You might also find Table 39 on 
https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#fsm
useful as it lists the minimum Spectrum Scale Level supported and tested against the RHEL Distribution/kernel version.

Thanks,
Dugan

On Fri, 2019-09-06 at 11:41 +0000, Alexander Wolf wrote:
> RHEL 7.7 is not supported by any Scale release at the moment. We are
> qualifying it right now and would like to claim support with the next PTFs on
> both 4.2.3 and 5.0.3 streams. However we have seen issues in test that will
> probably cause delays. 
>  
> Picking up new minor RHEL updates before Scale claims support might work many
> times but is quite a risky business. I highly recommend waiting for our
> support statement.
>  
> Mit freundlichen Gr??en / Kind regards
> 
> 
>                             
>  
>      
> Dr. Alexander Wolf-Reber
> Spectrum Scale Release Lead Architect
> Department M069 / Spectrum Scale Software Development
> 
> +49-160-90540880
> a.wolf-reber at de.ibm.com
>  
> IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats:
> Matthias Hartmann / Gesch?ftsf?hrung: Dirk Wittkopp
> Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB
> 243294
> 
>  
>  
> > ----- Original message -----
> > From: david_johnson at brown.edu
> > Sent by: gpfsug-discuss-bounces at spectrumscale.org
> > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > Cc:
> > Subject: [EXTERNAL] Re: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
> > Date: Fri, Sep 6, 2019 12:33
> >  
> > We are starting rolling upgrade to 5.0.3-x and gplbin compiles with non-
> > fatal warnings at that version. It seems to run fine. The rest of the
> > cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do you have a
> > reason to not go for the latest release on either the 4- or 5- line?
> >  
> > [root at xxx ~]# ssh node1301 rpm -q gpfs.base
> > gpfs.base-4.2.3-10.x86_64
> >  
> >  
> >   -- ddj
> > Dave Johnson
> > 
> > On Sep 6, 2019, at 5:48 AM, Son Truong <son.truong at bristol.ac.uk> wrote:
> >  
> > > Hello,
> > >  
> > > Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on
> > > RHEL 7.7?
> > >  
> > > I am failing with these errors:
> > >  
> > > [root at host ~]# uname -a
> > > Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019
> > > x86_64 x86_64 x86_64 GNU/Linux
> > >  
> > > [root at host ~]# rpm -qa | grep gpfs
> > > gpfs.base-4.2.3-7.x86_64
> > > gpfs.gskit-8.0.50-75.x86_64
> > > gpfs.ext-4.2.3-7.x86_64
> > > gpfs.msg.en_US-4.2.3-7.noarch
> > > gpfs.docs-4.2.3-7.noarch
> > > gpfs.gpl-4.2.3-7.noarch
> > >  
> > > [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl
> > > --------------------------------------------------------
> > > mmbuildgpl: Building GPL module begins at Fri Sep  6 09:30:20 UTC 2019.
> > > --------------------------------------------------------
> > > Verifying Kernel Header...
> > >   kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64,
> > > 3.10.0-1062)
> > >   module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include
> > >   module build dir   = /lib/modules/3.10.0-1062.el7.x86_64/build
> > >   kernel source dir  = /usr/src/linux-3.10.0-1062.el7.x86_64/include
> > >   Found valid kernel header file under /usr/src/kernels/3.10.0-
> > > 1062.el7.x86_64/include
> > > Verifying Compiler...
> > >   make is present at /bin/make
> > >   cpp is present at /bin/cpp
> > >   gcc is present at /bin/gcc
> > >   g++ is present at /bin/g++
> > >   ld is present at /bin/ld
> > > Verifying Additional System Headers...
> > >   Verifying kernel-headers is installed ...
> > >     Command: /bin/rpm -q kernel-headers
> > >     The required package kernel-headers is installed
> > > make World ...
> > > Verifying that tools to build the portability layer exist....
> > > cpp present
> > > gcc present
> > > g++ present
> > > ld present
> > > cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit
> > > $? || exit 1
> > > rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin
> > > /usr/lpp/mmfs/src/lib
> > > mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin
> > > /usr/lpp/mmfs/src/lib
> > > rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver
> > > cleaning (/usr/lpp/mmfs/src/ibm-kxi)
> > > make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi'
> > > rm -f trcid.h ibm_kxi.trclst
> > >  
> > > [cut]
> > >  
> > > Invoking Kbuild...
> > > /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64
> > > M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config  ; \
> > > if [ $? -ne 0 ]; then \
> > >         exit 1;\
> > > fi
> > > make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
> > >   LD      /usr/lpp/mmfs/src/gpl-linux/built-in.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracelin.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/relaytrc.o
> > >   LD [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o
> > >   LD [M]  /usr/lpp/mmfs/src/gpl-linux/mmfs26.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?:
> > > /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? has no
> > > member named ?i_wb_list?
> > >      _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)),
> > > (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP-
> > > >i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev));
> > >                                                          ^
> > > /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro
> > > _TRACE_MACRO?
> > >          { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP
> > >  
> > > [ cut ]
> > >  
> > >                           ^
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro
> > > ?TRACE6?
> > >    TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8,
> > >    ^
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of
> > > ?security_old_inode_init_security? from incompatible pointer type [enabled
> > > by default]
> > >    rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name,
> > >    ^
> > > In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0,
> > >                  from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/dir.c:56,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > include/linux/security.h:1896:5: note: expected ?const char **? but
> > > argument is of type ?char **?
> > > int security_old_inode_init_security(struct inode *inode, struct inode
> > > *dir,
> > >      ^
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?:
> > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration
> > > of function ?vfs_readdir? [-Werror=implicit-function-declaration]
> > >      error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer);
> > >      ^
> > > cc1: some warnings being treated as errors
> > > make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1
> > > make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2
> > > make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
> > > make[1]: *** [modules] Error 1
> > > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
> > > make: *** [Modules] Error 1
> > > --------------------------------------------------------
> > > mmbuildgpl: Building GPL module failed at Fri Sep  6 09:30:28 UTC 2019.
> > > --------------------------------------------------------
> > > mmbuildgpl: Command failed. Examine previous error messages to determine
> > > cause.
> > >  
> > > Any help appreciated?
> > > Son
> > >  
> > > Son V Truong - Senior Storage Administrator
> > > Advanced Computing Research Centre
> > > IT Services, University of Bristol
> > > Email: son.truong at bristol.ac.uk
> > > Tel: Mobile: +44 (0) 7732 257 232
> > > Address: 31 Great George Street, Bristol, BS1 5QD
> > >  
> >  
> > > _______________________________________________
> > > gpfsug-discuss mailing list
> > > gpfsug-discuss at spectrumscale.org
> > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > 
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
> 
>  
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From son.truong at bristol.ac.uk  Fri Sep  6 15:15:04 2019
From: son.truong at bristol.ac.uk (Son Truong)
Date: Fri, 6 Sep 2019 14:15:04 +0000
Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7
In-Reply-To: <05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk>
References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>	,
	<DB6PR0602MB2933F30E2344E6B231E63B24D2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>
	<OF68980980.A6F8177F-ON0025846D.003FDECA-0025846D.00403A43@notes.na.collabserv.com>
	<05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk>
Message-ID: <DB6PR0602MB2933853D2AEDBC8AD3A4A67AD2BA0@DB6PR0602MB2933.eurprd06.prod.outlook.com>

Thank you. Table 39 is most helpful.

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Witherick, Dugan
Sent: 06 September 2019 13:25
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Compiling gplbin on RHEL 7.7

Hi Son,

You might also find Table 39 on
https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#fsm
useful as it lists the minimum Spectrum Scale Level supported and tested against the RHEL Distribution/kernel version.

Thanks,
Dugan

On Fri, 2019-09-06 at 11:41 +0000, Alexander Wolf wrote:
> RHEL 7.7 is not supported by any Scale release at the moment. We are 
> qualifying it right now and would like to claim support with the next 
> PTFs on both 4.2.3 and 5.0.3 streams. However we have seen issues in 
> test that will probably cause delays.
>  
> Picking up new minor RHEL updates before Scale claims support might 
> work many times but is quite a risky business. I highly recommend 
> waiting for our support statement.
>  
> Mit freundlichen Gr??en / Kind regards
> 
> 
>                             
>  
>      
> Dr. Alexander Wolf-Reber
> Spectrum Scale Release Lead Architect
> Department M069 / Spectrum Scale Software Development
> 
> +49-160-90540880
> a.wolf-reber at de.ibm.com
>  
> IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats:
> Matthias Hartmann / Gesch?ftsf?hrung: Dirk Wittkopp Sitz der 
> Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB
> 243294
> 
>  
>  
> > ----- Original message -----
> > From: david_johnson at brown.edu
> > Sent by: gpfsug-discuss-bounces at spectrumscale.org
> > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > Cc:
> > Subject: [EXTERNAL] Re: [gpfsug-discuss] Compiling gplbin on RHEL 
> > 7.7
> > Date: Fri, Sep 6, 2019 12:33
> >  
> > We are starting rolling upgrade to 5.0.3-x and gplbin compiles with 
> > non- fatal warnings at that version. It seems to run fine. The rest 
> > of the cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do 
> > you have a reason to not go for the latest release on either the 4- or 5- line?
> >  
> > [root at xxx ~]# ssh node1301 rpm -q gpfs.base
> > gpfs.base-4.2.3-10.x86_64
> >  
> >  
> >   -- ddj
> > Dave Johnson
> > 
> > On Sep 6, 2019, at 5:48 AM, Son Truong <son.truong at bristol.ac.uk> wrote:
> >  
> > > Hello,
> > >  
> > > Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel 
> > > modules on RHEL 7.7?
> > >  
> > > I am failing with these errors:
> > >  
> > > [root at host ~]# uname -a
> > > Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 
> > > 2019
> > > x86_64 x86_64 x86_64 GNU/Linux
> > >  
> > > [root at host ~]# rpm -qa | grep gpfs
> > > gpfs.base-4.2.3-7.x86_64
> > > gpfs.gskit-8.0.50-75.x86_64
> > > gpfs.ext-4.2.3-7.x86_64
> > > gpfs.msg.en_US-4.2.3-7.noarch
> > > gpfs.docs-4.2.3-7.noarch
> > > gpfs.gpl-4.2.3-7.noarch
> > >  
> > > [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl
> > > --------------------------------------------------------
> > > mmbuildgpl: Building GPL module begins at Fri Sep  6 09:30:20 UTC 2019.
> > > --------------------------------------------------------
> > > Verifying Kernel Header...
> > >   kernel version = 31000999 (31000999000000, 
> > > 3.10.0-1062.el7.x86_64,
> > > 3.10.0-1062)
> > >   module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include
> > >   module build dir   = /lib/modules/3.10.0-1062.el7.x86_64/build
> > >   kernel source dir  = /usr/src/linux-3.10.0-1062.el7.x86_64/include
> > >   Found valid kernel header file under /usr/src/kernels/3.10.0- 
> > > 1062.el7.x86_64/include Verifying Compiler...
> > >   make is present at /bin/make
> > >   cpp is present at /bin/cpp
> > >   gcc is present at /bin/gcc
> > >   g++ is present at /bin/g++
> > >   ld is present at /bin/ld
> > > Verifying Additional System Headers...
> > >   Verifying kernel-headers is installed ...
> > >     Command: /bin/rpm -q kernel-headers
> > >     The required package kernel-headers is installed make World 
> > > ...
> > > Verifying that tools to build the portability layer exist....
> > > cpp present
> > > gcc present
> > > g++ present
> > > ld present
> > > cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > 
> > > ./def.mk; exit $? || exit 1 rm -rf /usr/lpp/mmfs/src/include 
> > > /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib mkdir 
> > > /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin 
> > > /usr/lpp/mmfs/src/lib rm -f 
> > > //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver
> > > cleaning (/usr/lpp/mmfs/src/ibm-kxi)
> > > make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi'
> > > rm -f trcid.h ibm_kxi.trclst
> > >  
> > > [cut]
> > >  
> > > Invoking Kbuild...
> > > /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 
> > > ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux 
> > > CONFIGDIR=/usr/lpp/mmfs/src/config  ; \ if [ $? -ne 0 ]; then \
> > >         exit 1;\
> > > fi
> > > make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
> > >   LD      /usr/lpp/mmfs/src/gpl-linux/built-in.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracelin.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/relaytrc.o
> > >   LD [M]  /usr/lpp/mmfs/src/gpl-linux/tracedev.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o
> > >   LD [M]  /usr/lpp/mmfs/src/gpl-linux/mmfs26.o
> > >   CC [M]  /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?:
> > > /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? 
> > > has no member named ?i_wb_list?
> > >      _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), 
> > > (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), 
> > > (Int64)(iP->i_wb_list.prev), (Int64)(&(iP-
> > > >i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev));
> > >                                                          ^
> > > /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition 
> > > of macro _TRACE_MACRO?
> > >          { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP
> > >  
> > > [ cut ]
> > >  
> > >                           ^
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of 
> > > macro ?TRACE6?
> > >    TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8,
> > >    ^
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?:
> > > /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing 
> > > argument 4 of ?security_old_inode_init_security? from incompatible 
> > > pointer type [enabled by default]
> > >    rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name,
> > >    ^
> > > In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0,
> > >                  from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/dir.c:56,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > include/linux/security.h:1896:5: note: expected ?const char **? 
> > > but argument is of type ?char **?
> > > int security_old_inode_init_security(struct inode *inode, struct 
> > > inode *dir,
> > >      ^
> > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0,
> > >                  from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55:
> > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?:
> > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit 
> > > declaration of function ?vfs_readdir? [-Werror=implicit-function-declaration]
> > >      error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer);
> > >      ^
> > > cc1: some warnings being treated as errors
> > > make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1
> > > make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2
> > > make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64'
> > > make[1]: *** [modules] Error 1
> > > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux'
> > > make: *** [Modules] Error 1
> > > --------------------------------------------------------
> > > mmbuildgpl: Building GPL module failed at Fri Sep  6 09:30:28 UTC 2019.
> > > --------------------------------------------------------
> > > mmbuildgpl: Command failed. Examine previous error messages to 
> > > determine cause.
> > >  
> > > Any help appreciated?
> > > Son
> > >  
> > > Son V Truong - Senior Storage Administrator Advanced Computing 
> > > Research Centre IT Services, University of Bristol
> > > Email: son.truong at bristol.ac.uk
> > > Tel: Mobile: +44 (0) 7732 257 232
> > > Address: 31 Great George Street, Bristol, BS1 5QD
> > >  
> >  
> > > _______________________________________________
> > > gpfsug-discuss mailing list
> > > gpfsug-discuss at spectrumscale.org 
> > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > 
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
>  
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

From Robert.Oesterlin at nuance.com  Fri Sep  6 16:42:39 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Fri, 6 Sep 2019 15:42:39 +0000
Subject: [gpfsug-discuss] SSUG Meeting at SC19: Save the date and call for
	user talks!
Message-ID: <B5C8A89A-3CAF-4738-AC10-7341A67F4941@nuance.com>

The Spectrum Scale User group will hold its annual meeting at SC19 on Sunday November 17th from 12:30PM -6PM In Denver, Co. We will be posting exact meeting location soon, but reserve this time. IBM will host a reception following the user group meeting.

We?re also looking for user talks - these are short update (20 mins or so) on your use of Spectrum Scale - any topics are welcome. If you are interested, please contact myself or Kristy Kallback-Rose.

Looking forward to seeing everyone in Denver!

Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190906/71c77399/attachment-0002.htm>

From bipcuds at gmail.com  Mon Sep  9 21:29:28 2019
From: bipcuds at gmail.com (Keith Ball)
Date: Mon, 9 Sep 2019 16:29:28 -0400
Subject: [gpfsug-discuss] Anyone have experience with changing NSD server
 node name in an ESS/DSS cluster?
Message-ID: <CAAxuGpGoC6ZFbGKvvnT=eMfjaEfnm-G3XOXfTN+bhr5yzZ_yvw@mail.gmail.com>

Hi All,

We are thinking of attempting a non-destructive change of NSD server node
names in a Lenovo DSS cluster (DSS level 1.2a, which has Scale 4.2.3.5).
For a non-GNR cluster, changing a node name for an NSD server isn't a huge
deal if you can have a backup server serve up disks; one can mmdelnode then
mmaddnode, for instance.

Has anyone tried to rename the NSD servers in a GNR cluster, however? I am
not sure if it's as easy as failing over the recovery group, and
deleting/adding the NSD server. It's easy enough to modify xcat. Perhaps
mmchrecoverygroup can be used to change the RG names (since they are named
after the NSD servers), but that might not be necessary. Or, it might not
work - does anyone know if there is a special process to change NSD server
names in an E( or D or G)SS cluster that does not run afoul of GNR or
upgrade scripts?

Best regards,
  Keith
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190909/317b232a/attachment-0002.htm>

From TROPPENS at de.ibm.com  Wed Sep 11 13:20:22 2019
From: TROPPENS at de.ibm.com (Ulf Troppens)
Date: Wed, 11 Sep 2019 14:20:22 +0200
Subject: [gpfsug-discuss] Save the date: Oct 10 - Spectrum Scale NYC User
	Meeting
Message-ID: <OFEE4AA8A1.E86C67B9-ONC1258472.00425CA9-C1258472.0043C885@notes.na.collabserv.com>


Greetings,

NYU Langone and IBM will host a Spectrum Scale User Meeting on October 10.
Many senior engineers of our development lab in Poughkeepsie will attend
and present. Details with agenda, exact location and registration link will
follow.

Best
Ulf


--
IBM Spectrum Scale Development - Client Engagements & Solutions Delivery
Consulting IT Specialist
Author "Storage Networks Explained"

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart,
HRB 243294
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190911/3380e88a/attachment-0002.htm>

From jjvilla at nccs.nasa.gov  Wed Sep 11 20:14:12 2019
From: jjvilla at nccs.nasa.gov (John J. Villa)
Date: Wed, 11 Sep 2019 15:14:12 -0400 (EDT)
Subject: [gpfsug-discuss] Introduction - New Subscriber
Message-ID: <alpine.DEB.2.02.1909111508550.28760@calvin2.nccs.nasa.gov>

Hello,

My name is John Villa. I work for NASA at the Nasa Center for Climate 
Simulation. We currently utilize GPFS as the primary filesystem on the 
discover cluster:
https://www.nccs.nasa.gov/systems/discover

I look forward to seeing everyone at SC19.

Thank You,
--
John J. Villa
NASA Center for Climate Simulation
Discover Systems Administrator


From damir.krstic at gmail.com  Thu Sep 12 15:16:03 2019
From: damir.krstic at gmail.com (Damir Krstic)
Date: Thu, 12 Sep 2019 09:16:03 -0500
Subject: [gpfsug-discuss] VerbsReconnectThread waiters
Message-ID: <CAKV+WqfbysaKM=We3NNm5essS=--Ej=wu+yzvLfousVp4cfOYw@mail.gmail.com>

On my cluster I have seen couple of long waiters such as this:

gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230
VerbsReconnectThread: delaying for 43.145624000 more seconds, reason:
delaying for next reconnect attempt

I tried searching on gpfs wiki for this type of waiter, but was unable to
find anything of value.

Is this something to pay attention to, and what does this waiter mean?

Thank you.
Damir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190912/8b65f9f1/attachment-0002.htm>

From george at markomanolis.com  Thu Sep 12 16:10:58 2019
From: george at markomanolis.com (George Markomanolis)
Date: Thu, 12 Sep 2019 11:10:58 -0400
Subject: [gpfsug-discuss] Call for Submission for the IO500 List
Message-ID: <CAPU3yroLbtpNzguSxxHndmQnxrAJ4OZtZTW0Wgngt7aq=2QKmA@mail.gmail.com>

Call for Submission

*Deadline*: 10 November 2019 AoE

The IO500 <http://io500.org/> is now accepting and encouraging submissions
for the upcoming 5th IO500 list revealed at SC19 in Denver, Colorado. Once
again, we are also accepting submissions to the 10 Node I/O Challenge to
encourage submission of small scale results. The new ranked lists will be
announced at our SC19 BoF [2]. We hope to see you, and your results, there.
We have updated our submission rules [3]. This year, we will have a new
list for the Student Cluster Competition as IO500 is used for extra points
during this competition

The benchmark suite is designed to be easy to run and the community has
multiple active support channels to help with any questions. Please submit
and we look forward to seeing many of you at SC19! Please note that
submissions of all sizes are welcome; the site has customizable sorting so
it is possible to submit on a small system and still get a very good
per-client score for example. Additionally, the list is about much more
than just the raw rank; all submissions help the community by collecting
and publishing a wider corpus of data. More details below.

Following the success of the Top500 in collecting and analyzing historical
trends in supercomputer technology and evolution, the IO500
<http://io500.org/> was created in 2017, published its first list at SC17,
and has grown exponentially since then. The need for such an initiative has
long been known within High-Performance Computing; however, defining
appropriate benchmarks had long been challenging. Despite this challenge,
the community, after long and spirited discussion, finally reached
consensus on a suite of benchmarks and a metric for resolving the scores
into a single ranking.

The multi-fold goals of the benchmark suite are as follows:

1.   Maximizing simplicity in running the benchmark suite

2.   Encouraging complexity in tuning for performance

3.   Allowing submitters to highlight their ?hero run? performance numbers

4.   Forcing submitters to simultaneously report performance for
challenging IO patterns.

Specifically, the benchmark suite includes a hero-run of both IOR and
mdtest configured however possible to maximize performance and establish an
upper-bound for performance. It also includes an IOR and mdtest run with
highly prescribed parameters in an attempt to determine a lower-bound.
Finally, it includes a namespace search as this has been determined to be a
highly sought-after feature in HPC storage systems that have historically
not been well-measured. Submitters are encouraged to share their tuning
insights for publication.

The goals of the community are also multi-fold:

1.   Gather historical data for the sake of analysis and to aid predictions
of storage futures

2.   Collect tuning information to share valuable performance optimizations
across the community

3.   Encourage vendors and designers to optimize for workloads beyond ?hero
runs?

4.   Establish bounded expectations for users, procurers, and administrators
10 Node I/O Challenge

At SC, we will continue the 10 Node Challenge. This challenge is conducted
using the regular IO500 benchmark, however, with the rule that exactly *10
computes nodes* must be used to run the benchmark (one exception is the
find, which may use 1 node). You may use any shared storage with, e.g., any
number of servers. We will announce the result in a separate derived list
and in the full list but not on the ranked IO500 list at io500.org.
Birds-of-a-feather

Once again, we encourage you to submit [1], to join our community, and to
attend our BoF ?The IO500 and the Virtual Institute of I/O? at SC19,
November 19th, 12:15-1:15pm, room 205-207, where we will announce the new
IO500 list, the 10 node challenge list, and the Student Cluster Competition
list. We look forward to answering any questions or concerns you might have.

[1] http://io500.org/submission

[2] *https://www.vi4io.org/io500/bofs/sc19/start
<https://www.vi4io.org/io500/bofs/sc19/start>*

[3] https://www.vi4io.org/io500/rules/submission

The IO500 committee
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190912/6b48bb24/attachment-0002.htm>

From kkr at lbl.gov  Thu Sep 12 20:19:20 2019
From: kkr at lbl.gov (Kristy Kallback-Rose)
Date: Thu, 12 Sep 2019 12:19:20 -0700
Subject: [gpfsug-discuss] Hold the Date - September 23 and 24 -
	REGISTRATION CLOSING SOON
In-Reply-To: <938EC571-B900-42BC-8465-3E666912533F@lbl.gov>
References: <3F2B08E9-C6E3-412B-9308-D79E3480C5DA@lbl.gov>
	<938EC571-B900-42BC-8465-3E666912533F@lbl.gov>
Message-ID: <FDE59BE1-EBFB-422F-A5BB-28CFD0BC403B@lbl.gov>

Reminder, registration closing on 9/16 EOB. That?s real soon now. Hope to see you there. Details below.

> On Aug 29, 2019, at 7:30 PM, Kristy Kallback-Rose <kkr at lbl.gov> wrote:
> 
> Hello,
> 
> 	You will now find the nearly complete agenda here: 
> 
> https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/ <https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/>
> 
> 	As noted before, the event is free, but please do register below to help with catering planning.
> 
> 	You can find more information about the full HPCXXL event here: http://hpcxxl.org/ <http://hpcxxl.org/>
> 
> 	Any questions let us know. Hope to see you there!
> 
> -Kristy
> 
>> On Jul 2, 2019, at 10:45 AM, Kristy Kallback-Rose <kkr at lbl.gov <mailto:kkr at lbl.gov>> wrote:
>> 
>> Hello,
>> 
>> HPCXXL will be hosted by NERSC (Berkeley, CA) this September. As part of this event, there will be approximately a day and a half on GPFS content. We have done this type of event in the past, and as before, the GPFS days will be free to attend, but you do need to register. 
>> 
>> We?ll have more details soon, mark your calendars. 
>> 
>> Initial details: https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/ <https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/>
>> 
>> Best,
>> Kristy
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190912/1c4b782e/attachment-0002.htm>

From Greg.Lehmann at csiro.au  Fri Sep 13 09:48:58 2019
From: Greg.Lehmann at csiro.au (Lehmann, Greg (IM&T, Pullenvale))
Date: Fri, 13 Sep 2019 08:48:58 +0000
Subject: [gpfsug-discuss] infiniband fabric instability effects
Message-ID: <SYCPR01MB4544546F3C45E36F01E2DC3EF0B30@SYCPR01MB4544.ausprd01.prod.outlook.com>

Hi All,
                I was wondering what effect restarting the subnet manager has on an active Spectrum Scale filesystem. Is there any scope for data loss or corruption? A 2nd similar scenario of slightly longer duration is failover to a secondary subnet manager because the primary has crashed. What effect would that have on the filesystem?

Cheers,

Greg Lehmann
Senior High Performance Data Specialist
Data Services | Scientific Computing Platforms
Information Management and Technology  |  CSIRO
Greg.Lehmann at csiro.au<mailto:Greg.Lehmann at csiro.au>  |  +61 7 3327 4137 |
1 Technology Court, Pullenvale, QLD 4069

CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and we pay our respects to their Elders past and present.

The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference.

Please consider the environment before printing this email.

CSIRO Australia's National Science Agency  |  csiro.au

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190913/d85678ad/attachment-0002.htm>

From david_johnson at brown.edu  Fri Sep 13 10:14:06 2019
From: david_johnson at brown.edu (david_johnson at brown.edu)
Date: Fri, 13 Sep 2019 05:14:06 -0400
Subject: [gpfsug-discuss] infiniband fabric instability effects
In-Reply-To: <SYCPR01MB4544546F3C45E36F01E2DC3EF0B30@SYCPR01MB4544.ausprd01.prod.outlook.com>
References: <SYCPR01MB4544546F3C45E36F01E2DC3EF0B30@SYCPR01MB4544.ausprd01.prod.outlook.com>
Message-ID: <21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu>

Restarting subnet manager in general is fairly harmless. It will cause a heavy sweep of the fabric when it comes back up, but there should be no LID renumbering. Traffic may be held up during the scanning and rebuild of the routing tables. 
 Losing a subnet manager for a period of time would prevent newly booted nodes from receiving a LID but existing nodes will continue to function. 
Adding or deleting inter-switch links should probably be avoided if the subnet manager is down.  I would also avoid changing the routing algorithm while in production.  
Moving a non ha subnet manager from primary to backup and back again has worked for us without disruption, but I would try to do this in a maintenance window. 

  -- ddj
Dave Johnson

> On Sep 13, 2019, at 4:48 AM, Lehmann, Greg (IM&T, Pullenvale) <Greg.Lehmann at csiro.au> wrote:
> 
> Hi All,
>                 I was wondering what effect restarting the subnet manager has on an active Spectrum Scale filesystem. Is there any scope for data loss or corruption? A 2nd similar scenario of slightly longer duration is failover to a secondary subnet manager because the primary has crashed. What effect would that have on the filesystem?
>  
> Cheers,
>  
> Greg Lehmann
> Senior High Performance Data Specialist
> Data Services | Scientific Computing Platforms
> Information Management and Technology  |  CSIRO 
> Greg.Lehmann at csiro.au  |  +61 7 3327 4137 |
> 1 Technology Court, Pullenvale, QLD 4069
>  
> CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and we pay our respects to their Elders past and present.
>  
> The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference.
>  
> Please consider the environment before printing this email.
>  
> CSIRO Australia?s National Science Agency  |  csiro.au
>  
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190913/491fcfd7/attachment-0002.htm>

From jonathan.buzzard at strath.ac.uk  Fri Sep 13 10:48:52 2019
From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard)
Date: Fri, 13 Sep 2019 09:48:52 +0000
Subject: [gpfsug-discuss] infiniband fabric instability effects
In-Reply-To: <21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu>
References: <SYCPR01MB4544546F3C45E36F01E2DC3EF0B30@SYCPR01MB4544.ausprd01.prod.outlook.com>
	<21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu>
Message-ID: <ddeb2a2ebc4b8df22b831d3e8287401db673a1ba.camel@strath.ac.uk>

On Fri, 2019-09-13 at 05:14 -0400, david_johnson at brown.edu wrote:

[SNIP]

> Moving a non ha subnet manager from primary to backup and back again
> has worked for us without disruption, but I would try to do this in a
> maintenance window. 
> 

Not on GPFS but in the past I have moved from one subnet manager to
another with dozens of running MPI jobs, and Lustre running over the
fabric and not missed a beat. My current cluster used 10 and 40Gbps
ethernet for GPFS with Omnipath exclusively for MPI traffic.

To be honest I just cannot wrap my head around the idea that you would
not be running two subnet managers in the first place. Just fire up two
subnet managers (whether on a switch or a node) and forget about it.
They will automatically work together to give you a HA solution. It is
the same with Omnipath too.

I would also note that you can fire up more than two fabric managers
and it all "just works".

If it where me and I didn't have fabric managers running on at least
two of my switches and I was doing GPFS over Infiniband, I would fire
up fabric managers on all of my NSD servers.

JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG


From heinrich.billich at id.ethz.ch  Fri Sep 13 15:56:07 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Fri, 13 Sep 2019 14:56:07 +0000
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?
Message-ID: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>


Hello,

I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated.  But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. 
I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level?

Any comment is welcome

Cheers,
Heiner
-- 
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================
 

I did check with

  ss -l -t -4
  ss -l -t  -6

add  -p to get the process name, too.

do you get the same results on your ces nodes?


[root at nas22ces04-i config_samples]#   ss -l -t   -4
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      8192                                                                                  *:gpfs                                                                                                *:*
LISTEN      0      50                                                                                    *:netbios-ssn                                                                                         *:*
LISTEN      0      128                                                                                   *:5355                                                                                                *:*
LISTEN      0      128                                                                                   *:sunrpc                                                                                              *:*
LISTEN      0      128                                                                                   *:ssh                                                                                                 *:*
LISTEN      0      100                                                                           127.0.0.1:smtp                                                                                                *:*
LISTEN      0      10                                                                        10.250.135.24:4379                                                                                                *:*
LISTEN      0      128                                                                                   *:32765                                                                                               *:*
LISTEN      0      50                                                                                    *:microsoft-ds                                                                                        *:*
[root at nas22ces04-i config_samples]#   ss -l -t   -6
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      128                                                                                  :::32767                                                                                              :::*
LISTEN      0      128                                                                                  :::32768                                                                                              :::*
LISTEN      0      128                                                                                  :::32769                                                                                              :::*
LISTEN      0      128                                                                                  :::2049                                                                                               :::*
LISTEN      0      128                                                                                  :::5355                                                                                               :::*
LISTEN      0      50                                                                                   :::netbios-ssn                                                                                        :::*
LISTEN      0      128                                                                                  :::sunrpc                                                                                             :::*
LISTEN      0      128                                                                                  :::ssh                                                                                                :::*
LISTEN      0      128                                                                                  :::32765                                                                                              :::*
LISTEN      0      50                                                                                   :::microsoft-ds                                                                                       :::*


From ewahl at osc.edu  Fri Sep 13 16:42:30 2019
From: ewahl at osc.edu (Wahl, Edward)
Date: Fri, 13 Sep 2019 15:42:30 +0000
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be
	expected?
In-Reply-To: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
References: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
Message-ID: <DM5PR0102MB349461A0FCE42DB381989D35A8B30@DM5PR0102MB3494.prod.exchangelabs.com>

I recall looking at this a year or two back.  Ganesha is either v4 and v6 both (ie: the encapsulation you see), OR  ipv4 ONLY.  (ie: /etc/modprobe.d/ipv6.conf   disable=1)

Ed


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Billich Heinrich Rainer (ID SD) <heinrich.billich at id.ethz.ch>
Sent: Friday, September 13, 2019 10:56 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?


Hello,

I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated.  But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets.
I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level?

Any comment is welcome

Cheers,
Heiner
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


I did check with

  ss -l -t -4
  ss -l -t  -6

add  -p to get the process name, too.

do you get the same results on your ces nodes?


[root at nas22ces04-i config_samples]#   ss -l -t   -4
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      8192                                                                                  *:gpfs                                                                                                *:*
LISTEN      0      50                                                                                    *:netbios-ssn                                                                                         *:*
LISTEN      0      128                                                                                   *:5355                                                                                                *:*
LISTEN      0      128                                                                                   *:sunrpc                                                                                              *:*
LISTEN      0      128                                                                                   *:ssh                                                                                                 *:*
LISTEN      0      100                                                                           127.0.0.1:smtp                                                                                                *:*
LISTEN      0      10                                                                        10.250.135.24:4379                                                                                                *:*
LISTEN      0      128                                                                                   *:32765                                                                                               *:*
LISTEN      0      50                                                                                    *:microsoft-ds                                                                                        *:*
[root at nas22ces04-i config_samples]#   ss -l -t   -6
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      128                                                                                  :::32767                                                                                              :::*
LISTEN      0      128                                                                                  :::32768                                                                                              :::*
LISTEN      0      128                                                                                  :::32769                                                                                              :::*
LISTEN      0      128                                                                                  :::2049                                                                                               :::*
LISTEN      0      128                                                                                  :::5355                                                                                               :::*
LISTEN      0      50                                                                                   :::netbios-ssn                                                                                        :::*
LISTEN      0      128                                                                                  :::sunrpc                                                                                             :::*
LISTEN      0      128                                                                                  :::ssh                                                                                                :::*
LISTEN      0      128                                                                                  :::32765                                                                                              :::*
LISTEN      0      50                                                                                   :::microsoft-ds                                                                                       :::*

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190913/9f3212f9/attachment-0002.htm>

From jam at ucar.edu  Fri Sep 13 17:07:01 2019
From: jam at ucar.edu (Joseph Mendoza)
Date: Fri, 13 Sep 2019 10:07:01 -0600
Subject: [gpfsug-discuss] VerbsReconnectThread waiters
In-Reply-To: <CAKV+WqfbysaKM=We3NNm5essS=--Ej=wu+yzvLfousVp4cfOYw@mail.gmail.com>
References: <CAKV+WqfbysaKM=We3NNm5essS=--Ej=wu+yzvLfousVp4cfOYw@mail.gmail.com>
Message-ID: <0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu>

I have seen these on our cluster after the IB network goes down (GPFS still runs over ethernet) and then comes back up.?
They will retry forever it seems, even after the IB is healthy again.? The effect they seem to have is that verbs
connections between some nodes breaks and GPFS uses ethernet/ipoib instead.? You may see messages in your
mmfs.log.latest about verbs being disabled "due to too many errors".? You can also see fewer verbs connections between
nodes in "mmfsadm test verbs conn" output.

Restarting GPFS on the nodes with waiters has fixed the issue for me, I don't know if IBM has any other tricks to fix
this without a restart.

--Joey


On 9/12/19 8:16 AM, Damir Krstic wrote:
> On my cluster I have seen couple of long waiters such as this:
>
> gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230 VerbsReconnectThread: delaying for 43.145624000 more
> seconds, reason: delaying for next reconnect attempt
>
> I tried searching on gpfs wiki for this type of waiter, but was unable to find anything of value.
>
> Is this something to pay attention to, and what does this waiter mean?
>
> Thank you.
> Damir
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190913/62e11588/attachment-0002.htm>

From olaf.weiser at de.ibm.com  Mon Sep 16 08:12:09 2019
From: olaf.weiser at de.ibm.com (Olaf Weiser)
Date: Mon, 16 Sep 2019 09:12:09 +0200
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be
	expected?
In-Reply-To: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
References: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
Message-ID: <OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/eb0ae02d/attachment-0002.htm>

From scale at us.ibm.com  Mon Sep 16 10:33:58 2019
From: scale at us.ibm.com (IBM Spectrum Scale)
Date: Mon, 16 Sep 2019 17:33:58 +0800
Subject: [gpfsug-discuss] VerbsReconnectThread waiters
In-Reply-To: <0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu>
References: <CAKV+WqfbysaKM=We3NNm5essS=--Ej=wu+yzvLfousVp4cfOYw@mail.gmail.com>
	<0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu>
Message-ID: <OFBB9F688D.90B746AE-ON85258477.0032A926-48258477.00348C9B@notes.na.collabserv.com>


Damir, Joseph,

> Is this something to pay attention to, and what does this waiter mean?
This waiter means GPFS fails to reconnect broken verbs connection,  which
can cause performance degradation.

> I have seen these on our cluster after the IB network goes down (GPFS
still runs over ethernet) and then comes back up.? They will retry forever
it seems, even after the IB is healthy again.
> Restarting GPFS on the nodes with waiters has fixed the issue for me, I
don't know if IBM has any other tricks to fix this without a restart.

This is a code bug which is fixed through internal defect 1090669. It will
be backport to service releases after verification.
There is a work-around which can fix this problem without a restart.
-   On nodes which have this waiter list, run command 'mmfsadm test
breakconn all 744'
     744 is E_RECONNECT, which triggers tcp reconnect and will not cause
node leave/rejoin. Its side effect clears RDMA connections and their
incorrect status.

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.


From:	Joseph Mendoza <jam at ucar.edu>
To:	gpfsug-discuss at spectrumscale.org
Date:	2019/09/14 12:08 AM
Subject:	[EXTERNAL] Re: [gpfsug-discuss] VerbsReconnectThread waiters
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


I have seen these on our cluster after the IB network goes down (GPFS still
runs over ethernet) and then comes back up.? They will retry forever it
seems, even after the IB is healthy again.? The effect they seem to have is
that verbs connections between some nodes breaks and GPFS uses
ethernet/ipoib instead.? You may see messages in your mmfs.log.latest about
verbs being disabled "due to too many errors".? You can also see fewer
verbs connections between nodes in "mmfsadm test verbs conn" output.


Restarting GPFS on the nodes with waiters has fixed the issue for me, I
don't know if IBM has any other tricks to fix this without a restart.


--Joey


On 9/12/19 8:16 AM, Damir Krstic wrote:
      On my cluster I have seen couple of long waiters such as this:

      gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230
      VerbsReconnectThread: delaying for 43.145624000 more seconds, reason:
      delaying for next reconnect attempt

      I tried searching on gpfs wiki for this type of waiter, but was
      unable to find anything of value.

      Is this something to pay attention to, and what does this waiter
      mean?

      Thank you.
      Damir

      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      http://gpfsug.org/mailman/listinfo/gpfsug-discuss
      _______________________________________________
      gpfsug-discuss mailing list
      gpfsug-discuss at spectrumscale.org
      https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=WoT3TYlCvAM8RQxUISD9L6UzqY0I_ffCJTS-UHhw8z4&s=18A0j0Zmp8OwZ6Y6cc3HFe3OgFZRHIv8OeJcBpkaPwQ&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/e5e489f9/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/e5e489f9/attachment-0002.gif>

From alvise.dorigo at psi.ch  Mon Sep 16 13:58:03 2019
From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI))
Date: Mon, 16 Sep 2019 12:58:03 +0000
Subject: [gpfsug-discuss] Can 5-minutes frequent lsscsi command disrupt GPFS
 I/O on a Lenovo system ?
Message-ID: <83A6EEB0EC738F459A39439733AE80452BEA85FE@MBX214.d.ethz.ch>

Hello folks,
recently I observed that calling every 5 minutes the command "lsscsi -g" on a Lenovo I/O node (a X3650 M5 connected to D3284 enclosures, part of a DSS-G220 system) can seriously compromise the GPFS I/O performance.

(The motivation of running lsscsi every 5 minutes is a bit out of topic, but I can explain on request).

What we observed is that there were several GPFS waiters telling that flushing caches to physical disk was impossible and they had to wait (possibly going in timeout).

Is this something expected and/or observed by someone else in this community ?

Thanks
Regards,

   Alvise Dorigo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/479fd0fe/attachment-0002.htm>

From ewahl at osc.edu  Mon Sep 16 15:50:24 2019
From: ewahl at osc.edu (Wahl, Edward)
Date: Mon, 16 Sep 2019 14:50:24 +0000
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to
	be	expected?
In-Reply-To: <OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>
References: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>,
	<OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>
Message-ID: <DM5PR0102MB3494845B8859DD1F731DBA53A88C0@DM5PR0102MB3494.prod.exchangelabs.com>

What package provides this /usr/lib/tuned/  file?

Ed


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
Sent: Monday, September 16, 2019 3:12 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?

Hallo Heiner,
usually, Spectrum Scale comes with a tuned profile (named scale) ..

[root at nsd01 ~]# tuned-adm active
Current active profile: scale

in there
[root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3
# Disable IPv6
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
[root at nsd01 ~]#

depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ...
but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled...


From:        "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        09/13/2019 05:02 PM
Subject:        [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello,

I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated.  But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets.
I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level

Any comment is welcome

Cheers,
Heiner
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


I did check with

 ss -l -t -4
 ss -l -t  -6

add  -p to get the process name, too.

do you get the same results on your ces nodes?


[root at nas22ces04-i config_samples]#   ss -l -t   -4
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      8192                                                                                  *:gpfs                                                                                                *:*
LISTEN      0      50                                                                                    *:netbios-ssn                                                                                         *:*
LISTEN      0      128                                                                                   *:5355                                                                                                *:*
LISTEN      0      128                                                                                   *:sunrpc                                                                                              *:*
LISTEN      0      128                                                                                   *:ssh                                                                                                 *:*
LISTEN      0      100                                                                           127.0.0.1:smtp                                                                                                *:*
LISTEN      0      10                                                                        10.250.135.24:4379                                                                                                *:*
LISTEN      0      128                                                                                   *:32765                                                                                               *:*
LISTEN      0      50                                                                                    *:microsoft-ds                                                                                        *:*
[root at nas22ces04-i config_samples]#   ss -l -t   -6
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      128                                                                                  :::32767                                                                                              :::*
LISTEN      0      128                                                                                  :::32768                                                                                              :::*
LISTEN      0      128                                                                                  :::32769                                                                                              :::*
LISTEN      0      128                                                                                  :::2049                                                                                               :::*
LISTEN      0      128                                                                                  :::5355                                                                                               :::*
LISTEN      0      50                                                                                   :::netbios-ssn                                                                                        :::*
LISTEN      0      128                                                                                  :::sunrpc                                                                                             :::*
LISTEN      0      128                                                                                  :::ssh                                                                                                :::*
LISTEN      0      128                                                                                  :::32765                                                                                              :::*
LISTEN      0      50                                                                                   :::microsoft-ds                                                                                       :::*

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/32f48867/attachment-0002.htm>

From cblack at nygenome.org  Mon Sep 16 15:55:34 2019
From: cblack at nygenome.org (Christopher Black)
Date: Mon, 16 Sep 2019 14:55:34 +0000
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be
 expected?
In-Reply-To: <DM5PR0102MB3494845B8859DD1F731DBA53A88C0@DM5PR0102MB3494.prod.exchangelabs.com>
References: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
	<OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>
	<DM5PR0102MB3494845B8859DD1F731DBA53A88C0@DM5PR0102MB3494.prod.exchangelabs.com>
Message-ID: <C6000761-94CE-4D51-BF9A-B70AB3CF5B31@nygenome.org>

On our recent ESS systems we do not see /etc/tuned/scale/tuned.conf (or script.sh) owned by any package (rpm -qif ?).
I?ve attached what we have on our ESS 5.3.3 systems.

Best,
Chris

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "Wahl, Edward" <ewahl at osc.edu>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, September 16, 2019 at 10:50 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?

What package provides this /usr/lib/tuned/  file?

Ed


________________________________
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
Sent: Monday, September 16, 2019 3:12 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?

Hallo Heiner,
usually, Spectrum Scale comes with a tuned profile (named scale) ..

[root at nsd01 ~]# tuned-adm active
Current active profile: scale

in there
[root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3
# Disable IPv6
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
[root at nsd01 ~]#

depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ...
but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled...


From:        "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        09/13/2019 05:02 PM
Subject:        [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello,

I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated.  But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets.
I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level

Any comment is welcome

Cheers,
Heiner
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


I did check with

 ss -l -t -4
 ss -l -t  -6

add  -p to get the process name, too.

do you get the same results on your ces nodes?


[root at nas22ces04-i config_samples]#   ss -l -t   -4
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      8192                                                                                  *:gpfs                                                                                                *:*
LISTEN      0      50                                                                                    *:netbios-ssn                                                                                         *:*
LISTEN      0      128                                                                                   *:5355                                                                                                *:*
LISTEN      0      128                                                                                   *:sunrpc                                                                                              *:*
LISTEN      0      128                                                                                   *:ssh                                                                                                 *:*
LISTEN      0      100                                                                           127.0.0.1:smtp                                                                                                *:*
LISTEN      0      10                                                                        10.250.135.24:4379                                                                                                *:*
LISTEN      0      128                                                                                   *:32765                                                                                               *:*
LISTEN      0      50                                                                                    *:microsoft-ds                                                                                        *:*
[root at nas22ces04-i config_samples]#   ss -l -t   -6
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      128                                                                                  :::32767                                                                                              :::*
LISTEN      0      128                                                                                  :::32768                                                                                              :::*
LISTEN      0      128                                                                                  :::32769                                                                                              :::*
LISTEN      0      128                                                                                  :::2049                                                                                               :::*
LISTEN      0      128                                                                                  :::5355                                                                                               :::*
LISTEN      0      50                                                                                   :::netbios-ssn                                                                                        :::*
LISTEN      0      128                                                                                  :::sunrpc                                                                                             :::*
LISTEN      0      128                                                                                  :::ssh                                                                                                :::*
LISTEN      0      128                                                                                  :::32765                                                                                              :::*
LISTEN      0      50                                                                                   :::microsoft-ds                                                                                       :::*

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss<https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwMFAw&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=vVQfi9vKAMyJJJkblLG-6lFn75kWWfpc6yGZpiIkJMo&s=l9Yuaa-imE1XkV2RV-lyYdcH0aRV2klb5vXbPDHg6F4&e=>


________________________________
This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/8f60d415/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tuned.conf
Type: application/octet-stream
Size: 2859 bytes
Desc: tuned.conf
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/8f60d415/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: script.sh
Type: application/octet-stream
Size: 270 bytes
Desc: script.sh
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/8f60d415/attachment-0005.obj>

From heinrich.billich at id.ethz.ch  Mon Sep 16 16:49:57 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Mon, 16 Sep 2019 15:49:57 +0000
Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be
 expected?
In-Reply-To: <OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>
References: <C02E064B-0BCB-42AF-BF96-D3DFD0E10790@id.ethz.ch>
	<OF597DE095.E79B2222-ONC1258477.00270B08-C1258477.002790E0@notes.na.collabserv.com>
Message-ID: <766AA5C3-46BD-4B91-9D1E-52BC5FAB90A8@id.ethz.ch>

Hello Olaf,

Thank you, so we?ll try to get rid of IPv6. Actually we do have this settings active but  I may have to add them to the initrd file, too. (See https://access.redhat.com/solutions/8709#?rhel7disable) to prevent ganesha from opening an IPv6 socket. It?s probably no big issue if ganesha uses IPv4overIPv6 for all connections, but to keep things simple I would like to avoid it.

@Edward
We got /etc/tuned/scale/tuned.conf with GSS/xCAT. I?m not sure whether it?s part of any rpm.

Cheers,
Heiner
From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Olaf Weiser <olaf.weiser at de.ibm.com>
Reply to: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Monday, 16 September 2019 at 09:12
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?

Hallo Heiner,
usually, Spectrum Scale comes with a tuned profile (named scale) ..

[root at nsd01 ~]# tuned-adm active
Current active profile: scale

in there
[root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3
# Disable IPv6
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
[root at nsd01 ~]#

depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ...
but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled...


From:        "Billich  Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:        09/13/2019 05:02 PM
Subject:        [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected?
Sent by:        gpfsug-discuss-bounces at spectrumscale.org
________________________________


Hello,

I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated.  But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets.
I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level

Any comment is welcome

Cheers,
Heiner
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


I did check with

 ss -l -t -4
 ss -l -t  -6

add  -p to get the process name, too.

do you get the same results on your ces nodes?


[root at nas22ces04-i config_samples]#   ss -l -t   -4
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      8192                                                                                  *:gpfs                                                                                                *:*
LISTEN      0      50                                                                                    *:netbios-ssn                                                                                         *:*
LISTEN      0      128                                                                                   *:5355                                                                                                *:*
LISTEN      0      128                                                                                   *:sunrpc                                                                                              *:*
LISTEN      0      128                                                                                   *:ssh                                                                                                 *:*
LISTEN      0      100                                                                           127.0.0.1:smtp                                                                                                *:*
LISTEN      0      10                                                                        10.250.135.24:4379                                                                                                *:*
LISTEN      0      128                                                                                   *:32765                                                                                               *:*
LISTEN      0      50                                                                                    *:microsoft-ds                                                                                        *:*
[root at nas22ces04-i config_samples]#   ss -l -t   -6
State       Recv-Q Send-Q                                                                    Local Address:Port                                                                                     Peer Address:Port
LISTEN      0      128                                                                                  :::32767                                                                                              :::*
LISTEN      0      128                                                                                  :::32768                                                                                              :::*
LISTEN      0      128                                                                                  :::32769                                                                                              :::*
LISTEN      0      128                                                                                  :::2049                                                                                               :::*
LISTEN      0      128                                                                                  :::5355                                                                                               :::*
LISTEN      0      50                                                                                   :::netbios-ssn                                                                                        :::*
LISTEN      0      128                                                                                  :::sunrpc                                                                                             :::*
LISTEN      0      128                                                                                  :::ssh                                                                                                :::*
LISTEN      0      128                                                                                  :::32765                                                                                              :::*
LISTEN      0      50                                                                                   :::microsoft-ds                                                                                       :::*

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/e76352cd/attachment-0002.htm>

From Robert.Oesterlin at nuance.com  Mon Sep 16 18:34:07 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Mon, 16 Sep 2019 17:34:07 +0000
Subject: [gpfsug-discuss] SSUG @ SC19 Update: Scheduling and Sponsorship
	Opportunities
Message-ID: <F5A6F15E-EBD8-4CD2-B1E4-D6B5616184E2@nuance.com>

Two months until SC19 and the schedule is starting to come together, with a great mix of technical updates and user talks. I would like highlight a few items for you to be aware of:

- Morning session: We?re currently trying to put together a morning ?new users? session for those new to Spectrum Scale. These talks would be focused on fundamentals and give an opportunity to ask questions. We?re tentatively thinking about starting around 9:30-10 AM on Sunday November 17th. Watch the mailing list for updates and on the http://spectrumscale.org site.
- Sponsorships: We?re looking for sponsors. If your company is an IBM partner, uses/incorporates Spectrum Scale - please contact myself or Kristy Kallback-Rose. We are looking for sponsors to help with lunch (YES - we?d like to serve lunch this year!) and WiFi access during the user group meeting.

Looking forward to seeing you all at SC19. Registration link coming soon, watch here: https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-sc19/

Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190916/74eaddb9/attachment-0002.htm>

From S.J.Thompson at bham.ac.uk  Wed Sep 18 18:56:29 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Wed, 18 Sep 2019 17:56:29 +0000
Subject: [gpfsug-discuss] GPFS and POWER9
Message-ID: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk>

Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log:

Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]
Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert
Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000
Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3
Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]
Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert
Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000
Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]
Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [00000000115a2478] PID: 141380 Comm: mmfsd
Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU
Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Load/Store]
Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000003002a2a8400
Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c016590000
Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]
Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001150b160] PID: 141380 Comm: mmfsd
Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU
Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]
Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001150b160
Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c01fe80000
Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]
Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd
Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU
Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]
Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001086a7f0
Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c00fe70000
Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4

I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed.

Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)?

Simon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190918/19cb9b0e/attachment-0002.htm>

From abeattie at au1.ibm.com  Thu Sep 19 11:44:46 2019
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Thu, 19 Sep 2019 10:44:46 +0000
Subject: [gpfsug-discuss] GPFS and POWER9
In-Reply-To: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk>
References: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk>
Message-ID: <OF5BB3FFA4.5126D760-ON0025847A.003B07EE-0025847A.003B07F5@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/565ad74a/attachment-0002.htm>

From heinrich.billich at id.ethz.ch  Thu Sep 19 15:20:53 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Thu, 19 Sep 2019 14:20:53 +0000
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this
	unusual?
Message-ID: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>


Hello,

Is it usual to see 200?000-400?000 open files for a single ganesha process? Or does this indicate that something ist wrong?

We have some issues with ganesha (on spectrum scale protocol nodes)  reporting NFS3ERR_IO in the log. I noticed that the affected nodes have a large number of open files, 200?000-400?000 open files per daemon (and 500 threads and about 250 client connections). Other nodes have 1?000 ? 10?000 open files by ganesha only and don?t show the issue.

If someone could explain how ganesha decides which files to keep open and which to close that would help, too. As NFSv3 is stateless the client doesn?t open/close a file, it?s the server to decide when to close it? We do have a few NFSv4 clients, too.

Are there certain access patterns that can trigger such a large number of open file? Maybe traversing and reading a large number of small files?

Thank you,
Heiner

I did count the open files  by counting the entries in /proc/<pid of ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to list all the symbolic links, hence I can?t relate the open files to different exports easily.

I did post this to the ganesha mailing list, too.
--
=======================
Heinrich Billich
ETH Z?rich
Informatikdienste
Tel.: +41 44 632 72 56
heinrich.billich at id.ethz.ch
========================


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/52b76cf1/attachment-0002.htm>

From frederik.ferner at diamond.ac.uk  Thu Sep 19 15:30:45 2019
From: frederik.ferner at diamond.ac.uk (Frederik Ferner)
Date: Thu, 19 Sep 2019 15:30:45 +0100
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is
 this unusual?
In-Reply-To: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
Message-ID: <b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>

Heiner,

we are seeing similar issues with CES/ganesha NFS, in our case it 
exclusively with NFSv3 clients.

What is maxFilesToCache set to on your ganesha node(s)? In our case 
ganesha was running into the limit of open file descriptors because 
maxFilesToCache was set at a low default and for now we've increased it 
to 1M.

It seemed that ganesha was never releasing files even after clients 
unmounted the file system.

We've only recently made the change, so we'll see how much that improved 
the situation.

I thought we had a reproducer but after our recent change, I can now no 
longer successfully reproduce the increase in open files not being released.

Kind regards,
Frederik

On 19/09/2019 15:20, Billich  Heinrich Rainer (ID SD) wrote:
> Hello,
> 
> Is it usual to see 200?000-400?000 open files for a single ganesha 
> process? Or does this indicate that something ist wrong?
> 
> We have some issues with ganesha (on spectrum scale protocol nodes) 
>  ?reporting NFS3ERR_IO in the log. I noticed that the affected nodes 
> have a large number of open files, 200?000-400?000 open files per daemon 
> (and 500 threads and about 250 client connections). Other nodes have 
> 1?000 ? 10?000 open files by ganesha only and don?t show the issue.
> 
> If someone could explain how ganesha decides which files to keep open 
> and which to close that would help, too. As NFSv3 is stateless the 
> client doesn?t open/close a file, it?s the server to decide when to 
> close it? We do have a few NFSv4 clients, too.
> 
> Are there certain access patterns that can trigger such a large number 
> of open file? Maybe traversing and reading a large number of small files?
> 
> Thank you,
> 
> Heiner
> 
> I did count the open files ?by counting the entries in /proc/<pid of 
> ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to 
> list all the symbolic links, hence I can?t relate the open files to 
> different exports easily.
> 
> I did post this to the ganesha mailing list, too.
> 
> -- 
> 
> =======================
> 
> Heinrich Billich
> 
> ETH Z?rich
> 
> Informatikdienste
> 
> Tel.: +41 44 632 72 56
> 
> heinrich.billich at id.ethz.ch
> 
> ========================
> 
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 


-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom


From S.J.Thompson at bham.ac.uk  Thu Sep 19 16:18:47 2019
From: S.J.Thompson at bham.ac.uk (Simon Thompson)
Date: Thu, 19 Sep 2019 15:18:47 +0000
Subject: [gpfsug-discuss] GPFS and POWER9
In-Reply-To: <OF5BB3FFA4.5126D760-ON0025847A.003B07EE-0025847A.003B07F5@notes.na.collabserv.com>
References: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk>
	<OF5BB3FFA4.5126D760-ON0025847A.003B07EE-0025847A.003B07F5@notes.na.collabserv.com>
Message-ID: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>

Hi Andrew,

Yes, but not only. We use the two SFP+ ports from the Broadcom supplied card + the bifurcated Mellanox card in them.

Simon

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "abeattie at au1.ibm.com" <abeattie at au1.ibm.com>
Reply-To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 19 September 2019 at 11:45
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] GPFS and POWER9

Simon,

are you using Intel 10Gb Network Adapters with RH 7.6 by anychance?

regards
Andrew Beattie
File and Object Storage Technical Specialist - A/NZ
IBM Systems - Storage
Phone: 614-2133-7927
E-mail: abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>


----- Original message -----
From: Simon Thompson <S.J.Thompson at bham.ac.uk>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] GPFS and POWER9
Date: Thu, Sep 19, 2019 8:42 PM


Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log:


Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert

Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000

Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3

Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert

Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [00000000115a2478] PID: 141380 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Load/Store]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000003002a2a8400

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c016590000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001150b160] PID: 141380 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001150b160

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c01fe80000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001086a7f0

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c00fe70000

Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4


I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed.


Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)?


Simon
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/425e8bd9/attachment-0002.htm>

From mnaineni at in.ibm.com  Thu Sep 19 19:38:53 2019
From: mnaineni at in.ibm.com (Malahal R Naineni)
Date: Thu, 19 Sep 2019 18:38:53 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Ganesha_daemon_has_400=27000_open_file?=
 =?utf-8?q?s_-_is_this=09unusual=3F?=
In-Reply-To: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
Message-ID: <OF0B80358D.CD6BEB6B-ON0025847A.0063AE28-0025847A.0066700B@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/bd109dc6/attachment-0002.htm>

From abeattie at au1.ibm.com  Thu Sep 19 22:34:33 2019
From: abeattie at au1.ibm.com (Andrew Beattie)
Date: Thu, 19 Sep 2019 21:34:33 +0000
Subject: [gpfsug-discuss] GPFS and POWER9
In-Reply-To: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>
References: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>,
	<2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk><OF5BB3FFA4.5126D760-ON0025847A.003B07EE-0025847A.003B07F5@notes.na.collabserv.com>
Message-ID: <OF41085F4B.A1242A6C-ON0025847A.00767BDC-0025847A.00768562@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/4aedb8d1/attachment-0002.htm>

From Robert.Oesterlin at nuance.com  Thu Sep 19 23:41:08 2019
From: Robert.Oesterlin at nuance.com (Oesterlin, Robert)
Date: Thu, 19 Sep 2019 22:41:08 +0000
Subject: [gpfsug-discuss] Leftover GUI events after ESS upgrade
Message-ID: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com>

I just upgraded to ESS 5.3.4-1, and during the process these appeared. They only show up in the GUI. They don?t appear in gnrhelathcheck or mmhealth. pdisk checks are clearAny idea how to get rid of them?


GSSIO1-HS GNR pdisk rg_gssio1-hs/n001v001 is diagnosing

GSSIO1-HS GNR pdisk rg_gssio2-hs/n001v002 is diagnosing


Bob Oesterlin
Sr Principal Storage Engineer, Nuance

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190919/d3559e5e/attachment-0002.htm>

From TROPPENS at de.ibm.com  Fri Sep 20 09:08:01 2019
From: TROPPENS at de.ibm.com (Ulf Troppens)
Date: Fri, 20 Sep 2019 10:08:01 +0200
Subject: [gpfsug-discuss] Agenda and registration link // Oct 10 - Spectrum
	Scale NYC User Meeting
Message-ID: <OF64C797F0.8BE6F29D-ONC125847B.0029D9CA-C125847B.002CAE42@notes.na.collabserv.com>


Draft agenda and registration link are now available:
https://www.spectrumscaleug.org/event/spectrum-scale-nyc-user-meeting-2019/


--
IBM Spectrum Scale Development - Client Engagements & Solutions Delivery
Consulting IT Specialist
Author "Storage Networks Explained"

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart,
HRB 243294

----- Forwarded by Ulf Troppens/Germany/IBM on 20/09/2019 09:37 -----

From:	"Ulf Troppens" <TROPPENS at de.ibm.com>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	11/09/2019 14:27
Subject:	[EXTERNAL] [gpfsug-discuss] Save the date: Oct 10 - Spectrum
            Scale NYC User	Meeting
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


Greetings,

NYU Langone and IBM will host a Spectrum Scale User Meeting on October 10.
Many senior engineers of our development lab in Poughkeepsie will attend
and present. Details with agenda, exact location and registration link will
follow.

Best
Ulf


--
IBM Spectrum Scale Development - Client Engagements & Solutions Delivery
Consulting IT Specialist
Author "Storage Networks Explained"

IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Gesch?ftsf?hrung: Dirk Wittkopp
Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart,
HRB 243294
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=kZaabFheMr5-INuBtDMnDjxzZMuvvQ-K0cx1FAfh4lg&m=I3TzCv5SKxKb51eAL_blo-XwctX64z70ayrZKERanWA&s=OSKGngwXAoOemFy3HkctexuIpBJQu8NPeTkC_MMQBks&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190920/ce0d530a/attachment-0002.htm>

From rohwedder at de.ibm.com  Fri Sep 20 10:14:58 2019
From: rohwedder at de.ibm.com (Markus Rohwedder)
Date: Fri, 20 Sep 2019 11:14:58 +0200
Subject: [gpfsug-discuss] Leftover GUI events after ESS upgrade
In-Reply-To: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com>
References: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com>
Message-ID: <OF65D349DD.8095BE48-ON0025847B.003194D2-C125847B.0032CF35@notes.na.collabserv.com>

Hello Bob,

this event is a "Notice": You can use the action "Mark  Selected Notices as
Read" or "Mark All Notices as Read"in the GUI Event Groups or Individual
Events grid.
Notice events are transient by nature and don't imply a permanent state
change of an entity.
It seems that during the upgrade, mmhealth had probed the pdisk and the
disk hospital was diagnosing the pdisk at this time, but eventually disk
hospital placed the pdisk back to normal state,

Mit freundlichen Gr??en / Kind regards

Dr. Markus Rohwedder

Spectrum Scale GUI Development
                                                                                   
                                                                                   
 Phone:  +49 162 4159920       IBM Deutschland Research &                          
                              Development                                          
                                                                                   
 E-Mail: rohwedder at de.ibm.com  Am Weiher 24                                        
                                                                                   
                               65451 Kelsterbach                                   
                                                                                   
                               Germany                                             
                                                                                   
                                                                                   
From:	"Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	20.09.2019 00:53
Subject:	[EXTERNAL] [gpfsug-discuss] Leftover GUI events after ESS
            upgrade
Sent by:	gpfsug-discuss-bounces at spectrumscale.org


I just upgraded to ESS 5.3.4-1, and during the process these appeared. They
only show up in the GUI. They don?t appear in gnrhelathcheck or mmhealth.
pdisk checks are clearAny idea how to get rid of them?

GSSIO1-HS GNR pdisk rg_gssio1-hs/n001v001 is diagnosing
GSSIO1-HS GNR pdisk rg_gssio2-hs/n001v002 is diagnosing


Bob Oesterlin
Sr Principal Storage Engineer, Nuance
 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=hLyf83U0otjISdpV5zl1cSCPVFFUF61ny3jWvv-5kNQ&s=ptMGcpNhnRTogPO2CN_l6jhC-vCN-VQAf53HmRLQDq8&e=


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190920/074ed425/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190920/074ed425/attachment-0006.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 14525383.gif
Type: image/gif
Size: 4659 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190920/074ed425/attachment-0007.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190920/074ed425/attachment-0008.gif>

From heinrich.billich at id.ethz.ch  Mon Sep 23 10:33:02 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Mon, 23 Sep 2019 09:33:02 +0000
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is
 this unusual?
In-Reply-To: <b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
	<b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>
Message-ID: <9D53BE88-A5FC-469F-9362-F2EC67E393B7@id.ethz.ch>

Hello Frederik,

Thank you. I now see a similar behavior: Ganesha has 500k open files while the node is suspended since 2+hours. I would expect that some cleanup job does remove most of the open FD after a much shorter while. Our systems have an upper limit of 1M open files per process and these spectrum scale settings:

! maxFilesToCache 1048576
! maxStatCache 2097152

Our ganesha version is 2.5.3. (gpfs.nfs-ganesha-2.5.3-ibm036.10.el7). I don't see the issue with gpfs.nfs-ganesha-2.5.3-ibm030.01.el7. But this second cluster also has a different load pattern.

I did also post my initial question to the ganesha mailing list and want to share the reply I've got from Daniel Gryniewicz.

Cheers,
Heiner

Daniel Gryniewicz <dang at redhat.com>
So, it's not impossible, based on the workload, but it may also be a bug.

For global FDs (All NFSv3 and stateless NFSv4), we obviously cannot know
when the client closes the FD, and opening/closing all the time causes a
large performance hit.  So, we cache open FDs.

All handles in MDCACHE live on the LRU.  This LRU is divided into 2
levels.  Level 1 is more active handles, and they can have open FDs.
Various operation can demote a handle to level 2 of the LRU.  As part of
this transition, the global FD on that handle is closed.  Handles that
are actively in use (have a refcount taken on them) are not eligible for
this transition, as the FD may be being used.

We have a background thread that runs, and periodically does this
demotion, closing the FDs.  This thread runs more often when the number
of open FDs is above FD_HwMark_Percent of the available number of FDs,
and runs constantly when the open FD count is above FD_Limit_Percent of
the available number of FDs.

So, a heavily used server could definitely have large numbers of FDs
open.  However, there have also, in the past, been bugs that would
either keep the FDs from being closed, or would break the accounting (so
they were closed, but Ganesha still thought they were open).  You didn't
say what version of Ganesha you're using, so I can't tell if one of
those bugs apply.

Daniel

?On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" <gpfsug-discuss-bounces at spectrumscale.org on behalf of frederik.ferner at diamond.ac.uk> wrote:

    Heiner,
    
    we are seeing similar issues with CES/ganesha NFS, in our case it 
    exclusively with NFSv3 clients.
    
    What is maxFilesToCache set to on your ganesha node(s)? In our case 
    ganesha was running into the limit of open file descriptors because 
    maxFilesToCache was set at a low default and for now we've increased it 
    to 1M.
    
    It seemed that ganesha was never releasing files even after clients 
    unmounted the file system.
    
    We've only recently made the change, so we'll see how much that improved 
    the situation.
    
    I thought we had a reproducer but after our recent change, I can now no 
    longer successfully reproduce the increase in open files not being released.
    
    Kind regards,
    Frederik
    
    On 19/09/2019 15:20, Billich  Heinrich Rainer (ID SD) wrote:
    > Hello,
    > 
    > Is it usual to see 200?000-400?000 open files for a single ganesha 
    > process? Or does this indicate that something ist wrong?
    > 
    > We have some issues with ganesha (on spectrum scale protocol nodes) 
    >   reporting NFS3ERR_IO in the log. I noticed that the affected nodes 
    > have a large number of open files, 200?000-400?000 open files per daemon 
    > (and 500 threads and about 250 client connections). Other nodes have 
    > 1?000 ? 10?000 open files by ganesha only and don?t show the issue.
    > 
    > If someone could explain how ganesha decides which files to keep open 
    > and which to close that would help, too. As NFSv3 is stateless the 
    > client doesn?t open/close a file, it?s the server to decide when to 
    > close it? We do have a few NFSv4 clients, too.
    > 
    > Are there certain access patterns that can trigger such a large number 
    > of open file? Maybe traversing and reading a large number of small files?
    > 
    > Thank you,
    > 
    > Heiner
    > 
    > I did count the open files  by counting the entries in /proc/<pid of 
    > ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to 
    > list all the symbolic links, hence I can?t relate the open files to 
    > different exports easily.
    > 
    > I did post this to the ganesha mailing list, too.
    > 
    > -- 
    > 
    > =======================
    > 
    > Heinrich Billich
    > 
    > ETH Z?rich
    > 
    > Informatikdienste
    > 
    > Tel.: +41 44 632 72 56
    > 
    > heinrich.billich at id.ethz.ch
    > 
    > ========================
    > 
    > 
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    > 
    
    
    -- 
    This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
    Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
    Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
    Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    

From heinrich.billich at id.ethz.ch  Mon Sep 23 11:43:06 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Mon, 23 Sep 2019 10:43:06 +0000
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is
 this unusual?
In-Reply-To: <OF0B80358D.CD6BEB6B-ON0025847A.0063AE28-0025847A.0066700B@notes.na.collabserv.com>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
	<OF0B80358D.CD6BEB6B-ON0025847A.0063AE28-0025847A.0066700B@notes.na.collabserv.com>
Message-ID: <72079C31-1E3E-4F69-B428-480620466353@id.ethz.ch>

Hello Malhal,

Thank you. Actually I don?t see the parameter Cache_FDs in our ganesha config. But when I trace LRU processing I see that almost no FDs get released. And the number of FDs given in the log messages doesn?t match what I see in /proc/<pid of ganesha>/fd/. I see 512k open files while the logfile give 600k. Even 4hours since the I suspended the node and all i/o activity stopped I see 500k open files and LRU processing doesn?t close any of them.

This looks like a bug in gpfs.nfs-ganesha-2.5.3-ibm036.10.el7. I?ll open a case with IBM. We did see gansha to fail to open new files and hence client requests to fail. I assume that 500K FDs compared to 10K FDs as before create some notable overhead for ganesha, spectrum scale and kernel and withdraw resources from samba.

I?ll post to the list once we got some results.

Cheers,

Heiner


Start of LRU processing

2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1027
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51350 totalclosed:6
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1027
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1027 closing 0 descriptors
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1028
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51400 totalclosed:6
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1028
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1028 closing 0 descriptors

End of log
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1029
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1029 closing 0 descriptors
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1030
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51500 totalclosed:6
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1030
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1030 closing 0 descriptors
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :After work, open_fd_count:607024  count:29503718 fdrate:1908874353 threadwait=9
2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :currentopen=607024 futility=0 totalwork=51550 biggest_window=335544 extremis=0 lanes=1031 fds_lowat=167772

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Malahal R Naineni <mnaineni at in.ibm.com>
Reply to: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Thursday, 19 September 2019 at 20:39
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: Re: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual?

NFSv3 doesn't have open/close requests, so nfs-ganesha opens a file for read/write when there is an NFSv3 read/write request. It does cache file descriptors, so its open count can be very large. If you have 'Cache_FDs = true" in your config, ganesha aggressively caches file descriptors.

Taking traces with COMPONENT_CACHE_INODE_LRU level set to full debug should give us better insight on what is happening when the the open file descriptors count is very high.

When the I/O failure happens or when the open fd count is high, you could do the following:

1. ganesha_mgr set_log COMPONENT_CACHE_INODE_LRU FULL_DEBUG
2. wait for 90 seconds, then run
3. ganesha_mgr set_log COMPONENT_CACHE_INODE_LRU EVENT

Regards, Malahal.

----- Original message -----
From: "Billich Heinrich Rainer (ID SD)" <heinrich.billich at id.ethz.ch>
Sent by: gpfsug-discuss-bounces at spectrumscale.org
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual?
Date: Thu, Sep 19, 2019 7:51 PM


Hello,


Is it usual to see 200?000-400?000 open files for a single ganesha process? Or does this indicate that something ist wrong?


We have some issues with ganesha (on spectrum scale protocol nodes)  reporting NFS3ERR_IO in the log. I noticed that the affected nodes have a large number of open files, 200?000-400?000 open files per daemon (and 500 threads and about 250 client connections). Other nodes have 1?000 ? 10?000 open files by ganesha only and don?t show the issue.


If someone could explain how ganesha decides which files to keep open and which to close that would help, too. As NFSv3 is stateless the client doesn?t open/close a file, it?s the server to decide when to close it? We do have a few NFSv4 clients, too.


Are there certain access patterns that can trigger such a large number of open file? Maybe traversing and reading a large number of small files?


Thank you,

Heiner


I did count the open files  by counting the entries in /proc/<pid of ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to list all the symbolic links, hence I can?t relate the open files to different exports easily.


I did post this to the ganesha mailing list, too.

--

=======================

Heinrich Billich

ETH Z?rich

Informatikdienste

Tel.: +41 44 632 72 56

heinrich.billich at id.ethz.ch

========================


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190923/f76d0274/attachment-0002.htm>

From heinrich.billich at id.ethz.ch  Tue Sep 24 09:52:34 2019
From: heinrich.billich at id.ethz.ch (Billich  Heinrich Rainer (ID SD))
Date: Tue, 24 Sep 2019 08:52:34 +0000
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is
 this unusual?
In-Reply-To: <b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
	<b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>
Message-ID: <280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch>

Hello Frederik,

Just some addition, maybe its of interest to someone:  The number of max open files for Ganesha is based on maxFilesToCache. Its. 80%of maxFilesToCache up to an upper and lower limits of 2000/1M. The active setting is visible in /etc/sysconfig/ganesha.

Cheers,

Heiner

?On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" <gpfsug-discuss-bounces at spectrumscale.org on behalf of frederik.ferner at diamond.ac.uk> wrote:

    Heiner,
    
    we are seeing similar issues with CES/ganesha NFS, in our case it 
    exclusively with NFSv3 clients.
    
    What is maxFilesToCache set to on your ganesha node(s)? In our case 
    ganesha was running into the limit of open file descriptors because 
    maxFilesToCache was set at a low default and for now we've increased it 
    to 1M.
    
    It seemed that ganesha was never releasing files even after clients 
    unmounted the file system.
    
    We've only recently made the change, so we'll see how much that improved 
    the situation.
    
    I thought we had a reproducer but after our recent change, I can now no 
    longer successfully reproduce the increase in open files not being released.
    
    Kind regards,
    Frederik
    
    On 19/09/2019 15:20, Billich  Heinrich Rainer (ID SD) wrote:
    > Hello,
    > 
    > Is it usual to see 200?000-400?000 open files for a single ganesha 
    > process? Or does this indicate that something ist wrong?
    > 
    > We have some issues with ganesha (on spectrum scale protocol nodes) 
    >   reporting NFS3ERR_IO in the log. I noticed that the affected nodes 
    > have a large number of open files, 200?000-400?000 open files per daemon 
    > (and 500 threads and about 250 client connections). Other nodes have 
    > 1?000 ? 10?000 open files by ganesha only and don?t show the issue.
    > 
    > If someone could explain how ganesha decides which files to keep open 
    > and which to close that would help, too. As NFSv3 is stateless the 
    > client doesn?t open/close a file, it?s the server to decide when to 
    > close it? We do have a few NFSv4 clients, too.
    > 
    > Are there certain access patterns that can trigger such a large number 
    > of open file? Maybe traversing and reading a large number of small files?
    > 
    > Thank you,
    > 
    > Heiner
    > 
    > I did count the open files  by counting the entries in /proc/<pid of 
    > ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to 
    > list all the symbolic links, hence I can?t relate the open files to 
    > different exports easily.
    > 
    > I did post this to the ganesha mailing list, too.
    > 
    > -- 
    > 
    > =======================
    > 
    > Heinrich Billich
    > 
    > ETH Z?rich
    > 
    > Informatikdienste
    > 
    > Tel.: +41 44 632 72 56
    > 
    > heinrich.billich at id.ethz.ch
    > 
    > ========================
    > 
    > 
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    > 
    
    
    -- 
    This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
    Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
    Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
    Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    

From valdis.kletnieks at vt.edu  Tue Sep 24 21:41:07 2019
From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks)
Date: Tue, 24 Sep 2019 16:41:07 -0400
Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is
	this unusual?
In-Reply-To: <280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch>
References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch>
	<b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk>
	<280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch>
Message-ID: <269692.1569357667@turing-police>

On Tue, 24 Sep 2019 08:52:34 -0000, "Billich Heinrich Rainer (ID SD)" said:
> Just some addition, maybe its of interest to someone:  The number of max open
> files for Ganesha is based on maxFilesToCache. Its. 80%of maxFilesToCache up to
> an upper and lower limits of 2000/1M. The active setting is visible in
> /etc/sysconfig/ganesha.

Note that strictly speaking, the values in /etc/sysconfig are in general the
values that will be used at next restart - it's totally possible for the system
to boot, the then-current values be picked up from /etc/sysconfig, and then any
number of things, from configuration automation tools like Ansible, to a
cow-orker sysadmin armed with nothing but /usr/bin/vi, to have changed the
values without you knowing about it and the daemons not be restarted yet...

(Let's just say that in 4 decades of doing this stuff, I've been surprised by that
sort of thing a few times.  :)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190924/0ccd6eff/attachment-0002.sig>

From mnaineni at in.ibm.com  Wed Sep 25 18:06:18 2019
From: mnaineni at in.ibm.com (Malahal R Naineni)
Date: Wed, 25 Sep 2019 17:06:18 +0000
Subject: [gpfsug-discuss]
 =?utf-8?q?Ganesha_daemon_has_400=27000_open_file?=
 =?utf-8?q?s_-_is=09this_unusual=3F?=
In-Reply-To: <269692.1569357667@turing-police>
References: <269692.1569357667@turing-police>,
	<819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch><b162d64c-17d9-64c0-380a-ce380a4be5c3@diamond.ac.uk><280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch>
Message-ID: <OF9267490D.933DCBEF-ON00258480.005D5A71-00258480.005DF60D@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190925/981e9862/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: att6j9ca.dat
Type: application/octet-stream
Size: 849 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190925/981e9862/attachment-0002.obj>

From L.R.Sudbery at bham.ac.uk  Thu Sep 26 10:38:09 2019
From: L.R.Sudbery at bham.ac.uk (Luke Sudbery)
Date: Thu, 26 Sep 2019 09:38:09 +0000
Subject: [gpfsug-discuss] GPFS and POWER9
In-Reply-To: <OF41085F4B.A1242A6C-ON0025847A.00767BDC-0025847A.00768562@notes.na.collabserv.com>
References: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>,
	<2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk><OF5BB3FFA4.5126D760-ON0025847A.003B07EE-0025847A.003B07F5@notes.na.collabserv.com>
	<OF41085F4B.A1242A6C-ON0025847A.00767BDC-0025847A.00768562@notes.na.collabserv.com>
Message-ID: <3b15db460ac1459e9ca53bec00f30833@bham.ac.uk>

We think our issue was down to numa settings actually - making mmfsd allocate GPU memory. Makes sense given the type of error.

Tomer suggested to Simon we set numactlOptioni to "0 8", as per:
https://www-01.ibm.com/support/docview.wss?uid=isg1IJ02794

Our tests are not crashing since setting then ? we need to roll it out on all nodes to confirm its fixed all our hangs/reboots.

Cheers,

Luke

--
Luke Sudbery
Architecture, Infrastructure and Systems
Advanced Research Computing, IT Services
Room 132, Computer Centre G5, Elms Road

Please note I don?t work on Monday and work from home on Friday.

From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of abeattie at au1.ibm.com
Sent: 19 September 2019 22:35
To: gpfsug-discuss at spectrumscale.org
Cc: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] GPFS and POWER9

Simon,

I have an open support call that required Redhat to create a kernel patch for RH 7.6 because of issues with the Intel x710 network adapter - I can't tell you if its related to your issue or not

but it would cause the GPFS cluster to reboot and the affected node to reboot if we tried to do almost anything with that intel adapter

regards,
Andrew Beattie
File and Object Storage Technical Specialist - A/NZ
IBM Systems - Storage
Phone: 614-2133-7927
E-mail: abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>


----- Original message -----
From: Simon Thompson <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Cc:
Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS and POWER9
Date: Fri, Sep 20, 2019 1:18 AM


Hi Andrew,


Yes, but not only. We use the two SFP+ ports from the Broadcom supplied card + the bifurcated Mellanox card in them.


Simon


From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of "abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>" <abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>>
Reply-To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Thursday, 19 September 2019 at 11:45
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] GPFS and POWER9


Simon,


are you using Intel 10Gb Network Adapters with RH 7.6 by anychance?


regards

Andrew Beattie

File and Object Storage Technical Specialist - A/NZ

IBM Systems - Storage

Phone: 614-2133-7927

E-mail: abeattie at au1.ibm.com<mailto:abeattie at au1.ibm.com>


----- Original message -----
From: Simon Thompson <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>
Sent by: gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>
To: "gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>" <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Cc:
Subject: [EXTERNAL] [gpfsug-discuss] GPFS and POWER9
Date: Thu, Sep 19, 2019 8:42 PM


Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log:


Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert

Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000

Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3

Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert

Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [00000000115a2478] PID: 141380 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Load/Store]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000003002a2a8400

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c016590000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001150b160] PID: 141380 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001150b160

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c01fe80000

Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered]

Sep 18 18:45:14 bear-pg0306u11a kernel:  NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd

Sep 18 18:45:14 bear-pg0306u11a kernel:  Initiator: CPU

Sep 18 18:45:14 bear-pg0306u11a kernel:  Error type: UE [Instruction fetch]

Sep 18 18:45:14 bear-pg0306u11a kernel:    Effective address: 000000001086a7f0

Sep 18 18:45:14 bear-pg0306u11a kernel:    Physical address:  000003c00fe70000

Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4


I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed.


Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)?


Simon

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190926/0a8cd98f/attachment-0002.htm>

From andreas.mattsson at maxiv.lu.se  Thu Sep 26 10:55:45 2019
From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson)
Date: Thu, 26 Sep 2019 09:55:45 +0000
Subject: [gpfsug-discuss] afmRefreshAsync questions
Message-ID: <ec6802b481a4474e8a346676ee8ee817@maxiv.lu.se>

Hi,

Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up.

Enabling this feature has had zero impact on performance of the software though.


The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x.

Would this feature still have any effect in this setup?


Regards,

Andreas Mattsson


____________________________________________

[X]

Andreas Mattsson

Systems Engineer


MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 224 84 Lund
Mobile: +46 706 64 95 44
<mailto:andreas.mattsson at maxiv.se>andreas.mattsson at maxiv.lu.se<mailto:andreas.mattsson at maxiv.lu.se>
www.maxiv.se<http://www.maxiv.se/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190926/be6fb1ca/attachment-0002.htm>

From vpuvvada at in.ibm.com  Fri Sep 27 09:23:13 2019
From: vpuvvada at in.ibm.com (Venkateswara R Puvvada)
Date: Fri, 27 Sep 2019 13:53:13 +0530
Subject: [gpfsug-discuss] afmRefreshAsync questions
In-Reply-To: <ec6802b481a4474e8a346676ee8ee817@maxiv.lu.se>
References: <ec6802b481a4474e8a346676ee8ee817@maxiv.lu.se>
Message-ID: <OFFED7B75A.B058DB34-ON65258482.002DA8DD-65258482.002E1244@notes.na.collabserv.com>

Hi,

Both storage and client clusters  have to be on 5.0.3.x to get the AFM 
revalidation performance with afmRefreshAsync. What are the refresh 
intervals ?, you could also try increasing them. Is this config option set 
at fileset level or cluster level ?

~Venkat (vpuvvada at in.ibm.com)


From:   Andreas Mattsson <andreas.mattsson at maxiv.lu.se>
To:     GPFS User Group <gpfsug-discuss at spectrumscale.org>
Date:   09/26/2019 03:26 PM
Subject:        [EXTERNAL] [gpfsug-discuss] afmRefreshAsync questions
Sent by:        gpfsug-discuss-bounces at spectrumscale.org


Hi,
Due to having a data analysis software that isn't running well at all in 
our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM 
fileset on the same storage system, I wanted to try out the 
afmRefreshAsync feature that came with 5.0.3 to see if it is the cache 
data refresh that is holding things up.
Enabling this feature has had zero impact on performance of the software 
though.

The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set 
there, but at the moment the remote-mounting client cluster is still 
running 5.0.2.x.
Would this feature still have any effect in this setup?

Regards,
Andreas Mattsson

____________________________________________


Andreas Mattsson
Systems Engineer
 
MAX IV Laboratory
Lund University
P.O. Box 118, SE-221 00 Lund, Sweden
Visiting address: Fotongatan 2, 224 84 Lund
Mobile: +46 706 64 95 44
andreas.mattsson at maxiv.lu.se
www.maxiv.se
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=tjCOcTjZ_AjP3N1mpspwuLu5u2XOFb5LkZqVAwX3wk8&s=tD6X2XM1HPMqWxSg-IelnstWbneQ7On4xfEVkCajtPE&e= 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190927/0ef79489/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 4232 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190927/0ef79489/attachment-0002.png>

From sakkuma4 at in.ibm.com  Fri Sep 27 11:31:42 2019
From: sakkuma4 at in.ibm.com (Saket Kumar11)
Date: Fri, 27 Sep 2019 10:31:42 +0000
Subject: [gpfsug-discuss] afmRefreshAsync questions
In-Reply-To: <mailman.1.1569495602.52991.gpfsug-discuss@spectrumscale.org>
References: <mailman.1.1569495602.52991.gpfsug-discuss@spectrumscale.org>
Message-ID: <OFAE8D4A31.6A3AF249-ON00258481.00443BC9-00258482.0039D5A9@notes.na.collabserv.com>

An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20190927/c70622b4/attachment-0002.htm>