From abeattie at au1.ibm.com Sun Sep 1 14:17:01 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Sun, 1 Sep 2019 13:17:01 +0000 Subject: [gpfsug-discuss] Backup question In-Reply-To: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com> References: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com> Message-ID: An HTML attachment was scrubbed... URL: From sandeep.patil at in.ibm.com Tue Sep 3 06:28:30 2019 From: sandeep.patil at in.ibm.com (Sandeep Ramesh) Date: Tue, 3 Sep 2019 05:28:30 +0000 Subject: [gpfsug-discuss] Latest Technical Blogs on IBM Spectrum Scale (Q2 2019) In-Reply-To: References: Message-ID: Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q2 2019). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. Redpaper : IBM Power Systems Enterprise AI Solutions (W/ SPECTRUM SCALE) http://www.redbooks.ibm.com/redpieces/abstracts/redp5556.html?Open IBM Spectrum Scale Erasure Code Edition (ECE): Installation Demonstration https://www.youtube.com/watch?v=6If50EvgP-U Blogs: Using IBM Spectrum Scale as platform storage for running containerized Hadoop/Spark workloads https://developer.ibm.com/storage/2019/08/27/using-ibm-spectrum-scale-as-platform-storage-for-running-containerized-hadoop-spark-workloads/ Useful Tools for Spectrum Scale CES NFS https://developer.ibm.com/storage/2019/07/22/useful-tools-for-spectrum-scale-ces-nfs/ How to ensure NFS uses strong encryption algorithms for secure data in motion ? https://developer.ibm.com/storage/2019/07/19/how-to-ensure-nfs-uses-strong-encryption-algorithms-for-secure-data-in-motion/ Introducing IBM Spectrum Scale Erasure Code Edition https://developer.ibm.com/storage/2019/07/07/introducing-ibm-spectrum-scale-erasure-code-edition/ Spectrum Scale: Which Filesystem Encryption Algo to Consider ? https://developer.ibm.com/storage/2019/07/01/spectrum-scale-which-filesystem-encryption-algo-to-consider/ IBM Spectrum Scale HDFS Transparency Apache Hadoop 3.1.x Support https://developer.ibm.com/storage/2019/06/24/ibm-spectrum-scale-hdfs-transparency-apache-hadoop-3-0-x-support/ Enhanced features in Elastic Storage Server (ESS) 5.3.4 https://developer.ibm.com/storage/2019/06/19/enhanced-features-in-elastic-storage-server-ess-5-3-4/ Upgrading IBM Spectrum Scale Erasure Code Edition using installation toolkit https://developer.ibm.com/storage/2019/06/09/upgrading-ibm-spectrum-scale-erasure-code-edition-using-installation-toolkit/ Upgrading IBM Spectrum Scale sync replication / stretch cluster setup in PureApp https://developer.ibm.com/storage/2019/06/06/upgrading-ibm-spectrum-scale-sync-replication-stretch-cluster-setup/ GPFS config remote access with multiple network definitions https://developer.ibm.com/storage/2019/05/30/gpfs-config-remote-access-with-multiple-network-definitions/ IBM Spectrum Scale Erasure Code Edition Fault Tolerance https://developer.ibm.com/storage/2019/05/30/ibm-spectrum-scale-erasure-code-edition-fault-tolerance/ IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 5.0.3 ? https://developer.ibm.com/storage/2019/05/02/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-3/ Understanding and Solving WBC_ERR_DOMAIN_NOT_FOUND error with Spectrum Scale https://crk10.wordpress.com/2019/07/21/solving-the-wbc-err-domain-not-found-nt-status-none-mapped-glitch-in-ibm-spectrum-scale/ Understanding and Solving NT_STATUS_INVALID_SID issue for SMB access with Spectrum Scale https://crk10.wordpress.com/2019/07/24/solving-nt_status_invalid_sid-for-smb-share-access-in-ibm-spectrum-scale/ mmadquery primer (apparatus to query Active Directory from IBM Spectrum Scale) https://crk10.wordpress.com/2019/07/27/mmadquery-primer-apparatus-to-query-active-directory-from-ibm-spectrum-scale/ How to configure RHEL host as Active Directory Client using SSSD https://crk10.wordpress.com/2019/07/28/configure-rhel-machine-as-active-directory-client-using-sssd/ How to configure RHEL host as LDAP client using nslcd https://crk10.wordpress.com/2019/07/28/configure-rhel-machine-as-ldap-client-using-nslcd/ Solving NFSv4 AUTH_SYS nobody ownership issue https://crk10.wordpress.com/2019/07/29/nfsv4-auth_sys-nobody-ownership-and-idmapd/ For more : Search /browse here: https://developer.ibm.com/storage/blog User Group Presentations: https://www.spectrumscale.org/presentations/ Consolidation list of all blogs and collaterals. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 04/29/2019 12:12 PM Subject: Latest Technical Blogs on IBM Spectrum Scale (Q1 2019) Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q1 2019). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. Spectrum Scale 5.0.3 https://developer.ibm.com/storage/2019/04/24/spectrum-scale-5-0-3/ IBM Spectrum Scale HDFS Transparency Ranger Support https://developer.ibm.com/storage/2019/04/01/ibm-spectrum-scale-hdfs-transparency-ranger-support/ Integration of IBM Aspera Sync with IBM Spectrum Scale: Protecting and Sharing Files Globally, http://www.redbooks.ibm.com/abstracts/redp5527.html?Open Spectrum Scale user group in Singapore, 2019 https://developer.ibm.com/storage/2019/03/14/spectrum-scale-user-group-in-singapore-2019/ 7 traits to use Spectrum Scale to run container workload https://developer.ibm.com/storage/2019/02/26/7-traits-to-use-spectrum-scale-to-run-container-workload/ Health Monitoring of IBM Spectrum Scale Cluster via External Monitoring Framework https://developer.ibm.com/storage/2019/01/22/health-monitoring-of-ibm-spectrum-scale-cluster-via-external-monitoring-framework/ Migrating data from native HDFS to IBM Spectrum Scale based shared storage https://developer.ibm.com/storage/2019/01/18/migrating-data-from-native-hdfs-to-ibm-spectrum-scale-based-shared-storage/ Bulk File Creation useful for Test on Filesystems https://developer.ibm.com/storage/2019/01/16/bulk-file-creation-useful-for-test-on-filesystems/ For more : Search /browse here: https://developer.ibm.com/storage/blog User Group Presentations: https://www.spectrumscale.org/presentations/ Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 01/14/2019 06:24 PM Subject: Latest Technical Blogs on IBM Spectrum Scale (Q4 2018) Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q4 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. Redpaper: IBM Spectrum Scale and IBM StoredIQ: Identifying and securing your business data to support regulatory requirements http://www.redbooks.ibm.com/abstracts/redp5525.html?Open IBM Spectrum Scale Memory Usage https://www.slideshare.net/tomerperry/ibm-spectrum-scale-memory-usage?qid=50a1dfda-3102-484f-b9d0-14b69fc4800b&v=&b=&from_search=2 Spectrum Scale and Containers https://developer.ibm.com/storage/2018/12/20/spectrum-scale-and-containers/ IBM Elastic Storage Server Performance Graphical Visualization with Grafana https://developer.ibm.com/storage/2018/12/18/ibm-elastic-storage-server-performance-graphical-visualization-with-grafana/ Hadoop Performance for disaggregated compute and storage configurations based on IBM Spectrum Scale Storage https://developer.ibm.com/storage/2018/12/13/hadoop-performance-for-disaggregated-compute-and-storage-configurations-based-on-ibm-spectrum-scale-storage/ EMS HA in ESS LE (Little Endian) environment https://developer.ibm.com/storage/2018/12/07/ems-ha-in-ess-le-little-endian-environment/ What?s new in ESS 5.3.2 https://developer.ibm.com/storage/2018/12/04/whats-new-in-ess-5-3-2/ Administer your Spectrum Scale cluster easily https://developer.ibm.com/storage/2018/11/13/administer-your-spectrum-scale-cluster-easily/ Disaster Recovery using Spectrum Scale?s Active File Management https://developer.ibm.com/storage/2018/11/13/disaster-recovery-using-spectrum-scales-active-file-management/ Recovery Group Failover Procedure of IBM Elastic Storage Server (ESS) https://developer.ibm.com/storage/2018/10/08/recovery-group-failover-procedure-ibm-elastic-storage-server-ess/ Whats new in IBM Elastic Storage Server (ESS) Version 5.3.1 and 5.3.1.1 https://developer.ibm.com/storage/2018/10/04/whats-new-ibm-elastic-storage-server-ess-version-5-3-1-5-3-1-1/ For more : Search /browse here: https://developer.ibm.com/storage/blog User Group Presentations: https://www.spectrumscale.org/presentations/ Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 10/03/2018 08:48 PM Subject: Latest Technical Blogs on IBM Spectrum Scale (Q3 2018) Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q3 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. How NFS exports became more dynamic with Spectrum Scale 5.0.2 https://developer.ibm.com/storage/2018/10/02/nfs-exports-became-dynamic-spectrum-scale-5-0-2/ HPC storage on AWS (IBM Spectrum Scale) https://developer.ibm.com/storage/2018/10/02/hpc-storage-aws-ibm-spectrum-scale/ Upgrade with Excluding the node(s) using Install-toolkit https://developer.ibm.com/storage/2018/09/30/upgrade-excluding-nodes-using-install-toolkit/ Offline upgrade using Install-toolkit https://developer.ibm.com/storage/2018/09/30/offline-upgrade-using-install-toolkit/ IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/21/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-2/ What?s New in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/15/whats-new-ibm-spectrum-scale-5-0-2/ Starting IBM Spectrum Scale 5.0.2 release, the installation toolkit supports upgrade rerun if fresh upgrade fails. https://developer.ibm.com/storage/2018/09/15/starting-ibm-spectrum-scale-5-0-2-release-installation-toolkit-supports-upgrade-rerun-fresh-upgrade-fails/ IBM Spectrum Scale installation toolkit ? enhancements over releases ? 5.0.2.0 https://developer.ibm.com/storage/2018/09/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases-5-0-2-0/ Announcing HDP 3.0 support with IBM Spectrum Scale https://developer.ibm.com/storage/2018/08/31/announcing-hdp-3-0-support-ibm-spectrum-scale/ IBM Spectrum Scale Tuning Overview for Hadoop Workload https://developer.ibm.com/storage/2018/08/20/ibm-spectrum-scale-tuning-overview-hadoop-workload/ Making the Most of Multicloud Storage https://developer.ibm.com/storage/2018/08/13/making-multicloud-storage/ Disaster Recovery for Transparent Cloud Tiering using SOBAR https://developer.ibm.com/storage/2018/08/13/disaster-recovery-transparent-cloud-tiering-using-sobar/ Your Optimal Choice of AI Storage for Today and Tomorrow https://developer.ibm.com/storage/2018/08/10/spectrum-scale-ai-workloads/ Analyze IBM Spectrum Scale File Access Audit with ELK Stack https://developer.ibm.com/storage/2018/07/30/analyze-ibm-spectrum-scale-file-access-audit-elk-stack/ Mellanox SX1710 40G switch MLAG configuration for IBM ESS https://developer.ibm.com/storage/2018/07/12/mellanox-sx1710-40g-switcher-mlag-configuration/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? SMB and NFS Access issues https://developer.ibm.com/storage/2018/07/10/protocol-problem-determination-guide-ibm-spectrum-scale-smb-nfs-access-issues/ Access Control in IBM Spectrum Scale Object https://developer.ibm.com/storage/2018/07/06/access-control-ibm-spectrum-scale-object/ IBM Spectrum Scale HDFS Transparency Docker support https://developer.ibm.com/storage/2018/07/06/ibm-spectrum-scale-hdfs-transparency-docker-support/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? Log Collection https://developer.ibm.com/storage/2018/07/04/protocol-problem-determination-guide-ibm-spectrum-scale-log-collection/ Redpapers IBM Spectrum Scale Immutability Introduction, Configuration Guidance, and Use Cases http://www.redbooks.ibm.com/abstracts/redp5507.html?Open Certifications Assessment of the immutability function of IBM Spectrum Scale Version 5.0 in accordance to US SEC17a-4f, EU GDPR Article 21 Section 1, German and Swiss laws and regulations in collaboration with KPMG. Certificate: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?DE968667B47544FF83F6CCDCF37E5FB5 Full assessment report: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?B290411BE1224F5A9B4D24663BCD3C5D For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 07/03/2018 12:13 AM Subject: Re: Latest Technical Blogs on Spectrum Scale (Q2 2018) Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q2 2018). We now have over 100+ developer blogs. As discussed in User Groups, passing it along: IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ IBM Spectrum Scale ILM Policies https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/ IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ Management GUI enhancements in IBM Spectrum Scale release 5.0.1 https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/ Managing IBM Spectrum Scale services through GUI https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/ Use AWS CLI with IBM Spectrum Scale? object storage https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/ Hadoop Storage Tiering with IBM Spectrum Scale https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/ How many Files on my Filesystem? https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/ Recording Spectrum Scale Object Stats for Potential Billing like Purpose using Elasticsearch https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/ New features in IBM Elastic Storage Server (ESS) Version 5.3 https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/ Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send earlier) https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19 Redpapers Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering http://www.redbooks.ibm.com/abstracts/redp5411.html?Open SAP HANA and ESS: A Winning Combination (Update) http://www.redbooks.ibm.com/abstracts/redp5436.html?Open Others IBM Spectrum Scale Software Version Recommendation Preventive Service Planning (Updated) http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in HCLS https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN& For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 03/27/2018 05:23 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q1 2018). As discussed in User Groups, passing it along: GDPR Compliance and Unstructured Data Storage https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/ IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and highlights https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/ Management GUI enhancements in IBM Spectrum Scale release 5.0.0 https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/ IBM Spectrum Scale 5.0.0 ? What?s new in NFS? https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/ Benefits and implementation of Spectrum Scale sudo wrappers https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/ IBM Spectrum Scale: Big Data and Analytics Solution Brief https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/ Variant Sub-blocks in Spectrum Scale 5.0 https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/ Compression support in Spectrum Scale 5.0.0 https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM Spectrum Scale on AWS. This solution helps the users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale. Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4 Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY. For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 01/10/2018 12:13 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, Here are list of development blogs in the last quarter. Passing it to this email group as Doris had got a feedback in the UG meetings to notify the members with the latest updates periodically. Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ IBM Spectrum Scale MMFSCK ? Savvy Enhancements https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/ ESS Disk Management https://developer.ibm.com/storage/2018/01/02/ess-disk-management/ IBM Spectrum Scale Object Protocol On Ubuntu https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/ IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/ A Complete Guide to ? Protocol Problem Determination Guide for IBM Spectrum Scale? ? Part 1 https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/ IBM Spectrum Scale installation toolkit ? enhancements over releases https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/ Network requirements in an Elastic Storage Server Setup https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/ Co-resident migration with Transparent cloud tierin https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/ IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big Data Solution https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/ Big data analytics with Spectrum Scale using remote cluster mount & multi-filesystem support https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/ IBM Spectrum Scale HDFS Transparency Short Circuit Write Support https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/ IBM Spectrum Scale HDFS Transparency Federation Support https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/ How to configure and performance tuning different system workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Spark workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning database workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Hadoop workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/ How to Configure IBM Spectrum Scale? with NIS based Authentication. https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 11/16/2017 08:15 PM Subject: Latest Technical Blogs on Spectrum Scale Dear User Group members, Here are the Development Blogs in last 3 months on Spectrum Scale Technical Topics. Spectrum Scale Monitoring ? Know More ? https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/ IBM Spectrum Scale 5.0 Release ? What?s coming ! https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/ Four Essentials things to know for managing data ACLs on IBM Spectrum Scale? from Windows https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/ GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server https://developer.ibm.com/storage/2017/11/13/gssutils/ IBM Spectrum Scale Object Authentication https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/ Video Surveillance ? Choosing the right storage https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/ IBM Spectrum scale object deep dive training with problem determination https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training Spectrum Scale as preferred software defined storage for Ubuntu OpenStack https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/ IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a performance workhorse https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/ A Complete Guide to Configure LDAP-based authentication with IBM Spectrum Scale? for File Access https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/ Deploying IBM Spectrum Scale on AWS Quick Start https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/ Monitoring Spectrum Scale Object metrics https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/ Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk Universal https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/ Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol on IBM Spectrum Scale?? https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory and LDAP https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/ IBM Spectrum Scale? Authentication using Active Directory and RFC2307 https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/ High Availability Implementation with IBM Spectrum Virtualize and IBM Spectrum Scale https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/ 10 Frequently asked Questions on configuring Authentication using AD + AUTO ID mapping on IBM Spectrum Scale?. https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/ Five cool things that you didn?t know Transparent Cloud Tiering on Spectrum Scale can do https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/ IBM Spectrum Scale GUI videos https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/ IBM Spectrum Scale? Authentication ? Planning for NFS Access https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Tue Sep 3 14:07:44 2019 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 3 Sep 2019 15:07:44 +0200 Subject: [gpfsug-discuss] Fileheat - does work! Complete test/example provided here. In-Reply-To: References: Message-ID: Thanks for this example, very userful, but I'm still struggeling a bit at a customer.. We're doing heat daily based rebalancing, with fileheatlosspercent=20 and fileheatperiodminutes=720: RULE "defineTiers" GROUP POOL 'Tiers' IS 'ssdpool' LIMIT(70) then 'saspool' RULE 'Rebalance' MIGRATE FROM POOL 'Tiers' TO POOL 'Tiers' WEIGHT(FILE_HEAT) WHERE FILE_SIZE<10000000000 but are seeing too many files moved down to the saspool and too few are staying in the ssdpool. Right now we ran a test of this policy, and saw that it wanted to move 130k files / 300 GB down to the saspool, and a single small file up to the ssdpool -- even though the ssdpool is only 50% utilized. Running your listing policy reveals lots of files with zero heat: <7> /gpfs/gpfs0/file1 RULE 'fh2' LIST 'fh' WEIGHT(0.000000) SHOW( _NULL_ _NULL_ _NULL_ +0.00000000000000E+000 _NULL_ 720 25 server.locale) <7> /gpfs/gpfs0/file2 RULE 'fh2' LIST 'fh' WEIGHT(0.000000) SHOW( _NULL_ _NULL_ _NULL_ +0.00000000000000E+000 _NULL_ 720 25 server.locale) <7> /gpfs/gpfs0/file3/HM_WVS_8P41017_1/HM_WVS_8P41017_1.S2206 RULE 'fh2' LIST 'fh' WEIGHT(0.000000) SHOW( _NULL_ _NULL_ _NULL_ +0.00000000000000E+000 _NULL_ 720 25 server.locale) and others with heat: <5> /gpfs/gpfs0/file4 RULE 'fh2' LIST 'fh' WEIGHT(0.004246) SHOW( 300401047 0 0 +4.24600492924153E-003 11E7C19700000000 720 25 server.locale) <5> /gpfs/gpfs0/file5 RULE 'fh2' LIST 'fh' WEIGHT(0.001717) SHOW( 120971793 1 0 +1.71725239616613E-003 0735E21100010000 720 25 server.locale) These are not new files -- so we're wondering if maybe the fileheat is reduced to zero/NULL after a while (how many times can it shrink by 25% before it's zero??). Would it make sense to increase fileheatperiodeminutes and/or decrease fileheatlosspercentage? What would be good values? (BTW: we have relatime enabled) Any other ideas for why it won't fill up our ssdpool to close to LIMIT(70) ? -jf On Tue, Aug 13, 2019 at 3:33 PM Marc A Kaplan wrote: > Yes, you are correct. It should only be necessary to set > fileHeatPeriodMinutes, since the loss percent does have a default value. > But IIRC (I implemented part of this!) you must restart the daemon to get > those fileheat parameter(s) "loaded"and initialized into the daemon > processes. > > Not fully trusting my memory... I will now "prove" this works today as > follows: > > To test, create and re-read a large file with dd... > > [root@/main/gpfs-git]$mmchconfig fileHeatPeriodMinutes=60 > mmchconfig: Command successfully completed > ... > [root@/main/gpfs-git]$mmlsconfig | grep -i heat > fileHeatPeriodMinutes 60 > > [root@/main/gpfs-git]$mmshutdown > ... > [root@/main/gpfs-git]$mmstartup > ... > [root@/main/gpfs-git]$mmmount c23 > ... > [root@/main/gpfs-git]$ls -l /c23/10g > -rw-r--r--. 1 root root 10737418240 May 16 15:09 /c23/10g > > [root@/main/gpfs-git]$mmlsattr -d -X /c23/10g > file name: /c23/10g > security.selinux > > (NO fileheat attribute yet...) > > [root@/main/gpfs-git]$dd if=/c23/10g bs=1M of=/dev/null > ... > After the command finishes, you may need to wait a while for the metadata > to flush to the inode on disk ... or you can force that with an unmount or > a mmfsctl... > > Then the fileheat attribute will appear (I just waited by answering > another email... No need to do any explicit operations on the file system..) > > [root@/main/gpfs-git]$mmlsattr -d -X /c23/10g > file name: /c23/10g > security.selinux > gpfs.FileHeat > > To see its hex string value: > > [root@/main/gpfs-git]$mmlsattr -d -X -L /c23/10g > file name: /c23/10g > ... > security.selinux: > 0x756E636F6E66696E65645F753A6F626A6563745F723A756E6C6162656C65645F743A733000 > gpfs.FileHeat: 0x000000EE42A40400 > > Which will be interpreted by mmapplypolicy... > > YES, the interpretation is relative to last access time and current time, > and done by a policy/sql function "computeFileHeat" > (You could find this using m4 directives in your policy file...) > > > define([FILE_HEAT],[computeFileHeat(CURRENT_TIMESTAMP-ACCESS_TIME,xattr('gpfs.FileHeat'),KB_ALLOCATED)]) > > Well gone that far, might as well try mmapplypolicy too.... > > [root@/main/gpfs-git]$cat /gh/policies/fileheat.policy > define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1) > END]) > > rule fh1 external list 'fh' exec '' > rule fh2 list 'fh' weight(FILE_HEAT) > show(DISPLAY_NULL(xattr_integer('gpfs.FileHeat',1,4,'B')) || ' ' || > DISPLAY_NULL(xattr_integer('gpfs.FileHeat',5,2,'B')) || ' ' || > DISPLAY_NULL(xattr_integer('gpfs.FileHeat',7,2,'B')) || ' ' || > DISPLAY_NULL(FILE_HEAT) || ' ' || > DISPLAY_NULL(hex(xattr('gpfs.FileHeat'))) || ' ' || > getmmconfig('fileHeatPeriodMinutes') || ' ' || > getmmconfig('fileHeatLossPercent') || ' ' || > getmmconfig('clusterName') ) > > > [root@/main/gpfs-git]$mmapplypolicy /c23 --maxdepth 1 -P > /gh/policies/fileheat.policy -I test -L 3 > ... > <1> /c23/10g RULE 'fh2' LIST 'fh' WEIGHT(0.022363) SHOW( 238 17060 1024 > +2.23632812500000E-002 000000EE42A40400 60 10 makaplan.sl.cloud9.ibm.com) > ... > WEIGHT(0.022363) LIST 'fh' /c23/10g SHOW(238 17060 1024 > +2.23632812500000E-002 000000EE42A40400 60 10 makaplan.sl.cloud9.ibm.com) > > > > > [image: Inactive hide details for Jan-Frode Myklebust ---08/13/2019 > 06:22:46 AM---What about filesystem atime updates. We recently chan]Jan-Frode > Myklebust ---08/13/2019 06:22:46 AM---What about filesystem atime updates. > We recently changed the default to ?relatime?. Could that maybe > > From: Jan-Frode Myklebust > To: gpfsug main discussion list > Date: 08/13/2019 06:22 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Fileheat > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > > What about filesystem atime updates. We recently changed the default to > ?relatime?. Could that maybe influence heat tracking? > > > > -jf > > > tir. 13. aug. 2019 kl. 11:29 skrev Ulrich Sibiller < > *u.sibiller at science-computing.de* >: > > On 12.08.19 15:38, Marc A Kaplan wrote: > > My Admin guide says: > > > > The loss percentage and period are set via the configuration > > variables *fileHeatLossPercent *and *fileHeatPeriodMinutes*. By > default, the file access temperature > > is not > > tracked. To use access temperature in policy, the tracking must > first be enabled. To do this, set > > the two > > configuration variables as follows:* > > Yes, I am aware of that. > > > fileHeatLossPercent* > > The percentage (between 0 and 100) of file access temperature > dissipated over the* > > fileHeatPeriodMinutes *time. The default value is 10. > > Chapter 25. Information lifecycle management for IBM Spectrum Scale > *361** > > fileHeatPeriodMinutes* > > The number of minutes defined for the recalculation of file access > temperature. To turn on > > tracking, *fileHeatPeriodMinutes *must be set to a nonzero value. > The default value is 0 > > > > > > SO Try setting both! > > Well, I have not because the documentation explicitly mentions a > default. What's the point of a > default if I have to explicitly configure it? > > > ALSO to take effect you may have to mmshutdown and mmstartup, at > least on the (client gpfs) nodes > > that are accessing the files of interest. > > I have now configured both parameters and restarted GPFS. Ran a tar > over a directory - still no > change. I will wait for 720minutes and retry (tomorrow). > > Thanks > > Uli > > -- > Science + Computing AG > Vorstandsvorsitzender/Chairman of the board of management: > Dr. Martin Matzke > Vorstand/Board of Management: > Matthias Schempp, Sabine Hohenstein > Vorsitzender des Aufsichtsrats/ > Chairman of the Supervisory Board: > Philippe Miltin > Aufsichtsrat/Supervisory Board: > Martin Wibbe, Ursula Morgenstern > Sitz/Registered Office: Tuebingen > Registergericht/Registration Court: Stuttgart > Registernummer/Commercial Register No.: HRB 382196 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Tue Sep 3 16:37:58 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 3 Sep 2019 15:37:58 +0000 Subject: [gpfsug-discuss] Easiest way to copy quota settings from one file system to another? Message-ID: <63C132C3-63AF-465B-8FD9-67AF9EA4887D@nuance.com> I?m migratinga file system from one cluster to another. I want to copy all user quotas from cluster1 filesystem ?A? to cluster2, filesystem ?fs1?, fileset ?A? What?s the easiest way to do that? I?m thinking mmsetquota with a stanza file, but is there a tool to generate the stanza file from the source? I could do a ?mmrepquota -u -Y? and process the output. Hoping for something easier :) Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.mattsson at maxiv.lu.se Thu Sep 5 10:54:04 2019 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Thu, 5 Sep 2019 09:54:04 +0000 Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction Message-ID: <3ed969d0d778446982a419067320f927@maxiv.lu.se> Hi, Does anyone here know if cache eviction on a AFM cache also make the inodes used by the evicted files available for reuse? Basically, I'm trying to figure out if it is enough to have sufficient inode space in my cache filesets to keep the maximum expected simultaneously cached files, or if I need the same inode space as for the total amount of files that will reside in the home of the cache. Regards, Andreas Mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Thu Sep 5 14:28:00 2019 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Thu, 5 Sep 2019 18:58:00 +0530 Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction In-Reply-To: <3ed969d0d778446982a419067320f927@maxiv.lu.se> References: <3ed969d0d778446982a419067320f927@maxiv.lu.se> Message-ID: Hi, AFM does not support inode eviction, only data blocks are evicted and the file's metadata will remain in the fileset. ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: GPFS User Group Date: 09/05/2019 03:39 PM Subject: [EXTERNAL] [gpfsug-discuss] Inode reuse on AFM cache eviction Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Does anyone here know if cache eviction on a AFM cache also make the inodes used by the evicted files available for reuse? Basically, I'm trying to figure out if it is enough to have sufficient inode space in my cache filesets to keep the maximum expected simultaneously cached files, or if I need the same inode space as for the total amount of files that will reside in the home of the cache. Regards, Andreas Mattsson ____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=5omqUvEiiIKUhShJOBEgb3WwLU5uy-8o_4--y0TOuw0&s=ZFAcjvG5LrsnsCJgIf9f1320V866HKG6iJGteRQ7oac&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4232 bytes Desc: not available URL: From sakkuma4 at in.ibm.com Thu Sep 5 19:37:47 2019 From: sakkuma4 at in.ibm.com (Saket Kumar11) Date: Thu, 5 Sep 2019 18:37:47 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 92, Issue 4 In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From sakkuma4 at in.ibm.com Thu Sep 5 20:06:17 2019 From: sakkuma4 at in.ibm.com (Saket Kumar11) Date: Thu, 5 Sep 2019 19:06:17 +0000 Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From son.truong at bristol.ac.uk Fri Sep 6 10:48:56 2019 From: son.truong at bristol.ac.uk (Son Truong) Date: Fri, 6 Sep 2019 09:48:56 +0000 Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 Message-ID: Hello, Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on RHEL 7.7? I am failing with these errors: [root at host ~]# uname -a Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux [root at host ~]# rpm -qa | grep gpfs gpfs.base-4.2.3-7.x86_64 gpfs.gskit-8.0.50-75.x86_64 gpfs.ext-4.2.3-7.x86_64 gpfs.msg.en_US-4.2.3-7.noarch gpfs.docs-4.2.3-7.noarch gpfs.gpl-4.2.3-7.noarch [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl -------------------------------------------------------- mmbuildgpl: Building GPL module begins at Fri Sep 6 09:30:20 UTC 2019. -------------------------------------------------------- Verifying Kernel Header... kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64, 3.10.0-1062) module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include module build dir = /lib/modules/3.10.0-1062.el7.x86_64/build kernel source dir = /usr/src/linux-3.10.0-1062.el7.x86_64/include Found valid kernel header file under /usr/src/kernels/3.10.0-1062.el7.x86_64/include Verifying Compiler... make is present at /bin/make cpp is present at /bin/cpp gcc is present at /bin/gcc g++ is present at /bin/g++ ld is present at /bin/ld Verifying Additional System Headers... Verifying kernel-headers is installed ... Command: /bin/rpm -q kernel-headers The required package kernel-headers is installed make World ... Verifying that tools to build the portability layer exist.... cpp present gcc present g++ present ld present cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit $? || exit 1 rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver cleaning (/usr/lpp/mmfs/src/ibm-kxi) make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi' rm -f trcid.h ibm_kxi.trclst [cut] Invoking Kbuild... /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config ; \ if [ $? -ne 0 ]; then \ exit 1;\ fi make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' LD /usr/lpp/mmfs/src/gpl-linux/built-in.o CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0, from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: /usr/lpp/mmfs/src/gpl-linux/inode.c: In function 'printInode': /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: 'struct inode' has no member named 'i_wb_list' _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP->i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev)); ^ /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro '_TRACE_MACRO' { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP [ cut ] ^ /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro 'TRACE6' TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8, ^ In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: /usr/lpp/mmfs/src/gpl-linux/inode.c: In function 'cxiInitInodeSecurity': /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of 'security_old_inode_init_security' from incompatible pointer type [enabled by default] rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name, ^ In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0, from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61, from /usr/lpp/mmfs/src/gpl-linux/dir.c:56, from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: include/linux/security.h:1896:5: note: expected 'const char **' but argument is of type 'char **' int security_old_inode_init_security(struct inode *inode, struct inode *dir, ^ In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function 'cache_get_name': /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration of function 'vfs_readdir' [-Werror=implicit-function-declaration] error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer); ^ cc1: some warnings being treated as errors make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1 make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2 make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' make[1]: *** [modules] Error 1 make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' make: *** [Modules] Error 1 -------------------------------------------------------- mmbuildgpl: Building GPL module failed at Fri Sep 6 09:30:28 UTC 2019. -------------------------------------------------------- mmbuildgpl: Command failed. Examine previous error messages to determine cause. Any help appreciated... Son Son V Truong - Senior Storage Administrator Advanced Computing Research Centre IT Services, University of Bristol Email: son.truong at bristol.ac.uk Tel: Mobile: +44 (0) 7732 257 232 Address: 31 Great George Street, Bristol, BS1 5QD -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Sep 6 11:24:51 2019 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Fri, 6 Sep 2019 06:24:51 -0400 Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 In-Reply-To: References: Message-ID: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu> We are starting rolling upgrade to 5.0.3-x and gplbin compiles with non-fatal warnings at that version. It seems to run fine. The rest of the cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do you have a reason to not go for the latest release on either the 4- or 5- line? [root at xxx ~]# ssh node1301 rpm -q gpfs.base gpfs.base-4.2.3-10.x86_64 -- ddj Dave Johnson > On Sep 6, 2019, at 5:48 AM, Son Truong wrote: > > Hello, > > Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on RHEL 7.7? > > I am failing with these errors: > > [root at host ~]# uname -a > Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux > > [root at host ~]# rpm -qa | grep gpfs > gpfs.base-4.2.3-7.x86_64 > gpfs.gskit-8.0.50-75.x86_64 > gpfs.ext-4.2.3-7.x86_64 > gpfs.msg.en_US-4.2.3-7.noarch > gpfs.docs-4.2.3-7.noarch > gpfs.gpl-4.2.3-7.noarch > > [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl > -------------------------------------------------------- > mmbuildgpl: Building GPL module begins at Fri Sep 6 09:30:20 UTC 2019. > -------------------------------------------------------- > Verifying Kernel Header... > kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64, 3.10.0-1062) > module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include > module build dir = /lib/modules/3.10.0-1062.el7.x86_64/build > kernel source dir = /usr/src/linux-3.10.0-1062.el7.x86_64/include > Found valid kernel header file under /usr/src/kernels/3.10.0-1062.el7.x86_64/include > Verifying Compiler... > make is present at /bin/make > cpp is present at /bin/cpp > gcc is present at /bin/gcc > g++ is present at /bin/g++ > ld is present at /bin/ld > Verifying Additional System Headers... > Verifying kernel-headers is installed ... > Command: /bin/rpm -q kernel-headers > The required package kernel-headers is installed > make World ... > Verifying that tools to build the portability layer exist.... > cpp present > gcc present > g++ present > ld present > cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit $? || exit 1 > rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib > mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib > rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver > cleaning (/usr/lpp/mmfs/src/ibm-kxi) > make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi' > rm -f trcid.h ibm_kxi.trclst > > [cut] > > Invoking Kbuild... > /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config ; \ > if [ $? -ne 0 ]; then \ > exit 1;\ > fi > make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > LD /usr/lpp/mmfs/src/gpl-linux/built-in.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o > LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o > LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o > In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0, > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?: > /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? has no member named ?i_wb_list? > _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP->i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev)); > ^ > /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro ?_TRACE_MACRO? > { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP > > [ cut ] > > ^ > /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro ?TRACE6? > TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8, > ^ > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0, > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?: > /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of ?security_old_inode_init_security? from incompatible pointer type [enabled by default] > rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name, > ^ > In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0, > from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61, > from /usr/lpp/mmfs/src/gpl-linux/dir.c:56, > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > include/linux/security.h:1896:5: note: expected ?const char **? but argument is of type ?char **? > int security_old_inode_init_security(struct inode *inode, struct inode *dir, > ^ > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0, > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?: > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration of function ?vfs_readdir? [-Werror=implicit-function-declaration] > error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer); > ^ > cc1: some warnings being treated as errors > make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1 > make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2 > make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > make[1]: *** [modules] Error 1 > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' > make: *** [Modules] Error 1 > -------------------------------------------------------- > mmbuildgpl: Building GPL module failed at Fri Sep 6 09:30:28 UTC 2019. > -------------------------------------------------------- > mmbuildgpl: Command failed. Examine previous error messages to determine cause. > > Any help appreciated? > Son > > Son V Truong - Senior Storage Administrator > Advanced Computing Research Centre > IT Services, University of Bristol > Email: son.truong at bristol.ac.uk > Tel: Mobile: +44 (0) 7732 257 232 > Address: 31 Great George Street, Bristol, BS1 5QD > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Fri Sep 6 12:41:32 2019 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Fri, 6 Sep 2019 11:41:32 +0000 Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 In-Reply-To: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu> References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>, Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15677537609150.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15677537609151.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15677537609152.png Type: image/png Size: 1134 bytes Desc: not available URL: From Dugan.Witherick at warwick.ac.uk Fri Sep 6 13:25:22 2019 From: Dugan.Witherick at warwick.ac.uk (Witherick, Dugan) Date: Fri, 6 Sep 2019 12:25:22 +0000 Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 In-Reply-To: References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu> , Message-ID: <05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk> Hi Son, You might also find Table 39 on https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#fsm useful as it lists the minimum Spectrum Scale Level supported and tested against the RHEL Distribution/kernel version. Thanks, Dugan On Fri, 2019-09-06 at 11:41 +0000, Alexander Wolf wrote: > RHEL 7.7 is not supported by any Scale release at the moment. We are > qualifying it right now and would like to claim support with the next PTFs on > both 4.2.3 and 5.0.3 streams. However we have seen issues in test that will > probably cause delays. > > Picking up new minor RHEL updates before Scale claims support might work many > times but is quite a risky business. I highly recommend waiting for our > support statement. > > Mit freundlichen Gr??en / Kind regards > > > > > > Dr. Alexander Wolf-Reber > Spectrum Scale Release Lead Architect > Department M069 / Spectrum Scale Software Development > > +49-160-90540880 > a.wolf-reber at de.ibm.com > > IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: > Matthias Hartmann / Gesch?ftsf?hrung: Dirk Wittkopp > Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB > 243294 > > > > > ----- Original message ----- > > From: david_johnson at brown.edu > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > To: gpfsug main discussion list > > Cc: > > Subject: [EXTERNAL] Re: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 > > Date: Fri, Sep 6, 2019 12:33 > > > > We are starting rolling upgrade to 5.0.3-x and gplbin compiles with non- > > fatal warnings at that version. It seems to run fine. The rest of the > > cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do you have a > > reason to not go for the latest release on either the 4- or 5- line? > > > > [root at xxx ~]# ssh node1301 rpm -q gpfs.base > > gpfs.base-4.2.3-10.x86_64 > > > > > > -- ddj > > Dave Johnson > > > > On Sep 6, 2019, at 5:48 AM, Son Truong wrote: > > > > > Hello, > > > > > > Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on > > > RHEL 7.7? > > > > > > I am failing with these errors: > > > > > > [root at host ~]# uname -a > > > Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 > > > x86_64 x86_64 x86_64 GNU/Linux > > > > > > [root at host ~]# rpm -qa | grep gpfs > > > gpfs.base-4.2.3-7.x86_64 > > > gpfs.gskit-8.0.50-75.x86_64 > > > gpfs.ext-4.2.3-7.x86_64 > > > gpfs.msg.en_US-4.2.3-7.noarch > > > gpfs.docs-4.2.3-7.noarch > > > gpfs.gpl-4.2.3-7.noarch > > > > > > [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl > > > -------------------------------------------------------- > > > mmbuildgpl: Building GPL module begins at Fri Sep 6 09:30:20 UTC 2019. > > > -------------------------------------------------------- > > > Verifying Kernel Header... > > > kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64, > > > 3.10.0-1062) > > > module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include > > > module build dir = /lib/modules/3.10.0-1062.el7.x86_64/build > > > kernel source dir = /usr/src/linux-3.10.0-1062.el7.x86_64/include > > > Found valid kernel header file under /usr/src/kernels/3.10.0- > > > 1062.el7.x86_64/include > > > Verifying Compiler... > > > make is present at /bin/make > > > cpp is present at /bin/cpp > > > gcc is present at /bin/gcc > > > g++ is present at /bin/g++ > > > ld is present at /bin/ld > > > Verifying Additional System Headers... > > > Verifying kernel-headers is installed ... > > > Command: /bin/rpm -q kernel-headers > > > The required package kernel-headers is installed > > > make World ... > > > Verifying that tools to build the portability layer exist.... > > > cpp present > > > gcc present > > > g++ present > > > ld present > > > cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit > > > $? || exit 1 > > > rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin > > > /usr/lpp/mmfs/src/lib > > > mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin > > > /usr/lpp/mmfs/src/lib > > > rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver > > > cleaning (/usr/lpp/mmfs/src/ibm-kxi) > > > make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi' > > > rm -f trcid.h ibm_kxi.trclst > > > > > > [cut] > > > > > > Invoking Kbuild... > > > /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64 > > > M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config ; \ > > > if [ $? -ne 0 ]; then \ > > > exit 1;\ > > > fi > > > make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > > > LD /usr/lpp/mmfs/src/gpl-linux/built-in.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o > > > LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o > > > LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o > > > In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?: > > > /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? has no > > > member named ?i_wb_list? > > > _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)), > > > (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP- > > > >i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev)); > > > ^ > > > /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro > > > _TRACE_MACRO? > > > { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP > > > > > > [ cut ] > > > > > > ^ > > > /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro > > > ?TRACE6? > > > TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8, > > > ^ > > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of > > > ?security_old_inode_init_security? from incompatible pointer type [enabled > > > by default] > > > rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name, > > > ^ > > > In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0, > > > from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61, > > > from /usr/lpp/mmfs/src/gpl-linux/dir.c:56, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > include/linux/security.h:1896:5: note: expected ?const char **? but > > > argument is of type ?char **? > > > int security_old_inode_init_security(struct inode *inode, struct inode > > > *dir, > > > ^ > > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?: > > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration > > > of function ?vfs_readdir? [-Werror=implicit-function-declaration] > > > error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer); > > > ^ > > > cc1: some warnings being treated as errors > > > make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1 > > > make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2 > > > make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > > > make[1]: *** [modules] Error 1 > > > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' > > > make: *** [Modules] Error 1 > > > -------------------------------------------------------- > > > mmbuildgpl: Building GPL module failed at Fri Sep 6 09:30:28 UTC 2019. > > > -------------------------------------------------------- > > > mmbuildgpl: Command failed. Examine previous error messages to determine > > > cause. > > > > > > Any help appreciated? > > > Son > > > > > > Son V Truong - Senior Storage Administrator > > > Advanced Computing Research Centre > > > IT Services, University of Bristol > > > Email: son.truong at bristol.ac.uk > > > Tel: Mobile: +44 (0) 7732 257 232 > > > Address: 31 Great George Street, Bristol, BS1 5QD > > > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From son.truong at bristol.ac.uk Fri Sep 6 15:15:04 2019 From: son.truong at bristol.ac.uk (Son Truong) Date: Fri, 6 Sep 2019 14:15:04 +0000 Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 In-Reply-To: <05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk> References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu> , <05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk> Message-ID: Thank you. Table 39 is most helpful. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Witherick, Dugan Sent: 06 September 2019 13:25 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 Hi Son, You might also find Table 39 on https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#fsm useful as it lists the minimum Spectrum Scale Level supported and tested against the RHEL Distribution/kernel version. Thanks, Dugan On Fri, 2019-09-06 at 11:41 +0000, Alexander Wolf wrote: > RHEL 7.7 is not supported by any Scale release at the moment. We are > qualifying it right now and would like to claim support with the next > PTFs on both 4.2.3 and 5.0.3 streams. However we have seen issues in > test that will probably cause delays. > > Picking up new minor RHEL updates before Scale claims support might > work many times but is quite a risky business. I highly recommend > waiting for our support statement. > > Mit freundlichen Gr??en / Kind regards > > > > > > Dr. Alexander Wolf-Reber > Spectrum Scale Release Lead Architect > Department M069 / Spectrum Scale Software Development > > +49-160-90540880 > a.wolf-reber at de.ibm.com > > IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: > Matthias Hartmann / Gesch?ftsf?hrung: Dirk Wittkopp Sitz der > Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB > 243294 > > > > > ----- Original message ----- > > From: david_johnson at brown.edu > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > To: gpfsug main discussion list > > Cc: > > Subject: [EXTERNAL] Re: [gpfsug-discuss] Compiling gplbin on RHEL > > 7.7 > > Date: Fri, Sep 6, 2019 12:33 > > > > We are starting rolling upgrade to 5.0.3-x and gplbin compiles with > > non- fatal warnings at that version. It seems to run fine. The rest > > of the cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do > > you have a reason to not go for the latest release on either the 4- or 5- line? > > > > [root at xxx ~]# ssh node1301 rpm -q gpfs.base > > gpfs.base-4.2.3-10.x86_64 > > > > > > -- ddj > > Dave Johnson > > > > On Sep 6, 2019, at 5:48 AM, Son Truong wrote: > > > > > Hello, > > > > > > Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel > > > modules on RHEL 7.7? > > > > > > I am failing with these errors: > > > > > > [root at host ~]# uname -a > > > Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC > > > 2019 > > > x86_64 x86_64 x86_64 GNU/Linux > > > > > > [root at host ~]# rpm -qa | grep gpfs > > > gpfs.base-4.2.3-7.x86_64 > > > gpfs.gskit-8.0.50-75.x86_64 > > > gpfs.ext-4.2.3-7.x86_64 > > > gpfs.msg.en_US-4.2.3-7.noarch > > > gpfs.docs-4.2.3-7.noarch > > > gpfs.gpl-4.2.3-7.noarch > > > > > > [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl > > > -------------------------------------------------------- > > > mmbuildgpl: Building GPL module begins at Fri Sep 6 09:30:20 UTC 2019. > > > -------------------------------------------------------- > > > Verifying Kernel Header... > > > kernel version = 31000999 (31000999000000, > > > 3.10.0-1062.el7.x86_64, > > > 3.10.0-1062) > > > module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include > > > module build dir = /lib/modules/3.10.0-1062.el7.x86_64/build > > > kernel source dir = /usr/src/linux-3.10.0-1062.el7.x86_64/include > > > Found valid kernel header file under /usr/src/kernels/3.10.0- > > > 1062.el7.x86_64/include Verifying Compiler... > > > make is present at /bin/make > > > cpp is present at /bin/cpp > > > gcc is present at /bin/gcc > > > g++ is present at /bin/g++ > > > ld is present at /bin/ld > > > Verifying Additional System Headers... > > > Verifying kernel-headers is installed ... > > > Command: /bin/rpm -q kernel-headers > > > The required package kernel-headers is installed make World > > > ... > > > Verifying that tools to build the portability layer exist.... > > > cpp present > > > gcc present > > > g++ present > > > ld present > > > cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > > > > ./def.mk; exit $? || exit 1 rm -rf /usr/lpp/mmfs/src/include > > > /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib mkdir > > > /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin > > > /usr/lpp/mmfs/src/lib rm -f > > > //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver > > > cleaning (/usr/lpp/mmfs/src/ibm-kxi) > > > make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi' > > > rm -f trcid.h ibm_kxi.trclst > > > > > > [cut] > > > > > > Invoking Kbuild... > > > /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 > > > ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux > > > CONFIGDIR=/usr/lpp/mmfs/src/config ; \ if [ $? -ne 0 ]; then \ > > > exit 1;\ > > > fi > > > make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > > > LD /usr/lpp/mmfs/src/gpl-linux/built-in.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o > > > LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o > > > LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o > > > In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?: > > > /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? > > > has no member named ?i_wb_list? > > > _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), > > > (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), > > > (Int64)(iP->i_wb_list.prev), (Int64)(&(iP- > > > >i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev)); > > > ^ > > > /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition > > > of macro _TRACE_MACRO? > > > { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP > > > > > > [ cut ] > > > > > > ^ > > > /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of > > > macro ?TRACE6? > > > TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8, > > > ^ > > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing > > > argument 4 of ?security_old_inode_init_security? from incompatible > > > pointer type [enabled by default] > > > rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name, > > > ^ > > > In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0, > > > from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61, > > > from /usr/lpp/mmfs/src/gpl-linux/dir.c:56, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > include/linux/security.h:1896:5: note: expected ?const char **? > > > but argument is of type ?char **? > > > int security_old_inode_init_security(struct inode *inode, struct > > > inode *dir, > > > ^ > > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?: > > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit > > > declaration of function ?vfs_readdir? [-Werror=implicit-function-declaration] > > > error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer); > > > ^ > > > cc1: some warnings being treated as errors > > > make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1 > > > make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2 > > > make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > > > make[1]: *** [modules] Error 1 > > > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' > > > make: *** [Modules] Error 1 > > > -------------------------------------------------------- > > > mmbuildgpl: Building GPL module failed at Fri Sep 6 09:30:28 UTC 2019. > > > -------------------------------------------------------- > > > mmbuildgpl: Command failed. Examine previous error messages to > > > determine cause. > > > > > > Any help appreciated? > > > Son > > > > > > Son V Truong - Senior Storage Administrator Advanced Computing > > > Research Centre IT Services, University of Bristol > > > Email: son.truong at bristol.ac.uk > > > Tel: Mobile: +44 (0) 7732 257 232 > > > Address: 31 Great George Street, Bristol, BS1 5QD > > > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Robert.Oesterlin at nuance.com Fri Sep 6 16:42:39 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 6 Sep 2019 15:42:39 +0000 Subject: [gpfsug-discuss] SSUG Meeting at SC19: Save the date and call for user talks! Message-ID: The Spectrum Scale User group will hold its annual meeting at SC19 on Sunday November 17th from 12:30PM -6PM In Denver, Co. We will be posting exact meeting location soon, but reserve this time. IBM will host a reception following the user group meeting. We?re also looking for user talks - these are short update (20 mins or so) on your use of Spectrum Scale - any topics are welcome. If you are interested, please contact myself or Kristy Kallback-Rose. Looking forward to seeing everyone in Denver! Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From bipcuds at gmail.com Mon Sep 9 21:29:28 2019 From: bipcuds at gmail.com (Keith Ball) Date: Mon, 9 Sep 2019 16:29:28 -0400 Subject: [gpfsug-discuss] Anyone have experience with changing NSD server node name in an ESS/DSS cluster? Message-ID: Hi All, We are thinking of attempting a non-destructive change of NSD server node names in a Lenovo DSS cluster (DSS level 1.2a, which has Scale 4.2.3.5). For a non-GNR cluster, changing a node name for an NSD server isn't a huge deal if you can have a backup server serve up disks; one can mmdelnode then mmaddnode, for instance. Has anyone tried to rename the NSD servers in a GNR cluster, however? I am not sure if it's as easy as failing over the recovery group, and deleting/adding the NSD server. It's easy enough to modify xcat. Perhaps mmchrecoverygroup can be used to change the RG names (since they are named after the NSD servers), but that might not be necessary. Or, it might not work - does anyone know if there is a special process to change NSD server names in an E( or D or G)SS cluster that does not run afoul of GNR or upgrade scripts? Best regards, Keith -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Wed Sep 11 13:20:22 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 11 Sep 2019 14:20:22 +0200 Subject: [gpfsug-discuss] Save the date: Oct 10 - Spectrum Scale NYC User Meeting Message-ID: Greetings, NYU Langone and IBM will host a Spectrum Scale User Meeting on October 10. Many senior engineers of our development lab in Poughkeepsie will attend and present. Details with agenda, exact location and registration link will follow. Best Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjvilla at nccs.nasa.gov Wed Sep 11 20:14:12 2019 From: jjvilla at nccs.nasa.gov (John J. Villa) Date: Wed, 11 Sep 2019 15:14:12 -0400 (EDT) Subject: [gpfsug-discuss] Introduction - New Subscriber Message-ID: Hello, My name is John Villa. I work for NASA at the Nasa Center for Climate Simulation. We currently utilize GPFS as the primary filesystem on the discover cluster: https://www.nccs.nasa.gov/systems/discover I look forward to seeing everyone at SC19. Thank You, -- John J. Villa NASA Center for Climate Simulation Discover Systems Administrator From damir.krstic at gmail.com Thu Sep 12 15:16:03 2019 From: damir.krstic at gmail.com (Damir Krstic) Date: Thu, 12 Sep 2019 09:16:03 -0500 Subject: [gpfsug-discuss] VerbsReconnectThread waiters Message-ID: On my cluster I have seen couple of long waiters such as this: gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230 VerbsReconnectThread: delaying for 43.145624000 more seconds, reason: delaying for next reconnect attempt I tried searching on gpfs wiki for this type of waiter, but was unable to find anything of value. Is this something to pay attention to, and what does this waiter mean? Thank you. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From george at markomanolis.com Thu Sep 12 16:10:58 2019 From: george at markomanolis.com (George Markomanolis) Date: Thu, 12 Sep 2019 11:10:58 -0400 Subject: [gpfsug-discuss] Call for Submission for the IO500 List Message-ID: Call for Submission *Deadline*: 10 November 2019 AoE The IO500 is now accepting and encouraging submissions for the upcoming 5th IO500 list revealed at SC19 in Denver, Colorado. Once again, we are also accepting submissions to the 10 Node I/O Challenge to encourage submission of small scale results. The new ranked lists will be announced at our SC19 BoF [2]. We hope to see you, and your results, there. We have updated our submission rules [3]. This year, we will have a new list for the Student Cluster Competition as IO500 is used for extra points during this competition The benchmark suite is designed to be easy to run and the community has multiple active support channels to help with any questions. Please submit and we look forward to seeing many of you at SC19! Please note that submissions of all sizes are welcome; the site has customizable sorting so it is possible to submit on a small system and still get a very good per-client score for example. Additionally, the list is about much more than just the raw rank; all submissions help the community by collecting and publishing a wider corpus of data. More details below. Following the success of the Top500 in collecting and analyzing historical trends in supercomputer technology and evolution, the IO500 was created in 2017, published its first list at SC17, and has grown exponentially since then. The need for such an initiative has long been known within High-Performance Computing; however, defining appropriate benchmarks had long been challenging. Despite this challenge, the community, after long and spirited discussion, finally reached consensus on a suite of benchmarks and a metric for resolving the scores into a single ranking. The multi-fold goals of the benchmark suite are as follows: 1. Maximizing simplicity in running the benchmark suite 2. Encouraging complexity in tuning for performance 3. Allowing submitters to highlight their ?hero run? performance numbers 4. Forcing submitters to simultaneously report performance for challenging IO patterns. Specifically, the benchmark suite includes a hero-run of both IOR and mdtest configured however possible to maximize performance and establish an upper-bound for performance. It also includes an IOR and mdtest run with highly prescribed parameters in an attempt to determine a lower-bound. Finally, it includes a namespace search as this has been determined to be a highly sought-after feature in HPC storage systems that have historically not been well-measured. Submitters are encouraged to share their tuning insights for publication. The goals of the community are also multi-fold: 1. Gather historical data for the sake of analysis and to aid predictions of storage futures 2. Collect tuning information to share valuable performance optimizations across the community 3. Encourage vendors and designers to optimize for workloads beyond ?hero runs? 4. Establish bounded expectations for users, procurers, and administrators 10 Node I/O Challenge At SC, we will continue the 10 Node Challenge. This challenge is conducted using the regular IO500 benchmark, however, with the rule that exactly *10 computes nodes* must be used to run the benchmark (one exception is the find, which may use 1 node). You may use any shared storage with, e.g., any number of servers. We will announce the result in a separate derived list and in the full list but not on the ranked IO500 list at io500.org. Birds-of-a-feather Once again, we encourage you to submit [1], to join our community, and to attend our BoF ?The IO500 and the Virtual Institute of I/O? at SC19, November 19th, 12:15-1:15pm, room 205-207, where we will announce the new IO500 list, the 10 node challenge list, and the Student Cluster Competition list. We look forward to answering any questions or concerns you might have. [1] http://io500.org/submission [2] *https://www.vi4io.org/io500/bofs/sc19/start * [3] https://www.vi4io.org/io500/rules/submission The IO500 committee -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Thu Sep 12 20:19:20 2019 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 12 Sep 2019 12:19:20 -0700 Subject: [gpfsug-discuss] Hold the Date - September 23 and 24 - REGISTRATION CLOSING SOON In-Reply-To: <938EC571-B900-42BC-8465-3E666912533F@lbl.gov> References: <3F2B08E9-C6E3-412B-9308-D79E3480C5DA@lbl.gov> <938EC571-B900-42BC-8465-3E666912533F@lbl.gov> Message-ID: Reminder, registration closing on 9/16 EOB. That?s real soon now. Hope to see you there. Details below. > On Aug 29, 2019, at 7:30 PM, Kristy Kallback-Rose wrote: > > Hello, > > You will now find the nearly complete agenda here: > > https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/ > > As noted before, the event is free, but please do register below to help with catering planning. > > You can find more information about the full HPCXXL event here: http://hpcxxl.org/ > > Any questions let us know. Hope to see you there! > > -Kristy > >> On Jul 2, 2019, at 10:45 AM, Kristy Kallback-Rose > wrote: >> >> Hello, >> >> HPCXXL will be hosted by NERSC (Berkeley, CA) this September. As part of this event, there will be approximately a day and a half on GPFS content. We have done this type of event in the past, and as before, the GPFS days will be free to attend, but you do need to register. >> >> We?ll have more details soon, mark your calendars. >> >> Initial details: https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/ >> >> Best, >> Kristy > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Fri Sep 13 09:48:58 2019 From: Greg.Lehmann at csiro.au (Lehmann, Greg (IM&T, Pullenvale)) Date: Fri, 13 Sep 2019 08:48:58 +0000 Subject: [gpfsug-discuss] infiniband fabric instability effects Message-ID: Hi All, I was wondering what effect restarting the subnet manager has on an active Spectrum Scale filesystem. Is there any scope for data loss or corruption? A 2nd similar scenario of slightly longer duration is failover to a secondary subnet manager because the primary has crashed. What effect would that have on the filesystem? Cheers, Greg Lehmann Senior High Performance Data Specialist Data Services | Scientific Computing Platforms Information Management and Technology | CSIRO Greg.Lehmann at csiro.au | +61 7 3327 4137 | 1 Technology Court, Pullenvale, QLD 4069 CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and we pay our respects to their Elders past and present. The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference. Please consider the environment before printing this email. CSIRO Australia's National Science Agency | csiro.au -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Sep 13 10:14:06 2019 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Fri, 13 Sep 2019 05:14:06 -0400 Subject: [gpfsug-discuss] infiniband fabric instability effects In-Reply-To: References: Message-ID: <21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu> Restarting subnet manager in general is fairly harmless. It will cause a heavy sweep of the fabric when it comes back up, but there should be no LID renumbering. Traffic may be held up during the scanning and rebuild of the routing tables. Losing a subnet manager for a period of time would prevent newly booted nodes from receiving a LID but existing nodes will continue to function. Adding or deleting inter-switch links should probably be avoided if the subnet manager is down. I would also avoid changing the routing algorithm while in production. Moving a non ha subnet manager from primary to backup and back again has worked for us without disruption, but I would try to do this in a maintenance window. -- ddj Dave Johnson > On Sep 13, 2019, at 4:48 AM, Lehmann, Greg (IM&T, Pullenvale) wrote: > > Hi All, > I was wondering what effect restarting the subnet manager has on an active Spectrum Scale filesystem. Is there any scope for data loss or corruption? A 2nd similar scenario of slightly longer duration is failover to a secondary subnet manager because the primary has crashed. What effect would that have on the filesystem? > > Cheers, > > Greg Lehmann > Senior High Performance Data Specialist > Data Services | Scientific Computing Platforms > Information Management and Technology | CSIRO > Greg.Lehmann at csiro.au | +61 7 3327 4137 | > 1 Technology Court, Pullenvale, QLD 4069 > > CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and we pay our respects to their Elders past and present. > > The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference. > > Please consider the environment before printing this email. > > CSIRO Australia?s National Science Agency | csiro.au > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Sep 13 10:48:52 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 13 Sep 2019 09:48:52 +0000 Subject: [gpfsug-discuss] infiniband fabric instability effects In-Reply-To: <21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu> References: <21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu> Message-ID: On Fri, 2019-09-13 at 05:14 -0400, david_johnson at brown.edu wrote: [SNIP] > Moving a non ha subnet manager from primary to backup and back again > has worked for us without disruption, but I would try to do this in a > maintenance window. > Not on GPFS but in the past I have moved from one subnet manager to another with dozens of running MPI jobs, and Lustre running over the fabric and not missed a beat. My current cluster used 10 and 40Gbps ethernet for GPFS with Omnipath exclusively for MPI traffic. To be honest I just cannot wrap my head around the idea that you would not be running two subnet managers in the first place. Just fire up two subnet managers (whether on a switch or a node) and forget about it. They will automatically work together to give you a HA solution. It is the same with Omnipath too. I would also note that you can fire up more than two fabric managers and it all "just works". If it where me and I didn't have fabric managers running on at least two of my switches and I was doing GPFS over Infiniband, I would fire up fabric managers on all of my NSD servers. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From heinrich.billich at id.ethz.ch Fri Sep 13 15:56:07 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Fri, 13 Sep 2019 14:56:07 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Message-ID: Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level? Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* From ewahl at osc.edu Fri Sep 13 16:42:30 2019 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 13 Sep 2019 15:42:30 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: Message-ID: I recall looking at this a year or two back. Ganesha is either v4 and v6 both (ie: the encapsulation you see), OR ipv4 ONLY. (ie: /etc/modprobe.d/ipv6.conf disable=1) Ed ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Billich Heinrich Rainer (ID SD) Sent: Friday, September 13, 2019 10:56 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level? Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jam at ucar.edu Fri Sep 13 17:07:01 2019 From: jam at ucar.edu (Joseph Mendoza) Date: Fri, 13 Sep 2019 10:07:01 -0600 Subject: [gpfsug-discuss] VerbsReconnectThread waiters In-Reply-To: References: Message-ID: <0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu> I have seen these on our cluster after the IB network goes down (GPFS still runs over ethernet) and then comes back up.? They will retry forever it seems, even after the IB is healthy again.? The effect they seem to have is that verbs connections between some nodes breaks and GPFS uses ethernet/ipoib instead.? You may see messages in your mmfs.log.latest about verbs being disabled "due to too many errors".? You can also see fewer verbs connections between nodes in "mmfsadm test verbs conn" output. Restarting GPFS on the nodes with waiters has fixed the issue for me, I don't know if IBM has any other tricks to fix this without a restart. --Joey On 9/12/19 8:16 AM, Damir Krstic wrote: > On my cluster I have seen couple of long waiters such as this: > > gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230 VerbsReconnectThread: delaying for 43.145624000 more > seconds, reason: delaying for next reconnect attempt > > I tried searching on gpfs wiki for this type of waiter, but was unable to find anything of value. > > Is this something to pay attention to, and what does this waiter mean? > > Thank you. > Damir > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Sep 16 08:12:09 2019 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 16 Sep 2019 09:12:09 +0200 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Sep 16 10:33:58 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 16 Sep 2019 17:33:58 +0800 Subject: [gpfsug-discuss] VerbsReconnectThread waiters In-Reply-To: <0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu> References: <0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu> Message-ID: Damir, Joseph, > Is this something to pay attention to, and what does this waiter mean? This waiter means GPFS fails to reconnect broken verbs connection, which can cause performance degradation. > I have seen these on our cluster after the IB network goes down (GPFS still runs over ethernet) and then comes back up.? They will retry forever it seems, even after the IB is healthy again. > Restarting GPFS on the nodes with waiters has fixed the issue for me, I don't know if IBM has any other tricks to fix this without a restart. This is a code bug which is fixed through internal defect 1090669. It will be backport to service releases after verification. There is a work-around which can fix this problem without a restart. - On nodes which have this waiter list, run command 'mmfsadm test breakconn all 744' 744 is E_RECONNECT, which triggers tcp reconnect and will not cause node leave/rejoin. Its side effect clears RDMA connections and their incorrect status. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Joseph Mendoza To: gpfsug-discuss at spectrumscale.org Date: 2019/09/14 12:08 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] VerbsReconnectThread waiters Sent by: gpfsug-discuss-bounces at spectrumscale.org I have seen these on our cluster after the IB network goes down (GPFS still runs over ethernet) and then comes back up.? They will retry forever it seems, even after the IB is healthy again.? The effect they seem to have is that verbs connections between some nodes breaks and GPFS uses ethernet/ipoib instead.? You may see messages in your mmfs.log.latest about verbs being disabled "due to too many errors".? You can also see fewer verbs connections between nodes in "mmfsadm test verbs conn" output. Restarting GPFS on the nodes with waiters has fixed the issue for me, I don't know if IBM has any other tricks to fix this without a restart. --Joey On 9/12/19 8:16 AM, Damir Krstic wrote: On my cluster I have seen couple of long waiters such as this: gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230 VerbsReconnectThread: delaying for 43.145624000 more seconds, reason: delaying for next reconnect attempt I tried searching on gpfs wiki for this type of waiter, but was unable to find anything of value. Is this something to pay attention to, and what does this waiter mean? Thank you. Damir _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=WoT3TYlCvAM8RQxUISD9L6UzqY0I_ffCJTS-UHhw8z4&s=18A0j0Zmp8OwZ6Y6cc3HFe3OgFZRHIv8OeJcBpkaPwQ&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From alvise.dorigo at psi.ch Mon Sep 16 13:58:03 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Mon, 16 Sep 2019 12:58:03 +0000 Subject: [gpfsug-discuss] Can 5-minutes frequent lsscsi command disrupt GPFS I/O on a Lenovo system ? Message-ID: <83A6EEB0EC738F459A39439733AE80452BEA85FE@MBX214.d.ethz.ch> Hello folks, recently I observed that calling every 5 minutes the command "lsscsi -g" on a Lenovo I/O node (a X3650 M5 connected to D3284 enclosures, part of a DSS-G220 system) can seriously compromise the GPFS I/O performance. (The motivation of running lsscsi every 5 minutes is a bit out of topic, but I can explain on request). What we observed is that there were several GPFS waiters telling that flushing caches to physical disk was impossible and they had to wait (possibly going in timeout). Is this something expected and/or observed by someone else in this community ? Thanks Regards, Alvise Dorigo -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Mon Sep 16 15:50:24 2019 From: ewahl at osc.edu (Wahl, Edward) Date: Mon, 16 Sep 2019 14:50:24 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: , Message-ID: What package provides this /usr/lib/tuned/ file? Ed ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Olaf Weiser Sent: Monday, September 16, 2019 3:12 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hallo Heiner, usually, Spectrum Scale comes with a tuned profile (named scale) .. [root at nsd01 ~]# tuned-adm active Current active profile: scale in there [root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3 # Disable IPv6 net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 [root at nsd01 ~]# depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ... but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled... From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 09/13/2019 05:02 PM Subject: [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From cblack at nygenome.org Mon Sep 16 15:55:34 2019 From: cblack at nygenome.org (Christopher Black) Date: Mon, 16 Sep 2019 14:55:34 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: Message-ID: On our recent ESS systems we do not see /etc/tuned/scale/tuned.conf (or script.sh) owned by any package (rpm -qif ?). I?ve attached what we have on our ESS 5.3.3 systems. Best, Chris From: on behalf of "Wahl, Edward" Reply-To: gpfsug main discussion list Date: Monday, September 16, 2019 at 10:50 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? What package provides this /usr/lib/tuned/ file? Ed ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Olaf Weiser Sent: Monday, September 16, 2019 3:12 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hallo Heiner, usually, Spectrum Scale comes with a tuned profile (named scale) .. [root at nsd01 ~]# tuned-adm active Current active profile: scale in there [root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3 # Disable IPv6 net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 [root at nsd01 ~]# depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ... but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled... From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 09/13/2019 05:02 PM Subject: [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tuned.conf Type: application/octet-stream Size: 2859 bytes Desc: tuned.conf URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: script.sh Type: application/octet-stream Size: 270 bytes Desc: script.sh URL: From heinrich.billich at id.ethz.ch Mon Sep 16 16:49:57 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 16 Sep 2019 15:49:57 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: Message-ID: <766AA5C3-46BD-4B91-9D1E-52BC5FAB90A8@id.ethz.ch> Hello Olaf, Thank you, so we?ll try to get rid of IPv6. Actually we do have this settings active but I may have to add them to the initrd file, too. (See https://access.redhat.com/solutions/8709#?rhel7disable) to prevent ganesha from opening an IPv6 socket. It?s probably no big issue if ganesha uses IPv4overIPv6 for all connections, but to keep things simple I would like to avoid it. @Edward We got /etc/tuned/scale/tuned.conf with GSS/xCAT. I?m not sure whether it?s part of any rpm. Cheers, Heiner From: on behalf of Olaf Weiser Reply to: gpfsug main discussion list Date: Monday, 16 September 2019 at 09:12 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hallo Heiner, usually, Spectrum Scale comes with a tuned profile (named scale) .. [root at nsd01 ~]# tuned-adm active Current active profile: scale in there [root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3 # Disable IPv6 net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 [root at nsd01 ~]# depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ... but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled... From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 09/13/2019 05:02 PM Subject: [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Sep 16 18:34:07 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 16 Sep 2019 17:34:07 +0000 Subject: [gpfsug-discuss] SSUG @ SC19 Update: Scheduling and Sponsorship Opportunities Message-ID: Two months until SC19 and the schedule is starting to come together, with a great mix of technical updates and user talks. I would like highlight a few items for you to be aware of: - Morning session: We?re currently trying to put together a morning ?new users? session for those new to Spectrum Scale. These talks would be focused on fundamentals and give an opportunity to ask questions. We?re tentatively thinking about starting around 9:30-10 AM on Sunday November 17th. Watch the mailing list for updates and on the http://spectrumscale.org site. - Sponsorships: We?re looking for sponsors. If your company is an IBM partner, uses/incorporates Spectrum Scale - please contact myself or Kristy Kallback-Rose. We are looking for sponsors to help with lunch (YES - we?d like to serve lunch this year!) and WiFi access during the user group meeting. Looking forward to seeing you all at SC19. Registration link coming soon, watch here: https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-sc19/ Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Sep 18 18:56:29 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 18 Sep 2019 17:56:29 +0000 Subject: [gpfsug-discuss] GPFS and POWER9 Message-ID: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log: Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3 Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [00000000115a2478] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Load/Store] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000003002a2a8400 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c016590000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001150b160] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001150b160 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c01fe80000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001086a7f0 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c00fe70000 Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4 I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed. Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)? Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Sep 19 11:44:46 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 19 Sep 2019 10:44:46 +0000 Subject: [gpfsug-discuss] GPFS and POWER9 In-Reply-To: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> References: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Thu Sep 19 15:20:53 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Thu, 19 Sep 2019 14:20:53 +0000 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? Message-ID: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Hello, Is it usual to see 200?000-400?000 open files for a single ganesha process? Or does this indicate that something ist wrong? We have some issues with ganesha (on spectrum scale protocol nodes) reporting NFS3ERR_IO in the log. I noticed that the affected nodes have a large number of open files, 200?000-400?000 open files per daemon (and 500 threads and about 250 client connections). Other nodes have 1?000 ? 10?000 open files by ganesha only and don?t show the issue. If someone could explain how ganesha decides which files to keep open and which to close that would help, too. As NFSv3 is stateless the client doesn?t open/close a file, it?s the server to decide when to close it? We do have a few NFSv4 clients, too. Are there certain access patterns that can trigger such a large number of open file? Maybe traversing and reading a large number of small files? Thank you, Heiner I did count the open files by counting the entries in /proc//fd/ . With several 100k entries I failed to do a ?ls -ls? to list all the symbolic links, hence I can?t relate the open files to different exports easily. I did post this to the ganesha mailing list, too. -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederik.ferner at diamond.ac.uk Thu Sep 19 15:30:45 2019 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Thu, 19 Sep 2019 15:30:45 +0100 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Message-ID: Heiner, we are seeing similar issues with CES/ganesha NFS, in our case it exclusively with NFSv3 clients. What is maxFilesToCache set to on your ganesha node(s)? In our case ganesha was running into the limit of open file descriptors because maxFilesToCache was set at a low default and for now we've increased it to 1M. It seemed that ganesha was never releasing files even after clients unmounted the file system. We've only recently made the change, so we'll see how much that improved the situation. I thought we had a reproducer but after our recent change, I can now no longer successfully reproduce the increase in open files not being released. Kind regards, Frederik On 19/09/2019 15:20, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > Is it usual to see 200?000-400?000 open files for a single ganesha > process? Or does this indicate that something ist wrong? > > We have some issues with ganesha (on spectrum scale protocol nodes) > ?reporting NFS3ERR_IO in the log. I noticed that the affected nodes > have a large number of open files, 200?000-400?000 open files per daemon > (and 500 threads and about 250 client connections). Other nodes have > 1?000 ? 10?000 open files by ganesha only and don?t show the issue. > > If someone could explain how ganesha decides which files to keep open > and which to close that would help, too. As NFSv3 is stateless the > client doesn?t open/close a file, it?s the server to decide when to > close it? We do have a few NFSv4 clients, too. > > Are there certain access patterns that can trigger such a large number > of open file? Maybe traversing and reading a large number of small files? > > Thank you, > > Heiner > > I did count the open files ?by counting the entries in /proc/ ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to > list all the symbolic links, hence I can?t relate the open files to > different exports easily. > > I did post this to the ganesha mailing list, too. > > -- > > ======================= > > Heinrich Billich > > ETH Z?rich > > Informatikdienste > > Tel.: +41 44 632 72 56 > > heinrich.billich at id.ethz.ch > > ======================== > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom From S.J.Thompson at bham.ac.uk Thu Sep 19 16:18:47 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 19 Sep 2019 15:18:47 +0000 Subject: [gpfsug-discuss] GPFS and POWER9 In-Reply-To: References: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> Message-ID: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk> Hi Andrew, Yes, but not only. We use the two SFP+ ports from the Broadcom supplied card + the bifurcated Mellanox card in them. Simon From: on behalf of "abeattie at au1.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 19 September 2019 at 11:45 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] GPFS and POWER9 Simon, are you using Intel 10Gb Network Adapters with RH 7.6 by anychance? regards Andrew Beattie File and Object Storage Technical Specialist - A/NZ IBM Systems - Storage Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Simon Thompson Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [EXTERNAL] [gpfsug-discuss] GPFS and POWER9 Date: Thu, Sep 19, 2019 8:42 PM Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log: Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3 Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [00000000115a2478] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Load/Store] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000003002a2a8400 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c016590000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001150b160] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001150b160 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c01fe80000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001086a7f0 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c00fe70000 Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4 I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed. Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)? Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Thu Sep 19 19:38:53 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Thu, 19 Sep 2019 18:38:53 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Ganesha_daemon_has_400=27000_open_file?= =?utf-8?q?s_-_is_this=09unusual=3F?= In-Reply-To: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Sep 19 22:34:33 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 19 Sep 2019 21:34:33 +0000 Subject: [gpfsug-discuss] GPFS and POWER9 In-Reply-To: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk> References: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>, <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Sep 19 23:41:08 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 19 Sep 2019 22:41:08 +0000 Subject: [gpfsug-discuss] Leftover GUI events after ESS upgrade Message-ID: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com> I just upgraded to ESS 5.3.4-1, and during the process these appeared. They only show up in the GUI. They don?t appear in gnrhelathcheck or mmhealth. pdisk checks are clearAny idea how to get rid of them? GSSIO1-HS GNR pdisk rg_gssio1-hs/n001v001 is diagnosing GSSIO1-HS GNR pdisk rg_gssio2-hs/n001v002 is diagnosing Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Fri Sep 20 09:08:01 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Fri, 20 Sep 2019 10:08:01 +0200 Subject: [gpfsug-discuss] Agenda and registration link // Oct 10 - Spectrum Scale NYC User Meeting Message-ID: Draft agenda and registration link are now available: https://www.spectrumscaleug.org/event/spectrum-scale-nyc-user-meeting-2019/ -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Ulf Troppens/Germany/IBM on 20/09/2019 09:37 ----- From: "Ulf Troppens" To: gpfsug main discussion list Date: 11/09/2019 14:27 Subject: [EXTERNAL] [gpfsug-discuss] Save the date: Oct 10 - Spectrum Scale NYC User Meeting Sent by: gpfsug-discuss-bounces at spectrumscale.org Greetings, NYU Langone and IBM will host a Spectrum Scale User Meeting on October 10. Many senior engineers of our development lab in Poughkeepsie will attend and present. Details with agenda, exact location and registration link will follow. Best Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=kZaabFheMr5-INuBtDMnDjxzZMuvvQ-K0cx1FAfh4lg&m=I3TzCv5SKxKb51eAL_blo-XwctX64z70ayrZKERanWA&s=OSKGngwXAoOemFy3HkctexuIpBJQu8NPeTkC_MMQBks&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Fri Sep 20 10:14:58 2019 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Fri, 20 Sep 2019 11:14:58 +0200 Subject: [gpfsug-discuss] Leftover GUI events after ESS upgrade In-Reply-To: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com> References: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com> Message-ID: Hello Bob, this event is a "Notice": You can use the action "Mark Selected Notices as Read" or "Mark All Notices as Read"in the GUI Event Groups or Individual Events grid. Notice events are transient by nature and don't imply a permanent state change of an entity. It seems that during the upgrade, mmhealth had probed the pdisk and the disk hospital was diagnosing the pdisk at this time, but eventually disk hospital placed the pdisk back to normal state, Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 162 4159920 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 20.09.2019 00:53 Subject: [EXTERNAL] [gpfsug-discuss] Leftover GUI events after ESS upgrade Sent by: gpfsug-discuss-bounces at spectrumscale.org I just upgraded to ESS 5.3.4-1, and during the process these appeared. They only show up in the GUI. They don?t appear in gnrhelathcheck or mmhealth. pdisk checks are clearAny idea how to get rid of them? GSSIO1-HS GNR pdisk rg_gssio1-hs/n001v001 is diagnosing GSSIO1-HS GNR pdisk rg_gssio2-hs/n001v002 is diagnosing Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=hLyf83U0otjISdpV5zl1cSCPVFFUF61ny3jWvv-5kNQ&s=ptMGcpNhnRTogPO2CN_l6jhC-vCN-VQAf53HmRLQDq8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 14525383.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From heinrich.billich at id.ethz.ch Mon Sep 23 10:33:02 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 23 Sep 2019 09:33:02 +0000 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Message-ID: <9D53BE88-A5FC-469F-9362-F2EC67E393B7@id.ethz.ch> Hello Frederik, Thank you. I now see a similar behavior: Ganesha has 500k open files while the node is suspended since 2+hours. I would expect that some cleanup job does remove most of the open FD after a much shorter while. Our systems have an upper limit of 1M open files per process and these spectrum scale settings: ! maxFilesToCache 1048576 ! maxStatCache 2097152 Our ganesha version is 2.5.3. (gpfs.nfs-ganesha-2.5.3-ibm036.10.el7). I don't see the issue with gpfs.nfs-ganesha-2.5.3-ibm030.01.el7. But this second cluster also has a different load pattern. I did also post my initial question to the ganesha mailing list and want to share the reply I've got from Daniel Gryniewicz. Cheers, Heiner Daniel Gryniewicz So, it's not impossible, based on the workload, but it may also be a bug. For global FDs (All NFSv3 and stateless NFSv4), we obviously cannot know when the client closes the FD, and opening/closing all the time causes a large performance hit. So, we cache open FDs. All handles in MDCACHE live on the LRU. This LRU is divided into 2 levels. Level 1 is more active handles, and they can have open FDs. Various operation can demote a handle to level 2 of the LRU. As part of this transition, the global FD on that handle is closed. Handles that are actively in use (have a refcount taken on them) are not eligible for this transition, as the FD may be being used. We have a background thread that runs, and periodically does this demotion, closing the FDs. This thread runs more often when the number of open FDs is above FD_HwMark_Percent of the available number of FDs, and runs constantly when the open FD count is above FD_Limit_Percent of the available number of FDs. So, a heavily used server could definitely have large numbers of FDs open. However, there have also, in the past, been bugs that would either keep the FDs from being closed, or would break the accounting (so they were closed, but Ganesha still thought they were open). You didn't say what version of Ganesha you're using, so I can't tell if one of those bugs apply. Daniel ?On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" wrote: Heiner, we are seeing similar issues with CES/ganesha NFS, in our case it exclusively with NFSv3 clients. What is maxFilesToCache set to on your ganesha node(s)? In our case ganesha was running into the limit of open file descriptors because maxFilesToCache was set at a low default and for now we've increased it to 1M. It seemed that ganesha was never releasing files even after clients unmounted the file system. We've only recently made the change, so we'll see how much that improved the situation. I thought we had a reproducer but after our recent change, I can now no longer successfully reproduce the increase in open files not being released. Kind regards, Frederik On 19/09/2019 15:20, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > Is it usual to see 200?000-400?000 open files for a single ganesha > process? Or does this indicate that something ist wrong? > > We have some issues with ganesha (on spectrum scale protocol nodes) > reporting NFS3ERR_IO in the log. I noticed that the affected nodes > have a large number of open files, 200?000-400?000 open files per daemon > (and 500 threads and about 250 client connections). Other nodes have > 1?000 ? 10?000 open files by ganesha only and don?t show the issue. > > If someone could explain how ganesha decides which files to keep open > and which to close that would help, too. As NFSv3 is stateless the > client doesn?t open/close a file, it?s the server to decide when to > close it? We do have a few NFSv4 clients, too. > > Are there certain access patterns that can trigger such a large number > of open file? Maybe traversing and reading a large number of small files? > > Thank you, > > Heiner > > I did count the open files by counting the entries in /proc/ ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to > list all the symbolic links, hence I can?t relate the open files to > different exports easily. > > I did post this to the ganesha mailing list, too. > > -- > > ======================= > > Heinrich Billich > > ETH Z?rich > > Informatikdienste > > Tel.: +41 44 632 72 56 > > heinrich.billich at id.ethz.ch > > ======================== > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From heinrich.billich at id.ethz.ch Mon Sep 23 11:43:06 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 23 Sep 2019 10:43:06 +0000 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Message-ID: <72079C31-1E3E-4F69-B428-480620466353@id.ethz.ch> Hello Malhal, Thank you. Actually I don?t see the parameter Cache_FDs in our ganesha config. But when I trace LRU processing I see that almost no FDs get released. And the number of FDs given in the log messages doesn?t match what I see in /proc//fd/. I see 512k open files while the logfile give 600k. Even 4hours since the I suspended the node and all i/o activity stopped I see 500k open files and LRU processing doesn?t close any of them. This looks like a bug in gpfs.nfs-ganesha-2.5.3-ibm036.10.el7. I?ll open a case with IBM. We did see gansha to fail to open new files and hence client requests to fail. I assume that 500K FDs compared to 10K FDs as before create some notable overhead for ganesha, spectrum scale and kernel and withdraw resources from samba. I?ll post to the list once we got some results. Cheers, Heiner Start of LRU processing 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1027 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51350 totalclosed:6 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1027 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1027 closing 0 descriptors 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1028 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51400 totalclosed:6 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1028 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1028 closing 0 descriptors End of log 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1029 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1029 closing 0 descriptors 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1030 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51500 totalclosed:6 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1030 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1030 closing 0 descriptors 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :After work, open_fd_count:607024 count:29503718 fdrate:1908874353 threadwait=9 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :currentopen=607024 futility=0 totalwork=51550 biggest_window=335544 extremis=0 lanes=1031 fds_lowat=167772 From: on behalf of Malahal R Naineni Reply to: gpfsug main discussion list Date: Thursday, 19 September 2019 at 20:39 To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? NFSv3 doesn't have open/close requests, so nfs-ganesha opens a file for read/write when there is an NFSv3 read/write request. It does cache file descriptors, so its open count can be very large. If you have 'Cache_FDs = true" in your config, ganesha aggressively caches file descriptors. Taking traces with COMPONENT_CACHE_INODE_LRU level set to full debug should give us better insight on what is happening when the the open file descriptors count is very high. When the I/O failure happens or when the open fd count is high, you could do the following: 1. ganesha_mgr set_log COMPONENT_CACHE_INODE_LRU FULL_DEBUG 2. wait for 90 seconds, then run 3. ganesha_mgr set_log COMPONENT_CACHE_INODE_LRU EVENT Regards, Malahal. ----- Original message ----- From: "Billich Heinrich Rainer (ID SD)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? Date: Thu, Sep 19, 2019 7:51 PM Hello, Is it usual to see 200?000-400?000 open files for a single ganesha process? Or does this indicate that something ist wrong? We have some issues with ganesha (on spectrum scale protocol nodes) reporting NFS3ERR_IO in the log. I noticed that the affected nodes have a large number of open files, 200?000-400?000 open files per daemon (and 500 threads and about 250 client connections). Other nodes have 1?000 ? 10?000 open files by ganesha only and don?t show the issue. If someone could explain how ganesha decides which files to keep open and which to close that would help, too. As NFSv3 is stateless the client doesn?t open/close a file, it?s the server to decide when to close it? We do have a few NFSv4 clients, too. Are there certain access patterns that can trigger such a large number of open file? Maybe traversing and reading a large number of small files? Thank you, Heiner I did count the open files by counting the entries in /proc//fd/ . With several 100k entries I failed to do a ?ls -ls? to list all the symbolic links, hence I can?t relate the open files to different exports easily. I did post this to the ganesha mailing list, too. -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Tue Sep 24 09:52:34 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Tue, 24 Sep 2019 08:52:34 +0000 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Message-ID: <280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch> Hello Frederik, Just some addition, maybe its of interest to someone: The number of max open files for Ganesha is based on maxFilesToCache. Its. 80%of maxFilesToCache up to an upper and lower limits of 2000/1M. The active setting is visible in /etc/sysconfig/ganesha. Cheers, Heiner ?On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" wrote: Heiner, we are seeing similar issues with CES/ganesha NFS, in our case it exclusively with NFSv3 clients. What is maxFilesToCache set to on your ganesha node(s)? In our case ganesha was running into the limit of open file descriptors because maxFilesToCache was set at a low default and for now we've increased it to 1M. It seemed that ganesha was never releasing files even after clients unmounted the file system. We've only recently made the change, so we'll see how much that improved the situation. I thought we had a reproducer but after our recent change, I can now no longer successfully reproduce the increase in open files not being released. Kind regards, Frederik On 19/09/2019 15:20, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > Is it usual to see 200?000-400?000 open files for a single ganesha > process? Or does this indicate that something ist wrong? > > We have some issues with ganesha (on spectrum scale protocol nodes) > reporting NFS3ERR_IO in the log. I noticed that the affected nodes > have a large number of open files, 200?000-400?000 open files per daemon > (and 500 threads and about 250 client connections). Other nodes have > 1?000 ? 10?000 open files by ganesha only and don?t show the issue. > > If someone could explain how ganesha decides which files to keep open > and which to close that would help, too. As NFSv3 is stateless the > client doesn?t open/close a file, it?s the server to decide when to > close it? We do have a few NFSv4 clients, too. > > Are there certain access patterns that can trigger such a large number > of open file? Maybe traversing and reading a large number of small files? > > Thank you, > > Heiner > > I did count the open files by counting the entries in /proc/ ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to > list all the symbolic links, hence I can?t relate the open files to > different exports easily. > > I did post this to the ganesha mailing list, too. > > -- > > ======================= > > Heinrich Billich > > ETH Z?rich > > Informatikdienste > > Tel.: +41 44 632 72 56 > > heinrich.billich at id.ethz.ch > > ======================== > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From valdis.kletnieks at vt.edu Tue Sep 24 21:41:07 2019 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Tue, 24 Sep 2019 16:41:07 -0400 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: <280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch> References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> <280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch> Message-ID: <269692.1569357667@turing-police> On Tue, 24 Sep 2019 08:52:34 -0000, "Billich Heinrich Rainer (ID SD)" said: > Just some addition, maybe its of interest to someone: The number of max open > files for Ganesha is based on maxFilesToCache. Its. 80%of maxFilesToCache up to > an upper and lower limits of 2000/1M. The active setting is visible in > /etc/sysconfig/ganesha. Note that strictly speaking, the values in /etc/sysconfig are in general the values that will be used at next restart - it's totally possible for the system to boot, the then-current values be picked up from /etc/sysconfig, and then any number of things, from configuration automation tools like Ansible, to a cow-orker sysadmin armed with nothing but /usr/bin/vi, to have changed the values without you knowing about it and the daemons not be restarted yet... (Let's just say that in 4 decades of doing this stuff, I've been surprised by that sort of thing a few times. :) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From mnaineni at in.ibm.com Wed Sep 25 18:06:18 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Wed, 25 Sep 2019 17:06:18 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Ganesha_daemon_has_400=27000_open_file?= =?utf-8?q?s_-_is=09this_unusual=3F?= In-Reply-To: <269692.1569357667@turing-police> References: <269692.1569357667@turing-police>, <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch><280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: att6j9ca.dat Type: application/octet-stream Size: 849 bytes Desc: not available URL: From L.R.Sudbery at bham.ac.uk Thu Sep 26 10:38:09 2019 From: L.R.Sudbery at bham.ac.uk (Luke Sudbery) Date: Thu, 26 Sep 2019 09:38:09 +0000 Subject: [gpfsug-discuss] GPFS and POWER9 In-Reply-To: References: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>, <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> Message-ID: <3b15db460ac1459e9ca53bec00f30833@bham.ac.uk> We think our issue was down to numa settings actually - making mmfsd allocate GPU memory. Makes sense given the type of error. Tomer suggested to Simon we set numactlOptioni to "0 8", as per: https://www-01.ibm.com/support/docview.wss?uid=isg1IJ02794 Our tests are not crashing since setting then ? we need to roll it out on all nodes to confirm its fixed all our hangs/reboots. Cheers, Luke -- Luke Sudbery Architecture, Infrastructure and Systems Advanced Research Computing, IT Services Room 132, Computer Centre G5, Elms Road Please note I don?t work on Monday and work from home on Friday. From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of abeattie at au1.ibm.com Sent: 19 September 2019 22:35 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] GPFS and POWER9 Simon, I have an open support call that required Redhat to create a kernel patch for RH 7.6 because of issues with the Intel x710 network adapter - I can't tell you if its related to your issue or not but it would cause the GPFS cluster to reboot and the affected node to reboot if we tried to do almost anything with that intel adapter regards, Andrew Beattie File and Object Storage Technical Specialist - A/NZ IBM Systems - Storage Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Simon Thompson > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS and POWER9 Date: Fri, Sep 20, 2019 1:18 AM Hi Andrew, Yes, but not only. We use the two SFP+ ports from the Broadcom supplied card + the bifurcated Mellanox card in them. Simon From: > on behalf of "abeattie at au1.ibm.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 19 September 2019 at 11:45 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] GPFS and POWER9 Simon, are you using Intel 10Gb Network Adapters with RH 7.6 by anychance? regards Andrew Beattie File and Object Storage Technical Specialist - A/NZ IBM Systems - Storage Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Simon Thompson > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" > Cc: Subject: [EXTERNAL] [gpfsug-discuss] GPFS and POWER9 Date: Thu, Sep 19, 2019 8:42 PM Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log: Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3 Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [00000000115a2478] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Load/Store] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000003002a2a8400 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c016590000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001150b160] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001150b160 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c01fe80000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001086a7f0 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c00fe70000 Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4 I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed. Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)? Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.mattsson at maxiv.lu.se Thu Sep 26 10:55:45 2019 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Thu, 26 Sep 2019 09:55:45 +0000 Subject: [gpfsug-discuss] afmRefreshAsync questions Message-ID: Hi, Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up. Enabling this feature has had zero impact on performance of the software though. The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x. Would this feature still have any effect in this setup? Regards, Andreas Mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Sep 27 09:23:13 2019 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 27 Sep 2019 13:53:13 +0530 Subject: [gpfsug-discuss] afmRefreshAsync questions In-Reply-To: References: Message-ID: Hi, Both storage and client clusters have to be on 5.0.3.x to get the AFM revalidation performance with afmRefreshAsync. What are the refresh intervals ?, you could also try increasing them. Is this config option set at fileset level or cluster level ? ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: GPFS User Group Date: 09/26/2019 03:26 PM Subject: [EXTERNAL] [gpfsug-discuss] afmRefreshAsync questions Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up. Enabling this feature has had zero impact on performance of the software though. The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x. Would this feature still have any effect in this setup? Regards, Andreas Mattsson ____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=tjCOcTjZ_AjP3N1mpspwuLu5u2XOFb5LkZqVAwX3wk8&s=tD6X2XM1HPMqWxSg-IelnstWbneQ7On4xfEVkCajtPE&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4232 bytes Desc: not available URL: From sakkuma4 at in.ibm.com Fri Sep 27 11:31:42 2019 From: sakkuma4 at in.ibm.com (Saket Kumar11) Date: Fri, 27 Sep 2019 10:31:42 +0000 Subject: [gpfsug-discuss] afmRefreshAsync questions In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Sun Sep 1 14:17:01 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Sun, 1 Sep 2019 13:17:01 +0000 Subject: [gpfsug-discuss] Backup question In-Reply-To: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com> References: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com> Message-ID: An HTML attachment was scrubbed... URL: From sandeep.patil at in.ibm.com Tue Sep 3 06:28:30 2019 From: sandeep.patil at in.ibm.com (Sandeep Ramesh) Date: Tue, 3 Sep 2019 05:28:30 +0000 Subject: [gpfsug-discuss] Latest Technical Blogs on IBM Spectrum Scale (Q2 2019) In-Reply-To: References: Message-ID: Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q2 2019). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. Redpaper : IBM Power Systems Enterprise AI Solutions (W/ SPECTRUM SCALE) http://www.redbooks.ibm.com/redpieces/abstracts/redp5556.html?Open IBM Spectrum Scale Erasure Code Edition (ECE): Installation Demonstration https://www.youtube.com/watch?v=6If50EvgP-U Blogs: Using IBM Spectrum Scale as platform storage for running containerized Hadoop/Spark workloads https://developer.ibm.com/storage/2019/08/27/using-ibm-spectrum-scale-as-platform-storage-for-running-containerized-hadoop-spark-workloads/ Useful Tools for Spectrum Scale CES NFS https://developer.ibm.com/storage/2019/07/22/useful-tools-for-spectrum-scale-ces-nfs/ How to ensure NFS uses strong encryption algorithms for secure data in motion ? https://developer.ibm.com/storage/2019/07/19/how-to-ensure-nfs-uses-strong-encryption-algorithms-for-secure-data-in-motion/ Introducing IBM Spectrum Scale Erasure Code Edition https://developer.ibm.com/storage/2019/07/07/introducing-ibm-spectrum-scale-erasure-code-edition/ Spectrum Scale: Which Filesystem Encryption Algo to Consider ? https://developer.ibm.com/storage/2019/07/01/spectrum-scale-which-filesystem-encryption-algo-to-consider/ IBM Spectrum Scale HDFS Transparency Apache Hadoop 3.1.x Support https://developer.ibm.com/storage/2019/06/24/ibm-spectrum-scale-hdfs-transparency-apache-hadoop-3-0-x-support/ Enhanced features in Elastic Storage Server (ESS) 5.3.4 https://developer.ibm.com/storage/2019/06/19/enhanced-features-in-elastic-storage-server-ess-5-3-4/ Upgrading IBM Spectrum Scale Erasure Code Edition using installation toolkit https://developer.ibm.com/storage/2019/06/09/upgrading-ibm-spectrum-scale-erasure-code-edition-using-installation-toolkit/ Upgrading IBM Spectrum Scale sync replication / stretch cluster setup in PureApp https://developer.ibm.com/storage/2019/06/06/upgrading-ibm-spectrum-scale-sync-replication-stretch-cluster-setup/ GPFS config remote access with multiple network definitions https://developer.ibm.com/storage/2019/05/30/gpfs-config-remote-access-with-multiple-network-definitions/ IBM Spectrum Scale Erasure Code Edition Fault Tolerance https://developer.ibm.com/storage/2019/05/30/ibm-spectrum-scale-erasure-code-edition-fault-tolerance/ IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 5.0.3 ? https://developer.ibm.com/storage/2019/05/02/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-3/ Understanding and Solving WBC_ERR_DOMAIN_NOT_FOUND error with Spectrum Scale https://crk10.wordpress.com/2019/07/21/solving-the-wbc-err-domain-not-found-nt-status-none-mapped-glitch-in-ibm-spectrum-scale/ Understanding and Solving NT_STATUS_INVALID_SID issue for SMB access with Spectrum Scale https://crk10.wordpress.com/2019/07/24/solving-nt_status_invalid_sid-for-smb-share-access-in-ibm-spectrum-scale/ mmadquery primer (apparatus to query Active Directory from IBM Spectrum Scale) https://crk10.wordpress.com/2019/07/27/mmadquery-primer-apparatus-to-query-active-directory-from-ibm-spectrum-scale/ How to configure RHEL host as Active Directory Client using SSSD https://crk10.wordpress.com/2019/07/28/configure-rhel-machine-as-active-directory-client-using-sssd/ How to configure RHEL host as LDAP client using nslcd https://crk10.wordpress.com/2019/07/28/configure-rhel-machine-as-ldap-client-using-nslcd/ Solving NFSv4 AUTH_SYS nobody ownership issue https://crk10.wordpress.com/2019/07/29/nfsv4-auth_sys-nobody-ownership-and-idmapd/ For more : Search /browse here: https://developer.ibm.com/storage/blog User Group Presentations: https://www.spectrumscale.org/presentations/ Consolidation list of all blogs and collaterals. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 04/29/2019 12:12 PM Subject: Latest Technical Blogs on IBM Spectrum Scale (Q1 2019) Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q1 2019). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. Spectrum Scale 5.0.3 https://developer.ibm.com/storage/2019/04/24/spectrum-scale-5-0-3/ IBM Spectrum Scale HDFS Transparency Ranger Support https://developer.ibm.com/storage/2019/04/01/ibm-spectrum-scale-hdfs-transparency-ranger-support/ Integration of IBM Aspera Sync with IBM Spectrum Scale: Protecting and Sharing Files Globally, http://www.redbooks.ibm.com/abstracts/redp5527.html?Open Spectrum Scale user group in Singapore, 2019 https://developer.ibm.com/storage/2019/03/14/spectrum-scale-user-group-in-singapore-2019/ 7 traits to use Spectrum Scale to run container workload https://developer.ibm.com/storage/2019/02/26/7-traits-to-use-spectrum-scale-to-run-container-workload/ Health Monitoring of IBM Spectrum Scale Cluster via External Monitoring Framework https://developer.ibm.com/storage/2019/01/22/health-monitoring-of-ibm-spectrum-scale-cluster-via-external-monitoring-framework/ Migrating data from native HDFS to IBM Spectrum Scale based shared storage https://developer.ibm.com/storage/2019/01/18/migrating-data-from-native-hdfs-to-ibm-spectrum-scale-based-shared-storage/ Bulk File Creation useful for Test on Filesystems https://developer.ibm.com/storage/2019/01/16/bulk-file-creation-useful-for-test-on-filesystems/ For more : Search /browse here: https://developer.ibm.com/storage/blog User Group Presentations: https://www.spectrumscale.org/presentations/ Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 01/14/2019 06:24 PM Subject: Latest Technical Blogs on IBM Spectrum Scale (Q4 2018) Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q4 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. Redpaper: IBM Spectrum Scale and IBM StoredIQ: Identifying and securing your business data to support regulatory requirements http://www.redbooks.ibm.com/abstracts/redp5525.html?Open IBM Spectrum Scale Memory Usage https://www.slideshare.net/tomerperry/ibm-spectrum-scale-memory-usage?qid=50a1dfda-3102-484f-b9d0-14b69fc4800b&v=&b=&from_search=2 Spectrum Scale and Containers https://developer.ibm.com/storage/2018/12/20/spectrum-scale-and-containers/ IBM Elastic Storage Server Performance Graphical Visualization with Grafana https://developer.ibm.com/storage/2018/12/18/ibm-elastic-storage-server-performance-graphical-visualization-with-grafana/ Hadoop Performance for disaggregated compute and storage configurations based on IBM Spectrum Scale Storage https://developer.ibm.com/storage/2018/12/13/hadoop-performance-for-disaggregated-compute-and-storage-configurations-based-on-ibm-spectrum-scale-storage/ EMS HA in ESS LE (Little Endian) environment https://developer.ibm.com/storage/2018/12/07/ems-ha-in-ess-le-little-endian-environment/ What?s new in ESS 5.3.2 https://developer.ibm.com/storage/2018/12/04/whats-new-in-ess-5-3-2/ Administer your Spectrum Scale cluster easily https://developer.ibm.com/storage/2018/11/13/administer-your-spectrum-scale-cluster-easily/ Disaster Recovery using Spectrum Scale?s Active File Management https://developer.ibm.com/storage/2018/11/13/disaster-recovery-using-spectrum-scales-active-file-management/ Recovery Group Failover Procedure of IBM Elastic Storage Server (ESS) https://developer.ibm.com/storage/2018/10/08/recovery-group-failover-procedure-ibm-elastic-storage-server-ess/ Whats new in IBM Elastic Storage Server (ESS) Version 5.3.1 and 5.3.1.1 https://developer.ibm.com/storage/2018/10/04/whats-new-ibm-elastic-storage-server-ess-version-5-3-1-5-3-1-1/ For more : Search /browse here: https://developer.ibm.com/storage/blog User Group Presentations: https://www.spectrumscale.org/presentations/ Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 10/03/2018 08:48 PM Subject: Latest Technical Blogs on IBM Spectrum Scale (Q3 2018) Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q3 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. How NFS exports became more dynamic with Spectrum Scale 5.0.2 https://developer.ibm.com/storage/2018/10/02/nfs-exports-became-dynamic-spectrum-scale-5-0-2/ HPC storage on AWS (IBM Spectrum Scale) https://developer.ibm.com/storage/2018/10/02/hpc-storage-aws-ibm-spectrum-scale/ Upgrade with Excluding the node(s) using Install-toolkit https://developer.ibm.com/storage/2018/09/30/upgrade-excluding-nodes-using-install-toolkit/ Offline upgrade using Install-toolkit https://developer.ibm.com/storage/2018/09/30/offline-upgrade-using-install-toolkit/ IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/21/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-2/ What?s New in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/15/whats-new-ibm-spectrum-scale-5-0-2/ Starting IBM Spectrum Scale 5.0.2 release, the installation toolkit supports upgrade rerun if fresh upgrade fails. https://developer.ibm.com/storage/2018/09/15/starting-ibm-spectrum-scale-5-0-2-release-installation-toolkit-supports-upgrade-rerun-fresh-upgrade-fails/ IBM Spectrum Scale installation toolkit ? enhancements over releases ? 5.0.2.0 https://developer.ibm.com/storage/2018/09/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases-5-0-2-0/ Announcing HDP 3.0 support with IBM Spectrum Scale https://developer.ibm.com/storage/2018/08/31/announcing-hdp-3-0-support-ibm-spectrum-scale/ IBM Spectrum Scale Tuning Overview for Hadoop Workload https://developer.ibm.com/storage/2018/08/20/ibm-spectrum-scale-tuning-overview-hadoop-workload/ Making the Most of Multicloud Storage https://developer.ibm.com/storage/2018/08/13/making-multicloud-storage/ Disaster Recovery for Transparent Cloud Tiering using SOBAR https://developer.ibm.com/storage/2018/08/13/disaster-recovery-transparent-cloud-tiering-using-sobar/ Your Optimal Choice of AI Storage for Today and Tomorrow https://developer.ibm.com/storage/2018/08/10/spectrum-scale-ai-workloads/ Analyze IBM Spectrum Scale File Access Audit with ELK Stack https://developer.ibm.com/storage/2018/07/30/analyze-ibm-spectrum-scale-file-access-audit-elk-stack/ Mellanox SX1710 40G switch MLAG configuration for IBM ESS https://developer.ibm.com/storage/2018/07/12/mellanox-sx1710-40g-switcher-mlag-configuration/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? SMB and NFS Access issues https://developer.ibm.com/storage/2018/07/10/protocol-problem-determination-guide-ibm-spectrum-scale-smb-nfs-access-issues/ Access Control in IBM Spectrum Scale Object https://developer.ibm.com/storage/2018/07/06/access-control-ibm-spectrum-scale-object/ IBM Spectrum Scale HDFS Transparency Docker support https://developer.ibm.com/storage/2018/07/06/ibm-spectrum-scale-hdfs-transparency-docker-support/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? Log Collection https://developer.ibm.com/storage/2018/07/04/protocol-problem-determination-guide-ibm-spectrum-scale-log-collection/ Redpapers IBM Spectrum Scale Immutability Introduction, Configuration Guidance, and Use Cases http://www.redbooks.ibm.com/abstracts/redp5507.html?Open Certifications Assessment of the immutability function of IBM Spectrum Scale Version 5.0 in accordance to US SEC17a-4f, EU GDPR Article 21 Section 1, German and Swiss laws and regulations in collaboration with KPMG. Certificate: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?DE968667B47544FF83F6CCDCF37E5FB5 Full assessment report: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?B290411BE1224F5A9B4D24663BCD3C5D For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 07/03/2018 12:13 AM Subject: Re: Latest Technical Blogs on Spectrum Scale (Q2 2018) Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q2 2018). We now have over 100+ developer blogs. As discussed in User Groups, passing it along: IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ IBM Spectrum Scale ILM Policies https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/ IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ Management GUI enhancements in IBM Spectrum Scale release 5.0.1 https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/ Managing IBM Spectrum Scale services through GUI https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/ Use AWS CLI with IBM Spectrum Scale? object storage https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/ Hadoop Storage Tiering with IBM Spectrum Scale https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/ How many Files on my Filesystem? https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/ Recording Spectrum Scale Object Stats for Potential Billing like Purpose using Elasticsearch https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/ New features in IBM Elastic Storage Server (ESS) Version 5.3 https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/ Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send earlier) https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19 Redpapers Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering http://www.redbooks.ibm.com/abstracts/redp5411.html?Open SAP HANA and ESS: A Winning Combination (Update) http://www.redbooks.ibm.com/abstracts/redp5436.html?Open Others IBM Spectrum Scale Software Version Recommendation Preventive Service Planning (Updated) http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in HCLS https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN& For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 03/27/2018 05:23 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q1 2018). As discussed in User Groups, passing it along: GDPR Compliance and Unstructured Data Storage https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/ IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and highlights https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/ Management GUI enhancements in IBM Spectrum Scale release 5.0.0 https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/ IBM Spectrum Scale 5.0.0 ? What?s new in NFS? https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/ Benefits and implementation of Spectrum Scale sudo wrappers https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/ IBM Spectrum Scale: Big Data and Analytics Solution Brief https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/ Variant Sub-blocks in Spectrum Scale 5.0 https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/ Compression support in Spectrum Scale 5.0.0 https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM Spectrum Scale on AWS. This solution helps the users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale. Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4 Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY. For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 01/10/2018 12:13 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, Here are list of development blogs in the last quarter. Passing it to this email group as Doris had got a feedback in the UG meetings to notify the members with the latest updates periodically. Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ IBM Spectrum Scale MMFSCK ? Savvy Enhancements https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/ ESS Disk Management https://developer.ibm.com/storage/2018/01/02/ess-disk-management/ IBM Spectrum Scale Object Protocol On Ubuntu https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/ IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/ A Complete Guide to ? Protocol Problem Determination Guide for IBM Spectrum Scale? ? Part 1 https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/ IBM Spectrum Scale installation toolkit ? enhancements over releases https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/ Network requirements in an Elastic Storage Server Setup https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/ Co-resident migration with Transparent cloud tierin https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/ IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big Data Solution https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/ Big data analytics with Spectrum Scale using remote cluster mount & multi-filesystem support https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/ IBM Spectrum Scale HDFS Transparency Short Circuit Write Support https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/ IBM Spectrum Scale HDFS Transparency Federation Support https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/ How to configure and performance tuning different system workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Spark workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning database workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Hadoop workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/ How to Configure IBM Spectrum Scale? with NIS based Authentication. https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 11/16/2017 08:15 PM Subject: Latest Technical Blogs on Spectrum Scale Dear User Group members, Here are the Development Blogs in last 3 months on Spectrum Scale Technical Topics. Spectrum Scale Monitoring ? Know More ? https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/ IBM Spectrum Scale 5.0 Release ? What?s coming ! https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/ Four Essentials things to know for managing data ACLs on IBM Spectrum Scale? from Windows https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/ GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server https://developer.ibm.com/storage/2017/11/13/gssutils/ IBM Spectrum Scale Object Authentication https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/ Video Surveillance ? Choosing the right storage https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/ IBM Spectrum scale object deep dive training with problem determination https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training Spectrum Scale as preferred software defined storage for Ubuntu OpenStack https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/ IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a performance workhorse https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/ A Complete Guide to Configure LDAP-based authentication with IBM Spectrum Scale? for File Access https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/ Deploying IBM Spectrum Scale on AWS Quick Start https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/ Monitoring Spectrum Scale Object metrics https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/ Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk Universal https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/ Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol on IBM Spectrum Scale?? https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory and LDAP https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/ IBM Spectrum Scale? Authentication using Active Directory and RFC2307 https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/ High Availability Implementation with IBM Spectrum Virtualize and IBM Spectrum Scale https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/ 10 Frequently asked Questions on configuring Authentication using AD + AUTO ID mapping on IBM Spectrum Scale?. https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/ Five cool things that you didn?t know Transparent Cloud Tiering on Spectrum Scale can do https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/ IBM Spectrum Scale GUI videos https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/ IBM Spectrum Scale? Authentication ? Planning for NFS Access https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Tue Sep 3 14:07:44 2019 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 3 Sep 2019 15:07:44 +0200 Subject: [gpfsug-discuss] Fileheat - does work! Complete test/example provided here. In-Reply-To: References: Message-ID: Thanks for this example, very userful, but I'm still struggeling a bit at a customer.. We're doing heat daily based rebalancing, with fileheatlosspercent=20 and fileheatperiodminutes=720: RULE "defineTiers" GROUP POOL 'Tiers' IS 'ssdpool' LIMIT(70) then 'saspool' RULE 'Rebalance' MIGRATE FROM POOL 'Tiers' TO POOL 'Tiers' WEIGHT(FILE_HEAT) WHERE FILE_SIZE<10000000000 but are seeing too many files moved down to the saspool and too few are staying in the ssdpool. Right now we ran a test of this policy, and saw that it wanted to move 130k files / 300 GB down to the saspool, and a single small file up to the ssdpool -- even though the ssdpool is only 50% utilized. Running your listing policy reveals lots of files with zero heat: <7> /gpfs/gpfs0/file1 RULE 'fh2' LIST 'fh' WEIGHT(0.000000) SHOW( _NULL_ _NULL_ _NULL_ +0.00000000000000E+000 _NULL_ 720 25 server.locale) <7> /gpfs/gpfs0/file2 RULE 'fh2' LIST 'fh' WEIGHT(0.000000) SHOW( _NULL_ _NULL_ _NULL_ +0.00000000000000E+000 _NULL_ 720 25 server.locale) <7> /gpfs/gpfs0/file3/HM_WVS_8P41017_1/HM_WVS_8P41017_1.S2206 RULE 'fh2' LIST 'fh' WEIGHT(0.000000) SHOW( _NULL_ _NULL_ _NULL_ +0.00000000000000E+000 _NULL_ 720 25 server.locale) and others with heat: <5> /gpfs/gpfs0/file4 RULE 'fh2' LIST 'fh' WEIGHT(0.004246) SHOW( 300401047 0 0 +4.24600492924153E-003 11E7C19700000000 720 25 server.locale) <5> /gpfs/gpfs0/file5 RULE 'fh2' LIST 'fh' WEIGHT(0.001717) SHOW( 120971793 1 0 +1.71725239616613E-003 0735E21100010000 720 25 server.locale) These are not new files -- so we're wondering if maybe the fileheat is reduced to zero/NULL after a while (how many times can it shrink by 25% before it's zero??). Would it make sense to increase fileheatperiodeminutes and/or decrease fileheatlosspercentage? What would be good values? (BTW: we have relatime enabled) Any other ideas for why it won't fill up our ssdpool to close to LIMIT(70) ? -jf On Tue, Aug 13, 2019 at 3:33 PM Marc A Kaplan wrote: > Yes, you are correct. It should only be necessary to set > fileHeatPeriodMinutes, since the loss percent does have a default value. > But IIRC (I implemented part of this!) you must restart the daemon to get > those fileheat parameter(s) "loaded"and initialized into the daemon > processes. > > Not fully trusting my memory... I will now "prove" this works today as > follows: > > To test, create and re-read a large file with dd... > > [root@/main/gpfs-git]$mmchconfig fileHeatPeriodMinutes=60 > mmchconfig: Command successfully completed > ... > [root@/main/gpfs-git]$mmlsconfig | grep -i heat > fileHeatPeriodMinutes 60 > > [root@/main/gpfs-git]$mmshutdown > ... > [root@/main/gpfs-git]$mmstartup > ... > [root@/main/gpfs-git]$mmmount c23 > ... > [root@/main/gpfs-git]$ls -l /c23/10g > -rw-r--r--. 1 root root 10737418240 May 16 15:09 /c23/10g > > [root@/main/gpfs-git]$mmlsattr -d -X /c23/10g > file name: /c23/10g > security.selinux > > (NO fileheat attribute yet...) > > [root@/main/gpfs-git]$dd if=/c23/10g bs=1M of=/dev/null > ... > After the command finishes, you may need to wait a while for the metadata > to flush to the inode on disk ... or you can force that with an unmount or > a mmfsctl... > > Then the fileheat attribute will appear (I just waited by answering > another email... No need to do any explicit operations on the file system..) > > [root@/main/gpfs-git]$mmlsattr -d -X /c23/10g > file name: /c23/10g > security.selinux > gpfs.FileHeat > > To see its hex string value: > > [root@/main/gpfs-git]$mmlsattr -d -X -L /c23/10g > file name: /c23/10g > ... > security.selinux: > 0x756E636F6E66696E65645F753A6F626A6563745F723A756E6C6162656C65645F743A733000 > gpfs.FileHeat: 0x000000EE42A40400 > > Which will be interpreted by mmapplypolicy... > > YES, the interpretation is relative to last access time and current time, > and done by a policy/sql function "computeFileHeat" > (You could find this using m4 directives in your policy file...) > > > define([FILE_HEAT],[computeFileHeat(CURRENT_TIMESTAMP-ACCESS_TIME,xattr('gpfs.FileHeat'),KB_ALLOCATED)]) > > Well gone that far, might as well try mmapplypolicy too.... > > [root@/main/gpfs-git]$cat /gh/policies/fileheat.policy > define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1) > END]) > > rule fh1 external list 'fh' exec '' > rule fh2 list 'fh' weight(FILE_HEAT) > show(DISPLAY_NULL(xattr_integer('gpfs.FileHeat',1,4,'B')) || ' ' || > DISPLAY_NULL(xattr_integer('gpfs.FileHeat',5,2,'B')) || ' ' || > DISPLAY_NULL(xattr_integer('gpfs.FileHeat',7,2,'B')) || ' ' || > DISPLAY_NULL(FILE_HEAT) || ' ' || > DISPLAY_NULL(hex(xattr('gpfs.FileHeat'))) || ' ' || > getmmconfig('fileHeatPeriodMinutes') || ' ' || > getmmconfig('fileHeatLossPercent') || ' ' || > getmmconfig('clusterName') ) > > > [root@/main/gpfs-git]$mmapplypolicy /c23 --maxdepth 1 -P > /gh/policies/fileheat.policy -I test -L 3 > ... > <1> /c23/10g RULE 'fh2' LIST 'fh' WEIGHT(0.022363) SHOW( 238 17060 1024 > +2.23632812500000E-002 000000EE42A40400 60 10 makaplan.sl.cloud9.ibm.com) > ... > WEIGHT(0.022363) LIST 'fh' /c23/10g SHOW(238 17060 1024 > +2.23632812500000E-002 000000EE42A40400 60 10 makaplan.sl.cloud9.ibm.com) > > > > > [image: Inactive hide details for Jan-Frode Myklebust ---08/13/2019 > 06:22:46 AM---What about filesystem atime updates. We recently chan]Jan-Frode > Myklebust ---08/13/2019 06:22:46 AM---What about filesystem atime updates. > We recently changed the default to ?relatime?. Could that maybe > > From: Jan-Frode Myklebust > To: gpfsug main discussion list > Date: 08/13/2019 06:22 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Fileheat > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > > What about filesystem atime updates. We recently changed the default to > ?relatime?. Could that maybe influence heat tracking? > > > > -jf > > > tir. 13. aug. 2019 kl. 11:29 skrev Ulrich Sibiller < > *u.sibiller at science-computing.de* >: > > On 12.08.19 15:38, Marc A Kaplan wrote: > > My Admin guide says: > > > > The loss percentage and period are set via the configuration > > variables *fileHeatLossPercent *and *fileHeatPeriodMinutes*. By > default, the file access temperature > > is not > > tracked. To use access temperature in policy, the tracking must > first be enabled. To do this, set > > the two > > configuration variables as follows:* > > Yes, I am aware of that. > > > fileHeatLossPercent* > > The percentage (between 0 and 100) of file access temperature > dissipated over the* > > fileHeatPeriodMinutes *time. The default value is 10. > > Chapter 25. Information lifecycle management for IBM Spectrum Scale > *361** > > fileHeatPeriodMinutes* > > The number of minutes defined for the recalculation of file access > temperature. To turn on > > tracking, *fileHeatPeriodMinutes *must be set to a nonzero value. > The default value is 0 > > > > > > SO Try setting both! > > Well, I have not because the documentation explicitly mentions a > default. What's the point of a > default if I have to explicitly configure it? > > > ALSO to take effect you may have to mmshutdown and mmstartup, at > least on the (client gpfs) nodes > > that are accessing the files of interest. > > I have now configured both parameters and restarted GPFS. Ran a tar > over a directory - still no > change. I will wait for 720minutes and retry (tomorrow). > > Thanks > > Uli > > -- > Science + Computing AG > Vorstandsvorsitzender/Chairman of the board of management: > Dr. Martin Matzke > Vorstand/Board of Management: > Matthias Schempp, Sabine Hohenstein > Vorsitzender des Aufsichtsrats/ > Chairman of the Supervisory Board: > Philippe Miltin > Aufsichtsrat/Supervisory Board: > Martin Wibbe, Ursula Morgenstern > Sitz/Registered Office: Tuebingen > Registergericht/Registration Court: Stuttgart > Registernummer/Commercial Register No.: HRB 382196 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Tue Sep 3 16:37:58 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 3 Sep 2019 15:37:58 +0000 Subject: [gpfsug-discuss] Easiest way to copy quota settings from one file system to another? Message-ID: <63C132C3-63AF-465B-8FD9-67AF9EA4887D@nuance.com> I?m migratinga file system from one cluster to another. I want to copy all user quotas from cluster1 filesystem ?A? to cluster2, filesystem ?fs1?, fileset ?A? What?s the easiest way to do that? I?m thinking mmsetquota with a stanza file, but is there a tool to generate the stanza file from the source? I could do a ?mmrepquota -u -Y? and process the output. Hoping for something easier :) Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.mattsson at maxiv.lu.se Thu Sep 5 10:54:04 2019 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Thu, 5 Sep 2019 09:54:04 +0000 Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction Message-ID: <3ed969d0d778446982a419067320f927@maxiv.lu.se> Hi, Does anyone here know if cache eviction on a AFM cache also make the inodes used by the evicted files available for reuse? Basically, I'm trying to figure out if it is enough to have sufficient inode space in my cache filesets to keep the maximum expected simultaneously cached files, or if I need the same inode space as for the total amount of files that will reside in the home of the cache. Regards, Andreas Mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Thu Sep 5 14:28:00 2019 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Thu, 5 Sep 2019 18:58:00 +0530 Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction In-Reply-To: <3ed969d0d778446982a419067320f927@maxiv.lu.se> References: <3ed969d0d778446982a419067320f927@maxiv.lu.se> Message-ID: Hi, AFM does not support inode eviction, only data blocks are evicted and the file's metadata will remain in the fileset. ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: GPFS User Group Date: 09/05/2019 03:39 PM Subject: [EXTERNAL] [gpfsug-discuss] Inode reuse on AFM cache eviction Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Does anyone here know if cache eviction on a AFM cache also make the inodes used by the evicted files available for reuse? Basically, I'm trying to figure out if it is enough to have sufficient inode space in my cache filesets to keep the maximum expected simultaneously cached files, or if I need the same inode space as for the total amount of files that will reside in the home of the cache. Regards, Andreas Mattsson ____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=5omqUvEiiIKUhShJOBEgb3WwLU5uy-8o_4--y0TOuw0&s=ZFAcjvG5LrsnsCJgIf9f1320V866HKG6iJGteRQ7oac&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4232 bytes Desc: not available URL: From sakkuma4 at in.ibm.com Thu Sep 5 19:37:47 2019 From: sakkuma4 at in.ibm.com (Saket Kumar11) Date: Thu, 5 Sep 2019 18:37:47 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 92, Issue 4 In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From sakkuma4 at in.ibm.com Thu Sep 5 20:06:17 2019 From: sakkuma4 at in.ibm.com (Saket Kumar11) Date: Thu, 5 Sep 2019 19:06:17 +0000 Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From son.truong at bristol.ac.uk Fri Sep 6 10:48:56 2019 From: son.truong at bristol.ac.uk (Son Truong) Date: Fri, 6 Sep 2019 09:48:56 +0000 Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 Message-ID: Hello, Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on RHEL 7.7? I am failing with these errors: [root at host ~]# uname -a Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux [root at host ~]# rpm -qa | grep gpfs gpfs.base-4.2.3-7.x86_64 gpfs.gskit-8.0.50-75.x86_64 gpfs.ext-4.2.3-7.x86_64 gpfs.msg.en_US-4.2.3-7.noarch gpfs.docs-4.2.3-7.noarch gpfs.gpl-4.2.3-7.noarch [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl -------------------------------------------------------- mmbuildgpl: Building GPL module begins at Fri Sep 6 09:30:20 UTC 2019. -------------------------------------------------------- Verifying Kernel Header... kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64, 3.10.0-1062) module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include module build dir = /lib/modules/3.10.0-1062.el7.x86_64/build kernel source dir = /usr/src/linux-3.10.0-1062.el7.x86_64/include Found valid kernel header file under /usr/src/kernels/3.10.0-1062.el7.x86_64/include Verifying Compiler... make is present at /bin/make cpp is present at /bin/cpp gcc is present at /bin/gcc g++ is present at /bin/g++ ld is present at /bin/ld Verifying Additional System Headers... Verifying kernel-headers is installed ... Command: /bin/rpm -q kernel-headers The required package kernel-headers is installed make World ... Verifying that tools to build the portability layer exist.... cpp present gcc present g++ present ld present cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit $? || exit 1 rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver cleaning (/usr/lpp/mmfs/src/ibm-kxi) make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi' rm -f trcid.h ibm_kxi.trclst [cut] Invoking Kbuild... /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config ; \ if [ $? -ne 0 ]; then \ exit 1;\ fi make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' LD /usr/lpp/mmfs/src/gpl-linux/built-in.o CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0, from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: /usr/lpp/mmfs/src/gpl-linux/inode.c: In function 'printInode': /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: 'struct inode' has no member named 'i_wb_list' _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP->i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev)); ^ /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro '_TRACE_MACRO' { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP [ cut ] ^ /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro 'TRACE6' TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8, ^ In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: /usr/lpp/mmfs/src/gpl-linux/inode.c: In function 'cxiInitInodeSecurity': /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of 'security_old_inode_init_security' from incompatible pointer type [enabled by default] rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name, ^ In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0, from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61, from /usr/lpp/mmfs/src/gpl-linux/dir.c:56, from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: include/linux/security.h:1896:5: note: expected 'const char **' but argument is of type 'char **' int security_old_inode_init_security(struct inode *inode, struct inode *dir, ^ In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function 'cache_get_name': /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration of function 'vfs_readdir' [-Werror=implicit-function-declaration] error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer); ^ cc1: some warnings being treated as errors make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1 make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2 make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' make[1]: *** [modules] Error 1 make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' make: *** [Modules] Error 1 -------------------------------------------------------- mmbuildgpl: Building GPL module failed at Fri Sep 6 09:30:28 UTC 2019. -------------------------------------------------------- mmbuildgpl: Command failed. Examine previous error messages to determine cause. Any help appreciated... Son Son V Truong - Senior Storage Administrator Advanced Computing Research Centre IT Services, University of Bristol Email: son.truong at bristol.ac.uk Tel: Mobile: +44 (0) 7732 257 232 Address: 31 Great George Street, Bristol, BS1 5QD -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Sep 6 11:24:51 2019 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Fri, 6 Sep 2019 06:24:51 -0400 Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 In-Reply-To: References: Message-ID: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu> We are starting rolling upgrade to 5.0.3-x and gplbin compiles with non-fatal warnings at that version. It seems to run fine. The rest of the cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do you have a reason to not go for the latest release on either the 4- or 5- line? [root at xxx ~]# ssh node1301 rpm -q gpfs.base gpfs.base-4.2.3-10.x86_64 -- ddj Dave Johnson > On Sep 6, 2019, at 5:48 AM, Son Truong wrote: > > Hello, > > Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on RHEL 7.7? > > I am failing with these errors: > > [root at host ~]# uname -a > Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux > > [root at host ~]# rpm -qa | grep gpfs > gpfs.base-4.2.3-7.x86_64 > gpfs.gskit-8.0.50-75.x86_64 > gpfs.ext-4.2.3-7.x86_64 > gpfs.msg.en_US-4.2.3-7.noarch > gpfs.docs-4.2.3-7.noarch > gpfs.gpl-4.2.3-7.noarch > > [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl > -------------------------------------------------------- > mmbuildgpl: Building GPL module begins at Fri Sep 6 09:30:20 UTC 2019. > -------------------------------------------------------- > Verifying Kernel Header... > kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64, 3.10.0-1062) > module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include > module build dir = /lib/modules/3.10.0-1062.el7.x86_64/build > kernel source dir = /usr/src/linux-3.10.0-1062.el7.x86_64/include > Found valid kernel header file under /usr/src/kernels/3.10.0-1062.el7.x86_64/include > Verifying Compiler... > make is present at /bin/make > cpp is present at /bin/cpp > gcc is present at /bin/gcc > g++ is present at /bin/g++ > ld is present at /bin/ld > Verifying Additional System Headers... > Verifying kernel-headers is installed ... > Command: /bin/rpm -q kernel-headers > The required package kernel-headers is installed > make World ... > Verifying that tools to build the portability layer exist.... > cpp present > gcc present > g++ present > ld present > cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit $? || exit 1 > rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib > mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib > rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver > cleaning (/usr/lpp/mmfs/src/ibm-kxi) > make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi' > rm -f trcid.h ibm_kxi.trclst > > [cut] > > Invoking Kbuild... > /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config ; \ > if [ $? -ne 0 ]; then \ > exit 1;\ > fi > make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > LD /usr/lpp/mmfs/src/gpl-linux/built-in.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o > LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o > LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o > In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0, > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?: > /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? has no member named ?i_wb_list? > _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP->i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev)); > ^ > /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro ?_TRACE_MACRO? > { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP > > [ cut ] > > ^ > /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro ?TRACE6? > TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8, > ^ > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0, > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?: > /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of ?security_old_inode_init_security? from incompatible pointer type [enabled by default] > rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name, > ^ > In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0, > from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61, > from /usr/lpp/mmfs/src/gpl-linux/dir.c:56, > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > include/linux/security.h:1896:5: note: expected ?const char **? but argument is of type ?char **? > int security_old_inode_init_security(struct inode *inode, struct inode *dir, > ^ > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0, > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?: > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration of function ?vfs_readdir? [-Werror=implicit-function-declaration] > error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer); > ^ > cc1: some warnings being treated as errors > make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1 > make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2 > make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > make[1]: *** [modules] Error 1 > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' > make: *** [Modules] Error 1 > -------------------------------------------------------- > mmbuildgpl: Building GPL module failed at Fri Sep 6 09:30:28 UTC 2019. > -------------------------------------------------------- > mmbuildgpl: Command failed. Examine previous error messages to determine cause. > > Any help appreciated? > Son > > Son V Truong - Senior Storage Administrator > Advanced Computing Research Centre > IT Services, University of Bristol > Email: son.truong at bristol.ac.uk > Tel: Mobile: +44 (0) 7732 257 232 > Address: 31 Great George Street, Bristol, BS1 5QD > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Fri Sep 6 12:41:32 2019 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Fri, 6 Sep 2019 11:41:32 +0000 Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 In-Reply-To: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu> References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>, Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15677537609150.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15677537609151.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15677537609152.png Type: image/png Size: 1134 bytes Desc: not available URL: From Dugan.Witherick at warwick.ac.uk Fri Sep 6 13:25:22 2019 From: Dugan.Witherick at warwick.ac.uk (Witherick, Dugan) Date: Fri, 6 Sep 2019 12:25:22 +0000 Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 In-Reply-To: References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu> , Message-ID: <05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk> Hi Son, You might also find Table 39 on https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#fsm useful as it lists the minimum Spectrum Scale Level supported and tested against the RHEL Distribution/kernel version. Thanks, Dugan On Fri, 2019-09-06 at 11:41 +0000, Alexander Wolf wrote: > RHEL 7.7 is not supported by any Scale release at the moment. We are > qualifying it right now and would like to claim support with the next PTFs on > both 4.2.3 and 5.0.3 streams. However we have seen issues in test that will > probably cause delays. > > Picking up new minor RHEL updates before Scale claims support might work many > times but is quite a risky business. I highly recommend waiting for our > support statement. > > Mit freundlichen Gr??en / Kind regards > > > > > > Dr. Alexander Wolf-Reber > Spectrum Scale Release Lead Architect > Department M069 / Spectrum Scale Software Development > > +49-160-90540880 > a.wolf-reber at de.ibm.com > > IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: > Matthias Hartmann / Gesch?ftsf?hrung: Dirk Wittkopp > Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB > 243294 > > > > > ----- Original message ----- > > From: david_johnson at brown.edu > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > To: gpfsug main discussion list > > Cc: > > Subject: [EXTERNAL] Re: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 > > Date: Fri, Sep 6, 2019 12:33 > > > > We are starting rolling upgrade to 5.0.3-x and gplbin compiles with non- > > fatal warnings at that version. It seems to run fine. The rest of the > > cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do you have a > > reason to not go for the latest release on either the 4- or 5- line? > > > > [root at xxx ~]# ssh node1301 rpm -q gpfs.base > > gpfs.base-4.2.3-10.x86_64 > > > > > > -- ddj > > Dave Johnson > > > > On Sep 6, 2019, at 5:48 AM, Son Truong wrote: > > > > > Hello, > > > > > > Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on > > > RHEL 7.7? > > > > > > I am failing with these errors: > > > > > > [root at host ~]# uname -a > > > Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 > > > x86_64 x86_64 x86_64 GNU/Linux > > > > > > [root at host ~]# rpm -qa | grep gpfs > > > gpfs.base-4.2.3-7.x86_64 > > > gpfs.gskit-8.0.50-75.x86_64 > > > gpfs.ext-4.2.3-7.x86_64 > > > gpfs.msg.en_US-4.2.3-7.noarch > > > gpfs.docs-4.2.3-7.noarch > > > gpfs.gpl-4.2.3-7.noarch > > > > > > [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl > > > -------------------------------------------------------- > > > mmbuildgpl: Building GPL module begins at Fri Sep 6 09:30:20 UTC 2019. > > > -------------------------------------------------------- > > > Verifying Kernel Header... > > > kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64, > > > 3.10.0-1062) > > > module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include > > > module build dir = /lib/modules/3.10.0-1062.el7.x86_64/build > > > kernel source dir = /usr/src/linux-3.10.0-1062.el7.x86_64/include > > > Found valid kernel header file under /usr/src/kernels/3.10.0- > > > 1062.el7.x86_64/include > > > Verifying Compiler... > > > make is present at /bin/make > > > cpp is present at /bin/cpp > > > gcc is present at /bin/gcc > > > g++ is present at /bin/g++ > > > ld is present at /bin/ld > > > Verifying Additional System Headers... > > > Verifying kernel-headers is installed ... > > > Command: /bin/rpm -q kernel-headers > > > The required package kernel-headers is installed > > > make World ... > > > Verifying that tools to build the portability layer exist.... > > > cpp present > > > gcc present > > > g++ present > > > ld present > > > cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit > > > $? || exit 1 > > > rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin > > > /usr/lpp/mmfs/src/lib > > > mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin > > > /usr/lpp/mmfs/src/lib > > > rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver > > > cleaning (/usr/lpp/mmfs/src/ibm-kxi) > > > make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi' > > > rm -f trcid.h ibm_kxi.trclst > > > > > > [cut] > > > > > > Invoking Kbuild... > > > /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64 > > > M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config ; \ > > > if [ $? -ne 0 ]; then \ > > > exit 1;\ > > > fi > > > make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > > > LD /usr/lpp/mmfs/src/gpl-linux/built-in.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o > > > LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o > > > LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o > > > In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?: > > > /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? has no > > > member named ?i_wb_list? > > > _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)), > > > (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP- > > > >i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev)); > > > ^ > > > /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro > > > _TRACE_MACRO? > > > { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP > > > > > > [ cut ] > > > > > > ^ > > > /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro > > > ?TRACE6? > > > TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8, > > > ^ > > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of > > > ?security_old_inode_init_security? from incompatible pointer type [enabled > > > by default] > > > rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name, > > > ^ > > > In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0, > > > from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61, > > > from /usr/lpp/mmfs/src/gpl-linux/dir.c:56, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > include/linux/security.h:1896:5: note: expected ?const char **? but > > > argument is of type ?char **? > > > int security_old_inode_init_security(struct inode *inode, struct inode > > > *dir, > > > ^ > > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?: > > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration > > > of function ?vfs_readdir? [-Werror=implicit-function-declaration] > > > error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer); > > > ^ > > > cc1: some warnings being treated as errors > > > make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1 > > > make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2 > > > make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > > > make[1]: *** [modules] Error 1 > > > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' > > > make: *** [Modules] Error 1 > > > -------------------------------------------------------- > > > mmbuildgpl: Building GPL module failed at Fri Sep 6 09:30:28 UTC 2019. > > > -------------------------------------------------------- > > > mmbuildgpl: Command failed. Examine previous error messages to determine > > > cause. > > > > > > Any help appreciated? > > > Son > > > > > > Son V Truong - Senior Storage Administrator > > > Advanced Computing Research Centre > > > IT Services, University of Bristol > > > Email: son.truong at bristol.ac.uk > > > Tel: Mobile: +44 (0) 7732 257 232 > > > Address: 31 Great George Street, Bristol, BS1 5QD > > > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From son.truong at bristol.ac.uk Fri Sep 6 15:15:04 2019 From: son.truong at bristol.ac.uk (Son Truong) Date: Fri, 6 Sep 2019 14:15:04 +0000 Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 In-Reply-To: <05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk> References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu> , <05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk> Message-ID: Thank you. Table 39 is most helpful. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Witherick, Dugan Sent: 06 September 2019 13:25 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 Hi Son, You might also find Table 39 on https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#fsm useful as it lists the minimum Spectrum Scale Level supported and tested against the RHEL Distribution/kernel version. Thanks, Dugan On Fri, 2019-09-06 at 11:41 +0000, Alexander Wolf wrote: > RHEL 7.7 is not supported by any Scale release at the moment. We are > qualifying it right now and would like to claim support with the next > PTFs on both 4.2.3 and 5.0.3 streams. However we have seen issues in > test that will probably cause delays. > > Picking up new minor RHEL updates before Scale claims support might > work many times but is quite a risky business. I highly recommend > waiting for our support statement. > > Mit freundlichen Gr??en / Kind regards > > > > > > Dr. Alexander Wolf-Reber > Spectrum Scale Release Lead Architect > Department M069 / Spectrum Scale Software Development > > +49-160-90540880 > a.wolf-reber at de.ibm.com > > IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: > Matthias Hartmann / Gesch?ftsf?hrung: Dirk Wittkopp Sitz der > Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB > 243294 > > > > > ----- Original message ----- > > From: david_johnson at brown.edu > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > To: gpfsug main discussion list > > Cc: > > Subject: [EXTERNAL] Re: [gpfsug-discuss] Compiling gplbin on RHEL > > 7.7 > > Date: Fri, Sep 6, 2019 12:33 > > > > We are starting rolling upgrade to 5.0.3-x and gplbin compiles with > > non- fatal warnings at that version. It seems to run fine. The rest > > of the cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do > > you have a reason to not go for the latest release on either the 4- or 5- line? > > > > [root at xxx ~]# ssh node1301 rpm -q gpfs.base > > gpfs.base-4.2.3-10.x86_64 > > > > > > -- ddj > > Dave Johnson > > > > On Sep 6, 2019, at 5:48 AM, Son Truong wrote: > > > > > Hello, > > > > > > Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel > > > modules on RHEL 7.7? > > > > > > I am failing with these errors: > > > > > > [root at host ~]# uname -a > > > Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC > > > 2019 > > > x86_64 x86_64 x86_64 GNU/Linux > > > > > > [root at host ~]# rpm -qa | grep gpfs > > > gpfs.base-4.2.3-7.x86_64 > > > gpfs.gskit-8.0.50-75.x86_64 > > > gpfs.ext-4.2.3-7.x86_64 > > > gpfs.msg.en_US-4.2.3-7.noarch > > > gpfs.docs-4.2.3-7.noarch > > > gpfs.gpl-4.2.3-7.noarch > > > > > > [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl > > > -------------------------------------------------------- > > > mmbuildgpl: Building GPL module begins at Fri Sep 6 09:30:20 UTC 2019. > > > -------------------------------------------------------- > > > Verifying Kernel Header... > > > kernel version = 31000999 (31000999000000, > > > 3.10.0-1062.el7.x86_64, > > > 3.10.0-1062) > > > module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include > > > module build dir = /lib/modules/3.10.0-1062.el7.x86_64/build > > > kernel source dir = /usr/src/linux-3.10.0-1062.el7.x86_64/include > > > Found valid kernel header file under /usr/src/kernels/3.10.0- > > > 1062.el7.x86_64/include Verifying Compiler... > > > make is present at /bin/make > > > cpp is present at /bin/cpp > > > gcc is present at /bin/gcc > > > g++ is present at /bin/g++ > > > ld is present at /bin/ld > > > Verifying Additional System Headers... > > > Verifying kernel-headers is installed ... > > > Command: /bin/rpm -q kernel-headers > > > The required package kernel-headers is installed make World > > > ... > > > Verifying that tools to build the portability layer exist.... > > > cpp present > > > gcc present > > > g++ present > > > ld present > > > cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > > > > ./def.mk; exit $? || exit 1 rm -rf /usr/lpp/mmfs/src/include > > > /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib mkdir > > > /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin > > > /usr/lpp/mmfs/src/lib rm -f > > > //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver > > > cleaning (/usr/lpp/mmfs/src/ibm-kxi) > > > make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi' > > > rm -f trcid.h ibm_kxi.trclst > > > > > > [cut] > > > > > > Invoking Kbuild... > > > /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 > > > ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux > > > CONFIGDIR=/usr/lpp/mmfs/src/config ; \ if [ $? -ne 0 ]; then \ > > > exit 1;\ > > > fi > > > make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > > > LD /usr/lpp/mmfs/src/gpl-linux/built-in.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o > > > LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o > > > LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o > > > In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?: > > > /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? > > > has no member named ?i_wb_list? > > > _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), > > > (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), > > > (Int64)(iP->i_wb_list.prev), (Int64)(&(iP- > > > >i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev)); > > > ^ > > > /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition > > > of macro _TRACE_MACRO? > > > { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP > > > > > > [ cut ] > > > > > > ^ > > > /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of > > > macro ?TRACE6? > > > TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8, > > > ^ > > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing > > > argument 4 of ?security_old_inode_init_security? from incompatible > > > pointer type [enabled by default] > > > rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name, > > > ^ > > > In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0, > > > from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61, > > > from /usr/lpp/mmfs/src/gpl-linux/dir.c:56, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > include/linux/security.h:1896:5: note: expected ?const char **? > > > but argument is of type ?char **? > > > int security_old_inode_init_security(struct inode *inode, struct > > > inode *dir, > > > ^ > > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?: > > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit > > > declaration of function ?vfs_readdir? [-Werror=implicit-function-declaration] > > > error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer); > > > ^ > > > cc1: some warnings being treated as errors > > > make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1 > > > make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2 > > > make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > > > make[1]: *** [modules] Error 1 > > > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' > > > make: *** [Modules] Error 1 > > > -------------------------------------------------------- > > > mmbuildgpl: Building GPL module failed at Fri Sep 6 09:30:28 UTC 2019. > > > -------------------------------------------------------- > > > mmbuildgpl: Command failed. Examine previous error messages to > > > determine cause. > > > > > > Any help appreciated? > > > Son > > > > > > Son V Truong - Senior Storage Administrator Advanced Computing > > > Research Centre IT Services, University of Bristol > > > Email: son.truong at bristol.ac.uk > > > Tel: Mobile: +44 (0) 7732 257 232 > > > Address: 31 Great George Street, Bristol, BS1 5QD > > > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Robert.Oesterlin at nuance.com Fri Sep 6 16:42:39 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 6 Sep 2019 15:42:39 +0000 Subject: [gpfsug-discuss] SSUG Meeting at SC19: Save the date and call for user talks! Message-ID: The Spectrum Scale User group will hold its annual meeting at SC19 on Sunday November 17th from 12:30PM -6PM In Denver, Co. We will be posting exact meeting location soon, but reserve this time. IBM will host a reception following the user group meeting. We?re also looking for user talks - these are short update (20 mins or so) on your use of Spectrum Scale - any topics are welcome. If you are interested, please contact myself or Kristy Kallback-Rose. Looking forward to seeing everyone in Denver! Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From bipcuds at gmail.com Mon Sep 9 21:29:28 2019 From: bipcuds at gmail.com (Keith Ball) Date: Mon, 9 Sep 2019 16:29:28 -0400 Subject: [gpfsug-discuss] Anyone have experience with changing NSD server node name in an ESS/DSS cluster? Message-ID: Hi All, We are thinking of attempting a non-destructive change of NSD server node names in a Lenovo DSS cluster (DSS level 1.2a, which has Scale 4.2.3.5). For a non-GNR cluster, changing a node name for an NSD server isn't a huge deal if you can have a backup server serve up disks; one can mmdelnode then mmaddnode, for instance. Has anyone tried to rename the NSD servers in a GNR cluster, however? I am not sure if it's as easy as failing over the recovery group, and deleting/adding the NSD server. It's easy enough to modify xcat. Perhaps mmchrecoverygroup can be used to change the RG names (since they are named after the NSD servers), but that might not be necessary. Or, it might not work - does anyone know if there is a special process to change NSD server names in an E( or D or G)SS cluster that does not run afoul of GNR or upgrade scripts? Best regards, Keith -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Wed Sep 11 13:20:22 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 11 Sep 2019 14:20:22 +0200 Subject: [gpfsug-discuss] Save the date: Oct 10 - Spectrum Scale NYC User Meeting Message-ID: Greetings, NYU Langone and IBM will host a Spectrum Scale User Meeting on October 10. Many senior engineers of our development lab in Poughkeepsie will attend and present. Details with agenda, exact location and registration link will follow. Best Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjvilla at nccs.nasa.gov Wed Sep 11 20:14:12 2019 From: jjvilla at nccs.nasa.gov (John J. Villa) Date: Wed, 11 Sep 2019 15:14:12 -0400 (EDT) Subject: [gpfsug-discuss] Introduction - New Subscriber Message-ID: Hello, My name is John Villa. I work for NASA at the Nasa Center for Climate Simulation. We currently utilize GPFS as the primary filesystem on the discover cluster: https://www.nccs.nasa.gov/systems/discover I look forward to seeing everyone at SC19. Thank You, -- John J. Villa NASA Center for Climate Simulation Discover Systems Administrator From damir.krstic at gmail.com Thu Sep 12 15:16:03 2019 From: damir.krstic at gmail.com (Damir Krstic) Date: Thu, 12 Sep 2019 09:16:03 -0500 Subject: [gpfsug-discuss] VerbsReconnectThread waiters Message-ID: On my cluster I have seen couple of long waiters such as this: gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230 VerbsReconnectThread: delaying for 43.145624000 more seconds, reason: delaying for next reconnect attempt I tried searching on gpfs wiki for this type of waiter, but was unable to find anything of value. Is this something to pay attention to, and what does this waiter mean? Thank you. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From george at markomanolis.com Thu Sep 12 16:10:58 2019 From: george at markomanolis.com (George Markomanolis) Date: Thu, 12 Sep 2019 11:10:58 -0400 Subject: [gpfsug-discuss] Call for Submission for the IO500 List Message-ID: Call for Submission *Deadline*: 10 November 2019 AoE The IO500 is now accepting and encouraging submissions for the upcoming 5th IO500 list revealed at SC19 in Denver, Colorado. Once again, we are also accepting submissions to the 10 Node I/O Challenge to encourage submission of small scale results. The new ranked lists will be announced at our SC19 BoF [2]. We hope to see you, and your results, there. We have updated our submission rules [3]. This year, we will have a new list for the Student Cluster Competition as IO500 is used for extra points during this competition The benchmark suite is designed to be easy to run and the community has multiple active support channels to help with any questions. Please submit and we look forward to seeing many of you at SC19! Please note that submissions of all sizes are welcome; the site has customizable sorting so it is possible to submit on a small system and still get a very good per-client score for example. Additionally, the list is about much more than just the raw rank; all submissions help the community by collecting and publishing a wider corpus of data. More details below. Following the success of the Top500 in collecting and analyzing historical trends in supercomputer technology and evolution, the IO500 was created in 2017, published its first list at SC17, and has grown exponentially since then. The need for such an initiative has long been known within High-Performance Computing; however, defining appropriate benchmarks had long been challenging. Despite this challenge, the community, after long and spirited discussion, finally reached consensus on a suite of benchmarks and a metric for resolving the scores into a single ranking. The multi-fold goals of the benchmark suite are as follows: 1. Maximizing simplicity in running the benchmark suite 2. Encouraging complexity in tuning for performance 3. Allowing submitters to highlight their ?hero run? performance numbers 4. Forcing submitters to simultaneously report performance for challenging IO patterns. Specifically, the benchmark suite includes a hero-run of both IOR and mdtest configured however possible to maximize performance and establish an upper-bound for performance. It also includes an IOR and mdtest run with highly prescribed parameters in an attempt to determine a lower-bound. Finally, it includes a namespace search as this has been determined to be a highly sought-after feature in HPC storage systems that have historically not been well-measured. Submitters are encouraged to share their tuning insights for publication. The goals of the community are also multi-fold: 1. Gather historical data for the sake of analysis and to aid predictions of storage futures 2. Collect tuning information to share valuable performance optimizations across the community 3. Encourage vendors and designers to optimize for workloads beyond ?hero runs? 4. Establish bounded expectations for users, procurers, and administrators 10 Node I/O Challenge At SC, we will continue the 10 Node Challenge. This challenge is conducted using the regular IO500 benchmark, however, with the rule that exactly *10 computes nodes* must be used to run the benchmark (one exception is the find, which may use 1 node). You may use any shared storage with, e.g., any number of servers. We will announce the result in a separate derived list and in the full list but not on the ranked IO500 list at io500.org. Birds-of-a-feather Once again, we encourage you to submit [1], to join our community, and to attend our BoF ?The IO500 and the Virtual Institute of I/O? at SC19, November 19th, 12:15-1:15pm, room 205-207, where we will announce the new IO500 list, the 10 node challenge list, and the Student Cluster Competition list. We look forward to answering any questions or concerns you might have. [1] http://io500.org/submission [2] *https://www.vi4io.org/io500/bofs/sc19/start * [3] https://www.vi4io.org/io500/rules/submission The IO500 committee -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Thu Sep 12 20:19:20 2019 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 12 Sep 2019 12:19:20 -0700 Subject: [gpfsug-discuss] Hold the Date - September 23 and 24 - REGISTRATION CLOSING SOON In-Reply-To: <938EC571-B900-42BC-8465-3E666912533F@lbl.gov> References: <3F2B08E9-C6E3-412B-9308-D79E3480C5DA@lbl.gov> <938EC571-B900-42BC-8465-3E666912533F@lbl.gov> Message-ID: Reminder, registration closing on 9/16 EOB. That?s real soon now. Hope to see you there. Details below. > On Aug 29, 2019, at 7:30 PM, Kristy Kallback-Rose wrote: > > Hello, > > You will now find the nearly complete agenda here: > > https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/ > > As noted before, the event is free, but please do register below to help with catering planning. > > You can find more information about the full HPCXXL event here: http://hpcxxl.org/ > > Any questions let us know. Hope to see you there! > > -Kristy > >> On Jul 2, 2019, at 10:45 AM, Kristy Kallback-Rose > wrote: >> >> Hello, >> >> HPCXXL will be hosted by NERSC (Berkeley, CA) this September. As part of this event, there will be approximately a day and a half on GPFS content. We have done this type of event in the past, and as before, the GPFS days will be free to attend, but you do need to register. >> >> We?ll have more details soon, mark your calendars. >> >> Initial details: https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/ >> >> Best, >> Kristy > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Fri Sep 13 09:48:58 2019 From: Greg.Lehmann at csiro.au (Lehmann, Greg (IM&T, Pullenvale)) Date: Fri, 13 Sep 2019 08:48:58 +0000 Subject: [gpfsug-discuss] infiniband fabric instability effects Message-ID: Hi All, I was wondering what effect restarting the subnet manager has on an active Spectrum Scale filesystem. Is there any scope for data loss or corruption? A 2nd similar scenario of slightly longer duration is failover to a secondary subnet manager because the primary has crashed. What effect would that have on the filesystem? Cheers, Greg Lehmann Senior High Performance Data Specialist Data Services | Scientific Computing Platforms Information Management and Technology | CSIRO Greg.Lehmann at csiro.au | +61 7 3327 4137 | 1 Technology Court, Pullenvale, QLD 4069 CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and we pay our respects to their Elders past and present. The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference. Please consider the environment before printing this email. CSIRO Australia's National Science Agency | csiro.au -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Sep 13 10:14:06 2019 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Fri, 13 Sep 2019 05:14:06 -0400 Subject: [gpfsug-discuss] infiniband fabric instability effects In-Reply-To: References: Message-ID: <21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu> Restarting subnet manager in general is fairly harmless. It will cause a heavy sweep of the fabric when it comes back up, but there should be no LID renumbering. Traffic may be held up during the scanning and rebuild of the routing tables. Losing a subnet manager for a period of time would prevent newly booted nodes from receiving a LID but existing nodes will continue to function. Adding or deleting inter-switch links should probably be avoided if the subnet manager is down. I would also avoid changing the routing algorithm while in production. Moving a non ha subnet manager from primary to backup and back again has worked for us without disruption, but I would try to do this in a maintenance window. -- ddj Dave Johnson > On Sep 13, 2019, at 4:48 AM, Lehmann, Greg (IM&T, Pullenvale) wrote: > > Hi All, > I was wondering what effect restarting the subnet manager has on an active Spectrum Scale filesystem. Is there any scope for data loss or corruption? A 2nd similar scenario of slightly longer duration is failover to a secondary subnet manager because the primary has crashed. What effect would that have on the filesystem? > > Cheers, > > Greg Lehmann > Senior High Performance Data Specialist > Data Services | Scientific Computing Platforms > Information Management and Technology | CSIRO > Greg.Lehmann at csiro.au | +61 7 3327 4137 | > 1 Technology Court, Pullenvale, QLD 4069 > > CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and we pay our respects to their Elders past and present. > > The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference. > > Please consider the environment before printing this email. > > CSIRO Australia?s National Science Agency | csiro.au > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Sep 13 10:48:52 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 13 Sep 2019 09:48:52 +0000 Subject: [gpfsug-discuss] infiniband fabric instability effects In-Reply-To: <21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu> References: <21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu> Message-ID: On Fri, 2019-09-13 at 05:14 -0400, david_johnson at brown.edu wrote: [SNIP] > Moving a non ha subnet manager from primary to backup and back again > has worked for us without disruption, but I would try to do this in a > maintenance window. > Not on GPFS but in the past I have moved from one subnet manager to another with dozens of running MPI jobs, and Lustre running over the fabric and not missed a beat. My current cluster used 10 and 40Gbps ethernet for GPFS with Omnipath exclusively for MPI traffic. To be honest I just cannot wrap my head around the idea that you would not be running two subnet managers in the first place. Just fire up two subnet managers (whether on a switch or a node) and forget about it. They will automatically work together to give you a HA solution. It is the same with Omnipath too. I would also note that you can fire up more than two fabric managers and it all "just works". If it where me and I didn't have fabric managers running on at least two of my switches and I was doing GPFS over Infiniband, I would fire up fabric managers on all of my NSD servers. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From heinrich.billich at id.ethz.ch Fri Sep 13 15:56:07 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Fri, 13 Sep 2019 14:56:07 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Message-ID: Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level? Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* From ewahl at osc.edu Fri Sep 13 16:42:30 2019 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 13 Sep 2019 15:42:30 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: Message-ID: I recall looking at this a year or two back. Ganesha is either v4 and v6 both (ie: the encapsulation you see), OR ipv4 ONLY. (ie: /etc/modprobe.d/ipv6.conf disable=1) Ed ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Billich Heinrich Rainer (ID SD) Sent: Friday, September 13, 2019 10:56 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level? Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jam at ucar.edu Fri Sep 13 17:07:01 2019 From: jam at ucar.edu (Joseph Mendoza) Date: Fri, 13 Sep 2019 10:07:01 -0600 Subject: [gpfsug-discuss] VerbsReconnectThread waiters In-Reply-To: References: Message-ID: <0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu> I have seen these on our cluster after the IB network goes down (GPFS still runs over ethernet) and then comes back up.? They will retry forever it seems, even after the IB is healthy again.? The effect they seem to have is that verbs connections between some nodes breaks and GPFS uses ethernet/ipoib instead.? You may see messages in your mmfs.log.latest about verbs being disabled "due to too many errors".? You can also see fewer verbs connections between nodes in "mmfsadm test verbs conn" output. Restarting GPFS on the nodes with waiters has fixed the issue for me, I don't know if IBM has any other tricks to fix this without a restart. --Joey On 9/12/19 8:16 AM, Damir Krstic wrote: > On my cluster I have seen couple of long waiters such as this: > > gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230 VerbsReconnectThread: delaying for 43.145624000 more > seconds, reason: delaying for next reconnect attempt > > I tried searching on gpfs wiki for this type of waiter, but was unable to find anything of value. > > Is this something to pay attention to, and what does this waiter mean? > > Thank you. > Damir > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Sep 16 08:12:09 2019 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 16 Sep 2019 09:12:09 +0200 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Sep 16 10:33:58 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 16 Sep 2019 17:33:58 +0800 Subject: [gpfsug-discuss] VerbsReconnectThread waiters In-Reply-To: <0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu> References: <0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu> Message-ID: Damir, Joseph, > Is this something to pay attention to, and what does this waiter mean? This waiter means GPFS fails to reconnect broken verbs connection, which can cause performance degradation. > I have seen these on our cluster after the IB network goes down (GPFS still runs over ethernet) and then comes back up.? They will retry forever it seems, even after the IB is healthy again. > Restarting GPFS on the nodes with waiters has fixed the issue for me, I don't know if IBM has any other tricks to fix this without a restart. This is a code bug which is fixed through internal defect 1090669. It will be backport to service releases after verification. There is a work-around which can fix this problem without a restart. - On nodes which have this waiter list, run command 'mmfsadm test breakconn all 744' 744 is E_RECONNECT, which triggers tcp reconnect and will not cause node leave/rejoin. Its side effect clears RDMA connections and their incorrect status. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Joseph Mendoza To: gpfsug-discuss at spectrumscale.org Date: 2019/09/14 12:08 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] VerbsReconnectThread waiters Sent by: gpfsug-discuss-bounces at spectrumscale.org I have seen these on our cluster after the IB network goes down (GPFS still runs over ethernet) and then comes back up.? They will retry forever it seems, even after the IB is healthy again.? The effect they seem to have is that verbs connections between some nodes breaks and GPFS uses ethernet/ipoib instead.? You may see messages in your mmfs.log.latest about verbs being disabled "due to too many errors".? You can also see fewer verbs connections between nodes in "mmfsadm test verbs conn" output. Restarting GPFS on the nodes with waiters has fixed the issue for me, I don't know if IBM has any other tricks to fix this without a restart. --Joey On 9/12/19 8:16 AM, Damir Krstic wrote: On my cluster I have seen couple of long waiters such as this: gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230 VerbsReconnectThread: delaying for 43.145624000 more seconds, reason: delaying for next reconnect attempt I tried searching on gpfs wiki for this type of waiter, but was unable to find anything of value. Is this something to pay attention to, and what does this waiter mean? Thank you. Damir _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=WoT3TYlCvAM8RQxUISD9L6UzqY0I_ffCJTS-UHhw8z4&s=18A0j0Zmp8OwZ6Y6cc3HFe3OgFZRHIv8OeJcBpkaPwQ&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From alvise.dorigo at psi.ch Mon Sep 16 13:58:03 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Mon, 16 Sep 2019 12:58:03 +0000 Subject: [gpfsug-discuss] Can 5-minutes frequent lsscsi command disrupt GPFS I/O on a Lenovo system ? Message-ID: <83A6EEB0EC738F459A39439733AE80452BEA85FE@MBX214.d.ethz.ch> Hello folks, recently I observed that calling every 5 minutes the command "lsscsi -g" on a Lenovo I/O node (a X3650 M5 connected to D3284 enclosures, part of a DSS-G220 system) can seriously compromise the GPFS I/O performance. (The motivation of running lsscsi every 5 minutes is a bit out of topic, but I can explain on request). What we observed is that there were several GPFS waiters telling that flushing caches to physical disk was impossible and they had to wait (possibly going in timeout). Is this something expected and/or observed by someone else in this community ? Thanks Regards, Alvise Dorigo -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Mon Sep 16 15:50:24 2019 From: ewahl at osc.edu (Wahl, Edward) Date: Mon, 16 Sep 2019 14:50:24 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: , Message-ID: What package provides this /usr/lib/tuned/ file? Ed ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Olaf Weiser Sent: Monday, September 16, 2019 3:12 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hallo Heiner, usually, Spectrum Scale comes with a tuned profile (named scale) .. [root at nsd01 ~]# tuned-adm active Current active profile: scale in there [root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3 # Disable IPv6 net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 [root at nsd01 ~]# depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ... but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled... From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 09/13/2019 05:02 PM Subject: [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From cblack at nygenome.org Mon Sep 16 15:55:34 2019 From: cblack at nygenome.org (Christopher Black) Date: Mon, 16 Sep 2019 14:55:34 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: Message-ID: On our recent ESS systems we do not see /etc/tuned/scale/tuned.conf (or script.sh) owned by any package (rpm -qif ?). I?ve attached what we have on our ESS 5.3.3 systems. Best, Chris From: on behalf of "Wahl, Edward" Reply-To: gpfsug main discussion list Date: Monday, September 16, 2019 at 10:50 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? What package provides this /usr/lib/tuned/ file? Ed ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Olaf Weiser Sent: Monday, September 16, 2019 3:12 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hallo Heiner, usually, Spectrum Scale comes with a tuned profile (named scale) .. [root at nsd01 ~]# tuned-adm active Current active profile: scale in there [root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3 # Disable IPv6 net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 [root at nsd01 ~]# depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ... but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled... From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 09/13/2019 05:02 PM Subject: [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tuned.conf Type: application/octet-stream Size: 2859 bytes Desc: tuned.conf URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: script.sh Type: application/octet-stream Size: 270 bytes Desc: script.sh URL: From heinrich.billich at id.ethz.ch Mon Sep 16 16:49:57 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 16 Sep 2019 15:49:57 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: Message-ID: <766AA5C3-46BD-4B91-9D1E-52BC5FAB90A8@id.ethz.ch> Hello Olaf, Thank you, so we?ll try to get rid of IPv6. Actually we do have this settings active but I may have to add them to the initrd file, too. (See https://access.redhat.com/solutions/8709#?rhel7disable) to prevent ganesha from opening an IPv6 socket. It?s probably no big issue if ganesha uses IPv4overIPv6 for all connections, but to keep things simple I would like to avoid it. @Edward We got /etc/tuned/scale/tuned.conf with GSS/xCAT. I?m not sure whether it?s part of any rpm. Cheers, Heiner From: on behalf of Olaf Weiser Reply to: gpfsug main discussion list Date: Monday, 16 September 2019 at 09:12 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hallo Heiner, usually, Spectrum Scale comes with a tuned profile (named scale) .. [root at nsd01 ~]# tuned-adm active Current active profile: scale in there [root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3 # Disable IPv6 net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 [root at nsd01 ~]# depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ... but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled... From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 09/13/2019 05:02 PM Subject: [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Sep 16 18:34:07 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 16 Sep 2019 17:34:07 +0000 Subject: [gpfsug-discuss] SSUG @ SC19 Update: Scheduling and Sponsorship Opportunities Message-ID: Two months until SC19 and the schedule is starting to come together, with a great mix of technical updates and user talks. I would like highlight a few items for you to be aware of: - Morning session: We?re currently trying to put together a morning ?new users? session for those new to Spectrum Scale. These talks would be focused on fundamentals and give an opportunity to ask questions. We?re tentatively thinking about starting around 9:30-10 AM on Sunday November 17th. Watch the mailing list for updates and on the http://spectrumscale.org site. - Sponsorships: We?re looking for sponsors. If your company is an IBM partner, uses/incorporates Spectrum Scale - please contact myself or Kristy Kallback-Rose. We are looking for sponsors to help with lunch (YES - we?d like to serve lunch this year!) and WiFi access during the user group meeting. Looking forward to seeing you all at SC19. Registration link coming soon, watch here: https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-sc19/ Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Sep 18 18:56:29 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 18 Sep 2019 17:56:29 +0000 Subject: [gpfsug-discuss] GPFS and POWER9 Message-ID: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log: Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3 Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [00000000115a2478] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Load/Store] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000003002a2a8400 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c016590000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001150b160] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001150b160 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c01fe80000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001086a7f0 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c00fe70000 Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4 I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed. Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)? Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Sep 19 11:44:46 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 19 Sep 2019 10:44:46 +0000 Subject: [gpfsug-discuss] GPFS and POWER9 In-Reply-To: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> References: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Thu Sep 19 15:20:53 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Thu, 19 Sep 2019 14:20:53 +0000 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? Message-ID: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Hello, Is it usual to see 200?000-400?000 open files for a single ganesha process? Or does this indicate that something ist wrong? We have some issues with ganesha (on spectrum scale protocol nodes) reporting NFS3ERR_IO in the log. I noticed that the affected nodes have a large number of open files, 200?000-400?000 open files per daemon (and 500 threads and about 250 client connections). Other nodes have 1?000 ? 10?000 open files by ganesha only and don?t show the issue. If someone could explain how ganesha decides which files to keep open and which to close that would help, too. As NFSv3 is stateless the client doesn?t open/close a file, it?s the server to decide when to close it? We do have a few NFSv4 clients, too. Are there certain access patterns that can trigger such a large number of open file? Maybe traversing and reading a large number of small files? Thank you, Heiner I did count the open files by counting the entries in /proc//fd/ . With several 100k entries I failed to do a ?ls -ls? to list all the symbolic links, hence I can?t relate the open files to different exports easily. I did post this to the ganesha mailing list, too. -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederik.ferner at diamond.ac.uk Thu Sep 19 15:30:45 2019 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Thu, 19 Sep 2019 15:30:45 +0100 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Message-ID: Heiner, we are seeing similar issues with CES/ganesha NFS, in our case it exclusively with NFSv3 clients. What is maxFilesToCache set to on your ganesha node(s)? In our case ganesha was running into the limit of open file descriptors because maxFilesToCache was set at a low default and for now we've increased it to 1M. It seemed that ganesha was never releasing files even after clients unmounted the file system. We've only recently made the change, so we'll see how much that improved the situation. I thought we had a reproducer but after our recent change, I can now no longer successfully reproduce the increase in open files not being released. Kind regards, Frederik On 19/09/2019 15:20, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > Is it usual to see 200?000-400?000 open files for a single ganesha > process? Or does this indicate that something ist wrong? > > We have some issues with ganesha (on spectrum scale protocol nodes) > ?reporting NFS3ERR_IO in the log. I noticed that the affected nodes > have a large number of open files, 200?000-400?000 open files per daemon > (and 500 threads and about 250 client connections). Other nodes have > 1?000 ? 10?000 open files by ganesha only and don?t show the issue. > > If someone could explain how ganesha decides which files to keep open > and which to close that would help, too. As NFSv3 is stateless the > client doesn?t open/close a file, it?s the server to decide when to > close it? We do have a few NFSv4 clients, too. > > Are there certain access patterns that can trigger such a large number > of open file? Maybe traversing and reading a large number of small files? > > Thank you, > > Heiner > > I did count the open files ?by counting the entries in /proc/ ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to > list all the symbolic links, hence I can?t relate the open files to > different exports easily. > > I did post this to the ganesha mailing list, too. > > -- > > ======================= > > Heinrich Billich > > ETH Z?rich > > Informatikdienste > > Tel.: +41 44 632 72 56 > > heinrich.billich at id.ethz.ch > > ======================== > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom From S.J.Thompson at bham.ac.uk Thu Sep 19 16:18:47 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 19 Sep 2019 15:18:47 +0000 Subject: [gpfsug-discuss] GPFS and POWER9 In-Reply-To: References: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> Message-ID: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk> Hi Andrew, Yes, but not only. We use the two SFP+ ports from the Broadcom supplied card + the bifurcated Mellanox card in them. Simon From: on behalf of "abeattie at au1.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 19 September 2019 at 11:45 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] GPFS and POWER9 Simon, are you using Intel 10Gb Network Adapters with RH 7.6 by anychance? regards Andrew Beattie File and Object Storage Technical Specialist - A/NZ IBM Systems - Storage Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Simon Thompson Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [EXTERNAL] [gpfsug-discuss] GPFS and POWER9 Date: Thu, Sep 19, 2019 8:42 PM Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log: Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3 Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [00000000115a2478] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Load/Store] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000003002a2a8400 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c016590000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001150b160] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001150b160 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c01fe80000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001086a7f0 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c00fe70000 Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4 I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed. Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)? Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Thu Sep 19 19:38:53 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Thu, 19 Sep 2019 18:38:53 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Ganesha_daemon_has_400=27000_open_file?= =?utf-8?q?s_-_is_this=09unusual=3F?= In-Reply-To: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Sep 19 22:34:33 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 19 Sep 2019 21:34:33 +0000 Subject: [gpfsug-discuss] GPFS and POWER9 In-Reply-To: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk> References: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>, <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Sep 19 23:41:08 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 19 Sep 2019 22:41:08 +0000 Subject: [gpfsug-discuss] Leftover GUI events after ESS upgrade Message-ID: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com> I just upgraded to ESS 5.3.4-1, and during the process these appeared. They only show up in the GUI. They don?t appear in gnrhelathcheck or mmhealth. pdisk checks are clearAny idea how to get rid of them? GSSIO1-HS GNR pdisk rg_gssio1-hs/n001v001 is diagnosing GSSIO1-HS GNR pdisk rg_gssio2-hs/n001v002 is diagnosing Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Fri Sep 20 09:08:01 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Fri, 20 Sep 2019 10:08:01 +0200 Subject: [gpfsug-discuss] Agenda and registration link // Oct 10 - Spectrum Scale NYC User Meeting Message-ID: Draft agenda and registration link are now available: https://www.spectrumscaleug.org/event/spectrum-scale-nyc-user-meeting-2019/ -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Ulf Troppens/Germany/IBM on 20/09/2019 09:37 ----- From: "Ulf Troppens" To: gpfsug main discussion list Date: 11/09/2019 14:27 Subject: [EXTERNAL] [gpfsug-discuss] Save the date: Oct 10 - Spectrum Scale NYC User Meeting Sent by: gpfsug-discuss-bounces at spectrumscale.org Greetings, NYU Langone and IBM will host a Spectrum Scale User Meeting on October 10. Many senior engineers of our development lab in Poughkeepsie will attend and present. Details with agenda, exact location and registration link will follow. Best Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=kZaabFheMr5-INuBtDMnDjxzZMuvvQ-K0cx1FAfh4lg&m=I3TzCv5SKxKb51eAL_blo-XwctX64z70ayrZKERanWA&s=OSKGngwXAoOemFy3HkctexuIpBJQu8NPeTkC_MMQBks&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Fri Sep 20 10:14:58 2019 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Fri, 20 Sep 2019 11:14:58 +0200 Subject: [gpfsug-discuss] Leftover GUI events after ESS upgrade In-Reply-To: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com> References: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com> Message-ID: Hello Bob, this event is a "Notice": You can use the action "Mark Selected Notices as Read" or "Mark All Notices as Read"in the GUI Event Groups or Individual Events grid. Notice events are transient by nature and don't imply a permanent state change of an entity. It seems that during the upgrade, mmhealth had probed the pdisk and the disk hospital was diagnosing the pdisk at this time, but eventually disk hospital placed the pdisk back to normal state, Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 162 4159920 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 20.09.2019 00:53 Subject: [EXTERNAL] [gpfsug-discuss] Leftover GUI events after ESS upgrade Sent by: gpfsug-discuss-bounces at spectrumscale.org I just upgraded to ESS 5.3.4-1, and during the process these appeared. They only show up in the GUI. They don?t appear in gnrhelathcheck or mmhealth. pdisk checks are clearAny idea how to get rid of them? GSSIO1-HS GNR pdisk rg_gssio1-hs/n001v001 is diagnosing GSSIO1-HS GNR pdisk rg_gssio2-hs/n001v002 is diagnosing Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=hLyf83U0otjISdpV5zl1cSCPVFFUF61ny3jWvv-5kNQ&s=ptMGcpNhnRTogPO2CN_l6jhC-vCN-VQAf53HmRLQDq8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 14525383.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From heinrich.billich at id.ethz.ch Mon Sep 23 10:33:02 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 23 Sep 2019 09:33:02 +0000 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Message-ID: <9D53BE88-A5FC-469F-9362-F2EC67E393B7@id.ethz.ch> Hello Frederik, Thank you. I now see a similar behavior: Ganesha has 500k open files while the node is suspended since 2+hours. I would expect that some cleanup job does remove most of the open FD after a much shorter while. Our systems have an upper limit of 1M open files per process and these spectrum scale settings: ! maxFilesToCache 1048576 ! maxStatCache 2097152 Our ganesha version is 2.5.3. (gpfs.nfs-ganesha-2.5.3-ibm036.10.el7). I don't see the issue with gpfs.nfs-ganesha-2.5.3-ibm030.01.el7. But this second cluster also has a different load pattern. I did also post my initial question to the ganesha mailing list and want to share the reply I've got from Daniel Gryniewicz. Cheers, Heiner Daniel Gryniewicz So, it's not impossible, based on the workload, but it may also be a bug. For global FDs (All NFSv3 and stateless NFSv4), we obviously cannot know when the client closes the FD, and opening/closing all the time causes a large performance hit. So, we cache open FDs. All handles in MDCACHE live on the LRU. This LRU is divided into 2 levels. Level 1 is more active handles, and they can have open FDs. Various operation can demote a handle to level 2 of the LRU. As part of this transition, the global FD on that handle is closed. Handles that are actively in use (have a refcount taken on them) are not eligible for this transition, as the FD may be being used. We have a background thread that runs, and periodically does this demotion, closing the FDs. This thread runs more often when the number of open FDs is above FD_HwMark_Percent of the available number of FDs, and runs constantly when the open FD count is above FD_Limit_Percent of the available number of FDs. So, a heavily used server could definitely have large numbers of FDs open. However, there have also, in the past, been bugs that would either keep the FDs from being closed, or would break the accounting (so they were closed, but Ganesha still thought they were open). You didn't say what version of Ganesha you're using, so I can't tell if one of those bugs apply. Daniel ?On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" wrote: Heiner, we are seeing similar issues with CES/ganesha NFS, in our case it exclusively with NFSv3 clients. What is maxFilesToCache set to on your ganesha node(s)? In our case ganesha was running into the limit of open file descriptors because maxFilesToCache was set at a low default and for now we've increased it to 1M. It seemed that ganesha was never releasing files even after clients unmounted the file system. We've only recently made the change, so we'll see how much that improved the situation. I thought we had a reproducer but after our recent change, I can now no longer successfully reproduce the increase in open files not being released. Kind regards, Frederik On 19/09/2019 15:20, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > Is it usual to see 200?000-400?000 open files for a single ganesha > process? Or does this indicate that something ist wrong? > > We have some issues with ganesha (on spectrum scale protocol nodes) > reporting NFS3ERR_IO in the log. I noticed that the affected nodes > have a large number of open files, 200?000-400?000 open files per daemon > (and 500 threads and about 250 client connections). Other nodes have > 1?000 ? 10?000 open files by ganesha only and don?t show the issue. > > If someone could explain how ganesha decides which files to keep open > and which to close that would help, too. As NFSv3 is stateless the > client doesn?t open/close a file, it?s the server to decide when to > close it? We do have a few NFSv4 clients, too. > > Are there certain access patterns that can trigger such a large number > of open file? Maybe traversing and reading a large number of small files? > > Thank you, > > Heiner > > I did count the open files by counting the entries in /proc/ ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to > list all the symbolic links, hence I can?t relate the open files to > different exports easily. > > I did post this to the ganesha mailing list, too. > > -- > > ======================= > > Heinrich Billich > > ETH Z?rich > > Informatikdienste > > Tel.: +41 44 632 72 56 > > heinrich.billich at id.ethz.ch > > ======================== > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From heinrich.billich at id.ethz.ch Mon Sep 23 11:43:06 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 23 Sep 2019 10:43:06 +0000 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Message-ID: <72079C31-1E3E-4F69-B428-480620466353@id.ethz.ch> Hello Malhal, Thank you. Actually I don?t see the parameter Cache_FDs in our ganesha config. But when I trace LRU processing I see that almost no FDs get released. And the number of FDs given in the log messages doesn?t match what I see in /proc//fd/. I see 512k open files while the logfile give 600k. Even 4hours since the I suspended the node and all i/o activity stopped I see 500k open files and LRU processing doesn?t close any of them. This looks like a bug in gpfs.nfs-ganesha-2.5.3-ibm036.10.el7. I?ll open a case with IBM. We did see gansha to fail to open new files and hence client requests to fail. I assume that 500K FDs compared to 10K FDs as before create some notable overhead for ganesha, spectrum scale and kernel and withdraw resources from samba. I?ll post to the list once we got some results. Cheers, Heiner Start of LRU processing 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1027 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51350 totalclosed:6 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1027 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1027 closing 0 descriptors 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1028 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51400 totalclosed:6 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1028 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1028 closing 0 descriptors End of log 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1029 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1029 closing 0 descriptors 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1030 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51500 totalclosed:6 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1030 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1030 closing 0 descriptors 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :After work, open_fd_count:607024 count:29503718 fdrate:1908874353 threadwait=9 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :currentopen=607024 futility=0 totalwork=51550 biggest_window=335544 extremis=0 lanes=1031 fds_lowat=167772 From: on behalf of Malahal R Naineni Reply to: gpfsug main discussion list Date: Thursday, 19 September 2019 at 20:39 To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? NFSv3 doesn't have open/close requests, so nfs-ganesha opens a file for read/write when there is an NFSv3 read/write request. It does cache file descriptors, so its open count can be very large. If you have 'Cache_FDs = true" in your config, ganesha aggressively caches file descriptors. Taking traces with COMPONENT_CACHE_INODE_LRU level set to full debug should give us better insight on what is happening when the the open file descriptors count is very high. When the I/O failure happens or when the open fd count is high, you could do the following: 1. ganesha_mgr set_log COMPONENT_CACHE_INODE_LRU FULL_DEBUG 2. wait for 90 seconds, then run 3. ganesha_mgr set_log COMPONENT_CACHE_INODE_LRU EVENT Regards, Malahal. ----- Original message ----- From: "Billich Heinrich Rainer (ID SD)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? Date: Thu, Sep 19, 2019 7:51 PM Hello, Is it usual to see 200?000-400?000 open files for a single ganesha process? Or does this indicate that something ist wrong? We have some issues with ganesha (on spectrum scale protocol nodes) reporting NFS3ERR_IO in the log. I noticed that the affected nodes have a large number of open files, 200?000-400?000 open files per daemon (and 500 threads and about 250 client connections). Other nodes have 1?000 ? 10?000 open files by ganesha only and don?t show the issue. If someone could explain how ganesha decides which files to keep open and which to close that would help, too. As NFSv3 is stateless the client doesn?t open/close a file, it?s the server to decide when to close it? We do have a few NFSv4 clients, too. Are there certain access patterns that can trigger such a large number of open file? Maybe traversing and reading a large number of small files? Thank you, Heiner I did count the open files by counting the entries in /proc//fd/ . With several 100k entries I failed to do a ?ls -ls? to list all the symbolic links, hence I can?t relate the open files to different exports easily. I did post this to the ganesha mailing list, too. -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Tue Sep 24 09:52:34 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Tue, 24 Sep 2019 08:52:34 +0000 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Message-ID: <280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch> Hello Frederik, Just some addition, maybe its of interest to someone: The number of max open files for Ganesha is based on maxFilesToCache. Its. 80%of maxFilesToCache up to an upper and lower limits of 2000/1M. The active setting is visible in /etc/sysconfig/ganesha. Cheers, Heiner ?On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" wrote: Heiner, we are seeing similar issues with CES/ganesha NFS, in our case it exclusively with NFSv3 clients. What is maxFilesToCache set to on your ganesha node(s)? In our case ganesha was running into the limit of open file descriptors because maxFilesToCache was set at a low default and for now we've increased it to 1M. It seemed that ganesha was never releasing files even after clients unmounted the file system. We've only recently made the change, so we'll see how much that improved the situation. I thought we had a reproducer but after our recent change, I can now no longer successfully reproduce the increase in open files not being released. Kind regards, Frederik On 19/09/2019 15:20, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > Is it usual to see 200?000-400?000 open files for a single ganesha > process? Or does this indicate that something ist wrong? > > We have some issues with ganesha (on spectrum scale protocol nodes) > reporting NFS3ERR_IO in the log. I noticed that the affected nodes > have a large number of open files, 200?000-400?000 open files per daemon > (and 500 threads and about 250 client connections). Other nodes have > 1?000 ? 10?000 open files by ganesha only and don?t show the issue. > > If someone could explain how ganesha decides which files to keep open > and which to close that would help, too. As NFSv3 is stateless the > client doesn?t open/close a file, it?s the server to decide when to > close it? We do have a few NFSv4 clients, too. > > Are there certain access patterns that can trigger such a large number > of open file? Maybe traversing and reading a large number of small files? > > Thank you, > > Heiner > > I did count the open files by counting the entries in /proc/ ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to > list all the symbolic links, hence I can?t relate the open files to > different exports easily. > > I did post this to the ganesha mailing list, too. > > -- > > ======================= > > Heinrich Billich > > ETH Z?rich > > Informatikdienste > > Tel.: +41 44 632 72 56 > > heinrich.billich at id.ethz.ch > > ======================== > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From valdis.kletnieks at vt.edu Tue Sep 24 21:41:07 2019 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Tue, 24 Sep 2019 16:41:07 -0400 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: <280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch> References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> <280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch> Message-ID: <269692.1569357667@turing-police> On Tue, 24 Sep 2019 08:52:34 -0000, "Billich Heinrich Rainer (ID SD)" said: > Just some addition, maybe its of interest to someone: The number of max open > files for Ganesha is based on maxFilesToCache. Its. 80%of maxFilesToCache up to > an upper and lower limits of 2000/1M. The active setting is visible in > /etc/sysconfig/ganesha. Note that strictly speaking, the values in /etc/sysconfig are in general the values that will be used at next restart - it's totally possible for the system to boot, the then-current values be picked up from /etc/sysconfig, and then any number of things, from configuration automation tools like Ansible, to a cow-orker sysadmin armed with nothing but /usr/bin/vi, to have changed the values without you knowing about it and the daemons not be restarted yet... (Let's just say that in 4 decades of doing this stuff, I've been surprised by that sort of thing a few times. :) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From mnaineni at in.ibm.com Wed Sep 25 18:06:18 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Wed, 25 Sep 2019 17:06:18 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Ganesha_daemon_has_400=27000_open_file?= =?utf-8?q?s_-_is=09this_unusual=3F?= In-Reply-To: <269692.1569357667@turing-police> References: <269692.1569357667@turing-police>, <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch><280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: att6j9ca.dat Type: application/octet-stream Size: 849 bytes Desc: not available URL: From L.R.Sudbery at bham.ac.uk Thu Sep 26 10:38:09 2019 From: L.R.Sudbery at bham.ac.uk (Luke Sudbery) Date: Thu, 26 Sep 2019 09:38:09 +0000 Subject: [gpfsug-discuss] GPFS and POWER9 In-Reply-To: References: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>, <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> Message-ID: <3b15db460ac1459e9ca53bec00f30833@bham.ac.uk> We think our issue was down to numa settings actually - making mmfsd allocate GPU memory. Makes sense given the type of error. Tomer suggested to Simon we set numactlOptioni to "0 8", as per: https://www-01.ibm.com/support/docview.wss?uid=isg1IJ02794 Our tests are not crashing since setting then ? we need to roll it out on all nodes to confirm its fixed all our hangs/reboots. Cheers, Luke -- Luke Sudbery Architecture, Infrastructure and Systems Advanced Research Computing, IT Services Room 132, Computer Centre G5, Elms Road Please note I don?t work on Monday and work from home on Friday. From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of abeattie at au1.ibm.com Sent: 19 September 2019 22:35 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] GPFS and POWER9 Simon, I have an open support call that required Redhat to create a kernel patch for RH 7.6 because of issues with the Intel x710 network adapter - I can't tell you if its related to your issue or not but it would cause the GPFS cluster to reboot and the affected node to reboot if we tried to do almost anything with that intel adapter regards, Andrew Beattie File and Object Storage Technical Specialist - A/NZ IBM Systems - Storage Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Simon Thompson > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS and POWER9 Date: Fri, Sep 20, 2019 1:18 AM Hi Andrew, Yes, but not only. We use the two SFP+ ports from the Broadcom supplied card + the bifurcated Mellanox card in them. Simon From: > on behalf of "abeattie at au1.ibm.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 19 September 2019 at 11:45 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] GPFS and POWER9 Simon, are you using Intel 10Gb Network Adapters with RH 7.6 by anychance? regards Andrew Beattie File and Object Storage Technical Specialist - A/NZ IBM Systems - Storage Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Simon Thompson > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" > Cc: Subject: [EXTERNAL] [gpfsug-discuss] GPFS and POWER9 Date: Thu, Sep 19, 2019 8:42 PM Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log: Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3 Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [00000000115a2478] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Load/Store] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000003002a2a8400 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c016590000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001150b160] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001150b160 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c01fe80000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001086a7f0 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c00fe70000 Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4 I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed. Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)? Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.mattsson at maxiv.lu.se Thu Sep 26 10:55:45 2019 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Thu, 26 Sep 2019 09:55:45 +0000 Subject: [gpfsug-discuss] afmRefreshAsync questions Message-ID: Hi, Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up. Enabling this feature has had zero impact on performance of the software though. The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x. Would this feature still have any effect in this setup? Regards, Andreas Mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Sep 27 09:23:13 2019 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 27 Sep 2019 13:53:13 +0530 Subject: [gpfsug-discuss] afmRefreshAsync questions In-Reply-To: References: Message-ID: Hi, Both storage and client clusters have to be on 5.0.3.x to get the AFM revalidation performance with afmRefreshAsync. What are the refresh intervals ?, you could also try increasing them. Is this config option set at fileset level or cluster level ? ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: GPFS User Group Date: 09/26/2019 03:26 PM Subject: [EXTERNAL] [gpfsug-discuss] afmRefreshAsync questions Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up. Enabling this feature has had zero impact on performance of the software though. The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x. Would this feature still have any effect in this setup? Regards, Andreas Mattsson ____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=tjCOcTjZ_AjP3N1mpspwuLu5u2XOFb5LkZqVAwX3wk8&s=tD6X2XM1HPMqWxSg-IelnstWbneQ7On4xfEVkCajtPE&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4232 bytes Desc: not available URL: From sakkuma4 at in.ibm.com Fri Sep 27 11:31:42 2019 From: sakkuma4 at in.ibm.com (Saket Kumar11) Date: Fri, 27 Sep 2019 10:31:42 +0000 Subject: [gpfsug-discuss] afmRefreshAsync questions In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Sun Sep 1 14:17:01 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Sun, 1 Sep 2019 13:17:01 +0000 Subject: [gpfsug-discuss] Backup question In-Reply-To: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com> References: <41363a9ff37a4cf19245ba67d5f43077@gmfinancial.com> Message-ID: An HTML attachment was scrubbed... URL: From sandeep.patil at in.ibm.com Tue Sep 3 06:28:30 2019 From: sandeep.patil at in.ibm.com (Sandeep Ramesh) Date: Tue, 3 Sep 2019 05:28:30 +0000 Subject: [gpfsug-discuss] Latest Technical Blogs on IBM Spectrum Scale (Q2 2019) In-Reply-To: References: Message-ID: Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q2 2019). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. Redpaper : IBM Power Systems Enterprise AI Solutions (W/ SPECTRUM SCALE) http://www.redbooks.ibm.com/redpieces/abstracts/redp5556.html?Open IBM Spectrum Scale Erasure Code Edition (ECE): Installation Demonstration https://www.youtube.com/watch?v=6If50EvgP-U Blogs: Using IBM Spectrum Scale as platform storage for running containerized Hadoop/Spark workloads https://developer.ibm.com/storage/2019/08/27/using-ibm-spectrum-scale-as-platform-storage-for-running-containerized-hadoop-spark-workloads/ Useful Tools for Spectrum Scale CES NFS https://developer.ibm.com/storage/2019/07/22/useful-tools-for-spectrum-scale-ces-nfs/ How to ensure NFS uses strong encryption algorithms for secure data in motion ? https://developer.ibm.com/storage/2019/07/19/how-to-ensure-nfs-uses-strong-encryption-algorithms-for-secure-data-in-motion/ Introducing IBM Spectrum Scale Erasure Code Edition https://developer.ibm.com/storage/2019/07/07/introducing-ibm-spectrum-scale-erasure-code-edition/ Spectrum Scale: Which Filesystem Encryption Algo to Consider ? https://developer.ibm.com/storage/2019/07/01/spectrum-scale-which-filesystem-encryption-algo-to-consider/ IBM Spectrum Scale HDFS Transparency Apache Hadoop 3.1.x Support https://developer.ibm.com/storage/2019/06/24/ibm-spectrum-scale-hdfs-transparency-apache-hadoop-3-0-x-support/ Enhanced features in Elastic Storage Server (ESS) 5.3.4 https://developer.ibm.com/storage/2019/06/19/enhanced-features-in-elastic-storage-server-ess-5-3-4/ Upgrading IBM Spectrum Scale Erasure Code Edition using installation toolkit https://developer.ibm.com/storage/2019/06/09/upgrading-ibm-spectrum-scale-erasure-code-edition-using-installation-toolkit/ Upgrading IBM Spectrum Scale sync replication / stretch cluster setup in PureApp https://developer.ibm.com/storage/2019/06/06/upgrading-ibm-spectrum-scale-sync-replication-stretch-cluster-setup/ GPFS config remote access with multiple network definitions https://developer.ibm.com/storage/2019/05/30/gpfs-config-remote-access-with-multiple-network-definitions/ IBM Spectrum Scale Erasure Code Edition Fault Tolerance https://developer.ibm.com/storage/2019/05/30/ibm-spectrum-scale-erasure-code-edition-fault-tolerance/ IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 5.0.3 ? https://developer.ibm.com/storage/2019/05/02/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-3/ Understanding and Solving WBC_ERR_DOMAIN_NOT_FOUND error with Spectrum Scale https://crk10.wordpress.com/2019/07/21/solving-the-wbc-err-domain-not-found-nt-status-none-mapped-glitch-in-ibm-spectrum-scale/ Understanding and Solving NT_STATUS_INVALID_SID issue for SMB access with Spectrum Scale https://crk10.wordpress.com/2019/07/24/solving-nt_status_invalid_sid-for-smb-share-access-in-ibm-spectrum-scale/ mmadquery primer (apparatus to query Active Directory from IBM Spectrum Scale) https://crk10.wordpress.com/2019/07/27/mmadquery-primer-apparatus-to-query-active-directory-from-ibm-spectrum-scale/ How to configure RHEL host as Active Directory Client using SSSD https://crk10.wordpress.com/2019/07/28/configure-rhel-machine-as-active-directory-client-using-sssd/ How to configure RHEL host as LDAP client using nslcd https://crk10.wordpress.com/2019/07/28/configure-rhel-machine-as-ldap-client-using-nslcd/ Solving NFSv4 AUTH_SYS nobody ownership issue https://crk10.wordpress.com/2019/07/29/nfsv4-auth_sys-nobody-ownership-and-idmapd/ For more : Search /browse here: https://developer.ibm.com/storage/blog User Group Presentations: https://www.spectrumscale.org/presentations/ Consolidation list of all blogs and collaterals. https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 04/29/2019 12:12 PM Subject: Latest Technical Blogs on IBM Spectrum Scale (Q1 2019) Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q1 2019). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. Spectrum Scale 5.0.3 https://developer.ibm.com/storage/2019/04/24/spectrum-scale-5-0-3/ IBM Spectrum Scale HDFS Transparency Ranger Support https://developer.ibm.com/storage/2019/04/01/ibm-spectrum-scale-hdfs-transparency-ranger-support/ Integration of IBM Aspera Sync with IBM Spectrum Scale: Protecting and Sharing Files Globally, http://www.redbooks.ibm.com/abstracts/redp5527.html?Open Spectrum Scale user group in Singapore, 2019 https://developer.ibm.com/storage/2019/03/14/spectrum-scale-user-group-in-singapore-2019/ 7 traits to use Spectrum Scale to run container workload https://developer.ibm.com/storage/2019/02/26/7-traits-to-use-spectrum-scale-to-run-container-workload/ Health Monitoring of IBM Spectrum Scale Cluster via External Monitoring Framework https://developer.ibm.com/storage/2019/01/22/health-monitoring-of-ibm-spectrum-scale-cluster-via-external-monitoring-framework/ Migrating data from native HDFS to IBM Spectrum Scale based shared storage https://developer.ibm.com/storage/2019/01/18/migrating-data-from-native-hdfs-to-ibm-spectrum-scale-based-shared-storage/ Bulk File Creation useful for Test on Filesystems https://developer.ibm.com/storage/2019/01/16/bulk-file-creation-useful-for-test-on-filesystems/ For more : Search /browse here: https://developer.ibm.com/storage/blog User Group Presentations: https://www.spectrumscale.org/presentations/ Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 01/14/2019 06:24 PM Subject: Latest Technical Blogs on IBM Spectrum Scale (Q4 2018) Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q4 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. Redpaper: IBM Spectrum Scale and IBM StoredIQ: Identifying and securing your business data to support regulatory requirements http://www.redbooks.ibm.com/abstracts/redp5525.html?Open IBM Spectrum Scale Memory Usage https://www.slideshare.net/tomerperry/ibm-spectrum-scale-memory-usage?qid=50a1dfda-3102-484f-b9d0-14b69fc4800b&v=&b=&from_search=2 Spectrum Scale and Containers https://developer.ibm.com/storage/2018/12/20/spectrum-scale-and-containers/ IBM Elastic Storage Server Performance Graphical Visualization with Grafana https://developer.ibm.com/storage/2018/12/18/ibm-elastic-storage-server-performance-graphical-visualization-with-grafana/ Hadoop Performance for disaggregated compute and storage configurations based on IBM Spectrum Scale Storage https://developer.ibm.com/storage/2018/12/13/hadoop-performance-for-disaggregated-compute-and-storage-configurations-based-on-ibm-spectrum-scale-storage/ EMS HA in ESS LE (Little Endian) environment https://developer.ibm.com/storage/2018/12/07/ems-ha-in-ess-le-little-endian-environment/ What?s new in ESS 5.3.2 https://developer.ibm.com/storage/2018/12/04/whats-new-in-ess-5-3-2/ Administer your Spectrum Scale cluster easily https://developer.ibm.com/storage/2018/11/13/administer-your-spectrum-scale-cluster-easily/ Disaster Recovery using Spectrum Scale?s Active File Management https://developer.ibm.com/storage/2018/11/13/disaster-recovery-using-spectrum-scales-active-file-management/ Recovery Group Failover Procedure of IBM Elastic Storage Server (ESS) https://developer.ibm.com/storage/2018/10/08/recovery-group-failover-procedure-ibm-elastic-storage-server-ess/ Whats new in IBM Elastic Storage Server (ESS) Version 5.3.1 and 5.3.1.1 https://developer.ibm.com/storage/2018/10/04/whats-new-ibm-elastic-storage-server-ess-version-5-3-1-5-3-1-1/ For more : Search /browse here: https://developer.ibm.com/storage/blog User Group Presentations: https://www.spectrumscale.org/presentations/ Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Blogs%2C%20White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 10/03/2018 08:48 PM Subject: Latest Technical Blogs on IBM Spectrum Scale (Q3 2018) Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q3 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. How NFS exports became more dynamic with Spectrum Scale 5.0.2 https://developer.ibm.com/storage/2018/10/02/nfs-exports-became-dynamic-spectrum-scale-5-0-2/ HPC storage on AWS (IBM Spectrum Scale) https://developer.ibm.com/storage/2018/10/02/hpc-storage-aws-ibm-spectrum-scale/ Upgrade with Excluding the node(s) using Install-toolkit https://developer.ibm.com/storage/2018/09/30/upgrade-excluding-nodes-using-install-toolkit/ Offline upgrade using Install-toolkit https://developer.ibm.com/storage/2018/09/30/offline-upgrade-using-install-toolkit/ IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/21/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-2/ What?s New in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/15/whats-new-ibm-spectrum-scale-5-0-2/ Starting IBM Spectrum Scale 5.0.2 release, the installation toolkit supports upgrade rerun if fresh upgrade fails. https://developer.ibm.com/storage/2018/09/15/starting-ibm-spectrum-scale-5-0-2-release-installation-toolkit-supports-upgrade-rerun-fresh-upgrade-fails/ IBM Spectrum Scale installation toolkit ? enhancements over releases ? 5.0.2.0 https://developer.ibm.com/storage/2018/09/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases-5-0-2-0/ Announcing HDP 3.0 support with IBM Spectrum Scale https://developer.ibm.com/storage/2018/08/31/announcing-hdp-3-0-support-ibm-spectrum-scale/ IBM Spectrum Scale Tuning Overview for Hadoop Workload https://developer.ibm.com/storage/2018/08/20/ibm-spectrum-scale-tuning-overview-hadoop-workload/ Making the Most of Multicloud Storage https://developer.ibm.com/storage/2018/08/13/making-multicloud-storage/ Disaster Recovery for Transparent Cloud Tiering using SOBAR https://developer.ibm.com/storage/2018/08/13/disaster-recovery-transparent-cloud-tiering-using-sobar/ Your Optimal Choice of AI Storage for Today and Tomorrow https://developer.ibm.com/storage/2018/08/10/spectrum-scale-ai-workloads/ Analyze IBM Spectrum Scale File Access Audit with ELK Stack https://developer.ibm.com/storage/2018/07/30/analyze-ibm-spectrum-scale-file-access-audit-elk-stack/ Mellanox SX1710 40G switch MLAG configuration for IBM ESS https://developer.ibm.com/storage/2018/07/12/mellanox-sx1710-40g-switcher-mlag-configuration/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? SMB and NFS Access issues https://developer.ibm.com/storage/2018/07/10/protocol-problem-determination-guide-ibm-spectrum-scale-smb-nfs-access-issues/ Access Control in IBM Spectrum Scale Object https://developer.ibm.com/storage/2018/07/06/access-control-ibm-spectrum-scale-object/ IBM Spectrum Scale HDFS Transparency Docker support https://developer.ibm.com/storage/2018/07/06/ibm-spectrum-scale-hdfs-transparency-docker-support/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? Log Collection https://developer.ibm.com/storage/2018/07/04/protocol-problem-determination-guide-ibm-spectrum-scale-log-collection/ Redpapers IBM Spectrum Scale Immutability Introduction, Configuration Guidance, and Use Cases http://www.redbooks.ibm.com/abstracts/redp5507.html?Open Certifications Assessment of the immutability function of IBM Spectrum Scale Version 5.0 in accordance to US SEC17a-4f, EU GDPR Article 21 Section 1, German and Swiss laws and regulations in collaboration with KPMG. Certificate: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?DE968667B47544FF83F6CCDCF37E5FB5 Full assessment report: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?B290411BE1224F5A9B4D24663BCD3C5D For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 07/03/2018 12:13 AM Subject: Re: Latest Technical Blogs on Spectrum Scale (Q2 2018) Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q2 2018). We now have over 100+ developer blogs. As discussed in User Groups, passing it along: IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ IBM Spectrum Scale ILM Policies https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/ IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ Management GUI enhancements in IBM Spectrum Scale release 5.0.1 https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/ Managing IBM Spectrum Scale services through GUI https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/ Use AWS CLI with IBM Spectrum Scale? object storage https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/ Hadoop Storage Tiering with IBM Spectrum Scale https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/ How many Files on my Filesystem? https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/ Recording Spectrum Scale Object Stats for Potential Billing like Purpose using Elasticsearch https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/ New features in IBM Elastic Storage Server (ESS) Version 5.3 https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/ Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send earlier) https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19 Redpapers Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering http://www.redbooks.ibm.com/abstracts/redp5411.html?Open SAP HANA and ESS: A Winning Combination (Update) http://www.redbooks.ibm.com/abstracts/redp5436.html?Open Others IBM Spectrum Scale Software Version Recommendation Preventive Service Planning (Updated) http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in HCLS https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN& For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 03/27/2018 05:23 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q1 2018). As discussed in User Groups, passing it along: GDPR Compliance and Unstructured Data Storage https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/ IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and highlights https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/ Management GUI enhancements in IBM Spectrum Scale release 5.0.0 https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/ IBM Spectrum Scale 5.0.0 ? What?s new in NFS? https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/ Benefits and implementation of Spectrum Scale sudo wrappers https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/ IBM Spectrum Scale: Big Data and Analytics Solution Brief https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/ Variant Sub-blocks in Spectrum Scale 5.0 https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/ Compression support in Spectrum Scale 5.0.0 https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM Spectrum Scale on AWS. This solution helps the users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale. Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4 Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY. For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 01/10/2018 12:13 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, Here are list of development blogs in the last quarter. Passing it to this email group as Doris had got a feedback in the UG meetings to notify the members with the latest updates periodically. Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ IBM Spectrum Scale MMFSCK ? Savvy Enhancements https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/ ESS Disk Management https://developer.ibm.com/storage/2018/01/02/ess-disk-management/ IBM Spectrum Scale Object Protocol On Ubuntu https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/ IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/ A Complete Guide to ? Protocol Problem Determination Guide for IBM Spectrum Scale? ? Part 1 https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/ IBM Spectrum Scale installation toolkit ? enhancements over releases https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/ Network requirements in an Elastic Storage Server Setup https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/ Co-resident migration with Transparent cloud tierin https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/ IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big Data Solution https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/ Big data analytics with Spectrum Scale using remote cluster mount & multi-filesystem support https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/ IBM Spectrum Scale HDFS Transparency Short Circuit Write Support https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/ IBM Spectrum Scale HDFS Transparency Federation Support https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/ How to configure and performance tuning different system workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Spark workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning database workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Hadoop workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/ How to Configure IBM Spectrum Scale? with NIS based Authentication. https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 11/16/2017 08:15 PM Subject: Latest Technical Blogs on Spectrum Scale Dear User Group members, Here are the Development Blogs in last 3 months on Spectrum Scale Technical Topics. Spectrum Scale Monitoring ? Know More ? https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/ IBM Spectrum Scale 5.0 Release ? What?s coming ! https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/ Four Essentials things to know for managing data ACLs on IBM Spectrum Scale? from Windows https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/ GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server https://developer.ibm.com/storage/2017/11/13/gssutils/ IBM Spectrum Scale Object Authentication https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/ Video Surveillance ? Choosing the right storage https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/ IBM Spectrum scale object deep dive training with problem determination https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training Spectrum Scale as preferred software defined storage for Ubuntu OpenStack https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/ IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a performance workhorse https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/ A Complete Guide to Configure LDAP-based authentication with IBM Spectrum Scale? for File Access https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/ Deploying IBM Spectrum Scale on AWS Quick Start https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/ Monitoring Spectrum Scale Object metrics https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/ Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk Universal https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/ Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol on IBM Spectrum Scale?? https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory and LDAP https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/ IBM Spectrum Scale? Authentication using Active Directory and RFC2307 https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/ High Availability Implementation with IBM Spectrum Virtualize and IBM Spectrum Scale https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/ 10 Frequently asked Questions on configuring Authentication using AD + AUTO ID mapping on IBM Spectrum Scale?. https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/ Five cool things that you didn?t know Transparent Cloud Tiering on Spectrum Scale can do https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/ IBM Spectrum Scale GUI videos https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/ IBM Spectrum Scale? Authentication ? Planning for NFS Access https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Tue Sep 3 14:07:44 2019 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Tue, 3 Sep 2019 15:07:44 +0200 Subject: [gpfsug-discuss] Fileheat - does work! Complete test/example provided here. In-Reply-To: References: Message-ID: Thanks for this example, very userful, but I'm still struggeling a bit at a customer.. We're doing heat daily based rebalancing, with fileheatlosspercent=20 and fileheatperiodminutes=720: RULE "defineTiers" GROUP POOL 'Tiers' IS 'ssdpool' LIMIT(70) then 'saspool' RULE 'Rebalance' MIGRATE FROM POOL 'Tiers' TO POOL 'Tiers' WEIGHT(FILE_HEAT) WHERE FILE_SIZE<10000000000 but are seeing too many files moved down to the saspool and too few are staying in the ssdpool. Right now we ran a test of this policy, and saw that it wanted to move 130k files / 300 GB down to the saspool, and a single small file up to the ssdpool -- even though the ssdpool is only 50% utilized. Running your listing policy reveals lots of files with zero heat: <7> /gpfs/gpfs0/file1 RULE 'fh2' LIST 'fh' WEIGHT(0.000000) SHOW( _NULL_ _NULL_ _NULL_ +0.00000000000000E+000 _NULL_ 720 25 server.locale) <7> /gpfs/gpfs0/file2 RULE 'fh2' LIST 'fh' WEIGHT(0.000000) SHOW( _NULL_ _NULL_ _NULL_ +0.00000000000000E+000 _NULL_ 720 25 server.locale) <7> /gpfs/gpfs0/file3/HM_WVS_8P41017_1/HM_WVS_8P41017_1.S2206 RULE 'fh2' LIST 'fh' WEIGHT(0.000000) SHOW( _NULL_ _NULL_ _NULL_ +0.00000000000000E+000 _NULL_ 720 25 server.locale) and others with heat: <5> /gpfs/gpfs0/file4 RULE 'fh2' LIST 'fh' WEIGHT(0.004246) SHOW( 300401047 0 0 +4.24600492924153E-003 11E7C19700000000 720 25 server.locale) <5> /gpfs/gpfs0/file5 RULE 'fh2' LIST 'fh' WEIGHT(0.001717) SHOW( 120971793 1 0 +1.71725239616613E-003 0735E21100010000 720 25 server.locale) These are not new files -- so we're wondering if maybe the fileheat is reduced to zero/NULL after a while (how many times can it shrink by 25% before it's zero??). Would it make sense to increase fileheatperiodeminutes and/or decrease fileheatlosspercentage? What would be good values? (BTW: we have relatime enabled) Any other ideas for why it won't fill up our ssdpool to close to LIMIT(70) ? -jf On Tue, Aug 13, 2019 at 3:33 PM Marc A Kaplan wrote: > Yes, you are correct. It should only be necessary to set > fileHeatPeriodMinutes, since the loss percent does have a default value. > But IIRC (I implemented part of this!) you must restart the daemon to get > those fileheat parameter(s) "loaded"and initialized into the daemon > processes. > > Not fully trusting my memory... I will now "prove" this works today as > follows: > > To test, create and re-read a large file with dd... > > [root@/main/gpfs-git]$mmchconfig fileHeatPeriodMinutes=60 > mmchconfig: Command successfully completed > ... > [root@/main/gpfs-git]$mmlsconfig | grep -i heat > fileHeatPeriodMinutes 60 > > [root@/main/gpfs-git]$mmshutdown > ... > [root@/main/gpfs-git]$mmstartup > ... > [root@/main/gpfs-git]$mmmount c23 > ... > [root@/main/gpfs-git]$ls -l /c23/10g > -rw-r--r--. 1 root root 10737418240 May 16 15:09 /c23/10g > > [root@/main/gpfs-git]$mmlsattr -d -X /c23/10g > file name: /c23/10g > security.selinux > > (NO fileheat attribute yet...) > > [root@/main/gpfs-git]$dd if=/c23/10g bs=1M of=/dev/null > ... > After the command finishes, you may need to wait a while for the metadata > to flush to the inode on disk ... or you can force that with an unmount or > a mmfsctl... > > Then the fileheat attribute will appear (I just waited by answering > another email... No need to do any explicit operations on the file system..) > > [root@/main/gpfs-git]$mmlsattr -d -X /c23/10g > file name: /c23/10g > security.selinux > gpfs.FileHeat > > To see its hex string value: > > [root@/main/gpfs-git]$mmlsattr -d -X -L /c23/10g > file name: /c23/10g > ... > security.selinux: > 0x756E636F6E66696E65645F753A6F626A6563745F723A756E6C6162656C65645F743A733000 > gpfs.FileHeat: 0x000000EE42A40400 > > Which will be interpreted by mmapplypolicy... > > YES, the interpretation is relative to last access time and current time, > and done by a policy/sql function "computeFileHeat" > (You could find this using m4 directives in your policy file...) > > > define([FILE_HEAT],[computeFileHeat(CURRENT_TIMESTAMP-ACCESS_TIME,xattr('gpfs.FileHeat'),KB_ALLOCATED)]) > > Well gone that far, might as well try mmapplypolicy too.... > > [root@/main/gpfs-git]$cat /gh/policies/fileheat.policy > define(DISPLAY_NULL,[CASE WHEN ($1) IS NULL THEN '_NULL_' ELSE varchar($1) > END]) > > rule fh1 external list 'fh' exec '' > rule fh2 list 'fh' weight(FILE_HEAT) > show(DISPLAY_NULL(xattr_integer('gpfs.FileHeat',1,4,'B')) || ' ' || > DISPLAY_NULL(xattr_integer('gpfs.FileHeat',5,2,'B')) || ' ' || > DISPLAY_NULL(xattr_integer('gpfs.FileHeat',7,2,'B')) || ' ' || > DISPLAY_NULL(FILE_HEAT) || ' ' || > DISPLAY_NULL(hex(xattr('gpfs.FileHeat'))) || ' ' || > getmmconfig('fileHeatPeriodMinutes') || ' ' || > getmmconfig('fileHeatLossPercent') || ' ' || > getmmconfig('clusterName') ) > > > [root@/main/gpfs-git]$mmapplypolicy /c23 --maxdepth 1 -P > /gh/policies/fileheat.policy -I test -L 3 > ... > <1> /c23/10g RULE 'fh2' LIST 'fh' WEIGHT(0.022363) SHOW( 238 17060 1024 > +2.23632812500000E-002 000000EE42A40400 60 10 makaplan.sl.cloud9.ibm.com) > ... > WEIGHT(0.022363) LIST 'fh' /c23/10g SHOW(238 17060 1024 > +2.23632812500000E-002 000000EE42A40400 60 10 makaplan.sl.cloud9.ibm.com) > > > > > [image: Inactive hide details for Jan-Frode Myklebust ---08/13/2019 > 06:22:46 AM---What about filesystem atime updates. We recently chan]Jan-Frode > Myklebust ---08/13/2019 06:22:46 AM---What about filesystem atime updates. > We recently changed the default to ?relatime?. Could that maybe > > From: Jan-Frode Myklebust > To: gpfsug main discussion list > Date: 08/13/2019 06:22 AM > Subject: [EXTERNAL] Re: [gpfsug-discuss] Fileheat > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > > What about filesystem atime updates. We recently changed the default to > ?relatime?. Could that maybe influence heat tracking? > > > > -jf > > > tir. 13. aug. 2019 kl. 11:29 skrev Ulrich Sibiller < > *u.sibiller at science-computing.de* >: > > On 12.08.19 15:38, Marc A Kaplan wrote: > > My Admin guide says: > > > > The loss percentage and period are set via the configuration > > variables *fileHeatLossPercent *and *fileHeatPeriodMinutes*. By > default, the file access temperature > > is not > > tracked. To use access temperature in policy, the tracking must > first be enabled. To do this, set > > the two > > configuration variables as follows:* > > Yes, I am aware of that. > > > fileHeatLossPercent* > > The percentage (between 0 and 100) of file access temperature > dissipated over the* > > fileHeatPeriodMinutes *time. The default value is 10. > > Chapter 25. Information lifecycle management for IBM Spectrum Scale > *361** > > fileHeatPeriodMinutes* > > The number of minutes defined for the recalculation of file access > temperature. To turn on > > tracking, *fileHeatPeriodMinutes *must be set to a nonzero value. > The default value is 0 > > > > > > SO Try setting both! > > Well, I have not because the documentation explicitly mentions a > default. What's the point of a > default if I have to explicitly configure it? > > > ALSO to take effect you may have to mmshutdown and mmstartup, at > least on the (client gpfs) nodes > > that are accessing the files of interest. > > I have now configured both parameters and restarted GPFS. Ran a tar > over a directory - still no > change. I will wait for 720minutes and retry (tomorrow). > > Thanks > > Uli > > -- > Science + Computing AG > Vorstandsvorsitzender/Chairman of the board of management: > Dr. Martin Matzke > Vorstand/Board of Management: > Matthias Schempp, Sabine Hohenstein > Vorsitzender des Aufsichtsrats/ > Chairman of the Supervisory Board: > Philippe Miltin > Aufsichtsrat/Supervisory Board: > Martin Wibbe, Ursula Morgenstern > Sitz/Registered Office: Tuebingen > Registergericht/Registration Court: Stuttgart > Registernummer/Commercial Register No.: HRB 382196 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From Robert.Oesterlin at nuance.com Tue Sep 3 16:37:58 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Tue, 3 Sep 2019 15:37:58 +0000 Subject: [gpfsug-discuss] Easiest way to copy quota settings from one file system to another? Message-ID: <63C132C3-63AF-465B-8FD9-67AF9EA4887D@nuance.com> I?m migratinga file system from one cluster to another. I want to copy all user quotas from cluster1 filesystem ?A? to cluster2, filesystem ?fs1?, fileset ?A? What?s the easiest way to do that? I?m thinking mmsetquota with a stanza file, but is there a tool to generate the stanza file from the source? I could do a ?mmrepquota -u -Y? and process the output. Hoping for something easier :) Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.mattsson at maxiv.lu.se Thu Sep 5 10:54:04 2019 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Thu, 5 Sep 2019 09:54:04 +0000 Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction Message-ID: <3ed969d0d778446982a419067320f927@maxiv.lu.se> Hi, Does anyone here know if cache eviction on a AFM cache also make the inodes used by the evicted files available for reuse? Basically, I'm trying to figure out if it is enough to have sufficient inode space in my cache filesets to keep the maximum expected simultaneously cached files, or if I need the same inode space as for the total amount of files that will reside in the home of the cache. Regards, Andreas Mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Thu Sep 5 14:28:00 2019 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Thu, 5 Sep 2019 18:58:00 +0530 Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction In-Reply-To: <3ed969d0d778446982a419067320f927@maxiv.lu.se> References: <3ed969d0d778446982a419067320f927@maxiv.lu.se> Message-ID: Hi, AFM does not support inode eviction, only data blocks are evicted and the file's metadata will remain in the fileset. ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: GPFS User Group Date: 09/05/2019 03:39 PM Subject: [EXTERNAL] [gpfsug-discuss] Inode reuse on AFM cache eviction Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Does anyone here know if cache eviction on a AFM cache also make the inodes used by the evicted files available for reuse? Basically, I'm trying to figure out if it is enough to have sufficient inode space in my cache filesets to keep the maximum expected simultaneously cached files, or if I need the same inode space as for the total amount of files that will reside in the home of the cache. Regards, Andreas Mattsson ____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=5omqUvEiiIKUhShJOBEgb3WwLU5uy-8o_4--y0TOuw0&s=ZFAcjvG5LrsnsCJgIf9f1320V866HKG6iJGteRQ7oac&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4232 bytes Desc: not available URL: From sakkuma4 at in.ibm.com Thu Sep 5 19:37:47 2019 From: sakkuma4 at in.ibm.com (Saket Kumar11) Date: Thu, 5 Sep 2019 18:37:47 +0000 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 92, Issue 4 In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From sakkuma4 at in.ibm.com Thu Sep 5 20:06:17 2019 From: sakkuma4 at in.ibm.com (Saket Kumar11) Date: Thu, 5 Sep 2019 19:06:17 +0000 Subject: [gpfsug-discuss] Inode reuse on AFM cache eviction In-Reply-To: Message-ID: An HTML attachment was scrubbed... URL: From son.truong at bristol.ac.uk Fri Sep 6 10:48:56 2019 From: son.truong at bristol.ac.uk (Son Truong) Date: Fri, 6 Sep 2019 09:48:56 +0000 Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 Message-ID: Hello, Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on RHEL 7.7? I am failing with these errors: [root at host ~]# uname -a Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux [root at host ~]# rpm -qa | grep gpfs gpfs.base-4.2.3-7.x86_64 gpfs.gskit-8.0.50-75.x86_64 gpfs.ext-4.2.3-7.x86_64 gpfs.msg.en_US-4.2.3-7.noarch gpfs.docs-4.2.3-7.noarch gpfs.gpl-4.2.3-7.noarch [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl -------------------------------------------------------- mmbuildgpl: Building GPL module begins at Fri Sep 6 09:30:20 UTC 2019. -------------------------------------------------------- Verifying Kernel Header... kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64, 3.10.0-1062) module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include module build dir = /lib/modules/3.10.0-1062.el7.x86_64/build kernel source dir = /usr/src/linux-3.10.0-1062.el7.x86_64/include Found valid kernel header file under /usr/src/kernels/3.10.0-1062.el7.x86_64/include Verifying Compiler... make is present at /bin/make cpp is present at /bin/cpp gcc is present at /bin/gcc g++ is present at /bin/g++ ld is present at /bin/ld Verifying Additional System Headers... Verifying kernel-headers is installed ... Command: /bin/rpm -q kernel-headers The required package kernel-headers is installed make World ... Verifying that tools to build the portability layer exist.... cpp present gcc present g++ present ld present cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit $? || exit 1 rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver cleaning (/usr/lpp/mmfs/src/ibm-kxi) make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi' rm -f trcid.h ibm_kxi.trclst [cut] Invoking Kbuild... /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config ; \ if [ $? -ne 0 ]; then \ exit 1;\ fi make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' LD /usr/lpp/mmfs/src/gpl-linux/built-in.o CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0, from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: /usr/lpp/mmfs/src/gpl-linux/inode.c: In function 'printInode': /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: 'struct inode' has no member named 'i_wb_list' _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP->i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev)); ^ /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro '_TRACE_MACRO' { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP [ cut ] ^ /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro 'TRACE6' TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8, ^ In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: /usr/lpp/mmfs/src/gpl-linux/inode.c: In function 'cxiInitInodeSecurity': /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of 'security_old_inode_init_security' from incompatible pointer type [enabled by default] rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name, ^ In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0, from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61, from /usr/lpp/mmfs/src/gpl-linux/dir.c:56, from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: include/linux/security.h:1896:5: note: expected 'const char **' but argument is of type 'char **' int security_old_inode_init_security(struct inode *inode, struct inode *dir, ^ In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0, from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function 'cache_get_name': /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration of function 'vfs_readdir' [-Werror=implicit-function-declaration] error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer); ^ cc1: some warnings being treated as errors make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1 make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2 make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' make[1]: *** [modules] Error 1 make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' make: *** [Modules] Error 1 -------------------------------------------------------- mmbuildgpl: Building GPL module failed at Fri Sep 6 09:30:28 UTC 2019. -------------------------------------------------------- mmbuildgpl: Command failed. Examine previous error messages to determine cause. Any help appreciated... Son Son V Truong - Senior Storage Administrator Advanced Computing Research Centre IT Services, University of Bristol Email: son.truong at bristol.ac.uk Tel: Mobile: +44 (0) 7732 257 232 Address: 31 Great George Street, Bristol, BS1 5QD -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Sep 6 11:24:51 2019 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Fri, 6 Sep 2019 06:24:51 -0400 Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 In-Reply-To: References: Message-ID: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu> We are starting rolling upgrade to 5.0.3-x and gplbin compiles with non-fatal warnings at that version. It seems to run fine. The rest of the cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do you have a reason to not go for the latest release on either the 4- or 5- line? [root at xxx ~]# ssh node1301 rpm -q gpfs.base gpfs.base-4.2.3-10.x86_64 -- ddj Dave Johnson > On Sep 6, 2019, at 5:48 AM, Son Truong wrote: > > Hello, > > Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on RHEL 7.7? > > I am failing with these errors: > > [root at host ~]# uname -a > Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux > > [root at host ~]# rpm -qa | grep gpfs > gpfs.base-4.2.3-7.x86_64 > gpfs.gskit-8.0.50-75.x86_64 > gpfs.ext-4.2.3-7.x86_64 > gpfs.msg.en_US-4.2.3-7.noarch > gpfs.docs-4.2.3-7.noarch > gpfs.gpl-4.2.3-7.noarch > > [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl > -------------------------------------------------------- > mmbuildgpl: Building GPL module begins at Fri Sep 6 09:30:20 UTC 2019. > -------------------------------------------------------- > Verifying Kernel Header... > kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64, 3.10.0-1062) > module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include > module build dir = /lib/modules/3.10.0-1062.el7.x86_64/build > kernel source dir = /usr/src/linux-3.10.0-1062.el7.x86_64/include > Found valid kernel header file under /usr/src/kernels/3.10.0-1062.el7.x86_64/include > Verifying Compiler... > make is present at /bin/make > cpp is present at /bin/cpp > gcc is present at /bin/gcc > g++ is present at /bin/g++ > ld is present at /bin/ld > Verifying Additional System Headers... > Verifying kernel-headers is installed ... > Command: /bin/rpm -q kernel-headers > The required package kernel-headers is installed > make World ... > Verifying that tools to build the portability layer exist.... > cpp present > gcc present > g++ present > ld present > cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit $? || exit 1 > rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib > mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib > rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver > cleaning (/usr/lpp/mmfs/src/ibm-kxi) > make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi' > rm -f trcid.h ibm_kxi.trclst > > [cut] > > Invoking Kbuild... > /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config ; \ > if [ $? -ne 0 ]; then \ > exit 1;\ > fi > make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > LD /usr/lpp/mmfs/src/gpl-linux/built-in.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o > LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o > LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o > CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o > In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0, > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?: > /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? has no member named ?i_wb_list? > _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP->i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev)); > ^ > /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro ?_TRACE_MACRO? > { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP > > [ cut ] > > ^ > /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro ?TRACE6? > TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8, > ^ > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0, > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?: > /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of ?security_old_inode_init_security? from incompatible pointer type [enabled by default] > rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name, > ^ > In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0, > from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61, > from /usr/lpp/mmfs/src/gpl-linux/dir.c:56, > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > include/linux/security.h:1896:5: note: expected ?const char **? but argument is of type ?char **? > int security_old_inode_init_security(struct inode *inode, struct inode *dir, > ^ > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0, > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?: > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration of function ?vfs_readdir? [-Werror=implicit-function-declaration] > error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer); > ^ > cc1: some warnings being treated as errors > make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1 > make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2 > make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > make[1]: *** [modules] Error 1 > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' > make: *** [Modules] Error 1 > -------------------------------------------------------- > mmbuildgpl: Building GPL module failed at Fri Sep 6 09:30:28 UTC 2019. > -------------------------------------------------------- > mmbuildgpl: Command failed. Examine previous error messages to determine cause. > > Any help appreciated? > Son > > Son V Truong - Senior Storage Administrator > Advanced Computing Research Centre > IT Services, University of Bristol > Email: son.truong at bristol.ac.uk > Tel: Mobile: +44 (0) 7732 257 232 > Address: 31 Great George Street, Bristol, BS1 5QD > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.Wolf-Reber at de.ibm.com Fri Sep 6 12:41:32 2019 From: A.Wolf-Reber at de.ibm.com (Alexander Wolf) Date: Fri, 6 Sep 2019 11:41:32 +0000 Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 In-Reply-To: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu> References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu>, Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15677537609150.png Type: image/png Size: 1134 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15677537609151.png Type: image/png Size: 6645 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image.15677537609152.png Type: image/png Size: 1134 bytes Desc: not available URL: From Dugan.Witherick at warwick.ac.uk Fri Sep 6 13:25:22 2019 From: Dugan.Witherick at warwick.ac.uk (Witherick, Dugan) Date: Fri, 6 Sep 2019 12:25:22 +0000 Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 In-Reply-To: References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu> , Message-ID: <05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk> Hi Son, You might also find Table 39 on https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#fsm useful as it lists the minimum Spectrum Scale Level supported and tested against the RHEL Distribution/kernel version. Thanks, Dugan On Fri, 2019-09-06 at 11:41 +0000, Alexander Wolf wrote: > RHEL 7.7 is not supported by any Scale release at the moment. We are > qualifying it right now and would like to claim support with the next PTFs on > both 4.2.3 and 5.0.3 streams. However we have seen issues in test that will > probably cause delays. > > Picking up new minor RHEL updates before Scale claims support might work many > times but is quite a risky business. I highly recommend waiting for our > support statement. > > Mit freundlichen Gr??en / Kind regards > > > > > > Dr. Alexander Wolf-Reber > Spectrum Scale Release Lead Architect > Department M069 / Spectrum Scale Software Development > > +49-160-90540880 > a.wolf-reber at de.ibm.com > > IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: > Matthias Hartmann / Gesch?ftsf?hrung: Dirk Wittkopp > Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB > 243294 > > > > > ----- Original message ----- > > From: david_johnson at brown.edu > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > To: gpfsug main discussion list > > Cc: > > Subject: [EXTERNAL] Re: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 > > Date: Fri, Sep 6, 2019 12:33 > > > > We are starting rolling upgrade to 5.0.3-x and gplbin compiles with non- > > fatal warnings at that version. It seems to run fine. The rest of the > > cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do you have a > > reason to not go for the latest release on either the 4- or 5- line? > > > > [root at xxx ~]# ssh node1301 rpm -q gpfs.base > > gpfs.base-4.2.3-10.x86_64 > > > > > > -- ddj > > Dave Johnson > > > > On Sep 6, 2019, at 5:48 AM, Son Truong wrote: > > > > > Hello, > > > > > > Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel modules on > > > RHEL 7.7? > > > > > > I am failing with these errors: > > > > > > [root at host ~]# uname -a > > > Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 > > > x86_64 x86_64 x86_64 GNU/Linux > > > > > > [root at host ~]# rpm -qa | grep gpfs > > > gpfs.base-4.2.3-7.x86_64 > > > gpfs.gskit-8.0.50-75.x86_64 > > > gpfs.ext-4.2.3-7.x86_64 > > > gpfs.msg.en_US-4.2.3-7.noarch > > > gpfs.docs-4.2.3-7.noarch > > > gpfs.gpl-4.2.3-7.noarch > > > > > > [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl > > > -------------------------------------------------------- > > > mmbuildgpl: Building GPL module begins at Fri Sep 6 09:30:20 UTC 2019. > > > -------------------------------------------------------- > > > Verifying Kernel Header... > > > kernel version = 31000999 (31000999000000, 3.10.0-1062.el7.x86_64, > > > 3.10.0-1062) > > > module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include > > > module build dir = /lib/modules/3.10.0-1062.el7.x86_64/build > > > kernel source dir = /usr/src/linux-3.10.0-1062.el7.x86_64/include > > > Found valid kernel header file under /usr/src/kernels/3.10.0- > > > 1062.el7.x86_64/include > > > Verifying Compiler... > > > make is present at /bin/make > > > cpp is present at /bin/cpp > > > gcc is present at /bin/gcc > > > g++ is present at /bin/g++ > > > ld is present at /bin/ld > > > Verifying Additional System Headers... > > > Verifying kernel-headers is installed ... > > > Command: /bin/rpm -q kernel-headers > > > The required package kernel-headers is installed > > > make World ... > > > Verifying that tools to build the portability layer exist.... > > > cpp present > > > gcc present > > > g++ present > > > ld present > > > cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > ./def.mk; exit > > > $? || exit 1 > > > rm -rf /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin > > > /usr/lpp/mmfs/src/lib > > > mkdir /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin > > > /usr/lpp/mmfs/src/lib > > > rm -f //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver > > > cleaning (/usr/lpp/mmfs/src/ibm-kxi) > > > make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi' > > > rm -f trcid.h ibm_kxi.trclst > > > > > > [cut] > > > > > > Invoking Kbuild... > > > /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 ARCH=x86_64 > > > M=/usr/lpp/mmfs/src/gpl-linux CONFIGDIR=/usr/lpp/mmfs/src/config ; \ > > > if [ $? -ne 0 ]; then \ > > > exit 1;\ > > > fi > > > make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > > > LD /usr/lpp/mmfs/src/gpl-linux/built-in.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o > > > LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o > > > LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o > > > In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?: > > > /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? has no > > > member named ?i_wb_list? > > > _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), (Int64)(&(iP->i_wb_list)), > > > (Int64)(iP->i_wb_list.next), (Int64)(iP->i_wb_list.prev), (Int64)(&(iP- > > > >i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev)); > > > ^ > > > /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition of macro > > > _TRACE_MACRO? > > > { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP > > > > > > [ cut ] > > > > > > ^ > > > /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of macro > > > ?TRACE6? > > > TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8, > > > ^ > > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing argument 4 of > > > ?security_old_inode_init_security? from incompatible pointer type [enabled > > > by default] > > > rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name, > > > ^ > > > In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0, > > > from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61, > > > from /usr/lpp/mmfs/src/gpl-linux/dir.c:56, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > include/linux/security.h:1896:5: note: expected ?const char **? but > > > argument is of type ?char **? > > > int security_old_inode_init_security(struct inode *inode, struct inode > > > *dir, > > > ^ > > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?: > > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit declaration > > > of function ?vfs_readdir? [-Werror=implicit-function-declaration] > > > error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer); > > > ^ > > > cc1: some warnings being treated as errors > > > make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1 > > > make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2 > > > make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > > > make[1]: *** [modules] Error 1 > > > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' > > > make: *** [Modules] Error 1 > > > -------------------------------------------------------- > > > mmbuildgpl: Building GPL module failed at Fri Sep 6 09:30:28 UTC 2019. > > > -------------------------------------------------------- > > > mmbuildgpl: Command failed. Examine previous error messages to determine > > > cause. > > > > > > Any help appreciated? > > > Son > > > > > > Son V Truong - Senior Storage Administrator > > > Advanced Computing Research Centre > > > IT Services, University of Bristol > > > Email: son.truong at bristol.ac.uk > > > Tel: Mobile: +44 (0) 7732 257 232 > > > Address: 31 Great George Street, Bristol, BS1 5QD > > > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From son.truong at bristol.ac.uk Fri Sep 6 15:15:04 2019 From: son.truong at bristol.ac.uk (Son Truong) Date: Fri, 6 Sep 2019 14:15:04 +0000 Subject: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 In-Reply-To: <05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk> References: <17B05E26-7F3B-4ADC-B1CA-5A37B7E16EFA@brown.edu> , <05bcf5cd48b9f5000a82f7440974275f98138661.camel@warwick.ac.uk> Message-ID: Thank you. Table 39 is most helpful. -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Witherick, Dugan Sent: 06 September 2019 13:25 To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Compiling gplbin on RHEL 7.7 Hi Son, You might also find Table 39 on https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html#fsm useful as it lists the minimum Spectrum Scale Level supported and tested against the RHEL Distribution/kernel version. Thanks, Dugan On Fri, 2019-09-06 at 11:41 +0000, Alexander Wolf wrote: > RHEL 7.7 is not supported by any Scale release at the moment. We are > qualifying it right now and would like to claim support with the next > PTFs on both 4.2.3 and 5.0.3 streams. However we have seen issues in > test that will probably cause delays. > > Picking up new minor RHEL updates before Scale claims support might > work many times but is quite a risky business. I highly recommend > waiting for our support statement. > > Mit freundlichen Gr??en / Kind regards > > > > > > Dr. Alexander Wolf-Reber > Spectrum Scale Release Lead Architect > Department M069 / Spectrum Scale Software Development > > +49-160-90540880 > a.wolf-reber at de.ibm.com > > IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: > Matthias Hartmann / Gesch?ftsf?hrung: Dirk Wittkopp Sitz der > Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB > 243294 > > > > > ----- Original message ----- > > From: david_johnson at brown.edu > > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > To: gpfsug main discussion list > > Cc: > > Subject: [EXTERNAL] Re: [gpfsug-discuss] Compiling gplbin on RHEL > > 7.7 > > Date: Fri, Sep 6, 2019 12:33 > > > > We are starting rolling upgrade to 5.0.3-x and gplbin compiles with > > non- fatal warnings at that version. It seems to run fine. The rest > > of the cluster is still at 4.2.3-10 but only at RHEL 7.6 kernel. Do > > you have a reason to not go for the latest release on either the 4- or 5- line? > > > > [root at xxx ~]# ssh node1301 rpm -q gpfs.base > > gpfs.base-4.2.3-10.x86_64 > > > > > > -- ddj > > Dave Johnson > > > > On Sep 6, 2019, at 5:48 AM, Son Truong wrote: > > > > > Hello, > > > > > > Has anyone successfully compiled the GPFS 4.2.3-7 gplbin kernel > > > modules on RHEL 7.7? > > > > > > I am failing with these errors: > > > > > > [root at host ~]# uname -a > > > Linux host 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC > > > 2019 > > > x86_64 x86_64 x86_64 GNU/Linux > > > > > > [root at host ~]# rpm -qa | grep gpfs > > > gpfs.base-4.2.3-7.x86_64 > > > gpfs.gskit-8.0.50-75.x86_64 > > > gpfs.ext-4.2.3-7.x86_64 > > > gpfs.msg.en_US-4.2.3-7.noarch > > > gpfs.docs-4.2.3-7.noarch > > > gpfs.gpl-4.2.3-7.noarch > > > > > > [root at host ~]# /usr/lpp/mmfs/bin/mmbuildgpl > > > -------------------------------------------------------- > > > mmbuildgpl: Building GPL module begins at Fri Sep 6 09:30:20 UTC 2019. > > > -------------------------------------------------------- > > > Verifying Kernel Header... > > > kernel version = 31000999 (31000999000000, > > > 3.10.0-1062.el7.x86_64, > > > 3.10.0-1062) > > > module include dir = /lib/modules/3.10.0-1062.el7.x86_64/build/include > > > module build dir = /lib/modules/3.10.0-1062.el7.x86_64/build > > > kernel source dir = /usr/src/linux-3.10.0-1062.el7.x86_64/include > > > Found valid kernel header file under /usr/src/kernels/3.10.0- > > > 1062.el7.x86_64/include Verifying Compiler... > > > make is present at /bin/make > > > cpp is present at /bin/cpp > > > gcc is present at /bin/gcc > > > g++ is present at /bin/g++ > > > ld is present at /bin/ld > > > Verifying Additional System Headers... > > > Verifying kernel-headers is installed ... > > > Command: /bin/rpm -q kernel-headers > > > The required package kernel-headers is installed make World > > > ... > > > Verifying that tools to build the portability layer exist.... > > > cpp present > > > gcc present > > > g++ present > > > ld present > > > cd /usr/lpp/mmfs/src/config; /usr/bin/cpp -P def.mk.proto > > > > ./def.mk; exit $? || exit 1 rm -rf /usr/lpp/mmfs/src/include > > > /usr/lpp/mmfs/src/bin /usr/lpp/mmfs/src/lib mkdir > > > /usr/lpp/mmfs/src/include /usr/lpp/mmfs/src/bin > > > /usr/lpp/mmfs/src/lib rm -f > > > //usr/lpp/mmfs/src/gpl-linux/gpl_kernel.tmp.ver > > > cleaning (/usr/lpp/mmfs/src/ibm-kxi) > > > make[1]: Entering directory `/usr/lpp/mmfs/src/ibm-kxi' > > > rm -f trcid.h ibm_kxi.trclst > > > > > > [cut] > > > > > > Invoking Kbuild... > > > /usr/bin/make -C /usr/src/kernels/3.10.0-1062.el7.x86_64 > > > ARCH=x86_64 M=/usr/lpp/mmfs/src/gpl-linux > > > CONFIGDIR=/usr/lpp/mmfs/src/config ; \ if [ $? -ne 0 ]; then \ > > > exit 1;\ > > > fi > > > make[2]: Entering directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > > > LD /usr/lpp/mmfs/src/gpl-linux/built-in.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracelin.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/tracedev-ksyms.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/ktrccalls.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/relaytrc.o > > > LD [M] /usr/lpp/mmfs/src/gpl-linux/tracedev.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/mmfsmod.o > > > LD [M] /usr/lpp/mmfs/src/gpl-linux/mmfs26.o > > > CC [M] /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o > > > In file included from /usr/lpp/mmfs/src/gpl-linux/dir.c:63:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?printInode?: > > > /usr/lpp/mmfs/src/gpl-linux/trcid.h:1212:57: error: ?struct inode? > > > has no member named ?i_wb_list? > > > _TRACE6D(_HOOKWORD(TRCID_PRINTINODE_8), > > > (Int64)(&(iP->i_wb_list)), (Int64)(iP->i_wb_list.next), > > > (Int64)(iP->i_wb_list.prev), (Int64)(&(iP- > > > >i_lru)), (Int64)(iP->i_lru.next), (Int64)(iP->i_lru.prev)); > > > ^ > > > /usr/lpp/mmfs/src/include/cxi/Trace.h:395:23: note: in definition > > > of macro _TRACE_MACRO? > > > { _TR_BEFORE; _ktrc; KTRCOPTCODE; _TR_AFTER; } else NOOP > > > > > > [ cut ] > > > > > > ^ > > > /usr/lpp/mmfs/src/gpl-linux/inode.c:169:3: note: in expansion of > > > macro ?TRACE6? > > > TRACE6(TRACE_VNODE, 3, TRCID_PRINTINODE_8, > > > ^ > > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:63:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c: In function ?cxiInitInodeSecurity?: > > > /usr/lpp/mmfs/src/gpl-linux/inode.c:4358:3: warning: passing > > > argument 4 of ?security_old_inode_init_security? from incompatible > > > pointer type [enabled by default] > > > rc = SECURITY_INODE_INIT_SECURITY(iP, parentP, &dentryP->d_name, > > > ^ > > > In file included from /usr/lpp/mmfs/src/include/gpl-linux/verdep.h:50:0, > > > from /usr/lpp/mmfs/src/include/gpl-linux/linux2gpfs.h:61, > > > from /usr/lpp/mmfs/src/gpl-linux/dir.c:56, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:58, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > include/linux/security.h:1896:5: note: expected ?const char **? > > > but argument is of type ?char **? > > > int security_old_inode_init_security(struct inode *inode, struct > > > inode *dir, > > > ^ > > > In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:75:0, > > > from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:55: > > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c: In function ?cache_get_name?: > > > /usr/lpp/mmfs/src/gpl-linux/cxiCache.c:695:5: error: implicit > > > declaration of function ?vfs_readdir? [-Werror=implicit-function-declaration] > > > error = vfs_readdir(fileP, (filldir_t)filldir_one, &buffer); > > > ^ > > > cc1: some warnings being treated as errors > > > make[3]: *** [/usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1 > > > make[2]: *** [_module_/usr/lpp/mmfs/src/gpl-linux] Error 2 > > > make[2]: Leaving directory `/usr/src/kernels/3.10.0-1062.el7.x86_64' > > > make[1]: *** [modules] Error 1 > > > make[1]: Leaving directory `/usr/lpp/mmfs/src/gpl-linux' > > > make: *** [Modules] Error 1 > > > -------------------------------------------------------- > > > mmbuildgpl: Building GPL module failed at Fri Sep 6 09:30:28 UTC 2019. > > > -------------------------------------------------------- > > > mmbuildgpl: Command failed. Examine previous error messages to > > > determine cause. > > > > > > Any help appreciated? > > > Son > > > > > > Son V Truong - Senior Storage Administrator Advanced Computing > > > Research Centre IT Services, University of Bristol > > > Email: son.truong at bristol.ac.uk > > > Tel: Mobile: +44 (0) 7732 257 232 > > > Address: 31 Great George Street, Bristol, BS1 5QD > > > > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Robert.Oesterlin at nuance.com Fri Sep 6 16:42:39 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 6 Sep 2019 15:42:39 +0000 Subject: [gpfsug-discuss] SSUG Meeting at SC19: Save the date and call for user talks! Message-ID: The Spectrum Scale User group will hold its annual meeting at SC19 on Sunday November 17th from 12:30PM -6PM In Denver, Co. We will be posting exact meeting location soon, but reserve this time. IBM will host a reception following the user group meeting. We?re also looking for user talks - these are short update (20 mins or so) on your use of Spectrum Scale - any topics are welcome. If you are interested, please contact myself or Kristy Kallback-Rose. Looking forward to seeing everyone in Denver! Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From bipcuds at gmail.com Mon Sep 9 21:29:28 2019 From: bipcuds at gmail.com (Keith Ball) Date: Mon, 9 Sep 2019 16:29:28 -0400 Subject: [gpfsug-discuss] Anyone have experience with changing NSD server node name in an ESS/DSS cluster? Message-ID: Hi All, We are thinking of attempting a non-destructive change of NSD server node names in a Lenovo DSS cluster (DSS level 1.2a, which has Scale 4.2.3.5). For a non-GNR cluster, changing a node name for an NSD server isn't a huge deal if you can have a backup server serve up disks; one can mmdelnode then mmaddnode, for instance. Has anyone tried to rename the NSD servers in a GNR cluster, however? I am not sure if it's as easy as failing over the recovery group, and deleting/adding the NSD server. It's easy enough to modify xcat. Perhaps mmchrecoverygroup can be used to change the RG names (since they are named after the NSD servers), but that might not be necessary. Or, it might not work - does anyone know if there is a special process to change NSD server names in an E( or D or G)SS cluster that does not run afoul of GNR or upgrade scripts? Best regards, Keith -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Wed Sep 11 13:20:22 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Wed, 11 Sep 2019 14:20:22 +0200 Subject: [gpfsug-discuss] Save the date: Oct 10 - Spectrum Scale NYC User Meeting Message-ID: Greetings, NYU Langone and IBM will host a Spectrum Scale User Meeting on October 10. Many senior engineers of our development lab in Poughkeepsie will attend and present. Details with agenda, exact location and registration link will follow. Best Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjvilla at nccs.nasa.gov Wed Sep 11 20:14:12 2019 From: jjvilla at nccs.nasa.gov (John J. Villa) Date: Wed, 11 Sep 2019 15:14:12 -0400 (EDT) Subject: [gpfsug-discuss] Introduction - New Subscriber Message-ID: Hello, My name is John Villa. I work for NASA at the Nasa Center for Climate Simulation. We currently utilize GPFS as the primary filesystem on the discover cluster: https://www.nccs.nasa.gov/systems/discover I look forward to seeing everyone at SC19. Thank You, -- John J. Villa NASA Center for Climate Simulation Discover Systems Administrator From damir.krstic at gmail.com Thu Sep 12 15:16:03 2019 From: damir.krstic at gmail.com (Damir Krstic) Date: Thu, 12 Sep 2019 09:16:03 -0500 Subject: [gpfsug-discuss] VerbsReconnectThread waiters Message-ID: On my cluster I have seen couple of long waiters such as this: gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230 VerbsReconnectThread: delaying for 43.145624000 more seconds, reason: delaying for next reconnect attempt I tried searching on gpfs wiki for this type of waiter, but was unable to find anything of value. Is this something to pay attention to, and what does this waiter mean? Thank you. Damir -------------- next part -------------- An HTML attachment was scrubbed... URL: From george at markomanolis.com Thu Sep 12 16:10:58 2019 From: george at markomanolis.com (George Markomanolis) Date: Thu, 12 Sep 2019 11:10:58 -0400 Subject: [gpfsug-discuss] Call for Submission for the IO500 List Message-ID: Call for Submission *Deadline*: 10 November 2019 AoE The IO500 is now accepting and encouraging submissions for the upcoming 5th IO500 list revealed at SC19 in Denver, Colorado. Once again, we are also accepting submissions to the 10 Node I/O Challenge to encourage submission of small scale results. The new ranked lists will be announced at our SC19 BoF [2]. We hope to see you, and your results, there. We have updated our submission rules [3]. This year, we will have a new list for the Student Cluster Competition as IO500 is used for extra points during this competition The benchmark suite is designed to be easy to run and the community has multiple active support channels to help with any questions. Please submit and we look forward to seeing many of you at SC19! Please note that submissions of all sizes are welcome; the site has customizable sorting so it is possible to submit on a small system and still get a very good per-client score for example. Additionally, the list is about much more than just the raw rank; all submissions help the community by collecting and publishing a wider corpus of data. More details below. Following the success of the Top500 in collecting and analyzing historical trends in supercomputer technology and evolution, the IO500 was created in 2017, published its first list at SC17, and has grown exponentially since then. The need for such an initiative has long been known within High-Performance Computing; however, defining appropriate benchmarks had long been challenging. Despite this challenge, the community, after long and spirited discussion, finally reached consensus on a suite of benchmarks and a metric for resolving the scores into a single ranking. The multi-fold goals of the benchmark suite are as follows: 1. Maximizing simplicity in running the benchmark suite 2. Encouraging complexity in tuning for performance 3. Allowing submitters to highlight their ?hero run? performance numbers 4. Forcing submitters to simultaneously report performance for challenging IO patterns. Specifically, the benchmark suite includes a hero-run of both IOR and mdtest configured however possible to maximize performance and establish an upper-bound for performance. It also includes an IOR and mdtest run with highly prescribed parameters in an attempt to determine a lower-bound. Finally, it includes a namespace search as this has been determined to be a highly sought-after feature in HPC storage systems that have historically not been well-measured. Submitters are encouraged to share their tuning insights for publication. The goals of the community are also multi-fold: 1. Gather historical data for the sake of analysis and to aid predictions of storage futures 2. Collect tuning information to share valuable performance optimizations across the community 3. Encourage vendors and designers to optimize for workloads beyond ?hero runs? 4. Establish bounded expectations for users, procurers, and administrators 10 Node I/O Challenge At SC, we will continue the 10 Node Challenge. This challenge is conducted using the regular IO500 benchmark, however, with the rule that exactly *10 computes nodes* must be used to run the benchmark (one exception is the find, which may use 1 node). You may use any shared storage with, e.g., any number of servers. We will announce the result in a separate derived list and in the full list but not on the ranked IO500 list at io500.org. Birds-of-a-feather Once again, we encourage you to submit [1], to join our community, and to attend our BoF ?The IO500 and the Virtual Institute of I/O? at SC19, November 19th, 12:15-1:15pm, room 205-207, where we will announce the new IO500 list, the 10 node challenge list, and the Student Cluster Competition list. We look forward to answering any questions or concerns you might have. [1] http://io500.org/submission [2] *https://www.vi4io.org/io500/bofs/sc19/start * [3] https://www.vi4io.org/io500/rules/submission The IO500 committee -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Thu Sep 12 20:19:20 2019 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 12 Sep 2019 12:19:20 -0700 Subject: [gpfsug-discuss] Hold the Date - September 23 and 24 - REGISTRATION CLOSING SOON In-Reply-To: <938EC571-B900-42BC-8465-3E666912533F@lbl.gov> References: <3F2B08E9-C6E3-412B-9308-D79E3480C5DA@lbl.gov> <938EC571-B900-42BC-8465-3E666912533F@lbl.gov> Message-ID: Reminder, registration closing on 9/16 EOB. That?s real soon now. Hope to see you there. Details below. > On Aug 29, 2019, at 7:30 PM, Kristy Kallback-Rose wrote: > > Hello, > > You will now find the nearly complete agenda here: > > https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/ > > As noted before, the event is free, but please do register below to help with catering planning. > > You can find more information about the full HPCXXL event here: http://hpcxxl.org/ > > Any questions let us know. Hope to see you there! > > -Kristy > >> On Jul 2, 2019, at 10:45 AM, Kristy Kallback-Rose > wrote: >> >> Hello, >> >> HPCXXL will be hosted by NERSC (Berkeley, CA) this September. As part of this event, there will be approximately a day and a half on GPFS content. We have done this type of event in the past, and as before, the GPFS days will be free to attend, but you do need to register. >> >> We?ll have more details soon, mark your calendars. >> >> Initial details: https://www.spectrumscaleug.org/event/spectrum-scale-gpfs-days-part-of-hpcxxl/ >> >> Best, >> Kristy > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Greg.Lehmann at csiro.au Fri Sep 13 09:48:58 2019 From: Greg.Lehmann at csiro.au (Lehmann, Greg (IM&T, Pullenvale)) Date: Fri, 13 Sep 2019 08:48:58 +0000 Subject: [gpfsug-discuss] infiniband fabric instability effects Message-ID: Hi All, I was wondering what effect restarting the subnet manager has on an active Spectrum Scale filesystem. Is there any scope for data loss or corruption? A 2nd similar scenario of slightly longer duration is failover to a secondary subnet manager because the primary has crashed. What effect would that have on the filesystem? Cheers, Greg Lehmann Senior High Performance Data Specialist Data Services | Scientific Computing Platforms Information Management and Technology | CSIRO Greg.Lehmann at csiro.au | +61 7 3327 4137 | 1 Technology Court, Pullenvale, QLD 4069 CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and we pay our respects to their Elders past and present. The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference. Please consider the environment before printing this email. CSIRO Australia's National Science Agency | csiro.au -------------- next part -------------- An HTML attachment was scrubbed... URL: From david_johnson at brown.edu Fri Sep 13 10:14:06 2019 From: david_johnson at brown.edu (david_johnson at brown.edu) Date: Fri, 13 Sep 2019 05:14:06 -0400 Subject: [gpfsug-discuss] infiniband fabric instability effects In-Reply-To: References: Message-ID: <21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu> Restarting subnet manager in general is fairly harmless. It will cause a heavy sweep of the fabric when it comes back up, but there should be no LID renumbering. Traffic may be held up during the scanning and rebuild of the routing tables. Losing a subnet manager for a period of time would prevent newly booted nodes from receiving a LID but existing nodes will continue to function. Adding or deleting inter-switch links should probably be avoided if the subnet manager is down. I would also avoid changing the routing algorithm while in production. Moving a non ha subnet manager from primary to backup and back again has worked for us without disruption, but I would try to do this in a maintenance window. -- ddj Dave Johnson > On Sep 13, 2019, at 4:48 AM, Lehmann, Greg (IM&T, Pullenvale) wrote: > > Hi All, > I was wondering what effect restarting the subnet manager has on an active Spectrum Scale filesystem. Is there any scope for data loss or corruption? A 2nd similar scenario of slightly longer duration is failover to a secondary subnet manager because the primary has crashed. What effect would that have on the filesystem? > > Cheers, > > Greg Lehmann > Senior High Performance Data Specialist > Data Services | Scientific Computing Platforms > Information Management and Technology | CSIRO > Greg.Lehmann at csiro.au | +61 7 3327 4137 | > 1 Technology Court, Pullenvale, QLD 4069 > > CSIRO acknowledges the Traditional Owners of the land, sea and waters, of the area that we live and work on across Australia. We acknowledge their continuing connection to their culture and we pay our respects to their Elders past and present. > > The information contained in this email may be confidential or privileged. Any unauthorised use or disclosure is prohibited. If you have received this email in error, please delete it immediately and notify the sender by return email. Thank you. To the extent permitted by law, CSIRO does not represent, warrant and/or guarantee that the integrity of this communication has been maintained or that the communication is free of errors, virus, interception or interference. > > Please consider the environment before printing this email. > > CSIRO Australia?s National Science Agency | csiro.au > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Sep 13 10:48:52 2019 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 13 Sep 2019 09:48:52 +0000 Subject: [gpfsug-discuss] infiniband fabric instability effects In-Reply-To: <21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu> References: <21DA886A-1CCA-4C88-B35E-624006C70534@brown.edu> Message-ID: On Fri, 2019-09-13 at 05:14 -0400, david_johnson at brown.edu wrote: [SNIP] > Moving a non ha subnet manager from primary to backup and back again > has worked for us without disruption, but I would try to do this in a > maintenance window. > Not on GPFS but in the past I have moved from one subnet manager to another with dozens of running MPI jobs, and Lustre running over the fabric and not missed a beat. My current cluster used 10 and 40Gbps ethernet for GPFS with Omnipath exclusively for MPI traffic. To be honest I just cannot wrap my head around the idea that you would not be running two subnet managers in the first place. Just fire up two subnet managers (whether on a switch or a node) and forget about it. They will automatically work together to give you a HA solution. It is the same with Omnipath too. I would also note that you can fire up more than two fabric managers and it all "just works". If it where me and I didn't have fabric managers running on at least two of my switches and I was doing GPFS over Infiniband, I would fire up fabric managers on all of my NSD servers. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From heinrich.billich at id.ethz.ch Fri Sep 13 15:56:07 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Fri, 13 Sep 2019 14:56:07 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Message-ID: Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level? Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* From ewahl at osc.edu Fri Sep 13 16:42:30 2019 From: ewahl at osc.edu (Wahl, Edward) Date: Fri, 13 Sep 2019 15:42:30 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: Message-ID: I recall looking at this a year or two back. Ganesha is either v4 and v6 both (ie: the encapsulation you see), OR ipv4 ONLY. (ie: /etc/modprobe.d/ipv6.conf disable=1) Ed ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Billich Heinrich Rainer (ID SD) Sent: Friday, September 13, 2019 10:56 AM To: gpfsug main discussion list Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level? Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jam at ucar.edu Fri Sep 13 17:07:01 2019 From: jam at ucar.edu (Joseph Mendoza) Date: Fri, 13 Sep 2019 10:07:01 -0600 Subject: [gpfsug-discuss] VerbsReconnectThread waiters In-Reply-To: References: Message-ID: <0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu> I have seen these on our cluster after the IB network goes down (GPFS still runs over ethernet) and then comes back up.? They will retry forever it seems, even after the IB is healthy again.? The effect they seem to have is that verbs connections between some nodes breaks and GPFS uses ethernet/ipoib instead.? You may see messages in your mmfs.log.latest about verbs being disabled "due to too many errors".? You can also see fewer verbs connections between nodes in "mmfsadm test verbs conn" output. Restarting GPFS on the nodes with waiters has fixed the issue for me, I don't know if IBM has any other tricks to fix this without a restart. --Joey On 9/12/19 8:16 AM, Damir Krstic wrote: > On my cluster I have seen couple of long waiters such as this: > > gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230 VerbsReconnectThread: delaying for 43.145624000 more > seconds, reason: delaying for next reconnect attempt > > I tried searching on gpfs wiki for this type of waiter, but was unable to find anything of value. > > Is this something to pay attention to, and what does this waiter mean? > > Thank you. > Damir > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Mon Sep 16 08:12:09 2019 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Mon, 16 Sep 2019 09:12:09 +0200 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From scale at us.ibm.com Mon Sep 16 10:33:58 2019 From: scale at us.ibm.com (IBM Spectrum Scale) Date: Mon, 16 Sep 2019 17:33:58 +0800 Subject: [gpfsug-discuss] VerbsReconnectThread waiters In-Reply-To: <0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu> References: <0b4f23f6-e862-c45a-ce72-7ea3ee0f1067@ucar.edu> Message-ID: Damir, Joseph, > Is this something to pay attention to, and what does this waiter mean? This waiter means GPFS fails to reconnect broken verbs connection, which can cause performance degradation. > I have seen these on our cluster after the IB network goes down (GPFS still runs over ethernet) and then comes back up.? They will retry forever it seems, even after the IB is healthy again. > Restarting GPFS on the nodes with waiters has fixed the issue for me, I don't know if IBM has any other tricks to fix this without a restart. This is a code bug which is fixed through internal defect 1090669. It will be backport to service releases after verification. There is a work-around which can fix this problem without a restart. - On nodes which have this waiter list, run command 'mmfsadm test breakconn all 744' 744 is E_RECONNECT, which triggers tcp reconnect and will not cause node leave/rejoin. Its side effect clears RDMA connections and their incorrect status. Regards, The Spectrum Scale (GPFS) team ------------------------------------------------------------------------------------------------------------------ If you feel that your question can benefit other users of Spectrum Scale (GPFS), then please post it to the public IBM developerWroks Forum at https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479. If your query concerns a potential software error in Spectrum Scale (GPFS) and you have an IBM software maintenance contract please contact 1-800-237-5511 in the United States or your local IBM Service Center in other countries. The forum is informally monitored as time permits and should not be used for priority messages to the Spectrum Scale (GPFS) team. From: Joseph Mendoza To: gpfsug-discuss at spectrumscale.org Date: 2019/09/14 12:08 AM Subject: [EXTERNAL] Re: [gpfsug-discuss] VerbsReconnectThread waiters Sent by: gpfsug-discuss-bounces at spectrumscale.org I have seen these on our cluster after the IB network goes down (GPFS still runs over ethernet) and then comes back up.? They will retry forever it seems, even after the IB is healthy again.? The effect they seem to have is that verbs connections between some nodes breaks and GPFS uses ethernet/ipoib instead.? You may see messages in your mmfs.log.latest about verbs being disabled "due to too many errors".? You can also see fewer verbs connections between nodes in "mmfsadm test verbs conn" output. Restarting GPFS on the nodes with waiters has fixed the issue for me, I don't know if IBM has any other tricks to fix this without a restart. --Joey On 9/12/19 8:16 AM, Damir Krstic wrote: On my cluster I have seen couple of long waiters such as this: gss01: Waiting 16.8543 sec since 09:07:02, ignored, thread 46230 VerbsReconnectThread: delaying for 43.145624000 more seconds, reason: delaying for next reconnect attempt I tried searching on gpfs wiki for this type of waiter, but was unable to find anything of value. Is this something to pay attention to, and what does this waiter mean? Thank you. Damir _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=WoT3TYlCvAM8RQxUISD9L6UzqY0I_ffCJTS-UHhw8z4&s=18A0j0Zmp8OwZ6Y6cc3HFe3OgFZRHIv8OeJcBpkaPwQ&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From alvise.dorigo at psi.ch Mon Sep 16 13:58:03 2019 From: alvise.dorigo at psi.ch (Dorigo Alvise (PSI)) Date: Mon, 16 Sep 2019 12:58:03 +0000 Subject: [gpfsug-discuss] Can 5-minutes frequent lsscsi command disrupt GPFS I/O on a Lenovo system ? Message-ID: <83A6EEB0EC738F459A39439733AE80452BEA85FE@MBX214.d.ethz.ch> Hello folks, recently I observed that calling every 5 minutes the command "lsscsi -g" on a Lenovo I/O node (a X3650 M5 connected to D3284 enclosures, part of a DSS-G220 system) can seriously compromise the GPFS I/O performance. (The motivation of running lsscsi every 5 minutes is a bit out of topic, but I can explain on request). What we observed is that there were several GPFS waiters telling that flushing caches to physical disk was impossible and they had to wait (possibly going in timeout). Is this something expected and/or observed by someone else in this community ? Thanks Regards, Alvise Dorigo -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewahl at osc.edu Mon Sep 16 15:50:24 2019 From: ewahl at osc.edu (Wahl, Edward) Date: Mon, 16 Sep 2019 14:50:24 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: , Message-ID: What package provides this /usr/lib/tuned/ file? Ed ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Olaf Weiser Sent: Monday, September 16, 2019 3:12 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hallo Heiner, usually, Spectrum Scale comes with a tuned profile (named scale) .. [root at nsd01 ~]# tuned-adm active Current active profile: scale in there [root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3 # Disable IPv6 net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 [root at nsd01 ~]# depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ... but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled... From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 09/13/2019 05:02 PM Subject: [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From cblack at nygenome.org Mon Sep 16 15:55:34 2019 From: cblack at nygenome.org (Christopher Black) Date: Mon, 16 Sep 2019 14:55:34 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: Message-ID: On our recent ESS systems we do not see /etc/tuned/scale/tuned.conf (or script.sh) owned by any package (rpm -qif ?). I?ve attached what we have on our ESS 5.3.3 systems. Best, Chris From: on behalf of "Wahl, Edward" Reply-To: gpfsug main discussion list Date: Monday, September 16, 2019 at 10:50 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? What package provides this /usr/lib/tuned/ file? Ed ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Olaf Weiser Sent: Monday, September 16, 2019 3:12 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hallo Heiner, usually, Spectrum Scale comes with a tuned profile (named scale) .. [root at nsd01 ~]# tuned-adm active Current active profile: scale in there [root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3 # Disable IPv6 net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 [root at nsd01 ~]# depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ... but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled... From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 09/13/2019 05:02 PM Subject: [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tuned.conf Type: application/octet-stream Size: 2859 bytes Desc: tuned.conf URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: script.sh Type: application/octet-stream Size: 270 bytes Desc: script.sh URL: From heinrich.billich at id.ethz.ch Mon Sep 16 16:49:57 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 16 Sep 2019 15:49:57 +0000 Subject: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? In-Reply-To: References: Message-ID: <766AA5C3-46BD-4B91-9D1E-52BC5FAB90A8@id.ethz.ch> Hello Olaf, Thank you, so we?ll try to get rid of IPv6. Actually we do have this settings active but I may have to add them to the initrd file, too. (See https://access.redhat.com/solutions/8709#?rhel7disable) to prevent ganesha from opening an IPv6 socket. It?s probably no big issue if ganesha uses IPv4overIPv6 for all connections, but to keep things simple I would like to avoid it. @Edward We got /etc/tuned/scale/tuned.conf with GSS/xCAT. I?m not sure whether it?s part of any rpm. Cheers, Heiner From: on behalf of Olaf Weiser Reply to: gpfsug main discussion list Date: Monday, 16 September 2019 at 09:12 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Hallo Heiner, usually, Spectrum Scale comes with a tuned profile (named scale) .. [root at nsd01 ~]# tuned-adm active Current active profile: scale in there [root at nsd01 ~]# cat /etc/tuned/scale/tuned.conf | tail -3 # Disable IPv6 net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 [root at nsd01 ~]# depending on .... what you need to achieve .. one might be forced to changed that.. e.g. for RoCE .. you need IPv6 to be active ... but for all other scenarios with SpectrumScale (at least what I'm aware of right now) ... IPv6 can be disabled... From: "Billich Heinrich Rainer (ID SD)" To: gpfsug main discussion list Date: 09/13/2019 05:02 PM Subject: [EXTERNAL] [gpfsug-discuss] Ganesha all IPv6 sockets - ist this to be expected? Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hello, I just noted that our ganesha daemons offer IPv6 sockets only, IPv4 traffic gets encapsulated. But all traffic to samba is IPv4, smbd offers both IPv4 and IPv6 sockets. I just wonder whether this is to be expected? Protocols support IPv4 only, so why running on IPv6 sockets only for ganesha? Did we configure something wrong and should completely disable IPv6 on the kernel level Any comment is welcome Cheers, Heiner -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== I did check with ss -l -t -4 ss -l -t -6 add -p to get the process name, too. do you get the same results on your ces nodes? [root at nas22ces04-i config_samples]# ss -l -t -4 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 8192 *:gpfs *:* LISTEN 0 50 *:netbios-ssn *:* LISTEN 0 128 *:5355 *:* LISTEN 0 128 *:sunrpc *:* LISTEN 0 128 *:ssh *:* LISTEN 0 100 127.0.0.1:smtp *:* LISTEN 0 10 10.250.135.24:4379 *:* LISTEN 0 128 *:32765 *:* LISTEN 0 50 *:microsoft-ds *:* [root at nas22ces04-i config_samples]# ss -l -t -6 State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 :::32767 :::* LISTEN 0 128 :::32768 :::* LISTEN 0 128 :::32769 :::* LISTEN 0 128 :::2049 :::* LISTEN 0 128 :::5355 :::* LISTEN 0 50 :::netbios-ssn :::* LISTEN 0 128 :::sunrpc :::* LISTEN 0 128 :::ssh :::* LISTEN 0 128 :::32765 :::* LISTEN 0 50 :::microsoft-ds :::* _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Sep 16 18:34:07 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 16 Sep 2019 17:34:07 +0000 Subject: [gpfsug-discuss] SSUG @ SC19 Update: Scheduling and Sponsorship Opportunities Message-ID: Two months until SC19 and the schedule is starting to come together, with a great mix of technical updates and user talks. I would like highlight a few items for you to be aware of: - Morning session: We?re currently trying to put together a morning ?new users? session for those new to Spectrum Scale. These talks would be focused on fundamentals and give an opportunity to ask questions. We?re tentatively thinking about starting around 9:30-10 AM on Sunday November 17th. Watch the mailing list for updates and on the http://spectrumscale.org site. - Sponsorships: We?re looking for sponsors. If your company is an IBM partner, uses/incorporates Spectrum Scale - please contact myself or Kristy Kallback-Rose. We are looking for sponsors to help with lunch (YES - we?d like to serve lunch this year!) and WiFi access during the user group meeting. Looking forward to seeing you all at SC19. Registration link coming soon, watch here: https://www.spectrumscaleug.org/event/spectrum-scale-user-group-meeting-sc19/ Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Sep 18 18:56:29 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 18 Sep 2019 17:56:29 +0000 Subject: [gpfsug-discuss] GPFS and POWER9 Message-ID: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log: Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3 Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [00000000115a2478] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Load/Store] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000003002a2a8400 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c016590000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001150b160] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001150b160 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c01fe80000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001086a7f0 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c00fe70000 Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4 I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed. Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)? Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Sep 19 11:44:46 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 19 Sep 2019 10:44:46 +0000 Subject: [gpfsug-discuss] GPFS and POWER9 In-Reply-To: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> References: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Thu Sep 19 15:20:53 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Thu, 19 Sep 2019 14:20:53 +0000 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? Message-ID: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Hello, Is it usual to see 200?000-400?000 open files for a single ganesha process? Or does this indicate that something ist wrong? We have some issues with ganesha (on spectrum scale protocol nodes) reporting NFS3ERR_IO in the log. I noticed that the affected nodes have a large number of open files, 200?000-400?000 open files per daemon (and 500 threads and about 250 client connections). Other nodes have 1?000 ? 10?000 open files by ganesha only and don?t show the issue. If someone could explain how ganesha decides which files to keep open and which to close that would help, too. As NFSv3 is stateless the client doesn?t open/close a file, it?s the server to decide when to close it? We do have a few NFSv4 clients, too. Are there certain access patterns that can trigger such a large number of open file? Maybe traversing and reading a large number of small files? Thank you, Heiner I did count the open files by counting the entries in /proc//fd/ . With several 100k entries I failed to do a ?ls -ls? to list all the symbolic links, hence I can?t relate the open files to different exports easily. I did post this to the ganesha mailing list, too. -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederik.ferner at diamond.ac.uk Thu Sep 19 15:30:45 2019 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Thu, 19 Sep 2019 15:30:45 +0100 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Message-ID: Heiner, we are seeing similar issues with CES/ganesha NFS, in our case it exclusively with NFSv3 clients. What is maxFilesToCache set to on your ganesha node(s)? In our case ganesha was running into the limit of open file descriptors because maxFilesToCache was set at a low default and for now we've increased it to 1M. It seemed that ganesha was never releasing files even after clients unmounted the file system. We've only recently made the change, so we'll see how much that improved the situation. I thought we had a reproducer but after our recent change, I can now no longer successfully reproduce the increase in open files not being released. Kind regards, Frederik On 19/09/2019 15:20, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > Is it usual to see 200?000-400?000 open files for a single ganesha > process? Or does this indicate that something ist wrong? > > We have some issues with ganesha (on spectrum scale protocol nodes) > ?reporting NFS3ERR_IO in the log. I noticed that the affected nodes > have a large number of open files, 200?000-400?000 open files per daemon > (and 500 threads and about 250 client connections). Other nodes have > 1?000 ? 10?000 open files by ganesha only and don?t show the issue. > > If someone could explain how ganesha decides which files to keep open > and which to close that would help, too. As NFSv3 is stateless the > client doesn?t open/close a file, it?s the server to decide when to > close it? We do have a few NFSv4 clients, too. > > Are there certain access patterns that can trigger such a large number > of open file? Maybe traversing and reading a large number of small files? > > Thank you, > > Heiner > > I did count the open files ?by counting the entries in /proc/ ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to > list all the symbolic links, hence I can?t relate the open files to > different exports easily. > > I did post this to the ganesha mailing list, too. > > -- > > ======================= > > Heinrich Billich > > ETH Z?rich > > Informatikdienste > > Tel.: +41 44 632 72 56 > > heinrich.billich at id.ethz.ch > > ======================== > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom From S.J.Thompson at bham.ac.uk Thu Sep 19 16:18:47 2019 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 19 Sep 2019 15:18:47 +0000 Subject: [gpfsug-discuss] GPFS and POWER9 In-Reply-To: References: <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> Message-ID: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk> Hi Andrew, Yes, but not only. We use the two SFP+ ports from the Broadcom supplied card + the bifurcated Mellanox card in them. Simon From: on behalf of "abeattie at au1.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 19 September 2019 at 11:45 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] GPFS and POWER9 Simon, are you using Intel 10Gb Network Adapters with RH 7.6 by anychance? regards Andrew Beattie File and Object Storage Technical Specialist - A/NZ IBM Systems - Storage Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Simon Thompson Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [EXTERNAL] [gpfsug-discuss] GPFS and POWER9 Date: Thu, Sep 19, 2019 8:42 PM Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log: Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3 Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [00000000115a2478] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Load/Store] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000003002a2a8400 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c016590000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001150b160] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001150b160 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c01fe80000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001086a7f0 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c00fe70000 Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4 I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed. Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)? Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Thu Sep 19 19:38:53 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Thu, 19 Sep 2019 18:38:53 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Ganesha_daemon_has_400=27000_open_file?= =?utf-8?q?s_-_is_this=09unusual=3F?= In-Reply-To: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Thu Sep 19 22:34:33 2019 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Thu, 19 Sep 2019 21:34:33 +0000 Subject: [gpfsug-discuss] GPFS and POWER9 In-Reply-To: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk> References: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>, <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Sep 19 23:41:08 2019 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 19 Sep 2019 22:41:08 +0000 Subject: [gpfsug-discuss] Leftover GUI events after ESS upgrade Message-ID: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com> I just upgraded to ESS 5.3.4-1, and during the process these appeared. They only show up in the GUI. They don?t appear in gnrhelathcheck or mmhealth. pdisk checks are clearAny idea how to get rid of them? GSSIO1-HS GNR pdisk rg_gssio1-hs/n001v001 is diagnosing GSSIO1-HS GNR pdisk rg_gssio2-hs/n001v002 is diagnosing Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From TROPPENS at de.ibm.com Fri Sep 20 09:08:01 2019 From: TROPPENS at de.ibm.com (Ulf Troppens) Date: Fri, 20 Sep 2019 10:08:01 +0200 Subject: [gpfsug-discuss] Agenda and registration link // Oct 10 - Spectrum Scale NYC User Meeting Message-ID: Draft agenda and registration link are now available: https://www.spectrumscaleug.org/event/spectrum-scale-nyc-user-meeting-2019/ -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Forwarded by Ulf Troppens/Germany/IBM on 20/09/2019 09:37 ----- From: "Ulf Troppens" To: gpfsug main discussion list Date: 11/09/2019 14:27 Subject: [EXTERNAL] [gpfsug-discuss] Save the date: Oct 10 - Spectrum Scale NYC User Meeting Sent by: gpfsug-discuss-bounces at spectrumscale.org Greetings, NYU Langone and IBM will host a Spectrum Scale User Meeting on October 10. Many senior engineers of our development lab in Poughkeepsie will attend and present. Details with agenda, exact location and registration link will follow. Best Ulf -- IBM Spectrum Scale Development - Client Engagements & Solutions Delivery Consulting IT Specialist Author "Storage Networks Explained" IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Matthias Hartmann Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=kZaabFheMr5-INuBtDMnDjxzZMuvvQ-K0cx1FAfh4lg&m=I3TzCv5SKxKb51eAL_blo-XwctX64z70ayrZKERanWA&s=OSKGngwXAoOemFy3HkctexuIpBJQu8NPeTkC_MMQBks&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Fri Sep 20 10:14:58 2019 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Fri, 20 Sep 2019 11:14:58 +0200 Subject: [gpfsug-discuss] Leftover GUI events after ESS upgrade In-Reply-To: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com> References: <777F74C6-2670-4030-92AF-A739B2514862@nuance.com> Message-ID: Hello Bob, this event is a "Notice": You can use the action "Mark Selected Notices as Read" or "Mark All Notices as Read"in the GUI Event Groups or Individual Events grid. Notice events are transient by nature and don't imply a permanent state change of an entity. It seems that during the upgrade, mmhealth had probed the pdisk and the disk hospital was diagnosing the pdisk at this time, but eventually disk hospital placed the pdisk back to normal state, Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 162 4159920 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: "Oesterlin, Robert" To: gpfsug main discussion list Date: 20.09.2019 00:53 Subject: [EXTERNAL] [gpfsug-discuss] Leftover GUI events after ESS upgrade Sent by: gpfsug-discuss-bounces at spectrumscale.org I just upgraded to ESS 5.3.4-1, and during the process these appeared. They only show up in the GUI. They don?t appear in gnrhelathcheck or mmhealth. pdisk checks are clearAny idea how to get rid of them? GSSIO1-HS GNR pdisk rg_gssio1-hs/n001v001 is diagnosing GSSIO1-HS GNR pdisk rg_gssio2-hs/n001v002 is diagnosing Bob Oesterlin Sr Principal Storage Engineer, Nuance _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=hLyf83U0otjISdpV5zl1cSCPVFFUF61ny3jWvv-5kNQ&s=ptMGcpNhnRTogPO2CN_l6jhC-vCN-VQAf53HmRLQDq8&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 14525383.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From heinrich.billich at id.ethz.ch Mon Sep 23 10:33:02 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 23 Sep 2019 09:33:02 +0000 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Message-ID: <9D53BE88-A5FC-469F-9362-F2EC67E393B7@id.ethz.ch> Hello Frederik, Thank you. I now see a similar behavior: Ganesha has 500k open files while the node is suspended since 2+hours. I would expect that some cleanup job does remove most of the open FD after a much shorter while. Our systems have an upper limit of 1M open files per process and these spectrum scale settings: ! maxFilesToCache 1048576 ! maxStatCache 2097152 Our ganesha version is 2.5.3. (gpfs.nfs-ganesha-2.5.3-ibm036.10.el7). I don't see the issue with gpfs.nfs-ganesha-2.5.3-ibm030.01.el7. But this second cluster also has a different load pattern. I did also post my initial question to the ganesha mailing list and want to share the reply I've got from Daniel Gryniewicz. Cheers, Heiner Daniel Gryniewicz So, it's not impossible, based on the workload, but it may also be a bug. For global FDs (All NFSv3 and stateless NFSv4), we obviously cannot know when the client closes the FD, and opening/closing all the time causes a large performance hit. So, we cache open FDs. All handles in MDCACHE live on the LRU. This LRU is divided into 2 levels. Level 1 is more active handles, and they can have open FDs. Various operation can demote a handle to level 2 of the LRU. As part of this transition, the global FD on that handle is closed. Handles that are actively in use (have a refcount taken on them) are not eligible for this transition, as the FD may be being used. We have a background thread that runs, and periodically does this demotion, closing the FDs. This thread runs more often when the number of open FDs is above FD_HwMark_Percent of the available number of FDs, and runs constantly when the open FD count is above FD_Limit_Percent of the available number of FDs. So, a heavily used server could definitely have large numbers of FDs open. However, there have also, in the past, been bugs that would either keep the FDs from being closed, or would break the accounting (so they were closed, but Ganesha still thought they were open). You didn't say what version of Ganesha you're using, so I can't tell if one of those bugs apply. Daniel ?On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" wrote: Heiner, we are seeing similar issues with CES/ganesha NFS, in our case it exclusively with NFSv3 clients. What is maxFilesToCache set to on your ganesha node(s)? In our case ganesha was running into the limit of open file descriptors because maxFilesToCache was set at a low default and for now we've increased it to 1M. It seemed that ganesha was never releasing files even after clients unmounted the file system. We've only recently made the change, so we'll see how much that improved the situation. I thought we had a reproducer but after our recent change, I can now no longer successfully reproduce the increase in open files not being released. Kind regards, Frederik On 19/09/2019 15:20, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > Is it usual to see 200?000-400?000 open files for a single ganesha > process? Or does this indicate that something ist wrong? > > We have some issues with ganesha (on spectrum scale protocol nodes) > reporting NFS3ERR_IO in the log. I noticed that the affected nodes > have a large number of open files, 200?000-400?000 open files per daemon > (and 500 threads and about 250 client connections). Other nodes have > 1?000 ? 10?000 open files by ganesha only and don?t show the issue. > > If someone could explain how ganesha decides which files to keep open > and which to close that would help, too. As NFSv3 is stateless the > client doesn?t open/close a file, it?s the server to decide when to > close it? We do have a few NFSv4 clients, too. > > Are there certain access patterns that can trigger such a large number > of open file? Maybe traversing and reading a large number of small files? > > Thank you, > > Heiner > > I did count the open files by counting the entries in /proc/ ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to > list all the symbolic links, hence I can?t relate the open files to > different exports easily. > > I did post this to the ganesha mailing list, too. > > -- > > ======================= > > Heinrich Billich > > ETH Z?rich > > Informatikdienste > > Tel.: +41 44 632 72 56 > > heinrich.billich at id.ethz.ch > > ======================== > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From heinrich.billich at id.ethz.ch Mon Sep 23 11:43:06 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Mon, 23 Sep 2019 10:43:06 +0000 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Message-ID: <72079C31-1E3E-4F69-B428-480620466353@id.ethz.ch> Hello Malhal, Thank you. Actually I don?t see the parameter Cache_FDs in our ganesha config. But when I trace LRU processing I see that almost no FDs get released. And the number of FDs given in the log messages doesn?t match what I see in /proc//fd/. I see 512k open files while the logfile give 600k. Even 4hours since the I suspended the node and all i/o activity stopped I see 500k open files and LRU processing doesn?t close any of them. This looks like a bug in gpfs.nfs-ganesha-2.5.3-ibm036.10.el7. I?ll open a case with IBM. We did see gansha to fail to open new files and hence client requests to fail. I assume that 500K FDs compared to 10K FDs as before create some notable overhead for ganesha, spectrum scale and kernel and withdraw resources from samba. I?ll post to the list once we got some results. Cheers, Heiner Start of LRU processing 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1027 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51350 totalclosed:6 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1027 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1027 closing 0 descriptors 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1028 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51400 totalclosed:6 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1028 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1028 closing 0 descriptors End of log 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1029 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1029 closing 0 descriptors 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1030 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :formeropen=607025 totalwork=0 workpass=51500 totalclosed:6 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Reaping up to 50 entries from lane 1030 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run_lane :INODE LRU :DEBUG :Actually processed 50 entries on lane 1030 closing 0 descriptors 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :DEBUG :After work, open_fd_count:607024 count:29503718 fdrate:1908874353 threadwait=9 2019-09-23 11:37:30 : epoch 00100524 : nas12ces01 : gpfs.ganesha.nfsd-100816[cache_lru] lru_run :INODE LRU :F_DBG :currentopen=607024 futility=0 totalwork=51550 biggest_window=335544 extremis=0 lanes=1031 fds_lowat=167772 From: on behalf of Malahal R Naineni Reply to: gpfsug main discussion list Date: Thursday, 19 September 2019 at 20:39 To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? NFSv3 doesn't have open/close requests, so nfs-ganesha opens a file for read/write when there is an NFSv3 read/write request. It does cache file descriptors, so its open count can be very large. If you have 'Cache_FDs = true" in your config, ganesha aggressively caches file descriptors. Taking traces with COMPONENT_CACHE_INODE_LRU level set to full debug should give us better insight on what is happening when the the open file descriptors count is very high. When the I/O failure happens or when the open fd count is high, you could do the following: 1. ganesha_mgr set_log COMPONENT_CACHE_INODE_LRU FULL_DEBUG 2. wait for 90 seconds, then run 3. ganesha_mgr set_log COMPONENT_CACHE_INODE_LRU EVENT Regards, Malahal. ----- Original message ----- From: "Billich Heinrich Rainer (ID SD)" Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: [EXTERNAL] [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? Date: Thu, Sep 19, 2019 7:51 PM Hello, Is it usual to see 200?000-400?000 open files for a single ganesha process? Or does this indicate that something ist wrong? We have some issues with ganesha (on spectrum scale protocol nodes) reporting NFS3ERR_IO in the log. I noticed that the affected nodes have a large number of open files, 200?000-400?000 open files per daemon (and 500 threads and about 250 client connections). Other nodes have 1?000 ? 10?000 open files by ganesha only and don?t show the issue. If someone could explain how ganesha decides which files to keep open and which to close that would help, too. As NFSv3 is stateless the client doesn?t open/close a file, it?s the server to decide when to close it? We do have a few NFSv4 clients, too. Are there certain access patterns that can trigger such a large number of open file? Maybe traversing and reading a large number of small files? Thank you, Heiner I did count the open files by counting the entries in /proc//fd/ . With several 100k entries I failed to do a ?ls -ls? to list all the symbolic links, hence I can?t relate the open files to different exports easily. I did post this to the ganesha mailing list, too. -- ======================= Heinrich Billich ETH Z?rich Informatikdienste Tel.: +41 44 632 72 56 heinrich.billich at id.ethz.ch ======================== _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From heinrich.billich at id.ethz.ch Tue Sep 24 09:52:34 2019 From: heinrich.billich at id.ethz.ch (Billich Heinrich Rainer (ID SD)) Date: Tue, 24 Sep 2019 08:52:34 +0000 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> Message-ID: <280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch> Hello Frederik, Just some addition, maybe its of interest to someone: The number of max open files for Ganesha is based on maxFilesToCache. Its. 80%of maxFilesToCache up to an upper and lower limits of 2000/1M. The active setting is visible in /etc/sysconfig/ganesha. Cheers, Heiner ?On 19.09.19, 16:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Frederik Ferner" wrote: Heiner, we are seeing similar issues with CES/ganesha NFS, in our case it exclusively with NFSv3 clients. What is maxFilesToCache set to on your ganesha node(s)? In our case ganesha was running into the limit of open file descriptors because maxFilesToCache was set at a low default and for now we've increased it to 1M. It seemed that ganesha was never releasing files even after clients unmounted the file system. We've only recently made the change, so we'll see how much that improved the situation. I thought we had a reproducer but after our recent change, I can now no longer successfully reproduce the increase in open files not being released. Kind regards, Frederik On 19/09/2019 15:20, Billich Heinrich Rainer (ID SD) wrote: > Hello, > > Is it usual to see 200?000-400?000 open files for a single ganesha > process? Or does this indicate that something ist wrong? > > We have some issues with ganesha (on spectrum scale protocol nodes) > reporting NFS3ERR_IO in the log. I noticed that the affected nodes > have a large number of open files, 200?000-400?000 open files per daemon > (and 500 threads and about 250 client connections). Other nodes have > 1?000 ? 10?000 open files by ganesha only and don?t show the issue. > > If someone could explain how ganesha decides which files to keep open > and which to close that would help, too. As NFSv3 is stateless the > client doesn?t open/close a file, it?s the server to decide when to > close it? We do have a few NFSv4 clients, too. > > Are there certain access patterns that can trigger such a large number > of open file? Maybe traversing and reading a large number of small files? > > Thank you, > > Heiner > > I did count the open files by counting the entries in /proc/ ganesha>/fd/ . With several 100k entries I failed to do a ?ls -ls? to > list all the symbolic links, hence I can?t relate the open files to > different exports easily. > > I did post this to the ganesha mailing list, too. > > -- > > ======================= > > Heinrich Billich > > ETH Z?rich > > Informatikdienste > > Tel.: +41 44 632 72 56 > > heinrich.billich at id.ethz.ch > > ======================== > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From valdis.kletnieks at vt.edu Tue Sep 24 21:41:07 2019 From: valdis.kletnieks at vt.edu (Valdis Kl=?utf-8?Q?=c4=93?=tnieks) Date: Tue, 24 Sep 2019 16:41:07 -0400 Subject: [gpfsug-discuss] Ganesha daemon has 400'000 open files - is this unusual? In-Reply-To: <280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch> References: <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch> <280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch> Message-ID: <269692.1569357667@turing-police> On Tue, 24 Sep 2019 08:52:34 -0000, "Billich Heinrich Rainer (ID SD)" said: > Just some addition, maybe its of interest to someone: The number of max open > files for Ganesha is based on maxFilesToCache. Its. 80%of maxFilesToCache up to > an upper and lower limits of 2000/1M. The active setting is visible in > /etc/sysconfig/ganesha. Note that strictly speaking, the values in /etc/sysconfig are in general the values that will be used at next restart - it's totally possible for the system to boot, the then-current values be picked up from /etc/sysconfig, and then any number of things, from configuration automation tools like Ansible, to a cow-orker sysadmin armed with nothing but /usr/bin/vi, to have changed the values without you knowing about it and the daemons not be restarted yet... (Let's just say that in 4 decades of doing this stuff, I've been surprised by that sort of thing a few times. :) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 832 bytes Desc: not available URL: From mnaineni at in.ibm.com Wed Sep 25 18:06:18 2019 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Wed, 25 Sep 2019 17:06:18 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Ganesha_daemon_has_400=27000_open_file?= =?utf-8?q?s_-_is=09this_unusual=3F?= In-Reply-To: <269692.1569357667@turing-police> References: <269692.1569357667@turing-police>, <819CAAD3-FB8B-4FF1-B017-45A4C48A0BCE@id.ethz.ch><280DF857-C2EA-4B1D-BBB4-4986C3DC1C93@id.ethz.ch> Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: att6j9ca.dat Type: application/octet-stream Size: 849 bytes Desc: not available URL: From L.R.Sudbery at bham.ac.uk Thu Sep 26 10:38:09 2019 From: L.R.Sudbery at bham.ac.uk (Luke Sudbery) Date: Thu, 26 Sep 2019 09:38:09 +0000 Subject: [gpfsug-discuss] GPFS and POWER9 In-Reply-To: References: <878CB977-1C05-4167-81D6-DED62790182C@bham.ac.uk>, <2271395E-1767-49D0-9EAE-5F8891682AA0@bham.ac.uk> Message-ID: <3b15db460ac1459e9ca53bec00f30833@bham.ac.uk> We think our issue was down to numa settings actually - making mmfsd allocate GPU memory. Makes sense given the type of error. Tomer suggested to Simon we set numactlOptioni to "0 8", as per: https://www-01.ibm.com/support/docview.wss?uid=isg1IJ02794 Our tests are not crashing since setting then ? we need to roll it out on all nodes to confirm its fixed all our hangs/reboots. Cheers, Luke -- Luke Sudbery Architecture, Infrastructure and Systems Advanced Research Computing, IT Services Room 132, Computer Centre G5, Elms Road Please note I don?t work on Monday and work from home on Friday. From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of abeattie at au1.ibm.com Sent: 19 September 2019 22:35 To: gpfsug-discuss at spectrumscale.org Cc: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] GPFS and POWER9 Simon, I have an open support call that required Redhat to create a kernel patch for RH 7.6 because of issues with the Intel x710 network adapter - I can't tell you if its related to your issue or not but it would cause the GPFS cluster to reboot and the affected node to reboot if we tried to do almost anything with that intel adapter regards, Andrew Beattie File and Object Storage Technical Specialist - A/NZ IBM Systems - Storage Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Simon Thompson > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list > Cc: Subject: [EXTERNAL] Re: [gpfsug-discuss] GPFS and POWER9 Date: Fri, Sep 20, 2019 1:18 AM Hi Andrew, Yes, but not only. We use the two SFP+ ports from the Broadcom supplied card + the bifurcated Mellanox card in them. Simon From: > on behalf of "abeattie at au1.ibm.com" > Reply-To: "gpfsug-discuss at spectrumscale.org" > Date: Thursday, 19 September 2019 at 11:45 To: "gpfsug-discuss at spectrumscale.org" > Subject: Re: [gpfsug-discuss] GPFS and POWER9 Simon, are you using Intel 10Gb Network Adapters with RH 7.6 by anychance? regards Andrew Beattie File and Object Storage Technical Specialist - A/NZ IBM Systems - Storage Phone: 614-2133-7927 E-mail: abeattie at au1.ibm.com ----- Original message ----- From: Simon Thompson > Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" > Cc: Subject: [EXTERNAL] [gpfsug-discuss] GPFS and POWER9 Date: Thu, Sep 19, 2019 8:42 PM Recently we?ve been having some issues with some of our POWER9 systems. They are occasionally handing or rebooting, in one case, we?ve found we can cause them to do it by running some MPI IOR workload to GPFS. Every instance we?ve seen which has logged something to syslog has had mmfsd referenced, but we don?t know if that is a symptom or a cause. (sometimes they just hang and we don?t see such a message) We see the following in the kern log: Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: #011Unknown Malfunction Alert of type 3 Sep 18 18:45:14 bear-pg0306u11a kernel: Hypervisor Maintenance interrupt [Recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: Error detail: Malfunction Alert Sep 18 18:45:14 bear-pg0306u11a kernel: #011HMER: 8040000000000000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [00000000115a2478] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Load/Store] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000003002a2a8400 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c016590000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001150b160] PID: 141380 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001150b160 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c01fe80000 Sep 18 18:45:14 bear-pg0306u11a kernel: Severe Machine check interrupt [Not recovered] Sep 18 18:45:14 bear-pg0306u11a kernel: NIP: [000000001086a7f0] PID: 25926 Comm: mmfsd Sep 18 18:45:14 bear-pg0306u11a kernel: Initiator: CPU Sep 18 18:45:14 bear-pg0306u11a kernel: Error type: UE [Instruction fetch] Sep 18 18:45:14 bear-pg0306u11a kernel: Effective address: 000000001086a7f0 Sep 18 18:45:14 bear-pg0306u11a kernel: Physical address: 000003c00fe70000 Sep 18 18:45:14 bear-pg0306u11a kernel: mmfsd[25926]: unhandled signal 7 at 000000001086a7f0 nip 000000001086a7f0 lr 000000001086a7f0 code 4 I?ve raised a hardware ticket with IBM, as traditionally a machine check exception would likely be a hardware/firmware issue. Anyone else seen this sort of behaviour? Its multiple boxes doing this, but they do all have the same firmware/rhel/gpfs stack installed. Asking here as they always reference mmfsd PIDs ? (but maybe it?s a symptom rather than cause)? Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.mattsson at maxiv.lu.se Thu Sep 26 10:55:45 2019 From: andreas.mattsson at maxiv.lu.se (Andreas Mattsson) Date: Thu, 26 Sep 2019 09:55:45 +0000 Subject: [gpfsug-discuss] afmRefreshAsync questions Message-ID: Hi, Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up. Enabling this feature has had zero impact on performance of the software though. The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x. Would this feature still have any effect in this setup? Regards, Andreas Mattsson ____________________________________________ [X] Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpuvvada at in.ibm.com Fri Sep 27 09:23:13 2019 From: vpuvvada at in.ibm.com (Venkateswara R Puvvada) Date: Fri, 27 Sep 2019 13:53:13 +0530 Subject: [gpfsug-discuss] afmRefreshAsync questions In-Reply-To: References: Message-ID: Hi, Both storage and client clusters have to be on 5.0.3.x to get the AFM revalidation performance with afmRefreshAsync. What are the refresh intervals ?, you could also try increasing them. Is this config option set at fileset level or cluster level ? ~Venkat (vpuvvada at in.ibm.com) From: Andreas Mattsson To: GPFS User Group Date: 09/26/2019 03:26 PM Subject: [EXTERNAL] [gpfsug-discuss] afmRefreshAsync questions Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, Due to having a data analysis software that isn't running well at all in our AFM caches, it runs 4-6 times slower on an AFM cache than on a non-AFM fileset on the same storage system, I wanted to try out the afmRefreshAsync feature that came with 5.0.3 to see if it is the cache data refresh that is holding things up. Enabling this feature has had zero impact on performance of the software though. The storage cluster is running 5.0.3.x, and afmRefreshAsync has been set there, but at the moment the remote-mounting client cluster is still running 5.0.2.x. Would this feature still have any effect in this setup? Regards, Andreas Mattsson ____________________________________________ Andreas Mattsson Systems Engineer MAX IV Laboratory Lund University P.O. Box 118, SE-221 00 Lund, Sweden Visiting address: Fotongatan 2, 224 84 Lund Mobile: +46 706 64 95 44 andreas.mattsson at maxiv.lu.se www.maxiv.se _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=92LOlNh2yLzrrGTDA7HnfF8LFr55zGxghLZtvZcZD7A&m=tjCOcTjZ_AjP3N1mpspwuLu5u2XOFb5LkZqVAwX3wk8&s=tD6X2XM1HPMqWxSg-IelnstWbneQ7On4xfEVkCajtPE&e= -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4232 bytes Desc: not available URL: From sakkuma4 at in.ibm.com Fri Sep 27 11:31:42 2019 From: sakkuma4 at in.ibm.com (Saket Kumar11) Date: Fri, 27 Sep 2019 10:31:42 +0000 Subject: [gpfsug-discuss] afmRefreshAsync questions In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: