From chris.schlipalius at pawsey.org.au Mon Oct 1 06:53:06 2018 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Mon, 01 Oct 2018 13:53:06 +0800 Subject: [gpfsug-discuss] Upcoming meeting: Australian Spectrum Scale Usergroup 15th October 2018 Melbourne Message-ID: <676180C3-1B36-4D25-8325-532AF15C6552@pawsey.org.au> Dear members, Please note the next Australian Usergroup is confirmed. If you plan to attend, please register: http://bit.ly/2wHGuhY Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 13 Burvill Court Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 10709 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Tue Oct 2 09:12:28 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 2 Oct 2018 08:12:28 +0000 Subject: [gpfsug-discuss] LDAP in GUI / 5.0.2 Message-ID: Hi all, >From the release notes: "5.0.2: Added option to configure an external authentication method to manage the GUI user access in the Services > GUI page. " Does this mean I should be able to configure LDAP through the GUI because at the moment I'm not seeing any relevant options. Running 5.0.2 DME and minReleaseLevel=latest. Do I need to restart ALL nodes for this to take effect, or have I misunderstood the meaning of the above? Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Tue Oct 2 09:27:02 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Tue, 2 Oct 2018 10:27:02 +0200 Subject: [gpfsug-discuss] LDAP in GUI / 5.0.2 In-Reply-To: References: Message-ID: Hello Richard, I am sorry, it seems that the release notes document were note refreshed with the latest information. The GUI pages to modify external user authentication for GUI users have not made it into the 5.0.2 release. The Knowledge center is correct in this respect: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1xx_soc.htm Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Date: 02.10.2018 10:12 Subject: [gpfsug-discuss] LDAP in GUI / 5.0.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, >From the release notes: ?5.0.2: Added option to configure an external authentication method to manage the GUI user access in the Services > GUI page. ? Does this mean I should be able to configure LDAP through the GUI because at the moment I?m not seeing any relevant options. Running 5.0.2 DME and minReleaseLevel=latest. Do I need to restart ALL nodes for this to take effect, or have I misunderstood the meaning of the above? Thanks Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C467306.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Tue Oct 2 09:44:23 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 2 Oct 2018 08:44:23 +0000 Subject: [gpfsug-discuss] LDAP in GUI / 5.0.2 In-Reply-To: References: Message-ID: Alright, thanks for clearing that up. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Markus Rohwedder Sent: 02 October 2018 09:27 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LDAP in GUI / 5.0.2 Hello Richard, I am sorry, it seems that the release notes document were note refreshed with the latest information. The GUI pages to modify external user authentication for GUI users have not made it into the 5.0.2 release. The Knowledge center is correct in this respect: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1xx_soc.htm Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development ________________________________ Phone: +49 7034 6430190 IBM Deutschland Research & Development [cid:image002.png at 01D45A34.7F2A60F0] E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ [Inactive hide details for "Sobey, Richard A" ---02.10.2018 10:12:51---Hi all, From the release notes:]"Sobey, Richard A" ---02.10.2018 10:12:51---Hi all, From the release notes: From: "Sobey, Richard A" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 02.10.2018 10:12 Subject: [gpfsug-discuss] LDAP in GUI / 5.0.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, From the release notes: ?5.0.2: Added option to configure an external authentication method to manage the GUI user access in the Services > GUI page. ? Does this mean I should be able to configure LDAP through the GUI because at the moment I?m not seeing any relevant options. Running 5.0.2 DME and minReleaseLevel=latest. Do I need to restart ALL nodes for this to take effect, or have I misunderstood the meaning of the above? Thanks Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 166 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 4659 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.gif Type: image/gif Size: 105 bytes Desc: image003.gif URL: From Renar.Grunenberg at huk-coburg.de Tue Oct 2 11:49:33 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 2 Oct 2018 10:49:33 +0000 Subject: [gpfsug-discuss] V5.0.2 and Maxblocksize Message-ID: <796971E1-7AC1-40E1-BB4E-879C704DA054@huk-coburg.de> Hallo Spectrumscale-team, We installed the new Version 5.0.2 and had the hope that the maxblocksize Parameter are online changeable. But dont. Are there a timeframe when this 24/7 gap are fixed. The Problem here we can not shuting down the complete Cluster. Regards Renar Von meinem iPhone gesendet Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= From sandeep.patil at in.ibm.com Wed Oct 3 16:18:06 2018 From: sandeep.patil at in.ibm.com (Sandeep Ramesh) Date: Wed, 3 Oct 2018 15:18:06 +0000 Subject: [gpfsug-discuss] Latest Technical Blogs on IBM Spectrum Scale (Q3 2018) In-Reply-To: References: Message-ID: Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q3 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. How NFS exports became more dynamic with Spectrum Scale 5.0.2 https://developer.ibm.com/storage/2018/10/02/nfs-exports-became-dynamic-spectrum-scale-5-0-2/ HPC storage on AWS (IBM Spectrum Scale) https://developer.ibm.com/storage/2018/10/02/hpc-storage-aws-ibm-spectrum-scale/ Upgrade with Excluding the node(s) using Install-toolkit https://developer.ibm.com/storage/2018/09/30/upgrade-excluding-nodes-using-install-toolkit/ Offline upgrade using Install-toolkit https://developer.ibm.com/storage/2018/09/30/offline-upgrade-using-install-toolkit/ IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/21/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-2/ What?s New in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/15/whats-new-ibm-spectrum-scale-5-0-2/ Starting IBM Spectrum Scale 5.0.2 release, the installation toolkit supports upgrade rerun if fresh upgrade fails. https://developer.ibm.com/storage/2018/09/15/starting-ibm-spectrum-scale-5-0-2-release-installation-toolkit-supports-upgrade-rerun-fresh-upgrade-fails/ IBM Spectrum Scale installation toolkit ? enhancements over releases ? 5.0.2.0 https://developer.ibm.com/storage/2018/09/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases-5-0-2-0/ Announcing HDP 3.0 support with IBM Spectrum Scale https://developer.ibm.com/storage/2018/08/31/announcing-hdp-3-0-support-ibm-spectrum-scale/ IBM Spectrum Scale Tuning Overview for Hadoop Workload https://developer.ibm.com/storage/2018/08/20/ibm-spectrum-scale-tuning-overview-hadoop-workload/ Making the Most of Multicloud Storage https://developer.ibm.com/storage/2018/08/13/making-multicloud-storage/ Disaster Recovery for Transparent Cloud Tiering using SOBAR https://developer.ibm.com/storage/2018/08/13/disaster-recovery-transparent-cloud-tiering-using-sobar/ Your Optimal Choice of AI Storage for Today and Tomorrow https://developer.ibm.com/storage/2018/08/10/spectrum-scale-ai-workloads/ Analyze IBM Spectrum Scale File Access Audit with ELK Stack https://developer.ibm.com/storage/2018/07/30/analyze-ibm-spectrum-scale-file-access-audit-elk-stack/ Mellanox SX1710 40G switch MLAG configuration for IBM ESS https://developer.ibm.com/storage/2018/07/12/mellanox-sx1710-40g-switcher-mlag-configuration/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? SMB and NFS Access issues https://developer.ibm.com/storage/2018/07/10/protocol-problem-determination-guide-ibm-spectrum-scale-smb-nfs-access-issues/ Access Control in IBM Spectrum Scale Object https://developer.ibm.com/storage/2018/07/06/access-control-ibm-spectrum-scale-object/ IBM Spectrum Scale HDFS Transparency Docker support https://developer.ibm.com/storage/2018/07/06/ibm-spectrum-scale-hdfs-transparency-docker-support/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? Log Collection https://developer.ibm.com/storage/2018/07/04/protocol-problem-determination-guide-ibm-spectrum-scale-log-collection/ Redpapers IBM Spectrum Scale Immutability Introduction, Configuration Guidance, and Use Cases http://www.redbooks.ibm.com/abstracts/redp5507.html?Open Certifications Assessment of the immutability function of IBM Spectrum Scale Version 5.0 in accordance to US SEC17a-4f, EU GDPR Article 21 Section 1, German and Swiss laws and regulations in collaboration with KPMG. Certificate: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?DE968667B47544FF83F6CCDCF37E5FB5 Full assessment report: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?B290411BE1224F5A9B4D24663BCD3C5D For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 07/03/2018 12:13 AM Subject: Re: Latest Technical Blogs on Spectrum Scale (Q2 2018) Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q2 2018). We now have over 100+ developer blogs. As discussed in User Groups, passing it along: IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ IBM Spectrum Scale ILM Policies https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/ IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ Management GUI enhancements in IBM Spectrum Scale release 5.0.1 https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/ Managing IBM Spectrum Scale services through GUI https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/ Use AWS CLI with IBM Spectrum Scale? object storage https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/ Hadoop Storage Tiering with IBM Spectrum Scale https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/ How many Files on my Filesystem? https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/ Recording Spectrum Scale Object Stats for Potential Billing like Purpose using Elasticsearch https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/ New features in IBM Elastic Storage Server (ESS) Version 5.3 https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/ Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send earlier) https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19 Redpapers Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering http://www.redbooks.ibm.com/abstracts/redp5411.html?Open SAP HANA and ESS: A Winning Combination (Update) http://www.redbooks.ibm.com/abstracts/redp5436.html?Open Others IBM Spectrum Scale Software Version Recommendation Preventive Service Planning (Updated) http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in HCLS https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN& For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 03/27/2018 05:23 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q1 2018). As discussed in User Groups, passing it along: GDPR Compliance and Unstructured Data Storage https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/ IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and highlights https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/ Management GUI enhancements in IBM Spectrum Scale release 5.0.0 https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/ IBM Spectrum Scale 5.0.0 ? What?s new in NFS? https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/ Benefits and implementation of Spectrum Scale sudo wrappers https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/ IBM Spectrum Scale: Big Data and Analytics Solution Brief https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/ Variant Sub-blocks in Spectrum Scale 5.0 https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/ Compression support in Spectrum Scale 5.0.0 https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM Spectrum Scale on AWS. This solution helps the users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale. Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4 Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY. For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 01/10/2018 12:13 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, Here are list of development blogs in the last quarter. Passing it to this email group as Doris had got a feedback in the UG meetings to notify the members with the latest updates periodically. Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ IBM Spectrum Scale MMFSCK ? Savvy Enhancements https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/ ESS Disk Management https://developer.ibm.com/storage/2018/01/02/ess-disk-management/ IBM Spectrum Scale Object Protocol On Ubuntu https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/ IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/ A Complete Guide to ? Protocol Problem Determination Guide for IBM Spectrum Scale? ? Part 1 https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/ IBM Spectrum Scale installation toolkit ? enhancements over releases https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/ Network requirements in an Elastic Storage Server Setup https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/ Co-resident migration with Transparent cloud tierin https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/ IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big Data Solution https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/ Big data analytics with Spectrum Scale using remote cluster mount & multi-filesystem support https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/ IBM Spectrum Scale HDFS Transparency Short Circuit Write Support https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/ IBM Spectrum Scale HDFS Transparency Federation Support https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/ How to configure and performance tuning different system workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Spark workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning database workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Hadoop workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/ How to Configure IBM Spectrum Scale? with NIS based Authentication. https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 11/16/2017 08:15 PM Subject: Latest Technical Blogs on Spectrum Scale Dear User Group members, Here are the Development Blogs in last 3 months on Spectrum Scale Technical Topics. Spectrum Scale Monitoring ? Know More ? https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/ IBM Spectrum Scale 5.0 Release ? What?s coming ! https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/ Four Essentials things to know for managing data ACLs on IBM Spectrum Scale? from Windows https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/ GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server https://developer.ibm.com/storage/2017/11/13/gssutils/ IBM Spectrum Scale Object Authentication https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/ Video Surveillance ? Choosing the right storage https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/ IBM Spectrum scale object deep dive training with problem determination https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training Spectrum Scale as preferred software defined storage for Ubuntu OpenStack https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/ IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a performance workhorse https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/ A Complete Guide to Configure LDAP-based authentication with IBM Spectrum Scale? for File Access https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/ Deploying IBM Spectrum Scale on AWS Quick Start https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/ Monitoring Spectrum Scale Object metrics https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/ Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk Universal https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/ Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol on IBM Spectrum Scale?? https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory and LDAP https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/ IBM Spectrum Scale? Authentication using Active Directory and RFC2307 https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/ High Availability Implementation with IBM Spectrum Virtualize and IBM Spectrum Scale https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/ 10 Frequently asked Questions on configuring Authentication using AD + AUTO ID mapping on IBM Spectrum Scale?. https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/ Five cool things that you didn?t know Transparent Cloud Tiering on Spectrum Scale can do https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/ IBM Spectrum Scale GUI videos https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/ IBM Spectrum Scale? Authentication ? Planning for NFS Access https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Thu Oct 4 10:05:57 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 4 Oct 2018 09:05:57 +0000 Subject: [gpfsug-discuss] V5.0.2 and maxblocksize Message-ID: <3cc9ab310d6d42009f779ac0b1967a53@SMXRF105.msg.hukrf.de> Hallo All, i put a requirement for these gap. Link is here: http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=125603 Please Vote. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Oct 4 20:54:48 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 Oct 2018 19:54:48 +0000 Subject: [gpfsug-discuss] Long I/O's on client but not on NSD server(s) Message-ID: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> Hi All, What does it mean if I have a few dozen very long I/O?s (50 - 75 seconds) on a gateway as reported by ?mmdiag ?iohist? and they all reference two of my eight NSD servers? ? but then I go to those 2 NSD servers and I don?t see any long I/O?s at all? In other words, if the problem (this time) were the backend storage, I should see long I/O?s on the NSD servers, right? I?m thinking this indicates that there is some sort of problem with either the client gateway itself or the network in between the gateway and the NSD server(s) ? thoughts??? Thanks in advance? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjdoherty at yahoo.com Thu Oct 4 20:58:19 2018 From: jjdoherty at yahoo.com (Jim Doherty) Date: Thu, 4 Oct 2018 19:58:19 +0000 (UTC) Subject: [gpfsug-discuss] Long I/O's on client but not on NSD server(s) In-Reply-To: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> References: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> Message-ID: <2043390893.1272.1538683099673@mail.yahoo.com> It could mean a shortage of nsd server threads?? or a congested network.?? Jim On Thursday, October 4, 2018, 3:55:10 PM EDT, Buterbaugh, Kevin L wrote: Hi All, What does it mean if I have a few dozen very long I/O?s (50 - 75 seconds) on a gateway as reported by ?mmdiag ?iohist? and they all reference two of my eight NSD servers? ? but then I go to those 2 NSD servers and I don?t see any long I/O?s at all? In other words, if the problem (this time) were the backend storage, I should see long I/O?s on the NSD servers, right? I?m thinking this indicates that there is some sort of problem with either the client gateway itself or the network in between the gateway and the NSD server(s) ? thoughts??? Thanks in advance? ?Kevin Buterbaugh - Senior System AdministratorVanderbilt University - Advanced Computing Center for Research and EducationKevin.Buterbaugh at vanderbilt.edu?- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Oct 4 21:00:21 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 4 Oct 2018 16:00:21 -0400 Subject: [gpfsug-discuss] Long I/O's on client but not on NSD server(s) In-Reply-To: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> References: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> Message-ID: My first guess would be the network between the NSD client and NSD server. netstat and ethtool may help to determine where the cause may lie, if it is on the NSD client. Obviously a switch on the network could be another source of the problem. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 10/04/2018 03:55 PM Subject: [gpfsug-discuss] Long I/O's on client but not on NSD server(s) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, What does it mean if I have a few dozen very long I/O?s (50 - 75 seconds) on a gateway as reported by ?mmdiag ?iohist? and they all reference two of my eight NSD servers? ? but then I go to those 2 NSD servers and I don?t see any long I/O?s at all? In other words, if the problem (this time) were the backend storage, I should see long I/O?s on the NSD servers, right? I?m thinking this indicates that there is some sort of problem with either the client gateway itself or the network in between the gateway and the NSD server(s) ? thoughts??? Thanks in advance? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinsworkmachine at gmail.com Thu Oct 4 21:05:53 2018 From: martinsworkmachine at gmail.com (J Martin Rushton) Date: Thu, 4 Oct 2018 21:05:53 +0100 Subject: [gpfsug-discuss] Long I/O's on client but not on NSD server(s) In-Reply-To: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> References: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> Message-ID: <651fe07d-e745-e844-2f9b-44fd78ccee24@gmail.com> I saw something similar a good few years ago (ie on an older version of GPFS).? IIRC the issue was one of contention: one or two served nodes were streaming IOs to/from the NSD servers and as a result other nodes were exhibiting insane IO times.? Can't be more helpful though, I no longer have access to the system. Regards, J Martin Rushton MBCS On 04/10/18 20:54, Buterbaugh, Kevin L wrote: > Hi All, > > What does it mean if I have a few dozen very long I/O?s (50 - 75 > seconds) on a gateway as reported by ?mmdiag ?iohist? and they all > reference two of my eight NSD servers? > > ? but then I go to those 2 NSD servers and I don?t see any long I/O?s > at all? > > In other words, if the problem (this time) were the backend storage, I > should see long I/O?s on the NSD servers, right? > > I?m thinking this indicates that there is some sort of problem with > either the client gateway itself or the network in between the gateway > and the NSD server(s) ? thoughts??? > > Thanks in advance? > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > Kevin.Buterbaugh at vanderbilt.edu > ?- (615)875-9633 > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 9 14:38:21 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 9 Oct 2018 13:38:21 +0000 Subject: [gpfsug-discuss] Pmsensors and gui Message-ID: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler$1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From lgayne at us.ibm.com Tue Oct 9 14:43:09 2018 From: lgayne at us.ibm.com (Lyle Gayne) Date: Tue, 9 Oct 2018 09:43:09 -0400 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: Adding GUI personnel to respond. Lyle From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 10/09/2018 09:41 AM Subject: [gpfsug-discuss] Pmsensors and gui Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler $1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Tue Oct 9 14:54:51 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 9 Oct 2018 13:54:51 +0000 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: I can help with the first one as I had the issue a few weeks ago. The answer from support is below, verbatim. --------------------------------------------------------------------------------------------------------------------------------------------- When trying to resolve the IP-Address in the JAVA code the first entry entry in the list is returned. Just localhost was expected for this. If the order is other way around and the list starts with localhost.localdomain, the GUI unfortunately cannot resolve the real node name and will fail with the message seen in the log files. Thus I assume that this is the case for your customer. it seems that our code it not as tolerant as it should be for the localhost definitions in the /etc/hosts file on the GUI node. We need to change this in our code to handle accordingly. Please let the customer adjust this entry and place localhost at the top of the list. After this the task should run successful and the state should be OK again for the pm_collector. --------------------------------------------------------------------------------------------------------------------------------------------- checking the GUI nodes /etc/hosts it shows actually 127.0.0.1 localhost.localdomain localhost4 localhost4.localdomain4 localhost From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: 09 October 2018 14:38 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Pmsensors and gui Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler$1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.koeninger at de.ibm.com Tue Oct 9 15:03:41 2018 From: andreas.koeninger at de.ibm.com (Andreas Koeninger) Date: Tue, 9 Oct 2018 14:03:41 +0000 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Tue Oct 9 15:56:14 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Tue, 9 Oct 2018 16:56:14 +0200 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: Hello Simon, the performance collector collects data from each node with the "hostname" as in /bin/hostname as key. The GUI reaches out to all nodes and tries to map the GPFS node name to the local hostname on that node. If the hostname is set identical to be "hostname" on all nodes, the mapping will not succeed, So you will have to use unique hostnames on all cluster nodes. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: "Sobey, Richard A" To: gpfsug main discussion list Date: 09.10.2018 16:00 Subject: Re: [gpfsug-discuss] Pmsensors and gui Sent by: gpfsug-discuss-bounces at spectrumscale.org I can help with the first one as I had the issue a few weeks ago. The answer from support is below, verbatim. --------------------------------------------------------------------------------------------------------------------------------------------- When trying to resolve the IP-Address in the JAVA code the first entry entry in the list is returned. Just localhost was expected for this. If the order is other way around and the list starts with localhost.localdomain, the GUI unfortunately cannot resolve the real node name and will fail with the message seen in the log files. Thus I assume that this is the case for your customer. it seems that our code it not as tolerant as it should be for the localhost definitions in the /etc/hosts file on the GUI node. We need to change this in our code to handle accordingly. Please let the customer adjust this entry and place localhost at the top of the list. After this the task should run successful and the state should be OK again for the pm_collector. --------------------------------------------------------------------------------------------------------------------------------------------- checking the GUI nodes /etc/hosts it shows actually 127.0.0.1 localhost.localdomain localhost4 localhost4.localdomain4 localhost From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: 09 October 2018 14:38 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Pmsensors and gui Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler $1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 17486462.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Tue Oct 9 15:56:24 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 9 Oct 2018 14:56:24 +0000 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: <320AAE68-5F40-48B7-97CF-DA0029DB76C2@bham.ac.uk> Yes we do indeed have: 127.0.0.1 localhost.localdomain localhost I saw a post on the list, but never the answer ? (I don?t think!) Simon From: on behalf of "andreas.koeninger at de.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 9 October 2018 at 15:04 To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Pmsensors and gui Hi Simon, For your fist issue regarding the PM_MONITOR task, you may have hit a known issue. Please check if the following applies to your environment. I will get back to you for the second issue. -------------------- Solution: For this to fix, the customer should change the /etc/hosts entry for the 127.0.0.1 as follows: from current: 127.0.0.1 localhost.localdomain localhost to this: 127.0.0.1 localhost localhost.localdomain -------------------- Mit freundlichen Gr??en / Kind regards Andreas Koeninger Scrum Master and Software Developer / Spectrum Scale GUI and REST API IBM Systems &Technology Group, Integrated Systems Development / M069 ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49-7034-643-0867 Mobile: +49-7034-643-0867 E-Mail: andreas.koeninger at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Original message ----- From: Simon Thompson Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] Pmsensors and gui Date: Tue, Oct 9, 2018 3:42 PM Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler$1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 9 15:59:35 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 9 Oct 2018 14:59:35 +0000 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: <57798FAD-4CD1-41DB-8F00-06F34E85D34F@bham.ac.uk> We do ? Its just the node is joined to the cluster as ?hostname1-data.cluster?, but it also has a primary (1GbE link) as ?hostname.cluster?? Simon From: on behalf of "rohwedder at de.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 9 October 2018 at 15:56 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Pmsensors and gui Hello Simon, the performance collector collects data from each node with the "hostname" as in /bin/hostname as key. The GUI reaches out to all nodes and tries to map the GPFS node name to the local hostname on that node. If the hostname is set identical to be "hostname" on all nodes, the mapping will not succeed, So you will have to use unique hostnames on all cluster nodes. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development ________________________________ Phone: +49 7034 6430190 IBM Deutschland Research & Development [cid:2__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@] E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ [Inactive hide details for "Sobey, Richard A" ---09.10.2018 16:00:32---I can help with the first one as I had the issue a few we]"Sobey, Richard A" ---09.10.2018 16:00:32---I can help with the first one as I had the issue a few weeks ago. The answer from support is below, From: "Sobey, Richard A" To: gpfsug main discussion list Date: 09.10.2018 16:00 Subject: Re: [gpfsug-discuss] Pmsensors and gui Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I can help with the first one as I had the issue a few weeks ago. The answer from support is below, verbatim. --------------------------------------------------------------------------------------------------------------------------------------------- When trying to resolve the IP-Address in the JAVA code the first entry entry in the list is returned. Just localhost was expected for this. If the order is other way around and the list starts with localhost.localdomain, the GUI unfortunately cannot resolve the real node name and will fail with the message seen in the log files. Thus I assume that this is the case for your customer. it seems that our code it not as tolerant as it should be for the localhost definitions in the /etc/hosts file on the GUI node. We need to change this in our code to handle accordingly. Please let the customer adjust this entry and place localhost at the top of the list. After this the task should run successful and the state should be OK again for the pm_collector. --------------------------------------------------------------------------------------------------------------------------------------------- checking the GUI nodes /etc/hosts it shows actually 127.0.0.1 localhost.localdomain localhost4 localhost4.localdomain4 localhost From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: 09 October 2018 14:38 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Pmsensors and gui Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler$1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 46 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 4660 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.gif Type: image/gif Size: 106 bytes Desc: image003.gif URL: From S.J.Thompson at bham.ac.uk Tue Oct 9 20:37:59 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 9 Oct 2018 19:37:59 +0000 Subject: [gpfsug-discuss] Protocols protocols ... Message-ID: <0D334EB6-4F92-4D03-B19E-A8AEA2957232@bham.ac.uk> So we have both SMB and NFS enabled in our cluster. For various reasons we want to only run SMB on some nodes and only run NFS on other nodes? We have used mmchnode to set the nodes into different groups and then have IP addresses associated with those groups which we want to use for SMB and NFS. All seems OK so far ? Now comes the problem, I can?t see a way to tell CES that group1 should run NFS and group2 SMB. We thought we had this cracked by removing the gpfs.smb packages from NFS nodes and ganesha from SMB nodes. Seems to work OK, EXCEPT ? sometimes nodes go into failed state, and it looks like this is because the SMB state is failed on the NFS only nodes ? This looks to me like GPFS is expecting protocol packages to be installed for both NFS and SMB. I worked out I can clear the failed state by running mmces service stop SMB -N node. The docs mention attributes, but I don?t see that they are used other than when running object? Any thoughts/comments/links to a doc page I missed? Or is it expected that both smb and nfs packages are required to be installed on all protocol nodes even if not being used on that node? Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 9 21:34:43 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 9 Oct 2018 21:34:43 +0100 Subject: [gpfsug-discuss] Protocols protocols ... In-Reply-To: <0D334EB6-4F92-4D03-B19E-A8AEA2957232@bham.ac.uk> References: <0D334EB6-4F92-4D03-B19E-A8AEA2957232@bham.ac.uk> Message-ID: On 09/10/18 20:37, Simon Thompson wrote: [SNIP] > > Any thoughts/comments/links to a doc page I missed? Or is it expected > that both smb and nfs packages are required to be installed on all > protocol nodes even if not being used on that node? > As a last resort could you notionally let them do both and fix it with iptables so they only appear to the outside world to be running one or the other? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From kkr at lbl.gov Tue Oct 9 22:39:23 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Tue, 9 Oct 2018 14:39:23 -0700 Subject: [gpfsug-discuss] TO BE RESCHEDULED [was] - Re: Request for Enhancements (RFE) Forum - Submission Deadline October 1 In-Reply-To: <841FA5CA-5C6B-4626-8137-BA5994C3A651@bham.ac.uk> References: <52220937-CE0A-4949-89A0-6EA41D5ECF93@lbl.gov> <263e53c18647421f8b3cd936da0075df@jumptrading.com> <0341213A-6CB7-434F-A575-9099C2D0D703@spectrumscale.org> <585b21e7-d437-380f-65d8-d24fa236ce3b@nasa.gov> <841FA5CA-5C6B-4626-8137-BA5994C3A651@bham.ac.uk> Message-ID: Due to scheduling conflicts we need to reschedule the RFE meeting that was to happen tomorrow, October 10th. We received RFEs from 2 sites (NASA and Sloan Kettering), if you sent one and it was somehow missed. Please respond here, and we?ll pick up privately as follow up. More soon. Best, Kristy > On Sep 28, 2018, at 6:44 AM, Simon Thompson wrote: > > There is a limit on votes, not submissions. i.e. your site gets three votes, so you can't have three votes and someone else from Goddard also have three. > > We have to review the submissions, so as you say 10 we'd think unreasonable and skip, but a sensible number is OK. > > Simon > > ?On 28/09/2018, 13:52, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Aaron Knister" wrote: > > Hi Kristy, > > At some point I thought I'd read there was a per-site limit of the > number of RFEs that could be submitted but I can't find it skimming > through email. I'd think submitting 10 would be unreasonable but would 2 > or 3 be OK? > > -Aaron > > On 9/27/18 4:35 PM, Kristy Kallback-Rose wrote: >> Reminder, the*October 1st* deadline is approaching. We?re looking for at >> least a few RFEs (Requests For Enhancements) for this first forum, so if >> you?re interesting in promoting your RFE please reach out to one of us, >> or even here on the list. >> >> Thanks, >> Kristy >> >>> On Sep 7, 2018, at 3:00 AM, Simon Thompson (Spectrum Scale User Group >>> Chair) > wrote: >>> >>> GPFS/Spectrum Scale Users, >>> Here?s a long-ish note about our plans to try and improve the RFE >>> process. We?ve tried to include a tl;dr version if you just read the >>> headers. You?ll find the details underneath ;-) and reading to the end >>> is ideal. >>> >>> IMPROVING THE RFE PROCESS >>> As you?ve heard on the list, and at some of the in-person User Group >>> events, we?ve been talking about ways we can improve the RFE process. >>> We?d like to begin having an RFE forum, and have it be de-coupled from >>> the in-person events because we know not everyone can travel. >>> LIGHTNING PRESENTATIONS ON-LINE >>> In general terms, we?d have regular on-line events, where RFEs could >>> be/very briefly/(5 minutes, lightning talk) presented by the >>> requester. There would then be time for brief follow-on discussion >>> and questions. The session would be recorded to deal with large time >>> zone differences. >>> The live meeting is planned for October 10^th 2018, at 4PM BST (that >>> should be 11am EST if we worked is out right!) >>> FOLLOW UP POLL >>> A poll, independent of current RFE voting, would be conducted a couple >>> days after the recording was available to gather votes and feedback >>> on the RFEs submitted ?we may collect site name, to see how many votes >>> are coming from a certain site. >>> >>> MAY NOT GET IT RIGHT THE FIRST TIME >>> We view this supplemental RFE process as organic, that is, we?ll learn >>> as we go and make modifications. The overall goal here is to highlight >>> the RFEs that matter the most to the largest number of UG members by >>> providing a venue for people to speak about their RFEs and collect >>> feedback from fellow community members. >>> >>> *RFE PRESENTERS WANTED, SUBMISSION DEADLINE OCTOBER 1ST >>> *We?d like to guide a small handful of RFE submitters through this >>> process the first time around, so if you?re interested in being a >>> presenter, let us know now. We?re planning on doing the online meeting >>> and poll for the first time in mid-October, so the submission deadline >>> for your RFE is October 1st. If it?s useful, when you?re drafting your >>> RFE feel free to use the list as a sounding board for feedback. Often >>> sites have similar needs and you may find someone to collaborate with >>> on your RFE to make it useful to more sites, and thereby get more >>> votes. Some guidelines are here: >>> https://drive.google.com/file/d/1o8nN39DTU32qj_EFia5wRhnvfvNfr3cI/view?usp=sharing >>> You can submit you RFE by email to:rfe at spectrumscaleug.org >>> >>> >>> *PARTICIPANTS (AKA YOU!!), VIEW AND VOTE >>> *We are seeking very good participation in the RFE on-line events >>> needed to make this an effective method of Spectrum Scale Community >>> and IBM Developer collaboration. * It is to your benefit to >>> participate and help set priorities on Spectrum Scale enhancements!! >>> *We want to make this process light lifting for you as a participant. >>> We will limit the duration of the meeting to 1 hour to minimize the >>> use of your valuable time. >>> >>> Please register for the online meeting via Eventbrite >>> (https://www.eventbrite.com/e/spectrum-scale-request-for-enhancements-voting-tickets-49979954389) >>> ? we?ll send details of how to join the online meeting nearer the time. >>> >>> Thanks! >>> >>> Simon, Kristy, Bob, Bryan and Carl! >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss atspectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From kkr at lbl.gov Wed Oct 10 03:08:16 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Tue, 9 Oct 2018 19:08:16 -0700 Subject: [gpfsug-discuss] Still need a couple User Talks for SC18 Message-ID: Hello, Please reach out if you?re even a little bit interested, we really want to balance the agenda with user presentations. Thanks, Kristy From kkr at lbl.gov Wed Oct 10 03:13:36 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Tue, 9 Oct 2018 19:13:36 -0700 Subject: [gpfsug-discuss] Still need a couple User Talks for SC18 In-Reply-To: References: Message-ID: <5EAF422E-FD80-4370-8267-959D4E89A0B5@lbl.gov> PS - If you?ve already contacted me about talking can you please ping me again? I?m drowning in stuff-to-do sauce. Thanks, Kristy > On Oct 9, 2018, at 7:08 PM, Kristy Kallback-Rose wrote: > > Hello, > > Please reach out if you?re even a little bit interested, we really want to balance the agenda with user presentations. > > Thanks, > Kristy From rohwedder at de.ibm.com Wed Oct 10 09:24:58 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Wed, 10 Oct 2018 10:24:58 +0200 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: <57798FAD-4CD1-41DB-8F00-06F34E85D34F@bham.ac.uk> References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> <57798FAD-4CD1-41DB-8F00-06F34E85D34F@bham.ac.uk> Message-ID: Hello Simon, not sure if the answer solved your question from the response, Even if nodes can be externally resolved by unique hostnames, applications that run on the host use the /bin/hostname binary or the hostname() call to identify the node they are running on. This is the case with the performance collection sensor. So you need to set the hostname of the hosts using /bin/hostname in in a way that provides unique responses of the "/bin/hostname" call within a cluster. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: Simon Thompson To: gpfsug main discussion list Date: 09.10.2018 17:00 Subject: Re: [gpfsug-discuss] Pmsensors and gui Sent by: gpfsug-discuss-bounces at spectrumscale.org We do ? Its just the node is joined to the cluster as ?hostname1-data.cluster?, but it also has a primary (1GbE link) as ?hostname.cluster?? Simon From: on behalf of "rohwedder at de.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 9 October 2018 at 15:56 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Pmsensors and gui Hello Simon, the performance collector collects data from each node with the "hostname" as in /bin/hostname as key. The GUI reaches out to all nodes and tries to map the GPFS node name to the local hostname on that node. If the hostname is set identical to be "hostname" on all nodes, the mapping will not succeed, So you will have to use unique hostnames on all cluster nodes. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development |------------------------------------------------+------------------------------------------------+-------------------------------> | | | | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@| | | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |Phone: |+49 7034 6430190 |IBM Deutschland Research & | | | |Development | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| |cid:2__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@| >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |E-Mail: |rohwedder at de.ibm.com |Am Weiher 24 | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@|cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@|65451 Kelsterbach | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@|cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@|Germany | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> | | | | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@| | | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| Inactive hide details for "Sobey, Richard A" ---09.10.2018 16:00:32---I can help with the first one as I had the issue a few we"Sobey, Richard A" ---09.10.2018 16:00:32---I can help with the first one as I had the issue a few weeks ago. The answer from support is below, From: "Sobey, Richard A" To: gpfsug main discussion list Date: 09.10.2018 16:00 Subject: Re: [gpfsug-discuss] Pmsensors and gui Sent by: gpfsug-discuss-bounces at spectrumscale.org I can help with the first one as I had the issue a few weeks ago. The answer from support is below, verbatim. --------------------------------------------------------------------------------------------------------------------------------------------- When trying to resolve the IP-Address in the JAVA code the first entry entry in the list is returned. Just localhost was expected for this. If the order is other way around and the list starts with localhost.localdomain, the GUI unfortunately cannot resolve the real node name and will fail with the message seen in the log files. Thus I assume that this is the case for your customer. it seems that our code it not as tolerant as it should be for the localhost definitions in the /etc/hosts file on the GUI node. We need to change this in our code to handle accordingly. Please let the customer adjust this entry and place localhost at the top of the list. After this the task should run successful and the state should be OK again for the pm_collector. --------------------------------------------------------------------------------------------------------------------------------------------- checking the GUI nodes /etc/hosts it shows actually 127.0.0.1 localhost.localdomain localhost4 localhost4.localdomain4 localhost From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: 09 October 2018 14:38 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Pmsensors and gui Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler$1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19742873.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19933766.gif Type: image/gif Size: 46 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19033540.gif Type: image/gif Size: 4660 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19192281.gif Type: image/gif Size: 106 bytes Desc: not available URL: From robbyb at us.ibm.com Wed Oct 10 14:07:10 2018 From: robbyb at us.ibm.com (Rob Basham) Date: Wed, 10 Oct 2018 13:07:10 +0000 Subject: [gpfsug-discuss] Still need a couple User Talks for SC18 In-Reply-To: <5EAF422E-FD80-4370-8267-959D4E89A0B5@lbl.gov> References: <5EAF422E-FD80-4370-8267-959D4E89A0B5@lbl.gov>, Message-ID: An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Oct 10 14:22:52 2018 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]) Date: Wed, 10 Oct 2018 13:22:52 +0000 Subject: [gpfsug-discuss] Still need a couple User Talks for SC18 In-Reply-To: References: <5EAF422E-FD80-4370-8267-959D4E89A0B5@lbl.gov>, , Message-ID: <9DF57532-9CF1-4288-AB75-6937F583953D@nasa.gov> If there?s interest I could do a short presentation on our 1k node virtual GPFS test cluster (with SR-IOV and real IB RDMA!) and some of the benefits we?ve found (including helping squash a nasty hard-to-reproduce bug) as well as how we use it to test upgrades. On October 10, 2018 at 09:07:24 EDT, Rob Basham wrote: Kristy, I'll be at SC18 for client presentations and could talk about TCT. We have a big release coming up in 1H18 with multi-site support and we've broken out of the gateway paradigm to where we work on every client node in the cluster for key data path work. If you have a slot I could talk about that. Regards, Rob Basham MCStore and IBM Ready Archive architecture 971-344-1999 ----- Original message ----- From: Kristy Kallback-Rose Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Still need a couple User Talks for SC18 Date: Tue, Oct 9, 2018 7:13 PM PS - If you?ve already contacted me about talking can you please ping me again? I?m drowning in stuff-to-do sauce. Thanks, Kristy > On Oct 9, 2018, at 7:08 PM, Kristy Kallback-Rose wrote: > > Hello, > > Please reach out if you?re even a little bit interested, we really want to balance the agenda with user presentations. > > Thanks, > Kristy _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Oct 10 14:58:24 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 10 Oct 2018 13:58:24 +0000 Subject: [gpfsug-discuss] Still need a couple User Talks for SC18 In-Reply-To: <9DF57532-9CF1-4288-AB75-6937F583953D@nasa.gov> References: <5EAF422E-FD80-4370-8267-959D4E89A0B5@lbl.gov> <9DF57532-9CF1-4288-AB75-6937F583953D@nasa.gov> Message-ID: <0835F404-DF06-4237-A1AA-8553E28E1343@nuance.com> User talks - For those interested, please email Kristy and/or myself directly. Rob/other IBMers - work with Ulf Troppens on slots. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Oct 10 16:06:09 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 10 Oct 2018 11:06:09 -0400 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> <57798FAD-4CD1-41DB-8F00-06F34E85D34F@bham.ac.uk> Message-ID: <11037.1539183969@turing-police.cc.vt.edu> On Wed, 10 Oct 2018 10:24:58 +0200, "Markus Rohwedder" said: > Hello Simon, > > not sure if the answer solved your question from the response, > > Even if nodes can be externally resolved by unique hostnames, applications > that run on the host use the /bin/hostname binary or the hostname() call to > identify the node they are running on. > This is the case with the performance collection sensor. > So you need to set the hostname of the hosts using /bin/hostname in in a > way that provides unique responses of the "/bin/hostname" call within a > cluster. And we discovered that 'unique' applies to "only considering the leftmost part of the hostname". We set up a stretch cluster that had 3 NSD servers at each of two locations, and found that using FQDN names of the form: nsd1.something.loc1.internal nsd2.something.loc1.internal nsd1.something.loc2.internal nsd2.something.loc2.internal got things all sorts of upset in a very passive-agressive way. The cluster would come up, and serve data just fine. But things like 'nsdperf' would toss errors about not being able to resolve a NSD server name, or fail to connect, or complain that it was connecting to itself, or other similar "not talking to the node it thought" type confusion... We ended up renaming to: nsd1-loc1.something.internal nsd1-loc2.something.internal ... and all the userspace tools started working much better. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Wed Oct 10 16:43:45 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 10 Oct 2018 15:43:45 +0000 Subject: [gpfsug-discuss] Performance collector no results for Capacity Message-ID: Hi all, Maybe I'm barking up the wrong tree but I'm debugging why I don't get a nice graph in the GUI for fileset capacity, even though the GUI does know about things such as capacity and inodes and usage. So off I go to the CLI to run "mmperfmon query GPFSFilesetQuota" and I get this: Oct-10 16:33:28 [Info ] QueryEngine: (fd=64) query from 127.0.0.1: get metrics GPFSFilesetQuota from node=icgpfsq1 last 10 bucket_size 1 Oct-10 16:33:28 [Info ] QueryParser: metric: GPFSFilesetQuota Oct-10 16:33:28 [Warning] QueryEngine: searchForMetric: could not find metaKey for given metric GPFSFilesetQuota, returning. Oct-10 16:33:28 [Info ] QueryEngine: [fd=64] no data available for query Is this a golden ticket to my problem or should I be checking elsewhere? I'm following a troubleshooting guide here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1pdg_guiperfmonissues.htm and from the page directly within the GUI server itself. Notably, other things work ok: [root at icgpfsq1 richard]# mmperfmon query cpu_user Legend: 1: icgpfsq1|CPU|cpu_user Row Timestamp cpu_user 1 2018-10-10-16:41:09 0.00 2 2018-10-10-16:41:10 0.25 3 2018-10-10-16:41:11 0.50 4 2018-10-10-16:41:12 0.50 5 2018-10-10-16:41:13 0.50 6 2018-10-10-16:41:14 0.25 7 2018-10-10-16:41:15 1.25 8 2018-10-10-16:41:16 2.51 9 2018-10-10-16:41:17 0.25 10 2018-10-10-16:41:18 0.25 I'm running 5.0.1-2 on all nodes except the NSD servers which still run 5.0.0.2. Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Oct 10 16:58:51 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 10 Oct 2018 15:58:51 +0000 Subject: [gpfsug-discuss] Sudo wrappers Message-ID: <88E47B96-DF0B-428A-92F6-1AEAEA4AA8EE@bham.ac.uk> OK, so I finally got a few minutes to play with the sudo wrappers. I read the docs on the GPFS website, setup my gpfsadmin user and made it so that root can ssh as the gpfsadmin user to the host. Except of course I?ve clearly misunderstood things, because when I do: [myusername at bber-dssg02 bin]$ sudo /usr/lpp/mmfs/bin/mmgetstate -a myusername at bber-afmgw01.bb2.cluster's password: myusername at bber-dssg02.bb2.cluster's password: myusername at bber-dssg01.bb2.cluster's password: myusername at bber-afmgw02.bb2.cluster's password: Now ?myusername? is ? my username, not ?gpfsadmin?. What I really don?t want to do is permit root to ssh to all the hosts in the cluster as ?myusername?. I kinda thought the username it sshes as would be configurable, but apparently not? Annoyingly, I can do: [myusername at bber-dssg02 bin]$ sudo SUDO_USER=gpfsadmin /usr/lpp/mmfs/bin/mmgetstate -a And that works fine? So is it possibly to set in a config file the user that the sudo wrapper works as? (I get there are cases where you want to ssh as the original calling user) Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From Fabrice.Cantos at niwa.co.nz Wed Oct 10 22:57:04 2018 From: Fabrice.Cantos at niwa.co.nz (Fabrice Cantos) Date: Wed, 10 Oct 2018 21:57:04 +0000 Subject: [gpfsug-discuss] Quick survey: ACL Posix vs NFS V4 Message-ID: <8120950808e344e280ae211ff22ba0bf@welwex02.niwa.local> I would be interested to know what you chose for your filesystems and user/project space directories: * Traditional Posix ACL * NFS V4 ACL What did motivate your choice? We are facing some issues to get the correct NFS ACL to keep correct attributes for new files created. Thanks Fabrice [cid:image4cef17.PNG at 18c66b76.4480e036] Fabrice Cantos HPC Systems Engineer Group Manager ? High Performance Computing T +64-4-386-0367 M +64-27-412-9693 National Institute of Water & Atmospheric Research Ltd (NIWA) 301 Evans Bay Parade, Greta Point, Wellington Connect with NIWA: niwa.co.nz Facebook Twitter LinkedIn Instagram To ensure compliance with legal requirements and to maintain cyber security standards, NIWA's IT systems are subject to ongoing monitoring, activity logging and auditing. This monitoring and auditing service may be provided by third parties. Such third parties can access information transmitted to, processed by and stored on NIWA's IT systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image4cef17.PNG Type: image/png Size: 12288 bytes Desc: image4cef17.PNG URL: From truongv at us.ibm.com Thu Oct 11 04:14:24 2018 From: truongv at us.ibm.com (Truong Vu) Date: Wed, 10 Oct 2018 23:14:24 -0400 Subject: [gpfsug-discuss] Sudo wrappers In-Reply-To: References: Message-ID: Yes, you can use mmchconfig for that. eg: mmchconfig sudoUser=gpfsadmin Thanks, Tru. Message: 2 Date: Wed, 10 Oct 2018 15:58:51 +0000 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Sudo wrappers Message-ID: <88E47B96-DF0B-428A-92F6-1AEAEA4AA8EE at bham.ac.uk> Content-Type: text/plain; charset="utf-8" OK, so I finally got a few minutes to play with the sudo wrappers. I read the docs on the GPFS website, setup my gpfsadmin user and made it so that root can ssh as the gpfsadmin user to the host. Except of course I?ve clearly misunderstood things, because when I do: [myusername at bber-dssg02 bin]$ sudo /usr/lpp/mmfs/bin/mmgetstate -a myusername at bber-afmgw01.bb2.cluster's password: myusername at bber-dssg02.bb2.cluster's password: myusername at bber-dssg01.bb2.cluster's password: myusername at bber-afmgw02.bb2.cluster's password: Now ?myusername? is ? my username, not ?gpfsadmin?. What I really don?t want to do is permit root to ssh to all the hosts in the cluster as ?myusername?. I kinda thought the username it sshes as would be configurable, but apparently not? Annoyingly, I can do: [myusername at bber-dssg02 bin]$ sudo SUDO_USER=gpfsadmin /usr/lpp/mmfs/bin/mmgetstate -a And that works fine? So is it possibly to set in a config file the user that the sudo wrapper works as? (I get there are cases where you want to ssh as the original calling user) Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20181010/6317be26/attachment-0001.html > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Anna.Greim at de.ibm.com Thu Oct 11 07:41:25 2018 From: Anna.Greim at de.ibm.com (Anna Greim) Date: Thu, 11 Oct 2018 08:41:25 +0200 Subject: [gpfsug-discuss] Performance collector no results for Capacity In-Reply-To: References: Message-ID: Hi Richard, one thing to note. You tried "mmperfmon query GPFSFilesetQuota" to get metric data. So you used the sensor's name instead of a metric name. And compared it to "mmperfmon query cpu_user" where you used the metric name. mmperfmon will not return data, if you use the sensor's name instead of a metric's name. I bet you got something like this returned: [root at test-51 ~]# mmperfmon query GPFSFilesetQuota Error: no data available for query . mmperfmon: Command failed. Examine previous error messages to determine cause. The log entries you found just tell you, that the collector does not know any metric named "GPFSFilesetQuota". Please try the query again with gpfs_rq_blk_current or gpfs_rq_file_current. If the collector never got any data for that metrics, it also does not know those metrics' names. But since you do not see any data in the GUI this might be the case. In this case please check with "mmperfmon config show" if the restrict field is set correctly. You should use the long gpfs name and not the hostname. You can check, if the configuration file was distributed correctly in checking the /opt/IBM/zimon/ZIMonSensors.cfg on the node that is supposed to start this monitor. If the mmperfmon command was able to identify the restrict value correctly, this node should have your configured period value instead of 0 in ZIMonSensors.cfg under the GPFSFilesetQuota sensor. All other nodes should include a period equal to 0. Furthermore, of course, the period for GPFSFilesetQuota should be higher than 0. Recommended is a value of 3600 (once per hour) since the underlying command is heavier on the system than other sensors. Change the values with the "mmperfmon config update" command, so that it is distributed in the system. E.g. "mmperfmon config update GPFSFilesetQuota.restrict=" and "mmperfmon config update GPFSFilesetQuota.period=3600" Mit freundlichen Gr??en / Kind regards Greim, Anna Software Engineer, Spectrum Scale Development IBM Systems Phone: +49-7034-2740981 IBM Deutschland Research & Development GmbH Mobil: +49-172-2646541 Am Weiher 24 Email: anna.greim at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Date: 10/10/2018 17:43 Subject: [gpfsug-discuss] Performance collector no results for Capacity Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, Maybe I?m barking up the wrong tree but I?m debugging why I don?t get a nice graph in the GUI for fileset capacity, even though the GUI does know about things such as capacity and inodes and usage. So off I go to the CLI to run ?mmperfmon query GPFSFilesetQuota? and I get this: Oct-10 16:33:28 [Info ] QueryEngine: (fd=64) query from 127.0.0.1: get metrics GPFSFilesetQuota from node=icgpfsq1 last 10 bucket_size 1 Oct-10 16:33:28 [Info ] QueryParser: metric: GPFSFilesetQuota Oct-10 16:33:28 [Warning] QueryEngine: searchForMetric: could not find metaKey for given metric GPFSFilesetQuota, returning. Oct-10 16:33:28 [Info ] QueryEngine: [fd=64] no data available for query Is this a golden ticket to my problem or should I be checking elsewhere? I?m following a troubleshooting guide here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1pdg_guiperfmonissues.htm and from the page directly within the GUI server itself. Notably, other things work ok: [root at icgpfsq1 richard]# mmperfmon query cpu_user Legend: 1: icgpfsq1|CPU|cpu_user Row Timestamp cpu_user 1 2018-10-10-16:41:09 0.00 2 2018-10-10-16:41:10 0.25 3 2018-10-10-16:41:11 0.50 4 2018-10-10-16:41:12 0.50 5 2018-10-10-16:41:13 0.50 6 2018-10-10-16:41:14 0.25 7 2018-10-10-16:41:15 1.25 8 2018-10-10-16:41:16 2.51 9 2018-10-10-16:41:17 0.25 10 2018-10-10-16:41:18 0.25 I?m running 5.0.1-2 on all nodes except the NSD servers which still run 5.0.0.2. Thanks Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Thu Oct 11 08:54:01 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 11 Oct 2018 07:54:01 +0000 Subject: [gpfsug-discuss] Sudo wrappers In-Reply-To: References: Message-ID: <39DC4B5E-CAFD-489C-9BE5-42B83B29A8F5@bham.ac.uk> Nope that one doesn?t work ? I found it in the docs: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adm_mmchconfig.htm ?Specifies a non-root admin user ID to be used when sudo wrappers are enabled and a root-level background process calls an administration command directly instead of through sudo.? So it reads like it still wants to be ?me? unless it?s a background process. Simon From: on behalf of "truongv at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 11 October 2018 at 04:14 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Sudo wrappers Yes, you can use mmchconfig for that. eg: mmchconfig sudoUser=gpfsadmin Thanks, Tru. Message: 2 Date: Wed, 10 Oct 2018 15:58:51 +0000 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Sudo wrappers Message-ID: <88E47B96-DF0B-428A-92F6-1AEAEA4AA8EE at bham.ac.uk> Content-Type: text/plain; charset="utf-8" OK, so I finally got a few minutes to play with the sudo wrappers. I read the docs on the GPFS website, setup my gpfsadmin user and made it so that root can ssh as the gpfsadmin user to the host. Except of course I?ve clearly misunderstood things, because when I do: [myusername at bber-dssg02 bin]$ sudo /usr/lpp/mmfs/bin/mmgetstate -a myusername at bber-afmgw01.bb2.cluster's password: myusername at bber-dssg02.bb2.cluster's password: myusername at bber-dssg01.bb2.cluster's password: myusername at bber-afmgw02.bb2.cluster's password: Now ?myusername? is ? my username, not ?gpfsadmin?. What I really don?t want to do is permit root to ssh to all the hosts in the cluster as ?myusername?. I kinda thought the username it sshes as would be configurable, but apparently not? Annoyingly, I can do: [myusername at bber-dssg02 bin]$ sudo SUDO_USER=gpfsadmin /usr/lpp/mmfs/bin/mmgetstate -a And that works fine? So is it possibly to set in a config file the user that the sudo wrapper works as? (I get there are cases where you want to ssh as the original calling user) Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu Oct 11 13:10:00 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 11 Oct 2018 12:10:00 +0000 Subject: [gpfsug-discuss] Performance collector no results for Capacity In-Reply-To: References: Message-ID: Anna, Yes, that will be it! I was running the wrong command as you surmise. The GPFSFileSetQuota config appears to be correct: { name = "GPFSFilesetQuota" period = 3600 restrict = "icgpfsq1.cc.ic.ac.uk" }, However "mmperfmon query gpfs_rq_blk_current" just shows lots of null values, for example: Row Timestamp gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current 1 2018-10-11-13:07:31 null null null null null null null null 2 2018-10-11-13:07:32 null null null null null null null null 3 2018-10-11-13:07:33 null null null null null null null null 4 2018-10-11-13:07:34 null null null null null null null null 5 2018-10-11-13:07:35 null null null null null null null null 6 2018-10-11-13:07:36 null null null null null null null null 7 2018-10-11-13:07:37 null null null null null null null null 8 2018-10-11-13:07:38 null null null null null null null null 9 2018-10-11-13:07:39 null null null null null null null null 10 2018-10-11-13:07:40 null null null null null null null null Same with the metric gpfs_rq_file_current. I'll have a look at the PDF sent by Markus in the meantime. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Anna Greim Sent: 11 October 2018 07:41 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Performance collector no results for Capacity Hi Richard, one thing to note. You tried "mmperfmon query GPFSFilesetQuota" to get metric data. So you used the sensor's name instead of a metric name. And compared it to "mmperfmon query cpu_user" where you used the metric name. mmperfmon will not return data, if you use the sensor's name instead of a metric's name. I bet you got something like this returned: [root at test-51 ~]# mmperfmon query GPFSFilesetQuota Error: no data available for query . mmperfmon: Command failed. Examine previous error messages to determine cause. The log entries you found just tell you, that the collector does not know any metric named "GPFSFilesetQuota". Please try the query again with gpfs_rq_blk_current or gpfs_rq_file_current. If the collector never got any data for that metrics, it also does not know those metrics' names. But since you do not see any data in the GUI this might be the case. In this case please check with "mmperfmon config show" if the restrict field is set correctly. You should use the long gpfs name and not the hostname. You can check, if the configuration file was distributed correctly in checking the /opt/IBM/zimon/ZIMonSensors.cfg on the node that is supposed to start this monitor. If the mmperfmon command was able to identify the restrict value correctly, this node should have your configured period value instead of 0 in ZIMonSensors.cfg under the GPFSFilesetQuota sensor. All other nodes should include a period equal to 0. Furthermore, of course, the period for GPFSFilesetQuota should be higher than 0. Recommended is a value of 3600 (once per hour) since the underlying command is heavier on the system than other sensors. Change the values with the "mmperfmon config update" command, so that it is distributed in the system. E.g. "mmperfmon config update GPFSFilesetQuota.restrict=" and "mmperfmon config update GPFSFilesetQuota.period=3600" Mit freundlichen Gr??en / Kind regards Greim, Anna Software Engineer, Spectrum Scale Development IBM Systems ________________________________ Phone: +49-7034-2740981 IBM Deutschland Research & Development GmbH [cid:image001.gif at 01D46163.B6B21E10] Mobil: +49-172-2646541 Am Weiher 24 Email: anna.greim at de.ibm.com 65451 Kelsterbach Germany ________________________________ IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Sobey, Richard A" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 10/10/2018 17:43 Subject: [gpfsug-discuss] Performance collector no results for Capacity Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, Maybe I'm barking up the wrong tree but I'm debugging why I don't get a nice graph in the GUI for fileset capacity, even though the GUI does know about things such as capacity and inodes and usage. So off I go to the CLI to run "mmperfmon query GPFSFilesetQuota" and I get this: Oct-10 16:33:28 [Info ] QueryEngine: (fd=64) query from 127.0.0.1: get metrics GPFSFilesetQuota from node=icgpfsq1 last 10 bucket_size 1 Oct-10 16:33:28 [Info ] QueryParser: metric: GPFSFilesetQuota Oct-10 16:33:28 [Warning] QueryEngine: searchForMetric: could not find metaKey for given metric GPFSFilesetQuota, returning. Oct-10 16:33:28 [Info ] QueryEngine: [fd=64] no data available for query Is this a golden ticket to my problem or should I be checking elsewhere? I'm following a troubleshooting guide here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1pdg_guiperfmonissues.htm and from the page directly within the GUI server itself. Notably, other things work ok: [root at icgpfsq1 richard]# mmperfmon query cpu_user Legend: 1: icgpfsq1|CPU|cpu_user Row Timestamp cpu_user 1 2018-10-10-16:41:09 0.00 2 2018-10-10-16:41:10 0.25 3 2018-10-10-16:41:11 0.50 4 2018-10-10-16:41:12 0.50 5 2018-10-10-16:41:13 0.50 6 2018-10-10-16:41:14 0.25 7 2018-10-10-16:41:15 1.25 8 2018-10-10-16:41:16 2.51 9 2018-10-10-16:41:17 0.25 10 2018-10-10-16:41:18 0.25 I'm running 5.0.1-2 on all nodes except the NSD servers which still run 5.0.0.2. Thanks Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 1851 bytes Desc: image001.gif URL: From Anna.Greim at de.ibm.com Thu Oct 11 14:11:56 2018 From: Anna.Greim at de.ibm.com (Anna Greim) Date: Thu, 11 Oct 2018 15:11:56 +0200 Subject: [gpfsug-discuss] Performance collector no results for Capacity In-Reply-To: References: Message-ID: Hello Richard, the sensor is running once an hour and the default of mmperfmon returns the last 10 results in a bucket-size of 1 seconds. The sensor did not run in the time of 13:07:31-13:07:40. Please use the command again with the option -b 3600 or with --bucket-size=3600 and see if you've got any data for that time. If you get any data the question is, why the GUI isn't able to get the data. If you do not have any data (only null rows) the question is, why the collector does not get data or why the sensor does not collect data and sends them to the collector. Since you get data for the cpu_user metric it is more likely that the sensor is not collecting and sending anything. The guide from Markus should help you here. Otherwise just write again into the user group. Mit freundlichen Gr??en / Kind regards Greim, Anna Software Engineer, Spectrum Scale Development IBM Systems IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Sobey, Richard A" To: gpfsug main discussion list Date: 11/10/2018 14:10 Subject: Re: [gpfsug-discuss] Performance collector no results for Capacity Sent by: gpfsug-discuss-bounces at spectrumscale.org Anna, Yes, that will be it! I was running the wrong command as you surmise. The GPFSFileSetQuota config appears to be correct: { name = "GPFSFilesetQuota" period = 3600 restrict = "icgpfsq1.cc.ic.ac.uk" }, However ?mmperfmon query gpfs_rq_blk_current? just shows lots of null values, for example: Row Timestamp gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current 1 2018-10-11-13:07:31 null null null null null null null null 2 2018-10-11-13:07:32 null null null null null null null null 3 2018-10-11-13:07:33 null null null null null null null null 4 2018-10-11-13:07:34 null null null null null null null null 5 2018-10-11-13:07:35 null null null null null null null null 6 2018-10-11-13:07:36 null null null null null null null null 7 2018-10-11-13:07:37 null null null null null null null null 8 2018-10-11-13:07:38 null null null null null null null null 9 2018-10-11-13:07:39 null null null null null null null null 10 2018-10-11-13:07:40 null null null null null null null null Same with the metric gpfs_rq_file_current. I?ll have a look at the PDF sent by Markus in the meantime. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Anna Greim Sent: 11 October 2018 07:41 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Performance collector no results for Capacity Hi Richard, one thing to note. You tried "mmperfmon query GPFSFilesetQuota" to get metric data. So you used the sensor's name instead of a metric name. And compared it to "mmperfmon query cpu_user" where you used the metric name. mmperfmon will not return data, if you use the sensor's name instead of a metric's name. I bet you got something like this returned: [root at test-51 ~]# mmperfmon query GPFSFilesetQuota Error: no data available for query . mmperfmon: Command failed. Examine previous error messages to determine cause. The log entries you found just tell you, that the collector does not know any metric named "GPFSFilesetQuota". Please try the query again with gpfs_rq_blk_current or gpfs_rq_file_current. If the collector never got any data for that metrics, it also does not know those metrics' names. But since you do not see any data in the GUI this might be the case. In this case please check with "mmperfmon config show" if the restrict field is set correctly. You should use the long gpfs name and not the hostname. You can check, if the configuration file was distributed correctly in checking the /opt/IBM/zimon/ZIMonSensors.cfg on the node that is supposed to start this monitor. If the mmperfmon command was able to identify the restrict value correctly, this node should have your configured period value instead of 0 in ZIMonSensors.cfg under the GPFSFilesetQuota sensor. All other nodes should include a period equal to 0. Furthermore, of course, the period for GPFSFilesetQuota should be higher than 0. Recommended is a value of 3600 (once per hour) since the underlying command is heavier on the system than other sensors. Change the values with the "mmperfmon config update" command, so that it is distributed in the system. E.g. "mmperfmon config update GPFSFilesetQuota.restrict=" and "mmperfmon config update GPFSFilesetQuota.period=3600" Mit freundlichen Gr??en / Kind regards Greim, Anna Software Engineer, Spectrum Scale Development IBM Systems Phone: +49-7034-2740981 IBM Deutschland Research & Development GmbH Mobil: +49-172-2646541 Am Weiher 24 Email: anna.greim at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" < gpfsug-discuss at spectrumscale.org> Date: 10/10/2018 17:43 Subject: [gpfsug-discuss] Performance collector no results for Capacity Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, Maybe I?m barking up the wrong tree but I?m debugging why I don?t get a nice graph in the GUI for fileset capacity, even though the GUI does know about things such as capacity and inodes and usage. So off I go to the CLI to run ?mmperfmon query GPFSFilesetQuota? and I get this: Oct-10 16:33:28 [Info ] QueryEngine: (fd=64) query from 127.0.0.1: get metrics GPFSFilesetQuota from node=icgpfsq1 last 10 bucket_size 1 Oct-10 16:33:28 [Info ] QueryParser: metric: GPFSFilesetQuota Oct-10 16:33:28 [Warning] QueryEngine: searchForMetric: could not find metaKey for given metric GPFSFilesetQuota, returning. Oct-10 16:33:28 [Info ] QueryEngine: [fd=64] no data available for query Is this a golden ticket to my problem or should I be checking elsewhere? I?m following a troubleshooting guide here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1pdg_guiperfmonissues.htm and from the page directly within the GUI server itself. Notably, other things work ok: [root at icgpfsq1 richard]# mmperfmon query cpu_user Legend: 1: icgpfsq1|CPU|cpu_user Row Timestamp cpu_user 1 2018-10-10-16:41:09 0.00 2 2018-10-10-16:41:10 0.25 3 2018-10-10-16:41:11 0.50 4 2018-10-10-16:41:12 0.50 5 2018-10-10-16:41:13 0.50 6 2018-10-10-16:41:14 0.25 7 2018-10-10-16:41:15 1.25 8 2018-10-10-16:41:16 2.51 9 2018-10-10-16:41:17 0.25 10 2018-10-10-16:41:18 0.25 I?m running 5.0.1-2 on all nodes except the NSD servers which still run 5.0.0.2. Thanks Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From spectrumscale at kiranghag.com Fri Oct 12 05:38:19 2018 From: spectrumscale at kiranghag.com (KG) Date: Fri, 12 Oct 2018 07:38:19 +0300 Subject: [gpfsug-discuss] error compiling IOR on GPFS Message-ID: Hi Folks I am trying to compile IOR on a GPFS filesystem and running into following errors. Github forum says that "The configure script does not add -lgpfs to the CFLAGS when it detects GPFS support." Any help on how to get around this? mpicc -DHAVE_CONFIG_H -I. -g -O2 -MT aiori-MPIIO.o -MD -MP -MF .deps/aiori-MPIIO.Tpo -c -o aiori-MPIIO.o aiori-MPIIO.c aiori-MPIIO.c: In function ?MPIIO_Xfer?: aiori-MPIIO.c:236:24: warning: assignment from incompatible pointer type [enabled by default] Access = MPI_File_write; ^ aiori-MPIIO.c:237:27: warning: assignment from incompatible pointer type [enabled by default] Access_at = MPI_File_write_at; ^ aiori-MPIIO.c:238:28: warning: assignment from incompatible pointer type [enabled by default] Access_all = MPI_File_write_all; ^ aiori-MPIIO.c:239:31: warning: assignment from incompatible pointer type [enabled by default] Access_at_all = MPI_File_write_at_all; ^ mv -f .deps/aiori-MPIIO.Tpo .deps/aiori-MPIIO.Po mpicc -g -O2 -o ior ior.o utilities.o parse_options.o aiori-POSIX.o aiori-MPIIO.o -lm aiori-POSIX.o: In function `gpfs_free_all_locks': /gpfs/Aramco_POC/ior-master/src/aiori-POSIX.c:118: undefined reference to `gpfs_fcntl' aiori-POSIX.o: In function `gpfs_access_start': aiori-POSIX.c:(.text+0x91f): undefined reference to `gpfs_fcntl' aiori-POSIX.o: In function `gpfs_access_end': aiori-POSIX.c:(.text+0xa04): undefined reference to `gpfs_fcntl' collect2: error: ld returned 1 exit status make[2]: *** [ior] Error 1 make[2]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' make[1]: *** [all] Error 2 make[1]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' make: *** [all-recursive] Error 1 Kiran -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnbent at gmail.com Fri Oct 12 05:50:45 2018 From: johnbent at gmail.com (John Bent) Date: Thu, 11 Oct 2018 22:50:45 -0600 Subject: [gpfsug-discuss] error compiling IOR on GPFS In-Reply-To: References: Message-ID: Kiran, Are you using the latest version of IOR? https://github.com/hpc/ior Thanks, John On Thu, Oct 11, 2018 at 10:39 PM KG wrote: > Hi Folks > > I am trying to compile IOR on a GPFS filesystem and running into following > errors. > > Github forum says that "The configure script does not add -lgpfs to the > CFLAGS when it detects GPFS support." > > Any help on how to get around this? > > mpicc -DHAVE_CONFIG_H -I. -g -O2 -MT aiori-MPIIO.o -MD -MP -MF > .deps/aiori-MPIIO.Tpo -c -o aiori-MPIIO.o aiori-MPIIO.c > aiori-MPIIO.c: In function ?MPIIO_Xfer?: > aiori-MPIIO.c:236:24: warning: assignment from incompatible pointer type > [enabled by default] > Access = MPI_File_write; > ^ > aiori-MPIIO.c:237:27: warning: assignment from incompatible pointer type > [enabled by default] > Access_at = MPI_File_write_at; > ^ > aiori-MPIIO.c:238:28: warning: assignment from incompatible pointer type > [enabled by default] > Access_all = MPI_File_write_all; > ^ > aiori-MPIIO.c:239:31: warning: assignment from incompatible pointer type > [enabled by default] > Access_at_all = MPI_File_write_at_all; > ^ > mv -f .deps/aiori-MPIIO.Tpo .deps/aiori-MPIIO.Po > mpicc -g -O2 -o ior ior.o utilities.o parse_options.o aiori-POSIX.o > aiori-MPIIO.o -lm > aiori-POSIX.o: In function `gpfs_free_all_locks': > /gpfs/Aramco_POC/ior-master/src/aiori-POSIX.c:118: undefined reference to > `gpfs_fcntl' > aiori-POSIX.o: In function `gpfs_access_start': > aiori-POSIX.c:(.text+0x91f): undefined reference to `gpfs_fcntl' > aiori-POSIX.o: In function `gpfs_access_end': > aiori-POSIX.c:(.text+0xa04): undefined reference to `gpfs_fcntl' > collect2: error: ld returned 1 exit status > make[2]: *** [ior] Error 1 > make[2]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' > make[1]: *** [all] Error 2 > make[1]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' > make: *** [all-recursive] Error 1 > > Kiran > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Oct 12 11:09:49 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 12 Oct 2018 10:09:49 +0000 Subject: [gpfsug-discuss] Performance collector no results for Capacity In-Reply-To: References: Message-ID: Hi Anna, Markus It was the incorrect restrict clause referencing the FQDN of the server, and not the GPFS daemon node name, that was causing the problem. This has now been updated and we have nice graphs ? Many thanks! Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Fri Oct 12 11:39:12 2018 From: spectrumscale at kiranghag.com (KG) Date: Fri, 12 Oct 2018 13:39:12 +0300 Subject: [gpfsug-discuss] error compiling IOR on GPFS In-Reply-To: References: Message-ID: Hi John Yes, I am using latest version from this link. Do I have to use any additional switches for compilation? I used following sequence ./bootstrap ./configure ./make (fails) On Fri, Oct 12, 2018 at 7:51 AM John Bent wrote: > Kiran, > > Are you using the latest version of IOR? > https://github.com/hpc/ior > > Thanks, > > John > > On Thu, Oct 11, 2018 at 10:39 PM KG wrote: > >> Hi Folks >> >> I am trying to compile IOR on a GPFS filesystem and running into >> following errors. >> >> Github forum says that "The configure script does not add -lgpfs to the >> CFLAGS when it detects GPFS support." >> >> Any help on how to get around this? >> >> mpicc -DHAVE_CONFIG_H -I. -g -O2 -MT aiori-MPIIO.o -MD -MP -MF >> .deps/aiori-MPIIO.Tpo -c -o aiori-MPIIO.o aiori-MPIIO.c >> aiori-MPIIO.c: In function ?MPIIO_Xfer?: >> aiori-MPIIO.c:236:24: warning: assignment from incompatible pointer type >> [enabled by default] >> Access = MPI_File_write; >> ^ >> aiori-MPIIO.c:237:27: warning: assignment from incompatible pointer type >> [enabled by default] >> Access_at = MPI_File_write_at; >> ^ >> aiori-MPIIO.c:238:28: warning: assignment from incompatible pointer type >> [enabled by default] >> Access_all = MPI_File_write_all; >> ^ >> aiori-MPIIO.c:239:31: warning: assignment from incompatible pointer type >> [enabled by default] >> Access_at_all = MPI_File_write_at_all; >> ^ >> mv -f .deps/aiori-MPIIO.Tpo .deps/aiori-MPIIO.Po >> mpicc -g -O2 -o ior ior.o utilities.o parse_options.o aiori-POSIX.o >> aiori-MPIIO.o -lm >> aiori-POSIX.o: In function `gpfs_free_all_locks': >> /gpfs/Aramco_POC/ior-master/src/aiori-POSIX.c:118: undefined reference to >> `gpfs_fcntl' >> aiori-POSIX.o: In function `gpfs_access_start': >> aiori-POSIX.c:(.text+0x91f): undefined reference to `gpfs_fcntl' >> aiori-POSIX.o: In function `gpfs_access_end': >> aiori-POSIX.c:(.text+0xa04): undefined reference to `gpfs_fcntl' >> collect2: error: ld returned 1 exit status >> make[2]: *** [ior] Error 1 >> make[2]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' >> make[1]: *** [all] Error 2 >> make[1]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' >> make: *** [all-recursive] Error 1 >> >> Kiran >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Fri Oct 12 11:43:41 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 12 Oct 2018 12:43:41 +0200 Subject: [gpfsug-discuss] error compiling IOR on GPFS In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Oct 15 15:11:34 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 15 Oct 2018 14:11:34 +0000 Subject: [gpfsug-discuss] mmfileid on 2 NSDs simultaneously? Message-ID: Hi All, Is there a way to run mmfileid on two NSD?s simultaneously? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alexander.Saupp at de.ibm.com Mon Oct 15 19:18:32 2018 From: Alexander.Saupp at de.ibm.com (Alexander Saupp) Date: Mon, 15 Oct 2018 20:18:32 +0200 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS Message-ID: Dear Spectrum Scale mailing list, I'm part of IBM Lab Services - currently i'm having multiple customers asking me for optimization of a similar workloads. The task is to tune a Spectrum Scale system (comprising ESS and CES protocol nodes) for the following workload: A single Linux NFS client mounts an NFS export, extracts a flat tar archive with lots of ~5KB files. I'm measuring the speed at which those 5KB files are written (`time tar xf archive.tar`). I do understand that Spectrum Scale is not designed for such workload (single client, single thread, small files, single directory), and that such benchmark in not appropriate to benmark the system. Yet I find myself explaining the performance for such scenario (git clone..) quite frequently, as customers insist that optimization of that scenario would impact individual users as it shows task duration. I want to make sure that I have optimized the system as much as possible for the given workload, and that I have not overlooked something obvious. When writing to GPFS directly I'm able to write ~1800 files / second in a test setup. This is roughly the same on the protocol nodes (NSD client), as well as on the ESS IO nodes (NSD server). When writing to the NFS export on the protocol node itself (to avoid any network effects) I'm only able to write ~230 files / second. Writing to the NFS export from another node (now including network latency) gives me ~220 files / second. There seems to be a huge performance degradation by adding NFS-Ganesha to the software stack alone. I wonder what can be done to minimize the impact. - Ganesha doesn't seem to support 'async' or 'no_wdelay' options... anything equivalent available? - Is there and expected advantage of using the network-latency tuned profile, as opposed to the ESS default throughput-performance profile? - Are there other relevant Kernel params? - Is there an expected advantage of raising the number of threads (NSD server (nsd*WorkerThreads) / NSD client (workerThreads) / Ganesha (NB_WORKER)) for the given workload (single client, single thread, small files)? - Are there other relevant GPFS params? - Impact of Sync replication, disk latency, etc is understood. - I'm aware that 'the real thing' would be to work with larger files in a multithreaded manner from multiple nodes - and that this scenario will scale quite well. I just want to ensure that I'm not missing something obvious over reiterating that massage to customers. Any help was greatly appreciated - thanks much in advance! Alexander Saupp IBM Germany Mit freundlichen Gr??en / Kind regards Alexander Saupp IBM Systems, Storage Platform, EMEA Storage Competence Center Phone: +49 7034-643-1512 IBM Deutschland GmbH Mobile: +49-172 7251072 Am Weiher 24 Email: alexander.saupp at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 54993307.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From makaplan at us.ibm.com Mon Oct 15 19:44:52 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 15 Oct 2018 14:44:52 -0400 Subject: [gpfsug-discuss] mmfileid on 2 NSDs simultaneously? In-Reply-To: References: Message-ID: How about using the -F option? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mutantllama at gmail.com Mon Oct 15 23:32:35 2018 From: mutantllama at gmail.com (Carl) Date: Tue, 16 Oct 2018 09:32:35 +1100 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Hi, We recently had a PMR open for Ganesha related performance issues, which was resolved with an eFix that updated Ganesha. If you are running GPFS v5 I would suggest contacting support. Cheers, Carl. On Tue, 16 Oct 2018 at 5:20 am, Alexander Saupp wrote: > Dear Spectrum Scale mailing list, > > I'm part of IBM Lab Services - currently i'm having multiple customers > asking me for optimization of a similar workloads. > > The task is to tune a Spectrum Scale system (comprising ESS and CES > protocol nodes) for the following workload: > A single Linux NFS client mounts an NFS export, extracts a flat tar > archive with lots of ~5KB files. > I'm measuring the speed at which those 5KB files are written (`time tar xf > archive.tar`). > > I do understand that Spectrum Scale is not designed for such workload > (single client, single thread, small files, single directory), and that > such benchmark in not appropriate to benmark the system. > Yet I find myself explaining the performance for such scenario (git > clone..) quite frequently, as customers insist that optimization of that > scenario would impact individual users as it shows task duration. > I want to make sure that I have optimized the system as much as possible > for the given workload, and that I have not overlooked something obvious. > > > When writing to GPFS directly I'm able to write ~1800 files / second in a > test setup. > This is roughly the same on the protocol nodes (NSD client), as well as on > the ESS IO nodes (NSD server). > When writing to the NFS export on the protocol node itself (to avoid any > network effects) I'm only able to write ~230 files / second. > Writing to the NFS export from another node (now including network > latency) gives me ~220 files / second. > > > There seems to be a huge performance degradation by adding NFS-Ganesha to > the software stack alone. I wonder what can be done to minimize the impact. > > > - Ganesha doesn't seem to support 'async' or 'no_wdelay' options... > anything equivalent available? > - Is there and expected advantage of using the network-latency tuned > profile, as opposed to the ESS default throughput-performance profile? > - Are there other relevant Kernel params? > - Is there an expected advantage of raising the number of threads (NSD > server (nsd*WorkerThreads) / NSD client (workerThreads) / Ganesha > (NB_WORKER)) for the given workload (single client, single thread, small > files)? > - Are there other relevant GPFS params? > - Impact of Sync replication, disk latency, etc is understood. > - I'm aware that 'the real thing' would be to work with larger files in a > multithreaded manner from multiple nodes - and that this scenario will > scale quite well. > I just want to ensure that I'm not missing something obvious over > reiterating that massage to customers. > > Any help was greatly appreciated - thanks much in advance! > Alexander Saupp > IBM Germany > > > Mit freundlichen Gr??en / Kind regards > > *Alexander Saupp* > > IBM Systems, Storage Platform, EMEA Storage Competence Center > ------------------------------ > Phone: +49 7034-643-1512 IBM Deutschland GmbH > Mobile: +49-172 7251072 Am Weiher 24 > Email: alexander.saupp at de.ibm.com 65451 Kelsterbach > Germany > ------------------------------ > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan > Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 54993307.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From kums at us.ibm.com Mon Oct 15 23:34:50 2018 From: kums at us.ibm.com (Kumaran Rajaram) Date: Mon, 15 Oct 2018 18:34:50 -0400 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Hi Alexander, 1. >>When writing to GPFS directly I'm able to write ~1800 files / second in a test setup. >>This is roughly the same on the protocol nodes (NSD client), as well as on the ESS IO nodes (NSD server). 2. >> When writing to the NFS export on the protocol node itself (to avoid any network effects) I'm only able to write ~230 files / second. IMHO #2, writing to the NFS export on the protocol node should be same as #1. Protocol node is also a NSD client and when you write from a protocol node, it will use the NSD protocol to write to the ESS IO nodes. In #1, you cite seeing ~1800 files from protocol node and in #2 you cite seeing ~230 file/sec which seem to contradict each other. >>Writing to the NFS export from another node (now including network latency) gives me ~220 files / second. IMHO, this workload "single client, single thread, small files, single directory - tar xf" is synchronous is nature and will result in single outstanding file to be sent from the NFS client to the CES node. Hence, the performance will be limited by network latency/capability between the NFS client and CES node for small IO size (~5KB file size). Also, what is the network interconnect/interface between the NFS client and CES node? Is the network 10GigE since @220 file/s for 5KiB file-size will saturate 1 x 10GigE link. 220 files/sec * 5KiB file size ==> ~1.126 GB/s. >> I'm aware that 'the real thing' would be to work with larger files in a multithreaded manner from multiple nodes - and that this scenario will scale quite well. Yes, larger file-size + multiple threads + multiple NFS client nodes will help to scale performance further by having more NFS I/O requests scheduled/pipelined over the network and processed on the CES nodes. >> I just want to ensure that I'm not missing something obvious over reiterating that massage to customers. Adding NFS experts/team, for advise. My two cents. Best Regards, -Kums From: "Alexander Saupp" To: gpfsug-discuss at spectrumscale.org Date: 10/15/2018 02:20 PM Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Spectrum Scale mailing list, I'm part of IBM Lab Services - currently i'm having multiple customers asking me for optimization of a similar workloads. The task is to tune a Spectrum Scale system (comprising ESS and CES protocol nodes) for the following workload: A single Linux NFS client mounts an NFS export, extracts a flat tar archive with lots of ~5KB files. I'm measuring the speed at which those 5KB files are written (`time tar xf archive.tar`). I do understand that Spectrum Scale is not designed for such workload (single client, single thread, small files, single directory), and that such benchmark in not appropriate to benmark the system. Yet I find myself explaining the performance for such scenario (git clone..) quite frequently, as customers insist that optimization of that scenario would impact individual users as it shows task duration. I want to make sure that I have optimized the system as much as possible for the given workload, and that I have not overlooked something obvious. When writing to GPFS directly I'm able to write ~1800 files / second in a test setup. This is roughly the same on the protocol nodes (NSD client), as well as on the ESS IO nodes (NSD server). When writing to the NFS export on the protocol node itself (to avoid any network effects) I'm only able to write ~230 files / second. Writing to the NFS export from another node (now including network latency) gives me ~220 files / second. There seems to be a huge performance degradation by adding NFS-Ganesha to the software stack alone. I wonder what can be done to minimize the impact. - Ganesha doesn't seem to support 'async' or 'no_wdelay' options... anything equivalent available? - Is there and expected advantage of using the network-latency tuned profile, as opposed to the ESS default throughput-performance profile? - Are there other relevant Kernel params? - Is there an expected advantage of raising the number of threads (NSD server (nsd*WorkerThreads) / NSD client (workerThreads) / Ganesha (NB_WORKER)) for the given workload (single client, single thread, small files)? - Are there other relevant GPFS params? - Impact of Sync replication, disk latency, etc is understood. - I'm aware that 'the real thing' would be to work with larger files in a multithreaded manner from multiple nodes - and that this scenario will scale quite well. I just want to ensure that I'm not missing something obvious over reiterating that massage to customers. Any help was greatly appreciated - thanks much in advance! Alexander Saupp IBM Germany Mit freundlichen Gr??en / Kind regards Alexander Saupp IBM Systems, Storage Platform, EMEA Storage Competence Center Phone: +49 7034-643-1512 IBM Deutschland GmbH Mobile: +49-172 7251072 Am Weiher 24 Email: alexander.saupp at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Oct 15 20:09:19 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 15 Oct 2018 19:09:19 +0000 Subject: [gpfsug-discuss] mmfileid on 2 NSDs simultaneously? In-Reply-To: References: Message-ID: <4C0E90D1-14DA-44A1-B037-95C17076193C@vanderbilt.edu> Marc, Ugh - sorry, completely overlooked that? Kevin On Oct 15, 2018, at 1:44 PM, Marc A Kaplan > wrote: How about using the -F option? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb6d9700cd6ff4bbed85808d632ce4ff2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636752259026486137&sdata=mBfANLkK8v2ZEahGumE4a7iVIAcVJXb1Dv2kgSrynrI%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Tue Oct 16 01:42:14 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 15 Oct 2018 20:42:14 -0400 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: <5824.1539650534@turing-police.cc.vt.edu> On Mon, 15 Oct 2018 18:34:50 -0400, "Kumaran Rajaram" said: > 1. >>When writing to GPFS directly I'm able to write ~1800 files / second in a test setup. > >>This is roughly the same on the protocol nodes (NSD client), as well as > on the ESS IO nodes (NSD server). > > 2. >> When writing to the NFS export on the protocol node itself (to avoid > any network effects) I'm only able to write ~230 files / second. > IMHO #2, writing to the NFS export on the protocol node should be same as #1. > Protocol node is also a NSD client and when you write from a protocol node, it > will use the NSD protocol to write to the ESS IO nodes. In #1, you cite seeing > ~1800 files from protocol node and in #2 you cite seeing ~230 file/sec which > seem to contradict each other. I think he means this: 1) ssh nsd_server 2) cd /gpfs/filesystem/testarea 3) (whomp out 1800 files/sec) 4) mount -t nfs localhost:/gpfs/filesystem/testarea /mnt/test 5) cd /mnt/test 6) Watch the same test struggle to hit 230. Indicating the issue is going from NFS to GPFS (For what it's worth, we've had issues with Ganesha as well...) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Achim.Rehor at de.ibm.com Tue Oct 16 10:39:14 2018 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Tue, 16 Oct 2018 11:39:14 +0200 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: From diederich at de.ibm.com Tue Oct 16 13:31:20 2018 From: diederich at de.ibm.com (Michael Diederich) Date: Tue, 16 Oct 2018 14:31:20 +0200 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS In-Reply-To: <5824.1539650534@turing-police.cc.vt.edu> References: <5824.1539650534@turing-police.cc.vt.edu> Message-ID: All NFS IO requires syncing. The client does send explicit fsync (commit). If the NFS server does not sync, a server fail will cause data loss! (for small files <1M it really does not matter if it is sync on write or sync on close/explicit commit) while that may be ok for a "git pull" or similar, in general it violates the NFS spec. The client can decide to cache, and usually NFSv4 does less caching (for better consistency) So the observed factor 100 is realistic. Latencies will make matters worse, so the FS should be tuned for very small random IO (small blocksize - small subblock-size will not help) If you were to put the Linux kernel NFS server into the picture, it will behave very much the same - although Ganesha could be a bit more efficient (by some percent - certainly less then 200%). But hey - this is a GPFS cluster not some NAS box. Run "git pull" on tthe GPFS client. Enjoy the 1800 files/sec (or more). Modify the files on your XY client mounting over NFS. Use a wrapper script to automatically have your AD or LDAP user id SSH into the cluster to perform it. Michael Mit freundlichen Gr??en / with best regards Michael Diederich IBM Systems Group Spectrum Scale Software Development Contact Information IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen Registergericht: Amtsgericht Stuttgart, HRB 243294 mail: fon: address: michael.diederich at de.ibm.com +49-7034-274-4062 Am Weiher 24 D-65451 Kelsterbach From: valdis.kletnieks at vt.edu To: gpfsug main discussion list Cc: Silvana De Gyves , Jay Vaddi , Michael Diederich Date: 10/16/2018 02:42 AM Subject: Re: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS Sent by: Valdis Kletnieks On Mon, 15 Oct 2018 18:34:50 -0400, "Kumaran Rajaram" said: > 1. >>When writing to GPFS directly I'm able to write ~1800 files / second in a test setup. > >>This is roughly the same on the protocol nodes (NSD client), as well as > on the ESS IO nodes (NSD server). > > 2. >> When writing to the NFS export on the protocol node itself (to avoid > any network effects) I'm only able to write ~230 files / second. > IMHO #2, writing to the NFS export on the protocol node should be same as #1. > Protocol node is also a NSD client and when you write from a protocol node, it > will use the NSD protocol to write to the ESS IO nodes. In #1, you cite seeing > ~1800 files from protocol node and in #2 you cite seeing ~230 file/sec which > seem to contradict each other. I think he means this: 1) ssh nsd_server 2) cd /gpfs/filesystem/testarea 3) (whomp out 1800 files/sec) 4) mount -t nfs localhost:/gpfs/filesystem/testarea /mnt/test 5) cd /mnt/test 6) Watch the same test struggle to hit 230. Indicating the issue is going from NFS to GPFS (For what it's worth, we've had issues with Ganesha as well...) [attachment "att4z9wh.dat" deleted by Michael Diederich/Germany/IBM] -------------- next part -------------- An HTML attachment was scrubbed... URL: From KKR at lbl.gov Tue Oct 16 14:20:08 2018 From: KKR at lbl.gov (Kristy Kallback-Rose) Date: Tue, 16 Oct 2018 14:20:08 +0100 Subject: [gpfsug-discuss] Presentations and SC18 Sign Up Message-ID: Quick message, more later. The presentation bundle (zip file) from the September UG meeting at ORNL is now here: https://www.spectrumscaleug.org/presentations/ I'll add more details there soon. If you haven't signed up for SC18's UG meeting yet, you can should do so here: https://ibm.co/2CjZyHG SC18 agenda is being discussed today. Hoping for more details about that soon. Cheers, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Tue Oct 16 17:44:08 2018 From: spectrumscale at kiranghag.com (KG) Date: Tue, 16 Oct 2018 19:44:08 +0300 Subject: [gpfsug-discuss] error compiling IOR on GPFS In-Reply-To: References: Message-ID: Thanks Olaf It worked. On Fri, Oct 12, 2018, 13:43 Olaf Weiser wrote: > I think the step you are missing is this... > > > > > ./configure LIBS=/usr/lpp/mmfs/lib/libgpfs.so > make > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 > E-Mail: olaf.weiser at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: KG > To: gpfsug main discussion list > Date: 10/12/2018 12:40 PM > Subject: Re: [gpfsug-discuss] error compiling IOR on GPFS > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi John > > Yes, I am using latest version from this link. > > Do I have to use any additional switches for compilation? I used following > sequence > ./bootstrap > ./configure > ./make (fails) > > > On Fri, Oct 12, 2018 at 7:51 AM John Bent <*johnbent at gmail.com* > > wrote: > Kiran, > > Are you using the latest version of IOR? > *https://github.com/hpc/ior* > > Thanks, > > John > > On Thu, Oct 11, 2018 at 10:39 PM KG <*spectrumscale at kiranghag.com* > > wrote: > Hi Folks > > I am trying to compile IOR on a GPFS filesystem and running into following > errors. > > Github forum says that "The configure script does not add -lgpfs to the > CFLAGS when it detects GPFS support." > > Any help on how to get around this? > > mpicc -DHAVE_CONFIG_H -I. -g -O2 -MT aiori-MPIIO.o -MD -MP -MF > .deps/aiori-MPIIO.Tpo -c -o aiori-MPIIO.o aiori-MPIIO.c > aiori-MPIIO.c: In function ?MPIIO_Xfer?: > aiori-MPIIO.c:236:24: warning: assignment from incompatible pointer type > [enabled by default] > Access = MPI_File_write; > ^ > aiori-MPIIO.c:237:27: warning: assignment from incompatible pointer type > [enabled by default] > Access_at = MPI_File_write_at; > ^ > aiori-MPIIO.c:238:28: warning: assignment from incompatible pointer type > [enabled by default] > Access_all = MPI_File_write_all; > ^ > aiori-MPIIO.c:239:31: warning: assignment from incompatible pointer type > [enabled by default] > Access_at_all = MPI_File_write_at_all; > ^ > mv -f .deps/aiori-MPIIO.Tpo .deps/aiori-MPIIO.Po > mpicc -g -O2 -o ior ior.o utilities.o parse_options.o aiori-POSIX.o > aiori-MPIIO.o -lm > aiori-POSIX.o: In function `gpfs_free_all_locks': > /gpfs/Aramco_POC/ior-master/src/aiori-POSIX.c:118: undefined reference to > `gpfs_fcntl' > aiori-POSIX.o: In function `gpfs_access_start': > aiori-POSIX.c:(.text+0x91f): undefined reference to `gpfs_fcntl' > aiori-POSIX.o: In function `gpfs_access_end': > aiori-POSIX.c:(.text+0xa04): undefined reference to `gpfs_fcntl' > collect2: error: ld returned 1 exit status > make[2]: *** [ior] Error 1 > make[2]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' > make[1]: *** [all] Error 2 > make[1]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' > make: *** [all-recursive] Error 1 > > Kiran > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alexander.Saupp at de.ibm.com Wed Oct 17 12:44:41 2018 From: Alexander.Saupp at de.ibm.com (Alexander Saupp) Date: Wed, 17 Oct 2018 13:44:41 +0200 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Message-ID: Dear Mailing List readers, I've come to a preliminary conclusion that explains the behavior in an appropriate manner, so I'm trying to summarize my current thinking with this audience. Problem statement: Big performance derivation between native GPFS (fast) and loopback NFS mount on the same node (way slower) for single client, single thread, small files workload. Current explanation: tar seems to use close() on files, not fclose(). That is an application choice and common behavior. The ideas is to allow OS write caching to speed up process run time. When running locally on ext3 / xfs / GPFS / .. that allows async destaging of data down to disk, somewhat compromising data for better performance. As we're talking about write caching on the same node that the application runs on - a crash is missfortune but in the same failure domain. E.g. if you run a compile job that includes extraction of a tar and the node crashes you'll have to restart the entire job, anyhow. The NFSv2 spec defined that NFS io's are to be 'sync', probably because the compile job on the nfs client would survive if the NFS Server crashes, so the failure domain would be different NFSv3 in rfc1813 below acknowledged the performance impact and introduced the 'async' flag for NFS, which would handle IO's similar to local IOs, allowing to destage in the background. Keep in mind - applications, independent if running locally or via NFS can always decided to use the fclose() option, which will ensure that data is destaged to persistent storage right away. But its an applications choice if that's really mandatory or whether performance has higher priority. The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache down to disk - very filesystem independent. -> single client, single thread, small files workload on GPFS can be destaged async, allowing to hide latency and parallelizing disk IOs. -> NFS client IO's are sync, so the second IO can only be started after the first one hit non volatile memory -> much higher latency The Spectrum Scale NFS implementation (based on ganesha) does not support the async mount option, which is a bit of a pitty. There might also be implementation differences compared to kernel-nfs, I did not investigate into that direction. However, the principles of the difference are explained for my by the above behavior. One workaround that I saw working well for multiple customers was to replace the NFS client by a Spectrum Scale nsd client. That has two advantages, but is certainly not suitable in all cases: - Improved speed by efficent NSD protocol and NSD client side write caching - Write Caching in the same failure domain as the application (on NSD client) which seems to be more reasonable compared to NFS Server side write caching. References: NFS sync vs async https://tools.ietf.org/html/rfc1813 The write throughput bottleneck caused by the synchronous definition of write in the NFS version 2 protocol has been addressed by adding support so that the NFS server can do unsafe writes. Unsafe writes are writes which have not been committed to stable storage before the operation returns. This specification defines a method for committing these unsafe writes to stable storage in a reliable way. sync() vs fsync() https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm - An application program makes an fsync() call for a specified file. This causes all of the pages that contain modified data for that file to be written to disk. The writing is complete when the fsync() call returns to the program. - An application program makes a sync() call. This causes all of the file pages in memory that contain modified data to be scheduled for writing to disk. The writing is not necessarily complete when the sync() call returns to the program. - A user can enter the sync command, which in turn issues a sync() call. Again, some of the writes may not be complete when the user is prompted for input (or the next command in a shell script is processed). close() vs fclose() A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a file system to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.) Mit freundlichen Gr??en / Kind regards Alexander Saupp IBM Systems, Storage Platform, EMEA Storage Competence Center Phone: +49 7034-643-1512 IBM Deutschland GmbH Mobile: +49-172 7251072 Am Weiher 24 Email: alexander.saupp at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19995626.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From janfrode at tanso.net Wed Oct 17 13:24:01 2018 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 17 Oct 2018 08:24:01 -0400 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Do you know if the slow throughput is caused by the network/nfs-protocol layer, or does it help to use faster storage (ssd)? If on storage, have you considered if HAWC can help? I?m thinking about adding an SSD pool as a first tier to hold the active dataset for a similar setup, but that?s mainly to solve the small file read workload (i.e. random I/O ). -jf ons. 17. okt. 2018 kl. 07:47 skrev Alexander Saupp < Alexander.Saupp at de.ibm.com>: > Dear Mailing List readers, > > I've come to a preliminary conclusion that explains the behavior in an > appropriate manner, so I'm trying to summarize my current thinking with > this audience. > > *Problem statement: * > > Big performance derivation between native GPFS (fast) and loopback NFS > mount on the same node (way slower) for single client, single thread, small > files workload. > > > > *Current explanation:* > > tar seems to use close() on files, not fclose(). That is an > application choice and common behavior. The ideas is to allow OS write > caching to speed up process run time. > > When running locally on ext3 / xfs / GPFS / .. that allows async > destaging of data down to disk, somewhat compromising data for better > performance. > As we're talking about write caching on the same node that the > application runs on - a crash is missfortune but in the same failure domain. > E.g. if you run a compile job that includes extraction of a tar and > the node crashes you'll have to restart the entire job, anyhow. > > The NFSv2 spec defined that NFS io's are to be 'sync', probably > because the compile job on the nfs client would survive if the NFS Server > crashes, so the failure domain would be different > > NFSv3 in rfc1813 below acknowledged the performance impact and > introduced the 'async' flag for NFS, which would handle IO's similar to > local IOs, allowing to destage in the background. > > Keep in mind - applications, independent if running locally or via NFS > can always decided to use the fclose() option, which will ensure that data > is destaged to persistent storage right away. > But its an applications choice if that's really mandatory or whether > performance has higher priority. > > The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache > down to disk - very filesystem independent. > > > -> single client, single thread, small files workload on GPFS can be > destaged async, allowing to hide latency and parallelizing disk IOs. > -> NFS client IO's are sync, so the second IO can only be started after > the first one hit non volatile memory -> much higher latency > > > > The Spectrum Scale NFS implementation (based on ganesha) does not > support the async mount option, which is a bit of a pitty. There might also > be implementation differences compared to kernel-nfs, I did not investigate > into that direction. > > However, the principles of the difference are explained for my by the > above behavior. > > One workaround that I saw working well for multiple customers was to > replace the NFS client by a Spectrum Scale nsd client. > That has two advantages, but is certainly not suitable in all cases: > - Improved speed by efficent NSD protocol and NSD client side write > caching > - Write Caching in the same failure domain as the application (on > NSD client) which seems to be more reasonable compared to NFS Server side > write caching. > > > *References:* > > NFS sync vs async > https://tools.ietf.org/html/rfc1813 > *The write throughput bottleneck caused by the synchronous definition of > write in the NFS version 2 protocol has been addressed by adding support so > that the NFS server can do unsafe writes.* > Unsafe writes are writes which have not been committed to stable storage > before the operation returns. This specification defines a method for > committing these unsafe writes to stable storage in a reliable way. > > > *sync() vs fsync()* > > https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm > - An application program makes an fsync() call for a specified file. This > causes all of the pages that contain modified data for that file to be > written to disk. The writing is complete when the fsync() call returns to > the program. > > - An application program makes a sync() call. This causes all of the file > pages in memory that contain modified data to be scheduled for writing to > disk. The writing is not necessarily complete when the sync() call returns > to the program. > > - A user can enter the sync command, which in turn issues a sync() call. > Again, some of the writes may not be complete when the user is prompted for > input (or the next command in a shell script is processed). > > > *close() vs fclose()* > A successful close does not guarantee that the data has been successfully > saved to disk, as the kernel defers writes. It is not common for a file > system to flush the buffers when the stream is closed. If you need to be > sure that the data is > physically stored use fsync(2). (It will depend on the disk hardware at > this point.) > > > Mit freundlichen Gr??en / Kind regards > > *Alexander Saupp* > > IBM Systems, Storage Platform, EMEA Storage Competence Center > ------------------------------ > Phone: +49 7034-643-1512 IBM Deutschland GmbH > Mobile: +49-172 7251072 Am Weiher 24 > Email: alexander.saupp at de.ibm.com 65451 Kelsterbach > Germany > ------------------------------ > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan > Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19995626.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From olaf.weiser at de.ibm.com Wed Oct 17 14:15:12 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 17 Oct 2018 15:15:12 +0200 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Wed Oct 17 14:26:52 2018 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 17 Oct 2018 16:26:52 +0300 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Just to clarify ( from man exports): " async This option allows the NFS server to violate the NFS protocol and reply to requests before any changes made by that request have been committed to stable storage (e.g. disc drive). Using this option usually improves performance, but at the cost that an unclean server restart (i.e. a crash) can cause data to be lost or corrupted." With the Ganesha implementation in Spectrum Scale, it was decided not to allow this violation - so this async export options wasn't exposed. I believe that for those customers that agree to take the risk, using async mount option ( from the client) will achieve similar behavior. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Olaf Weiser" To: gpfsug main discussion list Date: 17/10/2018 16:16 Subject: Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Jallo Jan, you can expect to get slightly improved numbers from the lower response times of the HAWC ... but the loss of performance comes from the fact, that GPFS or (async kNFS) writes with multiple parallel threads - in opposite to e.g. tar via GaneshaNFS comes with single threads fsync on each file.. you'll never outperform e.g. 128 (maybe slower), but, parallel threads (running write-behind) <---> with one single but fast threads, .... so as Alex suggest.. if possible.. take gpfs client of kNFS for those types of workloads.. From: Jan-Frode Myklebust To: gpfsug main discussion list Date: 10/17/2018 02:24 PM Subject: Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Do you know if the slow throughput is caused by the network/nfs-protocol layer, or does it help to use faster storage (ssd)? If on storage, have you considered if HAWC can help? I?m thinking about adding an SSD pool as a first tier to hold the active dataset for a similar setup, but that?s mainly to solve the small file read workload (i.e. random I/O ). -jf ons. 17. okt. 2018 kl. 07:47 skrev Alexander Saupp < Alexander.Saupp at de.ibm.com>: Dear Mailing List readers, I've come to a preliminary conclusion that explains the behavior in an appropriate manner, so I'm trying to summarize my current thinking with this audience. Problem statement: Big performance derivation between native GPFS (fast) and loopback NFS mount on the same node (way slower) for single client, single thread, small files workload. Current explanation: tar seems to use close() on files, not fclose(). That is an application choice and common behavior. The ideas is to allow OS write caching to speed up process run time. When running locally on ext3 / xfs / GPFS / .. that allows async destaging of data down to disk, somewhat compromising data for better performance. As we're talking about write caching on the same node that the application runs on - a crash is missfortune but in the same failure domain. E.g. if you run a compile job that includes extraction of a tar and the node crashes you'll have to restart the entire job, anyhow. The NFSv2 spec defined that NFS io's are to be 'sync', probably because the compile job on the nfs client would survive if the NFS Server crashes, so the failure domain would be different NFSv3 in rfc1813 below acknowledged the performance impact and introduced the 'async' flag for NFS, which would handle IO's similar to local IOs, allowing to destage in the background. Keep in mind - applications, independent if running locally or via NFS can always decided to use the fclose() option, which will ensure that data is destaged to persistent storage right away. But its an applications choice if that's really mandatory or whether performance has higher priority. The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache down to disk - very filesystem independent. -> single client, single thread, small files workload on GPFS can be destaged async, allowing to hide latency and parallelizing disk IOs. -> NFS client IO's are sync, so the second IO can only be started after the first one hit non volatile memory -> much higher latency The Spectrum Scale NFS implementation (based on ganesha) does not support the async mount option, which is a bit of a pitty. There might also be implementation differences compared to kernel-nfs, I did not investigate into that direction. However, the principles of the difference are explained for my by the above behavior. One workaround that I saw working well for multiple customers was to replace the NFS client by a Spectrum Scale nsd client. That has two advantages, but is certainly not suitable in all cases: - Improved speed by efficent NSD protocol and NSD client side write caching - Write Caching in the same failure domain as the application (on NSD client) which seems to be more reasonable compared to NFS Server side write caching. References: NFS sync vs async https://tools.ietf.org/html/rfc1813 The write throughput bottleneck caused by the synchronous definition of write in the NFS version 2 protocol has been addressed by adding support so that the NFS server can do unsafe writes. Unsafe writes are writes which have not been committed to stable storage before the operation returns. This specification defines a method for committing these unsafe writes to stable storage in a reliable way. sync() vs fsync() https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm - An application program makes an fsync() call for a specified file. This causes all of the pages that contain modified data for that file to be written to disk. The writing is complete when the fsync() call returns to the program. - An application program makes a sync() call. This causes all of the file pages in memory that contain modified data to be scheduled for writing to disk. The writing is not necessarily complete when the sync() call returns to the program. - A user can enter the sync command, which in turn issues a sync() call. Again, some of the writes may not be complete when the user is prompted for input (or the next command in a shell script is processed). close() vs fclose() A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a file system to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.) Mit freundlichen Gr??en / Kind regards Alexander Saupp IBM Systems, Storage Platform, EMEA Storage Competence Center Phone: +49 7034-643-1512 IBM Deutschland GmbH Mobile: +49-172 7251072 Am Weiher 24 Email: alexander.saupp at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "ecblank.gif" deleted by Olaf Weiser/Germany/IBM] [attachment "19995626.gif" deleted by Olaf Weiser/Germany/IBM] [attachment "ecblank.gif" deleted by Olaf Weiser/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From MKEIGO at jp.ibm.com Wed Oct 17 14:34:55 2018 From: MKEIGO at jp.ibm.com (Keigo Matsubara) Date: Wed, 17 Oct 2018 22:34:55 +0900 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: I also wonder how many products actually exploit NFS async mode to improve I/O performance by sacrificing the file system consistency risk: gpfsug-discuss-bounces at spectrumscale.org wrote on 2018/10/17 22:26:52: > Using this option usually improves performance, but at > the cost that an unclean server restart (i.e. a crash) can cause > data to be lost or corrupted." For instance, NetApp, at the very least FAS 3220 running Data OnTap 8.1.2p4 7-mode which I tested with, would forcibly *promote* async mode to sync mode. Promoting means even if NFS client requests async mount mode, the NFS server ignores and allows only sync mount mode. Best Regards, --- Keigo Matsubara, Storage Solutions Client Technical Specialist, IBM Japan TEL: +81-50-3150-0595, T/L: 6205-0595 -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Oct 17 14:35:22 2018 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 17 Oct 2018 09:35:22 -0400 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: My thinking was mainly that single threaded 200 files/second == 5 ms/file. Where do these 5 ms go? Is it NFS protocol overhead, or is it waiting for I/O so that it can be fixed with a lower latency storage backend? -jf On Wed, Oct 17, 2018 at 9:15 AM Olaf Weiser wrote: > Jallo Jan, > you can expect to get slightly improved numbers from the lower response > times of the HAWC ... but the loss of performance comes from the fact, that > GPFS or (async kNFS) writes with multiple parallel threads - in opposite > to e.g. tar via GaneshaNFS comes with single threads fsync on each file.. > > you'll never outperform e.g. 128 (maybe slower), but, parallel threads > (running write-behind) <---> with one single but fast threads, .... > > so as Alex suggest.. if possible.. take gpfs client of kNFS for those > types of workloads.. > > > > > > > > > > > From: Jan-Frode Myklebust > To: gpfsug main discussion list > Date: 10/17/2018 02:24 PM > Subject: Re: [gpfsug-discuss] Preliminary conclusion: single > client, single thread, small files - native Scale vs NFS > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Do you know if the slow throughput is caused by the network/nfs-protocol > layer, or does it help to use faster storage (ssd)? If on storage, have you > considered if HAWC can help? > > I?m thinking about adding an SSD pool as a first tier to hold the active > dataset for a similar setup, but that?s mainly to solve the small file read > workload (i.e. random I/O ). > > > -jf > ons. 17. okt. 2018 kl. 07:47 skrev Alexander Saupp < > *Alexander.Saupp at de.ibm.com* >: > Dear Mailing List readers, > > I've come to a preliminary conclusion that explains the behavior in an > appropriate manner, so I'm trying to summarize my current thinking with > this audience. > > *Problem statement: * > Big performance derivation between native GPFS (fast) and loopback NFS > mount on the same node (way slower) for single client, single thread, small > files workload. > > > *Current explanation:* > tar seems to use close() on files, not fclose(). That is an application > choice and common behavior. The ideas is to allow OS write caching to speed > up process run time. > > When running locally on ext3 / xfs / GPFS / .. that allows async destaging > of data down to disk, somewhat compromising data for better performance. > As we're talking about write caching on the same node that the application > runs on - a crash is missfortune but in the same failure domain. > E.g. if you run a compile job that includes extraction of a tar and the > node crashes you'll have to restart the entire job, anyhow. > > The NFSv2 spec defined that NFS io's are to be 'sync', probably because > the compile job on the nfs client would survive if the NFS Server crashes, > so the failure domain would be different > > NFSv3 in rfc1813 below acknowledged the performance impact and introduced > the 'async' flag for NFS, which would handle IO's similar to local IOs, > allowing to destage in the background. > > Keep in mind - applications, independent if running locally or via NFS can > always decided to use the fclose() option, which will ensure that data is > destaged to persistent storage right away. > But its an applications choice if that's really mandatory or whether > performance has higher priority. > > The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache down > to disk - very filesystem independent. > > -> single client, single thread, small files workload on GPFS can be > destaged async, allowing to hide latency and parallelizing disk IOs. > -> NFS client IO's are sync, so the second IO can only be started after > the first one hit non volatile memory -> much higher latency > > > The Spectrum Scale NFS implementation (based on ganesha) does not support > the async mount option, which is a bit of a pitty. There might also be > implementation differences compared to kernel-nfs, I did not investigate > into that direction. > > However, the principles of the difference are explained for my by the > above behavior. > > One workaround that I saw working well for multiple customers was to > replace the NFS client by a Spectrum Scale nsd client. > That has two advantages, but is certainly not suitable in all cases: > - Improved speed by efficent NSD protocol and NSD client side write caching > - Write Caching in the same failure domain as the application (on NSD > client) which seems to be more reasonable compared to NFS Server side write > caching. > > *References:* > > NFS sync vs async > *https://tools.ietf.org/html/rfc1813* > > *The write throughput bottleneck caused by the synchronous definition of > write in the NFS version 2 protocol has been addressed by adding support so > that the NFS server can do unsafe writes.* > Unsafe writes are writes which have not been committed to stable storage > before the operation returns. This specification defines a method for > committing these unsafe writes to stable storage in a reliable way. > > > *sync() vs fsync()* > > *https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm* > > - An application program makes an fsync() call for a specified file. This > causes all of the pages that contain modified data for that file to be > written to disk. The writing is complete when the fsync() call returns to > the program. > > - An application program makes a sync() call. This causes all of the file > pages in memory that contain modified data to be scheduled for writing to > disk. The writing is not necessarily complete when the sync() call returns > to the program. > > - A user can enter the sync command, which in turn issues a sync() call. > Again, some of the writes may not be complete when the user is prompted for > input (or the next command in a shell script is processed). > > > *close() vs fclose()* > A successful close does not guarantee that the data has been successfully > saved to disk, as the kernel defers writes. It is not common for a file > system to flush the buffers when the stream is closed. If you need to be > sure that the data is > physically stored use fsync(2). (It will depend on the disk hardware at > this point.) > > > Mit freundlichen Gr??en / Kind regards > > *Alexander Saupp* > > IBM Systems, Storage Platform, EMEA Storage Competence Center > ------------------------------ > Phone: +49 7034-643-1512 IBM Deutschland GmbH > Mobile: +49-172 7251072 Am Weiher 24 > Email: *alexander.saupp at de.ibm.com* 65451 > Kelsterbach > Germany > ------------------------------ > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan > Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > *[attachment > "ecblank.gif" deleted by Olaf Weiser/Germany/IBM] [attachment > "19995626.gif" deleted by Olaf Weiser/Germany/IBM] [attachment > "ecblank.gif" deleted by Olaf Weiser/Germany/IBM] * > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Wed Oct 17 14:41:03 2018 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 17 Oct 2018 16:41:03 +0300 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Hi, Without going into to much details, AFAIR, Ontap integrate NVRAM into the NFS write cache ( as it was developed as a NAS product). Ontap is using the STABLE bit which kind of tell the client "hey, I have no write cache at all, everything is written to stable storage - thus, don't bother with commits ( sync) commands - they are meaningless". Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Keigo Matsubara" To: gpfsug main discussion list Date: 17/10/2018 16:35 Subject: Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Sent by: gpfsug-discuss-bounces at spectrumscale.org I also wonder how many products actually exploit NFS async mode to improve I/O performance by sacrificing the file system consistency risk: gpfsug-discuss-bounces at spectrumscale.org wrote on 2018/10/17 22:26:52: > Using this option usually improves performance, but at > the cost that an unclean server restart (i.e. a crash) can cause > data to be lost or corrupted." For instance, NetApp, at the very least FAS 3220 running Data OnTap 8.1.2p4 7-mode which I tested with, would forcibly *promote* async mode to sync mode. Promoting means even if NFS client requests async mount mode, the NFS server ignores and allows only sync mount mode. Best Regards, --- Keigo Matsubara, Storage Solutions Client Technical Specialist, IBM Japan TEL: +81-50-3150-0595, T/L: 6205-0595 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Wed Oct 17 14:42:02 2018 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Wed, 17 Oct 2018 15:42:02 +0200 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: <5508e483-25ef-d318-0c68-4009cb5871cc@ugent.be> hi all, has anyone tried to use tools like eatmydata that allow the user to "ignore" the syncs (there's another tool that has less explicit name if it would make you feel better ;). stijn On 10/17/2018 03:26 PM, Tomer Perry wrote: > Just to clarify ( from man exports): > " async This option allows the NFS server to violate the NFS protocol > and reply to requests before any changes made by that request have been > committed to stable storage (e.g. > disc drive). > > Using this option usually improves performance, but at the > cost that an unclean server restart (i.e. a crash) can cause data to be > lost or corrupted." > > With the Ganesha implementation in Spectrum Scale, it was decided not to > allow this violation - so this async export options wasn't exposed. > I believe that for those customers that agree to take the risk, using > async mount option ( from the client) will achieve similar behavior. > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: "Olaf Weiser" > To: gpfsug main discussion list > Date: 17/10/2018 16:16 > Subject: Re: [gpfsug-discuss] Preliminary conclusion: single > client, single thread, small files - native Scale vs NFS > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Jallo Jan, > you can expect to get slightly improved numbers from the lower response > times of the HAWC ... but the loss of performance comes from the fact, > that > GPFS or (async kNFS) writes with multiple parallel threads - in opposite > to e.g. tar via GaneshaNFS comes with single threads fsync on each file.. > > > you'll never outperform e.g. 128 (maybe slower), but, parallel threads > (running write-behind) <---> with one single but fast threads, .... > > so as Alex suggest.. if possible.. take gpfs client of kNFS for those > types of workloads.. > > > > > > > > > > > From: Jan-Frode Myklebust > To: gpfsug main discussion list > Date: 10/17/2018 02:24 PM > Subject: Re: [gpfsug-discuss] Preliminary conclusion: single > client, single thread, small files - native Scale vs NFS > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Do you know if the slow throughput is caused by the network/nfs-protocol > layer, or does it help to use faster storage (ssd)? If on storage, have > you considered if HAWC can help? > > I?m thinking about adding an SSD pool as a first tier to hold the active > dataset for a similar setup, but that?s mainly to solve the small file > read workload (i.e. random I/O ). > > > -jf > ons. 17. okt. 2018 kl. 07:47 skrev Alexander Saupp < > Alexander.Saupp at de.ibm.com>: > Dear Mailing List readers, > > I've come to a preliminary conclusion that explains the behavior in an > appropriate manner, so I'm trying to summarize my current thinking with > this audience. > > Problem statement: > Big performance derivation between native GPFS (fast) and loopback NFS > mount on the same node (way slower) for single client, single thread, > small files workload. > > > Current explanation: > tar seems to use close() on files, not fclose(). That is an application > choice and common behavior. The ideas is to allow OS write caching to > speed up process run time. > > When running locally on ext3 / xfs / GPFS / .. that allows async destaging > of data down to disk, somewhat compromising data for better performance. > As we're talking about write caching on the same node that the application > runs on - a crash is missfortune but in the same failure domain. > E.g. if you run a compile job that includes extraction of a tar and the > node crashes you'll have to restart the entire job, anyhow. > > The NFSv2 spec defined that NFS io's are to be 'sync', probably because > the compile job on the nfs client would survive if the NFS Server crashes, > so the failure domain would be different > > NFSv3 in rfc1813 below acknowledged the performance impact and introduced > the 'async' flag for NFS, which would handle IO's similar to local IOs, > allowing to destage in the background. > > Keep in mind - applications, independent if running locally or via NFS can > always decided to use the fclose() option, which will ensure that data is > destaged to persistent storage right away. > But its an applications choice if that's really mandatory or whether > performance has higher priority. > > The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache down > to disk - very filesystem independent. > > -> single client, single thread, small files workload on GPFS can be > destaged async, allowing to hide latency and parallelizing disk IOs. > -> NFS client IO's are sync, so the second IO can only be started after > the first one hit non volatile memory -> much higher latency > > > The Spectrum Scale NFS implementation (based on ganesha) does not support > the async mount option, which is a bit of a pitty. There might also be > implementation differences compared to kernel-nfs, I did not investigate > into that direction. > > However, the principles of the difference are explained for my by the > above behavior. > > One workaround that I saw working well for multiple customers was to > replace the NFS client by a Spectrum Scale nsd client. > That has two advantages, but is certainly not suitable in all cases: > - Improved speed by efficent NSD protocol and NSD client side write > caching > - Write Caching in the same failure domain as the application (on NSD > client) which seems to be more reasonable compared to NFS Server side > write caching. > > References: > > NFS sync vs async > https://tools.ietf.org/html/rfc1813 > The write throughput bottleneck caused by the synchronous definition of > write in the NFS version 2 protocol has been addressed by adding support > so that the NFS server can do unsafe writes. > Unsafe writes are writes which have not been committed to stable storage > before the operation returns. This specification defines a method for > committing these unsafe writes to stable storage in a reliable way. > > > sync() vs fsync() > https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm > > - An application program makes an fsync() call for a specified file. This > causes all of the pages that contain modified data for that file to be > written to disk. The writing is complete when the fsync() call returns to > the program. > > - An application program makes a sync() call. This causes all of the file > pages in memory that contain modified data to be scheduled for writing to > disk. The writing is not necessarily complete when the sync() call returns > to the program. > > - A user can enter the sync command, which in turn issues a sync() call. > Again, some of the writes may not be complete when the user is prompted > for input (or the next command in a shell script is processed). > > > close() vs fclose() > A successful close does not guarantee that the data has been successfully > saved to disk, as the kernel defers writes. It is not common for a file > system to flush the buffers when the stream is closed. If you need to be > sure that the data is > physically stored use fsync(2). (It will depend on the disk hardware at > this point.) > > > Mit freundlichen Gr??en / Kind regards > > Alexander Saupp > > IBM Systems, Storage Platform, EMEA Storage Competence Center > > > Phone: > +49 7034-643-1512 > IBM Deutschland GmbH > > Mobile: > +49-172 7251072 > Am Weiher 24 > Email: > alexander.saupp at de.ibm.com > 65451 Kelsterbach > > > Germany > > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan > Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "ecblank.gif" > deleted by Olaf Weiser/Germany/IBM] [attachment "19995626.gif" deleted by > Olaf Weiser/Germany/IBM] [attachment "ecblank.gif" deleted by Olaf > Weiser/Germany/IBM] _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From janfrode at tanso.net Wed Oct 17 14:50:38 2018 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 17 Oct 2018 09:50:38 -0400 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Also beware there are 2 different linux NFS "async" settings. A client side setting (mount -o async), which still cases sync on file close() -- and a server (knfs) side setting (/etc/exports) that violates NFS protocol and returns requests before data has hit stable storage. -jf On Wed, Oct 17, 2018 at 9:41 AM Tomer Perry wrote: > Hi, > > Without going into to much details, AFAIR, Ontap integrate NVRAM into the > NFS write cache ( as it was developed as a NAS product). > Ontap is using the STABLE bit which kind of tell the client "hey, I have > no write cache at all, everything is written to stable storage - thus, > don't bother with commits ( sync) commands - they are meaningless". > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: "Keigo Matsubara" > To: gpfsug main discussion list > Date: 17/10/2018 16:35 > Subject: Re: [gpfsug-discuss] Preliminary conclusion: single > client, single thread, small files - native Scale vs NFS > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > I also wonder how many products actually exploit NFS async mode to improve > I/O performance by sacrificing the file system consistency risk: > > gpfsug-discuss-bounces at spectrumscale.org wrote on 2018/10/17 22:26:52: > > Using this option usually improves performance, but at > > the cost that an unclean server restart (i.e. a crash) can cause > > data to be lost or corrupted." > > For instance, NetApp, at the very least FAS 3220 running Data OnTap > 8.1.2p4 7-mode which I tested with, would forcibly *promote* async mode to > sync mode. > Promoting means even if NFS client requests async mount mode, the NFS > server ignores and allows only sync mount mode. > > Best Regards, > --- > Keigo Matsubara, Storage Solutions Client Technical Specialist, IBM Japan > TEL: +81-50-3150-0595, T/L: 6205-0595 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Oct 17 17:22:05 2018 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 17 Oct 2018 09:22:05 -0700 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: <7E9A54A4-304E-42F7-BF4B-06EBC57503FE@gmail.com> while most said here is correct, it can?t explain the performance of 200 files /sec and I couldn?t resist jumping in here :-D lets assume for a second each operation is synchronous and its done by just 1 thread. 200 files / sec means 5 ms on average per file write. Lets be generous and say the network layer is 100 usec per roud-trip network hop (including code processing on protocol node or client) and for visualization lets assume the setup looks like this : ESS Node ---ethernet--- Protocol Node ?ethernet--- client Node . lets say the ESS write cache can absorb small io at a fixed cost of 300 usec if the heads are ethernet connected and not using IB (then it would be more in the 250 usec range). That?s 300 +100(net1) +100(net2) usec or 500 usec in total. So you are a factor 10 off from your number. So lets just assume a create + write is more than just 1 roundtrip worth or synchronization, lets say it needs to do 2 full roundtrips synchronously one for the create and one for the stable write that?s 1 ms, still 5x off of your 5 ms. So either there is a bug in the NFS Server, the NFS client or the storage is not behaving properly. To verify this, the best would be to run the following test : Create a file on the ESS node itself in the shared filesystem like : /usr/lpp/mmfs/samples/perf/gpfsperf create seq -nongpfs -r 4k -n 1m -th 1 -dio /sharedfs/test Now run the following command on one of the ESS nodes, then the protocol node and last the nfs client : /usr/lpp/mmfs/samples/perf/gpfsperf write seq -nongpfs -r 4k -n 1m -th 1 -dio /sharedfs/test This will create 256 stable 4k write i/os to the storage system, I picked the number just to get a statistical relevant number of i/os you can change 1m to 2m or 4m, just don?t make it too high or you might get variations due to de-staging or other side effects happening on the storage system, which you don?t care at this point you want to see the round trip time on each layer. The gpfsperf command will spit out a line like : Data rate was XYZ Kbytes/sec, Op Rate was XYZ Ops/sec, Avg Latency was 0.266 milliseconds, thread utilization 1.000, bytesTransferred 1048576 The only number here that matters is the average latency number , write it down. What I would expect to get back is something like : On ESS Node ? 300 usec average i/o On PN ? 400 usec average i/o On Client ? 500 usec average i/o If you get anything higher than the numbers above something fundamental is bad (in fact on fast system you may see from client no more than 200-300 usec response time) and it will be in the layer in between or below of where you test. If all the numbers are somewhere in line with my numbers above, it clearly points to a problem in NFS itself and the way it communicates with GPFS. Marc, myself and others have debugged numerous issues in this space in the past last one was fixed beginning of this year and ended up in some Scale 5.0.1.X release. To debug this is very hard and most of the time only possible with GPFS source code access which I no longer have. You would start with something like strace -Ttt -f -o tar-debug.out tar -xvf ?..? and check what exact system calls are made to nfs client and how long each takes. You would then run a similar strace on the NFS server to see how many individual system calls will be made to GPFS and how long each takes. This will allow you to narrow down where the issue really is. But I suggest to start with the simpler test above as this might already point to a much simpler problem. Btw. I will be also be speaking at the UG Meeting at SC18 in Dallas, in case somebody wants to catch up ? Sven From: on behalf of Jan-Frode Myklebust Reply-To: gpfsug main discussion list Date: Wednesday, October 17, 2018 at 6:50 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Also beware there are 2 different linux NFS "async" settings. A client side setting (mount -o async), which still cases sync on file close() -- and a server (knfs) side setting (/etc/exports) that violates NFS protocol and returns requests before data has hit stable storage. -jf On Wed, Oct 17, 2018 at 9:41 AM Tomer Perry wrote: Hi, Without going into to much details, AFAIR, Ontap integrate NVRAM into the NFS write cache ( as it was developed as a NAS product). Ontap is using the STABLE bit which kind of tell the client "hey, I have no write cache at all, everything is written to stable storage - thus, don't bother with commits ( sync) commands - they are meaningless". Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Keigo Matsubara" To: gpfsug main discussion list Date: 17/10/2018 16:35 Subject: Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Sent by: gpfsug-discuss-bounces at spectrumscale.org I also wonder how many products actually exploit NFS async mode to improve I/O performance by sacrificing the file system consistency risk: gpfsug-discuss-bounces at spectrumscale.org wrote on 2018/10/17 22:26:52: > Using this option usually improves performance, but at > the cost that an unclean server restart (i.e. a crash) can cause > data to be lost or corrupted." For instance, NetApp, at the very least FAS 3220 running Data OnTap 8.1.2p4 7-mode which I tested with, would forcibly *promote* async mode to sync mode. Promoting means even if NFS client requests async mount mode, the NFS server ignores and allows only sync mount mode. Best Regards, --- Keigo Matsubara, Storage Solutions Client Technical Specialist, IBM Japan TEL: +81-50-3150-0595, T/L: 6205-0595 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Oct 17 22:02:30 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 17 Oct 2018 21:02:30 +0000 Subject: [gpfsug-discuss] Job vacancy @Birmingham Message-ID: We're looking for someone to join our systems team here at University of Birmingham. In case you didn't realise, we're pretty reliant on Spectrum Scale to deliver our storage systems. https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117 Such a snappy URL :-) Feel free to email me *OFFLIST* if you have informal enquiries! Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Oct 18 10:14:51 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 18 Oct 2018 11:14:51 +0200 Subject: [gpfsug-discuss] Job vacancy @Birmingham In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From nathan.harper at cfms.org.uk Thu Oct 18 10:23:44 2018 From: nathan.harper at cfms.org.uk (Nathan Harper) Date: Thu, 18 Oct 2018 10:23:44 +0100 Subject: [gpfsug-discuss] Job vacancy @Birmingham In-Reply-To: References: Message-ID: Olaf - we don't need any reminders of Bre.. this morning On Thu, 18 Oct 2018 at 10:15, Olaf Weiser wrote: > Hi Simon .. > well - I would love to .. .but .. ;-) hey - what do you think, how long a > citizen from the EU can live (and work) in UK ;-) > don't take me too serious... see you soon, consider you invited for a > coffee for my rude comment .. ;-) > olaf > > > > > From: Simon Thompson > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 10/17/2018 11:02 PM > Subject: [gpfsug-discuss] Job vacancy @Birmingham > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We're looking for someone to join our systems team here at University of > Birmingham. In case you didn't realise, we're pretty reliant on Spectrum > Scale to deliver our storage systems. > > > https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117 > *https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117* > > > Such a snappy URL :-) > > Feel free to email me *OFFLIST* if you have informal enquiries! > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- *Nathan Harper* // IT Systems Lead *e: *nathan.harper at cfms.org.uk *t*: 0117 906 1104 *m*: 0787 551 0891 *w: *www.cfms.org.uk CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons Green // Bristol // BS16 7FR CFMS Services Ltd is registered in England and Wales No 05742022 - a subsidiary of CFMS Ltd CFMS Services Ltd registered office // 43 Queens Square // Bristol // BS1 4QP -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Thu Oct 18 16:32:43 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Thu, 18 Oct 2018 15:32:43 +0000 Subject: [gpfsug-discuss] Can't take snapshots while re-striping Message-ID: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London From alex at calicolabs.com Thu Oct 18 17:12:42 2018 From: alex at calicolabs.com (Alex Chekholko) Date: Thu, 18 Oct 2018 09:12:42 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: References: Message-ID: The re-striping uses a lot of I/O, so if your goal is user-facing performance, the re-striping is definitely hurting in the short term and is of questionable value in the long term, depending on how much churn there is on your filesystem. One way to split the difference would be to run your 'mmrestripe -b' midnight to 6am for many days; so it does not conflict with your snapshot. Or whatever other time you have lower user load. On Thu, Oct 18, 2018 at 8:32 AM Peter Childs wrote: > We've just added 9 raid volumes to our main storage, (5 Raid6 arrays > for data and 4 Raid1 arrays for metadata) > > We are now attempting to rebalance and our data around all the volumes. > > We started with the meta-data doing a "mmrestripe -r" as we'd changed > the failure groups to on our meta-data disks and wanted to ensure we > had all our metadata on known good ssd. No issues, here we could take > snapshots and I even tested it. (New SSD on new failure group and move > all old SSD to the same failure group) > > We're now doing a "mmrestripe -b" to rebalance the data accross all 21 > Volumes however when we attempt to take a snapshot, as we do every > night at 11pm it fails with > > sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test > Flushing dirty data for snapshot :test... > Quiescing all file system operations. > Unable to quiesce all nodes; some processes are busy or holding > required resources. > mmcrsnapshot: Command failed. Examine previous error messages to > determine cause. > > Are you meant to be able to take snapshots while re-striping or not? > > I know a rebalance of the data is probably unnecessary, but we'd like > to get the best possible speed out of the system, and we also kind of > like balance. > > Thanks > > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at calicolabs.com Thu Oct 18 17:12:42 2018 From: alex at calicolabs.com (Alex Chekholko) Date: Thu, 18 Oct 2018 09:12:42 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: References: Message-ID: The re-striping uses a lot of I/O, so if your goal is user-facing performance, the re-striping is definitely hurting in the short term and is of questionable value in the long term, depending on how much churn there is on your filesystem. One way to split the difference would be to run your 'mmrestripe -b' midnight to 6am for many days; so it does not conflict with your snapshot. Or whatever other time you have lower user load. On Thu, Oct 18, 2018 at 8:32 AM Peter Childs wrote: > We've just added 9 raid volumes to our main storage, (5 Raid6 arrays > for data and 4 Raid1 arrays for metadata) > > We are now attempting to rebalance and our data around all the volumes. > > We started with the meta-data doing a "mmrestripe -r" as we'd changed > the failure groups to on our meta-data disks and wanted to ensure we > had all our metadata on known good ssd. No issues, here we could take > snapshots and I even tested it. (New SSD on new failure group and move > all old SSD to the same failure group) > > We're now doing a "mmrestripe -b" to rebalance the data accross all 21 > Volumes however when we attempt to take a snapshot, as we do every > night at 11pm it fails with > > sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test > Flushing dirty data for snapshot :test... > Quiescing all file system operations. > Unable to quiesce all nodes; some processes are busy or holding > required resources. > mmcrsnapshot: Command failed. Examine previous error messages to > determine cause. > > Are you meant to be able to take snapshots while re-striping or not? > > I know a rebalance of the data is probably unnecessary, but we'd like > to get the best possible speed out of the system, and we also kind of > like balance. > > Thanks > > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Oct 18 17:13:52 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 18 Oct 2018 16:13:52 +0000 Subject: [gpfsug-discuss] Job vacancy @Birmingham In-Reply-To: References: Message-ID: <4B78CFBB-6B35-4914-A42D-5A66117DD588@vanderbilt.edu> Hi Nathan, Well, while I?m truly sorry for what you?re going thru, at least a majority of the voters in the UK did vote for it. Keep in mind that things could be worse. Some of us do happen to live in a country where a far worse thing has happened despite the fact that the majority of the voters were _against_ it?. ;-) Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Oct 18, 2018, at 4:23 AM, Nathan Harper > wrote: Olaf - we don't need any reminders of Bre.. this morning On Thu, 18 Oct 2018 at 10:15, Olaf Weiser > wrote: Hi Simon .. well - I would love to .. .but .. ;-) hey - what do you think, how long a citizen from the EU can live (and work) in UK ;-) don't take me too serious... see you soon, consider you invited for a coffee for my rude comment .. ;-) olaf From: Simon Thompson > To: "gpfsug-discuss at spectrumscale.org" > Date: 10/17/2018 11:02 PM Subject: [gpfsug-discuss] Job vacancy @Birmingham Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We're looking for someone to join our systems team here at University of Birmingham. In case you didn't realise, we're pretty reliant on Spectrum Scale to deliver our storage systems. https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117 Such a snappy URL :-) Feel free to email me *OFFLIST* if you have informal enquiries! Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Nathan Harper // IT Systems Lead e: nathan.harper at cfms.org.uk t: 0117 906 1104 m: 0787 551 0891 w: www.cfms.org.uk CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons Green // Bristol // BS16 7FR [http://cfms.org.uk/images/logo.png] CFMS Services Ltd is registered in England and Wales No 05742022 - a subsidiary of CFMS Ltd CFMS Services Ltd registered office // 43 Queens Square // Bristol // BS1 4QP _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ca552bcbb43b34c316b2808d634db7033%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636754514425052428&sdata=tErG6k2dNdqz%2Ffnc8eYtpyR%2Ba1Cb4AZ8n7WA%2Buv3oCw%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Oct 18 17:48:54 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 18 Oct 2018 16:48:54 +0000 Subject: [gpfsug-discuss] Reminder: Please keep discussion focused on GPFS/Scale Message-ID: <2A1399B8-441D-48E3-AACC-0BD3B0780A60@nuance.com> A gentle reminder to not left the discussions drift off topic, thanks. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Thu Oct 18 17:57:18 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 18 Oct 2018 16:57:18 +0000 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: Message-ID: And use QoS Less aggressive during peak, more on valleys. If your workload allows it. ? SENT FROM MOBILE DEVICE Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous > On 18 Oct 2018, at 19.13, Alex Chekholko wrote: > > The re-striping uses a lot of I/O, so if your goal is user-facing performance, the re-striping is definitely hurting in the short term and is of questionable value in the long term, depending on how much churn there is on your filesystem. > > One way to split the difference would be to run your 'mmrestripe -b' midnight to 6am for many days; so it does not conflict with your snapshot. Or whatever other time you have lower user load. > >> On Thu, Oct 18, 2018 at 8:32 AM Peter Childs wrote: >> We've just added 9 raid volumes to our main storage, (5 Raid6 arrays >> for data and 4 Raid1 arrays for metadata) >> >> We are now attempting to rebalance and our data around all the volumes. >> >> We started with the meta-data doing a "mmrestripe -r" as we'd changed >> the failure groups to on our meta-data disks and wanted to ensure we >> had all our metadata on known good ssd. No issues, here we could take >> snapshots and I even tested it. (New SSD on new failure group and move >> all old SSD to the same failure group) >> >> We're now doing a "mmrestripe -b" to rebalance the data accross all 21 >> Volumes however when we attempt to take a snapshot, as we do every >> night at 11pm it fails with >> >> sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test >> Flushing dirty data for snapshot :test... >> Quiescing all file system operations. >> Unable to quiesce all nodes; some processes are busy or holding >> required resources. >> mmcrsnapshot: Command failed. Examine previous error messages to >> determine cause. >> >> Are you meant to be able to take snapshots while re-striping or not? >> >> I know a rebalance of the data is probably unnecessary, but we'd like >> to get the best possible speed out of the system, and we also kind of >> like balance. >> >> Thanks >> >> >> -- >> Peter Childs >> ITS Research Storage >> Queen Mary, University of London >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Thu Oct 18 17:57:18 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 18 Oct 2018 16:57:18 +0000 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: Message-ID: And use QoS Less aggressive during peak, more on valleys. If your workload allows it. ? SENT FROM MOBILE DEVICE Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous > On 18 Oct 2018, at 19.13, Alex Chekholko wrote: > > The re-striping uses a lot of I/O, so if your goal is user-facing performance, the re-striping is definitely hurting in the short term and is of questionable value in the long term, depending on how much churn there is on your filesystem. > > One way to split the difference would be to run your 'mmrestripe -b' midnight to 6am for many days; so it does not conflict with your snapshot. Or whatever other time you have lower user load. > >> On Thu, Oct 18, 2018 at 8:32 AM Peter Childs wrote: >> We've just added 9 raid volumes to our main storage, (5 Raid6 arrays >> for data and 4 Raid1 arrays for metadata) >> >> We are now attempting to rebalance and our data around all the volumes. >> >> We started with the meta-data doing a "mmrestripe -r" as we'd changed >> the failure groups to on our meta-data disks and wanted to ensure we >> had all our metadata on known good ssd. No issues, here we could take >> snapshots and I even tested it. (New SSD on new failure group and move >> all old SSD to the same failure group) >> >> We're now doing a "mmrestripe -b" to rebalance the data accross all 21 >> Volumes however when we attempt to take a snapshot, as we do every >> night at 11pm it fails with >> >> sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test >> Flushing dirty data for snapshot :test... >> Quiescing all file system operations. >> Unable to quiesce all nodes; some processes are busy or holding >> required resources. >> mmcrsnapshot: Command failed. Examine previous error messages to >> determine cause. >> >> Are you meant to be able to take snapshots while re-striping or not? >> >> I know a rebalance of the data is probably unnecessary, but we'd like >> to get the best possible speed out of the system, and we also kind of >> like balance. >> >> Thanks >> >> >> -- >> Peter Childs >> ITS Research Storage >> Queen Mary, University of London >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dwayne.Hart at med.mun.ca Thu Oct 18 18:19:21 2018 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Thu, 18 Oct 2018 17:19:21 +0000 Subject: [gpfsug-discuss] Best way to migrate data Message-ID: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Hi, Just wondering what the best recipe for migrating a user?s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I?m currently using rsync and it has maxed out the client system?s IB interface. Best, Dwayne ? Dwayne Hart | Systems Administrator IV CHIA, Faculty of Medicine Memorial University of Newfoundland 300 Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L Dobbin Building | 4M409 T 709 864 6631 From S.J.Thompson at bham.ac.uk Thu Oct 18 18:44:11 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 18 Oct 2018 17:44:11 +0000 Subject: [gpfsug-discuss] Quick survey: ACL Posix vs NFS V4 In-Reply-To: <8120950808e344e280ae211ff22ba0bf@welwex02.niwa.local> References: <8120950808e344e280ae211ff22ba0bf@welwex02.niwa.local> Message-ID: Just following up this thread ... We use v4 ACLs, in part because we also export via SMB as well. Note that we do also use the fileset option "chmodAndUpdateAcl" Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Fabrice.Cantos at niwa.co.nz [Fabrice.Cantos at niwa.co.nz] Sent: 10 October 2018 22:57 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Quick survey: ACL Posix vs NFS V4 I would be interested to know what you chose for your filesystems and user/project space directories: * Traditional Posix ACL * NFS V4 ACL What did motivate your choice? We are facing some issues to get the correct NFS ACL to keep correct attributes for new files created. Thanks Fabrice [cid:image4cef17.PNG at 18c66b76.4480e036] Fabrice Cantos HPC Systems Engineer Group Manager ? High Performance Computing T +64-4-386-0367 M +64-27-412-9693 National Institute of Water & Atmospheric Research Ltd (NIWA) 301 Evans Bay Parade, Greta Point, Wellington Connect with NIWA: niwa.co.nz Facebook Twitter LinkedIn Instagram To ensure compliance with legal requirements and to maintain cyber security standards, NIWA's IT systems are subject to ongoing monitoring, activity logging and auditing. This monitoring and auditing service may be provided by third parties. Such third parties can access information transmitted to, processed by and stored on NIWA's IT systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image4cef17.PNG Type: image/png Size: 12288 bytes Desc: image4cef17.PNG URL: From frederik.ferner at diamond.ac.uk Thu Oct 18 18:54:32 2018 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Thu, 18 Oct 2018 18:54:32 +0100 Subject: [gpfsug-discuss] Quick survey: ACL Posix vs NFS V4 In-Reply-To: <8120950808e344e280ae211ff22ba0bf@welwex02.niwa.local> References: <8120950808e344e280ae211ff22ba0bf@welwex02.niwa.local> Message-ID: <595d0584-df41-a731-ac08-6bba81dbdb31@diamond.ac.uk> On 10/10/18 22:57, Fabrice Cantos wrote: > I would be interested to know what you chose for your filesystems and > user/project space directories: > > * Traditional Posix ACL > * NFS V4 ACL We use traditional Posix ACLs almost exclusively. The main exception is some directories on Spectrum Scale where Windows machines with native Spectrum Scale support create files and directories. There our scripts set Posix ACLs which are respected on Windows but automatically converted to NFS V4 ACLs on new files and directories by the file system. > What did motivate your choice? Mainly that our use of ACLs goes back way longer than our use of GPFS/Spectrum Scale and we also have other file systems which do not support NFSv4 ACLs. Keeping knowledge and script on one set of ACLs fresh within the team is easier. Additional headache comes because as we all know Posix ACLs and NFS V4 ACLs don't translate exactly. > We are facing some issues to get the correct NFS ACL to keep correct > attributes for new files created. Is this using kernel NFSd or Ganesha (CES)? Frederik -- Frederik Ferner Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 Duty Sys Admin can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom From oehmes at gmail.com Thu Oct 18 19:09:56 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 18 Oct 2018 11:09:56 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping Message-ID: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Thu Oct 18 19:09:56 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 18 Oct 2018 11:09:56 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping Message-ID: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Thu Oct 18 19:26:10 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 18 Oct 2018 18:26:10 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Message-ID: Hi Dwayne, I?m assuming you can?t just let an rsync run, possibly throttled in some way? If not, and if you?re just tapping out your network, then would it be possible to go old school? We have parts of the Medical Center here where their network connections are ? um, less than robust. So they tar stuff up to a portable HD, sneaker net it to us, and we untar is from an NSD server. HTH, and I really hope that someone has a better idea than that! Kevin > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote: > > Hi, > > Just wondering what the best recipe for migrating a user?s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I?m currently using rsync and it has maxed out the client system?s IB interface. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ccca728d2d61f4be06bcd08d6351f3650%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636754805507359478&sdata=2YAiqgqKl4CerlyCn3vJ9v9u%2FrGzbfa7aKxJ0PYV%2Fhc%3D&reserved=0 From p.childs at qmul.ac.uk Thu Oct 18 19:50:42 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Thu, 18 Oct 2018 18:50:42 +0000 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: Thanks Sven, that's one of the best answers I've seen and probably closer to why we sometimes can't take snapshots under normal circumstances as well. We're currently running the restripe with "-N " so it only runs on a few nodes and does not disturb the work of the cluster, which is why we hadn't noticed it slow down the storage too much. I've also tried to put some qos settings on it too, I always find the qos a little bit "trial and error" but 30,000Iops looks to be making the rebalance run at about 2/3 iops it was using with no qos limit...... Just out of interest which version do I need to be running for "mmchqos -N" to work? I tried it to limit a set of nodes and it says not supported by my filesystem version. Manual does not look to say. Even with a very, very small value for qos on maintenance tasks, I still can't take snapshots so as Sven says the buffers are getting dirty too quickly. I have thought before that making snapshot taking more reliable would be nice, I'd not really thought it would be possible, I guess its time to write another RFE. Peter Childs Research Storage ITS Research Infrastructure Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Thursday, October 18, 2018 7:09:56 PM To: gpfsug main discussion list; gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From p.childs at qmul.ac.uk Thu Oct 18 19:50:42 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Thu, 18 Oct 2018 18:50:42 +0000 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: Thanks Sven, that's one of the best answers I've seen and probably closer to why we sometimes can't take snapshots under normal circumstances as well. We're currently running the restripe with "-N " so it only runs on a few nodes and does not disturb the work of the cluster, which is why we hadn't noticed it slow down the storage too much. I've also tried to put some qos settings on it too, I always find the qos a little bit "trial and error" but 30,000Iops looks to be making the rebalance run at about 2/3 iops it was using with no qos limit...... Just out of interest which version do I need to be running for "mmchqos -N" to work? I tried it to limit a set of nodes and it says not supported by my filesystem version. Manual does not look to say. Even with a very, very small value for qos on maintenance tasks, I still can't take snapshots so as Sven says the buffers are getting dirty too quickly. I have thought before that making snapshot taking more reliable would be nice, I'd not really thought it would be possible, I guess its time to write another RFE. Peter Childs Research Storage ITS Research Infrastructure Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Thursday, October 18, 2018 7:09:56 PM To: gpfsug main discussion list; gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Paul.Sanchez at deshaw.com Thu Oct 18 19:47:31 2018 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Thu, 18 Oct 2018 18:47:31 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Message-ID: <8ec08801d1de486facbd6c3318e62d63@mbxtoa1.winmail.deshaw.com> Sharding can also work, if you have a storage-connected compute grid in your environment: If you enumerate all of the directories, then use a non-recursive rsync for each one, you may be able to parallelize the workload by using several clients simultaneously. It may still max out the links of these clients (assuming your source read throughput and target write throughput bottlenecks aren't encountered first) but it may run that way for 1/100th of the time if you can use 100+ machines. -Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Buterbaugh, Kevin L Sent: Thursday, October 18, 2018 2:26 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Hi Dwayne, I?m assuming you can?t just let an rsync run, possibly throttled in some way? If not, and if you?re just tapping out your network, then would it be possible to go old school? We have parts of the Medical Center here where their network connections are ? um, less than robust. So they tar stuff up to a portable HD, sneaker net it to us, and we untar is from an NSD server. HTH, and I really hope that someone has a better idea than that! Kevin > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote: > > Hi, > > Just wondering what the best recipe for migrating a user?s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I?m currently using rsync and it has maxed out the client system?s IB interface. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ccca728d2d61f4be06bcd08d6351f3650%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636754805507359478&sdata=2YAiqgqKl4CerlyCn3vJ9v9u%2FrGzbfa7aKxJ0PYV%2Fhc%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Thu Oct 18 20:18:37 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 18 Oct 2018 12:18:37 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: <47DF6EDF-CA0C-4EBB-851A-1D3603F8B0C5@gmail.com> I don't know which min FS version you need to make use of -N, but there is this Marc guy watching the mailing list who would know __ Sven ?On 10/18/18, 11:50 AM, "Peter Childs" wrote: Thanks Sven, that's one of the best answers I've seen and probably closer to why we sometimes can't take snapshots under normal circumstances as well. We're currently running the restripe with "-N " so it only runs on a few nodes and does not disturb the work of the cluster, which is why we hadn't noticed it slow down the storage too much. I've also tried to put some qos settings on it too, I always find the qos a little bit "trial and error" but 30,000Iops looks to be making the rebalance run at about 2/3 iops it was using with no qos limit...... Just out of interest which version do I need to be running for "mmchqos -N" to work? I tried it to limit a set of nodes and it says not supported by my filesystem version. Manual does not look to say. Even with a very, very small value for qos on maintenance tasks, I still can't take snapshots so as Sven says the buffers are getting dirty too quickly. I have thought before that making snapshot taking more reliable would be nice, I'd not really thought it would be possible, I guess its time to write another RFE. Peter Childs Research Storage ITS Research Infrastructure Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Thursday, October 18, 2018 7:09:56 PM To: gpfsug main discussion list; gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Thu Oct 18 20:18:37 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 18 Oct 2018 12:18:37 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: <47DF6EDF-CA0C-4EBB-851A-1D3603F8B0C5@gmail.com> I don't know which min FS version you need to make use of -N, but there is this Marc guy watching the mailing list who would know __ Sven ?On 10/18/18, 11:50 AM, "Peter Childs" wrote: Thanks Sven, that's one of the best answers I've seen and probably closer to why we sometimes can't take snapshots under normal circumstances as well. We're currently running the restripe with "-N " so it only runs on a few nodes and does not disturb the work of the cluster, which is why we hadn't noticed it slow down the storage too much. I've also tried to put some qos settings on it too, I always find the qos a little bit "trial and error" but 30,000Iops looks to be making the rebalance run at about 2/3 iops it was using with no qos limit...... Just out of interest which version do I need to be running for "mmchqos -N" to work? I tried it to limit a set of nodes and it says not supported by my filesystem version. Manual does not look to say. Even with a very, very small value for qos on maintenance tasks, I still can't take snapshots so as Sven says the buffers are getting dirty too quickly. I have thought before that making snapshot taking more reliable would be nice, I'd not really thought it would be possible, I guess its time to write another RFE. Peter Childs Research Storage ITS Research Infrastructure Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Thursday, October 18, 2018 7:09:56 PM To: gpfsug main discussion list; gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From cblack at nygenome.org Thu Oct 18 20:13:29 2018 From: cblack at nygenome.org (Christopher Black) Date: Thu, 18 Oct 2018 19:13:29 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <8ec08801d1de486facbd6c3318e62d63@mbxtoa1.winmail.deshaw.com> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <8ec08801d1de486facbd6c3318e62d63@mbxtoa1.winmail.deshaw.com> Message-ID: <6FECF7F6-57E6-4164-BAF4-8ACF39453C59@nygenome.org> Other tools and approaches that we've found helpful: msrsync: handles parallelizing rsync within a dir tree and can greatly speed up transfers on a single node with both filesystems mounted, especially when dealing with many small files Globus/GridFTP: set up one or more endpoints on each side, gridftp will auto parallelize and recover from disruptions msrsync is easier to get going but is limited to one parent dir per node. We've sometimes done an additional level of parallelization by running msrsync with different top level directories on different hpc nodes simultaneously. Best, Chris Refs: https://github.com/jbd/msrsync https://www.globus.org/ ?On 10/18/18, 2:54 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Sanchez, Paul" wrote: Sharding can also work, if you have a storage-connected compute grid in your environment: If you enumerate all of the directories, then use a non-recursive rsync for each one, you may be able to parallelize the workload by using several clients simultaneously. It may still max out the links of these clients (assuming your source read throughput and target write throughput bottlenecks aren't encountered first) but it may run that way for 1/100th of the time if you can use 100+ machines. -Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Buterbaugh, Kevin L Sent: Thursday, October 18, 2018 2:26 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Hi Dwayne, I?m assuming you can?t just let an rsync run, possibly throttled in some way? If not, and if you?re just tapping out your network, then would it be possible to go old school? We have parts of the Medical Center here where their network connections are ? um, less than robust. So they tar stuff up to a portable HD, sneaker net it to us, and we untar is from an NSD server. HTH, and I really hope that someone has a better idea than that! Kevin > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote: > > Hi, > > Just wondering what the best recipe for migrating a user?s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I?m currently using rsync and it has maxed out the client system?s IB interface. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss-26amp-3Bdata-3D02-257C01-257CKevin.Buterbaugh-2540vanderbilt.edu-257Ccca728d2d61f4be06bcd08d6351f3650-257Cba5a7f39e3be4ab3b45067fa80faecad-257C0-257C0-257C636754805507359478-26amp-3Bsdata-3D2YAiqgqKl4CerlyCn3vJ9v9u-252FrGzbfa7aKxJ0PYV-252Fhc-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=NVJncSq-SKJSPgljdYqLDoy753jhxiKJNI2M8CexJME&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. From makaplan at us.ibm.com Thu Oct 18 20:30:21 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 18 Oct 2018 15:30:21 -0400 Subject: [gpfsug-discuss] Can't take snapshots while re-striping - "mmchqos -N" In-Reply-To: References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: I believe `mmchqos ... -N ... ` is supported at 4.2.2 and later. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Oct 18 20:30:21 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 18 Oct 2018 15:30:21 -0400 Subject: [gpfsug-discuss] Can't take snapshots while re-striping - "mmchqos -N" In-Reply-To: References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: I believe `mmchqos ... -N ... ` is supported at 4.2.2 and later. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dwayne.Hart at med.mun.ca Thu Oct 18 21:05:50 2018 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Thu, 18 Oct 2018 20:05:50 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <6FECF7F6-57E6-4164-BAF4-8ACF39453C59@nygenome.org> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <8ec08801d1de486facbd6c3318e62d63@mbxtoa1.winmail.deshaw.com> <6FECF7F6-57E6-4164-BAF4-8ACF39453C59@nygenome.org> Message-ID: Thank you all for the responses. I'm currently using msrsync and things appear to be going very well. The data transfer is contained inside our DC. I'm transferring a user's home directory content from one GPFS file system to another. Our IBM Spectrum Scale Solution consists of 12 IO nodes connected to IB and the client node that I'm transferring the data from one fs to another is also connected to IB with a possible maximum of 2 hops. [root at client-system]# /gpfs/home/dwayne/bin/msrsync -P --stats -p 32 /gpfs/home/user/ /research/project/user/ [64756/992397 entries] [30.1 T/239.6 T transferred] [81 entries/s] [39.0 G/s bw] [monq 0] [jq 62043] Best, Dwayne -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christopher Black Sent: Thursday, October 18, 2018 4:43 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Other tools and approaches that we've found helpful: msrsync: handles parallelizing rsync within a dir tree and can greatly speed up transfers on a single node with both filesystems mounted, especially when dealing with many small files Globus/GridFTP: set up one or more endpoints on each side, gridftp will auto parallelize and recover from disruptions msrsync is easier to get going but is limited to one parent dir per node. We've sometimes done an additional level of parallelization by running msrsync with different top level directories on different hpc nodes simultaneously. Best, Chris Refs: https://github.com/jbd/msrsync https://www.globus.org/ ?On 10/18/18, 2:54 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Sanchez, Paul" wrote: Sharding can also work, if you have a storage-connected compute grid in your environment: If you enumerate all of the directories, then use a non-recursive rsync for each one, you may be able to parallelize the workload by using several clients simultaneously. It may still max out the links of these clients (assuming your source read throughput and target write throughput bottlenecks aren't encountered first) but it may run that way for 1/100th of the time if you can use 100+ machines. -Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Buterbaugh, Kevin L Sent: Thursday, October 18, 2018 2:26 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Hi Dwayne, I?m assuming you can?t just let an rsync run, possibly throttled in some way? If not, and if you?re just tapping out your network, then would it be possible to go old school? We have parts of the Medical Center here where their network connections are ? um, less than robust. So they tar stuff up to a portable HD, sneaker net it to us, and we untar is from an NSD server. HTH, and I really hope that someone has a better idea than that! Kevin > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote: > > Hi, > > Just wondering what the best recipe for migrating a user?s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I?m currently using rsync and it has maxed out the client system?s IB interface. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss-26amp-3Bdata-3D02-257C01-257CKevin.Buterbaugh-2540vanderbilt.edu-257Ccca728d2d61f4be06bcd08d6351f3650-257Cba5a7f39e3be4ab3b45067fa80faecad-257C0-257C0-257C636754805507359478-26amp-3Bsdata-3D2YAiqgqKl4CerlyCn3vJ9v9u-252FrGzbfa7aKxJ0PYV-252Fhc-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=NVJncSq-SKJSPgljdYqLDoy753jhxiKJNI2M8CexJME&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mutantllama at gmail.com Thu Oct 18 21:54:42 2018 From: mutantllama at gmail.com (Carl) Date: Fri, 19 Oct 2018 07:54:42 +1100 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <8ec08801d1de486facbd6c3318e62d63@mbxtoa1.winmail.deshaw.com> <6FECF7F6-57E6-4164-BAF4-8ACF39453C59@nygenome.org> Message-ID: It may be overkill for your use case but MPI file utils is very good for large datasets. https://github.com/hpc/mpifileutils Cheers, Carl. On Fri, 19 Oct 2018 at 7:05 am, wrote: > Thank you all for the responses. I'm currently using msrsync and things > appear to be going very well. > > The data transfer is contained inside our DC. I'm transferring a user's > home directory content from one GPFS file system to another. Our IBM > Spectrum Scale Solution consists of 12 IO nodes connected to IB and the > client node that I'm transferring the data from one fs to another is also > connected to IB with a possible maximum of 2 hops. > > [root at client-system]# /gpfs/home/dwayne/bin/msrsync -P --stats -p 32 > /gpfs/home/user/ /research/project/user/ > [64756/992397 entries] [30.1 T/239.6 T transferred] [81 entries/s] [39.0 > G/s bw] [monq 0] [jq 62043] > > Best, > Dwayne > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christopher Black > Sent: Thursday, October 18, 2018 4:43 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Best way to migrate data > > Other tools and approaches that we've found helpful: > msrsync: handles parallelizing rsync within a dir tree and can greatly > speed up transfers on a single node with both filesystems mounted, > especially when dealing with many small files > Globus/GridFTP: set up one or more endpoints on each side, gridftp will > auto parallelize and recover from disruptions > > msrsync is easier to get going but is limited to one parent dir per node. > We've sometimes done an additional level of parallelization by running > msrsync with different top level directories on different hpc nodes > simultaneously. > > Best, > Chris > > Refs: > https://github.com/jbd/msrsync > https://www.globus.org/ > > ?On 10/18/18, 2:54 PM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of Sanchez, Paul" behalf of Paul.Sanchez at deshaw.com> wrote: > > Sharding can also work, if you have a storage-connected compute grid > in your environment: If you enumerate all of the directories, then use a > non-recursive rsync for each one, you may be able to parallelize the > workload by using several clients simultaneously. It may still max out the > links of these clients (assuming your source read throughput and target > write throughput bottlenecks aren't encountered first) but it may run that > way for 1/100th of the time if you can use 100+ machines. > > -Paul > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Buterbaugh, Kevin L > Sent: Thursday, October 18, 2018 2:26 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Best way to migrate data > > Hi Dwayne, > > I?m assuming you can?t just let an rsync run, possibly throttled in > some way? If not, and if you?re just tapping out your network, then would > it be possible to go old school? We have parts of the Medical Center here > where their network connections are ? um, less than robust. So they tar > stuff up to a portable HD, sneaker net it to us, and we untar is from an > NSD server. > > HTH, and I really hope that someone has a better idea than that! > > Kevin > > > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote: > > > > Hi, > > > > Just wondering what the best recipe for migrating a user?s home > directory content from one GFPS file system to another which hosts a larger > research GPFS file system? I?m currently using rsync and it has maxed out > the client system?s IB interface. > > > > Best, > > Dwayne > > ? > > Dwayne Hart | Systems Administrator IV > > > > CHIA, Faculty of Medicine > > Memorial University of Newfoundland > > 300 Prince Philip Drive > > St. John?s, Newfoundland | A1B 3V6 > > Craig L Dobbin Building | 4M409 > > T 709 864 6631 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss-26amp-3Bdata-3D02-257C01-257CKevin.Buterbaugh-2540vanderbilt.edu-257Ccca728d2d61f4be06bcd08d6351f3650-257Cba5a7f39e3be4ab3b45067fa80faecad-257C0-257C0-257C636754805507359478-26amp-3Bsdata-3D2YAiqgqKl4CerlyCn3vJ9v9u-252FrGzbfa7aKxJ0PYV-252Fhc-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=NVJncSq-SKJSPgljdYqLDoy753jhxiKJNI2M8CexJME&e= > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= > > > ________________________________ > > This message is for the recipient?s use only, and may contain > confidential, privileged or protected information. Any unauthorized use or > dissemination of this communication is prohibited. If you received this > message in error, please immediately notify the sender and destroy all > copies of this message. The recipient should check this email and any > attachments for the presence of viruses, as we accept no liability for any > damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Oct 19 10:09:13 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 19 Oct 2018 10:09:13 +0100 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Message-ID: On 18/10/2018 18:19, Dwayne.Hart at med.mun.ca wrote: > Hi, > > Just wondering what the best recipe for migrating a user?s home > directory content from one GFPS file system to another which hosts a > larger research GPFS file system? I?m currently using rsync and it > has maxed out the client system?s IB interface. > Be careful with rsync, it resets all your atimes which screws up any hope of doing ILM or HSM. My personal favourite is to do something along the lines of dsmc restore /gpfs/ Minimal impact on the user facing services, and seems to preserve atimes last time I checked. Sure it tanks your backup server a bit, but that is not user facing. What do users care if the backup takes longer than normal. Of course this presumes you have a backup :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From novosirj at rutgers.edu Thu Oct 18 21:04:36 2018 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Thu, 18 Oct 2018 20:04:36 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Message-ID: <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We use parsyncfp. Our target is not GPFS, though. I was really hoping to hear about something snazzier for GPFS-GPFS. Lenovo would probably tell you that HSM is the way to go (we asked something similar for a replacement for our current setup or for distributed storage). On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: > Hi, > > Just wondering what the best recipe for migrating a user?s home > directory content from one GFPS file system to another which hosts > a larger research GPFS file system? I?m currently using rsync and > it has maxed out the client system?s IB interface. > > Best, Dwayne ? Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine Memorial University of Newfoundland 300 > Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L > Dobbin Building | 4M409 T 709 864 6631 > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk =dMDg -----END PGP SIGNATURE----- From Dwayne.Hart at med.mun.ca Fri Oct 19 11:15:15 2018 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Fri, 19 Oct 2018 10:15:15 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Message-ID: Hi JAB, We do not have either ILM or HSM. Thankfully, we have at minimum IBM Spectrum Protect (I recently updated the system to version 8.1.5). It would be an interesting exercise to see how long it would take IBM SP to restore a user's content fully to a different target. I have done some smaller recoveries so I know that the system is in a usable state ;) Best, Dwayne -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathan Buzzard Sent: Friday, October 19, 2018 6:39 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Best way to migrate data On 18/10/2018 18:19, Dwayne.Hart at med.mun.ca wrote: > Hi, > > Just wondering what the best recipe for migrating a user?s home > directory content from one GFPS file system to another which hosts a > larger research GPFS file system? I?m currently using rsync and it has > maxed out the client system?s IB interface. > Be careful with rsync, it resets all your atimes which screws up any hope of doing ILM or HSM. My personal favourite is to do something along the lines of dsmc restore /gpfs/ Minimal impact on the user facing services, and seems to preserve atimes last time I checked. Sure it tanks your backup server a bit, but that is not user facing. What do users care if the backup takes longer than normal. Of course this presumes you have a backup :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Dwayne.Hart at med.mun.ca Fri Oct 19 11:37:13 2018 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Fri, 19 Oct 2018 10:37:13 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca>, <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> Message-ID: Thank you Ryan. I?ll have a more in-depth look at this application later today and see how it deals with some of the large genetic files that are generated by the sequencer. By copying it from GPFS fs to another GPFS fs. Best, Dwayne ? Dwayne Hart | Systems Administrator IV CHIA, Faculty of Medicine Memorial University of Newfoundland 300 Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L Dobbin Building | 4M409 T 709 864 6631 > On Oct 19, 2018, at 7:04 AM, Ryan Novosielski wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > We use parsyncfp. Our target is not GPFS, though. I was really hoping > to hear about something snazzier for GPFS-GPFS. Lenovo would probably > tell you that HSM is the way to go (we asked something similar for a > replacement for our current setup or for distributed storage). > >> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: >> Hi, >> >> Just wondering what the best recipe for migrating a user?s home >> directory content from one GFPS file system to another which hosts >> a larger research GPFS file system? I?m currently using rsync and >> it has maxed out the client system?s IB interface. >> >> Best, Dwayne ? Dwayne Hart | Systems Administrator IV >> >> CHIA, Faculty of Medicine Memorial University of Newfoundland 300 >> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L >> Dobbin Building | 4M409 T 709 864 6631 >> _______________________________________________ gpfsug-discuss >> mailing list gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > - -- > ____ > || \\UTGERS, |----------------------*O*------------------------ > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark > `' > -----BEGIN PGP SIGNATURE----- > > iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG > p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk > =dMDg > -----END PGP SIGNATURE----- > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Fri Oct 19 11:41:15 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 19 Oct 2018 10:41:15 +0000 Subject: [gpfsug-discuss] Spectrum Scale and Firewalls Message-ID: <10239ED8-7E0D-4420-8BEC-F17F0606BE64@bham.ac.uk> Hi, We?re having some issues bringing up firewalls on some of our NSD nodes. The problem I was actually trying to diagnose I don?t think is firewall related but still ? We have port 22 and 1191 open and also 60000-61000, we also set: # mmlsconfig tscTcpPort tscTcpPort 1191 # mmlsconfig tscCmdPortRange tscCmdPortRange 60000-61000 https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_firewallforinternalcommn.htm Claims this is sufficient ? Running mmnetverify: # mmnetverify all --target-nodes rds-er-mgr01 rds-pg-mgr01 checking local configuration. Operation interface: Success. rds-pg-mgr01 checking communication with node rds-er-mgr01. Operation resolution: Success. Operation ping: Success. Operation shell: Success. Operation copy: Success. Operation time: Success. Operation daemon-port: Success. Operation sdrserv-port: Success. Operation tsccmd-port: Success. Operation data-small: Success. Operation data-medium: Success. Operation data-large: Success. Could not connect to port 46326 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. rds-pg-mgr01 checking cluster communications. Issues Found: rds-er-mgr01 could not connect to rds-pg-mgr01 (TCP, port 46326). mmnetverify: Command failed. Examine previous error messages to determine cause. Note that the port number mentioned changes if we run mmnetverify multiple times. The two clients in this test are running 5.0.2 code. If I run in verbose mode I see: Checking network communication with node rds-er-mgr01. Port range restricted by cluster configuration: 60000 - 61000. rds-er-mgr01: connecting to node rds-pg-mgr01. rds-er-mgr01: exchanged 256.0M bytes with rds-pg-mgr01. Write size: 16.0M bytes. Network statistics for rds-er-mgr01 during data exchange: packets sent: 68112 packets received: 72452 Network Traffic between rds-er-mgr01 and rds-pg-mgr01 port 60000 ok. Operation data-large: Success. Checking network bandwidth. rds-er-mgr01: connecting to node rds-pg-mgr01. Could not connect to port 36277 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. So for many of the tests it looks like its using port 60000 as expected, is this just a bug in mmnetverify or am I doing something silly? Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Oct 19 14:05:22 2018 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 19 Oct 2018 09:05:22 -0400 Subject: [gpfsug-discuss] Spectrum Scale and Firewalls In-Reply-To: <10239ED8-7E0D-4420-8BEC-F17F0606BE64@bham.ac.uk> References: <10239ED8-7E0D-4420-8BEC-F17F0606BE64@bham.ac.uk> Message-ID: Simon, Depending on what functions are being used in Scale, other ports may also get used, as documented in https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_firewall.htm On the other hand, I'd initially speculate that you might be hitting a problem in mmnetverify itself. (perhaps some aspect in mmnetverify is not taking into account that ports other than 22, 1191, 60000-61000 may be getting blocked by the firewall) Could you open a PMR for this one? Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 10/19/2018 06:41 AM Subject: [gpfsug-discuss] Spectrum Scale and Firewalls Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We?re having some issues bringing up firewalls on some of our NSD nodes. The problem I was actually trying to diagnose I don?t think is firewall related but still ? We have port 22 and 1191 open and also 60000-61000, we also set: # mmlsconfig tscTcpPort tscTcpPort 1191 # mmlsconfig tscCmdPortRange tscCmdPortRange 60000-61000 https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_firewallforinternalcommn.htm Claims this is sufficient ? Running mmnetverify: # mmnetverify all --target-nodes rds-er-mgr01 rds-pg-mgr01 checking local configuration. Operation interface: Success. rds-pg-mgr01 checking communication with node rds-er-mgr01. Operation resolution: Success. Operation ping: Success. Operation shell: Success. Operation copy: Success. Operation time: Success. Operation daemon-port: Success. Operation sdrserv-port: Success. Operation tsccmd-port: Success. Operation data-small: Success. Operation data-medium: Success. Operation data-large: Success. Could not connect to port 46326 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. rds-pg-mgr01 checking cluster communications. Issues Found: rds-er-mgr01 could not connect to rds-pg-mgr01 (TCP, port 46326). mmnetverify: Command failed. Examine previous error messages to determine cause. Note that the port number mentioned changes if we run mmnetverify multiple times. The two clients in this test are running 5.0.2 code. If I run in verbose mode I see: Checking network communication with node rds-er-mgr01. Port range restricted by cluster configuration: 60000 - 61000. rds-er-mgr01: connecting to node rds-pg-mgr01. rds-er-mgr01: exchanged 256.0M bytes with rds-pg-mgr01. Write size: 16.0M bytes. Network statistics for rds-er-mgr01 during data exchange: packets sent: 68112 packets received: 72452 Network Traffic between rds-er-mgr01 and rds-pg-mgr01 port 60000 ok. Operation data-large: Success. Checking network bandwidth. rds-er-mgr01: connecting to node rds-pg-mgr01. Could not connect to port 36277 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. So for many of the tests it looks like its using port 60000 as expected, is this just a bug in mmnetverify or am I doing something silly? Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Fri Oct 19 14:39:25 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 19 Oct 2018 13:39:25 +0000 Subject: [gpfsug-discuss] Spectrum Scale and Firewalls In-Reply-To: References: <10239ED8-7E0D-4420-8BEC-F17F0606BE64@bham.ac.uk> Message-ID: Yeah we have the perfmon ports open, and GUI ports open on the GUI nodes. But basically this is just a storage cluster and everything else (protocols etc) run in remote clusters. I?ve just opened a ticket ? no longer a PMR in the new support centre for Scale Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 19 October 2018 at 14:05 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Spectrum Scale and Firewalls Simon, Depending on what functions are being used in Scale, other ports may also get used, as documented in https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_firewall.htm On the other hand, I'd initially speculate that you might be hitting a problem in mmnetverify itself. (perhaps some aspect in mmnetverify is not taking into account that ports other than 22, 1191, 60000-61000 may be getting blocked by the firewall) Could you open a PMR for this one? Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for Simon Thompson ---10/19/2018 06:41:27 AM---Hi, We?re having some issues bringing up firewalls on som]Simon Thompson ---10/19/2018 06:41:27 AM---Hi, We?re having some issues bringing up firewalls on some of our NSD nodes. The problem I was actua From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 10/19/2018 06:41 AM Subject: [gpfsug-discuss] Spectrum Scale and Firewalls Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, We?re having some issues bringing up firewalls on some of our NSD nodes. The problem I was actually trying to diagnose I don?t think is firewall related but still ? We have port 22 and 1191 open and also 60000-61000, we also set: # mmlsconfig tscTcpPort tscTcpPort 1191 # mmlsconfig tscCmdPortRange tscCmdPortRange 60000-61000 https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_firewallforinternalcommn.htm Claims this is sufficient ? Running mmnetverify: # mmnetverify all --target-nodes rds-er-mgr01 rds-pg-mgr01 checking local configuration. Operation interface: Success. rds-pg-mgr01 checking communication with node rds-er-mgr01. Operation resolution: Success. Operation ping: Success. Operation shell: Success. Operation copy: Success. Operation time: Success. Operation daemon-port: Success. Operation sdrserv-port: Success. Operation tsccmd-port: Success. Operation data-small: Success. Operation data-medium: Success. Operation data-large: Success. Could not connect to port 46326 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. rds-pg-mgr01 checking cluster communications. Issues Found: rds-er-mgr01 could not connect to rds-pg-mgr01 (TCP, port 46326). mmnetverify: Command failed. Examine previous error messages to determine cause. Note that the port number mentioned changes if we run mmnetverify multiple times. The two clients in this test are running 5.0.2 code. If I run in verbose mode I see: Checking network communication with node rds-er-mgr01. Port range restricted by cluster configuration: 60000 - 61000. rds-er-mgr01: connecting to node rds-pg-mgr01. rds-er-mgr01: exchanged 256.0M bytes with rds-pg-mgr01. Write size: 16.0M bytes. Network statistics for rds-er-mgr01 during data exchange: packets sent: 68112 packets received: 72452 Network Traffic between rds-er-mgr01 and rds-pg-mgr01 port 60000 ok. Operation data-large: Success. Checking network bandwidth. rds-er-mgr01: connecting to node rds-pg-mgr01. Could not connect to port 36277 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. So for many of the tests it looks like its using port 60000 as expected, is this just a bug in mmnetverify or am I doing something silly? Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 106 bytes Desc: image001.gif URL: From Robert.Oesterlin at nuance.com Fri Oct 19 16:33:04 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 19 Oct 2018 15:33:04 +0000 Subject: [gpfsug-discuss] SC18 - User Group Meeting - Agenda and Registration Message-ID: <041D6114-8F12-463F-BFB5-ABF1A1834DA1@nuance.com> SC18 is only 3 weeks away! Here is the (more or less) final agenda for the user group meeting. SSUG @ SC18 Sunday, November 11th 12:30PM - 18:00 Omni Dallas Hotel 555 S Lamar Dallas, Texas Please register at the IBM site here: https://www-01.ibm.com/events/wwe/grp/grp305.nsf/Agenda.xsp?locale=en_US&openform=&seminar=2DQMNHES# Looking forward to seeing everyone in Dallas! Bob, Kristy, and Simon Start End Duration Title 12:30 12:45 15 Welcome 12:45 13:15 30 Spectrum Scale Update 13:15 13:30 15 ESS Update 13:30 13:45 15 Service Update 13:45 14:05 20 Lessons learned from a very unusual year (Kevin Buterbaugh, Vanderbilt) 14:05 14:25 20 Implementing a scratch filesystem with E8 Storage NVMe (Tom King, Queen Mary University of London) 14:25 14:45 20 Spectrum Scale and Containers (John Lewars, IBM) 14:45 15:10 25 Break 15:10 15:30 20 Best Practices for Protocol Nodes (Tomer Perry/Ulf Troppens, IBM) 15:30 15:50 20 Network Design Tomer Perry/Ulf Troppens, IBM/Mellanox) 15:50 16:10 20 AI Discussion 16:10 16:30 20 Improving Spark workload performance with Spectrum Conductor on Spectrum Scale (Chris Schlipalius, Pawsey Supercomputing Centre) 16:30 16:50 20 Spectrum Scale @ DDN ? Technical update (Sven Oehme, DDN) 16:50 17:10 20 Burst Buffer (Tom Goodings) 17:10 17:30 20 MetaData Management 17:30 17:45 15 Lenovo Update (Michael Hennecke, Lenovo) 17:45 18:00 15 Ask us anything 18:00 Social Event (at the hotel) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Mon Oct 22 01:25:50 2018 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Mon, 22 Oct 2018 00:25:50 +0000 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From oehmes at gmail.com Mon Oct 22 17:18:43 2018 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 22 Oct 2018 09:18:43 -0700 Subject: [gpfsug-discuss] GPFS, Pagepool and Block size -> Perfomance reduces with larger block size In-Reply-To: <243c5d36-f25e-4ebb-b9f3-6fc47bc6d93c@Spark> References: <6bb509b7-b7c5-422d-8e27-599333b6b7c4@Spark> <013aeb31-ebd2-4cc7-97d1-06883d9569f7@Spark> <243c5d36-f25e-4ebb-b9f3-6fc47bc6d93c@Spark> Message-ID: oops, somehow that slipped my inbox, i only saw that reply right now. its really hard to see from a trace snipped if the lock is the issue as the lower level locks don't show up in default traces. without having access to source code and a detailed trace you won't make much progress here. sven On Thu, Sep 27, 2018 at 12:31 PM wrote: > Thank you Sven, > > Turning of prefetching did not improve the performance, but it did degrade > a bit. > > I have made the prefetching default and took trace dump, for tracectl with > trace=io. Let me know if you want me to paste/attach it here. > > May i know, how could i confirm if the below is true? > > 1. this could be serialization around buffer locks. as larger your >>> blocksize gets as larger is the amount of data one of this pagepool buffers >>> will maintain, if there is a lot of concurrency on smaller amount of data >>> more threads potentially compete for the same buffer lock to copy stuff in >>> and out of a particular buffer, hence things go slower compared to the same >>> amount of data spread across more buffers, each of smaller size. >>> >>> > Will the above trace help in understanding if it is a serialization issue? > > I had been discussing the same with GPFS support for past few months, and > it seems to be that most of the time is being spent at cxiUXfer. They could > not understand on why it is taking spending so much of time in cxiUXfer. I > was seeing the same from perf top, and pagefaults. > > Below is snippet from what the support had said : > > ???????????????????????????? > > I searched all of the gpfsRead from trace and sort them by spending-time. > Except 2 reads which need fetch data from nsd server, the slowest read is > in the thread 72170. It took 112470.362 us. > > > trcrpt.2018-08-06_12.27.39.55538.lt15.trsum: 72165 6.860911319 > rdwr 141857.076 us + NSDIO > > trcrpt.2018-08-06_12.26.28.39794.lt15.trsum: 72170 1.483947593 > rdwr 112470.362 us + cxiUXfer > > trcrpt.2018-08-06_12.27.39.55538.lt15.trsum: 72165 6.949042593 > rdwr 88126.278 us + NSDIO > > trcrpt.2018-08-06_12.27.03.47706.lt15.trsum: 72156 2.919334474 > rdwr 81057.657 us + cxiUXfer > > trcrpt.2018-08-06_12.23.30.72745.lt15.trsum: 72154 1.167484466 > rdwr 76033.488 us + cxiUXfer > > trcrpt.2018-08-06_12.24.06.7508.lt15.trsum: 72187 0.685237501 > rdwr 70772.326 us + cxiUXfer > > trcrpt.2018-08-06_12.25.17.23989.lt15.trsum: 72193 4.757996530 > rdwr 70447.838 us + cxiUXfer > > > I check each of the slow IO as above, and find they all spend much time in > the function cxiUXfer. This function is used to copy data from kernel > buffer to user buffer. I am not sure why it took so much time. This should > be related to the pagefaults and pgfree you observed. Below is the trace > data for thread 72170. > > > 1.371477231 72170 TRACE_VNODE: gpfs_f_rdwr enter: fP > 0xFFFF882541649400 f_flags 0x8000 flags 0x8001 op 0 iovec > 0xFFFF881F2AFB3E70 count 1 offset 0x168F30D dentry 0xFFFF887C0CC298C0 > private 0xFFFF883F607175C0 iP 0xFFFF8823AA3CBFC0 name '410513.svs' > > .... > > 1.371483547 72170 TRACE_KSVFS: cachedReadFast exit: > uio_resid 16777216 code 1 err 11 > > .... > > 1.371498780 72170 TRACE_KSVFS: kSFSReadFast: oiP > 0xFFFFC90060B46740 offset 0x168F30D dataBufP FFFFC9003645A5A8 nDesc 64 buf > 200043C0000 valid words 64 dirty words 0 blkOff 0 > > 1.371499035 72170 TRACE_LOG: > UpdateLogger::beginDataUpdate begin ul 0xFFFFC900333F1A40 holdCount 0 > ioType 0x2 inProg 0x15 > > 1.371500157 72170 TRACE_LOG: > UpdateLogger::beginDataUpdate ul 0xFFFFC900333F1A40 holdCount 0 ioType 0x2 > inProg 0x16 err 0 > > 1.371500606 72170 TRACE_KSVFS: cxiUXfer: nDesc 64 1st > dataPtr 0x200043C0000 plP 0xFFFF887F7B90D600 toIOBuf 0 offset 6877965 len > 9899251 > > 1.371500793 72170 TRACE_KSVFS: cxiUXfer: ndesc 0 skip > dataAddrP 0x200043C0000 currOffset 0 currLen 262144 bufOffset 6877965 > > .... > > 1.371505949 72170 TRACE_KSVFS: cxiUXfer: ndesc 25 skip > dataAddrP 0x2001AF80000 currOffset 6553600 currLen 262144 bufOffset 6877965 > > 1.371506236 72170 TRACE_KSVFS: cxiUXfer: nDesc 26 > currOffset 6815744 tmpLen 262144 dataAddrP 0x2001AFCF30D currLen 199923 > pageOffset 781 pageLen 3315 plP 0xFFFF887F7B90D600 > > 1.373649823 72170 TRACE_KSVFS: cxiUXfer: nDesc 27 > currOffset 7077888 tmpLen 262144 dataAddrP 0x20027400000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > 1.375158799 72170 TRACE_KSVFS: cxiUXfer: nDesc 28 > currOffset 7340032 tmpLen 262144 dataAddrP 0x20027440000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > 1.376661566 72170 TRACE_KSVFS: cxiUXfer: nDesc 29 > currOffset 7602176 tmpLen 262144 dataAddrP 0x2002C180000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > 1.377892653 72170 TRACE_KSVFS: cxiUXfer: nDesc 30 > currOffset 7864320 tmpLen 262144 dataAddrP 0x2002C1C0000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > .... > > 1.471389843 72170 TRACE_KSVFS: cxiUXfer: nDesc 62 > currOffset 16252928 tmpLen 262144 dataAddrP 0x2001D2C0000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > 1.471845629 72170 TRACE_KSVFS: cxiUXfer: nDesc 63 > currOffset 16515072 tmpLen 262144 dataAddrP 0x2003EC80000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > 1.472417149 72170 TRACE_KSVFS: cxiDetachIOBuffer: > dataPtr 0x200043C0000 plP 0xFFFF887F7B90D600 > > 1.472417775 72170 TRACE_LOCK: unlock_vfs: type Data, > key 0000000000000004:000000001B1F24BF:0000000000000001 lock_mode have ro > token xw lock_state old [ ro:27 ] new [ ro:26 ] holdCount now 27 > > 1.472418427 72170 TRACE_LOCK: hash tab lookup vfs: > found cP 0xFFFFC9005FC0CDE0 holdCount now 14 > > 1.472418592 72170 TRACE_LOCK: lock_vfs: type Data key > 0000000000000004:000000001B1F24BF:0000000000000002 lock_mode want ro status > valid token xw/xw lock_state [ ro:12 ] flags 0x0 holdCount 14 > > 1.472419842 72170 TRACE_KSVFS: kSFSReadFast: oiP > 0xFFFFC90060B46740 offset 0x2000000 dataBufP FFFFC9003643C908 nDesc 64 buf > 38033480000 valid words 64 dirty words 0 blkOff 0 > > 1.472420029 72170 TRACE_LOG: > UpdateLogger::beginDataUpdate begin ul 0xFFFFC9005FC0CF98 holdCount 0 > ioType 0x2 inProg 0xC > > 1.472420187 72170 TRACE_LOG: > UpdateLogger::beginDataUpdate ul 0xFFFFC9005FC0CF98 holdCount 0 ioType 0x2 > inProg 0xD err 0 > > 1.472420652 72170 TRACE_KSVFS: cxiUXfer: nDesc 64 1st > dataPtr 0x38033480000 plP 0xFFFF887F7B934320 toIOBuf 0 offset 0 len 6877965 > > 1.472420936 72170 TRACE_KSVFS: cxiUXfer: nDesc 0 > currOffset 0 tmpLen 262144 dataAddrP 0x38033480000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > 1.472824790 72170 TRACE_KSVFS: cxiUXfer: nDesc 1 > currOffset 262144 tmpLen 262144 dataAddrP 0x380334C0000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > 1.473243905 72170 TRACE_KSVFS: cxiUXfer: nDesc 2 > currOffset 524288 tmpLen 262144 dataAddrP 0x38024280000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > .... > > 1.482949347 72170 TRACE_KSVFS: cxiUXfer: nDesc 24 > currOffset 6291456 tmpLen 262144 dataAddrP 0x38025E80000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > 1.483354265 72170 TRACE_KSVFS: cxiUXfer: nDesc 25 > currOffset 6553600 tmpLen 262144 dataAddrP 0x38025EC0000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > 1.483766631 72170 TRACE_KSVFS: cxiUXfer: nDesc 26 > currOffset 6815744 tmpLen 262144 dataAddrP 0x38003B00000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > 1.483943894 72170 TRACE_KSVFS: cxiDetachIOBuffer: > dataPtr 0x38033480000 plP 0xFFFF887F7B934320 > > 1.483944339 72170 TRACE_LOCK: unlock_vfs: type Data, > key 0000000000000004:000000001B1F24BF:0000000000000002 lock_mode have ro > token xw lock_state old [ ro:14 ] new [ ro:13 ] holdCount now 14 > > 1.483944683 72170 TRACE_BRL: brUnlockM: ofP > 0xFFFFC90069346B68 inode 455025855 snap 0 handle 0xFFFFC9003637D020 range > 0x168F30D-0x268F30C mode ro > > 1.483944985 72170 TRACE_KSVFS: kSFSReadFast exit: > uio_resid 0 err 0 > > 1.483945264 72170 TRACE_LOCK: unlock_vfs_m: type > Inode, key 305F105B9701E60A:000000001B1F24BF:0000000000000000 lock_mode > have ro status valid token rs lock_state old [ ro:25 ] new [ ro:24 ] > > 1.483945423 72170 TRACE_LOCK: unlock_vfs_m: cP > 0xFFFFC90069346B68 holdCount 25 > > 1.483945624 72170 TRACE_VNODE: gpfsRead exit: fast err > 0 > > 1.483946831 72170 TRACE_KSVFS: ReleSG: sli 38 sgP > 0xFFFFC90035E52F78 NotQuiesced vfsOp 2 > > 1.483946975 72170 TRACE_KSVFS: ReleSG: sli 38 sgP > 0xFFFFC90035E52F78 vfsOp 2 users 1-1 > > 1.483947116 72170 TRACE_KSVFS: ReleaseDaemonSegAndSG: > sli 38 count 2 needCleanup 0 > > 1.483947593 72170 TRACE_VNODE: gpfs_f_rdwr exit: fP > 0xFFFF882541649400 total_len 16777216 uio_resid 0 offset 0x268F30D rc 0 > > > ??????????????????????????????????????????? > > > > Regards, > Lohit > > On Sep 19, 2018, 3:11 PM -0400, Sven Oehme , wrote: > > the document primarily explains all performance specific knobs. general > advice would be to longer set anything beside workerthreads, pagepool and > filecache on 5.X systems as most other settings are no longer relevant > (thats a client side statement) . thats is true until you hit strange > workloads , which is why all the knobs are still there :-) > > sven > > > On Wed, Sep 19, 2018 at 11:17 AM wrote: > >> Thanks Sven. >> I will disable it completely and see how it behaves. >> >> Is this the presentation? >> >> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf >> >> I guess i read it, but it did not strike me at this situation. I will try >> to read it again and see if i could make use of it. >> >> Regards, >> Lohit >> >> On Sep 19, 2018, 2:12 PM -0400, Sven Oehme , wrote: >> >> seem like you never read my performance presentation from a few years ago >> ;-) >> >> you can control this on a per node basis , either for all i/o : >> >> prefetchAggressiveness = X >> >> or individual for reads or writes : >> >> prefetchAggressivenessRead = X >> prefetchAggressivenessWrite = X >> >> for a start i would turn it off completely via : >> >> mmchconfig prefetchAggressiveness=0 -I -N nodename >> >> that will turn it off only for that node and only until you restart the >> node. >> then see what happens >> >> sven >> >> >> On Wed, Sep 19, 2018 at 11:07 AM wrote: >> >>> Thank you Sven. >>> >>> I mostly think it could be 1. or some other issue. >>> I don?t think it could be 2. , because i can replicate this issue no >>> matter what is the size of the dataset. It happens for few files that could >>> easily fit in the page pool too. >>> >>> I do see a lot more page faults for 16M compared to 1M, so it could be >>> related to many threads trying to compete for the same buffer space. >>> >>> I will try to take the trace with trace=io option and see if can find >>> something. >>> >>> How do i turn of prefetching? Can i turn it off for a single >>> node/client? >>> >>> Regards, >>> Lohit >>> >>> On Sep 18, 2018, 5:23 PM -0400, Sven Oehme , wrote: >>> >>> Hi, >>> >>> taking a trace would tell for sure, but i suspect what you might be >>> hitting one or even multiple issues which have similar negative performance >>> impacts but different root causes. >>> >>> 1. this could be serialization around buffer locks. as larger your >>> blocksize gets as larger is the amount of data one of this pagepool buffers >>> will maintain, if there is a lot of concurrency on smaller amount of data >>> more threads potentially compete for the same buffer lock to copy stuff in >>> and out of a particular buffer, hence things go slower compared to the same >>> amount of data spread across more buffers, each of smaller size. >>> >>> 2. your data set is small'ish, lets say a couple of time bigger than the >>> pagepool and you random access it with multiple threads. what will happen >>> is that because it doesn't fit into the cache it will be read from the >>> backend. if multiple threads hit the same 16 mb block at once with multiple >>> 4k random reads, it will read the whole 16mb block because it thinks it >>> will benefit from it later on out of cache, but because it fully random the >>> same happens with the next block and the next and so on and before you get >>> back to this block it was pushed out of the cache because of lack of enough >>> pagepool. >>> >>> i could think of multiple other scenarios , which is why its so hard to >>> accurately benchmark an application because you will design a benchmark to >>> test an application, but it actually almost always behaves different then >>> you think it does :-) >>> >>> so best is to run the real application and see under which configuration >>> it works best. >>> >>> you could also take a trace with trace=io and then look at >>> >>> TRACE_VNOP: READ: >>> TRACE_VNOP: WRITE: >>> >>> and compare them to >>> >>> TRACE_IO: QIO: read >>> TRACE_IO: QIO: write >>> >>> and see if the numbers summed up for both are somewhat equal. if >>> TRACE_VNOP is significant smaller than TRACE_IO you most likely do more i/o >>> than you should and turning prefetching off might actually make things >>> faster . >>> >>> keep in mind i am no longer working for IBM so all i say might be >>> obsolete by now, i no longer have access to the one and only truth aka the >>> source code ... but if i am wrong i am sure somebody will point this out >>> soon ;-) >>> >>> sven >>> >>> >>> >>> >>> On Tue, Sep 18, 2018 at 10:31 AM wrote: >>> >>>> Hello All, >>>> >>>> This is a continuation to the previous discussion that i had with Sven. >>>> However against what i had mentioned previously - i realize that this >>>> is ?not? related to mmap, and i see it when doing random freads. >>>> >>>> I see that block-size of the filesystem matters when reading from Page >>>> pool. >>>> I see a major difference in performance when compared 1M to 16M, when >>>> doing lot of random small freads with all of the data in pagepool. >>>> >>>> Performance for 1M is a magnitude ?more? than the performance that i >>>> see for 16M. >>>> >>>> The GPFS that we have currently is : >>>> Version : 5.0.1-0.5 >>>> Filesystem version: 19.01 (5.0.1.0) >>>> Block-size : 16M >>>> >>>> I had made the filesystem block-size to be 16M, thinking that i would >>>> get the most performance for both random/sequential reads from 16M than the >>>> smaller block-sizes. >>>> With GPFS 5.0, i made use the 1024 sub-blocks instead of 32 and thus >>>> not loose lot of storage space even with 16M. >>>> I had run few benchmarks and i did see that 16M was performing better >>>> ?when hitting storage/disks? with respect to bandwidth for >>>> random/sequential on small/large reads. >>>> >>>> However, with this particular workload - where it freads a chunk of >>>> data randomly from hundreds of files -> I see that the number of >>>> page-faults increase with block-size and actually reduce the performance. >>>> 1M performs a lot better than 16M, and may be i will get better >>>> performance with less than 1M. >>>> It gives the best performance when reading from local disk, with 4K >>>> block size filesystem. >>>> >>>> What i mean by performance when it comes to this workload - is not the >>>> bandwidth but the amount of time that it takes to do each iteration/read >>>> batch of data. >>>> >>>> I figure what is happening is: >>>> fread is trying to read a full block size of 16M - which is good in a >>>> way, when it hits the hard disk. >>>> But the application could be using just a small part of that 16M. Thus >>>> when randomly reading(freads) lot of data of 16M chunk size - it is page >>>> faulting a lot more and causing the performance to drop . >>>> I could try to make the application do read instead of freads, but i >>>> fear that could be bad too since it might be hitting the disk with a very >>>> small block size and that is not good. >>>> >>>> With the way i see things now - >>>> I believe it could be best if the application does random reads of >>>> 4k/1M from pagepool but some how does 16M from rotating disks. >>>> >>>> I don?t see any way of doing the above other than following a different >>>> approach where i create a filesystem with a smaller block size ( 1M or less >>>> than 1M ), on SSDs as a tier. >>>> >>>> May i please ask for advise, if what i am understanding/seeing is right >>>> and the best solution possible for the above scenario. >>>> >>>> Regards, >>>> Lohit >>>> >>>> On Apr 11, 2018, 10:36 AM -0400, Lohit Valleru , >>>> wrote: >>>> >>>> Hey Sven, >>>> >>>> This is regarding mmap issues and GPFS. >>>> We had discussed previously of experimenting with GPFS 5. >>>> >>>> I now have upgraded all of compute nodes and NSD nodes to GPFS 5.0.0.2 >>>> >>>> I am yet to experiment with mmap performance, but before that - I am >>>> seeing weird hangs with GPFS 5 and I think it could be related to mmap. >>>> >>>> Have you seen GPFS ever hang on this syscall? >>>> [Tue Apr 10 04:20:13 2018] [] >>>> _ZN10gpfsNode_t8mmapLockEiiPKj+0xb5/0x140 [mmfs26] >>>> >>>> I see the above ,when kernel hangs and throws out a series of trace >>>> calls. >>>> >>>> I somehow think the above trace is related to processes hanging on GPFS >>>> forever. There are no errors in GPFS however. >>>> >>>> Also, I think the above happens only when the mmap threads go above a >>>> particular number. >>>> >>>> We had faced a similar issue in 4.2.3 and it was resolved in a patch to >>>> 4.2.3.2 . At that time , the issue happened when mmap threads go more than >>>> worker1threads. According to the ticket - it was a mmap race condition that >>>> GPFS was not handling well. >>>> >>>> I am not sure if this issue is a repeat and I am yet to isolate the >>>> incident and test with increasing number of mmap threads. >>>> >>>> I am not 100 percent sure if this is related to mmap yet but just >>>> wanted to ask you if you have seen anything like above. >>>> >>>> Thanks, >>>> >>>> Lohit >>>> >>>> On Feb 22, 2018, 3:59 PM -0500, Sven Oehme , wrote: >>>> >>>> Hi Lohit, >>>> >>>> i am working with ray on a mmap performance improvement right now, >>>> which most likely has the same root cause as yours , see --> >>>> http://gpfsug.org/pipermail/gpfsug-discuss/2018-January/004411.html >>>> the thread above is silent after a couple of back and rorth, but ray >>>> and i have active communication in the background and will repost as soon >>>> as there is something new to share. >>>> i am happy to look at this issue after we finish with ray's workload if >>>> there is something missing, but first let's finish his, get you try the >>>> same fix and see if there is something missing. >>>> >>>> btw. if people would share their use of MMAP , what applications they >>>> use (home grown, just use lmdb which uses mmap under the cover, etc) please >>>> let me know so i get a better picture on how wide the usage is with GPFS. i >>>> know a lot of the ML/DL workloads are using it, but i would like to know >>>> what else is out there i might not think about. feel free to drop me a >>>> personal note, i might not reply to it right away, but eventually. >>>> >>>> thx. sven >>>> >>>> >>>> On Thu, Feb 22, 2018 at 12:33 PM wrote: >>>> >>>>> Hi all, >>>>> >>>>> I wanted to know, how does mmap interact with GPFS pagepool with >>>>> respect to filesystem block-size? >>>>> Does the efficiency depend on the mmap read size and the block-size of >>>>> the filesystem even if all the data is cached in pagepool? >>>>> >>>>> GPFS 4.2.3.2 and CentOS7. >>>>> >>>>> Here is what i observed: >>>>> >>>>> I was testing a user script that uses mmap to read from 100M to 500MB >>>>> files. >>>>> >>>>> The above files are stored on 3 different filesystems. >>>>> >>>>> Compute nodes - 10G pagepool and 5G seqdiscardthreshold. >>>>> >>>>> 1. 4M block size GPFS filesystem, with separate metadata and data. >>>>> Data on Near line and metadata on SSDs >>>>> 2. 1M block size GPFS filesystem as a AFM cache cluster, "with all the >>>>> required files fully cached" from the above GPFS cluster as home. Data and >>>>> Metadata together on SSDs >>>>> 3. 16M block size GPFS filesystem, with separate metadata and data. >>>>> Data on Near line and metadata on SSDs >>>>> >>>>> When i run the script first time for ?each" filesystem: >>>>> I see that GPFS reads from the files, and caches into the pagepool as >>>>> it reads, from mmdiag -- iohist >>>>> >>>>> When i run the second time, i see that there are no IO requests from >>>>> the compute node to GPFS NSD servers, which is expected since all the data >>>>> from the 3 filesystems is cached. >>>>> >>>>> However - the time taken for the script to run for the files in the 3 >>>>> different filesystems is different - although i know that they are just >>>>> "mmapping"/reading from pagepool/cache and not from disk. >>>>> >>>>> Here is the difference in time, for IO just from pagepool: >>>>> >>>>> 20s 4M block size >>>>> 15s 1M block size >>>>> 40S 16M block size. >>>>> >>>>> Why do i see a difference when trying to mmap reads from different >>>>> block-size filesystems, although i see that the IO requests are not hitting >>>>> disks and just the pagepool? >>>>> >>>>> I am willing to share the strace output and mmdiag outputs if needed. >>>>> >>>>> Thanks, >>>>> Lohit >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Mon Oct 22 16:21:06 2018 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Mon, 22 Oct 2018 15:21:06 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> Message-ID: <3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> It seems like the primary way that this helps us is that we transfer user home directories and many of them have VERY large numbers of small files (in the millions), so running multiple simultaneous rsyncs allows the transfer to continue past that one slow area. I guess it balances the bandwidth constraint and the I/O constraints on generating a file list. There are unfortunately one or two known bugs that slow it down ? it keeps track of its rsync PIDs but sometimes a former rsync PID is reused by the system which it counts against the number of running rsyncs. It can also think rsync is still running at the end when it?s really something else now using the PID. I know the author is looking at that. For shorter transfers, you likely won?t run into this. I?m not sure I have the time or the programming ability to make this happen, but it seems to me that one could make some major gains by replacing fpart with mmfind in a GPFS environment. Generating lists of files takes a significant amount of time and mmfind can probably do it faster than anything else that does not have direct access to GPFS metadata. > On Oct 19, 2018, at 6:37 AM, Dwayne.Hart at med.mun.ca wrote: > > Thank you Ryan. I?ll have a more in-depth look at this application later today and see how it deals with some of the large genetic files that are generated by the sequencer. By copying it from GPFS fs to another GPFS fs. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > >> On Oct 19, 2018, at 7:04 AM, Ryan Novosielski wrote: >> >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> We use parsyncfp. Our target is not GPFS, though. I was really hoping >> to hear about something snazzier for GPFS-GPFS. Lenovo would probably >> tell you that HSM is the way to go (we asked something similar for a >> replacement for our current setup or for distributed storage). >> >>> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: >>> Hi, >>> >>> Just wondering what the best recipe for migrating a user?s home >>> directory content from one GFPS file system to another which hosts >>> a larger research GPFS file system? I?m currently using rsync and >>> it has maxed out the client system?s IB interface. >>> >>> Best, Dwayne ? Dwayne Hart | Systems Administrator IV >>> >>> CHIA, Faculty of Medicine Memorial University of Newfoundland 300 >>> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L >>> Dobbin Building | 4M409 T 709 864 6631 >>> _______________________________________________ gpfsug-discuss >>> mailing list gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> - -- >> ____ >> || \\UTGERS, |----------------------*O*------------------------ >> ||_// the State | Ryan Novosielski - novosirj at rutgers.edu >> || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus >> || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark >> `' >> -----BEGIN PGP SIGNATURE----- >> >> iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG >> p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk >> =dMDg >> -----END PGP SIGNATURE----- >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Mon Oct 22 19:11:06 2018 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 22 Oct 2018 11:11:06 -0700 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> <3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> Message-ID: i am not sure if that was mentioned already but in some version of V5.0.X based on my suggestion a tool was added by mark on a AS-IS basis (thanks mark) to do what you want with one exception : /usr/lpp/mmfs/samples/ilm/mmxcp -h Usage: /usr/lpp/mmfs/samples/ilm/mmxcp -t target -p strip_count source_pathname1 source_pathname2 ... Run "cp" in a mmfind ... -xarg ... pipeline, e.g. mmfind -polFlags '-N all -g /gpfs/tmp' /gpfs/source -gpfsWeight DIRECTORY_HASH -xargs mmxcp -t /target -p 2 Options: -t target_path : Copy files to this path. -p strip_count : Remove this many directory names from the pathnames of the source files. -a : pass -a to cp -v : pass -v to cp this is essentially a parallel copy tool using the policy with all its goddies. the one critical part thats missing is that it doesn't copy any GPFS specific metadata which unfortunate includes NFSV4 ACL's. the reason for that is that GPFS doesn't expose the NFSV4 ACl's via xattrs nor does any of the regular Linux tools uses the proprietary interface into GPFS to extract and apply them (this is what allows this magic unsupported version of rsync https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync to transfer the acls and other attributes). so a worth while RFE would be to either expose all special GPFS bits as xattrs or provide at least a maintained version of sync, cp or whatever which allows the transfer of this data. Sven On Mon, Oct 22, 2018 at 10:52 AM Ryan Novosielski wrote: > It seems like the primary way that this helps us is that we transfer user > home directories and many of them have VERY large numbers of small files > (in the millions), so running multiple simultaneous rsyncs allows the > transfer to continue past that one slow area. I guess it balances the > bandwidth constraint and the I/O constraints on generating a file list. > There are unfortunately one or two known bugs that slow it down ? it keeps > track of its rsync PIDs but sometimes a former rsync PID is reused by the > system which it counts against the number of running rsyncs. It can also > think rsync is still running at the end when it?s really something else now > using the PID. I know the author is looking at that. For shorter transfers, > you likely won?t run into this. > > I?m not sure I have the time or the programming ability to make this > happen, but it seems to me that one could make some major gains by > replacing fpart with mmfind in a GPFS environment. Generating lists of > files takes a significant amount of time and mmfind can probably do it > faster than anything else that does not have direct access to GPFS metadata. > > > On Oct 19, 2018, at 6:37 AM, Dwayne.Hart at med.mun.ca wrote: > > > > Thank you Ryan. I?ll have a more in-depth look at this application later > today and see how it deals with some of the large genetic files that are > generated by the sequencer. By copying it from GPFS fs to another GPFS fs. > > > > Best, > > Dwayne > > ? > > Dwayne Hart | Systems Administrator IV > > > > CHIA, Faculty of Medicine > > Memorial University of Newfoundland > > 300 Prince Philip Drive > > St. John?s, Newfoundland | A1B 3V6 > > Craig L Dobbin Building | 4M409 > > T 709 864 6631 <(709)%20864-6631> > > > >> On Oct 19, 2018, at 7:04 AM, Ryan Novosielski > wrote: > >> > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> We use parsyncfp. Our target is not GPFS, though. I was really hoping > >> to hear about something snazzier for GPFS-GPFS. Lenovo would probably > >> tell you that HSM is the way to go (we asked something similar for a > >> replacement for our current setup or for distributed storage). > >> > >>> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: > >>> Hi, > >>> > >>> Just wondering what the best recipe for migrating a user?s home > >>> directory content from one GFPS file system to another which hosts > >>> a larger research GPFS file system? I?m currently using rsync and > >>> it has maxed out the client system?s IB interface. > >>> > >>> Best, Dwayne ? Dwayne Hart | Systems Administrator IV > >>> > >>> CHIA, Faculty of Medicine Memorial University of Newfoundland 300 > >>> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L > >>> Dobbin Building | 4M409 T 709 864 6631 <(709)%20864-6631> > >>> _______________________________________________ gpfsug-discuss > >>> mailing list gpfsug-discuss at spectrumscale.org > >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >>> > >> > >> - -- > >> ____ > >> || \\UTGERS, |----------------------*O*------------------------ > >> ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > >> || \\ University | Sr. Technologist - 973/972.0922 <(973)%20972-0922> > ~*~ RBHS Campus > >> || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark > >> `' > >> -----BEGIN PGP SIGNATURE----- > >> > >> iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG > >> p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk > >> =dMDg > >> -----END PGP SIGNATURE----- > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Oct 22 21:08:49 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 22 Oct 2018 16:08:49 -0400 Subject: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca><92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu><3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> Message-ID: Rather than hack rsync or cp ... I proposed a smallish utility that would copy those extended attributes and ACLs that cp -a just skips over. This can be done using the documented GPFS APIs that were designed for backup and restore of files. SMOP and then add it as an option to samples/ilm/mmxcp Sorry I haven't gotten around to doing this ... Seems like a modest sized project... Avoids boiling the ocean and reinventing or hacking rsync. -- marc K -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Oct 22 21:15:52 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 22 Oct 2018 20:15:52 +0000 Subject: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca><92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu><3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> , Message-ID: Can you use mmxcp with output from tsbuhelper? Becuase this would actually be a pretty good way of doing incrementals when deploying a new storage system (unless IBM wants to let us add new storage and change the block size.... Someday maybe...) Though until mmxcp supports ACLs, it's still not really a solution I guess. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of makaplan at us.ibm.com [makaplan at us.ibm.com] Sent: 22 October 2018 21:08 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp Rather than hack rsync or cp ... I proposed a smallish utility that would copy those extended attributes and ACLs that cp -a just skips over. This can be done using the documented GPFS APIs that were designed for backup and restore of files. SMOP and then add it as an option to samples/ilm/mmxcp Sorry I haven't gotten around to doing this ... Seems like a modest sized project... Avoids boiling the ocean and reinventing or hacking rsync. -- marc K From oehmes at gmail.com Mon Oct 22 21:33:17 2018 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 22 Oct 2018 13:33:17 -0700 Subject: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> <3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> Message-ID: Marc, The issue with that is that you need multiple passes and things change in between, it also significant increases migration times. You will always miss something or you need to manually correct. The right thing is to have 1 tool that takes care of both, the bulk transfer and the additional attributes. Sven From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Monday, October 22, 2018 at 1:09 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp Rather than hack rsync or cp ... I proposed a smallish utility that would copy those extended attributes and ACLs that cp -a just skips over. This can be done using the documented GPFS APIs that were designed for backup and restore of files. SMOP and then add it as an option to samples/ilm/mmxcp Sorry I haven't gotten around to doing this ... Seems like a modest sized project... Avoids boiling the ocean and reinventing or hacking rsync. -- marc K _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Oct 22 22:15:17 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 22 Oct 2018 17:15:17 -0400 Subject: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca><92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu><3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> Message-ID: Just copy the extra attributes and ACL copy immediately after the cp. The window will be small, and if you think about it, the window of vulnerability is going to be there with a hacked rsync anyhow. There need not be any additional "passes". Once you put it into a single script, you have "one tool". From: Sven Oehme To: gpfsug main discussion list Date: 10/22/2018 04:33 PM Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp Sent by: gpfsug-discuss-bounces at spectrumscale.org Marc, The issue with that is that you need multiple passes and things change in between, it also significant increases migration times. You will always miss something or you need to manually correct. The right thing is to have 1 tool that takes care of both, the bulk transfer and the additional attributes. Sven From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Monday, October 22, 2018 at 1:09 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp Rather than hack rsync or cp ... I proposed a smallish utility that would copy those extended attributes and ACLs that cp -a just skips over. This can be done using the documented GPFS APIs that were designed for backup and restore of files. SMOP and then add it as an option to samples/ilm/mmxcp Sorry I haven't gotten around to doing this ... Seems like a modest sized project... Avoids boiling the ocean and reinventing or hacking rsync. -- marc K _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Tue Oct 23 00:45:05 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 22 Oct 2018 16:45:05 -0700 Subject: [gpfsug-discuss] =?utf-8?q?Still_Time_to_Register!_--_Spectrum_Sc?= =?utf-8?q?ale_User_Group_Meeting_=E2=80=93_NYC_-_New_York_Genome_Center?= In-Reply-To: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> References: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> Message-ID: <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> There?s still some room left for NYC event on THIS WEDNESDAY if you want to attend, use the link below. The current agenda is: 8:45 AM 9:00 AM Coffee & Registration Presenter 9:00 AM 9:15 AM Welcome Amy Hirst & Chris Black 9:15 AM 9:45 AM What is new in IBM Spectrum Scale? Piyush Chaudhary 9:45 AM 10:00 AM What is new in ESS? John Sing 10:00 AM 10:20 AM How does CORAL help other workloads? Kevin Gildea 10:20 AM 10:40 AM Break 10:40 AM 11:00 AM Customer Talk ? The New York Genome Center Chris Black 11:00 AM 11:20 AM Spinning up a Hadoop cluster on demand Piyush Chaudhary 11:20 AM 11:40 AM Customer Talk ? Mt. Sinai School of Medicine Francesca Tartaglione 11:40 AM 12:00 PM AI Reference Architecture Piyush Chaudhary 12:00 PM 12:50 PM Lunch 12:50 PM 1:30 PM Special Talk Joe Dain 1:30 PM 1:50 PM Multi-cloud Transparent Cloud Tiering Rob Basham 1:50 PM 2:10 PM Customer Talk ? Princeton University Curtis W. Hillegas 2:10 PM 2:30 PM Updates on Container Support John Lewars 2:30 PM 2:50 PM Customer Talk ? NYU Michael Costantino 2:50 PM 3:10 PM Spectrum Archive and TS1160 Carl Reasoner 3:10 PM 3:30 PM Break 3:30 PM 4:10 PM IBM Spectrum Scale Network Related Troubleshooting John Lewars & Felipe Knop 4:10 PM 4:40 PM Service Update Jim Doherty 4:40 PM 5:10 PM Open Forum 5:10 PM 5:30 PM Wrap-Up Social Event - Mezzanine at the Dominick Hotel (246 Spring Street) > On Sep 27, 2018, at 7:22 AM, Oesterlin, Robert wrote: > > For those of you in the NE US or NYC area, here is the agenda for the NYC meeting coming up on October 24th. Special thanks to Richard Rupp at IBM for helping to organize this event. If you can make it, please register at the Eventbrite link below. > > Spectrum Scale User Group ? NYC > October 24th, 2018 > The New York Genome Center > 101 Avenue of the Americas, New York, NY 10013 First Floor Auditorium > > Register Here: https://www.eventbrite.com/e/2018-spectrum-scale-user-group-nyc-tickets-49786782607 > > 08:45-09:00 Coffee & Registration > 09:00-09:15 Welcome > 09:15-09:45 What is new in IBM Spectrum Scale? > 09:45-10:00 What is new in ESS? > 10:00-10:20 How does CORAL help other workloads? > 10:20-10:40 --- Break --- > 10:40-11:00 Customer Talk ? The New York Genome Center > 11:00-11:20 Spinning up a Hadoop cluster on demand > 11:20-11:40 Customer Talk ? Mt. Sinai School of Medicine > 11:40-12:10 Spectrum Scale Network Flow > 12:10-13:00 --- Lunch --- > 13:00-13:40 Special Announcement and Demonstration > 13:40-14:00 Multi-cloud Transparent Cloud Tiering > 14:00-14:20 Customer Talk ? Princeton University > 14:20-14:40 AI Reference Architecture > 14:40-15:00 Updates on Container Support > 15:00-15:20 Customer Talk ? TBD > 15:20-15:40 --- Break --- > 15:40-16:10 IBM Spectrum Scale Tuning and Troubleshooting > 16:10-16:40 Service Update > 16:40-17:10 Open Forum > 17:10-17:30 Wrap-Up > 17:30- Social Event > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.schlipalius at pawsey.org.au Tue Oct 23 01:01:41 2018 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Tue, 23 Oct 2018 08:01:41 +0800 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 81, Issue 44 In-Reply-To: References: Message-ID: <8F05B8A3-B950-46E1-8711-2A5CC6D62BDA@pawsey.org.au> Hi So when we have migrated 1.6PB of data from one GPFS filesystems to another GPFS (over IB), we used dcp in github (with mmdsh). It just can be problematic to compile. I have used rsync with attrib and ACLs?s preserved in my previous job ? aka rsync -aAvz But DCP parallelises better, checksumming files and dirs. works and we used that to ensure nothing was lost. Worth a go! Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 13 Burvill Court Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au On 23/10/18, 4:08 am, "gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org" wrote: Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Best way to migrate data (Ryan Novosielski) 2. Re: Best way to migrate data (Sven Oehme) 3. Re: Best way to migrate data : mmfind ... mmxcp (Marc A Kaplan) ---------------------------------------------------------------------- Message: 1 Date: Mon, 22 Oct 2018 15:21:06 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Message-ID: <3023B88F-D115-4C0B-90DC-6EF711D858E6 at rutgers.edu> Content-Type: text/plain; charset="utf-8" It seems like the primary way that this helps us is that we transfer user home directories and many of them have VERY large numbers of small files (in the millions), so running multiple simultaneous rsyncs allows the transfer to continue past that one slow area. I guess it balances the bandwidth constraint and the I/O constraints on generating a file list. There are unfortunately one or two known bugs that slow it down ? it keeps track of its rsync PIDs but sometimes a former rsync PID is reused by the system which it counts against the number of running rsyncs. It can also think rsync is still running at the end when it?s really something else now using the PID. I know the author is looking at that. For shorter transfers, you likely won?t run into this. I?m not sure I have the time or the programming ability to make this happen, but it seems to me that one could make some major gains by replacing fpart with mmfind in a GPFS environment. Generating lists of files takes a significant amount of time and mmfind can probably do it faster than anything else that does not have direct access to GPFS metadata. > On Oct 19, 2018, at 6:37 AM, Dwayne.Hart at med.mun.ca wrote: > > Thank you Ryan. I?ll have a more in-depth look at this application later today and see how it deals with some of the large genetic files that are generated by the sequencer. By copying it from GPFS fs to another GPFS fs. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > >> On Oct 19, 2018, at 7:04 AM, Ryan Novosielski wrote: >> >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> We use parsyncfp. Our target is not GPFS, though. I was really hoping >> to hear about something snazzier for GPFS-GPFS. Lenovo would probably >> tell you that HSM is the way to go (we asked something similar for a >> replacement for our current setup or for distributed storage). >> >>> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: >>> Hi, >>> >>> Just wondering what the best recipe for migrating a user?s home >>> directory content from one GFPS file system to another which hosts >>> a larger research GPFS file system? I?m currently using rsync and >>> it has maxed out the client system?s IB interface. >>> >>> Best, Dwayne ? Dwayne Hart | Systems Administrator IV >>> >>> CHIA, Faculty of Medicine Memorial University of Newfoundland 300 >>> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L >>> Dobbin Building | 4M409 T 709 864 6631 >>> _______________________________________________ gpfsug-discuss >>> mailing list gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> - -- >> ____ >> || \\UTGERS, |----------------------*O*------------------------ >> ||_// the State | Ryan Novosielski - novosirj at rutgers.edu >> || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus >> || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark >> `' >> -----BEGIN PGP SIGNATURE----- >> >> iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG >> p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk >> =dMDg >> -----END PGP SIGNATURE----- >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ Message: 2 Date: Mon, 22 Oct 2018 11:11:06 -0700 From: Sven Oehme To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Message-ID: Content-Type: text/plain; charset="utf-8" i am not sure if that was mentioned already but in some version of V5.0.X based on my suggestion a tool was added by mark on a AS-IS basis (thanks mark) to do what you want with one exception : /usr/lpp/mmfs/samples/ilm/mmxcp -h Usage: /usr/lpp/mmfs/samples/ilm/mmxcp -t target -p strip_count source_pathname1 source_pathname2 ... Run "cp" in a mmfind ... -xarg ... pipeline, e.g. mmfind -polFlags '-N all -g /gpfs/tmp' /gpfs/source -gpfsWeight DIRECTORY_HASH -xargs mmxcp -t /target -p 2 Options: -t target_path : Copy files to this path. -p strip_count : Remove this many directory names from the pathnames of the source files. -a : pass -a to cp -v : pass -v to cp this is essentially a parallel copy tool using the policy with all its goddies. the one critical part thats missing is that it doesn't copy any GPFS specific metadata which unfortunate includes NFSV4 ACL's. the reason for that is that GPFS doesn't expose the NFSV4 ACl's via xattrs nor does any of the regular Linux tools uses the proprietary interface into GPFS to extract and apply them (this is what allows this magic unsupported version of rsync https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync to transfer the acls and other attributes). so a worth while RFE would be to either expose all special GPFS bits as xattrs or provide at least a maintained version of sync, cp or whatever which allows the transfer of this data. Sven On Mon, Oct 22, 2018 at 10:52 AM Ryan Novosielski wrote: > It seems like the primary way that this helps us is that we transfer user > home directories and many of them have VERY large numbers of small files > (in the millions), so running multiple simultaneous rsyncs allows the > transfer to continue past that one slow area. I guess it balances the > bandwidth constraint and the I/O constraints on generating a file list. > There are unfortunately one or two known bugs that slow it down ? it keeps > track of its rsync PIDs but sometimes a former rsync PID is reused by the > system which it counts against the number of running rsyncs. It can also > think rsync is still running at the end when it?s really something else now > using the PID. I know the author is looking at that. For shorter transfers, > you likely won?t run into this. > > I?m not sure I have the time or the programming ability to make this > happen, but it seems to me that one could make some major gains by > replacing fpart with mmfind in a GPFS environment. Generating lists of > files takes a significant amount of time and mmfind can probably do it > faster than anything else that does not have direct access to GPFS metadata. > > > On Oct 19, 2018, at 6:37 AM, Dwayne.Hart at med.mun.ca wrote: > > > > Thank you Ryan. I?ll have a more in-depth look at this application later > today and see how it deals with some of the large genetic files that are > generated by the sequencer. By copying it from GPFS fs to another GPFS fs. > > > > Best, > > Dwayne > > ? > > Dwayne Hart | Systems Administrator IV > > > > CHIA, Faculty of Medicine > > Memorial University of Newfoundland > > 300 Prince Philip Drive > > St. John?s, Newfoundland | A1B 3V6 > > Craig L Dobbin Building | 4M409 > > T 709 864 6631 <(709)%20864-6631> > > > >> On Oct 19, 2018, at 7:04 AM, Ryan Novosielski > wrote: > >> > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> We use parsyncfp. Our target is not GPFS, though. I was really hoping > >> to hear about something snazzier for GPFS-GPFS. Lenovo would probably > >> tell you that HSM is the way to go (we asked something similar for a > >> replacement for our current setup or for distributed storage). > >> > >>> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: > >>> Hi, > >>> > >>> Just wondering what the best recipe for migrating a user?s home > >>> directory content from one GFPS file system to another which hosts > >>> a larger research GPFS file system? I?m currently using rsync and > >>> it has maxed out the client system?s IB interface. > >>> > >>> Best, Dwayne ? Dwayne Hart | Systems Administrator IV > >>> > >>> CHIA, Faculty of Medicine Memorial University of Newfoundland 300 > >>> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L > >>> Dobbin Building | 4M409 T 709 864 6631 <(709)%20864-6631> > >>> _______________________________________________ gpfsug-discuss > >>> mailing list gpfsug-discuss at spectrumscale.org > >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >>> > >> > >> - -- > >> ____ > >> || \\UTGERS, |----------------------*O*------------------------ > >> ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > >> || \\ University | Sr. Technologist - 973/972.0922 <(973)%20972-0922> > ~*~ RBHS Campus > >> || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark > >> `' > >> -----BEGIN PGP SIGNATURE----- > >> > >> iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG > >> p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk > >> =dMDg > >> -----END PGP SIGNATURE----- > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Mon, 22 Oct 2018 16:08:49 -0400 From: "Marc A Kaplan" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp Message-ID: Content-Type: text/plain; charset="us-ascii" Rather than hack rsync or cp ... I proposed a smallish utility that would copy those extended attributes and ACLs that cp -a just skips over. This can be done using the documented GPFS APIs that were designed for backup and restore of files. SMOP and then add it as an option to samples/ilm/mmxcp Sorry I haven't gotten around to doing this ... Seems like a modest sized project... Avoids boiling the ocean and reinventing or hacking rsync. -- marc K -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 81, Issue 44 ********************************************** From Alexander.Saupp at de.ibm.com Tue Oct 23 06:51:54 2018 From: Alexander.Saupp at de.ibm.com (Alexander Saupp) Date: Tue, 23 Oct 2018 07:51:54 +0200 Subject: [gpfsug-discuss] Best way to migrate data : Plan B: policy engine + rsync Message-ID: Hi, I agree, a tool with proper wrapping delivered in samples would be the right approach. No warranty, no support - below a prototype I documented 2 years ago (prior to mmfind availability). The BP used an alternate approach, so its not tested at scale, but the principle was tested and works. Reading through it right now I'd re-test the 'deleted files on destination that were deleted on the source' scenario, that might now require some fixing. # Use 'GPFS patched' rsync on both ends to keep GPFS attributes https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync # Policy - initial & differential (add mod_time > .. for incremental runs. Use MOD_TIME < .. to have a defined start for the next incremental rsync, remove it for the 'final' rsync) # http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_usngfileattrbts.htm cat /tmp/policy.pol RULE 'mmfind' ??? LIST 'mmfindList' ??? DIRECTORIES_PLUS ??? SHOW( ????????? VARCHAR(MODE) || ' ' || ????????? VARCHAR(NLINK) || ' ' || ????????? VARCHAR(USER_ID) || ' ' || ????????? VARCHAR(GROUP_ID) || ' ' || ????????? VARCHAR(FILE_SIZE) || ' ' || ????????? VARCHAR(KB_ALLOCATED) || ' ' || ????????? VARCHAR(POOL_NAME) || ' ' || ????????? VARCHAR(MISC_ATTRIBUTES) || ' ' || ????????? VARCHAR(ACCESS_TIME) || ' ' || ????????? VARCHAR(CREATION_TIME) || ' ' || ????????? VARCHAR(MODIFICATION_TIME) ??????? ) # First run ??? WHERE MODIFICATION_TIME < TIMESTAMP('2016-08-10 00:00:00') # Incremental runs ??? WHERE MODIFICATION_TIME > TIMESTAMP('2016-08-10 00:00:00') and MODIFICATION_TIME < TIMESTAMP('2016-08-20 00:00:00') # Final run during maintenance, should also do deletes, ensure you to call rsync the proper way (--delete) ??? WHERE TRUE # Apply policy, defer will ensure the result file(s) are not deleted mmapplypolicy? group3fs -P /tmp/policy.pol? -f /ibm/group3fs/pol.txt -I defer # FYI only - look at results, ... not required # cat /ibm/group3fs/pol.txt.list.mmfindList 3 1 0? drwxr-xr-x 4 0 0 262144 512 system D2u 2016-08-25 08:30:35.053057 -- /ibm/group3fs 41472 1077291531 0? drwxr-xr-x 5 0 0 4096 0 system D2u 2016-08-18 21:07:36.996777 -- /ibm/group3fs/ces 60416 842873924 0? drwxr-xr-x 4 0 0 4096 0 system D2u 2016-08-18 21:07:45.947920 -- /ibm/group3fs/ces/ha 60417 2062486126 0? -rw-r--r-- 1 0 0 0 0 system FAu 2016-08-19 15:17:57.428922 -- /ibm/group3fs/ces/ha/.dummy 60418 436745294 0? drwxr-xr-x 4 0 0 4096 0 system D2u 2016-08-18 21:05:54.482094 -- /ibm/group3fs/ces/ces 60419 647668346 0? -rw-r--r-- 1 0 0 0 0 system FAu 2016-08-19 15:17:57.484923 -- /ibm/group3fs/ces/ces/.dummy 60420 1474765985 0? -rw-r--r-- 1 0 0 0 0 system FAu 2016-08-18 21:06:43.133640 -- /ibm/group3fs/ces/ces/addrs/1471554403-node0-9.155.118.69 60421 1020724013 0? drwxr-xr-x 2 0 0 4096 0 system D2um 2016-08-18 21:07:37.000695 -- /ibm/group3fs/ces/ganesha cat /ibm/group3fs/pol.txt.list.mmfindList? |awk ' { print $19}' /ibm/group3fs/ces/ha/.dummy /ibm/group3fs/ces/ces/.dummy /ibm/group3fs/ces/ha/nfs/ganesha/v4recov/node3 /ibm/group3fs/ces/ha/nfs/ganesha/v4old/node3 /ibm/group3fs/pol.txt.list.mmfindList /ibm/group3fs/ces/ces/connections /ibm/group3fs/ces/ha/nfs/ganesha/gpfs-epoch /ibm/group3fs/ces/ha/nfs/ganesha/v4recov /ibm/group3fs/ces/ha/nfs/ganesha/v4old # Start rsync - could split up single result file into multiple ones for parallel / multi node runs rsync -av --gpfs-attrs --progress --files-from $ ( cat /ibm/group3fs/pol.txt.list.mmfindList ) 10.10.10.10:/path Be sure you verify that extended attributes are properly replicated. I have in mind that you need to ensure the 'remote' rsync is not the default one, but the one with GPFS capabilities (rsync -e "remoteshell"). Kind regards, Alex Saupp Mit freundlichen Gr??en / Kind regards Alexander Saupp IBM Systems, Storage Platform, EMEA Storage Competence Center Phone: +49 7034-643-1512 IBM Deutschland GmbH Mobile: +49-172 7251072 Am Weiher 24 Email: alexander.saupp at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C800025.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Tue Oct 23 09:31:03 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 23 Oct 2018 08:31:03 +0000 Subject: [gpfsug-discuss] Best way to migrate data : Plan B: policy engine + rsync In-Reply-To: References: Message-ID: I should note, there is a PR there which adds symlink support as well to the patched rsync version ? It is quite an old version of rsync now, and I don?t know if it?s been tested with a newer release. Simon From: on behalf of "Alexander.Saupp at de.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 23 October 2018 at 06:52 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Best way to migrate data : Plan B: policy engine + rsync # Use 'GPFS patched' rsync on both ends to keep GPFS attributes https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync -------------- next part -------------- An HTML attachment was scrubbed... URL: From george at markomanolis.com Wed Oct 24 13:43:23 2018 From: george at markomanolis.com (George Markomanolis) Date: Wed, 24 Oct 2018 08:43:23 -0400 Subject: [gpfsug-discuss] IO500 - Call for Submission for SC18 Message-ID: Dear all, Please consider the submission of results to the new list. Deadline: 10 November 2018 AoE The IO500 is now accepting and encouraging submissions for the upcoming IO500 list revealed at Supercomputing 2018 in Dallas, Texas. We also announce the 10 compute node I/O challenge to encourage submission of small-scale results. The new ranked lists will be announced at our SC18 BOF on Wednesday, November 14th at 5:15pm. We hope to see you, and your results, there. The benchmark suite is designed to be easy to run and the community has multiple active support channels to help with any questions. Please submit and we look forward to seeing many of you at SC 2018! Please note that submissions of all size are welcome; the site has customizable sorting so it is possible to submit on a small system and still get a very good per-client score for example. Additionally, the list is about much more than just the raw rank; all submissions help the community by collecting and publishing a wider corpus of data. More details below. Following the success of the Top500 in collecting and analyzing historical trends in supercomputer technology and evolution, the IO500 was created in 2017 and published its first list at SC17. The need for such an initiative has long been known within High-Performance Computing; however, defining appropriate benchmarks had long been challenging. Despite this challenge, the community, after long and spirited discussion, finally reached consensus on a suite of benchmarks and a metric for resolving the scores into a single ranking. The multi-fold goals of the benchmark suite are as follows: Maximizing simplicity in running the benchmark suite Encouraging complexity in tuning for performance Allowing submitters to highlight their ?hero run? performance numbers Forcing submitters to simultaneously report performance for challenging IO patterns. Specifically, the benchmark suite includes a hero-run of both IOR and mdtest configured, however, possible to maximize performance and establish an upper-bound for performance. It also includes an IOR and mdtest run with highly prescribed parameters in an attempt to determine a lower-bound. Finally, it includes a namespace search as this has been determined to be a highly sought-after feature in HPC storage systems that have historically not been well measured. Submitters are encouraged to share their tuning insights for publication. The goals of the community are also multi-fold: Gather historical data for the sake of analysis and to aid predictions of storage futures Collect tuning information to share valuable performance optimizations across the community Encourage vendors and designers to optimize for workloads beyond ?hero runs? Establish bounded expectations for users, procurers, and administrators 10 Compute Node I/O Challenge At SC, we will announce another IO-500 award for the 10 Compute Node I/O Challenge. This challenge is conducted using the regular IO-500 benchmark, however, with the rule that exactly 10 computes nodes must be used to run the benchmark (one exception is find, which may use 1 node). You may use any shared storage with, e.g., any number of servers. When submitting for the IO-500 list, you can opt-in for ?Participate in the 10 compute node challenge only?, then we won't include the results into the ranked list. Other 10 compute node submission will be included in the full list and in the ranked list. We will announce the result in a separate derived list and in the full list but not on the ranked IO-500 list at io500.org. Birds-of-a-feather Once again, we encourage you to submit [1], to join our community, and to attend our BoF ?The IO-500 and the Virtual Institute of I/O? at SC 2018 [2] where we will announce the third ever IO500 list. The current list includes results from BeeGPFS, DataWarp, IME, Lustre, and Spectrum Scale. We hope that the next list has even more. We look forward to answering any questions or concerns you might have. [1] http://io500.org/submission [2] https://sc18.supercomputing.org/presentation/?id=bof134&sess=sess390 -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Oct 24 21:53:21 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 24 Oct 2018 20:53:21 +0000 Subject: [gpfsug-discuss] Spectrum Scale User Group@CIUK - call for user speakers Message-ID: Hi All, I know December is a little way off, but as usual we'll be holding a Spectrum Scale user group breakout session as part of CIUK here in the UK in December. As a breakout session its only a couple of hours... We're just looking at the agenda, I have a couple of IBM sessions in and Sven has agreed to give a talk as he'll be there as well. I'm looking for a couple of user talks to finish of the agenda. Whether you are a small deployment or large, we're interested in hearing from you! Note: you must be registered to attend CIUK to attend this user group. Registration is via the CIUK website: https://www.scd.stfc.ac.uk/Pages/CIUK2018.aspxhttps://www.scd.stfc.ac.uk/Pages/CIUK2018.aspx Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.dietrich at desy.de Thu Oct 25 13:12:07 2018 From: stefan.dietrich at desy.de (Dietrich, Stefan) Date: Thu, 25 Oct 2018 14:12:07 +0200 (CEST) Subject: [gpfsug-discuss] Nested NFSv4 Exports Message-ID: <1497297460.32545018.1540469527448.JavaMail.zimbra@desy.de> Hi, I am currently fiddling around with some nested NFSv4 exports and the differing behaviour to NFSv3. The environment is a GPFS 5.0.1 with enabled CES, so Ganesha is used as the NFS server. Given the following (pseudo) directory structure: /gpfs/filesystem1/directory1 /gpfs/filesystem1/directory1/sub-directory1 /gpfs/filesystem1/directory1/sub-directory2 Now to the exports: /gpfs/filesystem1/directory1 is exported to client1 as read-only. /gpfs/filesystem1/directory1/sub-directory1 is exported to client2 as read-write. client2 is not included in the export for /gpfs/filesystem1/directory1. Mounting /gpfs/filesystem1/directory1 on client1 works as expected. Mounting /gpfs/filesystem1/directory1/sub-directory1 on client2 does not work and results in a permission denied. If I change the protocol from NFSv4 to NFSv3, it works. There is a section about nested NFS exports in the mmnfs doc: Creating nested exports (such as /path/to/folder and /path/to/folder/subfolder) is strongly discouraged since this might lead to serious issues in data consistency. Be very cautious when creating and using nested exports. If there is a need to have nested exports (such as /path/to/folder and /path/to/folder/inside/subfolder), NFSv4 client that mounts the parent (/path/to/folder) export will not be able to see the child export subtree (/path/to/folder/inside/subfolder) unless the same client is explicitly allowed to access the child export as well. This is okay as long as the client uses only NFSv4 mounts. The Linux kernel NFS server and other NFSv4 servers do not show this behaviour. Is there a way to bypass this with CES/Ganesha? Or is the only solution to add client2 to /gpfs/filesystem1/directory1? Regards, Stefan -- ------------------------------------------------------------------------ Stefan Dietrich Deutsches Elektronen-Synchrotron (IT-Systems) Ein Forschungszentrum der Helmholtz-Gemeinschaft Notkestr. 85 phone: +49-40-8998-4696 22607 Hamburg e-mail: stefan.dietrich at desy.de Germany ------------------------------------------------------------------------ From dyoung at pixitmedia.com Thu Oct 25 17:59:08 2018 From: dyoung at pixitmedia.com (Dan Young) Date: Thu, 25 Oct 2018 12:59:08 -0400 Subject: [gpfsug-discuss] =?utf-8?q?Still_Time_to_Register!_--_Spectrum_Sc?= =?utf-8?q?ale_User_Group_Meeting_=E2=80=93_NYC_-_New_York_Genome_C?= =?utf-8?q?enter?= In-Reply-To: <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> References: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> Message-ID: Did I miss where these presentations were uploaded? People kept saying throughout the day that these would be uploaded somewhere. On Mon, 22 Oct 2018 at 19:45, Kristy Kallback-Rose wrote: > There?s still some room left for NYC event on THIS WEDNESDAY if you want > to attend, use the link below. > > *The current agenda is:* > > 8:45 AM > 9:00 AM > Coffee & Registration Presenter > > 9:00 AM > 9:15 AM > Welcome Amy Hirst & Chris Black > > 9:15 AM > 9:45 AM > What is new in IBM Spectrum Scale? Piyush Chaudhary > > 9:45 AM > 10:00 AM > What is new in ESS? John Sing > > 10:00 AM > 10:20 AM > How does CORAL help other workloads? Kevin Gildea > > 10:20 AM > 10:40 AM > Break > > 10:40 AM > 11:00 AM > Customer Talk ? The New York Genome Center Chris Black > > 11:00 AM > 11:20 AM > Spinning up a Hadoop cluster on demand Piyush Chaudhary > > 11:20 AM > 11:40 AM > Customer Talk ? Mt. Sinai School of Medicine Francesca Tartaglione > > 11:40 AM > 12:00 PM > AI Reference Architecture Piyush Chaudhary > > 12:00 PM > 12:50 PM > Lunch > > 12:50 PM > 1:30 PM > Special Talk Joe Dain > > 1:30 PM > 1:50 PM > Multi-cloud Transparent Cloud Tiering Rob Basham > > 1:50 PM > 2:10 PM > Customer Talk ? Princeton University Curtis W. Hillegas > > 2:10 PM > 2:30 PM > Updates on Container Support John Lewars > > 2:30 PM > 2:50 PM > Customer Talk ? NYU Michael Costantino > > 2:50 PM > 3:10 PM > Spectrum Archive and TS1160 Carl Reasoner > > 3:10 PM > 3:30 PM > Break > > 3:30 PM > 4:10 PM > IBM Spectrum Scale Network Related Troubleshooting John Lewars & Felipe > Knop > > 4:10 PM > 4:40 PM > Service Update Jim Doherty > > 4:40 PM > 5:10 PM > Open Forum > > 5:10 PM > 5:30 PM > Wrap-Up > Social Event - Mezzanine at the Dominick Hotel (246 Spring Street) > > On Sep 27, 2018, at 7:22 AM, Oesterlin, Robert < > Robert.Oesterlin at nuance.com> wrote: > > For those of you in the NE US or NYC area, here is the agenda for the NYC > meeting coming up on October 24th. Special thanks to Richard Rupp at IBM > for helping to organize this event. If you can make it, please register at > the Eventbrite link below. > > Spectrum Scale User Group ? NYC > October 24th, 2018 > The New York Genome Center > 101 Avenue of the Americas, New York, NY 10013 First Floor Auditorium > > Register Here: > https://www.eventbrite.com/e/2018-spectrum-scale-user-group-nyc-tickets-49786782607 > > 08:45-09:00 Coffee & Registration > 09:00-09:15 Welcome > 09:15-09:45 What is new in IBM Spectrum Scale? > 09:45-10:00 What is new in ESS? > 10:00-10:20 How does CORAL help other workloads? > 10:20-10:40 --- Break --- > 10:40-11:00 Customer Talk ? The New York Genome Center > 11:00-11:20 Spinning up a Hadoop cluster on demand > 11:20-11:40 Customer Talk ? Mt. Sinai School of Medicine > 11:40-12:10 Spectrum Scale Network Flow > 12:10-13:00 --- Lunch --- > 13:00-13:40 Special Announcement and Demonstration > 13:40-14:00 Multi-cloud Transparent Cloud Tiering > 14:00-14:20 Customer Talk ? Princeton University > 14:20-14:40 AI Reference Architecture > 14:40-15:00 Updates on Container Support > 15:00-15:20 Customer Talk ? TBD > 15:20-15:40 --- Break --- > 15:40-16:10 IBM Spectrum Scale Tuning and Troubleshooting > 16:10-16:40 Service Update > 16:40-17:10 Open Forum > 17:10-17:30 Wrap-Up > 17:30- Social Event > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- *Dan Young* Solutions Architect, Pixit Media +1-347-249-7413 | dyoung at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Thu Oct 25 18:01:39 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 25 Oct 2018 10:01:39 -0700 Subject: [gpfsug-discuss] =?utf-8?q?Still_Time_to_Register!_--_Spectrum_Sc?= =?utf-8?q?ale_User_Group_Meeting_=E2=80=93_NYC_-_New_York_Genome_Center?= In-Reply-To: References: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> Message-ID: Checking? -Kristy > On Oct 25, 2018, at 9:59 AM, Dan Young wrote: > > Did I miss where these presentations were uploaded? People kept saying throughout the day that these would be uploaded somewhere. > > On Mon, 22 Oct 2018 at 19:45, Kristy Kallback-Rose > wrote: > There?s still some room left for NYC event on THIS WEDNESDAY if you want to attend, use the link below. > > The current agenda is: > > 8:45 AM > 9:00 AM > Coffee & Registration Presenter > 9:00 AM > 9:15 AM > Welcome Amy Hirst & Chris Black > 9:15 AM > 9:45 AM > What is new in IBM Spectrum Scale? Piyush Chaudhary > 9:45 AM > 10:00 AM > What is new in ESS? John Sing > 10:00 AM > 10:20 AM > How does CORAL help other workloads? Kevin Gildea > 10:20 AM > 10:40 AM > Break > 10:40 AM > 11:00 AM > Customer Talk ? The New York Genome Center Chris Black > 11:00 AM > 11:20 AM > Spinning up a Hadoop cluster on demand Piyush Chaudhary > 11:20 AM > 11:40 AM > Customer Talk ? Mt. Sinai School of Medicine Francesca Tartaglione > 11:40 AM > 12:00 PM > AI Reference Architecture Piyush Chaudhary > 12:00 PM > 12:50 PM > Lunch > 12:50 PM > 1:30 PM > Special Talk Joe Dain > 1:30 PM > 1:50 PM > Multi-cloud Transparent Cloud Tiering Rob Basham > 1:50 PM > 2:10 PM > Customer Talk ? Princeton University Curtis W. Hillegas > 2:10 PM > 2:30 PM > Updates on Container Support John Lewars > 2:30 PM > 2:50 PM > Customer Talk ? NYU Michael Costantino > 2:50 PM > 3:10 PM > Spectrum Archive and TS1160 Carl Reasoner > 3:10 PM > 3:30 PM > Break > 3:30 PM > 4:10 PM > IBM Spectrum Scale Network Related Troubleshooting John Lewars & Felipe Knop > 4:10 PM > 4:40 PM > Service Update Jim Doherty > 4:40 PM > 5:10 PM > Open Forum > 5:10 PM > 5:30 PM > Wrap-Up > Social Event - Mezzanine at the Dominick Hotel (246 Spring Street) > > >> On Sep 27, 2018, at 7:22 AM, Oesterlin, Robert > wrote: >> >> For those of you in the NE US or NYC area, here is the agenda for the NYC meeting coming up on October 24th. Special thanks to Richard Rupp at IBM for helping to organize this event. If you can make it, please register at the Eventbrite link below. >> >> Spectrum Scale User Group ? NYC >> October 24th, 2018 >> The New York Genome Center >> 101 Avenue of the Americas, New York, NY 10013 First Floor Auditorium >> >> Register Here: https://www.eventbrite.com/e/2018-spectrum-scale-user-group-nyc-tickets-49786782607 >> >> 08:45-09:00 Coffee & Registration >> 09:00-09:15 Welcome >> 09:15-09:45 What is new in IBM Spectrum Scale? >> 09:45-10:00 What is new in ESS? >> 10:00-10:20 How does CORAL help other workloads? >> 10:20-10:40 --- Break --- >> 10:40-11:00 Customer Talk ? The New York Genome Center >> 11:00-11:20 Spinning up a Hadoop cluster on demand >> 11:20-11:40 Customer Talk ? Mt. Sinai School of Medicine >> 11:40-12:10 Spectrum Scale Network Flow >> 12:10-13:00 --- Lunch --- >> 13:00-13:40 Special Announcement and Demonstration >> 13:40-14:00 Multi-cloud Transparent Cloud Tiering >> 14:00-14:20 Customer Talk ? Princeton University >> 14:20-14:40 AI Reference Architecture >> 14:40-15:00 Updates on Container Support >> 15:00-15:20 Customer Talk ? TBD >> 15:20-15:40 --- Break --- >> 15:40-16:10 IBM Spectrum Scale Tuning and Troubleshooting >> 16:10-16:40 Service Update >> 16:40-17:10 Open Forum >> 17:10-17:30 Wrap-Up >> 17:30- Social Event >> >> >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > > Dan Young > Solutions Architect, Pixit Media > +1-347-249-7413 | dyoung at pixitmedia.com > www.pixitmedia.com | Tw:@pixitmedia > > This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Oct 26 01:54:13 2018 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 26 Oct 2018 00:54:13 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Still_Time_to_Register!_--_Spectrum_Sc?= =?utf-8?q?ale_User_Group_Meeting_=E2=80=93_NYC_-_New_York_Genome_Center?= In-Reply-To: References: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> Message-ID: What they said was ?spectrumscale.org?. I suspect they?ll wind up here: http://www.spectrumscaleug.org/presentations/ -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Oct 25, 2018, at 12:59 PM, Dan Young wrote: > > Did I miss where these presentations were uploaded? People kept saying throughout the day that these would be uploaded somewhere. > > On Mon, 22 Oct 2018 at 19:45, Kristy Kallback-Rose wrote: > There?s still some room left for NYC event on THIS WEDNESDAY if you want to attend, use the link below. > > The current agenda is: > > 8:45 AM > 9:00 AM > Coffee & Registration Presenter > 9:00 AM > 9:15 AM > Welcome Amy Hirst & Chris Black > 9:15 AM > 9:45 AM > What is new in IBM Spectrum Scale? Piyush Chaudhary > 9:45 AM > 10:00 AM > What is new in ESS? John Sing > 10:00 AM > 10:20 AM > How does CORAL help other workloads? Kevin Gildea > 10:20 AM > 10:40 AM > Break > 10:40 AM > 11:00 AM > Customer Talk ? The New York Genome Center Chris Black > 11:00 AM > 11:20 AM > Spinning up a Hadoop cluster on demand Piyush Chaudhary > 11:20 AM > 11:40 AM > Customer Talk ? Mt. Sinai School of Medicine Francesca Tartaglione > 11:40 AM > 12:00 PM > AI Reference Architecture Piyush Chaudhary > 12:00 PM > 12:50 PM > Lunch > 12:50 PM > 1:30 PM > Special Talk Joe Dain > 1:30 PM > 1:50 PM > Multi-cloud Transparent Cloud Tiering Rob Basham > 1:50 PM > 2:10 PM > Customer Talk ? Princeton University Curtis W. Hillegas > 2:10 PM > 2:30 PM > Updates on Container Support John Lewars > 2:30 PM > 2:50 PM > Customer Talk ? NYU Michael Costantino > 2:50 PM > 3:10 PM > Spectrum Archive and TS1160 Carl Reasoner > 3:10 PM > 3:30 PM > Break > 3:30 PM > 4:10 PM > IBM Spectrum Scale Network Related Troubleshooting John Lewars & Felipe Knop > 4:10 PM > 4:40 PM > Service Update Jim Doherty > 4:40 PM > 5:10 PM > Open Forum > 5:10 PM > 5:30 PM > Wrap-Up > Social Event - Mezzanine at the Dominick Hotel (246 Spring Street) > > >> On Sep 27, 2018, at 7:22 AM, Oesterlin, Robert wrote: >> >> For those of you in the NE US or NYC area, here is the agenda for the NYC meeting coming up on October 24th. Special thanks to Richard Rupp at IBM for helping to organize this event. If you can make it, please register at the Eventbrite link below. >> >> Spectrum Scale User Group ? NYC >> October 24th, 2018 >> The New York Genome Center >> 101 Avenue of the Americas, New York, NY 10013 First Floor Auditorium >> >> Register Here: https://www.eventbrite.com/e/2018-spectrum-scale-user-group-nyc-tickets-49786782607 >> >> 08:45-09:00 Coffee & Registration >> 09:00-09:15 Welcome >> 09:15-09:45 What is new in IBM Spectrum Scale? >> 09:45-10:00 What is new in ESS? >> 10:00-10:20 How does CORAL help other workloads? >> 10:20-10:40 --- Break --- >> 10:40-11:00 Customer Talk ? The New York Genome Center >> 11:00-11:20 Spinning up a Hadoop cluster on demand >> 11:20-11:40 Customer Talk ? Mt. Sinai School of Medicine >> 11:40-12:10 Spectrum Scale Network Flow >> 12:10-13:00 --- Lunch --- >> 13:00-13:40 Special Announcement and Demonstration >> 13:40-14:00 Multi-cloud Transparent Cloud Tiering >> 14:00-14:20 Customer Talk ? Princeton University >> 14:20-14:40 AI Reference Architecture >> 14:40-15:00 Updates on Container Support >> 15:00-15:20 Customer Talk ? TBD >> 15:20-15:40 --- Break --- >> 15:40-16:10 IBM Spectrum Scale Tuning and Troubleshooting >> 16:10-16:40 Service Update >> 16:40-17:10 Open Forum >> 17:10-17:30 Wrap-Up >> 17:30- Social Event >> >> >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > > Dan Young > Solutions Architect, Pixit Media > +1-347-249-7413 | dyoung at pixitmedia.com > www.pixitmedia.com | Tw:@pixitmedia > > > This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From kkr at lbl.gov Fri Oct 26 04:36:50 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 25 Oct 2018 20:36:50 -0700 Subject: [gpfsug-discuss] =?utf-8?q?Still_Time_to_Register!_--_Spectrum_Sc?= =?utf-8?q?ale_User_Group_Meeting_=E2=80=93_NYC_-_New_York_Genome_Center?= In-Reply-To: References: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> Message-ID: Yup. Richard is collecting them and we will upload afterwards. Sent from my iPhone > On Oct 25, 2018, at 5:54 PM, Ryan Novosielski wrote: > > What they said was ?spectrumscale.org?. I suspect they?ll wind up here: http://www.spectrumscaleug.org/presentations/ > > -- > ____ > || \\UTGERS, |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > `' > >> On Oct 25, 2018, at 12:59 PM, Dan Young wrote: >> >> Did I miss where these presentations were uploaded? People kept saying throughout the day that these would be uploaded somewhere. >> >> On Mon, 22 Oct 2018 at 19:45, Kristy Kallback-Rose wrote: >> There?s still some room left for NYC event on THIS WEDNESDAY if you want to attend, use the link below. >> >> The current agenda is: >> >> 8:45 AM >> 9:00 AM >> Coffee & Registration Presenter >> 9:00 AM >> 9:15 AM >> Welcome Amy Hirst & Chris Black >> 9:15 AM >> 9:45 AM >> What is new in IBM Spectrum Scale? Piyush Chaudhary >> 9:45 AM >> 10:00 AM >> What is new in ESS? John Sing >> 10:00 AM >> 10:20 AM >> How does CORAL help other workloads? Kevin Gildea >> 10:20 AM >> 10:40 AM >> Break >> 10:40 AM >> 11:00 AM >> Customer Talk ? The New York Genome Center Chris Black >> 11:00 AM >> 11:20 AM >> Spinning up a Hadoop cluster on demand Piyush Chaudhary >> 11:20 AM >> 11:40 AM >> Customer Talk ? Mt. Sinai School of Medicine Francesca Tartaglione >> 11:40 AM >> 12:00 PM >> AI Reference Architecture Piyush Chaudhary >> 12:00 PM >> 12:50 PM >> Lunch >> 12:50 PM >> 1:30 PM >> Special Talk Joe Dain >> 1:30 PM >> 1:50 PM >> Multi-cloud Transparent Cloud Tiering Rob Basham >> 1:50 PM >> 2:10 PM >> Customer Talk ? Princeton University Curtis W. Hillegas >> 2:10 PM >> 2:30 PM >> Updates on Container Support John Lewars >> 2:30 PM >> 2:50 PM >> Customer Talk ? NYU Michael Costantino >> 2:50 PM >> 3:10 PM >> Spectrum Archive and TS1160 Carl Reasoner >> 3:10 PM >> 3:30 PM >> Break >> 3:30 PM >> 4:10 PM >> IBM Spectrum Scale Network Related Troubleshooting John Lewars & Felipe Knop >> 4:10 PM >> 4:40 PM >> Service Update Jim Doherty >> 4:40 PM >> 5:10 PM >> Open Forum >> 5:10 PM >> 5:30 PM >> Wrap-Up >> Social Event - Mezzanine at the Dominick Hotel (246 Spring Street) >> >> >>> On Sep 27, 2018, at 7:22 AM, Oesterlin, Robert wrote: >>> >>> For those of you in the NE US or NYC area, here is the agenda for the NYC meeting coming up on October 24th. Special thanks to Richard Rupp at IBM for helping to organize this event. If you can make it, please register at the Eventbrite link below. >>> >>> Spectrum Scale User Group ? NYC >>> October 24th, 2018 >>> The New York Genome Center >>> 101 Avenue of the Americas, New York, NY 10013 First Floor Auditorium >>> >>> Register Here: https://www.eventbrite.com/e/2018-spectrum-scale-user-group-nyc-tickets-49786782607 >>> >>> 08:45-09:00 Coffee & Registration >>> 09:00-09:15 Welcome >>> 09:15-09:45 What is new in IBM Spectrum Scale? >>> 09:45-10:00 What is new in ESS? >>> 10:00-10:20 How does CORAL help other workloads? >>> 10:20-10:40 --- Break --- >>> 10:40-11:00 Customer Talk ? The New York Genome Center >>> 11:00-11:20 Spinning up a Hadoop cluster on demand >>> 11:20-11:40 Customer Talk ? Mt. Sinai School of Medicine >>> 11:40-12:10 Spectrum Scale Network Flow >>> 12:10-13:00 --- Lunch --- >>> 13:00-13:40 Special Announcement and Demonstration >>> 13:40-14:00 Multi-cloud Transparent Cloud Tiering >>> 14:00-14:20 Customer Talk ? Princeton University >>> 14:20-14:40 AI Reference Architecture >>> 14:40-15:00 Updates on Container Support >>> 15:00-15:20 Customer Talk ? TBD >>> 15:20-15:40 --- Break --- >>> 15:40-16:10 IBM Spectrum Scale Tuning and Troubleshooting >>> 16:10-16:40 Service Update >>> 16:40-17:10 Open Forum >>> 17:10-17:30 Wrap-Up >>> 17:30- Social Event >>> >>> >>> Bob Oesterlin >>> Sr Principal Storage Engineer, Nuance >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> -- >> >> Dan Young >> Solutions Architect, Pixit Media >> +1-347-249-7413 | dyoung at pixitmedia.com >> www.pixitmedia.com | Tw:@pixitmedia >> >> >> This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mnaineni at in.ibm.com Fri Oct 26 06:09:45 2018 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Fri, 26 Oct 2018 05:09:45 +0000 Subject: [gpfsug-discuss] Nested NFSv4 Exports In-Reply-To: <1497297460.32545018.1540469527448.JavaMail.zimbra@desy.de> References: <1497297460.32545018.1540469527448.JavaMail.zimbra@desy.de> Message-ID: An HTML attachment was scrubbed... URL: From stefan.dietrich at desy.de Fri Oct 26 12:18:20 2018 From: stefan.dietrich at desy.de (Dietrich, Stefan) Date: Fri, 26 Oct 2018 13:18:20 +0200 (CEST) Subject: [gpfsug-discuss] Nested NFSv4 Exports In-Reply-To: References: <1497297460.32545018.1540469527448.JavaMail.zimbra@desy.de> Message-ID: <2127020802.32763936.1540552700548.JavaMail.zimbra@desy.de> Hi Malhal, thanks for the input. I did already run Ganesha in debug mode, maybe this snippet I saved from that time might be helpful: 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] export_check_access :EXPORT :M_DBG :Check for address 192.168.142.92 for export id 3 fullpath /gpfs/exfel/d/proc 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] client_match :EXPORT :M_DBG :Match 0x941550, type = HOSTIF_CLIENT, options 0x42302050 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] LogClientListEntry :EXPORT :M_DBG : 0x941550 HOSTIF_CLIENT: 192.168.8.32 (root_squash , R-r-, 34-, ---, TCP, ----, M anage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] client_match :EXPORT :M_DBG :Match 0x940c90, type = HOSTIF_CLIENT, options 0x42302050 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] LogClientListEntry :EXPORT :M_DBG : 0x940c90 HOSTIF_CLIENT: 192.168.8.33 (root_squash , R-r-, 34-, ---, TCP, ----, M anage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] export_check_access :EXPORT :M_DBG :EXPORT ( , , , , , -- Dele g, , ) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (root_squash , ----, 34-, ---, TCP, ----, No Manage_Gids, , anon_uid= -2, anon_gid= -2, sys) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] export_check_access :EXPORT :M_DBG :default options (root_squash , ----, 34-, UDP, TCP, ----, No Manage_Gids, -- Dele g, anon_uid= -2, anon_gid= -2, none, sys) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] export_check_access :EXPORT :M_DBG :Final options (root_squash , ----, 34-, ---, TCP, ----, No Manage_Gids, -- Dele g, anon_uid= -2, anon_gid= -2, sys) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] nfs4_export_check_access :NFS4 :INFO :NFS4: INFO: Access not allowed on Export_Id 3 /gpfs/exfel/d/proc for client ::fff f:192.168.142.92 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] nfs4_op_lookup :EXPORT :DEBUG :NFS4ERR_ACCESS Hiding Export_Id 3 Path /gpfs/exfel/d/proc with NFS4ERR_NOENT 192.168.142.92 would be the client2 from my pseudo example, /gpfs/exfel/d/proc resembles /gpfs/filesystem1/directory1 Ganesha never checks anything for /gpfs/filesystem1/directory1/sub-directory1...or rather a subdir of /gpfs/exfel/d/proc Is this what you meant by looking at the real export object? If you think this is a bug, I would open a case in order to get this analyzed. mmnfs does not show me any pseudo options, I think this has been included in 5.0.2. Regards, Stefan ----- Original Message ----- > From: "Malahal R Naineni" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Sent: Friday, October 26, 2018 7:09:45 AM > Subject: Re: [gpfsug-discuss] Nested NFSv4 Exports >>> /gpfs/filesystem1/directory1/sub-directory1 is exported to client2 as >>> read-write. >>> client2 is not included in the export for /gpfs/filesystem1/directory1. >>> Mounting /gpfs/filesystem1/directory1/sub-directory1 on client2 does not work >>> and results in a permission denied > Any NFSv4 implementation needs to traverse the pseudo path for being able to > mount an export. One would expect "client2" to traverse over > /gpfs/filesystem1/directory1/ but not list its content/other files. I strongly > think this is a bug in Ganesha implementation, it is probably looking at the > real-export object than the pseudo-object for permission checking. > One option is to change the Pseudo file system layout. For example, > "/gpfs/client2" as "Pseudo" option for export with path " > /gpfs/filesystem1/directory1/sub-directory1". This is directly not possible > with Spectrum CLI command mmnfs unless you are using the latest and greatest > ("mmnfs export add" usage would show if it supports Pseudo option). Of course, > you can manually do it (using CCR) as Ganesha itself allows it. > Yes, NFSv3 has no pseudo traversal, it should work. > Regards, Malahal. > > > ----- Original message ----- > From: "Dietrich, Stefan" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [gpfsug-discuss] Nested NFSv4 Exports > Date: Thu, Oct 25, 2018 5:52 PM > Hi, > > I am currently fiddling around with some nested NFSv4 exports and the differing > behaviour to NFSv3. > The environment is a GPFS 5.0.1 with enabled CES, so Ganesha is used as the NFS > server. > > Given the following (pseudo) directory structure: > > /gpfs/filesystem1/directory1 > /gpfs/filesystem1/directory1/sub-directory1 > /gpfs/filesystem1/directory1/sub-directory2 > > Now to the exports: > /gpfs/filesystem1/directory1 is exported to client1 as read-only. > /gpfs/filesystem1/directory1/sub-directory1 is exported to client2 as > read-write. > > client2 is not included in the export for /gpfs/filesystem1/directory1. > > Mounting /gpfs/filesystem1/directory1 on client1 works as expected. > Mounting /gpfs/filesystem1/directory1/sub-directory1 on client2 does not work > and results in a permission denied. > If I change the protocol from NFSv4 to NFSv3, it works. > > There is a section about nested NFS exports in the mmnfs doc: > Creating nested exports (such as /path/to/folder and /path/to/folder/subfolder) > is strongly discouraged since this might lead to serious issues in data > consistency. Be very cautious when creating and using nested exports. > If there is a need to have nested exports (such as /path/to/folder and > /path/to/folder/inside/subfolder), NFSv4 client that mounts the parent > (/path/to/folder) export will not be able to see the child export subtree > (/path/to/folder/inside/subfolder) unless the same client is explicitly allowed > to access the child export as well. This is okay as long as the client uses > only NFSv4 mounts. > > The Linux kernel NFS server and other NFSv4 servers do not show this behaviour. > Is there a way to bypass this with CES/Ganesha? Or is the only solution to add > client2 to /gpfs/filesystem1/directory1? > > Regards, > Stefan > > -- > ------------------------------------------------------------------------ > Stefan Dietrich Deutsches Elektronen-Synchrotron (IT-Systems) > Ein Forschungszentrum der Helmholtz-Gemeinschaft > Notkestr. 85 > phone: +49-40-8998-4696 22607 Hamburg > e-mail: stefan.dietrich at desy.de Germany > ------------------------------------------------------------------------ > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > [ http://gpfsug.org/mailman/listinfo/gpfsug-discuss | > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ] > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From xhejtman at ics.muni.cz Fri Oct 26 15:24:38 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 26 Oct 2018 16:24:38 +0200 Subject: [gpfsug-discuss] Multihomed nodes and failover networks Message-ID: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> Hello, does anyone know whether there is a chance to use e.g., 10G ethernet together with IniniBand network for multihoming of GPFS nodes? I mean to setup two different type of networks to mitigate network failures. I read that you can have several networks configured in GPFS but it does not provide failover. Nothing changed in this as of GPFS version 5.x? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From S.J.Thompson at bham.ac.uk Fri Oct 26 15:48:48 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 26 Oct 2018 14:48:48 +0000 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> Message-ID: If IB is enabled and is setup with verbs, then this is the preferred network. GPFS will always fail-back to Ethernet afterwards, however what you can't do is have multiple "subnets" defined and have GPFS fail between different Ethernet networks. Simon ?On 26/10/2018, 15:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of xhejtman at ics.muni.cz" wrote: Hello, does anyone know whether there is a chance to use e.g., 10G ethernet together with IniniBand network for multihoming of GPFS nodes? I mean to setup two different type of networks to mitigate network failures. I read that you can have several networks configured in GPFS but it does not provide failover. Nothing changed in this as of GPFS version 5.x? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From xhejtman at ics.muni.cz Fri Oct 26 15:52:43 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 26 Oct 2018 16:52:43 +0200 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> Message-ID: <20181026145243.jflbns4zfoxgmshi@ics.muni.cz> On Fri, Oct 26, 2018 at 02:48:48PM +0000, Simon Thompson wrote: > If IB is enabled and is setup with verbs, then this is the preferred > network. GPFS will always fail-back to Ethernet afterwards, however what you > can't do is have multiple "subnets" defined and have GPFS fail between > different Ethernet networks. Does it fail-back to Etherenet even in runtime? I mean, doesn't fail-back happen only during mmstartup? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From jonathan.buzzard at strath.ac.uk Fri Oct 26 15:52:43 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 26 Oct 2018 15:52:43 +0100 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> Message-ID: <1cec0eaf-b6d9-77ee-889f-79d57d105615@strath.ac.uk> On 26/10/2018 15:48, Simon Thompson wrote: > If IB is enabled and is setup with verbs, then this is the preferred > network. GPFS will always fail-back to Ethernet afterwards, however > what you can't do is have multiple "subnets" defined and have GPFS > fail between different Ethernet networks. > If you want mitigate network failures then you need to mitigate it at layer 2. However it won't be cheap. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From xhejtman at ics.muni.cz Fri Oct 26 15:56:45 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 26 Oct 2018 16:56:45 +0200 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: <1cec0eaf-b6d9-77ee-889f-79d57d105615@strath.ac.uk> References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> <1cec0eaf-b6d9-77ee-889f-79d57d105615@strath.ac.uk> Message-ID: <20181026145645.qzn24jp26anxayub@ics.muni.cz> On Fri, Oct 26, 2018 at 03:52:43PM +0100, Jonathan Buzzard wrote: > On 26/10/2018 15:48, Simon Thompson wrote: > > If IB is enabled and is setup with verbs, then this is the preferred > > network. GPFS will always fail-back to Ethernet afterwards, however > > what you can't do is have multiple "subnets" defined and have GPFS > > fail between different Ethernet networks. > > > > If you want mitigate network failures then you need to mitigate it at layer > 2. However it won't be cheap. well, I believe this should be exactly what more 'subnets' are used for.. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From xhejtman at ics.muni.cz Fri Oct 26 15:57:53 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 26 Oct 2018 16:57:53 +0200 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: <20181026145243.jflbns4zfoxgmshi@ics.muni.cz> References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> <20181026145243.jflbns4zfoxgmshi@ics.muni.cz> Message-ID: <20181026145753.ijokzwbjh3aznxwr@ics.muni.cz> On Fri, Oct 26, 2018 at 04:52:43PM +0200, Lukas Hejtmanek wrote: > On Fri, Oct 26, 2018 at 02:48:48PM +0000, Simon Thompson wrote: > > If IB is enabled and is setup with verbs, then this is the preferred > > network. GPFS will always fail-back to Ethernet afterwards, however what you > > can't do is have multiple "subnets" defined and have GPFS fail between > > different Ethernet networks. > > Does it fail-back to Etherenet even in runtime? I mean, doesn't fail-back > happen only during mmstartup? moreover, are verbs used also for cluster management? E.g., node keepalive messages. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From S.J.Thompson at bham.ac.uk Fri Oct 26 15:59:08 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 26 Oct 2018 14:59:08 +0000 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: <20181026145243.jflbns4zfoxgmshi@ics.muni.cz> References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> <20181026145243.jflbns4zfoxgmshi@ics.muni.cz> Message-ID: Yes ... if the IB network goes down ... But it's not really fault tolerant, as you need the admin network for token management, so you could lose IB and have data fail to the Ethernet path, but not lose Ethernet. And it doesn't (or didn't) fail back to IB when IB come live again, though that might have changed with 5.0.2. Simon ?On 26/10/2018, 15:52, "gpfsug-discuss-bounces at spectrumscale.org on behalf of xhejtman at ics.muni.cz" wrote: On Fri, Oct 26, 2018 at 02:48:48PM +0000, Simon Thompson wrote: > If IB is enabled and is setup with verbs, then this is the preferred > network. GPFS will always fail-back to Ethernet afterwards, however what you > can't do is have multiple "subnets" defined and have GPFS fail between > different Ethernet networks. Does it fail-back to Etherenet even in runtime? I mean, doesn't fail-back happen only during mmstartup? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From eric.wonderley at vt.edu Fri Oct 26 15:44:13 2018 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 26 Oct 2018 10:44:13 -0400 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> Message-ID: Multihoming is accomplished by using subnets...see mmchconfig. Failover networks on the other hand are not allowed. Bad network behavior is dealt with by expelling nodes. You must have decent/supported network gear...we have learned that lesson the hard way On Fri, Oct 26, 2018 at 10:37 AM Lukas Hejtmanek wrote: > Hello, > > does anyone know whether there is a chance to use e.g., 10G ethernet > together > with IniniBand network for multihoming of GPFS nodes? > > I mean to setup two different type of networks to mitigate network > failures. > I read that you can have several networks configured in GPFS but it does > not > provide failover. Nothing changed in this as of GPFS version 5.x? > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vtarasov at us.ibm.com Fri Oct 26 23:58:16 2018 From: vtarasov at us.ibm.com (Vasily Tarasov) Date: Fri, 26 Oct 2018 22:58:16 +0000 Subject: [gpfsug-discuss] If you're attending KubeCon'18 Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Oct 29 00:29:51 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 29 Oct 2018 00:29:51 +0000 Subject: [gpfsug-discuss] Presentations from SSUG Meeting, Oct 24th - NY Genome Center Message-ID: <2CF4E6B3-B39E-4567-91A5-58C39A720362@nuance.com> These are now on the web site under ?Presentations? - single zip file has them all. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Oct 29 16:33:35 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 29 Oct 2018 12:33:35 -0400 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Message-ID: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> Flipping through the slides from the recent SSUG meeting I noticed that in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. Reading up on it it seems as though it comes with a warning about significant I/O performance degradation and increase in CPU usage. I also recall that data integrity checking is performed by default with GNR. How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? As someone who knows exactly 0 of the implementation details, I'm just naively assuming that the checksum are being generated (in the same way?) in both cases and transferred to the NSD server. Why is there such a penalty for "traditional" environments? -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From kums at us.ibm.com Mon Oct 29 19:56:09 2018 From: kums at us.ibm.com (Kumaran Rajaram) Date: Mon, 29 Oct 2018 14:56:09 -0500 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> Message-ID: Hi, >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? >>Why is there such a penalty for "traditional" environments? In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). My two cents. Regards, -Kums From: Aaron Knister To: gpfsug main discussion list Date: 10/29/2018 12:34 PM Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Sent by: gpfsug-discuss-bounces at spectrumscale.org Flipping through the slides from the recent SSUG meeting I noticed that in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. Reading up on it it seems as though it comes with a warning about significant I/O performance degradation and increase in CPU usage. I also recall that data integrity checking is performed by default with GNR. How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? As someone who knows exactly 0 of the implementation details, I'm just naively assuming that the checksum are being generated (in the same way?) in both cases and transferred to the NSD server. Why is there such a penalty for "traditional" environments? -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Mon Oct 29 20:47:24 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 29 Oct 2018 16:47:24 -0400 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> Message-ID: <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> So the ESS checksums that are highly touted as "protecting all the way to the disk surface" completely ignore the transfer between the client and the NSD server? It sounds like you are saying that all of the checksumming done for GNR is internal to GNR and only protects against bit-flips on the disk (and in staging buffers, etc.) I?m asking because your explanation completely ignores calculating anything on the NSD client and implies that the client could not participate, given that it does not know about the structure of the vdisks under the NSD ? but that has to be a performance factor for both types if the transfer is protected starting at the client ? which it is in the case of nsdCksumTraditional which is what we are comparing to ESS checksumming. If ESS checksumming doesn?t protect on the wire I?d say that marketing has run amok, because that has *definitely* been implied in meetings for which I?ve been present. In fact, when asked if Spectrum Scale provides checksumming for data in-flight, IBM sales has used it as an ESS up-sell opportunity. -- Stephen > On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram > wrote: > > Hi, > > >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? > >>Why is there such a penalty for "traditional" environments? > > In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. > > In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). > > My two cents. > > Regards, > -Kums > > > > > > From: Aaron Knister > > To: gpfsug main discussion list > > Date: 10/29/2018 12:34 PM > Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Flipping through the slides from the recent SSUG meeting I noticed that > in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. > Reading up on it it seems as though it comes with a warning about > significant I/O performance degradation and increase in CPU usage. I > also recall that data integrity checking is performed by default with > GNR. How can it be that the I/O performance degradation warning only > seems to accompany the nsdCksumTraditional setting and not GNR? As > someone who knows exactly 0 of the implementation details, I'm just > naively assuming that the checksum are being generated (in the same > way?) in both cases and transferred to the NSD server. Why is there such > a penalty for "traditional" environments? > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Oct 29 21:27:41 2018 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 29 Oct 2018 16:27:41 -0500 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> Message-ID: Stephen, ESS does perform checksums in the transfer between NSD clients and NSD servers. As Kums described below, the difference between the checksums performed by GNR and those performed with "nsdCksumTraditional" is that GNR checksums are computed in parallel on the server side, as a large FS block is broken into smaller pieces. On non-GNR environments (when nsdCksumTraditional is set), the checksum is computed sequentially on the server. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Stephen Ulmer To: gpfsug main discussion list Date: 10/29/2018 04:52 PM Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Sent by: gpfsug-discuss-bounces at spectrumscale.org So the ESS checksums that are highly touted as "protecting all the way to the disk surface" completely ignore the transfer between the client and the NSD server? It sounds like you are saying that all of the checksumming done for GNR is internal to GNR and only protects against bit-flips on the disk (and in staging buffers, etc.) I?m asking because your explanation completely ignores calculating anything on the NSD client and implies that the client could not participate, given that it does not know about the structure of the vdisks under the NSD ? but that has to be a performance factor for both types if the transfer is protected starting at the client ? which it is in the case of nsdCksumTraditional which is what we are comparing to ESS checksumming. If ESS checksumming doesn?t protect on the wire I?d say that marketing has run amok, because that has *definitely* been implied in meetings for which I?ve been present. In fact, when asked if Spectrum Scale provides checksumming for data in-flight, IBM sales has used it as an ESS up-sell opportunity. -- Stephen On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram wrote: Hi, >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? >>Why is there such a penalty for "traditional" environments? In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). My two cents. Regards, -Kums From: Aaron Knister To: gpfsug main discussion list < gpfsug-discuss at spectrumscale.org> Date: 10/29/2018 12:34 PM Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Sent by: gpfsug-discuss-bounces at spectrumscale.org Flipping through the slides from the recent SSUG meeting I noticed that in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. Reading up on it it seems as though it comes with a warning about significant I/O performance degradation and increase in CPU usage. I also recall that data integrity checking is performed by default with GNR. How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? As someone who knows exactly 0 of the implementation details, I'm just naively assuming that the checksum are being generated (in the same way?) in both cases and transferred to the NSD server. Why is there such a penalty for "traditional" environments? -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kums at us.ibm.com Mon Oct 29 21:29:33 2018 From: kums at us.ibm.com (Kumaran Rajaram) Date: Mon, 29 Oct 2018 16:29:33 -0500 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> Message-ID: In non-GNR setup, nsdCksumTraditional=yes enables data-integrity checking between a traditional NSD client node and its NSD server, at the network level only. The ESS storage supports end-to-end checksum, NSD client to the ESS IO servers (at the network level) as well as from ESS IO servers to the disk/storage. This is further detailed in the docs (link below): https://www.ibm.com/support/knowledgecenter/en/SSYSP8_5.3.1/com.ibm.spectrum.scale.raid.v5r01.adm.doc/bl1adv_introe2echecksum.htm Best, -Kums From: Stephen Ulmer To: gpfsug main discussion list Date: 10/29/2018 04:52 PM Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Sent by: gpfsug-discuss-bounces at spectrumscale.org So the ESS checksums that are highly touted as "protecting all the way to the disk surface" completely ignore the transfer between the client and the NSD server? It sounds like you are saying that all of the checksumming done for GNR is internal to GNR and only protects against bit-flips on the disk (and in staging buffers, etc.) I?m asking because your explanation completely ignores calculating anything on the NSD client and implies that the client could not participate, given that it does not know about the structure of the vdisks under the NSD ? but that has to be a performance factor for both types if the transfer is protected starting at the client ? which it is in the case of nsdCksumTraditional which is what we are comparing to ESS checksumming. If ESS checksumming doesn?t protect on the wire I?d say that marketing has run amok, because that has *definitely* been implied in meetings for which I?ve been present. In fact, when asked if Spectrum Scale provides checksumming for data in-flight, IBM sales has used it as an ESS up-sell opportunity. -- Stephen On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram wrote: Hi, >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? >>Why is there such a penalty for "traditional" environments? In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). My two cents. Regards, -Kums From: Aaron Knister To: gpfsug main discussion list Date: 10/29/2018 12:34 PM Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Sent by: gpfsug-discuss-bounces at spectrumscale.org Flipping through the slides from the recent SSUG meeting I noticed that in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. Reading up on it it seems as though it comes with a warning about significant I/O performance degradation and increase in CPU usage. I also recall that data integrity checking is performed by default with GNR. How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? As someone who knows exactly 0 of the implementation details, I'm just naively assuming that the checksum are being generated (in the same way?) in both cases and transferred to the NSD server. Why is there such a penalty for "traditional" environments? -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Tue Oct 30 00:39:35 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 29 Oct 2018 20:39:35 -0400 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> Message-ID: The point of the original question was to discover why there is a warning about performance for nsdChksumTraditional=yes, but that warning doesn?t seem to apply to an ESS environment. Your reply was that checksums in an ESS environment are calculated in parallel on the NSD server based on the physical storage layout used underneath the NSD, and is thus faster. My point was that if there is never a checksum calculated by the NSD client, then how does the NSD server know that it got uncorrupted data? The link you referenced below (thank you!) indicates that, in fact, the NSD client DOES calculate a checksum and forward it with the data to the NSD server. The server validates the data (necessitating a re-calculation of the checksum), and then GNR stores the data, A CHECKSUM[1], and some block metadata to media. So this leaves us with a checksum calculated by the client and then validated (re-calculated) by the server ? IN BOTH CASES. For the GNR case, another checksum in calculated and stored with the data for another purpose, but that means that the nsdChksumTraditional=yes case is exactly like the first phase of the GNR case. So why is that case slower when it does less work? Slow enough to merit a warning, no less! I?m really not trying to be a pest, but I have a logic problem with either the question or the answer ? they aren?t consistent (or I can?t rationalize them to be so). -- Stephen [1] The document is vague (I believe intentionally, because it could have easily been made clear) as to whether this is the same checksum or a different one. Presumably the server-side-new-checksum is calculated in parallel and protects the chunklets or whatever they're called. This is all consistent with what you said! > On Oct 29, 2018, at 5:29 PM, Kumaran Rajaram > wrote: > > In non-GNR setup, nsdCksumTraditional=yes enables data-integrity checking between a traditional NSD client node and its NSD server, at the network level only. > > The ESS storage supports end-to-end checksum, NSD client to the ESS IO servers (at the network level) as well as from ESS IO servers to the disk/storage. This is further detailed in the docs (link below): > > https://www.ibm.com/support/knowledgecenter/en/SSYSP8_5.3.1/com.ibm.spectrum.scale.raid.v5r01.adm.doc/bl1adv_introe2echecksum.htm > > Best, > -Kums > > > > > > From: Stephen Ulmer > > To: gpfsug main discussion list > > Date: 10/29/2018 04:52 PM > Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > So the ESS checksums that are highly touted as "protecting all the way to the disk surface" completely ignore the transfer between the client and the NSD server? It sounds like you are saying that all of the checksumming done for GNR is internal to GNR and only protects against bit-flips on the disk (and in staging buffers, etc.) > > I?m asking because your explanation completely ignores calculating anything on the NSD client and implies that the client could not participate, given that it does not know about the structure of the vdisks under the NSD ? but that has to be a performance factor for both types if the transfer is protected starting at the client ? which it is in the case of nsdCksumTraditional which is what we are comparing to ESS checksumming. > > If ESS checksumming doesn?t protect on the wire I?d say that marketing has run amok, because that has *definitely* been implied in meetings for which I?ve been present. In fact, when asked if Spectrum Scale provides checksumming for data in-flight, IBM sales has used it as an ESS up-sell opportunity. > > -- > Stephen > > > > On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram > wrote: > > Hi, > > >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? > >>Why is there such a penalty for "traditional" environments? > > In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. > > In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). > > My two cents. > > Regards, > -Kums > > > > > > From: Aaron Knister > > To: gpfsug main discussion list > > Date: 10/29/2018 12:34 PM > Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Flipping through the slides from the recent SSUG meeting I noticed that > in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. > Reading up on it it seems as though it comes with a warning about > significant I/O performance degradation and increase in CPU usage. I > also recall that data integrity checking is performed by default with > GNR. How can it be that the I/O performance degradation warning only > seems to accompany the nsdCksumTraditional setting and not GNR? As > someone who knows exactly 0 of the implementation details, I'm just > naively assuming that the checksum are being generated (in the same > way?) in both cases and transferred to the NSD server. Why is there such > a penalty for "traditional" environments? > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Tue Oct 30 00:53:06 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Tue, 30 Oct 2018 00:53:06 +0000 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: References: , <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov><326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 30 09:03:06 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 30 Oct 2018 09:03:06 +0000 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> Message-ID: <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> On 29/10/2018 20:47, Stephen Ulmer wrote: [SNIP] > > If ESS checksumming doesn?t protect on the wire I?d say that marketing > has run amok, because that has *definitely* been implied in meetings for > which I?ve been present. In fact, when asked if?Spectrum Scale provides > checksumming for data in-flight, IBM sales has used it as an ESS up-sell > opportunity. > Noting that on a TCP/IP network anything passing over a TCP connection is checksummed at the network layer. Consequently any addition checksumming is basically superfluous. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From daniel.kidger at uk.ibm.com Tue Oct 30 10:56:09 2018 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Tue, 30 Oct 2018 10:56:09 +0000 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: Message-ID: Remember too that in a traditional GPFS setup, the NSD servers are effectively merely data routers (since the clients know exactly where the block is going to be written) and as such NSD servers can be previous generation hardware. By contrast GNR needs cpu cycles and plenty of memory, so ESS nodes are naturally big and fast (as well as benefitting from parallel threads working together on the GNR). Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-(0)7818 522 266 daniel.kidger at uk.ibm.com > On 30 Oct 2018, at 00:53, Andrew Beattie wrote: > > Stephen, > > I think you also need to take into consideration that IBM does not control what infrastructure users may chose to deploy Spectrum scale on outside of ESS hardware. > > As such it is entirely possible that older or lower spec hardware, or even virtualised NSD Servers with even lower resources per virtual node, will have potential issues when running the nsdChksumTraditional=yes flag, As such IBM has a duty of care to provide a warning that you may experience issues if you turn the additional workload on. > > Beyond this i'm not seeing why there is an issue, if you turn the flag on in a non ESS scenario the process is Serialised, if you turn it on in an ESS Scenario you get to take advantage of the fact that Scale Native Raid does a significant amount of the work in a parallelised method, one is less resource intensive than the other, because the process is handled differently depending on the type of NSD Servers doing the work. > > > > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > ----- Original message ----- > From: Stephen Ulmer > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) > Date: Tue, Oct 30, 2018 10:39 AM > > The point of the original question was to discover why there is a warning about performance for nsdChksumTraditional=yes, but that warning doesn?t seem to apply to an ESS environment. > > Your reply was that checksums in an ESS environment are calculated in parallel on the NSD server based on the physical storage layout used underneath the NSD, and is thus faster. My point was that if there is never a checksum calculated by the NSD client, then how does the NSD server know that it got uncorrupted data? > > The link you referenced below (thank you!) indicates that, in fact, the NSD client DOES calculate a checksum and forward it with the data to the NSD server. The server validates the data (necessitating a re-calculation of the checksum), and then GNR stores the data, A CHECKSUM[1], and some block metadata to media. > > So this leaves us with a checksum calculated by the client and then validated (re-calculated) by the server ? IN BOTH CASES. For the GNR case, another checksum in calculated and stored with the data for another purpose, but that means that the nsdChksumTraditional=yes case is exactly like the first phase of the GNR case. So why is that case slower when it does less work? Slow enough to merit a warning, no less! > > I?m really not trying to be a pest, but I have a logic problem with either the question or the answer ? they aren?t consistent (or I can?t rationalize them to be so). > > -- > Stephen > > [1] The document is vague (I believe intentionally, because it could have easily been made clear) as to whether this is the same checksum or a different one. Presumably the server-side-new-checksum is calculated in parallel and protects the chunklets or whatever they're called. This is all consistent with what you said! > > > >> >> On Oct 29, 2018, at 5:29 PM, Kumaran Rajaram wrote: >> >> In non-GNR setup, nsdCksumTraditional=yes enables data-integrity checking between a traditional NSD client node and its NSD server, at the network level only. >> >> The ESS storage supports end-to-end checksum, NSD client to the ESS IO servers (at the network level) as well as from ESS IO servers to the disk/storage. This is further detailed in the docs (link below): >> >> https://www.ibm.com/support/knowledgecenter/en/SSYSP8_5.3.1/com.ibm.spectrum.scale.raid.v5r01.adm.doc/bl1adv_introe2echecksum.htm >> >> Best, >> -Kums >> >> >> >> >> >> From: Stephen Ulmer >> To: gpfsug main discussion list >> Date: 10/29/2018 04:52 PM >> Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> So the ESS checksums that are highly touted as "protecting all the way to the disk surface" completely ignore the transfer between the client and the NSD server? It sounds like you are saying that all of the checksumming done for GNR is internal to GNR and only protects against bit-flips on the disk (and in staging buffers, etc.) >> >> I?m asking because your explanation completely ignores calculating anything on the NSD client and implies that the client could not participate, given that it does not know about the structure of the vdisks under the NSD ? but that has to be a performance factor for both types if the transfer is protected starting at the client ? which it is in the case of nsdCksumTraditional which is what we are comparing to ESS checksumming. >> >> If ESS checksumming doesn?t protect on the wire I?d say that marketing has run amok, because that has *definitely* been implied in meetings for which I?ve been present. In fact, when asked if Spectrum Scale provides checksumming for data in-flight, IBM sales has used it as an ESS up-sell opportunity. >> >> -- >> Stephen >> >> >> >> On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram wrote: >> >> Hi, >> >> >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? >> >>Why is there such a penalty for "traditional" environments? >> >> In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. >> >> In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). >> >> My two cents. >> >> Regards, >> -Kums >> >> >> >> >> >> From: Aaron Knister >> To: gpfsug main discussion list >> Date: 10/29/2018 12:34 PM >> Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Flipping through the slides from the recent SSUG meeting I noticed that >> in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. >> Reading up on it it seems as though it comes with a warning about >> significant I/O performance degradation and increase in CPU usage. I >> also recall that data integrity checking is performed by default with >> GNR. How can it be that the I/O performance degradation warning only >> seems to accompany the nsdCksumTraditional setting and not GNR? As >> someone who knows exactly 0 of the implementation details, I'm just >> naively assuming that the checksum are being generated (in the same >> way?) in both cases and transferred to the NSD server. Why is there such >> a penalty for "traditional" environments? >> >> -Aaron >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Tue Oct 30 12:30:20 2018 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]) Date: Tue, 30 Oct 2018 12:30:20 +0000 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org>, <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> Message-ID: <0765E436-870B-430D-89D3-89CE60E94CCB@nasa.gov> I?m guessing IBM doesn?t generally spend huge amounts of money on things that are superfluous...although *cough*RedHat*cough*. TCP does of course perform checksumming, but I see the NSD checksums as being at a higher ?layer?, if you will. The layer at which I believe the NSD checksums operate sits above the complex spaghetti monster of queues, buffers, state machines, kernel/user space communication inside of GPFS as well as networking drivers that can suck (looking at you Intel, Mellanox), and high speed networking hardware all of which I?ve seen cause data corruption (even though the data on the wire was in some cases checksummed correctly). -Aaron On October 30, 2018 at 05:03:26 EDT, Jonathan Buzzard wrote: On 29/10/2018 20:47, Stephen Ulmer wrote: [SNIP] > > If ESS checksumming doesn?t protect on the wire I?d say that marketing > has run amok, because that has *definitely* been implied in meetings for > which I?ve been present. In fact, when asked if Spectrum Scale provides > checksumming for data in-flight, IBM sales has used it as an ESS up-sell > opportunity. > Noting that on a TCP/IP network anything passing over a TCP connection is checksummed at the network layer. Consequently any addition checksumming is basically superfluous. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Tue Oct 30 22:14:00 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Tue, 30 Oct 2018 18:14:00 -0400 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> Message-ID: <107111.1540937640@turing-police.cc.vt.edu> On Tue, 30 Oct 2018 09:03:06 -0000, Jonathan Buzzard said: > Noting that on a TCP/IP network anything passing over a TCP connection > is checksummed at the network layer. Consequently any addition > checksumming is basically superfluous. Note that the TCP checksum is relatively weak, and designed in a day when a 56K leased line was a high-speed long-haul link and 10mbit ethernet was the fastest thing on the planet. When 10 megabytes was a large transfer, it was a reasonable amount of protection. But when you get into moving petabytes of data around, the chances of an undetected error starts getting significant. Pop quiz time: When was the last time you (the reader) checked your network statistics to see what your bit error rate was? Do you even have the ability to do so? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From bbanister at jumptrading.com Tue Oct 30 22:52:35 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 30 Oct 2018 22:52:35 +0000 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <107111.1540937640@turing-police.cc.vt.edu> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> <107111.1540937640@turing-police.cc.vt.edu> Message-ID: Valdis will also recall how much "fun" we had with network related corruption due to what we surmised was a TCP offload engine FW defect in a certain 10GbE HCA. Only happened sporadically every few weeks... what a nightmare that was!! -B -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of valdis.kletnieks at vt.edu Sent: Tuesday, October 30, 2018 5:14 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) [EXTERNAL EMAIL] On Tue, 30 Oct 2018 09:03:06 -0000, Jonathan Buzzard said: > Noting that on a TCP/IP network anything passing over a TCP connection > is checksummed at the network layer. Consequently any addition > checksumming is basically superfluous. Note that the TCP checksum is relatively weak, and designed in a day when a 56K leased line was a high-speed long-haul link and 10mbit ethernet was the fastest thing on the planet. When 10 megabytes was a large transfer, it was a reasonable amount of protection. But when you get into moving petabytes of data around, the chances of an undetected error starts getting significant. Pop quiz time: When was the last time you (the reader) checked your network statistics to see what your bit error rate was? Do you even have the ability to do so? ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. From makaplan at us.ibm.com Tue Oct 30 23:15:38 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 30 Oct 2018 18:15:38 -0500 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov><326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org><72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk><107111.1540937640@turing-police.cc.vt.edu> Message-ID: I confess, I know what checksums are generally and how and why they are used, but I am not familiar with all the various checksums that have been discussed here. I'd like to see a list or a chart with the following information for each checksum: Computed on what data elements, of what (typical) length (e.g. packet, disk block, disk fragment, disk sector) Checksum function used, how many bits of checksum computed on each data element. Computed by what software or hardware entity at what nodes in the network. There may be such checksums on each NSD transfer. Lowest layers would be checking data coming off of the disk. Checking network packets coming off ethernet or IB adapters. Higher layer for NSD could be a checksum on a whole disk block and/or on NSD request and response, including message headers AND the disk data... -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Oct 31 01:09:40 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Tue, 30 Oct 2018 21:09:40 -0400 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> <107111.1540937640@turing-police.cc.vt.edu> Message-ID: <122689.1540948180@turing-police.cc.vt.edu> On Tue, 30 Oct 2018 22:52:35 -0000, Bryan Banister said: > Valdis will also recall how much "fun" we had with network related corruption > due to what we surmised was a TCP offload engine FW defect in a certain 10GbE > HCA. Only happened sporadically every few weeks... what a nightmare that was!! It makes for quite the bar story, as the symptoms pointed everywhere except the network adapter. For the purposes of this thread though, two points to note: 1) The card in question was a spectacularly good price/performer and totally rock solid in 4 NFS servers that we had - in 6 years of trying, I never managed to make them hiccup (the one suspected failure turned out to be a fiber cable that had gotten crimped when the rack door was closed on a loop). 2) Since the TCP offload engine was computing the checksum across the data, but it had gotten confused about which data it was about to transmit, every single packet went out with a perfectly correct checksum. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From rohwedder at de.ibm.com Wed Oct 31 15:33:54 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Wed, 31 Oct 2018 16:33:54 +0100 Subject: [gpfsug-discuss] Spectrum Scale Survey Message-ID: Hello Spectrum Scale Users, we have started a survey on how certain Spectrum Scale administrative tasks are performed. The survey focuses on use of tasks like snapshots or ILM including monitoring, scheduling and problem determination of these capabilities. It should take only a few minutes to complete the survey. Please take a look and let us know how you are using Spectrum Scale and what aspects are important for you. Here is the survey link: https://www.surveygizmo.com/s3/4631738/IBM-Spectrum-Scale-Administrative-Management Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18977725.gif Type: image/gif Size: 4659 bytes Desc: not available URL: From kkr at lbl.gov Wed Oct 31 20:10:02 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 31 Oct 2018 13:10:02 -0700 Subject: [gpfsug-discuss] V5 client limit? Message-ID: Hi, Can someone tell me the max # of GPFS native clients under 5.x? Everything I can find is dated. Thanks Kristy From chris.schlipalius at pawsey.org.au Mon Oct 1 06:53:06 2018 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Mon, 01 Oct 2018 13:53:06 +0800 Subject: [gpfsug-discuss] Upcoming meeting: Australian Spectrum Scale Usergroup 15th October 2018 Melbourne Message-ID: <676180C3-1B36-4D25-8325-532AF15C6552@pawsey.org.au> Dear members, Please note the next Australian Usergroup is confirmed. If you plan to attend, please register: http://bit.ly/2wHGuhY Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 13 Burvill Court Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 10709 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Tue Oct 2 09:12:28 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 2 Oct 2018 08:12:28 +0000 Subject: [gpfsug-discuss] LDAP in GUI / 5.0.2 Message-ID: Hi all, >From the release notes: "5.0.2: Added option to configure an external authentication method to manage the GUI user access in the Services > GUI page. " Does this mean I should be able to configure LDAP through the GUI because at the moment I'm not seeing any relevant options. Running 5.0.2 DME and minReleaseLevel=latest. Do I need to restart ALL nodes for this to take effect, or have I misunderstood the meaning of the above? Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Tue Oct 2 09:27:02 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Tue, 2 Oct 2018 10:27:02 +0200 Subject: [gpfsug-discuss] LDAP in GUI / 5.0.2 In-Reply-To: References: Message-ID: Hello Richard, I am sorry, it seems that the release notes document were note refreshed with the latest information. The GUI pages to modify external user authentication for GUI users have not made it into the 5.0.2 release. The Knowledge center is correct in this respect: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1xx_soc.htm Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Date: 02.10.2018 10:12 Subject: [gpfsug-discuss] LDAP in GUI / 5.0.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, >From the release notes: ?5.0.2: Added option to configure an external authentication method to manage the GUI user access in the Services > GUI page. ? Does this mean I should be able to configure LDAP through the GUI because at the moment I?m not seeing any relevant options. Running 5.0.2 DME and minReleaseLevel=latest. Do I need to restart ALL nodes for this to take effect, or have I misunderstood the meaning of the above? Thanks Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C467306.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Tue Oct 2 09:44:23 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 2 Oct 2018 08:44:23 +0000 Subject: [gpfsug-discuss] LDAP in GUI / 5.0.2 In-Reply-To: References: Message-ID: Alright, thanks for clearing that up. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Markus Rohwedder Sent: 02 October 2018 09:27 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LDAP in GUI / 5.0.2 Hello Richard, I am sorry, it seems that the release notes document were note refreshed with the latest information. The GUI pages to modify external user authentication for GUI users have not made it into the 5.0.2 release. The Knowledge center is correct in this respect: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1xx_soc.htm Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development ________________________________ Phone: +49 7034 6430190 IBM Deutschland Research & Development [cid:image002.png at 01D45A34.7F2A60F0] E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ [Inactive hide details for "Sobey, Richard A" ---02.10.2018 10:12:51---Hi all, From the release notes:]"Sobey, Richard A" ---02.10.2018 10:12:51---Hi all, From the release notes: From: "Sobey, Richard A" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 02.10.2018 10:12 Subject: [gpfsug-discuss] LDAP in GUI / 5.0.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, From the release notes: ?5.0.2: Added option to configure an external authentication method to manage the GUI user access in the Services > GUI page. ? Does this mean I should be able to configure LDAP through the GUI because at the moment I?m not seeing any relevant options. Running 5.0.2 DME and minReleaseLevel=latest. Do I need to restart ALL nodes for this to take effect, or have I misunderstood the meaning of the above? Thanks Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 166 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 4659 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.gif Type: image/gif Size: 105 bytes Desc: image003.gif URL: From Renar.Grunenberg at huk-coburg.de Tue Oct 2 11:49:33 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 2 Oct 2018 10:49:33 +0000 Subject: [gpfsug-discuss] V5.0.2 and Maxblocksize Message-ID: <796971E1-7AC1-40E1-BB4E-879C704DA054@huk-coburg.de> Hallo Spectrumscale-team, We installed the new Version 5.0.2 and had the hope that the maxblocksize Parameter are online changeable. But dont. Are there a timeframe when this 24/7 gap are fixed. The Problem here we can not shuting down the complete Cluster. Regards Renar Von meinem iPhone gesendet Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= From sandeep.patil at in.ibm.com Wed Oct 3 16:18:06 2018 From: sandeep.patil at in.ibm.com (Sandeep Ramesh) Date: Wed, 3 Oct 2018 15:18:06 +0000 Subject: [gpfsug-discuss] Latest Technical Blogs on IBM Spectrum Scale (Q3 2018) In-Reply-To: References: Message-ID: Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q3 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. How NFS exports became more dynamic with Spectrum Scale 5.0.2 https://developer.ibm.com/storage/2018/10/02/nfs-exports-became-dynamic-spectrum-scale-5-0-2/ HPC storage on AWS (IBM Spectrum Scale) https://developer.ibm.com/storage/2018/10/02/hpc-storage-aws-ibm-spectrum-scale/ Upgrade with Excluding the node(s) using Install-toolkit https://developer.ibm.com/storage/2018/09/30/upgrade-excluding-nodes-using-install-toolkit/ Offline upgrade using Install-toolkit https://developer.ibm.com/storage/2018/09/30/offline-upgrade-using-install-toolkit/ IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/21/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-2/ What?s New in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/15/whats-new-ibm-spectrum-scale-5-0-2/ Starting IBM Spectrum Scale 5.0.2 release, the installation toolkit supports upgrade rerun if fresh upgrade fails. https://developer.ibm.com/storage/2018/09/15/starting-ibm-spectrum-scale-5-0-2-release-installation-toolkit-supports-upgrade-rerun-fresh-upgrade-fails/ IBM Spectrum Scale installation toolkit ? enhancements over releases ? 5.0.2.0 https://developer.ibm.com/storage/2018/09/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases-5-0-2-0/ Announcing HDP 3.0 support with IBM Spectrum Scale https://developer.ibm.com/storage/2018/08/31/announcing-hdp-3-0-support-ibm-spectrum-scale/ IBM Spectrum Scale Tuning Overview for Hadoop Workload https://developer.ibm.com/storage/2018/08/20/ibm-spectrum-scale-tuning-overview-hadoop-workload/ Making the Most of Multicloud Storage https://developer.ibm.com/storage/2018/08/13/making-multicloud-storage/ Disaster Recovery for Transparent Cloud Tiering using SOBAR https://developer.ibm.com/storage/2018/08/13/disaster-recovery-transparent-cloud-tiering-using-sobar/ Your Optimal Choice of AI Storage for Today and Tomorrow https://developer.ibm.com/storage/2018/08/10/spectrum-scale-ai-workloads/ Analyze IBM Spectrum Scale File Access Audit with ELK Stack https://developer.ibm.com/storage/2018/07/30/analyze-ibm-spectrum-scale-file-access-audit-elk-stack/ Mellanox SX1710 40G switch MLAG configuration for IBM ESS https://developer.ibm.com/storage/2018/07/12/mellanox-sx1710-40g-switcher-mlag-configuration/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? SMB and NFS Access issues https://developer.ibm.com/storage/2018/07/10/protocol-problem-determination-guide-ibm-spectrum-scale-smb-nfs-access-issues/ Access Control in IBM Spectrum Scale Object https://developer.ibm.com/storage/2018/07/06/access-control-ibm-spectrum-scale-object/ IBM Spectrum Scale HDFS Transparency Docker support https://developer.ibm.com/storage/2018/07/06/ibm-spectrum-scale-hdfs-transparency-docker-support/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? Log Collection https://developer.ibm.com/storage/2018/07/04/protocol-problem-determination-guide-ibm-spectrum-scale-log-collection/ Redpapers IBM Spectrum Scale Immutability Introduction, Configuration Guidance, and Use Cases http://www.redbooks.ibm.com/abstracts/redp5507.html?Open Certifications Assessment of the immutability function of IBM Spectrum Scale Version 5.0 in accordance to US SEC17a-4f, EU GDPR Article 21 Section 1, German and Swiss laws and regulations in collaboration with KPMG. Certificate: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?DE968667B47544FF83F6CCDCF37E5FB5 Full assessment report: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?B290411BE1224F5A9B4D24663BCD3C5D For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 07/03/2018 12:13 AM Subject: Re: Latest Technical Blogs on Spectrum Scale (Q2 2018) Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q2 2018). We now have over 100+ developer blogs. As discussed in User Groups, passing it along: IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ IBM Spectrum Scale ILM Policies https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/ IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ Management GUI enhancements in IBM Spectrum Scale release 5.0.1 https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/ Managing IBM Spectrum Scale services through GUI https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/ Use AWS CLI with IBM Spectrum Scale? object storage https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/ Hadoop Storage Tiering with IBM Spectrum Scale https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/ How many Files on my Filesystem? https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/ Recording Spectrum Scale Object Stats for Potential Billing like Purpose using Elasticsearch https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/ New features in IBM Elastic Storage Server (ESS) Version 5.3 https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/ Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send earlier) https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19 Redpapers Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering http://www.redbooks.ibm.com/abstracts/redp5411.html?Open SAP HANA and ESS: A Winning Combination (Update) http://www.redbooks.ibm.com/abstracts/redp5436.html?Open Others IBM Spectrum Scale Software Version Recommendation Preventive Service Planning (Updated) http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in HCLS https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN& For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 03/27/2018 05:23 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q1 2018). As discussed in User Groups, passing it along: GDPR Compliance and Unstructured Data Storage https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/ IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and highlights https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/ Management GUI enhancements in IBM Spectrum Scale release 5.0.0 https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/ IBM Spectrum Scale 5.0.0 ? What?s new in NFS? https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/ Benefits and implementation of Spectrum Scale sudo wrappers https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/ IBM Spectrum Scale: Big Data and Analytics Solution Brief https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/ Variant Sub-blocks in Spectrum Scale 5.0 https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/ Compression support in Spectrum Scale 5.0.0 https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM Spectrum Scale on AWS. This solution helps the users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale. Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4 Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY. For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 01/10/2018 12:13 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, Here are list of development blogs in the last quarter. Passing it to this email group as Doris had got a feedback in the UG meetings to notify the members with the latest updates periodically. Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ IBM Spectrum Scale MMFSCK ? Savvy Enhancements https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/ ESS Disk Management https://developer.ibm.com/storage/2018/01/02/ess-disk-management/ IBM Spectrum Scale Object Protocol On Ubuntu https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/ IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/ A Complete Guide to ? Protocol Problem Determination Guide for IBM Spectrum Scale? ? Part 1 https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/ IBM Spectrum Scale installation toolkit ? enhancements over releases https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/ Network requirements in an Elastic Storage Server Setup https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/ Co-resident migration with Transparent cloud tierin https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/ IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big Data Solution https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/ Big data analytics with Spectrum Scale using remote cluster mount & multi-filesystem support https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/ IBM Spectrum Scale HDFS Transparency Short Circuit Write Support https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/ IBM Spectrum Scale HDFS Transparency Federation Support https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/ How to configure and performance tuning different system workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Spark workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning database workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Hadoop workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/ How to Configure IBM Spectrum Scale? with NIS based Authentication. https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 11/16/2017 08:15 PM Subject: Latest Technical Blogs on Spectrum Scale Dear User Group members, Here are the Development Blogs in last 3 months on Spectrum Scale Technical Topics. Spectrum Scale Monitoring ? Know More ? https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/ IBM Spectrum Scale 5.0 Release ? What?s coming ! https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/ Four Essentials things to know for managing data ACLs on IBM Spectrum Scale? from Windows https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/ GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server https://developer.ibm.com/storage/2017/11/13/gssutils/ IBM Spectrum Scale Object Authentication https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/ Video Surveillance ? Choosing the right storage https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/ IBM Spectrum scale object deep dive training with problem determination https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training Spectrum Scale as preferred software defined storage for Ubuntu OpenStack https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/ IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a performance workhorse https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/ A Complete Guide to Configure LDAP-based authentication with IBM Spectrum Scale? for File Access https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/ Deploying IBM Spectrum Scale on AWS Quick Start https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/ Monitoring Spectrum Scale Object metrics https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/ Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk Universal https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/ Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol on IBM Spectrum Scale?? https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory and LDAP https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/ IBM Spectrum Scale? Authentication using Active Directory and RFC2307 https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/ High Availability Implementation with IBM Spectrum Virtualize and IBM Spectrum Scale https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/ 10 Frequently asked Questions on configuring Authentication using AD + AUTO ID mapping on IBM Spectrum Scale?. https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/ Five cool things that you didn?t know Transparent Cloud Tiering on Spectrum Scale can do https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/ IBM Spectrum Scale GUI videos https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/ IBM Spectrum Scale? Authentication ? Planning for NFS Access https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Thu Oct 4 10:05:57 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 4 Oct 2018 09:05:57 +0000 Subject: [gpfsug-discuss] V5.0.2 and maxblocksize Message-ID: <3cc9ab310d6d42009f779ac0b1967a53@SMXRF105.msg.hukrf.de> Hallo All, i put a requirement for these gap. Link is here: http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=125603 Please Vote. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Oct 4 20:54:48 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 Oct 2018 19:54:48 +0000 Subject: [gpfsug-discuss] Long I/O's on client but not on NSD server(s) Message-ID: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> Hi All, What does it mean if I have a few dozen very long I/O?s (50 - 75 seconds) on a gateway as reported by ?mmdiag ?iohist? and they all reference two of my eight NSD servers? ? but then I go to those 2 NSD servers and I don?t see any long I/O?s at all? In other words, if the problem (this time) were the backend storage, I should see long I/O?s on the NSD servers, right? I?m thinking this indicates that there is some sort of problem with either the client gateway itself or the network in between the gateway and the NSD server(s) ? thoughts??? Thanks in advance? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjdoherty at yahoo.com Thu Oct 4 20:58:19 2018 From: jjdoherty at yahoo.com (Jim Doherty) Date: Thu, 4 Oct 2018 19:58:19 +0000 (UTC) Subject: [gpfsug-discuss] Long I/O's on client but not on NSD server(s) In-Reply-To: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> References: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> Message-ID: <2043390893.1272.1538683099673@mail.yahoo.com> It could mean a shortage of nsd server threads?? or a congested network.?? Jim On Thursday, October 4, 2018, 3:55:10 PM EDT, Buterbaugh, Kevin L wrote: Hi All, What does it mean if I have a few dozen very long I/O?s (50 - 75 seconds) on a gateway as reported by ?mmdiag ?iohist? and they all reference two of my eight NSD servers? ? but then I go to those 2 NSD servers and I don?t see any long I/O?s at all? In other words, if the problem (this time) were the backend storage, I should see long I/O?s on the NSD servers, right? I?m thinking this indicates that there is some sort of problem with either the client gateway itself or the network in between the gateway and the NSD server(s) ? thoughts??? Thanks in advance? ?Kevin Buterbaugh - Senior System AdministratorVanderbilt University - Advanced Computing Center for Research and EducationKevin.Buterbaugh at vanderbilt.edu?- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Oct 4 21:00:21 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 4 Oct 2018 16:00:21 -0400 Subject: [gpfsug-discuss] Long I/O's on client but not on NSD server(s) In-Reply-To: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> References: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> Message-ID: My first guess would be the network between the NSD client and NSD server. netstat and ethtool may help to determine where the cause may lie, if it is on the NSD client. Obviously a switch on the network could be another source of the problem. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 10/04/2018 03:55 PM Subject: [gpfsug-discuss] Long I/O's on client but not on NSD server(s) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, What does it mean if I have a few dozen very long I/O?s (50 - 75 seconds) on a gateway as reported by ?mmdiag ?iohist? and they all reference two of my eight NSD servers? ? but then I go to those 2 NSD servers and I don?t see any long I/O?s at all? In other words, if the problem (this time) were the backend storage, I should see long I/O?s on the NSD servers, right? I?m thinking this indicates that there is some sort of problem with either the client gateway itself or the network in between the gateway and the NSD server(s) ? thoughts??? Thanks in advance? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinsworkmachine at gmail.com Thu Oct 4 21:05:53 2018 From: martinsworkmachine at gmail.com (J Martin Rushton) Date: Thu, 4 Oct 2018 21:05:53 +0100 Subject: [gpfsug-discuss] Long I/O's on client but not on NSD server(s) In-Reply-To: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> References: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> Message-ID: <651fe07d-e745-e844-2f9b-44fd78ccee24@gmail.com> I saw something similar a good few years ago (ie on an older version of GPFS).? IIRC the issue was one of contention: one or two served nodes were streaming IOs to/from the NSD servers and as a result other nodes were exhibiting insane IO times.? Can't be more helpful though, I no longer have access to the system. Regards, J Martin Rushton MBCS On 04/10/18 20:54, Buterbaugh, Kevin L wrote: > Hi All, > > What does it mean if I have a few dozen very long I/O?s (50 - 75 > seconds) on a gateway as reported by ?mmdiag ?iohist? and they all > reference two of my eight NSD servers? > > ? but then I go to those 2 NSD servers and I don?t see any long I/O?s > at all? > > In other words, if the problem (this time) were the backend storage, I > should see long I/O?s on the NSD servers, right? > > I?m thinking this indicates that there is some sort of problem with > either the client gateway itself or the network in between the gateway > and the NSD server(s) ? thoughts??? > > Thanks in advance? > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > Kevin.Buterbaugh at vanderbilt.edu > ?- (615)875-9633 > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 9 14:38:21 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 9 Oct 2018 13:38:21 +0000 Subject: [gpfsug-discuss] Pmsensors and gui Message-ID: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler$1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From lgayne at us.ibm.com Tue Oct 9 14:43:09 2018 From: lgayne at us.ibm.com (Lyle Gayne) Date: Tue, 9 Oct 2018 09:43:09 -0400 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: Adding GUI personnel to respond. Lyle From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 10/09/2018 09:41 AM Subject: [gpfsug-discuss] Pmsensors and gui Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler $1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Tue Oct 9 14:54:51 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 9 Oct 2018 13:54:51 +0000 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: I can help with the first one as I had the issue a few weeks ago. The answer from support is below, verbatim. --------------------------------------------------------------------------------------------------------------------------------------------- When trying to resolve the IP-Address in the JAVA code the first entry entry in the list is returned. Just localhost was expected for this. If the order is other way around and the list starts with localhost.localdomain, the GUI unfortunately cannot resolve the real node name and will fail with the message seen in the log files. Thus I assume that this is the case for your customer. it seems that our code it not as tolerant as it should be for the localhost definitions in the /etc/hosts file on the GUI node. We need to change this in our code to handle accordingly. Please let the customer adjust this entry and place localhost at the top of the list. After this the task should run successful and the state should be OK again for the pm_collector. --------------------------------------------------------------------------------------------------------------------------------------------- checking the GUI nodes /etc/hosts it shows actually 127.0.0.1 localhost.localdomain localhost4 localhost4.localdomain4 localhost From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: 09 October 2018 14:38 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Pmsensors and gui Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler$1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.koeninger at de.ibm.com Tue Oct 9 15:03:41 2018 From: andreas.koeninger at de.ibm.com (Andreas Koeninger) Date: Tue, 9 Oct 2018 14:03:41 +0000 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Tue Oct 9 15:56:14 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Tue, 9 Oct 2018 16:56:14 +0200 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: Hello Simon, the performance collector collects data from each node with the "hostname" as in /bin/hostname as key. The GUI reaches out to all nodes and tries to map the GPFS node name to the local hostname on that node. If the hostname is set identical to be "hostname" on all nodes, the mapping will not succeed, So you will have to use unique hostnames on all cluster nodes. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: "Sobey, Richard A" To: gpfsug main discussion list Date: 09.10.2018 16:00 Subject: Re: [gpfsug-discuss] Pmsensors and gui Sent by: gpfsug-discuss-bounces at spectrumscale.org I can help with the first one as I had the issue a few weeks ago. The answer from support is below, verbatim. --------------------------------------------------------------------------------------------------------------------------------------------- When trying to resolve the IP-Address in the JAVA code the first entry entry in the list is returned. Just localhost was expected for this. If the order is other way around and the list starts with localhost.localdomain, the GUI unfortunately cannot resolve the real node name and will fail with the message seen in the log files. Thus I assume that this is the case for your customer. it seems that our code it not as tolerant as it should be for the localhost definitions in the /etc/hosts file on the GUI node. We need to change this in our code to handle accordingly. Please let the customer adjust this entry and place localhost at the top of the list. After this the task should run successful and the state should be OK again for the pm_collector. --------------------------------------------------------------------------------------------------------------------------------------------- checking the GUI nodes /etc/hosts it shows actually 127.0.0.1 localhost.localdomain localhost4 localhost4.localdomain4 localhost From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: 09 October 2018 14:38 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Pmsensors and gui Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler $1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 17486462.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Tue Oct 9 15:56:24 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 9 Oct 2018 14:56:24 +0000 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: <320AAE68-5F40-48B7-97CF-DA0029DB76C2@bham.ac.uk> Yes we do indeed have: 127.0.0.1 localhost.localdomain localhost I saw a post on the list, but never the answer ? (I don?t think!) Simon From: on behalf of "andreas.koeninger at de.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 9 October 2018 at 15:04 To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Pmsensors and gui Hi Simon, For your fist issue regarding the PM_MONITOR task, you may have hit a known issue. Please check if the following applies to your environment. I will get back to you for the second issue. -------------------- Solution: For this to fix, the customer should change the /etc/hosts entry for the 127.0.0.1 as follows: from current: 127.0.0.1 localhost.localdomain localhost to this: 127.0.0.1 localhost localhost.localdomain -------------------- Mit freundlichen Gr??en / Kind regards Andreas Koeninger Scrum Master and Software Developer / Spectrum Scale GUI and REST API IBM Systems &Technology Group, Integrated Systems Development / M069 ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49-7034-643-0867 Mobile: +49-7034-643-0867 E-Mail: andreas.koeninger at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Original message ----- From: Simon Thompson Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] Pmsensors and gui Date: Tue, Oct 9, 2018 3:42 PM Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler$1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 9 15:59:35 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 9 Oct 2018 14:59:35 +0000 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: <57798FAD-4CD1-41DB-8F00-06F34E85D34F@bham.ac.uk> We do ? Its just the node is joined to the cluster as ?hostname1-data.cluster?, but it also has a primary (1GbE link) as ?hostname.cluster?? Simon From: on behalf of "rohwedder at de.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 9 October 2018 at 15:56 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Pmsensors and gui Hello Simon, the performance collector collects data from each node with the "hostname" as in /bin/hostname as key. The GUI reaches out to all nodes and tries to map the GPFS node name to the local hostname on that node. If the hostname is set identical to be "hostname" on all nodes, the mapping will not succeed, So you will have to use unique hostnames on all cluster nodes. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development ________________________________ Phone: +49 7034 6430190 IBM Deutschland Research & Development [cid:2__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@] E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ [Inactive hide details for "Sobey, Richard A" ---09.10.2018 16:00:32---I can help with the first one as I had the issue a few we]"Sobey, Richard A" ---09.10.2018 16:00:32---I can help with the first one as I had the issue a few weeks ago. The answer from support is below, From: "Sobey, Richard A" To: gpfsug main discussion list Date: 09.10.2018 16:00 Subject: Re: [gpfsug-discuss] Pmsensors and gui Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I can help with the first one as I had the issue a few weeks ago. The answer from support is below, verbatim. --------------------------------------------------------------------------------------------------------------------------------------------- When trying to resolve the IP-Address in the JAVA code the first entry entry in the list is returned. Just localhost was expected for this. If the order is other way around and the list starts with localhost.localdomain, the GUI unfortunately cannot resolve the real node name and will fail with the message seen in the log files. Thus I assume that this is the case for your customer. it seems that our code it not as tolerant as it should be for the localhost definitions in the /etc/hosts file on the GUI node. We need to change this in our code to handle accordingly. Please let the customer adjust this entry and place localhost at the top of the list. After this the task should run successful and the state should be OK again for the pm_collector. --------------------------------------------------------------------------------------------------------------------------------------------- checking the GUI nodes /etc/hosts it shows actually 127.0.0.1 localhost.localdomain localhost4 localhost4.localdomain4 localhost From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: 09 October 2018 14:38 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Pmsensors and gui Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler$1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 46 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 4660 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.gif Type: image/gif Size: 106 bytes Desc: image003.gif URL: From S.J.Thompson at bham.ac.uk Tue Oct 9 20:37:59 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 9 Oct 2018 19:37:59 +0000 Subject: [gpfsug-discuss] Protocols protocols ... Message-ID: <0D334EB6-4F92-4D03-B19E-A8AEA2957232@bham.ac.uk> So we have both SMB and NFS enabled in our cluster. For various reasons we want to only run SMB on some nodes and only run NFS on other nodes? We have used mmchnode to set the nodes into different groups and then have IP addresses associated with those groups which we want to use for SMB and NFS. All seems OK so far ? Now comes the problem, I can?t see a way to tell CES that group1 should run NFS and group2 SMB. We thought we had this cracked by removing the gpfs.smb packages from NFS nodes and ganesha from SMB nodes. Seems to work OK, EXCEPT ? sometimes nodes go into failed state, and it looks like this is because the SMB state is failed on the NFS only nodes ? This looks to me like GPFS is expecting protocol packages to be installed for both NFS and SMB. I worked out I can clear the failed state by running mmces service stop SMB -N node. The docs mention attributes, but I don?t see that they are used other than when running object? Any thoughts/comments/links to a doc page I missed? Or is it expected that both smb and nfs packages are required to be installed on all protocol nodes even if not being used on that node? Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 9 21:34:43 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 9 Oct 2018 21:34:43 +0100 Subject: [gpfsug-discuss] Protocols protocols ... In-Reply-To: <0D334EB6-4F92-4D03-B19E-A8AEA2957232@bham.ac.uk> References: <0D334EB6-4F92-4D03-B19E-A8AEA2957232@bham.ac.uk> Message-ID: On 09/10/18 20:37, Simon Thompson wrote: [SNIP] > > Any thoughts/comments/links to a doc page I missed? Or is it expected > that both smb and nfs packages are required to be installed on all > protocol nodes even if not being used on that node? > As a last resort could you notionally let them do both and fix it with iptables so they only appear to the outside world to be running one or the other? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From kkr at lbl.gov Tue Oct 9 22:39:23 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Tue, 9 Oct 2018 14:39:23 -0700 Subject: [gpfsug-discuss] TO BE RESCHEDULED [was] - Re: Request for Enhancements (RFE) Forum - Submission Deadline October 1 In-Reply-To: <841FA5CA-5C6B-4626-8137-BA5994C3A651@bham.ac.uk> References: <52220937-CE0A-4949-89A0-6EA41D5ECF93@lbl.gov> <263e53c18647421f8b3cd936da0075df@jumptrading.com> <0341213A-6CB7-434F-A575-9099C2D0D703@spectrumscale.org> <585b21e7-d437-380f-65d8-d24fa236ce3b@nasa.gov> <841FA5CA-5C6B-4626-8137-BA5994C3A651@bham.ac.uk> Message-ID: Due to scheduling conflicts we need to reschedule the RFE meeting that was to happen tomorrow, October 10th. We received RFEs from 2 sites (NASA and Sloan Kettering), if you sent one and it was somehow missed. Please respond here, and we?ll pick up privately as follow up. More soon. Best, Kristy > On Sep 28, 2018, at 6:44 AM, Simon Thompson wrote: > > There is a limit on votes, not submissions. i.e. your site gets three votes, so you can't have three votes and someone else from Goddard also have three. > > We have to review the submissions, so as you say 10 we'd think unreasonable and skip, but a sensible number is OK. > > Simon > > ?On 28/09/2018, 13:52, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Aaron Knister" wrote: > > Hi Kristy, > > At some point I thought I'd read there was a per-site limit of the > number of RFEs that could be submitted but I can't find it skimming > through email. I'd think submitting 10 would be unreasonable but would 2 > or 3 be OK? > > -Aaron > > On 9/27/18 4:35 PM, Kristy Kallback-Rose wrote: >> Reminder, the*October 1st* deadline is approaching. We?re looking for at >> least a few RFEs (Requests For Enhancements) for this first forum, so if >> you?re interesting in promoting your RFE please reach out to one of us, >> or even here on the list. >> >> Thanks, >> Kristy >> >>> On Sep 7, 2018, at 3:00 AM, Simon Thompson (Spectrum Scale User Group >>> Chair) > wrote: >>> >>> GPFS/Spectrum Scale Users, >>> Here?s a long-ish note about our plans to try and improve the RFE >>> process. We?ve tried to include a tl;dr version if you just read the >>> headers. You?ll find the details underneath ;-) and reading to the end >>> is ideal. >>> >>> IMPROVING THE RFE PROCESS >>> As you?ve heard on the list, and at some of the in-person User Group >>> events, we?ve been talking about ways we can improve the RFE process. >>> We?d like to begin having an RFE forum, and have it be de-coupled from >>> the in-person events because we know not everyone can travel. >>> LIGHTNING PRESENTATIONS ON-LINE >>> In general terms, we?d have regular on-line events, where RFEs could >>> be/very briefly/(5 minutes, lightning talk) presented by the >>> requester. There would then be time for brief follow-on discussion >>> and questions. The session would be recorded to deal with large time >>> zone differences. >>> The live meeting is planned for October 10^th 2018, at 4PM BST (that >>> should be 11am EST if we worked is out right!) >>> FOLLOW UP POLL >>> A poll, independent of current RFE voting, would be conducted a couple >>> days after the recording was available to gather votes and feedback >>> on the RFEs submitted ?we may collect site name, to see how many votes >>> are coming from a certain site. >>> >>> MAY NOT GET IT RIGHT THE FIRST TIME >>> We view this supplemental RFE process as organic, that is, we?ll learn >>> as we go and make modifications. The overall goal here is to highlight >>> the RFEs that matter the most to the largest number of UG members by >>> providing a venue for people to speak about their RFEs and collect >>> feedback from fellow community members. >>> >>> *RFE PRESENTERS WANTED, SUBMISSION DEADLINE OCTOBER 1ST >>> *We?d like to guide a small handful of RFE submitters through this >>> process the first time around, so if you?re interested in being a >>> presenter, let us know now. We?re planning on doing the online meeting >>> and poll for the first time in mid-October, so the submission deadline >>> for your RFE is October 1st. If it?s useful, when you?re drafting your >>> RFE feel free to use the list as a sounding board for feedback. Often >>> sites have similar needs and you may find someone to collaborate with >>> on your RFE to make it useful to more sites, and thereby get more >>> votes. Some guidelines are here: >>> https://drive.google.com/file/d/1o8nN39DTU32qj_EFia5wRhnvfvNfr3cI/view?usp=sharing >>> You can submit you RFE by email to:rfe at spectrumscaleug.org >>> >>> >>> *PARTICIPANTS (AKA YOU!!), VIEW AND VOTE >>> *We are seeking very good participation in the RFE on-line events >>> needed to make this an effective method of Spectrum Scale Community >>> and IBM Developer collaboration. * It is to your benefit to >>> participate and help set priorities on Spectrum Scale enhancements!! >>> *We want to make this process light lifting for you as a participant. >>> We will limit the duration of the meeting to 1 hour to minimize the >>> use of your valuable time. >>> >>> Please register for the online meeting via Eventbrite >>> (https://www.eventbrite.com/e/spectrum-scale-request-for-enhancements-voting-tickets-49979954389) >>> ? we?ll send details of how to join the online meeting nearer the time. >>> >>> Thanks! >>> >>> Simon, Kristy, Bob, Bryan and Carl! >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss atspectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From kkr at lbl.gov Wed Oct 10 03:08:16 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Tue, 9 Oct 2018 19:08:16 -0700 Subject: [gpfsug-discuss] Still need a couple User Talks for SC18 Message-ID: Hello, Please reach out if you?re even a little bit interested, we really want to balance the agenda with user presentations. Thanks, Kristy From kkr at lbl.gov Wed Oct 10 03:13:36 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Tue, 9 Oct 2018 19:13:36 -0700 Subject: [gpfsug-discuss] Still need a couple User Talks for SC18 In-Reply-To: References: Message-ID: <5EAF422E-FD80-4370-8267-959D4E89A0B5@lbl.gov> PS - If you?ve already contacted me about talking can you please ping me again? I?m drowning in stuff-to-do sauce. Thanks, Kristy > On Oct 9, 2018, at 7:08 PM, Kristy Kallback-Rose wrote: > > Hello, > > Please reach out if you?re even a little bit interested, we really want to balance the agenda with user presentations. > > Thanks, > Kristy From rohwedder at de.ibm.com Wed Oct 10 09:24:58 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Wed, 10 Oct 2018 10:24:58 +0200 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: <57798FAD-4CD1-41DB-8F00-06F34E85D34F@bham.ac.uk> References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> <57798FAD-4CD1-41DB-8F00-06F34E85D34F@bham.ac.uk> Message-ID: Hello Simon, not sure if the answer solved your question from the response, Even if nodes can be externally resolved by unique hostnames, applications that run on the host use the /bin/hostname binary or the hostname() call to identify the node they are running on. This is the case with the performance collection sensor. So you need to set the hostname of the hosts using /bin/hostname in in a way that provides unique responses of the "/bin/hostname" call within a cluster. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: Simon Thompson To: gpfsug main discussion list Date: 09.10.2018 17:00 Subject: Re: [gpfsug-discuss] Pmsensors and gui Sent by: gpfsug-discuss-bounces at spectrumscale.org We do ? Its just the node is joined to the cluster as ?hostname1-data.cluster?, but it also has a primary (1GbE link) as ?hostname.cluster?? Simon From: on behalf of "rohwedder at de.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 9 October 2018 at 15:56 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Pmsensors and gui Hello Simon, the performance collector collects data from each node with the "hostname" as in /bin/hostname as key. The GUI reaches out to all nodes and tries to map the GPFS node name to the local hostname on that node. If the hostname is set identical to be "hostname" on all nodes, the mapping will not succeed, So you will have to use unique hostnames on all cluster nodes. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development |------------------------------------------------+------------------------------------------------+-------------------------------> | | | | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@| | | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |Phone: |+49 7034 6430190 |IBM Deutschland Research & | | | |Development | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| |cid:2__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@| >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |E-Mail: |rohwedder at de.ibm.com |Am Weiher 24 | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@|cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@|65451 Kelsterbach | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@|cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@|Germany | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> | | | | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@| | | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| Inactive hide details for "Sobey, Richard A" ---09.10.2018 16:00:32---I can help with the first one as I had the issue a few we"Sobey, Richard A" ---09.10.2018 16:00:32---I can help with the first one as I had the issue a few weeks ago. The answer from support is below, From: "Sobey, Richard A" To: gpfsug main discussion list Date: 09.10.2018 16:00 Subject: Re: [gpfsug-discuss] Pmsensors and gui Sent by: gpfsug-discuss-bounces at spectrumscale.org I can help with the first one as I had the issue a few weeks ago. The answer from support is below, verbatim. --------------------------------------------------------------------------------------------------------------------------------------------- When trying to resolve the IP-Address in the JAVA code the first entry entry in the list is returned. Just localhost was expected for this. If the order is other way around and the list starts with localhost.localdomain, the GUI unfortunately cannot resolve the real node name and will fail with the message seen in the log files. Thus I assume that this is the case for your customer. it seems that our code it not as tolerant as it should be for the localhost definitions in the /etc/hosts file on the GUI node. We need to change this in our code to handle accordingly. Please let the customer adjust this entry and place localhost at the top of the list. After this the task should run successful and the state should be OK again for the pm_collector. --------------------------------------------------------------------------------------------------------------------------------------------- checking the GUI nodes /etc/hosts it shows actually 127.0.0.1 localhost.localdomain localhost4 localhost4.localdomain4 localhost From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: 09 October 2018 14:38 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Pmsensors and gui Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler$1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19742873.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19933766.gif Type: image/gif Size: 46 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19033540.gif Type: image/gif Size: 4660 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19192281.gif Type: image/gif Size: 106 bytes Desc: not available URL: From robbyb at us.ibm.com Wed Oct 10 14:07:10 2018 From: robbyb at us.ibm.com (Rob Basham) Date: Wed, 10 Oct 2018 13:07:10 +0000 Subject: [gpfsug-discuss] Still need a couple User Talks for SC18 In-Reply-To: <5EAF422E-FD80-4370-8267-959D4E89A0B5@lbl.gov> References: <5EAF422E-FD80-4370-8267-959D4E89A0B5@lbl.gov>, Message-ID: An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Oct 10 14:22:52 2018 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]) Date: Wed, 10 Oct 2018 13:22:52 +0000 Subject: [gpfsug-discuss] Still need a couple User Talks for SC18 In-Reply-To: References: <5EAF422E-FD80-4370-8267-959D4E89A0B5@lbl.gov>, , Message-ID: <9DF57532-9CF1-4288-AB75-6937F583953D@nasa.gov> If there?s interest I could do a short presentation on our 1k node virtual GPFS test cluster (with SR-IOV and real IB RDMA!) and some of the benefits we?ve found (including helping squash a nasty hard-to-reproduce bug) as well as how we use it to test upgrades. On October 10, 2018 at 09:07:24 EDT, Rob Basham wrote: Kristy, I'll be at SC18 for client presentations and could talk about TCT. We have a big release coming up in 1H18 with multi-site support and we've broken out of the gateway paradigm to where we work on every client node in the cluster for key data path work. If you have a slot I could talk about that. Regards, Rob Basham MCStore and IBM Ready Archive architecture 971-344-1999 ----- Original message ----- From: Kristy Kallback-Rose Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Still need a couple User Talks for SC18 Date: Tue, Oct 9, 2018 7:13 PM PS - If you?ve already contacted me about talking can you please ping me again? I?m drowning in stuff-to-do sauce. Thanks, Kristy > On Oct 9, 2018, at 7:08 PM, Kristy Kallback-Rose wrote: > > Hello, > > Please reach out if you?re even a little bit interested, we really want to balance the agenda with user presentations. > > Thanks, > Kristy _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Oct 10 14:58:24 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 10 Oct 2018 13:58:24 +0000 Subject: [gpfsug-discuss] Still need a couple User Talks for SC18 In-Reply-To: <9DF57532-9CF1-4288-AB75-6937F583953D@nasa.gov> References: <5EAF422E-FD80-4370-8267-959D4E89A0B5@lbl.gov> <9DF57532-9CF1-4288-AB75-6937F583953D@nasa.gov> Message-ID: <0835F404-DF06-4237-A1AA-8553E28E1343@nuance.com> User talks - For those interested, please email Kristy and/or myself directly. Rob/other IBMers - work with Ulf Troppens on slots. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Oct 10 16:06:09 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 10 Oct 2018 11:06:09 -0400 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> <57798FAD-4CD1-41DB-8F00-06F34E85D34F@bham.ac.uk> Message-ID: <11037.1539183969@turing-police.cc.vt.edu> On Wed, 10 Oct 2018 10:24:58 +0200, "Markus Rohwedder" said: > Hello Simon, > > not sure if the answer solved your question from the response, > > Even if nodes can be externally resolved by unique hostnames, applications > that run on the host use the /bin/hostname binary or the hostname() call to > identify the node they are running on. > This is the case with the performance collection sensor. > So you need to set the hostname of the hosts using /bin/hostname in in a > way that provides unique responses of the "/bin/hostname" call within a > cluster. And we discovered that 'unique' applies to "only considering the leftmost part of the hostname". We set up a stretch cluster that had 3 NSD servers at each of two locations, and found that using FQDN names of the form: nsd1.something.loc1.internal nsd2.something.loc1.internal nsd1.something.loc2.internal nsd2.something.loc2.internal got things all sorts of upset in a very passive-agressive way. The cluster would come up, and serve data just fine. But things like 'nsdperf' would toss errors about not being able to resolve a NSD server name, or fail to connect, or complain that it was connecting to itself, or other similar "not talking to the node it thought" type confusion... We ended up renaming to: nsd1-loc1.something.internal nsd1-loc2.something.internal ... and all the userspace tools started working much better. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Wed Oct 10 16:43:45 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 10 Oct 2018 15:43:45 +0000 Subject: [gpfsug-discuss] Performance collector no results for Capacity Message-ID: Hi all, Maybe I'm barking up the wrong tree but I'm debugging why I don't get a nice graph in the GUI for fileset capacity, even though the GUI does know about things such as capacity and inodes and usage. So off I go to the CLI to run "mmperfmon query GPFSFilesetQuota" and I get this: Oct-10 16:33:28 [Info ] QueryEngine: (fd=64) query from 127.0.0.1: get metrics GPFSFilesetQuota from node=icgpfsq1 last 10 bucket_size 1 Oct-10 16:33:28 [Info ] QueryParser: metric: GPFSFilesetQuota Oct-10 16:33:28 [Warning] QueryEngine: searchForMetric: could not find metaKey for given metric GPFSFilesetQuota, returning. Oct-10 16:33:28 [Info ] QueryEngine: [fd=64] no data available for query Is this a golden ticket to my problem or should I be checking elsewhere? I'm following a troubleshooting guide here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1pdg_guiperfmonissues.htm and from the page directly within the GUI server itself. Notably, other things work ok: [root at icgpfsq1 richard]# mmperfmon query cpu_user Legend: 1: icgpfsq1|CPU|cpu_user Row Timestamp cpu_user 1 2018-10-10-16:41:09 0.00 2 2018-10-10-16:41:10 0.25 3 2018-10-10-16:41:11 0.50 4 2018-10-10-16:41:12 0.50 5 2018-10-10-16:41:13 0.50 6 2018-10-10-16:41:14 0.25 7 2018-10-10-16:41:15 1.25 8 2018-10-10-16:41:16 2.51 9 2018-10-10-16:41:17 0.25 10 2018-10-10-16:41:18 0.25 I'm running 5.0.1-2 on all nodes except the NSD servers which still run 5.0.0.2. Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Oct 10 16:58:51 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 10 Oct 2018 15:58:51 +0000 Subject: [gpfsug-discuss] Sudo wrappers Message-ID: <88E47B96-DF0B-428A-92F6-1AEAEA4AA8EE@bham.ac.uk> OK, so I finally got a few minutes to play with the sudo wrappers. I read the docs on the GPFS website, setup my gpfsadmin user and made it so that root can ssh as the gpfsadmin user to the host. Except of course I?ve clearly misunderstood things, because when I do: [myusername at bber-dssg02 bin]$ sudo /usr/lpp/mmfs/bin/mmgetstate -a myusername at bber-afmgw01.bb2.cluster's password: myusername at bber-dssg02.bb2.cluster's password: myusername at bber-dssg01.bb2.cluster's password: myusername at bber-afmgw02.bb2.cluster's password: Now ?myusername? is ? my username, not ?gpfsadmin?. What I really don?t want to do is permit root to ssh to all the hosts in the cluster as ?myusername?. I kinda thought the username it sshes as would be configurable, but apparently not? Annoyingly, I can do: [myusername at bber-dssg02 bin]$ sudo SUDO_USER=gpfsadmin /usr/lpp/mmfs/bin/mmgetstate -a And that works fine? So is it possibly to set in a config file the user that the sudo wrapper works as? (I get there are cases where you want to ssh as the original calling user) Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From Fabrice.Cantos at niwa.co.nz Wed Oct 10 22:57:04 2018 From: Fabrice.Cantos at niwa.co.nz (Fabrice Cantos) Date: Wed, 10 Oct 2018 21:57:04 +0000 Subject: [gpfsug-discuss] Quick survey: ACL Posix vs NFS V4 Message-ID: <8120950808e344e280ae211ff22ba0bf@welwex02.niwa.local> I would be interested to know what you chose for your filesystems and user/project space directories: * Traditional Posix ACL * NFS V4 ACL What did motivate your choice? We are facing some issues to get the correct NFS ACL to keep correct attributes for new files created. Thanks Fabrice [cid:image4cef17.PNG at 18c66b76.4480e036] Fabrice Cantos HPC Systems Engineer Group Manager ? High Performance Computing T +64-4-386-0367 M +64-27-412-9693 National Institute of Water & Atmospheric Research Ltd (NIWA) 301 Evans Bay Parade, Greta Point, Wellington Connect with NIWA: niwa.co.nz Facebook Twitter LinkedIn Instagram To ensure compliance with legal requirements and to maintain cyber security standards, NIWA's IT systems are subject to ongoing monitoring, activity logging and auditing. This monitoring and auditing service may be provided by third parties. Such third parties can access information transmitted to, processed by and stored on NIWA's IT systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image4cef17.PNG Type: image/png Size: 12288 bytes Desc: image4cef17.PNG URL: From truongv at us.ibm.com Thu Oct 11 04:14:24 2018 From: truongv at us.ibm.com (Truong Vu) Date: Wed, 10 Oct 2018 23:14:24 -0400 Subject: [gpfsug-discuss] Sudo wrappers In-Reply-To: References: Message-ID: Yes, you can use mmchconfig for that. eg: mmchconfig sudoUser=gpfsadmin Thanks, Tru. Message: 2 Date: Wed, 10 Oct 2018 15:58:51 +0000 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Sudo wrappers Message-ID: <88E47B96-DF0B-428A-92F6-1AEAEA4AA8EE at bham.ac.uk> Content-Type: text/plain; charset="utf-8" OK, so I finally got a few minutes to play with the sudo wrappers. I read the docs on the GPFS website, setup my gpfsadmin user and made it so that root can ssh as the gpfsadmin user to the host. Except of course I?ve clearly misunderstood things, because when I do: [myusername at bber-dssg02 bin]$ sudo /usr/lpp/mmfs/bin/mmgetstate -a myusername at bber-afmgw01.bb2.cluster's password: myusername at bber-dssg02.bb2.cluster's password: myusername at bber-dssg01.bb2.cluster's password: myusername at bber-afmgw02.bb2.cluster's password: Now ?myusername? is ? my username, not ?gpfsadmin?. What I really don?t want to do is permit root to ssh to all the hosts in the cluster as ?myusername?. I kinda thought the username it sshes as would be configurable, but apparently not? Annoyingly, I can do: [myusername at bber-dssg02 bin]$ sudo SUDO_USER=gpfsadmin /usr/lpp/mmfs/bin/mmgetstate -a And that works fine? So is it possibly to set in a config file the user that the sudo wrapper works as? (I get there are cases where you want to ssh as the original calling user) Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20181010/6317be26/attachment-0001.html > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Anna.Greim at de.ibm.com Thu Oct 11 07:41:25 2018 From: Anna.Greim at de.ibm.com (Anna Greim) Date: Thu, 11 Oct 2018 08:41:25 +0200 Subject: [gpfsug-discuss] Performance collector no results for Capacity In-Reply-To: References: Message-ID: Hi Richard, one thing to note. You tried "mmperfmon query GPFSFilesetQuota" to get metric data. So you used the sensor's name instead of a metric name. And compared it to "mmperfmon query cpu_user" where you used the metric name. mmperfmon will not return data, if you use the sensor's name instead of a metric's name. I bet you got something like this returned: [root at test-51 ~]# mmperfmon query GPFSFilesetQuota Error: no data available for query . mmperfmon: Command failed. Examine previous error messages to determine cause. The log entries you found just tell you, that the collector does not know any metric named "GPFSFilesetQuota". Please try the query again with gpfs_rq_blk_current or gpfs_rq_file_current. If the collector never got any data for that metrics, it also does not know those metrics' names. But since you do not see any data in the GUI this might be the case. In this case please check with "mmperfmon config show" if the restrict field is set correctly. You should use the long gpfs name and not the hostname. You can check, if the configuration file was distributed correctly in checking the /opt/IBM/zimon/ZIMonSensors.cfg on the node that is supposed to start this monitor. If the mmperfmon command was able to identify the restrict value correctly, this node should have your configured period value instead of 0 in ZIMonSensors.cfg under the GPFSFilesetQuota sensor. All other nodes should include a period equal to 0. Furthermore, of course, the period for GPFSFilesetQuota should be higher than 0. Recommended is a value of 3600 (once per hour) since the underlying command is heavier on the system than other sensors. Change the values with the "mmperfmon config update" command, so that it is distributed in the system. E.g. "mmperfmon config update GPFSFilesetQuota.restrict=" and "mmperfmon config update GPFSFilesetQuota.period=3600" Mit freundlichen Gr??en / Kind regards Greim, Anna Software Engineer, Spectrum Scale Development IBM Systems Phone: +49-7034-2740981 IBM Deutschland Research & Development GmbH Mobil: +49-172-2646541 Am Weiher 24 Email: anna.greim at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Date: 10/10/2018 17:43 Subject: [gpfsug-discuss] Performance collector no results for Capacity Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, Maybe I?m barking up the wrong tree but I?m debugging why I don?t get a nice graph in the GUI for fileset capacity, even though the GUI does know about things such as capacity and inodes and usage. So off I go to the CLI to run ?mmperfmon query GPFSFilesetQuota? and I get this: Oct-10 16:33:28 [Info ] QueryEngine: (fd=64) query from 127.0.0.1: get metrics GPFSFilesetQuota from node=icgpfsq1 last 10 bucket_size 1 Oct-10 16:33:28 [Info ] QueryParser: metric: GPFSFilesetQuota Oct-10 16:33:28 [Warning] QueryEngine: searchForMetric: could not find metaKey for given metric GPFSFilesetQuota, returning. Oct-10 16:33:28 [Info ] QueryEngine: [fd=64] no data available for query Is this a golden ticket to my problem or should I be checking elsewhere? I?m following a troubleshooting guide here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1pdg_guiperfmonissues.htm and from the page directly within the GUI server itself. Notably, other things work ok: [root at icgpfsq1 richard]# mmperfmon query cpu_user Legend: 1: icgpfsq1|CPU|cpu_user Row Timestamp cpu_user 1 2018-10-10-16:41:09 0.00 2 2018-10-10-16:41:10 0.25 3 2018-10-10-16:41:11 0.50 4 2018-10-10-16:41:12 0.50 5 2018-10-10-16:41:13 0.50 6 2018-10-10-16:41:14 0.25 7 2018-10-10-16:41:15 1.25 8 2018-10-10-16:41:16 2.51 9 2018-10-10-16:41:17 0.25 10 2018-10-10-16:41:18 0.25 I?m running 5.0.1-2 on all nodes except the NSD servers which still run 5.0.0.2. Thanks Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Thu Oct 11 08:54:01 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 11 Oct 2018 07:54:01 +0000 Subject: [gpfsug-discuss] Sudo wrappers In-Reply-To: References: Message-ID: <39DC4B5E-CAFD-489C-9BE5-42B83B29A8F5@bham.ac.uk> Nope that one doesn?t work ? I found it in the docs: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adm_mmchconfig.htm ?Specifies a non-root admin user ID to be used when sudo wrappers are enabled and a root-level background process calls an administration command directly instead of through sudo.? So it reads like it still wants to be ?me? unless it?s a background process. Simon From: on behalf of "truongv at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 11 October 2018 at 04:14 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Sudo wrappers Yes, you can use mmchconfig for that. eg: mmchconfig sudoUser=gpfsadmin Thanks, Tru. Message: 2 Date: Wed, 10 Oct 2018 15:58:51 +0000 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Sudo wrappers Message-ID: <88E47B96-DF0B-428A-92F6-1AEAEA4AA8EE at bham.ac.uk> Content-Type: text/plain; charset="utf-8" OK, so I finally got a few minutes to play with the sudo wrappers. I read the docs on the GPFS website, setup my gpfsadmin user and made it so that root can ssh as the gpfsadmin user to the host. Except of course I?ve clearly misunderstood things, because when I do: [myusername at bber-dssg02 bin]$ sudo /usr/lpp/mmfs/bin/mmgetstate -a myusername at bber-afmgw01.bb2.cluster's password: myusername at bber-dssg02.bb2.cluster's password: myusername at bber-dssg01.bb2.cluster's password: myusername at bber-afmgw02.bb2.cluster's password: Now ?myusername? is ? my username, not ?gpfsadmin?. What I really don?t want to do is permit root to ssh to all the hosts in the cluster as ?myusername?. I kinda thought the username it sshes as would be configurable, but apparently not? Annoyingly, I can do: [myusername at bber-dssg02 bin]$ sudo SUDO_USER=gpfsadmin /usr/lpp/mmfs/bin/mmgetstate -a And that works fine? So is it possibly to set in a config file the user that the sudo wrapper works as? (I get there are cases where you want to ssh as the original calling user) Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu Oct 11 13:10:00 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 11 Oct 2018 12:10:00 +0000 Subject: [gpfsug-discuss] Performance collector no results for Capacity In-Reply-To: References: Message-ID: Anna, Yes, that will be it! I was running the wrong command as you surmise. The GPFSFileSetQuota config appears to be correct: { name = "GPFSFilesetQuota" period = 3600 restrict = "icgpfsq1.cc.ic.ac.uk" }, However "mmperfmon query gpfs_rq_blk_current" just shows lots of null values, for example: Row Timestamp gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current 1 2018-10-11-13:07:31 null null null null null null null null 2 2018-10-11-13:07:32 null null null null null null null null 3 2018-10-11-13:07:33 null null null null null null null null 4 2018-10-11-13:07:34 null null null null null null null null 5 2018-10-11-13:07:35 null null null null null null null null 6 2018-10-11-13:07:36 null null null null null null null null 7 2018-10-11-13:07:37 null null null null null null null null 8 2018-10-11-13:07:38 null null null null null null null null 9 2018-10-11-13:07:39 null null null null null null null null 10 2018-10-11-13:07:40 null null null null null null null null Same with the metric gpfs_rq_file_current. I'll have a look at the PDF sent by Markus in the meantime. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Anna Greim Sent: 11 October 2018 07:41 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Performance collector no results for Capacity Hi Richard, one thing to note. You tried "mmperfmon query GPFSFilesetQuota" to get metric data. So you used the sensor's name instead of a metric name. And compared it to "mmperfmon query cpu_user" where you used the metric name. mmperfmon will not return data, if you use the sensor's name instead of a metric's name. I bet you got something like this returned: [root at test-51 ~]# mmperfmon query GPFSFilesetQuota Error: no data available for query . mmperfmon: Command failed. Examine previous error messages to determine cause. The log entries you found just tell you, that the collector does not know any metric named "GPFSFilesetQuota". Please try the query again with gpfs_rq_blk_current or gpfs_rq_file_current. If the collector never got any data for that metrics, it also does not know those metrics' names. But since you do not see any data in the GUI this might be the case. In this case please check with "mmperfmon config show" if the restrict field is set correctly. You should use the long gpfs name and not the hostname. You can check, if the configuration file was distributed correctly in checking the /opt/IBM/zimon/ZIMonSensors.cfg on the node that is supposed to start this monitor. If the mmperfmon command was able to identify the restrict value correctly, this node should have your configured period value instead of 0 in ZIMonSensors.cfg under the GPFSFilesetQuota sensor. All other nodes should include a period equal to 0. Furthermore, of course, the period for GPFSFilesetQuota should be higher than 0. Recommended is a value of 3600 (once per hour) since the underlying command is heavier on the system than other sensors. Change the values with the "mmperfmon config update" command, so that it is distributed in the system. E.g. "mmperfmon config update GPFSFilesetQuota.restrict=" and "mmperfmon config update GPFSFilesetQuota.period=3600" Mit freundlichen Gr??en / Kind regards Greim, Anna Software Engineer, Spectrum Scale Development IBM Systems ________________________________ Phone: +49-7034-2740981 IBM Deutschland Research & Development GmbH [cid:image001.gif at 01D46163.B6B21E10] Mobil: +49-172-2646541 Am Weiher 24 Email: anna.greim at de.ibm.com 65451 Kelsterbach Germany ________________________________ IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Sobey, Richard A" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 10/10/2018 17:43 Subject: [gpfsug-discuss] Performance collector no results for Capacity Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, Maybe I'm barking up the wrong tree but I'm debugging why I don't get a nice graph in the GUI for fileset capacity, even though the GUI does know about things such as capacity and inodes and usage. So off I go to the CLI to run "mmperfmon query GPFSFilesetQuota" and I get this: Oct-10 16:33:28 [Info ] QueryEngine: (fd=64) query from 127.0.0.1: get metrics GPFSFilesetQuota from node=icgpfsq1 last 10 bucket_size 1 Oct-10 16:33:28 [Info ] QueryParser: metric: GPFSFilesetQuota Oct-10 16:33:28 [Warning] QueryEngine: searchForMetric: could not find metaKey for given metric GPFSFilesetQuota, returning. Oct-10 16:33:28 [Info ] QueryEngine: [fd=64] no data available for query Is this a golden ticket to my problem or should I be checking elsewhere? I'm following a troubleshooting guide here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1pdg_guiperfmonissues.htm and from the page directly within the GUI server itself. Notably, other things work ok: [root at icgpfsq1 richard]# mmperfmon query cpu_user Legend: 1: icgpfsq1|CPU|cpu_user Row Timestamp cpu_user 1 2018-10-10-16:41:09 0.00 2 2018-10-10-16:41:10 0.25 3 2018-10-10-16:41:11 0.50 4 2018-10-10-16:41:12 0.50 5 2018-10-10-16:41:13 0.50 6 2018-10-10-16:41:14 0.25 7 2018-10-10-16:41:15 1.25 8 2018-10-10-16:41:16 2.51 9 2018-10-10-16:41:17 0.25 10 2018-10-10-16:41:18 0.25 I'm running 5.0.1-2 on all nodes except the NSD servers which still run 5.0.0.2. Thanks Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 1851 bytes Desc: image001.gif URL: From Anna.Greim at de.ibm.com Thu Oct 11 14:11:56 2018 From: Anna.Greim at de.ibm.com (Anna Greim) Date: Thu, 11 Oct 2018 15:11:56 +0200 Subject: [gpfsug-discuss] Performance collector no results for Capacity In-Reply-To: References: Message-ID: Hello Richard, the sensor is running once an hour and the default of mmperfmon returns the last 10 results in a bucket-size of 1 seconds. The sensor did not run in the time of 13:07:31-13:07:40. Please use the command again with the option -b 3600 or with --bucket-size=3600 and see if you've got any data for that time. If you get any data the question is, why the GUI isn't able to get the data. If you do not have any data (only null rows) the question is, why the collector does not get data or why the sensor does not collect data and sends them to the collector. Since you get data for the cpu_user metric it is more likely that the sensor is not collecting and sending anything. The guide from Markus should help you here. Otherwise just write again into the user group. Mit freundlichen Gr??en / Kind regards Greim, Anna Software Engineer, Spectrum Scale Development IBM Systems IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Sobey, Richard A" To: gpfsug main discussion list Date: 11/10/2018 14:10 Subject: Re: [gpfsug-discuss] Performance collector no results for Capacity Sent by: gpfsug-discuss-bounces at spectrumscale.org Anna, Yes, that will be it! I was running the wrong command as you surmise. The GPFSFileSetQuota config appears to be correct: { name = "GPFSFilesetQuota" period = 3600 restrict = "icgpfsq1.cc.ic.ac.uk" }, However ?mmperfmon query gpfs_rq_blk_current? just shows lots of null values, for example: Row Timestamp gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current 1 2018-10-11-13:07:31 null null null null null null null null 2 2018-10-11-13:07:32 null null null null null null null null 3 2018-10-11-13:07:33 null null null null null null null null 4 2018-10-11-13:07:34 null null null null null null null null 5 2018-10-11-13:07:35 null null null null null null null null 6 2018-10-11-13:07:36 null null null null null null null null 7 2018-10-11-13:07:37 null null null null null null null null 8 2018-10-11-13:07:38 null null null null null null null null 9 2018-10-11-13:07:39 null null null null null null null null 10 2018-10-11-13:07:40 null null null null null null null null Same with the metric gpfs_rq_file_current. I?ll have a look at the PDF sent by Markus in the meantime. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Anna Greim Sent: 11 October 2018 07:41 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Performance collector no results for Capacity Hi Richard, one thing to note. You tried "mmperfmon query GPFSFilesetQuota" to get metric data. So you used the sensor's name instead of a metric name. And compared it to "mmperfmon query cpu_user" where you used the metric name. mmperfmon will not return data, if you use the sensor's name instead of a metric's name. I bet you got something like this returned: [root at test-51 ~]# mmperfmon query GPFSFilesetQuota Error: no data available for query . mmperfmon: Command failed. Examine previous error messages to determine cause. The log entries you found just tell you, that the collector does not know any metric named "GPFSFilesetQuota". Please try the query again with gpfs_rq_blk_current or gpfs_rq_file_current. If the collector never got any data for that metrics, it also does not know those metrics' names. But since you do not see any data in the GUI this might be the case. In this case please check with "mmperfmon config show" if the restrict field is set correctly. You should use the long gpfs name and not the hostname. You can check, if the configuration file was distributed correctly in checking the /opt/IBM/zimon/ZIMonSensors.cfg on the node that is supposed to start this monitor. If the mmperfmon command was able to identify the restrict value correctly, this node should have your configured period value instead of 0 in ZIMonSensors.cfg under the GPFSFilesetQuota sensor. All other nodes should include a period equal to 0. Furthermore, of course, the period for GPFSFilesetQuota should be higher than 0. Recommended is a value of 3600 (once per hour) since the underlying command is heavier on the system than other sensors. Change the values with the "mmperfmon config update" command, so that it is distributed in the system. E.g. "mmperfmon config update GPFSFilesetQuota.restrict=" and "mmperfmon config update GPFSFilesetQuota.period=3600" Mit freundlichen Gr??en / Kind regards Greim, Anna Software Engineer, Spectrum Scale Development IBM Systems Phone: +49-7034-2740981 IBM Deutschland Research & Development GmbH Mobil: +49-172-2646541 Am Weiher 24 Email: anna.greim at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" < gpfsug-discuss at spectrumscale.org> Date: 10/10/2018 17:43 Subject: [gpfsug-discuss] Performance collector no results for Capacity Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, Maybe I?m barking up the wrong tree but I?m debugging why I don?t get a nice graph in the GUI for fileset capacity, even though the GUI does know about things such as capacity and inodes and usage. So off I go to the CLI to run ?mmperfmon query GPFSFilesetQuota? and I get this: Oct-10 16:33:28 [Info ] QueryEngine: (fd=64) query from 127.0.0.1: get metrics GPFSFilesetQuota from node=icgpfsq1 last 10 bucket_size 1 Oct-10 16:33:28 [Info ] QueryParser: metric: GPFSFilesetQuota Oct-10 16:33:28 [Warning] QueryEngine: searchForMetric: could not find metaKey for given metric GPFSFilesetQuota, returning. Oct-10 16:33:28 [Info ] QueryEngine: [fd=64] no data available for query Is this a golden ticket to my problem or should I be checking elsewhere? I?m following a troubleshooting guide here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1pdg_guiperfmonissues.htm and from the page directly within the GUI server itself. Notably, other things work ok: [root at icgpfsq1 richard]# mmperfmon query cpu_user Legend: 1: icgpfsq1|CPU|cpu_user Row Timestamp cpu_user 1 2018-10-10-16:41:09 0.00 2 2018-10-10-16:41:10 0.25 3 2018-10-10-16:41:11 0.50 4 2018-10-10-16:41:12 0.50 5 2018-10-10-16:41:13 0.50 6 2018-10-10-16:41:14 0.25 7 2018-10-10-16:41:15 1.25 8 2018-10-10-16:41:16 2.51 9 2018-10-10-16:41:17 0.25 10 2018-10-10-16:41:18 0.25 I?m running 5.0.1-2 on all nodes except the NSD servers which still run 5.0.0.2. Thanks Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From spectrumscale at kiranghag.com Fri Oct 12 05:38:19 2018 From: spectrumscale at kiranghag.com (KG) Date: Fri, 12 Oct 2018 07:38:19 +0300 Subject: [gpfsug-discuss] error compiling IOR on GPFS Message-ID: Hi Folks I am trying to compile IOR on a GPFS filesystem and running into following errors. Github forum says that "The configure script does not add -lgpfs to the CFLAGS when it detects GPFS support." Any help on how to get around this? mpicc -DHAVE_CONFIG_H -I. -g -O2 -MT aiori-MPIIO.o -MD -MP -MF .deps/aiori-MPIIO.Tpo -c -o aiori-MPIIO.o aiori-MPIIO.c aiori-MPIIO.c: In function ?MPIIO_Xfer?: aiori-MPIIO.c:236:24: warning: assignment from incompatible pointer type [enabled by default] Access = MPI_File_write; ^ aiori-MPIIO.c:237:27: warning: assignment from incompatible pointer type [enabled by default] Access_at = MPI_File_write_at; ^ aiori-MPIIO.c:238:28: warning: assignment from incompatible pointer type [enabled by default] Access_all = MPI_File_write_all; ^ aiori-MPIIO.c:239:31: warning: assignment from incompatible pointer type [enabled by default] Access_at_all = MPI_File_write_at_all; ^ mv -f .deps/aiori-MPIIO.Tpo .deps/aiori-MPIIO.Po mpicc -g -O2 -o ior ior.o utilities.o parse_options.o aiori-POSIX.o aiori-MPIIO.o -lm aiori-POSIX.o: In function `gpfs_free_all_locks': /gpfs/Aramco_POC/ior-master/src/aiori-POSIX.c:118: undefined reference to `gpfs_fcntl' aiori-POSIX.o: In function `gpfs_access_start': aiori-POSIX.c:(.text+0x91f): undefined reference to `gpfs_fcntl' aiori-POSIX.o: In function `gpfs_access_end': aiori-POSIX.c:(.text+0xa04): undefined reference to `gpfs_fcntl' collect2: error: ld returned 1 exit status make[2]: *** [ior] Error 1 make[2]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' make[1]: *** [all] Error 2 make[1]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' make: *** [all-recursive] Error 1 Kiran -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnbent at gmail.com Fri Oct 12 05:50:45 2018 From: johnbent at gmail.com (John Bent) Date: Thu, 11 Oct 2018 22:50:45 -0600 Subject: [gpfsug-discuss] error compiling IOR on GPFS In-Reply-To: References: Message-ID: Kiran, Are you using the latest version of IOR? https://github.com/hpc/ior Thanks, John On Thu, Oct 11, 2018 at 10:39 PM KG wrote: > Hi Folks > > I am trying to compile IOR on a GPFS filesystem and running into following > errors. > > Github forum says that "The configure script does not add -lgpfs to the > CFLAGS when it detects GPFS support." > > Any help on how to get around this? > > mpicc -DHAVE_CONFIG_H -I. -g -O2 -MT aiori-MPIIO.o -MD -MP -MF > .deps/aiori-MPIIO.Tpo -c -o aiori-MPIIO.o aiori-MPIIO.c > aiori-MPIIO.c: In function ?MPIIO_Xfer?: > aiori-MPIIO.c:236:24: warning: assignment from incompatible pointer type > [enabled by default] > Access = MPI_File_write; > ^ > aiori-MPIIO.c:237:27: warning: assignment from incompatible pointer type > [enabled by default] > Access_at = MPI_File_write_at; > ^ > aiori-MPIIO.c:238:28: warning: assignment from incompatible pointer type > [enabled by default] > Access_all = MPI_File_write_all; > ^ > aiori-MPIIO.c:239:31: warning: assignment from incompatible pointer type > [enabled by default] > Access_at_all = MPI_File_write_at_all; > ^ > mv -f .deps/aiori-MPIIO.Tpo .deps/aiori-MPIIO.Po > mpicc -g -O2 -o ior ior.o utilities.o parse_options.o aiori-POSIX.o > aiori-MPIIO.o -lm > aiori-POSIX.o: In function `gpfs_free_all_locks': > /gpfs/Aramco_POC/ior-master/src/aiori-POSIX.c:118: undefined reference to > `gpfs_fcntl' > aiori-POSIX.o: In function `gpfs_access_start': > aiori-POSIX.c:(.text+0x91f): undefined reference to `gpfs_fcntl' > aiori-POSIX.o: In function `gpfs_access_end': > aiori-POSIX.c:(.text+0xa04): undefined reference to `gpfs_fcntl' > collect2: error: ld returned 1 exit status > make[2]: *** [ior] Error 1 > make[2]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' > make[1]: *** [all] Error 2 > make[1]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' > make: *** [all-recursive] Error 1 > > Kiran > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Oct 12 11:09:49 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 12 Oct 2018 10:09:49 +0000 Subject: [gpfsug-discuss] Performance collector no results for Capacity In-Reply-To: References: Message-ID: Hi Anna, Markus It was the incorrect restrict clause referencing the FQDN of the server, and not the GPFS daemon node name, that was causing the problem. This has now been updated and we have nice graphs ? Many thanks! Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Fri Oct 12 11:39:12 2018 From: spectrumscale at kiranghag.com (KG) Date: Fri, 12 Oct 2018 13:39:12 +0300 Subject: [gpfsug-discuss] error compiling IOR on GPFS In-Reply-To: References: Message-ID: Hi John Yes, I am using latest version from this link. Do I have to use any additional switches for compilation? I used following sequence ./bootstrap ./configure ./make (fails) On Fri, Oct 12, 2018 at 7:51 AM John Bent wrote: > Kiran, > > Are you using the latest version of IOR? > https://github.com/hpc/ior > > Thanks, > > John > > On Thu, Oct 11, 2018 at 10:39 PM KG wrote: > >> Hi Folks >> >> I am trying to compile IOR on a GPFS filesystem and running into >> following errors. >> >> Github forum says that "The configure script does not add -lgpfs to the >> CFLAGS when it detects GPFS support." >> >> Any help on how to get around this? >> >> mpicc -DHAVE_CONFIG_H -I. -g -O2 -MT aiori-MPIIO.o -MD -MP -MF >> .deps/aiori-MPIIO.Tpo -c -o aiori-MPIIO.o aiori-MPIIO.c >> aiori-MPIIO.c: In function ?MPIIO_Xfer?: >> aiori-MPIIO.c:236:24: warning: assignment from incompatible pointer type >> [enabled by default] >> Access = MPI_File_write; >> ^ >> aiori-MPIIO.c:237:27: warning: assignment from incompatible pointer type >> [enabled by default] >> Access_at = MPI_File_write_at; >> ^ >> aiori-MPIIO.c:238:28: warning: assignment from incompatible pointer type >> [enabled by default] >> Access_all = MPI_File_write_all; >> ^ >> aiori-MPIIO.c:239:31: warning: assignment from incompatible pointer type >> [enabled by default] >> Access_at_all = MPI_File_write_at_all; >> ^ >> mv -f .deps/aiori-MPIIO.Tpo .deps/aiori-MPIIO.Po >> mpicc -g -O2 -o ior ior.o utilities.o parse_options.o aiori-POSIX.o >> aiori-MPIIO.o -lm >> aiori-POSIX.o: In function `gpfs_free_all_locks': >> /gpfs/Aramco_POC/ior-master/src/aiori-POSIX.c:118: undefined reference to >> `gpfs_fcntl' >> aiori-POSIX.o: In function `gpfs_access_start': >> aiori-POSIX.c:(.text+0x91f): undefined reference to `gpfs_fcntl' >> aiori-POSIX.o: In function `gpfs_access_end': >> aiori-POSIX.c:(.text+0xa04): undefined reference to `gpfs_fcntl' >> collect2: error: ld returned 1 exit status >> make[2]: *** [ior] Error 1 >> make[2]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' >> make[1]: *** [all] Error 2 >> make[1]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' >> make: *** [all-recursive] Error 1 >> >> Kiran >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Fri Oct 12 11:43:41 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 12 Oct 2018 12:43:41 +0200 Subject: [gpfsug-discuss] error compiling IOR on GPFS In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Oct 15 15:11:34 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 15 Oct 2018 14:11:34 +0000 Subject: [gpfsug-discuss] mmfileid on 2 NSDs simultaneously? Message-ID: Hi All, Is there a way to run mmfileid on two NSD?s simultaneously? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alexander.Saupp at de.ibm.com Mon Oct 15 19:18:32 2018 From: Alexander.Saupp at de.ibm.com (Alexander Saupp) Date: Mon, 15 Oct 2018 20:18:32 +0200 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS Message-ID: Dear Spectrum Scale mailing list, I'm part of IBM Lab Services - currently i'm having multiple customers asking me for optimization of a similar workloads. The task is to tune a Spectrum Scale system (comprising ESS and CES protocol nodes) for the following workload: A single Linux NFS client mounts an NFS export, extracts a flat tar archive with lots of ~5KB files. I'm measuring the speed at which those 5KB files are written (`time tar xf archive.tar`). I do understand that Spectrum Scale is not designed for such workload (single client, single thread, small files, single directory), and that such benchmark in not appropriate to benmark the system. Yet I find myself explaining the performance for such scenario (git clone..) quite frequently, as customers insist that optimization of that scenario would impact individual users as it shows task duration. I want to make sure that I have optimized the system as much as possible for the given workload, and that I have not overlooked something obvious. When writing to GPFS directly I'm able to write ~1800 files / second in a test setup. This is roughly the same on the protocol nodes (NSD client), as well as on the ESS IO nodes (NSD server). When writing to the NFS export on the protocol node itself (to avoid any network effects) I'm only able to write ~230 files / second. Writing to the NFS export from another node (now including network latency) gives me ~220 files / second. There seems to be a huge performance degradation by adding NFS-Ganesha to the software stack alone. I wonder what can be done to minimize the impact. - Ganesha doesn't seem to support 'async' or 'no_wdelay' options... anything equivalent available? - Is there and expected advantage of using the network-latency tuned profile, as opposed to the ESS default throughput-performance profile? - Are there other relevant Kernel params? - Is there an expected advantage of raising the number of threads (NSD server (nsd*WorkerThreads) / NSD client (workerThreads) / Ganesha (NB_WORKER)) for the given workload (single client, single thread, small files)? - Are there other relevant GPFS params? - Impact of Sync replication, disk latency, etc is understood. - I'm aware that 'the real thing' would be to work with larger files in a multithreaded manner from multiple nodes - and that this scenario will scale quite well. I just want to ensure that I'm not missing something obvious over reiterating that massage to customers. Any help was greatly appreciated - thanks much in advance! Alexander Saupp IBM Germany Mit freundlichen Gr??en / Kind regards Alexander Saupp IBM Systems, Storage Platform, EMEA Storage Competence Center Phone: +49 7034-643-1512 IBM Deutschland GmbH Mobile: +49-172 7251072 Am Weiher 24 Email: alexander.saupp at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 54993307.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From makaplan at us.ibm.com Mon Oct 15 19:44:52 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 15 Oct 2018 14:44:52 -0400 Subject: [gpfsug-discuss] mmfileid on 2 NSDs simultaneously? In-Reply-To: References: Message-ID: How about using the -F option? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mutantllama at gmail.com Mon Oct 15 23:32:35 2018 From: mutantllama at gmail.com (Carl) Date: Tue, 16 Oct 2018 09:32:35 +1100 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Hi, We recently had a PMR open for Ganesha related performance issues, which was resolved with an eFix that updated Ganesha. If you are running GPFS v5 I would suggest contacting support. Cheers, Carl. On Tue, 16 Oct 2018 at 5:20 am, Alexander Saupp wrote: > Dear Spectrum Scale mailing list, > > I'm part of IBM Lab Services - currently i'm having multiple customers > asking me for optimization of a similar workloads. > > The task is to tune a Spectrum Scale system (comprising ESS and CES > protocol nodes) for the following workload: > A single Linux NFS client mounts an NFS export, extracts a flat tar > archive with lots of ~5KB files. > I'm measuring the speed at which those 5KB files are written (`time tar xf > archive.tar`). > > I do understand that Spectrum Scale is not designed for such workload > (single client, single thread, small files, single directory), and that > such benchmark in not appropriate to benmark the system. > Yet I find myself explaining the performance for such scenario (git > clone..) quite frequently, as customers insist that optimization of that > scenario would impact individual users as it shows task duration. > I want to make sure that I have optimized the system as much as possible > for the given workload, and that I have not overlooked something obvious. > > > When writing to GPFS directly I'm able to write ~1800 files / second in a > test setup. > This is roughly the same on the protocol nodes (NSD client), as well as on > the ESS IO nodes (NSD server). > When writing to the NFS export on the protocol node itself (to avoid any > network effects) I'm only able to write ~230 files / second. > Writing to the NFS export from another node (now including network > latency) gives me ~220 files / second. > > > There seems to be a huge performance degradation by adding NFS-Ganesha to > the software stack alone. I wonder what can be done to minimize the impact. > > > - Ganesha doesn't seem to support 'async' or 'no_wdelay' options... > anything equivalent available? > - Is there and expected advantage of using the network-latency tuned > profile, as opposed to the ESS default throughput-performance profile? > - Are there other relevant Kernel params? > - Is there an expected advantage of raising the number of threads (NSD > server (nsd*WorkerThreads) / NSD client (workerThreads) / Ganesha > (NB_WORKER)) for the given workload (single client, single thread, small > files)? > - Are there other relevant GPFS params? > - Impact of Sync replication, disk latency, etc is understood. > - I'm aware that 'the real thing' would be to work with larger files in a > multithreaded manner from multiple nodes - and that this scenario will > scale quite well. > I just want to ensure that I'm not missing something obvious over > reiterating that massage to customers. > > Any help was greatly appreciated - thanks much in advance! > Alexander Saupp > IBM Germany > > > Mit freundlichen Gr??en / Kind regards > > *Alexander Saupp* > > IBM Systems, Storage Platform, EMEA Storage Competence Center > ------------------------------ > Phone: +49 7034-643-1512 IBM Deutschland GmbH > Mobile: +49-172 7251072 Am Weiher 24 > Email: alexander.saupp at de.ibm.com 65451 Kelsterbach > Germany > ------------------------------ > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan > Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 54993307.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From kums at us.ibm.com Mon Oct 15 23:34:50 2018 From: kums at us.ibm.com (Kumaran Rajaram) Date: Mon, 15 Oct 2018 18:34:50 -0400 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Hi Alexander, 1. >>When writing to GPFS directly I'm able to write ~1800 files / second in a test setup. >>This is roughly the same on the protocol nodes (NSD client), as well as on the ESS IO nodes (NSD server). 2. >> When writing to the NFS export on the protocol node itself (to avoid any network effects) I'm only able to write ~230 files / second. IMHO #2, writing to the NFS export on the protocol node should be same as #1. Protocol node is also a NSD client and when you write from a protocol node, it will use the NSD protocol to write to the ESS IO nodes. In #1, you cite seeing ~1800 files from protocol node and in #2 you cite seeing ~230 file/sec which seem to contradict each other. >>Writing to the NFS export from another node (now including network latency) gives me ~220 files / second. IMHO, this workload "single client, single thread, small files, single directory - tar xf" is synchronous is nature and will result in single outstanding file to be sent from the NFS client to the CES node. Hence, the performance will be limited by network latency/capability between the NFS client and CES node for small IO size (~5KB file size). Also, what is the network interconnect/interface between the NFS client and CES node? Is the network 10GigE since @220 file/s for 5KiB file-size will saturate 1 x 10GigE link. 220 files/sec * 5KiB file size ==> ~1.126 GB/s. >> I'm aware that 'the real thing' would be to work with larger files in a multithreaded manner from multiple nodes - and that this scenario will scale quite well. Yes, larger file-size + multiple threads + multiple NFS client nodes will help to scale performance further by having more NFS I/O requests scheduled/pipelined over the network and processed on the CES nodes. >> I just want to ensure that I'm not missing something obvious over reiterating that massage to customers. Adding NFS experts/team, for advise. My two cents. Best Regards, -Kums From: "Alexander Saupp" To: gpfsug-discuss at spectrumscale.org Date: 10/15/2018 02:20 PM Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Spectrum Scale mailing list, I'm part of IBM Lab Services - currently i'm having multiple customers asking me for optimization of a similar workloads. The task is to tune a Spectrum Scale system (comprising ESS and CES protocol nodes) for the following workload: A single Linux NFS client mounts an NFS export, extracts a flat tar archive with lots of ~5KB files. I'm measuring the speed at which those 5KB files are written (`time tar xf archive.tar`). I do understand that Spectrum Scale is not designed for such workload (single client, single thread, small files, single directory), and that such benchmark in not appropriate to benmark the system. Yet I find myself explaining the performance for such scenario (git clone..) quite frequently, as customers insist that optimization of that scenario would impact individual users as it shows task duration. I want to make sure that I have optimized the system as much as possible for the given workload, and that I have not overlooked something obvious. When writing to GPFS directly I'm able to write ~1800 files / second in a test setup. This is roughly the same on the protocol nodes (NSD client), as well as on the ESS IO nodes (NSD server). When writing to the NFS export on the protocol node itself (to avoid any network effects) I'm only able to write ~230 files / second. Writing to the NFS export from another node (now including network latency) gives me ~220 files / second. There seems to be a huge performance degradation by adding NFS-Ganesha to the software stack alone. I wonder what can be done to minimize the impact. - Ganesha doesn't seem to support 'async' or 'no_wdelay' options... anything equivalent available? - Is there and expected advantage of using the network-latency tuned profile, as opposed to the ESS default throughput-performance profile? - Are there other relevant Kernel params? - Is there an expected advantage of raising the number of threads (NSD server (nsd*WorkerThreads) / NSD client (workerThreads) / Ganesha (NB_WORKER)) for the given workload (single client, single thread, small files)? - Are there other relevant GPFS params? - Impact of Sync replication, disk latency, etc is understood. - I'm aware that 'the real thing' would be to work with larger files in a multithreaded manner from multiple nodes - and that this scenario will scale quite well. I just want to ensure that I'm not missing something obvious over reiterating that massage to customers. Any help was greatly appreciated - thanks much in advance! Alexander Saupp IBM Germany Mit freundlichen Gr??en / Kind regards Alexander Saupp IBM Systems, Storage Platform, EMEA Storage Competence Center Phone: +49 7034-643-1512 IBM Deutschland GmbH Mobile: +49-172 7251072 Am Weiher 24 Email: alexander.saupp at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Oct 15 20:09:19 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 15 Oct 2018 19:09:19 +0000 Subject: [gpfsug-discuss] mmfileid on 2 NSDs simultaneously? In-Reply-To: References: Message-ID: <4C0E90D1-14DA-44A1-B037-95C17076193C@vanderbilt.edu> Marc, Ugh - sorry, completely overlooked that? Kevin On Oct 15, 2018, at 1:44 PM, Marc A Kaplan > wrote: How about using the -F option? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb6d9700cd6ff4bbed85808d632ce4ff2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636752259026486137&sdata=mBfANLkK8v2ZEahGumE4a7iVIAcVJXb1Dv2kgSrynrI%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Tue Oct 16 01:42:14 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 15 Oct 2018 20:42:14 -0400 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: <5824.1539650534@turing-police.cc.vt.edu> On Mon, 15 Oct 2018 18:34:50 -0400, "Kumaran Rajaram" said: > 1. >>When writing to GPFS directly I'm able to write ~1800 files / second in a test setup. > >>This is roughly the same on the protocol nodes (NSD client), as well as > on the ESS IO nodes (NSD server). > > 2. >> When writing to the NFS export on the protocol node itself (to avoid > any network effects) I'm only able to write ~230 files / second. > IMHO #2, writing to the NFS export on the protocol node should be same as #1. > Protocol node is also a NSD client and when you write from a protocol node, it > will use the NSD protocol to write to the ESS IO nodes. In #1, you cite seeing > ~1800 files from protocol node and in #2 you cite seeing ~230 file/sec which > seem to contradict each other. I think he means this: 1) ssh nsd_server 2) cd /gpfs/filesystem/testarea 3) (whomp out 1800 files/sec) 4) mount -t nfs localhost:/gpfs/filesystem/testarea /mnt/test 5) cd /mnt/test 6) Watch the same test struggle to hit 230. Indicating the issue is going from NFS to GPFS (For what it's worth, we've had issues with Ganesha as well...) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Achim.Rehor at de.ibm.com Tue Oct 16 10:39:14 2018 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Tue, 16 Oct 2018 11:39:14 +0200 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: From diederich at de.ibm.com Tue Oct 16 13:31:20 2018 From: diederich at de.ibm.com (Michael Diederich) Date: Tue, 16 Oct 2018 14:31:20 +0200 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS In-Reply-To: <5824.1539650534@turing-police.cc.vt.edu> References: <5824.1539650534@turing-police.cc.vt.edu> Message-ID: All NFS IO requires syncing. The client does send explicit fsync (commit). If the NFS server does not sync, a server fail will cause data loss! (for small files <1M it really does not matter if it is sync on write or sync on close/explicit commit) while that may be ok for a "git pull" or similar, in general it violates the NFS spec. The client can decide to cache, and usually NFSv4 does less caching (for better consistency) So the observed factor 100 is realistic. Latencies will make matters worse, so the FS should be tuned for very small random IO (small blocksize - small subblock-size will not help) If you were to put the Linux kernel NFS server into the picture, it will behave very much the same - although Ganesha could be a bit more efficient (by some percent - certainly less then 200%). But hey - this is a GPFS cluster not some NAS box. Run "git pull" on tthe GPFS client. Enjoy the 1800 files/sec (or more). Modify the files on your XY client mounting over NFS. Use a wrapper script to automatically have your AD or LDAP user id SSH into the cluster to perform it. Michael Mit freundlichen Gr??en / with best regards Michael Diederich IBM Systems Group Spectrum Scale Software Development Contact Information IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen Registergericht: Amtsgericht Stuttgart, HRB 243294 mail: fon: address: michael.diederich at de.ibm.com +49-7034-274-4062 Am Weiher 24 D-65451 Kelsterbach From: valdis.kletnieks at vt.edu To: gpfsug main discussion list Cc: Silvana De Gyves , Jay Vaddi , Michael Diederich Date: 10/16/2018 02:42 AM Subject: Re: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS Sent by: Valdis Kletnieks On Mon, 15 Oct 2018 18:34:50 -0400, "Kumaran Rajaram" said: > 1. >>When writing to GPFS directly I'm able to write ~1800 files / second in a test setup. > >>This is roughly the same on the protocol nodes (NSD client), as well as > on the ESS IO nodes (NSD server). > > 2. >> When writing to the NFS export on the protocol node itself (to avoid > any network effects) I'm only able to write ~230 files / second. > IMHO #2, writing to the NFS export on the protocol node should be same as #1. > Protocol node is also a NSD client and when you write from a protocol node, it > will use the NSD protocol to write to the ESS IO nodes. In #1, you cite seeing > ~1800 files from protocol node and in #2 you cite seeing ~230 file/sec which > seem to contradict each other. I think he means this: 1) ssh nsd_server 2) cd /gpfs/filesystem/testarea 3) (whomp out 1800 files/sec) 4) mount -t nfs localhost:/gpfs/filesystem/testarea /mnt/test 5) cd /mnt/test 6) Watch the same test struggle to hit 230. Indicating the issue is going from NFS to GPFS (For what it's worth, we've had issues with Ganesha as well...) [attachment "att4z9wh.dat" deleted by Michael Diederich/Germany/IBM] -------------- next part -------------- An HTML attachment was scrubbed... URL: From KKR at lbl.gov Tue Oct 16 14:20:08 2018 From: KKR at lbl.gov (Kristy Kallback-Rose) Date: Tue, 16 Oct 2018 14:20:08 +0100 Subject: [gpfsug-discuss] Presentations and SC18 Sign Up Message-ID: Quick message, more later. The presentation bundle (zip file) from the September UG meeting at ORNL is now here: https://www.spectrumscaleug.org/presentations/ I'll add more details there soon. If you haven't signed up for SC18's UG meeting yet, you can should do so here: https://ibm.co/2CjZyHG SC18 agenda is being discussed today. Hoping for more details about that soon. Cheers, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Tue Oct 16 17:44:08 2018 From: spectrumscale at kiranghag.com (KG) Date: Tue, 16 Oct 2018 19:44:08 +0300 Subject: [gpfsug-discuss] error compiling IOR on GPFS In-Reply-To: References: Message-ID: Thanks Olaf It worked. On Fri, Oct 12, 2018, 13:43 Olaf Weiser wrote: > I think the step you are missing is this... > > > > > ./configure LIBS=/usr/lpp/mmfs/lib/libgpfs.so > make > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 > E-Mail: olaf.weiser at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: KG > To: gpfsug main discussion list > Date: 10/12/2018 12:40 PM > Subject: Re: [gpfsug-discuss] error compiling IOR on GPFS > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi John > > Yes, I am using latest version from this link. > > Do I have to use any additional switches for compilation? I used following > sequence > ./bootstrap > ./configure > ./make (fails) > > > On Fri, Oct 12, 2018 at 7:51 AM John Bent <*johnbent at gmail.com* > > wrote: > Kiran, > > Are you using the latest version of IOR? > *https://github.com/hpc/ior* > > Thanks, > > John > > On Thu, Oct 11, 2018 at 10:39 PM KG <*spectrumscale at kiranghag.com* > > wrote: > Hi Folks > > I am trying to compile IOR on a GPFS filesystem and running into following > errors. > > Github forum says that "The configure script does not add -lgpfs to the > CFLAGS when it detects GPFS support." > > Any help on how to get around this? > > mpicc -DHAVE_CONFIG_H -I. -g -O2 -MT aiori-MPIIO.o -MD -MP -MF > .deps/aiori-MPIIO.Tpo -c -o aiori-MPIIO.o aiori-MPIIO.c > aiori-MPIIO.c: In function ?MPIIO_Xfer?: > aiori-MPIIO.c:236:24: warning: assignment from incompatible pointer type > [enabled by default] > Access = MPI_File_write; > ^ > aiori-MPIIO.c:237:27: warning: assignment from incompatible pointer type > [enabled by default] > Access_at = MPI_File_write_at; > ^ > aiori-MPIIO.c:238:28: warning: assignment from incompatible pointer type > [enabled by default] > Access_all = MPI_File_write_all; > ^ > aiori-MPIIO.c:239:31: warning: assignment from incompatible pointer type > [enabled by default] > Access_at_all = MPI_File_write_at_all; > ^ > mv -f .deps/aiori-MPIIO.Tpo .deps/aiori-MPIIO.Po > mpicc -g -O2 -o ior ior.o utilities.o parse_options.o aiori-POSIX.o > aiori-MPIIO.o -lm > aiori-POSIX.o: In function `gpfs_free_all_locks': > /gpfs/Aramco_POC/ior-master/src/aiori-POSIX.c:118: undefined reference to > `gpfs_fcntl' > aiori-POSIX.o: In function `gpfs_access_start': > aiori-POSIX.c:(.text+0x91f): undefined reference to `gpfs_fcntl' > aiori-POSIX.o: In function `gpfs_access_end': > aiori-POSIX.c:(.text+0xa04): undefined reference to `gpfs_fcntl' > collect2: error: ld returned 1 exit status > make[2]: *** [ior] Error 1 > make[2]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' > make[1]: *** [all] Error 2 > make[1]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' > make: *** [all-recursive] Error 1 > > Kiran > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alexander.Saupp at de.ibm.com Wed Oct 17 12:44:41 2018 From: Alexander.Saupp at de.ibm.com (Alexander Saupp) Date: Wed, 17 Oct 2018 13:44:41 +0200 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Message-ID: Dear Mailing List readers, I've come to a preliminary conclusion that explains the behavior in an appropriate manner, so I'm trying to summarize my current thinking with this audience. Problem statement: Big performance derivation between native GPFS (fast) and loopback NFS mount on the same node (way slower) for single client, single thread, small files workload. Current explanation: tar seems to use close() on files, not fclose(). That is an application choice and common behavior. The ideas is to allow OS write caching to speed up process run time. When running locally on ext3 / xfs / GPFS / .. that allows async destaging of data down to disk, somewhat compromising data for better performance. As we're talking about write caching on the same node that the application runs on - a crash is missfortune but in the same failure domain. E.g. if you run a compile job that includes extraction of a tar and the node crashes you'll have to restart the entire job, anyhow. The NFSv2 spec defined that NFS io's are to be 'sync', probably because the compile job on the nfs client would survive if the NFS Server crashes, so the failure domain would be different NFSv3 in rfc1813 below acknowledged the performance impact and introduced the 'async' flag for NFS, which would handle IO's similar to local IOs, allowing to destage in the background. Keep in mind - applications, independent if running locally or via NFS can always decided to use the fclose() option, which will ensure that data is destaged to persistent storage right away. But its an applications choice if that's really mandatory or whether performance has higher priority. The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache down to disk - very filesystem independent. -> single client, single thread, small files workload on GPFS can be destaged async, allowing to hide latency and parallelizing disk IOs. -> NFS client IO's are sync, so the second IO can only be started after the first one hit non volatile memory -> much higher latency The Spectrum Scale NFS implementation (based on ganesha) does not support the async mount option, which is a bit of a pitty. There might also be implementation differences compared to kernel-nfs, I did not investigate into that direction. However, the principles of the difference are explained for my by the above behavior. One workaround that I saw working well for multiple customers was to replace the NFS client by a Spectrum Scale nsd client. That has two advantages, but is certainly not suitable in all cases: - Improved speed by efficent NSD protocol and NSD client side write caching - Write Caching in the same failure domain as the application (on NSD client) which seems to be more reasonable compared to NFS Server side write caching. References: NFS sync vs async https://tools.ietf.org/html/rfc1813 The write throughput bottleneck caused by the synchronous definition of write in the NFS version 2 protocol has been addressed by adding support so that the NFS server can do unsafe writes. Unsafe writes are writes which have not been committed to stable storage before the operation returns. This specification defines a method for committing these unsafe writes to stable storage in a reliable way. sync() vs fsync() https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm - An application program makes an fsync() call for a specified file. This causes all of the pages that contain modified data for that file to be written to disk. The writing is complete when the fsync() call returns to the program. - An application program makes a sync() call. This causes all of the file pages in memory that contain modified data to be scheduled for writing to disk. The writing is not necessarily complete when the sync() call returns to the program. - A user can enter the sync command, which in turn issues a sync() call. Again, some of the writes may not be complete when the user is prompted for input (or the next command in a shell script is processed). close() vs fclose() A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a file system to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.) Mit freundlichen Gr??en / Kind regards Alexander Saupp IBM Systems, Storage Platform, EMEA Storage Competence Center Phone: +49 7034-643-1512 IBM Deutschland GmbH Mobile: +49-172 7251072 Am Weiher 24 Email: alexander.saupp at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19995626.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From janfrode at tanso.net Wed Oct 17 13:24:01 2018 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 17 Oct 2018 08:24:01 -0400 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Do you know if the slow throughput is caused by the network/nfs-protocol layer, or does it help to use faster storage (ssd)? If on storage, have you considered if HAWC can help? I?m thinking about adding an SSD pool as a first tier to hold the active dataset for a similar setup, but that?s mainly to solve the small file read workload (i.e. random I/O ). -jf ons. 17. okt. 2018 kl. 07:47 skrev Alexander Saupp < Alexander.Saupp at de.ibm.com>: > Dear Mailing List readers, > > I've come to a preliminary conclusion that explains the behavior in an > appropriate manner, so I'm trying to summarize my current thinking with > this audience. > > *Problem statement: * > > Big performance derivation between native GPFS (fast) and loopback NFS > mount on the same node (way slower) for single client, single thread, small > files workload. > > > > *Current explanation:* > > tar seems to use close() on files, not fclose(). That is an > application choice and common behavior. The ideas is to allow OS write > caching to speed up process run time. > > When running locally on ext3 / xfs / GPFS / .. that allows async > destaging of data down to disk, somewhat compromising data for better > performance. > As we're talking about write caching on the same node that the > application runs on - a crash is missfortune but in the same failure domain. > E.g. if you run a compile job that includes extraction of a tar and > the node crashes you'll have to restart the entire job, anyhow. > > The NFSv2 spec defined that NFS io's are to be 'sync', probably > because the compile job on the nfs client would survive if the NFS Server > crashes, so the failure domain would be different > > NFSv3 in rfc1813 below acknowledged the performance impact and > introduced the 'async' flag for NFS, which would handle IO's similar to > local IOs, allowing to destage in the background. > > Keep in mind - applications, independent if running locally or via NFS > can always decided to use the fclose() option, which will ensure that data > is destaged to persistent storage right away. > But its an applications choice if that's really mandatory or whether > performance has higher priority. > > The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache > down to disk - very filesystem independent. > > > -> single client, single thread, small files workload on GPFS can be > destaged async, allowing to hide latency and parallelizing disk IOs. > -> NFS client IO's are sync, so the second IO can only be started after > the first one hit non volatile memory -> much higher latency > > > > The Spectrum Scale NFS implementation (based on ganesha) does not > support the async mount option, which is a bit of a pitty. There might also > be implementation differences compared to kernel-nfs, I did not investigate > into that direction. > > However, the principles of the difference are explained for my by the > above behavior. > > One workaround that I saw working well for multiple customers was to > replace the NFS client by a Spectrum Scale nsd client. > That has two advantages, but is certainly not suitable in all cases: > - Improved speed by efficent NSD protocol and NSD client side write > caching > - Write Caching in the same failure domain as the application (on > NSD client) which seems to be more reasonable compared to NFS Server side > write caching. > > > *References:* > > NFS sync vs async > https://tools.ietf.org/html/rfc1813 > *The write throughput bottleneck caused by the synchronous definition of > write in the NFS version 2 protocol has been addressed by adding support so > that the NFS server can do unsafe writes.* > Unsafe writes are writes which have not been committed to stable storage > before the operation returns. This specification defines a method for > committing these unsafe writes to stable storage in a reliable way. > > > *sync() vs fsync()* > > https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm > - An application program makes an fsync() call for a specified file. This > causes all of the pages that contain modified data for that file to be > written to disk. The writing is complete when the fsync() call returns to > the program. > > - An application program makes a sync() call. This causes all of the file > pages in memory that contain modified data to be scheduled for writing to > disk. The writing is not necessarily complete when the sync() call returns > to the program. > > - A user can enter the sync command, which in turn issues a sync() call. > Again, some of the writes may not be complete when the user is prompted for > input (or the next command in a shell script is processed). > > > *close() vs fclose()* > A successful close does not guarantee that the data has been successfully > saved to disk, as the kernel defers writes. It is not common for a file > system to flush the buffers when the stream is closed. If you need to be > sure that the data is > physically stored use fsync(2). (It will depend on the disk hardware at > this point.) > > > Mit freundlichen Gr??en / Kind regards > > *Alexander Saupp* > > IBM Systems, Storage Platform, EMEA Storage Competence Center > ------------------------------ > Phone: +49 7034-643-1512 IBM Deutschland GmbH > Mobile: +49-172 7251072 Am Weiher 24 > Email: alexander.saupp at de.ibm.com 65451 Kelsterbach > Germany > ------------------------------ > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan > Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19995626.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From olaf.weiser at de.ibm.com Wed Oct 17 14:15:12 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 17 Oct 2018 15:15:12 +0200 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Wed Oct 17 14:26:52 2018 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 17 Oct 2018 16:26:52 +0300 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Just to clarify ( from man exports): " async This option allows the NFS server to violate the NFS protocol and reply to requests before any changes made by that request have been committed to stable storage (e.g. disc drive). Using this option usually improves performance, but at the cost that an unclean server restart (i.e. a crash) can cause data to be lost or corrupted." With the Ganesha implementation in Spectrum Scale, it was decided not to allow this violation - so this async export options wasn't exposed. I believe that for those customers that agree to take the risk, using async mount option ( from the client) will achieve similar behavior. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Olaf Weiser" To: gpfsug main discussion list Date: 17/10/2018 16:16 Subject: Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Jallo Jan, you can expect to get slightly improved numbers from the lower response times of the HAWC ... but the loss of performance comes from the fact, that GPFS or (async kNFS) writes with multiple parallel threads - in opposite to e.g. tar via GaneshaNFS comes with single threads fsync on each file.. you'll never outperform e.g. 128 (maybe slower), but, parallel threads (running write-behind) <---> with one single but fast threads, .... so as Alex suggest.. if possible.. take gpfs client of kNFS for those types of workloads.. From: Jan-Frode Myklebust To: gpfsug main discussion list Date: 10/17/2018 02:24 PM Subject: Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Do you know if the slow throughput is caused by the network/nfs-protocol layer, or does it help to use faster storage (ssd)? If on storage, have you considered if HAWC can help? I?m thinking about adding an SSD pool as a first tier to hold the active dataset for a similar setup, but that?s mainly to solve the small file read workload (i.e. random I/O ). -jf ons. 17. okt. 2018 kl. 07:47 skrev Alexander Saupp < Alexander.Saupp at de.ibm.com>: Dear Mailing List readers, I've come to a preliminary conclusion that explains the behavior in an appropriate manner, so I'm trying to summarize my current thinking with this audience. Problem statement: Big performance derivation between native GPFS (fast) and loopback NFS mount on the same node (way slower) for single client, single thread, small files workload. Current explanation: tar seems to use close() on files, not fclose(). That is an application choice and common behavior. The ideas is to allow OS write caching to speed up process run time. When running locally on ext3 / xfs / GPFS / .. that allows async destaging of data down to disk, somewhat compromising data for better performance. As we're talking about write caching on the same node that the application runs on - a crash is missfortune but in the same failure domain. E.g. if you run a compile job that includes extraction of a tar and the node crashes you'll have to restart the entire job, anyhow. The NFSv2 spec defined that NFS io's are to be 'sync', probably because the compile job on the nfs client would survive if the NFS Server crashes, so the failure domain would be different NFSv3 in rfc1813 below acknowledged the performance impact and introduced the 'async' flag for NFS, which would handle IO's similar to local IOs, allowing to destage in the background. Keep in mind - applications, independent if running locally or via NFS can always decided to use the fclose() option, which will ensure that data is destaged to persistent storage right away. But its an applications choice if that's really mandatory or whether performance has higher priority. The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache down to disk - very filesystem independent. -> single client, single thread, small files workload on GPFS can be destaged async, allowing to hide latency and parallelizing disk IOs. -> NFS client IO's are sync, so the second IO can only be started after the first one hit non volatile memory -> much higher latency The Spectrum Scale NFS implementation (based on ganesha) does not support the async mount option, which is a bit of a pitty. There might also be implementation differences compared to kernel-nfs, I did not investigate into that direction. However, the principles of the difference are explained for my by the above behavior. One workaround that I saw working well for multiple customers was to replace the NFS client by a Spectrum Scale nsd client. That has two advantages, but is certainly not suitable in all cases: - Improved speed by efficent NSD protocol and NSD client side write caching - Write Caching in the same failure domain as the application (on NSD client) which seems to be more reasonable compared to NFS Server side write caching. References: NFS sync vs async https://tools.ietf.org/html/rfc1813 The write throughput bottleneck caused by the synchronous definition of write in the NFS version 2 protocol has been addressed by adding support so that the NFS server can do unsafe writes. Unsafe writes are writes which have not been committed to stable storage before the operation returns. This specification defines a method for committing these unsafe writes to stable storage in a reliable way. sync() vs fsync() https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm - An application program makes an fsync() call for a specified file. This causes all of the pages that contain modified data for that file to be written to disk. The writing is complete when the fsync() call returns to the program. - An application program makes a sync() call. This causes all of the file pages in memory that contain modified data to be scheduled for writing to disk. The writing is not necessarily complete when the sync() call returns to the program. - A user can enter the sync command, which in turn issues a sync() call. Again, some of the writes may not be complete when the user is prompted for input (or the next command in a shell script is processed). close() vs fclose() A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a file system to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.) Mit freundlichen Gr??en / Kind regards Alexander Saupp IBM Systems, Storage Platform, EMEA Storage Competence Center Phone: +49 7034-643-1512 IBM Deutschland GmbH Mobile: +49-172 7251072 Am Weiher 24 Email: alexander.saupp at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "ecblank.gif" deleted by Olaf Weiser/Germany/IBM] [attachment "19995626.gif" deleted by Olaf Weiser/Germany/IBM] [attachment "ecblank.gif" deleted by Olaf Weiser/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From MKEIGO at jp.ibm.com Wed Oct 17 14:34:55 2018 From: MKEIGO at jp.ibm.com (Keigo Matsubara) Date: Wed, 17 Oct 2018 22:34:55 +0900 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: I also wonder how many products actually exploit NFS async mode to improve I/O performance by sacrificing the file system consistency risk: gpfsug-discuss-bounces at spectrumscale.org wrote on 2018/10/17 22:26:52: > Using this option usually improves performance, but at > the cost that an unclean server restart (i.e. a crash) can cause > data to be lost or corrupted." For instance, NetApp, at the very least FAS 3220 running Data OnTap 8.1.2p4 7-mode which I tested with, would forcibly *promote* async mode to sync mode. Promoting means even if NFS client requests async mount mode, the NFS server ignores and allows only sync mount mode. Best Regards, --- Keigo Matsubara, Storage Solutions Client Technical Specialist, IBM Japan TEL: +81-50-3150-0595, T/L: 6205-0595 -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Oct 17 14:35:22 2018 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 17 Oct 2018 09:35:22 -0400 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: My thinking was mainly that single threaded 200 files/second == 5 ms/file. Where do these 5 ms go? Is it NFS protocol overhead, or is it waiting for I/O so that it can be fixed with a lower latency storage backend? -jf On Wed, Oct 17, 2018 at 9:15 AM Olaf Weiser wrote: > Jallo Jan, > you can expect to get slightly improved numbers from the lower response > times of the HAWC ... but the loss of performance comes from the fact, that > GPFS or (async kNFS) writes with multiple parallel threads - in opposite > to e.g. tar via GaneshaNFS comes with single threads fsync on each file.. > > you'll never outperform e.g. 128 (maybe slower), but, parallel threads > (running write-behind) <---> with one single but fast threads, .... > > so as Alex suggest.. if possible.. take gpfs client of kNFS for those > types of workloads.. > > > > > > > > > > > From: Jan-Frode Myklebust > To: gpfsug main discussion list > Date: 10/17/2018 02:24 PM > Subject: Re: [gpfsug-discuss] Preliminary conclusion: single > client, single thread, small files - native Scale vs NFS > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Do you know if the slow throughput is caused by the network/nfs-protocol > layer, or does it help to use faster storage (ssd)? If on storage, have you > considered if HAWC can help? > > I?m thinking about adding an SSD pool as a first tier to hold the active > dataset for a similar setup, but that?s mainly to solve the small file read > workload (i.e. random I/O ). > > > -jf > ons. 17. okt. 2018 kl. 07:47 skrev Alexander Saupp < > *Alexander.Saupp at de.ibm.com* >: > Dear Mailing List readers, > > I've come to a preliminary conclusion that explains the behavior in an > appropriate manner, so I'm trying to summarize my current thinking with > this audience. > > *Problem statement: * > Big performance derivation between native GPFS (fast) and loopback NFS > mount on the same node (way slower) for single client, single thread, small > files workload. > > > *Current explanation:* > tar seems to use close() on files, not fclose(). That is an application > choice and common behavior. The ideas is to allow OS write caching to speed > up process run time. > > When running locally on ext3 / xfs / GPFS / .. that allows async destaging > of data down to disk, somewhat compromising data for better performance. > As we're talking about write caching on the same node that the application > runs on - a crash is missfortune but in the same failure domain. > E.g. if you run a compile job that includes extraction of a tar and the > node crashes you'll have to restart the entire job, anyhow. > > The NFSv2 spec defined that NFS io's are to be 'sync', probably because > the compile job on the nfs client would survive if the NFS Server crashes, > so the failure domain would be different > > NFSv3 in rfc1813 below acknowledged the performance impact and introduced > the 'async' flag for NFS, which would handle IO's similar to local IOs, > allowing to destage in the background. > > Keep in mind - applications, independent if running locally or via NFS can > always decided to use the fclose() option, which will ensure that data is > destaged to persistent storage right away. > But its an applications choice if that's really mandatory or whether > performance has higher priority. > > The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache down > to disk - very filesystem independent. > > -> single client, single thread, small files workload on GPFS can be > destaged async, allowing to hide latency and parallelizing disk IOs. > -> NFS client IO's are sync, so the second IO can only be started after > the first one hit non volatile memory -> much higher latency > > > The Spectrum Scale NFS implementation (based on ganesha) does not support > the async mount option, which is a bit of a pitty. There might also be > implementation differences compared to kernel-nfs, I did not investigate > into that direction. > > However, the principles of the difference are explained for my by the > above behavior. > > One workaround that I saw working well for multiple customers was to > replace the NFS client by a Spectrum Scale nsd client. > That has two advantages, but is certainly not suitable in all cases: > - Improved speed by efficent NSD protocol and NSD client side write caching > - Write Caching in the same failure domain as the application (on NSD > client) which seems to be more reasonable compared to NFS Server side write > caching. > > *References:* > > NFS sync vs async > *https://tools.ietf.org/html/rfc1813* > > *The write throughput bottleneck caused by the synchronous definition of > write in the NFS version 2 protocol has been addressed by adding support so > that the NFS server can do unsafe writes.* > Unsafe writes are writes which have not been committed to stable storage > before the operation returns. This specification defines a method for > committing these unsafe writes to stable storage in a reliable way. > > > *sync() vs fsync()* > > *https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm* > > - An application program makes an fsync() call for a specified file. This > causes all of the pages that contain modified data for that file to be > written to disk. The writing is complete when the fsync() call returns to > the program. > > - An application program makes a sync() call. This causes all of the file > pages in memory that contain modified data to be scheduled for writing to > disk. The writing is not necessarily complete when the sync() call returns > to the program. > > - A user can enter the sync command, which in turn issues a sync() call. > Again, some of the writes may not be complete when the user is prompted for > input (or the next command in a shell script is processed). > > > *close() vs fclose()* > A successful close does not guarantee that the data has been successfully > saved to disk, as the kernel defers writes. It is not common for a file > system to flush the buffers when the stream is closed. If you need to be > sure that the data is > physically stored use fsync(2). (It will depend on the disk hardware at > this point.) > > > Mit freundlichen Gr??en / Kind regards > > *Alexander Saupp* > > IBM Systems, Storage Platform, EMEA Storage Competence Center > ------------------------------ > Phone: +49 7034-643-1512 IBM Deutschland GmbH > Mobile: +49-172 7251072 Am Weiher 24 > Email: *alexander.saupp at de.ibm.com* 65451 > Kelsterbach > Germany > ------------------------------ > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan > Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > *[attachment > "ecblank.gif" deleted by Olaf Weiser/Germany/IBM] [attachment > "19995626.gif" deleted by Olaf Weiser/Germany/IBM] [attachment > "ecblank.gif" deleted by Olaf Weiser/Germany/IBM] * > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Wed Oct 17 14:41:03 2018 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 17 Oct 2018 16:41:03 +0300 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Hi, Without going into to much details, AFAIR, Ontap integrate NVRAM into the NFS write cache ( as it was developed as a NAS product). Ontap is using the STABLE bit which kind of tell the client "hey, I have no write cache at all, everything is written to stable storage - thus, don't bother with commits ( sync) commands - they are meaningless". Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Keigo Matsubara" To: gpfsug main discussion list Date: 17/10/2018 16:35 Subject: Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Sent by: gpfsug-discuss-bounces at spectrumscale.org I also wonder how many products actually exploit NFS async mode to improve I/O performance by sacrificing the file system consistency risk: gpfsug-discuss-bounces at spectrumscale.org wrote on 2018/10/17 22:26:52: > Using this option usually improves performance, but at > the cost that an unclean server restart (i.e. a crash) can cause > data to be lost or corrupted." For instance, NetApp, at the very least FAS 3220 running Data OnTap 8.1.2p4 7-mode which I tested with, would forcibly *promote* async mode to sync mode. Promoting means even if NFS client requests async mount mode, the NFS server ignores and allows only sync mount mode. Best Regards, --- Keigo Matsubara, Storage Solutions Client Technical Specialist, IBM Japan TEL: +81-50-3150-0595, T/L: 6205-0595 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Wed Oct 17 14:42:02 2018 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Wed, 17 Oct 2018 15:42:02 +0200 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: <5508e483-25ef-d318-0c68-4009cb5871cc@ugent.be> hi all, has anyone tried to use tools like eatmydata that allow the user to "ignore" the syncs (there's another tool that has less explicit name if it would make you feel better ;). stijn On 10/17/2018 03:26 PM, Tomer Perry wrote: > Just to clarify ( from man exports): > " async This option allows the NFS server to violate the NFS protocol > and reply to requests before any changes made by that request have been > committed to stable storage (e.g. > disc drive). > > Using this option usually improves performance, but at the > cost that an unclean server restart (i.e. a crash) can cause data to be > lost or corrupted." > > With the Ganesha implementation in Spectrum Scale, it was decided not to > allow this violation - so this async export options wasn't exposed. > I believe that for those customers that agree to take the risk, using > async mount option ( from the client) will achieve similar behavior. > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: "Olaf Weiser" > To: gpfsug main discussion list > Date: 17/10/2018 16:16 > Subject: Re: [gpfsug-discuss] Preliminary conclusion: single > client, single thread, small files - native Scale vs NFS > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Jallo Jan, > you can expect to get slightly improved numbers from the lower response > times of the HAWC ... but the loss of performance comes from the fact, > that > GPFS or (async kNFS) writes with multiple parallel threads - in opposite > to e.g. tar via GaneshaNFS comes with single threads fsync on each file.. > > > you'll never outperform e.g. 128 (maybe slower), but, parallel threads > (running write-behind) <---> with one single but fast threads, .... > > so as Alex suggest.. if possible.. take gpfs client of kNFS for those > types of workloads.. > > > > > > > > > > > From: Jan-Frode Myklebust > To: gpfsug main discussion list > Date: 10/17/2018 02:24 PM > Subject: Re: [gpfsug-discuss] Preliminary conclusion: single > client, single thread, small files - native Scale vs NFS > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Do you know if the slow throughput is caused by the network/nfs-protocol > layer, or does it help to use faster storage (ssd)? If on storage, have > you considered if HAWC can help? > > I?m thinking about adding an SSD pool as a first tier to hold the active > dataset for a similar setup, but that?s mainly to solve the small file > read workload (i.e. random I/O ). > > > -jf > ons. 17. okt. 2018 kl. 07:47 skrev Alexander Saupp < > Alexander.Saupp at de.ibm.com>: > Dear Mailing List readers, > > I've come to a preliminary conclusion that explains the behavior in an > appropriate manner, so I'm trying to summarize my current thinking with > this audience. > > Problem statement: > Big performance derivation between native GPFS (fast) and loopback NFS > mount on the same node (way slower) for single client, single thread, > small files workload. > > > Current explanation: > tar seems to use close() on files, not fclose(). That is an application > choice and common behavior. The ideas is to allow OS write caching to > speed up process run time. > > When running locally on ext3 / xfs / GPFS / .. that allows async destaging > of data down to disk, somewhat compromising data for better performance. > As we're talking about write caching on the same node that the application > runs on - a crash is missfortune but in the same failure domain. > E.g. if you run a compile job that includes extraction of a tar and the > node crashes you'll have to restart the entire job, anyhow. > > The NFSv2 spec defined that NFS io's are to be 'sync', probably because > the compile job on the nfs client would survive if the NFS Server crashes, > so the failure domain would be different > > NFSv3 in rfc1813 below acknowledged the performance impact and introduced > the 'async' flag for NFS, which would handle IO's similar to local IOs, > allowing to destage in the background. > > Keep in mind - applications, independent if running locally or via NFS can > always decided to use the fclose() option, which will ensure that data is > destaged to persistent storage right away. > But its an applications choice if that's really mandatory or whether > performance has higher priority. > > The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache down > to disk - very filesystem independent. > > -> single client, single thread, small files workload on GPFS can be > destaged async, allowing to hide latency and parallelizing disk IOs. > -> NFS client IO's are sync, so the second IO can only be started after > the first one hit non volatile memory -> much higher latency > > > The Spectrum Scale NFS implementation (based on ganesha) does not support > the async mount option, which is a bit of a pitty. There might also be > implementation differences compared to kernel-nfs, I did not investigate > into that direction. > > However, the principles of the difference are explained for my by the > above behavior. > > One workaround that I saw working well for multiple customers was to > replace the NFS client by a Spectrum Scale nsd client. > That has two advantages, but is certainly not suitable in all cases: > - Improved speed by efficent NSD protocol and NSD client side write > caching > - Write Caching in the same failure domain as the application (on NSD > client) which seems to be more reasonable compared to NFS Server side > write caching. > > References: > > NFS sync vs async > https://tools.ietf.org/html/rfc1813 > The write throughput bottleneck caused by the synchronous definition of > write in the NFS version 2 protocol has been addressed by adding support > so that the NFS server can do unsafe writes. > Unsafe writes are writes which have not been committed to stable storage > before the operation returns. This specification defines a method for > committing these unsafe writes to stable storage in a reliable way. > > > sync() vs fsync() > https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm > > - An application program makes an fsync() call for a specified file. This > causes all of the pages that contain modified data for that file to be > written to disk. The writing is complete when the fsync() call returns to > the program. > > - An application program makes a sync() call. This causes all of the file > pages in memory that contain modified data to be scheduled for writing to > disk. The writing is not necessarily complete when the sync() call returns > to the program. > > - A user can enter the sync command, which in turn issues a sync() call. > Again, some of the writes may not be complete when the user is prompted > for input (or the next command in a shell script is processed). > > > close() vs fclose() > A successful close does not guarantee that the data has been successfully > saved to disk, as the kernel defers writes. It is not common for a file > system to flush the buffers when the stream is closed. If you need to be > sure that the data is > physically stored use fsync(2). (It will depend on the disk hardware at > this point.) > > > Mit freundlichen Gr??en / Kind regards > > Alexander Saupp > > IBM Systems, Storage Platform, EMEA Storage Competence Center > > > Phone: > +49 7034-643-1512 > IBM Deutschland GmbH > > Mobile: > +49-172 7251072 > Am Weiher 24 > Email: > alexander.saupp at de.ibm.com > 65451 Kelsterbach > > > Germany > > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan > Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "ecblank.gif" > deleted by Olaf Weiser/Germany/IBM] [attachment "19995626.gif" deleted by > Olaf Weiser/Germany/IBM] [attachment "ecblank.gif" deleted by Olaf > Weiser/Germany/IBM] _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From janfrode at tanso.net Wed Oct 17 14:50:38 2018 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 17 Oct 2018 09:50:38 -0400 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Also beware there are 2 different linux NFS "async" settings. A client side setting (mount -o async), which still cases sync on file close() -- and a server (knfs) side setting (/etc/exports) that violates NFS protocol and returns requests before data has hit stable storage. -jf On Wed, Oct 17, 2018 at 9:41 AM Tomer Perry wrote: > Hi, > > Without going into to much details, AFAIR, Ontap integrate NVRAM into the > NFS write cache ( as it was developed as a NAS product). > Ontap is using the STABLE bit which kind of tell the client "hey, I have > no write cache at all, everything is written to stable storage - thus, > don't bother with commits ( sync) commands - they are meaningless". > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: "Keigo Matsubara" > To: gpfsug main discussion list > Date: 17/10/2018 16:35 > Subject: Re: [gpfsug-discuss] Preliminary conclusion: single > client, single thread, small files - native Scale vs NFS > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > I also wonder how many products actually exploit NFS async mode to improve > I/O performance by sacrificing the file system consistency risk: > > gpfsug-discuss-bounces at spectrumscale.org wrote on 2018/10/17 22:26:52: > > Using this option usually improves performance, but at > > the cost that an unclean server restart (i.e. a crash) can cause > > data to be lost or corrupted." > > For instance, NetApp, at the very least FAS 3220 running Data OnTap > 8.1.2p4 7-mode which I tested with, would forcibly *promote* async mode to > sync mode. > Promoting means even if NFS client requests async mount mode, the NFS > server ignores and allows only sync mount mode. > > Best Regards, > --- > Keigo Matsubara, Storage Solutions Client Technical Specialist, IBM Japan > TEL: +81-50-3150-0595, T/L: 6205-0595 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Oct 17 17:22:05 2018 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 17 Oct 2018 09:22:05 -0700 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: <7E9A54A4-304E-42F7-BF4B-06EBC57503FE@gmail.com> while most said here is correct, it can?t explain the performance of 200 files /sec and I couldn?t resist jumping in here :-D lets assume for a second each operation is synchronous and its done by just 1 thread. 200 files / sec means 5 ms on average per file write. Lets be generous and say the network layer is 100 usec per roud-trip network hop (including code processing on protocol node or client) and for visualization lets assume the setup looks like this : ESS Node ---ethernet--- Protocol Node ?ethernet--- client Node . lets say the ESS write cache can absorb small io at a fixed cost of 300 usec if the heads are ethernet connected and not using IB (then it would be more in the 250 usec range). That?s 300 +100(net1) +100(net2) usec or 500 usec in total. So you are a factor 10 off from your number. So lets just assume a create + write is more than just 1 roundtrip worth or synchronization, lets say it needs to do 2 full roundtrips synchronously one for the create and one for the stable write that?s 1 ms, still 5x off of your 5 ms. So either there is a bug in the NFS Server, the NFS client or the storage is not behaving properly. To verify this, the best would be to run the following test : Create a file on the ESS node itself in the shared filesystem like : /usr/lpp/mmfs/samples/perf/gpfsperf create seq -nongpfs -r 4k -n 1m -th 1 -dio /sharedfs/test Now run the following command on one of the ESS nodes, then the protocol node and last the nfs client : /usr/lpp/mmfs/samples/perf/gpfsperf write seq -nongpfs -r 4k -n 1m -th 1 -dio /sharedfs/test This will create 256 stable 4k write i/os to the storage system, I picked the number just to get a statistical relevant number of i/os you can change 1m to 2m or 4m, just don?t make it too high or you might get variations due to de-staging or other side effects happening on the storage system, which you don?t care at this point you want to see the round trip time on each layer. The gpfsperf command will spit out a line like : Data rate was XYZ Kbytes/sec, Op Rate was XYZ Ops/sec, Avg Latency was 0.266 milliseconds, thread utilization 1.000, bytesTransferred 1048576 The only number here that matters is the average latency number , write it down. What I would expect to get back is something like : On ESS Node ? 300 usec average i/o On PN ? 400 usec average i/o On Client ? 500 usec average i/o If you get anything higher than the numbers above something fundamental is bad (in fact on fast system you may see from client no more than 200-300 usec response time) and it will be in the layer in between or below of where you test. If all the numbers are somewhere in line with my numbers above, it clearly points to a problem in NFS itself and the way it communicates with GPFS. Marc, myself and others have debugged numerous issues in this space in the past last one was fixed beginning of this year and ended up in some Scale 5.0.1.X release. To debug this is very hard and most of the time only possible with GPFS source code access which I no longer have. You would start with something like strace -Ttt -f -o tar-debug.out tar -xvf ?..? and check what exact system calls are made to nfs client and how long each takes. You would then run a similar strace on the NFS server to see how many individual system calls will be made to GPFS and how long each takes. This will allow you to narrow down where the issue really is. But I suggest to start with the simpler test above as this might already point to a much simpler problem. Btw. I will be also be speaking at the UG Meeting at SC18 in Dallas, in case somebody wants to catch up ? Sven From: on behalf of Jan-Frode Myklebust Reply-To: gpfsug main discussion list Date: Wednesday, October 17, 2018 at 6:50 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Also beware there are 2 different linux NFS "async" settings. A client side setting (mount -o async), which still cases sync on file close() -- and a server (knfs) side setting (/etc/exports) that violates NFS protocol and returns requests before data has hit stable storage. -jf On Wed, Oct 17, 2018 at 9:41 AM Tomer Perry wrote: Hi, Without going into to much details, AFAIR, Ontap integrate NVRAM into the NFS write cache ( as it was developed as a NAS product). Ontap is using the STABLE bit which kind of tell the client "hey, I have no write cache at all, everything is written to stable storage - thus, don't bother with commits ( sync) commands - they are meaningless". Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Keigo Matsubara" To: gpfsug main discussion list Date: 17/10/2018 16:35 Subject: Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Sent by: gpfsug-discuss-bounces at spectrumscale.org I also wonder how many products actually exploit NFS async mode to improve I/O performance by sacrificing the file system consistency risk: gpfsug-discuss-bounces at spectrumscale.org wrote on 2018/10/17 22:26:52: > Using this option usually improves performance, but at > the cost that an unclean server restart (i.e. a crash) can cause > data to be lost or corrupted." For instance, NetApp, at the very least FAS 3220 running Data OnTap 8.1.2p4 7-mode which I tested with, would forcibly *promote* async mode to sync mode. Promoting means even if NFS client requests async mount mode, the NFS server ignores and allows only sync mount mode. Best Regards, --- Keigo Matsubara, Storage Solutions Client Technical Specialist, IBM Japan TEL: +81-50-3150-0595, T/L: 6205-0595 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Oct 17 22:02:30 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 17 Oct 2018 21:02:30 +0000 Subject: [gpfsug-discuss] Job vacancy @Birmingham Message-ID: We're looking for someone to join our systems team here at University of Birmingham. In case you didn't realise, we're pretty reliant on Spectrum Scale to deliver our storage systems. https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117 Such a snappy URL :-) Feel free to email me *OFFLIST* if you have informal enquiries! Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Oct 18 10:14:51 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 18 Oct 2018 11:14:51 +0200 Subject: [gpfsug-discuss] Job vacancy @Birmingham In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From nathan.harper at cfms.org.uk Thu Oct 18 10:23:44 2018 From: nathan.harper at cfms.org.uk (Nathan Harper) Date: Thu, 18 Oct 2018 10:23:44 +0100 Subject: [gpfsug-discuss] Job vacancy @Birmingham In-Reply-To: References: Message-ID: Olaf - we don't need any reminders of Bre.. this morning On Thu, 18 Oct 2018 at 10:15, Olaf Weiser wrote: > Hi Simon .. > well - I would love to .. .but .. ;-) hey - what do you think, how long a > citizen from the EU can live (and work) in UK ;-) > don't take me too serious... see you soon, consider you invited for a > coffee for my rude comment .. ;-) > olaf > > > > > From: Simon Thompson > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 10/17/2018 11:02 PM > Subject: [gpfsug-discuss] Job vacancy @Birmingham > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We're looking for someone to join our systems team here at University of > Birmingham. In case you didn't realise, we're pretty reliant on Spectrum > Scale to deliver our storage systems. > > > https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117 > *https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117* > > > Such a snappy URL :-) > > Feel free to email me *OFFLIST* if you have informal enquiries! > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- *Nathan Harper* // IT Systems Lead *e: *nathan.harper at cfms.org.uk *t*: 0117 906 1104 *m*: 0787 551 0891 *w: *www.cfms.org.uk CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons Green // Bristol // BS16 7FR CFMS Services Ltd is registered in England and Wales No 05742022 - a subsidiary of CFMS Ltd CFMS Services Ltd registered office // 43 Queens Square // Bristol // BS1 4QP -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Thu Oct 18 16:32:43 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Thu, 18 Oct 2018 15:32:43 +0000 Subject: [gpfsug-discuss] Can't take snapshots while re-striping Message-ID: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London From alex at calicolabs.com Thu Oct 18 17:12:42 2018 From: alex at calicolabs.com (Alex Chekholko) Date: Thu, 18 Oct 2018 09:12:42 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: References: Message-ID: The re-striping uses a lot of I/O, so if your goal is user-facing performance, the re-striping is definitely hurting in the short term and is of questionable value in the long term, depending on how much churn there is on your filesystem. One way to split the difference would be to run your 'mmrestripe -b' midnight to 6am for many days; so it does not conflict with your snapshot. Or whatever other time you have lower user load. On Thu, Oct 18, 2018 at 8:32 AM Peter Childs wrote: > We've just added 9 raid volumes to our main storage, (5 Raid6 arrays > for data and 4 Raid1 arrays for metadata) > > We are now attempting to rebalance and our data around all the volumes. > > We started with the meta-data doing a "mmrestripe -r" as we'd changed > the failure groups to on our meta-data disks and wanted to ensure we > had all our metadata on known good ssd. No issues, here we could take > snapshots and I even tested it. (New SSD on new failure group and move > all old SSD to the same failure group) > > We're now doing a "mmrestripe -b" to rebalance the data accross all 21 > Volumes however when we attempt to take a snapshot, as we do every > night at 11pm it fails with > > sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test > Flushing dirty data for snapshot :test... > Quiescing all file system operations. > Unable to quiesce all nodes; some processes are busy or holding > required resources. > mmcrsnapshot: Command failed. Examine previous error messages to > determine cause. > > Are you meant to be able to take snapshots while re-striping or not? > > I know a rebalance of the data is probably unnecessary, but we'd like > to get the best possible speed out of the system, and we also kind of > like balance. > > Thanks > > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at calicolabs.com Thu Oct 18 17:12:42 2018 From: alex at calicolabs.com (Alex Chekholko) Date: Thu, 18 Oct 2018 09:12:42 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: References: Message-ID: The re-striping uses a lot of I/O, so if your goal is user-facing performance, the re-striping is definitely hurting in the short term and is of questionable value in the long term, depending on how much churn there is on your filesystem. One way to split the difference would be to run your 'mmrestripe -b' midnight to 6am for many days; so it does not conflict with your snapshot. Or whatever other time you have lower user load. On Thu, Oct 18, 2018 at 8:32 AM Peter Childs wrote: > We've just added 9 raid volumes to our main storage, (5 Raid6 arrays > for data and 4 Raid1 arrays for metadata) > > We are now attempting to rebalance and our data around all the volumes. > > We started with the meta-data doing a "mmrestripe -r" as we'd changed > the failure groups to on our meta-data disks and wanted to ensure we > had all our metadata on known good ssd. No issues, here we could take > snapshots and I even tested it. (New SSD on new failure group and move > all old SSD to the same failure group) > > We're now doing a "mmrestripe -b" to rebalance the data accross all 21 > Volumes however when we attempt to take a snapshot, as we do every > night at 11pm it fails with > > sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test > Flushing dirty data for snapshot :test... > Quiescing all file system operations. > Unable to quiesce all nodes; some processes are busy or holding > required resources. > mmcrsnapshot: Command failed. Examine previous error messages to > determine cause. > > Are you meant to be able to take snapshots while re-striping or not? > > I know a rebalance of the data is probably unnecessary, but we'd like > to get the best possible speed out of the system, and we also kind of > like balance. > > Thanks > > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Oct 18 17:13:52 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 18 Oct 2018 16:13:52 +0000 Subject: [gpfsug-discuss] Job vacancy @Birmingham In-Reply-To: References: Message-ID: <4B78CFBB-6B35-4914-A42D-5A66117DD588@vanderbilt.edu> Hi Nathan, Well, while I?m truly sorry for what you?re going thru, at least a majority of the voters in the UK did vote for it. Keep in mind that things could be worse. Some of us do happen to live in a country where a far worse thing has happened despite the fact that the majority of the voters were _against_ it?. ;-) Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Oct 18, 2018, at 4:23 AM, Nathan Harper > wrote: Olaf - we don't need any reminders of Bre.. this morning On Thu, 18 Oct 2018 at 10:15, Olaf Weiser > wrote: Hi Simon .. well - I would love to .. .but .. ;-) hey - what do you think, how long a citizen from the EU can live (and work) in UK ;-) don't take me too serious... see you soon, consider you invited for a coffee for my rude comment .. ;-) olaf From: Simon Thompson > To: "gpfsug-discuss at spectrumscale.org" > Date: 10/17/2018 11:02 PM Subject: [gpfsug-discuss] Job vacancy @Birmingham Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We're looking for someone to join our systems team here at University of Birmingham. In case you didn't realise, we're pretty reliant on Spectrum Scale to deliver our storage systems. https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117 Such a snappy URL :-) Feel free to email me *OFFLIST* if you have informal enquiries! Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Nathan Harper // IT Systems Lead e: nathan.harper at cfms.org.uk t: 0117 906 1104 m: 0787 551 0891 w: www.cfms.org.uk CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons Green // Bristol // BS16 7FR [http://cfms.org.uk/images/logo.png] CFMS Services Ltd is registered in England and Wales No 05742022 - a subsidiary of CFMS Ltd CFMS Services Ltd registered office // 43 Queens Square // Bristol // BS1 4QP _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ca552bcbb43b34c316b2808d634db7033%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636754514425052428&sdata=tErG6k2dNdqz%2Ffnc8eYtpyR%2Ba1Cb4AZ8n7WA%2Buv3oCw%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Oct 18 17:48:54 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 18 Oct 2018 16:48:54 +0000 Subject: [gpfsug-discuss] Reminder: Please keep discussion focused on GPFS/Scale Message-ID: <2A1399B8-441D-48E3-AACC-0BD3B0780A60@nuance.com> A gentle reminder to not left the discussions drift off topic, thanks. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Thu Oct 18 17:57:18 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 18 Oct 2018 16:57:18 +0000 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: Message-ID: And use QoS Less aggressive during peak, more on valleys. If your workload allows it. ? SENT FROM MOBILE DEVICE Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous > On 18 Oct 2018, at 19.13, Alex Chekholko wrote: > > The re-striping uses a lot of I/O, so if your goal is user-facing performance, the re-striping is definitely hurting in the short term and is of questionable value in the long term, depending on how much churn there is on your filesystem. > > One way to split the difference would be to run your 'mmrestripe -b' midnight to 6am for many days; so it does not conflict with your snapshot. Or whatever other time you have lower user load. > >> On Thu, Oct 18, 2018 at 8:32 AM Peter Childs wrote: >> We've just added 9 raid volumes to our main storage, (5 Raid6 arrays >> for data and 4 Raid1 arrays for metadata) >> >> We are now attempting to rebalance and our data around all the volumes. >> >> We started with the meta-data doing a "mmrestripe -r" as we'd changed >> the failure groups to on our meta-data disks and wanted to ensure we >> had all our metadata on known good ssd. No issues, here we could take >> snapshots and I even tested it. (New SSD on new failure group and move >> all old SSD to the same failure group) >> >> We're now doing a "mmrestripe -b" to rebalance the data accross all 21 >> Volumes however when we attempt to take a snapshot, as we do every >> night at 11pm it fails with >> >> sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test >> Flushing dirty data for snapshot :test... >> Quiescing all file system operations. >> Unable to quiesce all nodes; some processes are busy or holding >> required resources. >> mmcrsnapshot: Command failed. Examine previous error messages to >> determine cause. >> >> Are you meant to be able to take snapshots while re-striping or not? >> >> I know a rebalance of the data is probably unnecessary, but we'd like >> to get the best possible speed out of the system, and we also kind of >> like balance. >> >> Thanks >> >> >> -- >> Peter Childs >> ITS Research Storage >> Queen Mary, University of London >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Thu Oct 18 17:57:18 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 18 Oct 2018 16:57:18 +0000 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: Message-ID: And use QoS Less aggressive during peak, more on valleys. If your workload allows it. ? SENT FROM MOBILE DEVICE Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous > On 18 Oct 2018, at 19.13, Alex Chekholko wrote: > > The re-striping uses a lot of I/O, so if your goal is user-facing performance, the re-striping is definitely hurting in the short term and is of questionable value in the long term, depending on how much churn there is on your filesystem. > > One way to split the difference would be to run your 'mmrestripe -b' midnight to 6am for many days; so it does not conflict with your snapshot. Or whatever other time you have lower user load. > >> On Thu, Oct 18, 2018 at 8:32 AM Peter Childs wrote: >> We've just added 9 raid volumes to our main storage, (5 Raid6 arrays >> for data and 4 Raid1 arrays for metadata) >> >> We are now attempting to rebalance and our data around all the volumes. >> >> We started with the meta-data doing a "mmrestripe -r" as we'd changed >> the failure groups to on our meta-data disks and wanted to ensure we >> had all our metadata on known good ssd. No issues, here we could take >> snapshots and I even tested it. (New SSD on new failure group and move >> all old SSD to the same failure group) >> >> We're now doing a "mmrestripe -b" to rebalance the data accross all 21 >> Volumes however when we attempt to take a snapshot, as we do every >> night at 11pm it fails with >> >> sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test >> Flushing dirty data for snapshot :test... >> Quiescing all file system operations. >> Unable to quiesce all nodes; some processes are busy or holding >> required resources. >> mmcrsnapshot: Command failed. Examine previous error messages to >> determine cause. >> >> Are you meant to be able to take snapshots while re-striping or not? >> >> I know a rebalance of the data is probably unnecessary, but we'd like >> to get the best possible speed out of the system, and we also kind of >> like balance. >> >> Thanks >> >> >> -- >> Peter Childs >> ITS Research Storage >> Queen Mary, University of London >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dwayne.Hart at med.mun.ca Thu Oct 18 18:19:21 2018 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Thu, 18 Oct 2018 17:19:21 +0000 Subject: [gpfsug-discuss] Best way to migrate data Message-ID: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Hi, Just wondering what the best recipe for migrating a user?s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I?m currently using rsync and it has maxed out the client system?s IB interface. Best, Dwayne ? Dwayne Hart | Systems Administrator IV CHIA, Faculty of Medicine Memorial University of Newfoundland 300 Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L Dobbin Building | 4M409 T 709 864 6631 From S.J.Thompson at bham.ac.uk Thu Oct 18 18:44:11 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 18 Oct 2018 17:44:11 +0000 Subject: [gpfsug-discuss] Quick survey: ACL Posix vs NFS V4 In-Reply-To: <8120950808e344e280ae211ff22ba0bf@welwex02.niwa.local> References: <8120950808e344e280ae211ff22ba0bf@welwex02.niwa.local> Message-ID: Just following up this thread ... We use v4 ACLs, in part because we also export via SMB as well. Note that we do also use the fileset option "chmodAndUpdateAcl" Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Fabrice.Cantos at niwa.co.nz [Fabrice.Cantos at niwa.co.nz] Sent: 10 October 2018 22:57 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Quick survey: ACL Posix vs NFS V4 I would be interested to know what you chose for your filesystems and user/project space directories: * Traditional Posix ACL * NFS V4 ACL What did motivate your choice? We are facing some issues to get the correct NFS ACL to keep correct attributes for new files created. Thanks Fabrice [cid:image4cef17.PNG at 18c66b76.4480e036] Fabrice Cantos HPC Systems Engineer Group Manager ? High Performance Computing T +64-4-386-0367 M +64-27-412-9693 National Institute of Water & Atmospheric Research Ltd (NIWA) 301 Evans Bay Parade, Greta Point, Wellington Connect with NIWA: niwa.co.nz Facebook Twitter LinkedIn Instagram To ensure compliance with legal requirements and to maintain cyber security standards, NIWA's IT systems are subject to ongoing monitoring, activity logging and auditing. This monitoring and auditing service may be provided by third parties. Such third parties can access information transmitted to, processed by and stored on NIWA's IT systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image4cef17.PNG Type: image/png Size: 12288 bytes Desc: image4cef17.PNG URL: From frederik.ferner at diamond.ac.uk Thu Oct 18 18:54:32 2018 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Thu, 18 Oct 2018 18:54:32 +0100 Subject: [gpfsug-discuss] Quick survey: ACL Posix vs NFS V4 In-Reply-To: <8120950808e344e280ae211ff22ba0bf@welwex02.niwa.local> References: <8120950808e344e280ae211ff22ba0bf@welwex02.niwa.local> Message-ID: <595d0584-df41-a731-ac08-6bba81dbdb31@diamond.ac.uk> On 10/10/18 22:57, Fabrice Cantos wrote: > I would be interested to know what you chose for your filesystems and > user/project space directories: > > * Traditional Posix ACL > * NFS V4 ACL We use traditional Posix ACLs almost exclusively. The main exception is some directories on Spectrum Scale where Windows machines with native Spectrum Scale support create files and directories. There our scripts set Posix ACLs which are respected on Windows but automatically converted to NFS V4 ACLs on new files and directories by the file system. > What did motivate your choice? Mainly that our use of ACLs goes back way longer than our use of GPFS/Spectrum Scale and we also have other file systems which do not support NFSv4 ACLs. Keeping knowledge and script on one set of ACLs fresh within the team is easier. Additional headache comes because as we all know Posix ACLs and NFS V4 ACLs don't translate exactly. > We are facing some issues to get the correct NFS ACL to keep correct > attributes for new files created. Is this using kernel NFSd or Ganesha (CES)? Frederik -- Frederik Ferner Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 Duty Sys Admin can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom From oehmes at gmail.com Thu Oct 18 19:09:56 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 18 Oct 2018 11:09:56 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping Message-ID: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Thu Oct 18 19:09:56 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 18 Oct 2018 11:09:56 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping Message-ID: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Thu Oct 18 19:26:10 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 18 Oct 2018 18:26:10 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Message-ID: Hi Dwayne, I?m assuming you can?t just let an rsync run, possibly throttled in some way? If not, and if you?re just tapping out your network, then would it be possible to go old school? We have parts of the Medical Center here where their network connections are ? um, less than robust. So they tar stuff up to a portable HD, sneaker net it to us, and we untar is from an NSD server. HTH, and I really hope that someone has a better idea than that! Kevin > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote: > > Hi, > > Just wondering what the best recipe for migrating a user?s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I?m currently using rsync and it has maxed out the client system?s IB interface. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ccca728d2d61f4be06bcd08d6351f3650%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636754805507359478&sdata=2YAiqgqKl4CerlyCn3vJ9v9u%2FrGzbfa7aKxJ0PYV%2Fhc%3D&reserved=0 From p.childs at qmul.ac.uk Thu Oct 18 19:50:42 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Thu, 18 Oct 2018 18:50:42 +0000 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: Thanks Sven, that's one of the best answers I've seen and probably closer to why we sometimes can't take snapshots under normal circumstances as well. We're currently running the restripe with "-N " so it only runs on a few nodes and does not disturb the work of the cluster, which is why we hadn't noticed it slow down the storage too much. I've also tried to put some qos settings on it too, I always find the qos a little bit "trial and error" but 30,000Iops looks to be making the rebalance run at about 2/3 iops it was using with no qos limit...... Just out of interest which version do I need to be running for "mmchqos -N" to work? I tried it to limit a set of nodes and it says not supported by my filesystem version. Manual does not look to say. Even with a very, very small value for qos on maintenance tasks, I still can't take snapshots so as Sven says the buffers are getting dirty too quickly. I have thought before that making snapshot taking more reliable would be nice, I'd not really thought it would be possible, I guess its time to write another RFE. Peter Childs Research Storage ITS Research Infrastructure Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Thursday, October 18, 2018 7:09:56 PM To: gpfsug main discussion list; gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From p.childs at qmul.ac.uk Thu Oct 18 19:50:42 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Thu, 18 Oct 2018 18:50:42 +0000 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: Thanks Sven, that's one of the best answers I've seen and probably closer to why we sometimes can't take snapshots under normal circumstances as well. We're currently running the restripe with "-N " so it only runs on a few nodes and does not disturb the work of the cluster, which is why we hadn't noticed it slow down the storage too much. I've also tried to put some qos settings on it too, I always find the qos a little bit "trial and error" but 30,000Iops looks to be making the rebalance run at about 2/3 iops it was using with no qos limit...... Just out of interest which version do I need to be running for "mmchqos -N" to work? I tried it to limit a set of nodes and it says not supported by my filesystem version. Manual does not look to say. Even with a very, very small value for qos on maintenance tasks, I still can't take snapshots so as Sven says the buffers are getting dirty too quickly. I have thought before that making snapshot taking more reliable would be nice, I'd not really thought it would be possible, I guess its time to write another RFE. Peter Childs Research Storage ITS Research Infrastructure Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Thursday, October 18, 2018 7:09:56 PM To: gpfsug main discussion list; gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Paul.Sanchez at deshaw.com Thu Oct 18 19:47:31 2018 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Thu, 18 Oct 2018 18:47:31 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Message-ID: <8ec08801d1de486facbd6c3318e62d63@mbxtoa1.winmail.deshaw.com> Sharding can also work, if you have a storage-connected compute grid in your environment: If you enumerate all of the directories, then use a non-recursive rsync for each one, you may be able to parallelize the workload by using several clients simultaneously. It may still max out the links of these clients (assuming your source read throughput and target write throughput bottlenecks aren't encountered first) but it may run that way for 1/100th of the time if you can use 100+ machines. -Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Buterbaugh, Kevin L Sent: Thursday, October 18, 2018 2:26 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Hi Dwayne, I?m assuming you can?t just let an rsync run, possibly throttled in some way? If not, and if you?re just tapping out your network, then would it be possible to go old school? We have parts of the Medical Center here where their network connections are ? um, less than robust. So they tar stuff up to a portable HD, sneaker net it to us, and we untar is from an NSD server. HTH, and I really hope that someone has a better idea than that! Kevin > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote: > > Hi, > > Just wondering what the best recipe for migrating a user?s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I?m currently using rsync and it has maxed out the client system?s IB interface. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ccca728d2d61f4be06bcd08d6351f3650%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636754805507359478&sdata=2YAiqgqKl4CerlyCn3vJ9v9u%2FrGzbfa7aKxJ0PYV%2Fhc%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Thu Oct 18 20:18:37 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 18 Oct 2018 12:18:37 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: <47DF6EDF-CA0C-4EBB-851A-1D3603F8B0C5@gmail.com> I don't know which min FS version you need to make use of -N, but there is this Marc guy watching the mailing list who would know __ Sven ?On 10/18/18, 11:50 AM, "Peter Childs" wrote: Thanks Sven, that's one of the best answers I've seen and probably closer to why we sometimes can't take snapshots under normal circumstances as well. We're currently running the restripe with "-N " so it only runs on a few nodes and does not disturb the work of the cluster, which is why we hadn't noticed it slow down the storage too much. I've also tried to put some qos settings on it too, I always find the qos a little bit "trial and error" but 30,000Iops looks to be making the rebalance run at about 2/3 iops it was using with no qos limit...... Just out of interest which version do I need to be running for "mmchqos -N" to work? I tried it to limit a set of nodes and it says not supported by my filesystem version. Manual does not look to say. Even with a very, very small value for qos on maintenance tasks, I still can't take snapshots so as Sven says the buffers are getting dirty too quickly. I have thought before that making snapshot taking more reliable would be nice, I'd not really thought it would be possible, I guess its time to write another RFE. Peter Childs Research Storage ITS Research Infrastructure Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Thursday, October 18, 2018 7:09:56 PM To: gpfsug main discussion list; gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Thu Oct 18 20:18:37 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 18 Oct 2018 12:18:37 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: <47DF6EDF-CA0C-4EBB-851A-1D3603F8B0C5@gmail.com> I don't know which min FS version you need to make use of -N, but there is this Marc guy watching the mailing list who would know __ Sven ?On 10/18/18, 11:50 AM, "Peter Childs" wrote: Thanks Sven, that's one of the best answers I've seen and probably closer to why we sometimes can't take snapshots under normal circumstances as well. We're currently running the restripe with "-N " so it only runs on a few nodes and does not disturb the work of the cluster, which is why we hadn't noticed it slow down the storage too much. I've also tried to put some qos settings on it too, I always find the qos a little bit "trial and error" but 30,000Iops looks to be making the rebalance run at about 2/3 iops it was using with no qos limit...... Just out of interest which version do I need to be running for "mmchqos -N" to work? I tried it to limit a set of nodes and it says not supported by my filesystem version. Manual does not look to say. Even with a very, very small value for qos on maintenance tasks, I still can't take snapshots so as Sven says the buffers are getting dirty too quickly. I have thought before that making snapshot taking more reliable would be nice, I'd not really thought it would be possible, I guess its time to write another RFE. Peter Childs Research Storage ITS Research Infrastructure Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Thursday, October 18, 2018 7:09:56 PM To: gpfsug main discussion list; gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From cblack at nygenome.org Thu Oct 18 20:13:29 2018 From: cblack at nygenome.org (Christopher Black) Date: Thu, 18 Oct 2018 19:13:29 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <8ec08801d1de486facbd6c3318e62d63@mbxtoa1.winmail.deshaw.com> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <8ec08801d1de486facbd6c3318e62d63@mbxtoa1.winmail.deshaw.com> Message-ID: <6FECF7F6-57E6-4164-BAF4-8ACF39453C59@nygenome.org> Other tools and approaches that we've found helpful: msrsync: handles parallelizing rsync within a dir tree and can greatly speed up transfers on a single node with both filesystems mounted, especially when dealing with many small files Globus/GridFTP: set up one or more endpoints on each side, gridftp will auto parallelize and recover from disruptions msrsync is easier to get going but is limited to one parent dir per node. We've sometimes done an additional level of parallelization by running msrsync with different top level directories on different hpc nodes simultaneously. Best, Chris Refs: https://github.com/jbd/msrsync https://www.globus.org/ ?On 10/18/18, 2:54 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Sanchez, Paul" wrote: Sharding can also work, if you have a storage-connected compute grid in your environment: If you enumerate all of the directories, then use a non-recursive rsync for each one, you may be able to parallelize the workload by using several clients simultaneously. It may still max out the links of these clients (assuming your source read throughput and target write throughput bottlenecks aren't encountered first) but it may run that way for 1/100th of the time if you can use 100+ machines. -Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Buterbaugh, Kevin L Sent: Thursday, October 18, 2018 2:26 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Hi Dwayne, I?m assuming you can?t just let an rsync run, possibly throttled in some way? If not, and if you?re just tapping out your network, then would it be possible to go old school? We have parts of the Medical Center here where their network connections are ? um, less than robust. So they tar stuff up to a portable HD, sneaker net it to us, and we untar is from an NSD server. HTH, and I really hope that someone has a better idea than that! Kevin > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote: > > Hi, > > Just wondering what the best recipe for migrating a user?s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I?m currently using rsync and it has maxed out the client system?s IB interface. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss-26amp-3Bdata-3D02-257C01-257CKevin.Buterbaugh-2540vanderbilt.edu-257Ccca728d2d61f4be06bcd08d6351f3650-257Cba5a7f39e3be4ab3b45067fa80faecad-257C0-257C0-257C636754805507359478-26amp-3Bsdata-3D2YAiqgqKl4CerlyCn3vJ9v9u-252FrGzbfa7aKxJ0PYV-252Fhc-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=NVJncSq-SKJSPgljdYqLDoy753jhxiKJNI2M8CexJME&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. From makaplan at us.ibm.com Thu Oct 18 20:30:21 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 18 Oct 2018 15:30:21 -0400 Subject: [gpfsug-discuss] Can't take snapshots while re-striping - "mmchqos -N" In-Reply-To: References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: I believe `mmchqos ... -N ... ` is supported at 4.2.2 and later. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Oct 18 20:30:21 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 18 Oct 2018 15:30:21 -0400 Subject: [gpfsug-discuss] Can't take snapshots while re-striping - "mmchqos -N" In-Reply-To: References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: I believe `mmchqos ... -N ... ` is supported at 4.2.2 and later. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dwayne.Hart at med.mun.ca Thu Oct 18 21:05:50 2018 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Thu, 18 Oct 2018 20:05:50 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <6FECF7F6-57E6-4164-BAF4-8ACF39453C59@nygenome.org> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <8ec08801d1de486facbd6c3318e62d63@mbxtoa1.winmail.deshaw.com> <6FECF7F6-57E6-4164-BAF4-8ACF39453C59@nygenome.org> Message-ID: Thank you all for the responses. I'm currently using msrsync and things appear to be going very well. The data transfer is contained inside our DC. I'm transferring a user's home directory content from one GPFS file system to another. Our IBM Spectrum Scale Solution consists of 12 IO nodes connected to IB and the client node that I'm transferring the data from one fs to another is also connected to IB with a possible maximum of 2 hops. [root at client-system]# /gpfs/home/dwayne/bin/msrsync -P --stats -p 32 /gpfs/home/user/ /research/project/user/ [64756/992397 entries] [30.1 T/239.6 T transferred] [81 entries/s] [39.0 G/s bw] [monq 0] [jq 62043] Best, Dwayne -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christopher Black Sent: Thursday, October 18, 2018 4:43 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Other tools and approaches that we've found helpful: msrsync: handles parallelizing rsync within a dir tree and can greatly speed up transfers on a single node with both filesystems mounted, especially when dealing with many small files Globus/GridFTP: set up one or more endpoints on each side, gridftp will auto parallelize and recover from disruptions msrsync is easier to get going but is limited to one parent dir per node. We've sometimes done an additional level of parallelization by running msrsync with different top level directories on different hpc nodes simultaneously. Best, Chris Refs: https://github.com/jbd/msrsync https://www.globus.org/ ?On 10/18/18, 2:54 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Sanchez, Paul" wrote: Sharding can also work, if you have a storage-connected compute grid in your environment: If you enumerate all of the directories, then use a non-recursive rsync for each one, you may be able to parallelize the workload by using several clients simultaneously. It may still max out the links of these clients (assuming your source read throughput and target write throughput bottlenecks aren't encountered first) but it may run that way for 1/100th of the time if you can use 100+ machines. -Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Buterbaugh, Kevin L Sent: Thursday, October 18, 2018 2:26 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Hi Dwayne, I?m assuming you can?t just let an rsync run, possibly throttled in some way? If not, and if you?re just tapping out your network, then would it be possible to go old school? We have parts of the Medical Center here where their network connections are ? um, less than robust. So they tar stuff up to a portable HD, sneaker net it to us, and we untar is from an NSD server. HTH, and I really hope that someone has a better idea than that! Kevin > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote: > > Hi, > > Just wondering what the best recipe for migrating a user?s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I?m currently using rsync and it has maxed out the client system?s IB interface. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss-26amp-3Bdata-3D02-257C01-257CKevin.Buterbaugh-2540vanderbilt.edu-257Ccca728d2d61f4be06bcd08d6351f3650-257Cba5a7f39e3be4ab3b45067fa80faecad-257C0-257C0-257C636754805507359478-26amp-3Bsdata-3D2YAiqgqKl4CerlyCn3vJ9v9u-252FrGzbfa7aKxJ0PYV-252Fhc-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=NVJncSq-SKJSPgljdYqLDoy753jhxiKJNI2M8CexJME&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mutantllama at gmail.com Thu Oct 18 21:54:42 2018 From: mutantllama at gmail.com (Carl) Date: Fri, 19 Oct 2018 07:54:42 +1100 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <8ec08801d1de486facbd6c3318e62d63@mbxtoa1.winmail.deshaw.com> <6FECF7F6-57E6-4164-BAF4-8ACF39453C59@nygenome.org> Message-ID: It may be overkill for your use case but MPI file utils is very good for large datasets. https://github.com/hpc/mpifileutils Cheers, Carl. On Fri, 19 Oct 2018 at 7:05 am, wrote: > Thank you all for the responses. I'm currently using msrsync and things > appear to be going very well. > > The data transfer is contained inside our DC. I'm transferring a user's > home directory content from one GPFS file system to another. Our IBM > Spectrum Scale Solution consists of 12 IO nodes connected to IB and the > client node that I'm transferring the data from one fs to another is also > connected to IB with a possible maximum of 2 hops. > > [root at client-system]# /gpfs/home/dwayne/bin/msrsync -P --stats -p 32 > /gpfs/home/user/ /research/project/user/ > [64756/992397 entries] [30.1 T/239.6 T transferred] [81 entries/s] [39.0 > G/s bw] [monq 0] [jq 62043] > > Best, > Dwayne > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christopher Black > Sent: Thursday, October 18, 2018 4:43 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Best way to migrate data > > Other tools and approaches that we've found helpful: > msrsync: handles parallelizing rsync within a dir tree and can greatly > speed up transfers on a single node with both filesystems mounted, > especially when dealing with many small files > Globus/GridFTP: set up one or more endpoints on each side, gridftp will > auto parallelize and recover from disruptions > > msrsync is easier to get going but is limited to one parent dir per node. > We've sometimes done an additional level of parallelization by running > msrsync with different top level directories on different hpc nodes > simultaneously. > > Best, > Chris > > Refs: > https://github.com/jbd/msrsync > https://www.globus.org/ > > ?On 10/18/18, 2:54 PM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of Sanchez, Paul" behalf of Paul.Sanchez at deshaw.com> wrote: > > Sharding can also work, if you have a storage-connected compute grid > in your environment: If you enumerate all of the directories, then use a > non-recursive rsync for each one, you may be able to parallelize the > workload by using several clients simultaneously. It may still max out the > links of these clients (assuming your source read throughput and target > write throughput bottlenecks aren't encountered first) but it may run that > way for 1/100th of the time if you can use 100+ machines. > > -Paul > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Buterbaugh, Kevin L > Sent: Thursday, October 18, 2018 2:26 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Best way to migrate data > > Hi Dwayne, > > I?m assuming you can?t just let an rsync run, possibly throttled in > some way? If not, and if you?re just tapping out your network, then would > it be possible to go old school? We have parts of the Medical Center here > where their network connections are ? um, less than robust. So they tar > stuff up to a portable HD, sneaker net it to us, and we untar is from an > NSD server. > > HTH, and I really hope that someone has a better idea than that! > > Kevin > > > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote: > > > > Hi, > > > > Just wondering what the best recipe for migrating a user?s home > directory content from one GFPS file system to another which hosts a larger > research GPFS file system? I?m currently using rsync and it has maxed out > the client system?s IB interface. > > > > Best, > > Dwayne > > ? > > Dwayne Hart | Systems Administrator IV > > > > CHIA, Faculty of Medicine > > Memorial University of Newfoundland > > 300 Prince Philip Drive > > St. John?s, Newfoundland | A1B 3V6 > > Craig L Dobbin Building | 4M409 > > T 709 864 6631 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss-26amp-3Bdata-3D02-257C01-257CKevin.Buterbaugh-2540vanderbilt.edu-257Ccca728d2d61f4be06bcd08d6351f3650-257Cba5a7f39e3be4ab3b45067fa80faecad-257C0-257C0-257C636754805507359478-26amp-3Bsdata-3D2YAiqgqKl4CerlyCn3vJ9v9u-252FrGzbfa7aKxJ0PYV-252Fhc-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=NVJncSq-SKJSPgljdYqLDoy753jhxiKJNI2M8CexJME&e= > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= > > > ________________________________ > > This message is for the recipient?s use only, and may contain > confidential, privileged or protected information. Any unauthorized use or > dissemination of this communication is prohibited. If you received this > message in error, please immediately notify the sender and destroy all > copies of this message. The recipient should check this email and any > attachments for the presence of viruses, as we accept no liability for any > damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Oct 19 10:09:13 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 19 Oct 2018 10:09:13 +0100 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Message-ID: On 18/10/2018 18:19, Dwayne.Hart at med.mun.ca wrote: > Hi, > > Just wondering what the best recipe for migrating a user?s home > directory content from one GFPS file system to another which hosts a > larger research GPFS file system? I?m currently using rsync and it > has maxed out the client system?s IB interface. > Be careful with rsync, it resets all your atimes which screws up any hope of doing ILM or HSM. My personal favourite is to do something along the lines of dsmc restore /gpfs/ Minimal impact on the user facing services, and seems to preserve atimes last time I checked. Sure it tanks your backup server a bit, but that is not user facing. What do users care if the backup takes longer than normal. Of course this presumes you have a backup :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From novosirj at rutgers.edu Thu Oct 18 21:04:36 2018 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Thu, 18 Oct 2018 20:04:36 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Message-ID: <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We use parsyncfp. Our target is not GPFS, though. I was really hoping to hear about something snazzier for GPFS-GPFS. Lenovo would probably tell you that HSM is the way to go (we asked something similar for a replacement for our current setup or for distributed storage). On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: > Hi, > > Just wondering what the best recipe for migrating a user?s home > directory content from one GFPS file system to another which hosts > a larger research GPFS file system? I?m currently using rsync and > it has maxed out the client system?s IB interface. > > Best, Dwayne ? Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine Memorial University of Newfoundland 300 > Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L > Dobbin Building | 4M409 T 709 864 6631 > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk =dMDg -----END PGP SIGNATURE----- From Dwayne.Hart at med.mun.ca Fri Oct 19 11:15:15 2018 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Fri, 19 Oct 2018 10:15:15 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Message-ID: Hi JAB, We do not have either ILM or HSM. Thankfully, we have at minimum IBM Spectrum Protect (I recently updated the system to version 8.1.5). It would be an interesting exercise to see how long it would take IBM SP to restore a user's content fully to a different target. I have done some smaller recoveries so I know that the system is in a usable state ;) Best, Dwayne -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathan Buzzard Sent: Friday, October 19, 2018 6:39 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Best way to migrate data On 18/10/2018 18:19, Dwayne.Hart at med.mun.ca wrote: > Hi, > > Just wondering what the best recipe for migrating a user?s home > directory content from one GFPS file system to another which hosts a > larger research GPFS file system? I?m currently using rsync and it has > maxed out the client system?s IB interface. > Be careful with rsync, it resets all your atimes which screws up any hope of doing ILM or HSM. My personal favourite is to do something along the lines of dsmc restore /gpfs/ Minimal impact on the user facing services, and seems to preserve atimes last time I checked. Sure it tanks your backup server a bit, but that is not user facing. What do users care if the backup takes longer than normal. Of course this presumes you have a backup :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Dwayne.Hart at med.mun.ca Fri Oct 19 11:37:13 2018 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Fri, 19 Oct 2018 10:37:13 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca>, <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> Message-ID: Thank you Ryan. I?ll have a more in-depth look at this application later today and see how it deals with some of the large genetic files that are generated by the sequencer. By copying it from GPFS fs to another GPFS fs. Best, Dwayne ? Dwayne Hart | Systems Administrator IV CHIA, Faculty of Medicine Memorial University of Newfoundland 300 Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L Dobbin Building | 4M409 T 709 864 6631 > On Oct 19, 2018, at 7:04 AM, Ryan Novosielski wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > We use parsyncfp. Our target is not GPFS, though. I was really hoping > to hear about something snazzier for GPFS-GPFS. Lenovo would probably > tell you that HSM is the way to go (we asked something similar for a > replacement for our current setup or for distributed storage). > >> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: >> Hi, >> >> Just wondering what the best recipe for migrating a user?s home >> directory content from one GFPS file system to another which hosts >> a larger research GPFS file system? I?m currently using rsync and >> it has maxed out the client system?s IB interface. >> >> Best, Dwayne ? Dwayne Hart | Systems Administrator IV >> >> CHIA, Faculty of Medicine Memorial University of Newfoundland 300 >> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L >> Dobbin Building | 4M409 T 709 864 6631 >> _______________________________________________ gpfsug-discuss >> mailing list gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > - -- > ____ > || \\UTGERS, |----------------------*O*------------------------ > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark > `' > -----BEGIN PGP SIGNATURE----- > > iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG > p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk > =dMDg > -----END PGP SIGNATURE----- > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Fri Oct 19 11:41:15 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 19 Oct 2018 10:41:15 +0000 Subject: [gpfsug-discuss] Spectrum Scale and Firewalls Message-ID: <10239ED8-7E0D-4420-8BEC-F17F0606BE64@bham.ac.uk> Hi, We?re having some issues bringing up firewalls on some of our NSD nodes. The problem I was actually trying to diagnose I don?t think is firewall related but still ? We have port 22 and 1191 open and also 60000-61000, we also set: # mmlsconfig tscTcpPort tscTcpPort 1191 # mmlsconfig tscCmdPortRange tscCmdPortRange 60000-61000 https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_firewallforinternalcommn.htm Claims this is sufficient ? Running mmnetverify: # mmnetverify all --target-nodes rds-er-mgr01 rds-pg-mgr01 checking local configuration. Operation interface: Success. rds-pg-mgr01 checking communication with node rds-er-mgr01. Operation resolution: Success. Operation ping: Success. Operation shell: Success. Operation copy: Success. Operation time: Success. Operation daemon-port: Success. Operation sdrserv-port: Success. Operation tsccmd-port: Success. Operation data-small: Success. Operation data-medium: Success. Operation data-large: Success. Could not connect to port 46326 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. rds-pg-mgr01 checking cluster communications. Issues Found: rds-er-mgr01 could not connect to rds-pg-mgr01 (TCP, port 46326). mmnetverify: Command failed. Examine previous error messages to determine cause. Note that the port number mentioned changes if we run mmnetverify multiple times. The two clients in this test are running 5.0.2 code. If I run in verbose mode I see: Checking network communication with node rds-er-mgr01. Port range restricted by cluster configuration: 60000 - 61000. rds-er-mgr01: connecting to node rds-pg-mgr01. rds-er-mgr01: exchanged 256.0M bytes with rds-pg-mgr01. Write size: 16.0M bytes. Network statistics for rds-er-mgr01 during data exchange: packets sent: 68112 packets received: 72452 Network Traffic between rds-er-mgr01 and rds-pg-mgr01 port 60000 ok. Operation data-large: Success. Checking network bandwidth. rds-er-mgr01: connecting to node rds-pg-mgr01. Could not connect to port 36277 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. So for many of the tests it looks like its using port 60000 as expected, is this just a bug in mmnetverify or am I doing something silly? Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Oct 19 14:05:22 2018 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 19 Oct 2018 09:05:22 -0400 Subject: [gpfsug-discuss] Spectrum Scale and Firewalls In-Reply-To: <10239ED8-7E0D-4420-8BEC-F17F0606BE64@bham.ac.uk> References: <10239ED8-7E0D-4420-8BEC-F17F0606BE64@bham.ac.uk> Message-ID: Simon, Depending on what functions are being used in Scale, other ports may also get used, as documented in https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_firewall.htm On the other hand, I'd initially speculate that you might be hitting a problem in mmnetverify itself. (perhaps some aspect in mmnetverify is not taking into account that ports other than 22, 1191, 60000-61000 may be getting blocked by the firewall) Could you open a PMR for this one? Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 10/19/2018 06:41 AM Subject: [gpfsug-discuss] Spectrum Scale and Firewalls Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We?re having some issues bringing up firewalls on some of our NSD nodes. The problem I was actually trying to diagnose I don?t think is firewall related but still ? We have port 22 and 1191 open and also 60000-61000, we also set: # mmlsconfig tscTcpPort tscTcpPort 1191 # mmlsconfig tscCmdPortRange tscCmdPortRange 60000-61000 https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_firewallforinternalcommn.htm Claims this is sufficient ? Running mmnetverify: # mmnetverify all --target-nodes rds-er-mgr01 rds-pg-mgr01 checking local configuration. Operation interface: Success. rds-pg-mgr01 checking communication with node rds-er-mgr01. Operation resolution: Success. Operation ping: Success. Operation shell: Success. Operation copy: Success. Operation time: Success. Operation daemon-port: Success. Operation sdrserv-port: Success. Operation tsccmd-port: Success. Operation data-small: Success. Operation data-medium: Success. Operation data-large: Success. Could not connect to port 46326 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. rds-pg-mgr01 checking cluster communications. Issues Found: rds-er-mgr01 could not connect to rds-pg-mgr01 (TCP, port 46326). mmnetverify: Command failed. Examine previous error messages to determine cause. Note that the port number mentioned changes if we run mmnetverify multiple times. The two clients in this test are running 5.0.2 code. If I run in verbose mode I see: Checking network communication with node rds-er-mgr01. Port range restricted by cluster configuration: 60000 - 61000. rds-er-mgr01: connecting to node rds-pg-mgr01. rds-er-mgr01: exchanged 256.0M bytes with rds-pg-mgr01. Write size: 16.0M bytes. Network statistics for rds-er-mgr01 during data exchange: packets sent: 68112 packets received: 72452 Network Traffic between rds-er-mgr01 and rds-pg-mgr01 port 60000 ok. Operation data-large: Success. Checking network bandwidth. rds-er-mgr01: connecting to node rds-pg-mgr01. Could not connect to port 36277 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. So for many of the tests it looks like its using port 60000 as expected, is this just a bug in mmnetverify or am I doing something silly? Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Fri Oct 19 14:39:25 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 19 Oct 2018 13:39:25 +0000 Subject: [gpfsug-discuss] Spectrum Scale and Firewalls In-Reply-To: References: <10239ED8-7E0D-4420-8BEC-F17F0606BE64@bham.ac.uk> Message-ID: Yeah we have the perfmon ports open, and GUI ports open on the GUI nodes. But basically this is just a storage cluster and everything else (protocols etc) run in remote clusters. I?ve just opened a ticket ? no longer a PMR in the new support centre for Scale Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 19 October 2018 at 14:05 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Spectrum Scale and Firewalls Simon, Depending on what functions are being used in Scale, other ports may also get used, as documented in https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_firewall.htm On the other hand, I'd initially speculate that you might be hitting a problem in mmnetverify itself. (perhaps some aspect in mmnetverify is not taking into account that ports other than 22, 1191, 60000-61000 may be getting blocked by the firewall) Could you open a PMR for this one? Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for Simon Thompson ---10/19/2018 06:41:27 AM---Hi, We?re having some issues bringing up firewalls on som]Simon Thompson ---10/19/2018 06:41:27 AM---Hi, We?re having some issues bringing up firewalls on some of our NSD nodes. The problem I was actua From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 10/19/2018 06:41 AM Subject: [gpfsug-discuss] Spectrum Scale and Firewalls Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, We?re having some issues bringing up firewalls on some of our NSD nodes. The problem I was actually trying to diagnose I don?t think is firewall related but still ? We have port 22 and 1191 open and also 60000-61000, we also set: # mmlsconfig tscTcpPort tscTcpPort 1191 # mmlsconfig tscCmdPortRange tscCmdPortRange 60000-61000 https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_firewallforinternalcommn.htm Claims this is sufficient ? Running mmnetverify: # mmnetverify all --target-nodes rds-er-mgr01 rds-pg-mgr01 checking local configuration. Operation interface: Success. rds-pg-mgr01 checking communication with node rds-er-mgr01. Operation resolution: Success. Operation ping: Success. Operation shell: Success. Operation copy: Success. Operation time: Success. Operation daemon-port: Success. Operation sdrserv-port: Success. Operation tsccmd-port: Success. Operation data-small: Success. Operation data-medium: Success. Operation data-large: Success. Could not connect to port 46326 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. rds-pg-mgr01 checking cluster communications. Issues Found: rds-er-mgr01 could not connect to rds-pg-mgr01 (TCP, port 46326). mmnetverify: Command failed. Examine previous error messages to determine cause. Note that the port number mentioned changes if we run mmnetverify multiple times. The two clients in this test are running 5.0.2 code. If I run in verbose mode I see: Checking network communication with node rds-er-mgr01. Port range restricted by cluster configuration: 60000 - 61000. rds-er-mgr01: connecting to node rds-pg-mgr01. rds-er-mgr01: exchanged 256.0M bytes with rds-pg-mgr01. Write size: 16.0M bytes. Network statistics for rds-er-mgr01 during data exchange: packets sent: 68112 packets received: 72452 Network Traffic between rds-er-mgr01 and rds-pg-mgr01 port 60000 ok. Operation data-large: Success. Checking network bandwidth. rds-er-mgr01: connecting to node rds-pg-mgr01. Could not connect to port 36277 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. So for many of the tests it looks like its using port 60000 as expected, is this just a bug in mmnetverify or am I doing something silly? Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 106 bytes Desc: image001.gif URL: From Robert.Oesterlin at nuance.com Fri Oct 19 16:33:04 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 19 Oct 2018 15:33:04 +0000 Subject: [gpfsug-discuss] SC18 - User Group Meeting - Agenda and Registration Message-ID: <041D6114-8F12-463F-BFB5-ABF1A1834DA1@nuance.com> SC18 is only 3 weeks away! Here is the (more or less) final agenda for the user group meeting. SSUG @ SC18 Sunday, November 11th 12:30PM - 18:00 Omni Dallas Hotel 555 S Lamar Dallas, Texas Please register at the IBM site here: https://www-01.ibm.com/events/wwe/grp/grp305.nsf/Agenda.xsp?locale=en_US&openform=&seminar=2DQMNHES# Looking forward to seeing everyone in Dallas! Bob, Kristy, and Simon Start End Duration Title 12:30 12:45 15 Welcome 12:45 13:15 30 Spectrum Scale Update 13:15 13:30 15 ESS Update 13:30 13:45 15 Service Update 13:45 14:05 20 Lessons learned from a very unusual year (Kevin Buterbaugh, Vanderbilt) 14:05 14:25 20 Implementing a scratch filesystem with E8 Storage NVMe (Tom King, Queen Mary University of London) 14:25 14:45 20 Spectrum Scale and Containers (John Lewars, IBM) 14:45 15:10 25 Break 15:10 15:30 20 Best Practices for Protocol Nodes (Tomer Perry/Ulf Troppens, IBM) 15:30 15:50 20 Network Design Tomer Perry/Ulf Troppens, IBM/Mellanox) 15:50 16:10 20 AI Discussion 16:10 16:30 20 Improving Spark workload performance with Spectrum Conductor on Spectrum Scale (Chris Schlipalius, Pawsey Supercomputing Centre) 16:30 16:50 20 Spectrum Scale @ DDN ? Technical update (Sven Oehme, DDN) 16:50 17:10 20 Burst Buffer (Tom Goodings) 17:10 17:30 20 MetaData Management 17:30 17:45 15 Lenovo Update (Michael Hennecke, Lenovo) 17:45 18:00 15 Ask us anything 18:00 Social Event (at the hotel) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Mon Oct 22 01:25:50 2018 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Mon, 22 Oct 2018 00:25:50 +0000 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From oehmes at gmail.com Mon Oct 22 17:18:43 2018 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 22 Oct 2018 09:18:43 -0700 Subject: [gpfsug-discuss] GPFS, Pagepool and Block size -> Perfomance reduces with larger block size In-Reply-To: <243c5d36-f25e-4ebb-b9f3-6fc47bc6d93c@Spark> References: <6bb509b7-b7c5-422d-8e27-599333b6b7c4@Spark> <013aeb31-ebd2-4cc7-97d1-06883d9569f7@Spark> <243c5d36-f25e-4ebb-b9f3-6fc47bc6d93c@Spark> Message-ID: oops, somehow that slipped my inbox, i only saw that reply right now. its really hard to see from a trace snipped if the lock is the issue as the lower level locks don't show up in default traces. without having access to source code and a detailed trace you won't make much progress here. sven On Thu, Sep 27, 2018 at 12:31 PM wrote: > Thank you Sven, > > Turning of prefetching did not improve the performance, but it did degrade > a bit. > > I have made the prefetching default and took trace dump, for tracectl with > trace=io. Let me know if you want me to paste/attach it here. > > May i know, how could i confirm if the below is true? > > 1. this could be serialization around buffer locks. as larger your >>> blocksize gets as larger is the amount of data one of this pagepool buffers >>> will maintain, if there is a lot of concurrency on smaller amount of data >>> more threads potentially compete for the same buffer lock to copy stuff in >>> and out of a particular buffer, hence things go slower compared to the same >>> amount of data spread across more buffers, each of smaller size. >>> >>> > Will the above trace help in understanding if it is a serialization issue? > > I had been discussing the same with GPFS support for past few months, and > it seems to be that most of the time is being spent at cxiUXfer. They could > not understand on why it is taking spending so much of time in cxiUXfer. I > was seeing the same from perf top, and pagefaults. > > Below is snippet from what the support had said : > > ???????????????????????????? > > I searched all of the gpfsRead from trace and sort them by spending-time. > Except 2 reads which need fetch data from nsd server, the slowest read is > in the thread 72170. It took 112470.362 us. > > > trcrpt.2018-08-06_12.27.39.55538.lt15.trsum: 72165 6.860911319 > rdwr 141857.076 us + NSDIO > > trcrpt.2018-08-06_12.26.28.39794.lt15.trsum: 72170 1.483947593 > rdwr 112470.362 us + cxiUXfer > > trcrpt.2018-08-06_12.27.39.55538.lt15.trsum: 72165 6.949042593 > rdwr 88126.278 us + NSDIO > > trcrpt.2018-08-06_12.27.03.47706.lt15.trsum: 72156 2.919334474 > rdwr 81057.657 us + cxiUXfer > > trcrpt.2018-08-06_12.23.30.72745.lt15.trsum: 72154 1.167484466 > rdwr 76033.488 us + cxiUXfer > > trcrpt.2018-08-06_12.24.06.7508.lt15.trsum: 72187 0.685237501 > rdwr 70772.326 us + cxiUXfer > > trcrpt.2018-08-06_12.25.17.23989.lt15.trsum: 72193 4.757996530 > rdwr 70447.838 us + cxiUXfer > > > I check each of the slow IO as above, and find they all spend much time in > the function cxiUXfer. This function is used to copy data from kernel > buffer to user buffer. I am not sure why it took so much time. This should > be related to the pagefaults and pgfree you observed. Below is the trace > data for thread 72170. > > > 1.371477231 72170 TRACE_VNODE: gpfs_f_rdwr enter: fP > 0xFFFF882541649400 f_flags 0x8000 flags 0x8001 op 0 iovec > 0xFFFF881F2AFB3E70 count 1 offset 0x168F30D dentry 0xFFFF887C0CC298C0 > private 0xFFFF883F607175C0 iP 0xFFFF8823AA3CBFC0 name '410513.svs' > > .... > > 1.371483547 72170 TRACE_KSVFS: cachedReadFast exit: > uio_resid 16777216 code 1 err 11 > > .... > > 1.371498780 72170 TRACE_KSVFS: kSFSReadFast: oiP > 0xFFFFC90060B46740 offset 0x168F30D dataBufP FFFFC9003645A5A8 nDesc 64 buf > 200043C0000 valid words 64 dirty words 0 blkOff 0 > > 1.371499035 72170 TRACE_LOG: > UpdateLogger::beginDataUpdate begin ul 0xFFFFC900333F1A40 holdCount 0 > ioType 0x2 inProg 0x15 > > 1.371500157 72170 TRACE_LOG: > UpdateLogger::beginDataUpdate ul 0xFFFFC900333F1A40 holdCount 0 ioType 0x2 > inProg 0x16 err 0 > > 1.371500606 72170 TRACE_KSVFS: cxiUXfer: nDesc 64 1st > dataPtr 0x200043C0000 plP 0xFFFF887F7B90D600 toIOBuf 0 offset 6877965 len > 9899251 > > 1.371500793 72170 TRACE_KSVFS: cxiUXfer: ndesc 0 skip > dataAddrP 0x200043C0000 currOffset 0 currLen 262144 bufOffset 6877965 > > .... > > 1.371505949 72170 TRACE_KSVFS: cxiUXfer: ndesc 25 skip > dataAddrP 0x2001AF80000 currOffset 6553600 currLen 262144 bufOffset 6877965 > > 1.371506236 72170 TRACE_KSVFS: cxiUXfer: nDesc 26 > currOffset 6815744 tmpLen 262144 dataAddrP 0x2001AFCF30D currLen 199923 > pageOffset 781 pageLen 3315 plP 0xFFFF887F7B90D600 > > 1.373649823 72170 TRACE_KSVFS: cxiUXfer: nDesc 27 > currOffset 7077888 tmpLen 262144 dataAddrP 0x20027400000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > 1.375158799 72170 TRACE_KSVFS: cxiUXfer: nDesc 28 > currOffset 7340032 tmpLen 262144 dataAddrP 0x20027440000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > 1.376661566 72170 TRACE_KSVFS: cxiUXfer: nDesc 29 > currOffset 7602176 tmpLen 262144 dataAddrP 0x2002C180000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > 1.377892653 72170 TRACE_KSVFS: cxiUXfer: nDesc 30 > currOffset 7864320 tmpLen 262144 dataAddrP 0x2002C1C0000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > .... > > 1.471389843 72170 TRACE_KSVFS: cxiUXfer: nDesc 62 > currOffset 16252928 tmpLen 262144 dataAddrP 0x2001D2C0000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > 1.471845629 72170 TRACE_KSVFS: cxiUXfer: nDesc 63 > currOffset 16515072 tmpLen 262144 dataAddrP 0x2003EC80000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > 1.472417149 72170 TRACE_KSVFS: cxiDetachIOBuffer: > dataPtr 0x200043C0000 plP 0xFFFF887F7B90D600 > > 1.472417775 72170 TRACE_LOCK: unlock_vfs: type Data, > key 0000000000000004:000000001B1F24BF:0000000000000001 lock_mode have ro > token xw lock_state old [ ro:27 ] new [ ro:26 ] holdCount now 27 > > 1.472418427 72170 TRACE_LOCK: hash tab lookup vfs: > found cP 0xFFFFC9005FC0CDE0 holdCount now 14 > > 1.472418592 72170 TRACE_LOCK: lock_vfs: type Data key > 0000000000000004:000000001B1F24BF:0000000000000002 lock_mode want ro status > valid token xw/xw lock_state [ ro:12 ] flags 0x0 holdCount 14 > > 1.472419842 72170 TRACE_KSVFS: kSFSReadFast: oiP > 0xFFFFC90060B46740 offset 0x2000000 dataBufP FFFFC9003643C908 nDesc 64 buf > 38033480000 valid words 64 dirty words 0 blkOff 0 > > 1.472420029 72170 TRACE_LOG: > UpdateLogger::beginDataUpdate begin ul 0xFFFFC9005FC0CF98 holdCount 0 > ioType 0x2 inProg 0xC > > 1.472420187 72170 TRACE_LOG: > UpdateLogger::beginDataUpdate ul 0xFFFFC9005FC0CF98 holdCount 0 ioType 0x2 > inProg 0xD err 0 > > 1.472420652 72170 TRACE_KSVFS: cxiUXfer: nDesc 64 1st > dataPtr 0x38033480000 plP 0xFFFF887F7B934320 toIOBuf 0 offset 0 len 6877965 > > 1.472420936 72170 TRACE_KSVFS: cxiUXfer: nDesc 0 > currOffset 0 tmpLen 262144 dataAddrP 0x38033480000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > 1.472824790 72170 TRACE_KSVFS: cxiUXfer: nDesc 1 > currOffset 262144 tmpLen 262144 dataAddrP 0x380334C0000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > 1.473243905 72170 TRACE_KSVFS: cxiUXfer: nDesc 2 > currOffset 524288 tmpLen 262144 dataAddrP 0x38024280000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > .... > > 1.482949347 72170 TRACE_KSVFS: cxiUXfer: nDesc 24 > currOffset 6291456 tmpLen 262144 dataAddrP 0x38025E80000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > 1.483354265 72170 TRACE_KSVFS: cxiUXfer: nDesc 25 > currOffset 6553600 tmpLen 262144 dataAddrP 0x38025EC0000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > 1.483766631 72170 TRACE_KSVFS: cxiUXfer: nDesc 26 > currOffset 6815744 tmpLen 262144 dataAddrP 0x38003B00000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > 1.483943894 72170 TRACE_KSVFS: cxiDetachIOBuffer: > dataPtr 0x38033480000 plP 0xFFFF887F7B934320 > > 1.483944339 72170 TRACE_LOCK: unlock_vfs: type Data, > key 0000000000000004:000000001B1F24BF:0000000000000002 lock_mode have ro > token xw lock_state old [ ro:14 ] new [ ro:13 ] holdCount now 14 > > 1.483944683 72170 TRACE_BRL: brUnlockM: ofP > 0xFFFFC90069346B68 inode 455025855 snap 0 handle 0xFFFFC9003637D020 range > 0x168F30D-0x268F30C mode ro > > 1.483944985 72170 TRACE_KSVFS: kSFSReadFast exit: > uio_resid 0 err 0 > > 1.483945264 72170 TRACE_LOCK: unlock_vfs_m: type > Inode, key 305F105B9701E60A:000000001B1F24BF:0000000000000000 lock_mode > have ro status valid token rs lock_state old [ ro:25 ] new [ ro:24 ] > > 1.483945423 72170 TRACE_LOCK: unlock_vfs_m: cP > 0xFFFFC90069346B68 holdCount 25 > > 1.483945624 72170 TRACE_VNODE: gpfsRead exit: fast err > 0 > > 1.483946831 72170 TRACE_KSVFS: ReleSG: sli 38 sgP > 0xFFFFC90035E52F78 NotQuiesced vfsOp 2 > > 1.483946975 72170 TRACE_KSVFS: ReleSG: sli 38 sgP > 0xFFFFC90035E52F78 vfsOp 2 users 1-1 > > 1.483947116 72170 TRACE_KSVFS: ReleaseDaemonSegAndSG: > sli 38 count 2 needCleanup 0 > > 1.483947593 72170 TRACE_VNODE: gpfs_f_rdwr exit: fP > 0xFFFF882541649400 total_len 16777216 uio_resid 0 offset 0x268F30D rc 0 > > > ??????????????????????????????????????????? > > > > Regards, > Lohit > > On Sep 19, 2018, 3:11 PM -0400, Sven Oehme , wrote: > > the document primarily explains all performance specific knobs. general > advice would be to longer set anything beside workerthreads, pagepool and > filecache on 5.X systems as most other settings are no longer relevant > (thats a client side statement) . thats is true until you hit strange > workloads , which is why all the knobs are still there :-) > > sven > > > On Wed, Sep 19, 2018 at 11:17 AM wrote: > >> Thanks Sven. >> I will disable it completely and see how it behaves. >> >> Is this the presentation? >> >> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf >> >> I guess i read it, but it did not strike me at this situation. I will try >> to read it again and see if i could make use of it. >> >> Regards, >> Lohit >> >> On Sep 19, 2018, 2:12 PM -0400, Sven Oehme , wrote: >> >> seem like you never read my performance presentation from a few years ago >> ;-) >> >> you can control this on a per node basis , either for all i/o : >> >> prefetchAggressiveness = X >> >> or individual for reads or writes : >> >> prefetchAggressivenessRead = X >> prefetchAggressivenessWrite = X >> >> for a start i would turn it off completely via : >> >> mmchconfig prefetchAggressiveness=0 -I -N nodename >> >> that will turn it off only for that node and only until you restart the >> node. >> then see what happens >> >> sven >> >> >> On Wed, Sep 19, 2018 at 11:07 AM wrote: >> >>> Thank you Sven. >>> >>> I mostly think it could be 1. or some other issue. >>> I don?t think it could be 2. , because i can replicate this issue no >>> matter what is the size of the dataset. It happens for few files that could >>> easily fit in the page pool too. >>> >>> I do see a lot more page faults for 16M compared to 1M, so it could be >>> related to many threads trying to compete for the same buffer space. >>> >>> I will try to take the trace with trace=io option and see if can find >>> something. >>> >>> How do i turn of prefetching? Can i turn it off for a single >>> node/client? >>> >>> Regards, >>> Lohit >>> >>> On Sep 18, 2018, 5:23 PM -0400, Sven Oehme , wrote: >>> >>> Hi, >>> >>> taking a trace would tell for sure, but i suspect what you might be >>> hitting one or even multiple issues which have similar negative performance >>> impacts but different root causes. >>> >>> 1. this could be serialization around buffer locks. as larger your >>> blocksize gets as larger is the amount of data one of this pagepool buffers >>> will maintain, if there is a lot of concurrency on smaller amount of data >>> more threads potentially compete for the same buffer lock to copy stuff in >>> and out of a particular buffer, hence things go slower compared to the same >>> amount of data spread across more buffers, each of smaller size. >>> >>> 2. your data set is small'ish, lets say a couple of time bigger than the >>> pagepool and you random access it with multiple threads. what will happen >>> is that because it doesn't fit into the cache it will be read from the >>> backend. if multiple threads hit the same 16 mb block at once with multiple >>> 4k random reads, it will read the whole 16mb block because it thinks it >>> will benefit from it later on out of cache, but because it fully random the >>> same happens with the next block and the next and so on and before you get >>> back to this block it was pushed out of the cache because of lack of enough >>> pagepool. >>> >>> i could think of multiple other scenarios , which is why its so hard to >>> accurately benchmark an application because you will design a benchmark to >>> test an application, but it actually almost always behaves different then >>> you think it does :-) >>> >>> so best is to run the real application and see under which configuration >>> it works best. >>> >>> you could also take a trace with trace=io and then look at >>> >>> TRACE_VNOP: READ: >>> TRACE_VNOP: WRITE: >>> >>> and compare them to >>> >>> TRACE_IO: QIO: read >>> TRACE_IO: QIO: write >>> >>> and see if the numbers summed up for both are somewhat equal. if >>> TRACE_VNOP is significant smaller than TRACE_IO you most likely do more i/o >>> than you should and turning prefetching off might actually make things >>> faster . >>> >>> keep in mind i am no longer working for IBM so all i say might be >>> obsolete by now, i no longer have access to the one and only truth aka the >>> source code ... but if i am wrong i am sure somebody will point this out >>> soon ;-) >>> >>> sven >>> >>> >>> >>> >>> On Tue, Sep 18, 2018 at 10:31 AM wrote: >>> >>>> Hello All, >>>> >>>> This is a continuation to the previous discussion that i had with Sven. >>>> However against what i had mentioned previously - i realize that this >>>> is ?not? related to mmap, and i see it when doing random freads. >>>> >>>> I see that block-size of the filesystem matters when reading from Page >>>> pool. >>>> I see a major difference in performance when compared 1M to 16M, when >>>> doing lot of random small freads with all of the data in pagepool. >>>> >>>> Performance for 1M is a magnitude ?more? than the performance that i >>>> see for 16M. >>>> >>>> The GPFS that we have currently is : >>>> Version : 5.0.1-0.5 >>>> Filesystem version: 19.01 (5.0.1.0) >>>> Block-size : 16M >>>> >>>> I had made the filesystem block-size to be 16M, thinking that i would >>>> get the most performance for both random/sequential reads from 16M than the >>>> smaller block-sizes. >>>> With GPFS 5.0, i made use the 1024 sub-blocks instead of 32 and thus >>>> not loose lot of storage space even with 16M. >>>> I had run few benchmarks and i did see that 16M was performing better >>>> ?when hitting storage/disks? with respect to bandwidth for >>>> random/sequential on small/large reads. >>>> >>>> However, with this particular workload - where it freads a chunk of >>>> data randomly from hundreds of files -> I see that the number of >>>> page-faults increase with block-size and actually reduce the performance. >>>> 1M performs a lot better than 16M, and may be i will get better >>>> performance with less than 1M. >>>> It gives the best performance when reading from local disk, with 4K >>>> block size filesystem. >>>> >>>> What i mean by performance when it comes to this workload - is not the >>>> bandwidth but the amount of time that it takes to do each iteration/read >>>> batch of data. >>>> >>>> I figure what is happening is: >>>> fread is trying to read a full block size of 16M - which is good in a >>>> way, when it hits the hard disk. >>>> But the application could be using just a small part of that 16M. Thus >>>> when randomly reading(freads) lot of data of 16M chunk size - it is page >>>> faulting a lot more and causing the performance to drop . >>>> I could try to make the application do read instead of freads, but i >>>> fear that could be bad too since it might be hitting the disk with a very >>>> small block size and that is not good. >>>> >>>> With the way i see things now - >>>> I believe it could be best if the application does random reads of >>>> 4k/1M from pagepool but some how does 16M from rotating disks. >>>> >>>> I don?t see any way of doing the above other than following a different >>>> approach where i create a filesystem with a smaller block size ( 1M or less >>>> than 1M ), on SSDs as a tier. >>>> >>>> May i please ask for advise, if what i am understanding/seeing is right >>>> and the best solution possible for the above scenario. >>>> >>>> Regards, >>>> Lohit >>>> >>>> On Apr 11, 2018, 10:36 AM -0400, Lohit Valleru , >>>> wrote: >>>> >>>> Hey Sven, >>>> >>>> This is regarding mmap issues and GPFS. >>>> We had discussed previously of experimenting with GPFS 5. >>>> >>>> I now have upgraded all of compute nodes and NSD nodes to GPFS 5.0.0.2 >>>> >>>> I am yet to experiment with mmap performance, but before that - I am >>>> seeing weird hangs with GPFS 5 and I think it could be related to mmap. >>>> >>>> Have you seen GPFS ever hang on this syscall? >>>> [Tue Apr 10 04:20:13 2018] [] >>>> _ZN10gpfsNode_t8mmapLockEiiPKj+0xb5/0x140 [mmfs26] >>>> >>>> I see the above ,when kernel hangs and throws out a series of trace >>>> calls. >>>> >>>> I somehow think the above trace is related to processes hanging on GPFS >>>> forever. There are no errors in GPFS however. >>>> >>>> Also, I think the above happens only when the mmap threads go above a >>>> particular number. >>>> >>>> We had faced a similar issue in 4.2.3 and it was resolved in a patch to >>>> 4.2.3.2 . At that time , the issue happened when mmap threads go more than >>>> worker1threads. According to the ticket - it was a mmap race condition that >>>> GPFS was not handling well. >>>> >>>> I am not sure if this issue is a repeat and I am yet to isolate the >>>> incident and test with increasing number of mmap threads. >>>> >>>> I am not 100 percent sure if this is related to mmap yet but just >>>> wanted to ask you if you have seen anything like above. >>>> >>>> Thanks, >>>> >>>> Lohit >>>> >>>> On Feb 22, 2018, 3:59 PM -0500, Sven Oehme , wrote: >>>> >>>> Hi Lohit, >>>> >>>> i am working with ray on a mmap performance improvement right now, >>>> which most likely has the same root cause as yours , see --> >>>> http://gpfsug.org/pipermail/gpfsug-discuss/2018-January/004411.html >>>> the thread above is silent after a couple of back and rorth, but ray >>>> and i have active communication in the background and will repost as soon >>>> as there is something new to share. >>>> i am happy to look at this issue after we finish with ray's workload if >>>> there is something missing, but first let's finish his, get you try the >>>> same fix and see if there is something missing. >>>> >>>> btw. if people would share their use of MMAP , what applications they >>>> use (home grown, just use lmdb which uses mmap under the cover, etc) please >>>> let me know so i get a better picture on how wide the usage is with GPFS. i >>>> know a lot of the ML/DL workloads are using it, but i would like to know >>>> what else is out there i might not think about. feel free to drop me a >>>> personal note, i might not reply to it right away, but eventually. >>>> >>>> thx. sven >>>> >>>> >>>> On Thu, Feb 22, 2018 at 12:33 PM wrote: >>>> >>>>> Hi all, >>>>> >>>>> I wanted to know, how does mmap interact with GPFS pagepool with >>>>> respect to filesystem block-size? >>>>> Does the efficiency depend on the mmap read size and the block-size of >>>>> the filesystem even if all the data is cached in pagepool? >>>>> >>>>> GPFS 4.2.3.2 and CentOS7. >>>>> >>>>> Here is what i observed: >>>>> >>>>> I was testing a user script that uses mmap to read from 100M to 500MB >>>>> files. >>>>> >>>>> The above files are stored on 3 different filesystems. >>>>> >>>>> Compute nodes - 10G pagepool and 5G seqdiscardthreshold. >>>>> >>>>> 1. 4M block size GPFS filesystem, with separate metadata and data. >>>>> Data on Near line and metadata on SSDs >>>>> 2. 1M block size GPFS filesystem as a AFM cache cluster, "with all the >>>>> required files fully cached" from the above GPFS cluster as home. Data and >>>>> Metadata together on SSDs >>>>> 3. 16M block size GPFS filesystem, with separate metadata and data. >>>>> Data on Near line and metadata on SSDs >>>>> >>>>> When i run the script first time for ?each" filesystem: >>>>> I see that GPFS reads from the files, and caches into the pagepool as >>>>> it reads, from mmdiag -- iohist >>>>> >>>>> When i run the second time, i see that there are no IO requests from >>>>> the compute node to GPFS NSD servers, which is expected since all the data >>>>> from the 3 filesystems is cached. >>>>> >>>>> However - the time taken for the script to run for the files in the 3 >>>>> different filesystems is different - although i know that they are just >>>>> "mmapping"/reading from pagepool/cache and not from disk. >>>>> >>>>> Here is the difference in time, for IO just from pagepool: >>>>> >>>>> 20s 4M block size >>>>> 15s 1M block size >>>>> 40S 16M block size. >>>>> >>>>> Why do i see a difference when trying to mmap reads from different >>>>> block-size filesystems, although i see that the IO requests are not hitting >>>>> disks and just the pagepool? >>>>> >>>>> I am willing to share the strace output and mmdiag outputs if needed. >>>>> >>>>> Thanks, >>>>> Lohit >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Mon Oct 22 16:21:06 2018 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Mon, 22 Oct 2018 15:21:06 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> Message-ID: <3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> It seems like the primary way that this helps us is that we transfer user home directories and many of them have VERY large numbers of small files (in the millions), so running multiple simultaneous rsyncs allows the transfer to continue past that one slow area. I guess it balances the bandwidth constraint and the I/O constraints on generating a file list. There are unfortunately one or two known bugs that slow it down ? it keeps track of its rsync PIDs but sometimes a former rsync PID is reused by the system which it counts against the number of running rsyncs. It can also think rsync is still running at the end when it?s really something else now using the PID. I know the author is looking at that. For shorter transfers, you likely won?t run into this. I?m not sure I have the time or the programming ability to make this happen, but it seems to me that one could make some major gains by replacing fpart with mmfind in a GPFS environment. Generating lists of files takes a significant amount of time and mmfind can probably do it faster than anything else that does not have direct access to GPFS metadata. > On Oct 19, 2018, at 6:37 AM, Dwayne.Hart at med.mun.ca wrote: > > Thank you Ryan. I?ll have a more in-depth look at this application later today and see how it deals with some of the large genetic files that are generated by the sequencer. By copying it from GPFS fs to another GPFS fs. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > >> On Oct 19, 2018, at 7:04 AM, Ryan Novosielski wrote: >> >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> We use parsyncfp. Our target is not GPFS, though. I was really hoping >> to hear about something snazzier for GPFS-GPFS. Lenovo would probably >> tell you that HSM is the way to go (we asked something similar for a >> replacement for our current setup or for distributed storage). >> >>> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: >>> Hi, >>> >>> Just wondering what the best recipe for migrating a user?s home >>> directory content from one GFPS file system to another which hosts >>> a larger research GPFS file system? I?m currently using rsync and >>> it has maxed out the client system?s IB interface. >>> >>> Best, Dwayne ? Dwayne Hart | Systems Administrator IV >>> >>> CHIA, Faculty of Medicine Memorial University of Newfoundland 300 >>> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L >>> Dobbin Building | 4M409 T 709 864 6631 >>> _______________________________________________ gpfsug-discuss >>> mailing list gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> - -- >> ____ >> || \\UTGERS, |----------------------*O*------------------------ >> ||_// the State | Ryan Novosielski - novosirj at rutgers.edu >> || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus >> || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark >> `' >> -----BEGIN PGP SIGNATURE----- >> >> iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG >> p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk >> =dMDg >> -----END PGP SIGNATURE----- >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Mon Oct 22 19:11:06 2018 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 22 Oct 2018 11:11:06 -0700 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> <3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> Message-ID: i am not sure if that was mentioned already but in some version of V5.0.X based on my suggestion a tool was added by mark on a AS-IS basis (thanks mark) to do what you want with one exception : /usr/lpp/mmfs/samples/ilm/mmxcp -h Usage: /usr/lpp/mmfs/samples/ilm/mmxcp -t target -p strip_count source_pathname1 source_pathname2 ... Run "cp" in a mmfind ... -xarg ... pipeline, e.g. mmfind -polFlags '-N all -g /gpfs/tmp' /gpfs/source -gpfsWeight DIRECTORY_HASH -xargs mmxcp -t /target -p 2 Options: -t target_path : Copy files to this path. -p strip_count : Remove this many directory names from the pathnames of the source files. -a : pass -a to cp -v : pass -v to cp this is essentially a parallel copy tool using the policy with all its goddies. the one critical part thats missing is that it doesn't copy any GPFS specific metadata which unfortunate includes NFSV4 ACL's. the reason for that is that GPFS doesn't expose the NFSV4 ACl's via xattrs nor does any of the regular Linux tools uses the proprietary interface into GPFS to extract and apply them (this is what allows this magic unsupported version of rsync https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync to transfer the acls and other attributes). so a worth while RFE would be to either expose all special GPFS bits as xattrs or provide at least a maintained version of sync, cp or whatever which allows the transfer of this data. Sven On Mon, Oct 22, 2018 at 10:52 AM Ryan Novosielski wrote: > It seems like the primary way that this helps us is that we transfer user > home directories and many of them have VERY large numbers of small files > (in the millions), so running multiple simultaneous rsyncs allows the > transfer to continue past that one slow area. I guess it balances the > bandwidth constraint and the I/O constraints on generating a file list. > There are unfortunately one or two known bugs that slow it down ? it keeps > track of its rsync PIDs but sometimes a former rsync PID is reused by the > system which it counts against the number of running rsyncs. It can also > think rsync is still running at the end when it?s really something else now > using the PID. I know the author is looking at that. For shorter transfers, > you likely won?t run into this. > > I?m not sure I have the time or the programming ability to make this > happen, but it seems to me that one could make some major gains by > replacing fpart with mmfind in a GPFS environment. Generating lists of > files takes a significant amount of time and mmfind can probably do it > faster than anything else that does not have direct access to GPFS metadata. > > > On Oct 19, 2018, at 6:37 AM, Dwayne.Hart at med.mun.ca wrote: > > > > Thank you Ryan. I?ll have a more in-depth look at this application later > today and see how it deals with some of the large genetic files that are > generated by the sequencer. By copying it from GPFS fs to another GPFS fs. > > > > Best, > > Dwayne > > ? > > Dwayne Hart | Systems Administrator IV > > > > CHIA, Faculty of Medicine > > Memorial University of Newfoundland > > 300 Prince Philip Drive > > St. John?s, Newfoundland | A1B 3V6 > > Craig L Dobbin Building | 4M409 > > T 709 864 6631 <(709)%20864-6631> > > > >> On Oct 19, 2018, at 7:04 AM, Ryan Novosielski > wrote: > >> > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> We use parsyncfp. Our target is not GPFS, though. I was really hoping > >> to hear about something snazzier for GPFS-GPFS. Lenovo would probably > >> tell you that HSM is the way to go (we asked something similar for a > >> replacement for our current setup or for distributed storage). > >> > >>> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: > >>> Hi, > >>> > >>> Just wondering what the best recipe for migrating a user?s home > >>> directory content from one GFPS file system to another which hosts > >>> a larger research GPFS file system? I?m currently using rsync and > >>> it has maxed out the client system?s IB interface. > >>> > >>> Best, Dwayne ? Dwayne Hart | Systems Administrator IV > >>> > >>> CHIA, Faculty of Medicine Memorial University of Newfoundland 300 > >>> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L > >>> Dobbin Building | 4M409 T 709 864 6631 <(709)%20864-6631> > >>> _______________________________________________ gpfsug-discuss > >>> mailing list gpfsug-discuss at spectrumscale.org > >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >>> > >> > >> - -- > >> ____ > >> || \\UTGERS, |----------------------*O*------------------------ > >> ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > >> || \\ University | Sr. Technologist - 973/972.0922 <(973)%20972-0922> > ~*~ RBHS Campus > >> || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark > >> `' > >> -----BEGIN PGP SIGNATURE----- > >> > >> iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG > >> p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk > >> =dMDg > >> -----END PGP SIGNATURE----- > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Oct 22 21:08:49 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 22 Oct 2018 16:08:49 -0400 Subject: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca><92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu><3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> Message-ID: Rather than hack rsync or cp ... I proposed a smallish utility that would copy those extended attributes and ACLs that cp -a just skips over. This can be done using the documented GPFS APIs that were designed for backup and restore of files. SMOP and then add it as an option to samples/ilm/mmxcp Sorry I haven't gotten around to doing this ... Seems like a modest sized project... Avoids boiling the ocean and reinventing or hacking rsync. -- marc K -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Oct 22 21:15:52 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 22 Oct 2018 20:15:52 +0000 Subject: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca><92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu><3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> , Message-ID: Can you use mmxcp with output from tsbuhelper? Becuase this would actually be a pretty good way of doing incrementals when deploying a new storage system (unless IBM wants to let us add new storage and change the block size.... Someday maybe...) Though until mmxcp supports ACLs, it's still not really a solution I guess. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of makaplan at us.ibm.com [makaplan at us.ibm.com] Sent: 22 October 2018 21:08 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp Rather than hack rsync or cp ... I proposed a smallish utility that would copy those extended attributes and ACLs that cp -a just skips over. This can be done using the documented GPFS APIs that were designed for backup and restore of files. SMOP and then add it as an option to samples/ilm/mmxcp Sorry I haven't gotten around to doing this ... Seems like a modest sized project... Avoids boiling the ocean and reinventing or hacking rsync. -- marc K From oehmes at gmail.com Mon Oct 22 21:33:17 2018 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 22 Oct 2018 13:33:17 -0700 Subject: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> <3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> Message-ID: Marc, The issue with that is that you need multiple passes and things change in between, it also significant increases migration times. You will always miss something or you need to manually correct. The right thing is to have 1 tool that takes care of both, the bulk transfer and the additional attributes. Sven From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Monday, October 22, 2018 at 1:09 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp Rather than hack rsync or cp ... I proposed a smallish utility that would copy those extended attributes and ACLs that cp -a just skips over. This can be done using the documented GPFS APIs that were designed for backup and restore of files. SMOP and then add it as an option to samples/ilm/mmxcp Sorry I haven't gotten around to doing this ... Seems like a modest sized project... Avoids boiling the ocean and reinventing or hacking rsync. -- marc K _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Oct 22 22:15:17 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 22 Oct 2018 17:15:17 -0400 Subject: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca><92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu><3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> Message-ID: Just copy the extra attributes and ACL copy immediately after the cp. The window will be small, and if you think about it, the window of vulnerability is going to be there with a hacked rsync anyhow. There need not be any additional "passes". Once you put it into a single script, you have "one tool". From: Sven Oehme To: gpfsug main discussion list Date: 10/22/2018 04:33 PM Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp Sent by: gpfsug-discuss-bounces at spectrumscale.org Marc, The issue with that is that you need multiple passes and things change in between, it also significant increases migration times. You will always miss something or you need to manually correct. The right thing is to have 1 tool that takes care of both, the bulk transfer and the additional attributes. Sven From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Monday, October 22, 2018 at 1:09 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp Rather than hack rsync or cp ... I proposed a smallish utility that would copy those extended attributes and ACLs that cp -a just skips over. This can be done using the documented GPFS APIs that were designed for backup and restore of files. SMOP and then add it as an option to samples/ilm/mmxcp Sorry I haven't gotten around to doing this ... Seems like a modest sized project... Avoids boiling the ocean and reinventing or hacking rsync. -- marc K _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Tue Oct 23 00:45:05 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 22 Oct 2018 16:45:05 -0700 Subject: [gpfsug-discuss] =?utf-8?q?Still_Time_to_Register!_--_Spectrum_Sc?= =?utf-8?q?ale_User_Group_Meeting_=E2=80=93_NYC_-_New_York_Genome_Center?= In-Reply-To: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> References: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> Message-ID: <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> There?s still some room left for NYC event on THIS WEDNESDAY if you want to attend, use the link below. The current agenda is: 8:45 AM 9:00 AM Coffee & Registration Presenter 9:00 AM 9:15 AM Welcome Amy Hirst & Chris Black 9:15 AM 9:45 AM What is new in IBM Spectrum Scale? Piyush Chaudhary 9:45 AM 10:00 AM What is new in ESS? John Sing 10:00 AM 10:20 AM How does CORAL help other workloads? Kevin Gildea 10:20 AM 10:40 AM Break 10:40 AM 11:00 AM Customer Talk ? The New York Genome Center Chris Black 11:00 AM 11:20 AM Spinning up a Hadoop cluster on demand Piyush Chaudhary 11:20 AM 11:40 AM Customer Talk ? Mt. Sinai School of Medicine Francesca Tartaglione 11:40 AM 12:00 PM AI Reference Architecture Piyush Chaudhary 12:00 PM 12:50 PM Lunch 12:50 PM 1:30 PM Special Talk Joe Dain 1:30 PM 1:50 PM Multi-cloud Transparent Cloud Tiering Rob Basham 1:50 PM 2:10 PM Customer Talk ? Princeton University Curtis W. Hillegas 2:10 PM 2:30 PM Updates on Container Support John Lewars 2:30 PM 2:50 PM Customer Talk ? NYU Michael Costantino 2:50 PM 3:10 PM Spectrum Archive and TS1160 Carl Reasoner 3:10 PM 3:30 PM Break 3:30 PM 4:10 PM IBM Spectrum Scale Network Related Troubleshooting John Lewars & Felipe Knop 4:10 PM 4:40 PM Service Update Jim Doherty 4:40 PM 5:10 PM Open Forum 5:10 PM 5:30 PM Wrap-Up Social Event - Mezzanine at the Dominick Hotel (246 Spring Street) > On Sep 27, 2018, at 7:22 AM, Oesterlin, Robert wrote: > > For those of you in the NE US or NYC area, here is the agenda for the NYC meeting coming up on October 24th. Special thanks to Richard Rupp at IBM for helping to organize this event. If you can make it, please register at the Eventbrite link below. > > Spectrum Scale User Group ? NYC > October 24th, 2018 > The New York Genome Center > 101 Avenue of the Americas, New York, NY 10013 First Floor Auditorium > > Register Here: https://www.eventbrite.com/e/2018-spectrum-scale-user-group-nyc-tickets-49786782607 > > 08:45-09:00 Coffee & Registration > 09:00-09:15 Welcome > 09:15-09:45 What is new in IBM Spectrum Scale? > 09:45-10:00 What is new in ESS? > 10:00-10:20 How does CORAL help other workloads? > 10:20-10:40 --- Break --- > 10:40-11:00 Customer Talk ? The New York Genome Center > 11:00-11:20 Spinning up a Hadoop cluster on demand > 11:20-11:40 Customer Talk ? Mt. Sinai School of Medicine > 11:40-12:10 Spectrum Scale Network Flow > 12:10-13:00 --- Lunch --- > 13:00-13:40 Special Announcement and Demonstration > 13:40-14:00 Multi-cloud Transparent Cloud Tiering > 14:00-14:20 Customer Talk ? Princeton University > 14:20-14:40 AI Reference Architecture > 14:40-15:00 Updates on Container Support > 15:00-15:20 Customer Talk ? TBD > 15:20-15:40 --- Break --- > 15:40-16:10 IBM Spectrum Scale Tuning and Troubleshooting > 16:10-16:40 Service Update > 16:40-17:10 Open Forum > 17:10-17:30 Wrap-Up > 17:30- Social Event > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.schlipalius at pawsey.org.au Tue Oct 23 01:01:41 2018 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Tue, 23 Oct 2018 08:01:41 +0800 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 81, Issue 44 In-Reply-To: References: Message-ID: <8F05B8A3-B950-46E1-8711-2A5CC6D62BDA@pawsey.org.au> Hi So when we have migrated 1.6PB of data from one GPFS filesystems to another GPFS (over IB), we used dcp in github (with mmdsh). It just can be problematic to compile. I have used rsync with attrib and ACLs?s preserved in my previous job ? aka rsync -aAvz But DCP parallelises better, checksumming files and dirs. works and we used that to ensure nothing was lost. Worth a go! Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 13 Burvill Court Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au On 23/10/18, 4:08 am, "gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org" wrote: Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Best way to migrate data (Ryan Novosielski) 2. Re: Best way to migrate data (Sven Oehme) 3. Re: Best way to migrate data : mmfind ... mmxcp (Marc A Kaplan) ---------------------------------------------------------------------- Message: 1 Date: Mon, 22 Oct 2018 15:21:06 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Message-ID: <3023B88F-D115-4C0B-90DC-6EF711D858E6 at rutgers.edu> Content-Type: text/plain; charset="utf-8" It seems like the primary way that this helps us is that we transfer user home directories and many of them have VERY large numbers of small files (in the millions), so running multiple simultaneous rsyncs allows the transfer to continue past that one slow area. I guess it balances the bandwidth constraint and the I/O constraints on generating a file list. There are unfortunately one or two known bugs that slow it down ? it keeps track of its rsync PIDs but sometimes a former rsync PID is reused by the system which it counts against the number of running rsyncs. It can also think rsync is still running at the end when it?s really something else now using the PID. I know the author is looking at that. For shorter transfers, you likely won?t run into this. I?m not sure I have the time or the programming ability to make this happen, but it seems to me that one could make some major gains by replacing fpart with mmfind in a GPFS environment. Generating lists of files takes a significant amount of time and mmfind can probably do it faster than anything else that does not have direct access to GPFS metadata. > On Oct 19, 2018, at 6:37 AM, Dwayne.Hart at med.mun.ca wrote: > > Thank you Ryan. I?ll have a more in-depth look at this application later today and see how it deals with some of the large genetic files that are generated by the sequencer. By copying it from GPFS fs to another GPFS fs. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > >> On Oct 19, 2018, at 7:04 AM, Ryan Novosielski wrote: >> >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> We use parsyncfp. Our target is not GPFS, though. I was really hoping >> to hear about something snazzier for GPFS-GPFS. Lenovo would probably >> tell you that HSM is the way to go (we asked something similar for a >> replacement for our current setup or for distributed storage). >> >>> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: >>> Hi, >>> >>> Just wondering what the best recipe for migrating a user?s home >>> directory content from one GFPS file system to another which hosts >>> a larger research GPFS file system? I?m currently using rsync and >>> it has maxed out the client system?s IB interface. >>> >>> Best, Dwayne ? Dwayne Hart | Systems Administrator IV >>> >>> CHIA, Faculty of Medicine Memorial University of Newfoundland 300 >>> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L >>> Dobbin Building | 4M409 T 709 864 6631 >>> _______________________________________________ gpfsug-discuss >>> mailing list gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> - -- >> ____ >> || \\UTGERS, |----------------------*O*------------------------ >> ||_// the State | Ryan Novosielski - novosirj at rutgers.edu >> || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus >> || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark >> `' >> -----BEGIN PGP SIGNATURE----- >> >> iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG >> p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk >> =dMDg >> -----END PGP SIGNATURE----- >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ Message: 2 Date: Mon, 22 Oct 2018 11:11:06 -0700 From: Sven Oehme To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Message-ID: Content-Type: text/plain; charset="utf-8" i am not sure if that was mentioned already but in some version of V5.0.X based on my suggestion a tool was added by mark on a AS-IS basis (thanks mark) to do what you want with one exception : /usr/lpp/mmfs/samples/ilm/mmxcp -h Usage: /usr/lpp/mmfs/samples/ilm/mmxcp -t target -p strip_count source_pathname1 source_pathname2 ... Run "cp" in a mmfind ... -xarg ... pipeline, e.g. mmfind -polFlags '-N all -g /gpfs/tmp' /gpfs/source -gpfsWeight DIRECTORY_HASH -xargs mmxcp -t /target -p 2 Options: -t target_path : Copy files to this path. -p strip_count : Remove this many directory names from the pathnames of the source files. -a : pass -a to cp -v : pass -v to cp this is essentially a parallel copy tool using the policy with all its goddies. the one critical part thats missing is that it doesn't copy any GPFS specific metadata which unfortunate includes NFSV4 ACL's. the reason for that is that GPFS doesn't expose the NFSV4 ACl's via xattrs nor does any of the regular Linux tools uses the proprietary interface into GPFS to extract and apply them (this is what allows this magic unsupported version of rsync https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync to transfer the acls and other attributes). so a worth while RFE would be to either expose all special GPFS bits as xattrs or provide at least a maintained version of sync, cp or whatever which allows the transfer of this data. Sven On Mon, Oct 22, 2018 at 10:52 AM Ryan Novosielski wrote: > It seems like the primary way that this helps us is that we transfer user > home directories and many of them have VERY large numbers of small files > (in the millions), so running multiple simultaneous rsyncs allows the > transfer to continue past that one slow area. I guess it balances the > bandwidth constraint and the I/O constraints on generating a file list. > There are unfortunately one or two known bugs that slow it down ? it keeps > track of its rsync PIDs but sometimes a former rsync PID is reused by the > system which it counts against the number of running rsyncs. It can also > think rsync is still running at the end when it?s really something else now > using the PID. I know the author is looking at that. For shorter transfers, > you likely won?t run into this. > > I?m not sure I have the time or the programming ability to make this > happen, but it seems to me that one could make some major gains by > replacing fpart with mmfind in a GPFS environment. Generating lists of > files takes a significant amount of time and mmfind can probably do it > faster than anything else that does not have direct access to GPFS metadata. > > > On Oct 19, 2018, at 6:37 AM, Dwayne.Hart at med.mun.ca wrote: > > > > Thank you Ryan. I?ll have a more in-depth look at this application later > today and see how it deals with some of the large genetic files that are > generated by the sequencer. By copying it from GPFS fs to another GPFS fs. > > > > Best, > > Dwayne > > ? > > Dwayne Hart | Systems Administrator IV > > > > CHIA, Faculty of Medicine > > Memorial University of Newfoundland > > 300 Prince Philip Drive > > St. John?s, Newfoundland | A1B 3V6 > > Craig L Dobbin Building | 4M409 > > T 709 864 6631 <(709)%20864-6631> > > > >> On Oct 19, 2018, at 7:04 AM, Ryan Novosielski > wrote: > >> > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> We use parsyncfp. Our target is not GPFS, though. I was really hoping > >> to hear about something snazzier for GPFS-GPFS. Lenovo would probably > >> tell you that HSM is the way to go (we asked something similar for a > >> replacement for our current setup or for distributed storage). > >> > >>> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: > >>> Hi, > >>> > >>> Just wondering what the best recipe for migrating a user?s home > >>> directory content from one GFPS file system to another which hosts > >>> a larger research GPFS file system? I?m currently using rsync and > >>> it has maxed out the client system?s IB interface. > >>> > >>> Best, Dwayne ? Dwayne Hart | Systems Administrator IV > >>> > >>> CHIA, Faculty of Medicine Memorial University of Newfoundland 300 > >>> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L > >>> Dobbin Building | 4M409 T 709 864 6631 <(709)%20864-6631> > >>> _______________________________________________ gpfsug-discuss > >>> mailing list gpfsug-discuss at spectrumscale.org > >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >>> > >> > >> - -- > >> ____ > >> || \\UTGERS, |----------------------*O*------------------------ > >> ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > >> || \\ University | Sr. Technologist - 973/972.0922 <(973)%20972-0922> > ~*~ RBHS Campus > >> || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark > >> `' > >> -----BEGIN PGP SIGNATURE----- > >> > >> iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG > >> p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk > >> =dMDg > >> -----END PGP SIGNATURE----- > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Mon, 22 Oct 2018 16:08:49 -0400 From: "Marc A Kaplan" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp Message-ID: Content-Type: text/plain; charset="us-ascii" Rather than hack rsync or cp ... I proposed a smallish utility that would copy those extended attributes and ACLs that cp -a just skips over. This can be done using the documented GPFS APIs that were designed for backup and restore of files. SMOP and then add it as an option to samples/ilm/mmxcp Sorry I haven't gotten around to doing this ... Seems like a modest sized project... Avoids boiling the ocean and reinventing or hacking rsync. -- marc K -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 81, Issue 44 ********************************************** From Alexander.Saupp at de.ibm.com Tue Oct 23 06:51:54 2018 From: Alexander.Saupp at de.ibm.com (Alexander Saupp) Date: Tue, 23 Oct 2018 07:51:54 +0200 Subject: [gpfsug-discuss] Best way to migrate data : Plan B: policy engine + rsync Message-ID: Hi, I agree, a tool with proper wrapping delivered in samples would be the right approach. No warranty, no support - below a prototype I documented 2 years ago (prior to mmfind availability). The BP used an alternate approach, so its not tested at scale, but the principle was tested and works. Reading through it right now I'd re-test the 'deleted files on destination that were deleted on the source' scenario, that might now require some fixing. # Use 'GPFS patched' rsync on both ends to keep GPFS attributes https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync # Policy - initial & differential (add mod_time > .. for incremental runs. Use MOD_TIME < .. to have a defined start for the next incremental rsync, remove it for the 'final' rsync) # http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_usngfileattrbts.htm cat /tmp/policy.pol RULE 'mmfind' ??? LIST 'mmfindList' ??? DIRECTORIES_PLUS ??? SHOW( ????????? VARCHAR(MODE) || ' ' || ????????? VARCHAR(NLINK) || ' ' || ????????? VARCHAR(USER_ID) || ' ' || ????????? VARCHAR(GROUP_ID) || ' ' || ????????? VARCHAR(FILE_SIZE) || ' ' || ????????? VARCHAR(KB_ALLOCATED) || ' ' || ????????? VARCHAR(POOL_NAME) || ' ' || ????????? VARCHAR(MISC_ATTRIBUTES) || ' ' || ????????? VARCHAR(ACCESS_TIME) || ' ' || ????????? VARCHAR(CREATION_TIME) || ' ' || ????????? VARCHAR(MODIFICATION_TIME) ??????? ) # First run ??? WHERE MODIFICATION_TIME < TIMESTAMP('2016-08-10 00:00:00') # Incremental runs ??? WHERE MODIFICATION_TIME > TIMESTAMP('2016-08-10 00:00:00') and MODIFICATION_TIME < TIMESTAMP('2016-08-20 00:00:00') # Final run during maintenance, should also do deletes, ensure you to call rsync the proper way (--delete) ??? WHERE TRUE # Apply policy, defer will ensure the result file(s) are not deleted mmapplypolicy? group3fs -P /tmp/policy.pol? -f /ibm/group3fs/pol.txt -I defer # FYI only - look at results, ... not required # cat /ibm/group3fs/pol.txt.list.mmfindList 3 1 0? drwxr-xr-x 4 0 0 262144 512 system D2u 2016-08-25 08:30:35.053057 -- /ibm/group3fs 41472 1077291531 0? drwxr-xr-x 5 0 0 4096 0 system D2u 2016-08-18 21:07:36.996777 -- /ibm/group3fs/ces 60416 842873924 0? drwxr-xr-x 4 0 0 4096 0 system D2u 2016-08-18 21:07:45.947920 -- /ibm/group3fs/ces/ha 60417 2062486126 0? -rw-r--r-- 1 0 0 0 0 system FAu 2016-08-19 15:17:57.428922 -- /ibm/group3fs/ces/ha/.dummy 60418 436745294 0? drwxr-xr-x 4 0 0 4096 0 system D2u 2016-08-18 21:05:54.482094 -- /ibm/group3fs/ces/ces 60419 647668346 0? -rw-r--r-- 1 0 0 0 0 system FAu 2016-08-19 15:17:57.484923 -- /ibm/group3fs/ces/ces/.dummy 60420 1474765985 0? -rw-r--r-- 1 0 0 0 0 system FAu 2016-08-18 21:06:43.133640 -- /ibm/group3fs/ces/ces/addrs/1471554403-node0-9.155.118.69 60421 1020724013 0? drwxr-xr-x 2 0 0 4096 0 system D2um 2016-08-18 21:07:37.000695 -- /ibm/group3fs/ces/ganesha cat /ibm/group3fs/pol.txt.list.mmfindList? |awk ' { print $19}' /ibm/group3fs/ces/ha/.dummy /ibm/group3fs/ces/ces/.dummy /ibm/group3fs/ces/ha/nfs/ganesha/v4recov/node3 /ibm/group3fs/ces/ha/nfs/ganesha/v4old/node3 /ibm/group3fs/pol.txt.list.mmfindList /ibm/group3fs/ces/ces/connections /ibm/group3fs/ces/ha/nfs/ganesha/gpfs-epoch /ibm/group3fs/ces/ha/nfs/ganesha/v4recov /ibm/group3fs/ces/ha/nfs/ganesha/v4old # Start rsync - could split up single result file into multiple ones for parallel / multi node runs rsync -av --gpfs-attrs --progress --files-from $ ( cat /ibm/group3fs/pol.txt.list.mmfindList ) 10.10.10.10:/path Be sure you verify that extended attributes are properly replicated. I have in mind that you need to ensure the 'remote' rsync is not the default one, but the one with GPFS capabilities (rsync -e "remoteshell"). Kind regards, Alex Saupp Mit freundlichen Gr??en / Kind regards Alexander Saupp IBM Systems, Storage Platform, EMEA Storage Competence Center Phone: +49 7034-643-1512 IBM Deutschland GmbH Mobile: +49-172 7251072 Am Weiher 24 Email: alexander.saupp at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C800025.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Tue Oct 23 09:31:03 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 23 Oct 2018 08:31:03 +0000 Subject: [gpfsug-discuss] Best way to migrate data : Plan B: policy engine + rsync In-Reply-To: References: Message-ID: I should note, there is a PR there which adds symlink support as well to the patched rsync version ? It is quite an old version of rsync now, and I don?t know if it?s been tested with a newer release. Simon From: on behalf of "Alexander.Saupp at de.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 23 October 2018 at 06:52 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Best way to migrate data : Plan B: policy engine + rsync # Use 'GPFS patched' rsync on both ends to keep GPFS attributes https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync -------------- next part -------------- An HTML attachment was scrubbed... URL: From george at markomanolis.com Wed Oct 24 13:43:23 2018 From: george at markomanolis.com (George Markomanolis) Date: Wed, 24 Oct 2018 08:43:23 -0400 Subject: [gpfsug-discuss] IO500 - Call for Submission for SC18 Message-ID: Dear all, Please consider the submission of results to the new list. Deadline: 10 November 2018 AoE The IO500 is now accepting and encouraging submissions for the upcoming IO500 list revealed at Supercomputing 2018 in Dallas, Texas. We also announce the 10 compute node I/O challenge to encourage submission of small-scale results. The new ranked lists will be announced at our SC18 BOF on Wednesday, November 14th at 5:15pm. We hope to see you, and your results, there. The benchmark suite is designed to be easy to run and the community has multiple active support channels to help with any questions. Please submit and we look forward to seeing many of you at SC 2018! Please note that submissions of all size are welcome; the site has customizable sorting so it is possible to submit on a small system and still get a very good per-client score for example. Additionally, the list is about much more than just the raw rank; all submissions help the community by collecting and publishing a wider corpus of data. More details below. Following the success of the Top500 in collecting and analyzing historical trends in supercomputer technology and evolution, the IO500 was created in 2017 and published its first list at SC17. The need for such an initiative has long been known within High-Performance Computing; however, defining appropriate benchmarks had long been challenging. Despite this challenge, the community, after long and spirited discussion, finally reached consensus on a suite of benchmarks and a metric for resolving the scores into a single ranking. The multi-fold goals of the benchmark suite are as follows: Maximizing simplicity in running the benchmark suite Encouraging complexity in tuning for performance Allowing submitters to highlight their ?hero run? performance numbers Forcing submitters to simultaneously report performance for challenging IO patterns. Specifically, the benchmark suite includes a hero-run of both IOR and mdtest configured, however, possible to maximize performance and establish an upper-bound for performance. It also includes an IOR and mdtest run with highly prescribed parameters in an attempt to determine a lower-bound. Finally, it includes a namespace search as this has been determined to be a highly sought-after feature in HPC storage systems that have historically not been well measured. Submitters are encouraged to share their tuning insights for publication. The goals of the community are also multi-fold: Gather historical data for the sake of analysis and to aid predictions of storage futures Collect tuning information to share valuable performance optimizations across the community Encourage vendors and designers to optimize for workloads beyond ?hero runs? Establish bounded expectations for users, procurers, and administrators 10 Compute Node I/O Challenge At SC, we will announce another IO-500 award for the 10 Compute Node I/O Challenge. This challenge is conducted using the regular IO-500 benchmark, however, with the rule that exactly 10 computes nodes must be used to run the benchmark (one exception is find, which may use 1 node). You may use any shared storage with, e.g., any number of servers. When submitting for the IO-500 list, you can opt-in for ?Participate in the 10 compute node challenge only?, then we won't include the results into the ranked list. Other 10 compute node submission will be included in the full list and in the ranked list. We will announce the result in a separate derived list and in the full list but not on the ranked IO-500 list at io500.org. Birds-of-a-feather Once again, we encourage you to submit [1], to join our community, and to attend our BoF ?The IO-500 and the Virtual Institute of I/O? at SC 2018 [2] where we will announce the third ever IO500 list. The current list includes results from BeeGPFS, DataWarp, IME, Lustre, and Spectrum Scale. We hope that the next list has even more. We look forward to answering any questions or concerns you might have. [1] http://io500.org/submission [2] https://sc18.supercomputing.org/presentation/?id=bof134&sess=sess390 -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Oct 24 21:53:21 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 24 Oct 2018 20:53:21 +0000 Subject: [gpfsug-discuss] Spectrum Scale User Group@CIUK - call for user speakers Message-ID: Hi All, I know December is a little way off, but as usual we'll be holding a Spectrum Scale user group breakout session as part of CIUK here in the UK in December. As a breakout session its only a couple of hours... We're just looking at the agenda, I have a couple of IBM sessions in and Sven has agreed to give a talk as he'll be there as well. I'm looking for a couple of user talks to finish of the agenda. Whether you are a small deployment or large, we're interested in hearing from you! Note: you must be registered to attend CIUK to attend this user group. Registration is via the CIUK website: https://www.scd.stfc.ac.uk/Pages/CIUK2018.aspxhttps://www.scd.stfc.ac.uk/Pages/CIUK2018.aspx Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.dietrich at desy.de Thu Oct 25 13:12:07 2018 From: stefan.dietrich at desy.de (Dietrich, Stefan) Date: Thu, 25 Oct 2018 14:12:07 +0200 (CEST) Subject: [gpfsug-discuss] Nested NFSv4 Exports Message-ID: <1497297460.32545018.1540469527448.JavaMail.zimbra@desy.de> Hi, I am currently fiddling around with some nested NFSv4 exports and the differing behaviour to NFSv3. The environment is a GPFS 5.0.1 with enabled CES, so Ganesha is used as the NFS server. Given the following (pseudo) directory structure: /gpfs/filesystem1/directory1 /gpfs/filesystem1/directory1/sub-directory1 /gpfs/filesystem1/directory1/sub-directory2 Now to the exports: /gpfs/filesystem1/directory1 is exported to client1 as read-only. /gpfs/filesystem1/directory1/sub-directory1 is exported to client2 as read-write. client2 is not included in the export for /gpfs/filesystem1/directory1. Mounting /gpfs/filesystem1/directory1 on client1 works as expected. Mounting /gpfs/filesystem1/directory1/sub-directory1 on client2 does not work and results in a permission denied. If I change the protocol from NFSv4 to NFSv3, it works. There is a section about nested NFS exports in the mmnfs doc: Creating nested exports (such as /path/to/folder and /path/to/folder/subfolder) is strongly discouraged since this might lead to serious issues in data consistency. Be very cautious when creating and using nested exports. If there is a need to have nested exports (such as /path/to/folder and /path/to/folder/inside/subfolder), NFSv4 client that mounts the parent (/path/to/folder) export will not be able to see the child export subtree (/path/to/folder/inside/subfolder) unless the same client is explicitly allowed to access the child export as well. This is okay as long as the client uses only NFSv4 mounts. The Linux kernel NFS server and other NFSv4 servers do not show this behaviour. Is there a way to bypass this with CES/Ganesha? Or is the only solution to add client2 to /gpfs/filesystem1/directory1? Regards, Stefan -- ------------------------------------------------------------------------ Stefan Dietrich Deutsches Elektronen-Synchrotron (IT-Systems) Ein Forschungszentrum der Helmholtz-Gemeinschaft Notkestr. 85 phone: +49-40-8998-4696 22607 Hamburg e-mail: stefan.dietrich at desy.de Germany ------------------------------------------------------------------------ From dyoung at pixitmedia.com Thu Oct 25 17:59:08 2018 From: dyoung at pixitmedia.com (Dan Young) Date: Thu, 25 Oct 2018 12:59:08 -0400 Subject: [gpfsug-discuss] =?utf-8?q?Still_Time_to_Register!_--_Spectrum_Sc?= =?utf-8?q?ale_User_Group_Meeting_=E2=80=93_NYC_-_New_York_Genome_C?= =?utf-8?q?enter?= In-Reply-To: <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> References: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> Message-ID: Did I miss where these presentations were uploaded? People kept saying throughout the day that these would be uploaded somewhere. On Mon, 22 Oct 2018 at 19:45, Kristy Kallback-Rose wrote: > There?s still some room left for NYC event on THIS WEDNESDAY if you want > to attend, use the link below. > > *The current agenda is:* > > 8:45 AM > 9:00 AM > Coffee & Registration Presenter > > 9:00 AM > 9:15 AM > Welcome Amy Hirst & Chris Black > > 9:15 AM > 9:45 AM > What is new in IBM Spectrum Scale? Piyush Chaudhary > > 9:45 AM > 10:00 AM > What is new in ESS? John Sing > > 10:00 AM > 10:20 AM > How does CORAL help other workloads? Kevin Gildea > > 10:20 AM > 10:40 AM > Break > > 10:40 AM > 11:00 AM > Customer Talk ? The New York Genome Center Chris Black > > 11:00 AM > 11:20 AM > Spinning up a Hadoop cluster on demand Piyush Chaudhary > > 11:20 AM > 11:40 AM > Customer Talk ? Mt. Sinai School of Medicine Francesca Tartaglione > > 11:40 AM > 12:00 PM > AI Reference Architecture Piyush Chaudhary > > 12:00 PM > 12:50 PM > Lunch > > 12:50 PM > 1:30 PM > Special Talk Joe Dain > > 1:30 PM > 1:50 PM > Multi-cloud Transparent Cloud Tiering Rob Basham > > 1:50 PM > 2:10 PM > Customer Talk ? Princeton University Curtis W. Hillegas > > 2:10 PM > 2:30 PM > Updates on Container Support John Lewars > > 2:30 PM > 2:50 PM > Customer Talk ? NYU Michael Costantino > > 2:50 PM > 3:10 PM > Spectrum Archive and TS1160 Carl Reasoner > > 3:10 PM > 3:30 PM > Break > > 3:30 PM > 4:10 PM > IBM Spectrum Scale Network Related Troubleshooting John Lewars & Felipe > Knop > > 4:10 PM > 4:40 PM > Service Update Jim Doherty > > 4:40 PM > 5:10 PM > Open Forum > > 5:10 PM > 5:30 PM > Wrap-Up > Social Event - Mezzanine at the Dominick Hotel (246 Spring Street) > > On Sep 27, 2018, at 7:22 AM, Oesterlin, Robert < > Robert.Oesterlin at nuance.com> wrote: > > For those of you in the NE US or NYC area, here is the agenda for the NYC > meeting coming up on October 24th. Special thanks to Richard Rupp at IBM > for helping to organize this event. If you can make it, please register at > the Eventbrite link below. > > Spectrum Scale User Group ? NYC > October 24th, 2018 > The New York Genome Center > 101 Avenue of the Americas, New York, NY 10013 First Floor Auditorium > > Register Here: > https://www.eventbrite.com/e/2018-spectrum-scale-user-group-nyc-tickets-49786782607 > > 08:45-09:00 Coffee & Registration > 09:00-09:15 Welcome > 09:15-09:45 What is new in IBM Spectrum Scale? > 09:45-10:00 What is new in ESS? > 10:00-10:20 How does CORAL help other workloads? > 10:20-10:40 --- Break --- > 10:40-11:00 Customer Talk ? The New York Genome Center > 11:00-11:20 Spinning up a Hadoop cluster on demand > 11:20-11:40 Customer Talk ? Mt. Sinai School of Medicine > 11:40-12:10 Spectrum Scale Network Flow > 12:10-13:00 --- Lunch --- > 13:00-13:40 Special Announcement and Demonstration > 13:40-14:00 Multi-cloud Transparent Cloud Tiering > 14:00-14:20 Customer Talk ? Princeton University > 14:20-14:40 AI Reference Architecture > 14:40-15:00 Updates on Container Support > 15:00-15:20 Customer Talk ? TBD > 15:20-15:40 --- Break --- > 15:40-16:10 IBM Spectrum Scale Tuning and Troubleshooting > 16:10-16:40 Service Update > 16:40-17:10 Open Forum > 17:10-17:30 Wrap-Up > 17:30- Social Event > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- *Dan Young* Solutions Architect, Pixit Media +1-347-249-7413 | dyoung at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Thu Oct 25 18:01:39 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 25 Oct 2018 10:01:39 -0700 Subject: [gpfsug-discuss] =?utf-8?q?Still_Time_to_Register!_--_Spectrum_Sc?= =?utf-8?q?ale_User_Group_Meeting_=E2=80=93_NYC_-_New_York_Genome_Center?= In-Reply-To: References: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> Message-ID: Checking? -Kristy > On Oct 25, 2018, at 9:59 AM, Dan Young wrote: > > Did I miss where these presentations were uploaded? People kept saying throughout the day that these would be uploaded somewhere. > > On Mon, 22 Oct 2018 at 19:45, Kristy Kallback-Rose > wrote: > There?s still some room left for NYC event on THIS WEDNESDAY if you want to attend, use the link below. > > The current agenda is: > > 8:45 AM > 9:00 AM > Coffee & Registration Presenter > 9:00 AM > 9:15 AM > Welcome Amy Hirst & Chris Black > 9:15 AM > 9:45 AM > What is new in IBM Spectrum Scale? Piyush Chaudhary > 9:45 AM > 10:00 AM > What is new in ESS? John Sing > 10:00 AM > 10:20 AM > How does CORAL help other workloads? Kevin Gildea > 10:20 AM > 10:40 AM > Break > 10:40 AM > 11:00 AM > Customer Talk ? The New York Genome Center Chris Black > 11:00 AM > 11:20 AM > Spinning up a Hadoop cluster on demand Piyush Chaudhary > 11:20 AM > 11:40 AM > Customer Talk ? Mt. Sinai School of Medicine Francesca Tartaglione > 11:40 AM > 12:00 PM > AI Reference Architecture Piyush Chaudhary > 12:00 PM > 12:50 PM > Lunch > 12:50 PM > 1:30 PM > Special Talk Joe Dain > 1:30 PM > 1:50 PM > Multi-cloud Transparent Cloud Tiering Rob Basham > 1:50 PM > 2:10 PM > Customer Talk ? Princeton University Curtis W. Hillegas > 2:10 PM > 2:30 PM > Updates on Container Support John Lewars > 2:30 PM > 2:50 PM > Customer Talk ? NYU Michael Costantino > 2:50 PM > 3:10 PM > Spectrum Archive and TS1160 Carl Reasoner > 3:10 PM > 3:30 PM > Break > 3:30 PM > 4:10 PM > IBM Spectrum Scale Network Related Troubleshooting John Lewars & Felipe Knop > 4:10 PM > 4:40 PM > Service Update Jim Doherty > 4:40 PM > 5:10 PM > Open Forum > 5:10 PM > 5:30 PM > Wrap-Up > Social Event - Mezzanine at the Dominick Hotel (246 Spring Street) > > >> On Sep 27, 2018, at 7:22 AM, Oesterlin, Robert > wrote: >> >> For those of you in the NE US or NYC area, here is the agenda for the NYC meeting coming up on October 24th. Special thanks to Richard Rupp at IBM for helping to organize this event. If you can make it, please register at the Eventbrite link below. >> >> Spectrum Scale User Group ? NYC >> October 24th, 2018 >> The New York Genome Center >> 101 Avenue of the Americas, New York, NY 10013 First Floor Auditorium >> >> Register Here: https://www.eventbrite.com/e/2018-spectrum-scale-user-group-nyc-tickets-49786782607 >> >> 08:45-09:00 Coffee & Registration >> 09:00-09:15 Welcome >> 09:15-09:45 What is new in IBM Spectrum Scale? >> 09:45-10:00 What is new in ESS? >> 10:00-10:20 How does CORAL help other workloads? >> 10:20-10:40 --- Break --- >> 10:40-11:00 Customer Talk ? The New York Genome Center >> 11:00-11:20 Spinning up a Hadoop cluster on demand >> 11:20-11:40 Customer Talk ? Mt. Sinai School of Medicine >> 11:40-12:10 Spectrum Scale Network Flow >> 12:10-13:00 --- Lunch --- >> 13:00-13:40 Special Announcement and Demonstration >> 13:40-14:00 Multi-cloud Transparent Cloud Tiering >> 14:00-14:20 Customer Talk ? Princeton University >> 14:20-14:40 AI Reference Architecture >> 14:40-15:00 Updates on Container Support >> 15:00-15:20 Customer Talk ? TBD >> 15:20-15:40 --- Break --- >> 15:40-16:10 IBM Spectrum Scale Tuning and Troubleshooting >> 16:10-16:40 Service Update >> 16:40-17:10 Open Forum >> 17:10-17:30 Wrap-Up >> 17:30- Social Event >> >> >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > > Dan Young > Solutions Architect, Pixit Media > +1-347-249-7413 | dyoung at pixitmedia.com > www.pixitmedia.com | Tw:@pixitmedia > > This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Oct 26 01:54:13 2018 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 26 Oct 2018 00:54:13 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Still_Time_to_Register!_--_Spectrum_Sc?= =?utf-8?q?ale_User_Group_Meeting_=E2=80=93_NYC_-_New_York_Genome_Center?= In-Reply-To: References: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> Message-ID: What they said was ?spectrumscale.org?. I suspect they?ll wind up here: http://www.spectrumscaleug.org/presentations/ -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Oct 25, 2018, at 12:59 PM, Dan Young wrote: > > Did I miss where these presentations were uploaded? People kept saying throughout the day that these would be uploaded somewhere. > > On Mon, 22 Oct 2018 at 19:45, Kristy Kallback-Rose wrote: > There?s still some room left for NYC event on THIS WEDNESDAY if you want to attend, use the link below. > > The current agenda is: > > 8:45 AM > 9:00 AM > Coffee & Registration Presenter > 9:00 AM > 9:15 AM > Welcome Amy Hirst & Chris Black > 9:15 AM > 9:45 AM > What is new in IBM Spectrum Scale? Piyush Chaudhary > 9:45 AM > 10:00 AM > What is new in ESS? John Sing > 10:00 AM > 10:20 AM > How does CORAL help other workloads? Kevin Gildea > 10:20 AM > 10:40 AM > Break > 10:40 AM > 11:00 AM > Customer Talk ? The New York Genome Center Chris Black > 11:00 AM > 11:20 AM > Spinning up a Hadoop cluster on demand Piyush Chaudhary > 11:20 AM > 11:40 AM > Customer Talk ? Mt. Sinai School of Medicine Francesca Tartaglione > 11:40 AM > 12:00 PM > AI Reference Architecture Piyush Chaudhary > 12:00 PM > 12:50 PM > Lunch > 12:50 PM > 1:30 PM > Special Talk Joe Dain > 1:30 PM > 1:50 PM > Multi-cloud Transparent Cloud Tiering Rob Basham > 1:50 PM > 2:10 PM > Customer Talk ? Princeton University Curtis W. Hillegas > 2:10 PM > 2:30 PM > Updates on Container Support John Lewars > 2:30 PM > 2:50 PM > Customer Talk ? NYU Michael Costantino > 2:50 PM > 3:10 PM > Spectrum Archive and TS1160 Carl Reasoner > 3:10 PM > 3:30 PM > Break > 3:30 PM > 4:10 PM > IBM Spectrum Scale Network Related Troubleshooting John Lewars & Felipe Knop > 4:10 PM > 4:40 PM > Service Update Jim Doherty > 4:40 PM > 5:10 PM > Open Forum > 5:10 PM > 5:30 PM > Wrap-Up > Social Event - Mezzanine at the Dominick Hotel (246 Spring Street) > > >> On Sep 27, 2018, at 7:22 AM, Oesterlin, Robert wrote: >> >> For those of you in the NE US or NYC area, here is the agenda for the NYC meeting coming up on October 24th. Special thanks to Richard Rupp at IBM for helping to organize this event. If you can make it, please register at the Eventbrite link below. >> >> Spectrum Scale User Group ? NYC >> October 24th, 2018 >> The New York Genome Center >> 101 Avenue of the Americas, New York, NY 10013 First Floor Auditorium >> >> Register Here: https://www.eventbrite.com/e/2018-spectrum-scale-user-group-nyc-tickets-49786782607 >> >> 08:45-09:00 Coffee & Registration >> 09:00-09:15 Welcome >> 09:15-09:45 What is new in IBM Spectrum Scale? >> 09:45-10:00 What is new in ESS? >> 10:00-10:20 How does CORAL help other workloads? >> 10:20-10:40 --- Break --- >> 10:40-11:00 Customer Talk ? The New York Genome Center >> 11:00-11:20 Spinning up a Hadoop cluster on demand >> 11:20-11:40 Customer Talk ? Mt. Sinai School of Medicine >> 11:40-12:10 Spectrum Scale Network Flow >> 12:10-13:00 --- Lunch --- >> 13:00-13:40 Special Announcement and Demonstration >> 13:40-14:00 Multi-cloud Transparent Cloud Tiering >> 14:00-14:20 Customer Talk ? Princeton University >> 14:20-14:40 AI Reference Architecture >> 14:40-15:00 Updates on Container Support >> 15:00-15:20 Customer Talk ? TBD >> 15:20-15:40 --- Break --- >> 15:40-16:10 IBM Spectrum Scale Tuning and Troubleshooting >> 16:10-16:40 Service Update >> 16:40-17:10 Open Forum >> 17:10-17:30 Wrap-Up >> 17:30- Social Event >> >> >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > > Dan Young > Solutions Architect, Pixit Media > +1-347-249-7413 | dyoung at pixitmedia.com > www.pixitmedia.com | Tw:@pixitmedia > > > This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From kkr at lbl.gov Fri Oct 26 04:36:50 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 25 Oct 2018 20:36:50 -0700 Subject: [gpfsug-discuss] =?utf-8?q?Still_Time_to_Register!_--_Spectrum_Sc?= =?utf-8?q?ale_User_Group_Meeting_=E2=80=93_NYC_-_New_York_Genome_Center?= In-Reply-To: References: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> Message-ID: Yup. Richard is collecting them and we will upload afterwards. Sent from my iPhone > On Oct 25, 2018, at 5:54 PM, Ryan Novosielski wrote: > > What they said was ?spectrumscale.org?. I suspect they?ll wind up here: http://www.spectrumscaleug.org/presentations/ > > -- > ____ > || \\UTGERS, |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > `' > >> On Oct 25, 2018, at 12:59 PM, Dan Young wrote: >> >> Did I miss where these presentations were uploaded? People kept saying throughout the day that these would be uploaded somewhere. >> >> On Mon, 22 Oct 2018 at 19:45, Kristy Kallback-Rose wrote: >> There?s still some room left for NYC event on THIS WEDNESDAY if you want to attend, use the link below. >> >> The current agenda is: >> >> 8:45 AM >> 9:00 AM >> Coffee & Registration Presenter >> 9:00 AM >> 9:15 AM >> Welcome Amy Hirst & Chris Black >> 9:15 AM >> 9:45 AM >> What is new in IBM Spectrum Scale? Piyush Chaudhary >> 9:45 AM >> 10:00 AM >> What is new in ESS? John Sing >> 10:00 AM >> 10:20 AM >> How does CORAL help other workloads? Kevin Gildea >> 10:20 AM >> 10:40 AM >> Break >> 10:40 AM >> 11:00 AM >> Customer Talk ? The New York Genome Center Chris Black >> 11:00 AM >> 11:20 AM >> Spinning up a Hadoop cluster on demand Piyush Chaudhary >> 11:20 AM >> 11:40 AM >> Customer Talk ? Mt. Sinai School of Medicine Francesca Tartaglione >> 11:40 AM >> 12:00 PM >> AI Reference Architecture Piyush Chaudhary >> 12:00 PM >> 12:50 PM >> Lunch >> 12:50 PM >> 1:30 PM >> Special Talk Joe Dain >> 1:30 PM >> 1:50 PM >> Multi-cloud Transparent Cloud Tiering Rob Basham >> 1:50 PM >> 2:10 PM >> Customer Talk ? Princeton University Curtis W. Hillegas >> 2:10 PM >> 2:30 PM >> Updates on Container Support John Lewars >> 2:30 PM >> 2:50 PM >> Customer Talk ? NYU Michael Costantino >> 2:50 PM >> 3:10 PM >> Spectrum Archive and TS1160 Carl Reasoner >> 3:10 PM >> 3:30 PM >> Break >> 3:30 PM >> 4:10 PM >> IBM Spectrum Scale Network Related Troubleshooting John Lewars & Felipe Knop >> 4:10 PM >> 4:40 PM >> Service Update Jim Doherty >> 4:40 PM >> 5:10 PM >> Open Forum >> 5:10 PM >> 5:30 PM >> Wrap-Up >> Social Event - Mezzanine at the Dominick Hotel (246 Spring Street) >> >> >>> On Sep 27, 2018, at 7:22 AM, Oesterlin, Robert wrote: >>> >>> For those of you in the NE US or NYC area, here is the agenda for the NYC meeting coming up on October 24th. Special thanks to Richard Rupp at IBM for helping to organize this event. If you can make it, please register at the Eventbrite link below. >>> >>> Spectrum Scale User Group ? NYC >>> October 24th, 2018 >>> The New York Genome Center >>> 101 Avenue of the Americas, New York, NY 10013 First Floor Auditorium >>> >>> Register Here: https://www.eventbrite.com/e/2018-spectrum-scale-user-group-nyc-tickets-49786782607 >>> >>> 08:45-09:00 Coffee & Registration >>> 09:00-09:15 Welcome >>> 09:15-09:45 What is new in IBM Spectrum Scale? >>> 09:45-10:00 What is new in ESS? >>> 10:00-10:20 How does CORAL help other workloads? >>> 10:20-10:40 --- Break --- >>> 10:40-11:00 Customer Talk ? The New York Genome Center >>> 11:00-11:20 Spinning up a Hadoop cluster on demand >>> 11:20-11:40 Customer Talk ? Mt. Sinai School of Medicine >>> 11:40-12:10 Spectrum Scale Network Flow >>> 12:10-13:00 --- Lunch --- >>> 13:00-13:40 Special Announcement and Demonstration >>> 13:40-14:00 Multi-cloud Transparent Cloud Tiering >>> 14:00-14:20 Customer Talk ? Princeton University >>> 14:20-14:40 AI Reference Architecture >>> 14:40-15:00 Updates on Container Support >>> 15:00-15:20 Customer Talk ? TBD >>> 15:20-15:40 --- Break --- >>> 15:40-16:10 IBM Spectrum Scale Tuning and Troubleshooting >>> 16:10-16:40 Service Update >>> 16:40-17:10 Open Forum >>> 17:10-17:30 Wrap-Up >>> 17:30- Social Event >>> >>> >>> Bob Oesterlin >>> Sr Principal Storage Engineer, Nuance >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> -- >> >> Dan Young >> Solutions Architect, Pixit Media >> +1-347-249-7413 | dyoung at pixitmedia.com >> www.pixitmedia.com | Tw:@pixitmedia >> >> >> This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mnaineni at in.ibm.com Fri Oct 26 06:09:45 2018 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Fri, 26 Oct 2018 05:09:45 +0000 Subject: [gpfsug-discuss] Nested NFSv4 Exports In-Reply-To: <1497297460.32545018.1540469527448.JavaMail.zimbra@desy.de> References: <1497297460.32545018.1540469527448.JavaMail.zimbra@desy.de> Message-ID: An HTML attachment was scrubbed... URL: From stefan.dietrich at desy.de Fri Oct 26 12:18:20 2018 From: stefan.dietrich at desy.de (Dietrich, Stefan) Date: Fri, 26 Oct 2018 13:18:20 +0200 (CEST) Subject: [gpfsug-discuss] Nested NFSv4 Exports In-Reply-To: References: <1497297460.32545018.1540469527448.JavaMail.zimbra@desy.de> Message-ID: <2127020802.32763936.1540552700548.JavaMail.zimbra@desy.de> Hi Malhal, thanks for the input. I did already run Ganesha in debug mode, maybe this snippet I saved from that time might be helpful: 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] export_check_access :EXPORT :M_DBG :Check for address 192.168.142.92 for export id 3 fullpath /gpfs/exfel/d/proc 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] client_match :EXPORT :M_DBG :Match 0x941550, type = HOSTIF_CLIENT, options 0x42302050 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] LogClientListEntry :EXPORT :M_DBG : 0x941550 HOSTIF_CLIENT: 192.168.8.32 (root_squash , R-r-, 34-, ---, TCP, ----, M anage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] client_match :EXPORT :M_DBG :Match 0x940c90, type = HOSTIF_CLIENT, options 0x42302050 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] LogClientListEntry :EXPORT :M_DBG : 0x940c90 HOSTIF_CLIENT: 192.168.8.33 (root_squash , R-r-, 34-, ---, TCP, ----, M anage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] export_check_access :EXPORT :M_DBG :EXPORT ( , , , , , -- Dele g, , ) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (root_squash , ----, 34-, ---, TCP, ----, No Manage_Gids, , anon_uid= -2, anon_gid= -2, sys) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] export_check_access :EXPORT :M_DBG :default options (root_squash , ----, 34-, UDP, TCP, ----, No Manage_Gids, -- Dele g, anon_uid= -2, anon_gid= -2, none, sys) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] export_check_access :EXPORT :M_DBG :Final options (root_squash , ----, 34-, ---, TCP, ----, No Manage_Gids, -- Dele g, anon_uid= -2, anon_gid= -2, sys) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] nfs4_export_check_access :NFS4 :INFO :NFS4: INFO: Access not allowed on Export_Id 3 /gpfs/exfel/d/proc for client ::fff f:192.168.142.92 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] nfs4_op_lookup :EXPORT :DEBUG :NFS4ERR_ACCESS Hiding Export_Id 3 Path /gpfs/exfel/d/proc with NFS4ERR_NOENT 192.168.142.92 would be the client2 from my pseudo example, /gpfs/exfel/d/proc resembles /gpfs/filesystem1/directory1 Ganesha never checks anything for /gpfs/filesystem1/directory1/sub-directory1...or rather a subdir of /gpfs/exfel/d/proc Is this what you meant by looking at the real export object? If you think this is a bug, I would open a case in order to get this analyzed. mmnfs does not show me any pseudo options, I think this has been included in 5.0.2. Regards, Stefan ----- Original Message ----- > From: "Malahal R Naineni" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Sent: Friday, October 26, 2018 7:09:45 AM > Subject: Re: [gpfsug-discuss] Nested NFSv4 Exports >>> /gpfs/filesystem1/directory1/sub-directory1 is exported to client2 as >>> read-write. >>> client2 is not included in the export for /gpfs/filesystem1/directory1. >>> Mounting /gpfs/filesystem1/directory1/sub-directory1 on client2 does not work >>> and results in a permission denied > Any NFSv4 implementation needs to traverse the pseudo path for being able to > mount an export. One would expect "client2" to traverse over > /gpfs/filesystem1/directory1/ but not list its content/other files. I strongly > think this is a bug in Ganesha implementation, it is probably looking at the > real-export object than the pseudo-object for permission checking. > One option is to change the Pseudo file system layout. For example, > "/gpfs/client2" as "Pseudo" option for export with path " > /gpfs/filesystem1/directory1/sub-directory1". This is directly not possible > with Spectrum CLI command mmnfs unless you are using the latest and greatest > ("mmnfs export add" usage would show if it supports Pseudo option). Of course, > you can manually do it (using CCR) as Ganesha itself allows it. > Yes, NFSv3 has no pseudo traversal, it should work. > Regards, Malahal. > > > ----- Original message ----- > From: "Dietrich, Stefan" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [gpfsug-discuss] Nested NFSv4 Exports > Date: Thu, Oct 25, 2018 5:52 PM > Hi, > > I am currently fiddling around with some nested NFSv4 exports and the differing > behaviour to NFSv3. > The environment is a GPFS 5.0.1 with enabled CES, so Ganesha is used as the NFS > server. > > Given the following (pseudo) directory structure: > > /gpfs/filesystem1/directory1 > /gpfs/filesystem1/directory1/sub-directory1 > /gpfs/filesystem1/directory1/sub-directory2 > > Now to the exports: > /gpfs/filesystem1/directory1 is exported to client1 as read-only. > /gpfs/filesystem1/directory1/sub-directory1 is exported to client2 as > read-write. > > client2 is not included in the export for /gpfs/filesystem1/directory1. > > Mounting /gpfs/filesystem1/directory1 on client1 works as expected. > Mounting /gpfs/filesystem1/directory1/sub-directory1 on client2 does not work > and results in a permission denied. > If I change the protocol from NFSv4 to NFSv3, it works. > > There is a section about nested NFS exports in the mmnfs doc: > Creating nested exports (such as /path/to/folder and /path/to/folder/subfolder) > is strongly discouraged since this might lead to serious issues in data > consistency. Be very cautious when creating and using nested exports. > If there is a need to have nested exports (such as /path/to/folder and > /path/to/folder/inside/subfolder), NFSv4 client that mounts the parent > (/path/to/folder) export will not be able to see the child export subtree > (/path/to/folder/inside/subfolder) unless the same client is explicitly allowed > to access the child export as well. This is okay as long as the client uses > only NFSv4 mounts. > > The Linux kernel NFS server and other NFSv4 servers do not show this behaviour. > Is there a way to bypass this with CES/Ganesha? Or is the only solution to add > client2 to /gpfs/filesystem1/directory1? > > Regards, > Stefan > > -- > ------------------------------------------------------------------------ > Stefan Dietrich Deutsches Elektronen-Synchrotron (IT-Systems) > Ein Forschungszentrum der Helmholtz-Gemeinschaft > Notkestr. 85 > phone: +49-40-8998-4696 22607 Hamburg > e-mail: stefan.dietrich at desy.de Germany > ------------------------------------------------------------------------ > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > [ http://gpfsug.org/mailman/listinfo/gpfsug-discuss | > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ] > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From xhejtman at ics.muni.cz Fri Oct 26 15:24:38 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 26 Oct 2018 16:24:38 +0200 Subject: [gpfsug-discuss] Multihomed nodes and failover networks Message-ID: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> Hello, does anyone know whether there is a chance to use e.g., 10G ethernet together with IniniBand network for multihoming of GPFS nodes? I mean to setup two different type of networks to mitigate network failures. I read that you can have several networks configured in GPFS but it does not provide failover. Nothing changed in this as of GPFS version 5.x? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From S.J.Thompson at bham.ac.uk Fri Oct 26 15:48:48 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 26 Oct 2018 14:48:48 +0000 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> Message-ID: If IB is enabled and is setup with verbs, then this is the preferred network. GPFS will always fail-back to Ethernet afterwards, however what you can't do is have multiple "subnets" defined and have GPFS fail between different Ethernet networks. Simon ?On 26/10/2018, 15:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of xhejtman at ics.muni.cz" wrote: Hello, does anyone know whether there is a chance to use e.g., 10G ethernet together with IniniBand network for multihoming of GPFS nodes? I mean to setup two different type of networks to mitigate network failures. I read that you can have several networks configured in GPFS but it does not provide failover. Nothing changed in this as of GPFS version 5.x? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From xhejtman at ics.muni.cz Fri Oct 26 15:52:43 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 26 Oct 2018 16:52:43 +0200 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> Message-ID: <20181026145243.jflbns4zfoxgmshi@ics.muni.cz> On Fri, Oct 26, 2018 at 02:48:48PM +0000, Simon Thompson wrote: > If IB is enabled and is setup with verbs, then this is the preferred > network. GPFS will always fail-back to Ethernet afterwards, however what you > can't do is have multiple "subnets" defined and have GPFS fail between > different Ethernet networks. Does it fail-back to Etherenet even in runtime? I mean, doesn't fail-back happen only during mmstartup? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From jonathan.buzzard at strath.ac.uk Fri Oct 26 15:52:43 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 26 Oct 2018 15:52:43 +0100 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> Message-ID: <1cec0eaf-b6d9-77ee-889f-79d57d105615@strath.ac.uk> On 26/10/2018 15:48, Simon Thompson wrote: > If IB is enabled and is setup with verbs, then this is the preferred > network. GPFS will always fail-back to Ethernet afterwards, however > what you can't do is have multiple "subnets" defined and have GPFS > fail between different Ethernet networks. > If you want mitigate network failures then you need to mitigate it at layer 2. However it won't be cheap. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From xhejtman at ics.muni.cz Fri Oct 26 15:56:45 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 26 Oct 2018 16:56:45 +0200 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: <1cec0eaf-b6d9-77ee-889f-79d57d105615@strath.ac.uk> References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> <1cec0eaf-b6d9-77ee-889f-79d57d105615@strath.ac.uk> Message-ID: <20181026145645.qzn24jp26anxayub@ics.muni.cz> On Fri, Oct 26, 2018 at 03:52:43PM +0100, Jonathan Buzzard wrote: > On 26/10/2018 15:48, Simon Thompson wrote: > > If IB is enabled and is setup with verbs, then this is the preferred > > network. GPFS will always fail-back to Ethernet afterwards, however > > what you can't do is have multiple "subnets" defined and have GPFS > > fail between different Ethernet networks. > > > > If you want mitigate network failures then you need to mitigate it at layer > 2. However it won't be cheap. well, I believe this should be exactly what more 'subnets' are used for.. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From xhejtman at ics.muni.cz Fri Oct 26 15:57:53 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 26 Oct 2018 16:57:53 +0200 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: <20181026145243.jflbns4zfoxgmshi@ics.muni.cz> References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> <20181026145243.jflbns4zfoxgmshi@ics.muni.cz> Message-ID: <20181026145753.ijokzwbjh3aznxwr@ics.muni.cz> On Fri, Oct 26, 2018 at 04:52:43PM +0200, Lukas Hejtmanek wrote: > On Fri, Oct 26, 2018 at 02:48:48PM +0000, Simon Thompson wrote: > > If IB is enabled and is setup with verbs, then this is the preferred > > network. GPFS will always fail-back to Ethernet afterwards, however what you > > can't do is have multiple "subnets" defined and have GPFS fail between > > different Ethernet networks. > > Does it fail-back to Etherenet even in runtime? I mean, doesn't fail-back > happen only during mmstartup? moreover, are verbs used also for cluster management? E.g., node keepalive messages. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From S.J.Thompson at bham.ac.uk Fri Oct 26 15:59:08 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 26 Oct 2018 14:59:08 +0000 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: <20181026145243.jflbns4zfoxgmshi@ics.muni.cz> References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> <20181026145243.jflbns4zfoxgmshi@ics.muni.cz> Message-ID: Yes ... if the IB network goes down ... But it's not really fault tolerant, as you need the admin network for token management, so you could lose IB and have data fail to the Ethernet path, but not lose Ethernet. And it doesn't (or didn't) fail back to IB when IB come live again, though that might have changed with 5.0.2. Simon ?On 26/10/2018, 15:52, "gpfsug-discuss-bounces at spectrumscale.org on behalf of xhejtman at ics.muni.cz" wrote: On Fri, Oct 26, 2018 at 02:48:48PM +0000, Simon Thompson wrote: > If IB is enabled and is setup with verbs, then this is the preferred > network. GPFS will always fail-back to Ethernet afterwards, however what you > can't do is have multiple "subnets" defined and have GPFS fail between > different Ethernet networks. Does it fail-back to Etherenet even in runtime? I mean, doesn't fail-back happen only during mmstartup? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From eric.wonderley at vt.edu Fri Oct 26 15:44:13 2018 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 26 Oct 2018 10:44:13 -0400 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> Message-ID: Multihoming is accomplished by using subnets...see mmchconfig. Failover networks on the other hand are not allowed. Bad network behavior is dealt with by expelling nodes. You must have decent/supported network gear...we have learned that lesson the hard way On Fri, Oct 26, 2018 at 10:37 AM Lukas Hejtmanek wrote: > Hello, > > does anyone know whether there is a chance to use e.g., 10G ethernet > together > with IniniBand network for multihoming of GPFS nodes? > > I mean to setup two different type of networks to mitigate network > failures. > I read that you can have several networks configured in GPFS but it does > not > provide failover. Nothing changed in this as of GPFS version 5.x? > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vtarasov at us.ibm.com Fri Oct 26 23:58:16 2018 From: vtarasov at us.ibm.com (Vasily Tarasov) Date: Fri, 26 Oct 2018 22:58:16 +0000 Subject: [gpfsug-discuss] If you're attending KubeCon'18 Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Oct 29 00:29:51 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 29 Oct 2018 00:29:51 +0000 Subject: [gpfsug-discuss] Presentations from SSUG Meeting, Oct 24th - NY Genome Center Message-ID: <2CF4E6B3-B39E-4567-91A5-58C39A720362@nuance.com> These are now on the web site under ?Presentations? - single zip file has them all. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Oct 29 16:33:35 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 29 Oct 2018 12:33:35 -0400 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Message-ID: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> Flipping through the slides from the recent SSUG meeting I noticed that in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. Reading up on it it seems as though it comes with a warning about significant I/O performance degradation and increase in CPU usage. I also recall that data integrity checking is performed by default with GNR. How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? As someone who knows exactly 0 of the implementation details, I'm just naively assuming that the checksum are being generated (in the same way?) in both cases and transferred to the NSD server. Why is there such a penalty for "traditional" environments? -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From kums at us.ibm.com Mon Oct 29 19:56:09 2018 From: kums at us.ibm.com (Kumaran Rajaram) Date: Mon, 29 Oct 2018 14:56:09 -0500 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> Message-ID: Hi, >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? >>Why is there such a penalty for "traditional" environments? In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). My two cents. Regards, -Kums From: Aaron Knister To: gpfsug main discussion list Date: 10/29/2018 12:34 PM Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Sent by: gpfsug-discuss-bounces at spectrumscale.org Flipping through the slides from the recent SSUG meeting I noticed that in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. Reading up on it it seems as though it comes with a warning about significant I/O performance degradation and increase in CPU usage. I also recall that data integrity checking is performed by default with GNR. How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? As someone who knows exactly 0 of the implementation details, I'm just naively assuming that the checksum are being generated (in the same way?) in both cases and transferred to the NSD server. Why is there such a penalty for "traditional" environments? -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Mon Oct 29 20:47:24 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 29 Oct 2018 16:47:24 -0400 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> Message-ID: <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> So the ESS checksums that are highly touted as "protecting all the way to the disk surface" completely ignore the transfer between the client and the NSD server? It sounds like you are saying that all of the checksumming done for GNR is internal to GNR and only protects against bit-flips on the disk (and in staging buffers, etc.) I?m asking because your explanation completely ignores calculating anything on the NSD client and implies that the client could not participate, given that it does not know about the structure of the vdisks under the NSD ? but that has to be a performance factor for both types if the transfer is protected starting at the client ? which it is in the case of nsdCksumTraditional which is what we are comparing to ESS checksumming. If ESS checksumming doesn?t protect on the wire I?d say that marketing has run amok, because that has *definitely* been implied in meetings for which I?ve been present. In fact, when asked if Spectrum Scale provides checksumming for data in-flight, IBM sales has used it as an ESS up-sell opportunity. -- Stephen > On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram > wrote: > > Hi, > > >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? > >>Why is there such a penalty for "traditional" environments? > > In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. > > In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). > > My two cents. > > Regards, > -Kums > > > > > > From: Aaron Knister > > To: gpfsug main discussion list > > Date: 10/29/2018 12:34 PM > Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Flipping through the slides from the recent SSUG meeting I noticed that > in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. > Reading up on it it seems as though it comes with a warning about > significant I/O performance degradation and increase in CPU usage. I > also recall that data integrity checking is performed by default with > GNR. How can it be that the I/O performance degradation warning only > seems to accompany the nsdCksumTraditional setting and not GNR? As > someone who knows exactly 0 of the implementation details, I'm just > naively assuming that the checksum are being generated (in the same > way?) in both cases and transferred to the NSD server. Why is there such > a penalty for "traditional" environments? > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Oct 29 21:27:41 2018 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 29 Oct 2018 16:27:41 -0500 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> Message-ID: Stephen, ESS does perform checksums in the transfer between NSD clients and NSD servers. As Kums described below, the difference between the checksums performed by GNR and those performed with "nsdCksumTraditional" is that GNR checksums are computed in parallel on the server side, as a large FS block is broken into smaller pieces. On non-GNR environments (when nsdCksumTraditional is set), the checksum is computed sequentially on the server. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Stephen Ulmer To: gpfsug main discussion list Date: 10/29/2018 04:52 PM Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Sent by: gpfsug-discuss-bounces at spectrumscale.org So the ESS checksums that are highly touted as "protecting all the way to the disk surface" completely ignore the transfer between the client and the NSD server? It sounds like you are saying that all of the checksumming done for GNR is internal to GNR and only protects against bit-flips on the disk (and in staging buffers, etc.) I?m asking because your explanation completely ignores calculating anything on the NSD client and implies that the client could not participate, given that it does not know about the structure of the vdisks under the NSD ? but that has to be a performance factor for both types if the transfer is protected starting at the client ? which it is in the case of nsdCksumTraditional which is what we are comparing to ESS checksumming. If ESS checksumming doesn?t protect on the wire I?d say that marketing has run amok, because that has *definitely* been implied in meetings for which I?ve been present. In fact, when asked if Spectrum Scale provides checksumming for data in-flight, IBM sales has used it as an ESS up-sell opportunity. -- Stephen On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram wrote: Hi, >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? >>Why is there such a penalty for "traditional" environments? In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). My two cents. Regards, -Kums From: Aaron Knister To: gpfsug main discussion list < gpfsug-discuss at spectrumscale.org> Date: 10/29/2018 12:34 PM Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Sent by: gpfsug-discuss-bounces at spectrumscale.org Flipping through the slides from the recent SSUG meeting I noticed that in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. Reading up on it it seems as though it comes with a warning about significant I/O performance degradation and increase in CPU usage. I also recall that data integrity checking is performed by default with GNR. How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? As someone who knows exactly 0 of the implementation details, I'm just naively assuming that the checksum are being generated (in the same way?) in both cases and transferred to the NSD server. Why is there such a penalty for "traditional" environments? -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kums at us.ibm.com Mon Oct 29 21:29:33 2018 From: kums at us.ibm.com (Kumaran Rajaram) Date: Mon, 29 Oct 2018 16:29:33 -0500 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> Message-ID: In non-GNR setup, nsdCksumTraditional=yes enables data-integrity checking between a traditional NSD client node and its NSD server, at the network level only. The ESS storage supports end-to-end checksum, NSD client to the ESS IO servers (at the network level) as well as from ESS IO servers to the disk/storage. This is further detailed in the docs (link below): https://www.ibm.com/support/knowledgecenter/en/SSYSP8_5.3.1/com.ibm.spectrum.scale.raid.v5r01.adm.doc/bl1adv_introe2echecksum.htm Best, -Kums From: Stephen Ulmer To: gpfsug main discussion list Date: 10/29/2018 04:52 PM Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Sent by: gpfsug-discuss-bounces at spectrumscale.org So the ESS checksums that are highly touted as "protecting all the way to the disk surface" completely ignore the transfer between the client and the NSD server? It sounds like you are saying that all of the checksumming done for GNR is internal to GNR and only protects against bit-flips on the disk (and in staging buffers, etc.) I?m asking because your explanation completely ignores calculating anything on the NSD client and implies that the client could not participate, given that it does not know about the structure of the vdisks under the NSD ? but that has to be a performance factor for both types if the transfer is protected starting at the client ? which it is in the case of nsdCksumTraditional which is what we are comparing to ESS checksumming. If ESS checksumming doesn?t protect on the wire I?d say that marketing has run amok, because that has *definitely* been implied in meetings for which I?ve been present. In fact, when asked if Spectrum Scale provides checksumming for data in-flight, IBM sales has used it as an ESS up-sell opportunity. -- Stephen On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram wrote: Hi, >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? >>Why is there such a penalty for "traditional" environments? In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). My two cents. Regards, -Kums From: Aaron Knister To: gpfsug main discussion list Date: 10/29/2018 12:34 PM Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Sent by: gpfsug-discuss-bounces at spectrumscale.org Flipping through the slides from the recent SSUG meeting I noticed that in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. Reading up on it it seems as though it comes with a warning about significant I/O performance degradation and increase in CPU usage. I also recall that data integrity checking is performed by default with GNR. How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? As someone who knows exactly 0 of the implementation details, I'm just naively assuming that the checksum are being generated (in the same way?) in both cases and transferred to the NSD server. Why is there such a penalty for "traditional" environments? -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Tue Oct 30 00:39:35 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 29 Oct 2018 20:39:35 -0400 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> Message-ID: The point of the original question was to discover why there is a warning about performance for nsdChksumTraditional=yes, but that warning doesn?t seem to apply to an ESS environment. Your reply was that checksums in an ESS environment are calculated in parallel on the NSD server based on the physical storage layout used underneath the NSD, and is thus faster. My point was that if there is never a checksum calculated by the NSD client, then how does the NSD server know that it got uncorrupted data? The link you referenced below (thank you!) indicates that, in fact, the NSD client DOES calculate a checksum and forward it with the data to the NSD server. The server validates the data (necessitating a re-calculation of the checksum), and then GNR stores the data, A CHECKSUM[1], and some block metadata to media. So this leaves us with a checksum calculated by the client and then validated (re-calculated) by the server ? IN BOTH CASES. For the GNR case, another checksum in calculated and stored with the data for another purpose, but that means that the nsdChksumTraditional=yes case is exactly like the first phase of the GNR case. So why is that case slower when it does less work? Slow enough to merit a warning, no less! I?m really not trying to be a pest, but I have a logic problem with either the question or the answer ? they aren?t consistent (or I can?t rationalize them to be so). -- Stephen [1] The document is vague (I believe intentionally, because it could have easily been made clear) as to whether this is the same checksum or a different one. Presumably the server-side-new-checksum is calculated in parallel and protects the chunklets or whatever they're called. This is all consistent with what you said! > On Oct 29, 2018, at 5:29 PM, Kumaran Rajaram > wrote: > > In non-GNR setup, nsdCksumTraditional=yes enables data-integrity checking between a traditional NSD client node and its NSD server, at the network level only. > > The ESS storage supports end-to-end checksum, NSD client to the ESS IO servers (at the network level) as well as from ESS IO servers to the disk/storage. This is further detailed in the docs (link below): > > https://www.ibm.com/support/knowledgecenter/en/SSYSP8_5.3.1/com.ibm.spectrum.scale.raid.v5r01.adm.doc/bl1adv_introe2echecksum.htm > > Best, > -Kums > > > > > > From: Stephen Ulmer > > To: gpfsug main discussion list > > Date: 10/29/2018 04:52 PM > Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > So the ESS checksums that are highly touted as "protecting all the way to the disk surface" completely ignore the transfer between the client and the NSD server? It sounds like you are saying that all of the checksumming done for GNR is internal to GNR and only protects against bit-flips on the disk (and in staging buffers, etc.) > > I?m asking because your explanation completely ignores calculating anything on the NSD client and implies that the client could not participate, given that it does not know about the structure of the vdisks under the NSD ? but that has to be a performance factor for both types if the transfer is protected starting at the client ? which it is in the case of nsdCksumTraditional which is what we are comparing to ESS checksumming. > > If ESS checksumming doesn?t protect on the wire I?d say that marketing has run amok, because that has *definitely* been implied in meetings for which I?ve been present. In fact, when asked if Spectrum Scale provides checksumming for data in-flight, IBM sales has used it as an ESS up-sell opportunity. > > -- > Stephen > > > > On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram > wrote: > > Hi, > > >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? > >>Why is there such a penalty for "traditional" environments? > > In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. > > In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). > > My two cents. > > Regards, > -Kums > > > > > > From: Aaron Knister > > To: gpfsug main discussion list > > Date: 10/29/2018 12:34 PM > Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Flipping through the slides from the recent SSUG meeting I noticed that > in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. > Reading up on it it seems as though it comes with a warning about > significant I/O performance degradation and increase in CPU usage. I > also recall that data integrity checking is performed by default with > GNR. How can it be that the I/O performance degradation warning only > seems to accompany the nsdCksumTraditional setting and not GNR? As > someone who knows exactly 0 of the implementation details, I'm just > naively assuming that the checksum are being generated (in the same > way?) in both cases and transferred to the NSD server. Why is there such > a penalty for "traditional" environments? > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Tue Oct 30 00:53:06 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Tue, 30 Oct 2018 00:53:06 +0000 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: References: , <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov><326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 30 09:03:06 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 30 Oct 2018 09:03:06 +0000 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> Message-ID: <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> On 29/10/2018 20:47, Stephen Ulmer wrote: [SNIP] > > If ESS checksumming doesn?t protect on the wire I?d say that marketing > has run amok, because that has *definitely* been implied in meetings for > which I?ve been present. In fact, when asked if?Spectrum Scale provides > checksumming for data in-flight, IBM sales has used it as an ESS up-sell > opportunity. > Noting that on a TCP/IP network anything passing over a TCP connection is checksummed at the network layer. Consequently any addition checksumming is basically superfluous. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From daniel.kidger at uk.ibm.com Tue Oct 30 10:56:09 2018 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Tue, 30 Oct 2018 10:56:09 +0000 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: Message-ID: Remember too that in a traditional GPFS setup, the NSD servers are effectively merely data routers (since the clients know exactly where the block is going to be written) and as such NSD servers can be previous generation hardware. By contrast GNR needs cpu cycles and plenty of memory, so ESS nodes are naturally big and fast (as well as benefitting from parallel threads working together on the GNR). Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-(0)7818 522 266 daniel.kidger at uk.ibm.com > On 30 Oct 2018, at 00:53, Andrew Beattie wrote: > > Stephen, > > I think you also need to take into consideration that IBM does not control what infrastructure users may chose to deploy Spectrum scale on outside of ESS hardware. > > As such it is entirely possible that older or lower spec hardware, or even virtualised NSD Servers with even lower resources per virtual node, will have potential issues when running the nsdChksumTraditional=yes flag, As such IBM has a duty of care to provide a warning that you may experience issues if you turn the additional workload on. > > Beyond this i'm not seeing why there is an issue, if you turn the flag on in a non ESS scenario the process is Serialised, if you turn it on in an ESS Scenario you get to take advantage of the fact that Scale Native Raid does a significant amount of the work in a parallelised method, one is less resource intensive than the other, because the process is handled differently depending on the type of NSD Servers doing the work. > > > > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > ----- Original message ----- > From: Stephen Ulmer > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) > Date: Tue, Oct 30, 2018 10:39 AM > > The point of the original question was to discover why there is a warning about performance for nsdChksumTraditional=yes, but that warning doesn?t seem to apply to an ESS environment. > > Your reply was that checksums in an ESS environment are calculated in parallel on the NSD server based on the physical storage layout used underneath the NSD, and is thus faster. My point was that if there is never a checksum calculated by the NSD client, then how does the NSD server know that it got uncorrupted data? > > The link you referenced below (thank you!) indicates that, in fact, the NSD client DOES calculate a checksum and forward it with the data to the NSD server. The server validates the data (necessitating a re-calculation of the checksum), and then GNR stores the data, A CHECKSUM[1], and some block metadata to media. > > So this leaves us with a checksum calculated by the client and then validated (re-calculated) by the server ? IN BOTH CASES. For the GNR case, another checksum in calculated and stored with the data for another purpose, but that means that the nsdChksumTraditional=yes case is exactly like the first phase of the GNR case. So why is that case slower when it does less work? Slow enough to merit a warning, no less! > > I?m really not trying to be a pest, but I have a logic problem with either the question or the answer ? they aren?t consistent (or I can?t rationalize them to be so). > > -- > Stephen > > [1] The document is vague (I believe intentionally, because it could have easily been made clear) as to whether this is the same checksum or a different one. Presumably the server-side-new-checksum is calculated in parallel and protects the chunklets or whatever they're called. This is all consistent with what you said! > > > >> >> On Oct 29, 2018, at 5:29 PM, Kumaran Rajaram wrote: >> >> In non-GNR setup, nsdCksumTraditional=yes enables data-integrity checking between a traditional NSD client node and its NSD server, at the network level only. >> >> The ESS storage supports end-to-end checksum, NSD client to the ESS IO servers (at the network level) as well as from ESS IO servers to the disk/storage. This is further detailed in the docs (link below): >> >> https://www.ibm.com/support/knowledgecenter/en/SSYSP8_5.3.1/com.ibm.spectrum.scale.raid.v5r01.adm.doc/bl1adv_introe2echecksum.htm >> >> Best, >> -Kums >> >> >> >> >> >> From: Stephen Ulmer >> To: gpfsug main discussion list >> Date: 10/29/2018 04:52 PM >> Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> So the ESS checksums that are highly touted as "protecting all the way to the disk surface" completely ignore the transfer between the client and the NSD server? It sounds like you are saying that all of the checksumming done for GNR is internal to GNR and only protects against bit-flips on the disk (and in staging buffers, etc.) >> >> I?m asking because your explanation completely ignores calculating anything on the NSD client and implies that the client could not participate, given that it does not know about the structure of the vdisks under the NSD ? but that has to be a performance factor for both types if the transfer is protected starting at the client ? which it is in the case of nsdCksumTraditional which is what we are comparing to ESS checksumming. >> >> If ESS checksumming doesn?t protect on the wire I?d say that marketing has run amok, because that has *definitely* been implied in meetings for which I?ve been present. In fact, when asked if Spectrum Scale provides checksumming for data in-flight, IBM sales has used it as an ESS up-sell opportunity. >> >> -- >> Stephen >> >> >> >> On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram wrote: >> >> Hi, >> >> >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? >> >>Why is there such a penalty for "traditional" environments? >> >> In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. >> >> In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). >> >> My two cents. >> >> Regards, >> -Kums >> >> >> >> >> >> From: Aaron Knister >> To: gpfsug main discussion list >> Date: 10/29/2018 12:34 PM >> Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Flipping through the slides from the recent SSUG meeting I noticed that >> in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. >> Reading up on it it seems as though it comes with a warning about >> significant I/O performance degradation and increase in CPU usage. I >> also recall that data integrity checking is performed by default with >> GNR. How can it be that the I/O performance degradation warning only >> seems to accompany the nsdCksumTraditional setting and not GNR? As >> someone who knows exactly 0 of the implementation details, I'm just >> naively assuming that the checksum are being generated (in the same >> way?) in both cases and transferred to the NSD server. Why is there such >> a penalty for "traditional" environments? >> >> -Aaron >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Tue Oct 30 12:30:20 2018 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]) Date: Tue, 30 Oct 2018 12:30:20 +0000 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org>, <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> Message-ID: <0765E436-870B-430D-89D3-89CE60E94CCB@nasa.gov> I?m guessing IBM doesn?t generally spend huge amounts of money on things that are superfluous...although *cough*RedHat*cough*. TCP does of course perform checksumming, but I see the NSD checksums as being at a higher ?layer?, if you will. The layer at which I believe the NSD checksums operate sits above the complex spaghetti monster of queues, buffers, state machines, kernel/user space communication inside of GPFS as well as networking drivers that can suck (looking at you Intel, Mellanox), and high speed networking hardware all of which I?ve seen cause data corruption (even though the data on the wire was in some cases checksummed correctly). -Aaron On October 30, 2018 at 05:03:26 EDT, Jonathan Buzzard wrote: On 29/10/2018 20:47, Stephen Ulmer wrote: [SNIP] > > If ESS checksumming doesn?t protect on the wire I?d say that marketing > has run amok, because that has *definitely* been implied in meetings for > which I?ve been present. In fact, when asked if Spectrum Scale provides > checksumming for data in-flight, IBM sales has used it as an ESS up-sell > opportunity. > Noting that on a TCP/IP network anything passing over a TCP connection is checksummed at the network layer. Consequently any addition checksumming is basically superfluous. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Tue Oct 30 22:14:00 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Tue, 30 Oct 2018 18:14:00 -0400 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> Message-ID: <107111.1540937640@turing-police.cc.vt.edu> On Tue, 30 Oct 2018 09:03:06 -0000, Jonathan Buzzard said: > Noting that on a TCP/IP network anything passing over a TCP connection > is checksummed at the network layer. Consequently any addition > checksumming is basically superfluous. Note that the TCP checksum is relatively weak, and designed in a day when a 56K leased line was a high-speed long-haul link and 10mbit ethernet was the fastest thing on the planet. When 10 megabytes was a large transfer, it was a reasonable amount of protection. But when you get into moving petabytes of data around, the chances of an undetected error starts getting significant. Pop quiz time: When was the last time you (the reader) checked your network statistics to see what your bit error rate was? Do you even have the ability to do so? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From bbanister at jumptrading.com Tue Oct 30 22:52:35 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 30 Oct 2018 22:52:35 +0000 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <107111.1540937640@turing-police.cc.vt.edu> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> <107111.1540937640@turing-police.cc.vt.edu> Message-ID: Valdis will also recall how much "fun" we had with network related corruption due to what we surmised was a TCP offload engine FW defect in a certain 10GbE HCA. Only happened sporadically every few weeks... what a nightmare that was!! -B -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of valdis.kletnieks at vt.edu Sent: Tuesday, October 30, 2018 5:14 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) [EXTERNAL EMAIL] On Tue, 30 Oct 2018 09:03:06 -0000, Jonathan Buzzard said: > Noting that on a TCP/IP network anything passing over a TCP connection > is checksummed at the network layer. Consequently any addition > checksumming is basically superfluous. Note that the TCP checksum is relatively weak, and designed in a day when a 56K leased line was a high-speed long-haul link and 10mbit ethernet was the fastest thing on the planet. When 10 megabytes was a large transfer, it was a reasonable amount of protection. But when you get into moving petabytes of data around, the chances of an undetected error starts getting significant. Pop quiz time: When was the last time you (the reader) checked your network statistics to see what your bit error rate was? Do you even have the ability to do so? ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. From makaplan at us.ibm.com Tue Oct 30 23:15:38 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 30 Oct 2018 18:15:38 -0500 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov><326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org><72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk><107111.1540937640@turing-police.cc.vt.edu> Message-ID: I confess, I know what checksums are generally and how and why they are used, but I am not familiar with all the various checksums that have been discussed here. I'd like to see a list or a chart with the following information for each checksum: Computed on what data elements, of what (typical) length (e.g. packet, disk block, disk fragment, disk sector) Checksum function used, how many bits of checksum computed on each data element. Computed by what software or hardware entity at what nodes in the network. There may be such checksums on each NSD transfer. Lowest layers would be checking data coming off of the disk. Checking network packets coming off ethernet or IB adapters. Higher layer for NSD could be a checksum on a whole disk block and/or on NSD request and response, including message headers AND the disk data... -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Oct 31 01:09:40 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Tue, 30 Oct 2018 21:09:40 -0400 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> <107111.1540937640@turing-police.cc.vt.edu> Message-ID: <122689.1540948180@turing-police.cc.vt.edu> On Tue, 30 Oct 2018 22:52:35 -0000, Bryan Banister said: > Valdis will also recall how much "fun" we had with network related corruption > due to what we surmised was a TCP offload engine FW defect in a certain 10GbE > HCA. Only happened sporadically every few weeks... what a nightmare that was!! It makes for quite the bar story, as the symptoms pointed everywhere except the network adapter. For the purposes of this thread though, two points to note: 1) The card in question was a spectacularly good price/performer and totally rock solid in 4 NFS servers that we had - in 6 years of trying, I never managed to make them hiccup (the one suspected failure turned out to be a fiber cable that had gotten crimped when the rack door was closed on a loop). 2) Since the TCP offload engine was computing the checksum across the data, but it had gotten confused about which data it was about to transmit, every single packet went out with a perfectly correct checksum. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From rohwedder at de.ibm.com Wed Oct 31 15:33:54 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Wed, 31 Oct 2018 16:33:54 +0100 Subject: [gpfsug-discuss] Spectrum Scale Survey Message-ID: Hello Spectrum Scale Users, we have started a survey on how certain Spectrum Scale administrative tasks are performed. The survey focuses on use of tasks like snapshots or ILM including monitoring, scheduling and problem determination of these capabilities. It should take only a few minutes to complete the survey. Please take a look and let us know how you are using Spectrum Scale and what aspects are important for you. Here is the survey link: https://www.surveygizmo.com/s3/4631738/IBM-Spectrum-Scale-Administrative-Management Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18977725.gif Type: image/gif Size: 4659 bytes Desc: not available URL: From kkr at lbl.gov Wed Oct 31 20:10:02 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 31 Oct 2018 13:10:02 -0700 Subject: [gpfsug-discuss] V5 client limit? Message-ID: Hi, Can someone tell me the max # of GPFS native clients under 5.x? Everything I can find is dated. Thanks Kristy From chris.schlipalius at pawsey.org.au Mon Oct 1 06:53:06 2018 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Mon, 01 Oct 2018 13:53:06 +0800 Subject: [gpfsug-discuss] Upcoming meeting: Australian Spectrum Scale Usergroup 15th October 2018 Melbourne Message-ID: <676180C3-1B36-4D25-8325-532AF15C6552@pawsey.org.au> Dear members, Please note the next Australian Usergroup is confirmed. If you plan to attend, please register: http://bit.ly/2wHGuhY Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 13 Burvill Court Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 10709 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Tue Oct 2 09:12:28 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 2 Oct 2018 08:12:28 +0000 Subject: [gpfsug-discuss] LDAP in GUI / 5.0.2 Message-ID: Hi all, >From the release notes: "5.0.2: Added option to configure an external authentication method to manage the GUI user access in the Services > GUI page. " Does this mean I should be able to configure LDAP through the GUI because at the moment I'm not seeing any relevant options. Running 5.0.2 DME and minReleaseLevel=latest. Do I need to restart ALL nodes for this to take effect, or have I misunderstood the meaning of the above? Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Tue Oct 2 09:27:02 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Tue, 2 Oct 2018 10:27:02 +0200 Subject: [gpfsug-discuss] LDAP in GUI / 5.0.2 In-Reply-To: References: Message-ID: Hello Richard, I am sorry, it seems that the release notes document were note refreshed with the latest information. The GUI pages to modify external user authentication for GUI users have not made it into the 5.0.2 release. The Knowledge center is correct in this respect: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1xx_soc.htm Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Date: 02.10.2018 10:12 Subject: [gpfsug-discuss] LDAP in GUI / 5.0.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, >From the release notes: ?5.0.2: Added option to configure an external authentication method to manage the GUI user access in the Services > GUI page. ? Does this mean I should be able to configure LDAP through the GUI because at the moment I?m not seeing any relevant options. Running 5.0.2 DME and minReleaseLevel=latest. Do I need to restart ALL nodes for this to take effect, or have I misunderstood the meaning of the above? Thanks Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C467306.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Tue Oct 2 09:44:23 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 2 Oct 2018 08:44:23 +0000 Subject: [gpfsug-discuss] LDAP in GUI / 5.0.2 In-Reply-To: References: Message-ID: Alright, thanks for clearing that up. Cheers Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Markus Rohwedder Sent: 02 October 2018 09:27 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] LDAP in GUI / 5.0.2 Hello Richard, I am sorry, it seems that the release notes document were note refreshed with the latest information. The GUI pages to modify external user authentication for GUI users have not made it into the 5.0.2 release. The Knowledge center is correct in this respect: https://www.ibm.com/support/knowledgecenter/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1xx_soc.htm Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development ________________________________ Phone: +49 7034 6430190 IBM Deutschland Research & Development [cid:image002.png at 01D45A34.7F2A60F0] E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ [Inactive hide details for "Sobey, Richard A" ---02.10.2018 10:12:51---Hi all, From the release notes:]"Sobey, Richard A" ---02.10.2018 10:12:51---Hi all, From the release notes: From: "Sobey, Richard A" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 02.10.2018 10:12 Subject: [gpfsug-discuss] LDAP in GUI / 5.0.2 Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, From the release notes: ?5.0.2: Added option to configure an external authentication method to manage the GUI user access in the Services > GUI page. ? Does this mean I should be able to configure LDAP through the GUI because at the moment I?m not seeing any relevant options. Running 5.0.2 DME and minReleaseLevel=latest. Do I need to restart ALL nodes for this to take effect, or have I misunderstood the meaning of the above? Thanks Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 166 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 4659 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.gif Type: image/gif Size: 105 bytes Desc: image003.gif URL: From Renar.Grunenberg at huk-coburg.de Tue Oct 2 11:49:33 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Tue, 2 Oct 2018 10:49:33 +0000 Subject: [gpfsug-discuss] V5.0.2 and Maxblocksize Message-ID: <796971E1-7AC1-40E1-BB4E-879C704DA054@huk-coburg.de> Hallo Spectrumscale-team, We installed the new Version 5.0.2 and had the hope that the maxblocksize Parameter are online changeable. But dont. Are there a timeframe when this 24/7 gap are fixed. The Problem here we can not shuting down the complete Cluster. Regards Renar Von meinem iPhone gesendet Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ======================================================================= HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ======================================================================= Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ======================================================================= From sandeep.patil at in.ibm.com Wed Oct 3 16:18:06 2018 From: sandeep.patil at in.ibm.com (Sandeep Ramesh) Date: Wed, 3 Oct 2018 15:18:06 +0000 Subject: [gpfsug-discuss] Latest Technical Blogs on IBM Spectrum Scale (Q3 2018) In-Reply-To: References: Message-ID: Dear User Group Members, In continuation, here are list of development blogs in the this quarter (Q3 2018). We now have over 100+ developer blogs on Spectrum Scale/ESS. As discussed in User Groups, passing it along to the emailing list. How NFS exports became more dynamic with Spectrum Scale 5.0.2 https://developer.ibm.com/storage/2018/10/02/nfs-exports-became-dynamic-spectrum-scale-5-0-2/ HPC storage on AWS (IBM Spectrum Scale) https://developer.ibm.com/storage/2018/10/02/hpc-storage-aws-ibm-spectrum-scale/ Upgrade with Excluding the node(s) using Install-toolkit https://developer.ibm.com/storage/2018/09/30/upgrade-excluding-nodes-using-install-toolkit/ Offline upgrade using Install-toolkit https://developer.ibm.com/storage/2018/09/30/offline-upgrade-using-install-toolkit/ IBM Spectrum Scale for Linux on IBM Z ? What?s new in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/21/ibm-spectrum-scale-for-linux-on-ibm-z-whats-new-in-ibm-spectrum-scale-5-0-2/ What?s New in IBM Spectrum Scale 5.0.2 ? https://developer.ibm.com/storage/2018/09/15/whats-new-ibm-spectrum-scale-5-0-2/ Starting IBM Spectrum Scale 5.0.2 release, the installation toolkit supports upgrade rerun if fresh upgrade fails. https://developer.ibm.com/storage/2018/09/15/starting-ibm-spectrum-scale-5-0-2-release-installation-toolkit-supports-upgrade-rerun-fresh-upgrade-fails/ IBM Spectrum Scale installation toolkit ? enhancements over releases ? 5.0.2.0 https://developer.ibm.com/storage/2018/09/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases-5-0-2-0/ Announcing HDP 3.0 support with IBM Spectrum Scale https://developer.ibm.com/storage/2018/08/31/announcing-hdp-3-0-support-ibm-spectrum-scale/ IBM Spectrum Scale Tuning Overview for Hadoop Workload https://developer.ibm.com/storage/2018/08/20/ibm-spectrum-scale-tuning-overview-hadoop-workload/ Making the Most of Multicloud Storage https://developer.ibm.com/storage/2018/08/13/making-multicloud-storage/ Disaster Recovery for Transparent Cloud Tiering using SOBAR https://developer.ibm.com/storage/2018/08/13/disaster-recovery-transparent-cloud-tiering-using-sobar/ Your Optimal Choice of AI Storage for Today and Tomorrow https://developer.ibm.com/storage/2018/08/10/spectrum-scale-ai-workloads/ Analyze IBM Spectrum Scale File Access Audit with ELK Stack https://developer.ibm.com/storage/2018/07/30/analyze-ibm-spectrum-scale-file-access-audit-elk-stack/ Mellanox SX1710 40G switch MLAG configuration for IBM ESS https://developer.ibm.com/storage/2018/07/12/mellanox-sx1710-40g-switcher-mlag-configuration/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? SMB and NFS Access issues https://developer.ibm.com/storage/2018/07/10/protocol-problem-determination-guide-ibm-spectrum-scale-smb-nfs-access-issues/ Access Control in IBM Spectrum Scale Object https://developer.ibm.com/storage/2018/07/06/access-control-ibm-spectrum-scale-object/ IBM Spectrum Scale HDFS Transparency Docker support https://developer.ibm.com/storage/2018/07/06/ibm-spectrum-scale-hdfs-transparency-docker-support/ Protocol Problem Determination Guide for IBM Spectrum Scale? ? Log Collection https://developer.ibm.com/storage/2018/07/04/protocol-problem-determination-guide-ibm-spectrum-scale-log-collection/ Redpapers IBM Spectrum Scale Immutability Introduction, Configuration Guidance, and Use Cases http://www.redbooks.ibm.com/abstracts/redp5507.html?Open Certifications Assessment of the immutability function of IBM Spectrum Scale Version 5.0 in accordance to US SEC17a-4f, EU GDPR Article 21 Section 1, German and Swiss laws and regulations in collaboration with KPMG. Certificate: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?DE968667B47544FF83F6CCDCF37E5FB5 Full assessment report: http://www.kpmg.de/bescheinigungen/RequestReport.aspx?B290411BE1224F5A9B4D24663BCD3C5D For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 07/03/2018 12:13 AM Subject: Re: Latest Technical Blogs on Spectrum Scale (Q2 2018) Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q2 2018). We now have over 100+ developer blogs. As discussed in User Groups, passing it along: IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ IBM Spectrum Scale ILM Policies https://developer.ibm.com/storage/2018/06/02/ibm-spectrum-scale-ilm-policies/ IBM Spectrum Scale 5.0.1 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2018/06/15/6494/ Management GUI enhancements in IBM Spectrum Scale release 5.0.1 https://developer.ibm.com/storage/2018/05/18/management-gui-enhancements-in-ibm-spectrum-scale-release-5-0-1/ Managing IBM Spectrum Scale services through GUI https://developer.ibm.com/storage/2018/05/18/managing-ibm-spectrum-scale-services-through-gui/ Use AWS CLI with IBM Spectrum Scale? object storage https://developer.ibm.com/storage/2018/05/16/use-awscli-with-ibm-spectrum-scale-object-storage/ Hadoop Storage Tiering with IBM Spectrum Scale https://developer.ibm.com/storage/2018/05/09/hadoop-storage-tiering-ibm-spectrum-scale/ How many Files on my Filesystem? https://developer.ibm.com/storage/2018/05/07/many-files-filesystem/ Recording Spectrum Scale Object Stats for Potential Billing like Purpose using Elasticsearch https://developer.ibm.com/storage/2018/05/04/spectrum-scale-object-stats-for-billing-using-elasticsearch/ New features in IBM Elastic Storage Server (ESS) Version 5.3 https://developer.ibm.com/storage/2018/04/09/new-features-ibm-elastic-storage-server-ess-version-5-3/ Using IBM Spectrum Scale for storage in IBM Cloud Private (Missed to send earlier) https://medium.com/ibm-cloud/ibm-spectrum-scale-with-ibm-cloud-private-8bf801796f19 Redpapers Hortonworks Data Platform with IBM Spectrum Scale: Reference Guide for Building an Integrated Solution http://www.redbooks.ibm.com/redpieces/abstracts/redp5448.html, Enabling Hybrid Cloud Storage for IBM Spectrum Scale Using Transparent Cloud Tiering http://www.redbooks.ibm.com/abstracts/redp5411.html?Open SAP HANA and ESS: A Winning Combination (Update) http://www.redbooks.ibm.com/abstracts/redp5436.html?Open Others IBM Spectrum Scale Software Version Recommendation Preventive Service Planning (Updated) http://www-01.ibm.com/support/docview.wss?uid=ssg1S1009703, IDC Infobrief: A Modular Approach to Genomics Infrastructure at Scale in HCLS https://www.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=37016937USEN& For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Date: 03/27/2018 05:23 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, In continuation , here are list of development blogs in the this quarter (Q1 2018). As discussed in User Groups, passing it along: GDPR Compliance and Unstructured Data Storage https://developer.ibm.com/storage/2018/03/27/gdpr-compliance-unstructure-data-storage/ IBM Spectrum Scale for Linux on IBM Z ? Release 5.0 features and highlights https://developer.ibm.com/storage/2018/03/09/ibm-spectrum-scale-linux-ibm-z-release-5-0-features-highlights/ Management GUI enhancements in IBM Spectrum Scale release 5.0.0 https://developer.ibm.com/storage/2018/01/18/gui-enhancements-in-spectrum-scale-release-5-0-0/ IBM Spectrum Scale 5.0.0 ? What?s new in NFS? https://developer.ibm.com/storage/2018/01/18/ibm-spectrum-scale-5-0-0-whats-new-nfs/ Benefits and implementation of Spectrum Scale sudo wrappers https://developer.ibm.com/storage/2018/01/15/benefits-implementation-spectrum-scale-sudo-wrappers/ IBM Spectrum Scale: Big Data and Analytics Solution Brief https://developer.ibm.com/storage/2018/01/15/ibm-spectrum-scale-big-data-analytics-solution-brief/ Variant Sub-blocks in Spectrum Scale 5.0 https://developer.ibm.com/storage/2018/01/11/spectrum-scale-variant-sub-blocks/ Compression support in Spectrum Scale 5.0.0 https://developer.ibm.com/storage/2018/01/11/compression-support-spectrum-scale-5-0-0/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale On AWS Cloud : This video explains how to deploy IBM Spectrum Scale on AWS. This solution helps the users who require highly available access to a shared name space across multiple instances with good performance, without requiring an in-depth knowledge of IBM Spectrum Scale. Detailed Demo : https://www.youtube.com/watch?v=6j5Xj_d0bh4 Brief Demo : https://www.youtube.com/watch?v=-aMQKPW_RfY. For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 01/10/2018 12:13 PM Subject: Re: Latest Technical Blogs on Spectrum Scale Dear User Group Members, Here are list of development blogs in the last quarter. Passing it to this email group as Doris had got a feedback in the UG meetings to notify the members with the latest updates periodically. Genomic Workloads ? How To Get it Right From Infrastructure Point Of View. https://developer.ibm.com/storage/2018/01/06/genomic-workloads-get-right-infrastructure-point-view/ IBM Spectrum Scale Versus Apache Hadoop HDFS https://developer.ibm.com/storage/2018/01/10/spectrumscale_vs_hdfs/ ESS Fault Tolerance https://developer.ibm.com/storage/2018/01/09/ess-fault-tolerance/ IBM Spectrum Scale MMFSCK ? Savvy Enhancements https://developer.ibm.com/storage/2018/01/05/ibm-spectrum-scale-mmfsck-savvy-enhancements/ ESS Disk Management https://developer.ibm.com/storage/2018/01/02/ess-disk-management/ IBM Spectrum Scale Object Protocol On Ubuntu https://developer.ibm.com/storage/2018/01/01/ibm-spectrum-scale-object-protocol-ubuntu/ IBM Spectrum Scale 5.0 ? Whats new in Unified File and Object https://developer.ibm.com/storage/2017/12/20/ibm-spectrum-scale-5-0-whats-new-object/ A Complete Guide to ? Protocol Problem Determination Guide for IBM Spectrum Scale? ? Part 1 https://developer.ibm.com/storage/2017/12/19/complete-guide-protocol-problem-determination-guide-ibm-spectrum-scale-1/ IBM Spectrum Scale installation toolkit ? enhancements over releases https://developer.ibm.com/storage/2017/12/15/ibm-spectrum-scale-installation-toolkit-enhancements-releases/ Network requirements in an Elastic Storage Server Setup https://developer.ibm.com/storage/2017/12/13/network-requirements-in-an-elastic-storage-server-setup/ Co-resident migration with Transparent cloud tierin https://developer.ibm.com/storage/2017/12/05/co-resident-migration-transparent-cloud-tierin/ IBM Spectrum Scale on Hortonworks HDP Hadoop clusters : A Complete Big Data Solution https://developer.ibm.com/storage/2017/12/05/ibm-spectrum-scale-hortonworks-hdp-hadoop-clusters-complete-big-data-solution/ Big data analytics with Spectrum Scale using remote cluster mount & multi-filesystem support https://developer.ibm.com/storage/2017/11/28/big-data-analytics-spectrum-scale-using-remote-cluster-mount-multi-filesystem-support/ IBM Spectrum Scale HDFS Transparency Short Circuit Write Support https://developer.ibm.com/storage/2017/11/28/ibm-spectrum-scale-hdfs-transparency-short-circuit-write-support/ IBM Spectrum Scale HDFS Transparency Federation Support https://developer.ibm.com/storage/2017/11/27/ibm-spectrum-scale-hdfs-transparency-federation-support/ How to configure and performance tuning different system workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-different-system-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Spark workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-spark-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning database workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/27/configure-performance-tuning-database-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ How to configure and performance tuning Hadoop workloads on IBM Spectrum Scale Sharing Nothing Cluster https://developer.ibm.com/storage/2017/11/24/configure-performance-tuning-hadoop-workloads-ibm-spectrum-scale-sharing-nothing-cluster/ IBM Spectrum Scale Sharing Nothing Cluster Performance Tuning https://developer.ibm.com/storage/2017/11/24/ibm-spectrum-scale-sharing-nothing-cluster-performance-tuning/ How to Configure IBM Spectrum Scale? with NIS based Authentication. https://developer.ibm.com/storage/2017/11/21/configure-ibm-spectrum-scale-nis-based-authentication/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media From: Sandeep Ramesh/India/IBM To: gpfsug-discuss at spectrumscale.org Cc: Doris Conti/Poughkeepsie/IBM at IBMUS Date: 11/16/2017 08:15 PM Subject: Latest Technical Blogs on Spectrum Scale Dear User Group members, Here are the Development Blogs in last 3 months on Spectrum Scale Technical Topics. Spectrum Scale Monitoring ? Know More ? https://developer.ibm.com/storage/2017/11/16/spectrum-scale-monitoring-know/ IBM Spectrum Scale 5.0 Release ? What?s coming ! https://developer.ibm.com/storage/2017/11/14/ibm-spectrum-scale-5-0-release-whats-coming/ Four Essentials things to know for managing data ACLs on IBM Spectrum Scale? from Windows https://developer.ibm.com/storage/2017/11/13/four-essentials-things-know-managing-data-acls-ibm-spectrum-scale-windows/ GSSUTILS: A new way of running SSR, Deploying or Upgrading ESS Server https://developer.ibm.com/storage/2017/11/13/gssutils/ IBM Spectrum Scale Object Authentication https://developer.ibm.com/storage/2017/11/02/spectrum-scale-object-authentication/ Video Surveillance ? Choosing the right storage https://developer.ibm.com/storage/2017/11/02/video-surveillance-choosing-right-storage/ IBM Spectrum scale object deep dive training with problem determination https://www.slideshare.net/SmitaRaut/ibm-spectrum-scale-object-deep-dive-training Spectrum Scale as preferred software defined storage for Ubuntu OpenStack https://developer.ibm.com/storage/2017/09/29/spectrum-scale-preferred-software-defined-storage-ubuntu-openstack/ IBM Elastic Storage Server 2U24 Storage ? an All-Flash offering, a performance workhorse https://developer.ibm.com/storage/2017/10/06/ess-5-2-flash-storage/ A Complete Guide to Configure LDAP-based authentication with IBM Spectrum Scale? for File Access https://developer.ibm.com/storage/2017/09/21/complete-guide-configure-ldap-based-authentication-ibm-spectrum-scale-file-access/ Deploying IBM Spectrum Scale on AWS Quick Start https://developer.ibm.com/storage/2017/09/18/deploy-ibm-spectrum-scale-on-aws-quick-start/ Monitoring Spectrum Scale Object metrics https://developer.ibm.com/storage/2017/09/14/monitoring-spectrum-scale-object-metrics/ Tier your data with ease to Spectrum Scale Private Cloud(s) using Moonwalk Universal https://developer.ibm.com/storage/2017/09/14/tier-data-ease-spectrum-scale-private-clouds-using-moonwalk-universal/ Why do I see owner as ?Nobody? for my export mounted using NFSV4 Protocol on IBM Spectrum Scale?? https://developer.ibm.com/storage/2017/09/08/see-owner-nobody-export-mounted-using-nfsv4-protocol-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory and LDAP https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-ldap/ IBM Spectrum Scale? Authentication using Active Directory and RFC2307 https://developer.ibm.com/storage/2017/09/01/ibm-spectrum-scale-authentication-using-active-directory-rfc2307/ High Availability Implementation with IBM Spectrum Virtualize and IBM Spectrum Scale https://developer.ibm.com/storage/2017/08/30/high-availability-implementation-ibm-spectrum-virtualize-ibm-spectrum-scale/ 10 Frequently asked Questions on configuring Authentication using AD + AUTO ID mapping on IBM Spectrum Scale?. https://developer.ibm.com/storage/2017/08/04/10-frequently-asked-questions-configuring-authentication-using-ad-auto-id-mapping-ibm-spectrum-scale/ IBM Spectrum Scale? Authentication using Active Directory https://developer.ibm.com/storage/2017/07/30/ibm-spectrum-scale-auth-using-active-directory/ Five cool things that you didn?t know Transparent Cloud Tiering on Spectrum Scale can do https://developer.ibm.com/storage/2017/07/29/five-cool-things-didnt-know-transparent-cloud-tiering-spectrum-scale-can/ IBM Spectrum Scale GUI videos https://developer.ibm.com/storage/2017/07/25/ibm-spectrum-scale-gui-videos/ IBM Spectrum Scale? Authentication ? Planning for NFS Access https://developer.ibm.com/storage/2017/07/24/ibm-spectrum-scale-planning-nfs-access/ For more : Search /browse here: https://developer.ibm.com/storage/blog Consolidation list: https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/White%20Papers%20%26%20Media -------------- next part -------------- An HTML attachment was scrubbed... URL: From Renar.Grunenberg at huk-coburg.de Thu Oct 4 10:05:57 2018 From: Renar.Grunenberg at huk-coburg.de (Grunenberg, Renar) Date: Thu, 4 Oct 2018 09:05:57 +0000 Subject: [gpfsug-discuss] V5.0.2 and maxblocksize Message-ID: <3cc9ab310d6d42009f779ac0b1967a53@SMXRF105.msg.hukrf.de> Hallo All, i put a requirement for these gap. Link is here: http://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=125603 Please Vote. Regards Renar Renar Grunenberg Abteilung Informatik ? Betrieb HUK-COBURG Bahnhofsplatz 96444 Coburg Telefon: 09561 96-44110 Telefax: 09561 96-44104 E-Mail: Renar.Grunenberg at huk-coburg.de Internet: www.huk.de ________________________________ HUK-COBURG Haftpflicht-Unterst?tzungs-Kasse kraftfahrender Beamter Deutschlands a. G. in Coburg Reg.-Gericht Coburg HRB 100; St.-Nr. 9212/101/00021 Sitz der Gesellschaft: Bahnhofsplatz, 96444 Coburg Vorsitzender des Aufsichtsrats: Prof. Dr. Heinrich R. Schradin. Vorstand: Klaus-J?rgen Heitmann (Sprecher), Stefan Gronbach, Dr. Hans Olav Her?y, Dr. J?rg Rheinl?nder (stv.), Sarah R?ssler, Daniel Thomas. ________________________________ Diese Nachricht enth?lt vertrauliche und/oder rechtlich gesch?tzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese Nachricht irrt?mlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Nachricht. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Nachricht ist nicht gestattet. This information may contain confidential and/or privileged information. If you are not the intended recipient (or have received this information in error) please notify the sender immediately and destroy this information. Any unauthorized copying, disclosure or distribution of the material in this information is strictly forbidden. ________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Oct 4 20:54:48 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 4 Oct 2018 19:54:48 +0000 Subject: [gpfsug-discuss] Long I/O's on client but not on NSD server(s) Message-ID: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> Hi All, What does it mean if I have a few dozen very long I/O?s (50 - 75 seconds) on a gateway as reported by ?mmdiag ?iohist? and they all reference two of my eight NSD servers? ? but then I go to those 2 NSD servers and I don?t see any long I/O?s at all? In other words, if the problem (this time) were the backend storage, I should see long I/O?s on the NSD servers, right? I?m thinking this indicates that there is some sort of problem with either the client gateway itself or the network in between the gateway and the NSD server(s) ? thoughts??? Thanks in advance? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjdoherty at yahoo.com Thu Oct 4 20:58:19 2018 From: jjdoherty at yahoo.com (Jim Doherty) Date: Thu, 4 Oct 2018 19:58:19 +0000 (UTC) Subject: [gpfsug-discuss] Long I/O's on client but not on NSD server(s) In-Reply-To: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> References: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> Message-ID: <2043390893.1272.1538683099673@mail.yahoo.com> It could mean a shortage of nsd server threads?? or a congested network.?? Jim On Thursday, October 4, 2018, 3:55:10 PM EDT, Buterbaugh, Kevin L wrote: Hi All, What does it mean if I have a few dozen very long I/O?s (50 - 75 seconds) on a gateway as reported by ?mmdiag ?iohist? and they all reference two of my eight NSD servers? ? but then I go to those 2 NSD servers and I don?t see any long I/O?s at all? In other words, if the problem (this time) were the backend storage, I should see long I/O?s on the NSD servers, right? I?m thinking this indicates that there is some sort of problem with either the client gateway itself or the network in between the gateway and the NSD server(s) ? thoughts??? Thanks in advance? ?Kevin Buterbaugh - Senior System AdministratorVanderbilt University - Advanced Computing Center for Research and EducationKevin.Buterbaugh at vanderbilt.edu?- (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stockf at us.ibm.com Thu Oct 4 21:00:21 2018 From: stockf at us.ibm.com (Frederick Stock) Date: Thu, 4 Oct 2018 16:00:21 -0400 Subject: [gpfsug-discuss] Long I/O's on client but not on NSD server(s) In-Reply-To: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> References: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> Message-ID: My first guess would be the network between the NSD client and NSD server. netstat and ethtool may help to determine where the cause may lie, if it is on the NSD client. Obviously a switch on the network could be another source of the problem. Fred __________________________________________________ Fred Stock | IBM Pittsburgh Lab | 720-430-8821 stockf at us.ibm.com From: "Buterbaugh, Kevin L" To: gpfsug main discussion list Date: 10/04/2018 03:55 PM Subject: [gpfsug-discuss] Long I/O's on client but not on NSD server(s) Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi All, What does it mean if I have a few dozen very long I/O?s (50 - 75 seconds) on a gateway as reported by ?mmdiag ?iohist? and they all reference two of my eight NSD servers? ? but then I go to those 2 NSD servers and I don?t see any long I/O?s at all? In other words, if the problem (this time) were the backend storage, I should see long I/O?s on the NSD servers, right? I?m thinking this indicates that there is some sort of problem with either the client gateway itself or the network in between the gateway and the NSD server(s) ? thoughts??? Thanks in advance? ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From martinsworkmachine at gmail.com Thu Oct 4 21:05:53 2018 From: martinsworkmachine at gmail.com (J Martin Rushton) Date: Thu, 4 Oct 2018 21:05:53 +0100 Subject: [gpfsug-discuss] Long I/O's on client but not on NSD server(s) In-Reply-To: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> References: <50FD239F-E4D7-45EB-9769-DE7A0F3C4FA9@vanderbilt.edu> Message-ID: <651fe07d-e745-e844-2f9b-44fd78ccee24@gmail.com> I saw something similar a good few years ago (ie on an older version of GPFS).? IIRC the issue was one of contention: one or two served nodes were streaming IOs to/from the NSD servers and as a result other nodes were exhibiting insane IO times.? Can't be more helpful though, I no longer have access to the system. Regards, J Martin Rushton MBCS On 04/10/18 20:54, Buterbaugh, Kevin L wrote: > Hi All, > > What does it mean if I have a few dozen very long I/O?s (50 - 75 > seconds) on a gateway as reported by ?mmdiag ?iohist? and they all > reference two of my eight NSD servers? > > ? but then I go to those 2 NSD servers and I don?t see any long I/O?s > at all? > > In other words, if the problem (this time) were the backend storage, I > should see long I/O?s on the NSD servers, right? > > I?m thinking this indicates that there is some sort of problem with > either the client gateway itself or the network in between the gateway > and the NSD server(s) ? thoughts??? > > Thanks in advance? > > ? > Kevin Buterbaugh - Senior System Administrator > Vanderbilt University - Advanced Computing Center for Research and > Education > Kevin.Buterbaugh at vanderbilt.edu > ?- (615)875-9633 > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 9 14:38:21 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 9 Oct 2018 13:38:21 +0000 Subject: [gpfsug-discuss] Pmsensors and gui Message-ID: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler$1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From lgayne at us.ibm.com Tue Oct 9 14:43:09 2018 From: lgayne at us.ibm.com (Lyle Gayne) Date: Tue, 9 Oct 2018 09:43:09 -0400 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: Adding GUI personnel to respond. Lyle From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 10/09/2018 09:41 AM Subject: [gpfsug-discuss] Pmsensors and gui Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler $1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Tue Oct 9 14:54:51 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Tue, 9 Oct 2018 13:54:51 +0000 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: I can help with the first one as I had the issue a few weeks ago. The answer from support is below, verbatim. --------------------------------------------------------------------------------------------------------------------------------------------- When trying to resolve the IP-Address in the JAVA code the first entry entry in the list is returned. Just localhost was expected for this. If the order is other way around and the list starts with localhost.localdomain, the GUI unfortunately cannot resolve the real node name and will fail with the message seen in the log files. Thus I assume that this is the case for your customer. it seems that our code it not as tolerant as it should be for the localhost definitions in the /etc/hosts file on the GUI node. We need to change this in our code to handle accordingly. Please let the customer adjust this entry and place localhost at the top of the list. After this the task should run successful and the state should be OK again for the pm_collector. --------------------------------------------------------------------------------------------------------------------------------------------- checking the GUI nodes /etc/hosts it shows actually 127.0.0.1 localhost.localdomain localhost4 localhost4.localdomain4 localhost From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: 09 October 2018 14:38 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Pmsensors and gui Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler$1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas.koeninger at de.ibm.com Tue Oct 9 15:03:41 2018 From: andreas.koeninger at de.ibm.com (Andreas Koeninger) Date: Tue, 9 Oct 2018 14:03:41 +0000 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: An HTML attachment was scrubbed... URL: From rohwedder at de.ibm.com Tue Oct 9 15:56:14 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Tue, 9 Oct 2018 16:56:14 +0200 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: Hello Simon, the performance collector collects data from each node with the "hostname" as in /bin/hostname as key. The GUI reaches out to all nodes and tries to map the GPFS node name to the local hostname on that node. If the hostname is set identical to be "hostname" on all nodes, the mapping will not succeed, So you will have to use unique hostnames on all cluster nodes. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: "Sobey, Richard A" To: gpfsug main discussion list Date: 09.10.2018 16:00 Subject: Re: [gpfsug-discuss] Pmsensors and gui Sent by: gpfsug-discuss-bounces at spectrumscale.org I can help with the first one as I had the issue a few weeks ago. The answer from support is below, verbatim. --------------------------------------------------------------------------------------------------------------------------------------------- When trying to resolve the IP-Address in the JAVA code the first entry entry in the list is returned. Just localhost was expected for this. If the order is other way around and the list starts with localhost.localdomain, the GUI unfortunately cannot resolve the real node name and will fail with the message seen in the log files. Thus I assume that this is the case for your customer. it seems that our code it not as tolerant as it should be for the localhost definitions in the /etc/hosts file on the GUI node. We need to change this in our code to handle accordingly. Please let the customer adjust this entry and place localhost at the top of the list. After this the task should run successful and the state should be OK again for the pm_collector. --------------------------------------------------------------------------------------------------------------------------------------------- checking the GUI nodes /etc/hosts it shows actually 127.0.0.1 localhost.localdomain localhost4 localhost4.localdomain4 localhost From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: 09 October 2018 14:38 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Pmsensors and gui Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler $1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 17486462.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Tue Oct 9 15:56:24 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 9 Oct 2018 14:56:24 +0000 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: <320AAE68-5F40-48B7-97CF-DA0029DB76C2@bham.ac.uk> Yes we do indeed have: 127.0.0.1 localhost.localdomain localhost I saw a post on the list, but never the answer ? (I don?t think!) Simon From: on behalf of "andreas.koeninger at de.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 9 October 2018 at 15:04 To: "gpfsug-discuss at spectrumscale.org" Cc: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Pmsensors and gui Hi Simon, For your fist issue regarding the PM_MONITOR task, you may have hit a known issue. Please check if the following applies to your environment. I will get back to you for the second issue. -------------------- Solution: For this to fix, the customer should change the /etc/hosts entry for the 127.0.0.1 as follows: from current: 127.0.0.1 localhost.localdomain localhost to this: 127.0.0.1 localhost localhost.localdomain -------------------- Mit freundlichen Gr??en / Kind regards Andreas Koeninger Scrum Master and Software Developer / Spectrum Scale GUI and REST API IBM Systems &Technology Group, Integrated Systems Development / M069 ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Am Weiher 24 65451 Kelsterbach Phone: +49-7034-643-0867 Mobile: +49-7034-643-0867 E-Mail: andreas.koeninger at de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ----- Original message ----- From: Simon Thompson Sent by: gpfsug-discuss-bounces at spectrumscale.org To: "gpfsug-discuss at spectrumscale.org" Cc: Subject: [gpfsug-discuss] Pmsensors and gui Date: Tue, Oct 9, 2018 3:42 PM Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler$1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Tue Oct 9 15:59:35 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 9 Oct 2018 14:59:35 +0000 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> Message-ID: <57798FAD-4CD1-41DB-8F00-06F34E85D34F@bham.ac.uk> We do ? Its just the node is joined to the cluster as ?hostname1-data.cluster?, but it also has a primary (1GbE link) as ?hostname.cluster?? Simon From: on behalf of "rohwedder at de.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 9 October 2018 at 15:56 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Pmsensors and gui Hello Simon, the performance collector collects data from each node with the "hostname" as in /bin/hostname as key. The GUI reaches out to all nodes and tries to map the GPFS node name to the local hostname on that node. If the hostname is set identical to be "hostname" on all nodes, the mapping will not succeed, So you will have to use unique hostnames on all cluster nodes. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development ________________________________ Phone: +49 7034 6430190 IBM Deutschland Research & Development [cid:2__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@] E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany ________________________________ [Inactive hide details for "Sobey, Richard A" ---09.10.2018 16:00:32---I can help with the first one as I had the issue a few we]"Sobey, Richard A" ---09.10.2018 16:00:32---I can help with the first one as I had the issue a few weeks ago. The answer from support is below, From: "Sobey, Richard A" To: gpfsug main discussion list Date: 09.10.2018 16:00 Subject: Re: [gpfsug-discuss] Pmsensors and gui Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ I can help with the first one as I had the issue a few weeks ago. The answer from support is below, verbatim. --------------------------------------------------------------------------------------------------------------------------------------------- When trying to resolve the IP-Address in the JAVA code the first entry entry in the list is returned. Just localhost was expected for this. If the order is other way around and the list starts with localhost.localdomain, the GUI unfortunately cannot resolve the real node name and will fail with the message seen in the log files. Thus I assume that this is the case for your customer. it seems that our code it not as tolerant as it should be for the localhost definitions in the /etc/hosts file on the GUI node. We need to change this in our code to handle accordingly. Please let the customer adjust this entry and place localhost at the top of the list. After this the task should run successful and the state should be OK again for the pm_collector. --------------------------------------------------------------------------------------------------------------------------------------------- checking the GUI nodes /etc/hosts it shows actually 127.0.0.1 localhost.localdomain localhost4 localhost4.localdomain4 localhost From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: 09 October 2018 14:38 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Pmsensors and gui Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler$1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 46 bytes Desc: image001.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.png Type: image/png Size: 4660 bytes Desc: image002.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.gif Type: image/gif Size: 106 bytes Desc: image003.gif URL: From S.J.Thompson at bham.ac.uk Tue Oct 9 20:37:59 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 9 Oct 2018 19:37:59 +0000 Subject: [gpfsug-discuss] Protocols protocols ... Message-ID: <0D334EB6-4F92-4D03-B19E-A8AEA2957232@bham.ac.uk> So we have both SMB and NFS enabled in our cluster. For various reasons we want to only run SMB on some nodes and only run NFS on other nodes? We have used mmchnode to set the nodes into different groups and then have IP addresses associated with those groups which we want to use for SMB and NFS. All seems OK so far ? Now comes the problem, I can?t see a way to tell CES that group1 should run NFS and group2 SMB. We thought we had this cracked by removing the gpfs.smb packages from NFS nodes and ganesha from SMB nodes. Seems to work OK, EXCEPT ? sometimes nodes go into failed state, and it looks like this is because the SMB state is failed on the NFS only nodes ? This looks to me like GPFS is expecting protocol packages to be installed for both NFS and SMB. I worked out I can clear the failed state by running mmces service stop SMB -N node. The docs mention attributes, but I don?t see that they are used other than when running object? Any thoughts/comments/links to a doc page I missed? Or is it expected that both smb and nfs packages are required to be installed on all protocol nodes even if not being used on that node? Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 9 21:34:43 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 9 Oct 2018 21:34:43 +0100 Subject: [gpfsug-discuss] Protocols protocols ... In-Reply-To: <0D334EB6-4F92-4D03-B19E-A8AEA2957232@bham.ac.uk> References: <0D334EB6-4F92-4D03-B19E-A8AEA2957232@bham.ac.uk> Message-ID: On 09/10/18 20:37, Simon Thompson wrote: [SNIP] > > Any thoughts/comments/links to a doc page I missed? Or is it expected > that both smb and nfs packages are required to be installed on all > protocol nodes even if not being used on that node? > As a last resort could you notionally let them do both and fix it with iptables so they only appear to the outside world to be running one or the other? JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From kkr at lbl.gov Tue Oct 9 22:39:23 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Tue, 9 Oct 2018 14:39:23 -0700 Subject: [gpfsug-discuss] TO BE RESCHEDULED [was] - Re: Request for Enhancements (RFE) Forum - Submission Deadline October 1 In-Reply-To: <841FA5CA-5C6B-4626-8137-BA5994C3A651@bham.ac.uk> References: <52220937-CE0A-4949-89A0-6EA41D5ECF93@lbl.gov> <263e53c18647421f8b3cd936da0075df@jumptrading.com> <0341213A-6CB7-434F-A575-9099C2D0D703@spectrumscale.org> <585b21e7-d437-380f-65d8-d24fa236ce3b@nasa.gov> <841FA5CA-5C6B-4626-8137-BA5994C3A651@bham.ac.uk> Message-ID: Due to scheduling conflicts we need to reschedule the RFE meeting that was to happen tomorrow, October 10th. We received RFEs from 2 sites (NASA and Sloan Kettering), if you sent one and it was somehow missed. Please respond here, and we?ll pick up privately as follow up. More soon. Best, Kristy > On Sep 28, 2018, at 6:44 AM, Simon Thompson wrote: > > There is a limit on votes, not submissions. i.e. your site gets three votes, so you can't have three votes and someone else from Goddard also have three. > > We have to review the submissions, so as you say 10 we'd think unreasonable and skip, but a sensible number is OK. > > Simon > > ?On 28/09/2018, 13:52, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Aaron Knister" wrote: > > Hi Kristy, > > At some point I thought I'd read there was a per-site limit of the > number of RFEs that could be submitted but I can't find it skimming > through email. I'd think submitting 10 would be unreasonable but would 2 > or 3 be OK? > > -Aaron > > On 9/27/18 4:35 PM, Kristy Kallback-Rose wrote: >> Reminder, the*October 1st* deadline is approaching. We?re looking for at >> least a few RFEs (Requests For Enhancements) for this first forum, so if >> you?re interesting in promoting your RFE please reach out to one of us, >> or even here on the list. >> >> Thanks, >> Kristy >> >>> On Sep 7, 2018, at 3:00 AM, Simon Thompson (Spectrum Scale User Group >>> Chair) > wrote: >>> >>> GPFS/Spectrum Scale Users, >>> Here?s a long-ish note about our plans to try and improve the RFE >>> process. We?ve tried to include a tl;dr version if you just read the >>> headers. You?ll find the details underneath ;-) and reading to the end >>> is ideal. >>> >>> IMPROVING THE RFE PROCESS >>> As you?ve heard on the list, and at some of the in-person User Group >>> events, we?ve been talking about ways we can improve the RFE process. >>> We?d like to begin having an RFE forum, and have it be de-coupled from >>> the in-person events because we know not everyone can travel. >>> LIGHTNING PRESENTATIONS ON-LINE >>> In general terms, we?d have regular on-line events, where RFEs could >>> be/very briefly/(5 minutes, lightning talk) presented by the >>> requester. There would then be time for brief follow-on discussion >>> and questions. The session would be recorded to deal with large time >>> zone differences. >>> The live meeting is planned for October 10^th 2018, at 4PM BST (that >>> should be 11am EST if we worked is out right!) >>> FOLLOW UP POLL >>> A poll, independent of current RFE voting, would be conducted a couple >>> days after the recording was available to gather votes and feedback >>> on the RFEs submitted ?we may collect site name, to see how many votes >>> are coming from a certain site. >>> >>> MAY NOT GET IT RIGHT THE FIRST TIME >>> We view this supplemental RFE process as organic, that is, we?ll learn >>> as we go and make modifications. The overall goal here is to highlight >>> the RFEs that matter the most to the largest number of UG members by >>> providing a venue for people to speak about their RFEs and collect >>> feedback from fellow community members. >>> >>> *RFE PRESENTERS WANTED, SUBMISSION DEADLINE OCTOBER 1ST >>> *We?d like to guide a small handful of RFE submitters through this >>> process the first time around, so if you?re interested in being a >>> presenter, let us know now. We?re planning on doing the online meeting >>> and poll for the first time in mid-October, so the submission deadline >>> for your RFE is October 1st. If it?s useful, when you?re drafting your >>> RFE feel free to use the list as a sounding board for feedback. Often >>> sites have similar needs and you may find someone to collaborate with >>> on your RFE to make it useful to more sites, and thereby get more >>> votes. Some guidelines are here: >>> https://drive.google.com/file/d/1o8nN39DTU32qj_EFia5wRhnvfvNfr3cI/view?usp=sharing >>> You can submit you RFE by email to:rfe at spectrumscaleug.org >>> >>> >>> *PARTICIPANTS (AKA YOU!!), VIEW AND VOTE >>> *We are seeking very good participation in the RFE on-line events >>> needed to make this an effective method of Spectrum Scale Community >>> and IBM Developer collaboration. * It is to your benefit to >>> participate and help set priorities on Spectrum Scale enhancements!! >>> *We want to make this process light lifting for you as a participant. >>> We will limit the duration of the meeting to 1 hour to minimize the >>> use of your valuable time. >>> >>> Please register for the online meeting via Eventbrite >>> (https://www.eventbrite.com/e/spectrum-scale-request-for-enhancements-voting-tickets-49979954389) >>> ? we?ll send details of how to join the online meeting nearer the time. >>> >>> Thanks! >>> >>> Simon, Kristy, Bob, Bryan and Carl! >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss atspectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From kkr at lbl.gov Wed Oct 10 03:08:16 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Tue, 9 Oct 2018 19:08:16 -0700 Subject: [gpfsug-discuss] Still need a couple User Talks for SC18 Message-ID: Hello, Please reach out if you?re even a little bit interested, we really want to balance the agenda with user presentations. Thanks, Kristy From kkr at lbl.gov Wed Oct 10 03:13:36 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Tue, 9 Oct 2018 19:13:36 -0700 Subject: [gpfsug-discuss] Still need a couple User Talks for SC18 In-Reply-To: References: Message-ID: <5EAF422E-FD80-4370-8267-959D4E89A0B5@lbl.gov> PS - If you?ve already contacted me about talking can you please ping me again? I?m drowning in stuff-to-do sauce. Thanks, Kristy > On Oct 9, 2018, at 7:08 PM, Kristy Kallback-Rose wrote: > > Hello, > > Please reach out if you?re even a little bit interested, we really want to balance the agenda with user presentations. > > Thanks, > Kristy From rohwedder at de.ibm.com Wed Oct 10 09:24:58 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Wed, 10 Oct 2018 10:24:58 +0200 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: <57798FAD-4CD1-41DB-8F00-06F34E85D34F@bham.ac.uk> References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> <57798FAD-4CD1-41DB-8F00-06F34E85D34F@bham.ac.uk> Message-ID: Hello Simon, not sure if the answer solved your question from the response, Even if nodes can be externally resolved by unique hostnames, applications that run on the host use the /bin/hostname binary or the hostname() call to identify the node they are running on. This is the case with the performance collection sensor. So you need to set the hostname of the hosts using /bin/hostname in in a way that provides unique responses of the "/bin/hostname" call within a cluster. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany From: Simon Thompson To: gpfsug main discussion list Date: 09.10.2018 17:00 Subject: Re: [gpfsug-discuss] Pmsensors and gui Sent by: gpfsug-discuss-bounces at spectrumscale.org We do ? Its just the node is joined to the cluster as ?hostname1-data.cluster?, but it also has a primary (1GbE link) as ?hostname.cluster?? Simon From: on behalf of "rohwedder at de.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 9 October 2018 at 15:56 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Pmsensors and gui Hello Simon, the performance collector collects data from each node with the "hostname" as in /bin/hostname as key. The GUI reaches out to all nodes and tries to map the GPFS node name to the local hostname on that node. If the hostname is set identical to be "hostname" on all nodes, the mapping will not succeed, So you will have to use unique hostnames on all cluster nodes. Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development |------------------------------------------------+------------------------------------------------+-------------------------------> | | | | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@| | | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |Phone: |+49 7034 6430190 |IBM Deutschland Research & | | | |Development | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| |cid:2__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@| >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |E-Mail: |rohwedder at de.ibm.com |Am Weiher 24 | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@|cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@|65451 Kelsterbach | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@|cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@|Germany | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> | | | | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| |------------------------------------------------+------------------------------------------------+-------------------------------> |cid:1__=8FBB09B2DFC235B78f9e8a93df938690918c8FB@| | | |------------------------------------------------+------------------------------------------------+-------------------------------> >------------------------------------------------| | | >------------------------------------------------| Inactive hide details for "Sobey, Richard A" ---09.10.2018 16:00:32---I can help with the first one as I had the issue a few we"Sobey, Richard A" ---09.10.2018 16:00:32---I can help with the first one as I had the issue a few weeks ago. The answer from support is below, From: "Sobey, Richard A" To: gpfsug main discussion list Date: 09.10.2018 16:00 Subject: Re: [gpfsug-discuss] Pmsensors and gui Sent by: gpfsug-discuss-bounces at spectrumscale.org I can help with the first one as I had the issue a few weeks ago. The answer from support is below, verbatim. --------------------------------------------------------------------------------------------------------------------------------------------- When trying to resolve the IP-Address in the JAVA code the first entry entry in the list is returned. Just localhost was expected for this. If the order is other way around and the list starts with localhost.localdomain, the GUI unfortunately cannot resolve the real node name and will fail with the message seen in the log files. Thus I assume that this is the case for your customer. it seems that our code it not as tolerant as it should be for the localhost definitions in the /etc/hosts file on the GUI node. We need to change this in our code to handle accordingly. Please let the customer adjust this entry and place localhost at the top of the list. After this the task should run successful and the state should be OK again for the pm_collector. --------------------------------------------------------------------------------------------------------------------------------------------- checking the GUI nodes /etc/hosts it shows actually 127.0.0.1 localhost.localdomain localhost4 localhost4.localdomain4 localhost From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Simon Thompson Sent: 09 October 2018 14:38 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Pmsensors and gui Hi, I have a couple of a problems with the GUI and the stats data in there ? First, on the gui node, I am getting ?The following GUI refresh task(s) failed: PM_MONITOR?, looking at the log for this: PM_MONITOR * 2018-10-09 14:35:31 15ms failed RefreshTaskScheduler$1.run com.ibm.fscc.common.exceptions.FsccException: No entity found for NODE: null/localhost.localdomain Suggestions? Second, a bunch of my hosts have multiple NICs on different networks, they are joined to the cluster with the name hostname1-data, however the ?primary? hostname of the host is ?hostname?. I see summary stats information in the GUI which references the shortname of the host, but when I click the host in the GUI, it claims no data ? I assume because the GPFS hostname is the -data nama and pmsensors is using the primary hostname. Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19742873.gif Type: image/gif Size: 4659 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19933766.gif Type: image/gif Size: 46 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19033540.gif Type: image/gif Size: 4660 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19192281.gif Type: image/gif Size: 106 bytes Desc: not available URL: From robbyb at us.ibm.com Wed Oct 10 14:07:10 2018 From: robbyb at us.ibm.com (Rob Basham) Date: Wed, 10 Oct 2018 13:07:10 +0000 Subject: [gpfsug-discuss] Still need a couple User Talks for SC18 In-Reply-To: <5EAF422E-FD80-4370-8267-959D4E89A0B5@lbl.gov> References: <5EAF422E-FD80-4370-8267-959D4E89A0B5@lbl.gov>, Message-ID: An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Wed Oct 10 14:22:52 2018 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]) Date: Wed, 10 Oct 2018 13:22:52 +0000 Subject: [gpfsug-discuss] Still need a couple User Talks for SC18 In-Reply-To: References: <5EAF422E-FD80-4370-8267-959D4E89A0B5@lbl.gov>, , Message-ID: <9DF57532-9CF1-4288-AB75-6937F583953D@nasa.gov> If there?s interest I could do a short presentation on our 1k node virtual GPFS test cluster (with SR-IOV and real IB RDMA!) and some of the benefits we?ve found (including helping squash a nasty hard-to-reproduce bug) as well as how we use it to test upgrades. On October 10, 2018 at 09:07:24 EDT, Rob Basham wrote: Kristy, I'll be at SC18 for client presentations and could talk about TCT. We have a big release coming up in 1H18 with multi-site support and we've broken out of the gateway paradigm to where we work on every client node in the cluster for key data path work. If you have a slot I could talk about that. Regards, Rob Basham MCStore and IBM Ready Archive architecture 971-344-1999 ----- Original message ----- From: Kristy Kallback-Rose Sent by: gpfsug-discuss-bounces at spectrumscale.org To: gpfsug main discussion list Cc: Subject: Re: [gpfsug-discuss] Still need a couple User Talks for SC18 Date: Tue, Oct 9, 2018 7:13 PM PS - If you?ve already contacted me about talking can you please ping me again? I?m drowning in stuff-to-do sauce. Thanks, Kristy > On Oct 9, 2018, at 7:08 PM, Kristy Kallback-Rose wrote: > > Hello, > > Please reach out if you?re even a little bit interested, we really want to balance the agenda with user presentations. > > Thanks, > Kristy _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Wed Oct 10 14:58:24 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Wed, 10 Oct 2018 13:58:24 +0000 Subject: [gpfsug-discuss] Still need a couple User Talks for SC18 In-Reply-To: <9DF57532-9CF1-4288-AB75-6937F583953D@nasa.gov> References: <5EAF422E-FD80-4370-8267-959D4E89A0B5@lbl.gov> <9DF57532-9CF1-4288-AB75-6937F583953D@nasa.gov> Message-ID: <0835F404-DF06-4237-A1AA-8553E28E1343@nuance.com> User talks - For those interested, please email Kristy and/or myself directly. Rob/other IBMers - work with Ulf Troppens on slots. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Oct 10 16:06:09 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Wed, 10 Oct 2018 11:06:09 -0400 Subject: [gpfsug-discuss] Pmsensors and gui In-Reply-To: References: <7C158C8E-A1CB-4460-854F-8439A831D8AD@bham.ac.uk> <57798FAD-4CD1-41DB-8F00-06F34E85D34F@bham.ac.uk> Message-ID: <11037.1539183969@turing-police.cc.vt.edu> On Wed, 10 Oct 2018 10:24:58 +0200, "Markus Rohwedder" said: > Hello Simon, > > not sure if the answer solved your question from the response, > > Even if nodes can be externally resolved by unique hostnames, applications > that run on the host use the /bin/hostname binary or the hostname() call to > identify the node they are running on. > This is the case with the performance collection sensor. > So you need to set the hostname of the hosts using /bin/hostname in in a > way that provides unique responses of the "/bin/hostname" call within a > cluster. And we discovered that 'unique' applies to "only considering the leftmost part of the hostname". We set up a stretch cluster that had 3 NSD servers at each of two locations, and found that using FQDN names of the form: nsd1.something.loc1.internal nsd2.something.loc1.internal nsd1.something.loc2.internal nsd2.something.loc2.internal got things all sorts of upset in a very passive-agressive way. The cluster would come up, and serve data just fine. But things like 'nsdperf' would toss errors about not being able to resolve a NSD server name, or fail to connect, or complain that it was connecting to itself, or other similar "not talking to the node it thought" type confusion... We ended up renaming to: nsd1-loc1.something.internal nsd1-loc2.something.internal ... and all the userspace tools started working much better. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From r.sobey at imperial.ac.uk Wed Oct 10 16:43:45 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Wed, 10 Oct 2018 15:43:45 +0000 Subject: [gpfsug-discuss] Performance collector no results for Capacity Message-ID: Hi all, Maybe I'm barking up the wrong tree but I'm debugging why I don't get a nice graph in the GUI for fileset capacity, even though the GUI does know about things such as capacity and inodes and usage. So off I go to the CLI to run "mmperfmon query GPFSFilesetQuota" and I get this: Oct-10 16:33:28 [Info ] QueryEngine: (fd=64) query from 127.0.0.1: get metrics GPFSFilesetQuota from node=icgpfsq1 last 10 bucket_size 1 Oct-10 16:33:28 [Info ] QueryParser: metric: GPFSFilesetQuota Oct-10 16:33:28 [Warning] QueryEngine: searchForMetric: could not find metaKey for given metric GPFSFilesetQuota, returning. Oct-10 16:33:28 [Info ] QueryEngine: [fd=64] no data available for query Is this a golden ticket to my problem or should I be checking elsewhere? I'm following a troubleshooting guide here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1pdg_guiperfmonissues.htm and from the page directly within the GUI server itself. Notably, other things work ok: [root at icgpfsq1 richard]# mmperfmon query cpu_user Legend: 1: icgpfsq1|CPU|cpu_user Row Timestamp cpu_user 1 2018-10-10-16:41:09 0.00 2 2018-10-10-16:41:10 0.25 3 2018-10-10-16:41:11 0.50 4 2018-10-10-16:41:12 0.50 5 2018-10-10-16:41:13 0.50 6 2018-10-10-16:41:14 0.25 7 2018-10-10-16:41:15 1.25 8 2018-10-10-16:41:16 2.51 9 2018-10-10-16:41:17 0.25 10 2018-10-10-16:41:18 0.25 I'm running 5.0.1-2 on all nodes except the NSD servers which still run 5.0.0.2. Thanks Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Oct 10 16:58:51 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 10 Oct 2018 15:58:51 +0000 Subject: [gpfsug-discuss] Sudo wrappers Message-ID: <88E47B96-DF0B-428A-92F6-1AEAEA4AA8EE@bham.ac.uk> OK, so I finally got a few minutes to play with the sudo wrappers. I read the docs on the GPFS website, setup my gpfsadmin user and made it so that root can ssh as the gpfsadmin user to the host. Except of course I?ve clearly misunderstood things, because when I do: [myusername at bber-dssg02 bin]$ sudo /usr/lpp/mmfs/bin/mmgetstate -a myusername at bber-afmgw01.bb2.cluster's password: myusername at bber-dssg02.bb2.cluster's password: myusername at bber-dssg01.bb2.cluster's password: myusername at bber-afmgw02.bb2.cluster's password: Now ?myusername? is ? my username, not ?gpfsadmin?. What I really don?t want to do is permit root to ssh to all the hosts in the cluster as ?myusername?. I kinda thought the username it sshes as would be configurable, but apparently not? Annoyingly, I can do: [myusername at bber-dssg02 bin]$ sudo SUDO_USER=gpfsadmin /usr/lpp/mmfs/bin/mmgetstate -a And that works fine? So is it possibly to set in a config file the user that the sudo wrapper works as? (I get there are cases where you want to ssh as the original calling user) Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From Fabrice.Cantos at niwa.co.nz Wed Oct 10 22:57:04 2018 From: Fabrice.Cantos at niwa.co.nz (Fabrice Cantos) Date: Wed, 10 Oct 2018 21:57:04 +0000 Subject: [gpfsug-discuss] Quick survey: ACL Posix vs NFS V4 Message-ID: <8120950808e344e280ae211ff22ba0bf@welwex02.niwa.local> I would be interested to know what you chose for your filesystems and user/project space directories: * Traditional Posix ACL * NFS V4 ACL What did motivate your choice? We are facing some issues to get the correct NFS ACL to keep correct attributes for new files created. Thanks Fabrice [cid:image4cef17.PNG at 18c66b76.4480e036] Fabrice Cantos HPC Systems Engineer Group Manager ? High Performance Computing T +64-4-386-0367 M +64-27-412-9693 National Institute of Water & Atmospheric Research Ltd (NIWA) 301 Evans Bay Parade, Greta Point, Wellington Connect with NIWA: niwa.co.nz Facebook Twitter LinkedIn Instagram To ensure compliance with legal requirements and to maintain cyber security standards, NIWA's IT systems are subject to ongoing monitoring, activity logging and auditing. This monitoring and auditing service may be provided by third parties. Such third parties can access information transmitted to, processed by and stored on NIWA's IT systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image4cef17.PNG Type: image/png Size: 12288 bytes Desc: image4cef17.PNG URL: From truongv at us.ibm.com Thu Oct 11 04:14:24 2018 From: truongv at us.ibm.com (Truong Vu) Date: Wed, 10 Oct 2018 23:14:24 -0400 Subject: [gpfsug-discuss] Sudo wrappers In-Reply-To: References: Message-ID: Yes, you can use mmchconfig for that. eg: mmchconfig sudoUser=gpfsadmin Thanks, Tru. Message: 2 Date: Wed, 10 Oct 2018 15:58:51 +0000 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Sudo wrappers Message-ID: <88E47B96-DF0B-428A-92F6-1AEAEA4AA8EE at bham.ac.uk> Content-Type: text/plain; charset="utf-8" OK, so I finally got a few minutes to play with the sudo wrappers. I read the docs on the GPFS website, setup my gpfsadmin user and made it so that root can ssh as the gpfsadmin user to the host. Except of course I?ve clearly misunderstood things, because when I do: [myusername at bber-dssg02 bin]$ sudo /usr/lpp/mmfs/bin/mmgetstate -a myusername at bber-afmgw01.bb2.cluster's password: myusername at bber-dssg02.bb2.cluster's password: myusername at bber-dssg01.bb2.cluster's password: myusername at bber-afmgw02.bb2.cluster's password: Now ?myusername? is ? my username, not ?gpfsadmin?. What I really don?t want to do is permit root to ssh to all the hosts in the cluster as ?myusername?. I kinda thought the username it sshes as would be configurable, but apparently not? Annoyingly, I can do: [myusername at bber-dssg02 bin]$ sudo SUDO_USER=gpfsadmin /usr/lpp/mmfs/bin/mmgetstate -a And that works fine? So is it possibly to set in a config file the user that the sudo wrapper works as? (I get there are cases where you want to ssh as the original calling user) Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: < http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20181010/6317be26/attachment-0001.html > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Anna.Greim at de.ibm.com Thu Oct 11 07:41:25 2018 From: Anna.Greim at de.ibm.com (Anna Greim) Date: Thu, 11 Oct 2018 08:41:25 +0200 Subject: [gpfsug-discuss] Performance collector no results for Capacity In-Reply-To: References: Message-ID: Hi Richard, one thing to note. You tried "mmperfmon query GPFSFilesetQuota" to get metric data. So you used the sensor's name instead of a metric name. And compared it to "mmperfmon query cpu_user" where you used the metric name. mmperfmon will not return data, if you use the sensor's name instead of a metric's name. I bet you got something like this returned: [root at test-51 ~]# mmperfmon query GPFSFilesetQuota Error: no data available for query . mmperfmon: Command failed. Examine previous error messages to determine cause. The log entries you found just tell you, that the collector does not know any metric named "GPFSFilesetQuota". Please try the query again with gpfs_rq_blk_current or gpfs_rq_file_current. If the collector never got any data for that metrics, it also does not know those metrics' names. But since you do not see any data in the GUI this might be the case. In this case please check with "mmperfmon config show" if the restrict field is set correctly. You should use the long gpfs name and not the hostname. You can check, if the configuration file was distributed correctly in checking the /opt/IBM/zimon/ZIMonSensors.cfg on the node that is supposed to start this monitor. If the mmperfmon command was able to identify the restrict value correctly, this node should have your configured period value instead of 0 in ZIMonSensors.cfg under the GPFSFilesetQuota sensor. All other nodes should include a period equal to 0. Furthermore, of course, the period for GPFSFilesetQuota should be higher than 0. Recommended is a value of 3600 (once per hour) since the underlying command is heavier on the system than other sensors. Change the values with the "mmperfmon config update" command, so that it is distributed in the system. E.g. "mmperfmon config update GPFSFilesetQuota.restrict=" and "mmperfmon config update GPFSFilesetQuota.period=3600" Mit freundlichen Gr??en / Kind regards Greim, Anna Software Engineer, Spectrum Scale Development IBM Systems Phone: +49-7034-2740981 IBM Deutschland Research & Development GmbH Mobil: +49-172-2646541 Am Weiher 24 Email: anna.greim at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" Date: 10/10/2018 17:43 Subject: [gpfsug-discuss] Performance collector no results for Capacity Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, Maybe I?m barking up the wrong tree but I?m debugging why I don?t get a nice graph in the GUI for fileset capacity, even though the GUI does know about things such as capacity and inodes and usage. So off I go to the CLI to run ?mmperfmon query GPFSFilesetQuota? and I get this: Oct-10 16:33:28 [Info ] QueryEngine: (fd=64) query from 127.0.0.1: get metrics GPFSFilesetQuota from node=icgpfsq1 last 10 bucket_size 1 Oct-10 16:33:28 [Info ] QueryParser: metric: GPFSFilesetQuota Oct-10 16:33:28 [Warning] QueryEngine: searchForMetric: could not find metaKey for given metric GPFSFilesetQuota, returning. Oct-10 16:33:28 [Info ] QueryEngine: [fd=64] no data available for query Is this a golden ticket to my problem or should I be checking elsewhere? I?m following a troubleshooting guide here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1pdg_guiperfmonissues.htm and from the page directly within the GUI server itself. Notably, other things work ok: [root at icgpfsq1 richard]# mmperfmon query cpu_user Legend: 1: icgpfsq1|CPU|cpu_user Row Timestamp cpu_user 1 2018-10-10-16:41:09 0.00 2 2018-10-10-16:41:10 0.25 3 2018-10-10-16:41:11 0.50 4 2018-10-10-16:41:12 0.50 5 2018-10-10-16:41:13 0.50 6 2018-10-10-16:41:14 0.25 7 2018-10-10-16:41:15 1.25 8 2018-10-10-16:41:16 2.51 9 2018-10-10-16:41:17 0.25 10 2018-10-10-16:41:18 0.25 I?m running 5.0.1-2 on all nodes except the NSD servers which still run 5.0.0.2. Thanks Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Thu Oct 11 08:54:01 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 11 Oct 2018 07:54:01 +0000 Subject: [gpfsug-discuss] Sudo wrappers In-Reply-To: References: Message-ID: <39DC4B5E-CAFD-489C-9BE5-42B83B29A8F5@bham.ac.uk> Nope that one doesn?t work ? I found it in the docs: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.2/com.ibm.spectrum.scale.v5r02.doc/bl1adm_mmchconfig.htm ?Specifies a non-root admin user ID to be used when sudo wrappers are enabled and a root-level background process calls an administration command directly instead of through sudo.? So it reads like it still wants to be ?me? unless it?s a background process. Simon From: on behalf of "truongv at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Thursday, 11 October 2018 at 04:14 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Sudo wrappers Yes, you can use mmchconfig for that. eg: mmchconfig sudoUser=gpfsadmin Thanks, Tru. Message: 2 Date: Wed, 10 Oct 2018 15:58:51 +0000 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Subject: [gpfsug-discuss] Sudo wrappers Message-ID: <88E47B96-DF0B-428A-92F6-1AEAEA4AA8EE at bham.ac.uk> Content-Type: text/plain; charset="utf-8" OK, so I finally got a few minutes to play with the sudo wrappers. I read the docs on the GPFS website, setup my gpfsadmin user and made it so that root can ssh as the gpfsadmin user to the host. Except of course I?ve clearly misunderstood things, because when I do: [myusername at bber-dssg02 bin]$ sudo /usr/lpp/mmfs/bin/mmgetstate -a myusername at bber-afmgw01.bb2.cluster's password: myusername at bber-dssg02.bb2.cluster's password: myusername at bber-dssg01.bb2.cluster's password: myusername at bber-afmgw02.bb2.cluster's password: Now ?myusername? is ? my username, not ?gpfsadmin?. What I really don?t want to do is permit root to ssh to all the hosts in the cluster as ?myusername?. I kinda thought the username it sshes as would be configurable, but apparently not? Annoyingly, I can do: [myusername at bber-dssg02 bin]$ sudo SUDO_USER=gpfsadmin /usr/lpp/mmfs/bin/mmgetstate -a And that works fine? So is it possibly to set in a config file the user that the sudo wrapper works as? (I get there are cases where you want to ssh as the original calling user) Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Thu Oct 11 13:10:00 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Thu, 11 Oct 2018 12:10:00 +0000 Subject: [gpfsug-discuss] Performance collector no results for Capacity In-Reply-To: References: Message-ID: Anna, Yes, that will be it! I was running the wrong command as you surmise. The GPFSFileSetQuota config appears to be correct: { name = "GPFSFilesetQuota" period = 3600 restrict = "icgpfsq1.cc.ic.ac.uk" }, However "mmperfmon query gpfs_rq_blk_current" just shows lots of null values, for example: Row Timestamp gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current 1 2018-10-11-13:07:31 null null null null null null null null 2 2018-10-11-13:07:32 null null null null null null null null 3 2018-10-11-13:07:33 null null null null null null null null 4 2018-10-11-13:07:34 null null null null null null null null 5 2018-10-11-13:07:35 null null null null null null null null 6 2018-10-11-13:07:36 null null null null null null null null 7 2018-10-11-13:07:37 null null null null null null null null 8 2018-10-11-13:07:38 null null null null null null null null 9 2018-10-11-13:07:39 null null null null null null null null 10 2018-10-11-13:07:40 null null null null null null null null Same with the metric gpfs_rq_file_current. I'll have a look at the PDF sent by Markus in the meantime. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Anna Greim Sent: 11 October 2018 07:41 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Performance collector no results for Capacity Hi Richard, one thing to note. You tried "mmperfmon query GPFSFilesetQuota" to get metric data. So you used the sensor's name instead of a metric name. And compared it to "mmperfmon query cpu_user" where you used the metric name. mmperfmon will not return data, if you use the sensor's name instead of a metric's name. I bet you got something like this returned: [root at test-51 ~]# mmperfmon query GPFSFilesetQuota Error: no data available for query . mmperfmon: Command failed. Examine previous error messages to determine cause. The log entries you found just tell you, that the collector does not know any metric named "GPFSFilesetQuota". Please try the query again with gpfs_rq_blk_current or gpfs_rq_file_current. If the collector never got any data for that metrics, it also does not know those metrics' names. But since you do not see any data in the GUI this might be the case. In this case please check with "mmperfmon config show" if the restrict field is set correctly. You should use the long gpfs name and not the hostname. You can check, if the configuration file was distributed correctly in checking the /opt/IBM/zimon/ZIMonSensors.cfg on the node that is supposed to start this monitor. If the mmperfmon command was able to identify the restrict value correctly, this node should have your configured period value instead of 0 in ZIMonSensors.cfg under the GPFSFilesetQuota sensor. All other nodes should include a period equal to 0. Furthermore, of course, the period for GPFSFilesetQuota should be higher than 0. Recommended is a value of 3600 (once per hour) since the underlying command is heavier on the system than other sensors. Change the values with the "mmperfmon config update" command, so that it is distributed in the system. E.g. "mmperfmon config update GPFSFilesetQuota.restrict=" and "mmperfmon config update GPFSFilesetQuota.period=3600" Mit freundlichen Gr??en / Kind regards Greim, Anna Software Engineer, Spectrum Scale Development IBM Systems ________________________________ Phone: +49-7034-2740981 IBM Deutschland Research & Development GmbH [cid:image001.gif at 01D46163.B6B21E10] Mobil: +49-172-2646541 Am Weiher 24 Email: anna.greim at de.ibm.com 65451 Kelsterbach Germany ________________________________ IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Sobey, Richard A" > To: "'gpfsug-discuss at spectrumscale.org'" > Date: 10/10/2018 17:43 Subject: [gpfsug-discuss] Performance collector no results for Capacity Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi all, Maybe I'm barking up the wrong tree but I'm debugging why I don't get a nice graph in the GUI for fileset capacity, even though the GUI does know about things such as capacity and inodes and usage. So off I go to the CLI to run "mmperfmon query GPFSFilesetQuota" and I get this: Oct-10 16:33:28 [Info ] QueryEngine: (fd=64) query from 127.0.0.1: get metrics GPFSFilesetQuota from node=icgpfsq1 last 10 bucket_size 1 Oct-10 16:33:28 [Info ] QueryParser: metric: GPFSFilesetQuota Oct-10 16:33:28 [Warning] QueryEngine: searchForMetric: could not find metaKey for given metric GPFSFilesetQuota, returning. Oct-10 16:33:28 [Info ] QueryEngine: [fd=64] no data available for query Is this a golden ticket to my problem or should I be checking elsewhere? I'm following a troubleshooting guide here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1pdg_guiperfmonissues.htm and from the page directly within the GUI server itself. Notably, other things work ok: [root at icgpfsq1 richard]# mmperfmon query cpu_user Legend: 1: icgpfsq1|CPU|cpu_user Row Timestamp cpu_user 1 2018-10-10-16:41:09 0.00 2 2018-10-10-16:41:10 0.25 3 2018-10-10-16:41:11 0.50 4 2018-10-10-16:41:12 0.50 5 2018-10-10-16:41:13 0.50 6 2018-10-10-16:41:14 0.25 7 2018-10-10-16:41:15 1.25 8 2018-10-10-16:41:16 2.51 9 2018-10-10-16:41:17 0.25 10 2018-10-10-16:41:18 0.25 I'm running 5.0.1-2 on all nodes except the NSD servers which still run 5.0.0.2. Thanks Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 1851 bytes Desc: image001.gif URL: From Anna.Greim at de.ibm.com Thu Oct 11 14:11:56 2018 From: Anna.Greim at de.ibm.com (Anna Greim) Date: Thu, 11 Oct 2018 15:11:56 +0200 Subject: [gpfsug-discuss] Performance collector no results for Capacity In-Reply-To: References: Message-ID: Hello Richard, the sensor is running once an hour and the default of mmperfmon returns the last 10 results in a bucket-size of 1 seconds. The sensor did not run in the time of 13:07:31-13:07:40. Please use the command again with the option -b 3600 or with --bucket-size=3600 and see if you've got any data for that time. If you get any data the question is, why the GUI isn't able to get the data. If you do not have any data (only null rows) the question is, why the collector does not get data or why the sensor does not collect data and sends them to the collector. Since you get data for the cpu_user metric it is more likely that the sensor is not collecting and sending anything. The guide from Markus should help you here. Otherwise just write again into the user group. Mit freundlichen Gr??en / Kind regards Greim, Anna Software Engineer, Spectrum Scale Development IBM Systems IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Sobey, Richard A" To: gpfsug main discussion list Date: 11/10/2018 14:10 Subject: Re: [gpfsug-discuss] Performance collector no results for Capacity Sent by: gpfsug-discuss-bounces at spectrumscale.org Anna, Yes, that will be it! I was running the wrong command as you surmise. The GPFSFileSetQuota config appears to be correct: { name = "GPFSFilesetQuota" period = 3600 restrict = "icgpfsq1.cc.ic.ac.uk" }, However ?mmperfmon query gpfs_rq_blk_current? just shows lots of null values, for example: Row Timestamp gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current gpfs_rq_blk_current 1 2018-10-11-13:07:31 null null null null null null null null 2 2018-10-11-13:07:32 null null null null null null null null 3 2018-10-11-13:07:33 null null null null null null null null 4 2018-10-11-13:07:34 null null null null null null null null 5 2018-10-11-13:07:35 null null null null null null null null 6 2018-10-11-13:07:36 null null null null null null null null 7 2018-10-11-13:07:37 null null null null null null null null 8 2018-10-11-13:07:38 null null null null null null null null 9 2018-10-11-13:07:39 null null null null null null null null 10 2018-10-11-13:07:40 null null null null null null null null Same with the metric gpfs_rq_file_current. I?ll have a look at the PDF sent by Markus in the meantime. Thanks Richard From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Anna Greim Sent: 11 October 2018 07:41 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Performance collector no results for Capacity Hi Richard, one thing to note. You tried "mmperfmon query GPFSFilesetQuota" to get metric data. So you used the sensor's name instead of a metric name. And compared it to "mmperfmon query cpu_user" where you used the metric name. mmperfmon will not return data, if you use the sensor's name instead of a metric's name. I bet you got something like this returned: [root at test-51 ~]# mmperfmon query GPFSFilesetQuota Error: no data available for query . mmperfmon: Command failed. Examine previous error messages to determine cause. The log entries you found just tell you, that the collector does not know any metric named "GPFSFilesetQuota". Please try the query again with gpfs_rq_blk_current or gpfs_rq_file_current. If the collector never got any data for that metrics, it also does not know those metrics' names. But since you do not see any data in the GUI this might be the case. In this case please check with "mmperfmon config show" if the restrict field is set correctly. You should use the long gpfs name and not the hostname. You can check, if the configuration file was distributed correctly in checking the /opt/IBM/zimon/ZIMonSensors.cfg on the node that is supposed to start this monitor. If the mmperfmon command was able to identify the restrict value correctly, this node should have your configured period value instead of 0 in ZIMonSensors.cfg under the GPFSFilesetQuota sensor. All other nodes should include a period equal to 0. Furthermore, of course, the period for GPFSFilesetQuota should be higher than 0. Recommended is a value of 3600 (once per hour) since the underlying command is heavier on the system than other sensors. Change the values with the "mmperfmon config update" command, so that it is distributed in the system. E.g. "mmperfmon config update GPFSFilesetQuota.restrict=" and "mmperfmon config update GPFSFilesetQuota.period=3600" Mit freundlichen Gr??en / Kind regards Greim, Anna Software Engineer, Spectrum Scale Development IBM Systems Phone: +49-7034-2740981 IBM Deutschland Research & Development GmbH Mobil: +49-172-2646541 Am Weiher 24 Email: anna.greim at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From: "Sobey, Richard A" To: "'gpfsug-discuss at spectrumscale.org'" < gpfsug-discuss at spectrumscale.org> Date: 10/10/2018 17:43 Subject: [gpfsug-discuss] Performance collector no results for Capacity Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi all, Maybe I?m barking up the wrong tree but I?m debugging why I don?t get a nice graph in the GUI for fileset capacity, even though the GUI does know about things such as capacity and inodes and usage. So off I go to the CLI to run ?mmperfmon query GPFSFilesetQuota? and I get this: Oct-10 16:33:28 [Info ] QueryEngine: (fd=64) query from 127.0.0.1: get metrics GPFSFilesetQuota from node=icgpfsq1 last 10 bucket_size 1 Oct-10 16:33:28 [Info ] QueryParser: metric: GPFSFilesetQuota Oct-10 16:33:28 [Warning] QueryEngine: searchForMetric: could not find metaKey for given metric GPFSFilesetQuota, returning. Oct-10 16:33:28 [Info ] QueryEngine: [fd=64] no data available for query Is this a golden ticket to my problem or should I be checking elsewhere? I?m following a troubleshooting guide here: https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.1/com.ibm.spectrum.scale.v5r01.doc/bl1pdg_guiperfmonissues.htm and from the page directly within the GUI server itself. Notably, other things work ok: [root at icgpfsq1 richard]# mmperfmon query cpu_user Legend: 1: icgpfsq1|CPU|cpu_user Row Timestamp cpu_user 1 2018-10-10-16:41:09 0.00 2 2018-10-10-16:41:10 0.25 3 2018-10-10-16:41:11 0.50 4 2018-10-10-16:41:12 0.50 5 2018-10-10-16:41:13 0.50 6 2018-10-10-16:41:14 0.25 7 2018-10-10-16:41:15 1.25 8 2018-10-10-16:41:16 2.51 9 2018-10-10-16:41:17 0.25 10 2018-10-10-16:41:18 0.25 I?m running 5.0.1-2 on all nodes except the NSD servers which still run 5.0.0.2. Thanks Richard_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: From spectrumscale at kiranghag.com Fri Oct 12 05:38:19 2018 From: spectrumscale at kiranghag.com (KG) Date: Fri, 12 Oct 2018 07:38:19 +0300 Subject: [gpfsug-discuss] error compiling IOR on GPFS Message-ID: Hi Folks I am trying to compile IOR on a GPFS filesystem and running into following errors. Github forum says that "The configure script does not add -lgpfs to the CFLAGS when it detects GPFS support." Any help on how to get around this? mpicc -DHAVE_CONFIG_H -I. -g -O2 -MT aiori-MPIIO.o -MD -MP -MF .deps/aiori-MPIIO.Tpo -c -o aiori-MPIIO.o aiori-MPIIO.c aiori-MPIIO.c: In function ?MPIIO_Xfer?: aiori-MPIIO.c:236:24: warning: assignment from incompatible pointer type [enabled by default] Access = MPI_File_write; ^ aiori-MPIIO.c:237:27: warning: assignment from incompatible pointer type [enabled by default] Access_at = MPI_File_write_at; ^ aiori-MPIIO.c:238:28: warning: assignment from incompatible pointer type [enabled by default] Access_all = MPI_File_write_all; ^ aiori-MPIIO.c:239:31: warning: assignment from incompatible pointer type [enabled by default] Access_at_all = MPI_File_write_at_all; ^ mv -f .deps/aiori-MPIIO.Tpo .deps/aiori-MPIIO.Po mpicc -g -O2 -o ior ior.o utilities.o parse_options.o aiori-POSIX.o aiori-MPIIO.o -lm aiori-POSIX.o: In function `gpfs_free_all_locks': /gpfs/Aramco_POC/ior-master/src/aiori-POSIX.c:118: undefined reference to `gpfs_fcntl' aiori-POSIX.o: In function `gpfs_access_start': aiori-POSIX.c:(.text+0x91f): undefined reference to `gpfs_fcntl' aiori-POSIX.o: In function `gpfs_access_end': aiori-POSIX.c:(.text+0xa04): undefined reference to `gpfs_fcntl' collect2: error: ld returned 1 exit status make[2]: *** [ior] Error 1 make[2]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' make[1]: *** [all] Error 2 make[1]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' make: *** [all-recursive] Error 1 Kiran -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnbent at gmail.com Fri Oct 12 05:50:45 2018 From: johnbent at gmail.com (John Bent) Date: Thu, 11 Oct 2018 22:50:45 -0600 Subject: [gpfsug-discuss] error compiling IOR on GPFS In-Reply-To: References: Message-ID: Kiran, Are you using the latest version of IOR? https://github.com/hpc/ior Thanks, John On Thu, Oct 11, 2018 at 10:39 PM KG wrote: > Hi Folks > > I am trying to compile IOR on a GPFS filesystem and running into following > errors. > > Github forum says that "The configure script does not add -lgpfs to the > CFLAGS when it detects GPFS support." > > Any help on how to get around this? > > mpicc -DHAVE_CONFIG_H -I. -g -O2 -MT aiori-MPIIO.o -MD -MP -MF > .deps/aiori-MPIIO.Tpo -c -o aiori-MPIIO.o aiori-MPIIO.c > aiori-MPIIO.c: In function ?MPIIO_Xfer?: > aiori-MPIIO.c:236:24: warning: assignment from incompatible pointer type > [enabled by default] > Access = MPI_File_write; > ^ > aiori-MPIIO.c:237:27: warning: assignment from incompatible pointer type > [enabled by default] > Access_at = MPI_File_write_at; > ^ > aiori-MPIIO.c:238:28: warning: assignment from incompatible pointer type > [enabled by default] > Access_all = MPI_File_write_all; > ^ > aiori-MPIIO.c:239:31: warning: assignment from incompatible pointer type > [enabled by default] > Access_at_all = MPI_File_write_at_all; > ^ > mv -f .deps/aiori-MPIIO.Tpo .deps/aiori-MPIIO.Po > mpicc -g -O2 -o ior ior.o utilities.o parse_options.o aiori-POSIX.o > aiori-MPIIO.o -lm > aiori-POSIX.o: In function `gpfs_free_all_locks': > /gpfs/Aramco_POC/ior-master/src/aiori-POSIX.c:118: undefined reference to > `gpfs_fcntl' > aiori-POSIX.o: In function `gpfs_access_start': > aiori-POSIX.c:(.text+0x91f): undefined reference to `gpfs_fcntl' > aiori-POSIX.o: In function `gpfs_access_end': > aiori-POSIX.c:(.text+0xa04): undefined reference to `gpfs_fcntl' > collect2: error: ld returned 1 exit status > make[2]: *** [ior] Error 1 > make[2]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' > make[1]: *** [all] Error 2 > make[1]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' > make: *** [all-recursive] Error 1 > > Kiran > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From r.sobey at imperial.ac.uk Fri Oct 12 11:09:49 2018 From: r.sobey at imperial.ac.uk (Sobey, Richard A) Date: Fri, 12 Oct 2018 10:09:49 +0000 Subject: [gpfsug-discuss] Performance collector no results for Capacity In-Reply-To: References: Message-ID: Hi Anna, Markus It was the incorrect restrict clause referencing the FQDN of the server, and not the GPFS daemon node name, that was causing the problem. This has now been updated and we have nice graphs ? Many thanks! Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Fri Oct 12 11:39:12 2018 From: spectrumscale at kiranghag.com (KG) Date: Fri, 12 Oct 2018 13:39:12 +0300 Subject: [gpfsug-discuss] error compiling IOR on GPFS In-Reply-To: References: Message-ID: Hi John Yes, I am using latest version from this link. Do I have to use any additional switches for compilation? I used following sequence ./bootstrap ./configure ./make (fails) On Fri, Oct 12, 2018 at 7:51 AM John Bent wrote: > Kiran, > > Are you using the latest version of IOR? > https://github.com/hpc/ior > > Thanks, > > John > > On Thu, Oct 11, 2018 at 10:39 PM KG wrote: > >> Hi Folks >> >> I am trying to compile IOR on a GPFS filesystem and running into >> following errors. >> >> Github forum says that "The configure script does not add -lgpfs to the >> CFLAGS when it detects GPFS support." >> >> Any help on how to get around this? >> >> mpicc -DHAVE_CONFIG_H -I. -g -O2 -MT aiori-MPIIO.o -MD -MP -MF >> .deps/aiori-MPIIO.Tpo -c -o aiori-MPIIO.o aiori-MPIIO.c >> aiori-MPIIO.c: In function ?MPIIO_Xfer?: >> aiori-MPIIO.c:236:24: warning: assignment from incompatible pointer type >> [enabled by default] >> Access = MPI_File_write; >> ^ >> aiori-MPIIO.c:237:27: warning: assignment from incompatible pointer type >> [enabled by default] >> Access_at = MPI_File_write_at; >> ^ >> aiori-MPIIO.c:238:28: warning: assignment from incompatible pointer type >> [enabled by default] >> Access_all = MPI_File_write_all; >> ^ >> aiori-MPIIO.c:239:31: warning: assignment from incompatible pointer type >> [enabled by default] >> Access_at_all = MPI_File_write_at_all; >> ^ >> mv -f .deps/aiori-MPIIO.Tpo .deps/aiori-MPIIO.Po >> mpicc -g -O2 -o ior ior.o utilities.o parse_options.o aiori-POSIX.o >> aiori-MPIIO.o -lm >> aiori-POSIX.o: In function `gpfs_free_all_locks': >> /gpfs/Aramco_POC/ior-master/src/aiori-POSIX.c:118: undefined reference to >> `gpfs_fcntl' >> aiori-POSIX.o: In function `gpfs_access_start': >> aiori-POSIX.c:(.text+0x91f): undefined reference to `gpfs_fcntl' >> aiori-POSIX.o: In function `gpfs_access_end': >> aiori-POSIX.c:(.text+0xa04): undefined reference to `gpfs_fcntl' >> collect2: error: ld returned 1 exit status >> make[2]: *** [ior] Error 1 >> make[2]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' >> make[1]: *** [all] Error 2 >> make[1]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' >> make: *** [all-recursive] Error 1 >> >> Kiran >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Fri Oct 12 11:43:41 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Fri, 12 Oct 2018 12:43:41 +0200 Subject: [gpfsug-discuss] error compiling IOR on GPFS In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Oct 15 15:11:34 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 15 Oct 2018 14:11:34 +0000 Subject: [gpfsug-discuss] mmfileid on 2 NSDs simultaneously? Message-ID: Hi All, Is there a way to run mmfileid on two NSD?s simultaneously? Thanks? Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alexander.Saupp at de.ibm.com Mon Oct 15 19:18:32 2018 From: Alexander.Saupp at de.ibm.com (Alexander Saupp) Date: Mon, 15 Oct 2018 20:18:32 +0200 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS Message-ID: Dear Spectrum Scale mailing list, I'm part of IBM Lab Services - currently i'm having multiple customers asking me for optimization of a similar workloads. The task is to tune a Spectrum Scale system (comprising ESS and CES protocol nodes) for the following workload: A single Linux NFS client mounts an NFS export, extracts a flat tar archive with lots of ~5KB files. I'm measuring the speed at which those 5KB files are written (`time tar xf archive.tar`). I do understand that Spectrum Scale is not designed for such workload (single client, single thread, small files, single directory), and that such benchmark in not appropriate to benmark the system. Yet I find myself explaining the performance for such scenario (git clone..) quite frequently, as customers insist that optimization of that scenario would impact individual users as it shows task duration. I want to make sure that I have optimized the system as much as possible for the given workload, and that I have not overlooked something obvious. When writing to GPFS directly I'm able to write ~1800 files / second in a test setup. This is roughly the same on the protocol nodes (NSD client), as well as on the ESS IO nodes (NSD server). When writing to the NFS export on the protocol node itself (to avoid any network effects) I'm only able to write ~230 files / second. Writing to the NFS export from another node (now including network latency) gives me ~220 files / second. There seems to be a huge performance degradation by adding NFS-Ganesha to the software stack alone. I wonder what can be done to minimize the impact. - Ganesha doesn't seem to support 'async' or 'no_wdelay' options... anything equivalent available? - Is there and expected advantage of using the network-latency tuned profile, as opposed to the ESS default throughput-performance profile? - Are there other relevant Kernel params? - Is there an expected advantage of raising the number of threads (NSD server (nsd*WorkerThreads) / NSD client (workerThreads) / Ganesha (NB_WORKER)) for the given workload (single client, single thread, small files)? - Are there other relevant GPFS params? - Impact of Sync replication, disk latency, etc is understood. - I'm aware that 'the real thing' would be to work with larger files in a multithreaded manner from multiple nodes - and that this scenario will scale quite well. I just want to ensure that I'm not missing something obvious over reiterating that massage to customers. Any help was greatly appreciated - thanks much in advance! Alexander Saupp IBM Germany Mit freundlichen Gr??en / Kind regards Alexander Saupp IBM Systems, Storage Platform, EMEA Storage Competence Center Phone: +49 7034-643-1512 IBM Deutschland GmbH Mobile: +49-172 7251072 Am Weiher 24 Email: alexander.saupp at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 54993307.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From makaplan at us.ibm.com Mon Oct 15 19:44:52 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 15 Oct 2018 14:44:52 -0400 Subject: [gpfsug-discuss] mmfileid on 2 NSDs simultaneously? In-Reply-To: References: Message-ID: How about using the -F option? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mutantllama at gmail.com Mon Oct 15 23:32:35 2018 From: mutantllama at gmail.com (Carl) Date: Tue, 16 Oct 2018 09:32:35 +1100 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Hi, We recently had a PMR open for Ganesha related performance issues, which was resolved with an eFix that updated Ganesha. If you are running GPFS v5 I would suggest contacting support. Cheers, Carl. On Tue, 16 Oct 2018 at 5:20 am, Alexander Saupp wrote: > Dear Spectrum Scale mailing list, > > I'm part of IBM Lab Services - currently i'm having multiple customers > asking me for optimization of a similar workloads. > > The task is to tune a Spectrum Scale system (comprising ESS and CES > protocol nodes) for the following workload: > A single Linux NFS client mounts an NFS export, extracts a flat tar > archive with lots of ~5KB files. > I'm measuring the speed at which those 5KB files are written (`time tar xf > archive.tar`). > > I do understand that Spectrum Scale is not designed for such workload > (single client, single thread, small files, single directory), and that > such benchmark in not appropriate to benmark the system. > Yet I find myself explaining the performance for such scenario (git > clone..) quite frequently, as customers insist that optimization of that > scenario would impact individual users as it shows task duration. > I want to make sure that I have optimized the system as much as possible > for the given workload, and that I have not overlooked something obvious. > > > When writing to GPFS directly I'm able to write ~1800 files / second in a > test setup. > This is roughly the same on the protocol nodes (NSD client), as well as on > the ESS IO nodes (NSD server). > When writing to the NFS export on the protocol node itself (to avoid any > network effects) I'm only able to write ~230 files / second. > Writing to the NFS export from another node (now including network > latency) gives me ~220 files / second. > > > There seems to be a huge performance degradation by adding NFS-Ganesha to > the software stack alone. I wonder what can be done to minimize the impact. > > > - Ganesha doesn't seem to support 'async' or 'no_wdelay' options... > anything equivalent available? > - Is there and expected advantage of using the network-latency tuned > profile, as opposed to the ESS default throughput-performance profile? > - Are there other relevant Kernel params? > - Is there an expected advantage of raising the number of threads (NSD > server (nsd*WorkerThreads) / NSD client (workerThreads) / Ganesha > (NB_WORKER)) for the given workload (single client, single thread, small > files)? > - Are there other relevant GPFS params? > - Impact of Sync replication, disk latency, etc is understood. > - I'm aware that 'the real thing' would be to work with larger files in a > multithreaded manner from multiple nodes - and that this scenario will > scale quite well. > I just want to ensure that I'm not missing something obvious over > reiterating that massage to customers. > > Any help was greatly appreciated - thanks much in advance! > Alexander Saupp > IBM Germany > > > Mit freundlichen Gr??en / Kind regards > > *Alexander Saupp* > > IBM Systems, Storage Platform, EMEA Storage Competence Center > ------------------------------ > Phone: +49 7034-643-1512 IBM Deutschland GmbH > Mobile: +49-172 7251072 Am Weiher 24 > Email: alexander.saupp at de.ibm.com 65451 Kelsterbach > Germany > ------------------------------ > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan > Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 54993307.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From kums at us.ibm.com Mon Oct 15 23:34:50 2018 From: kums at us.ibm.com (Kumaran Rajaram) Date: Mon, 15 Oct 2018 18:34:50 -0400 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Hi Alexander, 1. >>When writing to GPFS directly I'm able to write ~1800 files / second in a test setup. >>This is roughly the same on the protocol nodes (NSD client), as well as on the ESS IO nodes (NSD server). 2. >> When writing to the NFS export on the protocol node itself (to avoid any network effects) I'm only able to write ~230 files / second. IMHO #2, writing to the NFS export on the protocol node should be same as #1. Protocol node is also a NSD client and when you write from a protocol node, it will use the NSD protocol to write to the ESS IO nodes. In #1, you cite seeing ~1800 files from protocol node and in #2 you cite seeing ~230 file/sec which seem to contradict each other. >>Writing to the NFS export from another node (now including network latency) gives me ~220 files / second. IMHO, this workload "single client, single thread, small files, single directory - tar xf" is synchronous is nature and will result in single outstanding file to be sent from the NFS client to the CES node. Hence, the performance will be limited by network latency/capability between the NFS client and CES node for small IO size (~5KB file size). Also, what is the network interconnect/interface between the NFS client and CES node? Is the network 10GigE since @220 file/s for 5KiB file-size will saturate 1 x 10GigE link. 220 files/sec * 5KiB file size ==> ~1.126 GB/s. >> I'm aware that 'the real thing' would be to work with larger files in a multithreaded manner from multiple nodes - and that this scenario will scale quite well. Yes, larger file-size + multiple threads + multiple NFS client nodes will help to scale performance further by having more NFS I/O requests scheduled/pipelined over the network and processed on the CES nodes. >> I just want to ensure that I'm not missing something obvious over reiterating that massage to customers. Adding NFS experts/team, for advise. My two cents. Best Regards, -Kums From: "Alexander Saupp" To: gpfsug-discuss at spectrumscale.org Date: 10/15/2018 02:20 PM Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Dear Spectrum Scale mailing list, I'm part of IBM Lab Services - currently i'm having multiple customers asking me for optimization of a similar workloads. The task is to tune a Spectrum Scale system (comprising ESS and CES protocol nodes) for the following workload: A single Linux NFS client mounts an NFS export, extracts a flat tar archive with lots of ~5KB files. I'm measuring the speed at which those 5KB files are written (`time tar xf archive.tar`). I do understand that Spectrum Scale is not designed for such workload (single client, single thread, small files, single directory), and that such benchmark in not appropriate to benmark the system. Yet I find myself explaining the performance for such scenario (git clone..) quite frequently, as customers insist that optimization of that scenario would impact individual users as it shows task duration. I want to make sure that I have optimized the system as much as possible for the given workload, and that I have not overlooked something obvious. When writing to GPFS directly I'm able to write ~1800 files / second in a test setup. This is roughly the same on the protocol nodes (NSD client), as well as on the ESS IO nodes (NSD server). When writing to the NFS export on the protocol node itself (to avoid any network effects) I'm only able to write ~230 files / second. Writing to the NFS export from another node (now including network latency) gives me ~220 files / second. There seems to be a huge performance degradation by adding NFS-Ganesha to the software stack alone. I wonder what can be done to minimize the impact. - Ganesha doesn't seem to support 'async' or 'no_wdelay' options... anything equivalent available? - Is there and expected advantage of using the network-latency tuned profile, as opposed to the ESS default throughput-performance profile? - Are there other relevant Kernel params? - Is there an expected advantage of raising the number of threads (NSD server (nsd*WorkerThreads) / NSD client (workerThreads) / Ganesha (NB_WORKER)) for the given workload (single client, single thread, small files)? - Are there other relevant GPFS params? - Impact of Sync replication, disk latency, etc is understood. - I'm aware that 'the real thing' would be to work with larger files in a multithreaded manner from multiple nodes - and that this scenario will scale quite well. I just want to ensure that I'm not missing something obvious over reiterating that massage to customers. Any help was greatly appreciated - thanks much in advance! Alexander Saupp IBM Germany Mit freundlichen Gr??en / Kind regards Alexander Saupp IBM Systems, Storage Platform, EMEA Storage Competence Center Phone: +49 7034-643-1512 IBM Deutschland GmbH Mobile: +49-172 7251072 Am Weiher 24 Email: alexander.saupp at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: From Kevin.Buterbaugh at Vanderbilt.Edu Mon Oct 15 20:09:19 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Mon, 15 Oct 2018 19:09:19 +0000 Subject: [gpfsug-discuss] mmfileid on 2 NSDs simultaneously? In-Reply-To: References: Message-ID: <4C0E90D1-14DA-44A1-B037-95C17076193C@vanderbilt.edu> Marc, Ugh - sorry, completely overlooked that? Kevin On Oct 15, 2018, at 1:44 PM, Marc A Kaplan > wrote: How about using the -F option? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Cb6d9700cd6ff4bbed85808d632ce4ff2%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636752259026486137&sdata=mBfANLkK8v2ZEahGumE4a7iVIAcVJXb1Dv2kgSrynrI%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Tue Oct 16 01:42:14 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Mon, 15 Oct 2018 20:42:14 -0400 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: <5824.1539650534@turing-police.cc.vt.edu> On Mon, 15 Oct 2018 18:34:50 -0400, "Kumaran Rajaram" said: > 1. >>When writing to GPFS directly I'm able to write ~1800 files / second in a test setup. > >>This is roughly the same on the protocol nodes (NSD client), as well as > on the ESS IO nodes (NSD server). > > 2. >> When writing to the NFS export on the protocol node itself (to avoid > any network effects) I'm only able to write ~230 files / second. > IMHO #2, writing to the NFS export on the protocol node should be same as #1. > Protocol node is also a NSD client and when you write from a protocol node, it > will use the NSD protocol to write to the ESS IO nodes. In #1, you cite seeing > ~1800 files from protocol node and in #2 you cite seeing ~230 file/sec which > seem to contradict each other. I think he means this: 1) ssh nsd_server 2) cd /gpfs/filesystem/testarea 3) (whomp out 1800 files/sec) 4) mount -t nfs localhost:/gpfs/filesystem/testarea /mnt/test 5) cd /mnt/test 6) Watch the same test struggle to hit 230. Indicating the issue is going from NFS to GPFS (For what it's worth, we've had issues with Ganesha as well...) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From Achim.Rehor at de.ibm.com Tue Oct 16 10:39:14 2018 From: Achim.Rehor at de.ibm.com (Achim Rehor) Date: Tue, 16 Oct 2018 11:39:14 +0200 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 7182 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 45 bytes Desc: not available URL: From diederich at de.ibm.com Tue Oct 16 13:31:20 2018 From: diederich at de.ibm.com (Michael Diederich) Date: Tue, 16 Oct 2018 14:31:20 +0200 Subject: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS In-Reply-To: <5824.1539650534@turing-police.cc.vt.edu> References: <5824.1539650534@turing-police.cc.vt.edu> Message-ID: All NFS IO requires syncing. The client does send explicit fsync (commit). If the NFS server does not sync, a server fail will cause data loss! (for small files <1M it really does not matter if it is sync on write or sync on close/explicit commit) while that may be ok for a "git pull" or similar, in general it violates the NFS spec. The client can decide to cache, and usually NFSv4 does less caching (for better consistency) So the observed factor 100 is realistic. Latencies will make matters worse, so the FS should be tuned for very small random IO (small blocksize - small subblock-size will not help) If you were to put the Linux kernel NFS server into the picture, it will behave very much the same - although Ganesha could be a bit more efficient (by some percent - certainly less then 200%). But hey - this is a GPFS cluster not some NAS box. Run "git pull" on tthe GPFS client. Enjoy the 1800 files/sec (or more). Modify the files on your XY client mounting over NFS. Use a wrapper script to automatically have your AD or LDAP user id SSH into the cluster to perform it. Michael Mit freundlichen Gr??en / with best regards Michael Diederich IBM Systems Group Spectrum Scale Software Development Contact Information IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Koederitz Gesch?ftsf?hrung: Dirk Wittkopp Sitz der Gesellschaft: B?blingen Registergericht: Amtsgericht Stuttgart, HRB 243294 mail: fon: address: michael.diederich at de.ibm.com +49-7034-274-4062 Am Weiher 24 D-65451 Kelsterbach From: valdis.kletnieks at vt.edu To: gpfsug main discussion list Cc: Silvana De Gyves , Jay Vaddi , Michael Diederich Date: 10/16/2018 02:42 AM Subject: Re: [gpfsug-discuss] Tuning: single client, single thread, small files - native Scale vs NFS Sent by: Valdis Kletnieks On Mon, 15 Oct 2018 18:34:50 -0400, "Kumaran Rajaram" said: > 1. >>When writing to GPFS directly I'm able to write ~1800 files / second in a test setup. > >>This is roughly the same on the protocol nodes (NSD client), as well as > on the ESS IO nodes (NSD server). > > 2. >> When writing to the NFS export on the protocol node itself (to avoid > any network effects) I'm only able to write ~230 files / second. > IMHO #2, writing to the NFS export on the protocol node should be same as #1. > Protocol node is also a NSD client and when you write from a protocol node, it > will use the NSD protocol to write to the ESS IO nodes. In #1, you cite seeing > ~1800 files from protocol node and in #2 you cite seeing ~230 file/sec which > seem to contradict each other. I think he means this: 1) ssh nsd_server 2) cd /gpfs/filesystem/testarea 3) (whomp out 1800 files/sec) 4) mount -t nfs localhost:/gpfs/filesystem/testarea /mnt/test 5) cd /mnt/test 6) Watch the same test struggle to hit 230. Indicating the issue is going from NFS to GPFS (For what it's worth, we've had issues with Ganesha as well...) [attachment "att4z9wh.dat" deleted by Michael Diederich/Germany/IBM] -------------- next part -------------- An HTML attachment was scrubbed... URL: From KKR at lbl.gov Tue Oct 16 14:20:08 2018 From: KKR at lbl.gov (Kristy Kallback-Rose) Date: Tue, 16 Oct 2018 14:20:08 +0100 Subject: [gpfsug-discuss] Presentations and SC18 Sign Up Message-ID: Quick message, more later. The presentation bundle (zip file) from the September UG meeting at ORNL is now here: https://www.spectrumscaleug.org/presentations/ I'll add more details there soon. If you haven't signed up for SC18's UG meeting yet, you can should do so here: https://ibm.co/2CjZyHG SC18 agenda is being discussed today. Hoping for more details about that soon. Cheers, Kristy -------------- next part -------------- An HTML attachment was scrubbed... URL: From spectrumscale at kiranghag.com Tue Oct 16 17:44:08 2018 From: spectrumscale at kiranghag.com (KG) Date: Tue, 16 Oct 2018 19:44:08 +0300 Subject: [gpfsug-discuss] error compiling IOR on GPFS In-Reply-To: References: Message-ID: Thanks Olaf It worked. On Fri, Oct 12, 2018, 13:43 Olaf Weiser wrote: > I think the step you are missing is this... > > > > > ./configure LIBS=/usr/lpp/mmfs/lib/libgpfs.so > make > > > Mit freundlichen Gr??en / Kind regards > > > Olaf Weiser > > EMEA Storage Competence Center Mainz, German / IBM Systems, Storage > Platform, > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland > IBM Allee 1 > 71139 Ehningen > Phone: +49-170-579-44-66 > E-Mail: olaf.weiser at de.ibm.com > > ------------------------------------------------------------------------------------------------------------------------------------------- > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert > Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > From: KG > To: gpfsug main discussion list > Date: 10/12/2018 12:40 PM > Subject: Re: [gpfsug-discuss] error compiling IOR on GPFS > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Hi John > > Yes, I am using latest version from this link. > > Do I have to use any additional switches for compilation? I used following > sequence > ./bootstrap > ./configure > ./make (fails) > > > On Fri, Oct 12, 2018 at 7:51 AM John Bent <*johnbent at gmail.com* > > wrote: > Kiran, > > Are you using the latest version of IOR? > *https://github.com/hpc/ior* > > Thanks, > > John > > On Thu, Oct 11, 2018 at 10:39 PM KG <*spectrumscale at kiranghag.com* > > wrote: > Hi Folks > > I am trying to compile IOR on a GPFS filesystem and running into following > errors. > > Github forum says that "The configure script does not add -lgpfs to the > CFLAGS when it detects GPFS support." > > Any help on how to get around this? > > mpicc -DHAVE_CONFIG_H -I. -g -O2 -MT aiori-MPIIO.o -MD -MP -MF > .deps/aiori-MPIIO.Tpo -c -o aiori-MPIIO.o aiori-MPIIO.c > aiori-MPIIO.c: In function ?MPIIO_Xfer?: > aiori-MPIIO.c:236:24: warning: assignment from incompatible pointer type > [enabled by default] > Access = MPI_File_write; > ^ > aiori-MPIIO.c:237:27: warning: assignment from incompatible pointer type > [enabled by default] > Access_at = MPI_File_write_at; > ^ > aiori-MPIIO.c:238:28: warning: assignment from incompatible pointer type > [enabled by default] > Access_all = MPI_File_write_all; > ^ > aiori-MPIIO.c:239:31: warning: assignment from incompatible pointer type > [enabled by default] > Access_at_all = MPI_File_write_at_all; > ^ > mv -f .deps/aiori-MPIIO.Tpo .deps/aiori-MPIIO.Po > mpicc -g -O2 -o ior ior.o utilities.o parse_options.o aiori-POSIX.o > aiori-MPIIO.o -lm > aiori-POSIX.o: In function `gpfs_free_all_locks': > /gpfs/Aramco_POC/ior-master/src/aiori-POSIX.c:118: undefined reference to > `gpfs_fcntl' > aiori-POSIX.o: In function `gpfs_access_start': > aiori-POSIX.c:(.text+0x91f): undefined reference to `gpfs_fcntl' > aiori-POSIX.o: In function `gpfs_access_end': > aiori-POSIX.c:(.text+0xa04): undefined reference to `gpfs_fcntl' > collect2: error: ld returned 1 exit status > make[2]: *** [ior] Error 1 > make[2]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' > make[1]: *** [all] Error 2 > make[1]: Leaving directory `/gpfs/Aramco_POC/ior-master/src' > make: *** [all-recursive] Error 1 > > Kiran > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alexander.Saupp at de.ibm.com Wed Oct 17 12:44:41 2018 From: Alexander.Saupp at de.ibm.com (Alexander Saupp) Date: Wed, 17 Oct 2018 13:44:41 +0200 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Message-ID: Dear Mailing List readers, I've come to a preliminary conclusion that explains the behavior in an appropriate manner, so I'm trying to summarize my current thinking with this audience. Problem statement: Big performance derivation between native GPFS (fast) and loopback NFS mount on the same node (way slower) for single client, single thread, small files workload. Current explanation: tar seems to use close() on files, not fclose(). That is an application choice and common behavior. The ideas is to allow OS write caching to speed up process run time. When running locally on ext3 / xfs / GPFS / .. that allows async destaging of data down to disk, somewhat compromising data for better performance. As we're talking about write caching on the same node that the application runs on - a crash is missfortune but in the same failure domain. E.g. if you run a compile job that includes extraction of a tar and the node crashes you'll have to restart the entire job, anyhow. The NFSv2 spec defined that NFS io's are to be 'sync', probably because the compile job on the nfs client would survive if the NFS Server crashes, so the failure domain would be different NFSv3 in rfc1813 below acknowledged the performance impact and introduced the 'async' flag for NFS, which would handle IO's similar to local IOs, allowing to destage in the background. Keep in mind - applications, independent if running locally or via NFS can always decided to use the fclose() option, which will ensure that data is destaged to persistent storage right away. But its an applications choice if that's really mandatory or whether performance has higher priority. The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache down to disk - very filesystem independent. -> single client, single thread, small files workload on GPFS can be destaged async, allowing to hide latency and parallelizing disk IOs. -> NFS client IO's are sync, so the second IO can only be started after the first one hit non volatile memory -> much higher latency The Spectrum Scale NFS implementation (based on ganesha) does not support the async mount option, which is a bit of a pitty. There might also be implementation differences compared to kernel-nfs, I did not investigate into that direction. However, the principles of the difference are explained for my by the above behavior. One workaround that I saw working well for multiple customers was to replace the NFS client by a Spectrum Scale nsd client. That has two advantages, but is certainly not suitable in all cases: - Improved speed by efficent NSD protocol and NSD client side write caching - Write Caching in the same failure domain as the application (on NSD client) which seems to be more reasonable compared to NFS Server side write caching. References: NFS sync vs async https://tools.ietf.org/html/rfc1813 The write throughput bottleneck caused by the synchronous definition of write in the NFS version 2 protocol has been addressed by adding support so that the NFS server can do unsafe writes. Unsafe writes are writes which have not been committed to stable storage before the operation returns. This specification defines a method for committing these unsafe writes to stable storage in a reliable way. sync() vs fsync() https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm - An application program makes an fsync() call for a specified file. This causes all of the pages that contain modified data for that file to be written to disk. The writing is complete when the fsync() call returns to the program. - An application program makes a sync() call. This causes all of the file pages in memory that contain modified data to be scheduled for writing to disk. The writing is not necessarily complete when the sync() call returns to the program. - A user can enter the sync command, which in turn issues a sync() call. Again, some of the writes may not be complete when the user is prompted for input (or the next command in a shell script is processed). close() vs fclose() A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a file system to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.) Mit freundlichen Gr??en / Kind regards Alexander Saupp IBM Systems, Storage Platform, EMEA Storage Competence Center Phone: +49 7034-643-1512 IBM Deutschland GmbH Mobile: +49-172 7251072 Am Weiher 24 Email: alexander.saupp at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19995626.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From janfrode at tanso.net Wed Oct 17 13:24:01 2018 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 17 Oct 2018 08:24:01 -0400 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Do you know if the slow throughput is caused by the network/nfs-protocol layer, or does it help to use faster storage (ssd)? If on storage, have you considered if HAWC can help? I?m thinking about adding an SSD pool as a first tier to hold the active dataset for a similar setup, but that?s mainly to solve the small file read workload (i.e. random I/O ). -jf ons. 17. okt. 2018 kl. 07:47 skrev Alexander Saupp < Alexander.Saupp at de.ibm.com>: > Dear Mailing List readers, > > I've come to a preliminary conclusion that explains the behavior in an > appropriate manner, so I'm trying to summarize my current thinking with > this audience. > > *Problem statement: * > > Big performance derivation between native GPFS (fast) and loopback NFS > mount on the same node (way slower) for single client, single thread, small > files workload. > > > > *Current explanation:* > > tar seems to use close() on files, not fclose(). That is an > application choice and common behavior. The ideas is to allow OS write > caching to speed up process run time. > > When running locally on ext3 / xfs / GPFS / .. that allows async > destaging of data down to disk, somewhat compromising data for better > performance. > As we're talking about write caching on the same node that the > application runs on - a crash is missfortune but in the same failure domain. > E.g. if you run a compile job that includes extraction of a tar and > the node crashes you'll have to restart the entire job, anyhow. > > The NFSv2 spec defined that NFS io's are to be 'sync', probably > because the compile job on the nfs client would survive if the NFS Server > crashes, so the failure domain would be different > > NFSv3 in rfc1813 below acknowledged the performance impact and > introduced the 'async' flag for NFS, which would handle IO's similar to > local IOs, allowing to destage in the background. > > Keep in mind - applications, independent if running locally or via NFS > can always decided to use the fclose() option, which will ensure that data > is destaged to persistent storage right away. > But its an applications choice if that's really mandatory or whether > performance has higher priority. > > The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache > down to disk - very filesystem independent. > > > -> single client, single thread, small files workload on GPFS can be > destaged async, allowing to hide latency and parallelizing disk IOs. > -> NFS client IO's are sync, so the second IO can only be started after > the first one hit non volatile memory -> much higher latency > > > > The Spectrum Scale NFS implementation (based on ganesha) does not > support the async mount option, which is a bit of a pitty. There might also > be implementation differences compared to kernel-nfs, I did not investigate > into that direction. > > However, the principles of the difference are explained for my by the > above behavior. > > One workaround that I saw working well for multiple customers was to > replace the NFS client by a Spectrum Scale nsd client. > That has two advantages, but is certainly not suitable in all cases: > - Improved speed by efficent NSD protocol and NSD client side write > caching > - Write Caching in the same failure domain as the application (on > NSD client) which seems to be more reasonable compared to NFS Server side > write caching. > > > *References:* > > NFS sync vs async > https://tools.ietf.org/html/rfc1813 > *The write throughput bottleneck caused by the synchronous definition of > write in the NFS version 2 protocol has been addressed by adding support so > that the NFS server can do unsafe writes.* > Unsafe writes are writes which have not been committed to stable storage > before the operation returns. This specification defines a method for > committing these unsafe writes to stable storage in a reliable way. > > > *sync() vs fsync()* > > https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm > - An application program makes an fsync() call for a specified file. This > causes all of the pages that contain modified data for that file to be > written to disk. The writing is complete when the fsync() call returns to > the program. > > - An application program makes a sync() call. This causes all of the file > pages in memory that contain modified data to be scheduled for writing to > disk. The writing is not necessarily complete when the sync() call returns > to the program. > > - A user can enter the sync command, which in turn issues a sync() call. > Again, some of the writes may not be complete when the user is prompted for > input (or the next command in a shell script is processed). > > > *close() vs fclose()* > A successful close does not guarantee that the data has been successfully > saved to disk, as the kernel defers writes. It is not common for a file > system to flush the buffers when the stream is closed. If you need to be > sure that the data is > physically stored use fsync(2). (It will depend on the disk hardware at > this point.) > > > Mit freundlichen Gr??en / Kind regards > > *Alexander Saupp* > > IBM Systems, Storage Platform, EMEA Storage Competence Center > ------------------------------ > Phone: +49 7034-643-1512 IBM Deutschland GmbH > Mobile: +49-172 7251072 Am Weiher 24 > Email: alexander.saupp at de.ibm.com 65451 Kelsterbach > Germany > ------------------------------ > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan > Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 19995626.gif Type: image/gif Size: 1851 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: From olaf.weiser at de.ibm.com Wed Oct 17 14:15:12 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Wed, 17 Oct 2018 15:15:12 +0200 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Wed Oct 17 14:26:52 2018 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 17 Oct 2018 16:26:52 +0300 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Just to clarify ( from man exports): " async This option allows the NFS server to violate the NFS protocol and reply to requests before any changes made by that request have been committed to stable storage (e.g. disc drive). Using this option usually improves performance, but at the cost that an unclean server restart (i.e. a crash) can cause data to be lost or corrupted." With the Ganesha implementation in Spectrum Scale, it was decided not to allow this violation - so this async export options wasn't exposed. I believe that for those customers that agree to take the risk, using async mount option ( from the client) will achieve similar behavior. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Olaf Weiser" To: gpfsug main discussion list Date: 17/10/2018 16:16 Subject: Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Jallo Jan, you can expect to get slightly improved numbers from the lower response times of the HAWC ... but the loss of performance comes from the fact, that GPFS or (async kNFS) writes with multiple parallel threads - in opposite to e.g. tar via GaneshaNFS comes with single threads fsync on each file.. you'll never outperform e.g. 128 (maybe slower), but, parallel threads (running write-behind) <---> with one single but fast threads, .... so as Alex suggest.. if possible.. take gpfs client of kNFS for those types of workloads.. From: Jan-Frode Myklebust To: gpfsug main discussion list Date: 10/17/2018 02:24 PM Subject: Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Sent by: gpfsug-discuss-bounces at spectrumscale.org Do you know if the slow throughput is caused by the network/nfs-protocol layer, or does it help to use faster storage (ssd)? If on storage, have you considered if HAWC can help? I?m thinking about adding an SSD pool as a first tier to hold the active dataset for a similar setup, but that?s mainly to solve the small file read workload (i.e. random I/O ). -jf ons. 17. okt. 2018 kl. 07:47 skrev Alexander Saupp < Alexander.Saupp at de.ibm.com>: Dear Mailing List readers, I've come to a preliminary conclusion that explains the behavior in an appropriate manner, so I'm trying to summarize my current thinking with this audience. Problem statement: Big performance derivation between native GPFS (fast) and loopback NFS mount on the same node (way slower) for single client, single thread, small files workload. Current explanation: tar seems to use close() on files, not fclose(). That is an application choice and common behavior. The ideas is to allow OS write caching to speed up process run time. When running locally on ext3 / xfs / GPFS / .. that allows async destaging of data down to disk, somewhat compromising data for better performance. As we're talking about write caching on the same node that the application runs on - a crash is missfortune but in the same failure domain. E.g. if you run a compile job that includes extraction of a tar and the node crashes you'll have to restart the entire job, anyhow. The NFSv2 spec defined that NFS io's are to be 'sync', probably because the compile job on the nfs client would survive if the NFS Server crashes, so the failure domain would be different NFSv3 in rfc1813 below acknowledged the performance impact and introduced the 'async' flag for NFS, which would handle IO's similar to local IOs, allowing to destage in the background. Keep in mind - applications, independent if running locally or via NFS can always decided to use the fclose() option, which will ensure that data is destaged to persistent storage right away. But its an applications choice if that's really mandatory or whether performance has higher priority. The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache down to disk - very filesystem independent. -> single client, single thread, small files workload on GPFS can be destaged async, allowing to hide latency and parallelizing disk IOs. -> NFS client IO's are sync, so the second IO can only be started after the first one hit non volatile memory -> much higher latency The Spectrum Scale NFS implementation (based on ganesha) does not support the async mount option, which is a bit of a pitty. There might also be implementation differences compared to kernel-nfs, I did not investigate into that direction. However, the principles of the difference are explained for my by the above behavior. One workaround that I saw working well for multiple customers was to replace the NFS client by a Spectrum Scale nsd client. That has two advantages, but is certainly not suitable in all cases: - Improved speed by efficent NSD protocol and NSD client side write caching - Write Caching in the same failure domain as the application (on NSD client) which seems to be more reasonable compared to NFS Server side write caching. References: NFS sync vs async https://tools.ietf.org/html/rfc1813 The write throughput bottleneck caused by the synchronous definition of write in the NFS version 2 protocol has been addressed by adding support so that the NFS server can do unsafe writes. Unsafe writes are writes which have not been committed to stable storage before the operation returns. This specification defines a method for committing these unsafe writes to stable storage in a reliable way. sync() vs fsync() https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm - An application program makes an fsync() call for a specified file. This causes all of the pages that contain modified data for that file to be written to disk. The writing is complete when the fsync() call returns to the program. - An application program makes a sync() call. This causes all of the file pages in memory that contain modified data to be scheduled for writing to disk. The writing is not necessarily complete when the sync() call returns to the program. - A user can enter the sync command, which in turn issues a sync() call. Again, some of the writes may not be complete when the user is prompted for input (or the next command in a shell script is processed). close() vs fclose() A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a file system to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.) Mit freundlichen Gr??en / Kind regards Alexander Saupp IBM Systems, Storage Platform, EMEA Storage Competence Center Phone: +49 7034-643-1512 IBM Deutschland GmbH Mobile: +49-172 7251072 Am Weiher 24 Email: alexander.saupp at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "ecblank.gif" deleted by Olaf Weiser/Germany/IBM] [attachment "19995626.gif" deleted by Olaf Weiser/Germany/IBM] [attachment "ecblank.gif" deleted by Olaf Weiser/Germany/IBM] _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From MKEIGO at jp.ibm.com Wed Oct 17 14:34:55 2018 From: MKEIGO at jp.ibm.com (Keigo Matsubara) Date: Wed, 17 Oct 2018 22:34:55 +0900 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: I also wonder how many products actually exploit NFS async mode to improve I/O performance by sacrificing the file system consistency risk: gpfsug-discuss-bounces at spectrumscale.org wrote on 2018/10/17 22:26:52: > Using this option usually improves performance, but at > the cost that an unclean server restart (i.e. a crash) can cause > data to be lost or corrupted." For instance, NetApp, at the very least FAS 3220 running Data OnTap 8.1.2p4 7-mode which I tested with, would forcibly *promote* async mode to sync mode. Promoting means even if NFS client requests async mount mode, the NFS server ignores and allows only sync mount mode. Best Regards, --- Keigo Matsubara, Storage Solutions Client Technical Specialist, IBM Japan TEL: +81-50-3150-0595, T/L: 6205-0595 -------------- next part -------------- An HTML attachment was scrubbed... URL: From janfrode at tanso.net Wed Oct 17 14:35:22 2018 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 17 Oct 2018 09:35:22 -0400 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: My thinking was mainly that single threaded 200 files/second == 5 ms/file. Where do these 5 ms go? Is it NFS protocol overhead, or is it waiting for I/O so that it can be fixed with a lower latency storage backend? -jf On Wed, Oct 17, 2018 at 9:15 AM Olaf Weiser wrote: > Jallo Jan, > you can expect to get slightly improved numbers from the lower response > times of the HAWC ... but the loss of performance comes from the fact, that > GPFS or (async kNFS) writes with multiple parallel threads - in opposite > to e.g. tar via GaneshaNFS comes with single threads fsync on each file.. > > you'll never outperform e.g. 128 (maybe slower), but, parallel threads > (running write-behind) <---> with one single but fast threads, .... > > so as Alex suggest.. if possible.. take gpfs client of kNFS for those > types of workloads.. > > > > > > > > > > > From: Jan-Frode Myklebust > To: gpfsug main discussion list > Date: 10/17/2018 02:24 PM > Subject: Re: [gpfsug-discuss] Preliminary conclusion: single > client, single thread, small files - native Scale vs NFS > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > Do you know if the slow throughput is caused by the network/nfs-protocol > layer, or does it help to use faster storage (ssd)? If on storage, have you > considered if HAWC can help? > > I?m thinking about adding an SSD pool as a first tier to hold the active > dataset for a similar setup, but that?s mainly to solve the small file read > workload (i.e. random I/O ). > > > -jf > ons. 17. okt. 2018 kl. 07:47 skrev Alexander Saupp < > *Alexander.Saupp at de.ibm.com* >: > Dear Mailing List readers, > > I've come to a preliminary conclusion that explains the behavior in an > appropriate manner, so I'm trying to summarize my current thinking with > this audience. > > *Problem statement: * > Big performance derivation between native GPFS (fast) and loopback NFS > mount on the same node (way slower) for single client, single thread, small > files workload. > > > *Current explanation:* > tar seems to use close() on files, not fclose(). That is an application > choice and common behavior. The ideas is to allow OS write caching to speed > up process run time. > > When running locally on ext3 / xfs / GPFS / .. that allows async destaging > of data down to disk, somewhat compromising data for better performance. > As we're talking about write caching on the same node that the application > runs on - a crash is missfortune but in the same failure domain. > E.g. if you run a compile job that includes extraction of a tar and the > node crashes you'll have to restart the entire job, anyhow. > > The NFSv2 spec defined that NFS io's are to be 'sync', probably because > the compile job on the nfs client would survive if the NFS Server crashes, > so the failure domain would be different > > NFSv3 in rfc1813 below acknowledged the performance impact and introduced > the 'async' flag for NFS, which would handle IO's similar to local IOs, > allowing to destage in the background. > > Keep in mind - applications, independent if running locally or via NFS can > always decided to use the fclose() option, which will ensure that data is > destaged to persistent storage right away. > But its an applications choice if that's really mandatory or whether > performance has higher priority. > > The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache down > to disk - very filesystem independent. > > -> single client, single thread, small files workload on GPFS can be > destaged async, allowing to hide latency and parallelizing disk IOs. > -> NFS client IO's are sync, so the second IO can only be started after > the first one hit non volatile memory -> much higher latency > > > The Spectrum Scale NFS implementation (based on ganesha) does not support > the async mount option, which is a bit of a pitty. There might also be > implementation differences compared to kernel-nfs, I did not investigate > into that direction. > > However, the principles of the difference are explained for my by the > above behavior. > > One workaround that I saw working well for multiple customers was to > replace the NFS client by a Spectrum Scale nsd client. > That has two advantages, but is certainly not suitable in all cases: > - Improved speed by efficent NSD protocol and NSD client side write caching > - Write Caching in the same failure domain as the application (on NSD > client) which seems to be more reasonable compared to NFS Server side write > caching. > > *References:* > > NFS sync vs async > *https://tools.ietf.org/html/rfc1813* > > *The write throughput bottleneck caused by the synchronous definition of > write in the NFS version 2 protocol has been addressed by adding support so > that the NFS server can do unsafe writes.* > Unsafe writes are writes which have not been committed to stable storage > before the operation returns. This specification defines a method for > committing these unsafe writes to stable storage in a reliable way. > > > *sync() vs fsync()* > > *https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm* > > - An application program makes an fsync() call for a specified file. This > causes all of the pages that contain modified data for that file to be > written to disk. The writing is complete when the fsync() call returns to > the program. > > - An application program makes a sync() call. This causes all of the file > pages in memory that contain modified data to be scheduled for writing to > disk. The writing is not necessarily complete when the sync() call returns > to the program. > > - A user can enter the sync command, which in turn issues a sync() call. > Again, some of the writes may not be complete when the user is prompted for > input (or the next command in a shell script is processed). > > > *close() vs fclose()* > A successful close does not guarantee that the data has been successfully > saved to disk, as the kernel defers writes. It is not common for a file > system to flush the buffers when the stream is closed. If you need to be > sure that the data is > physically stored use fsync(2). (It will depend on the disk hardware at > this point.) > > > Mit freundlichen Gr??en / Kind regards > > *Alexander Saupp* > > IBM Systems, Storage Platform, EMEA Storage Competence Center > ------------------------------ > Phone: +49 7034-643-1512 IBM Deutschland GmbH > Mobile: +49-172 7251072 Am Weiher 24 > Email: *alexander.saupp at de.ibm.com* 65451 > Kelsterbach > Germany > ------------------------------ > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan > Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at *spectrumscale.org* > *http://gpfsug.org/mailman/listinfo/gpfsug-discuss* > *[attachment > "ecblank.gif" deleted by Olaf Weiser/Germany/IBM] [attachment > "19995626.gif" deleted by Olaf Weiser/Germany/IBM] [attachment > "ecblank.gif" deleted by Olaf Weiser/Germany/IBM] * > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From TOMP at il.ibm.com Wed Oct 17 14:41:03 2018 From: TOMP at il.ibm.com (Tomer Perry) Date: Wed, 17 Oct 2018 16:41:03 +0300 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Hi, Without going into to much details, AFAIR, Ontap integrate NVRAM into the NFS write cache ( as it was developed as a NAS product). Ontap is using the STABLE bit which kind of tell the client "hey, I have no write cache at all, everything is written to stable storage - thus, don't bother with commits ( sync) commands - they are meaningless". Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Keigo Matsubara" To: gpfsug main discussion list Date: 17/10/2018 16:35 Subject: Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Sent by: gpfsug-discuss-bounces at spectrumscale.org I also wonder how many products actually exploit NFS async mode to improve I/O performance by sacrificing the file system consistency risk: gpfsug-discuss-bounces at spectrumscale.org wrote on 2018/10/17 22:26:52: > Using this option usually improves performance, but at > the cost that an unclean server restart (i.e. a crash) can cause > data to be lost or corrupted." For instance, NetApp, at the very least FAS 3220 running Data OnTap 8.1.2p4 7-mode which I tested with, would forcibly *promote* async mode to sync mode. Promoting means even if NFS client requests async mount mode, the NFS server ignores and allows only sync mount mode. Best Regards, --- Keigo Matsubara, Storage Solutions Client Technical Specialist, IBM Japan TEL: +81-50-3150-0595, T/L: 6205-0595 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From stijn.deweirdt at ugent.be Wed Oct 17 14:42:02 2018 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Wed, 17 Oct 2018 15:42:02 +0200 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: <5508e483-25ef-d318-0c68-4009cb5871cc@ugent.be> hi all, has anyone tried to use tools like eatmydata that allow the user to "ignore" the syncs (there's another tool that has less explicit name if it would make you feel better ;). stijn On 10/17/2018 03:26 PM, Tomer Perry wrote: > Just to clarify ( from man exports): > " async This option allows the NFS server to violate the NFS protocol > and reply to requests before any changes made by that request have been > committed to stable storage (e.g. > disc drive). > > Using this option usually improves performance, but at the > cost that an unclean server restart (i.e. a crash) can cause data to be > lost or corrupted." > > With the Ganesha implementation in Spectrum Scale, it was decided not to > allow this violation - so this async export options wasn't exposed. > I believe that for those customers that agree to take the risk, using > async mount option ( from the client) will achieve similar behavior. > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: "Olaf Weiser" > To: gpfsug main discussion list > Date: 17/10/2018 16:16 > Subject: Re: [gpfsug-discuss] Preliminary conclusion: single > client, single thread, small files - native Scale vs NFS > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Jallo Jan, > you can expect to get slightly improved numbers from the lower response > times of the HAWC ... but the loss of performance comes from the fact, > that > GPFS or (async kNFS) writes with multiple parallel threads - in opposite > to e.g. tar via GaneshaNFS comes with single threads fsync on each file.. > > > you'll never outperform e.g. 128 (maybe slower), but, parallel threads > (running write-behind) <---> with one single but fast threads, .... > > so as Alex suggest.. if possible.. take gpfs client of kNFS for those > types of workloads.. > > > > > > > > > > > From: Jan-Frode Myklebust > To: gpfsug main discussion list > Date: 10/17/2018 02:24 PM > Subject: Re: [gpfsug-discuss] Preliminary conclusion: single > client, single thread, small files - native Scale vs NFS > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Do you know if the slow throughput is caused by the network/nfs-protocol > layer, or does it help to use faster storage (ssd)? If on storage, have > you considered if HAWC can help? > > I?m thinking about adding an SSD pool as a first tier to hold the active > dataset for a similar setup, but that?s mainly to solve the small file > read workload (i.e. random I/O ). > > > -jf > ons. 17. okt. 2018 kl. 07:47 skrev Alexander Saupp < > Alexander.Saupp at de.ibm.com>: > Dear Mailing List readers, > > I've come to a preliminary conclusion that explains the behavior in an > appropriate manner, so I'm trying to summarize my current thinking with > this audience. > > Problem statement: > Big performance derivation between native GPFS (fast) and loopback NFS > mount on the same node (way slower) for single client, single thread, > small files workload. > > > Current explanation: > tar seems to use close() on files, not fclose(). That is an application > choice and common behavior. The ideas is to allow OS write caching to > speed up process run time. > > When running locally on ext3 / xfs / GPFS / .. that allows async destaging > of data down to disk, somewhat compromising data for better performance. > As we're talking about write caching on the same node that the application > runs on - a crash is missfortune but in the same failure domain. > E.g. if you run a compile job that includes extraction of a tar and the > node crashes you'll have to restart the entire job, anyhow. > > The NFSv2 spec defined that NFS io's are to be 'sync', probably because > the compile job on the nfs client would survive if the NFS Server crashes, > so the failure domain would be different > > NFSv3 in rfc1813 below acknowledged the performance impact and introduced > the 'async' flag for NFS, which would handle IO's similar to local IOs, > allowing to destage in the background. > > Keep in mind - applications, independent if running locally or via NFS can > always decided to use the fclose() option, which will ensure that data is > destaged to persistent storage right away. > But its an applications choice if that's really mandatory or whether > performance has higher priority. > > The linux 'sync' (man sync) tool allows to sync 'dirty' memory cache down > to disk - very filesystem independent. > > -> single client, single thread, small files workload on GPFS can be > destaged async, allowing to hide latency and parallelizing disk IOs. > -> NFS client IO's are sync, so the second IO can only be started after > the first one hit non volatile memory -> much higher latency > > > The Spectrum Scale NFS implementation (based on ganesha) does not support > the async mount option, which is a bit of a pitty. There might also be > implementation differences compared to kernel-nfs, I did not investigate > into that direction. > > However, the principles of the difference are explained for my by the > above behavior. > > One workaround that I saw working well for multiple customers was to > replace the NFS client by a Spectrum Scale nsd client. > That has two advantages, but is certainly not suitable in all cases: > - Improved speed by efficent NSD protocol and NSD client side write > caching > - Write Caching in the same failure domain as the application (on NSD > client) which seems to be more reasonable compared to NFS Server side > write caching. > > References: > > NFS sync vs async > https://tools.ietf.org/html/rfc1813 > The write throughput bottleneck caused by the synchronous definition of > write in the NFS version 2 protocol has been addressed by adding support > so that the NFS server can do unsafe writes. > Unsafe writes are writes which have not been committed to stable storage > before the operation returns. This specification defines a method for > committing these unsafe writes to stable storage in a reliable way. > > > sync() vs fsync() > https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.performance/using_sync_fsync_calls.htm > > - An application program makes an fsync() call for a specified file. This > causes all of the pages that contain modified data for that file to be > written to disk. The writing is complete when the fsync() call returns to > the program. > > - An application program makes a sync() call. This causes all of the file > pages in memory that contain modified data to be scheduled for writing to > disk. The writing is not necessarily complete when the sync() call returns > to the program. > > - A user can enter the sync command, which in turn issues a sync() call. > Again, some of the writes may not be complete when the user is prompted > for input (or the next command in a shell script is processed). > > > close() vs fclose() > A successful close does not guarantee that the data has been successfully > saved to disk, as the kernel defers writes. It is not common for a file > system to flush the buffers when the stream is closed. If you need to be > sure that the data is > physically stored use fsync(2). (It will depend on the disk hardware at > this point.) > > > Mit freundlichen Gr??en / Kind regards > > Alexander Saupp > > IBM Systems, Storage Platform, EMEA Storage Competence Center > > > Phone: > +49 7034-643-1512 > IBM Deutschland GmbH > > Mobile: > +49-172 7251072 > Am Weiher 24 > Email: > alexander.saupp at de.ibm.com > 65451 Kelsterbach > > > Germany > > IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter > Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan > Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt > Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, > HRB 14562 / WEEE-Reg.-Nr. DE 99369940 > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss[attachment "ecblank.gif" > deleted by Olaf Weiser/Germany/IBM] [attachment "19995626.gif" deleted by > Olaf Weiser/Germany/IBM] [attachment "ecblank.gif" deleted by Olaf > Weiser/Germany/IBM] _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > From janfrode at tanso.net Wed Oct 17 14:50:38 2018 From: janfrode at tanso.net (Jan-Frode Myklebust) Date: Wed, 17 Oct 2018 09:50:38 -0400 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: Also beware there are 2 different linux NFS "async" settings. A client side setting (mount -o async), which still cases sync on file close() -- and a server (knfs) side setting (/etc/exports) that violates NFS protocol and returns requests before data has hit stable storage. -jf On Wed, Oct 17, 2018 at 9:41 AM Tomer Perry wrote: > Hi, > > Without going into to much details, AFAIR, Ontap integrate NVRAM into the > NFS write cache ( as it was developed as a NAS product). > Ontap is using the STABLE bit which kind of tell the client "hey, I have > no write cache at all, everything is written to stable storage - thus, > don't bother with commits ( sync) commands - they are meaningless". > > > Regards, > > Tomer Perry > Scalable I/O Development (Spectrum Scale) > email: tomp at il.ibm.com > 1 Azrieli Center, Tel Aviv 67021, Israel > Global Tel: +1 720 3422758 > Israel Tel: +972 3 9188625 > Mobile: +972 52 2554625 > > > > > From: "Keigo Matsubara" > To: gpfsug main discussion list > Date: 17/10/2018 16:35 > Subject: Re: [gpfsug-discuss] Preliminary conclusion: single > client, single thread, small files - native Scale vs NFS > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > I also wonder how many products actually exploit NFS async mode to improve > I/O performance by sacrificing the file system consistency risk: > > gpfsug-discuss-bounces at spectrumscale.org wrote on 2018/10/17 22:26:52: > > Using this option usually improves performance, but at > > the cost that an unclean server restart (i.e. a crash) can cause > > data to be lost or corrupted." > > For instance, NetApp, at the very least FAS 3220 running Data OnTap > 8.1.2p4 7-mode which I tested with, would forcibly *promote* async mode to > sync mode. > Promoting means even if NFS client requests async mount mode, the NFS > server ignores and allows only sync mount mode. > > Best Regards, > --- > Keigo Matsubara, Storage Solutions Client Technical Specialist, IBM Japan > TEL: +81-50-3150-0595, T/L: 6205-0595 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oehmes at gmail.com Wed Oct 17 17:22:05 2018 From: oehmes at gmail.com (Sven Oehme) Date: Wed, 17 Oct 2018 09:22:05 -0700 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: Message-ID: <7E9A54A4-304E-42F7-BF4B-06EBC57503FE@gmail.com> while most said here is correct, it can?t explain the performance of 200 files /sec and I couldn?t resist jumping in here :-D lets assume for a second each operation is synchronous and its done by just 1 thread. 200 files / sec means 5 ms on average per file write. Lets be generous and say the network layer is 100 usec per roud-trip network hop (including code processing on protocol node or client) and for visualization lets assume the setup looks like this : ESS Node ---ethernet--- Protocol Node ?ethernet--- client Node . lets say the ESS write cache can absorb small io at a fixed cost of 300 usec if the heads are ethernet connected and not using IB (then it would be more in the 250 usec range). That?s 300 +100(net1) +100(net2) usec or 500 usec in total. So you are a factor 10 off from your number. So lets just assume a create + write is more than just 1 roundtrip worth or synchronization, lets say it needs to do 2 full roundtrips synchronously one for the create and one for the stable write that?s 1 ms, still 5x off of your 5 ms. So either there is a bug in the NFS Server, the NFS client or the storage is not behaving properly. To verify this, the best would be to run the following test : Create a file on the ESS node itself in the shared filesystem like : /usr/lpp/mmfs/samples/perf/gpfsperf create seq -nongpfs -r 4k -n 1m -th 1 -dio /sharedfs/test Now run the following command on one of the ESS nodes, then the protocol node and last the nfs client : /usr/lpp/mmfs/samples/perf/gpfsperf write seq -nongpfs -r 4k -n 1m -th 1 -dio /sharedfs/test This will create 256 stable 4k write i/os to the storage system, I picked the number just to get a statistical relevant number of i/os you can change 1m to 2m or 4m, just don?t make it too high or you might get variations due to de-staging or other side effects happening on the storage system, which you don?t care at this point you want to see the round trip time on each layer. The gpfsperf command will spit out a line like : Data rate was XYZ Kbytes/sec, Op Rate was XYZ Ops/sec, Avg Latency was 0.266 milliseconds, thread utilization 1.000, bytesTransferred 1048576 The only number here that matters is the average latency number , write it down. What I would expect to get back is something like : On ESS Node ? 300 usec average i/o On PN ? 400 usec average i/o On Client ? 500 usec average i/o If you get anything higher than the numbers above something fundamental is bad (in fact on fast system you may see from client no more than 200-300 usec response time) and it will be in the layer in between or below of where you test. If all the numbers are somewhere in line with my numbers above, it clearly points to a problem in NFS itself and the way it communicates with GPFS. Marc, myself and others have debugged numerous issues in this space in the past last one was fixed beginning of this year and ended up in some Scale 5.0.1.X release. To debug this is very hard and most of the time only possible with GPFS source code access which I no longer have. You would start with something like strace -Ttt -f -o tar-debug.out tar -xvf ?..? and check what exact system calls are made to nfs client and how long each takes. You would then run a similar strace on the NFS server to see how many individual system calls will be made to GPFS and how long each takes. This will allow you to narrow down where the issue really is. But I suggest to start with the simpler test above as this might already point to a much simpler problem. Btw. I will be also be speaking at the UG Meeting at SC18 in Dallas, in case somebody wants to catch up ? Sven From: on behalf of Jan-Frode Myklebust Reply-To: gpfsug main discussion list Date: Wednesday, October 17, 2018 at 6:50 AM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Also beware there are 2 different linux NFS "async" settings. A client side setting (mount -o async), which still cases sync on file close() -- and a server (knfs) side setting (/etc/exports) that violates NFS protocol and returns requests before data has hit stable storage. -jf On Wed, Oct 17, 2018 at 9:41 AM Tomer Perry wrote: Hi, Without going into to much details, AFAIR, Ontap integrate NVRAM into the NFS write cache ( as it was developed as a NAS product). Ontap is using the STABLE bit which kind of tell the client "hey, I have no write cache at all, everything is written to stable storage - thus, don't bother with commits ( sync) commands - they are meaningless". Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: tomp at il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Keigo Matsubara" To: gpfsug main discussion list Date: 17/10/2018 16:35 Subject: Re: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS Sent by: gpfsug-discuss-bounces at spectrumscale.org I also wonder how many products actually exploit NFS async mode to improve I/O performance by sacrificing the file system consistency risk: gpfsug-discuss-bounces at spectrumscale.org wrote on 2018/10/17 22:26:52: > Using this option usually improves performance, but at > the cost that an unclean server restart (i.e. a crash) can cause > data to be lost or corrupted." For instance, NetApp, at the very least FAS 3220 running Data OnTap 8.1.2p4 7-mode which I tested with, would forcibly *promote* async mode to sync mode. Promoting means even if NFS client requests async mount mode, the NFS server ignores and allows only sync mount mode. Best Regards, --- Keigo Matsubara, Storage Solutions Client Technical Specialist, IBM Japan TEL: +81-50-3150-0595, T/L: 6205-0595 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Oct 17 22:02:30 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 17 Oct 2018 21:02:30 +0000 Subject: [gpfsug-discuss] Job vacancy @Birmingham Message-ID: We're looking for someone to join our systems team here at University of Birmingham. In case you didn't realise, we're pretty reliant on Spectrum Scale to deliver our storage systems. https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117 Such a snappy URL :-) Feel free to email me *OFFLIST* if you have informal enquiries! Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From olaf.weiser at de.ibm.com Thu Oct 18 10:14:51 2018 From: olaf.weiser at de.ibm.com (Olaf Weiser) Date: Thu, 18 Oct 2018 11:14:51 +0200 Subject: [gpfsug-discuss] Job vacancy @Birmingham In-Reply-To: References: Message-ID: An HTML attachment was scrubbed... URL: From nathan.harper at cfms.org.uk Thu Oct 18 10:23:44 2018 From: nathan.harper at cfms.org.uk (Nathan Harper) Date: Thu, 18 Oct 2018 10:23:44 +0100 Subject: [gpfsug-discuss] Job vacancy @Birmingham In-Reply-To: References: Message-ID: Olaf - we don't need any reminders of Bre.. this morning On Thu, 18 Oct 2018 at 10:15, Olaf Weiser wrote: > Hi Simon .. > well - I would love to .. .but .. ;-) hey - what do you think, how long a > citizen from the EU can live (and work) in UK ;-) > don't take me too serious... see you soon, consider you invited for a > coffee for my rude comment .. ;-) > olaf > > > > > From: Simon Thompson > To: "gpfsug-discuss at spectrumscale.org" < > gpfsug-discuss at spectrumscale.org> > Date: 10/17/2018 11:02 PM > Subject: [gpfsug-discuss] Job vacancy @Birmingham > Sent by: gpfsug-discuss-bounces at spectrumscale.org > ------------------------------ > > > > We're looking for someone to join our systems team here at University of > Birmingham. In case you didn't realise, we're pretty reliant on Spectrum > Scale to deliver our storage systems. > > > https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117 > *https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117* > > > Such a snappy URL :-) > > Feel free to email me *OFFLIST* if you have informal enquiries! > > Simon > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- *Nathan Harper* // IT Systems Lead *e: *nathan.harper at cfms.org.uk *t*: 0117 906 1104 *m*: 0787 551 0891 *w: *www.cfms.org.uk CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons Green // Bristol // BS16 7FR CFMS Services Ltd is registered in England and Wales No 05742022 - a subsidiary of CFMS Ltd CFMS Services Ltd registered office // 43 Queens Square // Bristol // BS1 4QP -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.childs at qmul.ac.uk Thu Oct 18 16:32:43 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Thu, 18 Oct 2018 15:32:43 +0000 Subject: [gpfsug-discuss] Can't take snapshots while re-striping Message-ID: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London From alex at calicolabs.com Thu Oct 18 17:12:42 2018 From: alex at calicolabs.com (Alex Chekholko) Date: Thu, 18 Oct 2018 09:12:42 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: References: Message-ID: The re-striping uses a lot of I/O, so if your goal is user-facing performance, the re-striping is definitely hurting in the short term and is of questionable value in the long term, depending on how much churn there is on your filesystem. One way to split the difference would be to run your 'mmrestripe -b' midnight to 6am for many days; so it does not conflict with your snapshot. Or whatever other time you have lower user load. On Thu, Oct 18, 2018 at 8:32 AM Peter Childs wrote: > We've just added 9 raid volumes to our main storage, (5 Raid6 arrays > for data and 4 Raid1 arrays for metadata) > > We are now attempting to rebalance and our data around all the volumes. > > We started with the meta-data doing a "mmrestripe -r" as we'd changed > the failure groups to on our meta-data disks and wanted to ensure we > had all our metadata on known good ssd. No issues, here we could take > snapshots and I even tested it. (New SSD on new failure group and move > all old SSD to the same failure group) > > We're now doing a "mmrestripe -b" to rebalance the data accross all 21 > Volumes however when we attempt to take a snapshot, as we do every > night at 11pm it fails with > > sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test > Flushing dirty data for snapshot :test... > Quiescing all file system operations. > Unable to quiesce all nodes; some processes are busy or holding > required resources. > mmcrsnapshot: Command failed. Examine previous error messages to > determine cause. > > Are you meant to be able to take snapshots while re-striping or not? > > I know a rebalance of the data is probably unnecessary, but we'd like > to get the best possible speed out of the system, and we also kind of > like balance. > > Thanks > > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex at calicolabs.com Thu Oct 18 17:12:42 2018 From: alex at calicolabs.com (Alex Chekholko) Date: Thu, 18 Oct 2018 09:12:42 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: References: Message-ID: The re-striping uses a lot of I/O, so if your goal is user-facing performance, the re-striping is definitely hurting in the short term and is of questionable value in the long term, depending on how much churn there is on your filesystem. One way to split the difference would be to run your 'mmrestripe -b' midnight to 6am for many days; so it does not conflict with your snapshot. Or whatever other time you have lower user load. On Thu, Oct 18, 2018 at 8:32 AM Peter Childs wrote: > We've just added 9 raid volumes to our main storage, (5 Raid6 arrays > for data and 4 Raid1 arrays for metadata) > > We are now attempting to rebalance and our data around all the volumes. > > We started with the meta-data doing a "mmrestripe -r" as we'd changed > the failure groups to on our meta-data disks and wanted to ensure we > had all our metadata on known good ssd. No issues, here we could take > snapshots and I even tested it. (New SSD on new failure group and move > all old SSD to the same failure group) > > We're now doing a "mmrestripe -b" to rebalance the data accross all 21 > Volumes however when we attempt to take a snapshot, as we do every > night at 11pm it fails with > > sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test > Flushing dirty data for snapshot :test... > Quiescing all file system operations. > Unable to quiesce all nodes; some processes are busy or holding > required resources. > mmcrsnapshot: Command failed. Examine previous error messages to > determine cause. > > Are you meant to be able to take snapshots while re-striping or not? > > I know a rebalance of the data is probably unnecessary, but we'd like > to get the best possible speed out of the system, and we also kind of > like balance. > > Thanks > > > -- > Peter Childs > ITS Research Storage > Queen Mary, University of London > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Buterbaugh at Vanderbilt.Edu Thu Oct 18 17:13:52 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 18 Oct 2018 16:13:52 +0000 Subject: [gpfsug-discuss] Job vacancy @Birmingham In-Reply-To: References: Message-ID: <4B78CFBB-6B35-4914-A42D-5A66117DD588@vanderbilt.edu> Hi Nathan, Well, while I?m truly sorry for what you?re going thru, at least a majority of the voters in the UK did vote for it. Keep in mind that things could be worse. Some of us do happen to live in a country where a far worse thing has happened despite the fact that the majority of the voters were _against_ it?. ;-) Kevin ? Kevin Buterbaugh - Senior System Administrator Vanderbilt University - Advanced Computing Center for Research and Education Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633 On Oct 18, 2018, at 4:23 AM, Nathan Harper > wrote: Olaf - we don't need any reminders of Bre.. this morning On Thu, 18 Oct 2018 at 10:15, Olaf Weiser > wrote: Hi Simon .. well - I would love to .. .but .. ;-) hey - what do you think, how long a citizen from the EU can live (and work) in UK ;-) don't take me too serious... see you soon, consider you invited for a coffee for my rude comment .. ;-) olaf From: Simon Thompson > To: "gpfsug-discuss at spectrumscale.org" > Date: 10/17/2018 11:02 PM Subject: [gpfsug-discuss] Job vacancy @Birmingham Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ We're looking for someone to join our systems team here at University of Birmingham. In case you didn't realise, we're pretty reliant on Spectrum Scale to deliver our storage systems. https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117https://atsv7.wcn.co.uk/search_engine/jobs.cgi?amNvZGU9MTc2MzczOSZ2dF90ZW1wbGF0ZT03Njcmb3duZXI9NTAzMjUyMSZvd25lcnR5cGU9ZmFpciZicmFuZF9pZD0wJmxvY2F0aW9uX2NvZGU9MTU0NDUmb2NjX2NvZGU9Njg3NiZwb3N0aW5nX2NvZGU9MTE3&jcode=1763739&vt_template=767&owner=5032521&ownertype=fair&brand_id=0&location_code=15445&occ_code=6876&posting_code=117 Such a snappy URL :-) Feel free to email me *OFFLIST* if you have informal enquiries! Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Nathan Harper // IT Systems Lead e: nathan.harper at cfms.org.uk t: 0117 906 1104 m: 0787 551 0891 w: www.cfms.org.uk CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons Green // Bristol // BS16 7FR [http://cfms.org.uk/images/logo.png] CFMS Services Ltd is registered in England and Wales No 05742022 - a subsidiary of CFMS Ltd CFMS Services Ltd registered office // 43 Queens Square // Bristol // BS1 4QP _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ca552bcbb43b34c316b2808d634db7033%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636754514425052428&sdata=tErG6k2dNdqz%2Ffnc8eYtpyR%2Ba1Cb4AZ8n7WA%2Buv3oCw%3D&reserved=0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Thu Oct 18 17:48:54 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Thu, 18 Oct 2018 16:48:54 +0000 Subject: [gpfsug-discuss] Reminder: Please keep discussion focused on GPFS/Scale Message-ID: <2A1399B8-441D-48E3-AACC-0BD3B0780A60@nuance.com> A gentle reminder to not left the discussions drift off topic, thanks. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Thu Oct 18 17:57:18 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 18 Oct 2018 16:57:18 +0000 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: Message-ID: And use QoS Less aggressive during peak, more on valleys. If your workload allows it. ? SENT FROM MOBILE DEVICE Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous > On 18 Oct 2018, at 19.13, Alex Chekholko wrote: > > The re-striping uses a lot of I/O, so if your goal is user-facing performance, the re-striping is definitely hurting in the short term and is of questionable value in the long term, depending on how much churn there is on your filesystem. > > One way to split the difference would be to run your 'mmrestripe -b' midnight to 6am for many days; so it does not conflict with your snapshot. Or whatever other time you have lower user load. > >> On Thu, Oct 18, 2018 at 8:32 AM Peter Childs wrote: >> We've just added 9 raid volumes to our main storage, (5 Raid6 arrays >> for data and 4 Raid1 arrays for metadata) >> >> We are now attempting to rebalance and our data around all the volumes. >> >> We started with the meta-data doing a "mmrestripe -r" as we'd changed >> the failure groups to on our meta-data disks and wanted to ensure we >> had all our metadata on known good ssd. No issues, here we could take >> snapshots and I even tested it. (New SSD on new failure group and move >> all old SSD to the same failure group) >> >> We're now doing a "mmrestripe -b" to rebalance the data accross all 21 >> Volumes however when we attempt to take a snapshot, as we do every >> night at 11pm it fails with >> >> sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test >> Flushing dirty data for snapshot :test... >> Quiescing all file system operations. >> Unable to quiesce all nodes; some processes are busy or holding >> required resources. >> mmcrsnapshot: Command failed. Examine previous error messages to >> determine cause. >> >> Are you meant to be able to take snapshots while re-striping or not? >> >> I know a rebalance of the data is probably unnecessary, but we'd like >> to get the best possible speed out of the system, and we also kind of >> like balance. >> >> Thanks >> >> >> -- >> Peter Childs >> ITS Research Storage >> Queen Mary, University of London >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From luis.bolinches at fi.ibm.com Thu Oct 18 17:57:18 2018 From: luis.bolinches at fi.ibm.com (Luis Bolinches) Date: Thu, 18 Oct 2018 16:57:18 +0000 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: Message-ID: And use QoS Less aggressive during peak, more on valleys. If your workload allows it. ? SENT FROM MOBILE DEVICE Yst?v?llisin terveisin / Kind regards / Saludos cordiales / Salutations Luis Bolinches Consultant IT Specialist Mobile Phone: +358503112585 https://www.youracclaim.com/user/luis-bolinches "If you always give you will always have" -- Anonymous > On 18 Oct 2018, at 19.13, Alex Chekholko wrote: > > The re-striping uses a lot of I/O, so if your goal is user-facing performance, the re-striping is definitely hurting in the short term and is of questionable value in the long term, depending on how much churn there is on your filesystem. > > One way to split the difference would be to run your 'mmrestripe -b' midnight to 6am for many days; so it does not conflict with your snapshot. Or whatever other time you have lower user load. > >> On Thu, Oct 18, 2018 at 8:32 AM Peter Childs wrote: >> We've just added 9 raid volumes to our main storage, (5 Raid6 arrays >> for data and 4 Raid1 arrays for metadata) >> >> We are now attempting to rebalance and our data around all the volumes. >> >> We started with the meta-data doing a "mmrestripe -r" as we'd changed >> the failure groups to on our meta-data disks and wanted to ensure we >> had all our metadata on known good ssd. No issues, here we could take >> snapshots and I even tested it. (New SSD on new failure group and move >> all old SSD to the same failure group) >> >> We're now doing a "mmrestripe -b" to rebalance the data accross all 21 >> Volumes however when we attempt to take a snapshot, as we do every >> night at 11pm it fails with >> >> sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test >> Flushing dirty data for snapshot :test... >> Quiescing all file system operations. >> Unable to quiesce all nodes; some processes are busy or holding >> required resources. >> mmcrsnapshot: Command failed. Examine previous error messages to >> determine cause. >> >> Are you meant to be able to take snapshots while re-striping or not? >> >> I know a rebalance of the data is probably unnecessary, but we'd like >> to get the best possible speed out of the system, and we also kind of >> like balance. >> >> Thanks >> >> >> -- >> Peter Childs >> ITS Research Storage >> Queen Mary, University of London >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss Ellei edell? ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dwayne.Hart at med.mun.ca Thu Oct 18 18:19:21 2018 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Thu, 18 Oct 2018 17:19:21 +0000 Subject: [gpfsug-discuss] Best way to migrate data Message-ID: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Hi, Just wondering what the best recipe for migrating a user?s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I?m currently using rsync and it has maxed out the client system?s IB interface. Best, Dwayne ? Dwayne Hart | Systems Administrator IV CHIA, Faculty of Medicine Memorial University of Newfoundland 300 Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L Dobbin Building | 4M409 T 709 864 6631 From S.J.Thompson at bham.ac.uk Thu Oct 18 18:44:11 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Thu, 18 Oct 2018 17:44:11 +0000 Subject: [gpfsug-discuss] Quick survey: ACL Posix vs NFS V4 In-Reply-To: <8120950808e344e280ae211ff22ba0bf@welwex02.niwa.local> References: <8120950808e344e280ae211ff22ba0bf@welwex02.niwa.local> Message-ID: Just following up this thread ... We use v4 ACLs, in part because we also export via SMB as well. Note that we do also use the fileset option "chmodAndUpdateAcl" Simon ________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Fabrice.Cantos at niwa.co.nz [Fabrice.Cantos at niwa.co.nz] Sent: 10 October 2018 22:57 To: gpfsug-discuss at spectrumscale.org Subject: [gpfsug-discuss] Quick survey: ACL Posix vs NFS V4 I would be interested to know what you chose for your filesystems and user/project space directories: * Traditional Posix ACL * NFS V4 ACL What did motivate your choice? We are facing some issues to get the correct NFS ACL to keep correct attributes for new files created. Thanks Fabrice [cid:image4cef17.PNG at 18c66b76.4480e036] Fabrice Cantos HPC Systems Engineer Group Manager ? High Performance Computing T +64-4-386-0367 M +64-27-412-9693 National Institute of Water & Atmospheric Research Ltd (NIWA) 301 Evans Bay Parade, Greta Point, Wellington Connect with NIWA: niwa.co.nz Facebook Twitter LinkedIn Instagram To ensure compliance with legal requirements and to maintain cyber security standards, NIWA's IT systems are subject to ongoing monitoring, activity logging and auditing. This monitoring and auditing service may be provided by third parties. Such third parties can access information transmitted to, processed by and stored on NIWA's IT systems. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image4cef17.PNG Type: image/png Size: 12288 bytes Desc: image4cef17.PNG URL: From frederik.ferner at diamond.ac.uk Thu Oct 18 18:54:32 2018 From: frederik.ferner at diamond.ac.uk (Frederik Ferner) Date: Thu, 18 Oct 2018 18:54:32 +0100 Subject: [gpfsug-discuss] Quick survey: ACL Posix vs NFS V4 In-Reply-To: <8120950808e344e280ae211ff22ba0bf@welwex02.niwa.local> References: <8120950808e344e280ae211ff22ba0bf@welwex02.niwa.local> Message-ID: <595d0584-df41-a731-ac08-6bba81dbdb31@diamond.ac.uk> On 10/10/18 22:57, Fabrice Cantos wrote: > I would be interested to know what you chose for your filesystems and > user/project space directories: > > * Traditional Posix ACL > * NFS V4 ACL We use traditional Posix ACLs almost exclusively. The main exception is some directories on Spectrum Scale where Windows machines with native Spectrum Scale support create files and directories. There our scripts set Posix ACLs which are respected on Windows but automatically converted to NFS V4 ACLs on new files and directories by the file system. > What did motivate your choice? Mainly that our use of ACLs goes back way longer than our use of GPFS/Spectrum Scale and we also have other file systems which do not support NFSv4 ACLs. Keeping knowledge and script on one set of ACLs fresh within the team is easier. Additional headache comes because as we all know Posix ACLs and NFS V4 ACLs don't translate exactly. > We are facing some issues to get the correct NFS ACL to keep correct > attributes for new files created. Is this using kernel NFSd or Ganesha (CES)? Frederik -- Frederik Ferner Senior Computer Systems Administrator (storage) phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 Duty Sys Admin can be reached on x8596 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom From oehmes at gmail.com Thu Oct 18 19:09:56 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 18 Oct 2018 11:09:56 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping Message-ID: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Thu Oct 18 19:09:56 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 18 Oct 2018 11:09:56 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping Message-ID: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Kevin.Buterbaugh at Vanderbilt.Edu Thu Oct 18 19:26:10 2018 From: Kevin.Buterbaugh at Vanderbilt.Edu (Buterbaugh, Kevin L) Date: Thu, 18 Oct 2018 18:26:10 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Message-ID: Hi Dwayne, I?m assuming you can?t just let an rsync run, possibly throttled in some way? If not, and if you?re just tapping out your network, then would it be possible to go old school? We have parts of the Medical Center here where their network connections are ? um, less than robust. So they tar stuff up to a portable HD, sneaker net it to us, and we untar is from an NSD server. HTH, and I really hope that someone has a better idea than that! Kevin > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote: > > Hi, > > Just wondering what the best recipe for migrating a user?s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I?m currently using rsync and it has maxed out the client system?s IB interface. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ccca728d2d61f4be06bcd08d6351f3650%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636754805507359478&sdata=2YAiqgqKl4CerlyCn3vJ9v9u%2FrGzbfa7aKxJ0PYV%2Fhc%3D&reserved=0 From p.childs at qmul.ac.uk Thu Oct 18 19:50:42 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Thu, 18 Oct 2018 18:50:42 +0000 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: Thanks Sven, that's one of the best answers I've seen and probably closer to why we sometimes can't take snapshots under normal circumstances as well. We're currently running the restripe with "-N " so it only runs on a few nodes and does not disturb the work of the cluster, which is why we hadn't noticed it slow down the storage too much. I've also tried to put some qos settings on it too, I always find the qos a little bit "trial and error" but 30,000Iops looks to be making the rebalance run at about 2/3 iops it was using with no qos limit...... Just out of interest which version do I need to be running for "mmchqos -N" to work? I tried it to limit a set of nodes and it says not supported by my filesystem version. Manual does not look to say. Even with a very, very small value for qos on maintenance tasks, I still can't take snapshots so as Sven says the buffers are getting dirty too quickly. I have thought before that making snapshot taking more reliable would be nice, I'd not really thought it would be possible, I guess its time to write another RFE. Peter Childs Research Storage ITS Research Infrastructure Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Thursday, October 18, 2018 7:09:56 PM To: gpfsug main discussion list; gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From p.childs at qmul.ac.uk Thu Oct 18 19:50:42 2018 From: p.childs at qmul.ac.uk (Peter Childs) Date: Thu, 18 Oct 2018 18:50:42 +0000 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: Thanks Sven, that's one of the best answers I've seen and probably closer to why we sometimes can't take snapshots under normal circumstances as well. We're currently running the restripe with "-N " so it only runs on a few nodes and does not disturb the work of the cluster, which is why we hadn't noticed it slow down the storage too much. I've also tried to put some qos settings on it too, I always find the qos a little bit "trial and error" but 30,000Iops looks to be making the rebalance run at about 2/3 iops it was using with no qos limit...... Just out of interest which version do I need to be running for "mmchqos -N" to work? I tried it to limit a set of nodes and it says not supported by my filesystem version. Manual does not look to say. Even with a very, very small value for qos on maintenance tasks, I still can't take snapshots so as Sven says the buffers are getting dirty too quickly. I have thought before that making snapshot taking more reliable would be nice, I'd not really thought it would be possible, I guess its time to write another RFE. Peter Childs Research Storage ITS Research Infrastructure Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Thursday, October 18, 2018 7:09:56 PM To: gpfsug main discussion list; gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Paul.Sanchez at deshaw.com Thu Oct 18 19:47:31 2018 From: Paul.Sanchez at deshaw.com (Sanchez, Paul) Date: Thu, 18 Oct 2018 18:47:31 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Message-ID: <8ec08801d1de486facbd6c3318e62d63@mbxtoa1.winmail.deshaw.com> Sharding can also work, if you have a storage-connected compute grid in your environment: If you enumerate all of the directories, then use a non-recursive rsync for each one, you may be able to parallelize the workload by using several clients simultaneously. It may still max out the links of these clients (assuming your source read throughput and target write throughput bottlenecks aren't encountered first) but it may run that way for 1/100th of the time if you can use 100+ machines. -Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Buterbaugh, Kevin L Sent: Thursday, October 18, 2018 2:26 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Hi Dwayne, I?m assuming you can?t just let an rsync run, possibly throttled in some way? If not, and if you?re just tapping out your network, then would it be possible to go old school? We have parts of the Medical Center here where their network connections are ? um, less than robust. So they tar stuff up to a portable HD, sneaker net it to us, and we untar is from an NSD server. HTH, and I really hope that someone has a better idea than that! Kevin > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote: > > Hi, > > Just wondering what the best recipe for migrating a user?s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I?m currently using rsync and it has maxed out the client system?s IB interface. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7Ccca728d2d61f4be06bcd08d6351f3650%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636754805507359478&sdata=2YAiqgqKl4CerlyCn3vJ9v9u%2FrGzbfa7aKxJ0PYV%2Fhc%3D&reserved=0 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Thu Oct 18 20:18:37 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 18 Oct 2018 12:18:37 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: <47DF6EDF-CA0C-4EBB-851A-1D3603F8B0C5@gmail.com> I don't know which min FS version you need to make use of -N, but there is this Marc guy watching the mailing list who would know __ Sven ?On 10/18/18, 11:50 AM, "Peter Childs" wrote: Thanks Sven, that's one of the best answers I've seen and probably closer to why we sometimes can't take snapshots under normal circumstances as well. We're currently running the restripe with "-N " so it only runs on a few nodes and does not disturb the work of the cluster, which is why we hadn't noticed it slow down the storage too much. I've also tried to put some qos settings on it too, I always find the qos a little bit "trial and error" but 30,000Iops looks to be making the rebalance run at about 2/3 iops it was using with no qos limit...... Just out of interest which version do I need to be running for "mmchqos -N" to work? I tried it to limit a set of nodes and it says not supported by my filesystem version. Manual does not look to say. Even with a very, very small value for qos on maintenance tasks, I still can't take snapshots so as Sven says the buffers are getting dirty too quickly. I have thought before that making snapshot taking more reliable would be nice, I'd not really thought it would be possible, I guess its time to write another RFE. Peter Childs Research Storage ITS Research Infrastructure Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Thursday, October 18, 2018 7:09:56 PM To: gpfsug main discussion list; gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Thu Oct 18 20:18:37 2018 From: oehmes at gmail.com (Sven Oehme) Date: Thu, 18 Oct 2018 12:18:37 -0700 Subject: [gpfsug-discuss] Can't take snapshots while re-striping In-Reply-To: References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: <47DF6EDF-CA0C-4EBB-851A-1D3603F8B0C5@gmail.com> I don't know which min FS version you need to make use of -N, but there is this Marc guy watching the mailing list who would know __ Sven ?On 10/18/18, 11:50 AM, "Peter Childs" wrote: Thanks Sven, that's one of the best answers I've seen and probably closer to why we sometimes can't take snapshots under normal circumstances as well. We're currently running the restripe with "-N " so it only runs on a few nodes and does not disturb the work of the cluster, which is why we hadn't noticed it slow down the storage too much. I've also tried to put some qos settings on it too, I always find the qos a little bit "trial and error" but 30,000Iops looks to be making the rebalance run at about 2/3 iops it was using with no qos limit...... Just out of interest which version do I need to be running for "mmchqos -N" to work? I tried it to limit a set of nodes and it says not supported by my filesystem version. Manual does not look to say. Even with a very, very small value for qos on maintenance tasks, I still can't take snapshots so as Sven says the buffers are getting dirty too quickly. I have thought before that making snapshot taking more reliable would be nice, I'd not really thought it would be possible, I guess its time to write another RFE. Peter Childs Research Storage ITS Research Infrastructure Queen Mary, University of London ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org on behalf of Sven Oehme Sent: Thursday, October 18, 2018 7:09:56 PM To: gpfsug main discussion list; gpfsug-discuss at gpfsug.org Subject: Re: [gpfsug-discuss] Can't take snapshots while re-striping Peter, If the 2 operations wouldn't be compatible you should have gotten a different message. To understand what the message means one needs to understand how the snapshot code works. When GPFS wants to do a snapshot it goes through multiple phases. It tries to first flush all dirty data a first time, then flushes new data a 2nd time and then tries to quiesce the filesystem, how to do this is quite complex, so let me try to explain. How much parallelism is used for the 2 sync periods is controlled by sync workers . sync1WorkerThreads 64 . sync2WorkerThreads 64 . syncBackgroundThreads 64 . syncWorkerThreads 64 and if my memory serves me correct the sync1 number is for the first flush, the sync2 for the 2nd flush while syncworkerthreads are used explicit by e.g. crsnapshot to flush dirty data (I am sure somebody from IBM will correct me if I state something wrong I mixed them up before ) : when data is flushed by background sync is triggered by the OS : root at dgx-1-01:~# sysctl -a |grep -i vm.dirty vm.dirty_background_bytes = 0 vm.dirty_background_ratio = 10 vm.dirty_bytes = 0 vm.dirty_expire_centisecs = 3000 vm.dirty_ratio = 20 vm.dirty_writeback_centisecs = 500. <--- this is 5 seconds as well as GPFS settings : syncInterval 5 syncIntervalStrict 0 here both are set to 5 seconds, so every 5 seconds there is a periodic background flush happening . why explain all this, because its very easy for a thread that does buffered i/o to make stuff dirty, a single thread can do 100's of thousands of i/os into memory so making stuff dirty is very easy. The number of threads described above need to clean all this stuff, means stabilizing it onto media and here is where it gets complicated. You already run rebalance, which puts a lot of work on the disk, on top I assume you don't have a idle filesystem , people make stuff dirty and the threads above compete flushing things , so it?s a battle they can't really win unless you have very fast storage or at least very fast and large caches in the storage, so the 64 threads in the example above can clean stuff faster than new data gets made dirty. So your choices are : 1. reduce workerthreads, so stuff gets less dirty. 2. turn writes into stable writes : mmchconfig forceOSyncWrites=yes (you can use -I while running) this will slow all write operations down on your system as all writes are now done synchronous, but because of that they can't make anything dirty, so the flushers actually don't have to do any work. While back at IBM I proposed to change the code to switch into O_SYNC mode dynamically between sync 1 and sync2 , this means for a seconds or 2 all writes would be done synchronous to not have the possibility to make things dirty so the quiesce actually doesn't get delayed and as soon as the quiesce happened remove the temporary enforced stable flag, but that proposal never got anywhere as no customer pushed for it. Maybe that would be worth a RFE __ Btw. I described some of the parameters in more detail here --> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf Some of that is outdated by now, but probably still the best summary presentation out there. Sven ?On 10/18/18, 8:32 AM, "Peter Childs" wrote: We've just added 9 raid volumes to our main storage, (5 Raid6 arrays for data and 4 Raid1 arrays for metadata) We are now attempting to rebalance and our data around all the volumes. We started with the meta-data doing a "mmrestripe -r" as we'd changed the failure groups to on our meta-data disks and wanted to ensure we had all our metadata on known good ssd. No issues, here we could take snapshots and I even tested it. (New SSD on new failure group and move all old SSD to the same failure group) We're now doing a "mmrestripe -b" to rebalance the data accross all 21 Volumes however when we attempt to take a snapshot, as we do every night at 11pm it fails with sudo /usr/lpp/mmfs/bin/mmcrsnapshot home test Flushing dirty data for snapshot :test... Quiescing all file system operations. Unable to quiesce all nodes; some processes are busy or holding required resources. mmcrsnapshot: Command failed. Examine previous error messages to determine cause. Are you meant to be able to take snapshots while re-striping or not? I know a rebalance of the data is probably unnecessary, but we'd like to get the best possible speed out of the system, and we also kind of like balance. Thanks -- Peter Childs ITS Research Storage Queen Mary, University of London _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From cblack at nygenome.org Thu Oct 18 20:13:29 2018 From: cblack at nygenome.org (Christopher Black) Date: Thu, 18 Oct 2018 19:13:29 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <8ec08801d1de486facbd6c3318e62d63@mbxtoa1.winmail.deshaw.com> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <8ec08801d1de486facbd6c3318e62d63@mbxtoa1.winmail.deshaw.com> Message-ID: <6FECF7F6-57E6-4164-BAF4-8ACF39453C59@nygenome.org> Other tools and approaches that we've found helpful: msrsync: handles parallelizing rsync within a dir tree and can greatly speed up transfers on a single node with both filesystems mounted, especially when dealing with many small files Globus/GridFTP: set up one or more endpoints on each side, gridftp will auto parallelize and recover from disruptions msrsync is easier to get going but is limited to one parent dir per node. We've sometimes done an additional level of parallelization by running msrsync with different top level directories on different hpc nodes simultaneously. Best, Chris Refs: https://github.com/jbd/msrsync https://www.globus.org/ ?On 10/18/18, 2:54 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Sanchez, Paul" wrote: Sharding can also work, if you have a storage-connected compute grid in your environment: If you enumerate all of the directories, then use a non-recursive rsync for each one, you may be able to parallelize the workload by using several clients simultaneously. It may still max out the links of these clients (assuming your source read throughput and target write throughput bottlenecks aren't encountered first) but it may run that way for 1/100th of the time if you can use 100+ machines. -Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Buterbaugh, Kevin L Sent: Thursday, October 18, 2018 2:26 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Hi Dwayne, I?m assuming you can?t just let an rsync run, possibly throttled in some way? If not, and if you?re just tapping out your network, then would it be possible to go old school? We have parts of the Medical Center here where their network connections are ? um, less than robust. So they tar stuff up to a portable HD, sneaker net it to us, and we untar is from an NSD server. HTH, and I really hope that someone has a better idea than that! Kevin > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote: > > Hi, > > Just wondering what the best recipe for migrating a user?s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I?m currently using rsync and it has maxed out the client system?s IB interface. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss-26amp-3Bdata-3D02-257C01-257CKevin.Buterbaugh-2540vanderbilt.edu-257Ccca728d2d61f4be06bcd08d6351f3650-257Cba5a7f39e3be4ab3b45067fa80faecad-257C0-257C0-257C636754805507359478-26amp-3Bsdata-3D2YAiqgqKl4CerlyCn3vJ9v9u-252FrGzbfa7aKxJ0PYV-252Fhc-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=NVJncSq-SKJSPgljdYqLDoy753jhxiKJNI2M8CexJME&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. From makaplan at us.ibm.com Thu Oct 18 20:30:21 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 18 Oct 2018 15:30:21 -0400 Subject: [gpfsug-discuss] Can't take snapshots while re-striping - "mmchqos -N" In-Reply-To: References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: I believe `mmchqos ... -N ... ` is supported at 4.2.2 and later. -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Thu Oct 18 20:30:21 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Thu, 18 Oct 2018 15:30:21 -0400 Subject: [gpfsug-discuss] Can't take snapshots while re-striping - "mmchqos -N" In-Reply-To: References: <03CE9BF2-F94C-455C-852C-C4BD7212BAE0@gmail.com> Message-ID: I believe `mmchqos ... -N ... ` is supported at 4.2.2 and later. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Dwayne.Hart at med.mun.ca Thu Oct 18 21:05:50 2018 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Thu, 18 Oct 2018 20:05:50 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <6FECF7F6-57E6-4164-BAF4-8ACF39453C59@nygenome.org> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <8ec08801d1de486facbd6c3318e62d63@mbxtoa1.winmail.deshaw.com> <6FECF7F6-57E6-4164-BAF4-8ACF39453C59@nygenome.org> Message-ID: Thank you all for the responses. I'm currently using msrsync and things appear to be going very well. The data transfer is contained inside our DC. I'm transferring a user's home directory content from one GPFS file system to another. Our IBM Spectrum Scale Solution consists of 12 IO nodes connected to IB and the client node that I'm transferring the data from one fs to another is also connected to IB with a possible maximum of 2 hops. [root at client-system]# /gpfs/home/dwayne/bin/msrsync -P --stats -p 32 /gpfs/home/user/ /research/project/user/ [64756/992397 entries] [30.1 T/239.6 T transferred] [81 entries/s] [39.0 G/s bw] [monq 0] [jq 62043] Best, Dwayne -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christopher Black Sent: Thursday, October 18, 2018 4:43 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Other tools and approaches that we've found helpful: msrsync: handles parallelizing rsync within a dir tree and can greatly speed up transfers on a single node with both filesystems mounted, especially when dealing with many small files Globus/GridFTP: set up one or more endpoints on each side, gridftp will auto parallelize and recover from disruptions msrsync is easier to get going but is limited to one parent dir per node. We've sometimes done an additional level of parallelization by running msrsync with different top level directories on different hpc nodes simultaneously. Best, Chris Refs: https://github.com/jbd/msrsync https://www.globus.org/ ?On 10/18/18, 2:54 PM, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Sanchez, Paul" wrote: Sharding can also work, if you have a storage-connected compute grid in your environment: If you enumerate all of the directories, then use a non-recursive rsync for each one, you may be able to parallelize the workload by using several clients simultaneously. It may still max out the links of these clients (assuming your source read throughput and target write throughput bottlenecks aren't encountered first) but it may run that way for 1/100th of the time if you can use 100+ machines. -Paul -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of Buterbaugh, Kevin L Sent: Thursday, October 18, 2018 2:26 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Hi Dwayne, I?m assuming you can?t just let an rsync run, possibly throttled in some way? If not, and if you?re just tapping out your network, then would it be possible to go old school? We have parts of the Medical Center here where their network connections are ? um, less than robust. So they tar stuff up to a portable HD, sneaker net it to us, and we untar is from an NSD server. HTH, and I really hope that someone has a better idea than that! Kevin > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote: > > Hi, > > Just wondering what the best recipe for migrating a user?s home directory content from one GFPS file system to another which hosts a larger research GPFS file system? I?m currently using rsync and it has maxed out the client system?s IB interface. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss-26amp-3Bdata-3D02-257C01-257CKevin.Buterbaugh-2540vanderbilt.edu-257Ccca728d2d61f4be06bcd08d6351f3650-257Cba5a7f39e3be4ab3b45067fa80faecad-257C0-257C0-257C636754805507359478-26amp-3Bsdata-3D2YAiqgqKl4CerlyCn3vJ9v9u-252FrGzbfa7aKxJ0PYV-252Fhc-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=NVJncSq-SKJSPgljdYqLDoy753jhxiKJNI2M8CexJME&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= ________________________________ This message is for the recipient?s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email. _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mutantllama at gmail.com Thu Oct 18 21:54:42 2018 From: mutantllama at gmail.com (Carl) Date: Fri, 19 Oct 2018 07:54:42 +1100 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <8ec08801d1de486facbd6c3318e62d63@mbxtoa1.winmail.deshaw.com> <6FECF7F6-57E6-4164-BAF4-8ACF39453C59@nygenome.org> Message-ID: It may be overkill for your use case but MPI file utils is very good for large datasets. https://github.com/hpc/mpifileutils Cheers, Carl. On Fri, 19 Oct 2018 at 7:05 am, wrote: > Thank you all for the responses. I'm currently using msrsync and things > appear to be going very well. > > The data transfer is contained inside our DC. I'm transferring a user's > home directory content from one GPFS file system to another. Our IBM > Spectrum Scale Solution consists of 12 IO nodes connected to IB and the > client node that I'm transferring the data from one fs to another is also > connected to IB with a possible maximum of 2 hops. > > [root at client-system]# /gpfs/home/dwayne/bin/msrsync -P --stats -p 32 > /gpfs/home/user/ /research/project/user/ > [64756/992397 entries] [30.1 T/239.6 T transferred] [81 entries/s] [39.0 > G/s bw] [monq 0] [jq 62043] > > Best, > Dwayne > > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org [mailto: > gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Christopher Black > Sent: Thursday, October 18, 2018 4:43 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Best way to migrate data > > Other tools and approaches that we've found helpful: > msrsync: handles parallelizing rsync within a dir tree and can greatly > speed up transfers on a single node with both filesystems mounted, > especially when dealing with many small files > Globus/GridFTP: set up one or more endpoints on each side, gridftp will > auto parallelize and recover from disruptions > > msrsync is easier to get going but is limited to one parent dir per node. > We've sometimes done an additional level of parallelization by running > msrsync with different top level directories on different hpc nodes > simultaneously. > > Best, > Chris > > Refs: > https://github.com/jbd/msrsync > https://www.globus.org/ > > ?On 10/18/18, 2:54 PM, "gpfsug-discuss-bounces at spectrumscale.org on > behalf of Sanchez, Paul" behalf of Paul.Sanchez at deshaw.com> wrote: > > Sharding can also work, if you have a storage-connected compute grid > in your environment: If you enumerate all of the directories, then use a > non-recursive rsync for each one, you may be able to parallelize the > workload by using several clients simultaneously. It may still max out the > links of these clients (assuming your source read throughput and target > write throughput bottlenecks aren't encountered first) but it may run that > way for 1/100th of the time if you can use 100+ machines. > > -Paul > -----Original Message----- > From: gpfsug-discuss-bounces at spectrumscale.org < > gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Buterbaugh, Kevin L > Sent: Thursday, October 18, 2018 2:26 PM > To: gpfsug main discussion list > Subject: Re: [gpfsug-discuss] Best way to migrate data > > Hi Dwayne, > > I?m assuming you can?t just let an rsync run, possibly throttled in > some way? If not, and if you?re just tapping out your network, then would > it be possible to go old school? We have parts of the Medical Center here > where their network connections are ? um, less than robust. So they tar > stuff up to a portable HD, sneaker net it to us, and we untar is from an > NSD server. > > HTH, and I really hope that someone has a better idea than that! > > Kevin > > > On Oct 18, 2018, at 12:19 PM, Dwayne.Hart at med.mun.ca wrote: > > > > Hi, > > > > Just wondering what the best recipe for migrating a user?s home > directory content from one GFPS file system to another which hosts a larger > research GPFS file system? I?m currently using rsync and it has maxed out > the client system?s IB interface. > > > > Best, > > Dwayne > > ? > > Dwayne Hart | Systems Administrator IV > > > > CHIA, Faculty of Medicine > > Memorial University of Newfoundland > > 300 Prince Philip Drive > > St. John?s, Newfoundland | A1B 3V6 > > Craig L Dobbin Building | 4M409 > > T 709 864 6631 > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttp-253A-252F-252Fgpfsug.org-252Fmailman-252Flistinfo-252Fgpfsug-2Ddiscuss-26amp-3Bdata-3D02-257C01-257CKevin.Buterbaugh-2540vanderbilt.edu-257Ccca728d2d61f4be06bcd08d6351f3650-257Cba5a7f39e3be4ab3b45067fa80faecad-257C0-257C0-257C636754805507359478-26amp-3Bsdata-3D2YAiqgqKl4CerlyCn3vJ9v9u-252FrGzbfa7aKxJ0PYV-252Fhc-253D-26amp-3Breserved-3D0&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=NVJncSq-SKJSPgljdYqLDoy753jhxiKJNI2M8CexJME&e= > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwIGaQ&c=C9X8xNkG_lwP_-eFHTGejw&r=DopWM-bvfskhBn2zeglfyyw5U2pumni6m_QzQFYFepU&m=e-U5zXflwxr0w9-5ia0FHn3tF1rwmM1qciZNrBLwFeg&s=oM0Uo8pPSV5bUj2Hyjzvw1q12Oug_mH-aYsM_R4Zfv4&e= > > > ________________________________ > > This message is for the recipient?s use only, and may contain > confidential, privileged or protected information. Any unauthorized use or > dissemination of this communication is prohibited. If you received this > message in error, please immediately notify the sender and destroy all > copies of this message. The recipient should check this email and any > attachments for the presence of viruses, as we accept no liability for any > damage caused by any virus transmitted by this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Fri Oct 19 10:09:13 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 19 Oct 2018 10:09:13 +0100 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Message-ID: On 18/10/2018 18:19, Dwayne.Hart at med.mun.ca wrote: > Hi, > > Just wondering what the best recipe for migrating a user?s home > directory content from one GFPS file system to another which hosts a > larger research GPFS file system? I?m currently using rsync and it > has maxed out the client system?s IB interface. > Be careful with rsync, it resets all your atimes which screws up any hope of doing ILM or HSM. My personal favourite is to do something along the lines of dsmc restore /gpfs/ Minimal impact on the user facing services, and seems to preserve atimes last time I checked. Sure it tanks your backup server a bit, but that is not user facing. What do users care if the backup takes longer than normal. Of course this presumes you have a backup :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From novosirj at rutgers.edu Thu Oct 18 21:04:36 2018 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Thu, 18 Oct 2018 20:04:36 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Message-ID: <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We use parsyncfp. Our target is not GPFS, though. I was really hoping to hear about something snazzier for GPFS-GPFS. Lenovo would probably tell you that HSM is the way to go (we asked something similar for a replacement for our current setup or for distributed storage). On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: > Hi, > > Just wondering what the best recipe for migrating a user?s home > directory content from one GFPS file system to another which hosts > a larger research GPFS file system? I?m currently using rsync and > it has maxed out the client system?s IB interface. > > Best, Dwayne ? Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine Memorial University of Newfoundland 300 > Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L > Dobbin Building | 4M409 T 709 864 6631 > _______________________________________________ gpfsug-discuss > mailing list gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > - -- ____ || \\UTGERS, |----------------------*O*------------------------ ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark `' -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk =dMDg -----END PGP SIGNATURE----- From Dwayne.Hart at med.mun.ca Fri Oct 19 11:15:15 2018 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Fri, 19 Oct 2018 10:15:15 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> Message-ID: Hi JAB, We do not have either ILM or HSM. Thankfully, we have at minimum IBM Spectrum Protect (I recently updated the system to version 8.1.5). It would be an interesting exercise to see how long it would take IBM SP to restore a user's content fully to a different target. I have done some smaller recoveries so I know that the system is in a usable state ;) Best, Dwayne -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Jonathan Buzzard Sent: Friday, October 19, 2018 6:39 AM To: gpfsug-discuss at spectrumscale.org Subject: Re: [gpfsug-discuss] Best way to migrate data On 18/10/2018 18:19, Dwayne.Hart at med.mun.ca wrote: > Hi, > > Just wondering what the best recipe for migrating a user?s home > directory content from one GFPS file system to another which hosts a > larger research GPFS file system? I?m currently using rsync and it has > maxed out the client system?s IB interface. > Be careful with rsync, it resets all your atimes which screws up any hope of doing ILM or HSM. My personal favourite is to do something along the lines of dsmc restore /gpfs/ Minimal impact on the user facing services, and seems to preserve atimes last time I checked. Sure it tanks your backup server a bit, but that is not user facing. What do users care if the backup takes longer than normal. Of course this presumes you have a backup :-) JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From Dwayne.Hart at med.mun.ca Fri Oct 19 11:37:13 2018 From: Dwayne.Hart at med.mun.ca (Dwayne.Hart at med.mun.ca) Date: Fri, 19 Oct 2018 10:37:13 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca>, <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> Message-ID: Thank you Ryan. I?ll have a more in-depth look at this application later today and see how it deals with some of the large genetic files that are generated by the sequencer. By copying it from GPFS fs to another GPFS fs. Best, Dwayne ? Dwayne Hart | Systems Administrator IV CHIA, Faculty of Medicine Memorial University of Newfoundland 300 Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L Dobbin Building | 4M409 T 709 864 6631 > On Oct 19, 2018, at 7:04 AM, Ryan Novosielski wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > We use parsyncfp. Our target is not GPFS, though. I was really hoping > to hear about something snazzier for GPFS-GPFS. Lenovo would probably > tell you that HSM is the way to go (we asked something similar for a > replacement for our current setup or for distributed storage). > >> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: >> Hi, >> >> Just wondering what the best recipe for migrating a user?s home >> directory content from one GFPS file system to another which hosts >> a larger research GPFS file system? I?m currently using rsync and >> it has maxed out the client system?s IB interface. >> >> Best, Dwayne ? Dwayne Hart | Systems Administrator IV >> >> CHIA, Faculty of Medicine Memorial University of Newfoundland 300 >> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L >> Dobbin Building | 4M409 T 709 864 6631 >> _______________________________________________ gpfsug-discuss >> mailing list gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > - -- > ____ > || \\UTGERS, |----------------------*O*------------------------ > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark > `' > -----BEGIN PGP SIGNATURE----- > > iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG > p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk > =dMDg > -----END PGP SIGNATURE----- > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From S.J.Thompson at bham.ac.uk Fri Oct 19 11:41:15 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 19 Oct 2018 10:41:15 +0000 Subject: [gpfsug-discuss] Spectrum Scale and Firewalls Message-ID: <10239ED8-7E0D-4420-8BEC-F17F0606BE64@bham.ac.uk> Hi, We?re having some issues bringing up firewalls on some of our NSD nodes. The problem I was actually trying to diagnose I don?t think is firewall related but still ? We have port 22 and 1191 open and also 60000-61000, we also set: # mmlsconfig tscTcpPort tscTcpPort 1191 # mmlsconfig tscCmdPortRange tscCmdPortRange 60000-61000 https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_firewallforinternalcommn.htm Claims this is sufficient ? Running mmnetverify: # mmnetverify all --target-nodes rds-er-mgr01 rds-pg-mgr01 checking local configuration. Operation interface: Success. rds-pg-mgr01 checking communication with node rds-er-mgr01. Operation resolution: Success. Operation ping: Success. Operation shell: Success. Operation copy: Success. Operation time: Success. Operation daemon-port: Success. Operation sdrserv-port: Success. Operation tsccmd-port: Success. Operation data-small: Success. Operation data-medium: Success. Operation data-large: Success. Could not connect to port 46326 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. rds-pg-mgr01 checking cluster communications. Issues Found: rds-er-mgr01 could not connect to rds-pg-mgr01 (TCP, port 46326). mmnetverify: Command failed. Examine previous error messages to determine cause. Note that the port number mentioned changes if we run mmnetverify multiple times. The two clients in this test are running 5.0.2 code. If I run in verbose mode I see: Checking network communication with node rds-er-mgr01. Port range restricted by cluster configuration: 60000 - 61000. rds-er-mgr01: connecting to node rds-pg-mgr01. rds-er-mgr01: exchanged 256.0M bytes with rds-pg-mgr01. Write size: 16.0M bytes. Network statistics for rds-er-mgr01 during data exchange: packets sent: 68112 packets received: 72452 Network Traffic between rds-er-mgr01 and rds-pg-mgr01 port 60000 ok. Operation data-large: Success. Checking network bandwidth. rds-er-mgr01: connecting to node rds-pg-mgr01. Could not connect to port 36277 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. So for many of the tests it looks like its using port 60000 as expected, is this just a bug in mmnetverify or am I doing something silly? Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Fri Oct 19 14:05:22 2018 From: knop at us.ibm.com (Felipe Knop) Date: Fri, 19 Oct 2018 09:05:22 -0400 Subject: [gpfsug-discuss] Spectrum Scale and Firewalls In-Reply-To: <10239ED8-7E0D-4420-8BEC-F17F0606BE64@bham.ac.uk> References: <10239ED8-7E0D-4420-8BEC-F17F0606BE64@bham.ac.uk> Message-ID: Simon, Depending on what functions are being used in Scale, other ports may also get used, as documented in https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_firewall.htm On the other hand, I'd initially speculate that you might be hitting a problem in mmnetverify itself. (perhaps some aspect in mmnetverify is not taking into account that ports other than 22, 1191, 60000-61000 may be getting blocked by the firewall) Could you open a PMR for this one? Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 10/19/2018 06:41 AM Subject: [gpfsug-discuss] Spectrum Scale and Firewalls Sent by: gpfsug-discuss-bounces at spectrumscale.org Hi, We?re having some issues bringing up firewalls on some of our NSD nodes. The problem I was actually trying to diagnose I don?t think is firewall related but still ? We have port 22 and 1191 open and also 60000-61000, we also set: # mmlsconfig tscTcpPort tscTcpPort 1191 # mmlsconfig tscCmdPortRange tscCmdPortRange 60000-61000 https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_firewallforinternalcommn.htm Claims this is sufficient ? Running mmnetverify: # mmnetverify all --target-nodes rds-er-mgr01 rds-pg-mgr01 checking local configuration. Operation interface: Success. rds-pg-mgr01 checking communication with node rds-er-mgr01. Operation resolution: Success. Operation ping: Success. Operation shell: Success. Operation copy: Success. Operation time: Success. Operation daemon-port: Success. Operation sdrserv-port: Success. Operation tsccmd-port: Success. Operation data-small: Success. Operation data-medium: Success. Operation data-large: Success. Could not connect to port 46326 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. rds-pg-mgr01 checking cluster communications. Issues Found: rds-er-mgr01 could not connect to rds-pg-mgr01 (TCP, port 46326). mmnetverify: Command failed. Examine previous error messages to determine cause. Note that the port number mentioned changes if we run mmnetverify multiple times. The two clients in this test are running 5.0.2 code. If I run in verbose mode I see: Checking network communication with node rds-er-mgr01. Port range restricted by cluster configuration: 60000 - 61000. rds-er-mgr01: connecting to node rds-pg-mgr01. rds-er-mgr01: exchanged 256.0M bytes with rds-pg-mgr01. Write size: 16.0M bytes. Network statistics for rds-er-mgr01 during data exchange: packets sent: 68112 packets received: 72452 Network Traffic between rds-er-mgr01 and rds-pg-mgr01 port 60000 ok. Operation data-large: Success. Checking network bandwidth. rds-er-mgr01: connecting to node rds-pg-mgr01. Could not connect to port 36277 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. So for many of the tests it looks like its using port 60000 as expected, is this just a bug in mmnetverify or am I doing something silly? Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Fri Oct 19 14:39:25 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 19 Oct 2018 13:39:25 +0000 Subject: [gpfsug-discuss] Spectrum Scale and Firewalls In-Reply-To: References: <10239ED8-7E0D-4420-8BEC-F17F0606BE64@bham.ac.uk> Message-ID: Yeah we have the perfmon ports open, and GUI ports open on the GUI nodes. But basically this is just a storage cluster and everything else (protocols etc) run in remote clusters. I?ve just opened a ticket ? no longer a PMR in the new support centre for Scale Simon From: on behalf of "knop at us.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Friday, 19 October 2018 at 14:05 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Spectrum Scale and Firewalls Simon, Depending on what functions are being used in Scale, other ports may also get used, as documented in https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_firewall.htm On the other hand, I'd initially speculate that you might be hitting a problem in mmnetverify itself. (perhaps some aspect in mmnetverify is not taking into account that ports other than 22, 1191, 60000-61000 may be getting blocked by the firewall) Could you open a PMR for this one? Thanks, Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 [Inactive hide details for Simon Thompson ---10/19/2018 06:41:27 AM---Hi, We?re having some issues bringing up firewalls on som]Simon Thompson ---10/19/2018 06:41:27 AM---Hi, We?re having some issues bringing up firewalls on some of our NSD nodes. The problem I was actua From: Simon Thompson To: "gpfsug-discuss at spectrumscale.org" Date: 10/19/2018 06:41 AM Subject: [gpfsug-discuss] Spectrum Scale and Firewalls Sent by: gpfsug-discuss-bounces at spectrumscale.org ________________________________ Hi, We?re having some issues bringing up firewalls on some of our NSD nodes. The problem I was actually trying to diagnose I don?t think is firewall related but still ? We have port 22 and 1191 open and also 60000-61000, we also set: # mmlsconfig tscTcpPort tscTcpPort 1191 # mmlsconfig tscCmdPortRange tscCmdPortRange 60000-61000 https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adv_firewallforinternalcommn.htm Claims this is sufficient ? Running mmnetverify: # mmnetverify all --target-nodes rds-er-mgr01 rds-pg-mgr01 checking local configuration. Operation interface: Success. rds-pg-mgr01 checking communication with node rds-er-mgr01. Operation resolution: Success. Operation ping: Success. Operation shell: Success. Operation copy: Success. Operation time: Success. Operation daemon-port: Success. Operation sdrserv-port: Success. Operation tsccmd-port: Success. Operation data-small: Success. Operation data-medium: Success. Operation data-large: Success. Could not connect to port 46326 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. rds-pg-mgr01 checking cluster communications. Issues Found: rds-er-mgr01 could not connect to rds-pg-mgr01 (TCP, port 46326). mmnetverify: Command failed. Examine previous error messages to determine cause. Note that the port number mentioned changes if we run mmnetverify multiple times. The two clients in this test are running 5.0.2 code. If I run in verbose mode I see: Checking network communication with node rds-er-mgr01. Port range restricted by cluster configuration: 60000 - 61000. rds-er-mgr01: connecting to node rds-pg-mgr01. rds-er-mgr01: exchanged 256.0M bytes with rds-pg-mgr01. Write size: 16.0M bytes. Network statistics for rds-er-mgr01 during data exchange: packets sent: 68112 packets received: 72452 Network Traffic between rds-er-mgr01 and rds-pg-mgr01 port 60000 ok. Operation data-large: Success. Checking network bandwidth. rds-er-mgr01: connecting to node rds-pg-mgr01. Could not connect to port 36277 on node rds-pg-mgr01 (10.20.0.56): timed out. This may indicate a firewall configuration issue. Operation bandwidth-node: Fail. So for many of the tests it looks like its using port 60000 as expected, is this just a bug in mmnetverify or am I doing something silly? Thanks Simon_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.gif Type: image/gif Size: 106 bytes Desc: image001.gif URL: From Robert.Oesterlin at nuance.com Fri Oct 19 16:33:04 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Fri, 19 Oct 2018 15:33:04 +0000 Subject: [gpfsug-discuss] SC18 - User Group Meeting - Agenda and Registration Message-ID: <041D6114-8F12-463F-BFB5-ABF1A1834DA1@nuance.com> SC18 is only 3 weeks away! Here is the (more or less) final agenda for the user group meeting. SSUG @ SC18 Sunday, November 11th 12:30PM - 18:00 Omni Dallas Hotel 555 S Lamar Dallas, Texas Please register at the IBM site here: https://www-01.ibm.com/events/wwe/grp/grp305.nsf/Agenda.xsp?locale=en_US&openform=&seminar=2DQMNHES# Looking forward to seeing everyone in Dallas! Bob, Kristy, and Simon Start End Duration Title 12:30 12:45 15 Welcome 12:45 13:15 30 Spectrum Scale Update 13:15 13:30 15 ESS Update 13:30 13:45 15 Service Update 13:45 14:05 20 Lessons learned from a very unusual year (Kevin Buterbaugh, Vanderbilt) 14:05 14:25 20 Implementing a scratch filesystem with E8 Storage NVMe (Tom King, Queen Mary University of London) 14:25 14:45 20 Spectrum Scale and Containers (John Lewars, IBM) 14:45 15:10 25 Break 15:10 15:30 20 Best Practices for Protocol Nodes (Tomer Perry/Ulf Troppens, IBM) 15:30 15:50 20 Network Design Tomer Perry/Ulf Troppens, IBM/Mellanox) 15:50 16:10 20 AI Discussion 16:10 16:30 20 Improving Spark workload performance with Spectrum Conductor on Spectrum Scale (Chris Schlipalius, Pawsey Supercomputing Centre) 16:30 16:50 20 Spectrum Scale @ DDN ? Technical update (Sven Oehme, DDN) 16:50 17:10 20 Burst Buffer (Tom Goodings) 17:10 17:30 20 MetaData Management 17:30 17:45 15 Lenovo Update (Michael Hennecke, Lenovo) 17:45 18:00 15 Ask us anything 18:00 Social Event (at the hotel) -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaineni at in.ibm.com Mon Oct 22 01:25:50 2018 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Mon, 22 Oct 2018 00:25:50 +0000 Subject: [gpfsug-discuss] Preliminary conclusion: single client, single thread, small files - native Scale vs NFS In-Reply-To: References: , Message-ID: An HTML attachment was scrubbed... URL: From oehmes at gmail.com Mon Oct 22 17:18:43 2018 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 22 Oct 2018 09:18:43 -0700 Subject: [gpfsug-discuss] GPFS, Pagepool and Block size -> Perfomance reduces with larger block size In-Reply-To: <243c5d36-f25e-4ebb-b9f3-6fc47bc6d93c@Spark> References: <6bb509b7-b7c5-422d-8e27-599333b6b7c4@Spark> <013aeb31-ebd2-4cc7-97d1-06883d9569f7@Spark> <243c5d36-f25e-4ebb-b9f3-6fc47bc6d93c@Spark> Message-ID: oops, somehow that slipped my inbox, i only saw that reply right now. its really hard to see from a trace snipped if the lock is the issue as the lower level locks don't show up in default traces. without having access to source code and a detailed trace you won't make much progress here. sven On Thu, Sep 27, 2018 at 12:31 PM wrote: > Thank you Sven, > > Turning of prefetching did not improve the performance, but it did degrade > a bit. > > I have made the prefetching default and took trace dump, for tracectl with > trace=io. Let me know if you want me to paste/attach it here. > > May i know, how could i confirm if the below is true? > > 1. this could be serialization around buffer locks. as larger your >>> blocksize gets as larger is the amount of data one of this pagepool buffers >>> will maintain, if there is a lot of concurrency on smaller amount of data >>> more threads potentially compete for the same buffer lock to copy stuff in >>> and out of a particular buffer, hence things go slower compared to the same >>> amount of data spread across more buffers, each of smaller size. >>> >>> > Will the above trace help in understanding if it is a serialization issue? > > I had been discussing the same with GPFS support for past few months, and > it seems to be that most of the time is being spent at cxiUXfer. They could > not understand on why it is taking spending so much of time in cxiUXfer. I > was seeing the same from perf top, and pagefaults. > > Below is snippet from what the support had said : > > ???????????????????????????? > > I searched all of the gpfsRead from trace and sort them by spending-time. > Except 2 reads which need fetch data from nsd server, the slowest read is > in the thread 72170. It took 112470.362 us. > > > trcrpt.2018-08-06_12.27.39.55538.lt15.trsum: 72165 6.860911319 > rdwr 141857.076 us + NSDIO > > trcrpt.2018-08-06_12.26.28.39794.lt15.trsum: 72170 1.483947593 > rdwr 112470.362 us + cxiUXfer > > trcrpt.2018-08-06_12.27.39.55538.lt15.trsum: 72165 6.949042593 > rdwr 88126.278 us + NSDIO > > trcrpt.2018-08-06_12.27.03.47706.lt15.trsum: 72156 2.919334474 > rdwr 81057.657 us + cxiUXfer > > trcrpt.2018-08-06_12.23.30.72745.lt15.trsum: 72154 1.167484466 > rdwr 76033.488 us + cxiUXfer > > trcrpt.2018-08-06_12.24.06.7508.lt15.trsum: 72187 0.685237501 > rdwr 70772.326 us + cxiUXfer > > trcrpt.2018-08-06_12.25.17.23989.lt15.trsum: 72193 4.757996530 > rdwr 70447.838 us + cxiUXfer > > > I check each of the slow IO as above, and find they all spend much time in > the function cxiUXfer. This function is used to copy data from kernel > buffer to user buffer. I am not sure why it took so much time. This should > be related to the pagefaults and pgfree you observed. Below is the trace > data for thread 72170. > > > 1.371477231 72170 TRACE_VNODE: gpfs_f_rdwr enter: fP > 0xFFFF882541649400 f_flags 0x8000 flags 0x8001 op 0 iovec > 0xFFFF881F2AFB3E70 count 1 offset 0x168F30D dentry 0xFFFF887C0CC298C0 > private 0xFFFF883F607175C0 iP 0xFFFF8823AA3CBFC0 name '410513.svs' > > .... > > 1.371483547 72170 TRACE_KSVFS: cachedReadFast exit: > uio_resid 16777216 code 1 err 11 > > .... > > 1.371498780 72170 TRACE_KSVFS: kSFSReadFast: oiP > 0xFFFFC90060B46740 offset 0x168F30D dataBufP FFFFC9003645A5A8 nDesc 64 buf > 200043C0000 valid words 64 dirty words 0 blkOff 0 > > 1.371499035 72170 TRACE_LOG: > UpdateLogger::beginDataUpdate begin ul 0xFFFFC900333F1A40 holdCount 0 > ioType 0x2 inProg 0x15 > > 1.371500157 72170 TRACE_LOG: > UpdateLogger::beginDataUpdate ul 0xFFFFC900333F1A40 holdCount 0 ioType 0x2 > inProg 0x16 err 0 > > 1.371500606 72170 TRACE_KSVFS: cxiUXfer: nDesc 64 1st > dataPtr 0x200043C0000 plP 0xFFFF887F7B90D600 toIOBuf 0 offset 6877965 len > 9899251 > > 1.371500793 72170 TRACE_KSVFS: cxiUXfer: ndesc 0 skip > dataAddrP 0x200043C0000 currOffset 0 currLen 262144 bufOffset 6877965 > > .... > > 1.371505949 72170 TRACE_KSVFS: cxiUXfer: ndesc 25 skip > dataAddrP 0x2001AF80000 currOffset 6553600 currLen 262144 bufOffset 6877965 > > 1.371506236 72170 TRACE_KSVFS: cxiUXfer: nDesc 26 > currOffset 6815744 tmpLen 262144 dataAddrP 0x2001AFCF30D currLen 199923 > pageOffset 781 pageLen 3315 plP 0xFFFF887F7B90D600 > > 1.373649823 72170 TRACE_KSVFS: cxiUXfer: nDesc 27 > currOffset 7077888 tmpLen 262144 dataAddrP 0x20027400000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > 1.375158799 72170 TRACE_KSVFS: cxiUXfer: nDesc 28 > currOffset 7340032 tmpLen 262144 dataAddrP 0x20027440000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > 1.376661566 72170 TRACE_KSVFS: cxiUXfer: nDesc 29 > currOffset 7602176 tmpLen 262144 dataAddrP 0x2002C180000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > 1.377892653 72170 TRACE_KSVFS: cxiUXfer: nDesc 30 > currOffset 7864320 tmpLen 262144 dataAddrP 0x2002C1C0000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > .... > > 1.471389843 72170 TRACE_KSVFS: cxiUXfer: nDesc 62 > currOffset 16252928 tmpLen 262144 dataAddrP 0x2001D2C0000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > 1.471845629 72170 TRACE_KSVFS: cxiUXfer: nDesc 63 > currOffset 16515072 tmpLen 262144 dataAddrP 0x2003EC80000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B90D600 > > 1.472417149 72170 TRACE_KSVFS: cxiDetachIOBuffer: > dataPtr 0x200043C0000 plP 0xFFFF887F7B90D600 > > 1.472417775 72170 TRACE_LOCK: unlock_vfs: type Data, > key 0000000000000004:000000001B1F24BF:0000000000000001 lock_mode have ro > token xw lock_state old [ ro:27 ] new [ ro:26 ] holdCount now 27 > > 1.472418427 72170 TRACE_LOCK: hash tab lookup vfs: > found cP 0xFFFFC9005FC0CDE0 holdCount now 14 > > 1.472418592 72170 TRACE_LOCK: lock_vfs: type Data key > 0000000000000004:000000001B1F24BF:0000000000000002 lock_mode want ro status > valid token xw/xw lock_state [ ro:12 ] flags 0x0 holdCount 14 > > 1.472419842 72170 TRACE_KSVFS: kSFSReadFast: oiP > 0xFFFFC90060B46740 offset 0x2000000 dataBufP FFFFC9003643C908 nDesc 64 buf > 38033480000 valid words 64 dirty words 0 blkOff 0 > > 1.472420029 72170 TRACE_LOG: > UpdateLogger::beginDataUpdate begin ul 0xFFFFC9005FC0CF98 holdCount 0 > ioType 0x2 inProg 0xC > > 1.472420187 72170 TRACE_LOG: > UpdateLogger::beginDataUpdate ul 0xFFFFC9005FC0CF98 holdCount 0 ioType 0x2 > inProg 0xD err 0 > > 1.472420652 72170 TRACE_KSVFS: cxiUXfer: nDesc 64 1st > dataPtr 0x38033480000 plP 0xFFFF887F7B934320 toIOBuf 0 offset 0 len 6877965 > > 1.472420936 72170 TRACE_KSVFS: cxiUXfer: nDesc 0 > currOffset 0 tmpLen 262144 dataAddrP 0x38033480000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > 1.472824790 72170 TRACE_KSVFS: cxiUXfer: nDesc 1 > currOffset 262144 tmpLen 262144 dataAddrP 0x380334C0000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > 1.473243905 72170 TRACE_KSVFS: cxiUXfer: nDesc 2 > currOffset 524288 tmpLen 262144 dataAddrP 0x38024280000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > .... > > 1.482949347 72170 TRACE_KSVFS: cxiUXfer: nDesc 24 > currOffset 6291456 tmpLen 262144 dataAddrP 0x38025E80000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > 1.483354265 72170 TRACE_KSVFS: cxiUXfer: nDesc 25 > currOffset 6553600 tmpLen 262144 dataAddrP 0x38025EC0000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > 1.483766631 72170 TRACE_KSVFS: cxiUXfer: nDesc 26 > currOffset 6815744 tmpLen 262144 dataAddrP 0x38003B00000 currLen 262144 > pageOffset 0 pageLen 4096 plP 0xFFFF887F7B934320 > > 1.483943894 72170 TRACE_KSVFS: cxiDetachIOBuffer: > dataPtr 0x38033480000 plP 0xFFFF887F7B934320 > > 1.483944339 72170 TRACE_LOCK: unlock_vfs: type Data, > key 0000000000000004:000000001B1F24BF:0000000000000002 lock_mode have ro > token xw lock_state old [ ro:14 ] new [ ro:13 ] holdCount now 14 > > 1.483944683 72170 TRACE_BRL: brUnlockM: ofP > 0xFFFFC90069346B68 inode 455025855 snap 0 handle 0xFFFFC9003637D020 range > 0x168F30D-0x268F30C mode ro > > 1.483944985 72170 TRACE_KSVFS: kSFSReadFast exit: > uio_resid 0 err 0 > > 1.483945264 72170 TRACE_LOCK: unlock_vfs_m: type > Inode, key 305F105B9701E60A:000000001B1F24BF:0000000000000000 lock_mode > have ro status valid token rs lock_state old [ ro:25 ] new [ ro:24 ] > > 1.483945423 72170 TRACE_LOCK: unlock_vfs_m: cP > 0xFFFFC90069346B68 holdCount 25 > > 1.483945624 72170 TRACE_VNODE: gpfsRead exit: fast err > 0 > > 1.483946831 72170 TRACE_KSVFS: ReleSG: sli 38 sgP > 0xFFFFC90035E52F78 NotQuiesced vfsOp 2 > > 1.483946975 72170 TRACE_KSVFS: ReleSG: sli 38 sgP > 0xFFFFC90035E52F78 vfsOp 2 users 1-1 > > 1.483947116 72170 TRACE_KSVFS: ReleaseDaemonSegAndSG: > sli 38 count 2 needCleanup 0 > > 1.483947593 72170 TRACE_VNODE: gpfs_f_rdwr exit: fP > 0xFFFF882541649400 total_len 16777216 uio_resid 0 offset 0x268F30D rc 0 > > > ??????????????????????????????????????????? > > > > Regards, > Lohit > > On Sep 19, 2018, 3:11 PM -0400, Sven Oehme , wrote: > > the document primarily explains all performance specific knobs. general > advice would be to longer set anything beside workerthreads, pagepool and > filecache on 5.X systems as most other settings are no longer relevant > (thats a client side statement) . thats is true until you hit strange > workloads , which is why all the knobs are still there :-) > > sven > > > On Wed, Sep 19, 2018 at 11:17 AM wrote: > >> Thanks Sven. >> I will disable it completely and see how it behaves. >> >> Is this the presentation? >> >> http://files.gpfsug.org/presentations/2014/UG10_GPFS_Performance_Session_v10.pdf >> >> I guess i read it, but it did not strike me at this situation. I will try >> to read it again and see if i could make use of it. >> >> Regards, >> Lohit >> >> On Sep 19, 2018, 2:12 PM -0400, Sven Oehme , wrote: >> >> seem like you never read my performance presentation from a few years ago >> ;-) >> >> you can control this on a per node basis , either for all i/o : >> >> prefetchAggressiveness = X >> >> or individual for reads or writes : >> >> prefetchAggressivenessRead = X >> prefetchAggressivenessWrite = X >> >> for a start i would turn it off completely via : >> >> mmchconfig prefetchAggressiveness=0 -I -N nodename >> >> that will turn it off only for that node and only until you restart the >> node. >> then see what happens >> >> sven >> >> >> On Wed, Sep 19, 2018 at 11:07 AM wrote: >> >>> Thank you Sven. >>> >>> I mostly think it could be 1. or some other issue. >>> I don?t think it could be 2. , because i can replicate this issue no >>> matter what is the size of the dataset. It happens for few files that could >>> easily fit in the page pool too. >>> >>> I do see a lot more page faults for 16M compared to 1M, so it could be >>> related to many threads trying to compete for the same buffer space. >>> >>> I will try to take the trace with trace=io option and see if can find >>> something. >>> >>> How do i turn of prefetching? Can i turn it off for a single >>> node/client? >>> >>> Regards, >>> Lohit >>> >>> On Sep 18, 2018, 5:23 PM -0400, Sven Oehme , wrote: >>> >>> Hi, >>> >>> taking a trace would tell for sure, but i suspect what you might be >>> hitting one or even multiple issues which have similar negative performance >>> impacts but different root causes. >>> >>> 1. this could be serialization around buffer locks. as larger your >>> blocksize gets as larger is the amount of data one of this pagepool buffers >>> will maintain, if there is a lot of concurrency on smaller amount of data >>> more threads potentially compete for the same buffer lock to copy stuff in >>> and out of a particular buffer, hence things go slower compared to the same >>> amount of data spread across more buffers, each of smaller size. >>> >>> 2. your data set is small'ish, lets say a couple of time bigger than the >>> pagepool and you random access it with multiple threads. what will happen >>> is that because it doesn't fit into the cache it will be read from the >>> backend. if multiple threads hit the same 16 mb block at once with multiple >>> 4k random reads, it will read the whole 16mb block because it thinks it >>> will benefit from it later on out of cache, but because it fully random the >>> same happens with the next block and the next and so on and before you get >>> back to this block it was pushed out of the cache because of lack of enough >>> pagepool. >>> >>> i could think of multiple other scenarios , which is why its so hard to >>> accurately benchmark an application because you will design a benchmark to >>> test an application, but it actually almost always behaves different then >>> you think it does :-) >>> >>> so best is to run the real application and see under which configuration >>> it works best. >>> >>> you could also take a trace with trace=io and then look at >>> >>> TRACE_VNOP: READ: >>> TRACE_VNOP: WRITE: >>> >>> and compare them to >>> >>> TRACE_IO: QIO: read >>> TRACE_IO: QIO: write >>> >>> and see if the numbers summed up for both are somewhat equal. if >>> TRACE_VNOP is significant smaller than TRACE_IO you most likely do more i/o >>> than you should and turning prefetching off might actually make things >>> faster . >>> >>> keep in mind i am no longer working for IBM so all i say might be >>> obsolete by now, i no longer have access to the one and only truth aka the >>> source code ... but if i am wrong i am sure somebody will point this out >>> soon ;-) >>> >>> sven >>> >>> >>> >>> >>> On Tue, Sep 18, 2018 at 10:31 AM wrote: >>> >>>> Hello All, >>>> >>>> This is a continuation to the previous discussion that i had with Sven. >>>> However against what i had mentioned previously - i realize that this >>>> is ?not? related to mmap, and i see it when doing random freads. >>>> >>>> I see that block-size of the filesystem matters when reading from Page >>>> pool. >>>> I see a major difference in performance when compared 1M to 16M, when >>>> doing lot of random small freads with all of the data in pagepool. >>>> >>>> Performance for 1M is a magnitude ?more? than the performance that i >>>> see for 16M. >>>> >>>> The GPFS that we have currently is : >>>> Version : 5.0.1-0.5 >>>> Filesystem version: 19.01 (5.0.1.0) >>>> Block-size : 16M >>>> >>>> I had made the filesystem block-size to be 16M, thinking that i would >>>> get the most performance for both random/sequential reads from 16M than the >>>> smaller block-sizes. >>>> With GPFS 5.0, i made use the 1024 sub-blocks instead of 32 and thus >>>> not loose lot of storage space even with 16M. >>>> I had run few benchmarks and i did see that 16M was performing better >>>> ?when hitting storage/disks? with respect to bandwidth for >>>> random/sequential on small/large reads. >>>> >>>> However, with this particular workload - where it freads a chunk of >>>> data randomly from hundreds of files -> I see that the number of >>>> page-faults increase with block-size and actually reduce the performance. >>>> 1M performs a lot better than 16M, and may be i will get better >>>> performance with less than 1M. >>>> It gives the best performance when reading from local disk, with 4K >>>> block size filesystem. >>>> >>>> What i mean by performance when it comes to this workload - is not the >>>> bandwidth but the amount of time that it takes to do each iteration/read >>>> batch of data. >>>> >>>> I figure what is happening is: >>>> fread is trying to read a full block size of 16M - which is good in a >>>> way, when it hits the hard disk. >>>> But the application could be using just a small part of that 16M. Thus >>>> when randomly reading(freads) lot of data of 16M chunk size - it is page >>>> faulting a lot more and causing the performance to drop . >>>> I could try to make the application do read instead of freads, but i >>>> fear that could be bad too since it might be hitting the disk with a very >>>> small block size and that is not good. >>>> >>>> With the way i see things now - >>>> I believe it could be best if the application does random reads of >>>> 4k/1M from pagepool but some how does 16M from rotating disks. >>>> >>>> I don?t see any way of doing the above other than following a different >>>> approach where i create a filesystem with a smaller block size ( 1M or less >>>> than 1M ), on SSDs as a tier. >>>> >>>> May i please ask for advise, if what i am understanding/seeing is right >>>> and the best solution possible for the above scenario. >>>> >>>> Regards, >>>> Lohit >>>> >>>> On Apr 11, 2018, 10:36 AM -0400, Lohit Valleru , >>>> wrote: >>>> >>>> Hey Sven, >>>> >>>> This is regarding mmap issues and GPFS. >>>> We had discussed previously of experimenting with GPFS 5. >>>> >>>> I now have upgraded all of compute nodes and NSD nodes to GPFS 5.0.0.2 >>>> >>>> I am yet to experiment with mmap performance, but before that - I am >>>> seeing weird hangs with GPFS 5 and I think it could be related to mmap. >>>> >>>> Have you seen GPFS ever hang on this syscall? >>>> [Tue Apr 10 04:20:13 2018] [] >>>> _ZN10gpfsNode_t8mmapLockEiiPKj+0xb5/0x140 [mmfs26] >>>> >>>> I see the above ,when kernel hangs and throws out a series of trace >>>> calls. >>>> >>>> I somehow think the above trace is related to processes hanging on GPFS >>>> forever. There are no errors in GPFS however. >>>> >>>> Also, I think the above happens only when the mmap threads go above a >>>> particular number. >>>> >>>> We had faced a similar issue in 4.2.3 and it was resolved in a patch to >>>> 4.2.3.2 . At that time , the issue happened when mmap threads go more than >>>> worker1threads. According to the ticket - it was a mmap race condition that >>>> GPFS was not handling well. >>>> >>>> I am not sure if this issue is a repeat and I am yet to isolate the >>>> incident and test with increasing number of mmap threads. >>>> >>>> I am not 100 percent sure if this is related to mmap yet but just >>>> wanted to ask you if you have seen anything like above. >>>> >>>> Thanks, >>>> >>>> Lohit >>>> >>>> On Feb 22, 2018, 3:59 PM -0500, Sven Oehme , wrote: >>>> >>>> Hi Lohit, >>>> >>>> i am working with ray on a mmap performance improvement right now, >>>> which most likely has the same root cause as yours , see --> >>>> http://gpfsug.org/pipermail/gpfsug-discuss/2018-January/004411.html >>>> the thread above is silent after a couple of back and rorth, but ray >>>> and i have active communication in the background and will repost as soon >>>> as there is something new to share. >>>> i am happy to look at this issue after we finish with ray's workload if >>>> there is something missing, but first let's finish his, get you try the >>>> same fix and see if there is something missing. >>>> >>>> btw. if people would share their use of MMAP , what applications they >>>> use (home grown, just use lmdb which uses mmap under the cover, etc) please >>>> let me know so i get a better picture on how wide the usage is with GPFS. i >>>> know a lot of the ML/DL workloads are using it, but i would like to know >>>> what else is out there i might not think about. feel free to drop me a >>>> personal note, i might not reply to it right away, but eventually. >>>> >>>> thx. sven >>>> >>>> >>>> On Thu, Feb 22, 2018 at 12:33 PM wrote: >>>> >>>>> Hi all, >>>>> >>>>> I wanted to know, how does mmap interact with GPFS pagepool with >>>>> respect to filesystem block-size? >>>>> Does the efficiency depend on the mmap read size and the block-size of >>>>> the filesystem even if all the data is cached in pagepool? >>>>> >>>>> GPFS 4.2.3.2 and CentOS7. >>>>> >>>>> Here is what i observed: >>>>> >>>>> I was testing a user script that uses mmap to read from 100M to 500MB >>>>> files. >>>>> >>>>> The above files are stored on 3 different filesystems. >>>>> >>>>> Compute nodes - 10G pagepool and 5G seqdiscardthreshold. >>>>> >>>>> 1. 4M block size GPFS filesystem, with separate metadata and data. >>>>> Data on Near line and metadata on SSDs >>>>> 2. 1M block size GPFS filesystem as a AFM cache cluster, "with all the >>>>> required files fully cached" from the above GPFS cluster as home. Data and >>>>> Metadata together on SSDs >>>>> 3. 16M block size GPFS filesystem, with separate metadata and data. >>>>> Data on Near line and metadata on SSDs >>>>> >>>>> When i run the script first time for ?each" filesystem: >>>>> I see that GPFS reads from the files, and caches into the pagepool as >>>>> it reads, from mmdiag -- iohist >>>>> >>>>> When i run the second time, i see that there are no IO requests from >>>>> the compute node to GPFS NSD servers, which is expected since all the data >>>>> from the 3 filesystems is cached. >>>>> >>>>> However - the time taken for the script to run for the files in the 3 >>>>> different filesystems is different - although i know that they are just >>>>> "mmapping"/reading from pagepool/cache and not from disk. >>>>> >>>>> Here is the difference in time, for IO just from pagepool: >>>>> >>>>> 20s 4M block size >>>>> 15s 1M block size >>>>> 40S 16M block size. >>>>> >>>>> Why do i see a difference when trying to mmap reads from different >>>>> block-size filesystems, although i see that the IO requests are not hitting >>>>> disks and just the pagepool? >>>>> >>>>> I am willing to share the strace output and mmdiag outputs if needed. >>>>> >>>>> Thanks, >>>>> Lohit >>>>> >>>>> _______________________________________________ >>>>> gpfsug-discuss mailing list >>>>> gpfsug-discuss at spectrumscale.org >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>>> _______________________________________________ >>>> gpfsug-discuss mailing list >>>> gpfsug-discuss at spectrumscale.org >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Mon Oct 22 16:21:06 2018 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Mon, 22 Oct 2018 15:21:06 +0000 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> Message-ID: <3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> It seems like the primary way that this helps us is that we transfer user home directories and many of them have VERY large numbers of small files (in the millions), so running multiple simultaneous rsyncs allows the transfer to continue past that one slow area. I guess it balances the bandwidth constraint and the I/O constraints on generating a file list. There are unfortunately one or two known bugs that slow it down ? it keeps track of its rsync PIDs but sometimes a former rsync PID is reused by the system which it counts against the number of running rsyncs. It can also think rsync is still running at the end when it?s really something else now using the PID. I know the author is looking at that. For shorter transfers, you likely won?t run into this. I?m not sure I have the time or the programming ability to make this happen, but it seems to me that one could make some major gains by replacing fpart with mmfind in a GPFS environment. Generating lists of files takes a significant amount of time and mmfind can probably do it faster than anything else that does not have direct access to GPFS metadata. > On Oct 19, 2018, at 6:37 AM, Dwayne.Hart at med.mun.ca wrote: > > Thank you Ryan. I?ll have a more in-depth look at this application later today and see how it deals with some of the large genetic files that are generated by the sequencer. By copying it from GPFS fs to another GPFS fs. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > >> On Oct 19, 2018, at 7:04 AM, Ryan Novosielski wrote: >> >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> We use parsyncfp. Our target is not GPFS, though. I was really hoping >> to hear about something snazzier for GPFS-GPFS. Lenovo would probably >> tell you that HSM is the way to go (we asked something similar for a >> replacement for our current setup or for distributed storage). >> >>> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: >>> Hi, >>> >>> Just wondering what the best recipe for migrating a user?s home >>> directory content from one GFPS file system to another which hosts >>> a larger research GPFS file system? I?m currently using rsync and >>> it has maxed out the client system?s IB interface. >>> >>> Best, Dwayne ? Dwayne Hart | Systems Administrator IV >>> >>> CHIA, Faculty of Medicine Memorial University of Newfoundland 300 >>> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L >>> Dobbin Building | 4M409 T 709 864 6631 >>> _______________________________________________ gpfsug-discuss >>> mailing list gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> - -- >> ____ >> || \\UTGERS, |----------------------*O*------------------------ >> ||_// the State | Ryan Novosielski - novosirj at rutgers.edu >> || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus >> || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark >> `' >> -----BEGIN PGP SIGNATURE----- >> >> iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG >> p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk >> =dMDg >> -----END PGP SIGNATURE----- >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From oehmes at gmail.com Mon Oct 22 19:11:06 2018 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 22 Oct 2018 11:11:06 -0700 Subject: [gpfsug-discuss] Best way to migrate data In-Reply-To: <3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> <3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> Message-ID: i am not sure if that was mentioned already but in some version of V5.0.X based on my suggestion a tool was added by mark on a AS-IS basis (thanks mark) to do what you want with one exception : /usr/lpp/mmfs/samples/ilm/mmxcp -h Usage: /usr/lpp/mmfs/samples/ilm/mmxcp -t target -p strip_count source_pathname1 source_pathname2 ... Run "cp" in a mmfind ... -xarg ... pipeline, e.g. mmfind -polFlags '-N all -g /gpfs/tmp' /gpfs/source -gpfsWeight DIRECTORY_HASH -xargs mmxcp -t /target -p 2 Options: -t target_path : Copy files to this path. -p strip_count : Remove this many directory names from the pathnames of the source files. -a : pass -a to cp -v : pass -v to cp this is essentially a parallel copy tool using the policy with all its goddies. the one critical part thats missing is that it doesn't copy any GPFS specific metadata which unfortunate includes NFSV4 ACL's. the reason for that is that GPFS doesn't expose the NFSV4 ACl's via xattrs nor does any of the regular Linux tools uses the proprietary interface into GPFS to extract and apply them (this is what allows this magic unsupported version of rsync https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync to transfer the acls and other attributes). so a worth while RFE would be to either expose all special GPFS bits as xattrs or provide at least a maintained version of sync, cp or whatever which allows the transfer of this data. Sven On Mon, Oct 22, 2018 at 10:52 AM Ryan Novosielski wrote: > It seems like the primary way that this helps us is that we transfer user > home directories and many of them have VERY large numbers of small files > (in the millions), so running multiple simultaneous rsyncs allows the > transfer to continue past that one slow area. I guess it balances the > bandwidth constraint and the I/O constraints on generating a file list. > There are unfortunately one or two known bugs that slow it down ? it keeps > track of its rsync PIDs but sometimes a former rsync PID is reused by the > system which it counts against the number of running rsyncs. It can also > think rsync is still running at the end when it?s really something else now > using the PID. I know the author is looking at that. For shorter transfers, > you likely won?t run into this. > > I?m not sure I have the time or the programming ability to make this > happen, but it seems to me that one could make some major gains by > replacing fpart with mmfind in a GPFS environment. Generating lists of > files takes a significant amount of time and mmfind can probably do it > faster than anything else that does not have direct access to GPFS metadata. > > > On Oct 19, 2018, at 6:37 AM, Dwayne.Hart at med.mun.ca wrote: > > > > Thank you Ryan. I?ll have a more in-depth look at this application later > today and see how it deals with some of the large genetic files that are > generated by the sequencer. By copying it from GPFS fs to another GPFS fs. > > > > Best, > > Dwayne > > ? > > Dwayne Hart | Systems Administrator IV > > > > CHIA, Faculty of Medicine > > Memorial University of Newfoundland > > 300 Prince Philip Drive > > St. John?s, Newfoundland | A1B 3V6 > > Craig L Dobbin Building | 4M409 > > T 709 864 6631 <(709)%20864-6631> > > > >> On Oct 19, 2018, at 7:04 AM, Ryan Novosielski > wrote: > >> > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> We use parsyncfp. Our target is not GPFS, though. I was really hoping > >> to hear about something snazzier for GPFS-GPFS. Lenovo would probably > >> tell you that HSM is the way to go (we asked something similar for a > >> replacement for our current setup or for distributed storage). > >> > >>> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: > >>> Hi, > >>> > >>> Just wondering what the best recipe for migrating a user?s home > >>> directory content from one GFPS file system to another which hosts > >>> a larger research GPFS file system? I?m currently using rsync and > >>> it has maxed out the client system?s IB interface. > >>> > >>> Best, Dwayne ? Dwayne Hart | Systems Administrator IV > >>> > >>> CHIA, Faculty of Medicine Memorial University of Newfoundland 300 > >>> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L > >>> Dobbin Building | 4M409 T 709 864 6631 <(709)%20864-6631> > >>> _______________________________________________ gpfsug-discuss > >>> mailing list gpfsug-discuss at spectrumscale.org > >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >>> > >> > >> - -- > >> ____ > >> || \\UTGERS, |----------------------*O*------------------------ > >> ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > >> || \\ University | Sr. Technologist - 973/972.0922 <(973)%20972-0922> > ~*~ RBHS Campus > >> || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark > >> `' > >> -----BEGIN PGP SIGNATURE----- > >> > >> iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG > >> p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk > >> =dMDg > >> -----END PGP SIGNATURE----- > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Oct 22 21:08:49 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 22 Oct 2018 16:08:49 -0400 Subject: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca><92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu><3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> Message-ID: Rather than hack rsync or cp ... I proposed a smallish utility that would copy those extended attributes and ACLs that cp -a just skips over. This can be done using the documented GPFS APIs that were designed for backup and restore of files. SMOP and then add it as an option to samples/ilm/mmxcp Sorry I haven't gotten around to doing this ... Seems like a modest sized project... Avoids boiling the ocean and reinventing or hacking rsync. -- marc K -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Mon Oct 22 21:15:52 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Mon, 22 Oct 2018 20:15:52 +0000 Subject: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca><92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu><3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> , Message-ID: Can you use mmxcp with output from tsbuhelper? Becuase this would actually be a pretty good way of doing incrementals when deploying a new storage system (unless IBM wants to let us add new storage and change the block size.... Someday maybe...) Though until mmxcp supports ACLs, it's still not really a solution I guess. Simon ________________________________________ From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of makaplan at us.ibm.com [makaplan at us.ibm.com] Sent: 22 October 2018 21:08 To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp Rather than hack rsync or cp ... I proposed a smallish utility that would copy those extended attributes and ACLs that cp -a just skips over. This can be done using the documented GPFS APIs that were designed for backup and restore of files. SMOP and then add it as an option to samples/ilm/mmxcp Sorry I haven't gotten around to doing this ... Seems like a modest sized project... Avoids boiling the ocean and reinventing or hacking rsync. -- marc K From oehmes at gmail.com Mon Oct 22 21:33:17 2018 From: oehmes at gmail.com (Sven Oehme) Date: Mon, 22 Oct 2018 13:33:17 -0700 Subject: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca> <92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu> <3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> Message-ID: Marc, The issue with that is that you need multiple passes and things change in between, it also significant increases migration times. You will always miss something or you need to manually correct. The right thing is to have 1 tool that takes care of both, the bulk transfer and the additional attributes. Sven From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Monday, October 22, 2018 at 1:09 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp Rather than hack rsync or cp ... I proposed a smallish utility that would copy those extended attributes and ACLs that cp -a just skips over. This can be done using the documented GPFS APIs that were designed for backup and restore of files. SMOP and then add it as an option to samples/ilm/mmxcp Sorry I haven't gotten around to doing this ... Seems like a modest sized project... Avoids boiling the ocean and reinventing or hacking rsync. -- marc K _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From makaplan at us.ibm.com Mon Oct 22 22:15:17 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Mon, 22 Oct 2018 17:15:17 -0400 Subject: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp In-Reply-To: References: <36FADB96-41AF-48BA-B0A8-8526505B81A2@med.mun.ca><92555fa5-f91a-18fb-2a25-277e7ae9fed6@rutgers.edu><3023B88F-D115-4C0B-90DC-6EF711D858E6@rutgers.edu> Message-ID: Just copy the extra attributes and ACL copy immediately after the cp. The window will be small, and if you think about it, the window of vulnerability is going to be there with a hacked rsync anyhow. There need not be any additional "passes". Once you put it into a single script, you have "one tool". From: Sven Oehme To: gpfsug main discussion list Date: 10/22/2018 04:33 PM Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp Sent by: gpfsug-discuss-bounces at spectrumscale.org Marc, The issue with that is that you need multiple passes and things change in between, it also significant increases migration times. You will always miss something or you need to manually correct. The right thing is to have 1 tool that takes care of both, the bulk transfer and the additional attributes. Sven From: on behalf of Marc A Kaplan Reply-To: gpfsug main discussion list Date: Monday, October 22, 2018 at 1:09 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp Rather than hack rsync or cp ... I proposed a smallish utility that would copy those extended attributes and ACLs that cp -a just skips over. This can be done using the documented GPFS APIs that were designed for backup and restore of files. SMOP and then add it as an option to samples/ilm/mmxcp Sorry I haven't gotten around to doing this ... Seems like a modest sized project... Avoids boiling the ocean and reinventing or hacking rsync. -- marc K _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Tue Oct 23 00:45:05 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Mon, 22 Oct 2018 16:45:05 -0700 Subject: [gpfsug-discuss] =?utf-8?q?Still_Time_to_Register!_--_Spectrum_Sc?= =?utf-8?q?ale_User_Group_Meeting_=E2=80=93_NYC_-_New_York_Genome_Center?= In-Reply-To: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> References: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> Message-ID: <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> There?s still some room left for NYC event on THIS WEDNESDAY if you want to attend, use the link below. The current agenda is: 8:45 AM 9:00 AM Coffee & Registration Presenter 9:00 AM 9:15 AM Welcome Amy Hirst & Chris Black 9:15 AM 9:45 AM What is new in IBM Spectrum Scale? Piyush Chaudhary 9:45 AM 10:00 AM What is new in ESS? John Sing 10:00 AM 10:20 AM How does CORAL help other workloads? Kevin Gildea 10:20 AM 10:40 AM Break 10:40 AM 11:00 AM Customer Talk ? The New York Genome Center Chris Black 11:00 AM 11:20 AM Spinning up a Hadoop cluster on demand Piyush Chaudhary 11:20 AM 11:40 AM Customer Talk ? Mt. Sinai School of Medicine Francesca Tartaglione 11:40 AM 12:00 PM AI Reference Architecture Piyush Chaudhary 12:00 PM 12:50 PM Lunch 12:50 PM 1:30 PM Special Talk Joe Dain 1:30 PM 1:50 PM Multi-cloud Transparent Cloud Tiering Rob Basham 1:50 PM 2:10 PM Customer Talk ? Princeton University Curtis W. Hillegas 2:10 PM 2:30 PM Updates on Container Support John Lewars 2:30 PM 2:50 PM Customer Talk ? NYU Michael Costantino 2:50 PM 3:10 PM Spectrum Archive and TS1160 Carl Reasoner 3:10 PM 3:30 PM Break 3:30 PM 4:10 PM IBM Spectrum Scale Network Related Troubleshooting John Lewars & Felipe Knop 4:10 PM 4:40 PM Service Update Jim Doherty 4:40 PM 5:10 PM Open Forum 5:10 PM 5:30 PM Wrap-Up Social Event - Mezzanine at the Dominick Hotel (246 Spring Street) > On Sep 27, 2018, at 7:22 AM, Oesterlin, Robert wrote: > > For those of you in the NE US or NYC area, here is the agenda for the NYC meeting coming up on October 24th. Special thanks to Richard Rupp at IBM for helping to organize this event. If you can make it, please register at the Eventbrite link below. > > Spectrum Scale User Group ? NYC > October 24th, 2018 > The New York Genome Center > 101 Avenue of the Americas, New York, NY 10013 First Floor Auditorium > > Register Here: https://www.eventbrite.com/e/2018-spectrum-scale-user-group-nyc-tickets-49786782607 > > 08:45-09:00 Coffee & Registration > 09:00-09:15 Welcome > 09:15-09:45 What is new in IBM Spectrum Scale? > 09:45-10:00 What is new in ESS? > 10:00-10:20 How does CORAL help other workloads? > 10:20-10:40 --- Break --- > 10:40-11:00 Customer Talk ? The New York Genome Center > 11:00-11:20 Spinning up a Hadoop cluster on demand > 11:20-11:40 Customer Talk ? Mt. Sinai School of Medicine > 11:40-12:10 Spectrum Scale Network Flow > 12:10-13:00 --- Lunch --- > 13:00-13:40 Special Announcement and Demonstration > 13:40-14:00 Multi-cloud Transparent Cloud Tiering > 14:00-14:20 Customer Talk ? Princeton University > 14:20-14:40 AI Reference Architecture > 14:40-15:00 Updates on Container Support > 15:00-15:20 Customer Talk ? TBD > 15:20-15:40 --- Break --- > 15:40-16:10 IBM Spectrum Scale Tuning and Troubleshooting > 16:10-16:40 Service Update > 16:40-17:10 Open Forum > 17:10-17:30 Wrap-Up > 17:30- Social Event > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.schlipalius at pawsey.org.au Tue Oct 23 01:01:41 2018 From: chris.schlipalius at pawsey.org.au (Chris Schlipalius) Date: Tue, 23 Oct 2018 08:01:41 +0800 Subject: [gpfsug-discuss] gpfsug-discuss Digest, Vol 81, Issue 44 In-Reply-To: References: Message-ID: <8F05B8A3-B950-46E1-8711-2A5CC6D62BDA@pawsey.org.au> Hi So when we have migrated 1.6PB of data from one GPFS filesystems to another GPFS (over IB), we used dcp in github (with mmdsh). It just can be problematic to compile. I have used rsync with attrib and ACLs?s preserved in my previous job ? aka rsync -aAvz But DCP parallelises better, checksumming files and dirs. works and we used that to ensure nothing was lost. Worth a go! Regards, Chris Schlipalius Team Lead, Data Storage Infrastructure, Data & Visualisation, Pawsey Supercomputing Centre (CSIRO) 13 Burvill Court Kensington WA 6151 Australia Tel +61 8 6436 8815 Email chris.schlipalius at pawsey.org.au Web www.pawsey.org.au On 23/10/18, 4:08 am, "gpfsug-discuss-bounces at spectrumscale.org on behalf of gpfsug-discuss-request at spectrumscale.org" wrote: Send gpfsug-discuss mailing list submissions to gpfsug-discuss at spectrumscale.org To subscribe or unsubscribe via the World Wide Web, visit http://gpfsug.org/mailman/listinfo/gpfsug-discuss or, via email, send a message with subject or body 'help' to gpfsug-discuss-request at spectrumscale.org You can reach the person managing the list at gpfsug-discuss-owner at spectrumscale.org When replying, please edit your Subject line so it is more specific than "Re: Contents of gpfsug-discuss digest..." Today's Topics: 1. Re: Best way to migrate data (Ryan Novosielski) 2. Re: Best way to migrate data (Sven Oehme) 3. Re: Best way to migrate data : mmfind ... mmxcp (Marc A Kaplan) ---------------------------------------------------------------------- Message: 1 Date: Mon, 22 Oct 2018 15:21:06 +0000 From: Ryan Novosielski To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Message-ID: <3023B88F-D115-4C0B-90DC-6EF711D858E6 at rutgers.edu> Content-Type: text/plain; charset="utf-8" It seems like the primary way that this helps us is that we transfer user home directories and many of them have VERY large numbers of small files (in the millions), so running multiple simultaneous rsyncs allows the transfer to continue past that one slow area. I guess it balances the bandwidth constraint and the I/O constraints on generating a file list. There are unfortunately one or two known bugs that slow it down ? it keeps track of its rsync PIDs but sometimes a former rsync PID is reused by the system which it counts against the number of running rsyncs. It can also think rsync is still running at the end when it?s really something else now using the PID. I know the author is looking at that. For shorter transfers, you likely won?t run into this. I?m not sure I have the time or the programming ability to make this happen, but it seems to me that one could make some major gains by replacing fpart with mmfind in a GPFS environment. Generating lists of files takes a significant amount of time and mmfind can probably do it faster than anything else that does not have direct access to GPFS metadata. > On Oct 19, 2018, at 6:37 AM, Dwayne.Hart at med.mun.ca wrote: > > Thank you Ryan. I?ll have a more in-depth look at this application later today and see how it deals with some of the large genetic files that are generated by the sequencer. By copying it from GPFS fs to another GPFS fs. > > Best, > Dwayne > ? > Dwayne Hart | Systems Administrator IV > > CHIA, Faculty of Medicine > Memorial University of Newfoundland > 300 Prince Philip Drive > St. John?s, Newfoundland | A1B 3V6 > Craig L Dobbin Building | 4M409 > T 709 864 6631 > >> On Oct 19, 2018, at 7:04 AM, Ryan Novosielski wrote: >> >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> We use parsyncfp. Our target is not GPFS, though. I was really hoping >> to hear about something snazzier for GPFS-GPFS. Lenovo would probably >> tell you that HSM is the way to go (we asked something similar for a >> replacement for our current setup or for distributed storage). >> >>> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: >>> Hi, >>> >>> Just wondering what the best recipe for migrating a user?s home >>> directory content from one GFPS file system to another which hosts >>> a larger research GPFS file system? I?m currently using rsync and >>> it has maxed out the client system?s IB interface. >>> >>> Best, Dwayne ? Dwayne Hart | Systems Administrator IV >>> >>> CHIA, Faculty of Medicine Memorial University of Newfoundland 300 >>> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L >>> Dobbin Building | 4M409 T 709 864 6631 >>> _______________________________________________ gpfsug-discuss >>> mailing list gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> >> >> - -- >> ____ >> || \\UTGERS, |----------------------*O*------------------------ >> ||_// the State | Ryan Novosielski - novosirj at rutgers.edu >> || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus >> || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark >> `' >> -----BEGIN PGP SIGNATURE----- >> >> iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG >> p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk >> =dMDg >> -----END PGP SIGNATURE----- >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ------------------------------ Message: 2 Date: Mon, 22 Oct 2018 11:11:06 -0700 From: Sven Oehme To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data Message-ID: Content-Type: text/plain; charset="utf-8" i am not sure if that was mentioned already but in some version of V5.0.X based on my suggestion a tool was added by mark on a AS-IS basis (thanks mark) to do what you want with one exception : /usr/lpp/mmfs/samples/ilm/mmxcp -h Usage: /usr/lpp/mmfs/samples/ilm/mmxcp -t target -p strip_count source_pathname1 source_pathname2 ... Run "cp" in a mmfind ... -xarg ... pipeline, e.g. mmfind -polFlags '-N all -g /gpfs/tmp' /gpfs/source -gpfsWeight DIRECTORY_HASH -xargs mmxcp -t /target -p 2 Options: -t target_path : Copy files to this path. -p strip_count : Remove this many directory names from the pathnames of the source files. -a : pass -a to cp -v : pass -v to cp this is essentially a parallel copy tool using the policy with all its goddies. the one critical part thats missing is that it doesn't copy any GPFS specific metadata which unfortunate includes NFSV4 ACL's. the reason for that is that GPFS doesn't expose the NFSV4 ACl's via xattrs nor does any of the regular Linux tools uses the proprietary interface into GPFS to extract and apply them (this is what allows this magic unsupported version of rsync https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync to transfer the acls and other attributes). so a worth while RFE would be to either expose all special GPFS bits as xattrs or provide at least a maintained version of sync, cp or whatever which allows the transfer of this data. Sven On Mon, Oct 22, 2018 at 10:52 AM Ryan Novosielski wrote: > It seems like the primary way that this helps us is that we transfer user > home directories and many of them have VERY large numbers of small files > (in the millions), so running multiple simultaneous rsyncs allows the > transfer to continue past that one slow area. I guess it balances the > bandwidth constraint and the I/O constraints on generating a file list. > There are unfortunately one or two known bugs that slow it down ? it keeps > track of its rsync PIDs but sometimes a former rsync PID is reused by the > system which it counts against the number of running rsyncs. It can also > think rsync is still running at the end when it?s really something else now > using the PID. I know the author is looking at that. For shorter transfers, > you likely won?t run into this. > > I?m not sure I have the time or the programming ability to make this > happen, but it seems to me that one could make some major gains by > replacing fpart with mmfind in a GPFS environment. Generating lists of > files takes a significant amount of time and mmfind can probably do it > faster than anything else that does not have direct access to GPFS metadata. > > > On Oct 19, 2018, at 6:37 AM, Dwayne.Hart at med.mun.ca wrote: > > > > Thank you Ryan. I?ll have a more in-depth look at this application later > today and see how it deals with some of the large genetic files that are > generated by the sequencer. By copying it from GPFS fs to another GPFS fs. > > > > Best, > > Dwayne > > ? > > Dwayne Hart | Systems Administrator IV > > > > CHIA, Faculty of Medicine > > Memorial University of Newfoundland > > 300 Prince Philip Drive > > St. John?s, Newfoundland | A1B 3V6 > > Craig L Dobbin Building | 4M409 > > T 709 864 6631 <(709)%20864-6631> > > > >> On Oct 19, 2018, at 7:04 AM, Ryan Novosielski > wrote: > >> > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> We use parsyncfp. Our target is not GPFS, though. I was really hoping > >> to hear about something snazzier for GPFS-GPFS. Lenovo would probably > >> tell you that HSM is the way to go (we asked something similar for a > >> replacement for our current setup or for distributed storage). > >> > >>> On 10/18/2018 01:19 PM, Dwayne.Hart at med.mun.ca wrote: > >>> Hi, > >>> > >>> Just wondering what the best recipe for migrating a user?s home > >>> directory content from one GFPS file system to another which hosts > >>> a larger research GPFS file system? I?m currently using rsync and > >>> it has maxed out the client system?s IB interface. > >>> > >>> Best, Dwayne ? Dwayne Hart | Systems Administrator IV > >>> > >>> CHIA, Faculty of Medicine Memorial University of Newfoundland 300 > >>> Prince Philip Drive St. John?s, Newfoundland | A1B 3V6 Craig L > >>> Dobbin Building | 4M409 T 709 864 6631 <(709)%20864-6631> > >>> _______________________________________________ gpfsug-discuss > >>> mailing list gpfsug-discuss at spectrumscale.org > >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >>> > >> > >> - -- > >> ____ > >> || \\UTGERS, |----------------------*O*------------------------ > >> ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > >> || \\ University | Sr. Technologist - 973/972.0922 <(973)%20972-0922> > ~*~ RBHS Campus > >> || \\ of NJ | Office of Advanced Res. Comp. - MSB C630, Newark > >> `' > >> -----BEGIN PGP SIGNATURE----- > >> > >> iEYEARECAAYFAlvI51AACgkQmb+gadEcsb62SQCfWBAru3KkJd+UftG2BXaRzjTG > >> p/wAn0mpC5XCZc50fZfMPRRXR40HsmEk > >> =dMDg > >> -----END PGP SIGNATURE----- > >> _______________________________________________ > >> gpfsug-discuss mailing list > >> gpfsug-discuss at spectrumscale.org > >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 3 Date: Mon, 22 Oct 2018 16:08:49 -0400 From: "Marc A Kaplan" To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Best way to migrate data : mmfind ... mmxcp Message-ID: Content-Type: text/plain; charset="us-ascii" Rather than hack rsync or cp ... I proposed a smallish utility that would copy those extended attributes and ACLs that cp -a just skips over. This can be done using the documented GPFS APIs that were designed for backup and restore of files. SMOP and then add it as an option to samples/ilm/mmxcp Sorry I haven't gotten around to doing this ... Seems like a modest sized project... Avoids boiling the ocean and reinventing or hacking rsync. -- marc K -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss End of gpfsug-discuss Digest, Vol 81, Issue 44 ********************************************** From Alexander.Saupp at de.ibm.com Tue Oct 23 06:51:54 2018 From: Alexander.Saupp at de.ibm.com (Alexander Saupp) Date: Tue, 23 Oct 2018 07:51:54 +0200 Subject: [gpfsug-discuss] Best way to migrate data : Plan B: policy engine + rsync Message-ID: Hi, I agree, a tool with proper wrapping delivered in samples would be the right approach. No warranty, no support - below a prototype I documented 2 years ago (prior to mmfind availability). The BP used an alternate approach, so its not tested at scale, but the principle was tested and works. Reading through it right now I'd re-test the 'deleted files on destination that were deleted on the source' scenario, that might now require some fixing. # Use 'GPFS patched' rsync on both ends to keep GPFS attributes https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync # Policy - initial & differential (add mod_time > .. for incremental runs. Use MOD_TIME < .. to have a defined start for the next incremental rsync, remove it for the 'final' rsync) # http://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_usngfileattrbts.htm cat /tmp/policy.pol RULE 'mmfind' ??? LIST 'mmfindList' ??? DIRECTORIES_PLUS ??? SHOW( ????????? VARCHAR(MODE) || ' ' || ????????? VARCHAR(NLINK) || ' ' || ????????? VARCHAR(USER_ID) || ' ' || ????????? VARCHAR(GROUP_ID) || ' ' || ????????? VARCHAR(FILE_SIZE) || ' ' || ????????? VARCHAR(KB_ALLOCATED) || ' ' || ????????? VARCHAR(POOL_NAME) || ' ' || ????????? VARCHAR(MISC_ATTRIBUTES) || ' ' || ????????? VARCHAR(ACCESS_TIME) || ' ' || ????????? VARCHAR(CREATION_TIME) || ' ' || ????????? VARCHAR(MODIFICATION_TIME) ??????? ) # First run ??? WHERE MODIFICATION_TIME < TIMESTAMP('2016-08-10 00:00:00') # Incremental runs ??? WHERE MODIFICATION_TIME > TIMESTAMP('2016-08-10 00:00:00') and MODIFICATION_TIME < TIMESTAMP('2016-08-20 00:00:00') # Final run during maintenance, should also do deletes, ensure you to call rsync the proper way (--delete) ??? WHERE TRUE # Apply policy, defer will ensure the result file(s) are not deleted mmapplypolicy? group3fs -P /tmp/policy.pol? -f /ibm/group3fs/pol.txt -I defer # FYI only - look at results, ... not required # cat /ibm/group3fs/pol.txt.list.mmfindList 3 1 0? drwxr-xr-x 4 0 0 262144 512 system D2u 2016-08-25 08:30:35.053057 -- /ibm/group3fs 41472 1077291531 0? drwxr-xr-x 5 0 0 4096 0 system D2u 2016-08-18 21:07:36.996777 -- /ibm/group3fs/ces 60416 842873924 0? drwxr-xr-x 4 0 0 4096 0 system D2u 2016-08-18 21:07:45.947920 -- /ibm/group3fs/ces/ha 60417 2062486126 0? -rw-r--r-- 1 0 0 0 0 system FAu 2016-08-19 15:17:57.428922 -- /ibm/group3fs/ces/ha/.dummy 60418 436745294 0? drwxr-xr-x 4 0 0 4096 0 system D2u 2016-08-18 21:05:54.482094 -- /ibm/group3fs/ces/ces 60419 647668346 0? -rw-r--r-- 1 0 0 0 0 system FAu 2016-08-19 15:17:57.484923 -- /ibm/group3fs/ces/ces/.dummy 60420 1474765985 0? -rw-r--r-- 1 0 0 0 0 system FAu 2016-08-18 21:06:43.133640 -- /ibm/group3fs/ces/ces/addrs/1471554403-node0-9.155.118.69 60421 1020724013 0? drwxr-xr-x 2 0 0 4096 0 system D2um 2016-08-18 21:07:37.000695 -- /ibm/group3fs/ces/ganesha cat /ibm/group3fs/pol.txt.list.mmfindList? |awk ' { print $19}' /ibm/group3fs/ces/ha/.dummy /ibm/group3fs/ces/ces/.dummy /ibm/group3fs/ces/ha/nfs/ganesha/v4recov/node3 /ibm/group3fs/ces/ha/nfs/ganesha/v4old/node3 /ibm/group3fs/pol.txt.list.mmfindList /ibm/group3fs/ces/ces/connections /ibm/group3fs/ces/ha/nfs/ganesha/gpfs-epoch /ibm/group3fs/ces/ha/nfs/ganesha/v4recov /ibm/group3fs/ces/ha/nfs/ganesha/v4old # Start rsync - could split up single result file into multiple ones for parallel / multi node runs rsync -av --gpfs-attrs --progress --files-from $ ( cat /ibm/group3fs/pol.txt.list.mmfindList ) 10.10.10.10:/path Be sure you verify that extended attributes are properly replicated. I have in mind that you need to ensure the 'remote' rsync is not the default one, but the one with GPFS capabilities (rsync -e "remoteshell"). Kind regards, Alex Saupp Mit freundlichen Gr??en / Kind regards Alexander Saupp IBM Systems, Storage Platform, EMEA Storage Competence Center Phone: +49 7034-643-1512 IBM Deutschland GmbH Mobile: +49-172 7251072 Am Weiher 24 Email: alexander.saupp at de.ibm.com 65451 Kelsterbach Germany IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter Gesch?ftsf?hrung: Matthias Hartmann (Vorsitzender), Norbert Janzen, Stefan Lutz, Nicole Reimer, Dr. Klaus Seifert, Wolfgang Wendt Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1C800025.gif Type: image/gif Size: 1851 bytes Desc: not available URL: From S.J.Thompson at bham.ac.uk Tue Oct 23 09:31:03 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Tue, 23 Oct 2018 08:31:03 +0000 Subject: [gpfsug-discuss] Best way to migrate data : Plan B: policy engine + rsync In-Reply-To: References: Message-ID: I should note, there is a PR there which adds symlink support as well to the patched rsync version ? It is quite an old version of rsync now, and I don?t know if it?s been tested with a newer release. Simon From: on behalf of "Alexander.Saupp at de.ibm.com" Reply-To: "gpfsug-discuss at spectrumscale.org" Date: Tuesday, 23 October 2018 at 06:52 To: "gpfsug-discuss at spectrumscale.org" Subject: Re: [gpfsug-discuss] Best way to migrate data : Plan B: policy engine + rsync # Use 'GPFS patched' rsync on both ends to keep GPFS attributes https://github.com/gpfsug/gpfsug-tools/tree/master/bin/rsync -------------- next part -------------- An HTML attachment was scrubbed... URL: From george at markomanolis.com Wed Oct 24 13:43:23 2018 From: george at markomanolis.com (George Markomanolis) Date: Wed, 24 Oct 2018 08:43:23 -0400 Subject: [gpfsug-discuss] IO500 - Call for Submission for SC18 Message-ID: Dear all, Please consider the submission of results to the new list. Deadline: 10 November 2018 AoE The IO500 is now accepting and encouraging submissions for the upcoming IO500 list revealed at Supercomputing 2018 in Dallas, Texas. We also announce the 10 compute node I/O challenge to encourage submission of small-scale results. The new ranked lists will be announced at our SC18 BOF on Wednesday, November 14th at 5:15pm. We hope to see you, and your results, there. The benchmark suite is designed to be easy to run and the community has multiple active support channels to help with any questions. Please submit and we look forward to seeing many of you at SC 2018! Please note that submissions of all size are welcome; the site has customizable sorting so it is possible to submit on a small system and still get a very good per-client score for example. Additionally, the list is about much more than just the raw rank; all submissions help the community by collecting and publishing a wider corpus of data. More details below. Following the success of the Top500 in collecting and analyzing historical trends in supercomputer technology and evolution, the IO500 was created in 2017 and published its first list at SC17. The need for such an initiative has long been known within High-Performance Computing; however, defining appropriate benchmarks had long been challenging. Despite this challenge, the community, after long and spirited discussion, finally reached consensus on a suite of benchmarks and a metric for resolving the scores into a single ranking. The multi-fold goals of the benchmark suite are as follows: Maximizing simplicity in running the benchmark suite Encouraging complexity in tuning for performance Allowing submitters to highlight their ?hero run? performance numbers Forcing submitters to simultaneously report performance for challenging IO patterns. Specifically, the benchmark suite includes a hero-run of both IOR and mdtest configured, however, possible to maximize performance and establish an upper-bound for performance. It also includes an IOR and mdtest run with highly prescribed parameters in an attempt to determine a lower-bound. Finally, it includes a namespace search as this has been determined to be a highly sought-after feature in HPC storage systems that have historically not been well measured. Submitters are encouraged to share their tuning insights for publication. The goals of the community are also multi-fold: Gather historical data for the sake of analysis and to aid predictions of storage futures Collect tuning information to share valuable performance optimizations across the community Encourage vendors and designers to optimize for workloads beyond ?hero runs? Establish bounded expectations for users, procurers, and administrators 10 Compute Node I/O Challenge At SC, we will announce another IO-500 award for the 10 Compute Node I/O Challenge. This challenge is conducted using the regular IO-500 benchmark, however, with the rule that exactly 10 computes nodes must be used to run the benchmark (one exception is find, which may use 1 node). You may use any shared storage with, e.g., any number of servers. When submitting for the IO-500 list, you can opt-in for ?Participate in the 10 compute node challenge only?, then we won't include the results into the ranked list. Other 10 compute node submission will be included in the full list and in the ranked list. We will announce the result in a separate derived list and in the full list but not on the ranked IO-500 list at io500.org. Birds-of-a-feather Once again, we encourage you to submit [1], to join our community, and to attend our BoF ?The IO-500 and the Virtual Institute of I/O? at SC 2018 [2] where we will announce the third ever IO500 list. The current list includes results from BeeGPFS, DataWarp, IME, Lustre, and Spectrum Scale. We hope that the next list has even more. We look forward to answering any questions or concerns you might have. [1] http://io500.org/submission [2] https://sc18.supercomputing.org/presentation/?id=bof134&sess=sess390 -------------- next part -------------- An HTML attachment was scrubbed... URL: From S.J.Thompson at bham.ac.uk Wed Oct 24 21:53:21 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Wed, 24 Oct 2018 20:53:21 +0000 Subject: [gpfsug-discuss] Spectrum Scale User Group@CIUK - call for user speakers Message-ID: Hi All, I know December is a little way off, but as usual we'll be holding a Spectrum Scale user group breakout session as part of CIUK here in the UK in December. As a breakout session its only a couple of hours... We're just looking at the agenda, I have a couple of IBM sessions in and Sven has agreed to give a talk as he'll be there as well. I'm looking for a couple of user talks to finish of the agenda. Whether you are a small deployment or large, we're interested in hearing from you! Note: you must be registered to attend CIUK to attend this user group. Registration is via the CIUK website: https://www.scd.stfc.ac.uk/Pages/CIUK2018.aspxhttps://www.scd.stfc.ac.uk/Pages/CIUK2018.aspx Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.dietrich at desy.de Thu Oct 25 13:12:07 2018 From: stefan.dietrich at desy.de (Dietrich, Stefan) Date: Thu, 25 Oct 2018 14:12:07 +0200 (CEST) Subject: [gpfsug-discuss] Nested NFSv4 Exports Message-ID: <1497297460.32545018.1540469527448.JavaMail.zimbra@desy.de> Hi, I am currently fiddling around with some nested NFSv4 exports and the differing behaviour to NFSv3. The environment is a GPFS 5.0.1 with enabled CES, so Ganesha is used as the NFS server. Given the following (pseudo) directory structure: /gpfs/filesystem1/directory1 /gpfs/filesystem1/directory1/sub-directory1 /gpfs/filesystem1/directory1/sub-directory2 Now to the exports: /gpfs/filesystem1/directory1 is exported to client1 as read-only. /gpfs/filesystem1/directory1/sub-directory1 is exported to client2 as read-write. client2 is not included in the export for /gpfs/filesystem1/directory1. Mounting /gpfs/filesystem1/directory1 on client1 works as expected. Mounting /gpfs/filesystem1/directory1/sub-directory1 on client2 does not work and results in a permission denied. If I change the protocol from NFSv4 to NFSv3, it works. There is a section about nested NFS exports in the mmnfs doc: Creating nested exports (such as /path/to/folder and /path/to/folder/subfolder) is strongly discouraged since this might lead to serious issues in data consistency. Be very cautious when creating and using nested exports. If there is a need to have nested exports (such as /path/to/folder and /path/to/folder/inside/subfolder), NFSv4 client that mounts the parent (/path/to/folder) export will not be able to see the child export subtree (/path/to/folder/inside/subfolder) unless the same client is explicitly allowed to access the child export as well. This is okay as long as the client uses only NFSv4 mounts. The Linux kernel NFS server and other NFSv4 servers do not show this behaviour. Is there a way to bypass this with CES/Ganesha? Or is the only solution to add client2 to /gpfs/filesystem1/directory1? Regards, Stefan -- ------------------------------------------------------------------------ Stefan Dietrich Deutsches Elektronen-Synchrotron (IT-Systems) Ein Forschungszentrum der Helmholtz-Gemeinschaft Notkestr. 85 phone: +49-40-8998-4696 22607 Hamburg e-mail: stefan.dietrich at desy.de Germany ------------------------------------------------------------------------ From dyoung at pixitmedia.com Thu Oct 25 17:59:08 2018 From: dyoung at pixitmedia.com (Dan Young) Date: Thu, 25 Oct 2018 12:59:08 -0400 Subject: [gpfsug-discuss] =?utf-8?q?Still_Time_to_Register!_--_Spectrum_Sc?= =?utf-8?q?ale_User_Group_Meeting_=E2=80=93_NYC_-_New_York_Genome_C?= =?utf-8?q?enter?= In-Reply-To: <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> References: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> Message-ID: Did I miss where these presentations were uploaded? People kept saying throughout the day that these would be uploaded somewhere. On Mon, 22 Oct 2018 at 19:45, Kristy Kallback-Rose wrote: > There?s still some room left for NYC event on THIS WEDNESDAY if you want > to attend, use the link below. > > *The current agenda is:* > > 8:45 AM > 9:00 AM > Coffee & Registration Presenter > > 9:00 AM > 9:15 AM > Welcome Amy Hirst & Chris Black > > 9:15 AM > 9:45 AM > What is new in IBM Spectrum Scale? Piyush Chaudhary > > 9:45 AM > 10:00 AM > What is new in ESS? John Sing > > 10:00 AM > 10:20 AM > How does CORAL help other workloads? Kevin Gildea > > 10:20 AM > 10:40 AM > Break > > 10:40 AM > 11:00 AM > Customer Talk ? The New York Genome Center Chris Black > > 11:00 AM > 11:20 AM > Spinning up a Hadoop cluster on demand Piyush Chaudhary > > 11:20 AM > 11:40 AM > Customer Talk ? Mt. Sinai School of Medicine Francesca Tartaglione > > 11:40 AM > 12:00 PM > AI Reference Architecture Piyush Chaudhary > > 12:00 PM > 12:50 PM > Lunch > > 12:50 PM > 1:30 PM > Special Talk Joe Dain > > 1:30 PM > 1:50 PM > Multi-cloud Transparent Cloud Tiering Rob Basham > > 1:50 PM > 2:10 PM > Customer Talk ? Princeton University Curtis W. Hillegas > > 2:10 PM > 2:30 PM > Updates on Container Support John Lewars > > 2:30 PM > 2:50 PM > Customer Talk ? NYU Michael Costantino > > 2:50 PM > 3:10 PM > Spectrum Archive and TS1160 Carl Reasoner > > 3:10 PM > 3:30 PM > Break > > 3:30 PM > 4:10 PM > IBM Spectrum Scale Network Related Troubleshooting John Lewars & Felipe > Knop > > 4:10 PM > 4:40 PM > Service Update Jim Doherty > > 4:40 PM > 5:10 PM > Open Forum > > 5:10 PM > 5:30 PM > Wrap-Up > Social Event - Mezzanine at the Dominick Hotel (246 Spring Street) > > On Sep 27, 2018, at 7:22 AM, Oesterlin, Robert < > Robert.Oesterlin at nuance.com> wrote: > > For those of you in the NE US or NYC area, here is the agenda for the NYC > meeting coming up on October 24th. Special thanks to Richard Rupp at IBM > for helping to organize this event. If you can make it, please register at > the Eventbrite link below. > > Spectrum Scale User Group ? NYC > October 24th, 2018 > The New York Genome Center > 101 Avenue of the Americas, New York, NY 10013 First Floor Auditorium > > Register Here: > https://www.eventbrite.com/e/2018-spectrum-scale-user-group-nyc-tickets-49786782607 > > 08:45-09:00 Coffee & Registration > 09:00-09:15 Welcome > 09:15-09:45 What is new in IBM Spectrum Scale? > 09:45-10:00 What is new in ESS? > 10:00-10:20 How does CORAL help other workloads? > 10:20-10:40 --- Break --- > 10:40-11:00 Customer Talk ? The New York Genome Center > 11:00-11:20 Spinning up a Hadoop cluster on demand > 11:20-11:40 Customer Talk ? Mt. Sinai School of Medicine > 11:40-12:10 Spectrum Scale Network Flow > 12:10-13:00 --- Lunch --- > 13:00-13:40 Special Announcement and Demonstration > 13:40-14:00 Multi-cloud Transparent Cloud Tiering > 14:00-14:20 Customer Talk ? Princeton University > 14:20-14:40 AI Reference Architecture > 14:40-15:00 Updates on Container Support > 15:00-15:20 Customer Talk ? TBD > 15:20-15:40 --- Break --- > 15:40-16:10 IBM Spectrum Scale Tuning and Troubleshooting > 16:10-16:40 Service Update > 16:40-17:10 Open Forum > 17:10-17:30 Wrap-Up > 17:30- Social Event > > > Bob Oesterlin > Sr Principal Storage Engineer, Nuance > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- *Dan Young* Solutions Architect, Pixit Media +1-347-249-7413 | dyoung at pixitmedia.com www.pixitmedia.com | Tw:@pixitmedia -- This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kkr at lbl.gov Thu Oct 25 18:01:39 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 25 Oct 2018 10:01:39 -0700 Subject: [gpfsug-discuss] =?utf-8?q?Still_Time_to_Register!_--_Spectrum_Sc?= =?utf-8?q?ale_User_Group_Meeting_=E2=80=93_NYC_-_New_York_Genome_Center?= In-Reply-To: References: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> Message-ID: Checking? -Kristy > On Oct 25, 2018, at 9:59 AM, Dan Young wrote: > > Did I miss where these presentations were uploaded? People kept saying throughout the day that these would be uploaded somewhere. > > On Mon, 22 Oct 2018 at 19:45, Kristy Kallback-Rose > wrote: > There?s still some room left for NYC event on THIS WEDNESDAY if you want to attend, use the link below. > > The current agenda is: > > 8:45 AM > 9:00 AM > Coffee & Registration Presenter > 9:00 AM > 9:15 AM > Welcome Amy Hirst & Chris Black > 9:15 AM > 9:45 AM > What is new in IBM Spectrum Scale? Piyush Chaudhary > 9:45 AM > 10:00 AM > What is new in ESS? John Sing > 10:00 AM > 10:20 AM > How does CORAL help other workloads? Kevin Gildea > 10:20 AM > 10:40 AM > Break > 10:40 AM > 11:00 AM > Customer Talk ? The New York Genome Center Chris Black > 11:00 AM > 11:20 AM > Spinning up a Hadoop cluster on demand Piyush Chaudhary > 11:20 AM > 11:40 AM > Customer Talk ? Mt. Sinai School of Medicine Francesca Tartaglione > 11:40 AM > 12:00 PM > AI Reference Architecture Piyush Chaudhary > 12:00 PM > 12:50 PM > Lunch > 12:50 PM > 1:30 PM > Special Talk Joe Dain > 1:30 PM > 1:50 PM > Multi-cloud Transparent Cloud Tiering Rob Basham > 1:50 PM > 2:10 PM > Customer Talk ? Princeton University Curtis W. Hillegas > 2:10 PM > 2:30 PM > Updates on Container Support John Lewars > 2:30 PM > 2:50 PM > Customer Talk ? NYU Michael Costantino > 2:50 PM > 3:10 PM > Spectrum Archive and TS1160 Carl Reasoner > 3:10 PM > 3:30 PM > Break > 3:30 PM > 4:10 PM > IBM Spectrum Scale Network Related Troubleshooting John Lewars & Felipe Knop > 4:10 PM > 4:40 PM > Service Update Jim Doherty > 4:40 PM > 5:10 PM > Open Forum > 5:10 PM > 5:30 PM > Wrap-Up > Social Event - Mezzanine at the Dominick Hotel (246 Spring Street) > > >> On Sep 27, 2018, at 7:22 AM, Oesterlin, Robert > wrote: >> >> For those of you in the NE US or NYC area, here is the agenda for the NYC meeting coming up on October 24th. Special thanks to Richard Rupp at IBM for helping to organize this event. If you can make it, please register at the Eventbrite link below. >> >> Spectrum Scale User Group ? NYC >> October 24th, 2018 >> The New York Genome Center >> 101 Avenue of the Americas, New York, NY 10013 First Floor Auditorium >> >> Register Here: https://www.eventbrite.com/e/2018-spectrum-scale-user-group-nyc-tickets-49786782607 >> >> 08:45-09:00 Coffee & Registration >> 09:00-09:15 Welcome >> 09:15-09:45 What is new in IBM Spectrum Scale? >> 09:45-10:00 What is new in ESS? >> 10:00-10:20 How does CORAL help other workloads? >> 10:20-10:40 --- Break --- >> 10:40-11:00 Customer Talk ? The New York Genome Center >> 11:00-11:20 Spinning up a Hadoop cluster on demand >> 11:20-11:40 Customer Talk ? Mt. Sinai School of Medicine >> 11:40-12:10 Spectrum Scale Network Flow >> 12:10-13:00 --- Lunch --- >> 13:00-13:40 Special Announcement and Demonstration >> 13:40-14:00 Multi-cloud Transparent Cloud Tiering >> 14:00-14:20 Customer Talk ? Princeton University >> 14:20-14:40 AI Reference Architecture >> 14:40-15:00 Updates on Container Support >> 15:00-15:20 Customer Talk ? TBD >> 15:20-15:40 --- Break --- >> 15:40-16:10 IBM Spectrum Scale Tuning and Troubleshooting >> 16:10-16:40 Service Update >> 16:40-17:10 Open Forum >> 17:10-17:30 Wrap-Up >> 17:30- Social Event >> >> >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > > Dan Young > Solutions Architect, Pixit Media > +1-347-249-7413 | dyoung at pixitmedia.com > www.pixitmedia.com | Tw:@pixitmedia > > This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From novosirj at rutgers.edu Fri Oct 26 01:54:13 2018 From: novosirj at rutgers.edu (Ryan Novosielski) Date: Fri, 26 Oct 2018 00:54:13 +0000 Subject: [gpfsug-discuss] =?utf-8?q?Still_Time_to_Register!_--_Spectrum_Sc?= =?utf-8?q?ale_User_Group_Meeting_=E2=80=93_NYC_-_New_York_Genome_Center?= In-Reply-To: References: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> Message-ID: What they said was ?spectrumscale.org?. I suspect they?ll wind up here: http://www.spectrumscaleug.org/presentations/ -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj at rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Oct 25, 2018, at 12:59 PM, Dan Young wrote: > > Did I miss where these presentations were uploaded? People kept saying throughout the day that these would be uploaded somewhere. > > On Mon, 22 Oct 2018 at 19:45, Kristy Kallback-Rose wrote: > There?s still some room left for NYC event on THIS WEDNESDAY if you want to attend, use the link below. > > The current agenda is: > > 8:45 AM > 9:00 AM > Coffee & Registration Presenter > 9:00 AM > 9:15 AM > Welcome Amy Hirst & Chris Black > 9:15 AM > 9:45 AM > What is new in IBM Spectrum Scale? Piyush Chaudhary > 9:45 AM > 10:00 AM > What is new in ESS? John Sing > 10:00 AM > 10:20 AM > How does CORAL help other workloads? Kevin Gildea > 10:20 AM > 10:40 AM > Break > 10:40 AM > 11:00 AM > Customer Talk ? The New York Genome Center Chris Black > 11:00 AM > 11:20 AM > Spinning up a Hadoop cluster on demand Piyush Chaudhary > 11:20 AM > 11:40 AM > Customer Talk ? Mt. Sinai School of Medicine Francesca Tartaglione > 11:40 AM > 12:00 PM > AI Reference Architecture Piyush Chaudhary > 12:00 PM > 12:50 PM > Lunch > 12:50 PM > 1:30 PM > Special Talk Joe Dain > 1:30 PM > 1:50 PM > Multi-cloud Transparent Cloud Tiering Rob Basham > 1:50 PM > 2:10 PM > Customer Talk ? Princeton University Curtis W. Hillegas > 2:10 PM > 2:30 PM > Updates on Container Support John Lewars > 2:30 PM > 2:50 PM > Customer Talk ? NYU Michael Costantino > 2:50 PM > 3:10 PM > Spectrum Archive and TS1160 Carl Reasoner > 3:10 PM > 3:30 PM > Break > 3:30 PM > 4:10 PM > IBM Spectrum Scale Network Related Troubleshooting John Lewars & Felipe Knop > 4:10 PM > 4:40 PM > Service Update Jim Doherty > 4:40 PM > 5:10 PM > Open Forum > 5:10 PM > 5:30 PM > Wrap-Up > Social Event - Mezzanine at the Dominick Hotel (246 Spring Street) > > >> On Sep 27, 2018, at 7:22 AM, Oesterlin, Robert wrote: >> >> For those of you in the NE US or NYC area, here is the agenda for the NYC meeting coming up on October 24th. Special thanks to Richard Rupp at IBM for helping to organize this event. If you can make it, please register at the Eventbrite link below. >> >> Spectrum Scale User Group ? NYC >> October 24th, 2018 >> The New York Genome Center >> 101 Avenue of the Americas, New York, NY 10013 First Floor Auditorium >> >> Register Here: https://www.eventbrite.com/e/2018-spectrum-scale-user-group-nyc-tickets-49786782607 >> >> 08:45-09:00 Coffee & Registration >> 09:00-09:15 Welcome >> 09:15-09:45 What is new in IBM Spectrum Scale? >> 09:45-10:00 What is new in ESS? >> 10:00-10:20 How does CORAL help other workloads? >> 10:20-10:40 --- Break --- >> 10:40-11:00 Customer Talk ? The New York Genome Center >> 11:00-11:20 Spinning up a Hadoop cluster on demand >> 11:20-11:40 Customer Talk ? Mt. Sinai School of Medicine >> 11:40-12:10 Spectrum Scale Network Flow >> 12:10-13:00 --- Lunch --- >> 13:00-13:40 Special Announcement and Demonstration >> 13:40-14:00 Multi-cloud Transparent Cloud Tiering >> 14:00-14:20 Customer Talk ? Princeton University >> 14:20-14:40 AI Reference Architecture >> 14:40-15:00 Updates on Container Support >> 15:00-15:20 Customer Talk ? TBD >> 15:20-15:40 --- Break --- >> 15:40-16:10 IBM Spectrum Scale Tuning and Troubleshooting >> 16:10-16:40 Service Update >> 16:40-17:10 Open Forum >> 17:10-17:30 Wrap-Up >> 17:30- Social Event >> >> >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- > > Dan Young > Solutions Architect, Pixit Media > +1-347-249-7413 | dyoung at pixitmedia.com > www.pixitmedia.com | Tw:@pixitmedia > > > This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From kkr at lbl.gov Fri Oct 26 04:36:50 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Thu, 25 Oct 2018 20:36:50 -0700 Subject: [gpfsug-discuss] =?utf-8?q?Still_Time_to_Register!_--_Spectrum_Sc?= =?utf-8?q?ale_User_Group_Meeting_=E2=80=93_NYC_-_New_York_Genome_Center?= In-Reply-To: References: <7E34B1A5-2412-4415-9095-C52EDDCE2A04@nuance.com> <52C08BB3-6740-4CA0-A3C9-D929C78BA9C0@lbl.gov> Message-ID: Yup. Richard is collecting them and we will upload afterwards. Sent from my iPhone > On Oct 25, 2018, at 5:54 PM, Ryan Novosielski wrote: > > What they said was ?spectrumscale.org?. I suspect they?ll wind up here: http://www.spectrumscaleug.org/presentations/ > > -- > ____ > || \\UTGERS, |---------------------------*O*--------------------------- > ||_// the State | Ryan Novosielski - novosirj at rutgers.edu > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus > || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark > `' > >> On Oct 25, 2018, at 12:59 PM, Dan Young wrote: >> >> Did I miss where these presentations were uploaded? People kept saying throughout the day that these would be uploaded somewhere. >> >> On Mon, 22 Oct 2018 at 19:45, Kristy Kallback-Rose wrote: >> There?s still some room left for NYC event on THIS WEDNESDAY if you want to attend, use the link below. >> >> The current agenda is: >> >> 8:45 AM >> 9:00 AM >> Coffee & Registration Presenter >> 9:00 AM >> 9:15 AM >> Welcome Amy Hirst & Chris Black >> 9:15 AM >> 9:45 AM >> What is new in IBM Spectrum Scale? Piyush Chaudhary >> 9:45 AM >> 10:00 AM >> What is new in ESS? John Sing >> 10:00 AM >> 10:20 AM >> How does CORAL help other workloads? Kevin Gildea >> 10:20 AM >> 10:40 AM >> Break >> 10:40 AM >> 11:00 AM >> Customer Talk ? The New York Genome Center Chris Black >> 11:00 AM >> 11:20 AM >> Spinning up a Hadoop cluster on demand Piyush Chaudhary >> 11:20 AM >> 11:40 AM >> Customer Talk ? Mt. Sinai School of Medicine Francesca Tartaglione >> 11:40 AM >> 12:00 PM >> AI Reference Architecture Piyush Chaudhary >> 12:00 PM >> 12:50 PM >> Lunch >> 12:50 PM >> 1:30 PM >> Special Talk Joe Dain >> 1:30 PM >> 1:50 PM >> Multi-cloud Transparent Cloud Tiering Rob Basham >> 1:50 PM >> 2:10 PM >> Customer Talk ? Princeton University Curtis W. Hillegas >> 2:10 PM >> 2:30 PM >> Updates on Container Support John Lewars >> 2:30 PM >> 2:50 PM >> Customer Talk ? NYU Michael Costantino >> 2:50 PM >> 3:10 PM >> Spectrum Archive and TS1160 Carl Reasoner >> 3:10 PM >> 3:30 PM >> Break >> 3:30 PM >> 4:10 PM >> IBM Spectrum Scale Network Related Troubleshooting John Lewars & Felipe Knop >> 4:10 PM >> 4:40 PM >> Service Update Jim Doherty >> 4:40 PM >> 5:10 PM >> Open Forum >> 5:10 PM >> 5:30 PM >> Wrap-Up >> Social Event - Mezzanine at the Dominick Hotel (246 Spring Street) >> >> >>> On Sep 27, 2018, at 7:22 AM, Oesterlin, Robert wrote: >>> >>> For those of you in the NE US or NYC area, here is the agenda for the NYC meeting coming up on October 24th. Special thanks to Richard Rupp at IBM for helping to organize this event. If you can make it, please register at the Eventbrite link below. >>> >>> Spectrum Scale User Group ? NYC >>> October 24th, 2018 >>> The New York Genome Center >>> 101 Avenue of the Americas, New York, NY 10013 First Floor Auditorium >>> >>> Register Here: https://www.eventbrite.com/e/2018-spectrum-scale-user-group-nyc-tickets-49786782607 >>> >>> 08:45-09:00 Coffee & Registration >>> 09:00-09:15 Welcome >>> 09:15-09:45 What is new in IBM Spectrum Scale? >>> 09:45-10:00 What is new in ESS? >>> 10:00-10:20 How does CORAL help other workloads? >>> 10:20-10:40 --- Break --- >>> 10:40-11:00 Customer Talk ? The New York Genome Center >>> 11:00-11:20 Spinning up a Hadoop cluster on demand >>> 11:20-11:40 Customer Talk ? Mt. Sinai School of Medicine >>> 11:40-12:10 Spectrum Scale Network Flow >>> 12:10-13:00 --- Lunch --- >>> 13:00-13:40 Special Announcement and Demonstration >>> 13:40-14:00 Multi-cloud Transparent Cloud Tiering >>> 14:00-14:20 Customer Talk ? Princeton University >>> 14:20-14:40 AI Reference Architecture >>> 14:40-15:00 Updates on Container Support >>> 15:00-15:20 Customer Talk ? TBD >>> 15:20-15:40 --- Break --- >>> 15:40-16:10 IBM Spectrum Scale Tuning and Troubleshooting >>> 16:10-16:40 Service Update >>> 16:40-17:10 Open Forum >>> 17:10-17:30 Wrap-Up >>> 17:30- Social Event >>> >>> >>> Bob Oesterlin >>> Sr Principal Storage Engineer, Nuance >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> -- >> >> Dan Young >> Solutions Architect, Pixit Media >> +1-347-249-7413 | dyoung at pixitmedia.com >> www.pixitmedia.com | Tw:@pixitmedia >> >> >> This email is confidential in that it is intended for the exclusive attention of the addressee(s) indicated. If you are not the intended recipient, this email should not be read or disclosed to any other person. Please notify the sender immediately and delete this email from your computer system. Any opinions expressed are not necessarily those of the company from which this email was sent and, whilst to the best of our knowledge no viruses or defects exist, no responsibility can be accepted for any loss or damage arising from its receipt or subsequent use of this email. >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From mnaineni at in.ibm.com Fri Oct 26 06:09:45 2018 From: mnaineni at in.ibm.com (Malahal R Naineni) Date: Fri, 26 Oct 2018 05:09:45 +0000 Subject: [gpfsug-discuss] Nested NFSv4 Exports In-Reply-To: <1497297460.32545018.1540469527448.JavaMail.zimbra@desy.de> References: <1497297460.32545018.1540469527448.JavaMail.zimbra@desy.de> Message-ID: An HTML attachment was scrubbed... URL: From stefan.dietrich at desy.de Fri Oct 26 12:18:20 2018 From: stefan.dietrich at desy.de (Dietrich, Stefan) Date: Fri, 26 Oct 2018 13:18:20 +0200 (CEST) Subject: [gpfsug-discuss] Nested NFSv4 Exports In-Reply-To: References: <1497297460.32545018.1540469527448.JavaMail.zimbra@desy.de> Message-ID: <2127020802.32763936.1540552700548.JavaMail.zimbra@desy.de> Hi Malhal, thanks for the input. I did already run Ganesha in debug mode, maybe this snippet I saved from that time might be helpful: 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] export_check_access :EXPORT :M_DBG :Check for address 192.168.142.92 for export id 3 fullpath /gpfs/exfel/d/proc 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] client_match :EXPORT :M_DBG :Match 0x941550, type = HOSTIF_CLIENT, options 0x42302050 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] LogClientListEntry :EXPORT :M_DBG : 0x941550 HOSTIF_CLIENT: 192.168.8.32 (root_squash , R-r-, 34-, ---, TCP, ----, M anage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] client_match :EXPORT :M_DBG :Match 0x940c90, type = HOSTIF_CLIENT, options 0x42302050 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] LogClientListEntry :EXPORT :M_DBG : 0x940c90 HOSTIF_CLIENT: 192.168.8.33 (root_squash , R-r-, 34-, ---, TCP, ----, M anage_Gids , -- Deleg, anon_uid= -2, anon_gid= -2, sys) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] export_check_access :EXPORT :M_DBG :EXPORT ( , , , , , -- Dele g, , ) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] export_check_access :EXPORT :M_DBG :EXPORT_DEFAULTS (root_squash , ----, 34-, ---, TCP, ----, No Manage_Gids, , anon_uid= -2, anon_gid= -2, sys) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] export_check_access :EXPORT :M_DBG :default options (root_squash , ----, 34-, UDP, TCP, ----, No Manage_Gids, -- Dele g, anon_uid= -2, anon_gid= -2, none, sys) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] export_check_access :EXPORT :M_DBG :Final options (root_squash , ----, 34-, ---, TCP, ----, No Manage_Gids, -- Dele g, anon_uid= -2, anon_gid= -2, sys) 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] nfs4_export_check_access :NFS4 :INFO :NFS4: INFO: Access not allowed on Export_Id 3 /gpfs/exfel/d/proc for client ::fff f:192.168.142.92 2018-10-16 13:21:11 : epoch 00080017 : ces-001.desy.de : ganesha.nfsd-119406[work-162] nfs4_op_lookup :EXPORT :DEBUG :NFS4ERR_ACCESS Hiding Export_Id 3 Path /gpfs/exfel/d/proc with NFS4ERR_NOENT 192.168.142.92 would be the client2 from my pseudo example, /gpfs/exfel/d/proc resembles /gpfs/filesystem1/directory1 Ganesha never checks anything for /gpfs/filesystem1/directory1/sub-directory1...or rather a subdir of /gpfs/exfel/d/proc Is this what you meant by looking at the real export object? If you think this is a bug, I would open a case in order to get this analyzed. mmnfs does not show me any pseudo options, I think this has been included in 5.0.2. Regards, Stefan ----- Original Message ----- > From: "Malahal R Naineni" > To: gpfsug-discuss at spectrumscale.org > Cc: gpfsug-discuss at spectrumscale.org > Sent: Friday, October 26, 2018 7:09:45 AM > Subject: Re: [gpfsug-discuss] Nested NFSv4 Exports >>> /gpfs/filesystem1/directory1/sub-directory1 is exported to client2 as >>> read-write. >>> client2 is not included in the export for /gpfs/filesystem1/directory1. >>> Mounting /gpfs/filesystem1/directory1/sub-directory1 on client2 does not work >>> and results in a permission denied > Any NFSv4 implementation needs to traverse the pseudo path for being able to > mount an export. One would expect "client2" to traverse over > /gpfs/filesystem1/directory1/ but not list its content/other files. I strongly > think this is a bug in Ganesha implementation, it is probably looking at the > real-export object than the pseudo-object for permission checking. > One option is to change the Pseudo file system layout. For example, > "/gpfs/client2" as "Pseudo" option for export with path " > /gpfs/filesystem1/directory1/sub-directory1". This is directly not possible > with Spectrum CLI command mmnfs unless you are using the latest and greatest > ("mmnfs export add" usage would show if it supports Pseudo option). Of course, > you can manually do it (using CCR) as Ganesha itself allows it. > Yes, NFSv3 has no pseudo traversal, it should work. > Regards, Malahal. > > > ----- Original message ----- > From: "Dietrich, Stefan" > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug-discuss at spectrumscale.org > Cc: > Subject: [gpfsug-discuss] Nested NFSv4 Exports > Date: Thu, Oct 25, 2018 5:52 PM > Hi, > > I am currently fiddling around with some nested NFSv4 exports and the differing > behaviour to NFSv3. > The environment is a GPFS 5.0.1 with enabled CES, so Ganesha is used as the NFS > server. > > Given the following (pseudo) directory structure: > > /gpfs/filesystem1/directory1 > /gpfs/filesystem1/directory1/sub-directory1 > /gpfs/filesystem1/directory1/sub-directory2 > > Now to the exports: > /gpfs/filesystem1/directory1 is exported to client1 as read-only. > /gpfs/filesystem1/directory1/sub-directory1 is exported to client2 as > read-write. > > client2 is not included in the export for /gpfs/filesystem1/directory1. > > Mounting /gpfs/filesystem1/directory1 on client1 works as expected. > Mounting /gpfs/filesystem1/directory1/sub-directory1 on client2 does not work > and results in a permission denied. > If I change the protocol from NFSv4 to NFSv3, it works. > > There is a section about nested NFS exports in the mmnfs doc: > Creating nested exports (such as /path/to/folder and /path/to/folder/subfolder) > is strongly discouraged since this might lead to serious issues in data > consistency. Be very cautious when creating and using nested exports. > If there is a need to have nested exports (such as /path/to/folder and > /path/to/folder/inside/subfolder), NFSv4 client that mounts the parent > (/path/to/folder) export will not be able to see the child export subtree > (/path/to/folder/inside/subfolder) unless the same client is explicitly allowed > to access the child export as well. This is okay as long as the client uses > only NFSv4 mounts. > > The Linux kernel NFS server and other NFSv4 servers do not show this behaviour. > Is there a way to bypass this with CES/Ganesha? Or is the only solution to add > client2 to /gpfs/filesystem1/directory1? > > Regards, > Stefan > > -- > ------------------------------------------------------------------------ > Stefan Dietrich Deutsches Elektronen-Synchrotron (IT-Systems) > Ein Forschungszentrum der Helmholtz-Gemeinschaft > Notkestr. 85 > phone: +49-40-8998-4696 22607 Hamburg > e-mail: stefan.dietrich at desy.de Germany > ------------------------------------------------------------------------ > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > [ http://gpfsug.org/mailman/listinfo/gpfsug-discuss | > http://gpfsug.org/mailman/listinfo/gpfsug-discuss ] > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss From xhejtman at ics.muni.cz Fri Oct 26 15:24:38 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 26 Oct 2018 16:24:38 +0200 Subject: [gpfsug-discuss] Multihomed nodes and failover networks Message-ID: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> Hello, does anyone know whether there is a chance to use e.g., 10G ethernet together with IniniBand network for multihoming of GPFS nodes? I mean to setup two different type of networks to mitigate network failures. I read that you can have several networks configured in GPFS but it does not provide failover. Nothing changed in this as of GPFS version 5.x? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From S.J.Thompson at bham.ac.uk Fri Oct 26 15:48:48 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 26 Oct 2018 14:48:48 +0000 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> Message-ID: If IB is enabled and is setup with verbs, then this is the preferred network. GPFS will always fail-back to Ethernet afterwards, however what you can't do is have multiple "subnets" defined and have GPFS fail between different Ethernet networks. Simon ?On 26/10/2018, 15:37, "gpfsug-discuss-bounces at spectrumscale.org on behalf of xhejtman at ics.muni.cz" wrote: Hello, does anyone know whether there is a chance to use e.g., 10G ethernet together with IniniBand network for multihoming of GPFS nodes? I mean to setup two different type of networks to mitigate network failures. I read that you can have several networks configured in GPFS but it does not provide failover. Nothing changed in this as of GPFS version 5.x? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From xhejtman at ics.muni.cz Fri Oct 26 15:52:43 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 26 Oct 2018 16:52:43 +0200 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> Message-ID: <20181026145243.jflbns4zfoxgmshi@ics.muni.cz> On Fri, Oct 26, 2018 at 02:48:48PM +0000, Simon Thompson wrote: > If IB is enabled and is setup with verbs, then this is the preferred > network. GPFS will always fail-back to Ethernet afterwards, however what you > can't do is have multiple "subnets" defined and have GPFS fail between > different Ethernet networks. Does it fail-back to Etherenet even in runtime? I mean, doesn't fail-back happen only during mmstartup? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From jonathan.buzzard at strath.ac.uk Fri Oct 26 15:52:43 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Fri, 26 Oct 2018 15:52:43 +0100 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> Message-ID: <1cec0eaf-b6d9-77ee-889f-79d57d105615@strath.ac.uk> On 26/10/2018 15:48, Simon Thompson wrote: > If IB is enabled and is setup with verbs, then this is the preferred > network. GPFS will always fail-back to Ethernet afterwards, however > what you can't do is have multiple "subnets" defined and have GPFS > fail between different Ethernet networks. > If you want mitigate network failures then you need to mitigate it at layer 2. However it won't be cheap. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From xhejtman at ics.muni.cz Fri Oct 26 15:56:45 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 26 Oct 2018 16:56:45 +0200 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: <1cec0eaf-b6d9-77ee-889f-79d57d105615@strath.ac.uk> References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> <1cec0eaf-b6d9-77ee-889f-79d57d105615@strath.ac.uk> Message-ID: <20181026145645.qzn24jp26anxayub@ics.muni.cz> On Fri, Oct 26, 2018 at 03:52:43PM +0100, Jonathan Buzzard wrote: > On 26/10/2018 15:48, Simon Thompson wrote: > > If IB is enabled and is setup with verbs, then this is the preferred > > network. GPFS will always fail-back to Ethernet afterwards, however > > what you can't do is have multiple "subnets" defined and have GPFS > > fail between different Ethernet networks. > > > > If you want mitigate network failures then you need to mitigate it at layer > 2. However it won't be cheap. well, I believe this should be exactly what more 'subnets' are used for.. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From xhejtman at ics.muni.cz Fri Oct 26 15:57:53 2018 From: xhejtman at ics.muni.cz (Lukas Hejtmanek) Date: Fri, 26 Oct 2018 16:57:53 +0200 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: <20181026145243.jflbns4zfoxgmshi@ics.muni.cz> References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> <20181026145243.jflbns4zfoxgmshi@ics.muni.cz> Message-ID: <20181026145753.ijokzwbjh3aznxwr@ics.muni.cz> On Fri, Oct 26, 2018 at 04:52:43PM +0200, Lukas Hejtmanek wrote: > On Fri, Oct 26, 2018 at 02:48:48PM +0000, Simon Thompson wrote: > > If IB is enabled and is setup with verbs, then this is the preferred > > network. GPFS will always fail-back to Ethernet afterwards, however what you > > can't do is have multiple "subnets" defined and have GPFS fail between > > different Ethernet networks. > > Does it fail-back to Etherenet even in runtime? I mean, doesn't fail-back > happen only during mmstartup? moreover, are verbs used also for cluster management? E.g., node keepalive messages. -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title From S.J.Thompson at bham.ac.uk Fri Oct 26 15:59:08 2018 From: S.J.Thompson at bham.ac.uk (Simon Thompson) Date: Fri, 26 Oct 2018 14:59:08 +0000 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: <20181026145243.jflbns4zfoxgmshi@ics.muni.cz> References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> <20181026145243.jflbns4zfoxgmshi@ics.muni.cz> Message-ID: Yes ... if the IB network goes down ... But it's not really fault tolerant, as you need the admin network for token management, so you could lose IB and have data fail to the Ethernet path, but not lose Ethernet. And it doesn't (or didn't) fail back to IB when IB come live again, though that might have changed with 5.0.2. Simon ?On 26/10/2018, 15:52, "gpfsug-discuss-bounces at spectrumscale.org on behalf of xhejtman at ics.muni.cz" wrote: On Fri, Oct 26, 2018 at 02:48:48PM +0000, Simon Thompson wrote: > If IB is enabled and is setup with verbs, then this is the preferred > network. GPFS will always fail-back to Ethernet afterwards, however what you > can't do is have multiple "subnets" defined and have GPFS fail between > different Ethernet networks. Does it fail-back to Etherenet even in runtime? I mean, doesn't fail-back happen only during mmstartup? -- Luk?? Hejtm?nek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss From eric.wonderley at vt.edu Fri Oct 26 15:44:13 2018 From: eric.wonderley at vt.edu (J. Eric Wonderley) Date: Fri, 26 Oct 2018 10:44:13 -0400 Subject: [gpfsug-discuss] Multihomed nodes and failover networks In-Reply-To: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> References: <20181026142438.vou5hhbd27ehtg2o@ics.muni.cz> Message-ID: Multihoming is accomplished by using subnets...see mmchconfig. Failover networks on the other hand are not allowed. Bad network behavior is dealt with by expelling nodes. You must have decent/supported network gear...we have learned that lesson the hard way On Fri, Oct 26, 2018 at 10:37 AM Lukas Hejtmanek wrote: > Hello, > > does anyone know whether there is a chance to use e.g., 10G ethernet > together > with IniniBand network for multihoming of GPFS nodes? > > I mean to setup two different type of networks to mitigate network > failures. > I read that you can have several networks configured in GPFS but it does > not > provide failover. Nothing changed in this as of GPFS version 5.x? > > -- > Luk?? Hejtm?nek > > Linux Administrator only because > Full Time Multitasking Ninja > is not an official job title > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vtarasov at us.ibm.com Fri Oct 26 23:58:16 2018 From: vtarasov at us.ibm.com (Vasily Tarasov) Date: Fri, 26 Oct 2018 22:58:16 +0000 Subject: [gpfsug-discuss] If you're attending KubeCon'18 Message-ID: An HTML attachment was scrubbed... URL: From Robert.Oesterlin at nuance.com Mon Oct 29 00:29:51 2018 From: Robert.Oesterlin at nuance.com (Oesterlin, Robert) Date: Mon, 29 Oct 2018 00:29:51 +0000 Subject: [gpfsug-discuss] Presentations from SSUG Meeting, Oct 24th - NY Genome Center Message-ID: <2CF4E6B3-B39E-4567-91A5-58C39A720362@nuance.com> These are now on the web site under ?Presentations? - single zip file has them all. Bob Oesterlin Sr Principal Storage Engineer, Nuance -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Mon Oct 29 16:33:35 2018 From: aaron.s.knister at nasa.gov (Aaron Knister) Date: Mon, 29 Oct 2018 12:33:35 -0400 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Message-ID: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> Flipping through the slides from the recent SSUG meeting I noticed that in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. Reading up on it it seems as though it comes with a warning about significant I/O performance degradation and increase in CPU usage. I also recall that data integrity checking is performed by default with GNR. How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? As someone who knows exactly 0 of the implementation details, I'm just naively assuming that the checksum are being generated (in the same way?) in both cases and transferred to the NSD server. Why is there such a penalty for "traditional" environments? -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 From kums at us.ibm.com Mon Oct 29 19:56:09 2018 From: kums at us.ibm.com (Kumaran Rajaram) Date: Mon, 29 Oct 2018 14:56:09 -0500 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> Message-ID: Hi, >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? >>Why is there such a penalty for "traditional" environments? In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). My two cents. Regards, -Kums From: Aaron Knister To: gpfsug main discussion list Date: 10/29/2018 12:34 PM Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Sent by: gpfsug-discuss-bounces at spectrumscale.org Flipping through the slides from the recent SSUG meeting I noticed that in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. Reading up on it it seems as though it comes with a warning about significant I/O performance degradation and increase in CPU usage. I also recall that data integrity checking is performed by default with GNR. How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? As someone who knows exactly 0 of the implementation details, I'm just naively assuming that the checksum are being generated (in the same way?) in both cases and transferred to the NSD server. Why is there such a penalty for "traditional" environments? -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Mon Oct 29 20:47:24 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 29 Oct 2018 16:47:24 -0400 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> Message-ID: <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> So the ESS checksums that are highly touted as "protecting all the way to the disk surface" completely ignore the transfer between the client and the NSD server? It sounds like you are saying that all of the checksumming done for GNR is internal to GNR and only protects against bit-flips on the disk (and in staging buffers, etc.) I?m asking because your explanation completely ignores calculating anything on the NSD client and implies that the client could not participate, given that it does not know about the structure of the vdisks under the NSD ? but that has to be a performance factor for both types if the transfer is protected starting at the client ? which it is in the case of nsdCksumTraditional which is what we are comparing to ESS checksumming. If ESS checksumming doesn?t protect on the wire I?d say that marketing has run amok, because that has *definitely* been implied in meetings for which I?ve been present. In fact, when asked if Spectrum Scale provides checksumming for data in-flight, IBM sales has used it as an ESS up-sell opportunity. -- Stephen > On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram > wrote: > > Hi, > > >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? > >>Why is there such a penalty for "traditional" environments? > > In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. > > In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). > > My two cents. > > Regards, > -Kums > > > > > > From: Aaron Knister > > To: gpfsug main discussion list > > Date: 10/29/2018 12:34 PM > Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Flipping through the slides from the recent SSUG meeting I noticed that > in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. > Reading up on it it seems as though it comes with a warning about > significant I/O performance degradation and increase in CPU usage. I > also recall that data integrity checking is performed by default with > GNR. How can it be that the I/O performance degradation warning only > seems to accompany the nsdCksumTraditional setting and not GNR? As > someone who knows exactly 0 of the implementation details, I'm just > naively assuming that the checksum are being generated (in the same > way?) in both cases and transferred to the NSD server. Why is there such > a penalty for "traditional" environments? > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From knop at us.ibm.com Mon Oct 29 21:27:41 2018 From: knop at us.ibm.com (Felipe Knop) Date: Mon, 29 Oct 2018 16:27:41 -0500 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> Message-ID: Stephen, ESS does perform checksums in the transfer between NSD clients and NSD servers. As Kums described below, the difference between the checksums performed by GNR and those performed with "nsdCksumTraditional" is that GNR checksums are computed in parallel on the server side, as a large FS block is broken into smaller pieces. On non-GNR environments (when nsdCksumTraditional is set), the checksum is computed sequentially on the server. Felipe ---- Felipe Knop knop at us.ibm.com GPFS Development and Security IBM Systems IBM Building 008 2455 South Rd, Poughkeepsie, NY 12601 (845) 433-9314 T/L 293-9314 From: Stephen Ulmer To: gpfsug main discussion list Date: 10/29/2018 04:52 PM Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Sent by: gpfsug-discuss-bounces at spectrumscale.org So the ESS checksums that are highly touted as "protecting all the way to the disk surface" completely ignore the transfer between the client and the NSD server? It sounds like you are saying that all of the checksumming done for GNR is internal to GNR and only protects against bit-flips on the disk (and in staging buffers, etc.) I?m asking because your explanation completely ignores calculating anything on the NSD client and implies that the client could not participate, given that it does not know about the structure of the vdisks under the NSD ? but that has to be a performance factor for both types if the transfer is protected starting at the client ? which it is in the case of nsdCksumTraditional which is what we are comparing to ESS checksumming. If ESS checksumming doesn?t protect on the wire I?d say that marketing has run amok, because that has *definitely* been implied in meetings for which I?ve been present. In fact, when asked if Spectrum Scale provides checksumming for data in-flight, IBM sales has used it as an ESS up-sell opportunity. -- Stephen On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram wrote: Hi, >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? >>Why is there such a penalty for "traditional" environments? In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). My two cents. Regards, -Kums From: Aaron Knister To: gpfsug main discussion list < gpfsug-discuss at spectrumscale.org> Date: 10/29/2018 12:34 PM Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Sent by: gpfsug-discuss-bounces at spectrumscale.org Flipping through the slides from the recent SSUG meeting I noticed that in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. Reading up on it it seems as though it comes with a warning about significant I/O performance degradation and increase in CPU usage. I also recall that data integrity checking is performed by default with GNR. How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? As someone who knows exactly 0 of the implementation details, I'm just naively assuming that the checksum are being generated (in the same way?) in both cases and transferred to the NSD server. Why is there such a penalty for "traditional" environments? -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: From kums at us.ibm.com Mon Oct 29 21:29:33 2018 From: kums at us.ibm.com (Kumaran Rajaram) Date: Mon, 29 Oct 2018 16:29:33 -0500 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> Message-ID: In non-GNR setup, nsdCksumTraditional=yes enables data-integrity checking between a traditional NSD client node and its NSD server, at the network level only. The ESS storage supports end-to-end checksum, NSD client to the ESS IO servers (at the network level) as well as from ESS IO servers to the disk/storage. This is further detailed in the docs (link below): https://www.ibm.com/support/knowledgecenter/en/SSYSP8_5.3.1/com.ibm.spectrum.scale.raid.v5r01.adm.doc/bl1adv_introe2echecksum.htm Best, -Kums From: Stephen Ulmer To: gpfsug main discussion list Date: 10/29/2018 04:52 PM Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Sent by: gpfsug-discuss-bounces at spectrumscale.org So the ESS checksums that are highly touted as "protecting all the way to the disk surface" completely ignore the transfer between the client and the NSD server? It sounds like you are saying that all of the checksumming done for GNR is internal to GNR and only protects against bit-flips on the disk (and in staging buffers, etc.) I?m asking because your explanation completely ignores calculating anything on the NSD client and implies that the client could not participate, given that it does not know about the structure of the vdisks under the NSD ? but that has to be a performance factor for both types if the transfer is protected starting at the client ? which it is in the case of nsdCksumTraditional which is what we are comparing to ESS checksumming. If ESS checksumming doesn?t protect on the wire I?d say that marketing has run amok, because that has *definitely* been implied in meetings for which I?ve been present. In fact, when asked if Spectrum Scale provides checksumming for data in-flight, IBM sales has used it as an ESS up-sell opportunity. -- Stephen On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram wrote: Hi, >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? >>Why is there such a penalty for "traditional" environments? In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). My two cents. Regards, -Kums From: Aaron Knister To: gpfsug main discussion list Date: 10/29/2018 12:34 PM Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) Sent by: gpfsug-discuss-bounces at spectrumscale.org Flipping through the slides from the recent SSUG meeting I noticed that in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. Reading up on it it seems as though it comes with a warning about significant I/O performance degradation and increase in CPU usage. I also recall that data integrity checking is performed by default with GNR. How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? As someone who knows exactly 0 of the implementation details, I'm just naively assuming that the checksum are being generated (in the same way?) in both cases and transferred to the NSD server. Why is there such a penalty for "traditional" environments? -Aaron -- Aaron Knister NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center (301) 286-2776 _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From ulmer at ulmer.org Tue Oct 30 00:39:35 2018 From: ulmer at ulmer.org (Stephen Ulmer) Date: Mon, 29 Oct 2018 20:39:35 -0400 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> Message-ID: The point of the original question was to discover why there is a warning about performance for nsdChksumTraditional=yes, but that warning doesn?t seem to apply to an ESS environment. Your reply was that checksums in an ESS environment are calculated in parallel on the NSD server based on the physical storage layout used underneath the NSD, and is thus faster. My point was that if there is never a checksum calculated by the NSD client, then how does the NSD server know that it got uncorrupted data? The link you referenced below (thank you!) indicates that, in fact, the NSD client DOES calculate a checksum and forward it with the data to the NSD server. The server validates the data (necessitating a re-calculation of the checksum), and then GNR stores the data, A CHECKSUM[1], and some block metadata to media. So this leaves us with a checksum calculated by the client and then validated (re-calculated) by the server ? IN BOTH CASES. For the GNR case, another checksum in calculated and stored with the data for another purpose, but that means that the nsdChksumTraditional=yes case is exactly like the first phase of the GNR case. So why is that case slower when it does less work? Slow enough to merit a warning, no less! I?m really not trying to be a pest, but I have a logic problem with either the question or the answer ? they aren?t consistent (or I can?t rationalize them to be so). -- Stephen [1] The document is vague (I believe intentionally, because it could have easily been made clear) as to whether this is the same checksum or a different one. Presumably the server-side-new-checksum is calculated in parallel and protects the chunklets or whatever they're called. This is all consistent with what you said! > On Oct 29, 2018, at 5:29 PM, Kumaran Rajaram > wrote: > > In non-GNR setup, nsdCksumTraditional=yes enables data-integrity checking between a traditional NSD client node and its NSD server, at the network level only. > > The ESS storage supports end-to-end checksum, NSD client to the ESS IO servers (at the network level) as well as from ESS IO servers to the disk/storage. This is further detailed in the docs (link below): > > https://www.ibm.com/support/knowledgecenter/en/SSYSP8_5.3.1/com.ibm.spectrum.scale.raid.v5r01.adm.doc/bl1adv_introe2echecksum.htm > > Best, > -Kums > > > > > > From: Stephen Ulmer > > To: gpfsug main discussion list > > Date: 10/29/2018 04:52 PM > Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > So the ESS checksums that are highly touted as "protecting all the way to the disk surface" completely ignore the transfer between the client and the NSD server? It sounds like you are saying that all of the checksumming done for GNR is internal to GNR and only protects against bit-flips on the disk (and in staging buffers, etc.) > > I?m asking because your explanation completely ignores calculating anything on the NSD client and implies that the client could not participate, given that it does not know about the structure of the vdisks under the NSD ? but that has to be a performance factor for both types if the transfer is protected starting at the client ? which it is in the case of nsdCksumTraditional which is what we are comparing to ESS checksumming. > > If ESS checksumming doesn?t protect on the wire I?d say that marketing has run amok, because that has *definitely* been implied in meetings for which I?ve been present. In fact, when asked if Spectrum Scale provides checksumming for data in-flight, IBM sales has used it as an ESS up-sell opportunity. > > -- > Stephen > > > > On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram > wrote: > > Hi, > > >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? > >>Why is there such a penalty for "traditional" environments? > > In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. > > In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). > > My two cents. > > Regards, > -Kums > > > > > > From: Aaron Knister > > To: gpfsug main discussion list > > Date: 10/29/2018 12:34 PM > Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) > Sent by: gpfsug-discuss-bounces at spectrumscale.org > > > > Flipping through the slides from the recent SSUG meeting I noticed that > in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. > Reading up on it it seems as though it comes with a warning about > significant I/O performance degradation and increase in CPU usage. I > also recall that data integrity checking is performed by default with > GNR. How can it be that the I/O performance degradation warning only > seems to accompany the nsdCksumTraditional setting and not GNR? As > someone who knows exactly 0 of the implementation details, I'm just > naively assuming that the checksum are being generated (in the same > way?) in both cases and transferred to the NSD server. Why is there such > a penalty for "traditional" environments? > > -Aaron > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From abeattie at au1.ibm.com Tue Oct 30 00:53:06 2018 From: abeattie at au1.ibm.com (Andrew Beattie) Date: Tue, 30 Oct 2018 00:53:06 +0000 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: References: , <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov><326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> Message-ID: An HTML attachment was scrubbed... URL: From jonathan.buzzard at strath.ac.uk Tue Oct 30 09:03:06 2018 From: jonathan.buzzard at strath.ac.uk (Jonathan Buzzard) Date: Tue, 30 Oct 2018 09:03:06 +0000 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> Message-ID: <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> On 29/10/2018 20:47, Stephen Ulmer wrote: [SNIP] > > If ESS checksumming doesn?t protect on the wire I?d say that marketing > has run amok, because that has *definitely* been implied in meetings for > which I?ve been present. In fact, when asked if?Spectrum Scale provides > checksumming for data in-flight, IBM sales has used it as an ESS up-sell > opportunity. > Noting that on a TCP/IP network anything passing over a TCP connection is checksummed at the network layer. Consequently any addition checksumming is basically superfluous. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG From daniel.kidger at uk.ibm.com Tue Oct 30 10:56:09 2018 From: daniel.kidger at uk.ibm.com (Daniel Kidger) Date: Tue, 30 Oct 2018 10:56:09 +0000 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: Message-ID: Remember too that in a traditional GPFS setup, the NSD servers are effectively merely data routers (since the clients know exactly where the block is going to be written) and as such NSD servers can be previous generation hardware. By contrast GNR needs cpu cycles and plenty of memory, so ESS nodes are naturally big and fast (as well as benefitting from parallel threads working together on the GNR). Daniel Dr Daniel Kidger IBM Technical Sales Specialist Software Defined Solution Sales +44-(0)7818 522 266 daniel.kidger at uk.ibm.com > On 30 Oct 2018, at 00:53, Andrew Beattie wrote: > > Stephen, > > I think you also need to take into consideration that IBM does not control what infrastructure users may chose to deploy Spectrum scale on outside of ESS hardware. > > As such it is entirely possible that older or lower spec hardware, or even virtualised NSD Servers with even lower resources per virtual node, will have potential issues when running the nsdChksumTraditional=yes flag, As such IBM has a duty of care to provide a warning that you may experience issues if you turn the additional workload on. > > Beyond this i'm not seeing why there is an issue, if you turn the flag on in a non ESS scenario the process is Serialised, if you turn it on in an ESS Scenario you get to take advantage of the fact that Scale Native Raid does a significant amount of the work in a parallelised method, one is less resource intensive than the other, because the process is handled differently depending on the type of NSD Servers doing the work. > > > > Andrew Beattie > Software Defined Storage - IT Specialist > Phone: 614-2133-7927 > E-mail: abeattie at au1.ibm.com > > > ----- Original message ----- > From: Stephen Ulmer > Sent by: gpfsug-discuss-bounces at spectrumscale.org > To: gpfsug main discussion list > Cc: > Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) > Date: Tue, Oct 30, 2018 10:39 AM > > The point of the original question was to discover why there is a warning about performance for nsdChksumTraditional=yes, but that warning doesn?t seem to apply to an ESS environment. > > Your reply was that checksums in an ESS environment are calculated in parallel on the NSD server based on the physical storage layout used underneath the NSD, and is thus faster. My point was that if there is never a checksum calculated by the NSD client, then how does the NSD server know that it got uncorrupted data? > > The link you referenced below (thank you!) indicates that, in fact, the NSD client DOES calculate a checksum and forward it with the data to the NSD server. The server validates the data (necessitating a re-calculation of the checksum), and then GNR stores the data, A CHECKSUM[1], and some block metadata to media. > > So this leaves us with a checksum calculated by the client and then validated (re-calculated) by the server ? IN BOTH CASES. For the GNR case, another checksum in calculated and stored with the data for another purpose, but that means that the nsdChksumTraditional=yes case is exactly like the first phase of the GNR case. So why is that case slower when it does less work? Slow enough to merit a warning, no less! > > I?m really not trying to be a pest, but I have a logic problem with either the question or the answer ? they aren?t consistent (or I can?t rationalize them to be so). > > -- > Stephen > > [1] The document is vague (I believe intentionally, because it could have easily been made clear) as to whether this is the same checksum or a different one. Presumably the server-side-new-checksum is calculated in parallel and protects the chunklets or whatever they're called. This is all consistent with what you said! > > > >> >> On Oct 29, 2018, at 5:29 PM, Kumaran Rajaram wrote: >> >> In non-GNR setup, nsdCksumTraditional=yes enables data-integrity checking between a traditional NSD client node and its NSD server, at the network level only. >> >> The ESS storage supports end-to-end checksum, NSD client to the ESS IO servers (at the network level) as well as from ESS IO servers to the disk/storage. This is further detailed in the docs (link below): >> >> https://www.ibm.com/support/knowledgecenter/en/SSYSP8_5.3.1/com.ibm.spectrum.scale.raid.v5r01.adm.doc/bl1adv_introe2echecksum.htm >> >> Best, >> -Kums >> >> >> >> >> >> From: Stephen Ulmer >> To: gpfsug main discussion list >> Date: 10/29/2018 04:52 PM >> Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> So the ESS checksums that are highly touted as "protecting all the way to the disk surface" completely ignore the transfer between the client and the NSD server? It sounds like you are saying that all of the checksumming done for GNR is internal to GNR and only protects against bit-flips on the disk (and in staging buffers, etc.) >> >> I?m asking because your explanation completely ignores calculating anything on the NSD client and implies that the client could not participate, given that it does not know about the structure of the vdisks under the NSD ? but that has to be a performance factor for both types if the transfer is protected starting at the client ? which it is in the case of nsdCksumTraditional which is what we are comparing to ESS checksumming. >> >> If ESS checksumming doesn?t protect on the wire I?d say that marketing has run amok, because that has *definitely* been implied in meetings for which I?ve been present. In fact, when asked if Spectrum Scale provides checksumming for data in-flight, IBM sales has used it as an ESS up-sell opportunity. >> >> -- >> Stephen >> >> >> >> On Oct 29, 2018, at 3:56 PM, Kumaran Rajaram wrote: >> >> Hi, >> >> >>How can it be that the I/O performance degradation warning only seems to accompany the nsdCksumTraditional setting and not GNR? >> >>Why is there such a penalty for "traditional" environments? >> >> In GNR IO/NSD servers (ESS IO nodes), the checksums are computed in parallel for a NSD (storage volume/vdisk) across the threads handling each pdisk/drive (that constitutes the vdisk/volume). This is possible since the GNR software on the ESS IO servers is tightly integrated with underlying storage and is aware of the vdisk DRAID configuration (strip-size, pdisk constituting vdisk etc.) to perform parallel checksum operations. >> >> In non-GNR + external storage model, the GPFS software on the NSD server(s) does not manage the underlying storage volume (this is done by storage RAID controllers) and the checksum is computed serially. This would contribute to increase in CPU usage and I/O performance degradation (depending on I/O access patterns, I/O load etc). >> >> My two cents. >> >> Regards, >> -Kums >> >> >> >> >> >> From: Aaron Knister >> To: gpfsug main discussion list >> Date: 10/29/2018 12:34 PM >> Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) >> Sent by: gpfsug-discuss-bounces at spectrumscale.org >> >> >> >> Flipping through the slides from the recent SSUG meeting I noticed that >> in 5.0.2 one of the features mentioned was the nsdCksumTraditional flag. >> Reading up on it it seems as though it comes with a warning about >> significant I/O performance degradation and increase in CPU usage. I >> also recall that data integrity checking is performed by default with >> GNR. How can it be that the I/O performance degradation warning only >> seems to accompany the nsdCksumTraditional setting and not GNR? As >> someone who knows exactly 0 of the implementation details, I'm just >> naively assuming that the checksum are being generated (in the same >> way?) in both cases and transferred to the NSD server. Why is there such >> a penalty for "traditional" environments? >> >> -Aaron >> >> -- >> Aaron Knister >> NASA Center for Climate Simulation (Code 606.2) >> Goddard Space Flight Center >> (301) 286-2776 >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron.s.knister at nasa.gov Tue Oct 30 12:30:20 2018 From: aaron.s.knister at nasa.gov (Knister, Aaron S. (GSFC-606.2)[InuTeq, LLC]) Date: Tue, 30 Oct 2018 12:30:20 +0000 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org>, <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> Message-ID: <0765E436-870B-430D-89D3-89CE60E94CCB@nasa.gov> I?m guessing IBM doesn?t generally spend huge amounts of money on things that are superfluous...although *cough*RedHat*cough*. TCP does of course perform checksumming, but I see the NSD checksums as being at a higher ?layer?, if you will. The layer at which I believe the NSD checksums operate sits above the complex spaghetti monster of queues, buffers, state machines, kernel/user space communication inside of GPFS as well as networking drivers that can suck (looking at you Intel, Mellanox), and high speed networking hardware all of which I?ve seen cause data corruption (even though the data on the wire was in some cases checksummed correctly). -Aaron On October 30, 2018 at 05:03:26 EDT, Jonathan Buzzard wrote: On 29/10/2018 20:47, Stephen Ulmer wrote: [SNIP] > > If ESS checksumming doesn?t protect on the wire I?d say that marketing > has run amok, because that has *definitely* been implied in meetings for > which I?ve been present. In fact, when asked if Spectrum Scale provides > checksumming for data in-flight, IBM sales has used it as an ESS up-sell > opportunity. > Noting that on a TCP/IP network anything passing over a TCP connection is checksummed at the network layer. Consequently any addition checksumming is basically superfluous. JAB. -- Jonathan A. Buzzard Tel: +44141-5483420 HPC System Administrator, ARCHIE-WeSt. University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Tue Oct 30 22:14:00 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Tue, 30 Oct 2018 18:14:00 -0400 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> Message-ID: <107111.1540937640@turing-police.cc.vt.edu> On Tue, 30 Oct 2018 09:03:06 -0000, Jonathan Buzzard said: > Noting that on a TCP/IP network anything passing over a TCP connection > is checksummed at the network layer. Consequently any addition > checksumming is basically superfluous. Note that the TCP checksum is relatively weak, and designed in a day when a 56K leased line was a high-speed long-haul link and 10mbit ethernet was the fastest thing on the planet. When 10 megabytes was a large transfer, it was a reasonable amount of protection. But when you get into moving petabytes of data around, the chances of an undetected error starts getting significant. Pop quiz time: When was the last time you (the reader) checked your network statistics to see what your bit error rate was? Do you even have the ability to do so? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From bbanister at jumptrading.com Tue Oct 30 22:52:35 2018 From: bbanister at jumptrading.com (Bryan Banister) Date: Tue, 30 Oct 2018 22:52:35 +0000 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: <107111.1540937640@turing-police.cc.vt.edu> References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> <107111.1540937640@turing-police.cc.vt.edu> Message-ID: Valdis will also recall how much "fun" we had with network related corruption due to what we surmised was a TCP offload engine FW defect in a certain 10GbE HCA. Only happened sporadically every few weeks... what a nightmare that was!! -B -----Original Message----- From: gpfsug-discuss-bounces at spectrumscale.org On Behalf Of valdis.kletnieks at vt.edu Sent: Tuesday, October 30, 2018 5:14 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) [EXTERNAL EMAIL] On Tue, 30 Oct 2018 09:03:06 -0000, Jonathan Buzzard said: > Noting that on a TCP/IP network anything passing over a TCP connection > is checksummed at the network layer. Consequently any addition > checksumming is basically superfluous. Note that the TCP checksum is relatively weak, and designed in a day when a 56K leased line was a high-speed long-haul link and 10mbit ethernet was the fastest thing on the planet. When 10 megabytes was a large transfer, it was a reasonable amount of protection. But when you get into moving petabytes of data around, the chances of an undetected error starts getting significant. Pop quiz time: When was the last time you (the reader) checked your network statistics to see what your bit error rate was? Do you even have the ability to do so? ________________________________ Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential, or privileged information and/or personal data. If you are not the intended recipient, you are hereby notified that any review, dissemination, or copying of this email is strictly prohibited, and requested to notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request, or solicitation of any kind to buy, sell, subscribe, redeem, or perform any type of transaction of a financial product. Personal data, as defined by applicable data privacy laws, contained in this email may be processed by the Company, and any of its affiliated or related companies, for potential ongoing compliance and/or business-related purposes. You may have rights regarding your personal data; for information on exercising these rights or the Company?s treatment of personal data, please email datarequests at jumptrading.com. From makaplan at us.ibm.com Tue Oct 30 23:15:38 2018 From: makaplan at us.ibm.com (Marc A Kaplan) Date: Tue, 30 Oct 2018 18:15:38 -0500 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov><326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org><72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk><107111.1540937640@turing-police.cc.vt.edu> Message-ID: I confess, I know what checksums are generally and how and why they are used, but I am not familiar with all the various checksums that have been discussed here. I'd like to see a list or a chart with the following information for each checksum: Computed on what data elements, of what (typical) length (e.g. packet, disk block, disk fragment, disk sector) Checksum function used, how many bits of checksum computed on each data element. Computed by what software or hardware entity at what nodes in the network. There may be such checksums on each NSD transfer. Lowest layers would be checking data coming off of the disk. Checking network packets coming off ethernet or IB adapters. Higher layer for NSD could be a checksum on a whole disk block and/or on NSD request and response, including message headers AND the disk data... -------------- next part -------------- An HTML attachment was scrubbed... URL: From valdis.kletnieks at vt.edu Wed Oct 31 01:09:40 2018 From: valdis.kletnieks at vt.edu (valdis.kletnieks at vt.edu) Date: Tue, 30 Oct 2018 21:09:40 -0400 Subject: [gpfsug-discuss] NSD network checksums (nsdCksumTraditional) In-Reply-To: References: <7271fdba-dfac-d198-fa13-60ad0109f683@nasa.gov> <326B0C60-5207-4990-AF45-162EDD453D07@ulmer.org> <72226f21-ed68-51bb-9eca-cc96b4bfe623@strath.ac.uk> <107111.1540937640@turing-police.cc.vt.edu> Message-ID: <122689.1540948180@turing-police.cc.vt.edu> On Tue, 30 Oct 2018 22:52:35 -0000, Bryan Banister said: > Valdis will also recall how much "fun" we had with network related corruption > due to what we surmised was a TCP offload engine FW defect in a certain 10GbE > HCA. Only happened sporadically every few weeks... what a nightmare that was!! It makes for quite the bar story, as the symptoms pointed everywhere except the network adapter. For the purposes of this thread though, two points to note: 1) The card in question was a spectacularly good price/performer and totally rock solid in 4 NFS servers that we had - in 6 years of trying, I never managed to make them hiccup (the one suspected failure turned out to be a fiber cable that had gotten crimped when the rack door was closed on a loop). 2) Since the TCP offload engine was computing the checksum across the data, but it had gotten confused about which data it was about to transmit, every single packet went out with a perfectly correct checksum. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 486 bytes Desc: not available URL: From rohwedder at de.ibm.com Wed Oct 31 15:33:54 2018 From: rohwedder at de.ibm.com (Markus Rohwedder) Date: Wed, 31 Oct 2018 16:33:54 +0100 Subject: [gpfsug-discuss] Spectrum Scale Survey Message-ID: Hello Spectrum Scale Users, we have started a survey on how certain Spectrum Scale administrative tasks are performed. The survey focuses on use of tasks like snapshots or ILM including monitoring, scheduling and problem determination of these capabilities. It should take only a few minutes to complete the survey. Please take a look and let us know how you are using Spectrum Scale and what aspects are important for you. Here is the survey link: https://www.surveygizmo.com/s3/4631738/IBM-Spectrum-Scale-Administrative-Management Mit freundlichen Gr??en / Kind regards Dr. Markus Rohwedder Spectrum Scale GUI Development Phone: +49 7034 6430190 IBM Deutschland Research & Development E-Mail: rohwedder at de.ibm.com Am Weiher 24 65451 Kelsterbach Germany -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ecblank.gif Type: image/gif Size: 45 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 18977725.gif Type: image/gif Size: 4659 bytes Desc: not available URL: From kkr at lbl.gov Wed Oct 31 20:10:02 2018 From: kkr at lbl.gov (Kristy Kallback-Rose) Date: Wed, 31 Oct 2018 13:10:02 -0700 Subject: [gpfsug-discuss] V5 client limit? Message-ID: Hi, Can someone tell me the max # of GPFS native clients under 5.x? Everything I can find is dated. Thanks Kristy