Ranger Goldstone Test PRD
Ranger Goldstone Test PRD
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
1
Version History
Revisio
Description Changed by
Revised on n
1/12/2023 0.3 Added some test items, please see items in this shading color Jinshui Liu
1/16/2023 1.0 Added some test items, Submitted for Review & MODS/DIAG development Jinshui Liu
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
2
Contents
1 PURPOSE 7
2 NOMENCLATURE 7
3 DOCUMENT STRUCTURE 7
4 REFERENCE DOCUMENTS 8
5 DEFINITIONS 9
6 RANGER SYSTEM & GOLDSTONE SWITCH BOARD OVERVIEW 11
7 TEST REQUIREMENTS 16
7.1 FRONT PANEL LED/BUTTON CHECK 19
7.2 OS 21
7.3 COME 22
7.4 LS10 NVLINK4 SWITCH IC 29
7.5 OSFP PORTS 33
7.6 COME & LS10 I2C TREE 35
7.7 GOLDSTONE SWITCHBOARD SENSORS 40
7.8 USB 44
7.9 COME UART PORT 46
7.10 GOLDSTONE SWITCH NODE PDB AND FANS 47
7.11 SYSTEM THERMAL & POWER STRESS TEST 51
7.12 PCIE I/O DEVICES 53
7.13 PCIE ENHANCEMENT 59
7.14 M.2 SSDS 60
7.15 FPGA/CPLD DEVICES 66
7.16 FRU EEPROM 72
7.17 EROT & EROT-PROTECTED SW/FW 74
8 SENSOR LIST 78
9 GOLDSTONE NODE I2C TREES 80
10 JTAG & BOUNDARY SCAN TEST 81
11 EQUIPMENT LIST 82
12 REFERENCES 85
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
3
Table of Figures
FIGURE 1. RANGER CHASSIS FRONT / REAR / SIDE VIEWS 11
FIGURE 2. RANGER CHASSIS CABLE BACKPLANE CARTRIDGE (CBC) 12
FIGURE 3. RANGER SYSTEM EXPLODED VIEW 13
FIGURE 4 GOLDSTONE SWITCH NODE AND SWITCHBOARD 13
FIGURE 5. GOLDSTONE SWITCHBOARD BLOCK DIAGRAM 14
FIGURE 6. GOLDSTONE SWITCHBOARD BLOCK DIAGRAM 15
FIGURE 7. GOLDSTONE NODE VS. GOLDSTONE SWITCHBOARD 16
FIGURE 8. 1-SLOT GOLDSTONE TESTER FOR MANUFACTURING TEST 16
FIGURE 9. DGX-GOLDSTONE SYSTEM INTEGRATION MFG TEST FLOW 17
FIGURE 10. DGX-GOLDSTONE BOM STRUCTURE (692-24262-0000-000) 18
FIGURE 11. GOLDSTONE SWITCHBOARD & KEYSTONE BASEBOARD TS1A BUILD PLAN 18
FIGURE 12. GOLDSTONE SWITCHBOARD FACEPLATE 19
FIGURE 13. NBU P2318 COME BLOCK DIAGRAM 22
FIGURE 14. COME DETAILED BLOCK DIAGRAM 23
FIGURE 15. COME TYPE 7 FORM-FACTOR 27
FIGURE 16. COME TYPE 7 2X220-PIN BOARD-2-BOARD CONNECTORS 28
FIGURE 17. LS10 NVLINK4 SWITCH IC EXTERNAL INTERFACES 29
FIGURE 18. GOLDSTONE SWITCHBOARD LS10 PORT ASSIGNMENT 30
FIGURE 19. TE 2344064-4 OSFP CONNECTOR ON GOLDSTONE 32
FIGURE 20. OSFP LOOPBACK DONGLE 32
FIGURE 21. GOLDSTONE SWITCHBOARD I2C TREE 35
FIGURE 22. SWITCHBOARD I2C DEVICES CONNECTED TO COME LPC2I2C INTERFACE 36
FIGURE 23. I2C DEVICES ON COME PCH SML1 36
FIGURE 24. I2C DEVICES & INTERFACES ON COME 39
FIGURE 25. USB 2.0 TYPE 2.0 PINOUT 44
FIGURE 26. GOLDSTONE SWITCHBOARD RS-232 RJ-45 UART PORT PINOUT 46
FIGURE 27. SWITCHBOARD - PDB - FANS INTERCONNECT 50
FIGURE 28. PCIE CONFIGURATION SPACE LAYOUT, 4KB 53
FIGURE 29. PCIE CAPABILITY REGISTERS (THE BASE ADDRESS IS TYPICALLY 0X70) 54
FIGURE 30. GOLDSTONE NODE PCIE DEVICES 55
FIGURE 31. PCIE LANE MARGINING AT RECEIVER REGISTERS 56
FIGURE 32. COME PCH FLEX I/O LANE MAPPING 56
FIGURE 33. SSD INTERFACE MARKET FORECAST (SOURCE: IDC, 2020.12) 60
FIGURE 34. EXAMPLE OF SMART LOG RETURNED BY NVME-CLI / NVME SMART-LOG 65
FIGURE 35. DIFFERENCES BETWEEN MACHX03D AND LCMX03D FAMILIES 66
FIGURE 36. MACHX03D FPGA CONFIGURATION PROCESS 67
FIGURE 37. MACHX03D FPGA CONFIGURATION PORTS (SYSCONFIG) 67
FIGURE 38. GOLDSTONE SWITCHBOARD IN-SYSTEM PROGRAMMING PATH 68
FIGURE 39. MACHX03D INTERNAL FLASH LAYOUT 68
FIGURE 40. MACHX03D FEATURE ROW ELEMENTS 69
FIGURE 41. MAIN AND PORT CPLDS CPU ACCESS PATHS 69
FIGURE 42. GOLDSTONE NODE'S EROT-PROTECTED AP FWS 74
FIGURE 43. EROT-PROTECTED AP-FW UPDATE FLOW WITH BMC 75
FIGURE 44. EROT/CEC1736 & EC-FW FLASH 75
FIGURE 45. GOLDSTONE SWITCHBOARD FW UPDATE PATHS 75
FIGURE 46. OSFP LOOPBACK DONGLE (2000-2250 MATING CYCLES) 82
FIGURE 47. EZDUPE M.2 NVME SSD DUPLICATOR (DM-HE0-8V07NTP) 83
FIGURE 48. PERLE 24-PORT RS-232 TERMINAL SERVER 83
FIGURE 49. M.2 NVME/SATA DUPLICATOR (PRODUPLICATOR.COM) 83
FIGURE 50. QSFP-112 PINOUT 85
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
4
FIGURE 51. QSFP-DD800 VS. QSFP112 PINOUT 86
FIGURE 52. QSFP-DD/QSFP-DD800 CONNECTOR 87
FIGURE 53. OSFP CONNECTOR & PINOUT IN GOLDSTONE 87
FIGURE 54. QSFP FROM QSFP+ TO QSFP-DD800 88
FIGURE 55. QSFP-DD MSA CONNECTOR PCB LAYOUT 89
FIGURE 56. QSFP-DD VS. OSFP 89
FIGURE 57. QSFP-DD VS. OSFP IN SIZES 90
FIGURE 58. CFP VS. QSFP VS. OSFP VS. QSFP-DD 90
FIGURE 59. QSFP-DD VS. OSFP FOR 400G 91
FIGURE 60. USB CONNECTOR PINOUT 91
FIGURE 61. SATA PINOUT 91
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
5
Table of Tables
TABLE 1 DOCUMENT REFERENCE 8
TABLE 2. FRONT PANEL TEST ITEMS 20
TABLE 3. OS VERSION 21
TABLE 4. COME B2B CONNECTOR INTERFACES 25
TABLE 5. COME TEST ITEMS 26
TABLE 6. LS10 TEST ITEMS 31
TABLE 7. OSFP PORTS NON-TRAFFIC TEST ITEMS 33
TABLE 8. GS SWITCHBOARD AND PDB I2C DEVICES TEST ITEMS 37
TABLE 9. COME & LS10 I2C DEVICES ON SWITCHBOARD & PDB 38
TABLE 10. GOLDSTONE SWITCHBOARD SENSOR TEST ITEMS 40
TABLE 11. USB TEST ITEMS 44
TABLE 12. BMC & CG1 CPU UARTS 46
TABLE 13. SUMMARY OF PDB I2C DEVICES (8-BIT I2C ADDRESS) 47
TABLE 14. SWITCHBOARD-PDB INTERCONNECT 48
TABLE 15. TEST ITEMS FOR NODE PDB AND FANS’ SIGNALS 49
TABLE 16. SYSTEM THERMAL & POWER STRESS TEST 51
TABLE 17 GOLDSTONE SWITCHBOARD PCIE DEVICES 55
TABLE 18. PCIE DEVICE GENERAL TEST ITEMS 57
TABLE 19. PCIE ENHANCEMENT TEST ITEMS 59
TABLE 20. M.2 SATA SSD TEST ITEMS (DEFAULT) 61
TABLE 21. M.2 & NVME SSD TEST ITEMS (ONLY APPLICABLE WITH M.2 NVME SSD, NOT USED IN CURRENT VERSION) 63
TABLE 22. MAIN CPLD TEST ITEMS 70
TABLE 23. FRU EEPROM TEST ITEMS 72
TABLE 24. EROT TEST ITEMS 76
TABLE 25 GOLDSTONE NODE SENSORS 78
TABLE 26. QSFP COMPARISON 88
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
6
1 Purpose
This document defines product test requirements for Ranger Goldstone Switchboard manufacturing testing & diagnostics.
2 Nomenclature
Requirements are numbered with somewhat sequential IDs. The terms “shall,” “should,” and “may” are used
interchangeably to specify requirements. The term “will” is not used. To avoid affixing a special meaning to each of these terms,
descriptors are used to prioritize 7 levels of importance, the requirement type. See below.
MANDATORY: Defines features or behaviors which must be included for product viability. These requirements must
be implemented and verified for product release.
REQUIRED: Defines features or behaviors which are intended to be included in the product design, for which both
resources and schedule are to be allocated for implementation, but which may be omitted from a particular
product release for budgetary, scheduling, technical, or commercial reasons.
DESIRED: Defines features or behaviors which are desirable, but not necessary for product use, maintenance,
marketing, or manufacture, and for which no significant resources or schedule allocation are intended during
product development; these requirements are sometimes described as “optional” or “nice-to-have”.
PERFORMANCE: Defines characteristics for which no narrow boundary exists between acceptable and
unacceptable behavior; this type of requirement defines characteristics that are sometimes referred to as
“goals”. These requirements indicate a target performance level and should be treated as aspirational, rather
than as prescriptive. In general, PERFORMANCE indicates that improved performance equates to improved
marketability or usability, and the implicit design requirement is “best possible performance within reasonable
bounds for budget and schedule”. Performance characteristics must be characterized and reported as part of
verification and any shortfalls with respect to the specification must be reviewed and approved prior to product
release.
EXPECTED RESULTS: Defines what the test outcome should be or what the operator needs to look for.
LIMITS: Defines the test limits if any.
GUIDANCE: Defines information that is provided only for clarification, to provide context or justification for other
requirements, as guidance to the system developer, as guidance to the project manager, or to indicate planned
design or implementation strategies; this information is not to be treated as a verifiable requirement. This
designator is provided to ensure there is no ambiguity when guidance statements appear interspersed with
requirements or appear in a form that might be misinterpreted as a verifiable requirement.
3 Document structure
The requirements are structured in sections and presented in tables. Sections can be organized by category (inventory,
functional, stress, monitor), by components, functional area, or any other organization that is natural to the product. The suggest
requirement enumeration/naming is to use three capital letters representing the section, followed by 3 digits. This will create
separate enumerated lists of requirements making it easy manage and allows for addition and deletion requirements as the
understanding of the product increases. The number sequence can be strictly sequential or can have gaps, allowing for insertion
of new requirements into the sequence. The goal is to name requirements and not to maintain an ordered sequence of
requirements.
The body of the requirements shall have a “Name:”, which is a short name used for the requirement, followed by one or more
<Requirement Type> stating the requirement. See the following sections for further guidance and examples.
4 Reference Documents
Table 1 Document Reference
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
7
Docu Document Document Link
# Description
02
03
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
8
5 Definitions
For the purposes of this document, the following definitions apply:
FCT-B Functional Test at Bench / Assembly Line FCT-R Functional testing at Rack
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
9
Priority Description
Must have - P0 Critical for the release and delivery target date.
The release must be delayed if any of the requirements marked P0 are missing or
incomplete.
Should have - P1 Important but not necessary for the delivery target date.
While these requirements can be equally important as P0, they are often not as time-critical
or may be satisfied through other means so they can be postponed to a later time.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
10
6 Ranger System & Goldstone
Switch Board Overview
Ranger Systems may mean many things – Ranger Cluster or Ranger POD, Ranger Rack and Ranger Chassis:
✔ Ranger Chassis: a 15-OU height chassis with 8x Keystone Compute Nodes (Compute Node) and 3x Goldstone NVLink4 Switch
nodes (Switch Node) that are interconnected with a cable backplane (called Cable Backplane Cartridge – CBC). There is no
dedicated management node and Power Supply Unit (PSU) in the chassis. Only 54VDC is delivered to the Compute nodes and
Switch Nodes via a Bus Bar from the PSU Node external to the Ranger Chassis.
✔ Ranger Rack: A Ranger Rack is a 21” wide OCP-Rack with 1-2 Ranger Chassis, PSU Node, TOR MGNT & In-Band Ethernet / IB
Switches; a Bus bar is used to connect the Ranger Chassis with its PSU node.
✔ Ranger Cluster / POD: A system with multiple Ranger Racks interconnected with Kong NVLink4 Switches, Ethernet / IB
Switches for GPU Scale-up applications.
✔ Cable Backplane Cartridge: a Backplane that connects the NVLink4 links between the Compute Nodes and the Switch nodes
within the Ranger Chassis, and is formed with cables (Not PCB traces). There is no interconnect between the Switch nodes.
The Strada Whisper Absolute backplane connectors from TE are used for Ranger Backplane NVLink4 interconnect:
✔ Goldstone Switchboard Node: 8-Pair x 12 connector (TE P/N 2416358) is used, and total 192 differential pairs (96 lanes)
with 2pcs connectors, for 12 lanes (6 NVLink4 links / ports) per Keystone Compute node.
✔ Keystone Compute Node: 4-Pair x 9 connector (TE P/N 2416357) is used, and 72 differential pairs (36 lanes) with 2 pcs
connectors, for NVLink4 connections with the Goldstone Nodes.
The Goldstone Node provides NVLink4 protocol switching between the Keystone Compute nodes for Scale-up GPU accelerating
computing such as Ultra Large AI models with Model Parallelism (rather than data parallelism with Scale-Out GPU cluster interconnected
with IB or RDMA Ethernet switching). For these Ultra Large AI models, they are so large and not able to fit into a single node’s GPU
HBM2E/HBM3 memory.
The 699- P/N of Goldstone Switchboard is 699-24262-0000-000, and the P/N for the Switchboard PCBA to be built and tested
at FXN SJ is 692-24262-0000-000.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
11
Please refer to Figure 4 Goldstone Switch Node and Switchboard for Goldstone Switch node layout and Figure 5. Goldstone
Switchboard Block Diagram for schematics block diagram.
Each of the LS10 provides 64 NVLink4 links/ports with 2 lanes per link/port. In Goldstone design, only 56 out of the 64 ports are
used: 24 links/ports for NVLink4 interconnects via Backplane and 32 links/ports via 8x OSFP connectors on the front panel.
✔ Total 112 NVLink4 Ports per Goldstone Switchboard for both OSFP and Backplane connectors
✔ 64 NVLink4 ports via OSFP for Interconnect to Kong Switches
✔ 56 NVLink4 ports via Backplane connectors for intra-chassis interconnects with 8x Keystone compute nodes.
The Goldstone Switchboard itself is an orderable product for OEM parts, and the BOM structure is shown in
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
12
Figure 4 Goldstone Switch Node and Switchboard
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
13
Figure 6. Goldstone Switchboard Block Diagram
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
14
7 Test Requirements
The NVBUGS for Goldstone/DGX-Goldstone MODS Requirements is Bug 3902273.
These requirements encompass the Ranger Goldstone Switchboard test requirements. The overall Goldstone Switchboard
integration manufacturing test flow is shown in Figure 9. Please note that NVDA is only responsible for the manufacturing &
testing the Goldstone Switch Board, not the whole Goldstone Node, thus the following modules / components are Golden parts
for Goldstone Switchboard manufacturing at CM (FXN SJ):
● Mechanical tray with opening for JTAG header access for Boundary Scan testing (BSI)
● PDB: power distribution board, from ZT
● Cable set: cables for Goldstone Node internal connections: Power cable, Fan cables, etc.
● OSFP Loopback Dongles: total 16pcs are required for board and are recycled >> 96pcs per 2-chassis Rack.
● 1-Slot Goldstone Tester: 1-slot Goldstone Test chassis with a 3KW PSU, six cooling fans and 2x loopback cables for
Goldstone Switchboard backplane interface, as shown in Figure 8.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
15
Figure 9. DGX-Goldstone System Integration MFG Test Flow
The DGX-Goldstone BOM structure is shown in Figure 10, indicating Dragon Chassis (w/ PSUs) is a DOM subsystem.
In case of DGX-Goldstone is not built at factory, the flow shown in Figure 9 could be executed at the lab or data center
where the DGX-Goldstone integration is done and the Runin and Power cycling tests could be waived.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
16
Figure 11. Goldstone Switchboard & Keystone Baseboard TS1A Build Plan
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
17
7.1 Front Panel LED/Button Check
The Goldstone Switchboard/Node Faceplate is shown in Figure 12, and there are following items on the faceplate:
● One RJ45 for COMe 1000Base_T LAN Port: mounted on Switchboard PCB
● One RJ45 for COMe UART RS232 Port: mounted on Switchboard PCB
● One USB 2.0 Type A Port: Mounted on Switchboard PCB
● 16x OSFP Ports: Mounted on Switchboard PCB, See Sections 7.4 and 7.5 for details.
The following items are also on front panel but are mounted on a Front Panel PCB that is connected to the Switchboard
PCBA via a cable:
● One Power Push Button & LED: Mounted on Front Panel PCB and connected to Switchboard via a cable
● One UID Push Button & LED: Mounted on Front Panel PCB and connected to Switchboard via a cable
● One Reset Push Button: Mounted on Front Panel PCB and connected to Switchboard via a cable
● One System Status LED: Mounted on Front Panel PCB and connected to Switchboard via a cable
● One System Fault LED: Mounted on Front Panel PCB and connected to Switchboard via a cable
● Two NVLink Status LEDs: Mounted on Front Panel PCB and connected to Switchboard via a cable
In addition, a 7-Segement Display is on front panel via a cable to the Switchboard PCBA.
These items and the cable are golden parts, but to test related circuits and connections on the Switchboard, the
following tests are required.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
18
Table 2. Front Panel Test Items
Test
Priori Nvbug Production
Req-# Requirement Meth
ty tracking Stage
od
GOLDSTO Name: UID Button & UID LED Connections Manu P1 N/A - One FCT-B
NE-FP001 Location: Power button & LED at Right Front Panel. al EVT Diag cannot
Mandatory: Shall check UID Button and UID LED signal path implement
connections. manual checks
Expected Result: Default status - blue color off. Identify server
location – Blue color flashing 1Hz.
Guidance: Manual Check. Controlled by Main CPLD
GOLDSTO Name: System Status LED Connection Manu P1 N/A - One FCT-B
NE-FP003 Location: LED at Right Front Panel. al EVT Diag cannot
Mandatory: Shall check System Status LED signal path connection. implement
Expected Result: System Status LED should be ON/OFF as specified manual checks
by Main CPLD Spec.
Guidance: Manual Check, driven by main CPLD
GOLDSTO Name: System Fault LED Connection Manu P1 N/A - One FCT-B
NE-FP004 Location: LED at Right Front Panel. al EVT Diag cannot
Mandatory: Shall check System fault LED signal path connection. implement
Expected Result: System Fault LED should be ON/OFF as specified by manual checks
Main CPLD Spec.
Guidance: Manual Check. Driven by main CPLD
GOLDSTO Name: NVLink LED Connection (2x LEDs) Manu P1 N/A - One FCT-B
NE-FP005 Location: LED at Right Front Panel. al EVT Diag cannot
Mandatory: Shall check NVLink LED signal path connections. implement
Expected Result: The NVLink LEDs should be ON/OFF as specified by manual checks
Main CPLD Spec.
Guidance: Manual Check. Driven by Port CPLD
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
19
Mandatory: Shall check 7-Segment Display Signal path connections. implement
Expected Result: The 7-Segment Display should work as specified by manual checks
Main CPLD Spec.
Guidance: Manual Check. Driven by Main CPLD
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
20
7.2 OS
Table 3. OS Version
Test
Priorit Nvbug Production
Req-# Requirement Metho
y tracking Stage
d
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
21
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
22
7.3 COMe
Goldstone Switchboard uses a COMe module as the control and management CPU, and the COMe used is developed by
NBU with the following features:
● Shared folder: CFL, E3684
● Product Code: P2318
● Form-Factor: COMe Type 7, Basic form-factor of 95x125mm
● CPU: Intel Coffee Lake H (CFL-H), I3-81000H, 4-Core, 45W TDP. ECC DRAM is supported.
● Memory: up to 2x 260-Pin ECC SO-DIMM is supported and only 1pcs 8GB SO-DIMM DDR4-2666 is used.
● Connector Type: TE 3-1827231-6 5mm, 2X220 pin 0.5pitch (the COM-Express Type 6 / 7 standard
connector), as one connector as shown in Figure 16. COMe Type 7 2x220-Pin Board-2-Board Connectors.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
23
Figure 14. COMe Detailed Block Diagram
For Goldstone Switchboard, COMe is a tested Finished Goods (FG), no need to run a full functional test during the
Goldstone Switchboard manufacturing test; but a sanity test is required to catch component damage(s) and defect(s) during
transportation and storage, as well as for test logs.
As shown in Figure 13. NBU P2318 COMe Block Diagram, the COMe board has the following features:
• CPU: I3-8100H, 4 cores, 6MB cache, 3.0GHz base frequency, 45W TDP
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
24
• PCH: Intel FH82CM246 PLATFORM CONTROLLER HUB (PCH)
• DDR Configuration:
✔ 2 channels, One SO-DIMM per channel.
✔ Each channel: up to 2666MTs, 16GB, DDR4, 1.2V, ECC SO-DIMM
• 256Mb SPI flash for BIOS code (Known Good Image)
• TPM2.0 (SPI bus)
• Die Temperature monitoring mechanism from carrier
• Real Time clock mechanism based on external feed
• Com Express Module VPD/FRU EEPROM
• Voltage monitoring with A2D.
• Interrupt controller (In CPLD)
The COMe 2x220-Pin B2B connector provides the interfaces to Goldstone Switchboard as shown in Table 4.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
25
Table 4. COMe B2B Connector Interfaces
B2B Interface Device on COMe Device on Goldstone Switch Board
PCIe Gen3 X16: Lanes PEG[31:16] CPU: Intel I3-8100H PEG[23:16] & PEG[31:28] not used; PEG[27:24]
for M.2 NVMe SSD (X4). See M.2 NVMe SSD
Section
PCIe Gen3 X15: Lanes PCIe[15:8][6:0] PCH: Intel FH82CM246 PCIe[15:8] & [6:4] not used, PCIe[1:0] for LS1,
PCIe[3:2] for LS2,
1x optional PCIe clock PCH: Intel FH82CM246 As Recovery clock, but NC on COMe
1GbE MDI: MDI[3:0]P/N; ACT, LINK, Intel I219 1000Base-T Ethernet PHY J101 RJ45 w/ transformer. ACT, LINK, LINK100,
LINK100, LINK1000, CTREF (not used) LINK1000 to Main CPLD and Main CPLD to drive
RJ45 ACT & LINK LEDs
2x SATA 3.0 PCH:SATA Port 0 & SATA Port 1 Only SATA Port 0 is used for M.2 SATA SSD
4x USB 2.0 PCH: USB 2.0 Ports 1-4, (total 14 for PCH) Only USB 2.0 Port 1 is used in Goldstone for
Front Panel USB 2.0 Port. See USB Section.
1x LPC bus PCH: LPC, 9 signals Main CPLD internal registers for COMe
I2C Master from CPLD (LPC2I2C) CPLD: I2C-CPLD-B2B-SCL/SDA B2B: I2C-B2B-SCL/SDA >> I2C-SW-SCL/SDA >>
(I2C-B2B-SCL/SDA) U66/U67 >> Many I2C devices on Switchboard
Optional Carrier BMC I2C bus on SMBus PCH: N/A as no BMC on Switchboard
pins
SPI0 for Programming MUX with PCH to program SBIOS Flash: Connector J60 for using of SPI programmer
U23 > U53 (MT25QL256)
Optional GSPI0 for future usage PCH: GSPI0, 4 bits: CEX_EROT_OOB_* LS10 CEC1736 QSPI1 for EC-FW * AP-FW update,
selected w/ 1-to-2 MUX controlled by Main
CPLD
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
26
4 GPI and 4 GPO (CPLD Field Upgrade CPLD GPIO_JTAG, SW drives GPIOs for GS Switchboard Main CPLD and Port CPLD JTAG
by SW based on GPO/GPI pins) JTAG to update GS CPLD FWs interface
COM Express spec Misc. signals 1. TYPE2-0: COMe Type 1. Tied to GND on Switchboard
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
27
Table 5. COMe Test Items
GOLDSTO Name: COMe Reset by Reset Button Manual P1 N/A - One FCT-B
NE-COME Location: Reset Button on front panel. EVT Diag cannot
000 Mandatory: Shall check COME reset button function. implement
Expected Result: COME reset after button is pressed. manual
Guidance: Front panel S6, Manual Check. As Part of front panel checks
Reset Button test
GOLDSTO Name: COME CPU SO-DIMM Quantity & Capacity Auto P1 Same bug # FCT-R
NE-COME Mandatory: Shall check COME SO-DIMM Quantity & Capacity EVT as
003 Guidance: GOLDSTONE-
COME001
GOLDSTO Name: COME LAN Port MAC address Auto P1 Same bug # FCT-B
NE-COME Mandatory: Shall collect & Report COME LAN Port MAC addresses. EVT as
005 Guidance: COMe U9, I219-AT GOLDSTONE-
COME001
GOLDSTO Name: COME LAN Port Link Speed Check Auto P1 Kong-like FCT-B
NE-COME Mandatory: Shall check link speed of COME LAN Port EVT coverage
006 Guidance: COME U9, I219-AT
GOLDSTO Name: COME LAN Port Link Up LED Manual P1 N/A - One FCT-B
NE-COME Location: please refer to Figure 12 for location EVT Diag cannot
007 Mandatory: Shall check LED status of COMe LAN port. implement
Expected Result: The Link Up LED should be on/ GREEN when manual
connected to a TOR or Ethernet Switch checks
Guidance: Manual Check
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
28
GOLDSTO Name: COME LAN Port Activity LED Manual P1 N/A - One FCT-B
NE-COME Location: please refer to for location EVT Diag cannot
008 Mandatory: Shall check LED status of COMe LAN port. implement
Expected Result: The Link Up LED should be blinking when manual
connected to a TOR or Ethernet Switch checks
Guidance: Manual Check
GOLDSTO Name: COME LAN PORT MISC Signals Checking Auto P1 FA FCT-R
NE-COME Mandatory: Shall check the ACT, LINK, LINK100 & LINK1000 Signals enhancemen
010 from COMe to Switchboard Main CPLD. t, lower
Expected Result: With LAN Port connected to an external Working priority
Switch, ACT, LINK and LINK1000 should be asserted, LINK100 request
deasserted.
Guidance: Please see Switchboard Main CPLD Register Definition Jinshui L…
to provide
what cable is
used for
manufacturi
ng test to
know which
signal to
check
GOLDSTO Name: COME-Main CPLD SYNC Bus Checking Auto P1 Jinshui L… FCT-R
NE-COME Mandatory: Shall check COMe – Switchboard Main CPLD SYNC Bus EVT to clarify
012 works successfully and reliably. requirement
Method: Please see COMe CPLD and Switchboard Main CPLD Specs s before
Guidance: B2B Connector Pins A15, A18, A24, B18 filing bug -
One DIag
team needs
more details
on how
exactly yo
test SYNC
bus and
assess effort
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
29
GOLDSTO Name: Switchboard 3V RTC Power Checking Auto P1 Jinshui L… FCT-R
NE-COME Mandatory: Shall check the 3V RTC Power from Switchboard is EVT to file and
013 Present. post "DGX
Method: Please see COMe CPLD Spec Diag Tools"
Guidance: B2B Connector Pin A47 bug
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
30
Figure 16. COMe Type 7 2x220-Pin Board-2-Board Connectors
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
31
7.4 LS10 NVLink4 Switch IC
Goldstone Switchboard uses 2pcs LS10 NVLink Switch ICs to implement NVLink4 protocol switching between GPUs.
Each LS10 device has 64 NVLink4 Ports with each port has 2 lanes (2x TX SerDes + 2x RX SerDes) of 50GT/s PAM4
signaling rate for 100Gbps data rate per SerDes, thus 200Gb/s Bandwidth for each TX and RX direction, i.e., 25GB/s for TX &
25GB/s for RX.
With 64 ports per LS10, the throughput per chip is 12.8Tb/s x 2 (TX+RX) = 25.6Tb/s.
LS10 provides a 2-lane PCIe Gen3 EP to host/CPU (COMe in Goldstone) for control & management via 64MB Memory
space. In addition, an I2C Slave (I2CS) is available for OOB communication to LS10 internal security processor.
LS10 uses a 4-pin SPI interface for LS10 ROM/FW Flash and in Goldstone Switch design, LS10 accesses the ROM/FW
Flash via EROT CEC1736 device for secure boot.
For more details of LS10 interfaces, please refer to Figure 17. LS10 NVLink4 Switch IC External Interfaces.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
32
Figure 18. Goldstone Switchboard LS10 Port Assignment
Goldstone Switchboard uses 2pcs LS10 NVLInk4 Switch devices to provide total 128 NVLink4 Ports as shown in Figure
18. Goldstone Switchboard LS10 Port Assignment. As shown, for each LS10, 56 out of the 64 ports are used, 8 ports are
not used; out of the 56 ports used, 24 ports are connected to an 8-Pair x 12 backplane connector from TE and 32 ports are 8
OSFP connectors.
Very limited information is available for LS10 about its testing capabilities, and the following is the list:
✔ LS10 supports internal near-end digital loopback for each Port
✔ LS10 supports external PHY level loopback with TX connected to RX on the same port
✔ LS10 supports traffic generator per port, (thus we could run loopback test on all ports simultaneously
for stress testing).
For Goldstone Switchboard product manufacturing testing, OSFP loopback dongles and Backplane loopback cables are
used to connect each link’s TX to its RX.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
33
Table 6. LS10 Test Items
GOLDSTO Name: LS10 PCIe ID & Version Check Auto P0 Same bug FCT-R
NE-LS000 Mandatory: Shall check each of the 2pcs LS10 Devices’ PCIe ID and TS1B as
Version. GOLDSTON
Guidance: LS10 PCIe Configuration Space E-COME001
GOLDSTO Name: LS10 PCIe I/F width & speed check Auto P0 Same as FCT-R
NE-LS001 Mandatory: Shall check each LS10’s PCIe I/F width & speed TS1B GOLDSTON
Guidance: LS10 PCIe Gen3, X2 E-COME001
GOLDSTO Name: LS10 Memory Address Space Read/Write access check Auto P1 Indirectly FCT-R
NE-LS002 Mandatory: Shall check the read/write accesses to each LS10’s EVT covered by
Scratch memory/register MODS
Guidance: Each LS10 presents 64MB memory space to Host/COMe initialization
GOLDSTO Name: LS10 NVLink4 Port Internal loopback test Auto P0 Kong-like FCT-R
NE-LS005 Required: Shall perform internal loopback test without any error for TS1B coverage
all used NVLink4 Ports.
Guidance: Please refer to Figure 18. Goldstone Switchboard LS10
Port Assignment for used ports on each LS10
GOLDSTO Name: LS10 NVLink4 Port OPT loopback test Auto P0 Kong-like FCT-R
NE-LS006 Required: Shall perform external loopback test without any error for TS1B coverage
all used NVLink4 Ports using OSFP loopback dongles / backplane
loopback.
Guidance: Please refer to Figure 18. Goldstone Switchboard LS10
Port Assignment for used ports on each LS10
GOLDSTO Name: LS10 NVLink4 Port EOM check w/ loopback Auto P0 Kong-like FCT-R
NE-LS007 Required: Shall check & report NVLink4 port EOM for all used TS1B coverage
NVLink4 Ports using OSFP loopback dongles / backplane loopback.
Guidance: Please refer to Figure 18. Goldstone Switchboard LS10
Port Assignment for used ports on each LS10
GOLDSTO Name: LS10 NVLink4 Port Loopback on all used ports simultaneously Auto P1 Kong-like Runin Test /
NE-LS008 Required: Shall run max traffic test on all used LS10 ports with EVT coverage stress test
external loopback simultaneously for 30 minutes to stress SI, PI,
power and thermal, and test should finish within specific error limit Note
(BER < 1E-13). limitation in
Guidance: Please refer to Figure 18. Goldstone Switchboard LS10 TREX - not
Port Assignment for used ports on each LS10 all ports can
be stressed
at the same
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
34
time as
some would
need to be
dedicated
as
generator
GOLDSTO Name: LS10 PCIe Reset Test Auto P1 Same as Final Test
NE-LS008 Required: Shall check if each LS10 device’s PCIe Reset (PEX-RST) is EVT GOLDSTON
working properly during the final test. E-PCI001
Guidance: Set LS10’s PEX-RST to LOW (via U80/PCA9505) and check
LS10’s PCIe Configuration Space Command Register Bits[2:0] = 0b000 Same as
SBR test
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
35
Figure 20. OSFP Loopback Dongle
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
36
7.5 OSFP Ports
During the Switchboard MFG Testing, all these OSFP ports are populated with an OSFP loopback dongle for NVLink connectivity
and traffic testing, please refer to Section 7.4 for details. This section lists the OSFP ports’ other items to be tested during manufacturing
testing:
● +3.3V to OSFP: indirectly test
● GND to OSFP: indirectly test
● I2C: SCL and SDA, described here.
● INT_RST_N: Active Low Interrupt from OSFP and Active Low Reset to OSFP
● PRN_LPW_N: Active Low Present from OSFP and Active Low Low-Power mode to OSFP
● OSFP_POWER_GOOD: OSFP Port Load Power Switch MP5087 Status, Active High, not from OSFP Port
● OSFP_POWER_ENABLE: OSFP Port Load Power Switch MP5087 Enable Signal, Active High.
OSFP transceivers adopt the “Common Management Interface Specification (CMIS)” for module management with I2C interface,
and the I2C address used is 0xA0 (8b) , please refer to CMIS for details.
As shown in Figure 22, OSFP modules’ I2C management interfaces could be accessed by LS10 via its I2CB (LS1 for OSFP Ports 1-8
and LS2 for OSFP Ports 9-16), or by COMe CPU via COMe CPU > PCH > LPC > COMe CPLD > I2C_B2B/I2C_SW interface; and the default is
during normal operations LS10 will access and manage the OSFP modules.
Table 7. OSFP Ports Non-traffic test items
Test Priorit Nvbug Bug # &
Req-# Requirement
Method y tracking TS
GOLDSTO Name: OSFP Ports’ I2C Access (16x OSFP Ports) Auto P0 Jinshui … FCT-R
NE-OSFP01 Mandatory: Shall check each OSFP Port’s I2C SCL/SDA signals EVT to file and
by reading OSFP Loopback Dongle’s internal FRU EEPROM post bug #
Guidance: Reading OSFP FRU EEPROM, not need to write
GOLDSTO Name: OSFP Ports’ INT/RST-N Signals (16x OSFP Ports) Auto P1 FA FCT-R
NE-OSFP02 Mandatory: Shall check each OSFP Port’s INT/RST-N signal EVT enhancemen
Method: Port CPLD to toggle RST-N and read INT-N status, t
should be the same. See OSFP Spec & Port CPLD Spec for more
details
Guidance: Controlled by Port CPLD
GOLDSTO Name: OSFP Ports’ PRN-LPW-N Signals (16x OSFP Ports) Auto P1 FA FCT-R
NE-OSFP03 Mandatory: Shall check each OSFP Port’s PRN-LPW-N signal EVT enhancemen
Method: Port CPLD to toggle LPW-N and read PRN-N status, t
should be the same. Need to check OSFP Spec & Port CPLD Spec
for more details
Guidance: Controlled by Port CPLD
GOLDSTO Name: OSFP Ports’ Green LED1 (16x OSFP Ports) Manual P1 N/A - One FCT-B
NE-OSFP04 Mandatory: Shall check each OSFP Port’s Green LED1 by EVT Diag cannot
toggling Port CPLD related Register bit and check the LED status implement
Guidance: Controlled by Port CPLD manual
checks
GOLDSTO Name: OSFP Ports’ Green LED2 (16x OSFP Ports) Manual P1 N/A - One FCT-B
NE-OSFP05 Mandatory: Shall check each OSFP Port’s Green LED2 by EVT Diag cannot
toggling Port CPLD related Register bit and check the LED status implement
Guidance: Controlled by Port CPLD manual
checks
GOLDSTO Name: OSFP Ports’ Power Good Signal Status Auto P1 FA FCT-R
NE-OSFP06 Mandatory: Shall check each OSFP Port’s Power Good Signal EVT enhancemen
Method: Toggle an OSFP port’s Power enable bit and check t
Power Good signal status
Guidance: Port CPLD Power Enable and Good Register
GOLDSTO Name: OSFP Ports’ Power Enable Signal Auto P1 FA FCT-R
NE-OSFP07 Mandatory: Shall check each OSFP Port’s Power Enable Signal EVT enhancemen
t
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
37
Method: Toggle an OSFP port’s Power enable bit and check
Power Good signal status and read OSFP Port FRU EEPROM.
When Enable the Power Good signal should be high and able to
read FRU EEPROM; When Disable the Power Good signal should
be low and not able to read FRU EEPROM.
Guidance: Port CPLD Power Enable and Good Register
GOLDSTO Name: OSFP Ports’ Temperature Auto P1 Jinshui … FCT-R
NE-OSFP08 Mandatory: Shall check each OSFP Port’s Temperature EVT to file and
Method: Read OSFP modules’ Temperature sensor value via its post bug #
I2C. for OSFP
Expected Result: All OSFP modules’ temperature reading should telemetry
be below 70°C.
Guidance: OSFP Module’s CMIS Spec
GOLDSTO Name: OSFP Ports’ Power Voltage Auto P1 Same as FCT-R
NE-OSFP09 Mandatory: Shall check each OSFP Port’s Power Voltage EVT GOLDSTO
Method: Read OSFP modules’ power voltage sensor value via its NE-OSFP0
I2C. 8 - include
Expected Result: All OSFP modules’ voltage reading should be Voltage
within spec (nominal 3.3V).
Guidance: OSFP Module’s CMIS Spec
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
38
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
39
7.6 COMe & LS10 I2C Tree
Please refer to Figure 21 for Goldstone I2C tree and Table 9 for detailed information about I2C devices connected to the
COMe. As shown, these are the related interfaces between COMe and Goldstone Switchboard:
● I2C_B2B_SCL/SDA: Main I2C Interface from COMe CPLD LPC2I2C for majority I2C devices, and it is accessed via COMe PCH
LPC interface.
● LPC: from COMe PCH to Switchboard Main CPLD
● EROT_ATTEST_I2C: not used as EROT_ATTEST_I2C on Switchboard is connected to I2C_B2B_SCL/SDA
● SMB_SCL/SDA: I2C_TESTING_GPIO_SCL/SDA on Switchboard, from COMe PCH SML1
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
40
Figure 23. I2C Devices on COMe PCH SML1
Table 8. GS Switchboard and PDB I2C Devices Test Items
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
41
Expected Result: The I2C devices found match the I2C Device list on
Table 9
Guidance: See Table 9 for I2C devices
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
42
Table 9. COMe & LS10 I2C Devices on Switchboard & PDB
U97: CEC1736 I2C06 Attest Interface for LS10-1 (G1_A) See CEC1736 Spec
U124: CEC1736 I2C06 Attest interface for LS10-2 (G1_B) See CEC1736 Spec
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
43
PU10: MP8880 LDO P12V_STBY 0x2C
COMe: PCH > U77: Main CPLD, See Main CPLD Spec
SML1 > B2B >
I2C_TESTING GS
U80: PCA9505 40 Pins GPIO Device 0x42
_GPIO_* Switchboard
U1_CP1: Port CPLD LS1 I2CA for OSFP Ports 1-8 See Port CPLD Spec
control & Status
LS1 (LS10-A) U5: PCA9847, G1_A LS1 I2CB fanout buffer 0xE2: select which
I2CB port
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
44
U1_CP1: Port CPLD LS2 I2CA for Ports 9-16 control & See Port CPLD Spec
status
GS
LS2 (G1_B) U5_36: MP2975 for LS2 VDD Switchboard 0xC4
I2CA
U4_36: MP2975 for LS2 DVDD & HVDD 0xCA
U7: PCA9847 I2C Fanout buffer for LS2 I2CB 0xE2: select which
port
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
45
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
46
7.7 Goldstone Switchboard Sensors
Please refer to Section 8, Sensor List, for sensor list. This section only covers the standalone sensors but will not cover
the built-in sensors of M.2 SSD, OSFP Transceivers / Loopback Dongles as they are covered in related sections. The sensors
on the PDB are covered here to check if any PDB error during Switchboard manufacturing testing even if PDB is a golden board
for Switchboard Manufacturing testing.
Please note that during normal operations, LS10 will access their Voltage Regulators and OSFP modules’ sensors via their
I2CA and I2CB interfaces but COMe CPU could access them under Main CPLD Register control.
GOLDSTO Name: Switchboard PDB Temperature Sensor Checking Auto P1 Same bug # FCT-R
NE-SNR00 Mandatory: Shall check the following 2x TMP75 Temperature EVT as
1 sensors on the Goldstone Switchboard PDB: U8, U29. GOLDSTON
Method: Read these temperature sensors’ internal registers for E-SNR000
current temperature values over the I2C bus.
Expected Result: All these 2x temperature sensors should return
valid temperature values that are within the normal ranges per Spec.
Guidance: See Table 9 & Table 25 for sensors’ I2C Address.
GOLDSTO Name: Switchboard PDB 54V HSC Sensor Checking Auto P0 Same bug # FCT-R
NE-SNR00 Mandatory: Shall check PDB LM5066 54V HSC (PU4) Status, Voltage EVT as
2 & Current sensors. GOLDSTON
Method: Read PU4 LM5066’s internal registers for E-SNR000
Input/Output/MOSFET Status, and sensor values for Input Voltage /
Current / Power, Output Voltage / Current, and Temperature.
Expected Result: Return valid values without any fault event and
Voltage, Current, Power and Temperature sensor values are within
the normal ranges per Spec.
Guidance: See Table 9 & Table 25 for sensors’ I2C Address, and
LM5066 data sheet for internal register description.
GOLDSTO Name: Switchboard PDB P12V DC-DC Converter Sensor Checking Auto P0 Same bug # FCT-R
NE-SNR00 Mandatory: Shall check PDB Switchboard P12V PU7 & PU8 DC-DC EVT as
3 Converters’ Status, Voltage & Current sensors. GOLDSTON
Method: Read PU7 & PU8 U50SU4P180’s internal registers for E-SNR000
Input/Output Status, and sensor values for Input Voltage / Current,
Output Voltage / Current, and Temperature.
Expected Result: Return valid values without any fault event and
Voltage, Current, Power and Temperature sensor values are within
the normal ranges per Spec.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
47
Guidance: See Table 9 & Table 25 for sensors’ I2C Address, and
U50SU4P180 data sheet for internal register description.
GOLDSTO Name: Switchboard PDB 12V_STBY DC-DC Converter Sensor Auto P0 Same bug # FCT-R
NE-SNR00 Checking EVT as
4 Mandatory: Shall check PDB 12V_STBY MP8880 DC-DC Converter GOLDSTON
(PU10, PU11) Status, Voltage & Current sensor values. E-SNR000
Method: Read PU10 & PU11 MP8880’s internal registers for
Input/Output Status, and sensor values for Input Voltage / Current,
Output Voltage / Current, and Temperature.
Expected Result: Return valid values without any fault event and
Voltage, Current and Temperature sensor values are within the
normal ranges per Spec.
Guidance: See Table 9 & Table 25 for sensors’ I2C Address, and
MP8880 data sheet for internal register description.
GOLDSTO Name: Switchboard PDB 12V_FAN DC-DC Converter Sensor Checking Auto P0 Same bug # FCT-R
NE-SNR00 Mandatory: Shall check PDB 12V_FAN QS54SH12060 DC-DC EVT as
5 Converter (PU5) Status, Voltage & Current sensor values. GOLDSTON
Method: Read PU5 QS54SH12060’s internal registers for E-SNR000
Input/Output Status, and sensor values for Input Voltage / Current /
Power, Output Voltage / Current, and Temperature.
Expected Result: Return valid values without any fault event and
Voltage, Current, Power and Temperature sensor values are within
the normal ranges per Spec.
Guidance: See Table 9 & Table 25 for sensors’ I2C Address, and
QS54SH12060 data sheet for internal register description.
GOLDSTO Name: Switchboard LS1 VDD DC-DC Converter Sensor Checking Auto P0 Same bug # FCT-R
NE-SNR00 Mandatory: Shall check LS1 VDD MP2975 DC-DC Converter (U5_35) EVT as
6 Status, Voltage, Current & Power sensor values. GOLDSTON
Method: Read U5_35 MP2975’s internal registers for Input/Output E-SNR000
Status, and sensor values for Input Voltage / Current / Power, Output
Voltage / Current / Power, and Temperature.
Expected Result: Return valid values without any fault event and
Voltage, Current, Power and Temperature sensor values are within
the normal ranges per Spec.
Guidance: See Table 9 & Table 25 for sensors’ I2C Address, and
MP2975 data sheet for internal register description. Only
PWM1-PWM4 are used.
GOLDSTO Name: Switchboard LS1 DVDD & HVDD DC-DC Converter Sensor Auto P0 Same bug # FCT-R
NE-SNR00 Checking EVT as
7 Mandatory: Shall check LS1 DVDD & HVDD MP2975 DC-DC GOLDSTON
Converter (U4_35) Status, Voltage, Current & Power sensor values. E-SNR000
Method: Read U4_35 MP2975’s internal registers for Input/Output
Status, and sensor values for Input Voltage / Current / Power, Output
Voltage / Current / Power, and Temperature.
Expected Result: Return valid values without any fault event and
Voltage, Current, Power and Temperature sensor values are within
the normal ranges per Spec.
Guidance: See Table 9 & Table 25 for sensors’ I2C Address, and
MP2975 data sheet for internal register description. PM2975’s
PWM1-PWM4 for DVDD & PWM6-PWM8 for HVDD
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
48
GOLDSTO Name: Switchboard LS2 VDD DC-DC Converter Sensor Checking Auto P0 Same bug # FCT-R
NE-SNR00 Mandatory: Shall check LS2 VDD MP2975 DC-DC Converter (U5_36) EVT as
8 Status, Voltage, Current & Power sensor values. GOLDSTON
Method: Read U5_36 MP2975’s internal registers for Input/Output E-SNR000
Status, and sensor values for Input Voltage / Current / Power, Output
Voltage / Current / Power, and Temperature.
Expected Result: Return valid values without any fault event and
Voltage, Current, Power and Temperature sensor values are within
the normal ranges per Spec.
Guidance: See Table 9 & Table 25 for sensors’ I2C Address, and
MP2975 data sheet for internal register description. Only
PWM1-PWM4 are used
GOLDSTO Name: Switchboard LS2 DVDD & HVDD DC-DC Converter Sensor Auto P0 Same bug # FCT-R
NE-SNR00 Checking EVT as
9 Mandatory: Shall check LS2 DVDD & HVDD MP2975 DC-DC GOLDSTON
Converter (U4_36) Status, Voltage, Current & Power sensor values. E-SNR000
Method: Read U4_36 MP2975’s internal registers for Input/Output
Status, and sensor values for Input Voltage / Current / Power, Output
Voltage / Current / Power, and Temperature.
Expected Result: Return valid values without any fault event and
Voltage, Current, Power and Temperature sensor values are within
the normal ranges per Spec.
Guidance: See Table 9 & Table 25 for sensors’ I2C Address, and
MP2975 data sheet for internal register description. PM2975’s
PWM1-PWM4 for DVDD & PWM6-PWM8 for HVDD
GOLDSTO Name: Switchboard OSFP Ports 3.3V DC-DC Converter Sensor Auto P0 Same bug # FCT-R
NE-SNR01 Checking EVT as
0 Mandatory: Shall check OSFP Ports 3.3V MP2975 DC-DC Converter GOLDSTON
(U1_33) Status, Voltage, Current & Power sensor values. E-SNR000
Method: Read U1_33 MP2975’s internal registers for Input/Output
Status, and sensor values for Input Voltage / Current / Power, Output
Voltage / Current / Power, and Temperature.
Expected Result: Return valid values without any fault event and
Voltage, Current, Power and Temperature sensor values are within
the normal ranges per Spec.
Guidance: See Table 9 & Table 25 for sensors’ I2C Address, and
MP2975 data sheet for internal register description. Only PWM1,
PWM2, PWM7 & PWM8 are used.
GOLDSTO Name: COMe VCORE & VCCSA DC-DC Converter Sensor Checking Auto P1 Same bug # FCT-R
NE-SNR01 Mandatory: Shall check COMe VCORE & VCCSA MP2975 DC-DC EVT as
1 Converter (U43) Status, Voltage, Current & Power sensor values. GOLDSTON
Method: Read U43 MP2975’s internal registers for Input/Output E-SNR000
Status, and sensor values for Input Voltage / Current / Power, Output
Voltage / Current / Power, and Temperature.
Expected Result: Return valid values without any fault event and
Voltage, Current, Power and Temperature sensor values are within
the normal ranges per Spec.
Guidance: See Table 9 & Table 25 for sensors’ I2C Address, and
MP2975 data sheet for internal register description. Only PWM1,
PWM2, PWM3 & PWM8 are used.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
49
GOLDSTO Name: COMe Voltage Sensor Checking Auto P1 Same bug # FCT-R
NE-SNR01 Mandatory: Shall check COMe Voltage Sensor (U40) MAX11603’s EVT as
2 values GOLDSTON
Method: Read U40 MAX11603’s internal registers for voltage values. E-SNR000
Expected Result: Return valid values without any fault event and
Voltage values are within the normal ranges per Spec.
Guidance: See Table 9 & Table 25 for sensors’ I2C Address, and
MAX11603 data sheet for internal register description. All 8x input
channels are used.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
50
7.8 USB
There is one USB 2.0 Type A Port on Goldstone Switchboard accessible from the Goldstone Node faceplate as shown in Figure 12.
This USB 2.0 Type A Port is connected to COMe CPU USB Port 1.
The USB 2.0 Type A connector pinout on Goldstone Faceplate is shown in Figure 25. The USB 2.0 port’s current is limited with a TI
TPS25200 eFuse device; in addition, the USB 2.0 Port could be switched to I2C-B2B-SCL/SDA (to/from COMe CPLD) under software
control via a Register in Main CPLD. Manufacturing testing will not test this optional USB-I2C MUX feature.
During manufacturing test, the USB 2.0 Port will be tested with a USB 2.0 Flash Drive.
GOLDSTO Name: USB 2.0 Port Read / Write Test Auto - P1 Jinshui … FCT-R
NE-USB00 Mandatory: Shall check the USB 2.0 Port Read / Write operations Jins… EVT to confirm if
0 with a USB2.0 Flash Drive. how ? - the
Method: With a USB 2.0 Flash Drive attached to the USB 2.0 Port, USB manufacturi
perform Write and Read operations to the Flash Drive, compare the installat ng test
read back data against the data written. ion is environmen
Expected result: The read & write operations on the USB 2.0 Flash not t will be
Drive should finish normally and the readback data should match the automa able to
data written. ted install a USB
drive.
GOLDSTO Name: USB 2.0 Port Read / Write Performance Auto P1 Same as FCT-R
NE-USB00 Mandatory: Shall check the USB 2.0 Port Read / Write operations EVT GOLDSTONE
1 performance with a USB2.0 Flash Drive. -USB000
Method: With a USB 2.0 Flash Drive attached to the USB 2.0 Port,
perform Write and Read operations to the Flash Drive, and measure
the performance
Expected result: The read & write operation performance should be
75% or higher of the specification.
GOLDSTO Name: USB 2.0 Port Enable Control Auto P1 Same as FCT-R
NE-USB00 Mandatory: Shall check USB 2.0 Port Power Enable Control function. EVT GOLDSTONE
2 Method: With a USB 2.0 Flash Drive attached to the USB 2.0 Port, -USB000
disable the USB Power eFuse TPS25200 via the register of Main
CPLD, wait 20s, then read the USB Flash drive.
Expected result: The USB 2.0 Flash Drive Read operation should fail
when the eFuse is disabled.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
51
GOLDSTO Name: USB 2.0 Port eFuse Fault Signal Auto P1 FA FCT-R
NE-USB00 Mandatory: Shall check USB 2.0 Port eFuse Fault signal status. EVT enhanceme
3 Method: With a USB 2.0 Flash Drive attached to the USB 2.0 Port, nt
read the USB 2.0 Port eFuse Fault Signal status from COMe CPLD
Register.
Expected result: The eFuse Fault signal should be High, indicating no
fault (over-current / temperature / voltage) during normal
operations. Please note that not able to generate fault condition
during normal operations.
Guidance: COMe CPLD Pin A10, Signal GP_USB01_OC_B2B_L, and
PCH USB2_OC0# and USB2_OC1# Pins
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
52
7.9 COMe UART port
The Goldstone Switchboard has one RS-232 UART port with RJ45 connector accessed via front panel, please refer to Figure 12 for
location. The RJ45 pinout for this RS-232 Port is shown in Figure 26. This RS-232 UART port is connected to COMe CPU
FH82CM246-SR40E UART0:
● PCH UART0: SE_UART0_* on B2B Connector, and connected to Front panel RS-232 RJ45 connector below
● PCH UART1: SE_UART1_* on B2B Connector, not used on GS Switchboard.
During manufacturing test, this RS-232 port is connected to a Terminal Server (IOLAN STS24) for remote access over Internet or to a
local computer for COMe Console access.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
53
7.10 Goldstone Switch Node PDB and Fans
The Goldstone Switch Nodes (as well as the Keystone Compute Nodes) are powered by the 54VDC distributed through the backplane
bus bar. On each node, there is a Power Distribution Board (PDB) for converting the 54VDC to the voltages required for the node.
As shown in Figure 27, the Switchboard PDB receives the 54VDC via Bus bar clips J8 & J9, and then the 54VDC goes through a 125A
fuse before feeding a Hot-Swap Controller (HSC). The PDB is designed with multiple HSCs for supply chain considerations.
The default HSC is LM5066 from TI, and the other two HSC options are XDP710 and LTC4285 from Linear Tech. All these 3 HSCs
have an I2C interface for internal registers of status, control and voltage/current measurement.
The I2C address for these HSCs are strapped as following:
● LM5066: 0x22 (8-bit)
● XDP710: 0x22 (8-bit)
● LTC4285: 0x22 (8-bit)
The 54VDC after the HSC is P54V_HSC, which is used to generate the following voltages:
● P12V: 12VDC to Switchboard via 2pcs U50SU4P180P DC-DC connectors. The I2C addresses for the 2pcs DC-DC are:
▪ 0x26 (8-bit) for PU7
▪ 0x2E (8-bit) for PU8
● P12V_STBY: 12VDC Standby Power to Switchboard via one FAN65004 LDO, its power source is P54V_HSC and also enabled
by HSC’s Power Good output. The HSC could be enabled or disabled from Switchboard, and once the P54V_HSC is gone the
P12V_STBY is also unavailable. The design supports two LDO sources: FAN65004 and MP888. The I2C addresses for these
LDOs are:
▪ N/A: PU9, FAN65004
▪ 0x2C: PU10, MP888
▪ 0x24: PU11, MP888
● P12V_FAN: 12VDC to fans before the E-Fuse for each fan via a Q54SH12060 DC-DC converter. The I2C address for
Q54SH12060 is:
▪ 0x36: PU5, Q54SH12060
● P3V3_STBY: Generated from the P12V_STBY with a TPS563211 for the 3.3V circuits on the PDB.
There are 2pcs of connectors for interconnect between the Switchboard and the PDB with cables as shown in Table 14.
I2C-HSC-SCL/SDA
I2C Ref # I2C Device Description I2C Ref # I2C Device Description
ADDR ADDR
0x22 PU4 LM5066 HSC (co-layout XDP710, LTC4285) 0x36 PU5 Q54SH12060 DC-DC converter
0x26 PU7 U50SU4P180P DC-DC for P12V 0x2C PU10 MP888 LDO
0x2E PU8 U50SU4P180P DC-DC for P12V 0x24 PU11 MP888 LDO
I2C-Temp-SCL/SDA
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
54
0x44 U43 PCA9555 I/O Expander TBD TBD Chassis & BP FRU EEPROM
P3V3-STBY-MB Standby 3.3V from Switchboard (MB), generated from P12V-STBY on MB. Not
able to FCT automatically as it is not used on PDB
P3V3-STBY-PDB Standby 3.3V from PDB, generated from P12V-STBY on PDB. If not available,
the I2C interfaces to PDB will not work.
FAN[X]-Present-N Active Low PRESENT signal from FAN[X], check w/ Main CPLD register read
I2C-TEMP-SCL/SDA I2C interface from Switchboard Main CPLD to temperature sensors on PDB
Global-WP Active High Write Protection Signal from Switchboard Main CPLD to PDB FRU
24C02 EEPROM
GP-TEMP-SNS-Alert Active Low PDB Temperature sensor (over-temp) Alert to Switchboard Main
CPLD. There are 2pcs TMP75 temperature sensors on PDB (U8 & U29), anyone
over-Temp will drive this signal LOW. These Alert Signal status could also be
read via I2C-TEMP-SCL/SDA
GND GND
J69 – J16 FORCED-POWER-OFF From Switchboard Main CPLD, LOW to force power off / (Disable HSC) all
power supplies
HOLD-POWER-ON From Switchboard CPLD, High to force Power ON (Enable HSC), but
FORCED-POWER-OFF takes priority
TRAY-PRESNT-N Active low PDB is attached, from PDB to Switchboard Main CPLD
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
55
PS-ON-N From Switchboard Main CPLD, Active Low to enable Fans’ 12VDC, Switchboard
12VDC, each fan’s 12VDC, but not the P12V-STBY & P3V3-STBY
P12V_IBC_PGD Active High Power Good from PDB, High – both P54V to P12V DC-DC are good;
Low – one or both P54V to P12V DC-DC are bad
I2C_HSC_ALERT_N Active Low I2C Alert signal for all HSC and DC-DC converters, to Switchboard
Main CPLD
Each Goldstone Node has 6x Fans attached to Goldstone PDB, but the fans are fully controlled by Goldstone Switchboard Main CPLD
(U77, MACHX03D_9400LUT_BG400). For Goldstone Switchboard manufacturing testing, Goldstone Node PDB is a Finished Goods, but it
is still required to test the fan control signals between the Goldstone switchboard and the fans.
Each fan has the following signals to Goldstone Switchboard Main CPLD:
● FAN[X]_PWM: Output from Switchboard Main CPLD to fan for Fan’s PWM control.
● FAN[X]_TACH_FRONT: TACH signal from front FAN of each fan unit (each fan unit has dual fans) to Main CPLD.
● FAN[X]_TACH_REAR: TACH signal from rear FAN of each fan unit (each fan unit has dual fans) to Main CPLD.
● FAN[X]_PRESENT: Active low Fan present signal from fan unit to Main CPLD.
● FAN[X]_LED: Active low Fan LED from Main CPLD to Fan tray
The TACH (or Tachometer) is a square wave with 50% duty cycle and is the indication of fan’s speed, and fan’s RPM is
calculated as following (Cycles per Rotation is from fan’s specification):
RPM = (Tach Pulse Frequency / Cycles per Rotation) * 60.
The PDB generates the 12V (P12V_FAN) for the fans using a Q54SH12060RNDH DC-DC convertor from the 54V Input
from the bus bar; in addition, a NCP81295MNTXG Hot Swap Smart Fuse is used to control & regulate the P12V_FAN for
each fan, but the NCP81295 has no internal registers for software to read or write. So, the NCP81295’s function is tested
indirectly – If a given Fan is working normally then the corresponding NCP81295 circuits are good.
Table 15. Test Items for Node PDB and Fans’ Signals
GOLDSTON Name: FAN’s PWM Signal and TACH Signals (6x Fans) Auto P1 Jinshui … FCT-R
E-FAN000 Mandatory: Shall check each fan’s PWM signal from Goldstone EVT to file and
Switchboard Main CPLD post bug #
Method: During a fan’s normal operation, change its PWM duty as fan
cycle via Goldstone Switchboard’s Main CPLD internal registers, sanity
and then check FAN[X]_TACH_FRONT and FAN[X]_TACH_REAR check
signals to see if the TACH signals change accordingly.
Guidance: Goldstone Switchboard Main CPLD Registers
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
56
GOLDSTON Name: FAN’s LED Signal Manual P1 N/A - One FCT-B
E-FAN002 Mandatory: Shall check each fan’s LED Signal from Main CPLD EVT Diag cannot
Method: Toggle each fan’s LED register bit of Main CPLD and implement
manually check the corresponding LED status. manual
Guidance: Goldstone Switchboard Main CPLD Registers checks
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
57
7.11 System Thermal & Power Stress Test
The testing specified here is to test Goldstone Switchboard will work reliably with the maximum TDP of the devices under
the high DC operation ambient temperatures around 31-35°C; and this exercise also tests the power system.
Please also note that while the fans used in the Tester chassis are the same as the one used in Ranger production chassis,
the slot pitch of the Tester Chassis is 2 OUs, larger than the Ranger chassis’ 1.36 OUs, thus the Tester chassis may provide
lower or higher thermal resistance, not reflecting the Ranger Chassis’ thermal environment.
Name: Temperature Read & Log before System Thermal & Power Auto P0 Same as RIN
Stress Test EVT GOLDSTON
Mandatory: Shall check Goldstone Node temperature sensors E-SNR000
before Stress Testing
Method: Read & log all temperature sensors .
Expected result: all temperature sensors in normal range.
Guidance: Please see Section 8 for Temperature sensor list.
GOLDSTO Name: System Thermal & Power Stress Test Auto P0 Jinshui … RIN
NE-SYS001 Mandatory: Shall check Goldstone Switchboard working normally EVT to file bug
under maximum NVLink traffic at 31-35°C ambient temperature and post
Method: Run maximum NVLink4 traffic on all used ports of the bug #
2pcs LS10 devices for 20 minutes; then check the number of NVLink Note that
errors, read temperature and power sensors. One Diag
Expected result: the number of errors per port should not more cannot
than 12. (10-13 x 100G x 1200s = 12); and all sensors in normal control
range. ambient
Guidance: Please see LS10 for details of PRBS testing temperatur
e
GOLDSTO Name: System Thermal & Power Stress Test w/ Fan-1 Failure Auto P1 N/A - this RIN
NE-SYS002 Mandatory: Shall check Goldstone Switchboard working normally should be
under maximum NVLink traffic at 31-35°C ambient temperature w/ part of
Fan-1 failed. validation
Method: at the end of “GOLDSTONE-SYS001” test, turn off Fan-1 instead of
and continue running maximum NVLink4 traffic on all used ports of manufactur
the 2pcs LS10 devices for 5 minutes; then check the number of ing test
NVLink errors, read temperature and power sensors.
Expected result: the number of errors per port should not more
than 3. (10-13 x 100G x 300s = 3) ; and all sensors in normal range.
Guidance: Please see LS10 for details of PRBS testing
GOLDSTO Name: System Thermal & Power Stress Test w/ Fan-2 Failure Auto P1 N/A - this RIN
NE-SYS003 Mandatory: Shall check Goldstone Switchboard working normally should be
under maximum NVLink traffic at 31-35°C ambient temperature w/ part of
Fan-2 failed. validation
Method: at the end of “GOLDSTONE-SYS002” test, turn on Fan 1 instead of
and turn off Fan-2, continue running maximum NVLink4 traffic on manufactur
all used ports of the 2pcs LS10 devices for 5 minutes; then check ing test
the number of NVLink errors, read temperature and power sensors.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
58
Expected result: the number of errors per port should not more
than 3. (10-13 x 100G x 300s = 3) ; and all sensors in normal range.
Guidance: Please see LS10 for details of PRBS testing
GOLDSTO Name: System Thermal & Power Stress Test w/ Fan-3 Failure Auto P1 N/A - this RIN
NE-SYS004 Mandatory: Shall check Goldstone Switchboard working normally should be
under maximum NVLink traffic at 31-35°C ambient temperature w/ part of
Fan-3 failed. validation
Method: at the end of “GOLDSTONE-SYS003” test, turn on Fan-2 instead of
and turn off Fan-3, continue running maximum NVLink4 traffic on manufactur
all used ports of the 2pcs LS10 devices for 5 minutes; then check ing test
the number of NVLink errors, read all temperature and power
sensors.
Expected result: the number of errors per port should not more
than 3. (10-13 x 100G x 300s = 3) ; and all sensors in normal range.
Guidance: Please see LS10 for details of PRBS testing
GOLDSTO Name: System Thermal & Power Stress Test w/ Fan-4 Failure Auto P1 N/A - this RIN
NE-SYS005 Mandatory: Shall check Goldstone Switchboard working normally should be
under maximum NVLink traffic at 31-35°C ambient temperature w/ part of
Fan-4 failed. validation
Method: at the end of “GOLDSTONE-SYS004” test, turn on Fan-3 instead of
and turn off Fan-4, continue running maximum NVLink4 traffic on manufactur
all used ports of the 2pcs LS10 devices for 5 minutes; then check ing test
the number of NVLink errors, read all temperature and power
sensors.
Expected result: the number of errors per port should not more
than 3. (10-13 x 100G x 300s = 3) ; and all sensors in normal range.
Guidance: Please see LS10 for details of PRBS testing
GOLDSTO Name: System Thermal & Power Stress Test w/ Fan-5 Failure Auto P1 N/A - this RIN
NE-SYS006 Mandatory: Shall check Goldstone Switchboard working normally should be
under maximum NVLink traffic at 31-35°C ambient temperature w/ part of
Fan-5 failed. validation
Method: at the end of “GOLDSTONE-SYS005” test, turn on Fan-4 instead of
and turn off Fan-5, continue running maximum NVLink4 traffic on manufactur
all used ports of the 2pcs LS10 devices for 5 minutes; then check ing test
the number of NVLink errors, read all temperature and power
sensors.
Expected result: the number of errors per port should not more
than 3. (10-13 x 100G x 300s = 3) ; and all sensors in normal range.
Guidance: Please see LS10 for details of PRBS testing
GOLDSTO Name: System Thermal & Power Stress Test w/ Fan-6 Failure Auto P1 N/A - this RIN
NE-SYS007 Mandatory: Shall check Goldstone Switchboard working normally should be
under maximum NVLink traffic at 31-35°C ambient temperature w/ part of
Fan-6 failed. validation
Method: at the end of “GOLDSTONE-SYS006” test, turn on Fan 5 instead of
and turn off Fan 6, continue running maximum NVLink4 traffic on manufactur
all used ports of the 2pcs LS10 devices for 5 minutes; then check ing test
the number of NVLink errors, read all temperature and power
sensors. At the end of this test, turn on Fan-6 to run all 6pcs fans in
normal state.
Expected result: the number of errors per port should not more
than 3. (10-13 x 100G x 300s = 3) ; and all sensors in normal range.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
59
Guidance: Please see LS10 for details of PRBS testing
GOLDSTO Name: M.2 SSD Test Under System Thermal & Power Stress Testing Auto P1 Jinshui … RIN
NE-SYS008 Mandatory: Shall check Goldstone Switchboard M.2 SSD work to clarify
normally during System Thermal & Power Stress Testing. this test
Method: At the end of “GOLDSTONE-SYS007” test, perform M.2 case.
SSD read & write stress testing. Mandatory
Expected result: the M.2 SSD read/write stress testing should finish states SSD
within error threshold limit. workload
Guidance: Please see M.2 SSD for details during
GOLDSTON
E-SYS000,
but
Method
states SSD
test should
run at the
end of the
thermal
stress test.
Is this
separate
test than
GOLDSTON
E-SYS001 or
combine
SSD test in
GOLDSTON
E-SYS001?
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
60
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
61
7.12 PCIe I/O Devices
Each PCIe device/function has 4KB Configuration Space accessible with PCI Configuration Read/Write commands, and the
first 256 bytes are compatible with PCI Base specification.
The host (CPU) could test if a PCIe device/function present or not by reading the PCI Device ID register at offsets 0x0-0x3,
the return value should not be all 0s’ or all 1s’.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
62
0x10 – X16,
0x20 – X32.
Please note that PCIe device will start link training after reset without any software involvement and software could trigger
link retrain by setting the Link Control Register (offset 0x10)’s Retrain Link bit to 0x1.
The PCIe Capability Device Status register indicates if any fatal or non-fatal error happened, during normal operations it
should have all 0s’ for no error.
Figure 29. PCIe Capability Registers (the base address is typically 0x70)
Starting from PCIe Gen4, “Lane Margining at the Receiver Extended Capability” is defined as an optional but very useful
feature with Extended Capability ID of 0x27. These registers show the timing and voltage margins of each receiver lane. Please
refer to PCIe Gen4 or Gen 5 Base Specification Sections 4.2.13 & 7.7.6 for details.
Lspci is a Linux built-in command that could be used to list all PCIe devices and could be used for some PCIe testing:
https://opensource.com/article/21/9/lspci-linux-hardware.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
63
A PCIe endpoint device is identified as S/B/D/F (Segment Number, Bus Number, Device Number and Function Number
within a Device).
For Goldstone Switchboard, the PCIe devices are listed on Table 17 and Figure 30. Please note that while the hardware design
supports M.2 NVMe SSD, the current configuration only uses M.2 SATA SSD for backward compatible with the standard
alone Kong Switch.
As PCIe Gen3 is used for PCH - LS10 PCIe interconnect, advanced features such as “lane margining” are not available as
they are available starting from PCIe Gen4.
Table 17 Goldstone Switchboard PCIe Devices
Item PCIe Device Description Note
01 CPU Intel Core i3-8100H Mobile CPU, 45W TDP, 4 cores, 3GHz, 6MB shared L3, 2x 64-bit ECC Root Port
DDR4 channels, 16x PCIe Gen3 (Up to 1x16,2x8,1x8+2x4)
02 PCH-H Chipset CM246 Chipset / PCH, 8GT/s DMI3 X4, max 16 ports with 24 lanes PCIe Gen3 in X1/X2/X4, Root Port
max 14x USB 3.1/2.0 ports, Max 6x 6G SATA3 ports, integrated 1GE LAN
03 LS10 (#1) Nvidia 64 Ports 100G NVLink4 Switch, connected to PCH PCIe[18:17], PCIe Gen3 X2 EP
04 LS10 (#2) Nvidia 64 Ports 100G NVLink4 Switch, connected to PCH PCIe[20:19], PCIe Gen3 X2 EP
06 M.2 NVMe SSD M.2 NVMe SSD, PCIe Gen4, X4, not being used as SATA M.2 SSD is default EP, N/A
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
64
Figure 31. PCIe Lane Margining at Receiver Registers
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
65
Table 18. PCIe Device General Test Items
GOLDSTO Name: PCH – LS10 (A) PCIe Link Width and Speed Auto P0 Same FCT-R
NE-DEV00 Mandatory: Should check the PCH – LS10 (A) PCIe Link training TS1B bug # as
0 result: Link Width and Speed GOLDSTO
Method: Read PCH corresponding Root Port’s PCI Express Link Status NE-LS000
Register (PCIE_LS)
Expected Result: PCIe Gen3, X2
GOLDSTO Name: PCH – LS10 (B) PCIe Link Width and Speed Auto P0 Same FCT-R
NE-DEV00 Mandatory: Should check the PCH – LS10 (B) PCIe Link training TS1B bug # as
1 result: Link Width and Speed GOLDSTO
Method: Read PCH corresponding Root Port’s PCI Express Link Status NE-LS000
Register (PCIE_LS)
Expected Result: PCIe Gen3, X2
GOLDSTO Name: LS10 (A) – PCH PCIe Link Width and Speed Auto P0 Same FCT-R
NE-DEV00 Mandatory: Should check the LS10 (A) – PCH PCIe Link training TS1B bug # as
2 result: Link Width and Speed GOLDSTO
Method: Read LS10 (A) ’s PCI Express Link Status Register (PCIE_LS) NE-LS000
Expected Result: PCIe Gen3, X2
GOLDSTO Name: LS10 (B) – PCH PCIe Link Width and Speed Auto P0 Same FCT-R
NE-DEV00 Mandatory: Should check the LS10 (B) – PCH PCIe Link training TS1B bug # as
3 result: Link Width and Speed GOLDSTO
Method: Read LS10 (B)’s PCI Express Link Status Register (PCIE_LS) NE-LS000
Expected Result: PCIe Gen3, X2
GOLDSTO Name: LS10 (A) ID and Revision Checking Auto P0 Same FCT-R
NE-DEV00 Mandatory: Should check LS10 (A) Vendor ID, Device ID and Revision TS1B bug # as
4 against POR GOLDSTO
Method: Read LS10 (A) PCIe Configuration Space’s Vendor ID, Device NE-LS000
ID and Revision Registers
Expected Result: Nvidia, LS10
GOLDSTO Name: LS10 (B) ID and Revision Checking Auto P0 Same FCT-R
NE-DEV00 Mandatory: Should check LS10 (B) Vendor ID, Device ID and Revision TS1B bug # as
5 against POR GOLDSTO
Method: Read LS10 (B) PCIe Configuration Space’s Vendor ID, Device NE-LS000
ID and Revision Registers
Expected Result: Nvidia, LS10
GOLDSTO Name: LS10 (A) PCIe Reset Function Auto P1 Same as FIN
NE-DEV00 Mandatory: Should check if LS10 (A)’s PCIe Reset Signal works EVT GOLDSTO
6 normally NE-LS008
Method: Assert LS10 (A)’s PCIe Reset signal for 10ms then Deassert
it, and wait for 1ms then read LS10 (A)’s PCIe Configuration Space’s
Command Register
Expected Result: The Command Register’s “Bus Master Enable” and
“ Memory Space Enable” bits should be cleared. Once this test is
done, LS10 (A) is disabled.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
66
GOLDSTO Name: LS10 (B) PCIe Reset Function Auto P1 Same as FIN
NE-DEV00 Mandatory: Should check if LS10 (B)’s PCIe Reset Signal works EVT GOLDSTO
7 normally NE-LS008
Method: Assert LS10 (B)’s PCIe Reset signal for 10ms then Deassert
it, and wait for 1ms then read LS10 (B)’s PCIe Configuration Space’s
Command Register
Expected Result: The Command Register’s “Bus Master Enable” and
“ Memory Space Enable” bits should be cleared. Once this test is
done, LS10 (B) is disabled.
GOLDSTO Name: CPU – M.2 NVMe SSD PCIe Link Width and Speed Auto P0 Same as FCT-R
NE-DEV00 Mandatory: Should check the CPU – M.2 NVMe SSD PCIe Link EVT GOLDSTO
8 training result: Link Width and Speed only if NE-COM
Method: Read CPU corresponding Root Port’s PCI Express Link Status M.2 E001
Register (PCIE_LS) NVMe
Expected Result: PCIe Gen3, X4 SSD is
used
GOLDSTO Name: M.2 NVMe SSD – PCH PCIe Link Width and Speed Auto P0 Same as FCT-R
NE-DEV00 Mandatory: Should check the M.2 NVMe SSD – PCH PCIe Link EVT GOLDSTO
9 training result: Link Width and Speed only if NE-COM
Method: Read M.2 NVMe SSD’s PCI Express Link Status Register M.2 E001
(PCIE_LS) NVMe
Expected Result: PCIe Gen3, X4 SSD is
used
GOLDSTO Name: M.2 NVMe SSD ID and Revision Checking Auto P0 Same as FCT-R
NE-DEV01 Mandatory: Should check M.2 NVMe SSD Vendor ID, Device ID and EVT GOLDSTO
0 Revision against POR only if NE-COM
Method: Read M.2 NVMe SSD PCIe Configuration Space’s Vendor ID, M.2 E001
Device ID and Revision Registers NVMe
Expected Result: Per POR SSD is
used
GOLDSTO Name: M.2 NVMe SSD PCIe Reset Function Auto P1 N/A - OS FIN
NE-DEV01 Mandatory: Should check if M.2 NVMe SSD’s PCIe Reset Signal works EVT is
1 normally only if installed
Method: Assert M.2 NVMe SSD’s PCIe Reset signal for 10ms then M.2 on a only
Deassert it, wait for 1ms and then read M.2 NVMe SSD’s PCIe NVMe M.2 on
Configuration Space’s Command Register SSD is the
Expected Result: The Command Register’s “Bus Master Enable” and used system.
“ Memory Space Enable” bits should be cleared. Once this test is This will
done, M.2 NVMe SSD is disabled. crash the
system
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
67
7.13 PCIe Enhancement
Please refer to the following links about PCI Express Advanced Error Reporting (AER) mechanism that applies to PCIe
Root Port(s), as well as the driver for Linux OS to use:
https://www.kernel.org/doc/html/latest/PCI/pcieaer-howto.html
https://www.design-reuse.com/articles/38374/pcie-error-logging-and-handling-on-a-typical-soc.html
The Linux OS captures and saves PCIe AER error statistical counters at /sys/bus/pci/devices/<dev>/ as documented by:
https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
Table 19. PCIe Enhancement Test Items
GOLDSTO Name: CPU-M.2 NVMe SSD PCIe Interop Auto P1 N/A - FCT-R
NE-PCI002 Mandatory: Shall re-train CPU-M.2 NVMe SSD PCIe links 25 times EVT Manufact
using the PCIe Base Spec 7.5.3.7 Link Control Register (address offset only if uring test
0x10), and check link training result. Please refer to PCIe Base M.2 environm
Specification 7.5.3.7 for special attention. NVMe ent
Guidance: System. SSD is would
used have OS
installed
on the
only M.2
in
Goldston
e.
Resetting
the M.2
would
crash the
system
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
68
Guidance: System, parse lspci -vvvs output
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
69
https://pcisig.com/faq?field_category_value%5B%5D=pci_express_3.0&keys=
https://mjmwired.net/kernel/Documentation/ABI/testing/sysfs-bus-pci
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
70
7.14 M.2 SSDs
Each Goldstone node has 1pcs M.2 SSD for OS and Fabric Manager Software storage, and the hardware design supports
both M.2 SATA and M.2 NVMe SSD options with default to SATA 6G M.2 SSD for compatibility with Kong Switch. As the writing
of this document, the M.2 SATA SSD Model Number is under selection. As shown in Figure 33, NVMe SSD has been
increasingly dominating the SSD market since 2020 for the following reasons:
● Much higher IOPS & BW performance than SATA SSD
● Minor Price premium than SATA SSD (NAND Flash devices dominate the SSD cost)
● Built-in NVMe SSD device driver in OS
● Richer performance benchmarking and debugging software tools
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
71
The following software tools are very useful for PCIe & NVMe SSD testing and diagnosis:
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
72
Table 20. M.2 SATA SSD Test Items (default)
GOLDSTO Name: M.2 SATA SSD Vendor ID, Model No. & Quantity Auto P0 Same as FCT-R
NE-SATA0 Mandatory: Shall check the quantity of M.2 SATA SSD devices (1 x EVT GOLDSTON
00 M.2). E-COME001
Method: Use SATA Identify Device Command to access M.2 SATA
SSD.
Expected Result: The Vendor ID and Model No. should match POR
or BOM, and the Qty should be 1.
Guidance: PCH SATA Port 0A, Identify Device Return Word
GOLDSTO Name: M.2 SATA SSD Interface speed Auto P0 Same as FCT-R
NE-SATA0 Mandatory: Shall check the M.2 SATA SSD Interface speed. EVT GOLDSTON
03 Method: Use SATA Identify Device Command to access M.2 SATA E-COME001
SSD
Expected Result: SATA 3 (6G)
Guidance: PCH SATA Port 0A, Identify Device Return Word 79.
GOLDSTO Name: M.2 SATA SSD Smart Data Read Auto P1 Same as FCT-R
NE-SATA0 Mandatory: Shall read and check M.2 SATA SSD Smart Data for E2E EVT GOLDSTON
04 Error Correction Count, Uncorrectable Error Count and E-COME001
Temperature .
Method: Use SATA Smart Data Read Command
Expected Result: log the first read Error Counts; Temperature
below 72°C
Guidance: PCH SATA Port 0A, Smart Data IDs 0xB8, 0xBB, 0xBE
GOLDSTO Name: M.2 SATA SSD Read-Write Operations & Performance Auto P1 Jinshui … FCT-R
NE-SATA0 Mandatory: Shall check the M.2 SATA SSD Read-Write Operations EVT to file and
05 and Performance. post bug #
Method: Write 1MB Data to M.2 SATA SSD; read back and
compare. May use IOMeter utility.
Expected Result: The read back data should match the Write Data;
I/O Performance should be 75% or Higher of the M.2 SSD
Specification.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
73
Guidance: PCH SATA Port 0A, IOMeter, SATA Read & Write
Commands
GOLDSTO Name: M.2 SATA SSD Smart Data Read to Check Error Count Auto P1 Same as FCT-R
NE-SATA0 Mandatory: Shall read and check M.2 SATA SSD Smart Data for E2E EVT GOLDSTON
06 Error Correction Count, Uncorrectable Error Count and E-SATA004
Temperature after 1MB Data Write and Read .
Method: Use SATA Smart Data Read Command
Expected Result: Compare the new Error Counts to the value in
GOLDSTONE-SATA005, the Error Counts should within threshold
limits; Temperature below 72°C
Guidance: PCH SATA Port 0A, Smart Data IDs 0xB8, 0xBB, 0xBE
GOLDSTO Name: M.2 SATA SSD FW Update Auto P1 N/A - One FLA
NE-SATA0 Mandatory: Shall check if the M.2 SATA SSD FW could be updated EVT Diag does
07 successfully . not support
Method: Please refer to M.2 SATA SSD FW Update Tools FW flashing
Expected Result: M.2 SATA SSD FW could be updated successfully
and reliably
Guidance: M.2 SATA SSD FW Tools from Vendor
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
74
Table 21. M.2 & NVMe SSD Test Items (only applicable with M.2 NVMe SSD, not used in current version)
GOLDSTO Name: M.2 NVMe SSD Quantity Auto P0 Same bug # FCT-R
NE-NVM0 Mandatory: Shall check the quantity of M.2 NVMe SSD device (1 x EVT as
00 M.2). GOLDSTON
Method: Read M.2 NVMe SSD’s PCIe Configuration Space Vendor E-COME001
ID and Device ID registers
Expected Result: The Vendor ID of SSD’s PCIe Configuration Space
should match POR or BOM, the quantity should be 1.
Guidance: COMe CPU PCIe Lanes [11:8]
GOLDSTO Name: M.2 NVMe SSD Version Auto P0 Same bug # FCT-R
NE-NVM0 Mandatory: Shall check the version of the M.2 NVMe SSD device (1 EVT as
01 x M.2). GOLDSTON
Method: COMe CPU to access the M.2 NVMe SSDs connected to E-COME001
CPU PCIe Lanes [11:8] with PCIe Configuration Read CMD
Expected Result: The REVID of SSD’s PCIe Configuration Space
should be 0x00 or Higher
Guidance: COMe CPU PCIe Lanes [11:8]
GOLDSTO Name: M.2 NVMe SSD PCIe interface capability and Status Auto P0 Same bug # FCT-R
NE-NVM0 Mandatory: Shall check PCIe Interface width and speed of the M.2 EVT as
02 NVMe SSD device (1 x M.2). GOLDSTON
Method: COMe CPU to access M.2 NVMe SSD connected to CPU E-COME001
PCIe Lanes [11:8] with PCIe Configuration Read CMD
Expected Result: PCI Express Link Capabilities Register: “Max link
width” should be 0x4, “max link speeds” should be 0x3 (PCIe
Gen4); PCI Express Link Status Register: “Negotiated Link Width”
should be 0x4, “Current Link Speed” should be 0x2 (Gen3); PCIe
Link Capability Device Status Register should be 0x00, indicating no
error.
Guidance: COMe CPU PCIe Lanes [11:8]
GOLDSTO Name: M.2 NVMe SSD Capacity Auto P0 Same bug # FCT-R
NE-NVM0 Mandatory: Shall check the size of the M.2 NVMe SSD device (1 x EVT as
03 M.2). GOLDSTON
Method: COMe CPU to access the M.2 NVMe SSD connected to E-COME001
CPU PCIe Lanes [11:8] with NVMe Identify Command or nvme-cli
utility.
Expected Result: The Total NVM Capacity returned by NVMe
Identify Command should match POR or BOM.
Guidance: COMe CPU PCIe Lanes [11:8] & nvme-cli.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
75
have SATA
or NVMe
M.2
GOLDSTO Name: M.2 read/write operations and performance test Auto P1 Same as FCT-R
NE-NVM0 Mandatory: Shall check read/write operations of the M.2 NVMe EVT GOLDSTON
05 SSD device (1 x M.2). E-SATA005
Method: COMe CPU to access the M.2 NVMe SSD connected to
CPU PCIe Lanes [11:8] with NVMe Read / Write commands or
IOMETER software (preferred)
Expected Result: the read / write operations should finish
normally, the data read should match the data written, the
sequential and random Read / Write IOPS / Bandwidth
performance should not below 75% of specification
Guidance: COMe CPU PCIe Lanes [11:8] & IOMeter
GOLDSTO Name: M.2 SSD SMBus Interface VPD (SMBus address 1010011b) Auto P1 Same as FCT-R
NE-NVM0 Mandatory: Shall check read/write operations of the M.2 devices EVT GOLDSTON
06 (1 x M.2). E-SATA004
Method: COMe CPU to access the VPD data of each M.2 (1x M.2)
with SMBus read CMD.
Expected Result: The vendor ID & Model Number should match
POR or BOM; M.2 NVMe SSD Port 0 max speed and port 0 max
width should be 0x04, the warning thresh should be 0x0480.
Guidance: COMe CPU PCIe Lanes [11:8] & M.2 SSD VPD
GOLDSTO Name: M.2 SSD SMBus Interface Temperature (SMBus address Auto P1 Same bug # FCT-R
NE-NVM0 1101010b) EVT as
07 Mandatory: Shall check the temperature of the M.2 device (1 x GOLDSTON
M.2). E-SNR000
Method: COMe CPU to access the VPD data of each M.2 (1x M.2)
via COMe IC2 with SMBus read and write CMDs, set high
temperature limit to 72°C, low temperature limit to 5°C.
Expected Result: the read values of the high / low temperature
limits should be equal to the values written to; the current ambient
temperature should be below 72°C.
Guidance: COMe CPU PCIe Lanes [11:8] & M.2 SSD VPD
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
76
Figure 34. Example of SMART log returned by nvme-cli / nvme smart-log
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
77
7.15 FPGA/CPLD Devices
There are 2pcs FPGA devices on Goldstone Switchboard as following even if they may be called CPLD in some design documents:
● Main CPLD: U77, MACHX03D_9400LUT_BG400 from Lattice Semi, for Switchboard main control functions, accessible to COMe
CPU via PCH LPC interface.
● Port CPLD: U1_CP1, LCMXO3D-9400HC-5BG256C from Lattice Semi, for OSFP Ports related control. As Each LS10 has 8x OSFP
ports, thus there are 2x I2C interfaces on Port CPLD, one for each LS10’s I2CA port.
The MachX03 is a FPGA as it uses internal SRAM to hold the configuration bits that define user’s chip functions. Upon
the power-on or reconfiguration, MachX03 internal configuration engine (boot loader) will automatically download the
configuration bits from internal Flash (due copies) or external device/interface determined by Configuration Modes.
As shown in Figure 35, LCMX03D and MACHX03D are in the same MachX03D family using internal dual-boot Flash for
configuration. The MachX03D CPLD can be configured in one of the following methods as shown in Figure 37:
● JTAG: Using Lattice Semi Diamond Programmer Software and USB-JTAG Dongle to program MachX03D FPGA internal
Configuration Flash. This mode is used in Goldstone Switchboard for in-system programming by driving JTAG signals from COMe CPLD
Registers, please refer to Figure 38.
● SDM: Self Download Mode, the FPGA download the Configuration bits from internal Configuration Flash to configuration
SRAM.
● MSPI: Master SPI Mode, FPGA as a SPI Master and loads Configuration bits from external SPI/QSPI Flash, and FPGA FW
update is done through updating external SPI Flash. Not used in Goldstone Switchboard.
● SSPI: Slave SPI Mode, FPGA as a SPI Slave and an external controller as a SPI master downloads Configuration Bits to FPGA’s
internal Configuration SRAM or SPI Flash, not used in Goldstone Switchboard design.
● I2C: FPGA as a I2C Slave device and an external I2C master downloads Configuration Bits to FPGA’s internal Configuration
SRAM or SPI Flash, not used in Goldstone Switchboard design. The FPGA’s I2C slave address is 0x40 (7b) or 0x3C0 (10b).
As shown in Figure 36, MachX03D FPGA’s Configuration Status could be monitoring by checking the INIT and DONE Pins, but current
design leaves them just pulled up. If Main CPLD could not Configure successfully, COMe would not be able to boot.
Also shown in Figure 36, MachX03D FPGA could be reconfigured by toggling its PROGRAMN pin, but current design leaves it just
pulled up.
When FPGA internal Configuration Flash is being programmed (FPGA FW Updated) via JTAG interface, the FPGA will continue its
formal Configured User Functions until a “Transfer Refresh” JTAG command.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
78
Goldstone Switchboard supports updating the Main CPLD and Port CPLD (actually FPGAs) through external JTAG USB Dongle and
Diamond Programming Software from Lattice Semi via Connector J67.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
79
Figure 38. Goldstone Switchboard In-System Programming Path
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
80
Figure 40. MachX03D Feature Row Elements
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
81
Table 22. MAIN CPLD Test Items
GOLDSTO Name: Main CPLD GPIO-JTAG Programming Interface Auto P0 Jinshui … FLA
NE-CPLD0 Mandatory: Shall check Main CPLD GPIO-JTAG interface with COMe EVT to clarify
03 CPLD how this
Method: Check Main CPLD JTAG Device ID from its JTAG interface. checked - is
Expected Result: Correct Main CPLD JTAG Device ID. this
Guidance: See Main CPLD Spec checked
from CPLD
version
name?
GOLDSTO Name: Main CPLD and Port CPLD Sync Interface Auto P1 Jinshui … FCT-R
NE-CPLD0 Mandatory: Shall check the SYNC Interface between Main CPLD & EVT to clarify
05 Port CPLD guidance -
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
82
Method: See Main CPLD and Port CPLD Design spec. what can be
Expected Result: Correct Sync interface communications between used to
Main CPLD & Port CPLD . check SYNC
Guidance: See Main CPLD and Port CPLD Design Specs. bus status
similar to
GOLDSTON
E-COME012
GOLDSTO Name: Port CPLD GPIO-JTAG Programming Interface Auto P0 Same as FLA
NE-CPLD0 Mandatory: Shall check Port CPLD GPIO-JTAG interface with COMe EVT GOLDSTON
06 CPLD E-CPLD003
Method: Check Port CPLD JTAG Device ID from its JTAG interface.
Expected Result: Correct Port CPLD JTAG Device ID.
Guidance: See Port CPLD Spec
GOLDSTO Name: LS10(A) – Port CPLD I2C Interface Auto P0 Same as FCT-R
NE-CPLD0 Mandatory: Shall check LS10 (A) – Port CPLD I2C Interface EVT GOLDSTON
08 Method: Read/Write Port CPLD internal registers via LS10 (A) I2CA. E-I2C002
Expected Result: Read/Write Port CPLD internal register(s)
correctly
Guidance: See LS10 I2CA and Port CPLD Spec
GOLDSTO Name: LS10(B) – Port CPLD I2C Interface Auto P0 Same as FCT-R
NE-CPLD0 Mandatory: Shall check LS10 (B) – Port CPLD I2C Interface EVT GOLDSTON
09 Method: Read/Write Port CPLD internal registers via LS10 (B) I2CA. E-I2C002
Expected Result: Read/Write Port CPLD internal register(s)
correctly
Guidance: See LS10 I2CA and Port CPLD Spec
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
83
7.16 FRU EEPROM
Goldstone switchboard has 3pcs FRU EEPROM chips, one for Goldstone Switchboard FRU, one reserved for Cable
Backplane & Chassis FRU, and one Cable Backplane & Chassis FRU in the Chassis.
The Cable Backplane & Chassis FRU (CB&C FRU) EEPROM is in the chassis and accessed via an I2C interface on the
Power connector; and would not be moved with the Keystone Node or Goldstone node. There is one CB&C FRU EEPROM per
slot, so total 11pcs in the Ranger Chassis. Ranger Chassis-level integration manufacturing testing needs to ensure FRU content
consistence cross these 11pcs CB&C FRU EEPROM Devices.
These 2pcs FRU EEPROM devices are connected to COMe CPU-PCH-(LPC)-CPLD U10 under control from U10 CPLD,
as shown in Figure 22.
Cable Backplane & Chassis (CB&C) FRU: in the chassis and CPLD
3 TBD
accessed via an I2C interface on the Power Signal connector LPCI2C
GOLDSTO Name: Switchboard FRU EEPROM read Auto P0 Same bug # FLA
NE-FRU00 Mandatory: Shall check ability to read the Switchboard FRU EVT as
0 EEPROM content. GOLDSTONE
Guidance: U92 (M24512) via COMe CPLD LPC2I2C I2C -COME001
GOLDSTO Name: Switchboard FRU EEPROM Programing Auto P0 N/A - One FLA
NE-FRU00 Performance: Shall check ability to program the Switchboard FRU EVT Diag does
1 EEPROM content. not support
Guidance: U92 (M24512) via COMe CPLD LPC2I2C I2C flashing
GOLDSTO Name: Switchboard FRU EEPROM Write Protection Auto P0 Same bug # FLA
NE-FRU00 Mandatory: Shall check Write-Protection of the Switchboard FRU EVT as
2 EEPROM. GOLDSTONE
Guidance: U92 (M24512) via COMe CPLD LPC2I2C I2C. -COME001
Write-Protection controlled by Main CPLD.
GOLDSTO Name: Switchboard FRU EEPROM with FRU data Auto P0 Same bug # FLA
NE-FRU00 Mandatory: Shall report, program and check Switchboard FRU EVT as
3 EEPROM with board-unique data, at early stage of FCT GOLDSTONE
Guidance: Program each Switchboard FRU EEPROM with its unique -COME001
FRU Data.
Note that
One Diag
does not
support
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
84
programmin
g FRU data
GOLDSTO Name: Reserved FRU EEPROM read Auto P1 Same bug # FLA
NE-FRU00 Mandatory: Shall check ability to read the Reserved FRU EEPROM DVT as
4 content. GOLDSTONE
Guidance: U190 (M24512) via COMe CPLD LPC2I2C I2C -COME001
GOLDSTO Name: Reserved FRU EEPROM Programing Auto P1 Same bug # FLA
NE-FRU00 Performance: Shall check ability to program the Served FRU EVT as
5 EEPROM content. GOLDSTONE
Guidance: U190 (M24512) via COMe CPLD LPC2I2C I2C -COME001
Note that
One Diag
does not
support
programmin
g FRU data
GOLDSTO Name: Reserved FRU EEPROM Write Protection Auto P1 Same bug # FLA
NE-FRU00 Mandatory: Shall check Write-Protection of the Reserved FRU EVT as
6 EEPROM. GOLDSTONE
Guidance: U190 (M24512) via COMe CPLD LPC2I2C I2C. -COME001
Write-Protection controlled by Main CPLD.
GOLDSTO Name: CB&C FRU EEPROM read Auto P0 Same bug # FCT
NE-FRU00 Mandatory: Shall check ability to read the CB&C FRU EEPROM Range as
7 content. r GOLDSTONE
Guidance: via COMe CPLD LPCI2C Chassi -COME001
s
GOLDSTO Name: CB&C FRU EEPROM Programing Auto P0 N/A - One FCT
NE-FRU00 Performance: Shall check ability to program the CB&C FRU Range Diag does
8 EEPROM content. r not support
Guidance: via COMe CPLD LPCI2C Chassi programmin
s g FRU data
GOLDSTO Name: CB&C FRU EEPROM Write Protection Auto P0 Same bug # FCT
NE-FRU00 Mandatory: Shall check Write-Protection of the CB&C FRU Range as
9 EEPROM. r GOLDSTONE
Guidance: via COMe CPLD LPCI2C Chassi -COME001
s
Note: No FLA is planned for Ranger Chassis level integration manufacturing testing, only FCT + RIN + ACC (AC Cycling)
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
85
7.17 EROT & EROT-Protected SW/FW
The programming paths for Goldstone Switchboard’s FWs update are shown in Figure 45.
As shown in Figure 42, the FWs for the following chips on the Goldstone Node are protected with CEC1736 EROT
controller:
● LS10 NVLink4 Switch IC: 2pcs (LS1/G1_A: LS10-A and LS2/G1_B: LS10-B), each has its own EROT/CEC1736 Chip.
✔ The FW SPI Flash for each LS10 is W25Q32 (4MB/32Mbits) from Winbond
✔ No EROT-bypass path is provided for LS10’s FW access from LS10
✔ Both EROT / CEC1736 Devices have FW_CONF[7:0] strapped as 0x20 (0b0010,0000): for GPU /
NVSwitch
For Each LS10 (AP), the connections are as following:
● LS1’s FW SPI Flash U126 is connected to LS1-EROT/CEC1736 U97’s QSPI0_IO Interface
● LS2’s FW SPI Flash U115 is connected to LS2-EROT/CEC1736 U124’s QSPI0_IO interface
● LS1’s QSPI (ROM) interface is connected to LS1-EROT/CEC1736 U97’s QSPI0_IN interface
● LS2’s QSPI (ROM) interface is connected to LS2-EROT/CEC1736 U124’s QSPI0_IN interface
● LS1-EROT/CEC1736 U97’s QSPI1_IN and LS2-EROT/CEC1736 U124’s QSPI1_IN are connected to COMe U2
PCH GSPI0 for Out of Band (OOB) EROT communications and LS10 AP-FW and EROT FW (EC-FW) update.
● LS1-EROT and LS2-EROT’s I2C06 is used as EROT ATTEST I2C Interface and connected to COMe
I2C_B2B_SCL/SDA (COMe CPLD: I2C_CPLD_B2B_SCL/SDA). The I2C addresses used are:
✔ 0xA4 (8b) for Runtime access
✔ 0xA6 (8b) for Recovery access
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
86
Figure 43. EROT-Protected AP-FW Update Flow with BMC
The EROT/CEC1736 itself has its own FW (EC-FW) stored in its internal built-In Flash as shown in Figure 44. The
EC-FW is pre-programmed before the EROT/CEC1736 is assembled to the Goldstone Switchboard PCB using a Device
Programmer at Microchip or CM (FXSJ); and the EROT’s JTAG interface is disabled using OTP.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
87
Table 24. EROT Test Items
GOLDSTO Name: EROT RESET Input & AP0 RESET output Checking Auto P0 N/A FLA
NE-EROT00 Mandatory: Shall check both EROT/CEC1736 devices’ RESET and EVT
1 AP0 Reset Signals
Expected Result: When EROT’s RESET input is asserted (LOW), its
AP0 RESET output should be LOW to hold LS10/AP in Reset state;
When EROT’s RESET input is deasserted (High), its AP0 RESET
output should also be High to release LS10/AP from RESET, after
some delays TBD.
Guidance: Goldstone Switchboard MAIN CPLD Register & U80
GOLDSTONE- Name: EROT Device EC-FW Boot Checking Auto P0 Same as PLA
EROT002 Mandatory: Shall check both EROT/CEC1736 devices have booted EVT GOLDSTO
successfully NE-EROT0
Expected Result: Both EROT/CEC1736 Devices boot its EC-FW 00
successfully.
Guidance: Please refer to CEC1736 Datasheet and NV EROT
Design Spec.
GOLDSTONE- Name: EROT Device & EC-FW Version Checking Auto P0 Same as PLA
EROT003 Mandatory: Shall check both EROT/CEC1736 devices having the EVT GOLDSTO
correct device ID and EC-FW version NE-EROT0
Expected Result: Correct EROT/CEC1736 Device ID and EC-FW 00
Version matching POR or BOM.
Guidance: Please refer to CEC1736 Datasheet and NV EROT
Design Spec.
GOLDSTO Name: EROT Device EC-FW Update Auto P0 N/A - One PLA
NE-EROT00 Mandatory: Shall check both EROT/CEC1736 EROT Devices’ EVT Diag does
6 EC-FW could be updated successfully and Reliably not support
Guidance: NV EROT Design guide & NVFLASH flashing
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
88
GOLDSTONE- Name: LS10 Device AP-FW Update Auto P0 N/A - One PLA
EROT007 Mandatory: Shall check both LS10 Devices’ AP-FW could be EVT Diag does
updated successfully and Reliably not support
Guidance: NV EROT Design guide & NVFLASH flashing
GOLDSTONE- Name: EROT Device ATTEST I2C Interface Auto P1 Jinshui … PLA
EROT008 Mandatory: Shall check COMe could communicate with both EVT to clarify
EROT/CEC1736 Devices over the ATTEST I2C Interface for both the
Runtime and Recovery I2C address requiremen
Guidance: NV EROT Design guide & NVFLASH t - If LS10
VBIOS is
present,
this is
covered
indirectly
GOLDSTONE- Name: LS10 Device AP-FW Write-Protection Checking Auto P1 Jinshui … FLA
EROT009 Mandatory: Shall check both LS10 Devices’ AP-FW could be EVT to clarify
Write-protected successfully and Reliably the
Guidance: NV EROT Design guide; AP-FW Flash’s WP is from EROT requiremen
t - is it
checking
for WP
enable/disa
bled? If it is
more to it,
then this
seem like a
validation
test.
GOLDSTOne- Name: LS10 Device AP-FW Kill Checking Auto P1 N/A - One FLA
EROT010 Mandatory: Shall check both LS10 Devices’ AP-FW could be killed EVT Diag does
successfully and Reliably not support
Guidance: NV EROT Design guide; the KILL signal is from EROT, flashing,
and once KILL is Active, the AP-FW Flash has no power. but also
this seems
like a
validation
testing.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
89
8 Sensor List
Please refer to Table 9 for sensors’ I2C addresses on Goldstone Switchboard including PDB, and the Figure 24 for
sensors’ I2C addresses on the COMe module.
In addition to the standalone sensors, the SODIMM, M.2 SSD and OSFP modules all include temperature and/or voltage
sensors accessible to their management interfaces (typically I2C).
During normal operations, LS10 devices manage their power sensors and OSFP modules’ sensors; but COMe CPU could
access them under Main CPLD’s control.
Table 25 Goldstone Node Sensors
Temperature sensor inside M.2 SSD , please see M.2 SSD Section GS Switchboard 0x3A
U29: TMP75 Temperature sensor PDB & Fan Tray via PDB: 0x9A
I2C_TEMP_*
U8: TMP75 Temperature sensor 0x9C
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
90
U4_35: MP2975 for LS1 DVDD & HVDD 0xCA
U1_33: MP2975 for OSFP Ports 1-16 P3.3V via MP86975 0x54
J1_01: OSFP Port 1 NVLink transceiver’ Temperature & Voltage sensors OSFP Module 0xA0
J1_02: OSFP Port 2 NVLink transceiver’ Temperature & Voltage Sensors 0xA0
J1_03: OSFP Port 3 NVLink transceiver’ Temperature & Voltage Sensors 0xA0
J1_04: OSFP Port 4 NVLink transceiver’ Temperature & Voltage Sensors 0xA0
J1_05: OSFP Port 5 NVLink transceiver’ Temperature & Voltage Sensors 0xA0
J1_06: OSFP Port 6 NVLink transceiver’ Temperature & Voltage Sensors 0xA0
J1_07: OSFP Port 7 NVLink transceiver’ Temperature & Voltage Sensors 0xA0
J1_08: OSFP Port 8 NVLink transceiver’ Temperature & Voltage Sensors 0xA0
J1_09: OSFP Port 9 NVLink transceiver’ Temperature & Voltage sensors 0xA0
J1_10: OSFP Port 10 NVLink transceiver’ Temperature & Voltage Sensors 0xA0
J1_11: OSFP Port 11 NVLink transceiver’ Temperature & Voltage Sensors 0xA0
J1_12: OSFP Port 12 NVLink transceiver’ Temperature & Voltage Sensors 0xA0
J1_13: OSFP Port 13 NVLink transceiver’ Temperature & Voltage Sensors 0xA0
J1_14: OSFP Port 14 NVLink transceiver’ Temperature & Voltage Sensors 0xA0
J1_15: OSFP Port 15 NVLink transceiver’ Temperature & Voltage Sensors 0xA0
J1_16: OSFP Port 16 NVLink transceiver’ Temperature & Voltage Sensors 0xA0
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
91
9 Goldstone Node I2C Trees
Please refer to Figure 24 for I2C tree on COMe Module and Table 9 for I2C tree on the Switchboard including PDB.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
92
10 JTAG & Boundary Scan Test
To be added in V1.1
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
93
11 Equipment List
EQP-001 OSFP Loopback Dongle for NVLink Loopback Nvidia P/N = MOP4OPT-NFLMD3U2
Amphenol P/N = NLMACE-0001, ~$XX
EQP-002 M.2 Clone Machine to clone OS & Manufacturing Test SW, eSystor.com, P/N = SYSNVME-M2205, ~$5400
2mins/4GB, cloning to
EQP-003 24-Port RS-232 Terminal Server, 1RU, C13 Power Cord Perle.com, IOLAN STS24, P/N = 04030464, ~$3000
EQP-005 Golden parts for EQP-004 (ME Tray, PDB, Cable set, etc.) ZT
EQP-010 CAT5 RJ45 RS232 Cable for Goldstone and IOLAN STS24 Pinout
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
94
Figure 47. EZDUPE M.2 NVMe SSD Duplicator (DM-HE0-8V07NTP)
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
95
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
96
12 References
● SMBPBI High-Speed Proxy Interface - Compute Resman Software - Confluence (nvidia.com), Out of band
GPU management from BMC via a SMBus (I2C Bus).
● OSFP: A new Small Form-factor Plug with 8x Lanes and a 60 pins connector (vs. 76 pins of QSFP-DD)
● QSFP 112 vs. QSFP-DD800:
● QSFP-112: 8x differential pairs (4x TX + 4x RX) on a 38-pin connector, http://qsfp112.com/
● QSFP-DD800: 16x differential pairs (8x TX + 8x RX) on a 76-pin connector,
http://www.qsfp-dd.com/. Goldstone Switchboard uses QSFP-DD800 connectors for NVLink4 links.
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
97
Figure 51. QSFP-DD800 vs. QSFP112 Pinout
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
98
Figure 52. QSFP-DD/QSFP-DD800 Connector
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
99
Differential data rate 10G 10-25G 10-56G 10-112G 10-112G 10-112G
Modulation Type NRZ NRZ PAM4 PAM4 NRZ & PAM4 NRZ & PAM4
Note:
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
100
Figure 55. QSFP-DD MSA Connector PCB Layout
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
101
Figure 57. QSFP-DD vs. OSFP in Sizes
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
102
Figure 59. QSFP-DD vs. OSFP for 400G
NVIDIA Confidential
The controlled copy of this document resides in PDP and printed copies of it are for reference only. Printed Date: 9/28/2021 2:50 PM
103