Innosilicon T2T model hash board repair guide
2022/12/2
Innosilicon T2T model hash board maintenance manual (V1.3)
In the process of manufacturing and using the miner, if the user encounters chain loss, low hashrate,
multiple hardware errors, etc., please refer to this manual for test and maintenance.
Note: This manual cannot cover all possible abnormal problems. If you encounter a problem that
cannot be solved using this manual, please consult our relevant personnel, and they will update this
manual when necessary.
Ⅰ. Overview
1. The circuit layout of the hash board and the distribution of test points
Take 3*31 model as an example. For other models, please refer to relevant design documents.
(1) The three adjacent chips in the figure are a voltage domain [(1,2,3), (4,5,6)...(91,92,93)]. There
are a total of 31 voltage domains on this hash board, and the voltages of the three chips in each
voltage domain are the same, and the average voltage of each voltage domain is about 0.45V at
startup (T series machines)
(2) The red arrow in the figure shows the transmission direction of CLK and communication signals;
(3) There are 1 to 7 test points between each two chips (each model is different, please refer to the
design file for details); test points 1 to 7 are CLK, RST, EN, SCK, CS, DI and DO signals respectively.
Specifically as shown below:
(4) Test points and connections between adjacent chips:
2. Description of the test software
Software Application occasion Purpose
Measuring chain After SMT, before sticking It is used to quickly check
heat sink soldering problems. It does
not do a long time function
test, but only tests whether
the transmission of all chips
is normal.
Before pasting After pasting the heat sink It is used to check various
on the non-chip side faults of the single board in
the high power state as early
as possible. Due to the lack
of a heat sink, the operating
frequency of the chip is
lower than that of normal
use.
Binning after pasting After all heat sinks are The test is carried out under
pasted 4 kinds of working voltages,
and the boards are graded
according to the measured
hash rate. Boards of the
same grade are loaded into
a machine.
Maintenance Locating problems with a The program will send
single hash board communication commands
indefinitely for maintenance
personnel to use multimeters
and oscilloscopes to check
the necessary circuits.
Ageing The machine is aged before Use the official factory
leaving the factory firmware, if there is an
exception, an error code will
be displayed on the mass
production management
interface.
3. Error code list of test software before and after pasting
If no problem is detected, "√" will be printed at the end of the log, otherwise it will be printed "×"。
When a problem is detected, the software will report the error type with the highest priority. The order
of error priority is: E0>E9>E6>E4>E7>E5>
E3 > E1 > E2 > E8. The chip can be repaired or replaced according to the report.
Error code Description Remarks
E0 Cannot find chip type Chain failure
E1 The number of good cores of a single Statistics under operating
chip is less than 30% frequency
E2 The number of good cores of a single Statistics under operating
chip is less than 90% frequency
E3 The single chip job test is all wrong
E4 The PLL with the chip is not locked
E5 The temperature of the chip is abnormal 9999 or - 9999 is displayed
in the software
E6 The voltage of the chip is abnormal
E7 There is an error in the return process of “E7:0” indicates that pll
the command, or the frequency increase configuration failed
fails
E8 The total error rate of job testing of the
whole board is greater than 10%
E9 The number of chips read is wrong
E10 (Reserve)
E11 Unable to find a suitable grade after
pasting the heatsink
E12 CRC error returned by the command
E13 Failure to depressurize
4. Error code list of aging software
Number Problem Methods of resolution Notice
1 The IO of the control Change the control board The factory settings must be
board is abnormal restored after completion
2 Network fault of control
board
3 Hash board is failure Change the hash board After replacement, be sure
to restore factory settings or
re-aging after completion
4 Chip failure
5 The temperature of
individual chips is too
high
6 PSU failure Change the power supply It is recommended to restore
factory settings or re-aging
after completion
7 SPI line interference Use shield wire
8 SPI cable is not plugged Check and re-plug SPI flat
properly cable
9 The power consumption Re-aging or frequency
of the whole machine is reduction (Efficiency mode)
too high
10 The ambient Check and re-plug SPI flat Improve the operating
temperature is too high cable environment
11 Fan fault Check the fan cable Reference document
connection / check whether “Summary of Frequently
the fan model matches / Asked Questions about the
check whether the fan Control Board”
installation direction is
correct
12 Mining pool settings Check pool settings or
error restore factory settings
13 The network cable is Check the network cable
not plugged properly connection
14 Network environment Check the DHCP and DNS
failure configurations of the switch
Error Description Err Message Analysis
code
0 Normal Normal
21 One or more hash The number of hash boards SPI cable not plugged in / IO
boards are not detected that have been detected. If fault of control board / Hash
there are more than one, board fault
separated by spaces
22 I2C communication of PSU failure / Control board
power supply is IO failure
abnormal
23 All hash board encore Control board IO
failure failure / PSU failure / hash
board failure
24 Partial hash board The number of the Hash board failure / control
encore failure normal encore hash board, if board IO
there are more than one, failure / PSU failure
separated by spaces
25 Upscaling failed Hash board number: wrong SPI line interference / hash
frequency point board failure
26 Failed to set voltage Hash board No.: 1/2 SPI line interference / hash
board failure
27 Failed to bist Hash board No.: 1/2 SPI line interference / hash
board failure
28 SPI error cannot be Hash board number SPI line interference / hash
recovered automatically board failure / control board
at runtime IO failure
29 I2C communication - PSU failure / control board
fails during operation IO failure
and cannot be recovered
automatically
30 Unable to connect to the - Mining pool setting error /
mining pool network cable not plugged in
properly / network failure of
the control board / network
environment failure
31 Individual chips are Damaged chip number: Chip failure
damaged, resulting in hashboard number. If there
falsely high hashrate. are more than one, separated
by spaces
32 Overtemperature Hash board number The ambient temperature is
too high / fan failure / the
temperature of individual
chips is too high / the power
consumption of the whole
machine is too high
33 Failed to read Hash board number Control board IO
temperature failure / hash board failure
34 SPI cable connection is Hash board number SPI port of the control board
abnormal is inserted incorrectly /
control board IO failure
35 Insufficient power PSU failure
supply
36 The number of good Hash board number: chip Hash board failure
cores of the chip is number
abnormal
37 Wrong vid type of vidtype, minertype, subtype, Hash board failure
control board chipnum
II. Preparation of maintenance platform
Tools:serial port board / data cable / TF card / jumper cap / oscilloscope / multimeter
Software:
boot.bin
SecureCRT.exe
1. Software instructions
(1) Instructions for the test software *. bin
How to use: After shutting down, copy xxx.bin directly to the TF card, and insert the TF card into the
slot of the serial port board. Then connect the serial port board to the control board, and use a jumper
cap to connect to the J2 interface. Finally, boot it up.
(2) Instructions for the the serial port tool
Install the serial port test tool (SecureCRT.exe) on the computer, and set the baud rate: 115200, n, 8,
1.
The setting method is as follows:
Double-click the serial port icon to open the serial port tool as shown in the figure below, click "New
Dialogue" in the red box in the dialog box.
Select the serial in the New Session Wizard.
Set baud rate: 115200 and other options.
(3) Software instructions
① Software before and after pasting
The usage process is:
1) After inserting the SD card into the slot, check that the device is correct and power on.
2) Open the serial port software to check whether the software version information is correct after
power-on.
3) During the test, the test information of each stage and other prompt characters will be displayed to
facilitate hardware testing and status monitoring.
4) After the test is finished, print the test result. If it is a multi-chain test, the test results will be printed
together after the test is finished.
5) To test again, directly press the reset key on the control board or press the enter key according to
the prompt characters of the software.
② Maintenance software
1) After inserting the SD card into the slot, check that the device is correct and power on.
2) Open the serial port software to check whether the version information of the software is correct
after power-on.
3) During the test, the test information and LED lights of each stage will be displayed to facilitate
hardware testing and status monitoring.
4) The software will continuously send a fixed command during operation, which can be used to
measure voltage and signal.
5) After the measurement is completed, press the function key to continue running, and finally print
the test results.
6) To test again, directly press the reset button on the control board or press the Enter button
according to the characters prompted by the software.
It should be noted that the maintenance software can only test one circuit board at a time. When the
function key is pressed, only when the corresponding indicator light goes out can it be ensured that
the key is successfully captured.
2. Establish a test environment
Take out the control board of the miner to be tested, place the control board and the serial port board
as shown in the figure, insert the TF card, and insert the jumper cap into the J2 interface. Connect the
serial port board and the computer with a data cable.
III. Maintenance process
1. The basic process of repairing the aging of the whole miner
(1) Reproduce the bad aging problem and record the error code. If you need our company's research
and development analysis, you also need to save the aging log.
(2) Check whether the power output corresponding to the defective board is normal.
(3) If it is a multi-channel control power supply, exchange the power channel of the bad board and the
normal board (note that the order of the data line interface is adjusted at the same time), and then
observe whether the bad phenomenon follows the hash board or the power supply. If it follows the
power supply, replace the power supply again. do aging.
(4) Disconnect the power supply and network cable. Check whether the appearance of the machine is
damaged. Check whether the power and data cables are loose or disconnected.
(5) Use the original machine power supply and the faulty hash board to do a sticky test in the bucket,
and record the error code and log. If there is no abnormality after 5 consecutive tests, our R&D
personnel will be notified for analysis.
(6) Use the original machine power supply and the faulty hash board, and do a post-stick test outside
the barrel to see if the phenomenon still exists and make a record. If the chip surface is a heat sink
fixed by screws, remove the heat sink on the chip surface, and then do a pre-stick test to see if the
phenomenon still exists, and make a record.
(7) Continue to analyze in accordance with the faulty board repair process.
2. The basic process of repairing the single board
Before maintenance, please confirm that the power supply, control board, and various cables are
connected properly.
(1) Use the pre-glue test software to test and get the error code Ex:x. Different next steps may be
taken for different types of errors.
(2) Check the appearance of the board, and observe whether there are missing components, errors,
or abnormal appearance. Check whether there are solder balls, foreign objects, etc. near the error
chip.
(3) Run the maintenance procedure and check the input voltage with a multimeter. Check crystal
oscillator supply. Check tail IO boost circuit. Check the LDO output of each stage.
(4) Use an oscilloscope to check the chip input and output signals CLK, SCK, DO, DI, CS, RSTN,
START.
(5) If the output signal of the ASIC chip is found to be abnormal, do not easily replace the chip.
According to the instructions in the following chapters, first try methods such as adding soldering, re-
soldering, and swapping with other chips on this board.
(6) If the method of chip exchange is adopted, it can be observed whether the problem follows the
chip.
(7) If the above method is invalid, then replace the chip. It is necessary to record in detail the
specified information such as the cause of the problem of the removed chip in the maintenance
report. Regularly send the maintenance report to our company for analysis.
3. Locate the broken chain with a maintenance-specific program
Copy the provided repair.bin to the TF card and plug it into the serial board. Connect power and data
cables (no fan required), power on. According to the error message of the software before or after
gluing, detect the test points of the relevant chip and its adjacent chips.
Description of function keys and indicators in the service software.
(1) After power on, the lights on the control board are on (red and green lights next to the reset
button). If the power-on link is broken, the software will keep sending cmd04. After pressing the
function key next to the USB card slot, the software will stop sending cmd04, and the program will
continue to run, and the green light will be off at this time;
(2) If the power-on link is connected, the software will continue to send cmd04. Similarly, after
pressing the function key, it will stop sending cmd04, and then the green light will go out;
(3) After the frequency configuration fails, the software will send cmd04 at the point of failure, press
the function key, stop sending cmd04, the program continues to run, and the red light is off at this
time;
(4) After the frequency configuration is successful, if the link breaks during the continuous reading
process, the software will send cmd04 at the link break. After pressing the function key, the
transmission will stop, and the red light will be off at the same time, and the program will continue to
execute.
IV. Analysis of typical problems
1. E0: 1
This kind of problem is that the communication chain is completely broken, and most of them are
caused by abnormal peripheral circuits. Known causes are:
(1) The power supply has no output or the output is abnormal.
(2) The solder connection between the communication interface and the plug-in pin is shorted.
(3) The data cable is not plugged in properly or the contact is poor or damaged, resulting in a short
circuit.
(4) The components between the communication interface and the first chip have problems such as
false soldering, short circuit, burning, displacement, and missing parts.
(5) The IO of the first chip was damaged by static electricity.
(6) The crystal oscillator is abnormal.
(7) Some components are missing.
If you encounter such problems, you need to follow the "5 Checklist" for a complete inspection.
2. E0: N
The problem is that part of the communication link is broken, and it is broken at the Nth chip. Known
causes are:
(1) The signal between the Nth and N-1th ASIC chips is abnormal, the pins of the two chips are
falsely soldered, floating high, short-circuited, and IO is damaged.
(2) False soldering, short circuit, burning, displacement, missing parts and other problems occur in
the peripheral components of the Nth chip.
Repair steps:
(1) Check the peripheral circuit, if there is no abnormality, go to the next step.
(2) Check the ground resistance of the IO of the Nth ASIC chip and the front and rear ASIC chips. If
there is no abnormality, proceed to the next step. If there is any abnormality, remove the chip and
compare it with the ground resistance of the IO of the new chip. If there is no obvious difference, go to
the next step, otherwise replace the chip.
(3) Re-solder the Nth and N-1th chips, if there is still an abnormality, go to the next step.
(4) In other cases, it is necessary to use a maintenance-specific program to assist in positioning.
Check the chip when software executes to "Start to send cmd04 endlessly". At this time, you need to
use a multimeter to measure the voltage of the abnormal chip (the measurement method is as shown
in the figure below). And use an oscilloscope to measure the signals of the Nth chip and the N-1th
chip. As shown in Figure 14, if the DO/CS/SCK output by the N-1 chip is abnormal (it can be
compared with the normal waveform of the chip before the N-1th chip, if the waveform is inconsistent,
it is abnormal), then replace the N- 1 chip. If the output of the Nth chip is abnormal, replace the Nth
chip. If the output of the Nth chip is normal, but the input DI is abnormal, then replace the N+1th chip.
3. E6: N
The voltage of the Nth chip is abnormal.
Maintenance method:
(1) Use a multimeter to confirm whether the voltage of the chip is abnormal. If the voltage of the chip
is too low, detect the SCK signal of the test points of the three chips of this level, and compare the
chip with the SCK frequency jitter with other chips of different levels with higher voltage division.
Swap. If the SCK signals are normal, replace chip N with other chips of different levels with higher
voltage division.
(2) If the problem is with the chip, replace the chip.
4. E7: 0
When E7:0 appears, you need to use maintenance software to locate the problem location, the
location method is the same as E0, and test when the program runs to "CRITICAL PLL CONFIGURE
ERROR on Board 0 !!! Begin to Check SPI ... "
5. E7: N
Indicates that the chip N does not respond, and you need to replace the chip. The checking method is
the same as E0:N.
6. E1: N
The Nth chip lacks a core. If this problem occurs in a large area, submit it to our company for
research and development analysis. If only a few hash boards have such problems, replace chip N.
7. E2
The total number of cores on the board is insufficient. At this time, it is necessary to check whether
the total voltage of the circuit board is abnormal (refer to the method in E0 error), and if there is no
abnormality, it needs to be returned to the factory for repair.
8. E3: N
The softbist error rate of the Nth chip is high. The method is the same as E1:N.
9. E4: N
The pll of the Nth chip is not locked. Check the output CLK of the N-1th chip. If there is no
abnormality, resolder the N-1th and Nth chips. If it still can't be solved, replace the Nth chip.
10. E5: N
The temperature of the Nth chip exceeds the standard, replace the chip. If the problem occurs over a
large area, you need to check the heat sink. If the problem still cannot be solved, it needs to be
returned to the factory.
11. E8
The temperature of the Nth chip exceeds the standard, replace the chip. If the problem occurs over a
large area, you need to check the heat sink. If the problem still cannot be solved, it needs to be
returned to the factory.
The softbist error rate of the whole board is high, you need to check whether the board voltage and
the clock of each chip are abnormal. If abnormal, replace the abnormal chip
If there is no abnormality, it needs to be returned to the factory.。
Ⅴ. Checklist
This checklist is for maintenance reference.
Check items
(1) Workmanship inspection
CheckPoint 1. Whether the solder joints of the chip are full and whether there are tin beads
CheckPoint 2. Whether any components fall off
CheckPoint 3. Whether the chip is covered with silicone grease or heat conductive cotton
(2) Check the error message of the software for the pre-glue or post-glue test
CheckPoint 4. Correct identification of chip type
CheckPoint 5. The reading status is normal at the default frequency (Frequency=60Mhz of all
chips, Main PLL Lock=1, Temperature, Voltage are within a reasonable range)
CheckPoint 6. Successfully raised to operating frequency (PLL frequency)
CheckPoint 7. The reading status is normal under the operating frequency
(Frequency=operating frequency/2, Main PLL Lock=1, Temperature, Voltage of all chips are
within a reasonable range)
CheckPoint 8. The error rate of Soft Bist is within a reasonable range (less than 10%)
CheckPoint 9. The test software result is √
(3) PSU output
CheckPoint 10. The voltage output from the power supply to the hash board is normal (see
the indicators of specific models)
CheckPoint 11. The voltage output from the power supply to the control board is 12V ± 10%
(4) Control signal (measured after the hash board is powered on)
CheckPoint 12. EN_CORE=3.3V±10%
CheckPoint 13. RESET=1.8V±10%
CheckPoint 14. START=1.8V±10%
(5) Voltage of chip of hash board
CheckPoint 15. The total CORE voltage should be consistent with the output voltage of the
PSU
If the VID setting is unreasonable or ineffective, it will cause abnormal or unstable operation.
If the VID setting does not take effect, check whether the software and hardware programs of
the control board are correct.
CheckPoint 16. The voltage of IO at all levels shall always be 1.8V
NOTICE
This is an original piece. Reproduction in whole or part without written permission is prohibited.