-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the machine of producer 's network io is 100MB/s #1313
Comments
Set the |
@edenhill the producer's debug log when I reboot my broker(10.17.107.11), [2017-07-12 15:23:57.671260 info] (/home/fenglin/LogAgent/src/biz/sender/Sender.h:39)[thrd:10.17.107.11:9092/711]: 10.17.107.11:9092/711: Sent partial ProduceRequest (v0, 0+69504/987730 bytes, CorrId 6227) And the machine of producer's network io is 103MB/s |
@edenhill why the produceRequest is so much and the produceRequest corrid is same? |
The socket is congested (probably due to slow broker) so the client can only send about 50k per attempt, thus the "Sent partial ProduceRequest". |
I found a phenomenon; I stop a broker, the machine of producer's network io increased 12/07/17-17:16:19 4.5M 8.6M 8.5K 9.8K 0.00 0.00 when i start the broker before closing, the network io is not suddenly increased, [2017-07-12 17:18:27,222] INFO [Kafka Server 711], starting (kafka.server.KafkaServer) the debug log is 200MB, this is so big, and the kafka's in network io also is increased; |
The only way to know what is going on is checking the protocol debug, try to correlate how many bytes are sent to which broker when you are seeing this rate increase. |
@edenhill I find that rd_kafka_metadata_fast_leader_query is invalid this is my debug log;
why???? |
@edenhill I debug my program that I add some logs in your library; I find two problems My debug is |
@edenhill Please give me a reply,thank you |
Thank you for your detailed analysis, I will look into this soon. |
@edenhill Do you have any progress on this question? |
Thanks for a great analysis, you are right on point:
The bug is the for-loop conditional which is |
Can you try to reproduce this issue with the fix in place (get master branch)? |
@edenhill thank you very much, I will try it now; |
Description
When the broker is unstable because the fullgc takes too long time, the machine of producer 's network io is 100MB/s, this is so big and it's not reasonable;
my producer config is
batch.num.messages:1000
queue.buffering.max.ms: 200ms
queue.buffering.max.kbytes: 100 * 1024
queue.buffering.max.messages:100000
metadata.broker.list:broker ip
message.max.bytes: 1024 * 1023
socket.blocking.max.ms: 200
broker.version.fallback: 0.8.2.0
message.send.max.retries: 3
max.in.flight.requests.per.connection: 1000
request.required.acks: 1
My broker's log
controller log:
[2017-07-06 15:11:07,583] INFO [SessionExpirationListener on 723], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
state-change.log
[2017-07-06 15:11:22,629] TRACE Broker 723 cached leader info (LeaderAndIsrInfo:(Leader:726,ISR:726,LeaderEpoch:2,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:726,728) for partition [antispam_trustlevel,0] in response to UpdateMetadata request sent by controller 107049 epoch 19 with correlation id 140127 (state.change.logger)
[2017-07-06 15:11:22,788] TRACE Broker 723 cached leader info (LeaderAndIsrInfo:(Leader:728,ISR:713,728,LeaderEpoch:20,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:728,713) for partition [royalty_info_log,2] in response to UpdateMetadata request sent by controller 107049 epoch 19 with correlation id 140127 (state.change.logger)
[2017-07-06 15:11:22,788] TRACE Broker 723 cached leader info (LeaderAndIsrInfo:(Leader:713,ISR:713,LeaderEpoch:26,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:713,711) for partition [forum_buildcms_log,3] in response to UpdateMetadata request sent by controller 107049 epoch 19 with correlation id 140127 (state.change.logger)
[2017-07-06 15:11:22,788] TRACE Broker 723 cached leader info (LeaderAndIsrInfo:(Leader:724,ISR:724,LeaderEpoch:198,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:723,724) for partition [pay_order_schedule_info,2] in response to UpdateMetadata request sent by controller 107049 epoch 19 with correlation id 140127 (state.change.logger)
[2017-07-06 15:11:22,788] TRACE Broker 723 cached leader info (LeaderAndIsrInfo:(Leader:2081,ISR:2081,729,LeaderEpoch:0,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:2081,729) for partition [acm_etl_order,0] in response to UpdateMetadata request sent by controller 107049 epoch 19 with correlation id 140127 (state.change.logger)
[2017-07-06 15:11:22,788] TRACE Broker 723 cached leader info (LeaderAndIsrInfo:(Leader:107048,ISR:107050,107048,LeaderEpoch:11,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:107048,107050) for partition [spot_http_deviceEvent_log,0] in response to UpdateMetadata request sent by controller 107049 epoch 19 with correlation id 140127 (state.change.logger)
[2017-07-06 15:11:22,788] TRACE Broker 723 cached leader info (LeaderAndIsrInfo:(Leader:726,ISR:729,726,LeaderEpoch:17,ControllerEpoch:19),ReplicationFactor:2),AllReplicas:726,729) for partition [mobisqlstat_log,2] in response to UpdateMetadata request sent by controller 107049 epoch 19 with correlation id 140127 (state.change.logger)
server.log
[2017-07-06 15:11:23,517] WARN [KafkaApi-723] Produce request with correlation id 1399 from client LogAgent_default on partition [trade_timeout_service_stdout,0] failed due to Leader not local for partition [trade_timeout_service_stdout,0] on broker 723 (kafka.server.KafkaApis)
[2017-07-06 15:11:23,517] WARN [KafkaApi-723] Produce request with correlation id 840495808 from client on partition [mario_etl_acm_expose_v1,2] failed due to Leader not local for partition [mario_etl_acm_expose_v1,2] on broker 723 (kafka.server.KafkaApis)
[2017-07-06 15:11:23,517] WARN [KafkaApi-723] Produce request with correlation id 840473960 from client on partition [mario_etl_ump,2] failed due to Leader not local for partition [mario_etl_ump,2] on broker 723 (kafka.server.KafkaApis)
[2017-07-06 15:11:23,517] WARN [KafkaApi-723] Produce request with correlation id 836965689 from client on partition [mario_etl_mobile_page,0] failed due to Leader not local for partition [mario_etl_mobile_page,0] on broker 723 (kafka.server.KafkaApis)
[2017-07-06 15:11:23,517] WARN [KafkaApi-723] Produce request with correlation id 836993146 from client on partition [mario_etl_acm_expose_v1,2] failed due to Leader not local for partition [mario_etl_acm_expose_v1,2] on broker 723 (kafka.server.KafkaApis)
[2017-07-06 15:11:23,517] WARN [KafkaApi-723] Produce request with correlation id 836966136 from client on partition [mario_etl_mobile_page,0] failed due to Leader not local for partition [mario_etl_mobile_page,0] on broker 723 (kafka.server.KafkaApis)
[2017-07-06 15:11:23,517] WARN [KafkaApi-723] Produce request with correlation id 836974316 from client on partition [mario_etl_acm_expose_v1,2] failed due to Leader not local for partition [mario_etl_acm_expose_v1,2] on broker 723 (kafka.server.KafkaApis)
[2017-07-06 15:11:23,517] WARN [Replica Manager on Broker 723]: Fetch request with correlation id 723374 from client cube_for_etl_gmv on partition [order_refer_v6.1,3] failed due to Leader not local for partition [order_refer_v6.1,3] on broker 723 (kafka.server.ReplicaManager)
My machine monitor about the network io
06/07/17-15:14 4.5M 9.1M 8.2K 9.4K 0.00 0.00
06/07/17-15:15 5.1M 39.3M 15.4K 30.0K 0.00 0.00
06/07/17-15:16 5.2M 39.0M 16.0K 29.9K 0.00 0.00
06/07/17-15:17 5.2M 41.2M 16.5K 31.3K 0.00 0.00
06/07/17-15:18 5.6M 74.5M 22.2K 54.0K 0.00 0.00
06/07/17-15:19 6.0M 109.7M 26.9K 77.6K 0.00 0.00
06/07/17-15:20 4.6M 13.7M 8.8K 12.6K 0.00 0.00
06/07/17-15:21 4.6M 8.7M 8.0K 9.2K 0.00 0.00
06/07/17-15:22 4.6M 8.7M 7.9K 9.2K 0.00 0.00
06/07/17-15:23 4.6M 8.6M 7.9K 9.2K 0.00 0.00
06/07/17-15:24 4.7M 8.9M 8.1K 9.4K 0.00 0.00
06/07/17-15:25 4.5M 8.6M 7.8K 9.1K 0.00 0.00
06/07/17-15:26 4.7M 8.9M 8.1K 9.4K 0.00 0.00
06/07/17-15:27 4.5M 8.6M 7.8K 9.1K 0.00 0.00
06/07/17-15:28 4.6M 8.8M 8.0K 9.3K 0.00 0.00
06/07/17-15:29 4.5M 8.5M 7.8K 9.0K 0.00 0.00
06/07/17-15:30 4.6M 8.7M 7.9K 9.2K 0.00 0.00
06/07/17-15:31 4.6M 8.7M 8.0K 9.2K 0.00 0.00
06/07/17-15:32 4.6M 8.8M 8.1K 9.3K 0.00 0.00
How to reproduce
Checklist
Please provide the following information:
debug=..
as necessary) from librdkafkaThe text was updated successfully, but these errors were encountered: