Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto.offset.reset=smallest doesn't always work (it does sometimes) #730

Closed
9 tasks
KeithHenrickson-AtNominum opened this issue Jul 13, 2016 · 3 comments
Closed
9 tasks
Labels

Comments

@KeithHenrickson-AtNominum

Description

We're seeing an issue with librdkafka where when using:
auto.offset.reset=smallest

we occasionally get stuck on startup where librdkafka never gives us any messages. It seems to occur when the .offset files exist, but reference an offset lower than is contained in the store.

In this case, the .offset file for partition 0 contained: 2113787406

In the log snippet below, server and topic names have been changed to protect the guilty.

How to reproduce

Create a .offset file that's below (but not too far below), the "lo_offset" in the statistics, and try to consume from that topic/partition with the legacy consumer. We see this very frequently, but not always.

Checklist

Please provide the following information:

  • librdkafka version (release number or git tag):
    librdkafka hash: c8caf05
  • Apache Kafka version:
    kafka-0.9.0.1
  • librdkafka client configuration:
    auto.offset.reset=smallest
    offset.store.method=file
    offset.store.path=kafka-offsets/name
    enable.auto.offset.store=false
  • Operating system:
    Mac OS X 10.11
  • Using the legacy Consumer
    yes
  • Using the high-level KafkaConsumer
    no
  • Provide logs (with debug=.. as necessary) from librdkafka
    my-link: info: kafka multiconsumer 'myconsumer': { "name": "rdkafka#consumer-1", "type": "consumer", "ts":1468450633090828, "time":1468450633, "replyq":0, "msg_cnt":0, "msg_max":100000, "simple_cnt":12, "brokers":{ "spool01.myserver.com:9092/1": { "name":"spool01.myserver.com:9092/1", "nodeid":1, "state":"UP", "stateage":586788831, "outbuf_cnt":0, "outbuf_msg_cnt":0, "waitresp_cnt":0, "waitresp_msg_cnt":0, "tx":4644, "txbytes":337204, "txerrs":0, "txretries":0, "req_timeouts":0, "rx":4644, "rxbytes":227076, "rxerrs":0, "rxcorriderrs":0, "rxpartial":0, "rtt": { "min":2747, "max":9016, "avg":5794, "sum":46352, "cnt":8 }, "throttle": { "min":0, "max":0, "avg":0, "sum":0, "cnt":0 }, "toppars":{ "my-data": { "topic":"my-data", "partition":5} , "my-data": { "topic":"my-data", "partition":1} , "my-data": { "topic":"my-data", "partition":9} } } , "spool02.myserver.com:9092/2": { "name":"spool02.myserver.com:9092/2", "nodeid":2, "state":"UP", "stateage":586756523, "outbuf_cnt":0, "outbuf_msg_cnt":0, "waitresp_cnt":0, "waitresp_msg_cnt":0, "tx":4644, "txbytes":337228, "txerrs":0, "txretries":0, "req_timeouts":0, "rx":4644, "rxbytes":226516, "rxerrs":0, "rxcorriderrs":0, "rxpartial":0, "rtt": { "min":2722, "max":9774, "avg":6177, "sum":49416, "cnt":8 }, "throttle": { "min":0, "max":0, "avg":0, "sum":0, "cnt":0 }, "toppars":{ "my-data": { "topic":"my-data", "partition":2} , "my-data": { "topic":"my-data", "partition":10} , "my-data": { "topic":"my-data", "partition":6} } } , "spool03.myserver.com:9092/3": { "name":"spool03.myserver.com:9092/3", "nodeid":3, "state":"UP", "stateage":586783303, "outbuf_cnt":0, "outbuf_msg_cnt":0, "waitresp_cnt":0, "waitresp_msg_cnt":0, "tx":4640, "txbytes":336912, "txerrs":0, "txretries":0, "req_timeouts":0, "rx":4640, "rxbytes":226880, "rxerrs":0, "rxcorriderrs":0, "rxpartial":0, "rtt": { "min":2721, "max":11627, "avg":7162, "sum":50139, "cnt":7 }, "throttle": { "min":0, "max":0, "avg":0, "sum":0, "cnt":0 }, "toppars":{ "my-data": { "topic":"my-data", "partition":11} , "my-data": { "topic":"my-data", "partition":7} , "my-data": { "topic":"my-data", "partition":3} } } , "spool04.myserver.com:9092/4": { "name":"spool04.myserver.com:9092/4", "nodeid":4, "state":"UP", "stateage":586783295, "outbuf_cnt":0, "outbuf_msg_cnt":0, "waitresp_cnt":0, "waitresp_msg_cnt":0, "tx":4643, "txbytes":337165, "txerrs":0, "txretries":0, "req_timeouts":0, "rx":4643, "rxbytes":226484, "rxerrs":0, "rxcorriderrs":0, "rxpartial":0, "rtt": { "min":2832, "max":9529, "avg":6715, "sum":47007, "cnt":7 }, "throttle": { "min":0, "max":0, "avg":0, "sum":0, "cnt":0 }, "toppars":{ "my-data": { "topic":"my-data", "partition":8} , "my-data": { "topic":"my-data", "partition":4} , "my-data": { "topic":"my-data", "partition":0} } } }, "topics":{ "my-data": { "topic":"my-data", "metadata_age":286922, "partitions":{ "0": { "partition":0, "leader":4, "desired":true, "unknown":false, "msgq_cnt":0, "msgq_bytes":0, "xmit_msgq_cnt":0, "xmit_msgq_bytes":0, "fetchq_cnt":0, "fetchq_size":0, "fetch_state":"active", "query_offset":-2, "next_offset":-1001, "app_offset":-1001, "stored_offset":2113787406, "commited_offset":2113787406, "committed_offset":2113787406, "eof_offset":-1001, "lo_offset":2114209034, "hi_offset":-1, "consumer_lag":-1, "txmsgs":0, "txbytes":0, "msgs": 0, "rx_ver_drops": 0 } , "1": { "partition":1, "leader":1, "desired":true, "unknown":false, "msgq_cnt":0, "msgq_bytes":0, "xmit_msgq_cnt":0, "xmit_msgq_bytes":0, "fetchq_cnt":0, "fetchq_size":0, "fetch_state":"active", "query_offset":-2, "next_offset":-1001, "app_offset":-1001, "stored_offset":2116569138, "commited_offset":2116569138, "committed_offset":2116569138, "eof_offset":-1001, "lo_offset":2117020649, "hi_offset":-1, "consumer_lag":-1, "txmsgs":0, "txbytes":0, "msgs": 0, "rx_ver_drops": 0 } , "2": { "partition":2, "leader":2, "desired":true, "unknown":false, "msgq_cnt":0, "msgq_bytes":0, "xmit_msgq_cnt":0, "xmit_msgq_bytes":0, "fetchq_cnt":0, "fetchq_size":0, "fetch_state":"active", "query_offset":-2, "next_offset":-1001, "app_offset":-1001, "stored_offset":2116730169, "commited_offset":2116730169, "committed_offset":2116730169, "eof_offset":-1001, "lo_offset":2117177542, "hi_offset":-1, "consumer_lag":-1, "txmsgs":0, "txbytes":0, "msgs": 0, "rx_ver_drops": 0 } , "3": { "partition":3, "leader":3, "desired":true, "unknown":false, "msgq_cnt":0, "msgq_bytes":0, "xmit_msgq_cnt":0, "xmit_msgq_bytes":0, "fetchq_cnt":0, "fetchq_size":0, "fetch_state":"active", "query_offset":-2, "next_offset":-1001, "app_offset":-1001, "stored_offset":2116919186, "commited_offset":2116919186, "committed_offset":2116919186, "eof_offset":-1001, "lo_offset":2117368869, "hi_offset":-1, "consumer_lag":-1, "txmsgs":0, "txbytes":0, "msgs": 0, "rx_ver_drops": 0 } , "4": { "partition":4, "leader":4, "desired":true, "unknown":false, "msgq_cnt":0, "msgq_bytes":0, "xmit_msgq_cnt":0, "xmit_msgq_bytes":0, "fetchq_cnt":0, "fetchq_size":0, "fetch_state":"active", "query_offset":-2, "next_offset":-1001, "app_offset":-1001, "stored_offset":2116033757, "commited_offset":2116033757, "committed_offset":2116033757, "eof_offset":-1001, "lo_offset":2116484031, "hi_offset":-1, "consumer_lag":-1, "txmsgs":0, "txbytes":0, "msgs": 0, "rx_ver_drops": 0 } , "5": { "partition":5, "leader":1, "desired":true, "unknown":false, "msgq_cnt":0, "msgq_bytes":0, "xmit_msgq_cnt":0, "xmit_msgq_bytes":0, "fetchq_cnt":0, "fetchq_size":0, "fetch_state":"active", "query_offset":-2, "next_offset":-1001, "app_offset":-1001, "stored_offset":2113222139, "commited_offset":2113222139, "committed_offset":2113222139, "eof_offset":-1001, "lo_offset":2113644021, "hi_offset":-1, "consumer_lag":-1, "txmsgs":0, "txbytes":0, "msgs": 0, "rx_ver_drops": 0 } , "6": { "partition":6, "leader":2, "desired":true, "unknown":false, "msgq_cnt":0, "msgq_bytes":0, "xmit_msgq_cnt":0, "xmit_msgq_bytes":0, "fetchq_cnt":0, "fetchq_size":0, "fetch_state":"active", "query_offset":-2, "next_offset":-1001, "app_offset":-1001, "stored_offset":2115925951, "commited_offset":2115925951, "committed_offset":2115925951, "eof_offset":-1001, "lo_offset":2116347889, "hi_offset":-1, "consumer_lag":-1, "txmsgs":0, "txbytes":0, "msgs": 0, "rx_ver_drops": 0 } , "7": { "partition":7, "leader":3, "desired":true, "unknown":false, "msgq_cnt":0, "msgq_bytes":0, "xmit_msgq_cnt":0, "xmit_msgq_bytes":0, "fetchq_cnt":0, "fetchq_size":0, "fetch_state":"active", "query_offset":-2, "next_offset":-1001, "app_offset":-1001, "stored_offset":2116478558, "commited_offset":2116478558, "committed_offset":2116478558, "eof_offset":-1001, "lo_offset":2116930501, "hi_offset":-1, "consumer_lag":-1, "txmsgs":0, "txbytes":0, "msgs": 0, "rx_ver_drops": 0 } , "8": { "partition":8, "leader":4, "desired":true, "unknown":false, "msgq_cnt":0, "msgq_bytes":0, "xmit_msgq_cnt":0, "xmit_msgq_bytes":0, "fetchq_cnt":0, "fetchq_size":0, "fetch_state":"active", "query_offset":-2, "next_offset":-1001, "app_offset":-1001, "stored_offset":2115560870, "commited_offset":2115560870, "committed_offset":2115560870, "eof_offset":-1001, "lo_offset":2116012482, "hi_offset":-1, "consumer_lag":-1, "txmsgs":0, "txbytes":0, "msgs": 0, "rx_ver_drops": 0 } , "9": { "partition":9, "leader":1, "desired":true, "unknown":false, "msgq_cnt":0, "msgq_bytes":0, "xmit_msgq_cnt":0, "xmit_msgq_bytes":0, "fetchq_cnt":0, "fetchq_size":0, "fetch_state":"active", "query_offset":-2, "next_offset":-1001, "app_offset":-1001, "stored_offset":2115618219, "commited_offset":2115618219, "committed_offset":2115618219, "eof_offset":-1001, "lo_offset":2116069488, "hi_offset":-1, "consumer_lag":-1, "txmsgs":0, "txbytes":0, "msgs": 0, "rx_ver_drops": 0 } , "10": { "partition":10, "leader":2, "desired":true, "unknown":false, "msgq_cnt":0, "msgq_bytes":0, "xmit_msgq_cnt":0, "xmit_msgq_bytes":0, "fetchq_cnt":0, "fetchq_size":0, "fetch_state":"active", "query_offset":-2, "next_offset":-1001, "app_offset":-1001, "stored_offset":2111692915, "commited_offset":2111692915, "committed_offset":2111692915, "eof_offset":-1001, "lo_offset":2112141207, "hi_offset":-1, "consumer_lag":-1, "txmsgs":0, "txbytes":0, "msgs": 0, "rx_ver_drops": 0 } , "11": { "partition":11, "leader":3, "desired":true, "unknown":false, "msgq_cnt":0, "msgq_bytes":0, "xmit_msgq_cnt":0, "xmit_msgq_bytes":0, "fetchq_cnt":0, "fetchq_size":0, "fetch_state":"active", "query_offset":-2, "next_offset":-1001, "app_offset":-1001, "stored_offset":2116825317, "commited_offset":2116825317, "committed_offset":2116825317, "eof_offset":-1001, "lo_offset":2117248751, "hi_offset":-1, "consumer_lag":-1, "txmsgs":0, "txbytes":0, "msgs": 0, "rx_ver_drops": 0 } , "-1": { "partition":-1, "leader":-1, "desired":false, "unknown":false, "msgq_cnt":0, "msgq_bytes":0, "xmit_msgq_cnt":0, "xmit_msgq_bytes":0, "fetchq_cnt":0, "fetchq_size":0, "fetch_state":"none", "query_offset":0, "next_offset":0, "app_offset":-1001, "stored_offset":-1001, "commited_offset":-1001, "committed_offset":-1001, "eof_offset":-1001, "lo_offset":-1001, "hi_offset":-1001, "consumer_lag":-1, "txmsgs":0, "txbytes":0, "msgs": 0, "rx_ver_drops": 0 } } } } }
  • Provide broker log excerpts
  • Critical issue
@edenhill
Copy link
Contributor

Can you reproduce this with debug=topic?
Thanks

@abc100m
Copy link

abc100m commented Sep 8, 2016

How is this issue going? our apps seems like have this same issue.

Thanks

@edenhill
Copy link
Contributor

edenhill commented Sep 8, 2016

@abc100m What librdkafka version/tag are you using?
Can you provide debug logs with debug=topic?

edenhill added a commit that referenced this issue Oct 28, 2016
…730)

 * avoid unwanted non-wait-unassign -> init state transition
 * toppars could be lost dangling between desired and unknown state
 * the previous assignment could get in the way of the new assignment
 * OffsetCommit for current assignment was poorly handling lack of coordinator
 * OffsetCommit without any offsets to commit could cause a wait_commit stall
edenhill added a commit that referenced this issue Nov 3, 2016
…730)

 * avoid unwanted non-wait-unassign -> init state transition
 * toppars could be lost dangling between desired and unknown state
 * the previous assignment could get in the way of the new assignment
 * OffsetCommit for current assignment was poorly handling lack of coordinator
 * OffsetCommit without any offsets to commit could cause a wait_commit stall

(cherry picked from commit 2cb7a29)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants