Skip to content

Sometimes one process node owns 2 tasks #8

@lyogavin

Description

@lyogavin

When using S4, we found sometimes it ends up with one process node owns 2 tasks. I did some investigation, it seems that the handling of ConnectionLossException when creating the ephemeral node is problematic. Sometimes when the response from zookeeper server times out, zookeeper.create() will fail with ConnectionLossException while the creation request might already be sent to server(see http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java line 830). From our logs this is the case we ran into.

Maybe we should handle it in the way that HBase is handling it (http://svn.apache.org/viewvc/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java?view=markup), just simply exit the process when got that exception to let the whole process restart.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions