-
Notifications
You must be signed in to change notification settings - Fork 352
support for multiple namenodes with failover #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for multiple namenodes with failover #77
Conversation
This looks like it should work fine, although testing is a concern. The other issue is that we can't change the signature of |
any progress? |
I've recently refactored this to support passing options, just need to do a bit more local testing before pushing it up. |
6ed6a15
to
0e45348
Compare
@colinmarc only took 3 months, but I've updated the PR. Let me know what you think. Cheers! |
} | ||
return namenodes[0], nil | ||
return namenodes, nil | ||
} | ||
|
||
// NewForUser returns a connected Client with the user specified, or an error if | ||
// it can't connect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably deprecate these, but that can be a separate PR.
rpc/namenode.go
Outdated
func WrapNamenodeConnection(conn net.Conn, user string) (*NamenodeConnection, error) { | ||
// NewNamenodeConnectionWithOptions creates a new connection to a namenode with | ||
// the given options and performs an initial handshake. | ||
func NewNamenodeConnectionWithOptions(options NamenodeConnectionOptions) (*NamenodeConnection, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, because I'm dumb and made this exported way back when I started this library, we can't just remove the old function. Can you leave it and make it deprecated?
rpc/namenode.go
Outdated
} | ||
|
||
if c.conn == nil { | ||
return fmt.Errorf("No available namenodes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No
shouldn't be capitalized. This can also just be errors.New
. Better, though, would be to include the last error, so that this isn't completely opaque in degenerate cases: something like fmt.Errorf("no available namenodes: %s", lastErr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also, for the single-namenode case, that would report the error, which is what we want)
rpc/namenode.go
Outdated
@@ -55,45 +67,96 @@ func (err *NamenodeError) Error() string { | |||
return s | |||
} | |||
|
|||
// NewNamenodeConnection creates a new connection to a Namenode, and preforms an | |||
type host struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namenodeHost
?
rpc/namenode.go
Outdated
} | ||
|
||
return err | ||
c.markFailure(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing, but I think this treats StandbyException
s the same as any non-NamenodeError
error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the operation will only retry if it's either a non-NamenodeError
or a StandbyException
rpc/namenode.go
Outdated
err = c.writeRequest(method, req) | ||
if err != nil { | ||
c.markFailure(err) | ||
goto R |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere in the code I use for
loops and continue
/break
rather than goto. This is probably a bit shorter, but I think harder to read.
Thanks for the changes! The netcode here looks pretty solid, but I'm a bit nervous about not really having a good test framework for it. Has anyone run this in production? |
@colinmarc: I think I've implemented all of your suggestions now. I understand your concerns about testing this. Locally I've been using a hadoop-ha Docker setup - maybe this would be an option for automated testing too? We've been using this code in production for a distributed service that mirrors files from HDFS. We haven't had any issues - it handled an active namenode switchover as expected. |
Sorry for the delay. I poked around with this locally and it does seem to do the thing, so I'm going to merge it. I wish we had better tests for the different connection failure modes, but I don't really see a way to do that. Thanks so much for the contribution! |
Hello! Here's a basic implementation for multiple namenode support.
NamenodeError
(i.e. EOF) orStandbyException
, the client will close the current connection, try to establish a new connection with another host, and retry the request."No available namenodes"
error.I realise there isn't much in terms of testing for this behaviour - would be happy to hear some suggestions for how this could be done.