Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block during write causing Celluloid IO to crash #22

Closed
fionawhim opened this issue Feb 12, 2015 · 7 comments
Closed

Block during write causing Celluloid IO to crash #22

fionawhim opened this issue Feb 12, 2015 · 7 comments

Comments

@fionawhim
Copy link

I filed this issue over at Celluloid IO, but it came up in Krakow so I wanted to raise it here in case a workaround is in order: celluloid/celluloid-io#132

The summary is that Celluloid IO crashes if the same actor is blocking on both reads and writes simultaneously. Because of Krakow's read loop, it is basically always blocking on reads. It can end up blocking on writes as well due to slow network + especially large message, which is how we're seeing it in production.

Not sure if there's a fix outside of Celluloid besides somehow refactoring the Krakow Producer into more actors, or hacking around Celluloid IO to do direct synchronous writes to the socket, sidestepping the problematic code.

@chrisroberts
Copy link
Owner

There's enough info in the celluloid-io issue that I should be able to throw together something to replicate the issue you're encountering pretty easily. Will throw updates in here once I'm reliably reproducing the behavior.

@chrisroberts
Copy link
Owner

and thanks for the report!

@fionawhim
Copy link
Author

I'm not entirely sure I would recommend this for anyone else, but we're working around the problem thusly: https://github.com/crashlytics/krakow/pull/2

I'm not sure that I'm able to fix the root problem in Celluloid IO.

@fionawhim
Copy link
Author

This actually has a knock-on effect: because Krakow didn't get to write the entire message when the ArgumentError was thrown, NSQ is waiting on its end of the socket for the rest of the data. Unless a timeout occurs, it will incorporate the subsequent commands as part of the message body.

This manifested for us as JSON parse errors due to inexplicable "PUB \n" in the middle of our messages.

Unless you go for the jank workaround I posted above, Krakow may want to have additional exceptions (at least ArgumentError, but maybe everything?) to cause reconnect!s in safe_socket.

@chrisroberts
Copy link
Owner

Alright, I have a nice changeset shoved into develop now that addresses these things if you feel like following along. I'm running through and refactoring all the spec tests, and once I have them properly in place and pushed, I'll get this released.

@fionawhim
Copy link
Author

Thanks for being so responsive about this. Taking a look.

@chrisroberts
Copy link
Owner

Released: https://github.com/chrisroberts/krakow/tree/v0.4.0

Thanks for the report!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants