Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhandled 'error' event #76

Open
valeriansaliou opened this issue May 10, 2013 · 34 comments
Open

Unhandled 'error' event #76

valeriansaliou opened this issue May 10, 2013 · 34 comments
Assignees

Comments

@valeriansaliou
Copy link
Contributor

Hello,

I upgraded Jappix.com to latest NXB v0.7.3 from NPM + node v0.6.12, and it appears it is sometimes crashing with the following bug (repeated 2 times there):

Error: 3072534224:error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error:s23_clnt.c:741:

    at CleartextStream._puller (tls.js:501:24)
    at CleartextStream._pull (tls.js:432:19)
    at SecurePair.cycle (tls.js:727:20)
    at EncryptedStream.write (tls.js:130:13)
    at Socket.ondata (stream.js:38:26)
    at Socket.emit (events.js:67:17)
    at TCP.onread (net.js:372:14)

events.js:48
        throw arguments[1]; // Unhandled 'error' event
                       ^
Error: 3072566992:error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error:s23_clnt.c:741:

    at CleartextStream._puller (tls.js:501:24)
    at CleartextStream._pull (tls.js:432:19)
    at SecurePair.cycle (tls.js:727:20)
    at EncryptedStream.write (tls.js:130:13)
    at Socket.ondata (stream.js:38:26)
    at Socket.emit (events.js:67:17)
    at TCP.onread (net.js:372:14)
@ghost ghost assigned dhruvbird May 10, 2013
@dhruvbird
Copy link
Collaborator

@valeriansaliou Any idea which host(s) this happened for (do you have debug logs enabled)? It seems to be a TLS connection to the upstream xmpp server that is causing problems.

@dhruvbird
Copy link
Collaborator

@valeriansaliou By any chance, did you update openssl on that box?
https://bugs.launchpad.net/ubuntu/+source/openssl/+bug/986147

@valeriansaliou
Copy link
Contributor Author

This cannot happen with jappix.com and anonymous.jappix.com since they are defined as "non-TLS hosts" in NXB configuration file. I can enable debug logs but I'm afraid the event we're looking for will get lost in a junk of messages.

About OpenSSL, I'm running OpenSSL 1.0.1e 11 Feb 2013 from Debian 7 Wheezy official repos. System is up-to-date.

@dhruvbird
Copy link
Collaborator

Sorry, so I just realized that node.js comes bundles with openssl and it uses its own version rather than the system default.

Can you try running (from within the node.js_source_dir/deps/openss/)

node-v0.6.17/deps/openssl$ cat README.chromium  | grep Version

@dhruvbird
Copy link
Collaborator

Will try and send you a patch that enables logging only for TLS initiation. That might narrow the problem down.

@dhruvbird
Copy link
Collaborator

@valeriansaliou Can you try running https://github.com/dhruvbird/node-xmpp-bosh/tree/log_TLS_initiation with --logging=WARN and let me know what you see.

@valeriansaliou
Copy link
Contributor Author

Thanks Dhruv! BTW, I'm running node v0.6.21 (latest one of the 0.6 branch).

OpenSSL version is 0.9.8o (result from the cat you gave me on my current node sources).

I'll run this with WARN logging, the log result may take up to a few hours since it doesn't crash a lot (3-4 times a day only - at very random intervals).

@valeriansaliou
Copy link
Contributor Author

One side question: I'm using NPM, any way to update/reinstall NXB from NPM using this log_TLS_initiation branch? (just as ROR does with Gems that are directly fetched from Github).

@dhruvbird
Copy link
Collaborator

So, you can install everything normally from npm, and just apply the diff using this command:

$ cd node-xmpp-bosh-git-directory
$ git pull --rebase
$ git checkout -b log_TLS_initiation --track origin/log_TLS_initiation
$ git diff master # Will show you the changes

It's just a single line change, so you can copy the file over in your npm install directory (i.e. wherever npm has installed NXB).

@valeriansaliou
Copy link
Contributor Author

Done. I'll report you the WARN trace when I'll get one. Thanks ;)

BTW, not in the topic but is that normal that the BOSH send me back this:

<body xmlns="http://jabber.org/protocol/httpbind" message="Timed out"/>

When the wait maximum time has come? I mean, I was getting a blank message with the previous NXB version I was using:

<body xmlns="http://jabber.org/protocol/httpbind">

@dhruvbird
Copy link
Collaborator

@valeriansaliou Thanks for reporting this actually! :)

And yes, we added "message" to some of the responses for better debugability so that when we see logs, it's easier to see what's been happening.

@valeriansaliou
Copy link
Contributor Author

Okay, that's fine. An user of our server thought something went wrong and asked me why he always got this.

BTW, here's the crash log I got 12 minutes ago (crashed 2 times in 14 hours):

Restoring old listeners
Restore::event: connect
Remove all listeners for event: connect
Restore::event: error
Remove all listeners for event: error
Restore::event: timeout
Remove all listeners for event: timeout
[2013-05-15 14:20:28.355] [WARN] [xmpp-proxy.js:XMPPProxy._starttls:143] - startTLS to billard-aktuell.de requested
Restoring old listeners
Restore::event: connect
Remove all listeners for event: connect
Restore::event: error
Remove all listeners for event: error
Restore::event: timeout
Remove all listeners for event: timeout
[2013-05-15 14:20:31.178] [WARN] [xmpp-proxy.js:XMPPProxy._starttls:143] - startTLS to billard-aktuell.de requested
Restoring old listeners
Restore::event: connect
Remove all listeners for event: connect
Restore::event: error
Remove all listeners for event: error
Restore::event: timeout
Remove all listeners for event: timeout
[2013-05-15 14:20:36.637] [WARN] [xmpp-proxy.js:XMPPProxy._starttls:143] - startTLS to billard-aktuell.de requested
+----------------------------------------------------------------------------------------------------------------------+
| Starting BOSH server 'v0.7.3' on 'http://:::5280/^\/http-bind(\/+)?$/' at 'Wed May 15 2013 16:21:02 GMT+0200 (CEST)' |
+----------------------------------------------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------------------------+
| Starting WEBSOCKET server 'v0.7.3' on ws://:::5280' at 'Wed May 15 2013 16:21:02 GMT+0200 (CEST)' |
+---------------------------------------------------------------------------------------------------+
Restoring old listeners
Restore::event: connect
Remove all listeners for event: connect
Restore::event: error
Remove all listeners for event: error
Restore::event: timeout
Remove all listeners for event: timeout
Restoring old listeners
Restore::event: connect
Remove all listeners for event: connect

And the associated error:

events.js:48
        throw arguments[1]; // Unhandled 'error' event
                       ^
Error: 3072362192:error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error:s23_clnt.c:741:

    at CleartextStream._puller (tls.js:501:24)
    at CleartextStream._pull (tls.js:432:19)
    at SecurePair.cycle (tls.js:727:20)
    at EncryptedStream.write (tls.js:130:13)
    at Socket.ondata (stream.js:38:26)
    at Socket.emit (events.js:67:17)
    at TCP.onread (net.js:372:14)

The crash occurred exactly on 2013-05-15 14:20:36.637.

@dhruvbird
Copy link
Collaborator

@valeriansaliou I tried reproducing the crash by connecting to billard-aktuell.de by connecting to it in a loop and doing a starttls, but both node-v0.6.17 and node-v0.6.21 seem to not crash. Do you know if this crash happens only for this host?

@valeriansaliou
Copy link
Contributor Author

@dhruvbird not sure it only happens on this host, but I have checked my logs and it seems to be recurrent (checked 3 different crashes, and the same domain comes out everytime).

Also, the crash actually happens one time over a hundred. I have many TLS logs for this host, but not that many crash after the TLS log.

@dhruvbird
Copy link
Collaborator

@valeriansaliou I've pushed out a commit to try and see if the TLS module is emitting an error event after emitting a close event (that could explain why the error event is unhandled). You should be able to apply it on top of current npm (v0.7.4).
Commit is on branch: https://github.com/dhruvbird/node-xmpp-bosh/tree/log_TLS_initiation

@valeriansaliou
Copy link
Contributor Author

Logging started. I will report you subsequent crashes. ;)

@valeriansaliou
Copy link
Contributor Author

Got this now:

From bosh.err:

events.js:48
        throw arguments[1]; // Unhandled 'error' event
                       ^
Error: 3072407248:error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error:s23_clnt.c:741:

    at CleartextStream._puller (tls.js:501:24)
    at CleartextStream._pull (tls.js:432:19)
    at SecurePair.cycle (tls.js:727:20)
    at EncryptedStream.write (tls.js:130:13)
    at Socket.ondata (stream.js:38:26)
    at Socket.emit (events.js:67:17)
    at TCP.onread (net.js:372:14)

From bosh.log:

starttls::onclose() -> 547
starttls::onclose() -> 550
starttls::onclose() -> 416
starttls::onclose() -> 552
+----------------------------------------------------------------------------------------------------------------------+
| Starting BOSH server 'v0.7.4' on 'http://:::5280/^\/http-bind(\/+)?$/' at 'Sun May 19 2013 13:21:01 GMT+0200 (CEST)' |
+----------------------------------------------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------------------------+
| Starting WEBSOCKET server 'v0.7.4' on ws://:::5280' at 'Sun May 19 2013 13:21:01 GMT+0200 (CEST)' |
+---------------------------------------------------------------------------------------------------+
starttls::onclose() -> 24
starttls::onclose() -> 25
starttls::onclose() -> 27

I also got some WARN that I assume are normal (apart from the crash, those WARN came maybe half an hour before it happened, unrelated):

starttls::onclose() -> 68
starttls::onclose() -> 69
starttls::onclose() -> 70
[2013-05-19 09:24:29.337] [WARN] [lookup-service.js:give_up_trying_to_connect:118] - Giving up connection attempts to [email protected]
starttls::onclose() -> 71
starttls::onclose() -> 72
[2013-05-19 09:24:36.061] [WARN] [lookup-service.js:give_up_trying_to_connect:118] - Giving up connection attempts to [email protected]
starttls::onclose() -> 73
starttls::onclose() -> 74
[2013-05-19 09:25:08.118] [WARN] [lookup-service.js:give_up_trying_to_connect:118] - Giving up connection attempts to hotmail.com
starttls::onclose() -> 63
[2013-05-19 09:25:31.320] [WARN] [lookup-service.js:give_up_trying_to_connect:118] - Giving up connection attempts to [email protected]
starttls::onclose() -> 75
starttls::onclose() -> 76
starttls::onclose() -> 77
[2013-05-19 09:25:57.199] [WARN] [lookup-service.js:give_up_trying_to_connect:118] - Giving up connection attempts to https://jappix.com/
starttls::onclose() -> 78
starttls::onclose() -> 79

@dhruvbird
Copy link
Collaborator

@valeriansaliou I see a lot of onclose() calls, but no onerror() calls. Were any onerror() calls logged for this duration?

@valeriansaliou
Copy link
Contributor Author

Yeah, actually there's one:

starttls::onerror() -> 29

@valeriansaliou
Copy link
Contributor Author

Check your emails BTW ;)

@valeriansaliou
Copy link
Contributor Author

After restarting my XMPPd today, I noticed NXB crashed just some milliseconds after the restart, the first time in 3 days it did not crash.

Same error log as before.

I do think this is only due to an unhandled error type on socket drop/close, that may have been introduced in nodejs 0.6.21 (well, the last one from 0.6 branch, not sure it is 21). Adding the error handler should do the trick, right?

@dhruvbird
Copy link
Collaborator

I tried adding the error handler in the patch I sent, but it doesn't seem to help. Will need some more investigation.

@valeriansaliou
Copy link
Contributor Author

Oh, I updated NXB to v0.7.4 in between. I was thinking you did not change anything else than logging. Sorry for the misunderstanding.

I'm re-adding your changes to my running v0.7.4, and keep you up-to-date in a week about whether it crashed or not.

@dhruvbird
Copy link
Collaborator

@valeriansaliou
Copy link
Contributor Author

Mhh so you suggest I give a try to nodejs v0.10?

I got some strange issues with my proxy server on front of NXB when first testing with node v0.10 but this may be linked to the fact I was changing a lot of things on the server, so I might have messed up my mind. The issue was that I started to get blank proxied pages after a while. Not a 500 error, blank pages. But anyway I can retry.

Will node v0.10 be more clear on the error message or fix the whole issue?

@dhruvbird
Copy link
Collaborator

Okay, so there were some bug-fixes on an external starttls module that my very old code didn't have.
I've pushed those changes here: https://github.com/dhruvbird/node-xmpp-bosh/tree/log_TLS_initiation
Please could you run from that branch and let me know if that fixes the crash?

@valeriansaliou
Copy link
Contributor Author

Thanks @dhruvbird ;)

I have updated Node to v0.10.8 (it's said it's 2 times faster on I/O than the 0.6 branch I was using).
NXB from log_TLS_initiation branch is now running, I'll report you any further error.

@dhruvbird
Copy link
Collaborator

@valeriansaliou Cool - were you able to it to work for ipv6 addresses?

@valeriansaliou
Copy link
Contributor Author

Absolutely. A telnet on IPv6 :5280 can be opened and my lighttpd frontend proxies on local loopback IPv6 sockets anyway ;)

@valeriansaliou
Copy link
Contributor Author

Mhh, crashed after an hour (running very well on latest Node version before crash):

From bosh.err:

node: ../src/node_crypto.cc:975: void node::crypto::Connection::ClearError(): Assertion `handle_->Get(String::New("error"))->BooleanValue() == false' failed.

It seem to be related to the error we previously had.

This time, the crash followed an overactivity kick from myself. I generated a lot of traffic on my account session, getting a lot of vCards at the same time, and should have been kicked. Instead NXB crashed.

Note that it's really weird I got something coming from node_crypto.css, which is a TLS lib as I understand, but I was not using a TLS c2s connection since connections to my local loopback are set to be plain text (in NXB config file, no TLS domains).

@dhruvbird
Copy link
Collaborator

@valeriansaliou The error seems helpful. I'll post on google groups and see if something comes out of it.

Are you sure the config file was being used? nxb prints out the config. options in debug (or maybe trace) mode.

@valeriansaliou
Copy link
Contributor Author

@dhruvbird great. I'm in FATAL for now, but config is actually used, otherwise NXB won't be listening on an IPv6 socket ;)

@valeriansaliou
Copy link
Contributor Author

Downgrading to v0.8.23, v0.10.x is causing blank pages after w/o making NXB crash (and no error log). I assume this is an HTTP server bug.

@valeriansaliou
Copy link
Contributor Author

Now with v0.8.23 it looks like I don't get blank page issue, but still crashing. Crashes gives that error trace log:

assert.js:102
throw new assert.AssertionError({
^
AssertionError: handle != self._handle
at Object.afterConnect as oncomplete

Seems to match v0.10 error log.

By the way, I got some WARNs during NXB install using NPM, that some piece of third party software were deprecated:

npm WARN unmet dependency /usr/local/lib/node_modules/npm/node_modules/fstream-npm/node_modules/fstream-> ignore requires inherits@'~1.0.0' but will load
npm WARN unmet dependency /usr/local/lib/node_modules/npm/node_modules/fstream-npm/node_modules/inherits,
npm WARN unmet dependency which is version 2.0.0

Otherwise the rest seems to be okay. Related to a stream package, may be related to the crash?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants