Unix Socket Programming Question:

Download Job Interview Questions and Answers PDF

Why does it take so long to detect that the peer died?

Unix Socket Programming Interview Question
Unix Socket Programming Interview Question

Answer:

Because by default, no packets are sent on the TCP connection unless there is data to send or acknowledge.

So, if you are simply waiting for data from the peer, there is no way to tell if the peer has silently gone away, or just isn't ready to send any more data yet. This can be a problem (especially if the peer is a PC, and the user just hits the Big Switch...).

One solution is to use the SO_KEEPALIVE option. This option enables periodic probing of the connection to ensure that the peer is still present. BE WARNED: the default timeout for this option is AT LEAST 2 HOURS. This timeout can often be altered (in a system-dependent fashion) but not normally on a per-connection basis (AFAIK).

RFC1122 specifies that this timeout (if it exists) must be configurable. On the majority of Unix variants, this configuration may only be done globally, affecting all TCP connections which have keepalive enabled. The method of changing the value, moreover, is often difficult and/or poorly documented, and in any case is different for just about every version in existence.

If you must change the value, look for something resembling tcp_keepidle in your kernel configuration or network options configuration.

If you're sending to the peer, though, you have some better guarantees; since sending data implies receiving ACKs from the peer, then you will know after the retransmit timeout whether the peer is still alive. But the retransmit timeout is designed to allow for various contingencies, with the intention that TCP connections are not dropped simply as a result of minor network upsets. So you should still expect a delay of several minutes before getting notification of the failure.

The approach taken by most application protocols currently in use on the Internet (e.g. FTP, SMTP etc.) is to implement read timeouts on the server end; the server simply gives up on the client if no requests are received in a given time period (often of the order of 15 minutes). Protocols where the connection is maintained even if idle for long periods have two choices:

1. use SO_KEEPALIVE

2. use a higher-level keepalive mechanism (such as sending a null request to the server every so often).

Download Unix Socket Programming Interview Questions And Answers PDF

Previous QuestionNext Question
What are the pros/cons of select(), non-blocking I/O and SIGIO?Why do I get EPROTO from read()?