G_poll times out with Windows

I am working on porting the Aravis library library to mingw32-w64, hitting an issue with g_poll not receiving replies during device discovery. I was able to isolate MWE of the issue, the problem manifests both under wine and Windows.

The client sends a special broadcast discovery packet (to every network interface; the MWE sends to one only) and waits for replies, similar to DHCP (broadcast DHCP discover, receive DHCP offer). Replies are processed via g_poll and while it works fine under Linux, no reply is ever obtained under Windows. I can see in Wireshark that the reply arrives to the machine but the code won’t pick it up.

There issue was raised previously on Gtk±dev without any definitive resolution, though @lrn (manifestly a guru on the topic) wrote glib/tests/gpoll.c specifically for that purpose.

I am not much familiar with the Win32 (or with low-level networking, for that matter) and reading the gpoll.c did not give me any hint for proceeding.

Could someone nail it down? The MWE is only 60 lines of code.

Thanks for any help!

(PS can’t post more than 2 links, sorry)

1 Like

Missing links: Aravis library, GTK±dev post on the topic .

  1. This is not truly an MWE without a server part (the one that sends replies) - unless a developer somehow happens to have a gvcp server just laying around.

  2. If the GTK-plus-dev post that you’ve unearthed wasn’t clear enough, i’ll repeat: Windows socket I/O and POSIX socket I/O are similar, but not identical. The difference gets exposed when it comes to corner-cases, obscure details or rarely-used functionality. I think that broadcast packets belong to one of these categories. There could be 1001 things that go wrong due to subtle differences between socket implementations. I don’t know anything about broadcast sockets (on either platform), so i can’t even begin to guess where the problem lies. The advice from that post still stands: figure out how to do this on Windows with Windows APIs, then see what GLib is doing wrong. Debugging (with a full MWE) may or may not help (if you’re lucky, g_poll does wake up from WaitForMultipleObjects, but then fails to detect that the socket is readable; if you’re unlucky, g_poll just doesn’t wake up from WaitForMultipleObjects…if so, whatchagonnado?).

Thanks for point 1, it made me think a bit and I isolated just the listening part, which is where the problem really is.

The updated and real MWE is just the listener, plus there is a python 6-liner sending (unicast) packets to that UDP port periodically (every 200ms). It is localhost:3956 by default (both the script and the binary accepts 2 optional args, IP address and port). Linux receives the packet, wine does not (not on localhost, not on another local address).

As I see it (you might agree), this changes point 2. Plain and simple listening on UDP port for incoming packet(s) is hardly a corner-case or something obscure. Or is it?

I also added a winsock-based UDP listener which uses the native API.

I have little idea how to debug/trace winsocks (or in Windows in general), I will appreciate pointers to how to actually look at that.

I tried to run with WINEDEBUG=trace+winsock, this is the last bit from listener-glib (note: no call to WSAWaitForMultipleEvents or similar)

0010:trace:winsock:WS_inet_pton family 2, addr "127.0.0.1", buffer (0x32f51c)
0010:trace:winsock:WS_setsockopt (socket 0084, level SOL_SOCKET, name SO_REUSEADDR, optval 0x32f4d8 (0), optlen 4)
0010:trace:winsock:WS_inet_ntop family 2, addr (0x32f4f4), buffer (0x32f390), len 16
0010:trace:winsock:WS_bind socket 0084, ptr 0x32f4f0 { family AF_INET, address 127.0.0.1, port 3956 }, length 16
Bound to 127.0.0.1:3956

** (process:15): CRITICAL **: 08:48:08.895: g_poll timed out.

whereas this is listener-win32:

0010:trace:winsock:WS_inet_ntop family 2, addr (0x32f854), buffer (0x32f5b0), len 16
0010:trace:winsock:WS_bind socket 0034, ptr 0x32f850 { family AF_INET, address 127.0.0.1, port 3956 }, length 16
0010:trace:winsock:WSACreateEvent 
0010:trace:winsock:WSAEventSelect 0034, hEvent 0x38, event 00000003
0010:trace:winsock:WSAEnumNetworkEvents 0034, hEvent 0x38, lpEvent 0x32f710
[repeated about 400Ă—]
0010:trace:winsock:WS2_recv_base socket 0034, wsabuf 0x32f600, nbufs 1, flags 0, from (nil), fromlen -1, ovl (nil), func (nil)
0010:trace:winsock:WS2_recv_base fd=13, options=0
0010:trace:winsock:WS2_recv_base  -> 8 bytes
0010:trace:winsock:DllMain 0x7f2bed160000 0x0 0x1
Bound to 127.0.0.1: 3956
Received 8 bytes

Maybe the relevant calls are not in the winsock channel.

I am running now with a local version of g_poll (with more tracing messages) and finally realized g_poll is internally using WaitForMultipleObjectsEx which times out (not WSAWaitForMultipleEvents as the win32 listener which works). Now, this SO post says:

The problem is that a socket, as implemented in the WinSock library, is not a Windows handle. You can’t put it into the WaitForMultipleObjectsEx array.

Luckily, WinSock provides a function WSAEventSelect that can link a socket to a Windows event object. An event object is the simplest type of synchronization object. In this case, you would ask it to signal the event object when the socket is ready to be read ( FD_READ ). Then you would put the event object into the array alongside the semaphore.

Can someone knowledgeable confirm that? If it is really so, g_poll will never work with socket objects (good to know); and it might be good for GLib to document that (or even check FD type when g_poll is called).

Should I try wrapping the socket(s) with g_io_channel_unix_new (or g_io_channel_win32_new_socket) and use g_io_create_watch and g_source_set_callback?

1 Like

Linking glib bug #214, continuing there if needed.

Right exactly, I implemented this solution in my app(DBKangaroo built-in libgda and GTK3), it worked, and it cost me a lot of time.

int res = 0;
PollFD[] ssh_socket_pollfds = new PollFD[1];
#if WINDOWS
long lNetworkEvents = 33; /* 0x01(FD_READ) | 0x20(FD_CLOSE) */
ssh_socket_pollfds[0].fd = (int) Windows.WSACreateEvent();
assert (Windows.WSAEventSelect (m_ssh_forward_socket.fd, (void*)ssh_socket_pollfds[0].fd, lNetworkEvents) == 0);
#else
ssh_socket_pollfds[0].events = IOCondition.IN;
ssh_socket_pollfds[0].fd = m_ssh_forward_socket.fd;
#endif

while (m_ssh_tunnel_alive) {
    #if WINDOWS
    Windows.WSAResetEvent((void*)ssh_socket_pollfds[0].fd);
    ssh_socket_pollfds[0].events =  IOCondition.IN;
    ssh_socket_pollfds[0].revents = 0;
    #endif

    // poll socket state info
    res = GLib.poll(ssh_socket_pollfds, 1000);
    if ((0 < res) && (IOCondition.IN in ssh_socket_pollfds[0].revents)) {
        ......
    }
    
    ......
}

Thanks, Andy. This confirmed I was on the right track. I got it working now (look at Aravis PR#442 at github, cannot post that link for some reason?!) which shows the changes.

Basically, to work around the issue, there are 3 steps:

  1. setup GPollFDs:

    1. create a new WSAEVENT hEvent = WSACreateEvent() and assign that (cast to guint64) to GPollFD.fd.

    2. call WSAEventSelect(g_socket_get_fd(socket), hEvent, FD_FLAGS) where g_socket_get_fd returns WSA handle (a SOCKET). FD_FLAGS is e.g. FD_READ and should be semantically equal to GPollFD.events (e.g. G_IO_IN).

      I saw some posts suggesting one should also watch for FD_CLOSE but I don’t see that as necessary for listening UDP (connectionless) sockets.

  2. in the loop: call WSAEnumNetworkEvents(g_socket_get_fd(socket),hEvent,&wsaNetEvents) after every call to g_poll, where socket is the one data will be read from (g_socket_receive). This clears internal flags about outstanding events so that g_poll does not return them again next time (g_socket_receive would have no data to read).

    You use WSAResetEvent which is discouraged in the docs:

    The proper way to reset the state of an event object used with the WSAEventSelect function is to pass the handle of the event object to the WSAEnumNetworkEvents function in the hEventObject parameter. This will reset the event object and adjust the status of active FD events on the socket in an atomic fashion.

  3. finish: WSACloseEvent(hEvent) to release the event FD.

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.