[FASTCGI] Threaded C fcgiapp implementation problems and questions

Jonathan Gray jgray at streamy.com
Wed Apr 22 19:39:55 EDT 2009


Further testing is now reproducing the issue.

It happens after the 65,563rd connection.  It then stops accepting
connections until all connections are dropped.  Then it continues
normally.

This is suspiciously close to 64 * 1024 (what i have set my linux max file
descriptors to).  Is lighttpd or fastcgi somehow hanging on to file
descriptors, either to the lighttpd->fastcgi file socket, or http/tcp
sockets?  Is there something else tied to ~64k I should look in to?

Thanks.

JG

On Wed, April 22, 2009 1:40 pm, Jonathan Gray wrote:
> If I disable keep-alives in lighttpd, I no longer see lots of connections
>  in lighttpd status in the read state.  That seems to have been tied to
> keep-alives.
>
> How does that work with fastcgi and fcgiapp?  Is there anything i need to
>  do to be able to take advantage of keep-alives?  The build-up of extra
> connections in weird states leads me to believe that the keep-alives are
> not working.
>
> On Wed, April 22, 2009 12:54 pm, Jonathan Gray wrote:
>
>> Hello,
>>
>>
>>
>> I have a multithreaded, C FastCGI script using the fcgiapp library
>> running on top of lighttpd.
>>
>> I'm having a recurring problem on my production environment that crops
>> up after a few days straight of load around 20-40 concurrent
>> connections.
>>
>> This script is implementing something called COMET
>> http://en.wikipedia.org/wiki/Comet_(programming)
>>
>>
>>
>> It's basically using AJAX/XHR requests to simulate pushing to the
>> client. The user opens an AJAX request to the script and the server
>> keeps it loading until a message comes in from the server (it connects
>> to a central server which sends messages to clients), or until we time
>> it out.  On the wikipedia page, this is described as Ajax with long
>> polling / XMLHttpRequest long polling.
>>
>>
>>
>> This has been working for a very long time but recently as load has
>> been increasing we started to see a weird behavior.
>>
>> All of a sudden, lighttpd/mod_fastcgi will start to reject all new
>> connections.  The log shows this error:
>>
>> 2009-04-01 12:21:33: (mod_fastcgi.c.3005) got proc: pid: 3664 socket:
>> unix:/home/user/cgi/socks/event.sock-0 load: 25
>> 2009-04-01 12:21:33: (mod_fastcgi.c.2494) unexpected end-of-file
>> (perhaps
>> the fastcgi process died): pid: 3664 socket:
>> unix:/home/user/cgi/socks/event.sock-0
>>
>>
>>
>>
>> The process is not dead, there are 24 other connections that are
>> currently being properly handled.  When these requests come in, the
>> script does not see them at all (ie. FCGX_Accept_r does not return).
>>
>> After all the existing connections have dropped, it will then continue
>> normal operation and start to accept new connections:
>>
>> 2009-04-01 12:22:00: (mod_fastcgi.c.1515) released proc: pid: 3664
>> socket:
>> unix:/home/user/cgi/socks/event.sock-0 load: 2
>> 2009-04-01 12:22:01: (mod_fastcgi.c.1515) released proc: pid: 3664
>> socket:
>> unix:/home/user/cgi/socks/event.sock-0 load: 1
>> 2009-04-01 12:22:03: (mod_fastcgi.c.1515) released proc: pid: 3664
>> socket:
>> unix:/home/user/cgi/socks/event.sock-0 load: 0
>>
>>
>>
>> and then
>>
>> 2009-04-01 12:22:03: (mod_fastcgi.c.3005) got proc: pid: 3664 socket:
>> unix:/home/user/cgi/socks/event.sock-0 load: 1
>>
>>
>>
>> The same PID (the process never crashed) then does start to see new
>> connections and things go for another few days without problems, then
>> the same thing happens again.
>>
>>
>> The design of my application differs from the example threaded
>> application because I do not keep a thread per connection, rather I use
>> queues, timers, hash tables, etc to track the state of sessions and
>> their FCGX_Request.
>>
>>
>>
>> Since I can't just use a FCGX_Request per thread, as done in the
>> example, I pre-instantiate a large array of FCGX_Requests of size
>> MAX_ALLOC_REQUESTS.  I then loop through this array, sliding down one
>> index each time.  This array is significantly large that I do not get
>> anywhere close to reusing a request that was not FCGX_Finish_r'd
>> already. (this is set to 25,000 right now, in benchmarking i'm trying to
>> get over 10k.  i am nowhere near this in production where the bug
>> happens).
>>
>>
>> Is this a sane approach?  Could I be messing something up with my
>> allocating so many and doing FCGX_InitRequest on each.
>>
>> for(i=0;i<MAX_ALLOC_REQUESTS;i++) FCGX_InitRequest(&reqs[i],0,0);
>>
>>
>> I am locking around the accept, so the accepting of connections is
>> single-threaded:
>>
>>
>>
>> pthread_mutex_lock(&accept_mutex); rc = FCGX_Accept_r(&reqs[curreq]);
>> nextreq = (curreq + 1) % MAX_ALLOC_REQUESTS;
>> pthread_mutex_unlock(&accept_mutex);
>>
>>
>> I have two potential threads that can close the connection.  In all
>> cases, the closing of the connection follows the form:
>>
>> FCGX_FPrintF(request->out,"...");
>> FCGX_Finish_r(request);
>>
>>
>>
>>
>> That request is still part fo the reqs[] and will be called again, much
>>  later, with FCGX_Accept_r.
>>
>> Again, is this right?  I read that FCGX_Finish_r is thread-safe, so I'm
>>  not locking around that, there are potentially two threads running
>> FPrintF
>> and Finish_r simultaneously (but ALWAYS on different FCGX_Requests).
>>
>>
>> I have increased all kinds of system limits like file/socket descriptor
>>  limits and memory limits.  I have seen the requests "loop" through the
>> big array of pre-allocated ones, and they are reused without a problem.
>>
>> One thing i'm also not sure about is how keep-alives and pipelining
>> might interact with what i'm doing.  When looking into the lighttpd
>> status page, sometimes I noticed connections, after they are out of
>> handle-req and the script has returned/finish_r'd it, they sit in a
>> 'read' state for
>> some time.  The only pointer I've got w.r.t. that was it might be trying
>>  to read another request from the client?
>>
>>
>> I'd really appreciate any kind of help.  I'm a bit stuck and in any
>> case could use some best practices advice.
>>
>> Thanks.
>>
>>
>>
>> Jonathan Gray
>> _______________________________________________
>> FastCGI-developers mailing list
>> FastCGI-developers at mailman.fastcgi.com
>> http://mailman.pins.net/mailman/listinfo.cgi/fastcgi-developers
>>
>>
>>
>>
>
> _______________________________________________
> FastCGI-developers mailing list
> FastCGI-developers at mailman.fastcgi.com
> http://mailman.pins.net/mailman/listinfo.cgi/fastcgi-developers
>
>
>



More information about the FastCGI-developers mailing list