[FASTCGI] Threaded C fcgiapp implementation problems and questions

Jonathan Gray jgray at streamy.com
Wed Apr 22 16:40:24 EDT 2009


If I disable keep-alives in lighttpd, I no longer see lots of connections
in lighttpd status in the read state.  That seems to have been tied to
keep-alives.

How does that work with fastcgi and fcgiapp?  Is there anything i need to
do to be able to take advantage of keep-alives?  The build-up of extra
connections in weird states leads me to believe that the keep-alives are
not working.

On Wed, April 22, 2009 12:54 pm, Jonathan Gray wrote:
> Hello,
>
>
> I have a multithreaded, C FastCGI script using the fcgiapp library
> running on top of lighttpd.
>
> I'm having a recurring problem on my production environment that crops up
>  after a few days straight of load around 20-40 concurrent connections.
>
> This script is implementing something called COMET
> http://en.wikipedia.org/wiki/Comet_(programming)
>
>
> It's basically using AJAX/XHR requests to simulate pushing to the client.
>  The user opens an AJAX request to the script and the server keeps it
> loading until a message comes in from the server (it connects to a central
>  server which sends messages to clients), or until we time it out.  On
> the wikipedia page, this is described as Ajax with long polling /
> XMLHttpRequest long polling.
>
>
> This has been working for a very long time but recently as load has been
> increasing we started to see a weird behavior.
>
> All of a sudden, lighttpd/mod_fastcgi will start to reject all new
> connections.  The log shows this error:
>
> 2009-04-01 12:21:33: (mod_fastcgi.c.3005) got proc: pid: 3664 socket:
> unix:/home/user/cgi/socks/event.sock-0 load: 25
> 2009-04-01 12:21:33: (mod_fastcgi.c.2494) unexpected end-of-file (perhaps
> the fastcgi process died): pid: 3664 socket:
> unix:/home/user/cgi/socks/event.sock-0
>
>
>
> The process is not dead, there are 24 other connections that are
> currently being properly handled.  When these requests come in, the script
> does not see them at all (ie. FCGX_Accept_r does not return).
>
> After all the existing connections have dropped, it will then continue
> normal operation and start to accept new connections:
>
> 2009-04-01 12:22:00: (mod_fastcgi.c.1515) released proc: pid: 3664
> socket:
> unix:/home/user/cgi/socks/event.sock-0 load: 2
> 2009-04-01 12:22:01: (mod_fastcgi.c.1515) released proc: pid: 3664 socket:
>  unix:/home/user/cgi/socks/event.sock-0 load: 1
> 2009-04-01 12:22:03: (mod_fastcgi.c.1515) released proc: pid: 3664 socket:
>  unix:/home/user/cgi/socks/event.sock-0 load: 0
>
>
> and then
>
> 2009-04-01 12:22:03: (mod_fastcgi.c.3005) got proc: pid: 3664 socket:
> unix:/home/user/cgi/socks/event.sock-0 load: 1
>
>
> The same PID (the process never crashed) then does start to see new
> connections and things go for another few days without problems, then the
> same thing happens again.
>
>
> The design of my application differs from the example threaded
> application because I do not keep a thread per connection, rather I use
> queues, timers, hash tables, etc to track the state of sessions and their
> FCGX_Request.
>
>
> Since I can't just use a FCGX_Request per thread, as done in the example,
>  I pre-instantiate a large array of FCGX_Requests of size
> MAX_ALLOC_REQUESTS.  I then loop through this array, sliding down one
> index each time.  This array is significantly large that I do not get
> anywhere close to reusing a request that was not FCGX_Finish_r'd already.
>  (this is set to 25,000 right now, in benchmarking i'm trying to get over
>  10k.  i am nowhere near this in production where the bug happens).
>
>
> Is this a sane approach?  Could I be messing something up with my
> allocating so many and doing FCGX_InitRequest on each.
>
> for(i=0;i<MAX_ALLOC_REQUESTS;i++) FCGX_InitRequest(&reqs[i],0,0);
>
>
> I am locking around the accept, so the accepting of connections is
> single-threaded:
>
>
> pthread_mutex_lock(&accept_mutex); rc = FCGX_Accept_r(&reqs[curreq]);
> nextreq = (curreq + 1) % MAX_ALLOC_REQUESTS;
> pthread_mutex_unlock(&accept_mutex);
>
>
> I have two potential threads that can close the connection.  In all
> cases, the closing of the connection follows the form:
>
> FCGX_FPrintF(request->out,"...");
> FCGX_Finish_r(request);
>
>
>
> That request is still part fo the reqs[] and will be called again, much
> later, with FCGX_Accept_r.
>
> Again, is this right?  I read that FCGX_Finish_r is thread-safe, so I'm
> not locking around that, there are potentially two threads running FPrintF
>  and Finish_r simultaneously (but ALWAYS on different FCGX_Requests).
>
>
> I have increased all kinds of system limits like file/socket descriptor
> limits and memory limits.  I have seen the requests "loop" through the big
>  array of pre-allocated ones, and they are reused without a problem.
>
> One thing i'm also not sure about is how keep-alives and pipelining might
>  interact with what i'm doing.  When looking into the lighttpd status
> page, sometimes I noticed connections, after they are out of handle-req
> and the script has returned/finish_r'd it, they sit in a 'read' state for
> some time.  The only pointer I've got w.r.t. that was it might be trying
> to read another request from the client?
>
>
> I'd really appreciate any kind of help.  I'm a bit stuck and in any case
> could use some best practices advice.
>
> Thanks.
>
>
> Jonathan Gray
> _______________________________________________
> FastCGI-developers mailing list
> FastCGI-developers at mailman.fastcgi.com
> http://mailman.pins.net/mailman/listinfo.cgi/fastcgi-developers
>
>
>



More information about the FastCGI-developers mailing list