I'm the sysadmin at Sprend.
In this post I'll expand on Arnes previous post (read that one first), and dive into technical details.
So consider yourself warned, this is gonna be fairly nerdy stuff :)
(Actually the Wikipedia geek article has a better explanation on the subject of nerds, but when I was a teenager me and my friends here in Sweden always called ourselves nerds, so that's the expression I'm sticking to)
Regarding the Java threads eating 99% CPU:
This might in part have been caused by us running Linux kernel 2.6.18 or Tomcat 5.5.20, but most likely the reason was Java 5.0.10 (it's hard to know since we were too lazy to do any serious debugging or profiling).
Also, the Java threads regularly allocated more memory than they were assigned, sometimes to the point of starving the machine of memory, at which point the kernel (or rather its dreaded OOM Killer) always made the unfortunate choice of killing MySQL instead of something less important, in order to free up memory.
(I didn't know about the oom_adj setting at the time. Not that it would have helped considering that Java and MySQL were the only things consuming any significant amount of memory on the server, and both of them had to stay alive)
Aside from CPU usage being reduced on the new server, memory is not "leaking" anymore and Java & MySQL are using fewer threads.
That's partly due to the faster CPU (dual core Athlon 64 5200+) but also because of more efficient software versions: kernel 2.6.3x, Java 6.x, Tomcat 6.x and MySQL 5.x
When things were at their worst on the old server, Java grew towards 200 threads and 1 GiB of RAM (the server only had 1 GiB of RAM, but no swap because that would have hurt our disk performance even more). MySQL 4.1.22 behaved more gracefully and stayed below 50 threads and 100 MiB of RAM. On the new server Java stays below 300 MiB of RAM and 120 threads. MySQL stays below 50 MiB of RAM and 25 threads. Java now seldom occupy more than 100% of one CPU core (often much less than that) and MySQL consumes virtually zero CPU (and that's how it should be).
We had some other minor problems with Java and MySQL as well that disappeared on the new server.
As a consequence Java and MySQL is roughly an order of magnitude more stable now, which is quite nice for me since I don't need to babysit them anymore.
Regarding moving the db from the USB flash drives to the hard drives:
The reason was that the USB drives are slow when MySQL is doing something that causes heavy and sustained disk IO.
Which is not a surprise considering that USB flash drives typically have IO throughput of merely 5-15 MiB/s.
Also, I separated the system disk (which holds the operating system) from the data disk (which holds the files being uploaded and downloaded to/from sprend.com).
The reason to separate the system disk from the data disk is performance - concurrent reads and writes in particular.
And why is that necessary?
Well, our internet connection is a dedicated 100/100 Mbit/s full duplex ethernet line. This means what we can push a maximum of 25 MiB/s through the line.
That's nothing for our SATA-300 hard drives which I've measured to push approximately 100 MiB/s of sequential IO per drive at peak performance.
But, and this is the crux, at peak hours (noon, afternoon and evenings) we typically have something like 30 to 40 simultaneous file transfers in progress.
And while the aggregate bandwidth of those transfers seldom go beyond 15 MiB/s they do cause simultaneous reads and writes of 30 to 40 different files on the hard drive. Also known as random IO.
This means that the magnetic head inside the hard drive is jumping around like crazy the whole time while accessing the different data blocks belonging to all those files (no matter what you do, the data blocks are gonna get spread out over the platter(s) inside the hard drive over time - especially with our high rate of file creations and deletions - and that's why the magnetic head has to jump around so much).
That in turn translates into increased seek times (and increased wear & tear) on the hard drive.
On the old server we had combined system and data disk, a PATA/100 disk controller and the XFS file system on the hard drives.
That caused the old hard drives to become seriously overworked and slow at peak traffic hours.
Now, there's nothing wrong with XFS. I've done some performance comparisons of the Linux journalling file systems ext3, reiser3, JFS and XFS. All on the same Linux installation on non-enterprise hardware, and XFS was the clear winner.
But the newer generation ext4 (with its extents, pre-allocation, delayed allocation and multiblock allocator) in conjunction with the faster SATA-300 disk subsystem and separated system & data disks proved to be highly effective.
The load on the hard drives can't even be noticed anymore during peak traffic hours.
Of course, ZFS is still the ultimate pr0n when it comes to file systems.
Unfortunately, the CDDL license of ZFS and the GPL license of the Linux kernel are incompatible, preventing ZFS from being incorporated into the Linux kernel.
But the good news is that there is an all new and shiny Linux native file system in full development right now, which is basically an improved clone of ZFS.
It's called Btrfs (sponsored primarily by Oracle) and when it's declared stable we'll switch over to it and get amazing kickass features!
Oh, and the reason that we used USB flash drives is that they're cheap, noiseless, cold, power efficient and small in physical size (the server has room for them, but not for 2 extra hard drives).
All of this except being cheap is also true for SSD drives, which is why we went with USB flash drives instead.
SSD drives have blazing performance, but they're just too expensive at this point in time for this project.
Also, SSD still share a serious technical problem with USB flash - after something like 50-100K of writes, individual memory cells will start to fail (even when utilizing wear levelling).
But that, and write performance, won't be a problem in the next generation of SSD drives.
Other points of interest regarding the new server:
- We've switched from 32 to 64 bit Gentoo Linux for OS
- We're now using NAPI in our NIC driver, which reduces interrupt generated CPU load by 5-10% (estimated) on incoming network traffic
- Security is increased. In particular login security + the number of open network ports is reduced (that number is extremly low now)
- We're utilizing around 400 GiB of our storage capacity (which is well over 1 TiB now)
- We will try using APR, which will enable Tomcat to scale better, and seems able to reduce Java CPU usage somewhat.
- We will connect a UPS to the server
We have to investigate whether to use clustering, loadbalancing or failover on the servers.
In conclusion, this is how I imagine a discussion with Homer would summarize things (see the video clip below for why this is funny):
Me: The old server is b0rked!(Not that Homer has any idea what a server is, but lets pretend that he does)
Homer: That's bad.
Me: But the new server is totally sweet!
Homer: That's good.
Edit: FU to Fox for revoking access to the Simpsons video clip on Youtube. Fortunately there are other video sites.