links
300 Images From 1800 Sites
Punctuated Productivity
ascii table
brainjar.com: css positioning
Catman's Reference Guide to XHTML 1.1
Catman's XHTML 1.1 Elements and Attributes Reference Guide
citeseer
Color Scheme Generator
common errors in english
cool images
Copying music between authorized computers with iTunes for Windows
css layout-o-matic
daypop
del.icio.us
elegant hack
emacs wiki
floatutorial
imho...
keystroke shortcuts for windows xp
mozilla keyboard shortcuts
NameVoyager
perldoc.com
programming language popularity
regular expression tester
selectoracle
short url services
simple urls for search engines
the unix acronym list
yahoo dictionary

permalink

hard disk failure

The Story of the Dying Disk

I just replaced the primary hard disk in this system. Here’s the story:

I began to notice weeks ago that my system seemed a bit sluggish. This server is a few years old and probably under configured for what it is being asked to do. I’m constantly piling more stuff on it and for the most part it hums along without complaint. I attributed the periodic sluggishness to insufficient memory and associated swapping.

A few days BC (before crash) I decided there was something more to it than swapping. I’m not sure exactly why. I think it had to do with some specific operation that I was doing over and over again. Thinking that little else was going on on the system, it didn’t make sense to call the slowness “swapping”—I mean, there was no time or reason for this to be swapped out. Also I think I noticed that some operations would slow down in the middle – perhaps something was loading and it would load part of it fast, then get real slow.

I started looking for a problem. Nothing new in the system logs that I noticed. I realize this is all very vague, but I decided it was the disk and that the disk was having trouble. I started trying to monitor the disk health.

I installed S.M.A.R.T. Monitoring Tools. SMART is a standard for reporting the health of a disk. Disks are pretty, ah, smart these days, and they track lots of things that are related to their health. SMARTMON interacts with the disk to report and control this monitoring.

The instant smartmon tests reported some problems, but concluded that the disk was OK, I think. I asked the disk to perform the more comprehensive tests—tests which apparently take 30 minutes or more. It never finished. The test was aborted. I tried this a few times and never got a complete test. The system reported that the test was canceled on request. I did not cancel the tests and I haven’t figured out what did cancel the test. Some have reported that a system going to sleep would cancel a test though my system is not configured to sleep.

I spent days, perhaps a week, messing with this. I thought I had time. At some point I decided that my disk was hurting and that I needed to replace it, so I bought a new drive and brought it home.

It wasn’t till then that I decided to make a backup.

This was a crucial mistake. What I perceived to be a very gradual slide into a non-optimal but still working state was really a drive plunging off the cliff of its life.

An aside: I may have contributed. Did I accelerate the destruction of this drive? Often you (read “I”) aggravate the situation more than you (I) help. I’m an amateur administrator of these systems and I don’t see failures like this very often, so I’m always learning or relearning, since the last failure happened so long ago. When it was not working well I tried to “tune it up” with hdparms. I tried several settings but none moved the drive performance above “abysmal”.

Of course the backup failed. Now the drive was reporting consistency problems. Files that appeared to be there were unreadable. I made several attempts to salvage pieces and had some success. Unfortunately there were recent files that were not readable.

I’ve put the system back together via a combination of a 6 month old full backup, fragments that I was able to salvage from the failing drive, and various copies from other systems.

Lessons Learned

Share this article on post this at del.icio.us post this at Digg post this at Reddit

Comment

* * *

permalink

maxloss

Make of it what you will. I had a recent disk failure. Now I have a stack of 3 failed maxtor disks sitting on my shelf – two from my systems and one from a friend’s system. There are no other failed disk brands on my shelf.

My process for buying disks has been pretty random and I think I have a mix of drives in various systems.

I now have 2 western digital disks in this server, one internal and one external firewire. My friend just installed a seagate SATA, with a 5 year warranty. We’ll see how it goes.

Share this article on post this at del.icio.us post this at Digg post this at Reddit

Comment

* * *

permalink

got backups?

The hard drive for my web server died last week. Nothing like losing a drive to cause you to examine your backup strategy.

Backup Suggestions

Backups are a challenge. Here are a few suggestions:

  1. Pay Attention to It: I’m busy. I tend to want to click a few times and not really think about it. I suggest that it is worth your while to think about it, try it a few times, and get it right.
  2. Backup to Another Hard Drive: Yes. Forget about tapes, cd’s or dvd’s. Hard drives are relatively inexpensive and backups are much easier. I hope it goes without saying that backing up from one area to another on the same drive is not really backing up. One of the primary threats is a disk failure.
  3. Don’t ‘Backup’ Your Media Files—Sync Them: Your media files are 1) big, and 2) mostly unchanging. Instead of backing them up, sync them with rsync. I have roughly 40GB of mp3s. I now have 2 copies, one on my main drive, and another on my ‘backup’ drive. And if I can’t find it there I can always recreate the mp3 from the original CD.
  4. Automate: Your odds of success increase dramatically when you are not required to remember to do the backup. However, please head 1 first – an automated backup that is wrong will be no better than a non-automated backup that is wrong.
  5. Don’t Do Image Backups: My original goal was to be able to restore to a new disk and boot up and be exactly where I was when I did the original backup. I think this is actually difficult, and any difficulty gets in the way of doing it. Instead, think of it as spring cleaning – when your disk fails you now have an opportunity to do a new install and only restore things that you currently care about.

My Backup Process

Heeding my own advice, here is my process:

  1. I’m backing up to an external usb hard drive. I have a directory named /wd/nobackup. ‘wd’ is basically the drive and ’/wd/nobackup’ is, well, ‘don’t backup up this stuff’. What goes in ‘nobackup’? Well, your backups. In other words, your backups are copies of things and you don’t need to make copies of your copies. The tar script below excludes the ‘nobackup’ directory.
  2. The backups are done by a shell script that runs daily via cron.
  3. The backup script has 3 parts:
    • Most files get tared and gzipped. Several useless directories are excluded, as well as the directories for the MP3 files. Mysql is turned off during the backup so that we get consistent mysql databases.
    • Mysql databases are mysqldumped. This is probably overkill since the mysql databases are backed up by the previous step. However, the previous step makes a binary copy of the databases. This step produces a portable backup that could be restored on another architecture. Basically these dump files are easier to deal with.
    • MP3s are rsynced.

Here’s my script:


	

#!/bin/sh mysqladmin --password=<mysql root pw> shutdown tar –cvpzf /wd/nobackup/backups/full-backup-`date '+%Y-%B-%d'`.tar.gz \   ––directory / \   ––anchored \   ––exclude='./mnt/*' \   ––exclude='./dev/*' \   ––exclude='./proc/*' \   ––exclude='./tmp/*' \   ––exclude='./wd/nobackup/*' \   ––exclude='./home/kellyf/music/*' \   ––exclude='./home/kellyf/audio-save/*' \   . > /wd/nobackup/backups/lastfullbackup.log /etc/init.d/mysql start mysqldump ––all-databases ––password=<mysql root pw> > /wd/nobackup/backups/all-databases-`date '+%Y-%B-%d'`.sql rsync -a --delete /home/kellyf/music /wd/nobackup/ rsync -a --delete /home/kellyf/audio-save /wd/nobackup/

Share this article on post this at del.icio.us post this at Digg post this at Reddit

Comment [330]

* * *

permalink

home network performance

My home network is a real mix of devices and technologies. I’ve acquired these devices over time to solve various connectivity issues. Overall it’s a significant investment (to me1).

I use plain old ethernet, HomePlug (powerline), HomePNA (phoneline), and Wi-Fi. This allows me to hook up just about anything in any room.

For the most part this works fine. In the past I’ve been primarily constrained by my external dsl connection, and the internal networks were far faster, so far as I knew. Lately I’ve had problems that I suspected were related to latency or flakyness of my internal network. For example, I have files on a samba server, and some windows applications would fail when using these networked files. My solution was to temporarily copy these files to my local disk. However, that was extraordinarily slow when the files were big (read ‘media’). Thus I decided to fix the network.

I had worked out what I was going to do when a friend at work suggested I test all of the links point to point. Sounded good. I found iperf and started testing. I installed iperf on my server and started it in server mode, then tested from various points on the internal network. In all of the tests below, the tests went from some location to the server—the server being the last link but common to all so omitted from the table.

My solution was to run a cat5 cable from my server room (the basement) to my bedroom. Now my primary PC can talk to the server at approximately 80 Mbits, roughly 24 times faster than before when that connection ran through the powerline network. Much better.

The Tests

1 kelly-enet kr1-xe102 sr1-ss2502 sr-fvs318 3.36
2 kelly-enet kr2-plebr10 sr1-ss2502 sr-fvs318 .29
3 kr-voltorb-wifi lr1-ss2521 sr1-ss2502 sr-fvs318 .53
4 lafr-loganpc-usb lafr-usb200ha sr-pe102 sr-fvs318 4.50
5 kelly-enet kr1-plebr10 sr1-ss2502 sr-fvs318 3.24
6 kelly-enet kr2-xe102 sr1-ss2502 sr-fvs318 3.05
7 lr-voltorb-wifi lr1-ss2521 sr1-ss2502 sr-fvs318 3.43
8 sr-voltorb-enet sr-fvs318 89.90
9 sr-voltorb-enet 92.80
10 kelly-enet kr1-plebr10 sr1-ss2502 sr-fvs318 4.52 xe102 and ss2521 unplugged
11 kelly-enet kr1-plebr10 sr1-ss2502 sr-fvs318 .22 ss2521 plugged in
12 kelly-enet kr1-plebr10 sr1-ss2502 sr-fvs318 2.74 ss2521 unplugged
13 kelly-enet kr1-xe102 sr2-plebr10 sr-fvs318 2.63 ss2521 unplugged
14 kelly-enet kr2-xe102 sr1-plebr10 sr-fvs318 2.21 ss2521 plugged
14 kelly-enet kr2-xe102 sr3-plebr10 sr-fvs318 2.57 ss2521 unplugged
15 kelly-enet sr-fvs318 80.10 cat5 to server room

The Gear

netgear powerline xe102
netgear phoneline enet bridge pe102
netgear firewall/router fvs318
speedstream powerline ss2502
speedstream powerline wap ss2521
linksys powerline plebr10
linksys phoneline usb usb200ha

The Testing Points

kr kelly’s room center
kr1 kelly’s room internal wall power outlet
kr2 kelly’s room outside wall power outlet
lr1 living room internal wall power outlet
lafr logan’s room
sr server room
sr1 server room, plug 1
sr2 server room, plug 2

1 I used to spend a lot of time staring at electronics at stores like Fry’s searching for the latest gadget. At some point I noticed my searches had changed and were now about finding the best cables. I think many modern households have massive tangles of cables—behind the tv/entertainment center, at the home computer/dsl connection, etc. I was thrilled the first time I stumbled upon at 3’ ethernet cable and elated by the discovery of a 10” ethernet cable—perfect for that connection between the dsl modem and the router/firewall/hub… Now, if I could only find a 2’ computer power cord.

Share this article on post this at del.icio.us post this at Digg post this at Reddit

Comment

* * *