h1

Why I’m Blogging and Not Working

April 26, 2007

This morning, a little after 9:00, I got a call from a fellow named Matt Rutherford. Matt is the systems administrator for the SERL (Software Engineering Research Lab) servers, the same servers that we use for our SVN repository. He needed me to go into the machine room and reboot one of them—Gatekeeper—because there was something wrong with it. Now the reason Matt couldn’t do this himself is because he lives in Niwot, or some other such non-Boulder place, and rarely finds himself on campus. You see, gentle reader, Matt is a graduate student on whom the task of administrating these computers has been dumped.

So I ankled on over to the machine room and pressed the button the KVM to switch the input devices over to Gatekeeper. “No Signal,” said the monitor. “OK,” I think to myself and proceed to try a few of the other inputs near the one labeled “Gatekeeper.” Nothing. I decided to try the inputs labeled “serl1” and “serl2.” Still nothing, not even a, “No Signal.” But I have not been defeated yet because Matt had said that, failing everything else, I could hard-reboot the computer using the power button on its front; so down I bend in search of the ubiquitous power button. And it was then that I discovered someone had removed all of the face plates from the machines—the face plates with the labels identifying which machine was which—and had left them lying on the floor.

Not to be so easily defeated, I thought, “Fine. I’ll just reboot every machine.” But at this point I failed to find anything resembling a power button on any of the machines. And so I returned to e-mail Matt—having forgotten my phone at home—and apprise him of the situation. Without access to these servers I cannot check out or commit any code, so I am in no big hurry to make enormous changes to my working copy.

“Why,” you may ask, “are you angry with the CS Dept. and not this Matt Rutherford character?” The reason is this: some time ago the CS Dept. decided that in order to save some money they wouldn’t hire systems administrators for their machines and would, instead, handle that task themselves. Which really meant that they would force graduate students to do it. Graduate students who were also matriculating at CU, doing research, and probably even working. This results in a number of problems. One, the servers are often not administered well. If you want, for instance, an SVN repository, the response is often, “Set it up yourself in your home directory and access it via SSH.” Which is fine for personal stuff, but when it’s for a research project that many people are working on, I think it deserves a better set up than that. Two, these poor graduate students have to spend a significant portion of their time dealing with problems when the shit hits the fan; time that should be spent studying, or working on their research.

There is a reason that major companies—oft times even smaller start-ups—hire full-time systems administrators: It is a full-time job! Computers are inherently riddled with problems because of their complexity. When you’re talking about a computer that is being accessed by many many people it gets worse. There are few things more helpful than a good sysadmin, and there are few things more frustrating than a poor one. So this is not Matt’s fault; SERL should not be his responsibility. This is the fault of every single professor here in the department that decided they’d save a little money by making the lives of their grad students a little more difficult.

I hope that each and every one of you loses years of vital research data because one of your servers crashes due to lack of maintenance.


UPDATE: I got Matt’s phone number from Ken and we managed to get things worked out. It turned out all three machines were off for some reason. Imagine how quick and easy that would have been to fix for someone who knew the machines. Lights were blinking, I heard the sound of fans, I had no idea that they were off. But who needs professional sysadmins when you can blindly stumble around with equipment you don’t know and try to debug problems over the phone…

FURTHER UPDATE: I think I just heard someone go into the machine room and check to make sure the machines were on. It’s a good thing we’re not duplicating effort by not having real sysadmins.

Advertisements

One comment

  1. That’s like working in a big dumb company. If I forget to change the date on some email I send, the person doing the forwarding has all the time in the world to type out a response saying the date is wrong and would I please fix it, rather than change one damn number. Then I have to change the date and say it has been fixed and resend it to her so she can forward an email that you can’t read without scrolling a page and a half to see what it’s for. Everything is dumb.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: