Wednesday, March 15, 2006

Here's what we use(Score:5, Informative)
by MagicMike (7992) on Wednesday March 15, @01:14PM (#14925846) (https://secure.eff.o...?pagename=DON_splash)
We're a small consulting shop, and the guys that do the in-house IT are expected to be full-time billable.
Issue tracking
JIRA (from Atlassian.com). Bugzilla could work though
Documentation
Confluence (also from Atlassian). Any Wiki could work.
Communication
Mailman - we have one operator mailing list, root mail all goes there and we have discussions there
Config Control
CVS - If you alter it from the stock install, it should be in CVS. Subversion would work. Use "activitymail" to send CVS commit messages with diffs to your operator mailing list. Now if a machine dies, you don't care
Monitoring
Nagios and MRTG - If I expect a computer to be providing a service, everything that I can obvserve about that service will be monitored so we can detect failures quickly and fix them, and see patterns over time. Nagios sends alerts to the operator mailing list. MRTG is used to see how bandwidth is trending.
Updates
Yum - we have our own yum repository, with our own packages in there. If I am using something on more than two servers, I package it up for easier maintenance
VPN links
PPP over SSH - nothing fancy, but it works.
Backups
rsync - we have a cascading backup where cron dumps data on a machine, then rsync carries it to a central machine, then that machine rsyncs over a VPN link to an off-site machine
Secret storage
GPG - we keep passwords in GPG-encrypted files. If you need them, I encrypt it with your public key, and you can see them.
Authn/Authz
LDAP - we use pam-ldap for access control everywhere, and mod_auth_ldap on the web stuff. It's not SSO, but it is single-password. That's key
The combination of these things keeps everything in line. In particular, I'll point out that each part works together in such a way that there is only one place to check documentation (the wiki), one place to check for a work queue (the issue tracker) and one place to check for state information and discussion (the mailing list). That makes it easy to deal with, easy to delegate etc.
Also, you'll note that on a day-to-day basis, unless something breaks, there is no work required. That's huge. If the status quo requires any work at all, you'll eventually hit a scaling limit. The only thing that should require work is either a migration, an upgrade, or an expansion. And of those, upgrades should be easy to (nagios, yum and version control help there)