Table of Contents
Account setup
- edit /etc/passwd
- add the users name (copy the previous line)
- increment the userid
- change name
- change home directory
- Set the desired shell
- edit /etc/shadow
- add the users name (copy the previous line)
- Change the password with passwd <username>
- edit /etc/group
- add the users name
- change the userid to match the one in /etc/passwd
- Create /usr/people/<username>
- Set permissions
- chown <username>:<username> /usr/people/<username>
- chmod 700 <username>
- set quotas
- eduquota <username>
- fs /usr/people kbytes (soft = 0, hard = 2097152) inodes (soft = 0, hard = 0)
—-
Enterprise
swmgr (IRIX Package manager.)
edquota <username> repquota -a
l2cmd --scdev /hw/module/001c10/L1/controller power
l1cmd --scdev /hw/module/001c10/L1/controller serial all
[/] /usr/cpu/firmware/sysco/flashsc --l2 10.17.175.238 /usr/cpu/firmware/sysco/l1.bin 121.20
$ /usr/diad/bin/pandora
(04:56:52 PM) luke.scharf@im.clusterbee.net: It came via the federal HPC Modernization Program.
(04:57:08 PM) luke.scharf@im.clusterbee.net: I think it was initially owned by the Army Research Lab in Louisiana, but the Air Force now holds the deed.
(04:57:22 PM) luke.scharf@im.clusterbee.net: I might be able to look up some of the e-mails from the HPCMP folks.
(05:01:19 PM) luke.scharf@im.clusterbee.net: It looks like the POC for the project at the time was: Maj Edward M. Williams, PhD HPC Program Manager Air Force Office of Scientific Research 703-696-6566 (DSN 426)
(05:04:38 PM) luke.scharf@im.clusterbee.net: Another contact is Bill Reidy (of BAE Systems and the HPCMP) at 703-812-8205
(05:04:51 PM) luke.scharf@im.clusterbee.net: reidy@hpcmo.hpc.mil
(05:05:18 PM) luke.scharf@im.clusterbee.net: I think Bill Reidy was the main POC – it looks like he answered most of the questions.
(05:10:11 PM) luke.scharf@im.clusterbee.net: It looks like Bill Reidy is really the guy who put it together.
Disk Maintenance
see the section “Creating a New System Disk by Cloning” for using xfsdump to copy the disk
I cloned the root drive from controller 0 to 7 on 11/20/2009. First, I did
xfsdump -l 0 -f /tmp/xfsdump-oldroot.sfsdump /mnt/tmp
of the mounted spare drive which contained a dump from when Luke helped get the machine back from /usr/people array trouble.
Luke made this copy back in 2007 after the crash maybe?
xfsdump - / | gzip | ssh root@alexandria.aoe.vt.edu 'cat > /home/sysadmin/enterprise.aoe.vt.edu/backup/root_2007-01-29.xfsdump.gz'
XVM Manager GUI
xvmgr
vol/people
subvol/people/data
stripe/stripe0
slice/disk3s0
slice/disk4s0
vol/tmp
subvol/tmp/data
stripe/stripe1
slice/disk1s0
slice/disk2s0
vol/workspace
subvol/workspace/data
concat/workspace
stripe/workspace1
slice/xvmdisk_0s0
slice/xvmdisk_1s0
stripe/workspace2
slice/xvmdisk_2s0
slice/xvmdisk_3s0
stripe/Workspace3
slice/xvmdisk_4s0
slice/xvmdisk_5s0
wipe commands
workspace
dd if=/dev/zero of=/hw/rdisk/dks13d66vol bs=1M dd if=/dev/zero of=/hw/rdisk/dks13d69vol bs=1M dd if=/dev/zero of=/hw/rdisk/dks13d70vol bs=1M dd if=/dev/zero of=/hw/rdisk/dks13d73vol bs=1M dd if=/dev/zero of=/hw/rdisk/dks13d74vol bs=1M dd if=/dev/zero of=/hw/rdisk/dks13d77vol bs=1M
tmp
dd if=/dev/zero of=/dev/rdsk/dks4d114vol bs=1M dd if=/dev/zero of=/dev/rdsk/dks4d117vol bs=1M
people
dd if=/dev/zero of=/dev/rdsk/dks4d115vol bs=1M dd if=/dev/zero of=/dev/rdsk/dks4d116vol bs=1M
Main Disks
/dev/rdsk/dks0d1vol /dev/rdsk/dks0d2vol dd if=/dev/zero of=/hw/disk/dks0d2vol bs=1M /dev/rdsk/dks7d1vol /dev/rdsk/dks7d2vol dd if=/dev/zero of=/hw/disk/dks7d2vol bs=1M
CD Rom
/dev/rdsk/dks5d1vol /dev/rdsk/dks8d1vol
SGI O2
xvm command line interface
enterprise 136# xvm
xvm:local> dump * Don't know how to dump object type 9 (unlabeled) Don't know how to dump object type 9 (unlabeled) Don't know how to dump object type 9 (unlabeled) Don't know how to dump object type 9 (unlabeled) Don't know how to dump object type 9 (unlabeled) Don't know how to dump object type 9 (unlabeled) Don't know how to dump object type 9 (unlabeled) Don't know how to dump object type 9 (unlabeled) Don't know how to dump object type 9 (unlabeled) Don't know how to dump object type 9 (unlabeled) label -name disk1 -volhdrblks 3072 -xvmlabelblks 1024 -uuid 9c694584-0d9e-102f-86cc-08006911d5db /dev/rdsk/dks4d114vol label -name disk2 -volhdrblks 3072 -xvmlabelblks 1024 -uuid 9c694585-0d9e-102f-86cc-08006911d5db /dev/rdsk/dks4d117vol label -name disk3 -volhdrblks 3072 -xvmlabelblks 1024 -uuid 9c694586-0d9e-102f-86cc-08006911d5db /dev/rdsk/dks4d116vol label -name disk4 -volhdrblks 3072 -xvmlabelblks 1024 -uuid 9c694587-0d9e-102f-86cc-08006911d5db /dev/rdsk/dks4d115vol slice -start 0 -length 71683264 -uuid 9c694588-0d9e-102f-86cc-08006911d5db phys/disk1 slice -start 0 -length 71683264 -uuid 9c694589-0d9e-102f-86cc-08006911d5db phys/disk2 slice -start 0 -length 71683264 -uuid 9c69458a-0d9e-102f-86cc-08006911d5db phys/disk3 slice -start 0 -length 71683264 -uuid 9c69458b-0d9e-102f-86cc-08006911d5db phys/disk4 stripe -pieces 2 -unit 128 -tempname -uuid 9c69458c-0d9e-102f-86cc-08006911d5db stripe -pieces 2 -unit 128 -tempname -uuid 9c69458d-0d9e-102f-86cc-08006911d5db subvol -uid 0 -gid 0 -mode 0600 -type 2 -tempname -uuid 9c69458e-0d9e-102f-86cc-08006911d5db subvol -uid 0 -gid 0 -mode 0600 -type 2 -tempname -uuid 9c69458f-0d9e-102f-86cc-08006911d5db volume -volname people -uuid 9c694590-0d9e-102f-86cc-08006911d5db volume -volname tmp -uuid 9c694591-0d9e-102f-86cc-08006911d5db xvm:local> xvm:local> dump -topology tmp volume -volname tmp -uuid 9c6945a0-0d9e-102f-86cc-08006911d5db subvol -uid 0 -gid 0 -mode 0600 -type 2 -tempname -uuid 9c6945a1-0d9e-102f-86cc-08006911d5db stripe -pieces 2 -unit 128 -tempname -uuid 9c6945a2-0d9e-102f-86cc-08006911d5db slice -start 0 -length 71683264 -uuid 9c6945a3-0d9e-102f-86cc-08006911d5db phys/disk1 attach -pos 0 9c6945a3-0d9e-102f-86cc-08006911d5db 9c6945a2-0d9e-102f-86cc-08006911d5db slice -start 0 -length 71683264 -uuid 9c6945a4-0d9e-102f-86cc-08006911d5db phys/disk2 attach -pos 1 9c6945a4-0d9e-102f-86cc-08006911d5db 9c6945a2-0d9e-102f-86cc-08006911d5db attach -pos 0 9c6945a2-0d9e-102f-86cc-08006911d5db 9c6945a1-0d9e-102f-86cc-08006911d5db attach -pos 0 9c6945a1-0d9e-102f-86cc-08006911d5db 9c6945a0-0d9e-102f-86cc-08006911d5db xvm:local> dump -topology people volume -volname people -uuid 9c6945a5-0d9e-102f-86cc-08006911d5db subvol -uid 0 -gid 0 -mode 0600 -type 2 -tempname -uuid 9c6945a6-0d9e-102f-86cc-08006911d5db stripe -pieces 2 -unit 128 -tempname -uuid 9c6945a7-0d9e-102f-86cc-08006911d5db slice -start 0 -length 71683264 -uuid 9c6945a8-0d9e-102f-86cc-08006911d5db phys/disk3 attach -pos 0 9c6945a8-0d9e-102f-86cc-08006911d5db 9c6945a7-0d9e-102f-86cc-08006911d5db slice -start 0 -length 71683264 -uuid 9c6945a9-0d9e-102f-86cc-08006911d5db phys/disk4 attach -pos 1 9c6945a9-0d9e-102f-86cc-08006911d5db 9c6945a7-0d9e-102f-86cc-08006911d5db attach -pos 0 9c6945a7-0d9e-102f-86cc-08006911d5db 9c6945a6-0d9e-102f-86cc-08006911d5db attach -pos 0 9c6945a6-0d9e-102f-86cc-08006911d5db 9c6945a5-0d9e-102f-86cc-08006911d5db xvm:local> xvm:local> dump disk* label -name disk1 -volhdrblks 3072 -xvmlabelblks 1024 -uuid 9c6945aa-0d9e-102f-86cc-08006911d5db /dev/rdsk/dks4d114vol label -name disk2 -volhdrblks 3072 -xvmlabelblks 1024 -uuid 9c6945ab-0d9e-102f-86cc-08006911d5db /dev/rdsk/dks4d117vol label -name disk3 -volhdrblks 3072 -xvmlabelblks 1024 -uuid 9c6945ac-0d9e-102f-86cc-08006911d5db /dev/rdsk/dks4d116vol label -name disk4 -volhdrblks 3072 -xvmlabelblks 1024 -uuid 9c6945ad-0d9e-102f-86cc-08006911d5db /dev/rdsk/dks4d115vol slice -start 0 -length 71683264 -uuid 9c6945ae-0d9e-102f-86cc-08006911d5db phys/disk1 slice -start 0 -length 71683264 -uuid 9c6945af-0d9e-102f-86cc-08006911d5db phys/disk2 slice -start 0 -length 71683264 -uuid 9c6945b0-0d9e-102f-86cc-08006911d5db phys/disk3 slice -start 0 -length 71683264 -uuid 9c6945b1-0d9e-102f-86cc-08006911d5db phys/disk4
prtvtoc
enterprise 2# prtvtoc /dev/rdsk/*vh * /dev/rdsk/dks0d1vh (bootfile "/unix") * 512 bytes/sector Partition Type Fs Start: sec Size: sec Mount Directory 0 xfs yes 2101248 33445628 / 1 raw 4096 2097152 8 volhdr 0 4096 10 volume 0 35546876 * /dev/rdsk/dks0d2vh (bootfile "/unix") * 512 bytes/sector Partition Type Fs Start: sec Size: sec Mount Directory 0 xfs yes 8392704 27173776 1 raw 4096 8388608 8 volhdr 0 4096 10 volume 0 35566480 * /dev/rdsk/dks13d66vh (bootfile "") * 512 bytes/sector * Unallocated space: * Start Size * 4096 71683273 * Partition Type Fs Start: sec Size: sec Mount Directory 8 volhdr 0 4096 10 volume 0 71687369 * /dev/rdsk/dks13d69vh (bootfile "") * 512 bytes/sector * Unallocated space: * Start Size * 4096 71683273 * Partition Type Fs Start: sec Size: sec Mount Directory 8 volhdr 0 4096 10 volume 0 71687369 * /dev/rdsk/dks13d70vh (bootfile "") * 512 bytes/sector * Unallocated space: * Start Size * 4096 286745392 * Partition Type Fs Start: sec Size: sec Mount Directory 8 volhdr 0 4096 10 volume 0 286749488 * /dev/rdsk/dks13d73vh (bootfile "") * 512 bytes/sector * Unallocated space: * Start Size * 4096 286745392 * Partition Type Fs Start: sec Size: sec Mount Directory 8 volhdr 0 4096 10 volume 0 286749488 * /dev/rdsk/dks13d74vh (bootfile "") * 512 bytes/sector * Unallocated space: * Start Size * 4096 143370642 * Partition Type Fs Start: sec Size: sec Mount Directory 8 volhdr 0 4096 10 volume 0 143374738 * /dev/rdsk/dks13d77vh (bootfile "") * 512 bytes/sector * Unallocated space: * Start Size * 4096 143370642 * Partition Type Fs Start: sec Size: sec Mount Directory 8 volhdr 0 4096 10 volume 0 143374738 * /dev/rdsk/dks4d114vh (bootfile "") * 512 bytes/sector * Unallocated space: * Start Size * 3072 71684297 * Partition Type Fs Start: sec Size: sec Mount Directory 8 volhdr 0 3072 10 volume 0 71687369 * /dev/rdsk/dks4d115vh (bootfile "") * 512 bytes/sector * Unallocated space: * Start Size * 3072 71684297 * Partition Type Fs Start: sec Size: sec Mount Directory 8 volhdr 0 3072 10 volume 0 71687369 * /dev/rdsk/dks4d116vh (bootfile "") * 512 bytes/sector * Unallocated space: * Start Size * 3072 71684297 * Partition Type Fs Start: sec Size: sec Mount Directory 8 volhdr 0 3072 10 volume 0 71687369 * /dev/rdsk/dks4d117vh (bootfile "") * 512 bytes/sector * Unallocated space: * Start Size * 3072 71684297 * Partition Type Fs Start: sec Size: sec Mount Directory 8 volhdr 0 3072 10 volume 0 71687369 prtvtoc: /dev/rdsk/dks5d1vh: I/O error * /dev/rdsk/dks7d1vh (bootfile "") * 512 bytes/sector Partition Type Fs Start: sec Size: sec Mount Directory 0 xfs yes 2101248 33445628 1 raw 4096 2097152 8 volhdr 0 4096 10 volume 0 35546876 * /dev/rdsk/dks7d2vh (bootfile "/unix") * 512 bytes/sector Partition Type Fs Start: sec Size: sec Mount Directory 0 xfs yes 8392704 27154172 1 raw 4096 8388608 8 volhdr 0 4096 10 volume 0 35546876 prtvtoc: /dev/rdsk/dks8d1vh: I/O error enterprise 3#
firewire (1394) Maxtor drive
fx -d /dev/rdsk/1394/10b9021153eeca/lun0vol/c5p0
enterprise 141# fx -d /dev/rdsk/1394/10b9021153eeca/lun0vol/c5p0 fx version 6.5, Jul 1, 2005 ...opening /dev/rdsk/1394/10b9021153eeca/lun0vol/c5p0 ...drive selftest...OK fx: Warning: device appears to be a floppy Scsi drive type == Maxtor ----- please choose one (? for help, .. to quit this menu)----- [exi]t [d]ebug/ [l]abel/ [b]adblock/ [exe]rcise/ [r]epartition/ fx> r ----- partitions----- part type blocks Megabytes (base+size) 8: volhdr 0 + 4096 0 + 2 10: volume 0 + 1953525168 0 + 953870 capacity is 1953525168 blocks ----- please choose one (? for help, .. to quit this menu)----- [ro]otdrive [u]srrootdrive [o]ptiondrive [re]size fx/repartition> o fx/repartition/optiondrive: type of data partition = (xfs) ? ----- type of data partition----- [x]fs [e]fs fx/repartition/optiondrive: type of data partition = (xfs) x Warning: you will need to re-install all software and restore user data from backups after changing the partition layout. Changing partitions will cause all data on the drive to be lost. Be sure you have the drive backed up if it contains any user data. Continue? y fx/repartition/optiondrive: Warning: device appears to be a floppy ----- partitions----- part type blocks Megabytes (base+size) 7: xfs 4096 + 1953521072 2 + 953868 8: volhdr 0 + 4096 0 + 2 10: volume 0 + 1953525168 0 + 953870 capacity is 1953525168 blocks ----- please choose one (? for help, .. to quit this menu)----- [ro]otdrive [u]srrootdrive [o]ptiondrive [re]size fx/repartition> .. ----- please choose one (? for help, .. to quit this menu)----- [exi]t [d]ebug/ [l]abel/ [b]adblock/ [exe]rcise/ [r]epartition/ fx> exi
mkfs_xfs /dev/dsk/1394/10b9021153eeca
/etc/fstab
# firewire drive /dev/dsk/1394/10b9021153eeca/lun0vol/c5p0 /tmp2 xfs rw 0 0
Bricks
down:
001C10 001c16 008C24 013c10 PIMM 0 VCPU Low V @ 1.678
004c35 off 006c21 No 48V 012c32 off 014c29 off
004c35 off (PIMM issues, Luke 7-7-06) 006c21 off 008c24 pwered down, display goofy. 012c32 off SGI Defective Part 014c24 off
console msg:
006r19 error: error readinf the router control 110 expander 002c10 1.5v low @ 1.368V
001c24 powered off and back on 131p16 power bay 101i20 Lost Redundancy
L2find L2term
002c10 L1 (Voltage regulator Seems like it needs replacing.)
—- 12/22?/2006
13c10 PIMM0 VCPU low/stabilized@1.1748 11c32 Error Reading noitor node 1 interupt status 1 : no Ack 101:20 DPS Power Bay 4 predicive fail
NASID 58-59 and 74-75 incomplete memory region Autoboot failed dksc(0,1,0)unix:no such file or Directory 14c24 was off
#### from phone converstation in early January with Al Duran(?) from SGI:
Load installation tools The #1 CD contains instaation tools.
esc to quit boot when prompted.
Maintenance Menu #2 Install
Jump into shell
v: /root/etc /old
Might be possible to break into the os using the copy of the drive made during the installation. (this was not neccessary.) —-to break in—-
passwd delete second field shadow
save
restart single user admin 6.520 esc to quit boot when prompted. Maintenance Menu #2 Install
- > shell
2002-2004 was end of life of disk box (d-brick??)
Built 94 or 95
Clarion Mass storage ==== Initial Installation process:
Load OS, Patch and configure
-xfs Dump restore
raid disk box – drop in – information from Joe or Eric. Does not recommend Apple.
####
To update hardware inventory database(re-inventory):
esc to abort boot
5 command monitor
>> update
Al's number: 804 839 8764
####
l2term to enter the l2 terminal control from the laptop connected to ??
008c24 PIMM 1 B 008c24 PIMM 1 A 014c24 PIMM 0 A 002c10 1.5v low
tty erase (backspace)
114 117 1
015c10 015c16
update klconfig_cpuinfo (error??)
load dksc(0,1,0)/unix : no such file or directory
c for continue goes to sash
sash: sash: boot (to boot)
from l1
env
env reset (to clear messages)
log
cfs
from pod → reset
enable all
Off:
011c32 012c32 006c21 004c35
####
1-24-2007 still trying to get enterprise going.
0013c13 12 Bias low fault below 8.0V auto powerdowm 002-L2 can't read packet from connection to 002c35 Irouter: read failed-read error
—-
kill all l2term startx
don't use the term that comes up.
start xterm
—- from pod, missing memory regions:
30-31 36-37 50-51 58-59 74-75
30 004c32 off 36 013c10 off 59 012c35 off 51 011c35 off 74 006c16 off
Kept turning off complaining c bricks and she came up
ssh enterprise swmgr freeware.sgi.com rsync 2.5.6 pkginfo screen
############################################################################## 1-25-07 still trying to get enterprise running. Went down during the night.
008c24 ALERT: Error reading monitor PIMM1 A and B interupt status1: "no acknowledge" and "bus busy"
l2erm | tee /tmp/l2.log
during boot:
160 objects 116 hubs 44 routers
001c32: IRouter: read failed- read error 004c29 Alert: Error Reading monitor PIMM1 A interupt status 2: arbitration lost.
l2gui (??)
004c24 Membanks 01 disabled 0016c35: CPU disabled, Reason: One or both cpu's took exception. 004c29 missing 008c24 missing
Incomplete memory regions:(NASID's)
92-93 28-29 122-123
016c36
as of 1-25-2007, these c bricks are turned off:
001c24 004c29 004c32 004c35 Luke 7/7/06 Pimm Issues 006c16 006c21 008c24 011c32 011c35 012c32 SGI Defective 012c35 013c10 013c13 016c35 004c24
1-29-2007
Rebooted and ran update.
113 hubs 44 routers
452 400MHz IP35 processors
Hardware information:
# hinv -v # topology (needs original path, not freeware)
incomplete memory regions:
122-123 92-93 4-5
6-18-2007 Turned off
002C10 002C13 ?
Moved power connectors for router over left one position.
Jun 14 03:53:17 enterprise unix: |$(0x160)WARNING: 002c10 ATTN: 1.5V level stabilized @ 1.354V.
March 17, 2008
Commented ftp out of /etc/inetd.conf and kill -HUP the process because of the logs being filled with connection attempts.
http://www.pimpworks.org/sgi/issues.html
April 11, 2008
Turned off 03c35(complaining) and 03c32(ok?) Turned off 13c29(complaining) and 13c24(ok?) Turned off 14c32(complaining) and 14c35(ok?)
May 19, 2008
Turned off 002c29(complaining) and 002c24(ok) Turned off 012c16(complaining) and 012c21(ok)
July 2, 2008
Turned off 016c29(complaining) and 016c24 Turned off 007c21(complaining) and 007c16
Nov 3, 2008
Turned off 004r38(complaining) Turned off 001c32(complaining) and 001c35
Dec 15, 2008
Rebooted
Dec 18, 2008
Turned off 003c10(complaining) and 003c13
Dec 19, 2008
011r19 complaining of high voltage
switched power module with 6r19 and switched 6r19 with 4r38 since 4r38 was complaining worse
That did not help, so switched 4r38 main board with 11r19. now personalities are reversed.
tried turning off 11r19 (which now reports as 4r38) and turning on 4r38.
Dec 23, 2008
Switched 4r38(bad one) with 12r19(good one)
4r38 ALERT: Error reading monitor POWER interrupt status 1: no acknowledge
This fixed 4r38, a repeating router and allowed 11r19 to be turned off.
Jan 6, 2009
$(0x167)WARNING: /hw/module/016c13/node/cpubus/1/a: INTR: Scache SBE, failing bit 16 CPU F9C3.K3 SRAM H4A7.P7
Oct 9, 2009
12c24 Error reading monitor Node 2 interrupt status 1: no acknowledge 14c16 PIMM0 High
Oct 16, 2009
12c32 Error reading monitor PIMM0 A interrupt status 1: no acknowledge 12c32 Error reading monitor PIMM0 A interrupt status 2: no acknowledge
Oct 19, 2009
1c24 Fatal PI errir interrupt 0x80000000 (from CPU B)