User Tools

Site Tools


aoe:hpc

linpack

Benchmark tool

nis

todo:

  • set hostname and domainname
  • firewall for nis/nfs server

Some good setup info: http://penguin.triumf.ca/recipes/nis-auto/index.html

nis client

# hostname (does it need the fqdn?)
# domainname (make sure it is set)
# domainname setup (to set it to 'setup')
# ypdomainname (same, but from yp tools)
# portmap (should be running)

edit /etc/yp.conf and add to the end:

  domain setup server G4-cluster01.setup.lan

Create /var/yp if it does not exist

mkdir /var/yp

edit /etc/nsswitch.conf

...

passwd:     files nis
shadow:     files nis
group:      files nis

...

automount:  files nis

...

To start the NIS client, otherwise known as ypbind:

# service ypbind start

To test:

# rpcinfo -p localhost

Output similar to:

       program vers proto   port
        100000    2   tcp    111  portmapper
        100000    2   udp    111  portmapper
        100007    2   udp    758  ypbind
        100007    1   udp    758  ypbind
        100007    2   tcp    761  ypbind
        100007    1   tcp    761  ypbind
# rpcinfo -u localhost ypbind

Output similar to:

        program 100007 version 1 ready and waiting 
        program 100007 version 2 ready and waiting
# ypcat passwd

Should output the yp users in password file.

Start ypbind

chkconfig ypbind on
service ypbind start

nis server

install:

# yum install ypserv

already installed (add if not already installed):

yp-tool 
ypbind

edit /etc/yp.conf as above

add normal unix users, then

# /usr/lib/yp/ypinit -m

option: on slaves yp servers (not all clients)

# ypinit -s G4-cluster01.setup.lan

Check the nis server:

# rpcinfo -u localhost ypserv
program 100004 version 1 ready and waiting
program 100004 version 2 ready and waiting

If not running,

# service ypserv start

To update the map (don't use ypinit -m):

(update users)
# cd /var/yp
# make

To allow password changes:

# rpc.yppasswdd
# rpc.yppasswdd -e chfn -e chsh (to allow changing of full name and login shell)

/etc/ypsecurenets

# allow connections from local host -- necessary
host 127.0.0.1
# same as 255.255.255.255 127.0.0.1
#
# allow connections from any host
# on the 192.168.2.0 network
255.255.255.0   192.168.2.0

start services on boot

chkconfig ypserv on
chkconfig ypbind on
chkconfig ypxfrd on
chkconfig yppasswdd on

nfs

  • edit /etc/exports
/home \
        G4-cluster01.setup.lan(rw,sync,no_root_squash) \
        G4-cluster02.setup.lan(rw,sync,no_root_squash) \
        G4-cluster03.setup.lan(rw,sync,no_root_squash) \
        G4-cluster04.setup.lan(rw,sync,no_root_squash)

/usr \
        G4-cluster01.setup.lan(rw,sync,no_root_squash) \
        G4-cluster02.setup.lan(rw,sync,no_root_squash) \
        G4-cluster03.setup.lan(rw,sync,no_root_squash) \
        G4-cluster04.setup.lan(rw,sync,no_root_squash)

* Add to startup services
chkconfig portmap on
chkconfig nfs on
  • Start the nfs server
portmap
service nfs start
  • Reload any nfs changes to /etc/exports
exportfs -ra

auto mount

if not using automount, add to /etc/fstab

G4-cluster01:/home      /home                   nfs     defaults        0 0

If using automount, on master:

/etc/auto.master

/home auto.home

/etc/auto.home

steve    G4-cluster01.setup.lan:/home/&

or

  • G4-cluster01.setup.lan:/home/&

/etc/nsswitch

automount:  files nis

Then update the database if yp is already running:

cd /var/yp
make

mpi

mpi

openmp

system monitoring

batch system

pbs

http://euler.phys.cmu.edu/cluster/pbs.html

qsub
showq
checkjob

notes from Shinpaugh:

openpbs → torque

A free version of Moab is maui

To submit your job to the queuing system use the command qsub:

  qsub ./JobScript.sh

This will return your job name of the form xxxx.queueserver. Example: 4567.admin01

To remove a job from the queue, or stop a running job, use the command `qdel <job_number>`. Example: qdel 4567

To see status information about your job, you can use:

      `qstat -f <job_number>`     which is a Torque command that will provide detailed information about the job.
      `showstart  <job_number>`     which is a Moab command that will tell you expected start and finish times.
       and `checkjob -v <job_number>`     which is a Moab command that will provide detailed information about the job.

NOTE: The Moab commands may report an error of the form “ERROR: cannot locate job '<job name>' if the scheduler has not yet picked up the newly submitted job. If so, just a wait a minute and try again.

When your job has finished running, any outputs to stdout or stderr will be placed in the files .o and .e. These 2 files will be in the directory that you submitted the job from.

To find information about all your queued or running jobs you can use the commands `qstat` and `showq`. The `qstat` command without a <job name> argument will show all ithaca jobs from the Torque resource manager's perspective. The `showq` command without arguments will show all of the running jobs on all ARC systems from the Moab scheduler's perspective . If you wish to only view ithaca jobs with showq, use `showq -p ITHACA`. NOTE: Users generally find showq to be more useful that qstat.

High Availability Cluster

aoe/hpc.txt · Last modified: 1970/01/05 20:29 by 127.0.0.1