Wednesday, October 7, 2009

Can't open boot_archive

During a disaster recovery exercise in one of my customers, I had to deal with this frustrating error and tried almost everything to solve it (since we didn't had too much time for debugging, I've opened a software case with Sun.

This is a sample of the msg:

 

Boot device: /pci@0/pci@0/pci@2/scsi@0/disk@2,0:a File and args: -s -r

Can't open boot_archive

Evaluating:
The file just loaded does not appear to be executable.
{0} ok  

 

 

After a few hours debugging the issue with Sun, we came in the conclusion that the disk had a new bootblock (kernel 139555-08) which was seeking for the boot archive area (but it wasn't there, therefore that's the reason for the msg error). Actually, we did some OS patch changes prior to the DR exercise, but unfortunately we had to back out.

 

The mystery of the entire history is that the image that was applied to the disk a few days ago was using an old kernel version (127111-11) and the bootblock that is compatible with this kernel version should be there (but it wasn't :\)

 

Anyway, after the whole damn troubleshooting this is what we've done to fix the issue:

 

1) Boot from an old disk (that has an older kernel version - in our case we booted using 127111-11)

2) Install a new boot block:

 

************************************************
# To install boot block in the root disk:
# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk/dev/rdsk/cXtXdXs0
************************************************

 

3) Reboot the server and be happy

 

If you still face issues, you might need to reboot in failsafe mode and sync your bootarchive. To do that, go to {OK} prompt and type:

 

boot -F failsafe

 

A message like this will appear:

 

An out of sync boot archive was detected on /dev/dsk/c1t2d0s0.
The boot archive is a cache of files used during boot and
should be kept in sync to ensure proper system operation.

 

Do you wish to automatically update this boot archive? [y,n,?] y


Updating boot archive on /dev/dsk/c1t2d0s0.
The boot archive on /dev/dsk/c1t2d0s0 was updated successfully.

 

Reboot and you should be all set :)

 

 

 

Finding the port number of a process (useful when lsof doesn't help).

Hi Folks,

 

I've been away for many months due to work/vacation but now I'm back (I'll try to post new things on a regular basis :)

 

Sometime ago I had a request from a WEBLOGIC guy to kill a process that was using a specific port, but the real issue was: *he didn't know the PID # and LSOF was not working*

 

So, I started g00gling around and found a small script that does the job. The script uses one of the proc tools "PFILES" to grep for the port # given in the command line:

 

# ./portfind.sh 161

Greping for your port, please be patient (CTRL+C breaks)

        sockname: AF_INET 10.13.204.20  port: 8161

Is owned by pid 1438

..

        sockname: AF_INET 0.0.0.0  port: 161

Is owned by pid 2474

..

 

 

# ps -ef | egrep '1438|2474'

    root  1438     1  0   Sep 28 ?        0:03 ./snmpmagt /opt/patrol/PATROL/Solaris28-sun4/lib/snmpmagt.cfg NOV

    root  2474     1  0   Sep 28 ?        0:00 /usr/lib/snmp/snmpdx -f 0 -y -c /etc/snmp/conf

    root  2490  2474  0   Sep 28 ?       11:36 mibiisa -r -p 32855

 

 

Here is the script code:

 

 

#!/bin/bash

# is the port we are looking for

 

if [ $# -lt 1 ]

then

echo "Please provide a port number parameter for this script"

echo "e.g. %content 1521"

exit

fi

 

echo "Greping for your port, please be patient (CTRL+C breaks)"

 

for i in `ls /proc`

do

pfiles $i | grep AF_INET | grep $1

if [ $? -eq 0 ]

then

echo Is owned by pid $i

echo ——

fi

done

 

Enjoy