uptime madness, or why do you need to reboot just because you replace the harddiscs?

I just got an new HP Microserver for a customer. I only had two 500GB disc available and installed Centos onto it. But now the 4* 3TB discs have arrived and I need to move everything from the 2 small discs to the new large discs.

Of course I could do a reboot, boot into a rescue CD and copy the data, but I don’t want to boot! Why no reboot? because I can! ūüôā

I installed Centos onto the 2 disc with md0 as a 500MB RAID1 containing /boot, and md1 as a RADI1 containing the rest of the discs hosting a LVM Physical Volume.

This configuration is not guaranteed to work with every setup. Booting with a Bios from a GPT Partition should not work. It works on a HP Microserver, but it does not work on a Asus Motherboard I tried it as well. Of course as always: If you follow this setup and it breaks, eats you data, your homework or you cat. It is you own fault, don’t blame me!

1.) I started by stopping the RAID for sdb to remove this disc.

mdadm -f /dev/md0 /dev/sdb1
mdadm -r /dev/md0 /dev/sdb1
mdadm -f /dev/md1 /dev/sdb2
mdadm -r /dev/md1 /dev/sdb2

2. I removed the disc from the machine and put it into a USB/SATA Converter, and put it back into the RAID. Nowadays it’s very fast because the RAID detects what is still in sync. I feared a long wait to sync 500GB over USB, but is was done in seconds instead. Nice!

mdadm -a /dev/md0 /dev/sdb1
mdadm -a /dev/md1 /dev/sdb2

3. Next I removed the remaining disc from the RAID and remove it from the case. Now you have a backup disc in case something goes wrong now!

mdadm -f /dev/md0 /dev/sda1
mdadm -r /dev/md0 /dev/sda1
mdadm -f /dev/md1 /dev/sda2
mdadm -r /dev/md1 /dev/sda2

4. Now I plugged in the 4 new 3TB hard discs. I run the usual badblocks -v -v -w on it, before I installed it. Create on every disc 2 partitions and mark them as Linux SW RAID.

parted -s -- /dev/sda \
mklabel gpt \
mkpart boot-raid ext2 1M 525M \
toggle 1 raid \
mkpart lvm-raid ext2 525M -1 \
toggle 2 raid

5. Add the 500MB partitions to md0. Remove the old Partition from the USB-Disk and extend the RAID from a 2 disc RAID1 to a 4 disc RAID1.

mdadm -a /dev/md0 /dev/sda1
mdadm -a /dev/md0 /dev/sdc1
mdadm -a /dev/md0 /dev/sdd1
mdadm -a /dev/md0 /dev/sde1
mdadm -f /dev/md0 /dev/sdb1
mdadm -r /dev/md0 /dev/sdb1
mdadm -G -n 4 /dev/md0

6. Create a new RAID. Is use RAID 5 named /dev/md2 and create a Physical Volume on it.

mdadm -C -n 4 -l 5 /dev/md2 /dev/sda2 /dev/sdc2 /dev/sdd2 /dev/sde2
pvcreate /dev/md

7. Extend the existing Volume Group to /dev/md2 and move all Data from md1 to md2. Remove md1 from the Volume Group when done and destroy md1.

vgextend vg_name /dev/md2
pvmove /dev/md1 /dev/md2
vgreduce vg_name /dev/md1
mdadm -S /dev/md

8. The hardest stop is to make the boot possible. You need to get the UUID of the new RAID1 and add that to the grub.conf. Also you need to update your mdadm.conf  and recreate your initramfs. Finally you need to install grub again onto the new sda.

mdadm -D /dev/md2 | grep UUID | sed -e 's/UUID : //'
#add resulting UUID with rd_MD_UUID= to all kernels
mdadm --examine --scan >> /etc/mdadm.conf
dracut -f /boot/initramfs-$(uname -r).img $(uname -r)
grub-install /dev/sda

9. reboot.

Wait, why reboot now, when I tried not to reboot? Because sooner or later you have to reboot and I want to now know if that will work.

Posted in Enterprise Linux, Fedora, Linux | Tagged | 1 Comment

Thank you Seth Vidal, my first ansible playbook

I was shocked when I heard about Seth Vidal’s death. Of course I use yum daily, but it brought tears to my eyes, when I was reading my¬† “my TODO List after a install” and realized that Seth was one of two people who responded. Thanks Seth, I will remember you.

So in reference to him, here my first ansible playbook:

---
- hosts: all
  user: root
  tasks:
  - name: make sure eth0 starts at boot
    lineinfile: dest=/etc/sysconfig/network-scripts/ifcfg-eth0 regexp=^ONBOOT= line=ONBOOT=yes backup=yes

  - name: put ssh-key in
    authorized_key: user=root key="{{lookup('file', '~/.ssh/id_dsa.pub') }}" manage_dir=yes

  - name: get epel-repo rpm RHEL6
    get_url: dest=/tmp/epel-release.rpm  url=http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
    when: ansible_os_family == 'RedHat' and ansible_lsb.major_release|int == 6
  - name: get epel-repo rpm RHEL5
    get_url: dest=/tmp/epel-release.rpm  url=http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
    when: ansible_os_family == 'RedHat' and ansible_lsb.major_release|int == 5

  - name: install epel-repo rpm
    yum: pkg=/tmp/epel-release.rpm state=installed

  - name: install my packages
    yum: pkg={{ item }} state=installed
    when: ansible_os_family == 'RedHat' and ansible_lsb.major_release|int == 6
    with_items:
#       - mmv 
       - policycoreutils-python
       - mod_ssl
       - screen
       - policycoreutils-python 
       - iotop 
       - yum-plugin-ps 
       - yum-cron   
       - iptraf 
       - acpid 
       - man 
       - bind-utils 
       - vim-enhanced 
       - nc 
       - zip 
       - unzip 
       - wget 
       - etckeeper 
       - links 
       - screen 
       - yum-utils 
       - lsof 
       - bash-completion 
       - ddrescue 
       - dos2unix 
       - dstat 
       - lftp 
       - links 
       - hdparm 
       - smartmontools 
       - jwhois 
       - kexec-tools 
       - mc 
       - mcelog 
       - memtest86+ 
       - mtr 
       - nmap 
       - ntp 
       - openssh-server 
       - pbzip2 
       - rng-tools 
       - sysstat 
       - vconfig 
       - vlock 
       - lzop 
       - atop 
       - mosh

  - name: activate autoupdate
    service: enabled=yes state=started name=yum-cron

  - name: initialize etckeeper
    command: /usr/bin/etckeeper init creates=/etc/.git/description
  - name: make first commit
    command: /usr/bin/etckeeper commit -m "init" creates=/etc/.git/COMMIT_EDITMSG
Posted in Enterprise Linux, Fedora, Uncategorized | Tagged | 1 Comment

my TODO List after a install

I had to reinstall a couple of machines recently and I had to do the same thing more then once. So I wrote a script,and for my own future self I post it here, as reference.

sed -e 's:ONBOOT=no:ONBOOT=yes:' -i /etc/sysconfig/network-scripts/ifcfg-eth0
yum install http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-9.noarch.rpm http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-10.noarch.rpm
yum update
yum install policycoreutils-python iotop yum-plugin-ps yum-cron   iptraf acpid man bind-utils vim-enhanced nc zip unzip wget etckeeper links screen yum-utils lsof bash-completion ddrescue dos2unix dstat lftp links hdparm smartmontools jwhois kexec-tools mc mcelog memtest86+ mmv mtr nmap ntp openssh-server pbzip2 rng-tools sysstat vconfig vlock lzop atop mosh
mkdir /root/.ssh/
echo 'ssh-dss AAAAB3NzaC1kc3MAAACBAKk1vlmmXqEEeyrIfvhIXaFy7E8mu39/nXvJ1UVqtwLPJedSszGKtaRGPMw/0D+csOP61mPCNHCzX8EUbrv5DZ0OB1PFaPWC+IAsXcentCO3Ssy0syqNCYYGpumUK1ycsACJbO4oiwyPJTHe2BkI8laXDjRLdrbryPD79h8k9Kd7AAAAFQCWJNXBY3gGZA6lcXZYmaWUqaCTHQAAAIARzJIR6q+GD2nDKA11A0uKrjBnJ2HzBLb9KOr+Psj2jMEmAVosvpw+NIpaCf7yNOTTrl1oTC9ziopKTGe0emxWz00Zrwvu9gxxAa0eBBVzUuVD6fSmNSGTq1mfQASF0Qhwx6BIjgnxjTZVBvuTgCgR2Kk3/tLXDK2rXWC1MTUQNAAAAIAlq8y8nWOJlSWsEctWvNQkb5av0VQo00co3qUAtiGlICwVUpOleaz2c5r7v99JrqUpG/v5IlLBzc/w7Wa4UX10gW8sAurYfIx7LnrsqrprA+yQVwDcTSmolIMAacLgO9C4IWquJKfJiGmve+0OWJ6s9sDK2vip5GnZGN5NxEPw1Q== normal' >> /root/.ssh/authorized_keys
chmod 700 ~/.ssh/
chmod 600 ~/.ssh/authorized_keys
restorecon ~/.ssh/authorized_keys ~/.ssh/
etckeeper init
etckeeper commit
chkconfig yum-cron on

Before people complain! Yes, I know puppet and chef. At work I use it daily, but this are play machines and I don’t want to handle puppet here at home.

Posted in Enterprise Linux, Fedora, Linux, Uncategorized | Tagged | 3 Comments

Be carefull when updating from RHEL6.1 to RHEL6.4

If you have a RHEL6.1 and try to update to RHEL6.4, you will get some strange errors, about missing libraries.

The reason is that ldconfig will not run anymore, because /etc/ld.so.conf.d/kernel-2.6.32-* sepcifies hwcap 0 till RHEL6.1 and RHEL 6.2 and higher specifies hwcap 1.

It the settings are different you get this error message:

ldconfig: /etc/ld.so.conf.d/kernel-2.6.32-71.29.1.el6.x86_64.conf:6: hwcap index 1 already defined as nosegneg

ldconfig cannot run after installing new libraries and some symlinks are broken.
To Fix: simply change all hwcap setting to 1 and rerun ldconfig.

Posted in Enterprise Linux, Fedora, Linux | Tagged | Leave a comment

Memorymangement is harder than it is known.

After I thought I understood memory manager after all, this comes along. I have a behavior I can’t understand or explain. Let’s see if anyone can solve the puzzle.

You have a machine, lets say with 8GB Ram. You only run a small number of process on that machine and you want to add another service.
You run free -m and it looks like this:

             total       used       free     shared    buffers     cached
Mem:          7980       4814       3165          0          1       4593
-/+ buffers/cache:        220       7759
Swap:         1759          0       1759

And you think hey, no problem a lot of free memory available. 220MB “real used” memory and 7759MB “freeable memory”. Let’s bring it on.

You start you application (I use memhog from numactl) which eats 4G of RAM and gives you a OOM.

Who knows why?

(The solution is attached as base64 encoded block) To get a peek simple execute the block in a shell.

cat << EOF | openssl base64 -d
VGhlIHByb2JsZW0gbGllcyBpbiB0aGUgZmFjdCB0aGF0IHRlbXBmcyBpbiAvZGV2
L3NobSBpcyBmdWxsLgpXaHkgdGhlIGNvbnRlbnQgb2YgL2Rldi9zaG0vIGlzIGNv
bnNpZGVyZWQgY2FjaGUgYW5kIG5vdCB1c2VkIGlzIGJleW9uZCBtZS4gQnV0IGl0
IGlzIHRoaXMgd2F5LgoK
EOF

I would still love an explanation why it is that way.

Posted in Enterprise Linux, Fedora, Linux | Tagged | 6 Comments

RHEL6.4 and NX

Be careful if you update to RHEL6.4 and you are using NX. RHEL6.4 removes the  keymap.dir and this breaks NX at the moment.

To fix this, simply do a touch /usr/share/X11/xkb/keymap.dir. It looks like that RedHat will not fix this, because the file layout of xkeyboard-config is not part of the fixed ABI/API. The Centos guys are aware of this, and I hope they will fix this instead.

More infos:
Centos
FreeDesktop
RedHat BZ

Posted in Enterprise Linux, Fedora, Linux | Tagged | Leave a comment

Todo after a Fedora upgrade

Even with the new fedup tool to upgrade. There are a couple of commands I use always after an upgrade. I suggest you use them too:

  1. yum distro-sync
  2. package-cleanup –problems
  3. package-cleanup –orphans
  4. rpmconf -a -fvimdiff
  5. rpmorphan

Helps to keep your Fedora installation clean.

 

Posted in Fedora, Linux, Uncategorized | Tagged | 1 Comment

python virtualenv

After joining my local LUG at the python workshop and because I start using more python now. Here some hints and tools that I’m using a lot lately.

virtualenvwrapper

Install it with pip install virualenvwrapper or yum install python-virtualenvwrapper.noarch.

Create new Enviroments with: mkvirtualenv ENV

Jump into the Env: workon ENV

jump out of Env: deactivate

Posted in python, Uncategorized | Tagged | Leave a comment

ArgoEclipse is dead and broken

If you are using ArgoUML and thought “Hey, its written with Eclipse, lets try using it inside Eclipse”. I’m pretty sure you search found ArogEclipse.

Please be aware this project looks very dead. I tried to open an perfectly good zargo file with it and it tells me it is broken. The Version is from 2010. Please use the standalone Version ArgoUML!

Posted in Fedora, Linux, Uncategorized | Tagged | Leave a comment

Two enterprise Linux boxes are talking to each other, or not

I setup a¬† RedHat Enterprise Box that should serve as a httpd server for a Suse Enterprise Server zypper repository and it don’t work.

Simple curl also don’t work. If you use curl -1 or curl -3 it works like a charm, curl without option I get the following error message:

Error code: Unrecognized error
error message: error: 14077458:SSL routines:SSL23_GET_SERVER_HELLO:reason(1112)

After some wiresharking I found the solution. You have to set a ServerName in the ssl VirtualHost. Yes, I know the hostname/CommonNamem is already in the sslcert you are using, but you have to repeat it in the config file there, otherwise some older version of openssl (like the one from SLES11) will run into this kind of problems.

Posted in Enterprise Linux, Linux, Uncategorized | Tagged | Leave a comment