Gregory Boehnlein
2008-03-08 17:08:45 UTC
Hello,
I just wanted to report back on my success using Round Robin
Multipath Load Balancing for Vmware ESX 3.5. My production SAN is running on
the Wasabi Systems iSCSI Target, while my LAB setup will be running on IET.
I have not yet tested this on IET, so I'm not sure how well it will work,
but I'm very happy w/ the results on the Wasabi target. I do know that I was
able to get really decent performance on IET using an 802.3ad LAG and the
bonding driver on IET. The multipath trick will have to wait until I get a
new box for testing.
My backend SAN is based on the following hardware:
Intel S5000PSLSATA Dual 771 Intel 5000P SSI EEB 3.6 (Extended ATX) Server
Motherboard
(NewEgg Link:
http://www.newegg.com/product/product.aspx?Item=N82E16813121038)
(This is the same motherboard in the Intel SSRC storage servers)
3ware 9550SXU-16ML Array Controller
(NewEgg Link:
http://www.newegg.com/Product/Product.aspx?Item=N82E16816116059&Tpk=N82E1681
6116059)
16 x Seagate Barracuda ES.2 ST31000340NS 1TB 7200 RPM 32MB Cache SATA
3.0Gb/s Hard Drives
(NewEgg Link:
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148278&Tpk=ST310003
40NS)
I have used both IET on Centos 5 (latest SVN release) and the Wasabi Systems
target on this hardware mix w/out issues.
For now, I am using the two onboard NICs, and have plans to add a pair of
Intel PT1000 dual-port cards to get 6 total NICS in the future. Each NIC is
plugged into a dedicated Gig-E switch which only carries iSCSI traffic for
the SAN.
After pretty exhaustive testing of different raid stripe sizes, I settled on
RAID-10 w/ 64k stripes (the default for the 3Ware). This gives me roughly
7.21 TB of usable storage.
I've broken it up in to 6 LUNS of roughly 1.2 TB each.
On the Wasabi target, I export all of the LUNS through different target IP
addresses. I.E. all LUNS are accessible through BOTH targets. These are
called "NODES" in Wasabi terminology.
10.1.2.254 - Target 1 - NIC 1
10.1.3.254 - Target 2 - NIC 2
My ESX Servers have been configured thusly:
vSwitch 1 - Service Console (iscsi Auth) 10.1.2.1
- Vmkernel Interface (iniator) 10.1.2.2
VSwitch 2 - Service Console (iscsi Auth) 10.1.3.1
- Vmkernel Interface (iniator) 10.1.3.2
I have defined two targets for the software iSCSI initator: 10.1.2.254 /
10.1.3.254
When I scan the target, I end up w/ 10 paths to the 6 LUNS. Doing an
esxcfg-mpath -l shows that the luns are setup in a Fixed path failover
scenario, where the secondary path is only used in the case of failure on
the first path.
Disk vmhba32:0:0 /dev/sdb (1228800MB) has 2 paths and policy of Fixed
iScsi sw
iqn.1998-01.com.vmware:esxhost3-6fd23d7e<->iqn.2000-05.com.wasabisystems.sto
ragebuilder:iscsi-0 vmhba32:0:0 On preferred
iScsi sw
iqn.1998-01.com.vmware:esxhost3-6fd23d7e<->iqn.2000-05.com.wasabisystems.sto
ragebuilder:iscsi-0 vmhba32:1:0 On active
After lots of testing, I decided to use the following settings on my LUN:
esxcfg-mpath --lun=vmhba32:0:0 -p rr
esxcfg-mpath --lun=vmhba32:0:0 -H any -B 64 -C 64 -T any
Basically, this sets the initiator for that HBA into Round Robin mode, and
switches paths after every 64 blocks and every 64 commands.
Doing so, I am able to get sustained Read/Write speeds of nearly 150 MB /
second utilizing both paths. The tests were performed using a Centos 5
Virtual Machine doing "dd if=/dev/null of=/dev/sdb bs-1024k count-1024". I
ran them for over 48 hours on a single VM.
When I run 8 concurrent VMs hammering the storage array at the same time, I
am able to see sustained peak read/write speeds of nearly 180 MB / second.
I spent the better part of the afternoon working on a conference call w/
Wasabi and Vmware engineers and did identify one gotcha in my setup. I have
a SuperMicro KVM over IP card that relies on USB support for an HID
keyboard/mouse setup. As a result, interrupts for the different NICS in my
Vmware box were not being properly distributed across all CPUs. By disabling
USB (and losing the keybord for the KVM.. bummer) I was able to get the IRQs
distributed across the CPUs in the box, and took my speed from 95 MB /
second to 150 MB / second.
I hope this is helpful if anyone else is looking to multipath their storage,
and I'd be very interested in seeing how this works under IET. I don't
expect there would be any issues with it, but I'd love for someone to try it
and report back before I build my LAB SAN.
I just wanted to report back on my success using Round Robin
Multipath Load Balancing for Vmware ESX 3.5. My production SAN is running on
the Wasabi Systems iSCSI Target, while my LAB setup will be running on IET.
I have not yet tested this on IET, so I'm not sure how well it will work,
but I'm very happy w/ the results on the Wasabi target. I do know that I was
able to get really decent performance on IET using an 802.3ad LAG and the
bonding driver on IET. The multipath trick will have to wait until I get a
new box for testing.
My backend SAN is based on the following hardware:
Intel S5000PSLSATA Dual 771 Intel 5000P SSI EEB 3.6 (Extended ATX) Server
Motherboard
(NewEgg Link:
http://www.newegg.com/product/product.aspx?Item=N82E16813121038)
(This is the same motherboard in the Intel SSRC storage servers)
3ware 9550SXU-16ML Array Controller
(NewEgg Link:
http://www.newegg.com/Product/Product.aspx?Item=N82E16816116059&Tpk=N82E1681
6116059)
16 x Seagate Barracuda ES.2 ST31000340NS 1TB 7200 RPM 32MB Cache SATA
3.0Gb/s Hard Drives
(NewEgg Link:
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148278&Tpk=ST310003
40NS)
I have used both IET on Centos 5 (latest SVN release) and the Wasabi Systems
target on this hardware mix w/out issues.
For now, I am using the two onboard NICs, and have plans to add a pair of
Intel PT1000 dual-port cards to get 6 total NICS in the future. Each NIC is
plugged into a dedicated Gig-E switch which only carries iSCSI traffic for
the SAN.
After pretty exhaustive testing of different raid stripe sizes, I settled on
RAID-10 w/ 64k stripes (the default for the 3Ware). This gives me roughly
7.21 TB of usable storage.
I've broken it up in to 6 LUNS of roughly 1.2 TB each.
On the Wasabi target, I export all of the LUNS through different target IP
addresses. I.E. all LUNS are accessible through BOTH targets. These are
called "NODES" in Wasabi terminology.
10.1.2.254 - Target 1 - NIC 1
10.1.3.254 - Target 2 - NIC 2
My ESX Servers have been configured thusly:
vSwitch 1 - Service Console (iscsi Auth) 10.1.2.1
- Vmkernel Interface (iniator) 10.1.2.2
VSwitch 2 - Service Console (iscsi Auth) 10.1.3.1
- Vmkernel Interface (iniator) 10.1.3.2
I have defined two targets for the software iSCSI initator: 10.1.2.254 /
10.1.3.254
When I scan the target, I end up w/ 10 paths to the 6 LUNS. Doing an
esxcfg-mpath -l shows that the luns are setup in a Fixed path failover
scenario, where the secondary path is only used in the case of failure on
the first path.
Disk vmhba32:0:0 /dev/sdb (1228800MB) has 2 paths and policy of Fixed
iScsi sw
iqn.1998-01.com.vmware:esxhost3-6fd23d7e<->iqn.2000-05.com.wasabisystems.sto
ragebuilder:iscsi-0 vmhba32:0:0 On preferred
iScsi sw
iqn.1998-01.com.vmware:esxhost3-6fd23d7e<->iqn.2000-05.com.wasabisystems.sto
ragebuilder:iscsi-0 vmhba32:1:0 On active
After lots of testing, I decided to use the following settings on my LUN:
esxcfg-mpath --lun=vmhba32:0:0 -p rr
esxcfg-mpath --lun=vmhba32:0:0 -H any -B 64 -C 64 -T any
Basically, this sets the initiator for that HBA into Round Robin mode, and
switches paths after every 64 blocks and every 64 commands.
Doing so, I am able to get sustained Read/Write speeds of nearly 150 MB /
second utilizing both paths. The tests were performed using a Centos 5
Virtual Machine doing "dd if=/dev/null of=/dev/sdb bs-1024k count-1024". I
ran them for over 48 hours on a single VM.
When I run 8 concurrent VMs hammering the storage array at the same time, I
am able to see sustained peak read/write speeds of nearly 180 MB / second.
I spent the better part of the afternoon working on a conference call w/
Wasabi and Vmware engineers and did identify one gotcha in my setup. I have
a SuperMicro KVM over IP card that relies on USB support for an HID
keyboard/mouse setup. As a result, interrupts for the different NICS in my
Vmware box were not being properly distributed across all CPUs. By disabling
USB (and losing the keybord for the KVM.. bummer) I was able to get the IRQs
distributed across the CPUs in the box, and took my speed from 95 MB /
second to 150 MB / second.
I hope this is helpful if anyone else is looking to multipath their storage,
and I'd be very interested in seeing how this works under IET. I don't
expect there would be any issues with it, but I'd love for someone to try it
and report back before I build my LAB SAN.