Tag Archive - SAN

Cutting-edge of Cloud Computing

Just got off the bus in Montreal, Québec. This is a lightning visit, in 48 hours, I’ll be back in my office in Ottawa. But, right now, I’m taking a drink in one of my favorite downtown coffee shop and I’m planning.

The next few hours will see little sleep and lots of action ; More precisely I’ll be deploying lots of hardware (2 IBM SAN, 2 core servers, 2 switchs, 2 APC, 5 branchs servers – supporting up to 20 ‘leaf’/virtual servers), and then somes (3 couples of 2 systems in high redundancy (wackamole IP ‘fencing’, shared-storage through DRBD). All that will go in ‘my’ new 48U cage @Hypertec (old nortel building) to act as a demo for some clients.

Once that’s completed, the true fun start: A very big part of this infrastructure is going to be self-healing, failure resitant and high performance. We are speaking of :

  • automatic & dynamic launch of new ‘branch’ systems (xen dom0), without having to do anything more than to rack them (no OS install needed, can be upgraded by rebooting them),
  • high redundancy at the leaf level (xen domU, automatic migration toward less used dom0),
  • failure resistace through bonded interface, multi-path & multi-host fiberchannel SAN & controller…
  • This is going to be solid, scalable, fast : the holy grail of a lot of service provider that are aiming at automatization of their ‘hosting’ business. The result of a lot of planning and testing ; the cutting-edge of cloud-computing.

    new projects

    There we go. Just got a proposal accepted by one of my Montreal based client for a new joint venture in the field of cloud computing. Estimate time before full disclosure of the project is 2 weeks from now. Might not be really cute at first, but it’s going to be very useful. Hardware is pseudo-ready (not yet in rack) but we are speaking of nice stuff.

    And I’m finishing the draft for another proposal, this, however, would be a lone venture from Les Laboratoires Phoenix for a specialized service (yet very used) that isn’t readily available (at a normal cost). We are speaking of about 100x less (in respect of recurring cost) of what’s currently available. Also a 2 weeks ETA for this one.

    Might even have found an employee. Things are really moving fast.

    deprecation of md-multipath

    Following a discussion with a potential client about building a ‘truly redundant system’, thought about warning solution developer of this thread on LKML :

    There are talks of deprecating md-multipath (from the Linux kernel, for those that weren’t really following). Quite a few systems would be moving from md-devices to “something else”.

    The new flavor of the month (or year, following your P.O.V.) is dm-multipath. The configuration file is straight forward and the RedHat Documentation is very decent.

    Ok, I know: the names are confusing. MD device drivers stand for “multiple devices” and is also know as Linux software raid solution. DM is the acronym of “Device Mappers” and is more known as the pre-requisite for LVM2 (not LVM1, but then who still use that!?) or as the foundation of dm-crypt, a free software interface allowing block level encryption through Linux (v2.6+) kernel cryptoapi framework.

    On the topic of names; This is where I insert this familiar rant. I’ve never quite understood the MD as ‘multiple device’ name. Linux kernel device names, other than the ones directly linked to hardware, are normally named after their function not a ‘source’. In this instance, MD as ‘meta-device’ would make perfect sense – A device about devices. Anyway… it wasn’t named like that…

    Anyway, no labs planned for any of those two techs (yet) or for the migration of md toward dm. We never know, I’ll keep you guys informed.

    mass-storage.org

    In the last couples of days, I’ve been doing a lot of experimentations on mass-storage systems. I do not want to saturate this blog with high-ends labs when most of my friends and family doesn’t clearly see the difference between a SAN and a NAS. On the other hand, I still want to publish my research process. Research might seem a bit presumptuous in the light of what I’ve published so far, but this is really just a side effect of this dichotomy.

    www.mass-storage.org is my answer to this dilemma. As one of my pet project, it is an oasis (ok: small wiki) where I (and any so oriented researcher) can publish informations related to mass-storage. I’ve already published 2 articles about the recent storage labs i’ve concluded (DRBD , OCFSv2, AoE) and more is under way (about labs thatare currently under way [Lustre, AoE, DRBD Optimization])…

    I should start posting more insight into my own life here (hey, it was always noted as MY private little place), and move the storage related (and more "permanent") info at m-s.org.

    If you have any comments, as always, feel free to post.

    Pascal Charest, directly from Camellia Sinensis on an IleSansfil connection.

    You may save your extra charges by having the final deals with the cheap web hosting companies. The functionality of dedicated servers is well-liked by all small and large webmasters. The different tactics of pay per click are valuable to boost up the revenue of the internet marketers. There are a lot of the drawbacks of the shared web hosting due to the limited services of hosting providers. The web hosting services of the reliable companies are more acceptable by all clients. The web hosting services of the reputable service provider are featured with all-inclusive hosting packages in the affordable ratings.

    AoE + OCFSv2 (storage fun, part 3)

    NOTE: Now on www.mass-storage.org

    I have a running {DRBD 8.2.4 (P/P) + OCFSv2} 2 nodes cluster. More Info here.

    Kinda nice for small workload (think load-balanced webservers, fileservers, sql servers (careful, Oracle is OK, mysql need specific configuration for external lock)) but a bit on the limited side as scalability goes.

    Removing the storage aspect from applications servers is the way to go. This is what SAN are for. Lets modify my two nodes (ruby and crystal) cluster to allow dynamic growth in term of application and storage nodes.

    For this test, i’ll be bringing a third and fourth system : "jade" & "glouton", two debian based fileservers.

    The setup will be as follow :

    (jade & glouton): SAN target, exporting device through AOE
    (ruby & crystal): SAN initiator + application server

    Lexical info: an Initiator is a SAN client, whereas Target are servers.
      
    Exporting through AoE

    (glouton&jade)# apt-get install aoetools vblade
    (glouton)# vblade 0 1 eth0 /dev/sdb1
    (jade)# vblade 1 1 eth0 /dev/sdb1

    Note 1: My current setup make me use the above configuration. In a true production environment dual NIC would be preferred (using linux bonding module) & the exported device would be a MD array. There is also a lot of fine-tuning that can be done along the way (jumbo frame, multipath algo, scheduling algo, kernel hacking … )

    Note 2: I would against going with an integrated list of MAC addrs. in the vblade export command. The option is present, but the list is then static. Using ebtables seem to be a valid alternative since it can be dynamically modified.

    Importing through AoE

    (ruby&crystal)# apt-get install aoe-tools
    (ruby&crystal)# modprobe aoe

    If the file systems are already exported (from jade & glouton), they will be automatically available in /dev/etherd, or else, use "aoe-discover".

    Creating MD device for redundancy.

    (ruby&crystal)# apt-get install mdadm
    (ruby)# mdadm –create /dev/md0 -l1 -n2 /dev/etherd/e0.1 /dev/etherd/e1.1
    (crystal)# mdadm –assemble /dev/md0 /dev/etherd/e0.1 /dev/etherd/e1.1

    So at this point, there is two md raid devices which use the same resources. They aren’t mounted yet. Using OCFSv2 will allow us to control the concurrent access.

    Still using the same /etc/ocfs2/cluster.conf file (see previous post), we format the raid device in OCFS2 format (note: I now use label, it simplify the creation process of identical configuration files):

    (ruby)# mkfs.ocfs2 -L "san" /dev/md0 
    (ruby & crystal)# mount -t ocfs2 -L "san" /storage

    There we go, once again, a shared storage between ruby & crystal.

    Note 01 : This such configuration can easily saturate your network. Do not even try if your max speed is 100Mb/s. This would give awful perfs (trust me!). Go for giga or even infiniband if you can afford it.

    Note 02 : There is a lot of alternative options, you might want to check the md module documentation, under multipath. I know I will ;-)

    But how exactly is this system scalable ?

    Application node : If a system is built with aoetools, md-device support and ocfs2 installed, they can be hot-added to the network. No restart of any running sys. needed. However, It is still a very good idea to modify each cluster.conf file.

    Storage node : A system with devices exported through AoE can be hot-added up to a certain point, depending on the underlying raid type (md-device), but I would advice against it. Anyway, you need to take OCFS2 offline to issue a resize command.

    Filesystem size : Currently, due to 32 bits adressing, there seem to be a limit @ 16TB for a file system. A good reminder though is that AoE target can export more than one devices….

    310-200 would have easier if the professionals would have approved of 650-178 or 70-292 before 70-431. However, one can also go for 70-528 if planning to attempt SY0-101 later.

    Page 1 of 212»