Tag Archives: cern

What do I do?

Hard to tell.. but if you see this pretty cool 6min movie: http://www.phdcomics.com/comics.php?f=1430

you’ll see that they mention a collider.
This thing sends out lots of data from each collision.

I administrate a system that gets a little bit (very little) of that data so that the scientists can look at it and maybe find out what’s going on :)

HEPIX Spring 2011 – Day 5

What day it is can be told by all the suitcases around the room.

Version Control

An overview of the version control used in CERN. Quite cool, they’re not using Git yet but they are moving away from CVS to SVN (subversion) which is not updated anymore. Apparently hard to migrate.

They use DNS load balancing

  • Browse code / logging, revisions, branches: WEBSVN – on the fly tar creation.
  • TRAC – web SVN browsing tool plus: ticketing system, wiki, plug-ins.
  • SVNPlot – generate SVN statsw. No need to checkout source code (svnstats do ‘co’).

Mercurial was also suggested at the side of Git (which is founded by Linus Torvalds).

Cern – VM – FS

Cern-VM-FS (CVMFS) looked very promising. The last one is not intended at the moment for images but more for sending applications around. It uses Squid proxy server and looked really excellent. Gives you a mount point like /cvmfs/ and under there you have the softwares.


Requirements needed to set it up:

  • Rpms: cvmfs, -init-scripts, -keys, -auto-setup (for tier-3 sites does some system configs), fuse, fuse-libs, autofs
  • squid cache – you need to have one. Ideally two or more for resilience. Configured (at least) to accept traffic from your site to one or more cvmfs repository servers. You could use existing frontier-squids.


National Grid Service Cloud

A Brittish cloud.

Good for teaching with a VM – if a machine is messed up it can be reinstalled.

Scalability – ‘cloudbursting‘ – users make use of their local systems/clusters – until they are full – and then if they need to they can do extra work in the cloud. Scalability/cloudbursting is the key feature that users are looking for.

Easy way to test an application on a number of operating systems/platforms.

Two cases were not suitable. Intensive – with a lot of number crunching.

Good: you don’t have to worry about physical assembly or housing. They do have to install the servers and networking etc. Usually this is done by somebody else. Images are key to making this easier.

Bad: Eucalyptus stability – not so good. Bottlenecks: networking is important. More is required to the whole physical server when it’s running vms.

To put a 5GB vm on a machine you would need 10GB. 5 for the image and 5 for the actual machine.
Some were intending to develop the images locally on this cloud and then move it on to Amazon.

Previous Days:
Day 4
Day 3
Day 2
Day 1

HEPIX Spring 2011 – Day 3

Day 3 woop!

An evaluation of gluster: uses distributed metadata, so no bottleneck that comes with a metadata server, can or will do do some replication/snapshot.

Virtualization of mass storage (tapes). Using IBM’s TSM (Tivoli Storage Manager) and ERMM. Where ERMM manages the libraries, so that TSM only sees the link to the ERMM. No need to set up specific paths from each agent to each tape drive in each library.
They were also using Oracle/SUN’s T10000c tape drives that goes all the way up to 5TB – which is quite far ahead of LTO consortium’s LTO-5 that only goes to 1.5/3TB per tape. Some talk about buffered tape marks which speeds up tape operations significantly.

Lustre success story at GSI. They have 105 servers that provide 1.2PB of storage and max throughput seen is 160Gb/s. Some problems with

Adaptec 5401 – boots longer than entire linux. Not very nice to administrate. Controller complains about high temps – and missing fans of non-existing enclosures. Filter out e-mails with level “ERROR” and look at the ones with “WARNING” instead.

Benchmarking storage with trace/replay. Using strace (comes default with most Unixes) to record some operations and the ioreplay to replay them. Proven to give very similar workloads. Especially great for when you have special applications.

IPv6 – running out of IPv4 addresses, when/will there be sites that are IPv6? Maybe if a new one comes up? What to do? Maybe collect/share IPv4 addresses?

Presentations about the evolve needed of two data centers to accomodate requirements of more resource/computing power.

Implementing ITIL with Service-Now (SNOW) at CERN.

Scientific Linux presentation. Live CD can be found here:

www.livecd.ethz.ch. They might port NFS 4.1 that comes with Linux Kernel 2.6.38 to work with SL5. There aren’t many differences between RHEL and SL but in SL there is a tool called Revisor, which can be used to create your own linux distributions/CDs quite easily.


Errata is a term – this means security fixes.

Dinner later today!


Next Days:
Day 5
Day 4

Previous Days:
Day 2
Day 1

HEPIX Spring 2011 – Day 1

Got in last night at around 2140 local time.
I should’ve done a little more exact research for how to find my hotel. Had to walk some 30 minutes (parts of it the wrong way) to get to it. But at least I made it to see some ice hockey.. . to bad Detroit lost.

Today’s another day though!

First stop: breakfast.

Wow. What a day, and it’s not over yet! So much cool stuff talked about.

Site Reports

The first half of the day was site reports from various places.

GSI here in Darmstadt (which is where some of the heaviest elements have been discovered). They have started an initiative to keep Lustre alive – as apparently Oracle is only going to develop this for their own services and hardware. They are running some SM – SuperMicro servers that have infiniband on board – and not like the HP ones I’ve seen that has the mellanox card as an additional mezzanine card. Nice. They were also running some really cool water cooling racks that uses the pressure in some way to push the hot air out of the racks. They found that their SM file servers had much stronger fans at the back, and not optimized airflow inside the servers so they had to tape over some (holes?) over the PCI slots on the back of the server to make it work properly for them. They were also running the servers in around 30C – altogether they got a PUE of around 1.1 which is quite impressive.

Other reports: Fermilab (loots of storage, their Enstore has for example 26PB of data on tape), KIT, Nikhef (moved to ManageEngine for patch and OS deployment, and Brocade for IP routers), CERN (lots of hard drives had to be replaced.. around 7000.. what vendor? HP, Dell, SM?), DESY (replaced Cisco routers with Juniper for better performance, RAL (problem with LSI controllers, replaced with adaptec), SLAC (FUDForum for communication).


Rest of the day was about:


Some talk about messaging – for signing and encrypting messages. Could be used for sending commands to servers but also for other stuff. I’ve seen ActiveMQ in EyeOS and it’s also elsewhere as well. Sounds quite nice but apparently not many use it, instead they use ssh scripts to run things like that.


About various threats that are public in the news lately and also presentation of some rootkits and a nice demo of a TTY hack. Basically the last one consists of one client/linux computer that has been taken hacked, then from this computer a person with access to a server sshs there. And then the TTY hack kicks in and gives the hacker access to the remote host. Not easy to defend against.

There was also a lengthier (longest of the day) 1h-1.5h presentation of a French site that went through how they went ahead when replacing their home-grown Batch management system with SGE(now Oracle Grid Engine).

*** Updated the post with links to some of the things. Maybe the TTY hack has another name that’s more public.

Next Days:

Day 5
Day 4
Day 3
Day 2

HEPIX Spring 2011

I’m heading to Hepix this whole week!

Looks like there’s some really interesting topics like:

Lustre, glustre, ipv6, stuff about the CERN it facilities, Scientific Linux report, cloud/grid virtualization, Oracle Linux.

I’ll sure be doing a bit of blogging about what’s going down.