Minutes of the IT strategy steering committee, 28/04/11

Present: Martin Hardcastle (chair: MJH), John Atkinson (systems manager: JA), Tim Gledhill (staff rep: TMG). Mark Thompson (staff rep: MAT), Mark Galloway (student rep: MG), John Barnes (PD/fellow rep: JB)

0. Introduction

The meeting opened at 11:10.

1. Minutes of previous meeting

The minutes had previously been circulated and were approved.

There were no actions from the previous minutes.

2. System status report

JA gave a report on central IT facilities.

He reported that over the Christmas period, car-server and star had both been upgraded to Fedora 13, primarily due to the availability of security updates. Initial reorganisation of the server room began, allowing for cluster expansion (see below). Over the Easter break, the CAR main file server and the storage had been relocated to the "CAIR rack". This allows for modest expansion of the storage and separates the CAR system from the cluster. Storage and servers are now in a cooler part of the room.

The client desktop freezing problem, discussed in the previous minutes, had been investigated by JA and MJH and JA reported their conclusion that the problem was not directly related to the fibre channel (FC) storage as they had thought earlier. Basic filesystem tests (bonnie++) show that the FC storage is out-performing internal disks as expected. JA and MJH now think that because the /home filesystem is on the same controller as the /data filesystem, heavy usage of /data was affecting access to /home. Similar behaviour has been noticed on the cluster (which has the same set-up), but is less of a problem because users do not have graphical desktops reliant on the corresponding /home filesystem. Consequently, /home has been moved to a hardware RAID1 of two internal 2TB SAS discs, and users have been notified of "best practice" use of /local discs for CPU intensive work. These two actions had resulted in no re-occurrences of the freezing. The committee discussed the importance of making sure that users, in particular new students, understand that doing extensive I/O over the network is a bad idea.

JA reported that the air conditioning capacity in the server room had been vastly improved. Installing of the two new 12.5 kW units was now complete and we have an effective cooling of ~ 37kW. There had recently been a leak from one of the older units (failed pump and blocked drain). No damage had been caused, but JA had instigated initial discussions of how to prevent a recurrence of this with Estates. The committee discussed ways in which the racks could be moved or protected from water leaks from ceiling-mounted A/C units.

JA reported that FC14 has been installed on ~ 50% of user desktops (mainly students' PCs). Upgrades to FC14 are continuing.

3. Cluster report

MJH reported on the status of the cluster. As trailed in the previous minutes, there had been a number of changes to the cluster hardware and software since the last meeting. Key points included:

MJH reported that up to the point of the major upgrade in early April over 2 million CPU-hours had been used by jobs: the average cluster utilization had been around 50%.

4. Consolidated grant bid

MJH explained that the committee was being asked to come up with proposals for infrastructure IT funding. Desktops and laptops would be covered by a standard formula, but the group was in a position to ask for other larger items. MJH reported that funding for a modest increase in (CAR-dedicated) cluster capacity was being proposed as part of Chiaki Kobayashi's bid, possibly supported by internal PAM funds, and so the committee felt that there was no need to discuss this further. At MAT's suggestion the committee discussed the hardware that might be becoming obsolescent in the lifetime of the consolidated grant; one obvious requirement in this period will be another server upgrade (allowing the current car-server to take over some of the less demanding web server/backup server tasks). The committee discussed the need for more storage: some users are expecting to have or generate very large amounts of data. The committee agreed that an expansion of the available storage might be useful. TMG asked about the expandability of the backup system, and JA explained that in principle the drive units could be upgraded to the next generation of tapes when those became available. An expansion of UPS capability would also be needed — probably just a single extra battery module. Finally, JA suggested that we upgrade the network switch serving user desktops to Gigabit — an existing Gigabit switch bought on MJH's Royal Society grant was not sufficient to support all or even many user desktops.

Action: JA to look into pricing for the upgrades to backups, networking and the UPS.

Action: MJH/MAT/TMG to pass on the committee's views at the meeting to discuss the consolidated grant later in the day.

5. AOB

MG said that he would be leaving the committee as he would no longer be a student at the time of the next meeting. The committee discussed a suitable replacement. TMG agreed to ask Kieran Forde if he would be willing to become student rep. Other members of the committee were happy to remain in post.

Action: TMG to invite Kieran to join as student rep. (Since the meeting Kieran has been asked and agreed.)

6. Date of next meeting

A meeting would be held in September, before the new students arrive at the start of term.

Action: MJH to circulate a Doodle nearer the time.

The meeting closed at 12:10.