Access control management system for Moscow State University supercomputer
In 2009 the Computing Center of Lomonosov Moscow State University launched its supercomputer called Lomonosov, which by mid-2011 ranked 13th in the global supercomputers ranking. Known as Octoshell, its access control management system was developed by Evrone.
The supercomputer was frequently used for computations in large-scale scientific projects and lab work by MSU students. Prior to Octoshell real time access control system introduction access to it would be granted via ticketing by paper mail. For obvious reasons, this communications channel is far from being instant.
As letters were traveling from one business or post office to another, user tasks could become irrelevant, while deadlines of scientific work could be moved. Given that the number of supercomputer users was meant to be increased dozens of times, there was a risk of being lost in paperwork and heavily slowing down effective resource assignment.
On top of that, the use of the supercomputer was not fully transparent to the Computing Center as it was unclear who was using the machine at the exact moment of time, whether or not a user was solving the problems stated as attempted to be solved, if the work ended in success, and what the challenges along the way were. Users were filing the reports as they could, which never guaranteed they were full and precise.
MSU planned on boosting the computing capacity, but it would have produced no result with the user issues at hand. This is why the stakeholder decided to have an automated access system developed that’d help tackle the aforementioned problems, and that led Evrone to carrying out the research and getting understanding of how to build an access control system.
Open source for supercomputers
What is access control system and how it works? The main idea behind an access system was to make Octoshell compatible with any supercomputer worldwide, irrelevant of whether it’s a school, an institution, or a company. This required a modular structure.
Modular means flexible. For instance, a computing center needs its unique form, registration routine, or report form, then its employee could make changes to a module, or write one from scratch to set up the system meeting the organization’s need. The functionality of the basic app and modules put together were achieved through:
- Fast and easy access to a supercomputer network and its computing capacities
- Exchange of approving documents online
- Automatically granting access details upon approval
- Control over tasks being performed
- Supercomputer workload control
- Automation of timely and full report offering by users
- Automated report generation covering supercomputer use efficiency
- Creation of stats analysis system
- User helpdesk
- Pickup of computation failure reports
- Automated processing of emergencies
In order to deliver the modular design, we’ve decided to go with Rails engines, the foundation of any Rails project. Engine is a standalone library that can be of a ruby gem format and work with the main application. Shopify is a popular project with the same approach.
The basic app offers fundamental functionality such as user profile, grouping users according to access rights, and granting rights to perform tasks. The rest of it is modular. Octoshell is a JRuby program, as supercomputers multitask a lot. Regular Ruby is poorly scalable to all processor cores, whereas JRuby utilizes all available processors equally.
After the introduction:
- Access granting time was cut to 17 hours, from several weeks
- More than 600 scientific projects being worked on simultaneously
- Over 250 scientific organizations and 3,000 users got access to the computers
- Helpdesk handles over 1,000 tickets a year
The supercomputer management system is being distributed under the MIT License, and you can find a repository on Github.com.