Virtualised Storage Assessment Report

RoaDMaPs Work Package 6 – Virtualised Storage Assessment Report.

RoaDMaPs Work Package 6 on virtualised storage aimed to test single virtualised storage area created from file systems on central University SAN, Faculty managed storage and a 3rd party cloud provider, working with trial equipment loaned by our commercial partner F5 – two ARX-2500 file virtualisation devices. 

File Virtualisation?  “…is the use of an abstraction layer to decouple access to files from the physical location of those files. Once files are virtualized, organizations can simplify access to data, optimize the use of existing storage resources, and share files across heterogeneous storage devices.” (F5)

 External Commercial Partners – F5 – are specialists in automated, open architectural IT frameworks that provide scalable and intelligent file virtualisation solutions that intercept, transform and direct IT traffic via a rule-based shared product platform.  

F5 ARX-2500 Test Plan (Objectives):

  • Identify a series of managed storage services using a range of hardware / platforms e.g. Engineering, Maths and Physical Sciences, Environment
  • Use the ARX-2500 device to create a single virtual namespace from individual storage areas
  • Test migration of data from one physical server to another
  • Get feedback from users on different virtualisation policies (e.g. automated vs. user-controlled)
  • Identify pilot research group (need to look closely at criticality of data) and work with them to test using single namespace, migration of data between tiers, storage performance.
  • Test cloud extender
  • Develop exit strategy (migration of data back to original storage location)

Outlined below are findings to date.  The detailed test plan is available at the end of this report.


Work Package 6 – Virtualised Storage Assessment Report

Author: Adrian Wheway, Senior Computer Officer

October 2012

Background

Scalability of the storage infrastructure is a key issue for many HEIs, particularly those that are research intensive; requirements for physical storage (such as volume, access speed, level of protection, backup etc) vary considerably, whilst the enterprise level storage that typically supports central University services such as e-mail, corporate systems, staff/student home directories etc, is both too expensive and too limited in its capacity to meet the combined needs of the research community.   It is estimated that up to 80% of the data that resides on this enterprise storage infrastructure hasn’t been accessed for over 90 days – and this puts further pressure on backup systems because a significant proportion of the time available in each backup window is spent backing up multiple copies of static data.  The storage needs of the research community have historically been met by a combination of central provision, Faculty-based managed storage and the ‘DIY’ approach (consisting of Flash Drives, USB hard drives, storage on local discs etc).  The result is that data is stored in a series of silos, reducing efficiency of service delivery and the ability to effectively manage research data across the institution.  The use of cloud storage either at an individual or Faculty level further complicates the picture.

The project worked in partnership with F5, a company which offers an automated storage tiering solution. F5 loaned equipment to us from March to August 2012 and provided technical support for installation and troubleshooting.

Questions we wanted to answer..

  1. Would the virtualized storage integrate with our existing Enterprise storage solution and could it be configured to function in a resilient, secure manner?
  2. Would deployment of virtualized storage provide more efficient and effective storage, backup and retrieval?
  3. Does virtualized storage work with a third party cloud storage provider?
  4. Would deployment of virtualized storage have any implications for researcher experience?

What we did..

The evaluation was based at the University of Leeds – see the Test Plan for F5 Evaluation table below

F5 provided two of their ARX-2500 switch appliances on loan for a trial period in order for us to evaluate the features of these devices and to determine if they would satisfy the questions we wanted to answer.

We identified broad test objectives – summarized here and created a detailed Test Plan for Evaluation (below).

The test plan we created set out to evaluate how effectively the F5-ARX could virtualise/present the storage we identified; how seamlessly it could move data between different tiers of virtualised storage without any impact to end user access; and how easily the appliances could integrate with our existing infrastructure in a resilient manner.

For the purpose of the evaluation, the tiered storage layers that the ARX would virtualise were provided by the University’s EMC Celerra NAS (connected to an EMC Clarrion CX4-480 SAN) and external cloud storage hosted by Amazon. 500GB research datasets were identified and provided by two faculties at the University to support the testing. An evaluation group was assembled to perform a number of end user access experience tests and a server specialist performed tests involving the server-side ARX infrastructure.

The two ARX-2500 appliances were configured in an active/passive role with each appliance located in a different data centre on campus.

Each appliance was connected to a single 10GB/s port and multiple 1GB/s networks in each datacentre. The 10GB/s connectivity was the primary route for connectivity. The appliances were integrated with the University’s Active Directory domains such that we could control access permissions on the virtualised storage with user/groups from our centralised directory service.

A global name space was configured on the ARX devices. This acted as the virtualised naming layer – effectively a server name hosted on the ARX devices with shared folders associated with it. Each global name space shared folder could be configured to redirect user access to shared storage held on other storage devices such as the EMC Celerra NAS.

The evaluation testing was split into 2 main phases:

(i) Scenario testing involving server side operations of ARX-2500 infrastructure;

(ii) Scenario testing of the client user experience provided by the ARX-2500 infrastructure when providing accessing tiered, virtualised storage.

Phase i (server side operations) covered these evaluation tasks and were to be performed by a server specialist:

  • Evaluation of the resilient features of the ARX-2500 and how these supported continued access to virtualised storage in several failure situations relating to failure of networking, storage unavailability, ARX device failure and disaster recovery.
  • Determine how existing storage access mechanisms using the Microsoft Distributed File System could be integrated with the global name space provided by the ARX-2500.
  • Prove that the ARX-2500 could move files/folders between different tiers of storage (including cloud storage) based on the different attributes of a file type such as its age and name/file extension and that the migration process between tiers maintained important file attributes such as date/time stamps and security/ownership attributes.
  • Prove that the ARX-2500 infrastructure could integrate with multiple Active Directory domains and internal network firewall configurations.
  • Monitor the pace at which the ARX-2500 appliances could migrate data between tiers and the impact this has on the storage hosting the tiered solution.
  • Explore the capacity management considerations of the ARX-2500 appliances.

Phase ii (client user experience of the tiered, virtualised storage) covered these evaluation tasks and were to be performed by the evaluation group over CIFS and NFS v3 from Windows XP onwards, Linux and Apple Mac (Snow Leopard & Lion) clients:

  • Prove that files/folders hosted on the virtualised storage could be accessed and that a range of file/folder operations could be performed and were correctly constrained by security permissions.
  • Prove that files/folders hosted on different tiers of the virtualised storage could be accessed that access was not impacted if a file/folder moved between tiers.
  • Determine if snapshot and recovery features of the virtualised, tiered storage could successfully recover files.
  • Prove that quota controls of the virtualised, tiered storage were correctly enforced.
  • Identify whether the 248 maximum character limit file path rule was correctly enforced from Windows clients.

Each member of the evaluation group was given a test matrix into which to record their results.

What we found..

Our evaluation explored the resilient features of the ARX-2500 appliances and how this supported the virtualised storage it provided. In the event of a networking failure, we found that there was automatic resilience between network ports of the same speed in the same appliance. (i.e. Ports of the same speed in the same appliance could act as a resilient option to one another. Ports of differing speeds in the same appliance could not act as a resilient option for one another). If primary networking of the active appliance failed or in any other device specific failure, then as long as the quorum share and meta-data shares (hosted on a separate system, not the ARX) were still up and visible to the passive ARX in the other data centre, then the ARX would automatically bring up its virtualised storage on the other appliance.

Storage resilience was provided by the actual system hosting the tiered storage. i.e. The storage was provisioned on our EMC NAS Celerra using a RAID 6 configuration provided by our EMC CX4-480 SAN and then shared as virtualised storage using the ARX-2500. The resilience of the storage was the RAID 6 configuration provided by the SAN.

In the event of a failure of a specific tier of storage, the ARX-2500 would continue to allow access to the virtualised tiers that were still available. The ARX-2500 appliance contains a metadata index of all files that it hosts across all tiers such that directory listings continue to show for a tier even when it is unavailable. Should a client try to access a file from a tier that has gone offline, then an error is shown when attempting to open a file from the unavailable tier.

A number of storage tier recovery scenarios were tested including partial and full failure of both primary and secondary tier virtualised storage. This process was supported by an F5 consultant and the recovery process was documented. We were able to recover from each of the storage failure scenarios using a combination of recovery from backup and applying a specific set of tasks on the ARX. We successfully performed a full disaster recovery of all elements of the virtualised storage provision, although the appliances did not come full back online until both primary and secondary tiered storage were recovered – we expected recovery of one of the tiers would be enough to bring the system back online.

Based on the evaluation tests we performed, we determined that the ARX-2500 appliances could successfully migrate files/folders between different tiers of storage based on a flexible range of file matching criteria and could present this tiered storage in a transparent manner to end users. This criteria included files that had not been modified for the last 30/90 days and by file extension type.

We configured Distributed File System links (our primary method to access storage) to point to the ARX-2500 global name space which proved we could integrate our existing access mechanisms with those of the appliance.

The ARX-2500 appliances successfully integrated with our Windows 2008 R2 Active Directory configured over 2 domains in the same forest and were able to successfully utilise security objects from both domains. With assistance from the F5 consultant, we configured multiple global name spaces on the appliance such that we could provision virtualised storage from multiple-VLANs with one of the VLANs located behind an internal network firewall.

File migration activities utilise 8 concurrent copy streams (threads). In our case, the performance of the migration activities were determined by the responsiveness of the target tiered storage.

F5 advised that the ARX-2500 appliance could handle 6000 concurrent users with multiple connections and 1.5 billion files.

Our end user experience testing was limited to the following platforms when accessing tiered, virtualised storage stored on the EMC Celerra NAS hosted from a single Active Directory domain on a VLAN located outside of the internal firewall: Windows XP and later using CIFS, Apple Mac Snow Leopard using CIFS. (No UNIX clients or NFS v3 connectivity was tested)

We did not complete testing from clients utilising our Child Domain connecting to virtualised storage located behind our internal firewall.

The tests we performed showed that files could be successfully retrieved and manipulated in a transparent manner from multiple-storage tiers with security constraints enforced and other file/folder attributes being maintained. From an Apple Mac it was noted that folders to which a user did not have access were shown with no contents, compared to Windows clients where an ‘Access is Denied’ error was shown.

The snapshot/file recovery features performed adequately, although by default the naming conventions for snapshots did not include the date/time that they were taken – this can be customised. Quota controls were correctly enforced, although as files were migrated from the primary to the secondary tier, the quota utilisation on the primary tier of storage was reduced.

The 248 maximum character limit for file paths was adhered to when connecting from Windows clients.

Connectivity to virtualised storage hosted on Amazon cloud storage was configured, but testing was limited to proving that files could be migrated to and from this tier. No user experience testing was performed on this feature. A proxy server is required for support the ARX when transferring files to a cloud provider. The proxy server runs a periodic job to copy the files up to the cloud and turns the file left behind into a stub. Any files that are migrated to the cloud leave a stub file behind which is located on the Windows or LINUX proxy server.

Should a user click on a stub to open the file or change its attributes, then the file is pulled back from the cloud and the file stops being a stub file and becomes a fully inflated file again. The file is not deleted from the cloud. All file transfers to the cloud are encrypted. Redundant data in the cloud can be removed if need using a scrub process.

Should the stub files be lost, then data can be recovered from the cloud, however, if any files have been renamed, then they’ll revert to their original name. Permissions are not stored in the cloud on files. Hence, if stub files were lost and everything was downloaded from the cloud for recovery purposes, then the files would take on the permission of the folder into which they were recovered.

Lessons learned..

Having the F5 ARX-2500 equipment loaned to us for a trial period with access to an F5 consultant proved to be an effective way in which to perform this trial and progress through our evaluation testing.

On reflection, perhaps our evaluation group could have had a greater number of members from a wider audience as this may have enabled us to complete all of the User Experience testing.

The initial set up of the equipment took longer than anticipated, illustrating the need for adequate set up and troubleshooting time when planning testing.

What next..

We have the option to download a cost free VMware virtual machine which can act as an F5 ARX appliance and run it in our VMware environment for trial period. Loading this virtual machine should allow us to complete some of the user experience testing that we were unable to complete in the main evaluation period, as well as testing the integration of the ARX with other storage platforms. We are progressing these objectives via RoaDMaP project meetings.

The Test Plan for F5 Evaluation

ID Task
1. Identify & satisfy data centre environment pre-reqs needed to host F5 appliance
2. a)      Identify and setup several data sets to be hosted on specific primary storage which will be managed by the F5 ensuring it is shared using CIFS and NFS3 for the purposes of proof of concept training;

b)      Identify and setup specific secondary storage to be managed by the F5 onto which specific primary storage data will be migrated by the F5 ensuring it is shared using CIFS and NFS3;

c)      Identify the global name space identifier to used;

3. Configure successful CBS backups of the primary and secondary storage.
4. a)      Rack and Setup the F5 appliance on the UOL network/DS domain with assistance from F5 consultant;

b)      Setup a global name space;

c)      Configure the F5 to see the specific primary and secondary storage using CIFS/NFSv3.

d)      Configure the required data migration tasks between tiers;

5. F5 consultant to provide any additional, F5 hands-on training to support this test plan.
6. Prove that the F5 can integrate with our Active Directory forest (DS and Admin) and resolve security attributes as required.
7. Identify that the global name space supports up to at least 248 characters in the path from CIFS and NFSv3 connections.
8. Identify what happens to the global name space when the F5 is offline.
9. Identify how the current means of accessing M: and N: drive storage using DFS can be integrated with any new global name space.
10. Configure an F5 task to migrate files from data set 1 that were accessed more than 90 days ago from primary to secondary storage.
11. Configure an F5 task to migrate files from data set 2 that were modified more than 30 days ago from primary to secondary storage.
12. Configure an F5 task to migrate files from data set 3 that only comprise of specific files types (*.DOC, *.JPG) from primary to secondary storage.
13. Prove that CIFS ACLs/date time stamps/ownership information is maintained when data is migrated from primary to secondary storage and vice versa.
14. Prove that NFSv3 security bits/date time stamps/ownership information is maintained when data is migrated from primary to secondary storage and vice versa.
15. Identify the impact of migrating data from primary to secondary storage on the Celerra quota assigned to that data set.
16. Prove that it is possible to setup an F5 task to repopulate primary storage with data that has previously been migrated to secondary storage.
17. Copy a large dataset (500GB) that matches with one of the migration criteria onto primary storage such that a large quantity of data is moved from primary to secondary in a short period of time. Monitor the performance of the F5 / Celerra / the SAN / the network.
18. Mimic a failure of the primary storage and determine how to recover the primary storage. Prove that the live data on the primary storage and the data that has been migrated from primary to secondary storage is still accessible after the recovery is complete.
19. Mimic a failure of the secondary storage and determine how to recover the secondary storage. Prove that the live data on the primary storage and the data that has been migrated from primary to secondary storage is still accessible after the recovery is complete.
20. Mimic a failure of both the primary and secondary storage and determine how to recover both tiers. Prove that the live data on the primary storage and the data that has been migrated from primary to secondary storage is still accessible after the recovery is complete.
21. Prove the system’s resilience to partial IP network failure: Remove half of the public/private network connection cables  from the node whilst migration jobs are running.
22. Prove the system’s resilience to recovering from a full IP network failure: Remove all of the network connection cables  from the node whilst migration jobs are running.
23. Identify means of monitoring the F5: Can it send out email notifications when problems occur; does it offer a web based user interface for troubleshooting in logs, etc.
24. Prove if the capacity of the ARX-2500 is sufficient with its limits of 6000 users and 1.5 billion files. Seek guidance on whether the ARX-2500 would scale sufficiently if it were to be the common access point for all of the M: and N: drive file storage.
25. Prove that it is possible to integrate the ARX-2500 with secondary storage hosted on a remote, private cloud using the Cloud Extender Module. Identify the relevant tests from this plan that could be repeated when using secondary storage via the Cloud Extender Module.
26. a)      Identify and setup several evaluation data sets to be hosted on specific primary storage which will be managed by the F5 ensuring it is shared using CIFS and NFS3;

b)      Identify and setup specific secondary storage to be managed by the F5 onto which specific primary storage data will be migrated by the F5 ensuring it is shared using CIFS and NFS3;

c)      Identify the global name space identifier to used;

d)      Configure an F5 task to migrate files from evaluation data set that were modified more than 30 days ago from primary to secondary storage.

e)      Configure an F5 task to migrate files from evaluation data set that were modified more than 30 days ago from secondary to primary storage.

27. a)      Prove that data hosted on both primary and secondary storage is accessible to Windows clients (XP, Vista, Windows 7, Windows 2003, Windows 2008) using the global name space over a CIFS connection;

b)      Prove that permissions are enforced when accessing data from primary and secondary storage;

c)      Prove that quotas are enforced when accessing data from primary and secondary storage;

d)      Update a file stored on primary storage;

e)      Update a file stored on secondary storage;

f)       Delete a file stored on primary storage;

g)      Delete a file stored on secondary storage;

h)      Attempt to access/recover from the snapshot of a file still on primary storage.

i)        Attempt to access/recover from the snapshot of a file on secondary storage.

j)        Attempt to access/recover from the snapshot of a file on secondary storage from a primary storage snapshot taken before the file was moved to secondary storage.

28. a)      Prove that data hosted on both primary and secondary storage is accessible to Apple Mac clients (Snow Leopard, Lion) using the global name space over a CIFS connection.

b)      Prove that permissions are enforced when accessing data from primary and secondary storage;

c)      Prove that quotas are enforced when accessing data from primary and secondary storage;

d)      Update a file stored on primary storage;

e)      Update a file stored on secondary storage;

f)       Delete a file stored on primary storage;

g)      Delete a file stored on secondary storage;

h)      Attempt to access/recover from the snapshot of a file still on primary storage.

i)        Attempt to access/recover from the snapshot of a file on secondary storage.

j)        Attempt to access/recover from the snapshot of a file on secondary storage from a primary storage snapshot taken before the file was moved to secondary storage.

29. a)      Prove that data hosted on both primary and secondary storage is accessible to UNIX clients (Red Hat Linux and Centos) using the global name space over a CIFS connection.

b)      Prove that permissions are enforced when accessing data from primary and secondary storage;

c)      Prove that quotas are enforced when accessing data from primary and secondary storage;

d)      Update a file stored on primary storage;

e)      Update a file stored on secondary storage;

f)       Delete a file stored on primary storage;

g)      Delete a file stored on secondary storage;

h)      Attempt to access/recover from the snapshot of a file still on primary storage.

i)        Attempt to access/recover from the snapshot of a file on secondary storage.

j)        Attempt to access/recover from the snapshot of a file on secondary storage from a primary storage snapshot taken before the file was moved to secondary storage.

30. a)      Prove that data hosted on both primary and secondary storage is accessible to UNIX clients (Red Hat Linux and Centos) using the global name space over an NFSv3 connection.

b)      Prove that permissions are enforced when accessing data from primary and secondary storage;

c)      Prove that quotas are enforced when accessing data from primary and secondary storage;

d)      Update a file stored on primary storage;

e)      Update a file stored on secondary storage;

f)       Delete a file stored on primary storage;

g)      Delete a file stored on secondary storage;

h)      Attempt to access/recover from the snapshot of a file still on primary storage.

i)        Attempt to access/recover from the snapshot of a file on secondary storage.

j)        Attempt to access/recover from the snapshot of a file on secondary storage from a primary storage snapshot taken before the file was moved to secondary storage.

31. Obtain compatibility matrix showing which NAS devices are supported for use with the ARX.
32. Learn about how to utilise the ARX as a means by which to seamlessly migrate data from one set of storage to another set of storage.
33. Determine if any client based tools exists which allow an end user to manually migrate files onto secondary storage.
Advertisements

DMP Online Plan Formatting

How can we gain most benefit from using an online DMP tool by introducing time-saving features that make best use of this environment?

Computers can perform certain tasks much faster than humans, but can be foxed (no pun intended) by the seemingly simple job of correctly distinguishing between pictures of cats and dogs.

main-dogs-vs-cats_1_

https://www.microsoft.com/en-us/research/publication/asirra-a-captcha-that-exploits-interest-aligned-manual-image-categorization/

In the context of DMPOnline, this got me thinking about what improvements to the system could make best use of the fact this is an online tool rather than a set of templates. Academic staff are often wary of using new systems due to the time that needs to be invested in learning how they work. So, if we can demonstrate that a new tool such as DMPOnline will definitely save time & effort by removing the need for the user to undertake mechanical tasks, then this can only be a good thing.

Computers excel (again, no pun intended) at processing structured data according to a clearly defined set of rules. Humans on the other hand are much better at dealing with unstructured data and open-ended tasks. Within the context of data management planning, I started to look at which of the requirements resulted from clearly defined rules.

The area which jumped out at me were the various requirements for formatting the finished plans. Some examples are as follows:

RCUK requires all ‘attachments’ including DMPs to be formatted with minimum margin sizes of 2cm in all directions. They also suggest the use of ‘Arial’ with a minimum font size of 11.

Je-S Formatting Guidance#1 

Elsewhere on the Je-S website, it state that Arial or Times New Roman are recommended.

Je-S Formatting Guidance#2

This page also defines that accepted file formats which include:

  • PDF versions 1.3, 1.4, 1.5 and 1.6 (*.pdf)
  • Postscript level 2 (*.ps)
  • Microsoft Word (’97 and later including Word 2007)

ESRC further require that a minimum font size of 12 is used and that DMPs do not exceed 3 pages of A4.

Je-S ESRC Formatting Guidance

MRC’s page length requirements are more complex because the data management plan forms part of a longer ‘Case for Support’ document. The maximum permitted length for the ‘Case for Support’ depends on the scheme being applied to.

Je-S MRC Formatting Guidance

The maximum length of a NERC DMP is one side of A4:

Je-S NERC Formatting Guidance

My personal experience of helping PIs to create DMPs using DMPOnline is that I spend (or rather waste) a lot of time adjusting settings to bring the plan in line with the funder’s requirements for page length, font size etc. This often involves deselecting different options, such as header and footer text on a trial and error basis to reduce the plan down to the required length.

These tasks can easily be expressed as a series of structured tasks.blog_post

Of course, the user should be given the option to override these settings (they may want a version of the plan that will not be submitted as part of the application that exceeds the maximum length permitted by the funder, for example), but in the majority of cases, if the system can produce a plan which meets the funder requirements (and updates these requirements as and when they change), then PIs will be able to spend more of their valuable time concentrating on the content, rather than the formatting.

This suggestion has been posted on the DMPOnline GitHub pages and has received positive feedback.

Tim Banks