Clustered Storage © Isilon® Systems www.isilon.com Abstract components. This trend is part of the continual movement toward the promise of Moore’s Law: over The paper introduces the reader to a new time, companies are getting higher computing power for paradigm shift that is currently taking place in a lower cost and realizing the economics of commodity the data storage industry: the movement hardware. The low cost of commodity hardware toward Clustered Storage architectures. components has made the merits of clustered Clustered Storage architectures are changing architectures affordable. the rules of how data is stored and accessed. These macro trends point to three fundamental This paper discusses the trends that clearly implications: define clustered storage architectures as the • The storage industry is undergoing a future of data storage, detail the requirements revolution; of this new category of storage, and introduce • Clustered storage is becoming the dominant the Isilon® IQ clustered storage solution which new storage architecture; is the first to deliver on the promises of this • Customers are reaping substantial business paradigm shift. value and benefits from clustered storage. From big monolithic boxes to clustered 1 Three Macro Trends Driving the Shift to architectures, storage is following the paradigm shift Clustered Storage that has already occurred in the server application world. The movement toward Clustered Storage architectures is being driven by three macro trends: • Explosive growth of unstructured data and 2 Clustered Storage digital content; When defining clustered storage solutions we find six • Paradigm shift to cluster computing; common characteristics: • Proliferation of cheaper and faster industry- • Symmetric Clustered Architecture; standard enterprise-class hardware. • Scalable Distributed File System; Today’s competitive companies are facing a • Inherent High Availability; tremendous increase in the amounts of data used to • Single Level of Management; conduct their everyday business, driven largely by the • Linear Performance Characteristics; explosion of unstructured data. IT managers know that • Enterprise Ready. applications using and storing video, audio, images, Symmetric Clustered Architecture: The key research sets, and other large digital files and design principle behind distributed clustered storage unstructured data are pushing the bounds of traditional solutions is symmetry among the nodes which can be storage system capacity and performance. thought of as self-contained storage controller heads, The second macro trend is the widespread adoption disks, CPU, memory, and network connectivity. The of clustered computing. Enterprise data centers have tasks the cluster must perform are distributed uniformly evolved from the era of “big iron” proprietary across its members, enhancing scalability, access to mainframes and symmetrical multiprocessing (SMP) data, performance and availability, in contrast to servers to that of standards-based (using industry- traditional storage architectures deploying master standard hardware), clustered machines running Linux server-based approaches where the storage nodes are or Windows. not symmetric and are limited in scalability and The third macro trend driving the movement to performance. clustered storage is a dramatic decrease in the price Scalable Distributed File System: The enabler of performance curves of industry-standard hardware this architectural approach is a distributed file system that can scale to be a very large pool of storage or single Proceedings of the Spring Young Researcher's network drive. Distributed file systems maintain control Colloquium On Database and Information Systems of file and data layout across the nodes and employ SYRCoDIS, St.-Petersburg, Russia, 2007 metadata and locking semantics that are fully distributed and cohesively maintained across the cluster, enabling the creation of a very large global pool of winning family of Isilon IQ products consists of high- storage. A single network drive and single file system performance clustered storage systems that combine an can seamlessly scale to hundreds of terabytes. intelligent distributed file system with modular Inherent High Availability: A distributed clustered industry-standard hardware to deliver unmatched architecture by definition is highly available since each simplicity and scalability. Isilon IQ was designed for node is a coherent peer to the other. If any node or unstructured data and for use in data-intensive markets component fails, the data is still accessible through any such as media and entertainment, digital imaging, life other node, and there is no single point of failure as the sciences, oil and gas, manufacturing and government. file system state is maintained across the entire cluster. In fact, fully distributed cluster architectures can sustain 3.1 Isilon IQ: Scalable Distributed File System multiple simultaneous drive and node failures and still At the heart of Isilon’s clustered storage solution is be able to recover and continue operation. Moreover, Isilon’s OneFS® patented distributed file system. It high availability is “inherent” for distributed cluster combines the three layers of traditional storage architectures, meaning that unlike traditional storage architectures — file system, volume manager and RAID systems, where an IT manager would have to purchase — into one unified software layer, creating a single additional software and expensive redundant hardware intelligent fully symmetrical file system that spans all in order to achieve high availability, clustered storage nodes within a cluster. OneFS provides a single point of solutions achieve high availability by the very nature of management for large content stores, faster access to the fully symmetrical architecture. large content files, inherent high availability, the ability Single Level of Management: Distributed clustered to easily scale a single cluster’s capacity, up to 10 storage solutions provide a single level of management Gigabytes per second of total throughput and hundreds regardless of the size of the file system and number of of terabytes of capacity, all from a single network file storage nodes added to the cluster, making it as easy to system. administer a cluster size of a few nodes as it is to OneFS uniquely stripes files and meta data across manage a cluster of several hundred nodes. Complete multiple storage nodes within a cluster, an improvement clustered storage solutions automate traditionally over the traditional method of striping content across manual tasks, including the load balancing of client individual disks within a single storage device or connections across nodes in the cluster to ensure volume. This fully distributed approach enables Isilon optimal performance and the automatic re-balancing of to deliver breakthrough performance, scalability, content when new nodes are added to the cluster to availability and manageability. scale capacity and performance. OneFS provides each node with knowledge of the Linear Scalability of Performance: Distributed entire file system layout and where each file and parts clustered storage solutions have the unique capability to of files reside. Accessing any independent node gives a scale all performance elements in a near linear fashion. user access to all content in one unified namespace, When more nodes/controllers of memory, processing, meaning that there are no volumes or shares, no disk spindles and bandwidth are added, it maintains its inflexible volume size limits, no downtime for coherency as one logical system and is able to aggregate reconfiguration or expansion of storage and no multiple across all resources; achieving linear scalability of network drives to manage. Instead, OneFS provides the performance with each additional node. In order to user with the ease and simplicity of managing a single achieve this linear scalability of performance, it is NAS head with scalability, performance, and flexibility critical for each node to stay in sync with all other that exceeds SAN systems. nodes in the cluster. As a result, more robust solutions typically employ very high-speed intra-cluster 3.2 Isilon IQ: Symmetric Architecture interconnects to ensure low latency between the nodes and real-time synchronization of the cluster. Each Isilon IQ cluster consists of anywhere from three Enterprise Ready: Distributed clustered storage to 96 Isilon IQ nodes. Each modular, selfcontained solutions must be enterprise ready. Historically, Isilon IQ node contains disk capacity along with a clustered architectures were first deployed primarily in powerful storage server, CPU, memory and network, all non-commercial research labs, not in mainstream in a self-contained, compact, 2U rack-mountable commercial enterprises. In order to be part of a system. As additional Isilon IQ nodes are added to a paradigm shift, though, the clustered solution must be cluster, all aspects of the cluster scale symmetrically, ready for implementation into a commercial enterprise including capacity, throughput, memory, CPU and data center. Specifically, the solution must support network connectivity. Isilon IQ nodes automatically standard network protocols and provide the tools that IT work together, harnessing their collective power into a managers have come to expect. single unified storage system that is tolerant of the failure of ANY piece of hardware, including disks, switches or even entire nodes. 3 Isilon® Systems Clustered Storage In a fully distributed architecture, it is critical for Isilon Systems® is now delivering its fourth generation each node to stay in sync with all other nodes in the of fully distributed clustered storage solutions and is the cluster. Isilon IQ storage nodes use either Gigabit clear leader in this emerging category. Isilon’s award- Ethernet or high-speed, low-latency Infiniband switching fabric for inter-cluster communication, “at risk” disk to available free space on the cluster in a synchronization and all intra-cluster operations. This manner that is both automatic and transparent to the enables each node to share information with every other customer. Once the data is rebuilt, the user is notified to node on the system, so that each storage node acts as a service the suspect drive in advance of actual failure. fully coherent peer with complete understanding of This feature provides customers with confidence that what the other nodes are doing. data written today will be stored 100 percent reliably, OneFS keeps the nodes synchronized by using a bit-for-bit correct, and available whenever it is needed. distributed lock manager, coherent caching and a No other cluster solution today provides this level of remote block manager that maintains global coherency data protection reliability. throughout the entire cluster. It is this global coherency through each node that eliminates any single point of 3.4 Isilon IQ: Single Level of Management failure for access to the file system. Any node in the Isilon IQ creates a single, shared pool of all content cluster can take a write or read request and each node within the cluster, providing one point of access for presents the same unified view of the entire file system. users and one point of management for administrators. All nodes in the cluster are “peers”, so the system is Today, Isilon has tested and supports growing a single fully symmetrical, eliminating hierarchy and inherent network drive up to 1,000TB (1 PB). Once an Isilon IQ bottlenecks. cluster is established, users can connect to any storage node and securely access all of the content within the 3.3 Isilon IQ: Inherent High Availability cluster. This means there is only a single relationship Traditional file systems use a master/slave relationship for all applications to connect to and that every to manage multiple storage resources. Such application has visibility and access to every file in the relationships have intrinsic dependencies and create entire file system. points of failure within a storage system. The only true As a distributed file system, OneFS eliminates way to ensure data integrity and eliminate single points captive server-attached storage and creates substantial of failure is to make all nodes in a cluster peers. improvements in the efficient viewing, sharing, and Because each node in an Isilon IQ is a peer, any node allocation of resources. Users can enjoy instant access can handle a request from any application server to to previously inaccessible content and administrators provide the content requested. If any one node were to can dynamically add and reallocate content when go down, any other node could fill in, thereby capacity needs increase. The result is faster deployment eliminating any single point of failure. of new business applications and the ability to access Multi-failure Support: With Isilon IQ, customers and share content anywhere on the network. can withstand the loss of multiple disks or entire nodes One of the key benefits of OneFS is the ease with without losing access to any content. OneFS’s unique which it allows users to add both performance and FlexProtect-AP feature utilizes Reed Solomon ECC capacity to an Isilon cluster without downtime or (error correction code), parity striping (from n+1 to application changes. System administrators simply plug n+4) and mirrored file striping (from 2x to 8x) that in a new Isilon IQ storage node, connect the network spans multiple nodes within a cluster. These policies cables and turn it on. The cluster automatically detects can be set at any level, including cluster, directory, sub- the newly added storage node and begins to configure it directory, or even at the individual file level. With to become a member of the cluster. In less than 60 Isilon, all files are striped across multiple nodes within seconds, a user can grow available capacity and grow a cluster, no single node stores 100 percent of any file, the single file system by terabytes. and if a node fails, all other nodes in the cluster can still Isilon’s unique modular approach offers a building deliver 100 percent of the files without interruption. block, or “pay-as-you-grow”, solution so customers Drive Rebuild: In the event of a failure, OneFS aren’t forced to buy more storage capacity than is automatically re-builds files across all of the existing needed up front. Unlike existing systems, the modular distributed free space in the cluster in parallel, design of Isilon IQ also enables customers to eliminating the need to have the dedicated “parity incorporate new technologies in the same cluster, such drives” typically required with most traditional storage as adding a node with higher-density disk drives or architectures. OneFS takes advantage of the cluster by more Gigabit Ethernet ports for higher performance. leveraging all available free space across all nodes in Finally, OneFS automates several advanced features the cluster to rebuild data. By utilizing this free space that for traditional storage solutions are manually while also drawing on the multiple processors and intensive operations. Two of these include Isilon’s compute power of the cluster, data can be rebuilt five to AutoBalance and SmartConnect features. ten times faster when compared to traditional AutoBalance: When a system administrator adds a architectures. new storage resource, the common next step is to Self-Healing Capabilities: OneFS constantly manually migrate content from an existing storage monitors the health of all files and disks and maintains device to the new one in order to balance capacity records of the smart statistics (e.g. recoverable read across resources. Isilon IQ delivers automated content errors) available on each drive to anticipate when that migration when scaling and totally eliminates the need drive will fail. When OneFS identifies at risk for business application outages. Using its AutoBalance components, it preemptively migrates the data off of the feature, a new storage node can be added to an Isilon IQ cluster in less than 60 seconds. As soon as the node is without the use of any proprietary tools or protocols. turned on and network cables are connected, Industry standard file-level network protocols (i.e. NFS, AutoBalance immediately begins to migrate content CIFS, FTP, HTTP, SNMP, NDMP) allows Isilon IQ to from the existing storage nodes to the newly added node easily interoperate with existing systems. In short, across the cluster interconnect back-end switch, re- customers seamlessly deploy Isilon IQ in their existing balancing all of the content across all nodes in the data centers right next to their traditional storage cluster and maximizing utilization. systems from vendors such as EMC and Network SmartConnect: Another OneFS automation feature Appliance. is SmartConnect. The SmartConnect feature enables client connection load balancing and dynamic NFS 4 Conclusion failover and failback of client connections across storage nodes to provide optimal utilization of the There is a revolution well underway in the storage cluster resources. Without the need to install client side industry – the movement to Clustered Storage drivers, administrators can easily manage a large and architectures. This technology shift is driving huge growing number of clients and rest assured that in the business benefits: event of a system failure, in flight reads and writes will • Reduces storage costs: Costs 40-60% less than successfully finish without failing. By providing a traditional storage solutions to own and single virtual host name, SmartConnect makes it easy operate; for IT administrators to manage client connections. • Increases workflow productivity: Get up to 5x SmartConnect applies intelligent policies (i.e. CPU more work done with existing staff and utilization, connection count, throughput) to simplify resources; the connection management task by automatically • Increases IT operating leverage: Manage 10x distributing the client connections across the cluster more storage with existing IT staff; based on the defined policies to maximize performance. • Unlocks new revenues: Create and distribute more products – faster. 3.5 Isilon IQ: Linear Scalability in Performance Adoption of Clustered Storage solutions is One of the key benefits of OneFS is the ease with which increasing at an exponential pace. And Isilon Systems is it allows users to add both performance and capacity to at the forefront of the paradigm shift to Clustered an Isilon cluster in a near linear fashion. See Graph Storage architectures. below. Unlike other storage systems that communicate below RAID at the physical disk level, OneFS controls the optimal placement of files directly on the disk and dramatically improves performance of the disk subsystem when delivering data. Each addition of an Isilon IQ storage node or Accelerator increases memory, CPU power, journal space and disk spindles. A new Isilon IQ node equips the aggregate of the cluster with approximately 700 megabits per second of available throughput that scales linearly, allowing customers to easily meet increasing bandwidth needs. The other enabling technology that allows Isilon IQ to reach break-through linear scalability of performance is use of Infiniband as the high–speed, low-latency intra-cluster interconnect. A backend Infiniband switch allows the Isilon cluster to experience nearly zero latency in keeping the nodes in sync, allowing for optimal overall cluster performance. In fact, Isilon testing has shown that this enabling technology allows an Isilon solution to obtain much higher performance, much more quickly, than with a GigE backend interconnect. Isilon is the first and only clustered storage solution to utilize Infiniband as a clustered storage interconnect, and today over 90% of Isilon customers deploy this option. 3.6 Isilon IQ: Enterprise Ready Now in its fourth generation, Isilon IQ has delivered on many of the features that meet the requirements for integration into the commercial enterprise. Isilon IQ is built to work in a wide array of existing environments