Constraints on Applications

The Constraints Of Disk Technology On Application Design

When magnetic drum technology was first introduced, access time to data was as fast as the compute technology of its day. The major components were magnetic heads and revolving magnetic media. Compute and persistent storage were balanced, with ten microseconds for compute and ten microseconds for access to persistent storage (See Note 3 of linked reference for more detail).

Over time, disk technology was driven by two factors – the development of better magnetic heads and the development of improved movement of the heads and/or magnetic media. Improved magnetic heads reduced the size of magnetic “blob”, so tracks could be closer, with more data per track. The technology to fabricate magnetic heads was the same fundamental technology used to fabricate memory and logic chips. The rate of improvement of magnetic heads followed Moore’s Law and until recently improved by 30-35%/year. Data density and the cost/unit of storage has improved in line with compute power. However, the rate magnetic data could be read or written (bandwidth) has only improved by less than the square root of the improvement of the data density (~10-12% per year). The physics of spinning the media faster soon reached a plateau, as the outer edge of disk drives approached the speed of sound. Disks became smaller, heads moved across the magnetic media from one track to another, but nothing has had much impact on the time to access data. Magnetic disks are limited by rotation speed, and access times are measured in milliseconds, while compute cycles are measured in nanoseconds.

The history of storage array development has been the application of technologies to mitigate this disparity in cycle time. Read buffers and smart algorithms attempt to maximize the probability that data required by applications would be found in DRAM. Battery and capacitance protect DRAM write buffers, allowing high burst rates of data to be written with low latency. Sequential data is striped across multiple disk drives to increase data rates. All these technologies help if applications are designed in a certain way, with small working sets and limited functionality. A whole major industrial market and supply chain has been built around “standard” disks from Seagate, HGST, and others housed in disk shelves and surrounded by proprietary software on Intel processors. The proprietary software from EMC, NetApp, HP, IBM, and others has been the “glue” that has allowed high functionality with very high uplifts (10-15:1) on the “Fry’s Price”, the base cost of a single disk drive.

The attempts to mitigate the mechanical problems have been significant, but the impact of disk storage limitations on application design has also been stark. As Wikibon has pointed out in previous research, applications have been written and designed with small working sets that fit into small buffers. Write rates have been suppressed to meet the write buffer constraints. The number of database calls within a business transaction have been severely limited by the high variance of disk-based storage; this is necessary in order to reduce operational and application maintenance complexity and cost. Large applications are still designed as a series of modules with loosely coupled databases. Data warehouses and analytics are separated from operational transaction systems.

The bottom line is that current applications are dependent on very complex infrastructure software to mitigate the ever increasing gap between compute and persistent storage cycle times. This complex software is spread between the controllers of storage arrays and ever more complex databases from Oracle, IBM, Microsoft, and others that protect applications. NoSQL databases offer apparent relief for some applications but in most instances just move the complexity from the database back to the programmer. Previous Wikibon research has shown the design of application suites such as SAP and many others are also constrained by disk-based storage, and apparently “simple” projects combining multiple landscapes and integrating business processes are in reality difficult, time-consuming and risky. The cost of developing, operating and managing enterprise applications has became very high. Most of the problems in operating current applications are storage related, and almost all the constraints in application design are storage related. Most DBAs and infrastructure specialists assume disk-based storage is guilty until it proves itself innocent.

The business impact of this is immense. The complexity and resultant cost of deploying additional database, storage array software and storage management software to enable moderate or large-scale applications is very high. The disappointment felt by business leaders in the failure of IT to meet its potential to decrease business costs can be placed squarely in this complexity, caused almost exclusively by the failure of mechanical persistent storage to keep up with compute and network technologies. The magnetic disk drive is the last mechanical device in the path of the electronic data center.