Beating the big data blues

Henrik Pederson, Senior Technical Account Manager, CommVault Systems, explains how an integrated data management strategy can help to manage big data and reduce storage costs. 

Data is the life blood of a successful organisation and effective management of data resources plays a vital role in its smooth operation. The ever growing number of processes and regulations result in the accumulation of large amounts of both business- and non-business-related content. According to a survey conducted by Gartner, 47 percent of large enterprises identify data growth as the biggest data centre hardware infrastructure challenge. On average, the data capacity in enterprises is growing at 40 percent to 60 percent. Research further shows that more than 52 percent of an organisation’s digital content is unstructured data such as files, documents, image files, video etc. while just 31 percent is structured.

Over 70 percent of this content is generated by end-users within the organisation. Employees often store personal data on company resources as they know that it will be securely maintained and regularly backed up. Thus, the data pool contains a mixture of data which is business-critical as well as data which has lesser or no business value. Even business-related data can get stale over a period of time, i.e. it becomes inactive and has low business relevance. The failure to analyse data means that all data is treated in the same manner, which leads to ineffective utilisation of company resources.

The real challenge that organisations face lies not in having to deal with data growth – that is inevitable – but in the effective and strategic management of data. After all, while data growth is projected at 40 to 60 percent per year, growth in IT budgets is estimated to be just 2.6 percent, which is significantly less.

Factors contributing to unnecessary data growth

Long term retention is a factor that complicates the overall data management process. Retention may be for business reasons, historical reasons, end-user driven requirements and policies and regulations, which may be prescribed by the government or the organisation itself. As the number of retention policies, both government and home-grown, add up, the organisation and storage of data becomes more complex.

Maintaining multiple copies of the same data is both inefficient and expensive. Apart from causing inconsistency and placing a large overhead, redundancy can affect long-term processes such as backup. Although the cost of storage devices is reducing, having redundant data on these devices increases the time taken for backup. This causes significant increase in the network overhead and bandwidth requirement. Furthermore, most large organisations with multiple locations globally generate large volumes of data on a constant basis. Due to this global dispersion, backup windows are constantly reducing and so only critical and business relevant data should be identified and selected for regular backup. So what techniques do organisations employ to reduce their data storage requirements and effectively utilise resources?

Resource acquisition – the quick fix

The most common tactical reaction to solving the data growth problem is to simply buy more storage. Given the reducing cost of storage, this knee-jerk reaction proves to be the quick fix but often reflects the lack of the ability to carry out predictive capacity planning. The hoarding of data is further complicated by the infinite retention policies as the data is stored without consideration for the actual content.


Data archiving, along with data tiering, is considered to be an effective data reduction technology. But blind archiving, without first gaining insight into the data landscape or applying any governing policy simply translates to moving data between the tiers and does not contribute to any reduction in the total volume of data being managed.

Deduplication – beating the bloat

Finally, Deduplication, or dedupe for short, is probably the most talked about data management strategy. It is also perhaps the leading data reduction technology permitting sizable reductions in data volume. Traditionally, organisations opt for a hardware-based approach to deduplication. This eliminates redundant data on back-end devices. However, the challenges faced with this methodology include an increase in operational management costs and impact on network overhead and bandwidth, which are factors that contribute significantly to the yearly increase in storage management costs.

Applying these data management strategies individually and independently won’t permit efficient capacity planning, which keeps capacity ahead of demand. Neither will they bring about reduction in data volume or operation expenses. So if the three most widely used strategies fall short, what really is the best solution?

Basics of integrated data reduction

The integrated data reduction approach is the hot topic in the data management world. By applying a combination of the three strategies, organisations can reduce their overall data volume and migrate the retained data to the most appropriate tier of storage thereby achieving significant reductions in storage costs. An integrated data reduction approach is implemented in the following manner:

  • First, through storage resource management (SRM), the organisation can gain visibility into the data. This visibility is the key to understanding how to reduce the content volume. It helps in making informed decisions based on the business value of the data. This information can then be used to determine what should be deleted and what should be archived, and how tiering of data should be carried out.
  • Intelligent archiving is then made possible by SRM and it becomes an enabler of data reduction. By identifying the right candidates for archiving, granular policies can then be applied instead of hoarding of large volumes of data. This phase also involves the deletion of inactive data and freeing up of critical primary storage resources.
  • Once the data has been aligned with the appropriate tier and the primary storage pool has been optimised, deduplication is applied across the backup and archive pools on a global scale so as to reduce the amount of data present on the back end. Deduplication is really the key to an integrated data reduction strategy as it reduces redundancy in the backup and archive pools, regardless of the back-end storage devices used. While deduplication ratios of 1:20 and higher are not unusual, even a conservative ratio of 1:5 would result in a drastic reduction of the operation management expenses.

The disproportionality between data and budget growth is set to increase and companies are starting to realise that addressing the problem in an ad-hoc manner is ineffective and carries severe long-term implications. When properly implemented, an integrated data management strategy can dramatically reduce inefficiency, enhance manageability and drastically reduce operational expenses.

Previous ArticleNext Article

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.


The free newsletter covering the top industry headlines