Together with social, mobile and cloud, analytics and associated data technologies have emerged as core business disruptors in the digital age. As companies began the shift from being data-generating to data-powered organisations in 2017, data and analytics became the centre of gravity for many enterprises. In 2018, these technologies need to start delivering value. Here are the approaches, roles and concerns that will drive data analytics strategies in the year ahead.
Data lakes will need to demonstrate business value or die
Data has been accumulating in the enterprise at a torrid pace for years. The Internet of Things (IoT) will only accelerate the creation of data as data sources move from web to mobile to machines.
“This has created a dire need to scale out data pipelines in a cost-effective way,” says Guy Churchward, CEO of real-time streaming data platform provider DataTorrent.
For many enterprises, buoyed by technologies like Apache Hadoop, the answer was to create data lakes — enterprise-wide data management platforms for storing all of an organisation’s data in native formats. Data lakes promised to break down information silos by providing a single data repository the entire organisation could use for everything from business analytics to data mining. Raw and ungoverned, data lakes have been pitched as a big data catch-all and cure-all.
But while data lakes have proven successful for storing massive quantities of data, gaining actionable insights from that data has proven difficult.
To survive 2018, data lakes will have to start proving their business value, says Ken Hoang, vice president of strategy and alliances at data catalog specialist Alation.
“The new dumping ground of data — data lakes — has gone through experimental deployments over the last few years, and will start to be shut down unless they prove that they can deliver value,” Hoang says. “The hallmark for a successful data lake will be having an enterprise catalog that brings information discovery, AI, and information stewarding together to deliver new insights to the business.”
However, Hoang doesn’t believe all is lost for data lakes. He predicts data lakes and other large data hubs can find a new lease on life with what he calls “super hubs” that can deliver “context-as-a-service” via machine learning.
“Deployments of large data hubs over the last 25 years (e.g., data warehouses, master data management, data lakes, Salesforce and ERP) resulted in more data silos that are not easily understood, related, or shared,” Hoang says. “A hub of hubs will bring the ability to relate assets across these hubs, enabling context-as-a-service. This, in turn, will drive more relevant and powerful predictive insights to enable faster and better operational business results.”
Langley Eide, chief strategy officer of self-service data analytics specialist Alteryx, says IT won’t be left alone on the hook when it comes to making data lakes deliver value: Line-of-business (LOB) analysts and chief digital officers (CDOs) will also have to take responsibility in 2018.
“Most analysts have not taken advantage of the vast amount of unstructured resources like clickstream data, IoT data, log data, etc., that have flooded their data lakes — largely because it’s difficult to do so,” Eide says. “But truthfully, analysts aren’t doing their job if they leave this data untouched. It’s widely understood that many data lakes are underperforming assets – people don’t know what’s in there, how to access it, or how to create insights from the data. This reality will change in 2018, as more CDOs and enterprises want better ROI for their data lakes.”
Eide predicts that 2018 will see analysts replacing “brute force” tools like Excel and SQL with more programmatic techniques and technologies, like data cataloging, to discover and get more value out of the data.
The CDO will come of age
As part of this new push to get better insights from data, Eide also predicts the CDO role will come into its own in 2018.
“Data is essentially the new oil, and the CDO is beginning to be recognised as the linchpin for tackling one of the most important problems in enterprises today: driving value from data,” Eide says. “Often with a budget of less than $10 million, one of the biggest challenges and opportunities for CDOs is making the much-touted self-service opportunity a reality by bringing corporate data assets closer to line-of-business users. In 2018, the CDOs that work to strike a balance between a centralized function and capabilities embedded in LOB will ultimately land the larger budgets.”
Eide believes CDOs that enable resources, skills, and functionality to shift rapidly between centres of excellence and LOB will find the most success. For this, Eide says, agile platforms and methodologies are key.
Rise of the data curator?
Tomer Shiran, CEO and co-founder of analytics startup Dremio, a driving force behind the open source Apache Arrow project, predicts that enterprises will see the need for a new role: the data curator.
The data curator, Shiran says, sits between data consumers (analysts and data scientists who use tools like Tableau and Python to answer important questions with data) and data engineers (the people who move and transform data between systems using scripting languages, Spark, Hive, and MapReduce). To be successful, data curators must understand the meaning of the data as well as the technologies that are applied to the data.
“The data curator is responsible for understanding the types of analysis that need to be performed by different groups across the organisation, what datasets are well suited for this work, and the steps involved in taking the data from its raw state to the shape and form needed for the job a data consumer will perform,” Shiran says. “The data curator uses systems such as self-service data platforms to accelerate the end-to-end process of providing data consumers access to essential datasets without making endless copies of data.”
Data governance strategies will be key themes for all C-level executives
The European Union’s General Data Protection Regulation (GDPR) is set to go into effect on May 25, 2018, and it looms like a specter over the analytics field, though not all enterprises are prepared.
The GDPR will apply directly in all EU member states, and it radically changes how companies must seek consent to collect and process the data of EU citizens, explain lawyers from Morrison & Foerster’s Global Privacy + Data Security Group: Miriam Wugmeister, Global Privacy co-chair; Lokke Moerel, European Privacy Expert; and John Carlin, Global Risk and Crisis Management chair (and former Assistant Attorney General for the U.S. Department of Justice’s National Security Division).
“Companies that rely on consent for all their processing operations will no longer be able to do so, and will need other legal bases (i.e., contractual necessity and legitimate interest),” they explain. “Companies will need to implement a whole new ecosystem for notice and consents.”
Even though GDPR fines are potentially massive — the administrative fines can be up to 20 million Euros or 4 percent of annual global turnover, whichever is highest — many enterprises, particularly in the U.S., are not prepared.
“When the Y2K boom came around, everyone was preparing for odds that they may or may not face,” says Scott Gnau, CTO of Hortonworks. “Today, it seems that barely anyone is properly preparing for the GDPR being enforced in May 2018. Why not? We’re currently in a phase where every organization is not only trying to deal for ‘what’s next,’ but they’re struggling to maintain and deal with issues that need solving now. Many organisations are likely relying on chief security officers to define the rules, systems, parameters, etc., to help their global system integrators figure out the best course of action. That is not a realistic expectation to put on one individual’s role.”
To enforce GDPR properly requires the C-suite be informed, prepared, and communicative with all facets of their organisation, Gnau says. Organisations will need a better handle on the overall governance of their data assets. But large breaches, like the Equifax breach that came to light in 2017, means they will struggle to balance providing self-service access to data for employees while protecting that same data from prospective threats.
As a result, Gnau predicts data governance will be a focus point for all organizations in 2018.
“A key goal should be developing a system that balances democratization of data, access, self-service analytics, and regulation,” Gnau says. “The way we architect data safely going forward will have an impact on everyone — customers in the U.S. and overseas, the media, your partners, and more.”
Zachary Bosin, director of solution marketing for multi-cloud data management specialist Veritas Technologies, predicts a U.S. company will be one of the first to be fined under the GDPR.
“Despite the impending deadline, only 31 percent of companies surveyed by Veritas worldwide believe they are GDPR-compliant,” Bosin says. “Penalties for non-compliance are steep, and this regulation will impact every and any company that deals with EU citizens.”