The hybrid cloud offers businesses tremendous flexibility in how to use their computing resources, allowing them to use the most appropriate and cost-effective infrastructure model for each workload. However, the split nature of hybrid cloud infrastructure means that there are constant decisions that need to be made about where to place each application and each data set, as well as how to manage it. For big data in the hybrid cloud, companies face three major decisions related to managing the data at rest, managing the data in motion, and managing the data in use.

Data At Rest in the Hybrid Cloud

Many companies struggle to decide which data should reside locally and what can go in the public cloud. In most cases, the “data gravity” should be the deciding factor: data generated outside the enterprise should remain outside the enterprise.

This external gravity could also pull in data generated by edge devices far from the corporate data center. Data generated inside the data center should generally remain there.

For the data that resides in the private cloud, companies then need to decide how they’ll implement a scalable storage system. True web-scale IT is beyond the technical capability of most enterprises. Instead clustered NAS, such as that provided by NetApp, provides scalable storage using a familiar technology.

Data In Motion in the Hybrid Cloud

Although data will have a natural home due to data gravity, this doesn’t eliminate the need for data to migrate between environments. An application may be prototyped in the public cloud, with final data and workloads running in the private cloud. An application may generate live data in one environment but need historic data held in the other; conversely, an application that relies on historic data may need to be updated with the latest data that became historic a millisecond ago. Although each dataset has a source, there may be a continuous stream of data between public and private clouds.

Making these data transfers work requires companies to address their network bandwidth, evaluate data latency, and ensure that data is in the appropriate format. Using tools such as the NetApp Cloud Sync service simplifies and streamlines transferring data between local storage and Amazon Web Services public cloud instances.

Data in Use in the Hybrid Cloud

For some of the new data formats utilized by big data in the hybrid cloud, data governance may not be a relevant concept. The data may be unstructured, meaning there’s no metadata or data model to be managed, and there may be no need for a system of record. Tools such as Veritas Information Map can help companies make sense of their unstructured big data.

Even for this data, there are still concerns about data usage, particularly in a public cloud where the server is used by other companies and accessible to the cloud provider’s employees. In addition, for big data which is structured or has data models, it’s important to determine whether the public or private cloud’s copy of the data (if it’s in both places) is the official value for that data.

To learn more about how NetApp and Veritas products can help you get your big data’s data challenges under control, contact VAST.