Autor

Autor

Autor

In today’s digital age, organizations are challenged to keep up with the unprecedented pace of data generation and the plethora of enterprise systems and digital technologies that collect all types of data. This is coupled with the need to rapidly and efficiently analyze these large volumes of data to generate insights and intelligence in order to maximize their business value. As a result, big data platforms have become an essential foundation for organizations to efficiently deploy data solutions that provide timely data-driven business decisions and competitive advantage.

"Data Analyse- und Intelligenzlösungen verbreiten sich in Unternehmen, um das Geschäftswachstum zu fördern. Unternehmen sollten große data Plattformen als solide Grundlage für den Einsatz von data Lösungen in großem Umfang aufbauen. Diese data Plattformen sollten speziell für Unternehmen entwickelt werden, da sie nur so gut sind wie die Geschäftseinblicke und die Intelligenz, die sie ermöglichen; und sie sollten zukunftssicher sein und von den ständigen Fortschritten bei data Infrastrukturdiensten und -technologien profitieren."
Oussama Ahmad, Data Beratungspartner bei Artefact

Hauptziele der Plattform Big Data

Big data Plattformen zielen darauf ab, data Silos aufzubrechen und die verschiedenen Arten von data Quellen zu integrieren, die für die Implementierung fortschrittlicher data Analyse- und Intelligence-Lösungen erforderlich sind. Sie bieten eine skalierbare und flexible Infrastruktur für das Sammeln, Speichern und Analysieren großer Mengen von data aus verschiedenen Quellen. Diese Plattformen sollten die besten data Verwaltungsdienste und -technologien nutzen und drei Hauptziele erfüllen:

  • Zentralisierung von data Quellen: Eine große data Plattform sollte data Silos aufbrechen, indem sie automatisch verschiedene Arten und Größen von data Quellen aus Unternehmenssystemen data und Quellen von Drittanbietern data aufnimmt und speichert. Sie sollte das zentrale data Repository werden und eine einzige Quelle der Wahrheit für alle data Quellen bieten, die für data Analyselösungen benötigt werden.

  • Ermöglichen Sie data Analyselösungen: Eine große data Plattform sollte eine robuste Infrastruktur für die Entwicklung, den Betrieb und die Bereitstellung verschiedener Arten von Analyselösungen (von einfachen Berichten bis hin zu fortgeschrittenem maschinellem Lernen) bieten, um den geschäftlichen Bedarf an Informationen und Erkenntnissen für die Entscheidungsfindung zu decken.

  • Gewährleistung eines rechtskonformen und sicheren Zugriffs auf data und Anwendungen: Mit einer großen data Plattform sollten Organisationen in der Lage sein, sowohl internen als auch externen Interessengruppen einen konsolidierten, sicheren data Zugang zu bieten. Außerdem sollte sie data in einer Weise speichern, verarbeiten und verteilen, die den lokalen data Gesetzen und Vorschriften sowie internationalen Standards und Best Practices entspricht.

Infrastruktur der Plattform Big Data

There are several infrastructure options for a big data platform: fully on-premise, fully cloud or hybrid cloud/on-premise, each with its own advantages and challenges. Organizations should consider a number of factors when choosing the most appropriate infrastructure option for their big data platform, including data security and residency requirements, data source integrations, functionality and scalability requirements, and cost and time. A fully cloud-based architecture offers lower and more predictable costs, out-of-the-box services and integrations, and rapid scalability, but lacks control over hardware and may not comply with data privacy and residency regulations. A fully on-premise architecture provides full control over hardware and data security, typically complies with privacy and residency regulations, but incurs higher costs and requires long-term planning for scaling. A hybrid cloud/on-premise architecture offers the best of both worlds, facilitating full migration to the cloud at a later date, but may require a more complex setup.

Many organizations choose a hybrid infrastructure for their big data platforms due to organizational requirements to keep highly sensitive data (such as customer and financial data) on their own servers, or due to the lack of government-certified cloud service providers (CSPs) that meet local data privacy and residency requirements. These organizations also prefer to keep cloud-native or non-sensitive data sources in the cloud to optimize storage and compute resource costs and take advantage of out-of-the-box data analytics and machine learning services available from CSPs. Other organizations that have no organizational or regulatory requirements for data residency within the company or country opt for fully cloud-based infrastructure for faster time to implement, optimized costs, and easily scalable resources.

class="lazyload

Figure 1: Hybrid Cloud & On-Premise Data Platform Infrastructure

Eine große data Plattform umfasst in der Regel sieben Hauptebenen, die den data Lebenszyklus von "Rohdaten data" über "Informationen" bis hin zu "Erkenntnissen" widerspiegeln. Unternehmen sollten sorgfältig prüfen, welche Dienste und Werkzeuge für jede dieser Ebenen erforderlich sind, um einen nahtlosen Datenfluss und eine effiziente Generierung von data Erkenntnissen zu gewährleisten. Diese Dienste und Werkzeuge sollten in jeder Schicht der Big data Plattform Schlüsselfunktionen übernehmen, wie in Abbildung 2: Big Data Plattform Data Schichten dargestellt.

class="lazyload

Figure 2: Big Data Platform Data Layers

Evolution of the Big Data Platform

The development of a big data platform should evolve through several stages, starting with a minimum viable platform (MVP) and continuing with incremental upgrades. An organization should synchronize the evolution of its big data platform with increased requirements for broader and faster data insights and intelligence for business decisions. These increased requirements affect the complexity of the big data platform in terms of data analytics solutions, data source volumes and types, and internal and external users. The evolution of the big data platform includes the addition of more storage and compute resources, advanced features and functionality, and improvements in platform security and management.

class="lazyload

Exhibit 3: Big Data Platform Evolution

“We have seen that many organizations tend to build big data platforms with advanced and unnecessary features from day one, which increases the technology cost of ownership. Big data platform deployment should start with a minimum viable platform and evolve based on business and technology requirements. In the early stages of building the platform, organizations should implement a robust data governance and management layer that ensures data quality, privacy, security, and compliance with local and regional data laws.”
 Anthony Cassab, Data Consulting Director at Artefact

Guidelines for a Future-Proof Big Data Platform

A big data platform should be built according to key architectural guidelines to ensure that it is future-proof, allowing for easy scalability of resources, portability across different on-premise and cloud infrastructures, upgrade and replacement of services, and expansion of data collection and sharing mechanisms.

  • Modular data layers: All platform layers should be well defined and integrated, from the data ingestion layer to the data visualization and BI layer. Each layer should leverage best-in-class services or tools, which typically requires that the architecture doesn’t rely on a “black box” solution and allows for the configuration and integration of standalone tools and services that provide specific functionality.

  • Containerized applications: The platform should containerize data ingestion, processing, and analysis procedures and applications using orchestration platforms like Kubernetes. Containers offer a logical packaging mechanism in which applications can be abstracted from the runtime environment allowing run of containerized workloads on various types of infrastructure. This facilitates portability of platform applications across different on-premise and cloud infrastructures and deployment across multiple clouds.

  • Microservices-based architecture: Platform applications should be broken down into microservices, each serving a specific function and interacting with each other. This facilitates build and maintenance of applications, allows independent deployment and scaling of microservices, and enables rapid and frequent delivery of large complex applications.

  • Standard services and tools: The selection of tools and services for the platform should focus on shared industry standards (open standards) and reduce reliance on those that are specific to any single technology vendor. For example, the platform should include cloud services that are common to multiple cloud service providers. This facilitates migration across different on-premise and cloud infrastructures and multi-cloud deployments, reducing cost and time.

  • Robust data governance: From the outset, the platform should incorporate a robust data governance framework in the form of governance tools, services, processes, controls, and rules that ensure continued monitoring and improvement of data quality, secure access to data and data analytics, privacy protection, compliant storage and processing, and standardized data and metadata management. This facilitates the scaling of platform resources and capabilities, and facilitates the broad adoption of data analytics solutions and the use of available datasets.

“An adaptable and modular platform that can scale as business needs evolve is preferable to a “black box” platform that is well integrated but allows limited customization. These platform architectures can be built fully or partially in the cloud to leverage the benefits of cloud computing, such as scalability and cost efficiency, while also meeting the privacy and security requirements of data protection regulations.”
Faisal Najmuddin, Data Engineering Manager at Artefact

In summary, a big data platform brings multiple benefits to organizations, such as centralizing data sources, enabling advanced data analytics solutions, and providing enterprise-wide access to data analytics solutions and sources. However, implementing a big data platform entails a number of strategic decisions, such as choosing the right infrastructure(s), adopting a future-proof architecture, selecting standard and “migratable” services, carefully considering data protection regulations, and finally, defining an optimal evolution plan that’s closely linked to business requirements and maximizes return on data investment.