Data Science and Artificial Intelligence

Big Data Storage and Processing

Integrated course, 5.00 ECTS


Course content

Introduction and basic concepts of big data, cloud computing, hybrid IT infrastructures and As-A-Service models; Design and setup of complex cloud-based service architectures and integration of existing IT systems to provide a holistic analysis service; Requirement-based scaling of storage and computing capacity; Storing and analyze petabytes and trillions of objects using cloud-based Data Lake stores (Azure Data Lake, etc.); Parallel Data Transformation and Processing with Big Data Processing Languages ??(U-SQL); Developement of U-SQL scripts using Data Lake tools; Highly Scalable Big Data Analytics with Cloud-Based Business Analytics Services (Azure Data Lake Analytics, HDInsight, etc.); Big Data Frameworks (Hadoop) and Distributed File Systems (HDFS). Integrated and end-to-end identity and access management to provide single sign-on (SSO), multi-factor authentication, and seamless centralized management of participating digital identities (Azure Active Directory, etc.)

Learning outcomes

Students have in-depth knowledge of requirement-based planning and the implementation of cloud-based service architectures for storing, processing and analyzing big data, regardless of the size, format, structure or speed of the data.
Students understand the relevance and context of hybrid cloud service architectures in order to deliver highly scalable, cost-effective and secure business analytics platforms. They know all necessary features for developers, data scientists, and analysts to store and process highly diverse, complex data of all sizes. In addition, they can efficiently utilize the benefits of business agility and the on-going scaling of cloud-based analytics services.
Students are able to design and implement the required service architectures and their interfaces holistically and according to requirements. They have an end-to-end understanding of the different distributed data storages, as well as high-level parallel data transformation and processing, to quickly gain valuable information from complex data.

Recommended or required reading and other learning resources / tools

Books: N. Marz and J. Warren, Big Data - Principles and best practices of scaleable real-time data systems, Shelter Island - New York: Manning Publications Co., 2015. T. White, Hadoop - The Definite Guide (4th edition), Sebastopol, California: O’Reilly Media, Inc., 2015. Z. Tejada, Mastering Azure Analytics. Architecting in the Cloud with Azure Data Lake, HDInsight, and Spark. Sebastopol CA: O`Reilly Media Inc., 2017.
Journals: various articles and online documentaries

Mode of delivery

2 THW Lecture, 2 THW Tutorial

Prerequisites and co-requisites

IT infrastructure and basics

Assessment methods and criteria

Lecture: final exam, Tutorial: continuous appraisal