(VU, 706.520 Data Integration and Large-Scale Analysis)
DIA is a 5 ECTS bachelor and master course, applicable to the bachelor programs computer science or software engineering and management, as well as the master catalog 'Data Science'. This course covers major data integration architectures, key techniques for data integration and cleaning, as well as methods for large-scale, i.e., distributed, data storage and analysis.
In detail, the course covers the following topics, which also reflects the course calendar. All slides will be made available prior to the individual lectures, which take place Friday's 3pm in HS-i5 or virtually.
In the first part of this course, we will explore essential techniques and methodologies for preparing and managing large volumes of data. These techniques form the foundation for building reliable AI models by ensuring that the data is clean, well-structured, and ready for advanced analytics.
October 11 - Introduction to data integration concepts and overview of the course.
Download PDFOctober 18 - Learn about data warehousing and data preparation techniques.
Download PDF Download PPTXOctober 25 - Explore middleware concepts, enterprise application integration, and data replication.
Download PDF Download PPTXNovember 08 - Learn about schema matching techniques and data mapping strategies.
Download PDF Download PPTXNovember 22 - Techniques for data cleaning and data fusion for integrated systems.
Download PDF Download PPTXIn the second part of this course, we will dive into cutting-edge technologies and frameworks designed to handle heterogeneous data at scale. You will learn how to leverage these tools to generate meaningful insights and analytics from distributed data systems.
November 29 - Introduction to cloud computing principles and technologies.
Download PDFDecember 06 - Learn about resource management and scheduling in cloud environments.
Download PDFDecember 13 - Explore distributed data storage systems and techniques.
Download PDFDecember 20 - Understand distributed and data-parallel computation methods.
January 10 - Learn about real-time distributed stream processing techniques.
January 17 - Study distributed machine learning frameworks and systems.