Data Science and Artificial Intelligence

Data quality and data cleansing

Integrated course, 2.50 ECTS

 

Course content

Part 1: Preparation of data
- Reading in and working with data from different sources (CSV, XML, HTML, JSON, ...)
- Character sets or character set transformation
- Data type conversion and renormalization
- Duplicate detection and deduplication
- Complex transformations of data (especially pivoting and unpivoting)
- Complex filtering and sorting of data
Part 2: Erroneous and incomplete data
- Data quality analysis
- Smoothing discrete data
- Anomaly detection
- Singular and multiple imputation
Part 3: Continuous data
- Special features of audio, image and video data (or signal data)
- Transformations and discretization of continuous data
- Convolution and application of filters
- Smooth continuous data
- Compression of continuous data

Learning outcomes

The students are able to read data from various sources and prepare them accordingly. They are also able to recognize incorrect data and to add incorrect data. Students also have basic knowledge of how to deal with continuous data.

Recommended or required reading and other learning resources / tools

Recommended literature or books:
- Hichert, R. (2019). Solid, outlined, hatched: How visual consistency helps better understand reports, presentations and dashboards. Vahlen, 1st edition.
- Kieran, H. (2019). Data Visualization: A Practical Introduction. Priceton University Press, 1st edition.
- Lovelace R., Nowosad J :, Muenchow J. (2019). Geocomputation with R. Taylor & Francis Ltd., 1st edition.
- McCandless, D. (2014). Knowledge is beautiful. Harper Collins Publ. UK, 1st edition.
- Ohser, J. (2018) Angewandte Bildverarbeitung und Bildanalyse: Methoden, Konzepte und Algorithmen in der Optotechnik, optischen Messtechnik und industriellen Qualitätskontrolle. Carl Hanser Verlag GmbH & Co. KG, 1. Auflage.
- Skiena, S. (2017). The Data Science Design Manual (Texts in Computer Science). Springer, 1st edition.
- Squire, M. (2015). Clean Data. Packt Publishing, 1st edition.
- Van Burren, S. (2018). Flexible Imputation of Missing Data, Second Edition (Chapman & Hall / CRC Interdisciplinary Statistics). Taylor & Francis Ltd., 2nd edition.
- Van der Loo, M., De Jonge, E. (2018). Statistical Data Cleaning with Applications in R. Wiley, 1st edition.
- Wickham, H. (2017). R for data science. O'Reilly UK Ltd., 1st edition.
- Wiedemann, J. (2018). Understanding the World. The Atles of Infographics. TASCHEN, 1st edition (multilingual).
- Wiedemann, J. et al (2018). Information Graphics. BAGS, reissue.
- Winston, C. (2018). R Graphics Cookbook: Practical Recipes for Visualizing Data. O'Reilly UK Ltd., 2nd edition.
- Yau, N. (2011). Visualize This: The FlowingData Guide to Design, Visualization, and Statistics. Wiley, 1st edition.
Recommended journals or selected articles:
Relevant journals and articles will be announced in the courses.

Typical software for this module:
R / RStudio, Python / Spyder / PyCharm, Matlab / Octave / Scilab etc.

Mode of delivery

1,25 ECTS Lecture, 1,25 ECTS Exercise

Prerequisites and co-requisites

module 2,3,4 and 5

Assessment methods and criteria

Lecture: final exam, Exercise: examination character