Data Science and Artificial Intelligence

Multivariate statistics and data mining

Integrated course, 5.00 ECTS

 

Course content

Part 1: Structure-discovering processes:
- Principal Component Analysis
- Exploratory factor analysis
- Nearest neighbor classification
- Cluster analysis
- Partial Least Squares regression
- Support vector machines
- Multidimensional scaling
Part 2: Structural inspections:
- Multivariate linear, nonlinear and logistic regression
- LASSO (least absolute shrinkage and selection operator)
- Multivariate time series analysis (including structural break analysis)
- Structural equation models
- Discriminant analysis
- Analysis of variance
- Confirmatory factor analysis
Part 3: Text mining
- Word frequencies and correlations
- Grouping / clustering of texts

Learning outcomes

The students have a comprehensive practical understanding in the fields of structure-discovering and structure-checking multivariate statistics and are able to use essential methods independently. They are also able to statistically analyze texts and to cluster them accordingly.

Recommended or required reading and other learning resources / tools

Recommended literature or books:
- Backhaus, K., Erichson, B., Plinke, W., Weiber, R. (2018). Multivariate analysis methods: An application-oriented introduction, Springer Gabler, 15th ed.
- Backhaus, K., Erichson, B., Weiber, R. (2015). Advanced multivariate analysis methods: An application-oriented introduction, Springer Gabler, 3rd ed.
- Bronstein, I.N., Mühlig, H., Musiol, G., Semendjajew, K.A. (2016). Taschenbuch der Mathematik, European teaching materials, 10th edition.
- Fields, A., Miles, J. (2012). Discovering Statistics Using R, Sage Publications.
- Hedderich, J., Sachs, L. (2018). Angewandte Statistik: Methodensammlung mit R, Springer Spektrum, 16. Auflage
- Silge, J., Robinson, D. (2017). Text Mining with R: A Tidy Approach, O'Reilly Media.
- Ugarte, M.D., Militino, A.F., Arnholt, A.T. (2015). Probability and Statistics with R. Taylor & Francis Inc. 2nd Edition.
- VanderPlas, J. (2017). Data Science with Python: The manual for the use of IPython, Jupyter, NumPy, Pandas, Matplotlib and Scikit-Learn, mitp Professional.
- Wickham, H., Grolemund, G. (2016). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data, O'Reilly Media.
Recommended journals or selected articles:
Relevant journals and articles will be announced in the course.

Typical software for this module:
R / RStudio, Python / Spyder / PyCharm etc.

Mode of delivery

2,5 ECTS Lecture, 2,5 ECTS Exercise

Prerequisites and co-requisites

module 3 and 5

Assessment methods and criteria

Lecture: final exam, Exercise: examination character