Data Science for Security and Forensics
Study plans 2016-2017 - IMT4133 - 7.5 ECTS

On the basis of

BSc level basics in statistics and mathematics, i.e. expected prior-knowledge in understanding basic statistical methods like descriptive statistics, probability, sampling distributions, and hypothesis testing, as well as basic analysis and matrix algebra.
 

Expected learning outcomes

Knowledge:

  • Understand principles how multidimensional statistical methods differ from one dimensional methods.
  • Understand the distribution of information in statistical analysis and meaning in data representation.
  • Extract features from raw, measured values of data to be analyzed.
  • Program some basic classification and clustering methods and test their validity.
  • Program some basic Neural networks methods and test their validity.
  • To apply basic statistical and data analysis methods to data relevant in information security, forensics and/or color/media technology

Skills:

  • The students can use relevant scientific methods in independent research and development in machine learning and pattern recognition.
  • The students are capable of carrying out an independent limited research or development project in machine learning and pattern recognition under supervision, following the applicable ethical rules.

 General competence:

  • The students can work independently and are familiar with terminology of machine learning and pattern recognition as well as their application in the security and forensics domain.

Topic(s)

 

  • Learning, Intelligence, and Machine learning basics: principles, measures, performance evaluation, method combinations.
  • Knowledge representations: discriminant and regression functions, probability distributions, Bayesian classifier.
  • Learning as search: Exhaustive search, heuristic search, genetic algorithms.
  • Attribute quality measures: measures for classification, measures for regression, application of feature-selection measures.
  • Data preprocessing: Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA).
  • Supervised symbolic and statistical learning, basics of artificial neural networks.
  • Unsupervised Learning and cluster analysis: hierarchical and partial clustering.
  • Data classification: Bayesian classifier, k-NN classifier, multi-layered perceptron (MBPN), support vector machine (SVM), and Random Forrest.
  • Data clustering: k-means clustering, Self-Organizing map (SOM).
  • Classification and clustering validity testing: leave-one-out, ground truth.
  •  Practical tasks may include:
    • Realize some search methods
    • Realize some classification methods
    • Realize some clustering methods

Teaching Methods

Lectures
Laboratory work
Net Support Learning
Mandatory assignments

Teaching Methods (additional text)

4 major assignments that include theoretical and practical aspects of the topics (graded)

Form(s) of Assessment

Exercises
Written exam, 3 hours

Form(s) of Assessment (additional text)

  • Written exam (60%)
  • 4 major assignments (40% total, 10% each)
  • The written exam and all major assignments must be passed

Grading Scale

Alphabetical Scale, A(best) – F (fail)

External/internal examiner

Internal examiner on the assignments, both internal and external examiner on the written exam.

Re-sit examination

For the written exam: Ordinary re-sit examination in August. The major assignments, if passed, need not be re-submitted.

Teaching Materials

Books/standards, conference/journal papers and web resources, such as:

  • Kononenko, M. Kukar, Machine Learning and Data Mining: Introduction to Principles and Algorithms, Horwood Publishing, Chichester, U.K., 2007, ISBN 1-904275-21-4

Recommended further reading:

  • T. Mitchell, Machine Learning, McGraw Hill, 1997.
  • R.O.Duda, P.E. Hart, and D.G. Stork: Pattern Classification. 2nd edition., Wiley, 2001.
  • S. Theodoridis, and K. Koutroumbas. Pattern Recognition, 3rd edition. Academic Press.

Replacement course for

IMT4612 Machine learning and pattern recognition

Additional information

The course will be made accessible for both campus and remote students. Every student is free to choose the pedagogic arrangement form that is best fitted for her/his own requirement. The lectures in the course will be given on campus and are open for both categories of students. All the lectures will also be available on Internet through the learning management system.