Multimedia Information Retrieval

2020-04-01

An introduction to retrieval from image, video, and audio databases, from a statistical, pattern recognition, and algorithmic point of view.

Models of information retrieval
Nearest neighbors, range queries, hash algorithms
PLSA and topic models vector space models
Text categorization, analysis, tagging, parsing
Content based image and video retrieval
Image retrieval based on color, texture, and shape
Visual bag of words model
Grouping, static and motion segmentation, scene cuts
Geometric indexing, verification, and object recognition
Automatic annotation and categorization
HMM-based methods (audio, video, music, document retrieval)
Generative and discriminative methods
Selected special purpose applications, such as face detection, x-ray image analysis
Performance evaluation and competitions
Applications in consumer imaging, security, forensics, and copyright and plagiarism detection

Materials

Questions

Lecture 1

Name common commercial applications of multimedia information retrieval.
Describe specific, common multimedia information retrieval tasks (what do queries look like? what is stored in the database?).
Write a Python / NumPy / SciPy function to …
Define: precision, recall, 11-point P/R graph, F-measure, mean average precision, true/false positive/negative, ROC curve, DET curve
Describe the vector space model of text information retrieval.
Describe the process of tokenization.
Describe latent semantic analysis.

Lecture 2

Name and describe major image storage formats, their properties, and differences (e.g., PPM, PNG, JPEG, JPEG2000).
Name and describe common command line tools and Python libraries for image processing (e.g., ImageMagick, SciPy, PIL, etc.)
What is EXIF? What kind of information is stored in it?
Given Python expressions to load images, display them, and convert them to grayscale.
Describe JPEG compression.
How do you perform in-memory JPEG compression of an image given as a Python array?
What is an MD5 checksum? What is a common use for checksums in image databases?
How are images stored in relational databases?
What is the HDF5 database format? How does it differ from a relational database?
What is OpenCV?
How would you perform fade-to-black detection in a video?
How would you perform scene cut detection in a video?
What is keyframe detection? What is it used for?
What is optical flow? How does it differ from object motion?

Lecture 3

What can you say about the error rate of nearest neighbor classifiers?
How are nearest neighbor classifiers related to information retrieval applications?
Why are nearest neighbor classifiers particularly suitable for information retrieval applications (compared to other kinds of classifiers)?
What properties do dissimilarity functions commonly satisfy?
What are the requirements for a distance in a metric space?
What Python library functions are useful for performing nearest neighbor classification?
Name some common color spaces.
What is a color histogram?
How are color histograms used for image retrieval?
What is a perceptually uniform color space?
What is image texture?
How is image texture represented?
How do you segment images by their texture?

Lecture 4

What do clustering algorithms do?
Describe commonly used, different approaches to clustering.
Describe the k-means clustering algorithm.
What is Gaussian mixture modeling?
When does GMM work better than k-means clustering?
How can you determine the correct number of clusters in a clustering problem?
What is hierarchical clustering?
What is a dendrogram?
What is single-linkage clustering? What is complete linkage clustering? What is average linkage clustering?
How can you combine k-means clustering and PCA? Why would you?
What is “hierarchical tree VQ”? What is hierarchical clustering?

Lecture 5

What are image patches? How are they used in MMIR?
How are patches commonly represented?
How does compression by vector quantization work?
What is a VQ code histogram? What is the bag of visual words method?
What is interest point detection?
How are interest points used with patch descriptors?
Name commonly used interest point detectors.
Describe how you can tag/label images using a combination of interest point detection, patch descriptors, and logistic regression.
Motivate and describe the Harris corner detector.
Describe corner detection based on (1) median filters, (2) level curves, (3) orientation histograms, (4) morphological operations.
What is scale space? How is scale space used in the SIFT interest point detector?

Lecture 6

Describe the abstraction used for interest point detection and descriptors used in OpenCV.
Describe common applications for interest point detectors and descriptors.
Describe the demands that each of these applications places on interest point detectors / descriptors.
Name some commonly used interest point detectors and feature descriptors.
What are SIFT, SURF, MSER, Harris corners?
Give commonly desired properties for interest point detectors.
Describe how we can benchmark/test/evaluate interest point detectors.
Describe the SIFT feature descriptor.

Lecture 7

Describe the relationship between k-nearest neighbor classification and nonparametric density estimation.
What happens with nearest neighbor classification in high dimensions?
What are approximate nearest neighbor methods? Does such an approximation make sense in high dimensions?
What is the intrinsic dimension of a data set?
How can we determine the intrinsic dimension of a data set?
What is the covering dimension?
Describe a linear method for dimensionality reduction.
Name some common nonlinear dimensionality reduction techniques.
Given a video sequence of an object rotating in 3D, what is the intrinsic dimensionality of the video frames viewed as feature vectors?
Name some commonly used techniques for fast approximate nearest neighbor retrieval.

Lecture 8

Explain the Hough transform for finding lines / the generalized Hough transform.
Explain the RANSAC algorithm for finding lines / for finding general object instances.
Explain the RAST algorithm for finding lines / for finding general object instances.
Explain how 2D coordinates can be represented as complex numbers, and how common geometric transformations can be expressed as complex arithmetic.
Describe how you can perform 2D object recognition using interest point detectors and geometric matching.
Explain how SIFT or SURF descriptors can be used to improve / speed up geometric matching for object recognition.

Lecture 9

Explain segmentation by thresholding / adaptive thresholding.
Discuss how thresholding might be used for color segmentation / texture segmentation.
What is document binarization?
How can the k-means algorithm be used for simple color-based segmentations of images?
Which color space is better for performing color based segmentation, Lab or RGB? Why?
What is the basic idea behind edge-based segmentation?
Explain the watershed segmentation algorithm.
Explain the idea behind the random-walk based segmentation algorithm.
What are superpixels?
Describe a simple k-means based implementation of superpixel segmentation.
Describe the graph cut segmentation algorithm.
What are active contours?
Describe the GVF-based active contour segmentation algorithm.

Lecture 10

How is speech generated? What is the vocal tract? How does it generate sounds?
What are formants?
What is a spectrogram? How is it computed?
What is a window function?
What is the purpose of dynamic time warping?
Describe the dynamic time warping algorithm.
How can you perform speech recognition with dynamic time warping?
What is a Markov chain?
What is an n-gram?
Define: accessibility / communicating states / irreducibility / transience / recurrence / positive recurrence / absorption / ergodicity / reversibility

Lecture 11

What is a Hidden Markov Model? How is it specified?
How can OCR (optical character recognition) be carried out with an HMM?
How can speech recognition be carried out with an HMM?
Describe how a spectrogram or the image of a line of text is transformed (using k-means) into a symbol sequence suitable for modeling with an HMM.
Construct a simple HMM to produce this ____ letter.
What is a durational model?
What is the durational model for a single HMM state with a self-recurrence and one transition to another state?
How can you construct a durational model with an approximately normal distribution of durations out of an HMM?
What does the forward algorithm compute?
Describe the forward algorithm.
Why do we implement most HMM algorithms using log probabilities?
What is Viterbi decoding?
Describe the Viterbi decoding algorithm.
What is Viterbi training?
What does the forward-backward algorithm compute?
Describe the forward-backward algorithm.
Describe Baum-Welch reestimation.
What is the Bakis model?
Why does the Bakis model often give better results than an unconstrained model with the same number of states?
Why does the Bakis model result in better durational models than a model with the minimum number of states necessary to produce the output sequence?