16720

Semester-2 Computer Vision

Code

This course introduces the fundamental techniques used in computer vision, that is, the analysis of patterns in visual images to reconstruct and understand the objects and scenes that generated them. Topics covered include image formation and representation, camera geometry and calibration, multi-view geometry, stereo, 3D reconstruction from images, motion analysis, image segmentation, object recognition.

Major Assignment Topics Covered:
1. Color Channel Alignment
2. Imager Warping
3. Spatial Pyramid matching for scene Classification
4. Augmented reality with planar homography
5. Panorama
6. Lucas Kanade Motion tracking
7. 3D reconstruction
8. Neural network for recognition
9. Photometric stereo
• Calibrated- Using albedos and normal for shape estimation
• Uncalibrated- Frankot-Chellappa algorithm for shape reconstruction

Not only did the model perform admirably well on unseen text from ID types part of training data irrespective of variations in OCR output and image layout, but it generalised well for out of sample ID types too when finetuned with just 1-5 samples of these cards.

The idea behind this was to build a generic, flexible information retrieval engine thats pretrained to extract important information from OCR output of all ID cards without specifically being trained on them or having seen them, without any rule based processing, that can be easily finetuned on a very small number of samples of any new card type for optimum performance. This was made into a rest API as a plug and play product for clients to finetune the model on their samples and then use it out of the box to extract information from IDs. The performance was measured using precision and recall figures.