In this project, we are focusing on stiching frames from a vedio into a panorama by computing homographies using SIFT and RANSAC. More information about this project can be found on the course website:
In order to stich two frames, we have to find key points and calculator their orientation discriptors. Fortunately, the funtion $sift$ in the libraray $VLfeat$ can cauculate both of them for us. Then we can compute the homography $H$ between these two frames using the following fomula:
let $x' = Hx$, where \[ x' = \begin{bmatrix} w'u' \\ w'v' \\ w' \end{bmatrix} H = \begin{bmatrix} h_1 & h_2 & h_3 \\ h_4 & h_5 & h_6 \\ h_7 & h_8 & h_9 \end{bmatrix} \]
Then we can solve the following the following matrix for $h$, \[ \begin{bmatrix} -u & -v & -1 & 0 & 0 & 0 & uu' & vu' & u' \\ 0 & 0 & 0 & -u & -v & -1 & uv' & vv' & v' \end{bmatrix} h = 0 \Rightarrow Ah = 0 \] We can apply SVD to $A$ s.t. $UDV^T = A$ and $h = V_{smallest}$.$
Result of mapping frame 270 onto frame 450:
Frame 270
Frame 450(Reference Frame)
Mapping other frames are alomost the same are part1.
To make a panorama, I am using frames [90 270 450 630 810] as key frames and frame 450 as the reference frame. Here is the result:
Note that I did not use any advanced blending algorithm for this panorama. I just directly replaced blank pixels with pixels from other mapped frames.
In this part, we are just mapping all the frames onto the reference plane and then convert them into a vedio.
However, for some frames, due to camera shankes and having bad luck in RANSAC, I did not get very good results so that the mapped video looks really shaky.
I tried to improve RANSAC results by increasing the number of iterations and slightly increased the threshold. Take frame 6 as an example, here is the result:
Iteration: 1000 Threshold: 1
Iteration: 4096 Threshold: 1.2
Hers is the video result:
I believe the result at the margin of the panorama is not very satisfactory because there is too few pixels mapped at that position and the median of those pixels became 0. Poor mapping of pixels at the left-most and the right-most areas also had negative influence on the result quality.
In this part, we have the background panorama. To create a backround movie from it, we have to determine which area of pixel belongs to which frame. For this process, we have to create a mask with the size of the original frames. For each frame use the homography matrix $H$ to map it onto the reference frame. Then we will have an area of background panorama that belongs to the certain frame. We can apply the inverse transform of homography to it so that the frame can map back to its original position. Here is the background movie:
By definition, foregorund pixels are pixels that stand out from the background pixels. We can take each frame from the originial movie and its background frame from the background movie. And then compare each pixel of these two frames. If the difference between a pixrl from the original frame and the background frame is larger than a threshold, it will be determined as a foreground pixel. Here is the foregorund movie:
Threshold: 0.35
@TODO