We present an automatic reconstruction pipeline for large scale urban scenes from aerial images captured by a camera mounted on
an unmanned aerial vehicle. Using state-of-the-art Structure from Motion and Multi-View Stereo algorithms, we first generate a
dense point cloud from the aerial images. Based on the statistical analysis of the footprint grid of the buildings, the point cloud
is classified into different categories (i.e., buildings, ground, trees, and others). Roof structures are extracted for each individual
building using Markov random field optimization. Then, a contour refinement algorithm based on pivot point detection is utilized
to refine the contour of patches. Finally, polygonal mesh models are extracted from the refined contours. Experiments on various
scenes as well as comparisons with state-of-the-art reconstruction methods demonstrate the effectiveness and robustness of the
proposed method.