The goal of the SEMAPOLIS project (1/10/2013-30/09/2017) is to develop advanced large-scale image analysis and learning techniques to semantize city images and produce semantized 3D reconstructions of urban environments, including proper rendering. The Semapolis project is partly funded by the French National Research Agency (ANR).
Geometric 3D models of existing cities have a wide range of applications, such as navigation in virtual environments and realistic sceneries for video games and movies. A number of players (Google, Microsoft, Apple) have started to produce such data. However, the models feature only plain surfaces, textured from available pictures. This limits their use in urban studies and in the construction industry, excluding in practice applications to diagnosis and simulation. Besides, geometry and texturing are often wrong when there are invisible or discontinuous parts, e.g., with occluding foreground objects such as trees, cars or lampposts, which are pervasive in urban scenes.
We wish to go beyond by producing semantized 3D models, i.e., models which are not bare surfaces but which identify architectural elements such as windows, walls, roofs, doors, etc. The semantic priors will use to analyze images will also let us reconstruct plausible geometry and rendering for invisible parts. Semantic information is useful in a larger number of scenarios, including diagnosis and simulation for building renovation projects, accurate shadow impact taking into account actual window location, and more general urban planning and studies such as solar cell deployment. Another line of applications concerns improved virtual cities for navigation, with object-specific rendering, e.g., specular surfaces for windows. Models can also be made more compact, encoding object repetition (e.g., windows) rather than instances and replacing actual textures with more generic ones according to semantics; it allows cheap and fast transmission over low-bandwidth mobile phone networks, and efficient storage in GPS navigation devices.
The primary goal of the project is to make significant contributions and advance the state-of-the-art in the following areas:
- Learning for visual recognition: Novel large-scale machine learning algorithms will be developed to recognize various types of architectural elements and styles in images. These methods will be able to fully exploit very large amounts of image data while at the same time requiring a minimum amount of user annotation (weakly supervised learning).
- Shape grammar learning: Techniques will be developed to learn stochastic shape grammars from examples, and corresponding architecture style. Learnt grammars will be able to rapidly adapt to a wide variety of specific building types without the cost of manual expert design. Learnt grammar parameters will also lead to better parsing: faster, more accurate and more robust.
- Grammar-based inference: Innovative energy minimization approaches will be developed, leveraging on bottom-up cues, to efficiently cope with the exponential number of grammar interpretations, in particular in the context of grammars featuring rich architectural elements. A principled aggregation of the statistical visual properties will be designed, to accurately score parsing trials.
- Semantized 3D reconstruction: Robust original techniques will be developed to synchronize multiple-view 3D reconstruction with the semantic analysis, preventing inconsistencies such as unaligned roof and windows at facade angles.
- Semantic-aware rendering: Image-based rendering techniques will be developed benefiting from semantic classification to greatly improve visual quality regarding: improved depth synthesis, adaptive warping and blending, hole filling and region completion.
To validate our research, we will run experiments based on various kinds of data concerning Paris (large-scale panoramas, smaller scale but denser and georeferenced terrestrial and aerial images, cadastral maps, construction date database), reconstructing and rendering an entire neighborhood.