|
@@ -29,13 +29,14 @@
|
|
|
<!-- Banner -->
|
|
|
<section id="banner" style="background-attachment:scroll;">
|
|
|
<h2>
|
|
|
- Taming Transformers for High-Resolution Image Synthesis
|
|
|
+ Taming Transformers for High-Resolution Image Synthesis (a.k.a <a href="https://twitter.com/hashtag/vqgan?src=hashtag_click">#VQGAN</a>)
|
|
|
</h2>
|
|
|
<p>
|
|
|
<a href="https://github.com/pesser">Patrick Esser</a>*,
|
|
|
<a href="https://github.com/rromb">Robin Rombach</a>*,
|
|
|
<a href="https://hci.iwr.uni-heidelberg.de/Staff/bommer">Björn Ommer</a><br/>
|
|
|
- <a href="https://www.iwr.uni-heidelberg.de/">IWR, Heidelberg University</a>
|
|
|
+ <a href="https://www.iwr.uni-heidelberg.de/">IWR, Heidelberg University</a><br/>
|
|
|
+ <a href="http://cvpr2021.thecvf.com/">CVPR 2021 (ORAL)</a>
|
|
|
</p>
|
|
|
</section>
|
|
|
|
|
@@ -48,7 +49,7 @@
|
|
|
<img src="paper/teaser.png" alt="" style="border:0px solid black"/>
|
|
|
<strong>TL;DR:</strong>
|
|
|
We combine the efficiancy of convolutional approaches with
|
|
|
- the expressivity of transformers by introducing a
|
|
|
+ the expressivity of transformers by introducing the
|
|
|
convolutional <em>VQGAN</em>, which learns a codebook of
|
|
|
context-rich visual parts, whose composition is modeled
|
|
|
with an autoregressive transformer.
|
|
@@ -58,6 +59,19 @@
|
|
|
<div class="container 25%">
|
|
|
|
|
|
|
|
|
+ <div class="image fit captioned align-just">
|
|
|
+ <div class="videocontainer">
|
|
|
+ <video controls class="videothing">
|
|
|
+ <source src="images/taming_talk.mp4" type="video/mp4">
|
|
|
+ Your browser does not support the video tag.
|
|
|
+ </video>
|
|
|
+ </div>
|
|
|
+ Our CVPR2021 Oral Talk
|
|
|
+ </div>
|
|
|
+ </div>
|
|
|
+ <div class="6u$ 12u$(xsmall)">
|
|
|
+
|
|
|
+
|
|
|
<div class="image fit captioned align-center"
|
|
|
style="margin-bottom:0em; box-shadow:0 0">
|
|
|
<a href="paper/paper.pdf">
|