data_extraction_byyear.py 16 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338
  1. from evaluator import *
  2. DESCRIPTION = "Test if the model can extract structured data from (somewhat) unstructured text."
  3. TAGS = ['data']
  4. question = '''
  5. From the following data extract the best performing defense each year, in the format {year: robust accuracy}
  6. So for example the answer for {"2024": 69.71, "2023": ..., ...}, now fill it in for every other year. Return the answer as a JSON dict.
  7. Rank Method Standard
  8. accuracy AutoAttack
  9. robust
  10. accuracy Best known
  11. robust
  12. accuracy AA eval.
  13. potentially
  14. unreliable Extra
  15. data Architecture Venue
  16. 1 Robust Principles: Architectural Design Principles for Adversarially Robust CNNs
  17. It uses additional 50M synthetic images in training. 93.27% 71.07% 71.07%
  18. ×
  19. × RaWideResNet-70-16 BMVC 2023
  20. 2 Better Diffusion Models Further Improve Adversarial Training
  21. It uses additional 50M synthetic images in training. 93.25% 70.69% 70.69%
  22. ×
  23. × WideResNet-70-16 ICML 2023
  24. 3 MixedNUTS: Training-Free Accuracy-Robustness Balance via Nonlinearly Mixed Classifiers
  25. It uses an ensemble of networks. The robust base classifier uses 50M synthetic images. 69.71% robust accuracy is due to the original evaluation (Adaptive AutoAttack) 95.19% 70.08% 69.71%
  26. ×
  27. ☑ ResNet-152 + WideResNet-70-16 arXiv, Feb 2024
  28. 4 Improving the Accuracy-Robustness Trade-off of Classifiers via Adaptive Smoothing
  29. It uses an ensemble of networks. The robust base classifier uses 50M synthetic images. 95.23% 68.06% 68.06%
  30. ×
  31. ☑ ResNet-152 + WideResNet-70-16 + mixing network SIMODS 2024
  32. 5 Decoupled Kullback-Leibler Divergence Loss
  33. It uses additional 20M synthetic images in training. 92.16% 67.73% 67.73%
  34. ×
  35. × WideResNet-28-10 arXiv, May 2023
  36. 6 Better Diffusion Models Further Improve Adversarial Training
  37. It uses additional 20M synthetic images in training. 92.44% 67.31% 67.31%
  38. ×
  39. × WideResNet-28-10 ICML 2023
  40. 7 Fixing Data Augmentation to Improve Adversarial Robustness
  41. 66.56% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted) 92.23% 66.58% 66.56%
  42. ×
  43. ☑ WideResNet-70-16 arXiv, Mar 2021
  44. 8 Improving Robustness using Generated Data
  45. It uses additional 100M synthetic images in training. 66.10% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted) 88.74% 66.11% 66.10%
  46. ×
  47. × WideResNet-70-16 NeurIPS 2021
  48. 9 Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
  49. 65.87% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted) 91.10% 65.88% 65.87%
  50. ×
  51. ☑ WideResNet-70-16 arXiv, Oct 2020
  52. 10 Revisiting Residual Networks for Adversarial Robustness: An Architectural Perspective 91.58% 65.79% 65.79%
  53. ×
  54. ☑ WideResNet-A4 arXiv, Dec. 2022
  55. 11 Fixing Data Augmentation to Improve Adversarial Robustness
  56. It uses additional 1M synthetic images in training. 64.58% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted) 88.50% 64.64% 64.58%
  57. ×
  58. × WideResNet-106-16 arXiv, Mar 2021
  59. 12 Stable Neural ODE with Lyapunov-Stable Equilibrium Points for Defending Against Adversarial Attacks
  60. Based on the model Rebuffi2021Fixing_70_16_cutmix_extra. 64.20% robust accuracy is due to AutoAttack + transfer APGD from Rebuffi2021Fixing_70_16_cutmix_extra 93.73% 71.28% 64.20%
  61. ☑ WideResNet-70-16, Neural ODE block NeurIPS 2021
  62. 13 Fixing Data Augmentation to Improve Adversarial Robustness
  63. It uses additional 1M synthetic images in training. 64.20% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted) 88.54% 64.25% 64.20%
  64. ×
  65. × WideResNet-70-16 arXiv, Mar 2021
  66. 14 Exploring and Exploiting Decision Boundary Dynamics for Adversarial Robustness
  67. It uses additional 10M synthetic images in training. 93.69% 63.89% 63.89%
  68. ×
  69. × WideResNet-28-10 ICLR 2023
  70. 15 Improving Robustness using Generated Data
  71. It uses additional 100M synthetic images in training. 63.38% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted) 87.50% 63.44% 63.38%
  72. ×
  73. × WideResNet-28-10 NeurIPS 2021
  74. 16 Robustness and Accuracy Could Be Reconcilable by (Proper) Definition
  75. It uses additional 1M synthetic images in training. 89.01% 63.35% 63.35%
  76. ×
  77. × WideResNet-70-16 ICML 2022
  78. 17 Helper-based Adversarial Training: Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off 91.47% 62.83% 62.83%
  79. ×
  80. ☑ WideResNet-34-10 OpenReview, Jun 2021
  81. 18 Robust Learning Meets Generative Models: Can Proxy Distributions Improve Adversarial Robustness?
  82. It uses additional 10M synthetic images in training. 87.30% 62.79% 62.79%
  83. ×
  84. × ResNest152 ICLR 2022
  85. 19 Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
  86. 62.76% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted) 89.48% 62.80% 62.76%
  87. ×
  88. ☑ WideResNet-28-10 arXiv, Oct 2020
  89. 20 Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks
  90. Uses exponential moving average (EMA) 91.23% 62.54% 62.54%
  91. ×
  92. ☑ WideResNet-34-R NeurIPS 2021
  93. 21 Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks 90.56% 61.56% 61.56%
  94. ×
  95. ☑ WideResNet-34-R NeurIPS 2021
  96. 22 Parameterizing Activation Functions for Adversarial Robustness
  97. It uses additional ~6M synthetic images in training. 87.02% 61.55% 61.55%
  98. ×
  99. × WideResNet-28-10-PSSiLU arXiv, Oct 2021
  100. 23 Robustness and Accuracy Could Be Reconcilable by (Proper) Definition
  101. It uses additional 1M synthetic images in training. 88.61% 61.04% 61.04%
  102. ×
  103. × WideResNet-28-10 ICML 2022
  104. 24 Helper-based Adversarial Training: Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off
  105. It uses additional 1M synthetic images in training. 88.16% 60.97% 60.97%
  106. ×
  107. × WideResNet-28-10 OpenReview, Jun 2021
  108. 25 Fixing Data Augmentation to Improve Adversarial Robustness
  109. It uses additional 1M synthetic images in training. 60.73% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted) 87.33% 60.75% 60.73%
  110. ×
  111. × WideResNet-28-10 arXiv, Mar 2021
  112. 26 Do Wider Neural Networks Really Help Adversarial Robustness?
  113. 87.67% 60.65% 60.65% Unknown ☑ WideResNet-34-15 arXiv, Oct 2020
  114. 27 Improving Neural Network Robustness via Persistency of Excitation 86.53% 60.41% 60.41%
  115. ×
  116. ☑ WideResNet-34-15 ACC 2022
  117. 28 Robust Learning Meets Generative Models: Can Proxy Distributions Improve Adversarial Robustness?
  118. It uses additional 10M synthetic images in training. 86.68% 60.27% 60.27%
  119. ×
  120. × WideResNet-34-10 ICLR 2022
  121. 29 Adversarial Weight Perturbation Helps Robust Generalization 88.25% 60.04% 60.04%
  122. ×
  123. ☑ WideResNet-28-10 NeurIPS 2020
  124. 30 Improving Neural Network Robustness via Persistency of Excitation 89.46% 59.66% 59.66%
  125. ×
  126. ☑ WideResNet-28-10 ACC 2022
  127. 31 Geometry-aware Instance-reweighted Adversarial Training
  128. Uses
  129. = 0.031 ≈ 7.9/255 instead of 8/255. 89.36% 59.64% 59.64%
  130. ×
  131. ☑ WideResNet-28-10 ICLR 2021
  132. 32 Unlabeled Data Improves Adversarial Robustness 89.69% 59.53% 59.53%
  133. ×
  134. ☑ WideResNet-28-10 NeurIPS 2019
  135. 33 Improving Robustness using Generated Data
  136. It uses additional 100M synthetic images in training. 58.50% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted) 87.35% 58.63% 58.50%
  137. ×
  138. × PreActResNet-18 NeurIPS 2021
  139. 34 Data filtering for efficient adversarial training
  140. 86.10% 58.09% 58.09%
  141. ×
  142. × WideResNet-34-20 Pattern Recognition 2024
  143. 35 Scaling Adversarial Training to Large Perturbation Bounds 85.32% 58.04% 58.04%
  144. ×
  145. × WideResNet-34-10 ECCV 2022
  146. 36 Efficient and Effective Augmentation Strategy for Adversarial Training 88.71% 57.81% 57.81%
  147. ×
  148. × WideResNet-34-10 NeurIPS 2022
  149. 37 LTD: Low Temperature Distillation for Robust Adversarial Training
  150. 86.03% 57.71% 57.71%
  151. ×
  152. × WideResNet-34-20 arXiv, Nov 2021
  153. 38 Helper-based Adversarial Training: Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off 89.02% 57.67% 57.67%
  154. ×
  155. ☑ PreActResNet-18 OpenReview, Jun 2021
  156. 39 LAS-AT: Adversarial Training with Learnable Attack Strategy
  157. 85.66% 57.61% 57.61%
  158. ×
  159. × WideResNet-70-16 arXiv, Mar 2022
  160. 40 A Light Recipe to Train Robust Vision Transformers 91.73% 57.58% 57.58%
  161. ×
  162. ☑ XCiT-L12 arXiv, Sep 2022
  163. 41 Data filtering for efficient adversarial training
  164. 86.54% 57.30% 57.30%
  165. ×
  166. × WideResNet-34-10 Pattern Recognition 2024
  167. 42 A Light Recipe to Train Robust Vision Transformers 91.30% 57.27% 57.27%
  168. ×
  169. ☑ XCiT-M12 arXiv, Sep 2022
  170. 43 Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
  171. 57.14% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted) 85.29% 57.20% 57.14%
  172. ×
  173. × WideResNet-70-16 arXiv, Oct 2020
  174. 44 HYDRA: Pruning Adversarially Robust Neural Networks
  175. Compressed model 88.98% 57.14% 57.14%
  176. ×
  177. ☑ WideResNet-28-10 NeurIPS 2020
  178. 45 Decoupled Kullback-Leibler Divergence Loss 85.31% 57.09% 57.09%
  179. ×
  180. × WideResNet-34-10 arXiv, May 2023
  181. 46 Helper-based Adversarial Training: Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off
  182. It uses additional 1M synthetic images in training. 86.86% 57.09% 57.09%
  183. ×
  184. × PreActResNet-18 OpenReview, Jun 2021
  185. 47 LTD: Low Temperature Distillation for Robust Adversarial Training
  186. 85.21% 56.94% 56.94%
  187. ×
  188. × WideResNet-34-10 arXiv, Nov 2021
  189. 48 Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples
  190. 56.82% robust accuracy is due to the original evaluation (AutoAttack + MultiTargeted) 85.64% 56.86% 56.82%
  191. ×
  192. × WideResNet-34-20 arXiv, Oct 2020
  193. 49 Fixing Data Augmentation to Improve Adversarial Robustness
  194. It uses additional 1M synthetic images in training. 83.53% 56.66% 56.66%
  195. ×
  196. × PreActResNet-18 arXiv, Mar 2021
  197. 50 Improving Adversarial Robustness Requires Revisiting Misclassified Examples 87.50% 56.29% 56.29%
  198. ×
  199. ☑ WideResNet-28-10 ICLR 2020
  200. 51 LAS-AT: Adversarial Training with Learnable Attack Strategy
  201. 84.98% 56.26% 56.26%
  202. ×
  203. × WideResNet-34-10 arXiv, Mar 2022
  204. 52 Adversarial Weight Perturbation Helps Robust Generalization 85.36% 56.17% 56.17%
  205. ×
  206. × WideResNet-34-10 NeurIPS 2020
  207. 53 A Light Recipe to Train Robust Vision Transformers 90.06% 56.14% 56.14%
  208. ×
  209. ☑ XCiT-S12 arXiv, Sep 2022
  210. 54 Are Labels Required for Improving Adversarial Robustness? 86.46% 56.03% 56.03% Unknown ☑ WideResNet-28-10 NeurIPS 2019
  211. 55 Robust Learning Meets Generative Models: Can Proxy Distributions Improve Adversarial Robustness?
  212. It uses additional 10M synthetic images in training. 84.59% 55.54% 55.54%
  213. ×
  214. × ResNet-18 ICLR 2022
  215. 56 Using Pre-Training Can Improve Model Robustness and Uncertainty 87.11% 54.92% 54.92%
  216. ×
  217. ☑ WideResNet-28-10 ICML 2019
  218. 57 Bag of Tricks for Adversarial Training
  219. 86.43% 54.39% 54.39% Unknown × WideResNet-34-20 ICLR 2021
  220. 58 Boosting Adversarial Training with Hypersphere Embedding 85.14% 53.74% 53.74%
  221. ×
  222. × WideResNet-34-20 NeurIPS 2020
  223. 59 Learnable Boundary Guided Adversarial Training
  224. Uses
  225. = 0.031 ≈ 7.9/255 instead of 8/255 88.70% 53.57% 53.57%
  226. ×
  227. × WideResNet-34-20 ICCV 2021
  228. 60 Attacks Which Do Not Kill Training Make Adversarial Learning Stronger 84.52% 53.51% 53.51%
  229. ×
  230. × WideResNet-34-10 ICML 2020
  231. 61 Overfitting in adversarially robust deep learning 85.34% 53.42% 53.42%
  232. ×
  233. × WideResNet-34-20 ICML 2020
  234. 62 Self-Adaptive Training: beyond Empirical Risk Minimization
  235. Uses
  236. = 0.031 ≈ 7.9/255 instead of 8/255. 83.48% 53.34% 53.34% Unknown × WideResNet-34-10 NeurIPS 2020
  237. 63 Theoretically Principled Trade-off between Robustness and Accuracy
  238. Uses
  239. = 0.031 ≈ 7.9/255 instead of 8/255. 84.92% 53.08% 53.08% Unknown × WideResNet-34-10 ICML 2019
  240. 64 Learnable Boundary Guided Adversarial Training
  241. Uses
  242. = 0.031 ≈ 7.9/255 instead of 8/255 88.22% 52.86% 52.86%
  243. ×
  244. × WideResNet-34-10 ICCV 2021
  245. 65 Adversarial Robustness through Local Linearization 86.28% 52.84% 52.84% Unknown × WideResNet-40-8 NeurIPS 2019
  246. 66 Efficient and Effective Augmentation Strategy for Adversarial Training 85.71% 52.48% 52.48%
  247. ×
  248. × ResNet-18 NeurIPS 2022
  249. 67 Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning
  250. Uses ensembles of 3 models. 86.04% 51.56% 51.56% Unknown × ResNet-50 CVPR 2020
  251. 68 Efficient Robust Training via Backward Smoothing
  252. 85.32% 51.12% 51.12% Unknown × WideResNet-34-10 arXiv, Oct 2020
  253. 69 Scaling Adversarial Training to Large Perturbation Bounds 80.24% 51.06% 51.06%
  254. ×
  255. × ResNet-18 ECCV 2022
  256. 70 Improving Adversarial Robustness Through Progressive Hardening
  257. 86.84% 50.72% 50.72% Unknown × WideResNet-34-10 arXiv, Mar 2020
  258. 71 Robustness library 87.03% 49.25% 49.25% Unknown × ResNet-50 GitHub,
  259. Oct 2019
  260. 72 Harnessing the Vulnerability of Latent Layers in Adversarially Trained Models 87.80% 49.12% 49.12% Unknown × WideResNet-34-10 IJCAI 2019
  261. 73 Metric Learning for Adversarial Robustness 86.21% 47.41% 47.41% Unknown × WideResNet-34-10 NeurIPS 2019
  262. 74 You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle
  263. Focuses on fast adversarial training. 87.20% 44.83% 44.83% Unknown × WideResNet-34-10 NeurIPS 2019
  264. 75 Towards Deep Learning Models Resistant to Adversarial Attacks 87.14% 44.04% 44.04% Unknown × WideResNet-34-10 ICLR 2018
  265. 76 Understanding and Improving Fast Adversarial Training
  266. Focuses on fast adversarial training. 79.84% 43.93% 43.93% Unknown × PreActResNet-18 NeurIPS 2020
  267. 77 Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness 80.89% 43.48% 43.48% Unknown × ResNet-32 ICLR 2020
  268. 78 Fast is better than free: Revisiting adversarial training
  269. Focuses on fast adversarial training. 83.34% 43.21% 43.21% Unknown × PreActResNet-18 ICLR 2020
  270. 79 Adversarial Training for Free! 86.11% 41.47% 41.47% Unknown × WideResNet-34-10 NeurIPS 2019
  271. 80 MMA Training: Direct Input Space Margin Maximization through Adversarial Training 84.36% 41.44% 41.44% Unknown × WideResNet-28-4 ICLR 2020
  272. 81 A Tunable Robust Pruning Framework Through Dynamic Network Rewiring of DNNs
  273. Compressed model 87.32% 40.41% 40.41%
  274. ×
  275. × ResNet-18 ASP-DAC 2021
  276. 82 Controlling Neural Level Sets
  277. Uses
  278. = 0.031 ≈ 7.9/255 instead of 8/255. 81.30% 40.22% 40.22% Unknown × ResNet-18 NeurIPS 2019
  279. 83 Robustness via Curvature Regularization, and Vice Versa 83.11% 38.50% 38.50% Unknown × ResNet-18 CVPR 2019
  280. 84 Defense Against Adversarial Attacks Using Feature Scattering-based Adversarial Training 89.98% 36.64% 36.64% Unknown × WideResNet-28-10 NeurIPS 2019
  281. 85 Adversarial Interpolation Training: A Simple Approach for Improving Model Robustness 90.25% 36.45% 36.45% Unknown × WideResNet-28-10 OpenReview, Sep 2019
  282. 86 Adversarial Defense via Learning to Generate Diverse Attacks 78.91% 34.95% 34.95% Unknown × ResNet-20 ICCV 2019
  283. 87 Sensible adversarial learning 91.51% 34.22% 34.22% Unknown × WideResNet-34-10 OpenReview, Sep 2019
  284. 88 Towards Stable and Efficient Training of Verifiably Robust Neural Networks
  285. Verifiably robust model with 32.24% provable robust accuracy 44.73% 32.64% 32.64% Unknown × 5-layer-CNN ICLR 2020
  286. 89 Bilateral Adversarial Training: Towards Fast Training of More Robust Models Against Adversarial Attacks 92.80% 29.35% 29.35% Unknown × WideResNet-28-10 ICCV 2019
  287. 90 Enhancing Adversarial Defense by k-Winners-Take-All
  288. Uses
  289. = 0.031 ≈ 7.9/255 instead of 8/255.
  290. 7.40% robust accuracy is due to 1 restart of APGD-CE and 30 restarts of Square Attack
  291. Note: this adaptive evaluation (Section 5) reports 0.16% robust accuracy on a different model (adversarially trained ResNet-18). 79.28% 18.50% 7.40%
  292. × DenseNet-121 ICLR 2020
  293. 91 Manifold Regularization for Adversarial Robustness 90.84% 1.35% 1.35% Unknown × ResNet-18 arXiv, Mar 2020
  294. 92 Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks 89.16% 0.28% 0.28% Unknown × ResNet-110 ICCV 2019
  295. 93 Jacobian Adversarially Regularized Networks for Robustness 93.79% 0.26% 0.26% Unknown × WideResNet-34-10 ICLR 2020
  296. 94 ClusTR: Clustering Training for Robustness 91.03% 0.00% 0.00% Unknown × WideResNet-28-10 arXiv, Jun 2020
  297. 95 Standardly trained model 94.78% 0.0% 0.0% Unknown × WideResNet-28-10 N/A
  298. '''
  299. TestDataYearExtract = question >> LLMRun() >> ExtractJSON() >> JSONSubsetEvaluator({
  300. "2024": 69.71,
  301. "2023": 71.07,
  302. "2022": 65.79,
  303. "2021": 66.56,
  304. "2020": 65.87,
  305. "2019": 59.53,
  306. "2018": 44.04
  307. })
  308. if __name__ == "__main__":
  309. print(run_test(TestDataYearExtract))