Groundbreaking Results: Setting New Benchmarks


Our research has achieved unprecedented levels of robustness against AutoAttack, the most rigorous ensemble of adversarial attacks, particularly on a critical medical dataset:

BrainTumor-7K (Medical Imaging Dataset - ResNet-18 Model):

  • Baseline AutoAttack Accuracy (ϵ=0.020): 48.29%

  • EDF Evolutionary Model AutoAttack Accuracy (ϵ=0.020): 98.86%

  • EDF Evolutionary Model AutoAttack Accuracy (ϵ=0.050): 97.86%

  • Improvement: An astonishing 50.57% gain in AutoAttack robustness over the baseline.

This achievement of nearly 99% AutoAttack robustness on a medical imaging dataset is, to our knowledge, a new state-of-the-art benchmark, profoundly impacting the potential for reliable AI deployment in healthcare.

Additional Highlights:

  • Phishing Legitimate (Tabular Cybersecurity): Achieved 94.10% AutoAttack Accuracy (from 73.25% baseline) – potentially the highest reported for tabular data without adversarial training.

  • Overall PGD Robustness (ϵ=0.020): Achieved a 75.78% improvement over baseline.

  • Maintained Clean Accuracy: A negligible -0.14% change (from 0.9872 to 0.9858), showcasing the true elimination of the trade-off.

Groundbreaking Results: Setting New Benchmarks

Our research has achieved unprecedented levels of robustness against AutoAttack, the most rigorous ensemble of adversarial attacks, particularly on a critical medical dataset:

BrainTumor-7K (Medical Imaging Dataset - ResNet-18 Model):

  • Baseline AutoAttack Accuracy (ϵ=0.020): 48.29%

  • EDF Evolutionary Model AutoAttack Accuracy (ϵ=0.020): 98.86%

  • EDF Evolutionary Model AutoAttack Accuracy (ϵ=0.050): 97.86%

  • Improvement: An astonishing 50.57% gain in AutoAttack robustness over the baseline.

This achievement of nearly 99% AutoAttack robustness on a medical imaging dataset is, to our knowledge, a new state-of-the-art benchmark, profoundly impacting the potential for reliable AI deployment in healthcare.

Additional Highlights:

  • Phishing Legitimate (Tabular Cybersecurity): Achieved 94.10% AutoAttack Accuracy (from 73.25% baseline) – potentially the highest reported for tabular data without adversarial training.

  • Overall PGD Robustness (ϵ=0.020): Achieved a 75.78% improvement over baseline.

  • Maintained Clean Accuracy: A negligible -0.14% change (from 0.9872 to 0.9858), showcasing the true elimination of the trade-off.

98.86%

98.86%

98.86%

Braintumor-7K (ResNet-18)

Braintumor-7K (ResNet-18)

AGAINST AUTOATTACK

AGAINST AUTOATTACK

AGAINST
AUTOATTACK

94.10%

94.10%

Phishing Dataset (Tabular Cybersecurity)

94.10%

Phishing Dataset (Tabular Cybersecurity)

Impact & Future Vision

Impact & Future Vision

This research, published in a leading venue (forthcoming presentation at IEEE AI-SI 2025, Kuala Lumpur), provides a foundational technology for building truly trustworthy and deployable AI systems

This research, published in a leading venue (forthcoming presentation at IEEE AI-SI 2025, Kuala Lumpur), provides a foundational technology for building truly trustworthy and deployable AI systems

Charanarravindaa Suriess (Co-Founder & CEO)

The Evolved Resnet Model weights have been made available for open-source use, allowing you to conduct personal stress tests. You can access them through the following link:
https://github.com/Charanarravindaa/Evolutionary_Models_Benchmarks

© 2025 CORTEXA LABS

charanarravindaasuriess@gmail.com

© 2025 CORTEXA LABS

charanarravindaasuriess@gmail.com

Create a free website with Framer, the website builder loved by startups, designers and agencies.