Timeouts on long lines / bad regexes


We’re having consistent reports of timeouts when users enable this. We have two different sources of timeouts:

  • Very long lines slow down regex matching, or poorly written regex (this ticket)

  • Very large binary files (SOTERIA-118).

  1. download rust repo https://github.com/rust-lang/rust

  2. I tested with revision de857bbcf02d192986efc380b4735d8c9bea85ac

  3. enabled GENERIC_PASSWORD rule

  4. started repository scan

  5. Scan is failed with timeout error. Cause is slowparse-bstring.rs and slowparse-string.rs files

The issue with slowparse-bstring.rs is that there’s a single huge line (hundreds of Kb). It actually breaks syntax highlighting in Vim also (errors with "pattern uses more memory than maxmempattern").

Implementation details

Two ideas come to mind of how to address this in general:

  • Arbitrarily chop up super long lines into smaller segments to scan.

    • There’s a small correctness issue, but extremely long lines are likely to be test data or another special case.

  • YACC has a regex extension, TimeLimitedMatcherFactory, which performs a timeout for each regex match. We could do something similar, explicitly doing a timeout on regex matching for each line so we can report a specific error of which regex failed to match in a reasonable time.

    • Admins have already requested the ability to see which regexes are slow, see

    • My only concern is that the extra calls to the system clock will slow matching quite a bit in the case where there is no timeout. But maybe it’ll perform ok.

