We’re having consistent reports of timeouts when users enable this. We have two different sources of timeouts:
Very long lines slow down regex matching, or poorly written regex (this ticket)
Very large binary files (SOTERIA-118).
Originally reported by here:
download rust repo https://github.com/rust-lang/rust
I tested with revision de857bbcf02d192986efc380b4735d8c9bea85ac
enabled GENERIC_PASSWORD rule
started repository scan
Scan is failed with timeout error. Cause is slowparse-bstring.rs and slowparse-string.rs files
The issue with slowparse-bstring.rs is that there’s a single huge line (hundreds of Kb). It actually breaks syntax highlighting in Vim also (errors with "pattern uses more memory than maxmempattern").
Discussion copied from this PR:
Two ideas come to mind of how to address this in general:
Arbitrarily chop up super long lines into smaller segments to scan.
There’s a small correctness issue, but extremely long lines are likely to be test data or another special case.
YACC has a regex extension, TimeLimitedMatcherFactory, which performs a timeout for each regex match. We could do something similar, explicitly doing a timeout on regex matching for each line so we can report a specific error of which regex failed to match in a reasonable time.
Admins have already requested the ability to see which regexes are slow, see
My only concern is that the extra calls to the system clock will slow matching quite a bit in the case where there is no timeout. But maybe it’ll perform ok.
We can also record timed out validations with file/rule info into error_message field, so admins could see this info right now (as error message) without additional UI development.