Timeouts on long lines / bad regexes

Description

We’re having consistent reports of timeouts when users enable this. We have two different sources of timeouts:

  • Very long lines slow down regex matching, or poorly written regex (this ticket)

  • Very large binary files (SOTERIA-118).

Originally reported by here:

  1. download rust repo https://github.com/rust-lang/rust

  2. I tested with revision de857bbcf02d192986efc380b4735d8c9bea85ac

  3. enabled GENERIC_PASSWORD rule

  4. started repository scan

  5. Scan is failed with timeout error. Cause is slowparse-bstring.rs and slowparse-string.rs files

The issue with slowparse-bstring.rs is that there’s a single huge line (hundreds of Kb). It actually breaks syntax highlighting in Vim also (errors with "pattern uses more memory than maxmempattern").

Implementation details

Discussion copied from this PR:

Two ideas come to mind of how to address this in general:

  • Arbitrarily chop up super long lines into smaller segments to scan.

    • There’s a small correctness issue, but extremely long lines are likely to be test data or another special case.

  • YACC has a regex extension, TimeLimitedMatcherFactory, which performs a timeout for each regex match. We could do something similar, explicitly doing a timeout on regex matching for each line so we can report a specific error of which regex failed to match in a reasonable time.

    • Admins have already requested the ability to see which regexes are slow, see

    • My only concern is that the extra calls to the system clock will slow matching quite a bit in the case where there is no timeout. But maybe it’ll perform ok.

We can also record timed out validations with file/rule info into error_message field, so admins could see this info right now (as error message) without additional UI development.

Environment

None

Assignee

Unassigned

Reporter

George V @Mohami

Labels

None

Github URL

None

Priority

High
Configure