Python
Python is a popular language among YARA users. They use Python for all kinds of automation tasks, and the YARA-X ecosystem wouldn’t be complete without the possibility of using it from Python programs.
YARA-X offers support for Python 3.8 or later, in Linux, MacOS and Windows.
Installation
Installing the yara-x
Python module couldn’t be easier:
After the installation you can check if everything went fine by running the following program:
If the program above runs without errors, everything is ready to start using YARA-X from your Python programs.
API overview
Using YARA-X from Python involves a two-step process: rule compilation and scanning. During the rule compilation phase you transform YARA rules from text into a compiled Rules object. This object is later used for scanning data.
To compile rules, you can either use the yara_x.compile(…) function or a Compiler object. The former is simpler and sufficient for simpler scenarios. For more complex use-cases involving the use of namespaces and multiple rule sets, the latter method is necessary.
Once you have a Rules object, you can proceed in two ways: either use the Rules.scan(…) method, or create a Scanner. Again, the former is the easiest way, but the later gives you more control over the scanning process.
Examples
API reference
compile(…)
Function that takes a string with one or more YARA rules and produces a Rules object representing the rules in compiled form. This is the simplest way for compiling YARA rules, for more advanced use-cases you must use a Compiler.
Returns: yara_x.Rules
Raises: yara_x.CompileError
Example
Compiler
Type that represents a YARA-X compiler. It takes one or more sets of YARA rules in text form and compile them into a Rules object.
.__init__(relaxed_re_syntax=False)
Compiler constructor. The relaxed_re_syntax
argument controls whether the
compiler should adopt a more relaxed syntax check for regular expressions,
allowing constructs that YARA-X doesn’t accept by default.
YARA-X enforces stricter regular expression syntax compared to YARA. For
instance, YARA accepts invalid escape sequences and treats them as literal
characters (e.g., \R is interpreted as a literal ‘R’). It also allows some
special characters to appear unescaped, inferring their meaning from the
context (e.g., {
and }
in /foo{}bar/
are literal, but in /foo{0,1}bar/
they form the repetition operator {0,1}
).
Example
.add_source(string, origin=None)
Adds some YARA source code to be compiled. Raises an exception if the source code is not valid.
The optional origin
parameter is a string that specifies the origin of the
source code. This is usually the path of the file containing the source code,
but it can be any arbitrary string conveying information about the source’s
origin.
Raises: yara_x.CompileError
Example
.define_global(identifier, value)
Defines a global variable and sets its initial value.
Global variables must be defined before calling Compiler.add_source(…) with some YARA rule that uses the variable. The variable will retain its initial value when the Rules are used for scanning data, however each scanner can change the variable’s value by calling Scanner.set_global(…).
The type of value
must be: bool
, str
, bytes
, int
or float
.
Raises: TypeError
if the type of value
is not one of the supported ones.
Example
.new_namespace(string)
Creates a new namespace. Any further call to Compiler.add_source(…) will put the new rules under the new namespace, isolating them from previously added rules.
Example
.errors()
Returns the errors found during the compilation, across all calls to Compiler.add_source(…). The result is an array of dictionaries, where each dictionary represents an error. This is an example:
.warnings()
Returns the warnings found during the compilation, across all calls to Compiler.add_source(…). The result is an array of dictionaries, where each dictionary represents a warning. This is an example:
.build()
Produces a compiled Rules object that contains all the rules previously added to the compiler with Compiler.add_source(…). Once this method is called the Compiler is reset to its original state, as if it was a newly created compiler.
Rules
Type that represents a set of compiled rules. The compiled rules can be used for
scanning data by calling the Rules.scan(…) method or passing
the Rules
object to a Scanner.
.scan(bytes)
Scans data with the compiled rules. This is the simplest way of using the compiled rules for scanning data. For more advanced use-cases you can use a Scanner.
Returns: yara_x.ScanResults
Raises: yara_x.ScanError, yara_x.TimeoutError
Scanner
Type that represents a YARA-X scanner. When creating the Scanner you must provide a Rules object containing the rules that will be used during the scan operation. The same Rules can be used by multiple scanner simultaneously.
Example
.scan(bytes)
Scans in-memory data.
Returns: yara_x.ScanResults
Raises: yara_x.ScanError, yara_x.TimeoutError
Example
.scan_file(path)
Scans a file given its path.
Returns: yara_x.ScanResults
Raises: yara_x.ScanError, yara_x.TimeoutError
.set_global(identifier, value)
Sets the value of a global variable. The variable must has been previously defined during the compilation, for example by calling Compiler.define_global(…), and the type it has during the definition must match the type of the new value. The variable will retain the new value in subsequent scans, unless this function is called again for setting a new value.
Raises: TypeError
if the type of value
is not one of the supported ones.
.set_timeout(seconds)
Sets a timeout for each scan. Scans will abort after the specified seconds
.
ScanResults
Type that represents the results of a scan operation.
.matching_rules
Array of Rule objects with every rule that matched during the scan.
.module_outputs
A dictionary containing the information extracted by all YARA-X modules from the file. Keys in the dictionary are module names (i.e: “pe”, “elf”, “dotnet”, etc), and values are dictionaries with the information produced by each module.
Rule
Type that represents an individual YARA rule.
.identifier
A str
with the rule’s identifier.
.namespace
A str
with the rule’s namespace.
.patterns
A tuple of Pattern with every pattern defined by the rule, matching or not. Each pattern contains information about the matches that were found during the scan, if any.
.metadata
A tuple of pairs (identifier, value)
with the metadata associated to the
rule.
Pattern
Type that represents a pattern in a Rule. Contains information about the pattern, including its identifier and the matches found for that pattern, if any.
.identifier
A str
with the pattern’s identifier (i.e: $a
, $foo
, etc).
.matches
A tuple of Match objects that contain information about the matches found for this pattern.
Match
Type that represents a match found for a Pattern.
.offset
The file offset where the match occurred.
.length
The length of the match.
.xor_key
If the pattern used the xor
modifier, this contains the XOR key (it may be 0). If not, this is None
.
CompileError
Exception raised when compilation fails.
Example
ScanError
Exception raised when scanning fails.
TimeoutError
Exception raised when a timeout occurs while scanning.