Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data for lightning fast search, fine‑tuned relevancy, and powerful analytics that scale with ease.
Starting in Security Onion 2.4, most data is associated with a data stream, which is an abstraction from traditional indices that leverages one or more backing indices to manage and represent the data within the data stream. The usage of data streams allows for greater flexibility in data management.
Data streams can be targeting during search or other operations directly, similar to how indices are targeted.
For example, a CLI-based query against Zeek connection records would look like the following:
When this query is run against the backend data, it is actually targeting one or more backing indices, such as:
.ds-logs-zeek-so-2022-03-07.0001 .ds-logs-zeek-so-2022-03-08.0001 .ds-logs-zeek-so-2022-03-08.0002
Similarly, you can target a single backing index with the following query:
You can learn more about data streams at https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html.
Security Onion tries to adhere to the Elastic Common Schema wherever possible. Otherwise, additional fields or slight modifications to native Elastic field mappings may be found within the data.
In Security Onion 2.4, Elasticsearch data is handled partially by both Curator and ILM (https://www.elastic.co/guide/en/elasticsearch/reference/current/index-lifecycle-management.html).
Only Curator performs the following actions:
- closing of open indices
- size-based index deletion
- size-based closed index deletion
Only ILM performs the following actions:
- size-based index rollover
- time-based index rollover
- time-based content tiers
Both Curator and ILM perform the following actions:
- time-based open index deletion
- time-based closed index deletion
Default ILM policies are preconfigured and associated with various data streams and index templates in
You can query Elasticsearch using web interfaces like Alerts, Dashboards, Hunt, and Kibana. You can also query Elasticsearch from the command line using a tool like
curl. You can also use so-elasticsearch-query.
You can authenticate to Elasticsearch using the same username and password that you use for Security Onion Console (SOC).
You can add new user accounts to both Elasticsearch and Security Onion Console (SOC) at the same time as shown in the Adding Accounts section. Please note that if you instead create accounts directly in Elastic, then those accounts will only have access to Elastic and not Security Onion Console (SOC).
- Elasticsearch logs can be found in
- Logging configuration can be found in
Depending on what you’re looking for, you may also need to look at the Docker logs for the container:
sudo docker logs so-elasticsearch
All of the data Elasticsearch collects is stored under
Elasticsearch receives unparsed logs from Logstash or Elastic Agent. Elasticsearch then parses and stores those logs. Parsers are stored in
/opt/so/conf/elasticsearch/ingest/. Custom ingest parsers can be placed in
/opt/so/saltstack/local/salt/elasticsearch/files/ingest/. To make these changes take effect, restart Elasticsearch using
Elastic Agent may pre-parse or act on data before the data reaches Elasticsearch, altering the data stream or index to which it is written, or other characteristics such as the event dataset or other pertinent information. This configuration is maintained in the agent policy or integration configuration in Elastic Fleet.
Fields are mapped to their appropriate data type using templates. When making changes for parsing, it is necessary to ensure fields are mapped to a data type to allow for indexing, which in turn allows for effective aggregation and searching in Dashboards, Hunt, and Kibana. Elasticsearch leverages both component and index templates.
Component templates are reusable building blocks that configure mappings, settings, and aliases. While you can use component templates to construct index templates, they aren’t directly applied to a set of indices.
An index template is a way to tell Elasticsearch how to configure an index when it is created. Templates are configured prior to index creation. When an index is created - either manually or through indexing a document - the template settings are used as a basis for creating the index. Index templates can contain a collection of component templates, as well as directly specify settings, mappings, and aliases.
In Security Onion, component templates are stored in
These templates are specified to be used in the index template definitions in
You can configure Elasticsearch by going to Administration –> Configuration –> elasticsearch.
field expansion matches too many fields¶
If you get errors like
failed to create query: field expansion for [*] matches too many fields, limit: 3500, got: XXXX, then this usually means that you’re sending in additional logs and so you have more fields than our default
max_clause_count value. To resolve this, you can go to Administration –> Configuration –> elasticsearch –> config –> indices –> query –> bool –> max_clause_count and adjust the value for any boxes running Elasticsearch in your deployment.
If total available memory is 8GB or greater, Setup configures the heap size to be 33% of available memory, but no greater than 25GB. You may need to adjust the value for heap size depending on your system’s performance. You can modify this by going to Administration –> Configuration –> elasticsearch –> esheap.
Security Onion currently defaults to a field limit of 5000. If you receive error messages from Logstash, or you would simply like to increase this, you can do so by going to Administration –> Configuration –> elasticsearch –> index_settings –> so-INDEX-NAME –> index_template –> template –> settings –> index –> mapping –> total_fields –> limit.
Please note that the change to the field limit will not occur immediately, only on index creation.
Elasticsearch indices are closed based on the
close setting shown at Administration –> Configuration –> elasticsearch –> index_settings –> so-INDEX-NAME –> close. This setting configures Curator to close any index older than the value given. The more indices are open, the more heap is required. Having too many open indices can lead to performance issues. There are many factors that determine the number of days you can have in an open state, so this is a good setting to adjust specific to your environment.
Size-based Index Deletion¶
Size-based deletion of Elasticsearch indices occurs based on the value of cluster-wide
elasticsearch.retention.retention_pct, which is derived from the total disk space available for
/nsm/elasticsearch across all nodes in the Elasticsearch cluster. The default value for this setting is
To modify this value, first navigate to Administration -> Configuration. At the top of the page, click the
Options menu and then enable the
Show all configurable settings, including advanced settings. option. Then navigate to elasticsearch -> retention -> retention_pct. The change will take effect at the next 15 minute interval. If you would like to make the change immediately, you can click the
SYNCHRONIZE GRID button under the
Options menu at the top of the page.
If your open indices are using more than
retention_pct, then Curator will delete old open indices until disk space is back under
retention_pct. If your total Elastic disk usage (both open and closed indices) is above
so-curator-closed-delete will delete old closed indices until disk space is back under
so-curator-closed-delete does not use Curator because Curator cannot calculate disk space used by closed indices. For more information, see https://www.elastic.co/guide/en/elasticsearch/client/curator/current/filtertype_space.html.
so-curator-closed-delete run on the same schedule. This might seem like there is a potential to delete open indices before deleting closed indices. However, keep in mind that Curator’s delete.yml is only going to see disk space used by open indices and not closed indices. So if we have both open and closed indices, we may be at
retention_pct but Curator’s delete.yml is going to see disk space at a value lower than
retention_pct and so it shouldn’t delete any open indices.
For example, suppose our
retention_pct is 50%, total disk space is 1TB, and we have 30 days of open indices and 300 days of closed indices. We reach
retention_pct and both Curator and
so-curator-closed-delete execute at the same time. Curator’s delete.yml will check disk space used but it will see that disk space is at maybe 500GB so it thinks we haven’t reached
retention_pct and does not delete anything.
so-curator-closed-delete gets a more accurate view of disk space used, sees that we have indeed reached
retention_pct, and so it deletes closed indices until we get lower than
retention_pct. In most cases, Curator deletion should really only happen if we have open indices without any closed indices.
Time-based Index Deletion¶
Time-based deletion occurs through the use of the $data_stream.policy.phases.delete.min_age setting within the lifecycle policy tied to each index and is controlled by ILM. It is important to note that size-based deletion takes priority over time-based deletion, as disk may reach
retention_pct and indices will be deleted before the
min_age value is reached.
Policies can be edited within the SOC administration interface by navigating to Administration -> Configuration -> elasticsearch -> $index -> policy -> phases -> delete -> min_age. Changes will take effect when a new index is created.
Security Onion supports Elastic clustering. In this configuration, Elasticsearch instances join together to create a single cluster. When using Elastic clustering, index deletion is based on the
delete settings shown in the global pillar above. The
delete settings in the global pillar configure Curator to delete indices older than the value given. For each index, please ensure that the
close setting is set to a smaller value than the
Let’s discuss the process for determining appropriate
delete settings. First, check your indices using so-elasticsearch-query to query
_cat/indices. For example:
sudo so-elasticsearch-query _cat/indices | grep 2021.08.26 green open so-zeek-2021.08.26 rEtb1ERqQcyr7bfbnR95zQ 5 0 2514236 0 2.4gb 2.4gb green open so-ids-2021.08.26 d3ySLbRHSJGRQ2oiS4pmMg 1 0 1385 147 3.3mb 3.3mb green open so-ossec-2021.08.26 qYf1HWGUSn6fIOlOgFgJOQ 1 0 125333 61 267.1mb 267.1mb green open so-elasticsearch-2021.08.26 JH8tOgr3QjaQ-EX08OGEXw 1 0 61170 0 32.7mb 32.7mb green open so-firewall-2021.08.26 Qx6_ZQS3QL6VGwIXIQ8mfQ 1 0 508799 0 297.4mb 297.4mb green open so-syslog-2021.08.26 3HiYP3fgSPmoV-Nbs3dlDw 1 0 181207 0 27mb 27mb green open so-kibana-2021.08.26 C6v6sazHSYiwqq5HxfokQg 1 0 745 0 809.5kb 809.5kb
Adding all the index sizes together plus a little padding results in 3.5GB per day. We will use this as our baseline.
If we look at our total
/nsm size for our search nodes (data nodes in Elastic nomenclature), we can calculate how many days open or closed that we can store. The equation shown below determines the proper delete timeframe. Note that total usable space depends on replica counts. In the example below we have 2 search nodes with 140GB for 280GB total of
/nsm storage. Since we have a single replica we need to take that into account. The formula for that is:
1 replica = 2 x Daily Index Size 2 replicas = 3 x Daily Index Size 3 replicas = 4 x Daily Index Size
Let’s use 1 replica:
Total Space / copies of data = Usable Space
280 / 2 = 140
Suppose we want a little cushion so let’s make Usable Space = 130
Usable NSM space / Daily Index Size = Days
For our example above lets fill in the proper values:
130GB / 3.5GB = 37.1428571 days rounded down to 37 days
Therefore, we can set all of our
delete values to 37.
Re-indexing may need to occur if field data types have changed and conflicts arise. This process can be VERY time-consuming, and we only recommend this if keeping data is absolutely critical.
If you want to clear all Elasticsearch data including documents and indices, you can run the
Elasticsearch 8 no longer includes GeoIP databases by default. We include GeoIP databases for Elasticsearch so that all users will have GeoIP functionality. If your search nodes have Internet access and can reach geoip.elastic.co and storage.googleapis.com, then you can opt-in to database updates if you want more recent information. To do this, add the following to your Elasticsearch Salt config:
config: ingest: geoip: downloader: enabled: true