Elasticsearch

From https://www.elastic.co/products/elasticsearch:

Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data for lightning fast search, fine‑tuned relevancy, and powerful analytics that scale with ease.

Querying

You can query Elasticsearch using web interfaces like Alerts, Dashboards, Hunt, and Kibana. You can also query Elasticsearch from the command line using a tool like curl. You can also use so-elasticsearch-query.

Authentication

We support Elastic authentication via so-elastic-auth.

Diagnostic Logging

  • Elasticsearch logs can be found in /opt/so/log/elasticsearch/.
  • Logging configuration can be found in /opt/so/conf/elasticsearch/log4j2.properties.

Depending on what you’re looking for, you may also need to look at the Docker logs for the container:

sudo docker logs so-elasticsearch

Storage

All of the data Elasticsearch collects is stored under /nsm/elasticsearch/.

Parsing

In Security Onion 2, Elasticsearch receives unparsed logs from Logstash or Filebeat. Elasticsearch then parses and stores those logs. Parsers are stored in /opt/so/conf/elasticsearch/ingest/. Custom ingest parsers can be placed in /opt/so/saltstack/local/salt/elasticsearch/files/ingest/. To make these changes take effect, restart Elasticsearch using so-elasticsearch-restart.

Note

For more about Elasticsearch ingest parsing, please see:

Templates

Fields are mapped to their appropriate data type using templates. When making changes for parsing, it is necessary to ensure fields are mapped to a data type to allow for indexing, which in turn allows for effective aggregation and searching in Dashboards, Hunt, and Kibana. Elasticsearch leverages both component and index templates.

Component Templates

From https://www.elastic.co/guide/en/elasticsearch/reference/current/index-templates.html:

Component templates are reusable building blocks that configure mappings, settings, and aliases. While you can use component templates to construct index templates, they aren’t directly applied to a set of indices.

Also see https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-component-template.html.

Index Templates

From https://www.elastic.co/guide/en/elasticsearch/reference/current/index-templates.html:

An index template is a way to tell Elasticsearch how to configure an index when it is created. Templates are configured prior to index creation. When an index is created - either manually or through indexing a document - the template settings are used as a basis for creating the index. Index templates can contain a collection of component templates, as well as directly specify settings, mappings, and aliases.

In Security Onion, component templates are stored in /opt/so/saltstack/default/salt/elasticsearch/templates/component/.

These templates are specified to be used in the index template definitions in /opt/so/saltstack/default/salt/elasticsearch/defaults.yml, and these references can be modified in the elasticsearch Salt pillar:

Custom Templates

To add a custom index template, ensure the custom or modified component templates are copied to /opt/so/saltstack/local/salt/elasticsearch/templates/component/so/.

Next, copy /opt/so/saltstack/default/pillar/elasticsearch/index_templates.sls to /opt/so/saltstack/local/pillar/elasticsearch/.

Edit the file similar to the following, adding your custom index template details and the references to the component templates you wish to associate to the index template:

/opt/so/saltstack/local/pillar/elasticsearch/index_templates.sls

elasticsearch:
  index_settings:
    so-custom:
      index_sorting: False
      index_template:
        index_patterns:
          - so-custom*
        template:
          mappings:
            dynamic_templates:
              - strings_as_keyword:
                  mapping:
                    ignore_above: 1024
                    type: keyword
                  match_mapping_type: string
            date_detection: false
          settings:
            index:
              mapping:
                total_fields:
                  limit: 1500
              sort:
                field: "@timestamp"
                order: desc
              refresh_interval: 30s
              number_of_shards: 1
              number_of_replicas: 0
        composed_of:
          - custom-mappings
          - custom-settings
        priority: 500

Next, apply the Elasticsearch state for the relevant nodes (or wait for the next highstate):

sudo salt-call state.apply elasticsearch

Upon successful application, the resultant index template will be created in /opt/so/conf/elasticsearch/templates/index with a filename that consists of the custom index key value (so-custom in this case) and a static -template.json suffix. We can check to see if the file exists and check the contents of the file with the following command:

cat /opt/so/conf/elasticsearch/templates/index/so-custom-template.json

We can also check to ensure that both the associated component templates and the index template itself were loaded into Elasticsearch:

so-elasticsearch-component-templates-list | grep custom
so-elasticsearch-index-templates-list | grep custom

Community ID

For logs that don’t naturally include Community ID, we use the Elasticsearch Community ID processor:

Configuration

Pillar Files

All configuration changes take place in Salt pillar files. There are two places that hold pillar settings for Elasticsearch. The pillars are:

/opt/so/saltstack/local/pillar/minions/$minion.sls

elasticsearch:
  mainip: 10.66.166.22
  mainint: eth0
  esheap: 4066m
  esclustername: {{ grains.host }}
  node_type: search
  es_port: 9200
  log_size_limit: 3198
  node_route_type: hot

/opt/so/saltstack/local/pillar/global.sls

elasticsearch:
  true_cluster: False
  replicas: 0
  discovery_nodes: 1
  hot_warm_enabled: False
  cluster_routing_allocation_disk.threshold_enabled: true
  cluster_routing_allocation_disk_watermark_low: '95%'
  cluster_routing_allocation_disk_watermark_high: '98%'
  cluster_routing_allocation_disk_watermark_flood_stage: '98%'
  script.painless.regex.enabled: true
  index_settings:
    so-beats:
      index_template:
        template:
          settings:
            index:
              number_of_shards: 1
      warm: 7
      close: 30
      delete: 365
    so-endgame:
      index_template:
        template:
          settings:
            index:
              number_of_shards: 1
      warm: 7
      close: 30
      delete: 365
    so-firewall:
      index_template:
        template:
          settings:
            index:
              number_of_shards: 1
      warm: 7
      close: 30
      delete: 365
    so-flow:
      index_template:
        template:
          settings:
            index:
              number_of_shards: 1
          close: 45
          delete: 365
    so-ids:
      index_template:
        template:
          settings:
            index:
              number_of_shards: 1
      warm: 7
      close: 30
      delete: 365
    so-import:
      index_template:
        template:
          settings:
            index:
              number_of_shards: 1
      warm: 7
      close: 73000
      delete: 73001
    so-osquery:
      index_template:
        template:
          settings:
            index:
              number_of_shards: 1
      warm: 7
      close: 30
      delete: 365
    so-ossec:
      index_template:
        template:
          settings:
            index:
              number_of_shards: 1
      warm: 7
      close: 30
      delete: 365
    so-strelka:
      index_template:
        template:
          settings:
            index:
              number_of_shards: 1
      warm: 7
      close: 30
       delete: 365
    so-syslog:
      index_template:
        template:
          settings:
            index:
              number_of_shards: 1
      warm: 7
      close: 30
      delete: 365
    so-zeek:
      index_template:
        template:
          settings:
            index:
              number_of_shards: 2

Customization

You can completely customize your Elasticsearch configuration via Salt pillars. This allows elasticsearch.yml customizations to be retained when doing upgrades of Security Onion. Depending on your customization goal, you can specify settings in either the global pillar or the minion pillar. Create the config sub-section if it does not already exist in your pillar and then place your configuration options under that sub-section. For example, to change the node_concurrent_recoveries setting:

elasticsearch:
  config:
    routing:
      allocation:
        node_concurrent_recoveries: 4

Warning

Please be very careful when adding items under the config sub-section to avoid typos and other errors that would interfere with Elasticsearch. After making changes, keep a close eye on Elasticsearch to make sure the change is working as intended.

field expansion matches too many fields

If you get errors like failed to create query: field expansion for [*] matches too many fields, limit: 3500, got: XXXX, then this usually means that you’re sending in additional logs and so you have more fields than our default max_clause_count value. To resolve this, you can customize the indices.query.bool.max_clause_count value for any boxes running Elasticsearch in your deployment.

elasticsearch:
  config:
    indices.query.bool.max_clause_count: 4000

Shards

Here are a few tips from https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster:

TIP: Avoid having very large shards as this can negatively affect the cluster’s ability to recover from failure. There is no fixed limit on how large shards can be, but a shard size of 50GB is often quoted as a limit that has been seen to work for a variety of use-cases.

TIP: Small shards result in small segments, which increases overhead. Aim to keep the average shard size between a few GB and a few tens of GB. For use-cases with time-based data, it is common to see shards between 20GB and 40GB in size.

TIP: The number of shards you can hold on a node will be proportional to the amount of heap you have available, but there is no fixed limit enforced by Elasticsearch. A good rule-of-thumb is to ensure you keep the number of shards per node below 20 to 25 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600-750 shards, but the further below this limit you can keep it the better. This will generally help the cluster stay in good health.

To see your existing shards, run the following command and the number of shards will be shown in the fifth column:

sudo so-elasticsearch-query _cat/indices

If you want to view the detail for each of those shards:

sudo so-elasticsearch-query _cat/shards

Given the sizing tips above, if any of your indices are averaging more than 50GB per shard, then you should probably increase the shard count until you get below that recommended maximum of 50GB per shard.

The number of shards for an index is defined in /opt/so/saltstack/local/pillar/global.sls. You can adjust shard counts for each index individually to meet your needs. The next time the node checks in, it will apply the settings automatically.

Please keep in mind that old indices will retain previous shard settings and the above settings will only be applied to newly created indices.

Heap Size

If total available memory is 8GB or greater, Setup configures the heap size to be 33% of available memory, but no greater than 25GB. You may need to adjust the value for heap size depending on your system’s performance. This can be modified in /opt/so/saltstack/local/pillar/minions/$minion.sls.

Field limit

Security Onion currently defaults to a field limit of 5000. If you receive error messages from Logstash, or you would simply like to increase this, you can do so with one of the following options.

Temporary

If you only need to increase the field limit temporarily, you can do something like:

curl -k -XPUT -H'Content-Type: application/json' https://localhost:9200/logstash-syslog-*/_settings -d'{ "index.mapping.total_fields.limit": 6000 }'

The above command would increase the field limit for the logstash-syslog-* indice(s) to 6000. Keep in mind, this setting only applies to the current index, so when the index rolls over and a new one is created, your new settings will not apply.

Persistent

If you need this change to be persistent, you can modify the settings stanza for the matched indices in the template:

"settings" : {
    "number_of_replicas": 0,
    "number_of_shards": 1,
    "index.refresh_interval" : "5s",
    "index.mapping.total_fields.limit": 6000
},

Then restart Logstash:

sudo so-logstash-restart

Please note that the change to the field limit will not occur immediately, only on index creation. Therefore, it is recommended to run the previously mentioned temporary command and modify the template file.

Closing Indices

Elasticsearch indices are closed based on the close setting shown in the global pillar above. This setting configures Curator to close any index older than the value given. The more indices are open, the more heap is required. Having too many open indices can lead to performance issues. There are many factors that determine the number of days you can have in an open state, so this is a good setting to adjust specific to your environment.

Deleting Indices

Note

This section describes how Elasticsearch indices are deleted in standalone deployments and distributed deployments using our default deployment method of cross cluster search. Index deletion is different for deployments using Elastic clustering and that is described in the Elastic clustering section later.

For standalone deployments and distributed deployments using cross cluster search, Elasticsearch indices are deleted based on the log_size_limit value in the minion pillar. If your open indices are using more than log_size_limit gigabytes, then Curator will delete old open indices until disk space is back under log_size_limit. If your total Elastic disk usage (both open and closed indices) is above log_size_limit, then so-curator-closed-delete will delete old closed indices until disk space is back under log_size_limit. so-curator-closed-delete does not use Curator because Curator cannot calculate disk space used by closed indices. For more information, see https://www.elastic.co/guide/en/elasticsearch/client/curator/current/filtertype_space.html.

Curator and so-curator-closed-delete run on the same schedule. This might seem like there is a potential to delete open indices before deleting closed indices. However, keep in mind that Curator’s delete.yml is only going to see disk space used by open indices and not closed indices. So if we have both open and closed indices, we may be at log_size_limit but Curator’s delete.yml is going to see disk space at a value lower than log_size_limit and so it shouldn’t delete any open indices.

For example, suppose our log_size_limit is 1TB and we have 30 days of open indices and 300 days of closed indices. We reach log_size_limit and both Curator and so-curator-closed-delete execute at the same time. Curator’s delete.yml will check disk space used but it will see that disk space is at maybe 100GB so it thinks we haven’t reached log_size_limit and does not delete anything. so-curator-closed-delete gets a more accurate view of disk space used, sees that we have indeed reached log_size_limit, and so it deletes closed indices until we get lower than log_size_limit. In most cases, Curator deletion should really only happen if we have open indices without any closed indices.

Distributed Deployments

For distributed deployments, Security Onion 2 supports two different configurations for deploying Elasticsearch: cross cluster search and Elastic clustering.

Elastic Clustering

For advanced users that require advanced features like shard replicas and hot/warm indices, Security Onion 2 also supports Elastic clustering. In this configuration, Elasticsearch instances join together to create a single cluster. However, please keep in mind that this requires more maintenance, more knowledge of Elasticsearch internals, and more traffic between nodes in the cluster.

Warning

Due to the increased complexity, we only recommend this option if you absolutely need cluster features.

_images/elastic-cluster-1.png _images/elastic-cluster-2.png _images/elastic-cluster-3.png

When using Elastic clustering, index deletion is based on the delete settings shown in the global pillar above. The delete settings in the global pillar configure Curator to delete indices older than the value given. For each index, please ensure that the close setting is set to a smaller value than the delete setting.

Let’s discuss the process for determining appropriate delete settings. First, check your indices using so-elasticsearch-query to query _cat/indices. For example:

sudo so-elasticsearch-query _cat/indices | grep 2021.08.26

green open  so-zeek-2021.08.26              rEtb1ERqQcyr7bfbnR95zQ 5 0  2514236      0    2.4gb    2.4gb
green open  so-ids-2021.08.26               d3ySLbRHSJGRQ2oiS4pmMg 1 0     1385    147    3.3mb    3.3mb
green open  so-ossec-2021.08.26             qYf1HWGUSn6fIOlOgFgJOQ 1 0   125333     61  267.1mb  267.1mb
green open  so-elasticsearch-2021.08.26     JH8tOgr3QjaQ-EX08OGEXw 1 0    61170      0   32.7mb   32.7mb
green open  so-firewall-2021.08.26          Qx6_ZQS3QL6VGwIXIQ8mfQ 1 0   508799      0  297.4mb  297.4mb
green open  so-syslog-2021.08.26            3HiYP3fgSPmoV-Nbs3dlDw 1 0   181207      0     27mb     27mb
green open  so-kibana-2021.08.26            C6v6sazHSYiwqq5HxfokQg 1 0      745      0  809.5kb  809.5kb

Adding all the index sizes together plus a little padding results in 3.5GB per day. We will use this as our baseline.

If we look at our total /nsm size for our search nodes (data nodes in Elastic nomenclature), we can calculate how many days open or closed that we can store. The equation shown below determines the proper delete timeframe. Note that total usable space depends on replica counts. In the example below we have 2 search nodes with 140GB for 280GB total of /nsm storage. Since we have a single replica we need to take that into account. The formula for that is:

1 replica = 2 x Daily Index Size 2 replicas = 3 x Daily Index Size 3 replicas = 4 x Daily Index Size

Let’s use 1 replica:

Total Space / copies of data = Usable Space

280 / 2 = 140

Suppose we want a little cushion so let’s make Usable Space = 130

Usable NSM space / Daily Index Size = Days

For our example above lets fill in the proper values:

130GB / 3.5GB = 37.1428571 days rounded down to 37 days

Therefore, we can set all of our delete values to 37 in the global.sls.

Re-indexing

Re-indexing may need to occur if field data types have changed and conflicts arise. This process can be VERY time-consuming, and we only recommend this if keeping data is absolutely critical.

For more information about re-indexing, please see:

Clearing

If you want to clear all Elasticsearch data including documents and indices, you can run the so-elastic-clear command.

GeoIP

Elasticsearch 8 no longer includes GeoIP databases by default, but we add GeoIP databases during the build process. If your search nodes have Internet access and can reach geoip.elastic.co and storage.googleapis.com, then you can opt-in to database updates if you want more recent information. To do this, add the following to your Elasticsearch Salt config:

config:
  ingest:
    geoip:
      downloader:
        enabled: true

More Information

Note

For more information about Elasticsearch, please see: