Welcome!

Blog Feed Post

Elasticsearch security: Authentication, Encryption, Backup

The recent ransom attack on public Elasticsearch instances showed that Elasticsearch security is still a hot topic. Elasticsearch was not the only target – tens of thousands of poorly configured MongoDB databases have been compromised over the past week, too, compromising over 27,000 servers where hackers stole and then deleted data from unpatched or “poorly-configured” systems. The scenario is always the same: insecure instances are “hacked” and data replaced with a note informing the owner to send payment to a Bitcoin address and then email the attacker to retrieve the data. Over the last few days we saw more than 4000 Elasticsearch instances compromised and the number of instances is still growing, as seen here. The attacks are rather simple. The attacker simply scans for services on port 9200. Once such a service is found the hacked fetches the data from it, then deletes it and puts the payment information as document in the stolen Elasticsearch index. Due to the fact that many Elasticsearch instances are not protected these instances are very easy targets.

In this post, we are going to show you what you should or shouldn’t do with Elasticsearch, but actually how to secure Elasticsearch by sharing a few simple and free prevention methods to do that.

Let’s start with the general attack ⇒ counter-measures:

  • Port scanning ⇒ minimize exposure:
    • Don’t use the default port 9200
    • Don’t expose Elasticsearch to the public Internet (put Elasticsearch behind a firewall)
    • Don’t forget Kibana port
  • Data theft ⇒ secure access:
    • Lock down the HTTP API with authentication
    • Encrypt communication with SSL/TLS
  • Data deletion ⇒ set up backup:
    • Backup your data
  • Log file manipulation ⇒ log auditing and alerting
    • Hackers might manipulate or delete system log files to cover their tracks. Sending logs to a remote destination increases the chances of discovering intrusion early.

Let’s drill into each of the above items with step-by-step actions to secure Elasticsearch:

Lock Down Open Ports

Firewall: Close the public ports

The first action should be to close the relevant ports to the Internet:

iptables -A INPUT -i eth0 -p tcp --destination-port 9200 -s {PUBLIC-IP-ADDRESS-HERE} -j DROP

iptables -A INPUT -i eth0 -p tcp --destination-port 9300 -s {PUBLIC-IP-ADDRESS-HERE} -j DROP

If you run Kibana note that the Kibana server acts as a proxy to Elasticsearch and thus needs its port closed as well:

iptables -A INPUT -i eth0 -p tcp --destination-port 5601 -s {PUBLIC-IP-ADDRESS-HERE} -j DROP

After this you can relax a bit! Elasticsearch won’t not reachable from the Internet anymore.

Bind Elasticsearch ports only to private IP addresses

Change the configuration in elasticsearch.yml to bind only to private IP addresses or for single node instances to the loopback interface:

network.bind_host: 127.0.0.1

Add private networking between Elasticsearch and client services

If you need access from another machine to Elasticsearch connect them via VPN or any other private network. A quick way to establish a secure tunnel between two machines is via SSH tunnels:

ssh -Nf -L 9200:localhost:9200 user@remote-elasticsearch-server

You can then access Elasticsearch via the SSH tunnel with from client machines e.g.

curl http://localhost:9200/_search

Authentication and SSL/TLS with Nginx

There are several open-source and free solutions that provide Elasticsearch access authentication, but if you want something quick and simple, here is how to do it yourself with just Nginx:

Generate password file

printf "esuser:$(openssl passwd -crypt MySecret)\n" > /etc/nginx/passwords

Generate self-signed SSL certificates, if you don’t have official certificates …

sudo mkdir /etc/nginx/ssl

sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout
/etc/nginx/ssl/nginx.key -out /etc/nginx/ssl/nginx.crt

Add the proxy configuration with SSL and activate basic authentication to /etc/nginx/nginx.conf (note we expect the SSL certificate and key file in /etc/nginx/ssl/). Example:

# define proxy upstream to Elasticsearch via loopback interface in 
http {
  upstream elasticsearch {
    server 127.0.0.1:9200;
  }
}

server {
  # enable TLS
  listen 0.0.0.0:443 ssl;
  ssl_certificate /etc/nginx/ssl/nginx.crt;
  ssl_certificate_key /etc/nginx/ssl/nginx.key
  ssl_protocols TLSv1.2;
  ssl_prefer_server_ciphers on;
    ssl_session_timeout 5m;
    ssl_ciphers "HIGH:!aNULL:!MD5 or HIGH:!aNULL:!MD5:!3DES";
    
    # Proxy for Elasticsearch 
    location / {
            auth_basic "Login";
            auth_basic_user_file passwords;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header Host $http_host;
            proxy_set_header X-NginX-Proxy true;
            # use defined upstream with the name "elasticsearch"
            proxy_pass http://elasticsearch/;
            proxy_redirect off;
            if ($request_method = OPTIONS ) {
                add_header Access-Control-Allow-Origin "*"; 
            add_header Access-Control-Allow-Methods "GET, POST, , PUT, OPTIONS";
            add_header Access-Control-Allow-Headers "Content-Type,Accept,Authorization, x-requested-with"; 
            add_header Access-Control-Allow-Credentials "true"; 
            add_header Content-Length 0;
            add_header Content-Type application/json;
            return 200;
        } 
}

Restart Nginx and try to access Elasticsearch via https://localhost/_search.

Free Security Plugins for Elasticsearch

Alternatively, you could install and configure one of the several free security plugins for Elasticsearch to enable authentication:

  • HTTP Authentication plugin for Elasticsearch is available on Github.  It provides Basic HTTP Authentication, as well as IP ACL.
  • SearchGuard is a free security plugin for Elasticsearch including role based access control, document level security and SSL/TLS encrypted node-to-node communication. Additional enterprise features like LDAP authentication or JSON Web Token authentication are available and licenced per Elasticsearch cluster.  Note that SearchGuard support is also included in some Sematext Elasticsearch Support Subscriptions.

Auditing & Alerting

As with any type of system holding sensitive data, you have to monitor it very closely.  This means not only monitoring its various metrics (whose sudden changes could be an early sign of trouble), but also watching its logs.  Concretely, in the recent Elasticsearch attacks, anyone who had alert rules that trigger when the number of documents in an index suddenly drops would have immediately been notified that something was going on. A number of monitoring vendors have Elasticsearch support, including Sematext (see Elasticsearch monitoring). Logs should be collected and shipped to a log management service in real time, where alerting needs to be set up to watch for any anomalous or suspicious activity, among other things.  The log management service can be on premises or it can be a 3rd party SaaS, like Logsene.  Shipping logs off site has the advantage of preventing attackers from covering their tracks by changing the logs.  Once logs are off site attackers won’t be able to get to them.  Alerting on metrics and logs means you will become aware of a security compromise early and take appropriate actions to, hopefully, prevent further damage.

Backup and restore data

A very handy tool to backup/restore or re-index data based on Elasticsearch queries is Elasticdump.

To backup complete indices, the Elasticsearch snapshot API is the right tool. The snapshot API provides operations to create and restore snapshots of whole indices, stored in files, or in Amazon S3 buckets.

Let’s have a look at a few examples for Elasticdump and snapshot backups and recovery.

  1. Install elasticdump with the node package manager
    npm i elasticdump -g
  2. Backup by query to a zip file:
    elasticdump --input='http://username:password@localhost:9200/myindex' --searchBody '{"query" : {"range" :{"timestamp" : {"lte": 1483228800000}}}}' --output=$ --limit=1000 | gzip > /backups/myindex.gz
  3. Restore from a zip file:
    zcat /backups/myindex.gz | elasticdump --input=$ --output=http://username:password@localhost:9200/index_name

Examples for backup and restore data with snapshots to Amazon S3 or files

First configure the snapshot destination

1) S3 example

 curl 'localhost:9200/_snapshot/my_repository?pretty' -XPUT -d '{

   "type" : "s3",
   "settings" : {
     "bucket" : "test-bucket",
     "base_path" : "backup-2017-01",
     "max_restore_bytes_per_sec" : "1gb",
     "max_snapshot_bytes_per_sec" : "1gb",
     "compress" : "true",
     "access_key" : "<ACCESS_KEY_HERE>",
     "secret_key" : "<SECRET_KEY_HERE>"
   }

 }'

2) Local disk or mounted NFS example

 curl 'localhost:9200/_snapshot/my_repository?pretty' -XPUT -d '{

   "type" : "fs",
   "settings" : {
     "location": "<PATH … for example /mnt/storage/backup>"
   }

 }'

3) Trigger snapshot

 curl -XPUT 'localhost:9200/_snapshot/my_repository/<snapshot_name>'

4) Show all backups

 curl 'localhost:9200/_snapshot/my_repository/_all'

5) Restore – the most important part of backup is verifying that backup restore actually works!

curl -XPOST 'localhost:9200/_snapshot/my_repository/<snapshot_name>/_restore'

What about Hosted Elasticsearch?

There are several hosted Elasticsearch services, with Logsene being a great alternative for time series data like logs.   Each hosted Elasticsearch service is a little different.  The list below shows a few relevant aspects of Logsene:

  • Logsene API is compatible to Elasticsearch except for a few security related exceptions
  • Logsene does not expose management API’s like index listing or global search via /_search
  • Logsene blocks scripting and index deletion operations
  • Logsene users can define and revoke access tokens for read and write access
  • Logsene provides role based access control and SSL/TLS
  • Logsene creates daily snapshots for all customers and stores them securely
  • Logsene supports raw data archiving to Amazon S3

If you have any questions, feel free to contact us. We provide professional Elasticsearch support, trainings and consulting.

Or, if you are interested in trying Logsene, our hassle-free managed ELK you don’t need to maintain and scale, here it is a short path to a free trial:

SIGN UP – FREE TRIAL

Read the original blog entry...

More Stories By Sematext Blog

Sematext is a globally distributed organization that builds innovative Cloud and On Premises solutions for performance monitoring, alerting and anomaly detection (SPM), log management and analytics (Logsene), and search analytics (SSA). We also provide Search and Big Data consulting services and offer 24/7 production support for Solr and Elasticsearch.

Latest Stories
SYS-CON Events announced today that Evatronix will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Evatronix SA offers comprehensive solutions in the design and implementation of electronic systems, in CAD / CAM deployment, and also is a designer and manufacturer of advanced 3D scanners for professional applications.
SYS-CON Events announced today that Synametrics Technologies will exhibit at SYS-CON's 22nd International Cloud Expo®, which will take place on June 5-7, 2018, at the Javits Center in New York, NY. Synametrics Technologies is a privately held company based in Plainsboro, New Jersey that has been providing solutions for the developer community since 1997. Based on the success of its initial product offerings such as WinSQL, Xeams, SynaMan and Syncrify, Synametrics continues to create and hone inn...
As many know, the first generation of Cloud Management Platform (CMP) solutions were designed for managing virtual infrastructure (IaaS) and traditional applications. But that's no longer enough to satisfy evolving and complex business requirements. In his session at 21st Cloud Expo, Scott Davis, Embotics CTO, explored how next-generation CMPs ensure organizations can manage cloud-native and microservice-based application architectures, while also facilitating agile DevOps methodology. He expla...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. In his session at @BigDataExpo, Jack Norris, Senior Vice President, Data and Applications at MapR Technologies, reviewed best practices to ...
DevOps promotes continuous improvement through a culture of collaboration. But in real terms, how do you: Integrate activities across diverse teams and services? Make objective decisions with system-wide visibility? Use feedback loops to enable learning and improvement? With technology insights and real-world examples, in his general session at @DevOpsSummit, at 21st Cloud Expo, Andi Mann, Chief Technology Advocate at Splunk, explored how leading organizations use data-driven DevOps to close th...
Continuous Delivery makes it possible to exploit findings of cognitive psychology and neuroscience to increase the productivity and happiness of our teams. In his session at 22nd Cloud Expo | DXWorld Expo, Daniel Jones, CTO of EngineerBetter, will answer: How can we improve willpower and decrease technical debt? Is the present bias real? How can we turn it to our advantage? Can you increase a team’s effective IQ? How do DevOps & Product Teams increase empathy, and what impact does empath...
Most technology leaders, contemporary and from the hardware era, are reshaping their businesses to do software. They hope to capture value from emerging technologies such as IoT, SDN, and AI. Ultimately, irrespective of the vertical, it is about deriving value from independent software applications participating in an ecosystem as one comprehensive solution. In his session at @ThingsExpo, Kausik Sridhar, founder and CTO of Pulzze Systems, discussed how given the magnitude of today's application ...
Modern software design has fundamentally changed how we manage applications, causing many to turn to containers as the new virtual machine for resource management. As container adoption grows beyond stateless applications to stateful workloads, the need for persistent storage is foundational - something customers routinely cite as a top pain point. In his session at @DevOpsSummit at 21st Cloud Expo, Bill Borsari, Head of Systems Engineering at Datera, explored how organizations can reap the bene...
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, examined the regulations and provided insight on how it affects technology, challenges the established rules and will usher in new levels of diligence arou...
You know you need the cloud, but you're hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You're looking at private cloud solutions based on hyperconverged infrastructure, but you're concerned with the limits inherent in those technologies. What do you do?
Sanjeev Sharma Joins June 5-7, 2018 @DevOpsSummit at @Cloud Expo New York Faculty. Sanjeev Sharma is an internationally known DevOps and Cloud Transformation thought leader, technology executive, and author. Sanjeev's industry experience includes tenures as CTO, Technical Sales leader, and Cloud Architect leader. As an IBM Distinguished Engineer, Sanjeev is recognized at the highest levels of IBM's core of technical leaders.
Recently, WebRTC has a lot of eyes from market. The use cases of WebRTC are expanding - video chat, online education, online health care etc. Not only for human-to-human communication, but also IoT use cases such as machine to human use cases can be seen recently. One of the typical use-case is remote camera monitoring. With WebRTC, people can have interoperability and flexibility for deploying monitoring service. However, the benefit of WebRTC for IoT is not only its convenience and interopera...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, discussed the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, application p...
Mobile device usage has increased exponentially during the past several years, as consumers rely on handhelds for everything from news and weather to banking and purchases. What can we expect in the next few years? The way in which we interact with our devices will fundamentally change, as businesses leverage Artificial Intelligence. We already see this taking shape as businesses leverage AI for cost savings and customer responsiveness. This trend will continue, as AI is used for more sophistica...
The 22nd International Cloud Expo | 1st DXWorld Expo has announced that its Call for Papers is open. Cloud Expo | DXWorld Expo, to be held June 5-7, 2018, at the Javits Center in New York, NY, brings together Cloud Computing, Digital Transformation, Big Data, Internet of Things, DevOps, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...