Elasticsearch to Load Large Custom JSON Datasets

Preface

Elasticsearch is an open-source, broadly accessed, readily scalable, enterprise-grade search engine. Accessible through an extensive and elaborate API, it supports extremely quick search commands on a large amount of data powering discovery applications.

We build an application that would search a large amount of GIS data to highlight geographical features such as roads, paths, lanes for a given location. The GIS data was made available in a geoJSON (a JSON format ) [for more details refer: ‘http://geojson.org/’ ] format. The geojson data comprise of road-specific information (highway, city road, one-way, paved/unpaved, latitude, longitude, etc). The objective was to import this information into an Elasticsearch instance in the most efficient way and then power an API library that would provide all the coordinates of a road for a search query. For example, if a user gives a name of a road, the elastic search would return all the waypoints or coordinates for the road and the end application would then highlight those coordinates to identify the road on the map and display the metadata alongside.

Expand to read more

The API library was developed using node.js and the user interface was built using AngularJS and Layers 3.0. [for more details refer : ‘https://openlayers.org/’ ] .

Below is a step-by-step explanation of our approach:
To understand what we have done to develop our application and which challenges we faced during development, and end-user should know the basic structure of Elasticsearch and Node.js.

Step 1: Implementation of Elasticsearch and integrate it with Node.js

The above image shows the comparison between Elasticsearch and a relational database (RDBMS). Elasticsearch has a simpler structure. The structure follows the steps like:
Step 1: Create Index (my_index in image),
Step-2: Create Type (my_type in image),
Step-3: Upload JSON data to the created type as documents with separate index for each parametric data point((A,B,..X,Y in image)).

Elasticsearch supports a tree structure: Index>Type>Document>fields. In Elasticsearch multiple Indexes can be created. In each index, multiple types can be created and in each type, multiple documents can be uploaded. And also each document consists of multiple fields.

After installation of Elasticsearch (from https://www.elastic.co), it can be run on the default port 9200. We used a Node.js application for Index creation, type creation, for uploading JSON data in bulk to Elasticsearch and for the development of API that would provide all the coordinates of a road using search query of Elasticsearch(in our case). To connect Node.js with Elasticsearch, we need to install ‘Elasticsearch.js’ in our Node.js application. Elasticsearch.js is the official Elasticsearch client for Node.js by which we can create indexes, types and upload bulk JSON data to Elasticsearch.

To install Elasticsearch.js into Node.js, refer ‘https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/about.html

Step 2: Establish Connection between Elasticsearch and Node.js Application

As Elasticsearch.js is the Elasticsearch client for Node.js, we have to instantiate elasticsearch.Client class in the Node.js application to connect with Elasticsearch. We established the Elasticsearch connection by through a javascript file and called it Connection.js. Following is how Elasticsearch is called in Connection.js

Using the command ‘module.exports = client’, Connection.js can be exported to anywhere in Node.js application to keep the connection of Elasticsearch alive via elasticsearch.Client.

Step 3: Create Index

Indexing in Elasticsearch is not quite like indexing in other databases. In Elasticsearch, an index is a place to store related documents. To create an index in Elasticsearch with Node.js application, we require a Connection.js to be imported, and then we can create an index. The following illustrates the importing of elasticsearch connection and the creation of an index.

The javascript file named ‘CreateIndex.js’ indicates that we are trying to create an index ‘bulk-import’ to Elasticsearch. The variable ‘client’ is used to import ‘Connection.js’. To run this script on Node.js, open command line, and navigate to the location where ‘CreateIndex.js’ is located, we ran the command ‘node CreateIndex.js’. The response (shown below) indicates that an index named ‘bulk-import’ has been created successfully in Elasticsearch.

To run any javascript file on Node.js, ‘node’ should be the prefix to the javascript file in the command line.

Step 4: Bulk import nested JSON structures

The above sections give a basic understanding of how Elasticsearch and Node.js application can work together. Now it will be easy to understand and overcome the challenges that are typically faced around bulk imports. While there are multiple mechanisms/tools by which data can be uploaded into elasticsearch, such as using Kibana or logstash, our objective was to use a custom-built node.js application to upload the data in bulk.

Objective:

Our objective was to upload a large amount of geojson data to Elasticsearch. The geoJSON format is a derivative of the JSON format file with a complex JSON tree and nested structure as shown below:

Challenges:

  • How do we upload a nested JSON file structure into elasticsearch using Node.js? Does elasticsearch support a nested JSON format?
  • How do we create a separate index for each data point while avoiding the risk of indexing large quantities of data under a single index, which typically is expected from a bulk upload?

While it is easy to load large amounts of JSON data into Elasticsearch where every datarow is indexed separately. Our objective, however, was to upload geoJSON file (which is a custom JSON data file) to Elasticsearch using a Node.js application. We had to develop a code in Node.js that can easily upload the nested JSON data in bulk to Elasticsearch.

As discussed earlier, we already created an index named ‘bulk-import’. It was then needed to create a new ‘document type’ in the index ‘bulk-import’ so that we could upload and index the nested JSON data against the created document type. In order to automate the creation of the ‘document type’ and to upload nested JSON data with separate index to Elasticsearch, a javascript file called ‘bulkupload.js’ was used. By using this script we created document type ‘bulkdoc’ in our index ‘bulk-import’, and tried to upload geoJSON data.

We were getting an error while parsing our geojson file using the command result=JSON.parse(‘./filename.geojson’); as shown in the above image.

Our Approach:

We had to modify our javascript code to overcome this issue. One thing was clear that to upload geojson (nested JSON) data to Elasticsearch with Node.js application, the geojson data must be parsed in an appropriate manner. After a tremendous amount of search efforts, we found ‘jsonfile’ package that helped us to parse the geojson data.

‘jsonfile’ is a Node.js package that has to be installed to the Node.js application (for installation and details refer: ‘https://www.npmjs.com/package/jsonfile’). ‘Jsonfile’ can do multiple tasks with any type of JSON data, whether it is flat JSON or complex JSON. It would stringify the JSON data then parse the stringified JSON and then read the parsed JSON, which can be sent for upload in bulk to Elasticsearch.

We installed the jsonfile packages in our Node.js application. We then imported ‘jsonfile’ and modified the code of bulkupload.js.

By running this script on Node.JS, we uploaded large amount of geoJSON data into Elasticsearch in bulk with a separate index of each record. We can run the following URL in the browser to cross-check whether each record of geoJSON data: ‘localhost:9200/bulkimport/bulkdoc/_search’.

  • In URL localhost:9200 indicates that Elasticsearch is running on your local IP address with 9200 port.
  • bulkimport indicates the index you have created in Elasticsearch and bulkdoc indicates the type you created in the index.
  • _search is a REST API of Elasticsearch. We can also run curl command with _search API to cross-check the same on the command line or ‘Kibana’.

After importing the geojson file to Elasticsearch, we developed an API with HTTP protocol by calling the search query of Elasticsearch. This was achieved by Node.js and Express.js.

Express.js and Node.js provide back-end functionality allowing developers to build software with JavaScript on the server side. Together, they make it possible to build an entire site with JavaScript. We integrated the URL of API to our frontend UI that was developed using Angular 5 to retrieve the Geojson data by particular search text from Elasticsearch.

The image given below is the UI of our Application. The end-users would be able to search for a particular area of the road or the entire road with its name or road reference number. As per the road reference number mentioned in the search engine of the image, the highlighted part of the image is the end result for the required location.


Related Post

The future of Digital E-Commerce: Leverage technology

In today’s world, advancement in technology is resulting in customers expecting a very intuitive shopping experience…

About the author

Leave a Reply

Your email address will not be published. Required fields are marked *