LATEST ARTICLES

Array enclosed in string to Array in spark dataframe

It is common scenario while reading data from kafka, that you are receiving array or array of json enclosed in a string. At first it seems intimidating to convert...
docker_mssql

Dockerizing MS-SQL and Python Application

While working with the cloud platform with micro service architecture, developers generally tends to use database as a service that server the application’s database needs. Considering if the...
Spark on Windows

Developing spark application with eclipse on windows

Developing a Spark Scala application on windows is a tedious task. Even for a single line of code change, jar needs to be built and moved to a cluster after which...

Reading CSV with comma inside data field

Comma inside data field is the common scenario while dealing with flat files such as CSV. Before loading, most of the project cleanse the data by removing the comma. But what...
HIve_JSON_Image

Loading JSON data in HIVE (CDH)

  JSON is the most popular data exchange format over web. Your application may directly interact with live system or gets dump from another application; Whatsoever is the problem, being Data Engineer...
Formula One

F1 Dataset

F1 Dataset contains data in JSON, CSV and MySQL table format. The complete dataset includes all races, circuits, drivers, lap times and many more. The vivid dataset in different formats allows you...
hadoop_file_format_decision_tree

Deciding Hadoop File Formats

Hadoop being the most popular distributed system in the market, has a ton of features. Among them is file formats. Most organizations and engineers do use various file formats but fail...