LATEST ARTICLES

Spark on Windows

Developing spark application with eclipse on windows

Developing a Spark Scala application on windows is a tedious task. Even for a single line of code change, jar needs to be built and moved to a cluster after which...

Reading CSV with comma inside data field

Comma inside data field is the common scenario while dealing with flat files such as CSV. Before loading, most of the project cleanse the data by removing the comma. But what...
HIve_JSON_Image

Loading JSON data in HIVE (CDH)

  JSON is the most popular data exchange format over web. Your application may directly interact with live system or gets dump from another application; Whatsoever is the problem, being Data Engineer...
Formula One

F1 Dataset

F1 Dataset contains data in JSON, CSV and MySQL table format. The complete dataset includes all races, circuits, drivers, lap times and many more. The vivid dataset in different formats allows you...
hadoop_file_format_decision_tree

Deciding Hadoop File Formats

Hadoop being the most popular distributed system in the market, has a ton of features. Among them is file formats. Most organizations and engineers do use various file formats but fail...