Big Data Application Development
Big Data
  • Big Data is not just about velocity, volume, veracity and variety. It is about how you identify the right information from data that is growing exponentially, and use it to add business value.
  • Apache Hadoop is an open source project that offers a new way to store and process big data. Hadoop is a framework for storing, analysing and accessing large amount of data, quickly and cost effectively through clusters of commodity hardware. Web 2.0 companies such as Google and Facebook use Hadoop to store and manage their huge data sets.
  • Hadoop is capable of computing on single server to thousands of machines and provides a low cost, but then dependable solution to tackle data management problems.
  • Hadoop ecosystem includes: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Spark.


We offer the following services:
Load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios Reports for the BI
MapReduce jobs for data cleaning and pre-processing. Data visualization
Big Data Management Big Data Analytics
Data architecture including data ingestion pipeline design Data modelling and data mining
Machine learning and advanced data processing, Optimizing ETL workflows
Real time queries over Big Data Data Stream Processing
Data Serialization Data Analytics

Areas of Expertise:

Real-time analytic: Spark Processing: MapReduce
Query-Engine : Hive, Impala ETL: Pig
Resource Manager: YARN, Mesos Big Data Analytics
Distribution: CLoudera, HortonWorks, Apache Data Integration: Flume, Sqoop
NO-SQL: Hbase, MongoDB Security: Ranger, Sentry, Kerberos
WorkFlow: OOZIE