Big Data Analytics

The volume of data being created, communicated, and utilized world-wide has exploded exponentially over the past 10 years and is expected to continue for the next ten reaching 78 Yottabytes (78 x 1024 bytes). Most of this is difficult to analyze unstructured heterogeneous data (data 
Figure 1 . Data Growth
that contains a combination of simple text files, images, videos etc.) See figure 1.
This trend is pandemic in all the business sectors being addressed by D2K and our clients. As one example: A single Jet engine can generate over 10 terabytes of data in 30 minutes of flight time. Real-time management of all of the data from one four-engine aircraft can be overwhelming using traditional data analytics which typically have been developed to handle hierarchical-structured data
The growing maturity more starkly delineates the difference between "Big Data" and traditional "Business Intelligence
  • Business Intelligence uses applied mathematics tools and descriptive statistics with data with high information density to measure things, detect trends, etc.
  • Big data uses mathematical analysis, optimization, inductive statistics and concepts from nonlinear systems to infer nonlinear relationships, and causal effects from large sets of data with low information density.
The characteristics of big data fall into at least three types:
  1. Volume – Size of the data to be analyzed
  2. Variety – The heterogeneous sources and the nature of the data, both structured and unstructured.
  3. Velocity - How fast the data is generated and needs to be processed to meet the demands.
  4. Variability - The inconsistency of the data over time.
All of these characteristics, taken together, will help determine the best analytical approaches to be chosen.
The growing maturity of these concepts more starkly delineates the difference between "Big Data" and traditional "Business Intelligence
  • Business Intelligence uses applied mathematics tools and descriptive statistics with data with high information density to measure things, detect trends, etc.
  • Big data uses mathematical analysis, optimization, inductive statistics and concepts from nonlinear system identification to infer laws (regressions, nonlinear relationships, and causal effects) from large sets of data with low information density to reveal relationships and dependencies, or to perform predictions of outcomes and behaviors.
53 % 0f existing companies are exploring the adoption of  Big Data Analytics methodology today.
Our mission is to develop and apply technology for our client/partners that will begin to cope with this trend and continue to meet the daily demands to comply with environmental and human safety regulations and still satisfy the stakeholder economic and efficiency objectives.
To that end we provide best of breed software methodology and tools like data mining, machine learning , and statistical modeling. Typical D2K Big Data analytics have one or more of  the following characteristics:
·Predictive Analytics: Predictive Analytics works on a data set and determines what happened. It basically analyses past data sets or records to provide a future prediction.
·Prescriptive Analytics: Prescriptive Analytics works on a data set and determines what actions needs to be taken.
·Descriptive Analytics: Descriptive Analytics actually analyze the past and determines what actually happens and why. It also helps to visualize this analysis in the dashboard may be in the form of graphical representation or in some other format.
·Diagnostic Analytics: Diagnostic Analytics executes on current data sets. It is used to do analysis based on incoming real-time data sets.
D2K Software Tool sets like Apache Hadoop provide the following important Big data features:
  • Open Source: Source code is freely available.
  • Scalable: Works on a cluster of machines by both horizontal and vertical scaling.
  • Fault-Tolerant: By default, each and every data block in the file system has a replication factor of 3.
  • Data Independent: Works on both structured and unstructured data .
  • Support for Multiple Languages: Java, with support for Python, Ruby, Etc.
  • Support for Various File Systems: supports various file systems like JSON, XML, Avro, Parquet, etc.
  • Distributed Storage: Stores data in a multiple cluster locations
  • Parallel Processing: Data remains Stationary, but code is moved to data for processing.