I work for a independent software vendor in the Big Data space (a little more on that later). This list contains the ten startups that either address challenges enterprises are facing with regards to Big Data that I hear about most frequently and/or have the most buzz with people who I speak with. In order to keep this list manageable I limited it to ten …there are many great Big Data startups that are not on this list.
Alpine Data Labs – Alpine Chorus is a collaborative, code-free solution for enabling Advanced Analytics on Big Data. With Alpine, data scientists and business analysts can work with large data sets, develop and collaborate on models at scale without having to use code or desktop software. Alpine is representative of a new generation of data mining and predictive model building tool ISVs who are seeing great traction as enterprises look to evolve from utilizing business intelligence tools to integrating predictive analytics into business processes.
Confluent – Confluent is a commercial provider/supporter of Apache Kafka, an open source technology created by the founders of Confluent (who are highly respected Linkedin alumni). Kafka acts as a real-time, fault tolerant, highly scalable messaging system. It was designed to solve the problems associated with different types of data processing systems (with very different requirements) needing to consume and operate on Big Data in real-time.
Databricks – Databricks was founded in the UC Berkeley AMPLab by the creators of Apache Spark. It is a commercial/provider supporter of Apache Spark that is an open source technology built on top of Hadoop for both Analytics and Streaming. Spark has tremendous momentum within the Hadoop Open Source community. Databricks provides tools for interactively analyzing, visualize, and curating data as well as collaboration and integration tools.
Datameer – Datameer was extremely early in realizing that Hadoop was going to be big and that business analysts in the enterprise needed an excel like interface to work with Hadoop more easily and quickly. Datameer provides an end-to-end data analytics application that makes Big Data analytics accessible for business as well as technical end users.
Hadoop – It is impossible to talk about Big Data without talking about Hadoop. It is impossible to talk about Hadoop without talking about Cloudera, Hortonworks, MapR and Pivotal. As well as the cloud versions of Hadoop offered by companies like Altiscale and Qubole. As well as a variety of established companies such as Amazon, IBM who also offer Hadoop distributions.. However, if I included all the activity in the database space including Graph databases and NoSQL that would have been the entire list. So they all get an explicit or implicit mention under “Hadoop”.
Interana – Interana was founded by a former Intel executive and a couple extremely high profile Facebook Alums who are trying to do for event (or real-time data) what Datameer has done for Hadoop. That is enabling non-technical users to be able to query real-time data through an elegant user interface to generate actionable business intelligence.
Lucidworks – Lucidworks is a commercial provider/support of Solr an open source enterprise search tool. Fusion, Lucidworks’ advanced search platform, provides the enterprise-grade capabilities needed to design, develop and deploy intelligent search apps — at any scale.
Paxata and Trifacta – Paxata and Tricfacta are direct competitors that focus on data preparation and transformation for Big Data. The goal being to drastically reduce the time necessary in prepping data for analysis, enabling analysts ( as opposed to IT professionals to do this work), and solvie some of the interoperability issues that arise when multiple data platforms are being used by a single company.
Zementis – Zementis enables a “write once, deploy anywhere – immediately” capability for sophisticated predictive analytics built by data science teams by leveraging an open industry standard called the Predictive Model Markup Language (PMML) . This provides a powerful ROI when compared to the typical six+ month timeframe for the deployment of predictive models into production information technology systems. In addition, Data Science teams typically see an impressive increase in productivity after implementing Zementis that refects in the bottom line. (NOTE: I do work for Zementis so there is a bit of shameless self-promotion here :).