Welcome!

Blog Feed Post

Apache Drill + Arrow = Dremio

A new company just emerged from stealth mode yesterday, called Dremio, backed by Redpoint and Lightspeed in a Series A funding of $10m back in 2015. The founders came from MapR, but were active in Apache projects like Drill and Arrow. The same VC’s backed MapR and had the Dremio founders work out of their facilities during the stealth phase. Now the company has around 50 people in their Mountainview, California office.

Apache Drill acts as a single SQL engine that, in turn, can query and join data from among several other systems. Drill can certainly make use of an in-memory columnar data standard. But while Dremio was still in stealth, it wasn’t immediately obvious what Drill’s strong intersection with Arrow might be. But yesterday the company launched a namesake product that also acts as a single SQL engine that can query and join data from among several other systems, and it accelerates those queries using Apache Arrow. So it is a combo of (Drill + Arrow): schema-free SQL for variety of data sources plus a columnar in-memory analytics execution engine.

Dremio believes that BI today involves too many layers. Source systems, via ETL processes, feed into data warehouses, which may then feed into OLAP cubes. BI tools themselves may add another layer, building their own in-memory models in order to accelerate query performance. Dremio thinks that’s a huge mess and disintermediates things by providing a direct bridge between BI tools and the source system they’re querying. The BI tools connect to Dremio as if it were a primary data source, and query it via SQL. Dremio then delegates the query work to the true back-end systems through push-down queries that it issues. Dremio can connect to relational databases (DB2, Oracle, SQL Server, MySQL, PostgreSQL), NoSQL stores (MongoDB, Amazon Redshift, HBase, MapR-FS), Hadoop, cloud blob stores like S3, and ElasticSearch.

Here’s how it works: all data pulled from the back-end data sources is represented in memory using Arrow. Combined with vectorized (in-CPU parallel processing) querying, this design can yield up to a 5x performance improvement over conventional systems (company claims). But a perhaps even more important optimization is Dremio’s use of what it calls “Reflections,” which are materialized data structures that optimize Dremio’s row and aggregation operations. Reflections are sorted, partitioned, and indexed, stored as files on Parquet disk, and handled in-memory as Arrow-formatted columnar data. This sounds similar to ROLAP aggregation tables).

Andrew Brust from ZDNet said, “While Dremio’s approach to this is novel, and may break a performance barrier that heretofore has not been well-addressed, the company is nonetheless entering a very crowded space. The product will need to work on a fairly plug-and-play basis and live up to its performance promises, not to mention build a real community and ecosystem. These are areas where Apache Drill has had only limited success. Dremio will have to have a bigger hammer, not just an Arrow”.


Read the original blog entry...

More Stories By Jnan Dash

Jnan Dash is Senior Advisor at EZShield Inc., Advisor at ScaleDB and Board Member at Compassites Software Solutions. He has lived in Silicon Valley since 1979. Formerly he was the Chief Strategy Officer (Consulting) at Curl Inc., before which he spent ten years at Oracle Corporation and was the Group Vice President, Systems Architecture and Technology till 2002. He was responsible for setting Oracle's core database and application server product directions and interacted with customers worldwide in translating future needs to product plans. Before that he spent 16 years at IBM. He blogs at http://jnandash.ulitzer.com.

Latest Stories
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, will examine the regulations and provide insight on how it affects technology, challenges the established rules and will usher in new levels of diligence a...
FinTechs use the cloud to operate at the speed and scale of digital financial activity, but are often hindered by the complexity of managing security and compliance in the cloud. In his session at 20th Cloud Expo, Sesh Murthy, co-founder and CTO of Cloud Raxak, showed how proactive and automated cloud security enables FinTechs to leverage the cloud to achieve their business goals. Through business-driven cloud security, FinTechs can speed time-to-market, diminish risk and costs, maintain continu...
When shopping for a new data processing platform for IoT solutions, many development teams want to be able to test-drive options before making a choice. Yet when evaluating an IoT solution, it’s simply not feasible to do so at scale with physical devices. Building a sensor simulator is the next best choice; however, generating a realistic simulation at very high TPS with ease of configurability is a formidable challenge. When dealing with multiple application or transport protocols, you would be...
An increasing number of companies are creating products that combine data with analytical capabilities. Running interactive queries on Big Data requires complex architectures to store and query data effectively, typically involving data streams, an choosing efficient file format/database and multiple independent systems that are tied together through custom-engineered pipelines. In his session at @BigDataExpo at @ThingsExpo, Tomer Levi, a senior software engineer at Intel’s Advanced Analytics ...
Existing Big Data solutions are mainly focused on the discovery and analysis of data. The solutions are scalable and highly available but tedious when swapping in and swapping out occurs in disarray and thrashing takes place. The resolution for thrashing through machine learning algorithms and support nomenclature is through simple techniques. Organizations that have been collecting large customer data are increasingly seeing the need to use the data for swapping in and out and thrashing occurs ...
In the enterprise today, connected IoT devices are everywhere – both inside and outside corporate environments. The need to identify, manage, control and secure a quickly growing web of connections and outside devices is making the already challenging task of security even more important, and onerous. In his session at @ThingsExpo, Rich Boyer, CISO and Chief Architect for Security at NTT i3, discussed new ways of thinking and the approaches needed to address the emerging challenges of security i...
As many know, the first generation of Cloud Management Platform (CMP) solutions were designed for managing virtual infrastructure (IaaS) and traditional applications. But that’s no longer enough to satisfy evolving and complex business requirements. In his session at 21st Cloud Expo, Scott Davis, Embotics CTO, will explore how next-generation CMPs ensure organizations can manage cloud-native and microservice-based application architectures, while also facilitating agile DevOps methodology. He wi...
When you focus on a journey from up-close, you look at your own technical and cultural history and how you changed it for the benefit of the customer. This was our starting point: too many integration issues, 13 SWP days and very long cycles. It was evident that in this fast-paced industry we could no longer afford this reality. We needed something that would take us beyond reducing the development lifecycles, CI and Agile methodologies. We made a fundamental difference, even changed our culture...
Historically, some banking activities such as trading have been relying heavily on analytics and cutting edge algorithmic tools. The coming of age of powerful data analytics solutions combined with the development of intelligent algorithms have created new opportunities for financial institutions. In his session at 20th Cloud Expo, Sebastien Meunier, Head of Digital for North America at Chappuis Halder & Co., discussed how these tools can be leveraged to develop a lasting competitive advantage ...
SYS-CON Events announced today that Dasher Technologies will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Dasher Technologies, Inc. ® is a premier IT solution provider that delivers expert technical resources along with trusted account executives to architect and deliver complete IT solutions and services to help our clients execute their goals, plans and objectives. Since 1999, we'v...
There is only one world-class Cloud event on earth, and that is Cloud Expo – which returns to Silicon Valley for the 21st Cloud Expo at the Santa Clara Convention Center, October 31 - November 2, 2017. Every Global 2000 enterprise in the world is now integrating cloud computing in some form into its IT development and operations. Midsize and small businesses are also migrating to the cloud in increasing numbers. Companies are each developing their unique mix of cloud technologies and service...
For financial firms, the cloud is going to increasingly become a crucial part of dealing with customers over the next five years and beyond, particularly with the growing use and acceptance of virtual currencies. There are new data storage paradigms on the horizon that will deliver secure solutions for storing and moving sensitive financial data around the world without touching terrestrial networks. In his session at 20th Cloud Expo, Cliff Beek, President of Cloud Constellation Corporation, d...
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
Blockchain is a shared, secure record of exchange that establishes trust, accountability and transparency across business networks. Supported by the Linux Foundation's open source, open-standards based Hyperledger Project, Blockchain has the potential to improve regulatory compliance, reduce cost as well as advance trade. Are you curious about how Blockchain is built for business? In her session at 21st Cloud Expo, René Bostic, Technical VP of the IBM Cloud Unit in North America, will discuss th...
SYS-CON Events announced today that Datera, that offers a radically new data management architecture, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Datera is transforming the traditional datacenter model through modern cloud simplicity. The technology industry is at another major inflection point. The rise of mobile, the Internet of Things, data storage and Big...