Welcome!

Blog Feed Post

The Apache Software Foundation Announces Apache™ Spark™ as a Top-Level Project

Thursday 27 February, 2014
Forest Hill, MD –27 February 2014– The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 170 Open Source projects and initiatives, announced today that Apache Spark has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project's community and products have been well-governed under the ASF's meritocratic process and principles.

Apache Spark is an Open Source cluster computing framework for fast and flexible large-scale data analysis. Dubbed a "Hadoop Swiss Army knife" by The Register, Spark is recognized for its remarkable speed and ease of use, running programs up to 100x faster than Apache Hadoop MapReduce in memory, and with APIs that allow developers to quickly write applications in Java, Python, or Scala.

"It's great to see Apache become Spark’s permanent home," said Matei Zaharia, Vice President of Apache Spark. "Spark has quickly become one of the most active projects in the Hadoop ecosystem, with dozens of organizations contributing, and we look forward to working closely with the rest of the Apache community."

Initially created in 2009 at the University of California at Berkeley's AMPLab (the research center also responsible for the original development of Apache Mesos), the Spark distributed computing framework for advanced analytics in Apache Hadoop can easily be used standalone or on Hadoop YARN, EC2 or Mesos. Integrated with Apache Hadoop, Spark is well suited for machine learning, interactive queries, and stream processing, and can read from HDFS, HBase, Cassandra, as well as any Hadoop data source.

"This is a major milestone for the students and researchers in the AMPLab," said Mike Franklin, Director of the AMPLab at UC Berkeley. "Spark demonstrates the real impact that research can have and validates the support AMPLab has received from our White House-announced NSF Expeditions in Computing Award and our 20+ industrial sponsors and collaborators."

"Through our work on Spark at both AMPLab and Databricks, we’ve focused on making it much easier for organizations to get insights from big data," said Ion Stoica, CEO at Databricks and Professor at UC Berkeley. "We're doing this together with a fantastic open source community. We look forward to continue working with the community to accelerate the development and adoption of Apache Spark."

Since entering the Apache Incubator in June 2013, Apache Spark bolstered its community through code contributions by more than 120 developers from 25 organizations. Apache Spark is in use at an array of global corporations that include Alibaba, Cloudera, Databricks, IBM, Intel, and Yahoo, among others.

Andrew Feng, Distinguished Architect at Yahoo, said "Yahoo has played a leading role in evolving Hadoop and related big-data technologies, including Spark. While Apache Hadoop serves as the foundation of our big-data platform, Spark is an attractive technology for iterative applications such as machine learning. Yahoo has made significant contributions to the development of Spark and we congratulate Spark on becoming an Apache top-level project."

"I'm really proud of the community aspect that has become infectious in Apache Spark and that really grew out of the energy in the project starting in the AMP Lab and through its movement to the ASF," said Chris Mattmann, Apache Spark Incubator Mentor at the ASF, and Chief Architect, Instrument and Science Data Systems Section at NASA JPL. "Matei, Patrick, Reynold, and many of the leaders of the project have really done a tremendous job and I'm excited to see the next generation of Hadoop-style systems have a home at the ASF."

"We have some very exciting features coming in the next months, so stay tuned for even more powerful versions of Spark," added Zaharia.

Availability and Oversight
As with all Apache products, Apache Spark software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. For documentation and ways to become involved with Apache Spark, visit http://spark.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than one hundred and seventy leading Open Source projects, including Apache HTTP Server --the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 400 individual Members and 3,500 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate sponsors including Budget Direct, Citrix, Cloudera, Comcast, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, Matt Mullenweg, Microsoft, Pivotal, Produban, WANdisco, and Yahoo.
For more information, visit http://www.apache.org/ or follow @TheASF on Twitter.

"Apache", "Spark", "Apache Spark", and "ApacheCon" are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

# # #



Distributed by http://www.pressat.co.uk/

Read the original blog entry...

Latest Stories
Machine learning provides predictive models which a business can apply in countless ways to better understand its customers and operations. Since machine learning was first developed with flat, tabular data in mind, it is still not widely understood: when does it make sense to use graph databases and machine learning in combination? This talk tackles the question from two ends: classifying predictive analytics methods and assessing graph database attributes. It also examines the ongoing lifecycl...
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-c...
When applications are hosted on servers, they produce immense quantities of logging data. Quality engineers should verify that apps are producing log data that is existent, correct, consumable, and complete. Otherwise, apps in production are not easily monitored, have issues that are difficult to detect, and cannot be corrected quickly. Tom Chavez presents the four steps that quality engineers should include in every test plan for apps that produce log output or other machine data. Learn the ste...
A valuable conference experience generates new contacts, sales leads, potential strategic partners and potential investors; helps gather competitive intelligence and even provides inspiration for new products and services. Conference Guru works with conference organizers to pass great deals to great conferences, helping you discover new conferences and increase your return on investment.
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
Containers and Kubernetes allow for code portability across on-premise VMs, bare metal, or multiple cloud provider environments. Yet, despite this portability promise, developers may include configuration and application definitions that constrain or even eliminate application portability. In this session we'll describe best practices for "configuration as code" in a Kubernetes environment. We will demonstrate how a properly constructed containerized app can be deployed to both Amazon and Azure ...
Everyone wants the rainbow - reduced IT costs, scalability, continuity, flexibility, manageability, and innovation. But in order to get to that collaboration rainbow, you need the cloud! In this presentation, we'll cover three areas: First - the rainbow of benefits from cloud collaboration. There are many different reasons why more and more companies and institutions are moving to the cloud. Benefits include: cost savings (reducing on-prem infrastructure, reducing data center foot print, redu...
SYS-CON Events announced today that Silicon India has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Published in Silicon Valley, Silicon India magazine is the premiere platform for CIOs to discuss their innovative enterprise solutions and allows IT vendors to learn about new solutions that can help grow their business.
We are seeing a major migration of enterprises applications to the cloud. As cloud and business use of real time applications accelerate, legacy networks are no longer able to architecturally support cloud adoption and deliver the performance and security required by highly distributed enterprises. These outdated solutions have become more costly and complicated to implement, install, manage, and maintain.SD-WAN offers unlimited capabilities for accessing the benefits of the cloud and Internet. ...
You want to start your DevOps journey but where do you begin? Do you say DevOps loudly 5 times while looking in the mirror and it suddenly appears? Do you hire someone? Do you upskill your existing team? Here are some tips to help support your DevOps transformation. Conor Delanbanque has been involved with building & scaling teams in the DevOps space globally. He is the Head of DevOps Practice at MThree Consulting, a global technology consultancy. Conor founded the Future of DevOps Thought Leade...
Founded in 2000, Chetu Inc. is a global provider of customized software development solutions and IT staff augmentation services for software technology providers. By providing clients with unparalleled niche technology expertise and industry experience, Chetu has become the premiere long-term, back-end software development partner for start-ups, SMBs, and Fortune 500 companies. Chetu is headquartered in Plantation, Florida, with thirteen offices throughout the U.S. and abroad.
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
DXWorldEXPO LLC announced today that "IoT Now" was named media sponsor of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. IoT Now explores the evolving opportunities and challenges facing CSPs, and it passes on some lessons learned from those who have taken the first steps in next-gen IoT services.