Blog Feed Post

Transforming from Autonomous to Smart: Reinforcement Learning Basics

In the blog “From Autonomous to Smart: Importance of Artificial Intelligence,” we laid out the artificial intelligence (AI) challenges in creating “smart” edge devices:

  • Artificial Intelligence Challenge #1: How do the Artificial Intelligence algorithms handle the unexpected, such as flash flooding, terrorist attacks, earthquakes, tornadoes, police car chases, emergency vehicles, blown tires, a child chasing a ball into the street, etc.?
  • Artificial Intelligence Challenge #2: The more complex the problem state, the more data storage (to retain known state history) and CPU processing power (to find the optimal or best solution) is required in the edge devices in order to create “smart.”

We also talked about how Moore’s Law isn’t going to bail us out of these challenges; that the growth of Internet of Things (IOT) data and the complexity of the problems that we are trying to address at the edge (think “smart” cars) is growing much faster than Moore’s Law can accommodate.

So we are going to use this blog to deep dive into the category of artificial intelligence called reinforcement learning. We are going to see how reinforcement learning might help us to address these challenges; to work smarter at the edge when brute force technology advances will not suffice.

Why Not Brute Force

With the rapid increases in computing power, it’s easy to get seduced into thinking that raw computing power can solve problems like smart edge devices (e.g., cars, trains, airplanes, wind turbines, jet engines, medical devices). But to understand the scope of the challenge, consider the following:

  • Checkers has 500 billion billion (that’s right, billion twice) possible board moves. That’s 500,000,000,000,000,000,000 possible moves (that’s 20 zeros).
  • The number of possible moves in a game of chess is a minimum of 10120 moves (that’s 120 zeros).

Look at the dramatic increase in the number of possible moves between checkers and chess even though the board layout is exactly the same. The only difference between checkers and chess is the types of moves that pieces can make. A checker has only two moves: forward diagonally and jump competitor’s pieces diagonally (once a checker is “kinged”, then it can move diagonally both forward and backwards). In chess, the complexity of the chess piece only increases slightly (rooks can move forward and sideways a variable number of spaces, bishops can move diagonally a variable number of spaces, etc.), but the complexity of the potential solutions exploded (from 20 zeros to 120 zeros). And both checkers and chess operate in a deterministic environment, where all possible moves are known ahead of time and there are no surprises (unless your dog decides that he wants to play at the same time).

Now think about the number and breadth of “moves” or variables that need to be considered when driving a car in a nondeterministic (random) environment:  weather (precipitation, snow, ice, black ice, wind), time of day (day time, twilight, night time, sun rise, sun set), road conditions (pot holes, bumpy, slick), traffic conditions (number of vehicles, types of vehicles, different speeds, different destinations). One can quickly see that the number of possible moves is staggering. We need a better answer than brute force.

Reinforcement Learning to the Rescue

Reinforcement Learning is for situations where you don’t have data sets with explicit known outcomes, but you do have a way to telling whether you are getting closer to your goal (reward function). Reinforcement learning learns through trial-and-error how to map situations to actions so as to maximize rewards. Actions may affect immediate rewards but actions may also affect subsequent or longer-term rewards, so the full extent of rewards must be considered when evaluating the reinforcement learning effectiveness (i.e., balancing short-term rewards like optimizing fuel consumption while driving a car balanced against the long-term rewards of getting to your destination on time and safely).

Reinforcement learning is used to address two general problems:

  • Prediction: How much reward can be expected for every combination of possible future states (e.g., how much can we collect from delinquent accounts based on the following steps?)
  • Control: By moving through all possible combinations of the environment (interacting with the environment or state space), find a combination of actions that maximizes reward and allows for optimal control (e.g., steering an autonomous vehicle, winning a game of chess).

The children’s game of “Hotter or Colder” is a good illustration of reinforcement learning; rather than getting a specific “right or wrong” answer with each data action, you’ll get a delayed reaction and only a hint of whether you’re heading in the right direction (hotter or colder).

Reinforcement Learning and Video Games

Reinforcement learning needs lots and lots of data from which to learn and very powerful compute to support its “trial and error” learning approach. Because it can take a considerable amount of time to gather enough data across enough scenarios in the real world, many of the advances in reinforcement learning are occurring from playing against video games.

One such example is the MarI/O program (‘MarI/O AI Program Learns To Play Super Mario World’). MarI/O is an artificial intelligence application that has learned how to play the video game “Super Mario World” (see Figure 1).

Figure 1: MarI/O Playing “Super Mario World” Video Game

Figure 1: MarI/O Playing “Super Mario World” Video Game


Some key points to learn from MarI/O:

  • MarI/O uses random steps to start its exploration process and to re-start whenever it stalls.
  • MarI/O takes inputs by sensing white boxes (safe landing areas) and black boxes (obstacles).
  • Rewards (Fitness points in the case of Mario Brothers) and punishment (death) guide the learning process (try to maximize rewards while minimizing or eliminating punishments)
  • Sometimes losing (failing) is the only way to learn.

Figure 2 shows the progress that MarI/O made in learning the environment in order to maximize its fitness points and survive.

Figure 2:  MarI/O Learning Curve

Figure 2:  MarI/O Learning Curve


But “Mario Super World” is a deterministic or known environment where the gaming patterns repeat themselves. For example, the chart in Figure 2 doesn’t show any regressions in performance, where the AI model hit a dead end and had to retreat and re-set itself. Retreating is a common behavior that the more advanced AI models must support.

And the game playing strategy in this model is very simple – just survive. There is no strategy to maximize the number of coins captured, which is an equally important part of playing Super Mario World (or at least if you want bragging rights with your friends).

So while the exercise is definitely educational, it’s not terribly application for our smart car example.

Grand Theft Auto to the Rescue!

Applying reinforcement learning to teach a car to drive requires an unbelievably huge quantity of data. Having a bunch of autonomous car tooling around the ‘hood just can’t generate enough data fast enough to optimize the models necessary to safely drive a vehicle. However, autonomous car companies have discovered a much richer training environment – Grand Theft Auto!


The virtual environment within the video game Grand Theft Auto is so realistic that it is being used to generate data that’s nearly as good as that generated by using real-world imagery. The most current version of Grand Theft Auto has 262 types of vehicles, more than 1,000 different unpredictable pedestrians and animals, 14 weather conditions and countless bridges, traffic signals, tunnels and intersections. It’s nearly impossible for an autonomous car manufacturer to operate enough vehicles in enough different situations to generate the amount of data that can be virtually gathered by playing against Grand Theft Auto.

Reinforcement Learning Summary

Ultimately, reinforcement learning model development is going to need to wrestle with real (not virtual) random obstacles that pop up in the normal driving of a vehicle. Grand Theft Auto might be great for teaching vehicles how to operate in an environment with hoodlums, robberies, heists and gratuitous car chases, but more real-world experience is going to be needed in order for autonomous cars to learn to handle the random, life-endangering threats, such as a child chasing a ball into the street, a new pothole caused by some (undocumented) construction or random debris falling off of truck beds.

One recent technology development that might be the key to solving “impossible” problems like autonomous driving is quantum computing. A future blog will explore how combining artificial intelligence with quantum computing might just help us solve the “impossible” problems.


Article Sources:

Reinforcement Learning and the Internet of Things


Ways in Which Machines Learn


Reinforcement Learning and AI


AI Is Learning To Drive In ‘Grand Theft Auto.’ It’s Going … Great


Deep Learning Machine Teaches Itself Chess in 72 Hours



The post Transforming from Autonomous to Smart: Reinforcement Learning Basics appeared first on InFocus Blog | Dell EMC Services.

Read the original blog entry...

More Stories By William Schmarzo

Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business”, is responsible for setting the strategy and defining the Big Data service line offerings and capabilities for the EMC Global Services organization. As part of Bill’s CTO charter, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He’s written several white papers, avid blogger and is a frequent speaker on the use of Big Data and advanced analytics to power organization’s key business initiatives. He also teaches the “Big Data MBA” at the University of San Francisco School of Management.

Bill has nearly three decades of experience in data warehousing, BI and analytics. Bill authored EMC’s Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements, and co-authored with Ralph Kimball a series of articles on analytic applications. Bill has served on The Data Warehouse Institute’s faculty as the head of the analytic applications curriculum.

Previously, Bill was the Vice President of Advertiser Analytics at Yahoo and the Vice President of Analytic Applications at Business Objects.

Latest Stories
In his session at @DevOpsSummit at 20th Cloud Expo, Kelly Looney, director of DevOps consulting for Skytap, showed how an incremental approach to introducing containers into complex, distributed applications results in modernization with less risk and more reward. He also shared the story of how Skytap used Docker to get out of the business of managing infrastructure, and into the business of delivering innovation and business value. Attendees learned how up-front planning allows for a clean sep...
Blockchain is a shared, secure record of exchange that establishes trust, accountability and transparency across supply chain networks. Supported by the Linux Foundation's open source, open-standards based Hyperledger Project, Blockchain has the potential to improve regulatory compliance, reduce cost and time for product recall as well as advance trade. Are you curious about Blockchain and how it can provide you with new opportunities for innovation and growth? In her session at 20th Cloud Exp...
IoT is at the core or many Digital Transformation initiatives with the goal of re-inventing a company's business model. We all agree that collecting relevant IoT data will result in massive amounts of data needing to be stored. However, with the rapid development of IoT devices and ongoing business model transformation, we are not able to predict the volume and growth of IoT data. And with the lack of IoT history, traditional methods of IT and infrastructure planning based on the past do not app...
To get the most out of their data, successful companies are not focusing on queries and data lakes, they are actively integrating analytics into their operations with a data-first application development approach. Real-time adjustments to improve revenues, reduce costs, or mitigate risk rely on applications that minimize latency on a variety of data sources. Jack Norris reviews best practices to show how companies develop, deploy, and dynamically update these applications and how this data-first...
Intelligent Automation is now one of the key business imperatives for CIOs and CISOs impacting all areas of business today. In his session at 21st Cloud Expo, Brian Boeggeman, VP Alliances & Partnerships at Ayehu, will talk about how business value is created and delivered through intelligent automation to today’s enterprises. The open ecosystem platform approach toward Intelligent Automation that Ayehu delivers to the market is core to enabling the creation of the self-driving enterprise.
"At the keynote this morning we spoke about the value proposition of Nutanix, of having a DevOps culture and a mindset, and the business outcomes of achieving agility and scale, which everybody here is trying to accomplish," noted Mark Lavi, DevOps Solution Architect at Nutanix, in this SYS-CON.tv interview at @DevOpsSummit at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Internet-of-Things discussions can end up either going down the consumer gadget rabbit hole or focused on the sort of data logging that industrial manufacturers have been doing forever. However, in fact, companies today are already using IoT data both to optimize their operational technology and to improve the experience of customer interactions in novel ways. In his session at @ThingsExpo, Gordon Haff, Red Hat Technology Evangelist, shared examples from a wide range of industries – including en...
In IT, we sometimes coin terms for things before we know exactly what they are and how they’ll be used. The resulting terms may capture a common set of aspirations and goals – as “cloud” did broadly for on-demand, self-service, and flexible computing. But such a term can also lump together diverse and even competing practices, technologies, and priorities to the point where important distinctions are glossed over and lost.
"We're here to tell the world about our cloud-scale infrastructure that we have at Juniper combined with the world-class security that we put into the cloud," explained Lisa Guess, VP of Systems Engineering at Juniper Networks, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Enterprise architects are increasingly adopting multi-cloud strategies as they seek to utilize existing data center assets, leverage the advantages of cloud computing and avoid cloud vendor lock-in. This requires a globally aware traffic management strategy that can monitor infrastructure health across data centers and end-user experience globally, while responding to control changes and system specification at the speed of today’s DevOps teams. In his session at 20th Cloud Expo, Josh Gray, Chie...
Consumers increasingly expect their electronic "things" to be connected to smart phones, tablets and the Internet. When that thing happens to be a medical device, the risks and benefits of connectivity must be carefully weighed. Once the decision is made that connecting the device is beneficial, medical device manufacturers must design their products to maintain patient safety and prevent compromised personal health information in the face of cybersecurity threats. In his session at @ThingsExpo...
All organizations that did not originate this moment have a pre-existing culture as well as legacy technology and processes that can be more or less amenable to DevOps implementation. That organizational culture is influenced by the personalities and management styles of Executive Management, the wider culture in which the organization is situated, and the personalities of key team members at all levels of the organization. This culture and entrenched interests usually throw a wrench in the work...
"We're a cybersecurity firm that specializes in engineering security solutions both at the software and hardware level. Security cannot be an after-the-fact afterthought, which is what it's become," stated Richard Blech, Chief Executive Officer at Secure Channels, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
In his session at 20th Cloud Expo, Mike Johnston, an infrastructure engineer at Supergiant.io, discussed how to use Kubernetes to set up a SaaS infrastructure for your business. Mike Johnston is an infrastructure engineer at Supergiant.io with over 12 years of experience designing, deploying, and maintaining server and workstation infrastructure at all scales. He has experience with brick and mortar data centers as well as cloud providers like Digital Ocean, Amazon Web Services, and Rackspace. H...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.