X

Oracle Artificial Intelligence Blog

Recent Posts

Oracle Labs

Achieve Fast, Scalable Querying for Very Large Graphs with Distributed Parallel Graph AnalytiX (PGX)

Graph data processing is already an integral part of big-data analytics with many applications in various domains including Finance, Cyber Security, Compliance, Retail, and Health Sciences. The adoption of graph processing is expected to further grow in the upcoming years. This is partially because graphs can naturally represent data that captures fine-grained relationships among entities. Graph analysis can provide valuable insights about such data by examining these relationships. Oracle Labs PGX has been providing graph solutions both for Big Data and for Relational Database customers.  In this post, I will describe our new distributed graph traversal solution that significantly improves performance and memory consumption of Oracle PGX's in-memory distributed graph query engine. That is especially true on very large graph queries where our competitors either fail to execute due to memory usage (see the performance figures later in the post), or fall back to slow and inefficient disk-based joins.  Typically, graph analysis is performed with two distinct but correlated methods, namely computational analysis (a.k.a graph algorithms) and pattern matching queries. Most graph engines nowadays, such as Oracle Labs PGX and Apache Spark GraphX/GraphFrames, support both graph algorithms and graph queries. With computational analysis, the user executes various algorithms that traverse the graph, often repeatedly, and calculate certain (numeric) values to get the desired information, e.g., PageRank or shortest paths. Pattern matching queries are declaratively given as graph patterns. The system finds every subgraph of the target graph that is topologically isomorphic/homomorphic to the query graph and satisfies any accompanying filters. For example, the following PGQL (for Property Graph Query Language) query: returns the persons p1 (called "John Doe") and p3 who have the largest number of common friends. Such queries can be used for example for friend recommendation.  Graph queries are a very challenging workload because they focus on the connections in the data. By following connections, i.e., edges, graph query execution can possibly explore large parts of the graph, generating large intermediate and final result sets with a combinatorial explosion effect. For example, on a very old snapshot of Twitter (known as the "Twitter graph" in academic graph-related research papers), a single-edge query (e.g., (v0)→(v1)) matches the whole graph, counting 1.4 billion results. A two-edge query (e.g., (v0)→(v1)→(v2)) returns more than nine trillion matches. Additionally, graph queries can exhibit extremely irregular access patterns and therefore require low-latency data accesses. For this reason, high-performance graph query engines try to keep data in main memory and scale out to a distributed system in order to handle graphs that exceed the capacity of a single node. Graph data processing and querying is an increasing market that has many applications in various domains including Finance, Cyber Security, Compliance, Retail, and Health Sciences.  These applications often require querying very large graph data in a fast and efficient manner. Oracle has been providing graph solutions both for Big Data and Relational Database audiences, while there are also commercial competitors like Amazon Neptune, Neo4J , as well as open source alternatives including Spark GraphFrame. Our invention can provide significant differentiation for Oracle's solution over those competitors.  Our invention significantly improves performance and memory consumption of Oracle's current in-memory distributed graph query engine, especially on very large graph queries where our competitors either fail to execute due to memory usage (e.g., Spark GraphFrame) or to fall back to slow and inefficient disk-based joins (Neptune or Neo4J).  Traditional Distributed Graph Traversal Approaches In a distributed system, graphs are typically partitioned across machines by vertices, meaning that each machine is storing a partition of the vertices of the overall graph, plus the edges corresponding to that vertex. For example, in the graph below, machine 0 stores vertices v0, v1, and v2, while machine 1 holds vertices v3 and v4. The edge between v0→v1 is local to machine 0, while the edge connecting v2→v3 is remote, as it spans machines 0 and 1. For large distributed graphs, none of the traditional graph exploration/traversal approaches is suitable for distributed queries. Breadth-first traversals and distributed joins quickly explode in terms of intermediate results and pose a performance challenge over the network. Depth-first traversals are challenging to paralellize and result in completely random data access patterns. In practice, most engines use breadth-style traversals combined with synchronous / blocking communication across machines.  Breadth-First Traversals In breadth-first traversals, the execution expands the query in width. The query pattern is matched to the target graph edge-after-edge. For example, matching pattern (a)→(b)→(c) to the example graph above could proceed by matching edge (a)→(b) to all graph edges, namely (v0)→(v1), (v0)→(v2) etc., and then proceed with expanding these intermediate results to match edge (b)→(c). Typically, the execution proceeds with synchronous traversals, i.e., the first edge is completely matched before moving to the next edge to match.  Expanding the query breadth-first is not ideal for a distributed system. First, materializing large sets of intermediate results at every step leads to an intermediate-result explosion. Breadth-first traversals typically have the benefit of locality (i.e., accessing adjacent edges one after the other). Unfortunately, locality in distributed graphs is much more limited, since many of the edges that are followed are remote. As a result, a large part of the intermediate results produced at each part of the query must be sent to the remote machines, creating large network bursts. Distributed Joins Graph traversals can be also expressed as relational joins. Following the edge (a)→(b) can be mapped to a join (or two) between the "vertex table" (holding the graph vertices) and the "edge table" (holding the edges): Distributed joins face the same problems as breadth-first traversals, plus an additional important problem. They perform table joins instead of graph traversals on top of specialized graph data structures. Unsurprisingly, graph-specific data structures are much faster than generic joins (see later in this post the performance comparison of Oracle Labs PGX Distributed to Apache Spark GraphFrames). Depth-First Traversals In depth-first traversals, the execution expands the query in depth. The query pattern is matched to the target graph as a whole, result-by-result. For example, matching the pattern (a)→(b)→(c) to the example graph above could proceed by matching (v0)→(v1)→(v2), then (v0)→(v2)→(v3), etc.  The main advantage of expanding the query depth-first is that intermediate results can be eagerly expanded to final results, thus reducing the memory footprint of query execution. Nevertheless, depth-first traversals have the disadvantages of not leveraging locality and of more complicated parallelism. The lack of locality is depth-first results in "edge chasing" – i.e., following one edge after the other as dictated by the query pattern – thus not accessing adjacent edges in order. The complication for parallelism manifests because the runtime cannot know in advance if there is enough work for all threads at any stage during the query. For instance, a query like the `MATCH (p1:person)-[:friend]->(p2:person)<-[:friend]-(p3:person) WHERE p1 <> p2 AND p2 <> p3 AND p1.name = "John Doe"` that I described in the beginning of the post will probably produce a single match for (p1). If this intermediate result is expanded in a depth-first manner, the amount of intermediate results (hence the parallelism) will grow slowly.  Dynamic Asynchronous Traversals For Distributed Graphs The Parallel Graph AnalytiX (PGX) toolkit, developed at Oracle Labs, is capable of executing graph analysis in a distributed way (i.e., across multiple servers); we refer to this capability as PGX.D. We are experimenting in PGX.D with a new hybrid approach to executing graph traversals that offers the best of breadth-first and depth-first traversals. Competing graph engines include the classic trade-off between performance and memory consumption for graph query execution: Sacrifice performance: One option is to use a fixed memory area (typically several gigabytes) for the execution, but spill intermediate results that do not fit to disk. Sacrifice memory: Another option is to perform the whole computation in memory. If the intermediate results do not fit the memory, this approach cannot compute this query on that graph. PGX.D enables the in-memory execution of any-size query without sacrificing memory or performance. In particular, graph queries in PGX.D: Can operate with a fixed, predefined amount of memory for storing intermediate results; Only use this memory for computations, i.e., do not spill any intermediate results to disk; and Can essentially calculate queries of any size, because intermediate results are turned to final results "on-demand", to keep memory consumption within limits. On the technical side, PGX.D achieves the aforementioned characteristics by deploying: Dynamic traversals, using Depth-first execution when needed, thus aggressively completing intermediate results and keeping the memory consumption within limits; and Breadth-first execution when possible, thus removing the performance complexities of depth-first traversals. Asynchronous communication of intermediate results from one machine to the other, thus not blocking/delaying local computation due to remote edges. Flow-control and incremental termination to keep global memory consumption (including messaging) within limits and guarantee the query termination (i.e., avoid deadlocks). Example Consider matching the pattern (a)→(b)→(c) to our example graph presented above and consider a worker thread currently starting to match from vertex v0. The thread can bind v0 as (a) and then try to expand to the edge (a)→(b). The worker could match (b) with v1. At this point, the dynamic traversal approach in PGX.D, based on how much memory does the query already consume, will dictate whether the worker will continue matching (a)→(b) edges (breadth) or should continue with the (b)→(c) edge (depth). In either case, the worker will eventually match v3 for (b) in which case PGX.D simply buffers the intermediate match to a message destined for machine 1 and continues matching the next (a)→(b) edge in a bread-expanding manner. Of course, PGX.D controls the number of outgoing messages / intermediate results in order to maintain the execution within limits. You can find more details in our GRADES 2017 publication that describes the main runtime queries in PGX.D (missing the local dynamicity which is described in a follow-up publication currently under submission). Comparing PGX.D to Open-Source Systems We use the LDBC social network benchmark graph (scale 100, 283 million vertices, 1.78 billion edges) and queries (we adapted the queries to reflect the current features of PGX.D; these changes mainly include the removal of HAVING clause, subqueries, and regular path queries).  We compare PGX.D to Apache Spark GraphFrames (version 0.7 on top of Spark 2.4.1) and PostgreSQL (version 11.2). Both PGX.D and GraphFrames execute on 8 machines connected with Infiniband. We perform 15 repetitions and report the median run.   Clearly, PGX.D is significantly faster than both GraphFrames and the traditional PostgreSQL RDBMS. PGX.D executes the total query suite 29.5- and 17.5-times faster than GraphFrames and PosgreSQL, respectively. In addition, PGX.D is configured to use approximately 16GB runtime memory for intermediate results, while the other two engines are configured to use the whole 756GB (8x for GraphFrames) of memory available in the underlying machines. As I mentioned earlier in this post, GraphFrames implements graph traversals on top of distributed joins on dataframes.  Large-Scale Queries  In this experiment, we evaluate the engines with very large queries. In particular: Q1: Simple cycle; pattern (v1)→(v2)→(v1) with (a) a COUNT(*) aggregation and (b) AVG aggregations of vertex data; Q2: Two-hop match; pattern (v1)→(v2)→(v3) with (a) a COUNT(*) aggregation and (b) AVG aggregations of vertex data. We execute these queries on graphs of increasing size: Graph # Vertices # Edges Description Livejournal  484K  68.9M Users and friendships Uniform Random  100M  1B  Uniform random edges Twitter  42.6M 1.47B Tweets and followers Webgraph-UK 77.7M 2.97B  2006 .uk domains This experiment highlights the true need for the scalable in-memory distributed graph traversal methodology of PGX.D. As the query exploration size increases, GraphFrames and PostgreSQL cannot keep up with the workload. Even with the two smallest graphs, PGX.D is on average 48- and 115-times faster than GraphFrames and PostgreSQL, respectively. Clearly, joins in PostgreSQL are significantly slower than graph traversals in PGX.D. With Q2 on Twitter and Webgraph-UK we see that even the 8 x 756GB = 6TB of total memory (backed by 1+TB of disk) is not sufficient for GraphFrames. As in the previous experiment, PGX.D completes these queries with approximately 16GB of memory in each machine.  What's Next We have explored dynamic asynchronous traversals in PGX.D only for graph querying and pattern matching with PGQL. We are currently exploring how to further leverage these fast in-memory explorations for machine learning. For example, we are developing large-scale random walks on top of PGX.D that will serve as the backbone for graph machine-learning solutions.  Conclusions I briefly presented our new dynamic asynchronous traversal approach for distributed graphs in PGX distributed mode. Using this approach PGX.D achieves fast, scalable, fully in-memory, with a small memory-footprint distributed graph queries, thus enabling graph processing on a whole new scale of graphs and queries. The hybrid/dynamic traversal functionality is on PGX.D roadmap so stay tuned for new information on its availability. For more information and for trying PGX, you can visit Oracle Labs PGX Technology Network Page.  

Graph data processing is already an integral part of big-data analytics with many applications in various domains including Finance, Cyber Security, Compliance, Retail, and Health Sciences. The adoption...

AI Events

Analytics and Data Summit, March 12-14, 2019

Oracle Conference Center, Redwood Shores, CA All Analytics, All Data, No Nonsense! Analytics and Data Summit 19 is a once-a-year opportunity for Oracle technology users.   Designed for Oracle technical professions, it is like a mini-Oracle World or Code One event held on the Oracle Campus for customers, users and partners, but focused on "novel and interesting" use cases of Oracle technologies.   The 3-day event is a great way to network with peers, share, and learn from “novel and interesting use cases” of Oracle’s amazing Analytics and Data technologies.    Analytics and Data Summit 19 delivers: More than 100 technical sessions delivered by technical experts, customers, partners, product managers and developers.  Oracle Database, Oracle Autonomous, Oracle Machine Learning, Oracle Advanced Analytics, Oracle Analytics Cloud, Oracle DVD, Oracle Business Intelligence EE, Oracle Spatial & Graph and more! IoT, Python, R, Blockchain, Kafka, Streaming, RDF and more! Hands on Labs training taught by Product Managers and technical experts Networking opportunities to meet with Oracle ACEs, Customers and Global Leader Customers, Product Managers, Developers and Partners and Consultants. Latest product updates and insider information from Oracle executive management, product managers and developers Partners and Consultants who can help you solve your specific challenges See this 2 min Analytics and Data Summit overview video on Twitter for more information. Click on the Analytics and Data Summit 19 Schedule for the full detailed 3-day Agenda. Analytics and Data Summit 19 is run by an independent user group (BIWA Inc.); not Oracle. Check out #AnD_Summit on Twitter for any current applicable discount codes to receive $50-$75 off the then current registration fee.  Please helps spread the word to friends, colleagues and users and Oracle technical professionals to register.  Being run by an independent user group (BIWA Inc.), we rely on word of mouth and grass roots user community initiatives—a lot, so really appreciate the community support! We have another great lineup this year and hope to have our biggest Analytics and Data Summit event ever!  Hope to see you there. 

Oracle Conference Center, Redwood Shores, CA All Analytics, All Data, No Nonsense! Analytics and Data Summit 19 is a once-a-year opportunity for Oracle technology users.   Designed for Oracle...

AI in Business

How AI is fueling smarter recruitment

  Prior planning and preparation prevents poor performance – a phrase that could be soon be out of date. With the advent of new technologies, and in particular Artificial Intelligence, businesses will soon be able to predict the outcome of different scenarios, before they even come to fruition.   From an HR standpoint, these technologies could transform how we look after team members, find new ones, and initiate HR measures in time to maximize the retention of your employees.   According to Deloitte’s 2018 Global Human Capital report, 72% of business leaders already recognize that AI, robotics and automation are important – but less than a third (31%) are ready to act. So, how can HR teams actually use AI and predictive analytics?   Know your teams Previously, HR professionals checked how happy people were during quarterly, biannual, or even annual reviews.   Now, managers and HR teams can monitor mood and attitude through chatbots with their teams each week. They can check on training programs to see who’s working on their development and who’s dropped off. And they can review skillsets and new opportunities in seconds, helping to keep employees engaged and feeling valued.   In other words, they can reduce staff turnover. With an intelligent Human Capital Management (HCM) system that uses predictive analytics, managers can draw on dozens of datasets to forecast the potential of employees leaving, work out what to do, and then act. And if new team members are needed, it can fuel smarter hiring too.   Anticipate your needs Through the filter of predictive analytics, data on current talent can help you work out which skills and people you’ll need next. You’ll be able to do workforce planning far in advance, finding the experience and capabilities that will matter in a year – or three – and molding the teams you know the business will need. Your AI-based system could also reveal where you should look for candidates and the possible impact on the team they’d be joining – not to mention the business as a whole. It could even reach out to the best people, making an automated but personalized first approach. People are the Business A prerequisite of these insights is a full transparency of skills and personality traits over all employees. This has been a wish for a long time for HR. Today we can realize this due to the almost endless capacities in managing complex data in Clouds. People are the business. Besides the needed equity capital means, the most important factor to secure future success of a corporation is having engaged people. AI capabilities as described above can help to identify what your people truly yearn for and assess the risks and chances of putting them in the position they are longing for. By using AI and predictive analytics to better understand the workforce, HR leaders can drive greater value and manage risk more effectively. But how do they currently view AI? Read the results of our latest survey to find out.

  Prior planning and preparation prevents poor performance – a phrase that could be soon be out of date. With the advent of new technologies, and in particular Artificial Intelligence, businesses will...

AI News

H2O.ai Driverless AI Cruises on Oracle Cloud Infrastructure GPUs

One of the things I'm most excited about at Oracle Cloud Infrastructure is the opportunity to do cool things with our partners in the artificial intelligence (AI)/machine learning (ML) ecosystem. H2O.ai is doing some really innovative things in the ML space that can help power these sorts of use cases and more. Their open source ML libraries have become the de facto standard in the industry, providing a simple way to run a variety of ML methods, from logistic regressions and GBT to an AutoML capability that tunes the model automatically. H2O.ai has continued to build on this functionality with GPU support with what I think might be the best-named product of all time, Sparkling Water. (Yes, it's H2O running on Spark. Get it?). The latest H2O.ai product is Driverless AI. The name is perhaps a bit misleading. Driverless AI isn't related to driverless cars. Instead, it's an ML platform that provides a GUI on top of the H2O ML libraries that we already know. The GUI provides support for a significant chunk of the ML lifecycle: Data loading Visualization Feature engineering Model creation Model evaluation Deployment for scoring Software to do all this simply wasn't available five years ago. Instead, a highly skilled person would have had to put everything together by hand over a period of weeks or months. There are still some gaps. For example, data wrangling is still a mess even with the time series support and automatic feature generation abilities of Driverless AI. That said, building accurate ML models has never been easier. So, what does this all have to do with Oracle Cloud Infrastructure? We're building data centers all over the world, and they're being populated with some nifty hardware, including cutting-edge GPU boxes. The new BM.GPU3.8 is the top of that range with 8 NVIDIA Volta cards. This is the perfect machine to handle the compute demands of DAI, and we're pricing them to be significantly less expensive than any competing platform. For our provisioning plane, Oracle Cloud Infrastructure has made an open choice. Rather than building a proprietary technology such as Amazon Web Services CloudFormation, we've chosen to adopt the open source industry standard of Terraform. We've joined the Cloud Native Computing Foundation (CNCF) as a Platinum member and contributed our Terraform provider to the open source project. We've partnered with H2O.ai to write some Terraform modules that deploy H2O.ai Driverless AI on Oracle Cloud Infrastructure. The first module deploys on GPU machines. I worked with our team to record this video that demonstrates how to use the module. It also includes a very basic demo.   This is just the beginning of our partnership with H2O.ai. We're working on several activities with them: Oguz Pastirmaci from the Oracle Cloud Infrastructure data and AI team is working to enhance the Terraform module. Building a model is fast with 8 GPUs. It's going to be a lot faster with a whole cluster of those machines humming in parallel. We're discussing how we might be able to simplify deployment even further, providing a more integrated experience with a higher-level interface. We'll be at H2O World San Francisco 2019 on Feb. 4-5. Although the event won't have booths, a number of us should be wandering around the conference. Say hi! If you're interested in learning more about H2O.ai on Oracle Cloud Infrastructure or about our AI/ML partnerships in general, reach out to me at ben.lackey@oracle.com. You can also follow me on Twitter @benofben.

One of the things I'm most excited about at Oracle Cloud Infrastructure is the opportunity to do cool things with our partners in the artificial intelligence (AI)/machine learning (ML) ecosystem....

Types of Machine Learning and Top 10 Algorithms Everyone Should Know

  From detecting skin cancer to sorting corn cobbs to predicting early equipment maintenance, machine learning has granted computer systems entirely new abilities.  Algorithms are the methods used to extract patterns from data for the purpose of granting computers the powers to predict and draw inferences. It will be interesting to learn how machine learning really works under the hood.  Let's walk through a few examples and use it as an excuse to talk about the process of getting answers from your data using machine learning. Here are top 10 machine learning algorithms that everyone involved in Data Science, Machine Learning, and AI should know about. Before we go further it is worth explaining the Taxonomy. Machine learning algorithms are divided into three broad categories: Supervised learning Unsupervised learning Reinforcement learning Supervised Learning Supervised learning is the task of inferring a function from the training data. The training data consists of a set of observations together with its outcome. This is used when you have labeled data sets available to train e.g. a set of medical images of human cells/organs that are labeled as malignant or benign. Supervised learning can be further subdivided into: Regression analysis Classification analysis Regression Analysis Regression analysis is used to predict numerical values. The top regression algorithms are: Linear Regression Linear regression model relationships between observation and outcome using a straight-line.  Root mean squared error and gradient descent is used to fit the best possible line. The methodology provides insights into the factors that have a greater influence on the outcome, for example, the color of an automobile may not have a strong correlation to its chances of breaking down, but the make/model may have a much stronger correlation. Polynomial Regression Polynomial regression it is a form of regression analysis in which the relationship between the observation and the outcome is modeled as an nth degree polynomial, the method is more reliable when the curve is built on a large number of observations that are distributed in a curve or a series of humps, and not linear. Classification analysis Classification analysis is a series of techniques used to predict categorical values, i.e. assign data points to categories e.g. Spam Email vs Non-Spam Email, or Red vs Blue vs Green objects. The top classification algorithms are: Logistic Regression Logistic regression is a misleading name even though the name suggests regression but in reality, it is a classification technique.  It is used to estimate the probability of a binary (1 or 0) response e.g. malignant or benign. It can be generalized to predict more than two categorical values also e.g. is the object an animal, human, or car. K-Nearest Neighbor K nearest neighbors is a classification technique where an object is classified by a majority vote. Suppose you are trying to classify the image of a flower as either Sunflower or Rose, and if K is chosen as 3, then 2 or all 3 of the 3 nearest classified neighbors should belong to the same flower class for the test sample image to be assigned that flower class. Nearness is measured for each dimension that is used for classification, for example, color and how close the color of the test sample to the color of other pre-classified flower images. It is neighbors the observation is assigned to the class which is most common among its K nearest neighbors. The best choice of K depends upon the data generally. The larger value of K reduces the effect of noise on the classification number. Decision Trees Decision trees is a decision support tool that uses a tree-like model of decisions. The possible consequences decision trees aim to create is a model that predicts by learning simple decision rules from the training data.   Unsupervised Learning Unsupervised learning is a set of algorithms used to draw inferences from data sets consisting of input data without using the outcome. The most common unsupervised learning method is cluster analysis which is used for exploratory data analysis to find hidden patterns or groupings in data. The popular unsupervised learning algorithms are: K-means Clustering K-means clustering aims to partition observations into K clusters, for instance, the item in a supermarket are clustered in categories like butter, cheese, and milk - A group dairy products. K-means algorithm does not necessarily find the most optimal configuration, the k-means algorithm is usually run multiple times to reduce this effect.   Principal Component Analysis Principal component analysis is a technique for feature extraction when faced with too many features or variables. Say you want to predict the GDP of United States, you have many variables to consider – Inflation, stock data for index funds as well as individual stocks, interest rate, ISM, jobless claims, unemployment rate, and the list goes on. Working with too many variables is problematic for machine learning as there can be risk of overfitting, lack of suitable data for each variable, and degree of correlation of each variable on the outcome. The first principal component has the largest possible variance that accounts for as much of the variability in the data as possible, each succeeding component, in turn, has the highest variance possible under the constraint that it is orthogonal to the preceding component.   Reinforcement Learning Reinforcement learning is different from both supervised and unsupervised learning. The goal in supervised learning is to find the best label based on past history of labeled data, and the goal in unsupervised learning is to assign logical grouping of the data in absence of outcomes or labels. In reinforcement learning, the goal is to reward good behavior, similar to rewarding pets for good behavior in order to reinforce that behavior. Reinforcement learning solves the difficult problem of correlating immediate actions with the delayed outcomes they create. Like humans, reinforcement learning algorithms sometimes have to contend with delayed gratification to see the outcomes of their actions or decisions made in the past, for example, the rewarding for a win in a game of chess or maximizing the points won in a game of Go with AlphaGo over many moves. Top reinforcement learning algorithms include Q-Learning, State–Action–Reward–State–Action (SARSA), Deep Q-Network (DQN), and Deep Deterministic Policy Gradient (DDPG). The explanation for these algorithms gets fairly involved and is worthy of its own dedicated blog post in future. Oracle offers a complete data science and machine learning frameworks, algorithms in its data science platform and also embedded in its SaaS applications and database. Click here to learn more about Oracle’s AI and Machine Learning offerings. p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; color: #000000} li.li2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px 'Helvetica Neue'; color: #e4af0a} span.s1 {color: #454545} span.s2 {color: #e4af0a} ol.ol1 {list-style-type: decimal} Image Sources/Credits (in order of appearance): http://datasciencecentral.com http://en.wikipedia.org https://stats.stackexchange.com https://medium.com/@adi.bronshtein/a-quick-introduction-to-k-nearest-neighbors-algorithm-62214cea29c7 https://canopylabs.com/resources/interpreting-complex-models-with-shap/ https://towardsdatascience.com/k-means-clustering-identifying-f-r-i-e-n-d-s-in-the-world-of-strangers-695537505d https://deepmind.com/research/alphago/            

  From detecting skin cancer to sorting corn cobbs to predicting early equipment maintenance, machine learning has granted computer systems entirely new abilities.  Algorithms are the methods used to...