Oracle AI & Data Science Blog
Learn AI, ML, and data science best practices

Recent Posts

AI in Business

3 Ways to Apply Emerging Technology to Your Company

In our last blog [1], we covered several housekeeping items for your business applications that you and your team could tackle during this stay-at-home period. One of the suggestions was learning about an emerging technology, such as AI, IoT, or digital assistants. In this blog, I’m going to dig deeper into specific emerging technologies and how you can apply an emerging technology to add extra value to your company. First, learn about your company My first recommendation is to gain a better understanding of your company.  If you’re working for a publicly traded company, a good place to start is the 10-K. It’s a document that your company is required to file every year with the SEC. It details financial performance, earnings per share, the organizational structure, subsidiaries, executive compensation, and much, much more information. The purpose of the 10-K is to make investors aware of the innerworkings of a company so they can make timely buy/sell decisions. It’s a great place for you to gain a big picture understanding of your company over and above what you may know about your specific line of business, geographical region, or the particular division in which you work. Next, learn more about your company’s new objectives   Most likely, these will be internal documents, and they may be readily available. The strategic goals may have shifted so be aware. These will help you understand the path forward … how your company plans to grow, new markets the leadership wants to enter, and possibly new products on the horizon. These strategic priorities may also give you an indication of challenges, struggles, and points of pain that your company needs to address during this time.   Apply emerging technologies to your company’s objectives Once you have an understanding of your company’s growth strategies and challenges, you can apply the information you learn about emerging technologies to help solve a challenge, streamline a process, or improve the employee or customer experience. This is your opportunity to think outside of the box as they say, but be sure that your ideas are tied to strategic objectives. Here are some examples of emerging technologies in action that may get you thinking about ideas for your company. Artificial Intelligence for Sales and Marketing The current crisis situation has literally shifted markets in a matter of days. Some segments of the economy went from tremendous growth to double-digit losses in a week. Your company may need to shift its sales strategy in the short-term to find ways to recover lost revenue or tap into new markets to keep pace with economic shifts. AI solutions apply machine learning algorithms to your customer data to identify an ideal customer profile and then match that profile to trusted 3rd-party data to create a target list of prospects for your sales and marketing teams. AI for Supply Chain Similar to entire markets that shifted quickly, suppliers all over the world found themselves dealing with shipping and receiving restrictions that may have prevented them from fulfilling orders. That situation may have put your company in scramble-mode as you tried to find new suppliers.  Depending on your industry, component parts may have spiked in price, impacting sales projections and profit margins. An AI application can analyze your ERP data, including current suppliers, POs, invoices, payables, etc., and compare company information with 3rd-party data to give you insight into your supplier ecosystem. It can also help you rank suppliers, figure out where you can negotiate better discounts, and create models to help your company optimize procurement processes. Digital Assistants/Chatbots for Workforce or Customer Experience This conversational interface uses machine learning and natural language processing to guide people to the right source of information, the appropriate live support person, or walk someone through a process. Digital assistants can be a relatively quick and easy way to fill a gap as your company shifts staff with people working from home. Chatbots can also greatly improve the user experience for both internal and external use cases.  For example, your company can add a digital assistant to your employee portal to help people find specific HCM benefits information, or you can add a digital assistant to the home page of your website to help people get to the right resource to help with a specific question or issue. Learning More Please read my last blog that is referenced at the end of this article for more information on emerging technologies. There is a lot of free content online; from YouTube tutorials to online classes. According to a recent cnbc.com article, many 4-year colleges are offering free online classes, so people can learn while staying safe at home. You can search for the online classes about emerging technologies that fit your interests and company objectives. I hope this gives you inspiration to learn something new so you can apply emerging technologies to develop professionally and to help tackle the priorities and challenges in your company.   For more information on a complete suite of cloud-based applications that includes emerging technologies, go to www.oracle.com/applications     [1] https://blogs.oracle.com/saas/tackling-your-cloud-applications-to-do-list          

In our last blog [1], we covered several housekeeping items for your business applications that you and your team could tackle during this stay-at-home period. One of the suggestions was learning...

AI in Business

Bridging the gap for remote workers through digital assistants

By Suhas Uliyar, Vice President, AI and Digital Assistant AI-based chatbots or digital assistants stand to change the way we interact with business applications, not just consumer ones. The main benefit is the ability to get immediate responses to queries via natural local language, without having to download apps or get training. While we have the freedom to engage in user-friendly experiences in our personal lives – such as Alexa and Siri – there have been few options for people in their professional lives. But that’s changing. As Steve Miranda, Oracle’s executive vice president of application development, remarked, “In HR, every common question or transaction has lent itself nicely to digital assistants. Within the next year, we will be calling HTML our ‘old UI.’ Every transaction you have will be through a digital assistant UI.” Work-at-home requirements associated with the spread of COVID-19 have made it all the more important to give employees easy access to ever-changing information – on company policies, insurance coverage, and public health guidance, in addition to the usual cadence of questions on vacation balances, status of expenses, and IT workarounds.  Here are a few key ways in which chatbots and digital assistants can help. An assistant for every employee Finding answers to simple questions can be a frustrating experience if there is no easy way to do so. Take, for example, basic questions like “how many vacation days do I have left?” or “what do I do if I have a change in marital status?” In some cases, employees need to log into their VPN to find the policy document or a web page, or the application – which they then need to further navigate to find answers to these straightforward questions. With a digital assistant, employees can simply speak the question out loud in a natural way or simply input the text, instead of having to navigate multiple screens or interfaces, and they will receive an immediate response. Not only that, the digital assistant can further help them by recommending or taking action as a follow-up to their original interaction and be a true assistant for the employees. For example, rather than just informing the employee on what to do to change their marital status, the digital assistant can actually trigger the change process by gathering the necessary information and then updating the relevant systems with that information.  Answering general policy questions With rapidly evolving governmental directives such as sheltering-in-place and social distancing, most organizations are quickly adapting their HR policies and guidelines. At the same time, employees need help and answers from their organizations more than ever. Questions may range widely from policies on employment, travel guidelines, and health and safety instructions, as well as guidelines on dealing with and working during the pandemic. In some cases, the information is very dynamic and changes by the minute. Digital assistants give employees a consistent channel, which is available 24x7, to ask their questions so they can get an immediate response – while freeing up the HR and IT/support teams to manage the more complex challenges they are facing today. In fact, you can also use digital assistants to send proactive alerts and notifications like changes in policies, so that employees don’t need to keep checking or search for the latest information time and again.  Supporting employee health and safety Practicing social distancing has also had an impact on recruitment, onboarding, and training processes for organizations. In effect, these processes provide resources and support that most organizations may seriously need in these uncertain times. Using a digital assistant, businesses can drive candidates’ pre-screening and interview scheduling online, across any messaging channel. You can drive virtual onboarding by enabling easy remote online access to relevant trainings, policies, and materials all via a digital assistant. Data can also be safely recorded to keep track of employee health status based on the organization’s health policy and guidelines. A digital assistant can also save the employee from the time-consuming task of completing forms or reporting on any health-related issues at work.  Employee self-service Whether working remotely or on-site as needed, employees may need access to both information and processes beyond just the HR systems. From submitting expenses to filing IT support tickets to making changes to travel plans, we touch a number of systems or applications as employees. Some processes even span across multiple systems, like role- or location-based expense reimbursement policies, where the system requires role information from the HR system before interacting with the finance/ERP system for reimbursement. A digital assistant is one common interaction point for employees, contractors, or partners across multiple applications and can provide a quick, consistent, and concise response.  Leading an organization through this unprecedented time has put an increased demand on the HR function. As a result, organizations would be wise to leverage AI-powered technologies such as digital assistants to scale their functions, create online connection and engagement, and provide dynamic updates on policies and safety guidance without bogging down human communication channels – which need to be available for essential tasks. A digital assistant can support an organization by providing benefits such as: Lowering operational costs via online self-service & automation Expanding HR availability 24x7 across different channels Enabling easy access to information and processes delivered via text or natural language Delivering consistent information and maintaining employee engagement Enabling proactive HR outreach Digital assistants can support the functions employees may need now while creating efficiencies for the long term. For more information or to discuss how a digital assistant can support your needs, email us here. Stay well, and be safe.  

By Suhas Uliyar, Vice President, AI and Digital Assistant AI-based chatbots or digital assistants stand to change the way we interact with business applications, not just consumer ones. The...

AI in Business

Oracle Digital Assistant Named a Leader in Ovum Decision Matrix for Intelligent Virtual Assistants

Ovum, a leading analyst firm and part of the global technology research organization, Omdia, has recognized Oracle Digital Assistant as a leader in the market in its latest research report, "Ovum Decision Matrix: Selecting an Intelligent Virtual Assistant Solution, 2020–21." The report analyzes the evolution of virtual intelligent assistants, the increasing scope of use cases, and the market landscape, and evaluates 10 niche and large technology vendors to determine Oracle Digital Assistant as one of the leaders in this market. Oracle Digital Assistant is a comprehensive, AI-powered conversational interface for business applications. Oracle Digital Assistant interprets the user’s intent so it can automate processes and deliver contextual responses to their voice or text commands to enrich the user experience, eliminate helpdesk and support overhead, and enable scale for communications and engagement. The Ovum report specifically highlights Oracle Digital Assistant as an easy-to-build solution, thanks to its no code, design-by-example, Conversational Design Interface that is intended to be used by non-developers to build, train, test, deploy, and monitor AI-powered digital assistant on channels of choice. Ovum also noted Oracle Digital Assistant’s advanced linguistic and deep learning-based natural language processing (NLP) models as a key strength that enables the Digital Assistant to better understand domain specific vocabulary, and respond with contextual information and best next step actions accordingly. Oracle Digital Assistant also received kudos in the report for providing an “enterprise-ready” solution. Organizations leveraging Oracle Digital Assistant know that their data is their own, stored securely in Oracle Cloud or via Cloud@Customer for organizations wanting to keep their data within their own boundaries. Furthermore, because it is a comprehensive platform, Oracle Digital Assistant can integrate with existing processes, routing rules, and contact center agents to support enterprises’ unique business needs. In fact, Ovum noted that “A differentiator for ODA [Oracle Digital Assistant] is that a business process engine sits beneath it and is tightly integrated to perform tasks emerging from the conversation. For example, when an end user informs the ODA of a change of address, several relevant processes kick in. Oracle's ODA and business process management R&D teams are also tightly integrated because of the overlap in functions.” Oracle also offers out-of-the-box chatbot skills for Oracle Cloud HCM, Cloud ERP, and Cloud CX, as well as integration with Oracle CX Service to speed up deployment and provide seamless engagement for Oracle Cloud Applications customers – a point that was noted as a strength in the Ovum report. With no apps to download and no training needed to use Oracle Digital Assistant, the use of intelligent assistants has picked up quite significantly in the industry. Over the past years more and more organizations – both public sector and commercial – have come to rely on Oracle Digital Assistant for their needs. Common use cases include enabling easy and 24x7 access to employee HR self-service functions and employee expense and finance functions, offering customer or employee FAQs and information lookup. This enables Oracle Digital Assistant to be the first line of customer/employee helpdesk and drive seamless bot-agent handoff only where needed, and more. These use cases present massive sales and ROI opportunities, freeing up human resources to take on the more complex challenges while at the same time improving the user experience. Oracle’s leadership position in the Ovum report is a testament to the significant R&D investments in AI and NLP-powered Cloud service over these recent years. For more information on how your organization can leverage Oracle Digital Assistant, please visit our website. And to download the full report, click here.

Ovum, a leading analyst firm and part of the global technology research organization, Omdia, has recognized Oracle Digital Assistant as a leader in the market in its latest research report, "Ovum...

Oracle AI

Achieve Fast, Scalable Querying for Very Large Graphs with Distributed Parallel Graph AnalytiX (PGX)

Graph data processing is already an integral part of big-data analytics with many applications in various domains including Finance, Cyber Security, Compliance, Retail, and Health Sciences. The adoption of graph processing is expected to further grow in the upcoming years. This is partially because graphs can naturally represent data that captures fine-grained relationships among entities. Graph analysis can provide valuable insights about such data by examining these relationships. Oracle Labs PGX has been providing graph solutions both for Big Data and for Relational Database customers.  In this post, I will describe our new distributed graph traversal solution that significantly improves performance and memory consumption of Oracle PGX's in-memory distributed graph query engine. That is especially true on very large graph queries where our competitors either fail to execute due to memory usage (see the performance figures later in the post), or fall back to slow and inefficient disk-based joins.  Typically, graph analysis is performed with two distinct but correlated methods, namely computational analysis (a.k.a graph algorithms) and pattern matching queries. Most graph engines nowadays, such as Oracle Labs PGX and Apache Spark GraphX/GraphFrames, support both graph algorithms and graph queries. With computational analysis, the user executes various algorithms that traverse the graph, often repeatedly, and calculate certain (numeric) values to get the desired information, e.g., PageRank or shortest paths. Pattern matching queries are declaratively given as graph patterns. The system finds every subgraph of the target graph that is topologically isomorphic/homomorphic to the query graph and satisfies any accompanying filters. For example, the following PGQL (for Property Graph Query Language) query: returns the persons p1 (called "John Doe") and p3 who have the largest number of common friends. Such queries can be used for example for friend recommendation.  Graph queries are a very challenging workload because they focus on the connections in the data. By following connections, i.e., edges, graph query execution can possibly explore large parts of the graph, generating large intermediate and final result sets with a combinatorial explosion effect. For example, on a very old snapshot of Twitter (known as the "Twitter graph" in academic graph-related research papers), a single-edge query (e.g., (v0)→(v1)) matches the whole graph, counting 1.4 billion results. A two-edge query (e.g., (v0)→(v1)→(v2)) returns more than nine trillion matches. Additionally, graph queries can exhibit extremely irregular access patterns and therefore require low-latency data accesses. For this reason, high-performance graph query engines try to keep data in main memory and scale out to a distributed system in order to handle graphs that exceed the capacity of a single node. Graph data processing and querying is an increasing market that has many applications in various domains including Finance, Cyber Security, Compliance, Retail, and Health Sciences.  These applications often require querying very large graph data in a fast and efficient manner. Oracle has been providing graph solutions both for Big Data and Relational Database audiences, while there are also commercial competitors like Amazon Neptune, Neo4J , as well as open source alternatives including Spark GraphFrame. Our invention can provide significant differentiation for Oracle's solution over those competitors.  Our invention significantly improves performance and memory consumption of Oracle's current in-memory distributed graph query engine, especially on very large graph queries where our competitors either fail to execute due to memory usage (e.g., Spark GraphFrame) or to fall back to slow and inefficient disk-based joins (Neptune or Neo4J).  Traditional Distributed Graph Traversal Approaches In a distributed system, graphs are typically partitioned across machines by vertices, meaning that each machine is storing a partition of the vertices of the overall graph, plus the edges corresponding to that vertex. For example, in the graph below, machine 0 stores vertices v0, v1, and v2, while machine 1 holds vertices v3 and v4. The edge between v0→v1 is local to machine 0, while the edge connecting v2→v3 is remote, as it spans machines 0 and 1. For large distributed graphs, none of the traditional graph exploration/traversal approaches is suitable for distributed queries. Breadth-first traversals and distributed joins quickly explode in terms of intermediate results and pose a performance challenge over the network. Depth-first traversals are challenging to paralellize and result in completely random data access patterns. In practice, most engines use breadth-style traversals combined with synchronous / blocking communication across machines.  Breadth-First Traversals In breadth-first traversals, the execution expands the query in width. The query pattern is matched to the target graph edge-after-edge. For example, matching pattern (a)→(b)→(c) to the example graph above could proceed by matching edge (a)→(b) to all graph edges, namely (v0)→(v1), (v0)→(v2) etc., and then proceed with expanding these intermediate results to match edge (b)→(c). Typically, the execution proceeds with synchronous traversals, i.e., the first edge is completely matched before moving to the next edge to match.  Expanding the query breadth-first is not ideal for a distributed system. First, materializing large sets of intermediate results at every step leads to an intermediate-result explosion. Breadth-first traversals typically have the benefit of locality (i.e., accessing adjacent edges one after the other). Unfortunately, locality in distributed graphs is much more limited, since many of the edges that are followed are remote. As a result, a large part of the intermediate results produced at each part of the query must be sent to the remote machines, creating large network bursts. Distributed Joins Graph traversals can be also expressed as relational joins. Following the edge (a)→(b) can be mapped to a join (or two) between the "vertex table" (holding the graph vertices) and the "edge table" (holding the edges): Distributed joins face the same problems as breadth-first traversals, plus an additional important problem. They perform table joins instead of graph traversals on top of specialized graph data structures. Unsurprisingly, graph-specific data structures are much faster than generic joins (see later in this post the performance comparison of Oracle Labs PGX Distributed to Apache Spark GraphFrames). Depth-First Traversals In depth-first traversals, the execution expands the query in depth. The query pattern is matched to the target graph as a whole, result-by-result. For example, matching the pattern (a)→(b)→(c) to the example graph above could proceed by matching (v0)→(v1)→(v2), then (v0)→(v2)→(v3), etc.  The main advantage of expanding the query depth-first is that intermediate results can be eagerly expanded to final results, thus reducing the memory footprint of query execution. Nevertheless, depth-first traversals have the disadvantages of not leveraging locality and of more complicated parallelism. The lack of locality is depth-first results in "edge chasing" – i.e., following one edge after the other as dictated by the query pattern – thus not accessing adjacent edges in order. The complication for parallelism manifests because the runtime cannot know in advance if there is enough work for all threads at any stage during the query. For instance, a query like the `MATCH (p1:person)-[:friend]->(p2:person)<-[:friend]-(p3:person) WHERE p1 <> p2 AND p2 <> p3 AND p1.name = "John Doe"` that I described in the beginning of the post will probably produce a single match for (p1). If this intermediate result is expanded in a depth-first manner, the amount of intermediate results (hence the parallelism) will grow slowly.  Dynamic Asynchronous Traversals For Distributed Graphs The Parallel Graph AnalytiX (PGX) toolkit, developed at Oracle Labs, is capable of executing graph analysis in a distributed way (i.e., across multiple servers); we refer to this capability as PGX.D. We are experimenting in PGX.D with a new hybrid approach to executing graph traversals that offers the best of breadth-first and depth-first traversals. Competing graph engines include the classic trade-off between performance and memory consumption for graph query execution: Sacrifice performance: One option is to use a fixed memory area (typically several gigabytes) for the execution, but spill intermediate results that do not fit to disk. Sacrifice memory: Another option is to perform the whole computation in memory. If the intermediate results do not fit the memory, this approach cannot compute this query on that graph. PGX.D enables the in-memory execution of any-size query without sacrificing memory or performance. In particular, graph queries in PGX.D: Can operate with a fixed, predefined amount of memory for storing intermediate results; Only use this memory for computations, i.e., do not spill any intermediate results to disk; and Can essentially calculate queries of any size, because intermediate results are turned to final results "on-demand", to keep memory consumption within limits. On the technical side, PGX.D achieves the aforementioned characteristics by deploying: Dynamic traversals, using Depth-first execution when needed, thus aggressively completing intermediate results and keeping the memory consumption within limits; and Breadth-first execution when possible, thus removing the performance complexities of depth-first traversals. Asynchronous communication of intermediate results from one machine to the other, thus not blocking/delaying local computation due to remote edges. Flow-control and incremental termination to keep global memory consumption (including messaging) within limits and guarantee the query termination (i.e., avoid deadlocks). Example Consider matching the pattern (a)→(b)→(c) to our example graph presented above and consider a worker thread currently starting to match from vertex v0. The thread can bind v0 as (a) and then try to expand to the edge (a)→(b). The worker could match (b) with v1. At this point, the dynamic traversal approach in PGX.D, based on how much memory does the query already consume, will dictate whether the worker will continue matching (a)→(b) edges (breadth) or should continue with the (b)→(c) edge (depth). In either case, the worker will eventually match v3 for (b) in which case PGX.D simply buffers the intermediate match to a message destined for machine 1 and continues matching the next (a)→(b) edge in a bread-expanding manner. Of course, PGX.D controls the number of outgoing messages / intermediate results in order to maintain the execution within limits. You can find more details in our GRADES 2017 publication that describes the main runtime queries in PGX.D (missing the local dynamicity which is described in a follow-up publication currently under submission). Comparing PGX.D to Open-Source Systems We use the LDBC social network benchmark graph (scale 100, 283 million vertices, 1.78 billion edges) and queries (we adapted the queries to reflect the current features of PGX.D; these changes mainly include the removal of HAVING clause, subqueries, and regular path queries).  We compare PGX.D to Apache Spark GraphFrames (version 0.7 on top of Spark 2.4.1) and PostgreSQL (version 11.2). Both PGX.D and GraphFrames execute on 8 machines connected with Infiniband. We perform 15 repetitions and report the median run.   Clearly, PGX.D is significantly faster than both GraphFrames and the traditional PostgreSQL RDBMS. PGX.D executes the total query suite 29.5- and 17.5-times faster than GraphFrames and PosgreSQL, respectively. In addition, PGX.D is configured to use approximately 16GB runtime memory for intermediate results, while the other two engines are configured to use the whole 756GB (8x for GraphFrames) of memory available in the underlying machines. As I mentioned earlier in this post, GraphFrames implements graph traversals on top of distributed joins on dataframes.  Large-Scale Queries  In this experiment, we evaluate the engines with very large queries. In particular: Q1: Simple cycle; pattern (v1)→(v2)→(v1) with (a) a COUNT(*) aggregation and (b) AVG aggregations of vertex data; Q2: Two-hop match; pattern (v1)→(v2)→(v3) with (a) a COUNT(*) aggregation and (b) AVG aggregations of vertex data. We execute these queries on graphs of increasing size: Graph # Vertices # Edges Description Livejournal  484K  68.9M Users and friendships Uniform Random  100M  1B  Uniform random edges Twitter  42.6M 1.47B Tweets and followers Webgraph-UK 77.7M 2.97B  2006 .uk domains This experiment highlights the true need for the scalable in-memory distributed graph traversal methodology of PGX.D. As the query exploration size increases, GraphFrames and PostgreSQL cannot keep up with the workload. Even with the two smallest graphs, PGX.D is on average 48- and 115-times faster than GraphFrames and PostgreSQL, respectively. Clearly, joins in PostgreSQL are significantly slower than graph traversals in PGX.D. With Q2 on Twitter and Webgraph-UK we see that even the 8 x 756GB = 6TB of total memory (backed by 1+TB of disk) is not sufficient for GraphFrames. As in the previous experiment, PGX.D completes these queries with approximately 16GB of memory in each machine.  What's Next We have explored dynamic asynchronous traversals in PGX.D only for graph querying and pattern matching with PGQL. We are currently exploring how to further leverage these fast in-memory explorations for machine learning. For example, we are developing large-scale random walks on top of PGX.D that will serve as the backbone for graph machine-learning solutions.  Conclusions I briefly presented our new dynamic asynchronous traversal approach for distributed graphs in PGX distributed mode. Using this approach PGX.D achieves fast, scalable, fully in-memory, with a small memory-footprint distributed graph queries, thus enabling graph processing on a whole new scale of graphs and queries. The hybrid/dynamic traversal functionality is on PGX.D roadmap so stay tuned for new information on its availability. For more information and for trying PGX, you can visit Oracle Labs PGX Technology Network Page.  

Graph data processing is already an integral part of big-data analytics with many applications in various domains including Finance, Cyber Security, Compliance, Retail, and Health Sciences. The adoption...