Python Applied to Machine Learning and Data

An overview by Travis E. Oliphant.

Most tech companies today are struggling to figure out how they can best work with their data. Travis E. Oliphant, the founder of startups Quansight and Anaconda, and the primary developer of NumPy and SciPy packages for Python, gave a talk at North Star AI conference, powered by Proekspert on how Python can be used effectively for machine learning and data – the heart of ML-driven technology.

Oliphant’s connection to Python dates as far back as 1997, when he was working with version 1.4. The very first problem he focused on was actually a data problem, and this led him to understand that in order to do anything with machine learning you have to get the data right.

While data seems to be everywhere these days, said Oliphant, the biggest problem is how you gain access to and utilize that information.

Applying Python to machine learning problems

In Oliphant’s mind, Python is one of the best languages to apply to machine learning problems. During his talk, he gave a thorough overview as to why and presented how he imagines artificial intelligence might develop in the future.

Not Artificial but Augmented Intelligence

When Oliphant thinks of AI, he thinks of “augmented” rather than “artificial intelligence.” For at least the next fifty years this technology will be more about “empowering people rather than replacing them,” he said. To be sure, AI might take over some tasks that you are doing today, but then you will shift to doing something more important, while the machine takes care of more mundane assignments.

What can AI be used for? Oliphant noted a multitude of possibilities.

Any time you have a complex function with many variables for which you want to have an understandable input and output—you can apply AI. For example, self-driving cars, medicine, and geophysics are a few of the fields where AI technology can make a big difference.

Big companies using Python in machine learning

In order to apply ML and be successful with your applications, Oliphant noted, you will need to work with people who have domain expertise, who know the business. The good news is that every major tech company is now involved in the AI field. Microsoft, Google, Apple, IBM, and Amazon are pioneering machine learning and artificial intelligence research.

These AI applications, said Oliphant, are on the verge of broad usability. Moreover, they are all written in Python.

Obstacles in the Industry

There is amazing promise in the AI and machine learning industry, but we have a very long way to go, said Oliphant. The current landscape contains challenges such as organizational infrastructures that make data-sharing difficult, and out-of-date regulatory structures that were created for a different era. Technology is changing faster than education can keep up, and software is lagging behind hardware advances; programmers are not yet tapping into the full potential of the hardware that is available to them.

As quickly as things progress, there remain many silos of technological advancement and a general lack of integration when it comes to methodology. “Can’t we figure out frameworks that everyone can use?” asked Oliphant.

AI exists in some basic forms now, but the dream is something bigger. There is still so much that needs to be done in order for the promise of AI to become actual capability, said Oliphant.

Anaconda, a Possible Solution

Launched by Oliphant himself, Anaconda is an open-source tool that simplifies package management and deployment in the Python and R programming languages. It can be used to great effect for data science and machine learning applications. Moreover, its open-source package and environment manager, Conda, is language-agnostic and can distribute software for any language.

Ecosystem frameworks

When it comes to AI, Python is not enough, explained Oliphant. You need ecosystem frameworks to solve your problems as well as machine learning tools—and Anaconda can bring all of these technologies together. “One of the key things we need is AI integrators bringing people these capabilities in everyday applications.”

Everybody loves modeling, predicting, classifying, and visualizing in AI, and these are fairly easy tasks to complete. The harder things are feature labeling, data-cleaning, data-extractions, deploying, reproducing, and scaling—and this is where Anaconda can help.

Oliphant discussed two other Anaconda tools, Numba and Dask, that he believes are crucial for anyone working in the realm of machine learning. Numba, he explained, is designed to help with scaling up. It is an open-source Python compiler that comes with a CUDA simulator. It can also compile for the CPU and GPU at the same time and make array processing easy. Most importantly, Numba executes code ~2.7 times faster than NumPy.

Dask is a parallel computation library for scaling NumPy arrays and Pandas dataframes. With Dask you can make a collection of arrays or dataframes that are larger-than-memory and can be used in distributed environments. It has a task scheduler that is optimized for computation, helping to run your custom algorithms on distributed nodes. Dask also has beautiful diagnostic dashboards that provide users with performance insight.

You can find more information about these tools on the Anaconda website.

Using AI in Your Organization

To end his presentation, Oliphant discussed how these tools should be applied within an organization or company. How do you actually go about integrating AI into your technology?

Using machine learning and artificial intelligence in any way requires a process, he said. First, you have to bring your data together. Here, he suggested using visualization tools because the best way to understand your data is to look at it. Next, it is important to do AI brainstorming and consult with people who have real experience in the field. Once you have found the “right” features for your work, it’s time to build and validate your model and repeat that for many models, then publish and manage at some time-scale.

Following such a rigorous approach to machine learning and armed with tools such as Anaconda, we can start making the transition from the AI we have today to a more seamless AI of the future, concluded Oliphant.

Save the date – March 7th, 2019!
North Star AI powered by Proekspert is coming again and tickets are now available.
More info: aiconf.tech


Tech Tomorrow

Receive our weeky newsletter! Inspiring ideas that are worth your time

Subscribe