KDnuggets has been around in one form or another for decades. It’s witnessed every trend in data mining at first, and now in data science. Matthew Mayo, a Machine Learning Researcher, has served as Editor of KDnuggets for the past four years. Here, he talks to Justin Charness, Director of Product Marketing for Oracle AI, about quenching the growing thirst for knowledge about data science.
1. What are the top topics trending on KDnuggets right now, and why?
The overarching trends are toward beginner-to-intermediate technical articles and tutorials on a variety of technical topics, from data science software to deep learning concepts to algorithm overviews to project implementations. We also see considerable interest in introductory opinion pieces on the data science profession, such as how to gain the skills necessary to be effective and how to break into the field.
As far as some specific topics, natural language processing, data preparation, neural networks, and the Python data science ecosystem are all trending over the recent past. Why are these topics currently trending? The simplest (and most uninteresting) answer is that our trends seem to follow general industry trends. The more honest answer takes into account our team's biases and areas of knowledge; for instance, collectively we have much stronger Python skills and knowledge than, say, R skills, and so this is reflected editorially.
2. What have been the biggest ways the KDnuggets community has changed in the past five years?
Speaking purely in terms of community membership, KDnuggets has dramatically increased its website readership, newsletter subscribers, and social media reach. This obviously correlates with having more practicing data scientists, machine learning engineers, and data professionals now than five years ago. There are also many more people aspiring to these roles, and this is a sector that KDnuggets has started to cater to more.
If you consider that five or more years ago, given the smaller numbers of people involved in data science, those who sought out a resource like KDnuggets would likely have done so knowing what they were in store for career-wise. Today, we often have folks in the aspirational stages of their careers, not yet quite knowing what they are getting themselves into. And that's OK; KDnuggets is here to help.
But we certainly have material for the more seasoned data scientists, too. While there is help for those looking to get into the field, there are also a number of resources for those looking to up their game.
Keep in mind that data science has changed quite dramatically in the past five years, in both the concept of what data science is and the practical aspects of what data scientists do. We have gone from what was probably more generally considered a "big data" community to a "data science" community in that time, reflecting a shift in audience as well as a shift in the industry at large.
3. How important are community-style, open-knowledge resources to the continuing advancement of data science?
I'm a proponent of open everything. I think open-source tools, open-knowledge resources, and community-based approaches to learning and problem-solving are crucial in any field, not just data science. The more information we all have access to, the better.
Specific to data science, these resources will continue to be immensely helpful. At KDnuggets, we take a community-style approach to information exchange, given that we publish quality articles by independent, third-party contributors frequently. Finding great material from the community and passing it along to others truly is what KDnuggets was founded upon.
But there are all sorts of other great community-style resources available. I see such cooperation on Twitter all the time. My LinkedIn feed is full of individuals seeking, providing, and otherwise exchanging information, which is fantastic. The data science community in general is very open, welcoming, and forgiving, which is beneficial for everyone involved. I've yet to ask someone for insight or assistance which has resulted in a bad experience, whether or not the individual was able to help. In this respect, the community is great.
4. What has your personal journey as a Machine Learning Researcher been like, and how has your work with KDnuggets influenced it?
Prior to KDnuggets, I worked on my master's in computer science concurrently with a graduate diploma in data mining. I tailored my coursework as much as possible toward machine learning, including some independent research and my thesis. I went from there directly to KDnuggets, and I was able to continue on in research mode indefinitely from that point onward, as the environment provided me with a breadth of topics to dive into and get myself up to speed on. From there, my individual interests allowed me to explore certain topics in depth.
KDnuggets is a great place for me. I get to investigate the work of others—be it academic papers, blog posts, or tutorials—while simultaneously writing and sharing information with the community, and I’m able to perform some research on my own as well.
I hope to soon expand on this and share more of what I'm doing with the community, and I definitely have identified areas in which I will be pursuing research in the near future.
5. Looking ahead, how do you think training of data professionals will evolve in the near future?
"Data professional" seems to have become synonymous with "data scientist" in my view, whether or not a given data professional is, in fact, a scientist. I will answer from this perspective.
Data science means everything and nothing at once, which is why I'm not an advocate of the term. I think that narrower professional descriptions such as machine learning engineer, data analyst, data engineer, or what have you—using “data scientist” only when truly warranted—are beneficial. Even then, what one data scientist does may have no overlap with what another does, given the multitude of possible approaches to asking questions and getting answers from data.
That said, I think technical training for these various data professionals is becoming especially effective given the different niches and roles that data professionals play. We can't expect everyone in a field as broad as "data" to learn all they need from monolithic training routes, and so the evolution of training with narrow, more specific expected outcomes will fill this gap. This has already started with all different types of online learning and courses, and I expect the trends will gain steam in the near term.
This doesn't mean that many of these professions don't require traditional or advanced degrees as a foundation. I don't believe that a data scientist can be crafted via a 10-week online course, but I certainly believe that such courses can augment traditional education, and in some cases provide the training necessary to make transitions between niches.
The takeaway is that short, narrow, focused training options will increase in importance for data professionals looking to upgrade their technical skills.
6. What are some knowledge resources that you use that you recommend to data scientists?
Resources that I use shift over time, but when it comes to what I'd recommend to data scientists, I would certainly start with a little site called KDnuggets.
Sticking with the beginner-to-intermediate theme, if you are looking to round out your blog roll, Towards Data Science is another top-notch community-based resource for data scientists, both practicing and aspiring. The Open Data Science blog is also a good one, as is Kaggle's.
When it comes to more in-depth learning resources, I recommend fast.ai to anyone looking to learn deep learning or traditional machine learning. Another set of resources I have enjoyed recently are projects helmed by Rachael Tatman of Kaggle. She livestreams an academic paper reading group (often related to natural language processing) weekly on Kaggle's YouTube channel. She also frequently livecodes tutorials and projects she undertakes, and has casual discussions with other data scientists in her "Kaggle Coffee Chat" segments.
Also, feel free to connect with me on LinkedIn for a steady supply of data science resources from KDnuggets and around the web.
To learn more about AI and data science, check out the Oracle AI page.