An interview with The Data Charmer.
By Giuseppe Maxia on Jan 15, 2009
The Data Charmer, a.k.a. The Wizard, is a free lance database consultant, with a long career in several IT fields. He is well known for his Perl and SQL expertise,although he is proficient in several other languages, such as C++, shell scripts, and Italian.
He has a split personality, one of which lives in virtual space and time, floating around UTC+1. The other (or the others, as there is a dispute about how many they are) is less documented and some people believe it to be fictional. He teaches Creative Biography at the University of Euphoria, CA (also known as Euphoric State).
G.M. Hello, D.C. Thanks for agreeing to be interviewed. I'll start with a question that most people ask. Who are you?
D.C. This is not really a question I'm willing to answer. Besides, the answer would be misleading. In the Internet age, I can be several people at once. Even presenting myself with the same name, I would be known as a different kind of individual in each place I appear. If I discuss philatelic matters in a specialized forum, they won't be interested in my involvement with the Perl community, thus I keep separate names for different places.
G.M. Ah! You're a philatelist, then?
D.C. No. That was just an example.
G.M. You have been associated with a few names in the IT field. Some have even said that you and I are the same person. What is your comment on such allegation?
D.C. Believe it or not, this question can't be answered in full. I may well be a different person when this interview is over. I could even be you, even though I recall many times when you have come to me for help, but the matter of identity is not compelling. I'm content with my fuzzy definition of a hyper space entity.
G.M. Ok. Let's move with more definite matters. What's your involvement with MySQL?
D.C. I started using it about eight years ago. I had a problem to solve quickly, and MySQL was available, easy to install, and it fit the billwonderfully. Then I found out that it was the right tool for a whole range of medium to high level problems, and I started using it instead of other more famous databases, despite the wide criticism from most Cargo Cult programmer, who claim that MySQL is not a real DBMS because it lacks this and that feature.
G.M. Whoa! Hold down. That's a mouthful. What's a Cargo Cult programmer?D.It's a funny, but very effective concept introduced by Richard Feynman in regard to science. He recalls that during the WWII in some Pacific islands the inhabitants observed American soldiers gesturing on an airstrip when military cargoes were landing, carrying every sort of goods. When the war ended and the troops left, the natives tried to recreate the conditions for the cargoes to land. A man with half coconuts on his ears wandered the airstrip, waving wooden pads, while others inspected the horizon from the height of makeshift control towers. But of course no cargo arrived as a result of their efforts. Likewise, there are many programmers who solve their problems with cut-and-paste, without having any idea of why the original code was structured in that way.
G.M. Are you telling me that all MySQL critics are Cargo Cult programmers?
D.C. No. But a real good share of them are. Why? It's a matter of statistics. The Internet has dozens of millions of dynamic sites. Behind each of them there is some sort of database, but that does not mean that all their users are database experts. On the contrary, most web developers in my experience, know very little about databases, and they use databases through wrappers. If you ask those people which database they are using, they may tell you it's 'Java' or some trendy CMS brand. I came across many programmers who were using Oracle through one of such wrappers and they spent much time explaining to me why Oracle was the right choice for the task straight from some marketing ad. However, I was auditing their code, to find the reason for a performance bottleneck. They wanted to show me the interaction with Java classes that handled the database. Instead, I looked at the database logs, and I found an impressive number of commits and no join clauses in their select statements. It turned out that they were not using database transactions at all, and they were emulating transactions with client code. And what's worse, their code was emulating joins as well.
G.M. This looks too ugly to be a generalized case. Surely the majority of database programmers are not like that.
D.C. I would like to share your optimism, but my personal experience tells me exactly the opposite. Most database programmers have little clue of relational theory, and thus most performance and scalability issues are just a problem of lack of basics.
G.M. Perhaps we have gone too far from the main subject of this interview. Let's try to get back on track. How did you get involved with databases?
D.C. I had my first encounter with SQL about twenty years ago. I attended an Oracle course and I started using it at my employer's. A few years later I was introduced to formal relational theory during a long course in structured analysis. Before that , I knew about relational theory as a necessary complement of SQL, not the other way around. Once I got acquainted with formal relational theory, I found out that it suited me quite well. I can design data in 3NF just out of my head. Thus, when I see a beginner struggling with a 1NF or even breaking it, I can see immediately what's wrong.
G.M. After your initial acquaintance with Oracle, have you used it a lot?
D.C. Not really. At my employer's it was ruled out after a few months, in favor of a home made solution. I'm talking about mainframe applications, which were still common at that time. Oracle was supposed to replace a huge non-relational database that was designed in the 1960s and it was showing its age. Personal computers were not as ubiquitous as they became after 1995, and the idea of using a PC for a database was considered sort of bizarre by the few professionals of this field. There was no Linux and no widespread open source yet, and MSDOS was by all practical purposes the only choice of OS in the market. In these years I managed to migrate the mainframe database to a relational one, using a PC based API for a RDBMS written in C. There was no MySQL in sight yet, and that wonderful API did not have a SQL interface. I wrote a simple wrapper that sounded like SQL and with that I converted the existing financial procedures in the new system. Since then, I have been always curious about the internals of database system. When I found MySQL, with its open code, it was love at first sight.
G.M. Let's skip to a related subject. You said that at the end of the 1980s MSDOS was the only choice. What's your take now? Which is your OS of choice?
D.C. As a MSDOS user, I became quite an expert at circumventing its limitations, the biggest of which was lack of multitasking. Although multitasking processors had been available for a few years (80286 and 80386) there was no way of exploiting that. When Windows 3.1 hit the shelves, offering a simple multi tasking, I grudgingly embraced the new system. It was clear from the beginning that most of the knowledge accumulated during the years of MSDOS usage were nearly useless with Windows 3.1. Which had its problems as well. And so I waited for the next major miracle, Windows 95, announced as the ultimate problem solver. What was ultimately true though, was the realization that most of the experience I had with MSDOS and Windows 3.1was now going down the drain. It seemed that Microsoft was making a point of discarding the ones who had invested time and money to become proficient with its products. In the meantime, in 1993 I had a serendipitous encounter with an alternative operating system. Its name was Linux, and it could do multitasking much more efficiently than Windows. I started using it for some projects, and I appreciated its powers. In 1999, when it became clear that the successor of Windows 95 were going to make me discard yet again my previous knowledge, I began to use dual boot computers with Linux and Windows. In 2001, I abandoned every surviving hope of seeing a usable OS from Microsoft and since then I only have full Linux installations in my machines. This year I started using a Mac Laptop, and to my delight I could apply most of my Linux expertise to this OS. I am currently using three Linux boxes for sheer power and development, and the Mac for mobility.
G.M. What are your tools of the trade? Editors, and so on?
D.C. I'm a command line guy. I do my text manipulation from the shell prompt. Between shell built-in commands and Perl I do most of my work. My editor of choice is vim. I know that there are more powerful ones, but vim (or at least 'vi') is ubiquitous. If you can use it, you can work everywhere. Vim is to editors what MySQL is to databases.
G.M. You have been called a wizard, a guru, a hacker. How do you describe yourself?
D.C. I am an experienced user of tools and I'm always curious about how things work and eager to learn new things. I don't call myself a hacker, but I don't object being called that. About wizard and guru, well, it makes me smile when I realize that I can still surprise someone with my old tricks.
G.M. If you could leave one piece of advice into the universal bag of tricks for technology newbies, what would your advice be?
D.C. Be curious.