MusicBrainz: A Semantic Web Service

This is an article I'm working on about MusicBrainz*, the Semantic Web, Web Services*, agoric* economies, and other stuff. Feel free to send me feedback.

Introduction

Music has always caught the imagination of the public. From dreams of a giant "jukebox in the sky" over the Information Superhighway [DET] to the recent debate over Napster, music has always been the "killer app" used to describe new technologies. Of course, these dreams have never quite come about as planned. Instead of a "smart" machine seeking out music tuned to my tastes, I still have only the small number of choices on my radio dial. And ever since Napster started their filtering, music sharing on the Internet has become ever more difficult.

One the things that underlies these ideas is their dependency on "metadata" or data about data. Metadata provides information about the artists, the titles, etc. All that information is attached to the music, but isn't part of it. The music world suffers from a lack of standardization in terms of metadata formats, and also from a paucity of public metadata.

About MusicBrainz

The MusicBrainz project hopes to change this situation. It's a large database of music metadata and, even though it's only in beta testing right now, it already has almost 300,000 tracks in the database. MusicBrainz information is all user-contributed, providing what some have termed the "cornucopia of the commons" [BRI]. Unlike many situations, where each user decreases the value of the shared space (the so called "tragedy of the commons" [HAR]), the easy duplication of electronic information creates a situation where each user makes the system more valuable.

Let's take an example: When you purchase a new CD and insert it into your computer, your audio player will probably come up with a generic name for it. "Audio CD 47" perhaps, complete with "Track 1," "Track 2," and so on. Were it to use MusicBrainz, it would have attempted to connect to the MusicBrainz server, to see if metadata about the CD were available. If it were, your audio player would have renamed the CD ("Amnesiac") and the tracks ("Pyramid Song," etc.). If metadata were not available, you likely would have filled in the track names yourself, if only for your own benefit. Your audio player would ask you if you wanted to share the information and, if you did, it would send the information to MusicBrainz for everyone else to use also. This kind of functionality is currently implemented in the FreeAmp player, and MusicBrainz plugins will hopefully be released for other players soon.

MusicBrainz wasn't the first to implement this idea. Back in 1996, the Internet Compact Disc Database (CDDB) was created using a very similar system. It grew incredibly fast, with its users contributing track and title infomation for 800 new CDs each day. However, its biggest problem was that it had no moderation system and, thus, soon became filled with typos, misspellings and duplicate data. Worse yet, it was later bought by Gracenote, who imposed severe restrictions on the use of the service. The many contributors who had helped build up the database were outraged that they could no longer use it freely.

To replace CDDB, many projects sprang up. One of them, the CDIndex, later became MusicBrainz. To prevent a CDDB-style takeover, a number of safeguards are being put in place. The database is freely available to the public under the OpenContent license, ensuring that no one can take control of the information without giving back to the community. Also, MusicBrainz is setting up a distributed network of servers, so that no one server has full control of the database.

Of course, MusicBrainz is adding features that weren't available with CDDB. Using audio fingerprinting software from Relatable, it can also get music metadata for an MP3 file. In addition, to prevent the problems that CDDB had, MusicBrainz includes a moderation system allowing people to correct mistakes in the database.

Semantic Web Services

MusicBrainz is one of the first of what might be called Semantic Web Services. These are a combination of the ideas behind the Semantic Web and Web Services. The Semantic Web is the project to add some machine-processable information to the largely human-language content currently on the Web. Web Services is a similar concept aimed at sending machine-processable information between organizations in an attempt to automate processes.

MusicBrainz does both. It uses RDF, the Resource Description Framework which is the foundational language of the Semantic Web. In addition to simply providing an HTML version of its content, MusicBrainz provides all of its information in RDF. It gives all the major items in its database (Artists, Albums, Tracks, etc.) URIs, so that they can be referred to by others and used in numerous applications. All of this is done using open protocols, so that anyone can easily access the data for any purpose. It also provides an RDF-based service API for applications to query the database and submit new information. An open source library is also available for programmers to easily contact the MusicBrainz API and include the added functionality in their application.

Since the MusicBrainz format is open and uses the industry-standard for metadata, RDF, it can be repurposed for numerous applications. For example, other people can build on top of the data provided by MusicBrainz. Music vendors can add information about where to legally purchase the music, and use the MusicBrainz metadata to enhance the information available on their websites.

File sharing systems (like Napster, Freenet or Audio Galaxy) can use the metadata to provide more information about the MP3s that are available for download, or to make it easier to search for the song you're looking for. Artists can provide links to a "tip jar" where appreciative fans can donate money if they like the music. In fact, the Espra project is doing something like this, building a decentralized architecture on top of Freenet and music metadata to provide a direct link between the artists and their fans, in an attempt to build a gift economy, similar to the patronage system which funded some of the greatest artists of history.

Other projects, which may not want to use the MusicBrainz database, can still use the RDF "terms" which it has defined (things like "artist", "album", etc.) for their own databases. This provides them with a "semantic bootstrap" -- terms that already have a well-defined meaning and are widely used.

Because of the flexibility of the RDF format, all of this additional data can be added without breaking backwards-compatibility with clients that don't expect it or use it. Since these are all open protocols, this means that clients have a choice of which server they want to receive their data from, and may want to choose servers that provide them with higher-quality information, or more information. All the servers are compatible, so users can easily switch.


Figure 1: A graphical representation of some MusicBrainz RDF data about a "Portishead" album.

Next Stop: The Semantic Web

Semantic Web Services like MusicBrainz are an exciting part of the growing number of Semantic Web tools and applications. The key to the power of the Semantic Web is to start building apps like these that publish their information. Once we get such systems to begin talking with each other, the possibilities are endless. Their combinations can reveal potential that we wouldn't have thought of before, and empower people, like music fans long separated from artists by the record companies, to take control of their world. It's an exciting ride!

More Info

Aaron Swartz is a contributor to the MusicBrainz project, especially its metadata initiative. He is also a member of the W3C's RDF Core Working Group and a co-founder of SWAG: The Semantic Web Agreement Group. You'll likely find him in the RDF IRC channel, working on some interesting new Semantic Web software. His website is at http://www.aaronsw.com/ and you can email him at me@aaronsw.com.

MusicBrainz is developed by programmers around the world, led by our Mayhem and Chaos Coordinator, Robert Kaye, with the source code available as free software. The project is sponsored in part by Bitzi, Relatable, EMusic, O'Reilly and Associates, and users like you. Its website is at http://www.musicbrainz.org/.

FreeAmp is a free audio music player which uses the MusicBrainz database. Grab a copy at http://freeamp.org/.

Espra is an open source file-sharing client which attempts to produce a decentralized system for the distribution of information and the creation of a gift economy. More information (and a free download) is available at their website: http://www.espra.net/.

[DET] What Will Be by Michael Detourzos

[BRI] The Cornucopia of the Commons by Dan Bricklin. http://www.bricklin.com/cornucopia.htm

[HAR] The Tragedy of the Commons by Garrett Hardin, 1968. http://dieoff.com/page95.htm

Part of LogicError. Powered by Blogspace, an Aaron Swartz project. Email the webmaster with problems.