Saturday, May 31, 2008

What’s next for copyright?

Everyone knows there is a copyright war going on right now. Big companies own the copyright to virtually all music, movies, and television shows, and they want to make large profits off these rights. People, on the other hand, want to be able to play their media when and where they want, and ideally they'd like to get it for free. This conflict shows up in a number of guises – battles over internet privacy, digital rights management, and bittorrent, to name a few – and shows no signs of being resolved any time soon. I do think there is hope for a solution, but the industry needs to accept that they can't control everything.

To see it, let's take a look at one website that ignores copyright and gets away with it: YouTube. Users can upload and watch virtually anything, and due to the sheer number of uploads, copyright holders can't keep up. It survives because there is a big company protecting it from lawsuits, but that isn't why it's popular. The biggest feature here is ease of use – users go to YouTube because it is the best way to get the content they're looking for. YouTube already makes lots of money; if publishers were to learn from this, I think they'd have a winning strategy.

Of course, the challenge is to turn this example into a general solution. I would say that big publishers can do it, but they need to change their business model. What they should do is try to control the brand and be the best for distribution, to make it as easy as possible for people to get the content they want. I'll give a couple examples of how I think this could go.

I picture a website much like YouTube, but controlled by a publisher. Users can visit it to watch movies and TV shows for free, at their own leisure. To fund this they use the tried and true internet model: advertising. Most of it is general advertising, keyed to the content in the movie, and the rest is good for the publisher – collector's edition DVDs, upcoming movies with the same director, etcetera. Sure, people still pass movies around on bittorrent, but this is quick and convenient. Then persecuting people who share files isn't really necessary, since many of them will make it to the publisher's site on their own. Similarly, people won't fight as hard to protect other sites hosting the content, since there is an easy, legal, and free channel to get the same. Naturally this wouldn't be a replacement for physical media, but the shows will be on the internet regardless of what publishers want, so they may as well get a slice out of it.

For music, this could go even further. A music company could run their own download site or tracker, putting up the content themselves. It would be high quality official content, for free. They could try to get users to pay if they like the music, buy physical copies, etcetera. Remind you of anything? Radiohead has done this. After this is in place, they could go after other distributors, to keep people coming to the 'official' site. Again, people will share the music. But who cares? It's free unless they choose to pay, and let's face it, they are going to share anyways.

In that case, why would this kind of scheme work? A number of reasons:

  • It is easy to do. Laws don't need to be changed, and people won't fight it.
  • Profits are still good. Maybe not as high as they once were, but that industry is dying anyways. And compared to physical media, costs are virtually non-existent, so people wouldn't need to pay nearly as much piece of media for this to work.
  • Publishers have already lost control; they just don't know it yet. If you can't control your users, make them want to come to you.

There are lots of signs that this kind of model is coming. The biggest problem is that it isn't being done by the publishers. Either artists are breaking off and trying it themselves, or companies are springing up to sign deals and try it for themselves. Somehow we need to get the publishers to try it, and I think the whole problem of draconian copyright enforcement will slowly disappear. Or at least, it won't be the users themselves getting slapped with lawsuits.

Thursday, May 29, 2008

NOT Puzzle Solution

Edited 2012/08/16: Updated problem attribution.

Last week, I closed with a puzzle: "Invert three inputs using only two NOT gates." I should start off by pointing out that I cannot take credit for this problem; it dates back to at least the 70s and was featured in HAKMEM by Richard Schroeppel. I myself learned about it via Clive Maxfield. I'm also told that Guy Steele beat me to the generalization to n case, however I cannot verify this, and the solution presented here is entirely my own.

The solutions that follow are just two of many possible ways of looking at this. Structurally, they are by no means the simplest solutions, but I find them to be conceptually the simplest. If you haven't tried to solve the puzzle yet I heartily recommend it; it makes knowing the solution so much more rewarding if you have.

Solution for 3 inputs

My solution to this puzzle involves reasoning on the number of inputs with value '1'. If none of the inputs are 1s, then all the outputs must be, and vice versa. To do this, we create a set of temporary variables with the goal of knowing exactly how many inputs were 1.

3 := A ∧ B ∧ C
2or3 := (A ∧ B) ∨ (B ∧ C) ∨ (A ∧ C)
0or1 := ¬ 2or3
1 := 0or1 ∧ (A ∨ B ∨ C)
0or2 := ¬ (1 ∨ 3)
0 := 0or1 ∧ 0or2
2 := 0or2 ∧ 2or3

From here we can just build our outputs directly:

X = 0 ∨ (1 ∧ (B ∨ C)) ∨ (2 ∧ (B ∧ C))
Y = 0 ∨ (1 ∧ (A ∨ C)) ∨ (2 ∧ (A ∧ C))
Z = 0 ∨ (1 ∧ (A ∨ B)) ∨ (2 ∧ (A ∧ B))

Solution for n inputs

This proof sketch is for odd n. If n is even, just invert the last input directly with a NOT, and use this pattern for the first n-1.

As before, we are going to reason on how many inputs (I1 to In) were 1. For this we will use a bit more syntax: {a-b} means anywhere from a to b inputs (inclusive) were 1, and {a, b} means either a or b inputs were 1, with no other possibilities allowed. First, we can generate all the ranges ending in n directly:

{1-n} := (I1 ∨ I2 ∨ … ∨ In)
{2-n} := ((I1 ∧ I2) ∨ (I1 ∧ I3) ∨ … ∨ (In-1 ∧ In))

{n-n} := (I1 ∧ I2 ∧ … ∧ In)

Clearly, this doesn't need any NOTs. Next, we are going to make all the ranges starting in 0 and ending in an odd number:

{0-1} := ¬ {2-n}
{0-3} := ¬ {4-n}

{0-(n-2)} := ¬{(n-1)-n}

This requires (n-1)/2 NOT gates. Using these together, we can have variables for all the odd numbers of inputs directly:

1 := {0-1} ∧ {1-n}
3 := {0-3} ∧ {3-n}

n-2 := {0-(n-2)} ∧ {(n-2)-n}
n := {n-n}

Again, this doesn't require any NOTs. Lastly, we use one more NOT to let us get all the even values:

{0,2,4,…,(n-1)} := ¬ (1 ∨ 3 ∨ … ∨ n)

0 := {0-1} ∧ {0,2,4,…,(n-1)}
2 := {0-3} ∧ {0,2,4,…,(n-1)} ∧ {2-n}
4 := {0-5} ∧ {0,2,4,…,(n-1)} ∧ {4-n}

n-1 := {0,2,4,…,(n-1)} ∧ {(n-1)-n}

There! Now we have a variable for each possible number of inputs with value 1, so we can just plug in all possibilities to get our outputs:

O1 = 0 ∨ (1 ∧ (I2 ∨ I3 ∨ … ∨ In)) ∨ … ∨ (n-1 ∧ (I2 ∧ I3 ∧ … ∧ In))
O2 = 0 ∨ (1 ∧ (I1 ∨ I3 ∨ … ∨ In)) ∨ … ∨ (n-1 ∧ (I1 ∧ I3 ∧ … ∧ In))

On = 0 ∨ (1 ∧ (I1 ∨ I2 ∨ … ∨ In-1)) ∨ … ∨ (n-1 ∧ (I1 ∧ I1 ∧ … ∧ In-1))

As you have seen, this took a total of (n-1)/2 + 1 NOT gates. Since n is odd, (n-1)/2 + 1 = ⌊n/2⌋ + 1, and we're done! Whew, I wouldn't want to implement that.

Saturday, May 24, 2008

Boolean Logic

Today we discuss Boolean Logic. It may be the first in a series on different kinds of logic, or it may not. We'll see.

A concept that should be intimately familiar to any programmer is Boolean logic. True or False, 1 or 0, Yes or No, these are all different representations of a Boolean variable. Invented by George Boole in the 1800s, Boolean logic forms the basis of modern computing, used in everything from the base transistor to bitmasks and search queries. Wikipedia has a good introduction to the theory for anyone who needs a refresher. Rather than re-teach it, here I am going to draw your attention to a few of the more interesting features of Boolean logic.

First off, everything can be defined from only one basic operation, alternative denial (aka NAND, ↑). Alternatively, the same can be done with NOR, albeit in different patterns. For example, NOT can be created by splitting the input into both sides of a NAND gate. For this reason – and many others – NAND is often used as a building block of transistor circuits. It is a bit awkward for general use however, so for higher level Boolean logic we usually deal in three main operations: negation (NOT, or ¬), conjunction (AND, or ∧), and disjunction (OR, or ∨). There are others, but these are good enough for now. Take a look at a few examples:

A ∨ B = ¬(¬A ∧ ¬B) (De Morgan's Law)
0 = A ∧ ¬A
1 = ¬(A ∧ ¬A)

Even just using these three operators, there are multiple ways to express any given function. For example, which is the 'best' way to represent XOR (the ⊕ operator)?

A ⊕ B = (¬A ∧ B) ∨ (A ∧ ¬B)
A ⊕ B = (A ∨ B) ∧ ¬(A ∧ B)

The first is easier for most people to understand ("A is true and B is false, or the other way around"), while the second has fewer logical operations. If we wanted to implement XOR in software that might matter, although in hardware it would usually be implemented as a custom unit such as the CMOS 4070, or failing that, four NAND gates:

N := A ↑ B
A ⊕ B = (A ↑ N) ↑ (N ↑ B)

Another interesting feature of hardware level Boolean logic is propagation delay. Essentially, we think of logical operations as instantaneous, however in hardware they take a measured time to switch. The propagation delay is the total time it takes for a change in the inputs to be reflected as a change of the outputs of a function. This gives us a tradeoff between time and transistor count – basically, it might be better to use more transistors to implement a function, but structured in parallel. This tradeoff is sometimes reflected in software: if your computer can run multiple instructions in parallel, it is sometimes possible to use more instructions perform a function in fewer clock cycles. The compiler usually handles this kind of micro-optimization, however.

One other interesting tidbit: take a look at Karnaugh maps for simplifying expressions and finding race conditions. Essentially, you draw out the truth table for an expression and then cover the '1's with as few overlapping rectangles as possible, optionally adding extras to prevent race conditions.


Lastly, I'll leave you with a puzzle: invert three inputs using only two NOT gates. To be specific, you have a function with 3 inputs, A B C, and you want three outputs, X Y Z, such that

X = ¬A
Y = ¬B
Z = ¬C

However, you only have two NOT gates (although you can use as many AND or OR gates as you'd like). Note: to solve this in written form (rather than by drawing a circuit diagram), you may find that you need temporary variables, such as 'N := ¬A ∧ B'.

Bonus points if you can generalize it to an algorithm for inverting n inputs using ⌊n/2⌋ + 1 NOT gates.

Saturday, May 17, 2008

Risk

I, like many people these days, keep my important information on the internet. In this day and age, the big companies I entrust my privacy to are all as secure as I could ask for, and keep getting better. For example, Google (who is going to be my example for the brunt of this article) has never been hacked or lost any user information (as far as I know). But security doesn't lie entirely in the cloud; a key challenge is getting information in and out, and the security on that really hasn't changed.

So how do we protect our online identities? With passwords, of course. Now, there have been a few advances in password security – such as guidelines on what kind of passwords to use – but for 40 years of innovation, that really doesn't feel like much. Of course, that could mean that passwords are Good Enough™, but considering that someone could just watch you type one in, I don't think that's true. It's like building a walled garden with a moat, barbed wire, and lasers: if someone steals the key, they can just walk in the front gate.

To be fair, we've had this same problem for years. But recently I've noticed a trend: more and more information is ending up on the internet, and quite a lot of it is going to the same places. To see why this is a problem, let's take a look at the engineering definition of Risk:

Risk = (probability of an accident) x (losses per accident)

The probability of an accident is equivalent to someone getting my password, and doesn't really change. But the losses per accident are going up and up and up. Take a look at my Google account right now: I use a fair number of services – probably more than most – but the trend is the same for everyone. Hypothetically, someone could get access to

  • Every email I have sent or received (Gmail)
  • Where and when I will be (Calendar)
  • My address and credit information (Google Checkout)
  • Pictures of me and my friends (Picasa)
  • Everything I have searched for or clicked on on Google (Web History)
  • My health information

Wait, health information? Yep! Coming soon to a Google near you, personal health information1. All the time we are moving more information into the digital world, and for the most part, that's a good thing. But the risk to us is increasing at the same rate, and that makes me a tad nervous. I have to ask, what would it take for stealing passwords to become a lucrative business? I'm not the only one who has noticed the problem; some research is going into building a better lock, using technologies like fingerprint or iris scanning, and there are other approaches too. One possibility being thrown around is using usage patterns to detect when someone malicious is in the account, and cut them off from any information until they confirm their identity2. Another idea I had was partitioning the information within the account, to mitigate the risk of someone getting access. For example, I'd use a different password for Gmail and Google Reader than the rest of my services, since those are the most likely to be compromised. But regardless of what, if anything, gets implemented, a word of advice: keep your passwords safe!

Footnotes

  1. It's not just Google – Microsoft has a product of its own in development for storing health information.
  2. The parallel is credit card companies. If you start to purchase more than usual, or in different locations than usual, you may find your credit card cut off until you make a phone call.

Saturday, May 10, 2008

Scaling User Content

Scalability. One little word, and yet it continues to be a major problem plaguing programmers to this day. From Wikitionary:

2. (computing) The ability to support the required quality of service as the system load increases without changing the system.

A desirable property in any system to be sure, and to some extent it's easy enough to do. But there is always a limit, and once hit, the system needs to change. A lot of research has been done on how to scale just about anything – a party, a business, a network – and I don't really want to talk about these. But there is one issue I will talk about: user content.

Let's take a look at one of the earliest examples: the book. At first there were only a couple books to be found, laboriously copied out by hand and hugely expensive. This in itself wasn't too much of a problem, as very few knew how to read anyways. But the effort involved meant that only the best content would do, and only the elite would ever see it. Reading and writing grew slowly, and over time libraries were built. But as the price of admission fell and the literacy rate rose, more people started reading and writing. And then in 1439, Johann Gutenberg invented the printing press. Within a very short period of time the problem had changed – rather than getting access to books, people now had to choose between them, and the quality bar had been lowered. People have been solving this problem ever since; Oprah's Book Club, the New York Times' best sellers, and the Hugo Awards are all ways to separate the wheat from the chaff.

In the internet age, it is so easy to get your words out there that websites need to have a strategy for scaling content beyond the purely technical problems of storage and retrieval, to keep from being drowned in the chaff. Let's take a look at a few popular websites and see how they do it.

Google

Born in an era when most companies were only interested in creating portal sites and search engines were 'good enough', Google was a very different kind of company. Focused on providing good quality search results, and unwilling to allow ad companies to 'buy' position in the index, the Google search engine was generations better than its competitors. It used innovative algorithms (notably the PageRank algorithm) and a team of high class programmers to stay ahead of spammers and keep the search results fresh and relevant. Even when branching out into other tools, Google has always used specialized algorithms to solve its problems and provide quality content for users.

  • Approach: Custom algorithms for filtering content
  • Drawback: Spammers can find flaws in the algorithms, and exploit them for gain

Wikipedia

Originally conceived as a feeder project for Nupedia, an expert-written online encyclopedia (complete with full-time editor), Wikipedia was born in early 2001. It was to be a more 'open' compliment to Nupedia where anyone could edit articles, except it was not well received by the editors and reviewers. Only five days after creation it was given its own name, Wikipedia, and moved to its own site. A popular idea from the very beginning, Wikipedia had over 20,000 articles in its very first year. To keep the content clean of spam a whole set of tools were created, and a hierarchy of moderators to use them. The goal was to make it easier to undo vandalism than to create it in the first place, and looking at the current state of Wikipedia, it was highly successful. There is also a series of guidelines for acceptable content, and a court style system for resolving complaints.

  • Approach: Content moderated by the community itself, with sophisticated tools and guidelines to assist this
  • Drawback: A lot of time and effort goes into keeping the content clean, usually done by a small subset of the site's users

Facebook

Created by Mark Zuckerberg as a way to connect with friends in Harvard University, Facebook rapidly expanded to other universities, high schools, and eventually, anyone who wanted to join. It works on the principle that people are grouped by the networks they are part of, and these can be used to find people you know. Users 'friend' each other, identifying that they do indeed know one another and giving permission to access each other's profile. This keeps the content remarkably relevant and spam free, since users only sees the actions of people they personally know.

  • Approach: Content associated with its author, and users manually select which authors they want to see content by
  • Drawback: Not well suited to finding new associations, instead good for maintaining ones created externally

Digg

Digg was one of the first widely used tools for people to share interesting content they found on the internet. Users can 'dig' an article, picture, or video they find, and subsequent users can 'dig' or 'bury' it depending on whether they like it or not. This kind of social voting brings the most interesting articles to the top and leaves the rest behind. The same kind of voting system is used on the submission comments, highlighting interesting ones and hiding the rest.

  • Approach: Submitted content is voted up or down by the rest of the community, filtering out what's interesting
  • Drawback: Using community opinion leads to a groupthink mentality, promoting content that reinforces the community's opinions

This is by no means all of the approaches out there, just a few that I think are worth noting. For example, Slashdot uses a more sophisticated version of comment voting that takes into account how 'correct' a user's opinions usually are. Each one has its own set of drawbacks, however they are all better than just doing nothing. I am excited to see what innovative solutions people will come up with in the coming years. I mentioned in my last post that I thought blog comments had a systemic flaw; they just don't scale with one moderator and n users. I'm sure someone will find a good solution for this problem soon, whether it is comment voting à la Digg or something entirely new. But until then, manual moderation it is.

Saturday, May 3, 2008

About Me

Details updated June 7, 2010

I've always hated writing 'About Me' sections. In my view writing should be about the content, not the author, but since when did the world work the way I want it to? And I have to admit, it does make sense for a first post.

Who are you?

I'm Eric Burnett. Currently I am a software engineer at Google, working on Ad Exchange. This blog is not about my work, however, so don't expect much in that direction. Prior to this I was at the University of Waterloo earning a bachelor's in Computer Science1, a topic on which I would be happy to field questions :).

I have always been fascinated by computers, and since I was in elementary school I knew I would end up in a career working with them. My first exposure to programming came actually quite late, when I was 15 or 16. I took a summer school course that had us making games in Macromedia Director (now Adobe Director) using its custom scripting language. That was also my first exposure to the realities of our industry: the game I was making didn't get completed on time, even with features cut. Despite that, I immediately found myself fascinated by programming and understanding the magical world inside the computer, and so during the next year I taught myself C++. The rest, as they say, is history.

Since then I have learned that it doesn't matter what language or tools you are using; programming is just a realization of the thought process of the programmer. From my point of view it doesn't matter what you know, any programmer worth their salt can learn what they need to know in short order. What matters more is how you think, and that's the part that fascinates me. This blog is part of that: in part, it's the story of my quest to know what makes us tick.

Why start a blog?

That's the real question, isn't it? I've been reading blogs for a couple years now, but recently I've found myself with ideas that I want to share, and nobody to share them with. I don't know if anyone will ever read this, but if nothing else the act of writing these ideas down forces me to think them through in a way that pure contemplation never quite does (more on that later). Posting this is both a challenge and a commitment to myself to keep thinking, questioning, and finding new insights that are worth sharing.

The strongest inspiration for this blog has come from reading Coding Horror, one of the most widely read programming blogs out there. If you haven't heard of it, I suggest you go take a look. I have a long list of other blogs that I read, but it was this one that showed me the power of a blog – Jeff Atwood consistently made me think. It also happens to be well written and humorous – hell, even the format of this post was borrowed from Jeff's About Me page.

Where can I find you?

Feel free to drop me an email at ericburnett@gmail.com. I can't promise I'll respond, but it will always get read.

Footnotes

  1. Technically, I earned an Honours Bachelor's of Mathematics in Computer Science Co-op degree, with a minor in Combinatorics and Optimization. But don't hold that against me.
  2. This doesn't hold for blog comments. On small blogs comments can work fine, but the more popular a blog gets, the harder to manage comments get. Part of this is from real spam, but technology is helping mitigate that. A larger part is just the nature of blogs: blogs are an open discourse, person to world. But their comments are a two way conversation, and relatively quickly there are just too many participants for any one person to follow. It doesn't help that the signal to noise ratio is usually quite low for blog comments, but I think the problem is systemic.