Thursday, June 12, 2008

/dev/random (literally and figuratively)

1. So I'm trying to figure out a way to construct an extractor (in the sense of Barak and Halevi; I'm still unclear in exactly what way this is related to the usual definition of an extractor) from a block cipher.

A construction like this should work pretty well: Determine an m-bit key K from a long string with high min-entropy. Then, use block cipher B with that key to encrypt the all-0 block; then, take consecutive substrings A_i of length m (padding the last if necessary) and compute a sequence defined by E_(i+1) = B(K, E_i xor A_i). The last term of this sequence is the output of the extractor.

Assuming the key is chosen from some distribution with reasonably high min-entropy, then E_i and A_i should be mostly independent. If E_i is "near uniform" (i.e. has min-entropy close to the Shannon entropy) and small, and A_i and E_i are uncorrelated, then the min-entropy is almost additive. The problem is, it's not obvious how to make this rigorous, or even under what conditions it actually holds.

What does this have to do with /dev/random? Well, it's well-known how to build a pseudo-random generator from a block cipher in counter mode. So if we can also construct an extractor from a secure block cipher, we can obtain a robust, secure random generator which has the added bonuses of being 1) small and 2) fast. Which makes it a perfect candidate for replacing/supplementing /dev/random, CryptGenRandom, etc.

2. Anyone know a good resource for learning about expanders? (Other than Luca Trevisan's blog, of course; his series on the subject is pretty much just epic. I need an introduction to the topic, though, which is shallower and broader).

3. How on earth do four Supreme Court justices believe that Gitmo detainees aren't entitled to challenge their detention in civil court? If Scalia and Thomas are such "strict constructionists," why don't they note that the relevant clause of the Constitution reads:
The privilege of the writ of habeas corpus shall not be suspended, unless when in cases of rebellion or invasion, the public safety may require it.
Which, you might notice, says nothing at all about "enemy combatants," or "acceptable alternatives to habeas corpus." Unless they personally amended the Constitution themselves?

Sunday, March 23, 2008

Coloring lattices

A problem:

You have an infinite (2-dimensional) chessboard. Can you 2-color the squares of the chessboard so that a chess king, starting on any one square, can move to at most n distinct squares of the same color for some fixed n? (I.e., that all the connected monochromatic regions are of bounded size?)

If you think about it for a few minutes, it should become obvious that the answer is no. Proving it is a bit more difficult. I don't wanna cut this post (also I'm not sure how to in Blogger), so I'll just put the proof outline in rot13 below.

Fhccbfr bgurejvfr. Vg'f pyrne gung n zbabpuebzngvp ertvba arrqf gb or pbzcyrgryl fheebhaqrq ol n ertvba bs gur bccbfvgr pbybe; vg'f nyfb boivbhf gung gur nern bs gur bhgre ertvba cyhf gur rairybcrq fdhnerf vf fgevpgyl terngre guna gur nern bs gur rairybcrq fdhnerf. Abj, vg'f rnfl gb fubj gung nf gur nern bs n pbaarpgrq ertvba nccebnpurf vasvavgl, fb qbrf gur nern bs na rairybcvat ertvba. Chggvat gurfr snpgf gbtrgure, gur gurberz rnfvyl sbyybjf.

Not that bad, but what happens if we increase the number of colors (and the number of dimensions)? Using an inductive argument, it's possible to color Z^d with (d+1) colors and get regions of bounded size; but I don't see any easy combinatorial argument that gives any (non-constant) lower bound. Any ideas?

Saturday, December 29, 2007

It runs in the family

So at dinner on Wednesday, I noticed that my younger brother (age 14, non-Honors Algebra II, smart but unmotivated, pretty much what I would have been had I had an academically overachieving older sibling) was filling out 16-man brackets. But these weren't NCAA basketball brackets (what he would have been doing with those in December is beyond me anyway), or fantasy NCAA football playoffs, or any of the above. They were 16 competitors labeled from "A" to "P."

So I asked him: What are you doing?
Him: Filling out brackets.
Me: Why?
(Silence).
(A little later)
Me: Chris, why are you filling out brackets all over your paper?
(Silence).
(Later still)
Him: Harrison, I have a question.
Me: Shoot.
Him: If you have 16 people and you know their initial positions in a bracket, what's the minimum number of spots you have to fill in so you can fill in the rest of the bracket?

I'm pretty sure I've been more proud of a member of my family than I was of him at that moment. But I'm having trouble thinking of an example.

[PS: I told him he should guest-post the problem on this blog. He politely declined.]

Wednesday, December 19, 2007

P, BPP, VDW, and all that jazz: A Complex Conversation in Three Parts

BACKGROUND

Louis and I were discussing the problem of finding an arithmetic progression of "special elements" in a general series. It all started like this:

Louis: Awesome question:
Me: ?
Louis: Find the longest arithmetic sequence of awesome XKCD comics
Me: XD
Me: 1. that's totally subjective, 2. that's AWESOME
Louis: well, it's different for every person of coures
Louis: *course
Me: well, of course, since they do have positive density, if xkcd were to continue forever
Me: there'd be arbitrarily long APs...
[...]
Louis: I'm talking a computational
Louis: problem
Me: oh
Me: hm.
Louis: not VDW :P
Me: well that's technically szemeredi...
Me: wait, is finding long APs...
Louis: VDW.
Me: szemeredi, and is it in P?
Louis: VDW.
Louis: And yes, I believe so
Louis: given that there are only n^2 possible AP'sMe: oh, d'oh. gp.
Me: lol
Louis: ;-)
Me: (incidentally, there's more like n^3 partial APs, but still polynomial)
Louis: ah, partial AP's.
Louis: well, we needn't consider only partial AP's
Louis: for each a_i, for each a_j, search the arithmetic sequence going a_i, a_j, a_i + 2(a_j - a_i),...
Louis: as far as you can go until you reach something not in the set.
Me: yeah, yeah
Me: and, btw, it is about n^2 log n [Exercise for the reader: Why?]

I'm many things, but I'm not an Algorithms person, and this was now, to me, an Algorithms question and therefore dead. (It didn't help that I couldn't find any way of checking for an AP in o(n^2) time). But then:

AN EPIPHANY

Me: I'm wondering if this might
Me: if this could provide an oracle separation between P and BPP
Louis: XD
Me: seriously!
Louis: how?
Louis: it's in P!
Me: yeah, but
Me: let's take sets of size
Me: well, let's consider the problem of finding an AP of length at least n in the range [1, superpoly(n)], 2-colored according to some cuh-razy random oracle.
Louis: okay
Me: (actually, we should probably specify that it has to be one specific color)
Louis: you do realize you're now reminding me of the quantum database search problem, too?
Me: let's say that all of the elements of the AP have to be yellow, and the oracle colors a number yellow with some fairly high probability
Louis: okay
Me: actually, let's not make it length n, let's make it length [O(log(superpoly(n)))]
Louis: how will BPP help us?
Me: for almost all oracles, all we have to do is pick a random AP and there's positive probability it's yellow.
Louis: but wait
Louis: if the numbers are colored randomly
Louis: then why would randomized search be any better than brute force?
Me: on average? duh, it wouldn't. worst-case, though...
Me: it seems less hard to show brute-force won't work well in some fraction of oracle universes

Mathematical backing-up of my assertions in the conversation: Consider a random coloring of [1, superpoly(n)] s.t. the probability of a given number being colored yellow is \delta. There are approximately superpoly(n)^2 APs of length log(superpoly(n)) [I'm totally fudging the numbers there, but the basic argument should still hold.] If the implied constant of the log(superpoly(n)) is sufficiently small, then the probability of a randomly chosen AP of that length being yellow is at least \delta^log(superpoly(n)) > 1/superpoly(n)^2.

N.B., in fact, that by doing nothing more than messing around with a few constants, we can get the probability to be > k/superpoly(n)^2 for any constant k we want to work with. (Or even, for that matter, > k/superpoly(n).) Then the expected value of the number of APs of the...oh, screw it, of the right length is k.

So! In retrospect, the functions I picked don't work very well, but it's not difficult to change them around a bit so that our above argument (after throwing in some stuff about the Chernoff bound) actually ends up telling us the following:

1. For almost all oracles we've considered here, there exists f(n) superpolynomial, g(n) really slow-growing such that a BPP algorithm can pick polynomially many APs of length g(n) in the range [1, f(n)] s.t. with positive, constant probability, at least one will be yellow.

And here's the crux: There's no obvious way that a deterministic algorithm can do the same thing. Actually, with all these oracle universes floating around, it should be obvious to even the most dimwitted individual who holds an advanced degree in hyperbolic topology, n'hey [sorry], that some of them won't.

CONCLUSION

The problem is, though, that like most lower bounds, this fact is easier noticed than proved. And since there are already known oracle separations of P from BPP, one more wouldn't be particularly exciting. Still, though, it's fun to see two of my (and some other occasional readers') main interests -- computational complexity and additive combinatorics -- work together in a way I can understand, even if I can't quite finish the proof.

Friday, November 30, 2007

10 Combinatorics Results to See Before You Die

OK, this is my personal list of the 10 most beautiful, useful, or interesting results in combinatorics. (Note: I'm leaving out complexity theory, since I'll hopefully cover that in a future post). These are, of course, heavily biased toward my interests and specialties, but I hope that anyone interested in the subject will be able to appreciate the list. So, in no particular order:


1. Erdos' lower bound for the Ramsey numbers R(k,k).
WHY? It's one of the best examples of the probabilistic method in action. Furthermore, it showcases the often-seen phenomenon in combinatorics where we can prove the existence of a certain object (a complete graph with 2^(k/2) vertices and no monochromatic k-clique), but can't come close to constructing it.

2. Szemeredi's regularity lemma.
WHY? It's a highly unintuitive proof of a highly unintuitive result, but both the proof and the lemma have turned out to be incredibly useful. If you're a graph theorist, you have no excuse for not knowing the statement of the Lemma; if you do additive combinatorics or theoretical CS, you have no reason not to know its proof. And if you're a mathematical aficionado, you should at least look at both.

3. Shelah's proof of the Hales-Jewett theorem.
WHY? Even though Hales and Jewett's original proof is shorter and cleaner, Shelah's strikes at the heart of "why" HJT is true. It manages to sidestep the Ackermann-type bounds that arose in all virtually all earlier proofs of Ramsey-type theorems on the integers, and it does it in such an incredibly simple way, you'll be left dumbstruck.

4. Sum-product inequalities from the crossing number inequality.
WHY? The crossing number inequality, the Szemeredi-Trotter theorem, and good sum-product estimates are all examples of highly nontrivial, very interesting results in combinatorics. And yet all three can be derived in just a couple of pages from what's essentially high-school level mathematics. A great example of how interesting combinatorial results often have simple and beautiful "Book proofs."

5. The Marcus-Tardos proof of the Stanley-Wilf conjecture.
WHY? OK, it would be wrong not to give a shout-out to my semi-official mentor, Adam Marcus. But there's more than that; this is another Book proof of a long-standing and unobvious conjecture. Just more evidence that combinatorics is the most level of mathematical playing fields.

6. Thomassen's proof of the five-list-color theorem.
WHY? This is the four-color theorem for list colorings, except, well, it doesn't take fifty pages and 200 hours of computer time to lay out. More like half a page -- pretty much as simple as Kempe's proof of the regular 5CT. And a wonderful illustration of the general principle that, with induction, it's sometimes easier to prove a more restrictive result.

7. Arrow's impossibility theorem.
WHY? OK, OK, this isn't usually considered to be part of combinatorics. But the statement of the theorem, as well as the proof, are heavily combinatorial. The theorem has (gasp!) a real-world interpretation, in terms of (duh) voting methods, and the proof is canonical combinatorial contradiction.

8. Kuperberg's proof of the alternating sign matrix conjecture.
WHY? Zeilberger's original proof is a tour de force of combinatorial reasoning, but Kuperberg's argument is unsurpassed in its originality. Statistical mechanics and its offshoots (mathematical physics, ergodic theory) have surprisingly many applications in combinatorics, and Kuperberg's masterful use of the Yang-Baxter equation is a novel example of this trend.

9. The "necklace proof" of Fermat's Little Theorem.
WHY? First of all, the fact that FLT, often considered a "number-theoretical" result, permits a strictly combinatorial argument shows combinatorics' applicability to other branches of math. Second, the proof itself shows that symmetry groups are useful for more than just enumeration. (They're also good for enumeration (mod p).

10. Add your own! What's one of the most original, informative, clever, or beautiful combinatorial results of all time? Comments are more than welcome.

Wednesday, November 21, 2007

The chromatic number of n-space

Consider the infinite graph G defined as follows:

The vertex set of G is the set of points of n-dimensional Euclidean space R^n. Two vertices are adjacent if the corresponding points are unit distance apart. The question is, what is the chromatic number of G?

First note that a simple compactness argument shows that this is equivalent to finding the maximum chromatic number of a unit distance graph in R^n; i.e., if we can show that any finite subgraph of G is k-chromatic, then G is k-chromatic.

For n = 2, this is a fairly famous problem. But for higher dimensions, although a lot of work has been done, the known results are (as often happens in combinatorics!) sometimes woefully poor.

Just as one example: The best known general upper bound for the chromatic number of R^n is (3+o(1))^n. But consider the c x c x ... x c hypercube, where c < 3, subdivided into 3^n hypercubes of edge length c/3. If we color each of these hypercubes distinctly, we may tessellate n-space with them, so \Chi(R^n) \leq 3^n is practically trivial!

This is one of the many examples in combinatorics where the best known results are only marginally better than easy, or even trivial, results. (Another good one: upper bounds for the Ramsey numbers R(n,n).) The interesting thing here is that it derives from a geometric fact: namely, that the only regular polytope that tiles n-space for n \geq 5 is the hypercube. The best known bound (7) for 2-space derives from a hexagonal tiling; the best known bound for 3-space is 16 -- I don't know where that comes from. The best bound for 4-space is 49 -- does this arise directly from the tessellation of hyperspace by 24-cells? I'd be greatly indebted to anyone who can shed light on that.

Incidentally, the best known lower bound for \Chi(R^n) is something like (1.239+o(1))^n. I don't see any obvious way to construct even a superlinear unit distance graph in R^n. Anyone know the details of the proof of that bound?

Wednesday, November 14, 2007

The (Complexity) Theory of NCLB

So, one of the biggest complaints we hear these days about the No Child Left Behind Act is the possibility, with "high-stakes testing," of teachers "teaching to the test." If students aren't actually learning the material, critics argue, then NCLB is hurting, not helping, these kids.

But the Atlanta Journal-Constitution yesterday wrote the following:

But why's that ["teaching to the test"] so terrible? If a test is an accurate measure of what should have been learned, teaching to the test is fine.


The AJC's editorial board makes a good point, but I thought I'd go ahead and state it a little more formally. So here we go:

A state Department of Education ("Dean") wishes to write a test satisfying NCLB requirements. Furthermore, Dean has to write the test in limited (i.e., polynomial) time; after all, we have to test new students at least every year! An overworked teacher ("Olivia") has to pass all her students, but wants to spend as little time teaching as possible. So if she can save time and effort by "teaching to the test," she'll do so.

Now, let's say that the time complexity for Olivia of actually teaching the material is O(f(n)) (where n is the length of the test, in problems). (She can talk as fast or as slow as she likes, which is why the implicit constant doesn't make a difference in this case.) Then Dean's goal is as follows: He wants to write a test T such that, for any probabilistic "teaching algorithm" with o(f(n)) time complexity that Olivia might use and any constant $\epsilon$ > 0, there exists a constant N such that, for a test T of length n > N,

Pr[J. Random Student passes test T after being taught by Olivia] < $\epsilon$.


If this reminds you of the formal definition of a one-way function, well, you should probably look up the formal definition of a one-way function, 'cause there's some pretty key differences. But actually, the analogy isn't all that bad. We certainly have some good candidates for OWFs, and it's not unreasonable to think that similar methods exist for creating tests that can't be "taught to."

OK, so my analogy's pretty weak. But I don't think it's unsaveable. Therefore, I'm willing to shell out $15 to anyone who can suitably formalize high-stakes testing in such a way that they can show that the existence of (trapdoor) one-way functions (perhaps relative to some sort of "curriculum oracle?") implies the existence of tests that aren't "teachable to."

If, on the other hand, you can convince me that it's always possible to cheat the system in any good formalization of HST, I'll pay you $25 (yeah, I'm cheap. I'm a student, get over it) and write a letter to my Congressman.


What I'm more concerned about with NCLB -- and, by the way, I'm shocked that complexity theorists aren't already up in arms about this -- is the requirement that, by 2014, all children will test at the "proficient" level on state tests. Look, the "pencil drop" is a classic element of standardized testing, and while I'm happy that Congress is so convinced that P = BPP, I think mandating full derandomization by 2014 -- while simultaneously providing so little funding -- is just insane.