So at dinner on Wednesday, I noticed that my younger brother (age 14, non-Honors Algebra II, smart but unmotivated, pretty much what I would have been had I had an academically overachieving older sibling) was filling out 16-man brackets. But these weren't NCAA basketball brackets (what he would have been doing with those in December is beyond me anyway), or fantasy NCAA football playoffs, or any of the above. They were 16 competitors labeled from "A" to "P."

So I asked him: What are you doing?

Him: Filling out brackets.

Me: Why?

(Silence).

(A little later)

Me: Chris, why are you filling out brackets all over your paper?

(Silence).

(Later still)

Him: Harrison, I have a question.

Me: Shoot.

Him: If you have 16 people and you know their initial positions in a bracket, what's the minimum number of spots you have to fill in so you can fill in the rest of the bracket?

I'm pretty sure I've been more proud of a member of my family than I was of him at that moment. But I'm having trouble thinking of an example.

[PS: I told him he should guest-post the problem on this blog. He politely declined.]

## Saturday, December 29, 2007

## Wednesday, December 19, 2007

### P, BPP, VDW, and all that jazz: A Complex Conversation in Three Parts

BACKGROUND

Louis and I were discussing the problem of finding an arithmetic progression of "special elements" in a general series. It all started like this:

Louis: Awesome question:

Me: ?

Louis: Find the longest arithmetic sequence of awesome XKCD comics

Me: XD

Me: 1. that's totally subjective, 2. that's AWESOME

Louis: well, it's different for every person of coures

Louis: *course

Me: well, of course, since they do have positive density, if xkcd were to continue forever

Me: there'd be arbitrarily long APs...

[...]

Louis: I'm talking a computational

Louis: problem

Me: oh

Me: hm.

Louis: not VDW :P

Me: well that's technically szemeredi...

Me: wait, is finding long APs...

Louis: VDW.

Me: szemeredi, and is it in P?

Louis: VDW.

Louis: And yes, I believe so

Louis: given that there are only n^2 possible AP'sMe: oh, d'oh. gp.

Me: lol

Louis: ;-)

Me: (incidentally, there's more like n^3 partial APs, but still polynomial)

Louis: ah, partial AP's.

Louis: well, we needn't consider only partial AP's

Louis: for each a_i, for each a_j, search the arithmetic sequence going a_i, a_j, a_i + 2(a_j - a_i),...

Louis: as far as you can go until you reach something not in the set.

Me: yeah, yeah

Me: and, btw, it is about n^2 log n [Exercise for the reader: Why?]

I'm many things, but I'm not an Algorithms person, and this was now, to me, an Algorithms question and therefore dead. (It didn't help that I couldn't find any way of checking for an AP in o(n^2) time). But then:

AN EPIPHANY

Me: I'm wondering if this might

Me: if this could provide an oracle separation between P and BPP

Louis: XD

Me: seriously!

Louis: how?

Louis: it's in P!

Me: yeah, but

Me: let's take sets of size

Me: well, let's consider the problem of finding an AP of length at least n in the range [1, superpoly(n)], 2-colored according to some cuh-razy random oracle.

Louis: okay

Me: (actually, we should probably specify that it has to be one specific color)

Louis: you do realize you're now reminding me of the quantum database search problem, too?

Me: let's say that all of the elements of the AP have to be yellow, and the oracle colors a number yellow with some fairly high probability

Louis: okay

Me: actually, let's not make it length n, let's make it length [O(log(superpoly(n)))]

Louis: how will BPP help us?

Me: for almost all oracles, all we have to do is pick a random AP and there's positive probability it's yellow.

Louis: but wait

Louis: if the numbers are colored randomly

Louis: then why would randomized search be any better than brute force?

Me: on average? duh, it wouldn't. worst-case, though...

Me: it seems less hard to show brute-force won't work well in some fraction of oracle universes

Mathematical backing-up of my assertions in the conversation: Consider a random coloring of [1, superpoly(n)] s.t. the probability of a given number being colored yellow is \delta. There are approximately superpoly(n)^2 APs of length log(superpoly(n)) [I'm totally fudging the numbers there, but the basic argument should still hold.] If the implied constant of the log(superpoly(n)) is sufficiently small, then the probability of a randomly chosen AP of that length being yellow is at least \delta^log(superpoly(n)) > 1/superpoly(n)^2.

N.B., in fact, that by doing nothing more than messing around with a few constants, we can get the probability to be > k/superpoly(n)^2 for any constant k we want to work with. (Or even, for that matter, > k/superpoly(n).) Then the expected value of the number of APs of the...oh, screw it, of the right length is k.

So! In retrospect, the functions I picked don't work very well, but it's not difficult to change them around a bit so that our above argument (after throwing in some stuff about the Chernoff bound) actually ends up telling us the following:

1. For almost all oracles we've considered here, there exists f(n) superpolynomial, g(n) really slow-growing such that a BPP algorithm can pick polynomially many APs of length g(n) in the range [1, f(n)] s.t. with positive, constant probability, at least one will be yellow.

And here's the crux: There's no obvious way that a deterministic algorithm can do the same thing. Actually, with all these oracle universes floating around, it should be obvious to even the most dimwitted individual who holds an advanced degree in hyperbolic topology, n'hey [sorry], that some of them won't.

CONCLUSION

The problem is, though, that like most lower bounds, this fact is easier noticed than proved. And since there are already known oracle separations of P from BPP, one more wouldn't be particularly exciting. Still, though, it's fun to see two of my (and some other occasional readers') main interests -- computational complexity and additive combinatorics -- work together in a way I can understand, even if I can't quite finish the proof.

Louis and I were discussing the problem of finding an arithmetic progression of "special elements" in a general series. It all started like this:

Louis: Awesome question:

Me: ?

Louis: Find the longest arithmetic sequence of awesome XKCD comics

Me: XD

Me: 1. that's totally subjective, 2. that's AWESOME

Louis: well, it's different for every person of coures

Louis: *course

Me: well, of course, since they do have positive density, if xkcd were to continue forever

Me: there'd be arbitrarily long APs...

[...]

Louis: I'm talking a computational

Louis: problem

Me: oh

Me: hm.

Louis: not VDW :P

Me: well that's technically szemeredi...

Me: wait, is finding long APs...

Louis: VDW.

Me: szemeredi, and is it in P?

Louis: VDW.

Louis: And yes, I believe so

Louis: given that there are only n^2 possible AP'sMe: oh, d'oh. gp.

Me: lol

Louis: ;-)

Me: (incidentally, there's more like n^3 partial APs, but still polynomial)

Louis: ah, partial AP's.

Louis: well, we needn't consider only partial AP's

Louis: for each a_i, for each a_j, search the arithmetic sequence going a_i, a_j, a_i + 2(a_j - a_i),...

Louis: as far as you can go until you reach something not in the set.

Me: yeah, yeah

Me: and, btw, it is about n^2 log n [Exercise for the reader: Why?]

I'm many things, but I'm not an Algorithms person, and this was now, to me, an Algorithms question and therefore dead. (It didn't help that I couldn't find any way of checking for an AP in o(n^2) time). But then:

AN EPIPHANY

Me: I'm wondering if this might

Me: if this could provide an oracle separation between P and BPP

Louis: XD

Me: seriously!

Louis: how?

Louis: it's in P!

Me: yeah, but

Me: let's take sets of size

Me: well, let's consider the problem of finding an AP of length at least n in the range [1, superpoly(n)], 2-colored according to some cuh-razy random oracle.

Louis: okay

Me: (actually, we should probably specify that it has to be one specific color)

Louis: you do realize you're now reminding me of the quantum database search problem, too?

Me: let's say that all of the elements of the AP have to be yellow, and the oracle colors a number yellow with some fairly high probability

Louis: okay

Me: actually, let's not make it length n, let's make it length [O(log(superpoly(n)))]

Louis: how will BPP help us?

Me: for almost all oracles, all we have to do is pick a random AP and there's positive probability it's yellow.

Louis: but wait

Louis: if the numbers are colored randomly

Louis: then why would randomized search be any better than brute force?

Me: on average? duh, it wouldn't. worst-case, though...

Me: it seems less hard to show brute-force won't work well in some fraction of oracle universes

Mathematical backing-up of my assertions in the conversation: Consider a random coloring of [1, superpoly(n)] s.t. the probability of a given number being colored yellow is \delta. There are approximately superpoly(n)^2 APs of length log(superpoly(n)) [I'm totally fudging the numbers there, but the basic argument should still hold.] If the implied constant of the log(superpoly(n)) is sufficiently small, then the probability of a randomly chosen AP of that length being yellow is at least \delta^log(superpoly(n)) > 1/superpoly(n)^2.

N.B., in fact, that by doing nothing more than messing around with a few constants, we can get the probability to be > k/superpoly(n)^2 for any constant k we want to work with. (Or even, for that matter, > k/superpoly(n).) Then the expected value of the number of APs of the...oh, screw it, of the right length is k.

So! In retrospect, the functions I picked don't work very well, but it's not difficult to change them around a bit so that our above argument (after throwing in some stuff about the Chernoff bound) actually ends up telling us the following:

1. For almost all oracles we've considered here, there exists f(n) superpolynomial, g(n) really slow-growing such that a BPP algorithm can pick polynomially many APs of length g(n) in the range [1, f(n)] s.t. with positive, constant probability, at least one will be yellow.

And here's the crux: There's no obvious way that a deterministic algorithm can do the same thing. Actually, with all these oracle universes floating around, it should be obvious to even the most dimwitted individual who holds an advanced degree in hyperbolic topology, n'hey [sorry], that some of them won't.

CONCLUSION

The problem is, though, that like most lower bounds, this fact is easier noticed than proved. And since there are already known oracle separations of P from BPP, one more wouldn't be particularly exciting. Still, though, it's fun to see two of my (and some other occasional readers') main interests -- computational complexity and additive combinatorics -- work together in a way I can understand, even if I can't quite finish the proof.

## Friday, November 30, 2007

### 10 Combinatorics Results to See Before You Die

OK, this is my personal list of the 10 most beautiful, useful, or interesting results in combinatorics. (Note: I'm leaving out complexity theory, since I'll hopefully cover that in a future post). These are, of course, heavily biased toward my interests and specialties, but I hope that anyone interested in the subject will be able to appreciate the list. So, in no particular order:

1. Erdos' lower bound for the Ramsey numbers R(k,k).

WHY? It's one of the best examples of the probabilistic method in action. Furthermore, it showcases the often-seen phenomenon in combinatorics where we can prove the existence of a certain object (a complete graph with 2^(k/2) vertices and no monochromatic k-clique), but can't come close to constructing it.

2. Szemeredi's regularity lemma.

WHY? It's a highly unintuitive proof of a highly unintuitive result, but both the proof and the lemma have turned out to be incredibly useful. If you're a graph theorist, you have no excuse for not knowing the statement of the Lemma; if you do additive combinatorics or theoretical CS, you have no reason not to know its proof. And if you're a mathematical aficionado, you should at least look at both.

3. Shelah's proof of the Hales-Jewett theorem.

WHY? Even though Hales and Jewett's original proof is shorter and cleaner, Shelah's strikes at the heart of "why" HJT is true. It manages to sidestep the Ackermann-type bounds that arose in all virtually all earlier proofs of Ramsey-type theorems on the integers, and it does it in such an incredibly simple way, you'll be left dumbstruck.

4. Sum-product inequalities from the crossing number inequality.

WHY? The crossing number inequality, the Szemeredi-Trotter theorem, and good sum-product estimates are all examples of highly nontrivial, very interesting results in combinatorics. And yet all three can be derived in just a couple of pages from what's essentially high-school level mathematics. A great example of how interesting combinatorial results often have simple and beautiful "Book proofs."

5. The Marcus-Tardos proof of the Stanley-Wilf conjecture.

WHY? OK, it would be wrong not to give a shout-out to my semi-official mentor, Adam Marcus. But there's more than that; this is another Book proof of a long-standing and unobvious conjecture. Just more evidence that combinatorics is the most level of mathematical playing fields.

6. Thomassen's proof of the five-list-color theorem.

WHY? This is the four-color theorem for list colorings, except, well, it doesn't take fifty pages and 200 hours of computer time to lay out. More like half a page -- pretty much as simple as Kempe's proof of the regular 5CT. And a wonderful illustration of the general principle that, with induction, it's sometimes easier to prove a more restrictive result.

7. Arrow's impossibility theorem.

WHY? OK, OK, this isn't usually considered to be part of combinatorics. But the statement of the theorem, as well as the proof, are heavily combinatorial. The theorem has (gasp!) a real-world interpretation, in terms of (duh) voting methods, and the proof is canonical combinatorial contradiction.

8. Kuperberg's proof of the alternating sign matrix conjecture.

WHY? Zeilberger's original proof is a tour de force of combinatorial reasoning, but Kuperberg's argument is unsurpassed in its originality. Statistical mechanics and its offshoots (mathematical physics, ergodic theory) have surprisingly many applications in combinatorics, and Kuperberg's masterful use of the Yang-Baxter equation is a novel example of this trend.

9. The "necklace proof" of Fermat's Little Theorem.

WHY? First of all, the fact that FLT, often considered a "number-theoretical" result, permits a strictly combinatorial argument shows combinatorics' applicability to other branches of math. Second, the proof itself shows that symmetry groups are useful for more than just enumeration. (They're also good for enumeration (mod p).

10. Add your own! What's one of the most original, informative, clever, or beautiful combinatorial results of all time? Comments are more than welcome.

1. Erdos' lower bound for the Ramsey numbers R(k,k).

WHY? It's one of the best examples of the probabilistic method in action. Furthermore, it showcases the often-seen phenomenon in combinatorics where we can prove the existence of a certain object (a complete graph with 2^(k/2) vertices and no monochromatic k-clique), but can't come close to constructing it.

2. Szemeredi's regularity lemma.

WHY? It's a highly unintuitive proof of a highly unintuitive result, but both the proof and the lemma have turned out to be incredibly useful. If you're a graph theorist, you have no excuse for not knowing the statement of the Lemma; if you do additive combinatorics or theoretical CS, you have no reason not to know its proof. And if you're a mathematical aficionado, you should at least look at both.

3. Shelah's proof of the Hales-Jewett theorem.

WHY? Even though Hales and Jewett's original proof is shorter and cleaner, Shelah's strikes at the heart of "why" HJT is true. It manages to sidestep the Ackermann-type bounds that arose in all virtually all earlier proofs of Ramsey-type theorems on the integers, and it does it in such an incredibly simple way, you'll be left dumbstruck.

4. Sum-product inequalities from the crossing number inequality.

WHY? The crossing number inequality, the Szemeredi-Trotter theorem, and good sum-product estimates are all examples of highly nontrivial, very interesting results in combinatorics. And yet all three can be derived in just a couple of pages from what's essentially high-school level mathematics. A great example of how interesting combinatorial results often have simple and beautiful "Book proofs."

5. The Marcus-Tardos proof of the Stanley-Wilf conjecture.

WHY? OK, it would be wrong not to give a shout-out to my semi-official mentor, Adam Marcus. But there's more than that; this is another Book proof of a long-standing and unobvious conjecture. Just more evidence that combinatorics is the most level of mathematical playing fields.

6. Thomassen's proof of the five-list-color theorem.

WHY? This is the four-color theorem for list colorings, except, well, it doesn't take fifty pages and 200 hours of computer time to lay out. More like half a page -- pretty much as simple as Kempe's proof of the regular 5CT. And a wonderful illustration of the general principle that, with induction, it's sometimes easier to prove a more restrictive result.

7. Arrow's impossibility theorem.

WHY? OK, OK, this isn't usually considered to be part of combinatorics. But the statement of the theorem, as well as the proof, are heavily combinatorial. The theorem has (gasp!) a real-world interpretation, in terms of (duh) voting methods, and the proof is canonical combinatorial contradiction.

8. Kuperberg's proof of the alternating sign matrix conjecture.

WHY? Zeilberger's original proof is a tour de force of combinatorial reasoning, but Kuperberg's argument is unsurpassed in its originality. Statistical mechanics and its offshoots (mathematical physics, ergodic theory) have surprisingly many applications in combinatorics, and Kuperberg's masterful use of the Yang-Baxter equation is a novel example of this trend.

9. The "necklace proof" of Fermat's Little Theorem.

WHY? First of all, the fact that FLT, often considered a "number-theoretical" result, permits a strictly combinatorial argument shows combinatorics' applicability to other branches of math. Second, the proof itself shows that symmetry groups are useful for more than just enumeration. (They're also good for enumeration (mod p).

10. Add your own! What's one of the most original, informative, clever, or beautiful combinatorial results of all time? Comments are more than welcome.

## Wednesday, November 21, 2007

### The chromatic number of n-space

Consider the infinite graph G defined as follows:

The vertex set of G is the set of points of n-dimensional Euclidean space R^n. Two vertices are adjacent if the corresponding points are unit distance apart. The question is, what is the chromatic number of G?

First note that a simple compactness argument shows that this is equivalent to finding the maximum chromatic number of a unit distance graph in R^n; i.e., if we can show that any finite subgraph of G is k-chromatic, then G is k-chromatic.

For n = 2, this is a fairly famous problem. But for higher dimensions, although a lot of work has been done, the known results are (as often happens in combinatorics!) sometimes woefully poor.

Just as one example: The best known general upper bound for the chromatic number of R^n is (3+o(1))^n. But consider the c x c x ... x c hypercube, where c < 3, subdivided into 3^n hypercubes of edge length c/3. If we color each of these hypercubes distinctly, we may tessellate n-space with them, so \Chi(R^n) \leq 3^n is practically trivial!

This is one of the many examples in combinatorics where the best known results are only marginally better than easy, or even trivial, results. (Another good one: upper bounds for the Ramsey numbers R(n,n).) The interesting thing here is that it derives from a geometric fact: namely, that the only regular polytope that tiles n-space for n \geq 5 is the hypercube. The best known bound (7) for 2-space derives from a hexagonal tiling; the best known bound for 3-space is 16 -- I don't know where that comes from. The best bound for 4-space is 49 -- does this arise directly from the tessellation of hyperspace by 24-cells? I'd be greatly indebted to anyone who can shed light on that.

Incidentally, the best known lower bound for \Chi(R^n) is something like (1.239+o(1))^n. I don't see any obvious way to construct even a superlinear unit distance graph in R^n. Anyone know the details of the proof of that bound?

The vertex set of G is the set of points of n-dimensional Euclidean space R^n. Two vertices are adjacent if the corresponding points are unit distance apart. The question is, what is the chromatic number of G?

First note that a simple compactness argument shows that this is equivalent to finding the maximum chromatic number of a unit distance graph in R^n; i.e., if we can show that any finite subgraph of G is k-chromatic, then G is k-chromatic.

For n = 2, this is a fairly famous problem. But for higher dimensions, although a lot of work has been done, the known results are (as often happens in combinatorics!) sometimes woefully poor.

Just as one example: The best known general upper bound for the chromatic number of R^n is (3+o(1))^n. But consider the c x c x ... x c hypercube, where c < 3, subdivided into 3^n hypercubes of edge length c/3. If we color each of these hypercubes distinctly, we may tessellate n-space with them, so \Chi(R^n) \leq 3^n is practically trivial!

This is one of the many examples in combinatorics where the best known results are only marginally better than easy, or even trivial, results. (Another good one: upper bounds for the Ramsey numbers R(n,n).) The interesting thing here is that it derives from a geometric fact: namely, that the only regular polytope that tiles n-space for n \geq 5 is the hypercube. The best known bound (7) for 2-space derives from a hexagonal tiling; the best known bound for 3-space is 16 -- I don't know where that comes from. The best bound for 4-space is 49 -- does this arise directly from the tessellation of hyperspace by 24-cells? I'd be greatly indebted to anyone who can shed light on that.

Incidentally, the best known lower bound for \Chi(R^n) is something like (1.239+o(1))^n. I don't see any obvious way to construct even a superlinear unit distance graph in R^n. Anyone know the details of the proof of that bound?

## Wednesday, November 14, 2007

### The (Complexity) Theory of NCLB

So, one of the biggest complaints we hear these days about the No Child Left Behind Act is the possibility, with "high-stakes testing," of teachers "teaching to the test." If students aren't actually learning the material, critics argue, then NCLB is hurting, not helping, these kids.

But the Atlanta Journal-Constitution yesterday wrote the following:

The AJC's editorial board makes a good point, but I thought I'd go ahead and state it a little more formally. So here we go:

A state Department of Education ("Dean") wishes to write a test satisfying NCLB requirements. Furthermore, Dean has to write the test in limited (i.e., polynomial) time; after all, we have to test new students at least every year! An overworked teacher ("Olivia") has to pass all her students, but wants to spend as little time teaching as possible. So if she can save time and effort by "teaching to the test," she'll do so.

Now, let's say that the time complexity for Olivia of actually teaching the material is O(f(n)) (where n is the length of the test, in problems). (She can talk as fast or as slow as she likes, which is why the implicit constant doesn't make a difference in this case.) Then Dean's goal is as follows: He wants to write a test T such that, for any probabilistic "teaching algorithm" with o(f(n)) time complexity that Olivia might use and any constant $\epsilon$ > 0, there exists a constant N such that, for a test T of length n > N,

Pr[J. Random Student passes test T after being taught by Olivia] < $\epsilon$.

If this reminds you of the formal definition of a one-way function, well, you should probably look up the formal definition of a one-way function, 'cause there's some pretty key differences. But actually, the analogy isn't all that bad. We certainly have some good candidates for OWFs, and it's not unreasonable to think that similar methods exist for creating tests that can't be "taught to."

OK, so my analogy's pretty weak. But I don't think it's unsaveable. Therefore, I'm willing to shell out $15 to anyone who can suitably formalize high-stakes testing in such a way that they can show that the existence of (trapdoor) one-way functions (perhaps relative to some sort of "curriculum oracle?") implies the existence of tests that aren't "teachable to."

If, on the other hand, you can convince me that it's always possible to cheat the system in any good formalization of HST, I'll pay you $25 (yeah, I'm cheap. I'm a student, get over it) and write a letter to my Congressman.

What I'm more concerned about with NCLB -- and, by the way, I'm shocked that complexity theorists aren't already up in arms about this -- is the requirement that, by 2014, all children will test at the "proficient" level on state tests. Look, the "pencil drop" is a classic element of standardized testing, and while I'm happy that Congress is so convinced that P = BPP, I think mandating full derandomization by 2014 -- while simultaneously providing so little funding -- is just insane.

But the Atlanta Journal-Constitution yesterday wrote the following:

But why's that ["teaching to the test"] so terrible? If a test is an accurate measure of what should have been learned, teaching to the test is fine.

The AJC's editorial board makes a good point, but I thought I'd go ahead and state it a little more formally. So here we go:

A state Department of Education ("Dean") wishes to write a test satisfying NCLB requirements. Furthermore, Dean has to write the test in limited (i.e., polynomial) time; after all, we have to test new students at least every year! An overworked teacher ("Olivia") has to pass all her students, but wants to spend as little time teaching as possible. So if she can save time and effort by "teaching to the test," she'll do so.

Now, let's say that the time complexity for Olivia of actually teaching the material is O(f(n)) (where n is the length of the test, in problems). (She can talk as fast or as slow as she likes, which is why the implicit constant doesn't make a difference in this case.) Then Dean's goal is as follows: He wants to write a test T such that, for any probabilistic "teaching algorithm" with o(f(n)) time complexity that Olivia might use and any constant $\epsilon$ > 0, there exists a constant N such that, for a test T of length n > N,

Pr[J. Random Student passes test T after being taught by Olivia] < $\epsilon$.

If this reminds you of the formal definition of a one-way function, well, you should probably look up the formal definition of a one-way function, 'cause there's some pretty key differences. But actually, the analogy isn't all that bad. We certainly have some good candidates for OWFs, and it's not unreasonable to think that similar methods exist for creating tests that can't be "taught to."

OK, so my analogy's pretty weak. But I don't think it's unsaveable. Therefore, I'm willing to shell out $15 to anyone who can suitably formalize high-stakes testing in such a way that they can show that the existence of (trapdoor) one-way functions (perhaps relative to some sort of "curriculum oracle?") implies the existence of tests that aren't "teachable to."

If, on the other hand, you can convince me that it's always possible to cheat the system in any good formalization of HST, I'll pay you $25 (yeah, I'm cheap. I'm a student, get over it) and write a letter to my Congressman.

What I'm more concerned about with NCLB -- and, by the way, I'm shocked that complexity theorists aren't already up in arms about this -- is the requirement that, by 2014, all children will test at the "proficient" level on state tests. Look, the "pencil drop" is a classic element of standardized testing, and while I'm happy that Congress is so convinced that P = BPP, I think mandating full derandomization by 2014 -- while simultaneously providing so little funding -- is just insane.

## Tuesday, October 23, 2007

### Math drug

PROBABILISTIC METHOD

Take 50 mg. once per day with food and water. Do not operate heavy machinery while under the influence of this proof method. Side effects may include: Loss of respect for constructive proofs, inability to exhibit examples, and heavy reliance on linearity of expectation. Talk to your research advisor if more severe side effects occur.

As with all probability theory, there is some risk of dependence, but this may be ignored as long as 4dp \leq 1 (Erdos and Lovasz).

SURGEON GENERAL'S WARNING: Applying this method to certain research may give trivial O(n^2) bounds when a O(n) bound is known and a bound of (1+o(1))n is sought.

X[

Take 50 mg. once per day with food and water. Do not operate heavy machinery while under the influence of this proof method. Side effects may include: Loss of respect for constructive proofs, inability to exhibit examples, and heavy reliance on linearity of expectation. Talk to your research advisor if more severe side effects occur.

As with all probability theory, there is some risk of dependence, but this may be ignored as long as 4dp \leq 1 (Erdos and Lovasz).

SURGEON GENERAL'S WARNING: Applying this method to certain research may give trivial O(n^2) bounds when a O(n) bound is known and a bound of (1+o(1))n is sought.

X[

## Wednesday, September 19, 2007

### Y'arr! Cryptography and piracy.

Ahoy, me mathematical mateys. Seein' as how today be the International Talk Like a Pirate Day, I decided to make a piratey post. Y'arr.

Y'arr! Here be a rough formal definition o' what ye can use for public-key encryption. Arr. A function f:Zx(Z/nZ) -> Z/nZ be a trapdoor function if:

1. Ye may compute f(m,k) smartly (in polynomial time).

2. The problem o' findin' a preimage o' k \in Z/nZ, given fixed m, is in NP.

3. There be some t not dependin' on k so, given t, findin' a preimage o' k with fixed m is in P.

Aye, I claim that these be the functions from which ye can construct a public-key cryptosystem. If ye disagree with me, ye can walk the plank. Otherwise, we'll continue:

Call the complexity class which consists o' findin' preimages under trapdoor functions PK (for public-key. Or plank, which be like a trapdoor, but better suited for seafarin' folk.). Is PK contained in FBQP?

If the answer be "aye," then pirates rejoice! With a quantum computer, ye can crack any classical public-key system, which would allow ye to commandeer credit card information and much, much more, without the risks of scurvy and losin' limbs that make pirates' health-insurance premiums so #@#$in' high.

However, I believe that the answer be "nay." Since P is in BQP, this would imply P < NP, so I'm not expectin' any o' ye landlubbers to solve it. But if ye can give me evidence of why my hunch is correct, this ol' scurvy dog'll be forever indebted. Y'arr.

Y'arr! Here be a rough formal definition o' what ye can use for public-key encryption. Arr. A function f:Zx(Z/nZ) -> Z/nZ be a trapdoor function if:

1. Ye may compute f(m,k) smartly (in polynomial time).

2. The problem o' findin' a preimage o' k \in Z/nZ, given fixed m, is in NP.

3. There be some t not dependin' on k so, given t, findin' a preimage o' k with fixed m is in P.

Aye, I claim that these be the functions from which ye can construct a public-key cryptosystem. If ye disagree with me, ye can walk the plank. Otherwise, we'll continue:

Call the complexity class which consists o' findin' preimages under trapdoor functions PK (for public-key. Or plank, which be like a trapdoor, but better suited for seafarin' folk.). Is PK contained in FBQP?

If the answer be "aye," then pirates rejoice! With a quantum computer, ye can crack any classical public-key system, which would allow ye to commandeer credit card information and much, much more, without the risks of scurvy and losin' limbs that make pirates' health-insurance premiums so #@#$in' high.

However, I believe that the answer be "nay." Since P is in BQP, this would imply P < NP, so I'm not expectin' any o' ye landlubbers to solve it. But if ye can give me evidence of why my hunch is correct, this ol' scurvy dog'll be forever indebted. Y'arr.

## Tuesday, September 18, 2007

### The Riemann Hypothesis for Dummies; or, How Bad Music can Make You Obscenely Rich

No, this blog post isn't about 50 Cent.

Here's the scenario:

I'm an ultra-modern, avant-garde composer. I...compose music. One day, I decide, "Hey! I'm gonna write the weirdest piece EVER!"

Here's what I do:

For strings, I write a waltz (which, as you probably know, is in 3/4 time.) But actually, I lied. I don't write a waltz; I only write one measure of a waltz. Which seems like a weird thing to do, but you'll see my plan.

Then, for horns, I write, say, one measure of 4/4 time. And for piano, one measure of, let's say, 7/4 time. And so on.

Then I instruct each section to play their one single measure over and over and over again until they get tired, and then to keep playing it until all the sections finish their respective measures at the same moment.

The musicians will gripe about this arrangement, as musicians often do, but the overall sound won't repeat for a very long time! For example: If, as above, the strings play in 3/4, the horns in 4/4, and the piano in 7/4, then the piece will last a total of 3*4*7 = 84 beats.

But there's a catch! Yes, I'm an edgy and avant-garde composer, but I'm also environmentally conscious. Which means I want to make this piece as long as I possibly can (without repeating itself, of course) while only using a fixed number of beats, so that I won't waste paper. (Don't think too hard about that part. It's avant-garde composer logic.)

So, here's my question: Exactly how long can I make my piece last without writing more than, oh, n total beats?

Believe it or not, this question is actually unsolved! More than 100 years ago, in 1902, a German mathematician named Edmund Landau proved that when n is really big, I can't make it go on for much longer than e^\sqrt{n ln n} beats (where e is the base of the natural logarithms, or 2.71828...) This is a weird function! It grows a LOT slower than any exponential function (e.g. 1,2,4,8,...) but a LOT faster than any polynomial (for example, n^2, or 1,4,9,16...).

But mathematicians think they can do better than Landau. In fact, they think that something similar to Landau's answer -- exactly the same, in fact, except that n ln n has been replaced by something called "Li^-1(n)," which behaves very similarly -- provides an upper bound for big enough n, not just a close-to-upper bound.

But no one's been able to prove their conjecture! In fact, mathematicians have shown that this conjecture is precisely equivalent to the famous Riemann Hypothesis.

And the RH is considered to be one of the hardest problems in mathematics. It's so hard, in fact, that the Clay Mathematics Institute is offering a million-dollar reward to the first person who can publish a correct proof. In other words, my composer's idle question is actually worth seven figures.

Here's the scenario:

I'm an ultra-modern, avant-garde composer. I...compose music. One day, I decide, "Hey! I'm gonna write the weirdest piece EVER!"

Here's what I do:

For strings, I write a waltz (which, as you probably know, is in 3/4 time.) But actually, I lied. I don't write a waltz; I only write one measure of a waltz. Which seems like a weird thing to do, but you'll see my plan.

Then, for horns, I write, say, one measure of 4/4 time. And for piano, one measure of, let's say, 7/4 time. And so on.

Then I instruct each section to play their one single measure over and over and over again until they get tired, and then to keep playing it until all the sections finish their respective measures at the same moment.

The musicians will gripe about this arrangement, as musicians often do, but the overall sound won't repeat for a very long time! For example: If, as above, the strings play in 3/4, the horns in 4/4, and the piano in 7/4, then the piece will last a total of 3*4*7 = 84 beats.

But there's a catch! Yes, I'm an edgy and avant-garde composer, but I'm also environmentally conscious. Which means I want to make this piece as long as I possibly can (without repeating itself, of course) while only using a fixed number of beats, so that I won't waste paper. (Don't think too hard about that part. It's avant-garde composer logic.)

So, here's my question: Exactly how long can I make my piece last without writing more than, oh, n total beats?

Believe it or not, this question is actually unsolved! More than 100 years ago, in 1902, a German mathematician named Edmund Landau proved that when n is really big, I can't make it go on for much longer than e^\sqrt{n ln n} beats (where e is the base of the natural logarithms, or 2.71828...) This is a weird function! It grows a LOT slower than any exponential function (e.g. 1,2,4,8,...) but a LOT faster than any polynomial (for example, n^2, or 1,4,9,16...).

But mathematicians think they can do better than Landau. In fact, they think that something similar to Landau's answer -- exactly the same, in fact, except that n ln n has been replaced by something called "Li^-1(n)," which behaves very similarly -- provides an upper bound for big enough n, not just a close-to-upper bound.

But no one's been able to prove their conjecture! In fact, mathematicians have shown that this conjecture is precisely equivalent to the famous Riemann Hypothesis.

And the RH is considered to be one of the hardest problems in mathematics. It's so hard, in fact, that the Clay Mathematics Institute is offering a million-dollar reward to the first person who can publish a correct proof. In other words, my composer's idle question is actually worth seven figures.

## Sunday, September 16, 2007

### Random graph labelings

OK. Here's what I think I maybe've got:

Let G be the random graph G(n,M) on n vertices and M edges. Then RS(G) = O(M*\delta(G)).

Proof (outline): Set the implied constant in the big-O notation as k, to be determined later.

Base Case: M = 0; trivial.

For M>0, choose a vertex v with degree \delta(G), and remove an edge incident to it to obtain the graph G'. Then, by the IH, RS(G') \leq k*(M-1)*(\delta(G)-1) = k*M*\delta(G)-(kM+k\delta(G)-1). For suitable k (I suspect), the probability that there exists some number in [0, kM+k\delta(G)-1] which we can add to the label on v so that the labeling remains range-relaxed graceful is 1.

...Yeah. I guess I hope that this is workable? I dunno though. Eh.

Let G be the random graph G(n,M) on n vertices and M edges. Then RS(G) = O(M*\delta(G)).

Proof (outline): Set the implied constant in the big-O notation as k, to be determined later.

Base Case: M = 0; trivial.

For M>0, choose a vertex v with degree \delta(G), and remove an edge incident to it to obtain the graph G'. Then, by the IH, RS(G') \leq k*(M-1)*(\delta(G)-1) = k*M*\delta(G)-(kM+k\delta(G)-1). For suitable k (I suspect), the probability that there exists some number in [0, kM+k\delta(G)-1] which we can add to the label on v so that the labeling remains range-relaxed graceful is 1.

...Yeah. I guess I hope that this is workable? I dunno though. Eh.

## Thursday, September 6, 2007

### Graph labellings and coding theory

So, here's what I did today (among, of course, other stuff):

Call a graph G (n,k)Hamming-representable if there is a labeling f: V(G) -> (F_2)^n such that two vertices v, w are adjacent iff the Hamming distance between f(v) and f(w) is at most k. If G is (n,k)Hamming-representable for some choice of n and k, we'll just call it Hamming-representable.

In addition, say that G is (n,k)Hamming-regular when two vertices are adjacent iff the Hamming distance between their labels is k. Same deal with just "Hamming-regular."

So it's not hard to show that all trees on n edges are (n,1)Hamming-regular; root the tree, assign each vertex apart from the root a number from 1 to n, assign the root the label (0,0,0,...,0), and assign a vertex v a label which has a 1 in the mth spot iff the path from the root to v passes through the vertex assigned m. (If you don't follow that, comment me; I'll attempt to make a diagram so it'll hopefully be clearer.)

Some conjectures/questions:

1. Are all Hamming-regular graphs also Hamming-representable? Trees are (duh), but what about others?

2. Are all graphs Hamming-representable? (Obviously, this is a stronger statement in the affirmative than #1...)

(Edit: So, a thought. Assign an index i to each vertex; set the "bit" in the ith place to 1 if the vertex you're labelling is either vertex i or adjacent to i. Does this work? I'm too tired to prove/find a counterexample; any takers?

Edit^2: Grr, doesn't work. Certain trees, for example. So it's trickier than I thought.)

3. What's the growth rate for the maxmin of n for all Hamming-representable graphs of the same size? Can we construct an (n,k)Hamming-labelling of a tree where n is logarithmic in the number of vertices, or is our current construction optimal?

I think I had more. But I don't remember them. Yeah.

As always, any thoughts/comments/opinions are greatly valued. Comment, people!

Call a graph G (n,k)Hamming-representable if there is a labeling f: V(G) -> (F_2)^n such that two vertices v, w are adjacent iff the Hamming distance between f(v) and f(w) is at most k. If G is (n,k)Hamming-representable for some choice of n and k, we'll just call it Hamming-representable.

In addition, say that G is (n,k)Hamming-regular when two vertices are adjacent iff the Hamming distance between their labels is k. Same deal with just "Hamming-regular."

So it's not hard to show that all trees on n edges are (n,1)Hamming-regular; root the tree, assign each vertex apart from the root a number from 1 to n, assign the root the label (0,0,0,...,0), and assign a vertex v a label which has a 1 in the mth spot iff the path from the root to v passes through the vertex assigned m. (If you don't follow that, comment me; I'll attempt to make a diagram so it'll hopefully be clearer.)

Some conjectures/questions:

1. Are all Hamming-regular graphs also Hamming-representable? Trees are (duh), but what about others?

2. Are all graphs Hamming-representable? (Obviously, this is a stronger statement in the affirmative than #1...)

(Edit: So, a thought. Assign an index i to each vertex; set the "bit" in the ith place to 1 if the vertex you're labelling is either vertex i or adjacent to i. Does this work? I'm too tired to prove/find a counterexample; any takers?

Edit^2: Grr, doesn't work. Certain trees, for example. So it's trickier than I thought.)

3. What's the growth rate for the maxmin of n for all Hamming-representable graphs of the same size? Can we construct an (n,k)Hamming-labelling of a tree where n is logarithmic in the number of vertices, or is our current construction optimal?

I think I had more. But I don't remember them. Yeah.

As always, any thoughts/comments/opinions are greatly valued. Comment, people!

## Monday, September 3, 2007

### Revenge of the Knapsack

Bram Cohen describes a modified knapsack system and suggests it could be used as a public-key cryptosystem. My thoughts:

1. Let's call the elements of the public key a_i, for 1 \leq i \leq k. Consider the sets B_n = {b_i: b_i = a_i (mod n)} for arbitrary n.

2. Note that, if gcd(p,n) is sufficiently large (i.e., > p/2k), then the elements of B_n will follow a very specific distribution. Specifically, each will fall into one of the intervals [a*gcd(p,n), a*gcd(p,n)+p/2k) for some integer a.

3. For p << n << kp with gcd(p,n) large, there will necessarily be more than 1 residue in at least some of the intervals, and, with high probability, each interval will contain at least one residue.

4. Trivially: gcd(p,n) = gcd(p,n+p).

So let's posit a hypothetical function f(n) that will give us a good probabilistic estimate of whether the residues of the a_i (mod n) really do "clump together" in the intervals of size p/2k. This should be close to 0 when gcd(p,n) is low, larger when gcd(p,n) is greater than about p/k, and close to 1 when gcd(p,n) = p. Then f(n) is periodic (or "close to it") with period p. So if we can construct f as a quantum function, we may be able to use a QFT to estimate its period, which would recover the private key.

This is shaky on several levels; first, I don't know whether you can use a QFT to measure functions that are "close to periodic." However, this could perhaps be rectified by defining a new function g: N -> {0, 1/2, 1}, where g(n) is whichever of those three f(n) is nearest to. Then g(n) might be periodic with positive probability.

More crucially: Can we actually devise a function f with the properties outlined above? I would venture to say "yes," since humans can (with high probability) tell whether 300 numbers fall within just three separate, small intervals. And as humans can do, so can do computers. The problem is actually exhibiting a function which we can then prove to work.

Any thoughts on the matter would be appreciated.

Harrison

1. Let's call the elements of the public key a_i, for 1 \leq i \leq k. Consider the sets B_n = {b_i: b_i = a_i (mod n)} for arbitrary n.

2. Note that, if gcd(p,n) is sufficiently large (i.e., > p/2k), then the elements of B_n will follow a very specific distribution. Specifically, each will fall into one of the intervals [a*gcd(p,n), a*gcd(p,n)+p/2k) for some integer a.

3. For p << n << kp with gcd(p,n) large, there will necessarily be more than 1 residue in at least some of the intervals, and, with high probability, each interval will contain at least one residue.

4. Trivially: gcd(p,n) = gcd(p,n+p).

So let's posit a hypothetical function f(n) that will give us a good probabilistic estimate of whether the residues of the a_i (mod n) really do "clump together" in the intervals of size p/2k. This should be close to 0 when gcd(p,n) is low, larger when gcd(p,n) is greater than about p/k, and close to 1 when gcd(p,n) = p. Then f(n) is periodic (or "close to it") with period p. So if we can construct f as a quantum function, we may be able to use a QFT to estimate its period, which would recover the private key.

This is shaky on several levels; first, I don't know whether you can use a QFT to measure functions that are "close to periodic." However, this could perhaps be rectified by defining a new function g: N -> {0, 1/2, 1}, where g(n) is whichever of those three f(n) is nearest to. Then g(n) might be periodic with positive probability.

More crucially: Can we actually devise a function f with the properties outlined above? I would venture to say "yes," since humans can (with high probability) tell whether 300 numbers fall within just three separate, small intervals. And as humans can do, so can do computers. The problem is actually exhibiting a function which we can then prove to work.

Any thoughts on the matter would be appreciated.

Harrison

## Thursday, August 30, 2007

### Things I Believe, Part 1: 13.7*10^9 B.C.E - c. 1 B.C.E.

So, just for the hell of it, I've decided to attempt a complete inventory of all the things that I believe. All my beliefs, if you will. This'll be a three-part series: Part 1 will cover events from the Beginning of the Universe 'til around the time of some guy's birth in Judea (since that's how people tend to reckon their calendars these days); Part 2 will cover around that dude's birth until the present day; and Part 3 will cover what I believe to be Mathematical and Other Truths (and the Future?). Capitalized. So, without further ado, here is What Harrison Believes.

1. I believe that the Universe as we know it exploded out of a small, hot, dense region approximately 13.7 billion years ago, plus or minus 200 million;

2. That an infinitesimal fraction of a second later, space expanded at an exponential rate far beyond c, and that this led to a "flattening" of local spacetime, and that quantum fluctuations were magnified to macroscopic scales;

3. That a fraction of a second later, the electromagnetic force separated from the weak force, and fundamental particles acquired mass;

4. That under a second after this, quarks were bound together by the strong nuclear force into hadrons and anti-hadrons (most of which annihilated each other immediately);

5. That about 100-300 seconds later, temperatures were right for some protons and neutrons left over to fuse into deuterium nuclei and those of a few other light elements (mostly helium);

6. That hadron/anti-hadron and lepton/anti-lepton annihilation reactions created an abundance of photons, which reacted over the next ~400,000 years with other nuclei, leptons, and hadrons wandering the cosmos;

7. That, as nuclei and electrons combined to form atoms (at around 400,000 years after the Big Bang), the Universe became transparent to radiation, and we see this radiation today as the cosmic microwave background;

8. That, around 200-500 million years later, the first stars (made almost entirely of lightweight elements) began to form, as did the first galaxies;

9. That, around this time, a very special galaxy began forming, one that certain sentient beings living in it would later call the "Milky Way"...

More to come - hold on tight.

1. I believe that the Universe as we know it exploded out of a small, hot, dense region approximately 13.7 billion years ago, plus or minus 200 million;

2. That an infinitesimal fraction of a second later, space expanded at an exponential rate far beyond c, and that this led to a "flattening" of local spacetime, and that quantum fluctuations were magnified to macroscopic scales;

3. That a fraction of a second later, the electromagnetic force separated from the weak force, and fundamental particles acquired mass;

4. That under a second after this, quarks were bound together by the strong nuclear force into hadrons and anti-hadrons (most of which annihilated each other immediately);

5. That about 100-300 seconds later, temperatures were right for some protons and neutrons left over to fuse into deuterium nuclei and those of a few other light elements (mostly helium);

6. That hadron/anti-hadron and lepton/anti-lepton annihilation reactions created an abundance of photons, which reacted over the next ~400,000 years with other nuclei, leptons, and hadrons wandering the cosmos;

7. That, as nuclei and electrons combined to form atoms (at around 400,000 years after the Big Bang), the Universe became transparent to radiation, and we see this radiation today as the cosmic microwave background;

8. That, around 200-500 million years later, the first stars (made almost entirely of lightweight elements) began to form, as did the first galaxies;

9. That, around this time, a very special galaxy began forming, one that certain sentient beings living in it would later call the "Milky Way"...

More to come - hold on tight.

## Sunday, August 19, 2007

### Stuff.

What with school starting and all, I unfortunately will not be able to post new and interesting (heh) problems at the rate I'd hoped. But don't worry, dear reader (and I'm not using the plural, since I know there's only one of you) - I haven't fallen off the face of the Tubes. I'll be back with interesting problems. Just, right now, I have to make easy interesting problems for a math tournament. Yeah.

Harrison

Harrison

## Friday, August 10, 2007

### Random graphs, logic, and math, oh my!

So a random graph on n vertices is, (semi-)formally, a probability space on the set of all graphs on n vertices. Normally, we consider pretty "vanilla" probability spaces; for example, the one defined by setting the probability of there being an edge between u and v (which we'll denote by P(E(u,v)) for the remainder of this blog post) to be some constant 0 < c < 1. (Technically, yes, we can set c=0 and c=1, but those are really boring probability spaces.) However, as is well known, I'm not normal. So I'd like to investigate whether we can even construct probability spaces, in general, that have weird properties. Here's what I want to know:

Given a sentence Q of first-order logic with two free variables; say, u and v (where the variables will represent vertices of our graph, and we'll restrict ourselves to the predicate E(w,x) which, in short, means "there is an edge between w and x"), and a constant k>1, does there exist a (nontrivial*) probability space on the set of graphs with n vertices which satisfies:

P(E(u,v)|Q(u,v)) \geq k*P(E(u,v))

for all vertices u,v?

*"Nontrivial" in this context basically means "the probability of the empty graph isn't 1," or maybe "for any k, there's some n for which the probability of choosing a graph with at least k edges is positive." I'm not really sure.

By studying these distributions, we can look at "random" graphs which nonetheless are expected to have some preassigned structure (i.e., by letting Q be "there exists w: E(u,w) and E(w,v)", we're looking at graphs that are in some sense "likelier" to contain triangles than a random graph. Alternatively, by setting k very large and constructing rather intricate FOL sentences, we might be able to get some insight into Ramsey-theoretic questions. This is definitely true if we move to SOL; I'm not not quite as sure with this weaker model.) You know, just in case you're the kind of weird person who wants some sort of "motivation" for studying a problem.

Apologies for the awkward wording of my problem; I'm just not sure how else to express it.

Given a sentence Q of first-order logic with two free variables; say, u and v (where the variables will represent vertices of our graph, and we'll restrict ourselves to the predicate E(w,x) which, in short, means "there is an edge between w and x"), and a constant k>1, does there exist a (nontrivial*) probability space on the set of graphs with n vertices which satisfies:

P(E(u,v)|Q(u,v)) \geq k*P(E(u,v))

for all vertices u,v?

*"Nontrivial" in this context basically means "the probability of the empty graph isn't 1," or maybe "for any k, there's some n for which the probability of choosing a graph with at least k edges is positive." I'm not really sure.

By studying these distributions, we can look at "random" graphs which nonetheless are expected to have some preassigned structure (i.e., by letting Q be "there exists w: E(u,w) and E(w,v)", we're looking at graphs that are in some sense "likelier" to contain triangles than a random graph. Alternatively, by setting k very large and constructing rather intricate FOL sentences, we might be able to get some insight into Ramsey-theoretic questions. This is definitely true if we move to SOL; I'm not not quite as sure with this weaker model.) You know, just in case you're the kind of weird person who wants some sort of "motivation" for studying a problem.

Apologies for the awkward wording of my problem; I'm just not sure how else to express it.

## Wednesday, August 8, 2007

### Problem #1 Redux

So, Problem #1 was maybe a bit too easy, as I "solved" it (cough, cough) while trying to fall asleep last night.

Heuristically: Choose about sqrt(n) elements from L, spaced more or less uniformly. Sort these elements in their places; this can be done in O(sqrt(n)ln(n)) time. Then (heuristically!) the smallest of these elements will on average have size ~sqrt(n)/2, the next-smallest will have size around ~3sqrt(n)/2, and so on. Counting the largest possible number of out-of-order pairs, we have that there are at most:

sqrt(n)+3sqrt(n)+5sqrt(n)+...+(2n-1)sqrt(n) ~ n^3/2 pairs which are out of order.

But this is o(n^2), so for large enough n, this will, on average, produce a (1-\epsilon)-sorted list.

We can easily extend this construction to produce an algorithm running in O(n^\epsilon) time for any positive \epsilon that gives (on average) arbitrarily good partial sorting (but the scaling gets worse and worse...)

So that part seems to be done (so long as the above argument can be made rigorous, which doesn't seem prohibitively difficult.) But the question remains: What about algorithms

Heuristically: Choose about sqrt(n) elements from L, spaced more or less uniformly. Sort these elements in their places; this can be done in O(sqrt(n)ln(n)) time. Then (heuristically!) the smallest of these elements will on average have size ~sqrt(n)/2, the next-smallest will have size around ~3sqrt(n)/2, and so on. Counting the largest possible number of out-of-order pairs, we have that there are at most:

sqrt(n)+3sqrt(n)+5sqrt(n)+...+(2n-1)sqrt(n) ~ n^3/2 pairs which are out of order.

But this is o(n^2), so for large enough n, this will, on average, produce a (1-\epsilon)-sorted list.

We can easily extend this construction to produce an algorithm running in O(n^\epsilon) time for any positive \epsilon that gives (on average) arbitrarily good partial sorting (but the scaling gets worse and worse...)

So that part seems to be done (so long as the above argument can be made rigorous, which doesn't seem prohibitively difficult.) But the question remains: What about algorithms

*guaranteed*to produce a c-sorting? My hunch: There do not exist algorithms with worst-case time complexity o(n) guaranteed to c-sort a list of size n. (Average-case complexity, I'm not so sure.) Anyone who can prove this gets my everlasting respect and admiration.## Tuesday, August 7, 2007

### Problems #1

1. Given an unsorted list L with n elements, define the set S: {a, b \in L: a < b and f(a) < f(b)} (where f(x) is x's position in the list). Call a list c-sorted (0 < c \leq 1) if |S|/(\choose{n, 2}) \geq c. For what 0 < c < 1 do there exist sublinear algorithms to c-sort any list L? (We can consider either algorithms that are guaranteed to produce a c-sorting of a list, or those for which the expected value of the expression |S|/(\choose{n, 2}) is at least c.)

2. Define a candy-passing game on a graph G as follows: For the initial state s_0, we assign a number of "chips" to each vertex v. We go from state s_i to state s_(i+1) as follows: if a vertex v with degree d has at least d chips on it, then it "passes" one chip to each adjacent vertex. Denote the maximum degree of a vertex in G by \Delta(G); is it true that, for any initial configuration with at least (\Delta(G)+1)*|V(G)| total chips, there is some k for which s_k = s_(k+1)? Can we place an upper bound on k?

3. Anyone know how to do LaTeX (or any sort of mathematical notation stuff) in Blogger?

2. Define a candy-passing game on a graph G as follows: For the initial state s_0, we assign a number of "chips" to each vertex v. We go from state s_i to state s_(i+1) as follows: if a vertex v with degree d has at least d chips on it, then it "passes" one chip to each adjacent vertex. Denote the maximum degree of a vertex in G by \Delta(G); is it true that, for any initial configuration with at least (\Delta(G)+1)*|V(G)| total chips, there is some k for which s_k = s_(k+1)? Can we place an upper bound on k?

3. Anyone know how to do LaTeX (or any sort of mathematical notation stuff) in Blogger?

Subscribe to:
Posts (Atom)