Re: Python 3 optimizations continued...

[Posted August 31, 2011 by corbet]

From:		Guido van Rossum <guido-AT-python.org>
To:		stefan brunthaler <stefan-AT-brunthaler.net>
Subject:		Re: Python 3 optimizations continued...
Date:		Wed, 31 Aug 2011 10:31:13 -0700
Message-ID:		<CAP7+vJJ6kK9HtVQTn2CJ1qZp8FZn0xzsbpMk8i7gw+nUDG4QaQ@mail.gmail.com>
Cc:		Stefan Behnel <stefan_ml-AT-behnel.de>, python-dev-AT-python.org
Archive‑link:		Article

On Wed, Aug 31, 2011 at 10:08 AM, stefan brunthaler
<stefan@brunthaler.net> wrote:
> Well, my code has primarily been a vehicle for my research in that
> area and thus is not immediately suited to adoption [...].

But if you want to be taken seriously as a researcher, you should
publish your code! Without publication of your *code* research in your
area cannot be reproduced by others, so it is not science. Please stop
being shy and open up what you have. The software engineering issues
can be dealt with separately!

-- 
--Guido van Rossum (python.org/~guido)

(Log in to post comments)

Re: Python 3 optimizations continued...

Posted Sep 1, 2011 16:05 UTC (Thu) by paulj (subscriber, #341) [Link]

This isn't quite true. For a result to become accepted, it has to be reproducible. For a result to be reproduced, the methodology behind it needs to be described adequately, so that others can recreate the experiment. Fairly obviously, that does *not* mean that code must be released, even if it'd be nice to have. A more succinct description of the methodology in the paper can suffice.

Re: Python 3 optimizations continued...

Posted Sep 1, 2011 17:14 UTC (Thu) by njs (subscriber, #40338) [Link]

This is highly controversial, though. In practice it turns out to be very, very common that the methodology described in the paper is not the same as the actual methodology used in the experiment...

http://boscoh.com/protein/a-sign-a-flipped-structure-and-...
http://projecteuclid.org/DPubS?service=UI&version=1.0...
http://www.aclweb.org/anthology/J04-4004.pdf

Re: Python 3 optimizations continued...

Posted Sep 2, 2011 10:04 UTC (Fri) by paulj (subscriber, #341) [Link]

Sure, but that it's common that methodology sections are often terribly incomplete is a different point. That can be fixed by releasing the code, and/or by increasing the quality of methodology sections in papers.

Personally, I'd prefer a succinct, non-working-code description of the methodology. Working code tends to have many many extraneous details that can confuse the issue.

Re: Python 3 optimizations continued...

Posted Sep 2, 2011 17:14 UTC (Fri) by njs (subscriber, #40338) [Link]

> That can be fixed by releasing the code, and/or by increasing the quality of methodology sections in papers.

I'm all for having good, high-level, detailed methodology sections. But empirically it's clear that there are often critical details that that the experimenter considers extraneous, or doesn't even know about in the first place (i.e., what they implemented is not what they thought they implemented, thanks to bugs). So we could put a huge effort into pressuring editors and educating reviewers to get some marginal improvement in methodology sections, or we could just make a rule that you have to release the code that you wrote anyway. Less work, easier to enforce, and more effective...

Re: Python 3 optimizations continued...

Posted Sep 2, 2011 1:53 UTC (Fri) by gdt (subscriber, #6284) [Link]

But how can we tell that your code implemented your methodology? You are simply making an argument for "source code is part of the data of your experiment", and just like data it should be available to other scientists so they can check your analysis.

Re: Python 3 optimizations continued...

Posted Sep 2, 2011 10:08 UTC (Fri) by paulj (subscriber, #341) [Link]

You can tell by using the methodological description to recreate the experiment, and see if the results match. If the description was incomplete, that's a flaw in the original paper (which can be improved on with further papers - not necessarily from the original authors).

Code no more MUST be released to recreate a software-based experiment, than that chemistry-lab-experiment papers must come with the original bottles, stands & chemicals used. Indeed, scientifically, it's *preferable* that the results be validated from *scratch*, to avoid risk of "contaminating" recreations with any mistakes made in the original experiment.

Would it be nice to have code? Would it help speed some things up? Sure. Is it *required* for good science: no.

Re: Python 3 optimizations continued...

Posted Sep 2, 2011 16:34 UTC (Fri) by daglwn (guest, #65432) [Link]

> You can tell by using the methodological description to recreate the experiment, and see if the results match. If the description was incomplete, that's a flaw in the original paper (which can be improved on with further papers - not necessarily from the original authors).

Half of my Ph.D. dissertation was spent systematically analyzing all of the undocumented assumptions made in previous publications related to compilers and microarchitecture. Let me tell you, it's a colossal waste of time. We should not spend half of a graduate term deciphering what some other group tried to do a few years ago.

I learned a lot, but I also became completely disillusioned with the quality of research in the computing field. In short, it's worthless.

> Code no more MUST be released to recreate a software-based experiment

That is completely wrong. Code is central to the experiment. It is the authoritative document on all sorts of assumptions. Bottles, stands and chemicals are standardized. Code is not.

> Indeed, scientifically, it's *preferable* that the results be validated from *scratch*, to avoid risk of "contaminating" recreations with any mistakes made in the original experiment.

Ideally, perhaps, but we do not live in a world of ideals. In the world of reality researchers often do not have the time to reproduce the code, in which case the publication goes without challenge (this is the most frequent outcome in the computing field). Other times the researcers will want to reproduce the code but cannot due to lack of documentation. A really ambitious researcher may spend a ton of time figuring out assumptions and discovering major flaws in the previous work, none of which can be published (in the computing field) because no conference will accept papers that do not show "improvement."

In the computing field, that last point is the Charybdis to the Scylla of unstated assumptions and privately-held code. Either one is grounds to seriously question the validity of our research methods, and therefore results, in computing.

I once talked to a few higher-ups at IBM research. They flat out stated that they will not accept any computing publication as true until they verify it themselves and they find that 99% of it is worthless either because it cannot be reproduced or the papers are flat-out lies.

Damning stuff, indeed.

Re: Python 3 optimizations continued...

Posted Sep 2, 2011 17:09 UTC (Fri) by deater (subscriber, #11746) [Link]

I wish to heartily agree with this.

Also wanted to add, never believe *any* computer research done entirely in a simulator. Believe it even less if they claim the simulator was "heavily modified" but refuse to give out source code for the changes they made.

Unfortunately the above statements discount roughly about 95% of all computer architecture conference publications from the past 10 years.

Re: Python 3 optimizations continued...

Posted Sep 2, 2011 20:59 UTC (Fri) by daglwn (guest, #65432) [Link]

> Also wanted to add, never believe *any* computer research done entirely in a simulator. Believe it even less if they claim the simulator was "heavily modified" but refuse to give out source code for the changes they made.

Right on. Simulators in general are notoriously unreliable predictors of actual performance. I have yet to encounter one that was not at least 20% off when compared to actual hardware and those are the really good ones. In a field where a 5% gain will get you a publication, it's meaningless. No, it's worse than that. It's downright misleading.

> Unfortunately the above statements discount roughly about 95% of all computer architecture conference publications from the past 10 years.

Yep. Probably more like 20 years. It's a tough problem to tackle. Computer architecture research is not simple and never was. Again, I have seen swings of 20% performance changes either way simply based on the heuristic the compiler used to determine which registers to spill. There's so much noise in the system that it is impossible to isolate a small microarchitectural change. Things that work well on one processor generation may be exactly the wrong things to do in the next.

That's why I am in favor of publishing ideas rather than results and frankly, that's how papers should be judged. Is the idea in the paper something truly novel or is it simply a tweak on some other idea that can in no way be measured meaningfully? Sure, some general numbers to indicate the idea has some merit are necessary but they should not be presented as proof of the quality of the idea.

Re: Python 3 optimizations continued...

Posted Sep 9, 2011 13:36 UTC (Fri) by fuhchee (guest, #40059) [Link]

"That's why I am in favor of publishing ideas rather than results and frankly, that's how papers should be judged. Is the idea in the paper something truly novel or is it simply a tweak on some other idea that can in no way be measured meaningfully?"

But ideas are cheap. Workable, useful ideas are valuable. Without results, how do you objectively tell them apart?

Re: Python 3 optimizations continued...

Posted Sep 2, 2011 17:22 UTC (Fri) by njs (subscriber, #40338) [Link]

> Let me tell you, it's a colossal waste of time. We should not spend half of a graduate term deciphering what some other group tried to do a few years ago.

That's not the worst of it. In one of the links I gave above, there was a analysis that purported to show that certain drugs were good cancer treatment candidates. People were skeptical, but it took ages to reverse engineer the analysis and show that it was hopelessly incorrect. (This reverse engineering included techniques like "take this accurate implementation of the algorithm they purportedly used, and then do a brute force search on all possible off-by-one errors until you find an algorithm that reproduces the published figure".)

By the time they had managed it, the original researchers had already started giving the drugs to patients in a clinical trial (!!!). The trial was canceled, huge scandal.

Re: Python 3 optimizations continued...

Posted Sep 3, 2011 11:32 UTC (Sat) by raven667 (subscriber, #5198) [Link]

That's kind of the point the other poster was making, that knowing the methodology that was intended is the important part in recreating an analysis. If you had just run the original software used to make the paper you would have made the same analysis errors. In this case as soon as you tried to implement it independently using the published method you knew the result was wrong. Figuring out how it went wrong without the source code is just gravy, although it would have been easier with the code

Do you think that you would have went through the trouble of reimplementing the analysis if the code was available?

Re: Python 3 optimizations continued...

Posted Sep 3, 2011 18:01 UTC (Sat) by njs (subscriber, #40338) [Link]

Perhaps I wasn't clear. They were skeptical of the results because they had failed to replicate them, as you say. (Though IIRC they tried to replicate by running a similar experiment from scratch, not just re-analyzing the data -- in practice "replication" rarely attempts to be exact. This is good because any robust, real effect should show up even if you change around the details of the experiment, so doing non-exact replication is a stronger test than doing an exact replication.)

Now there was one lab that gets one set of results, and another lab that gets contradictory results. What next? You need to know who's right. Maybe the second lab just screwed up some reagent or something, and that's why they failed to replicate -- that happens all the time. And the first lab had a lot of money and prestige riding on their results being right, so they weren't going to withdraw those results without some real proof.

Figuring out how it went wrong wasn't gravy -- it was the key prerequisite to understanding the actual facts about these drugs, and it was the key prerequisite to stopping the misguided clinical trial.

Re: Python 3 optimizations continued...

Posted Sep 3, 2011 13:43 UTC (Sat) by paulj (subscriber, #341) [Link]

I'm sorry, but experiments in other fields may and do use custom built tools, which will not be provided with the papers describing the results - not even design documents. Software in CS is just such a custom laboratory tool.

Complaining that results can not be believed because the software wasn't released is missing the point somewhat: science requires that results be *independently* recreated. If you simply re-run the same piece of code on the same data, you're just hitting "replay" on the original experiment to an extent - that's NOT an independent recreation. Further, without that independent recreation, you should NOT believe a result - even if it DOES come with software you can rerun.

That's because, exactly as you say yourself, the code may have all kinds of assumptions. Some which the original authors didn't even realise they made. These are things which need to be teased out by from-*scratch* recreations of the experiment. While code may be nice to have for some reasons, I doubt that the progress of science is served by shifting the burden from authors describing all methodology properly, in concise natural language in the paper, on to the rest of the community to pick through $DEITY-knows what amount of experimental (i.e. perhaps hastily-written and not very readable) code.

As for challenging the publication, an unvalidated result SHOULD be published, presuming it's sound otherwise (including a decent methodology ;) ). That's the beginning of the path towards having it scientifically validated, not the end, surely?

NB: I completely agree with everyone here who says 1 result from 1 specific simulation is not to be believed. I also agree there's a lot of research that does not adequately describe its methodology, and that needs to be fixed. I disagree though that releasing code is sufficient to address that problem, in terms of the scientific process (even if it could help other things).

Re: Python 3 optimizations continued...

Posted Sep 4, 2011 1:54 UTC (Sun) by deater (subscriber, #11746) [Link]

If you don't trust your own code enough to release your code for scrutiny, why should I trust it?

In any case, in the computer architecture field the only real way to prove something wrong is to design and fab a chip with the proposed changes. These days that runs upward of a few million dollars at least. So in the end pretty much all the research is not producible in the first place, let alone reproducible.

It is true that you could spend 3-years writing a simulator from scratch to try to refute someone, but all that gets you is yet another broken simulator with a different set of wrong assumptions.

But if they had released their code, you can sometimes find flaws within a few minutes looking at the config file on in a day or so looking at the source code.

So the primary reason people refuse to release their code isn't some sort of "that's how science should be" ideal, but rather just a case of boy I can check off another publication and it's likely no one will ever take the effort to prove me wrong.

Re: Python 3 optimizations continued...

Posted Sep 7, 2011 21:31 UTC (Wed) by daglwn (guest, #65432) [Link]

> So in the end pretty much all the research is not producible in the first place, let alone reproducible.

Right. That is why ideas should be of primary concern, not numbers. If we took this approach we could publish papers on the truly innovative ideas (branch prediction, register renaming, two level branch prediction, value prediction, specialized caches, etc.) and forget about the minor tweaks (gshare and the test of the neverending list of insignificant branch predictor improvements, address prediction, etc.).

We really must get away from the accepted notion that Least Publishable Unit is at all acceptable.