Re: Python 3 optimizations continued...
From: | Guido van Rossum <guido-AT-python.org> | |
To: | stefan brunthaler <stefan-AT-brunthaler.net> | |
Subject: | Re: Python 3 optimizations continued... | |
Date: | Wed, 31 Aug 2011 10:31:13 -0700 | |
Message-ID: | <CAP7+vJJ6kK9HtVQTn2CJ1qZp8FZn0xzsbpMk8i7gw+nUDG4QaQ@mail.gmail.com> | |
Cc: | Stefan Behnel <stefan_ml-AT-behnel.de>, python-dev-AT-python.org | |
Archive‑link: | Article |
On Wed, Aug 31, 2011 at 10:08 AM, stefan brunthaler <stefan@brunthaler.net> wrote: > Well, my code has primarily been a vehicle for my research in that > area and thus is not immediately suited to adoption [...]. But if you want to be taken seriously as a researcher, you should publish your code! Without publication of your *code* research in your area cannot be reproduced by others, so it is not science. Please stop being shy and open up what you have. The software engineering issues can be dealt with separately! -- --Guido van Rossum (python.org/~guido)
(Log in to post comments)
Re: Python 3 optimizations continued...
Posted Sep 1, 2011 16:05 UTC (Thu) by paulj (subscriber, #341) [Link]
Re: Python 3 optimizations continued...
Posted Sep 1, 2011 17:14 UTC (Thu) by njs (subscriber, #40338) [Link]
http://boscoh.com/protein/a-sign-a-flipped-structure-and-...
http://projecteuclid.org/DPubS?service=UI&version=1.0...
http://www.aclweb.org/anthology/J04-4004.pdf
Re: Python 3 optimizations continued...
Posted Sep 2, 2011 10:04 UTC (Fri) by paulj (subscriber, #341) [Link]
Personally, I'd prefer a succinct, non-working-code description of the methodology. Working code tends to have many many extraneous details that can confuse the issue.
Re: Python 3 optimizations continued...
Posted Sep 2, 2011 17:14 UTC (Fri) by njs (subscriber, #40338) [Link]
I'm all for having good, high-level, detailed methodology sections. But empirically it's clear that there are often critical details that that the experimenter considers extraneous, or doesn't even know about in the first place (i.e., what they implemented is not what they thought they implemented, thanks to bugs). So we could put a huge effort into pressuring editors and educating reviewers to get some marginal improvement in methodology sections, or we could just make a rule that you have to release the code that you wrote anyway. Less work, easier to enforce, and more effective...
Re: Python 3 optimizations continued...
Posted Sep 2, 2011 1:53 UTC (Fri) by gdt (subscriber, #6284) [Link]
Re: Python 3 optimizations continued...
Posted Sep 2, 2011 10:08 UTC (Fri) by paulj (subscriber, #341) [Link]
Code no more MUST be released to recreate a software-based experiment, than that chemistry-lab-experiment papers must come with the original bottles, stands & chemicals used. Indeed, scientifically, it's *preferable* that the results be validated from *scratch*, to avoid risk of "contaminating" recreations with any mistakes made in the original experiment.
Would it be nice to have code? Would it help speed some things up? Sure. Is it *required* for good science: no.
Re: Python 3 optimizations continued...
Posted Sep 2, 2011 16:34 UTC (Fri) by daglwn (guest, #65432) [Link]
Half of my Ph.D. dissertation was spent systematically analyzing all of the undocumented assumptions made in previous publications related to compilers and microarchitecture. Let me tell you, it's a colossal waste of time. We should not spend half of a graduate term deciphering what some other group tried to do a few years ago.
I learned a lot, but I also became completely disillusioned with the quality of research in the computing field. In short, it's worthless.
> Code no more MUST be released to recreate a software-based experiment
That is completely wrong. Code is central to the experiment. It is the authoritative document on all sorts of assumptions. Bottles, stands and chemicals are standardized. Code is not.
> Indeed, scientifically, it's *preferable* that the results be validated from *scratch*, to avoid risk of "contaminating" recreations with any mistakes made in the original experiment.
Ideally, perhaps, but we do not live in a world of ideals. In the world of reality researchers often do not have the time to reproduce the code, in which case the publication goes without challenge (this is the most frequent outcome in the computing field). Other times the researcers will want to reproduce the code but cannot due to lack of documentation. A really ambitious researcher may spend a ton of time figuring out assumptions and discovering major flaws in the previous work, none of which can be published (in the computing field) because no conference will accept papers that do not show "improvement."
In the computing field, that last point is the Charybdis to the Scylla of unstated assumptions and privately-held code. Either one is grounds to seriously question the validity of our research methods, and therefore results, in computing.
I once talked to a few higher-ups at IBM research. They flat out stated that they will not accept any computing publication as true until they verify it themselves and they find that 99% of it is worthless either because it cannot be reproduced or the papers are flat-out lies.
Damning stuff, indeed.
Re: Python 3 optimizations continued...
Posted Sep 2, 2011 17:09 UTC (Fri) by deater (subscriber, #11746) [Link]
Also wanted to add, never believe *any* computer research done entirely in a simulator. Believe it even less if they claim the simulator was "heavily modified" but refuse to give out source code for the changes they made.
Unfortunately the above statements discount roughly about 95% of all computer architecture conference publications from the past 10 years.
Re: Python 3 optimizations continued...
Posted Sep 2, 2011 20:59 UTC (Fri) by daglwn (guest, #65432) [Link]
Right on. Simulators in general are notoriously unreliable predictors of actual performance. I have yet to encounter one that was not at least 20% off when compared to actual hardware and those are the really good ones. In a field where a 5% gain will get you a publication, it's meaningless. No, it's worse than that. It's downright misleading.
> Unfortunately the above statements discount roughly about 95% of all computer architecture conference publications from the past 10 years.
Yep. Probably more like 20 years. It's a tough problem to tackle. Computer architecture research is not simple and never was. Again, I have seen swings of 20% performance changes either way simply based on the heuristic the compiler used to determine which registers to spill. There's so much noise in the system that it is impossible to isolate a small microarchitectural change. Things that work well on one processor generation may be exactly the wrong things to do in the next.
That's why I am in favor of publishing ideas rather than results and frankly, that's how papers should be judged. Is the idea in the paper something truly novel or is it simply a tweak on some other idea that can in no way be measured meaningfully? Sure, some general numbers to indicate the idea has some merit are necessary but they should not be presented as proof of the quality of the idea.
Re: Python 3 optimizations continued...
Posted Sep 9, 2011 13:36 UTC (Fri) by fuhchee (guest, #40059) [Link]
But ideas are cheap. Workable, useful ideas are valuable. Without results, how do you objectively tell them apart?
Re: Python 3 optimizations continued...
Posted Sep 2, 2011 17:22 UTC (Fri) by njs (subscriber, #40338) [Link]
That's not the worst of it. In one of the links I gave above, there was a analysis that purported to show that certain drugs were good cancer treatment candidates. People were skeptical, but it took ages to reverse engineer the analysis and show that it was hopelessly incorrect. (This reverse engineering included techniques like "take this accurate implementation of the algorithm they purportedly used, and then do a brute force search on all possible off-by-one errors until you find an algorithm that reproduces the published figure".)
By the time they had managed it, the original researchers had already started giving the drugs to patients in a clinical trial (!!!). The trial was canceled, huge scandal.
Re: Python 3 optimizations continued...
Posted Sep 3, 2011 11:32 UTC (Sat) by raven667 (subscriber, #5198) [Link]
Do you think that you would have went through the trouble of reimplementing the analysis if the code was available?
Re: Python 3 optimizations continued...
Posted Sep 3, 2011 18:01 UTC (Sat) by njs (subscriber, #40338) [Link]
Now there was one lab that gets one set of results, and another lab that gets contradictory results. What next? You need to know who's right. Maybe the second lab just screwed up some reagent or something, and that's why they failed to replicate -- that happens all the time. And the first lab had a lot of money and prestige riding on their results being right, so they weren't going to withdraw those results without some real proof.
Figuring out how it went wrong wasn't gravy -- it was the key prerequisite to understanding the actual facts about these drugs, and it was the key prerequisite to stopping the misguided clinical trial.
Re: Python 3 optimizations continued...
Posted Sep 3, 2011 13:43 UTC (Sat) by paulj (subscriber, #341) [Link]
Complaining that results can not be believed because the software wasn't released is missing the point somewhat: science requires that results be *independently* recreated. If you simply re-run the same piece of code on the same data, you're just hitting "replay" on the original experiment to an extent - that's NOT an independent recreation. Further, without that independent recreation, you should NOT believe a result - even if it DOES come with software you can rerun.
That's because, exactly as you say yourself, the code may have all kinds of assumptions. Some which the original authors didn't even realise they made. These are things which need to be teased out by from-*scratch* recreations of the experiment. While code may be nice to have for some reasons, I doubt that the progress of science is served by shifting the burden from authors describing all methodology properly, in concise natural language in the paper, on to the rest of the community to pick through $DEITY-knows what amount of experimental (i.e. perhaps hastily-written and not very readable) code.
As for challenging the publication, an unvalidated result SHOULD be published, presuming it's sound otherwise (including a decent methodology ;) ). That's the beginning of the path towards having it scientifically validated, not the end, surely?
NB: I completely agree with everyone here who says 1 result from 1 specific simulation is not to be believed. I also agree there's a lot of research that does not adequately describe its methodology, and that needs to be fixed. I disagree though that releasing code is sufficient to address that problem, in terms of the scientific process (even if it could help other things).
Re: Python 3 optimizations continued...
Posted Sep 4, 2011 1:54 UTC (Sun) by deater (subscriber, #11746) [Link]
In any case, in the computer architecture field the only real way to prove something wrong is to design and fab a chip with the proposed changes. These days that runs upward of a few million dollars at least. So in the end pretty much all the research is not producible in the first place, let alone reproducible.
It is true that you could spend 3-years writing a simulator from scratch to try to refute someone, but all that gets you is yet another broken simulator with a different set of wrong assumptions.
But if they had released their code, you can sometimes find flaws within a few minutes looking at the config file on in a day or so looking at the source code.
So the primary reason people refuse to release their code isn't some sort of "that's how science should be" ideal, but rather just a case of boy I can check off another publication and it's likely no one will ever take the effort to prove me wrong.
Re: Python 3 optimizations continued...
Posted Sep 7, 2011 21:31 UTC (Wed) by daglwn (guest, #65432) [Link]
Right. That is why ideas should be of primary concern, not numbers. If we took this approach we could publish papers on the truly innovative ideas (branch prediction, register renaming, two level branch prediction, value prediction, specialized caches, etc.) and forget about the minor tweaks (gshare and the test of the neverending list of insignificant branch predictor improvements, address prediction, etc.).
We really must get away from the accepted notion that Least Publishable Unit is at all acceptable.