-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Request: use semantic versioning #10156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could numpy be considered to be using semantic versioning, but with a leading |
Note that almost every core scientific Python project does what NumPy does: remove deprecated code after a couple of release unless that's very disruptive, and only bump the major version number for, well, major things. Not sure if you're proposing a change to the deprecation policy, or if you think we should be at version 14.0.0 instead of 1.14.09 now. |
The latter: NumPy should be roughly at version 14 by now. But I propose to adopt this convention only for future releases. BTW: NumPy's predecessor, Numeric, did use semantic versioning and got to version 24 over roughly a decade. I don't know why this was changed in the transition to NumPy. |
My impression is that the vast majority of Python projects do not use semantic versioning. For example, Python itself does not use semantic versioning. (I'm also not aware of any mainstream operating systems or compilers that use semver -- do you have some in mind?) I agree that semver proponents have done a great job of marketing it, leading many developers into thinking that it's a good idea, but AFAICT it's essentially unworkable in the real world for any project larger than left-pad, and I strongly dispute the idea that the semver folks now "own" the traditional MAJOR.MINOR.MICRO format and everyone else has to switch to something else. Can you give an example of what you mean by a "release labelling scheme that cannot be mistaken for semantic versioning"? Using names instead of numbers? You cite date-based versioning, but the most common scheme for this that I've seen is the one used by e.g. Twisted and PyOpenSSL, which are currently at 17.9.0 and 17.5.0, respectively. Those look like totally plausible semver versions to me... And can you elaborate on what benefit this would have to users? In this hypothetical future, every release would have some breaking changes that are irrelevant to the majority of users, just like now. What useful information would we be conveying by bumping the major number every few months? "This probably breaks someone, but probably doesn't break you"? Should we also bump the major version on bugfix releases, given the historical inevitability that a large proportion of them will break at least 1 person's code? Can you give any examples of "software developers, software users, and managers of software distributions" who have actually been confused? |
Note that the mailing list is a more appropriate venue for this discussion, and probably we would have to have a discussion there before actually making any change, but the comments here should be useful in getting a sense of what kind of issues you'd want to address in that discussion. |
@njsmith It seems that the only factual point we disagree on is whether or not semantic versioning is the default assumption today. This requires a clearer definition of the community in which it is (or not) the default. The levels of software management I care about is distribution managament and systems administration, which is where people decide which version is most appropriate in their context. The informal inquiry that led to me the conclusion that semantic versioning is the default consisted of talking to administrators of scientific computing installations. I also envisaged A comment from one systems administrator particularly struck me as relevant: he said that for the purposes of deciding which version to install, any convention other than semantic versioning is useless. Systems administrators can neither explore each package in detail (they lack the time and the competence) nor consult all their users (too many of them). They have to adopt a uniform policy, and this tends to be based on the assumption of semantic versioning. For example, an administrator of a computing cluster told me that he checks with a few "power users" he knows personally before applying an update with a change in the major version number. As for examples of people who have actually been confused, specifically concerning scientific Python users, I have plenty of them: colleagues at work, people I meet at conferences, people who ask for advice by e-mail, students in my classes. This typically starts with "I know you are a Python expert, can you help me with a problem?" That problem turns out to be a script that works on one computer but not on another. Most of these people don't consider dependency issues at all, but a few did actually compare the version numbers of the two installations, finding only "small differences". |
As @eric-wieser and @rgommers noted, my request is almost synonymous to requesting that the initial "1." be dropped from NumPy versions. In other words, NumPy de facto already uses semantic versioning, even though it is not the result of a policy decision and therefore probably not done rigorously. However, it does suggest that NumPy could adopt semantic versioning with almost no change to the current development workflow. |
Unfortunately, semantic versioning is also useless for this :-(. I don't mean to split hairs or exaggerate; I totally get that it's a real problem. But just because a problem is real doesn't mean that it has a solution. You fundamentally cannot boil down the question "should I upgrade this software?" to a simple mechanical check. It's a fantasy. Projects that use semver regularly make major releases that all their users ought to immediately upgrade to, and regularly make breaking changes in minor releases.
I like this part though :-). I doubt we'll agree about the philosophy of semver, but it's much easier to have a discussion about the concrete effects of different versioning schemes, and which outcome we find most desirable. I don't think the concept of semver has much to do with this policy -- does the system admin you talked to actually check every project to see if they're using semver? Most projects don't, as you said, it's hard to even tell which ones do. And the policy is the same one that sysadmins have been using since long before semver even existed. I think a better characterization of this policy would be: "follow the project's recommendation about how careful to be with an upgrade", along with the ancient tradition that major releases are "big" and minor releases are "little". The NumPy project's recommendation is that system administrators should upgrade to new feature releases, so what I take from this anecdote is that our current numbering scheme is accurately communicating what we want it to, and that switching to semver would not... |
@njsmith OK, let's turn away from philosophy and towards practicalities: What is the role of software version numbers in the communication between software developers, system maintainers, and software users? Again it seems that we have a major difference of opinion here. For you, it's the developers who give instructions to system maintainers and users, and use the version numbers to convey their instructions. For me, every player should decide according to his/her criteria, and the version number should act as means of factual communication at the coarsest level. Given that NumPy has no security implications, I don't see how and why the NumPy project should give universal recommendations. People and institutions have different needs. That's why we have both ArchLinux and CentOS, with very different updating policies. |
@khinsen The
Perhaps this could be your proposed "stable numpy," where the interface to numpy is restricted to Python/Cython and nothing is ever changed. Of course, writing code with oldnumeric is very arcane, but you can't have it both ways. |
@xoviat True, but that's a different issue. My point here is not software preservation, but communication between the different players in software management. Question: As a systems administrator (even just on your personal machine), would you expect a package to drop a complete API layer from version 1.8 to version 1.9? For those who replied "yes", second question: can you name any software other than numpy that ever did this? BTW, I can assure you that many people were bitten by this, because I got a lot of mails asking me why MMTK stopped working from one day to the next. All these people had done routine updates of their software installations, without expecting any serious consequences. But dropping |
BTW, since hardly anyone knows the story: |
Which layer are you referring to? |
SciPy dropped
That change was about 10 years in the making, and there is no way that a different versioning scheme would have made a difference here. Dropping deprecated features is a tradeoff between breaking a small fraction of (older) code, and keeping the codebase easy to maintain. Overall, if we're erring, then we're likely doing that on the being conservative side. As someone who also has had to deal with many years old large corporate code bases that use numpy I feel your pain, but you're arguing for something that is absolutely not a solution (and in general there is no full solution; educating users about things like pinning versions and checking for deprecation warnings is the best we can do). |
numeric/numarray support I assume |
@rgommers Sorry, I should have said "another example outside the SciPy ecosystem". Also, I am not complaining about dropping the support for What difference would that have made? It would have made people hesistate to update without reading the release notes. Everyone using (but not developing) Python code would have taken this as a sign to be careful. Don't forget that the SciPy ecosystem has an enormous number of low-profile users who are not actively following developments. Python and NumPy are infrastructure items of the same nature as |
Just edited my reply with a link to the Python release notes, that's outside the SciPy ecosystem.
This will simply not be the case. If instead of 1.12, 1.13, 1.14, etc we have 12.0, 13.0, 14.0 then users get used to that and will use the same upgrade strategy as before. The vast majority will not all of sudden become much more conservative.
All true, and all not magically fixable by a version number. If they ran |
Other downsides of changing the versioning scheme now:
|
My baseline reference is not Python, but a typical software installation. As I said, for many (perhaps most) users, NumPy is infrastructure like gnu-coreutils or gcc. They do not interpret version numbers specifically in the context of the SciPy ecosystem. I did a quick check on a Debian 9 system with about 300 installed packages. 85% of them have a version number starting with an integer followed by a dot. The most common integer prefixes are 1 (30%), 2 (26%), 0 (14%) and 3 (13%). If NumPy adopted a version numbering scheme conforming to common expectations (i.e. semantic versioning or a close approximation), it definitely would stand out and be treated with caution. Note also that the only updates in Debian-installed software that ever broke things for me were in the SciPy ecosystem, with the sole exception of an Emacs update that brought changes in org-mode which broke a home-made org-mode extension. The overall low version number prefixes thus do seem to indicate that most widely used software is much more stable than NumPy and friends. Uniformity across the SciPy ecosystem is indeed important, but I would prefer that the whole ecosystem adopt a versioning scheme conforming to the outside world's expectations. I am merely starting with NumPy because I see it as the most basic part. It's even more infrastructure than anything else. Finally, I consider a change in a function's semantics a much more important change than a change in the ABI. The former can cause debugging nightmares for hundreds of users, and make programs produce undetected wrong results for years. The latter leads to error messages that clearly indicate the need to fix something. According to those standards, NumPy is not even following Python's lead, because the only changes in semantics I am aware of in the Python language happened from 2 to 3. |
This we try really hard not to do. Clear breakage when some feature is removed can happen, silently changing numerical results should not. That's one thing we learned from the diagonal view change - that was a mistake in hindsight. |
I still disagree. Even on Debian, which is definitely not "a typical software installation" for our user base (that'd be something like Anaconda on Windows). You also seem to ignore my argument above that a user doesn't even get to see a version number normally (neither with |
Also, your experience that everything else never breaks is likely because you're using things like OS utilities and GUI programs, not other large dependency chains. E.g. the whole JavaScript/NodeJS ecosystem is probably more fragile than the Python one. |
This is a good example of the subtleties here. As far as I know, MMTK and your other projects are the only ones still extant that were affected by the removal of the numeric/numarray compatibility code. How many users would you estimate you have? 100? 1000? NumPy has millions, so maybe 0.1% of our users were affected by this removal? This is definitely not zero, and the fact that it's small doesn't mean that it doesn't matter – I wish we could support 100% of users forever in all ways. And I understand that it's particularly painful for you, receiving 100% of the complaints from your users. But if we bump our major version number for this, it means to 99.9% of our users, we've just cried wolf. It's a false positive. OTOH for that 0.1% of users, it was really important. Yet it's not uncommon that we break more than 0.1% of users in micro releases, despite our best efforts. So what do we do? It's simply not possible to communicate these nuances through the blunt instrument of a version number. Everyone wants a quick way to tell whether an upgrade will break their code, for good reasons. Semver is popular because it promises to do that. It's popular for the same reason that it's popular to think that fad diets can cure cancer. I wish semver lived up to its promises too. But it doesn't, and if we want to be good engineers we need to deal with the complexities of that reality.
We give universal recommendations because we only have 1 version number, so by definition whatever we do with it is a universal recommendation. That's not something we have any control over.
IIRC we have literally not received a single complaint about this from someone saying that it broke their code. (Maybe one person?) I'm not saying that means no-one was affected, obviously the people who complain about a change are in general only a small fraction of those affected, but if you use complaints as a rough proxy for real-world impact then I don't think this makes the top 50. And BTW I'm pretty sure if you go searching through deep history you can find far more egregious changes than that :-).
Respectfully, I think this says more about how you use NumPy vs Debian than it does about NumPy versus Debian. I love Debian, I've used it for almost 20 years now, and I can't count how many times it's broken things. Just in the last week, some bizarre issue with the new gnome broke my login scripts and some other upgrade broke my trackpoint. (Both are fixed now, but still.) I'll also note that Debian's emacs was set up to download and run code over unencrypted/insecure channels for years, because of backwards compatibility concerns about enabling security checks. I don't think there's such a thing as a gcc release that doesn't break a few people, if only because people do things like use
The overall low version number prefixes are because most widely used software does not use semver.
Yes, that's why we're extremely wary of such changes. There is some disconnect in perspectives here: you seem to think that we change things willy-nilly all the time, don't care about backwards compatibility, etc. I can respect that; I understand it reflects your experience. But our experience is that we put extreme care into such changes, and I would say that when I talk to users, it's ~5% who have your perspective, and ~95% who feel that numpy is either doing a good job at stability, or that it's doing too good a job and should be more willing to break things. Perhaps you can take comfort in knowing that even if we disappoint you, we are also disappointing that last group :-). |
Well, to go off topic, that does serve as an example of the other side of stability. Emacs was static for years due to Stallman's resistance to change, and that resulted in the xEmacs fork. My own path went Emacs -> xEmacs, to heck with it, -> Vim ;) Premature fossilization is also why I stopped using Debian back in the day. For some things, change simply isn't needed or even wanted, and I expect there are people running ancient versions of BSD on old hardware hidden away in a closet. But I don't expect there are many such places. Apropos the current problem, I don't think a change in the versioning scheme would really make any difference. A more productive path might be to address the modernization problem. @khinsen Do you see your way to accepting updating of your main projects? If so, I think we should explore ways in which we can help you do it. |
I am attempting to update the projects at
https://github.com/ScientificPython. It requires updating Python code that
used the old C API (and I mean old; some functions such as Py_PROTO were
from 2000). PRs are of course welcome, but I'm not sure whether anyone
wants to spend their time on that.
The bigger issue that I think he brought up is that there are "many
projects" (I don't know where exactly they are because all the projects
that I've seen support Python 3) that also need updating; how is it
determined which projects are allocated NumPy developer time? And also I
don't think his central claim was invalid: SciPy greatly benefits from the
fact that it could simply copy and paste old fortran projects (such as
fftpack) with little or no modification. If these had been written in say
"fortran 2" and new compilers only compiled "fortan 3," there would have
been significant issues.
That said, these issues aren't really NumPy's fault. Despite what he has
said, with NumPy 1.13 installed, oldnumeric still passed all of the tests,
indicating the NumPy is not the culprit here. Since the oldnumeric API is
literally over a decade old (maybe approaching two decades!), and it still
works on the latest NumPy, I think that the NumPy API is probably stable
enough.
|
@charris I fully agree with you that "never change anything" is not a productive attitude in computing. My point is that the SciPy ecosystem has become so immensely popular that no single approach to managing change can suit everyone. It depends on how quickly methods and their implementations evolve in a given field, on the technical competences of practitioners, on other software they depend on, on the resources they can invest into code, etc. The current NumPy core team cares more about progress (into a direction that matters for some fields but is largely irrelevant to others) than about stability. That is fine - in the Open Source world, the people who do the work decide what they want to work on. However, my impression is that they do not realize that lots of people whose work depend on NumPy have different needs, feel abandoned by the development team, and are starting to move away from SciPy towards more traditional and stable technology such as C and Fortran (and, in one case I know, even to Matlab). I have no idea what percentage of NumPy users are sufficiently unhappy with the current state of affairs, and I don't think anyone else has. Once a software package becomes infrastructure, you cannot easily estimate who depends on it. Many who do are not even aware of it, and much code that depends on NumPy (directly or indirectly) is not public and/or not easily discoverable. If we want to keep everyone happy in the SciPy community, we need to find a way to deal with diverse needs. The very first step, in my opinion, is to shift the control over the rate of change in a specific installation from the developers to someone who is closer to the end user. That could be the end users themselves, or systems administrators, or packagers, or whoever else - again I don't think there is a universal answer to this question. What this requires from the developers is information at the right level, and that is why I started this thread. Of course version numbers cannot save the world, but I see them as a first step to establishing a distributed responsability for change management. Finally, some of you seem to believe that I am fighting a personal battle about my own code. It may surprise you that my personal attitude is not the one I am defending here. My own sweetspot for rate of change is somewhere in between of what is common in my field and what sees to be prevalent in the NumPy team. Most of my work today uses Python 3 and NumPy > 1.10. MMTK is 20 years old and I do many things differently today. Quite often I take pieces of code from MMTK I need for a specific project and adapt them to "modern SciPy", but that's something I can do with confidence only because I wrote the original code. I have been maintaining a stable MMTK as a service to the community, not for my own use, which explains why I have been doing maintenance in a minimalistic way, avoiding large-scale changes in the codebase. Both funding for software and domain-competent developers are very hard to find, so MMTK has always remained a one-maintainer-plus-occasional-contributors project. I am not even sure that porting all of MMTK to "modern SciPy" would do anyone any good, because much of the code that depends on MMTK is completely unmaintained. But then, that's true for most of the Python code I see around me, even code completely unrelated to MMTK. It's the reality of a domain of research where experiments rather than computation and coding are in the focus of attention. |
@xoviat The number of test in The C extension modules that you have been looking at is literally 20 years old and was written for Python 1.4. Back then, it was among the most sophisticated examples of Python-C combos and in fact shaped the early development of Numeric (pre-NumPy) and even CPython itself: CObjects (pre-Capsules) were introduced based on the needs of ScientificPython and MMTK. I am the first to say that today's APIs and support tools are much better, and I expect they will still improve in the future. But some people simply want to use software for doing research, no matter how old-fashioned it is, and I think they have a right to exist as well. |
@rgommers I am not ignoring your argument that a user doesn't even get to see a version number. It's simply not true for the environments I see people use all around me. The people who decide about updates (which are not always end users) do see it. They don't just do "pip install --upgrade" once a week. They would even consider this a careless attitude. If people around mainly use use Anaconda under Windows, that just confirms that we work in very different environments. In the age of diversity, I hope we can agree that each community may adopt the tools and conventions that work well for it. And yes, NodeJS is worse, I agree. Fortunately, I can easily ignore it. |
Just got an e-mail from a colleague who follows this thread but wouldn't dare to chime in. With an excellent analogy: "I love it when I get the chance to buy a new microscope and do better science with it. But I would hate to see someone replacing my microscope overnight without consulting with me." It's all about having control over one's tools. |
RE: fancy indexing. Indeed, this could use a dedicated function. This is what was done in TensorFlow, for example, with Another feature that can help are type annotations, a feature that was partially motivated by the difficulty of the Python 2 to 3 transition. I'm not saying this would be easy. In my mind, the community consequences are a bigger deal. This would indeed take a lot of energy to implement and then push downstream into projects like SciPy. |
@khinsen I've been following the discussion all week and I think I have a practical test problem to test your take on it. This might be a good item to see how your perspective would handle such conflicts instead of the slightly-abstract discussion so far. Currently, thanks to Apple Accelerate framework the minimum required LAPACK version is 3.1.ish which from more than a decade ago. And currently LAPACK is at 3.8.0. In the meantime they have discarded quite a number of routines (deprecated and/or removed) and fixed a lot of bugs and most importantly introduced new routines that are needed to fill the gap between commercial software and Python scientific software. The end result is summarized here. I have been constantly annoying mainly @rgommers and others for the last 6 months for this 😃 and I can assure you if they were the kind of people that you, maybe unwillingly, portrayed here this should have happened by now and broke the code of many people. Instead they have been patiently explaining why it is not that easy to drop the support for Accelerate. Now there is an undisputed need for newer versions. That is not the discussion and we can safely skip that part. There is a significant portion of users of NumPy and SciPy that would benefit from this. But we can't just simply drop it because of arguments that you have already presented. How would you resolve this? I'm not asking this in a snarky fashion but since all the devs seemingly think alike (and I have to say I agree with them) maybe your look can give a fresh idea. Should we keep Accelerate and create a new NumPy/SciPy package everytime such thing happens? If we drop the support in order to innovate what is the best way you think to go here? |
@xoviat: Let's have this discussion on that issue |
@ilayn Thanks for nudging this discussion towards the concrete and constructive! There are in fact many similarities between that situation and the ones that motivated me to start this thread. The main common point: there are different users/communities that have different needs. Some want Accelerate, others want the new LAPACK features. Both have good reasons for their specific priorities. There may even be people who want both Accelerate and the new LAPACK features, though this isn't clear to me. In the Fortran/C world, there is no such problem because the software stacks are shallower. There's Fortran, LAPACK, and the application code, without additional intermediates. What happens is that each application code chooses a particular version of LAPACK depending on its priorities. Computing centres typically keep several LAPACK versions in parallel, each in its own directory, the choice being made by modifying the application code's The lesson that we can and should take over into the SciPy ecosystem is that choosing software versions is not the task of software developers, but of the people who assemble application-specific software bundles. In our world, that's the people who work on Anaconda, Debian, and other software distributions, but also systems managers at various levels and end users with the right competence and motivation. So my proposal for the SciPy/LAPACK dilemma is to keep today's SciPy using Accelerate, but put it into minimal maintenance mode (possibly taken over by different people). People who want Accelerate can then choose "SciPy 2017" and be happy. They won't get the new LAPACK features, but presumably that's fine with most of them. Development continues in a new namespace ( The overall idea is that developers propose new stuff (and concentrate on its development), but don't advertise it as "better" in a general sense, nor as a universal replacement. Choosing the right combination of software versions for a particular task is not their job, it's somebody else's. The general idea that development and assembly are done independently and by different people also suggests that today's mega-packages should be broken up into smaller units that can progress at different rates. There is no reason today for NumPy containing a small LAPACK interface and tools like The main obstacle to adopting such an approach is traditional Linux distributions such as Debian or Fedora with their "one Python installation per machine" approach. I think they could switch to multiple system-wide virtual environments with reasonable effort, but I haven't thought much about this. For me, the future of software packaging is environment-based systems such as conda or Guix. |
(This has become a free-for-all discussion, so I'll go ahead and jump in). The problem with keeping support for accelerate is not that it lacks newer LAPACK APIs. If that were the problem, we could ship newer LAPACK shims and be done. The problem is that there are basic functions that return incorrect results in certain scenarios. There is no way to work around that other than to write our own BLAS functions. And if we're doing that, we might as well require OpenBLAS or MKL. |
@xoviat These have been all discussed in scipy/scipy#6051. It's as usual never that simple. But the point is not to discuss Accelerate drop but use it as a use case for the actual dev cycle for new versions. |
One could argue that a feature (or limitation) of the Python ecosystem is that you get one version of a library without the horrible hack of name mangling. This happens in core Python. This is why there are libraries named lib and lib2 which have the same purpose but API differences. Even ore Python works this way. It isn't possibly to mix standard libraries across versions even if both are technically usable on the modern Python without someone ripping it and putting it on PyPi. There are plenty of StackOverflow questions on this, all with the same conclusion. |
@ilayn If for some reason you want to have all possible combinations of all versions of everything on your machine, yes, that's a mess. But why would you want that? If you limit yourself to the combinations you actually need for your application scenarios, I bet it's going to be less. As an example, I keep exactly two Python environments on my machine: one with Python 2 + NumPy 1.8.2 for running my 20-year-old code, and one representing the state of the art of about two years ago for everything else (two years ago because I set it up two years ago, and never saw a reason to upgrade after that). As for granularity, I was perhaps not quite clear in my proposition. What I advocate is more granularity in packaging, not in development. I would expect development of, say, f2py and SciPy to continue in close coordination. f2py-2018 and SciPy-2018 should work together. That doesn't mean they have to be packed as a single entity. The goal is to provide more freedom for software distribution managers to do their work. I definitely don't want to make Anaconda or any other distribution a dependency. It's more like the "abundance of somebody else's", although I don't expect the number of distributions to grow to "abundance", given that assembling them is a lot of work. I have no idea what workflow "the user base" wants. I see lots of different user bases with different requirements. Personally I'd go for multiple environments, but if there is a significant user base that wants a single environment per machine, some distribution will take care of that. But virtual environments were invented for a reason, they solve a real problem. System-level distributions like Nix or Guix take them to another level. I don't expect them to go away. BTW, I am actually following the mailing list of one Linux distribution (Guix). Not much fun, but a lot of down-to-earth grunt work. I am happy there are people doing this. |
@xoviat I didn't suggest to "keep Accelerate support". I merely suggest to keep a SciPy variant (pretty much the current one) around not as an outdated release for the museum, but as a variant of interest for a particular user group: those for whom using Accelerate is more important than solving the problems that Accelerate creates for others. The "Accelerate first" people will have to live with the consequences of their choice. Some problems will never be fixed for them. That's probably fine with them ("known bugs are better than unknown bugs"), so why force them into something different? It's really all about labelling and communication. I want to get away from the idealized image of software following a linear path of progress, with newer versions being "better" as indicated by "higher" version numbers. I want to replace this image with one that I consider more realistic: there is no obvious order relation between software releases. Those produced by a long-lived coherent developer community have a temporal order, but that doesn't imply anything about quality or suitability for any given application. If the idealized image were right, we wouldn't see forks, and we wouldn't have virtual environments. Nor projects such as VersionClimber. What I am proposing is that software developers should embrace this reality rather than denying it. They should develop (and, most of all, package and label) their products for a world of diversity. |
@khinsen If you're okay with incorrect results from linear algebra functions, then we can keep accelerate support (note to others: I know how to do this). However, the main problem is that you might be the only person who wants this. And even if you are not, what happens when someone down the road blames SciPy for a problem with accelerate? What happens when someone wants to have their cake and eat it too? I can just see that happening. |
@xoviat No, I am not OK with incorrect results from linear algebra functions. But I am sure that there are plenty of SciPy users who don't need the affected functions at all. In the thread you referred to, someone suggested removing/deactivating the affected functions when Accelerate is detected, which I think is a good solution (note: I cannot judge the effort required to implement this). In a way this is part of the mega-package problem. With a more granular distribution, it would be easier to pick the stuff that works, both at the development and the distribution assembly level. One could even imagine a distribution assembler composing a domain- and platform-specific SciPy distribution in which different subpackages use different LAPACK versions, e.g. for use in HPC contexts. |
There's minimal evidence for this statement and I would in fact bet on the opposite. The functions are widely used but only fail in certain scenarios; in other other words your results are probably correct but may not be. Yes, this probably applies to the SciPy that you currently have installed if using OSX. Yes, this needs to be fixed. As far as maintaining a separate branch, I don't think that anyone would be opposed to giving you write access to a particular branch for you to maintain. But this is open source software and people work on what they want to; I am skeptical that many people would be interested in maintaining that branch. |
Actually, I think the anaconda SciPy is compiled with MKL, so you wouldn't be affected in that case. But then why would you care about accelerate support? |
@xoviat It seems there's a big misunderstanding here. I have no personal stakes at all in this specific issue. I don't use any linear algebra routines from SciPy. You pointed to a thread on a SciPy issue and asked how I would handle that kind of situation. The thread clearly shows reluctance to simply drop Accelerate support, from which I deduced that there is a significant user group that would be affected by such a change. If that user group doesn't exist, then where is the problem? Why hasn't SciPy already dropped Accelerate support? |
@xoviat Maintaining a separate branch is easy for anyone. There is no need for it to be hosted in the same GitHub repository. In other words, branches are not the issue. The issue is namespacing, in order to make the parallel existence of separate SciPy versions transparent to users (and distribution assemblers). Today, when you see code saying "import scipy", you have no idea for which range of SciPy versions it is supposed to work (i.e. has been tested to some degree). In the best case, there is a README saying "SciPy >= 0.8" or something like that. This habit is based on the assumption that "higher" versions are always "better" and never degrade (break, slow down, ...) anything. And that assumption is quite simply wrong. If, on the other hand, the code says "import scipy2017 as scipy", then it is clear to every reader that using it with earlier or later versions might lead to bad surprises. And if old SciPy versions disappear (effectively, for lack of maintenance), then such a code will fail with an error message, rather than continuing to work unreliably. This is the one point I am trying to make in this thread. The coexistence of different versions is a reality. The idea that higher is better is a dream. Let's be realistic and organize ourselves for the real world, by acknowledging a multiple-version universe and adjusting everybody's communication to prevent misunderstandings. |
Well, dunno… in my opinion when it comes to warnings, a specific version import is not a warning, it is prohibitive of using a different version, since the users having problems as you describe will not dare change your code. A warning would be if you print a warning on install/runtime that it is untested for all but specific numpy versions? I suppose creating that type of extra packages is possible. I also expect it will just create a different type of hell. Much might survive, but type checking for example will not and cannot when you mix two versions, so basically you won't know if it can or cannot work until you try (and no one will test this!). The specific import makes sense if you have major prohibitive changes (a bit like py2/py3), and we already saw that we have different opinions on where or on what time scale that "major" line seems to be. |
The backward compatiblity NEP #11596 has been submitted, can we close this? |
Yes we can close this. Independent of that NEP (which explicitly mentions semver as a rejected alternative), the consensus of the core devs here is that we don't want to change to semver. Hence closing as wontfix. Thanks for the discussion everyone. |
Semantic versioning is a widely used convention in software development, distribution, and deployment. In spite of a long-lasting discussion about its appropriateness (Google knows where to find it), it is today the default. Projects that consciously decide not to use semantic versioning tend to choose release numbering schemes that make this immediately clear, such as using dates instead of versions.
NumPy is one of rare examples of widely used software that uses a version numbering scheme that looks like semantic versioning but isn't, because breaking changes are regularly introduced with a change only in the minor version number. This practice creates false expectations among software developers, software users, and managers of software distributions.
This is all the more important because NumPy is infrastructure software in the same way as operating systems or compilers. Most people who use NumPy (as developers or software users) get and update NumPy indirectly through software distributions like Anaconda or Debian. Often it is a systems administrator who makes the update decision. Neither the people initiating updates nor the people potentially affected by breaking changes follow the NumPy mailing list, and most of them do not even read the release notes.
I therefore propose that NumPy adopt the semantic versioning conventions for future releases. If there are good reasons for not adopting this convention, NumPy should adopt a release labelling scheme that cannot be mistaken for semantic versioning.
The text was updated successfully, but these errors were encountered: