Skip to content

strange behaviour when factorizing (new '0' issue with nodeset) #293

Closed
@nicolasdeferrieres

Description

@nicolasdeferrieres

Hi,

Recently I had some issue with factorization (#288 which was a duplicated of #286) which I fixed by installing the last version (from master branch). It fixed my issue.

I have now a new one (after eating my '0', it's adding an extra '0'):

$ cat liste_failed_for_clush
firstgroupa001
firstgroupa002
secondgroup001
secondgroup002
secondgroup003
secondgroup005
secondgroup006
secondgroup008
namewithissue007
namewithissue008
namewithissue009
namewithissue01
namewithissue02
namewithissue03
namewithissue04
namewithissue10
$ paste -sd, liste_failed_for_clush | nodeset -f
firstgroupa[001-002],namewithissue[001-004,007-010],secondgroup[001-003,005-006,008]

It is adding a 0 to the last five servers!
namewithissue[001-004,007-010] should be namewithissue[01-04,10],namewithissue[007-009]

If the servers are not in the same order (starting with 2 digits server's name instead of the 3 digits one):

$ cat liste_failed_for_clush2
namewithissue01
namewithissue02
namewithissue03
namewithissue04
namewithissue10
firstgroupa001
firstgroupa002
secondgroup001
secondgroup002
secondgroup003
secondgroup005
secondgroup006
secondgroup008
namewithissue007
namewithissue008
namewithissue009
$ paste -sd, liste_failed_for_clush2 | nodeset -f
firstgroupa[001-002],namewithissue[01-04,07-10],secondgroup[001-003,005-006,008]

This time, it's eating a '0'.

As I saw the last version was released (which should be the same I installed 10 days before), I installed it instead, and it is the same behaviour

$ rpm -qa | grep clustershell
clustershell-1.7.1-1.el6.noarch
$ /usr/bin/nodeset --version
nodeset 1.7.1

I even try to rollback to my initial version (1.7-1, and it is the same).

Is it a new bug?

Thanks

Activity

thiell

thiell commented on Mar 3, 2016

@thiell
Collaborator

Hi @nicolasdeferrieres!

Thanks for reporting. I would say, it's a known limitation. The current behavior is even documented:

http://clustershell.readthedocs.org/en/latest/tools/nodeset.html#zero-padding

nodeset will always try to coalesce node names by their numerical index first (without taking care of any zero-padding), and then will use the first zero-padding rule encountered.

Indeed, internally nodeset will recognize "namewithissue10" or "namewithissue008" as the same pattern "namewithissue%s" and will apply the largest padding value, here used for 008, to all indexes.

That is clearly a limitation we can work on. You're the first to report that, but this behavior has been there for a long time now. For me (I'm a cluster admin), this is a tricky case, it will lead to errors (human errors and/or or weird behavior from other tools).

Let us know what you think. I'm open to hear arguments :)

nicolasdeferrieres

nicolasdeferrieres commented on Mar 3, 2016

@nicolasdeferrieres
Author

Hi @thiell,

Thanks for the answer.

Indeed, this behavior is documented. I tried with an older version (1.6), and the behavior is the same.

However, this behavior troubles me a lot.

I'm giving a list of servers, and nodeset will output a different one!

In my case, I was fortunate, because the names altered by nodeset don't exists in my infrastructure (I know, it is dangerous to have a server named bar01 and another one named bar001, but still it can exists (a naming convention migration for example)).

Furthermore, you can see that when the first server has a name with 2 digits, it has a different behavior than when the first server has 3 digits.

$ nodeset -f bar01,bar002
bar[01-02]
$ nodeset -f bar002,bar01
bar[001-002]

For me (in an ideal world!), nodeset should be here to make groups of servers (from a input list of servers) not to change the meaning of this input.

I'm expecting that nodest will behave like:

$ nodeset -f bar01,bar002
bar0[1,02]
$ nodeset -f bar002,bar01
bar0[1,02]

Or even

$ nodeset -f bar01,bar002
bar01,bar002
$ nodeset -f bar002,bar01
bar01,bar002

And in rare cases:

$ nodeset -f bar01,bar001
bar01,bar001 (or bar0[1,01])

In the way I need to be sure that the servers on which my commands will be launched are the one I gave!

By talking to other users that encounter this strange behavior, they usually bypass by making two (or more) lists when it arrives.

But when working on a list of several thousands of servers, it is possible that the human will not see that the servers are not the one he gave in input (actions on unwanted servers, and servers forgotten).

Here are part of my arguments! :-) I'm open to hear arguments too.

degremont

degremont commented on Mar 3, 2016

@degremont
Collaborator

Hi @nicolasdeferrieres

By talking to other users that encounter this strange behavior, they usually bypass by making two (or more) lists when it arrives.

Could you give us more details about that? Do you know a lot of people doing this kind of things? :)

nicolasdeferrieres

nicolasdeferrieres commented on Mar 4, 2016

@nicolasdeferrieres
Author

Hi @degremont

It's just a simple manual bypass:
When they encounter this bad factorization, they split the list in order to avoid the "limitation":
list1:

firstgroupa001
firstgroupa002
secondgroup001
secondgroup002
secondgroup003
secondgroup005
secondgroup006
secondgroup008
namewithissue007
namewithissue008
namewithissue009

list2:

namewithissue01
namewithissue02
namewithissue03
namewithissue04
namewithissue10

And launching separately:

clush -bw $(paste -sd, list1) "command"
clush -bw $(paste -sd, list2) "command"

self-assigned this
on Mar 5, 2016
degremont

degremont commented on Mar 21, 2016

@degremont
Collaborator

We will probably have to support padding per pattern and no more per rangeset. This is a not trivial change that could go in 1.8 or 1.9...

added this to the 1.8 milestone on May 7, 2016
modified the milestones: 1.8, 1.9 on Sep 17, 2017

16 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

    Participants

    @andrewbogott@thiell@degremont@btravouillon@nicolasdeferrieres

    Issue actions

      strange behaviour when factorizing (new '0' issue with nodeset) · Issue #293 · cea-hpc/clustershell