Skip to content

Z-Stack rejects joins through routers unless coordinator is also permitting joins #53

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
puddly opened this issue Dec 24, 2020 · 17 comments

Comments

@puddly
Copy link
Collaborator

puddly commented Dec 24, 2020

@TheJulianJES reported that joins done via specific devices do not work. I've confirmed this occurs with release 0.3.0 with multiple coordinators. The new device successfully associates through the target router but the coordinator kicks the new device off the network with an "APS remove device (0x07)" command instead of sending it the transport key.

Comparing the behavior of zigpy-znp and Z2M shows that the only difference is the following at startup:

  • zigpy-znp sends: ZDO.MgmtPermitJoinReq.Req(AddrMode=<AddrMode.NWK: 2>, Dst=0x0000, Duration=0, TCSignificance=1)
  • Z2M sends: ZDO.MgmtPermitJoinReq.Req(AddrMode=<AddrMode.Broadcast: 3>, Dst=0xFFFC, Duration=0, TCSignificance=0)

The first command is sent on startup by zigpy-znp to fix the CC2531 Z-Stack Home 1.2 permanently permitting joins glitch. Unfortunately, it appears that sending any unicast permit join request to 0x0000 causes it to stop permitting joins later on. Sending a broadcast to all routers and the coordinator does not trigger this behavior. I've confirmed this on all of my Texas Instruments coordinators so it's not just a Z-Stack Home 1.2 bug.

Since Home Assistant includes buttons to permit joins through specific routers as well as just the coordinator, I see two possible solutions:

  1. Overload ControllerApplication.permit to always call ControllerApplication.permit_ncp as well. This has the unfortunate side effect of allowing new devices to join the coordinator directly as well as the desired router, which is not intuitive behavior.
  2. Make ControllerApplication.permit_ncp a no-op in zigpy-znp and somehow disable the "add devices via this device" button in Home Assistant for the coordinator. Removing functionality is not ideal but all the visible buttons will do exactly what they say.
@Adminiuga
Copy link
Contributor

So have you tried sending the same broadcast instead of unicast at startup? I see no harm sending MgmtPermitJoinReq as a broadcast with duration 0 at startup.

@puddly
Copy link
Collaborator Author

puddly commented Dec 24, 2020

Yes, that works. It's the unexpected behavior of permit_ncp that's really the issue, since either way some functionality will have to be removed to ensure joining through specific routers remains functional.

@Adminiuga
Copy link
Contributor

so are you saying that if at some time you call permit_ncp(duration=60) it would stop accepting devices through other routers?

@Adminiuga
Copy link
Contributor

and is this problem with both 3.0 and 1.2 or only 12 firmwares?

@puddly
Copy link
Collaborator Author

puddly commented Dec 24, 2020

Yep. Happens with the CC2531 and the CC2652R. Haven't tried Z-Stack 3 on the CC2531 but I don't think it'll behave differently.

If permit_ncp is not called, the coordinator properly sends the transport key. After permit_ncp(60) (or permit_ncp(0)) is called and 60s elapse, the coordinator instead begins sending "APS remove device".

@puddly
Copy link
Collaborator Author

puddly commented Dec 24, 2020

Maybe it's possible to add the coordinator to some random group and send the MgmtPermitJoinReq as a multicast (or are only unicast and broadcast allowed?)?

I'll do more thorough testing in a few days.

@puddly
Copy link
Collaborator Author

puddly commented Dec 24, 2020

I don't have the low-level hardware or firmware access to test this, but it may be that Z-Stack will still send the transport key if a device tries to joins its network, even if the coordinator's beacons say that joins are not permitted.

I'm unable to find any way to tell Z-Stack to permit joins without also advertising this in the coordinator's beacons. Is this what the EZSP permitJoining command is supposed to do?

@Adminiuga
Copy link
Contributor

I belive EZSP permitJoinin is similar as sending unicast ZDO to coordinator. At least in earlier versions it was responding to broad cast loopbacks and allowing joins too.

@Adminiuga
Copy link
Contributor

Send broadcast on startup and don't do anything on permit_ncp() ???

@puddly
Copy link
Collaborator Author

puddly commented Jan 10, 2021

Hmm, looks like this isn't as simple as I thought. Zigbee2MQTT suffers from the same problem with my freshly erased and flashed LAUNCHXL-CC26X2R1 and an IKEA dimmer joining through an IKEA bulb. Z2M doesn't send any unicast permit join requests to the coordinator yet this still happens:

image

@puddly
Copy link
Collaborator Author

puddly commented Jan 10, 2021

Good news: this does seem to be possible with firmware modifications and a new MT command. From what I understand:

  • All broadcast TC policy changes are explicitly re-sent as unicast via a loopback interface so the whole unicast/broadcast thing is probably nonsense.
  • When a new device joins, the ZDSecMgrPermitJoiningEnabled global determines whether the device is sent the network key or if it is kicked off the network.
  • The NLME_PermitJoining global determines what's in the beacon and thus, whether or not new devices will try connecting to the coordinator.

I modified the permit joining command in the firmware to only set ZDSecMgrPermitJoiningEnabled and never touch NLME_PermitJoining. Wireshark confirmed that only the bulb's beacons had Association Permit: True. The dimmer then joined through bulb and the coordinator sent it the network key, as expected.

The new command would have to be something like Util.SecMgrPermitJoining, which just internally enables/disables sending the network key to newly joined devices, regardless of the permit state of the coordinator.

Before I continue down this rabbit hole: @Adminiuga, is the purpose of permit_ncp just to help coordinators that don't properly handle zigpy requests sent to 0x0000? Is the expected behavior of app.permit(node=...) is to permit joins on both the other node and on the coordinator? My SiLabs stick is missing at the moment so I can't test this.

@Adminiuga
Copy link
Contributor

Yes, permit_ncp() is solely for opening only coordinator for joining new devices

@puddly
Copy link
Collaborator Author

puddly commented Jan 12, 2021

Alright, so since bellows has permit_ncp call permitJoining, which "tells the stack to allow other nodes to join the network with this node as their parent", this means that the current behavior of app.permit(node=router) is to permit joins on both the coordinator and the router. I was expecting it to force joins to happen only through router and never the coordinator but this isn't easy to do.

@puddly
Copy link
Collaborator Author

puddly commented Jan 12, 2021

On the other hand, if dev is not the coordinator, only dev.zdo.permit(time_s) will be called. app.permit_ncp(time_s) will never be called by zigpy: https://github.com/zigpy/zigpy/blob/5ed167206059018e25ac9bfd7c8ff0116e420270/zigpy/application.py#L351-L368

Maybe permit_ncp needs to be called no matter what?

@Adminiuga
Copy link
Contributor

permit_ncp() at the time was implemented to open only the coordinator for joining, in case someone wanted to join a device specifically through coordinator. This was for xiaomi targeted joining. Since I didn't know how different coordinator behave, it was implemented as a separate method to allow coordinator specific implementation.

If we always call permit_ncp() then it would interfere with joining new devices via specified router.
Should we just add a new method? like pre_permit() which would allow radio libs to prepare network for joining? Otherwise, just override permit() method???

@puddly puddly changed the title Z-Stack disallows joins through specific routers after coordinator receives a unicast Permit Join request Z-Stack rejects joins through routers unless coordinator is also permitting joins Jan 21, 2021
@puddly
Copy link
Collaborator Author

puddly commented Feb 19, 2021

I'm going to close this issue for now. The root cause is known and the only software fix has been released.

@puddly puddly closed this as completed Feb 19, 2021
@puddly
Copy link
Collaborator Author

puddly commented Jul 20, 2021

Fixed in a Z-Stack patch: Koenkk/Z-Stack-firmware@efac5ee (thanks @TheJulianJES)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants