Skip to content

Conversation

geohot
Copy link

@geohot geohot commented Nov 21, 2020

I haven't root caused why, but this PR disables it for arm64 chip. Otherwise on M1 the kernels segfault on startup.

@mwidjaja1
Copy link

Can confirm, this change fixes Jupyter for me on an ARM Macbook Air!

@erykoff
Copy link

erykoff commented Nov 27, 2020

Note that this solution is just a band-aid on a much deeper wound, see #562 . Something is not working with ctypes calls on Apple Silicon that is touched by appnope.

@geohot
Copy link
Author

geohot commented Nov 28, 2020

As far as I can tell, it's just appnope itself that's broken, nothing deeper. Has to do with whatever the implicit argtypes is for ctypes. Also objc_getClass returning None because Foundation wasn't loaded. Fix here:

minrk/appnope#7

@mwidjaja1
Copy link

I don't think this change is necessary anymore because as noted in the issue @geohot linked, the issue has been resolved in appnope. I updated appnope today, reverted this MR in my copy of iPython, and I was able to run Jupyterlab natively.

minrk/appnope#7 (comment)

@geohot
Copy link
Author

geohot commented Dec 1, 2020

Yes, should be good with appnope 0.1.1, which will already be installed. Closing PR.

@mtoseef99
Copy link

@geohot this update isn't working for me anymore, tried it many times after creating new envs.

Here is the sample output of installed packages and GPU installation:

  • macOS Big Sur 11.6, M1 MacbookPro

  • Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:24:02)

  • GPU is available

  • Num GPUs Available: 1

  • Metal device set to: Apple M1

  • systemMemory: 16.00 GB

  • maxCacheSize: 5.33 GB

2021-10-20 22:11:58.637125: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-10-20 22:11:58.637250: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Epoch 1/12
2021-10-20 22:11:58.825292: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2021-10-20 22:11:58.826147: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2021-10-20 22:11:58.826201: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-10-20 22:11:58.856 python[44484:5804473] -[MPSGraph adamUpdateWithLearningRateTensor:beta1Tensor:beta2Tensor:epsilonTensor:beta1PowerTensor:beta2PowerTensor:valuesTensor:momentumTensor:velocityTensor:maximumVelocityTensor:gradientTensor:name:]: unrecognized selector sent to instance 0x168eff7d0
2021-10-20 22:11:58.869 python[44484:5804473] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '-[MPSGraph adamUpdateWithLearningRateTensor:beta1Tensor:beta2Tensor:epsilonTensor:beta1PowerTensor:beta2PowerTensor:valuesTensor:momentumTensor:velocityTensor:maximumVelocityTensor:gradientTensor:name:]: unrecognized selector sent to instance 0x168eff7d0'
*** First throw call stack:
(
	0   CoreFoundation                      0x000000019bb8b838 __exceptionPreprocess + 240
	1   libobjc.A.dylib                     0x000000019b8b50a8 objc_exception_throw + 60
	2   CoreFoundation                      0x000000019bc1c694 -[NSObject(NSObject) __retain_OA] + 0
	3   CoreFoundation                      0x000000019baeccd4 ___forwarding___ + 1444
	4   CoreFoundation                      0x000000019baec670 _CF_forwarding_prep_0 + 96
	5   libmetal_plugin.dylib               0x0000000150b82290 _ZN12metal_plugin14MPSApplyAdamOpIfEC2EPNS_20OpKernelConstructionE + 656
	6   libmetal_plugin.dylib               0x0000000150b81ebc _ZN12metal_pluginL14CreateOpKernelINS_14MPSApplyAdamOpIfEEEEPvP23TF_OpKernelConstruction + 52
	7   libtensorflow_framework.2.dylib     0x00000001221685d4 _ZN10tensorflow12_GLOBAL__N_120KernelBuilderFactory6CreateEPNS_20OpKernelConstructionE + 88
	8   libtensorflow_framework.2.dylib     0x00000001221ea158 _ZN10tensorflow14CreateOpKernelENS_10DeviceTypeEPNS_10DeviceBaseEPNS_9AllocatorEPNS_22FunctionLibraryRuntimeEPNS_11ResourceMgrERKNSt3__110shared_ptrIKNS_14NodePropertiesEEEiPPNS_8OpKernelE + 784
	9   libtensorflow_framework.2.dylib     0x00000001223c52b8 _ZN10tensorflow21CreateNonCachedKernelEPNS_6DeviceEPNS_22FunctionLibraryRuntimeERKNSt3__110shared_ptrIKNS_14NodePropertiesEEEiPPNS_8OpKernelE + 272
	10  libtensorflow_framework.2.dylib     0x000000012236fc20 _ZN10tensorflow26FunctionLibraryRuntimeImpl12CreateKernelERKNSt3__110shared_ptrIKNS_14NodePropertiesEEEPNS_22FunctionLibraryRuntimeEPPNS_8OpKernelE + 600
	11  libtensorflow_framework.2.dylib     0x00000001223da430 _ZN10tensorflow22ImmutableExecutorState10InitializeERKNS_5GraphE + 1192
	12  libtensorflow_framework.2.dylib     0x00000001223c5064 _ZN10tensorflow16NewLocalExecutorERKNS_19LocalExecutorParamsERKNS_5GraphEPPNS_8ExecutorE + 304
	13  libtensorflow_framework.2.dylib     0x00000001223d2e6c _ZN10tensorflow12_GLOBAL__N_124DefaultExecutorRegistrar7Factory11NewExecutorERKNS_19LocalExecutorParamsERKNS_5GraphEPNSt3__110unique_ptrINS_8ExecutorENS9_14default_deleteISB_EEEE + 48
	14  libtensorflow_framework.2.dylib     0x00000001223d37e8 _ZN10tensorflow11NewExecutorERKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEERKNS_19LocalExecutorParamsERKNS_5GraphEPNS0_10unique_ptrINS_8ExecutorENS0_14default_deleteISG_EEEE + 92
	15  libtensorflow_framework.2.dylib     0x0000000122372278 _ZN10tensorflow26FunctionLibraryRuntimeImpl10CreateItemEPPNS0_4ItemE + 2676
	16  libtensorflow_framework.2.dylib     0x000000012237306c _ZN10tensorflow26FunctionLibraryRuntimeImpl3RunERKNS_22FunctionLibraryRuntime7OptionsEyN4absl12lts_202103244SpanIKNS_6TensorEEEPNSt3__16vectorIS8_NSB_9allocatorIS8_EEEENSB_8functionIFvRKNS_6StatusEEEE + 676
	17  libtensorflow_framework.2.dylib     0x00000001223810c0 _ZNK10tensorflow29ProcessFunctionLibraryRuntime14RunMultiDeviceERKNS_22FunctionLibraryRuntime7OptionsEyPNSt3__16vectorIN4absl12lts_202103247variantIJNS_6TensorENS_11TensorShapeEEEENS5_9allocatorISC_EEEEPNS6_INS5_10unique_ptrINS0_11CleanUpItemENS5_14default_deleteISI_EEEENSD_ISL_EEEENS5_8functionIFvRKNS_6StatusEEEENSP_IFSQ_RKNS0_21ComponentFunctionDataEPNS0_12InternalArgsEEEE + 2640
	18  libtensorflow_framework.2.dylib     0x0000000122384098 _ZNK10tensorflow29ProcessFunctionLibraryRuntime3RunERKNS_22FunctionLibraryRuntime7OptionsEyN4absl12lts_202103244SpanIKNS_6TensorEEEPNSt3__16vectorIS8_NSB_9allocatorIS8_EEEENSB_8functionIFvRKNS_6StatusEEEE + 2012
	19  libtensorflow_framework.2.dylib     0x0000000122384868 _ZNK10tensorflow29ProcessFunctionLibraryRuntime7RunSyncERKNS_22FunctionLibraryRuntime7OptionsEyN4absl12lts_202103244SpanIKNS_6TensorEEEPNSt3__16vectorIS8_NSB_9allocatorIS8_EEEE + 160
	20  _pywrap_tensorflow_internal.so      0x0000000134175554 _ZN10tensorflow19KernelAndDeviceFunc3RunEPNS_19ScopedStepContainerERKNS_15EagerKernelArgsEPNSt3__16vectorIN4absl12lts_202103247variantIJNS_6TensorENS_11TensorShapeEEEENS6_9allocatorISD_EEEEPNS_19CancellationManagerERKNS9_8optionalINS_25EagerRemoteFunctionParamsEEERKNSK_INS_17ManagedStackTraceEEE + 516
	21  _pywrap_tensorflow_internal.so      0x000000013413fd60 _ZN10tensorflow18EagerKernelExecuteEPNS_12EagerContextERKN4absl12lts_2021032413InlinedVectorIPNS_12TensorHandleELm4ENSt3__19allocatorIS6_EEEERKNS3_8optionalINS_25EagerRemoteFunctionParamsEEERKNS7_10unique_ptrINS_15KernelAndDeviceENS_4core15RefCountDeleterEEEPNS_14GraphCollectorEPNS_19CancellationManagerENS3_4SpanIS6_EERKNSD_INS_17ManagedStackTraceEEE + 372
	22  _pywrap_tensorflow_internal.so      0x00000001341463c4 _ZN10tensorflow11ExecuteNode3RunEv + 396
	23  _pywrap_tensorflow_internal.so      0x0000000134481764 _ZN10tensorflow13EagerExecutor11SyncExecuteEPNS_9EagerNodeE + 172
	24  _pywrap_tensorflow_internal.so      0x000000013413f89c _ZN10tensorflow12_GLOBAL__N_117EagerLocalExecuteEPNS_14EagerOperationEPPNS_12TensorHandleEPi + 1976
	25  _pywrap_tensorflow_internal.so      0x000000013413da44 _ZN10tensorflow12EagerExecuteEPNS_14EagerOperationEPPNS_12TensorHandleEPi + 296
	26  _pywrap_tensorflow_internal.so      0x0000000133da2ba4 _ZN10tensorflow14EagerOperation7ExecuteEN4absl12lts_202103244SpanIPNS_20AbstractTensorHandleEEEPi + 192
	27  _pywrap_tensorflow_internal.so      0x000000013417b92c _ZN10tensorflow21CustomDeviceOpHandler7ExecuteEPNS_27ImmediateExecutionOperationEPPNS_30ImmediateExecutionTensorHandleEPi + 468
	28  _pywrap_tensorflow_internal.so      0x00000001309c7f38 TFE_Execute + 80
	29  _pywrap_tensorflow_internal.so      0x0000000130944ac0 _Z24TFE_Py_ExecuteCancelableP11TFE_ContextPKcS2_PN4absl12lts_2021032413InlinedVectorIP16TFE_TensorHandleLm4ENSt3__19allocatorIS7_EEEEP7_objectP23TFE_CancellationManagerPNS5_IS7_Lm2ESA_EEP9TF_Status + 616
	30  _pywrap_tfe.so                      0x000000012540641c _ZN10tensorflow32TFE_Py_ExecuteCancelable_wrapperERKN8pybind116handleEPKcS5_S3_S3_PNS_19CancellationManagerES3_ + 160
	31  _pywrap_tfe.so                      0x0000000125437208 _ZZN8pybind1112cpp_function10initializeIZL25pybind11_init__pywrap_tfeRNS_7module_EE4$_44NS_6objectEJRKNS_6handleEPKcSA_S8_S8_S8_EJNS_4nameENS_5scopeENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE_8__invokeESR_ + 184
	32  _pywrap_tfe.so                      0x00000001254190e0 _ZN8pybind1112cpp_function10dispatcherEP7_objectS2_S2_ + 3216
	33  python                              0x0000000104b73398 cfunction_call + 80
	34  python                              0x0000000104b1f1e8 _PyObject_MakeTpCall + 340
	35  python                              0x0000000104c2f6ac call_function + 724
	36  python                              0x0000000104c2bd44 _PyEval_EvalFrameDefault + 29268
	37  python                              0x0000000104c244a8 _PyEval_EvalCode + 2968
	38  python                              0x0000000104b1fe64 _PyFunction_Vectorcall + 240
	39  python                              0x0000000104c2f614 call_function + 572
	40  python                              0x0000000104c2be40 _PyEval_EvalFrameDefault + 29520
	41  python                              0x0000000104c244a8 _PyEval_EvalCode + 2968
	42  python                              0x0000000104b1fe64 _PyFunction_Vectorcall + 240
	43  python                              0x0000000104b22cf0 method_vectorcall + 164
	44  python                              0x0000000104c2f614 call_function + 572
	45  python                              0x0000000104c2be40 _PyEval_EvalFrameDefault + 29520
	46  python                              0x0000000104c244a8 _PyEval_EvalCode + 2968
	47  python                              0x0000000104b1fe64 _PyFunction_Vectorcall + 240
	48  python                              0x0000000104b22cf0 method_vectorcall + 164
	49  python                              0x0000000104c2f614 call_function + 572
	50  python                              0x0000000104c2be40 _PyEval_EvalFrameDefault + 29520
	51  python                              0x0000000104c244a8 _PyEval_EvalCode + 2968
	52  python                              0x0000000104b1fe64 _PyFunction_Vectorcall + 240
	53  python                              0x0000000104b1f468 _PyObject_FastCallDictTstate + 320
	54  python                              0x0000000104b201e0 _PyObject_Call_Prepend + 164
	55  python                              0x0000000104b9735c slot_tp_call + 376
	56  python                              0x0000000104b1fc34 _PyObject_Call + 156
	57  python                              0x0000000104c2c078 _PyEval_EvalFrameDefault + 30088
	58  python                              0x0000000104c244a8 _PyEval_EvalCode + 2968
	59  python                              0x0000000104b1fe64 _PyFunction_Vectorcall + 240
	60  python                              0x0000000104b22e50 method_vectorcall + 516
	61  python                              0x0000000104c2c078 _PyEval_EvalFrameDefault + 30088
	62  python                              0x0000000104c244a8 _PyEval_EvalCode + 2968
	63  python                              0x0000000104b1fe64 _PyFunction_Vectorcall + 240
	64  python                              0x0000000104b1f468 _PyObject_FastCallDictTstate + 320
	65  python                              0x0000000104b201e0 _PyObject_Call_Prepend + 164
	66  python                              0x0000000104b9735c slot_tp_call + 376
	67  python                              0x0000000104b1f1e8 _PyObject_MakeTpCall + 340
	68  python                              0x0000000104c2f6ac call_function + 724
	69  python                              0x0000000104c2bd44 _PyEval_EvalFrameDefault + 29268
	70  python                              0x0000000104c244a8 _PyEval_EvalCode + 2968
	71  python                              0x0000000104b1fe64 _PyFunction_Vectorcall + 240
	72  python                              0x0000000104b22cf0 method_vectorcall + 164
	73  python                              0x0000000104c2f614 call_function + 572
	74  python                              0x0000000104c2be40 _PyEval_EvalFrameDefault + 29520
	75  python                              0x0000000104c244a8 _PyEval_EvalCode + 2968
	76  python                              0x0000000104c87834 pyrun_file + 376
	77  python                              0x0000000104c86d48 PyRun_SimpleFileExFlags + 816
	78  python                              0x0000000104ca9e84 Py_RunMain + 2916
	79  python                              0x0000000104cab018 pymain_main + 1272
	80  python                              0x0000000104ac5ddc main + 56
	81  libdyld.dylib                       0x000000019ba2d430 start + 4
)
libc++abi: terminating with uncaught exception of type NSException

@erykoff
Copy link

erykoff commented Oct 20, 2021

@mtoseef99 That appears to be a bug in tensorflow or somewhere else, it has no relation to ipython or appnope.

@mtoseef99
Copy link

mtoseef99 commented Oct 21, 2021

@erykoff I tried different solutions, installing, re-installing miniforge and TensorFlow but nothing worked for me except changing the optimizer to sgd from adam.

Someone just talked about this on stack exchange today:
https://datascience.stackexchange.com/questions/102435/could-not-identify-numa-node-of-platform-gpu-id-0-on-m1-macbook/103338#103338

But using sgd is not helpful at all, accuracy is quite low in my case. I will further check it with different optimizers and architectures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants