Skip to content

Initial refactoring #119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: master
Choose a base branch
from
Open

Conversation

ispielma
Copy link
Collaborator

@ispielma ispielma commented Feb 24, 2024

This is a real pull request that adds no functionality. Instead it is a first step for any large-scale refactoring of the lyse code base. All that was really done was pulling out the functional code from __main__.py into main.py. Even this is inherently useful because other files / modules can import from 'main.py', but owing to shamefully poor language design __main__.py cannot be imported from. For may larger aggregate pull request, this is why I refactored.

Doing this was slightly non trivial because __main__.py made liberal use of global variables that had to be transformed to method / function arguments and class variables.

Note that this is not intended to be a complete refactor, but any next steps should be pretty easy with the global scope variables banished.

This was tested both on a Mac and a live laboratory deployment of labscript.

@philipstarkey and @dihm : Is this about the scope you are looking for in a more-easy-to-audit pull request?

@dihm
Copy link
Contributor

dihm commented Feb 27, 2024

Thanks for getting this broken down! This is definitely the right size and scope Before doing a full review, I have a few higher level comments.

While I'd normally say to each their own, I don't think the amount of shade thrown on python is (fully) warranted. Now I'm probably equally biased the other way here, but I think some of your judgements fly in the face of standard python conventions and will make this code harder to work on longer term.

  1. Since python has made the choice that __main__.py is the only entry point for a stand-alone module, it actually is excellent design to prevent imports from it because it prevents all but inevitable circular imports. After all, why would something import from __main__.py and not ultimately need to be reimported back for actual use? Only separate API code would satisfy that assumption, but it should be in a separate file anyway.
    • I think wholesale moving everything from __main__.py to main.py is not as much of an improvement as it could be. Really, I'd like to see the GUI code broken up into descriptive modules (ie Analysis Worker stuff in one, dataframe stuff in another, the webserver on its own, etc).
    • Surely some of these items don't actually need to be moved from __main__.py, right? Do you really intend to import Lyse or LyseMainWindow somewhere else? I'd say anything that is only going to be used in __main__.py (long term) should stay there.
  2. This is more of a personal opinion, but I think "avoid globals at all costs" is scaremongering propaganda from the functional programming gang, especially in python. Because everything is an object, a "global" is just a module level attribute and in many cases is functionally equivalent to a class attribute. I personally don't feel a ton of motivation to manually track and pass around a handle to a singleton instance that is only used in a single module that is going to be modified in place via side effects anyway.
    • Ultimately moot since splitting up functionality between files breaks an assumption making this change necessary. Mostly just wanted to caution against casually going against python conventions since they are often conventions for a valid reason. And at the end of the day, PRs are easier to get reviewed and merged safely if we can agree on conventions and instead focus attention on the hard stuff.

@ispielma
Copy link
Collaborator Author

@dihm In terms of the actionable part of this review I can put the MainWindow back in __main__.py since it is unlikely to be imported. I am more than happy to break the new main.py into smaller files, I didn't do that before because I was trying to keep the commit modest in size.

@ispielma
Copy link
Collaborator Author

Regarding the other topics I don't want to start a philosophical war.

  1. The inability to import from __main__.py : perhaps this is a Python documentation or error reporting issue. The core problem is that the line from lyse.__main__ import ... from within a .py file in the lyse module doesn't behave as expected, and no error or warning is raised.
  2. My position is that could should be as easy as possible to understand by a 3rd party.
    a. So globals such as I_LOVE_GLOBALS = False are great. The common standard of all caps denotes them as some sort of global and they by convention are defined the top of a file, making them easy to find.
    b. The problem is hidden globals. For example in __main__.py the global qapplication was defined at the bottom of the script and inside the if __name__ == "__main__": block. This both makes it hard to understand the operation of the individual classes that reference qapplication in isolation without a wholistic understanding of the program and it is a bug because qapplication is not defined in the global scope if __name__ != "__main__"
  3. Thinking of amusing bugs. Run python and type import lyse.__main__ at the prompt. You will get the lyse spash window, but nothing else (because __name__ != "__main__").

@ispielma
Copy link
Collaborator Author

ispielma commented Feb 28, 2024

Update: I put Lyse and LyseMainWindow back into _main__.py and also fixed the amusing bug (3.) above. In doing so I realized that this combination actually defeats the splash window (because modules are loaded before the splash window opens as I had to move that to the if __name__ == "__main__" code block), so I removed them again. By having __main__.py nearly empty there are no required imports outside of if __name__ == "__main__" so we can be assured that splash does what we expect it to.

@ispielma
Copy link
Collaborator Author

And split "mostly gui" code into more files. This should pretty much complete the refactoring of what was __main__.py.

Copy link
Contributor

@dihm dihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying some things for me. I hadn't looked closely enough to realize how qapplication was defined. I agree that isn't great. It's a good thing to fix here while we are moving things around anyway.

As for import lyse.__main__, while I have little doubt documentation and behavior could be improved here at a language level, I still stipulate that it should never be done. So I wouldn't really classify unusual things happening when one does it as a bug and I don't think we should structure our code to protect against it. It shouldn't be done in the first place. In fact, I see the whole purpose of this PR is to ensure that action never has to happen.

This is relevant to the only real concern I have with the PR, namely that all these imports in __main__.py are decoupled from their usage. It means tooling can't ensure imports are used/missing and therefore ensuring that import list is accurate is a purely manual process (lyse will still work fine even if we add extra imports or take needed ones off). I really don't like it.

I'd much rather move current main.py stuff back into __main__.py and continue to rely on the import side effect to handle the splash. I don't see us losing any functionality, and it conforms to standard conventions so we are less likely to surprise future developers. Unless I'm missing something else, the only thing the current implementation is solving is allowing import lyse.__main__, but I don't think we should support that anyway.

Now, perhaps I am missing something. I am assuming that Lyse and LyseMainWindow do not need to be imported elsewhere for any reason in the future. My imagination may be limited, but I just don't see a valid reason for needing to do that that doesn't also entail a major structural change to how lyse works. And if that is the case, we should discuss it (ideally as another PR/Issue/etc, so this can move along).

Pretty sure my only other meaningful change request is some minor adjustments to the file structure. I think analysis.py is kind-of vague given analsis_subprocess.py exists. It also is only used in widgets.py for use with in RoutineBox (via an undeclared import, which is incidentally leading to a circular dependency since analysis.py has to import RoutineBox). I'd advocate for having a routines.py file that has RoutineBox and AnalysisRoutine in it. I'd also advocate moving more of the QT widgets from filebox.py into widgets.py (basically any class that only has QT dependencies and no specific lyse logic baked in should move to widgets.py).

Finally, I've noticed there are a few lingering unused imports in the files. Given that I'm asking for some changes here I won't list them all out, but an example is lyse.analysis and qtutils.icons in filebox.py. Annoyingly, my tooling is not catching the lyse.analysis unused import, but maybe that means something subtle is going on. Tread with caution I suppose.

spielman added 2 commits March 3, 2024 06:52
…ere are some other places with interprocess / thread communication is defined and getting that together seems wise. This commit gets a place for it ready.
@ispielma
Copy link
Collaborator Author

ispielma commented Mar 4, 2024

I am going to decouple some of these comments. First regarding the imports and SplashWindow. The SplashWindow is pure eyecandy that gives the user something to see while Lyse starts. Prior to this pull request pretty much everything that lyse was going to import was being also imported in __main__.py as a result the SplashWindow could give some indication about the progress of importing all of these imports. By pulling most/all of the code out of main.py there were very few imports and the SplashWindow therefore had few updates. My solution was to recover the old behavior by manually importing everything that was imported in __main.__.py prior to the refactoring. Personally I don't care about this behavior, but perhaps somebody likes it.

@ispielma
Copy link
Collaborator Author

ispielma commented Mar 4, 2024

Unfortunately import qtutils.icons needs to be present as it performs some sort of magic that allows the .ui files to reference that fugue icon set without defining the exact path in QtDesigner. I would like to add an explicit option to the ui loader such as

self.ui = loader.load(os.path.join(LYSE_DIR, 'user_interface/main.ui'), LyseMainWindow(self), fugue_icons=True)

to make this behavior explicit and avoid the magical import.

@ispielma
Copy link
Collaborator Author

ispielma commented Mar 5, 2024

The last set of changes should fully resolve @dihm's points.

Copy link
Contributor

@dihm dihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ispielma This is really good. Have two very small comments then I suspect it is ready to go.

I would like to stress test it a little more than the dummy shots on my home rig, but I won't have time until later this week. Hopefully that will give @philipstarkey sufficient time to look things over if he wants to weigh in. Otherwise, I think we'll be good to merge by Friday.

…ame_utilities as rangeindex_to_multiindex.
@dihm
Copy link
Contributor

dihm commented Mar 11, 2024

I also forgot, there is probably a little work that needs to be done on the documentation build to track the new changes. If you wanted to sort those out, I would appreciate it. If not, I'll get them sorted this week and make a PR to your branch with the changes.

@ispielma
Copy link
Collaborator Author

ispielma commented Mar 11, 2024

I am not that familiar with the documentation system (in terms of what is automatic and what is manual), but I will have a look.... OK so I have it generating the API in a way that is no worse than before, meaning that many functions / classes are not documented, but I think that this should be its own pull request.

Copy link
Contributor

@dihm dihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found another small issue testing the refactored code in the lab.

@dihm
Copy link
Contributor

dihm commented Mar 12, 2024

I am not that familiar with the documentation system (in terms of what is automatic and what is manual), but I will have a look.... OK so I have it generating the API in a way that is no worse than before, meaning that many functions / classes are not documented, but I don't think that will be its own pull request.

Thanks for sorting that out. Docs are definitely spotty, but at some level most people don't need to see the GUI docstrings anyway. In fact, we should probably consider splitting the lyse autosummary from all the others (ie divide up API and GUI docstrings) just to make it clearer what is actually important.

spielman and others added 3 commits March 13, 2024 09:47
…here and in dataframe_utilities.py to avoid circular dependencies, I moved these into utis.py and renamed them LYSE_PORT and LABCONFIG respectivly to denote their role as system wide constants. I also moved LYSE_PATH there as well for consistency, but re-exported it in __init__.py so it will still be accessible when lyse is imported.
@ispielma
Copy link
Collaborator Author

In working with __init__.py I realized that its exports are somewhat uncontroled. Meaning that everything in the namespace will be imported with import lyse or from lyse import *. To be clear, an example of this is

from lyse.dataframe_utilities import get_series_from_shot as _get_singleshot
from labscript_utils.dict_diff import dict_diff

in this case when one does import lyse the function dict_diff will be entered into the name space and lyse.dict_diff will be defined. Clearly the authors knew about this which is why we see ... as _get_singleshot.

I strongly suggest that I amend this pull request to also create a file lyse_api.py and move almost the whole content of __init__.py there and have __init__.py instead be more like

from lyse.lyse_api import ...
__all__ = [...]

this will make explicit what is being exported and provide better control over the namespace, and the use of __all__=[...] would define what we want from a * import.

Copy link
Contributor

@dihm dihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for moving the API to another file, I'm not totally sold on the benefit. We should definitely define a __all__, but we can just do that in place. The incidental imports still show up under import lyse (like sys, os, etc), but I'm less concerned about that in that situation.

In any case, given the docs have recommended from lyse import * for forever, this would be a breaking change, so we should save it for another PR.

dihm
dihm previously approved these changes Mar 18, 2024
Copy link
Contributor

@dihm dihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with where this. I'll give @philipstarkey a little more time to look it over before merging, now that the bugs have been sorted out.

@dihm dihm requested a review from philipstarkey March 27, 2024 06:58
Copy link
Member

@philipstarkey philipstarkey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just reviewed the changes for now. I'm also hoping to find time to check out the new code and give it a look over to see if there is anything that stands out about the structure (which can be hard to see with all of the noise from the diff)


# lyse imports
import lyse.dataframe_utilities
import lyse.utils
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little bit hesitant about this lyse.utils import. lyse.utils is importing Qt packages. But the lyse package (e.g. this file) can be imported in either the GUI, or the worker process. We are probably tied to a Qt matplotlib backed in the worker process anyway? But it would be nice if we were not, or at least minimally so.

One solution would be to make a utils dir, with three files (__init__.py, gui.py, worker.py - or whatever names you like) so that the utils and imports can be split between only for gui, only for worker, or for both. That allows more specific imports to be made (note that the contents of the init file will be imported if either of the others are so that one should try to stay clean of Qt stuff as well).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a go at implementing this and did end up settling on your suggested divisions. All said and done, I think there is a decent argument for moving dataframe_utilities into the submodule or flattening the structure again and having utils, gui_utils and worker_utils at the top level. I am leaning towards the latter, but I think there is a more fundamental issue to fix first (I'll comment the code elsewhere) before sorting out stylistic preferences like this one.

lyse.figure_manager.install()

from matplotlib.backends.backend_qt5agg import NavigationToolbar2QT as NavigationToolbar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit nervous about relocating this import to before the figure manager is installed. Any idea if it has consequences?

There is a brief line in figure_manager.py about needing to patch matplotlib before importing pylab. I think maybe this import order dependency needs investigating a little bit more, and then documenting (or we revert the change of import location and log an issue to investigate it later)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think this is OK. The comment specifically calls out that the install needs to occur before importing matplotlib.pyplot. I don't think there are issues with importing parallel submodules from elsewhere in matplotlib.

Comment on lines -30 to -32
splash.update_text('importing h5_lock and h5py')
import labscript_utils.h5_lock
import h5py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the h5_lock import happening now? Until it's imported, anything that uses h5py could access h5 files without locking them, which could lead to corrupt files. Worse, if something imports h5py.File explicitly (e.g. from h5py import File) before h5_lock is installed, then it won't ever get the patched version of h5py. To be honest I'm wondering if it should happen even before numpy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the answer is the main GUI does not perform any direct accesses to h5 files. These accesses are provided by the main API, filebox, and dataframe utilities. The import is therefore handled within the core Lyse modules portion.

I did, however, find an upatched h5py import in filebox that will be fixed.

Comment on lines -2340 to -2343
# Start the web server:
splash.update_text('starting analysis server')
server = WebServer(app.port)
splash.update_text('done')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relocation of the WebServer into the Lyse class, and instantiating it before the GUI is set up, means that any messages received immediately on start could crash lyse when the WebServer.handler method tries to access something about the UI that isn't instantiated yet.

The previous implementation worked such that messages could be received, and placed in appropriate event queues before the Qt loop was even started.

I suspect, given the auto-retry behaviour of BLACS, that there would be scenarios where the crash will occur (it's a bit of a race condition though so may not be easily replicable)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I understand the issue here. Would moving the start of the WebServer to the end of the GUI init be sufficient? I believe that would recover the older behavior. I don't think we can go back to the old behavior exactly since it relied on globals-like behavior which is a bit difficult to reason about.

lyse/utils.py Outdated
Comment on lines 27 to 32
try:
LABCONFIG = LabConfig(required_params={"ports": ["lyse"]})
LYSE_PORT = int(LABCONFIG.get('ports', 'lyse'))
except Exception:
LABCONFIG = None
LYSE_PORT = 42519
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super comfortable with this. Instantiating a LabConfig just because you import lyse.utils has a bad code smell to it. It also isn't used by anything in this file. Seems like it was just move to fix some sort of circular dependency issue?

I'm sure there are a few possible solutions. One that may be the simplest is just to move the import of the labconfig inside the function that uses it, so that it can import it from lyse. That's perfectly acceptable to do, and a good way to break circular dependencies if one part only needs something at run time, not load time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBF, the LABCONFIG is used to define the LYSE_PORT in this file.

In any case, I think the goal was to not hold multiple references to the LABCONFIG instance, while needing different outputs from it in both __init__ and dataframe_utilities. And given lyse.utils is imported in __init__ anyway, I'm not sure the import side-effect is really all that meaningful.

Of course, we could extract LYSE_PORT and integer_indexing in a way that doesn't hold the reference to the labconfig at all, and even move that into their used locations (since they are only needed once each). But that feels a little overly defensive to me, especially considering that the labconfig is something that should be fairly universal. But maybe I'm missing the more subtle point here.

@dihm
Copy link
Contributor

dihm commented Mar 14, 2025

@ispielma I believe I have finally cleared off some time from my calendar, and this PR is well beyond over due to be finished and merged. If at all possible, I'd like to get it done in the next week or two. If that timing doesn't work for you, I can take over (by pushing to your branch or creating a new PR based on this one) and try to address Phil's review so this can finally be done. Let me know what your preferences are.

@dihm
Copy link
Contributor

dihm commented Apr 14, 2025

@ispielma I have started working on some commits to address Phil's review. Are you OK with me pushing to your branch directly? I think it would be best to keep the work in a single PR (if possible, github rules for pushing to forks can be odd sometimes).

@@ -55,13 +61,6 @@
# a flag to determine whether we should wait for the delay event
_delay_flag = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variables are causing quite a bit of grief for me. We have a decent number of places that need them across the modules, and I think it is generally considered a bad code smell to import from the module __init__. Places these global-like values are used:

  • analysis_subprocess
    • _delay_flag
    • delay_event
    • spinning_top: redefinition
    • _updated_data
    • path: redefinition
    • plots: redefinition
    • Plot: redefinition
  • figure_manager
    • spinning_top
  • utils.worker: I've moved stuff here from __init__ on my working branch; I am slowly realizing that I don't really understand their purpose
    • spinning_top: register_plot_class, get_plot_class, and delay_results_return
    • _plot_classes: register_plot_class and get_plot_class
    • _delay_flag: delay_results_return
  • __init__
    • path
    • spinning_top in Run.save_result
    • _updated_data in Run.save_result

I'm still trying to grok exactly how they are used and what their purpose is, as it isn't obvious to me why they are defined in this way. Most of them only appear to be necessary if you are using a script within the lyse GUI, and they wouldn't even be called with the plain API. I suspect I am unaware of some standard workflow. Do people commonly run their lyse scripts in the GUI, then also run them as stand-alone batch scripts unmodified (hence why register_plot_class, get_plot_class and delay_results_return are defined in the API)?

Right now, I am inclined to move these global-like variables down to utils.worker. Ideally I'd also like to consider other ways for the GUI to communicate to the analysis subprocesses that doesn't rely on passing through module variables. I'm probably not the man for that job, but I feel like there must be a better way. Perhaps manual injection into the AnalysisWorker.routine_module.__dict__?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spending a little more time with this, I don't see the purpose of lyse.Plot or lyse.plots. They do not appear to be accessed anywhere, and they are only set to something meaningful within AnalysisWorker.do_analysis. I am fairly confident they can just be removed. Am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are part of the next step of my Lyse changes that seem to have leaked into this refactor. Their function is associated with making the lyse plot windows "first class" windows whose positions are when lyse shuts down. This is the highest priority Lyse request in my group.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, though they appear to be on mainline already. Doesn't really matter where they came from I suppose, I'll plan on keeping them in.

Would it be possible to implement figure position tracking independent of lyse globals? Chunking out distinct functionality, where feasible, goes a long way in ensuring these reviews don't drag on into absurdity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I have not visited this in about a year, I will use my old code as reference, but rewrite the implementation based on the current state of Lyse.

@dihm
Copy link
Contributor

dihm commented Apr 15, 2025

OK, I've gone ahead and pushed minor edits directly to the refactor branch. I've also created a PR to Ian's branch (ispielma#1) that contains more disruptive changes. Hopefully it makes it easier to see/discuss what they are relative to these changes as they presently exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants