Skip to content

Enable static linking of OS dependencies on Linux #6169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gkhanna79 opened this issue Jun 17, 2016 · 56 comments
Closed

Enable static linking of OS dependencies on Linux #6169

gkhanna79 opened this issue Jun 17, 2016 · 56 comments
Assignees
Labels
area-Infrastructure-coreclr blocking Marks issues that we want to fast track in order to unblock other important work os-linux Linux OS (any supported distro)
Milestone

Comments

@gkhanna79
Copy link
Member

No description provided.

@davidfowl
Copy link
Member

👏

@davzucky
Copy link
Contributor

@gkhanna79 anbd @janvorli
I started the implementation on CoreFx which was checked in over the week end. The implementation of the CoreClr is on work in progress on my side as I had some problem with the ICU library that I solved last week. I hope to send the push request in the coming week.
After that we will need to add library as incremental. You can refer as well to that issue about the same topic https://github.com/dotnet/coreclr/issues/4436

@davzucky
Copy link
Contributor

Licensing question, findICU.cmake is not part of the basic cmake support to find library, however they are a few available on github which could be quite nice to add in the project. What is the policy to use some of file like ?

Or could as well start to build our own version. Any comment ?

@gkhanna79
Copy link
Member Author

CC @jkotas

@jkotas
Copy link
Member

jkotas commented Jun 21, 2016

I have run https://github.com/julp/FindICU.cmake through the approval process. It is ok to use it. Could you please add it in a separate PR that will:

  • Add FindICU.cmake file
  • Add license to THIRD-PARTY-NOTICES

@sergiy-k sergiy-k assigned gkhanna79 and unassigned sergiy-k Sep 1, 2016
@eyoung100
Copy link

Darn, @shahid-pk you beat me to a reference but THANK YOU!!!

@janvorli
Copy link
Member

I have experimented with using libcoreclr.so built on Ubuntu 14.04 and it seems it works fine on SUSE 14.2 and CentOS 7.1. That build of libcoreclr.so depends on most functions from GLIBC 2.2.5, then there are three 2.3, one 2.3.2, one 2.4, one 2.6, two 2.12 symbols and the newest is memcpy with version 2.15. The GLIBC version 2.2.5 was released on 2002-01-20 and version 2.15 on 2012-03-21. Since this seems like this build of libcoreclr.so depends on a reasonably old GLIBC, it may be possible to use that build as a universal libcoreclr.so for all supported Linux systems.
All the libXXXX.so built in the coreclr repo are all the same story and the tools like corerun, coreconsole, ilasm, ildasm too.
The System.Globalization.Native.so is a different story though, since it links to libicui18n.so and libicuuc.so. Since these don't seem to be on the back compatibility plan as the GLIBC, I guess we would need to link these statically.

@janvorli
Copy link
Member

Similarly for GLIBCXX versions - the newest symbol referenced from that library by any of the coreclr components is from version 3.4.15, which was released on 2011-03-25.

@gkhanna79
Copy link
Member Author

@janvorli Is GLIBCXX 3.4.15 the lowest common denominator?

@janvorli
Copy link
Member

@gkhanna79 do you mean lowest common denominators for the currently supported Linux distros?

@gkhanna79
Copy link
Member Author

@janvorli Sort of. What all set of distros, which we support today, would this work for and how does it map to the set of distros we can support using GLIBC version in question? In principle, we want them to be consistently supported and move in tandem.

@janvorli
Copy link
Member

@gkhanna79 here is a list of distros and the versions of glibc / glibcxx they ship with. The glibcxx version represents the highest version of a symbol in the glibcxx. I don't have Redhat 7.2 installed and there is no docker container with it to enable me to give it a quick try. But I assume it will be the same or higher than CentOS 7.1.

Distro GLIBC GLIBCXX
CentOS 7.1 2.17 3.4.19
Debian 8.4 2.19 3.4.20
openSUSE 13.2 2.19 3.4.19
openSUSE 42.1 2.19 3.4.21
Red Hat 7.2 ? ?
Fedora 23 2.22 3.4.21
Ubuntu 14.04 2.19 3.4.19
Ubuntu 16.04 2.23 3.4.21
Ubuntu 16.10 2.23 3.4.22

@obadz
Copy link

obadz commented Oct 11, 2016

Possibly related to dotnet/coreclr#7573 ?

@davzucky
Copy link
Contributor

@janvorli The first think I tried to statically link is the libicu library that are used by System.Globalization.native.so that you pointed, however I keep getting into the wall with this library.
I tried a lot of distribribution and keep getting the same error. From the docuemntation it look like you can, however I'm getting an error from the linker about position independent. I recompile libicu with the option -fPIC but still getting the same error.

Would you have more knowledge about that ?

@janvorli
Copy link
Member

@davzucky I assume when you said you have compiled the libicu, you have done that for all the three icu libraries we depend on (libicuuc, libicui18n and libicudata), right? Are you sure that our build was picking the ones you have built instead of the ones installed by the package manager? Are you sure you have added -fPIC to all necessary places in their build files? How did you modify the CMakeLists.txt to use the static libraries?
And if the library contains some asm code, then it might need to be tweaked to be position independent.

@gkhanna79
Copy link
Member Author

Thank you @janvorli. Looks like the following the least common denominator (excluding RHEL) for now:

GLIBC: 2.17
GLIBCXX: 3.4.19

@ellismg Do you have access to a RHEL machine that @janvorli can use for his investigation?

@davzucky
Copy link
Contributor

@janvorli

  • I assume when you said you have compiled the libicu, you have done that for all the three icu libraries we depend on (libicuuc, libicui18n and libicudata), right? Yes I mean all the three
  • Are you sure that our build was picking the ones you have built instead of the ones installed by the package manager? I installed the lib after I build them. So that replace the one I have from the install and I shoult not have any problem. I checked the command line that the linker was using and it was referencing the right path
  • Are you sure you have added -fPIC to all necessary places in their build files? I expect yes. I will try to buikd it again and test
  • How did you modify the CMakeLists.txt to use the static libraries? I updated the current script to take the + adding extra flag. like I did in corefx before.
  • And if the library contains some asm code, then it might need to be tweaked to be position independent. Thank you for the information I wasn't aware about that. I will check that and let you know.

I will check all your point over the week end and come back to you. Thank you

@janvorli
Copy link
Member

I have made an experiment to enable System.Globalization.Native.so to use any version of ICU libraries that is present on the system ("any" means within some limits). I got the idea from the ICU doc where they describe building ICU libraries as OS components where an application compiled against one version of the ICU can still work when the ICU gets upgraded.
They have a concept of stable APIs - those APIs that are marked as stable are guaranteed to have the same signature / semantics in the future versions of the libicu. And for the usage of ICU as OS libraries, you can disable the renaming of the APIs. So instead of u_tolower_52 in version 52 of the ICU, you'd get u_tolower. When consumers of the APIs define #define U_DISABLE_RENAMING 1, the header files provide prototypes of the functions without the version number.
You can see the details here: http://userguide.icu-project.org/design#TOC-ICU-Binary-Compatibility:-Using-ICU-as-an-Operating-System-Level-Library
Fortunately, all the APIs we are using are stable. So, here is what I did:

  1. I have defined U_DISABLE_RENAMING as 1 for the build of System.Globalization.Native.so. That compiles everything referencing the unversioned symbols.
  2. I have added an initialization code that finds the installed libcuuc.so.xx and libcui18n.so.xx libraries, calls dlopen to load them dynamically and then calls dlsym for each API name with added version number that it has found and assigns that to a global variable of a type of a pointer to the API function and named by the API with _ptr added.
  3. Add a header that redefines (using #define) each API name to the name with _ptr added at the end.
  4. Include that header after all the unicode/xxxx headers are included.

This essentially makes the code to use the function pointers extracted dynamically using the dlsym.

I have ran the System.Globalization.Tests and System.Runtime.Tests with this modified System.Globalization.Native.so and all of them have passed.

So it seems like a feasible way, we just need to figure out the minimum version of ICU that we can support. Then we don't need to try to link ICU statically.

cc: @gkhanna79, @Petermarcu, @stephentoub. Do you guys have any concerns?

@stephentoub
Copy link
Member

I like it. That approach also would make it easy to add fallback implementations if ICU can't be found, e.g. just by pointing the function pointers to a local implementation that provides some ASCII/invariant/etc. implementation.

@stephentoub
Copy link
Member

Though, if we did want to statically link it once we moved to this scheme, how would we do it? Include a shim that calls each relevant function and just assign those shims to the pointers?

@janvorli
Copy link
Member

For static linking, we would just ifdef-out the body of the new header and exclude the new shim cpp that creates the xxx_ptr variables and uses dlopen / dlsym to set them from the build.

@ellismg
Copy link
Contributor

ellismg commented Oct 20, 2016

@janvorli We will need to move away from some of our build time customization where we configure to light up if features are present in ICU. Specifically, in this mode, https://github.com/dotnet/coreclr/blob/32f0f9721afb584b4a14d69135bea7ddc129f755/src/corefx/System.Globalization.Native/calendarData.cpp#L550 should always be defined to FALSE.

We will need to support ICU 50 at least, that's what's present on RHEL 7.0

The approach seems fine to me. When I wrote the code I made sure that we didn't take a dependency on things that were not outside the stable C surface area, so we could directly link against the OSX version of this library.

Note we will want this to be configurable at build time, so we can link against a system ICU when a distro owner is building us from source.

@janvorli
Copy link
Member

@ellismg That makes sense. Fortunately, making this configurable at build time is as easy as what I have described for static linking to @stephentoub.

@gkhanna79
Copy link
Member Author

The approach seems very reasonable to me. And if such investigations deem that we can be successful with dynamic linking in the goal of building binaries that run consistently across distros, then we may want to reevaluate the value static of static linking.

@stephentoub
Copy link
Member

if such investigations deem that we can be successful with dynamic linking in the goal of building binaries that run consistently across distros, then we may want to reevaluate the value static of static linking

Dynamic linking in this regard still doesn't help a case where libicu is not available on the target. In such a case, with dynamic linking we could fall back to a default implementation, but if we actually wanted true globalization support, we would still need to statically link for such cases.

@gkhanna79
Copy link
Member Author

In such a case, with dynamic linking we could fall back to a default implementation,

Precisely - and for such cases, we will need to evaluate if such scenarios also constitute "lightup" for us since, functionally, the experience will just work.

@bartonjs
Copy link
Member

Statically linking OpenSSL is a very, very bad idea. Especially if it is going to a customer. (If it's an opt-in or bootstrap-only situation that's different). Static inclusion of a frequently-updating security component is a very hard train to follow... because every OpenSSL CVE is also a dotnet CVE through static inclusion.

I don't know what the costs of the preloading would be. If it's effectively writing code as LoadLibrary/GetProcAddress it would take quite a lot of convincing for me to believe it was anywhere near justified. Safe compilation is a must-have. If the preload stuff is tooling done after (or with the assistance of) compilation then I don't know what other costs it would have.

And if the point is to avoid needing to build a per-distro version of the shim, we can't really do that at all. There are lots of #defines that we need to respect on a per-distro level, like where the root certificate store is.

@janvorli
Copy link
Member

@ellismg regarding the full static linking, I was investigating it in parallel while thinking about the options we have. The issue is that none of the static versions of dependencies we have can be used to create a shared library. I've tried to statically link with libcurl, libssl, libcrypto and ICU libraries and in all cases, as I have already said before, I have hit the problem that they were not built with -fPIC compiler option (or possibly contains ASM code that is not position independent). So in all of these cases, we would need to build the libraries ourselves, modifying the build options to make their code position independent.
Moreover, libcurl has quite a long list of other dependencies that we would also need to link in statically. Running ldd on libcurl shows 37 libraries it depends on.

Regarding your preference to implement it the same way as for the ICU, it would of course work. It just adds overhead of having to iterate over all the used symbols, calling dlsym to get their addresses and set the pointers which is unnecessary if we can just preload the libraries. There are also many more APIs that we use from the libssl / libcrypto than the number of those in ICU (284 vs 84).
However, I have just realized there is possibly one more way in this case - if I used #pragma weak on all the symbols, then the preload can happen in the System.Security.Cryptography.Native.so itself similar to what I did in the other library and the weak symbols would kind of represent the shim. But I haven't played with that yet.

@janvorli
Copy link
Member

@bartonjs thank you for pointing out the distro specific details like the cert store location. Are those really distro specific or is it possible to say that e.g. Fedora derived distros use one way, some other group of distros use another way etc?
Would you be able to put together a list of distro specific defines that you've mentioned or point me to a place where they are?
Does it sound unrealistic to think that even those distro specific details can be provided by the shim depending on the distro it is running on?

@bartonjs
Copy link
Member

@janvorli locate opensslconf.h. Though, the two that I were really worried about do seem to be queried via real functions (X509_get_default_cert_file and X509_get_default_cert_dir) (as opposed to macros pretending to be functions). I don't know how preload would handle it if the distro turned off big features which make function exports disappear entirely... so that may still be a concern.

For an example of the feature switch: Ubuntu 16.04 defines OPENSSL_NO_RC5, so the function EVP_rc5_32_12_16_cbc isn't compiled, or exported. But if we'd called it in our "portable" shim something would happen, and we'd need to be prepared for it. Right now we'd get a compile error on that distro and we'd presumably make our shim version of that function return an error indication under that #if.

I don't know that we can make generalizations about a distro's configuration (bucketing as families or even across major versions)... and I'd really, really hate to be wrong. But if everything that's in opensslconf.h (which are the "expected" tuning parameters) is queried via functions or we can identify when the functions are gone... well, maybe it's a doable plan.

@janvorli
Copy link
Member

@bartonjs Identifying functions that are not present is easy. If I create a shim that uses dlsym to get all the symbol addresses, the missing one would get NULL and we can check for that in the code. If we used #pragma weak, it would be the same case - again, testing a function symbol to NULL checks whether the function was resolved or not.

@gkhanna79
Copy link
Member Author

Platform Agnostic Linux Binaries are now available from DailyBuilds section of https://github.com/dotnet/core-setup/blob/master/README.md.

@tmds
Copy link
Member

tmds commented Jan 26, 2017

@janvorli @ellismg I think this feature has been tied to the linux-x64 rid. Does that imply something for libraries with native components that provide binaries for linux-x64 (e.g. libuv, sqllite). Should they be linked with a specific version of glibc? For example, SQLite has a GLIBC_2.4 symbol:

0000000000000000      DF *UND*	0000000000000000  GLIBC_2.4   __stack_chk_fail

Would it make sense to include the c library in the rid to make this more explicit (e.g. linux-x64-glibc)?

@tmds
Copy link
Member

tmds commented Jan 27, 2017

Would it make sense to include the c library in the rid to make this more explicit (e.g. linux-x64-glibc)?

Taking it even a step further, the glibc version could also be in the rid. This would allow to express the version supported by the different distributions and the glibc backward compatibility. Libraries can specify the version they link against.

The table by @janvorli can then be expressed as:

linux-x64
^-linux-x64-glibc.2.17        <- centos.7-x64
  ^-linux-x64-glibc.2.18
    ^-linux-x64-glibc.2.19    <- debian.8-x64, opensuse13.2-x64, opensuse42.1-x64, ubuntu.14.04-x64
      ^-linux-x64-glibc.2.20
	...

@janvorli
Copy link
Member

janvorli commented Jan 27, 2017

@tmds I don't think putting in the glibc version would be beneficial. We now depend on minimum glibc version 2.14 and it is very unlikely that we would ever have a version where we would have a build requiring e.g. version 2.14 and another build requiring version 2.20. If we ever change the requirement to a higher version, it would be because we would add some feature that requires higher version, so we could not just have a build that would still work with the older one.

@tmds
Copy link
Member

tmds commented Jan 27, 2017

@tmds it is very unlikely that we would ever have a version where we would have a build requiring e.g. version 2.14 and another build requiring version 2.20

Makes sense.

Where does version 2.14 come from? Was this the version on rhel? I had expected that to be similar as centos.

I guess this also means libraries providing native components under the linux-x64 rid preferably link with 2.14 or less?

Would it make sense to have a linux-x64-glibc rid (without the version) to be able to differentiate with systems using a different libc (e.g. containers)?

@janvorli
Copy link
Member

@tmds The version 2.14 actually reflects the highest version of a symbol we use from glibc when building on the oldest distros we support.

As for the libraries providing native components under the linux-x64 rid, they don't necessarily have to link against the glibc 2.14 or less, but they should not use symbols of newer versions if they want to work on the same range of distros as the dotnet itself.

Regarding the linux-x64-glibc rid, I have an ambivalent feeling about it. On one hand, it makes sense and if we wanted to create similar package for musl based Linux distros, we could name them in a similar manner. On the other hand, now that we already have rids for specific Linux distros that don't contain the glibc in them, it seems little strange to not to have glibc in the distro specific ones, but have it in the portable one.

Although, I am starting to think that it actually might make a sense if I start seeing the glibc on the same level as the distro name in the rid.
@gkhanna79 what do you think about it? What if we viewed the rid hierarchy so that linux-x64-glibc is on the same level in the hierarchy as e.g. ubuntu.16.04-x64. Or actually, it should rather be linux-glibc-x64.

@jkotas
Copy link
Member

jkotas commented Jan 27, 2017

I believe we are moving away from capturing the Linux versions hierarchy in the RID graph because of we have realized it is way too complex and it does not work well...

@tmds
Copy link
Member

tmds commented Jan 27, 2017

Regarding the linux-x64-glibc rid, I have an ambivalent feeling about it. On one hand, it makes sense and if we wanted to create similar package for musl based Linux distros, we could name them in a similar manner.

You or someone else. Anywho, he/she will have to pick an rid and a parent rid. It would make sense for this parent to be 'linux-x64' but then it shouldn't be tied to glibc.

On the other hand, now that we already have rids for specific Linux distros that don't contain the glibc in them, it seems little strange to not to have glibc in the distro specific ones, but have it in the portable one.

I believe we are moving away from capturing the Linux versions hierarchy in the RID graph because of we have realized it is way too complex and it does not work well...

I don't see the point of some of these rids like ubuntu, ubuntu-x64, ubuntu.14.04.

I think an RID should either be a runtime endpoint or a meaningful target of a library. For it to be a meaningful target of a library, an API/implementation must be available compared to the parent (e.g. glibc).

@tmds
Copy link
Member

tmds commented Jan 30, 2017

The rid graph for ubuntu.14.04-x64 looks like this now:

                               base
                                 |
                                any
                                 |
                                unix
                               |     |
                          linux     unix-x64
                         |      |       |
                   debian       linux-x64
                  |      |        | 
            ubuntu       debian-x64       
          |        |        |
ubuntu14.04        ubuntu-x64
      |              |
      ubuntu.14.04-x64

Are these RIDs used in practice (by .NET Core or any package on NuGet): unix, debian, ubuntu, ubuntu14.04, ubuntu-x64, debian-x64?

A much simplified version could look like this:

base <- any <- linux                                       // pure managed libraries
                 ^- linux-x64   <- linux-glibc-x64         // library with native bits
                                       ^- ubuntu.14.04-x64 // runtime end-point

@hqueue
Copy link
Member

hqueue commented Mar 29, 2017

/cc: @jyoungyun @hseok-oh @chunseoklee

@gkhanna79 gkhanna79 removed their assignment Mar 30, 2017
@jyoungyun
Copy link
Contributor

@gkhanna79 Do you have a plan to remove all dynamic link in coreclr including glibc, libunwind and the other all library? This decision, however, makes the binary size large. Should I be able to make a choice?

@janvorli
Copy link
Member

@jyoungyun no, that's not the plan. Definitely not for glibc, glibcxx or ICU. We may statically link libunwind. The current plan to enable portable dotnet core is to enable people to add local copies of shared libraries that would otherwise have to be installed on the target distro if they want to. Again, glibc and glibcc should never be included, since it could cause a lot of trouble (e.g. dotnet having one version of glibc interoping with other native code built with other version of it, etc.).

@jyoungyun
Copy link
Contributor

jyoungyun commented Mar 31, 2017

@janvorli Thanks for the friendly explanation. I understood glibc or any other libraries as an atempt to static link. But I'm glad you did not do all the library. I know about the portable dotnet core, and I know that in this case the difference in the library version of the host env. can cause problems. Howerver, static link can cause another problem as you said. I know that coreclr already has a static link to the ICU. I think it is also part of this issue. Everything is clear, Thank you.

@janvorli
Copy link
Member

@jyoungyun the ICU is not statically linked to coreclr (or to the System.Globalization.Native.so, to be precise). It is dynamically linked and a recent change in coreclr even removed the need to have it installed at all. It is now optional. If it is not installed, then a simple managed version of globalization supporting just en-US is used. That allows users to not to install ICU in case the apps they are using don't care about supporting other locales.

@jyoungyun
Copy link
Contributor

@janvorli Thank you. I understood that you removed the dependency on the fixed icu version at build time and if we use System.Globalization.Native.so in an environment without icu, you give a chance to use the pre-built library, which provides only en-US. In fact, Tizen uses a different version of icu than ubuntu, so a fixed version check at the time of build was causing issues. However, recent fixes are good to us because they solve these problems. Thank you for kindly explanation.

@gkhanna79
Copy link
Member Author

@janvorli Should this be closed?

@janvorli
Copy link
Member

After the investigations and various experiments, it turned out that static linking is not feasible from various points of view and that a better approach to achieve portability over Linux distros is to change linking of some dependencies from build time to run time and to enable bundling of local copies of shared libraries. This has proven to work well, so I am closing this issue.

@radu-matei
Copy link

a better approach to achieve portability over Linux distros is to change linking of some dependencies from build time to run time and to enable bundling of local copies of shared libraries. This has proven to work well

@janvorli - could you please link to how you can do this?
Thanks!

@janvorli
Copy link
Member

janvorli commented Sep 1, 2017

@radu-matei please see https://github.com/dotnet/core/blob/master/Documentation/self-contained-linux-apps.md

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 2.0.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure-coreclr blocking Marks issues that we want to fast track in order to unblock other important work os-linux Linux OS (any supported distro)
Projects
None yet
Development

No branches or pull requests