Skip to content

What will be recommended HW setup for Realtime face Detection ? #1004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MyraBaba opened this issue Dec 9, 2017 · 15 comments
Closed

What will be recommended HW setup for Realtime face Detection ? #1004

MyraBaba opened this issue Dec 9, 2017 · 15 comments

Comments

@MyraBaba
Copy link

MyraBaba commented Dec 9, 2017

Hi,

This could be help full for all of us i assume :)

What will be the best setup (NVIDIA 1080ti or NVIDIA Jetson TX2 . etc ) in perspective of the both performance and price wise ?

Jetson is ARM based and suspecting the possible problems 1080ti could be expensive for my demo project.

I need at least 30 fps realtime face detection + Recognition (multiply faces in the camera).

According to your experience may be have an idea.

Many thanks for the library. I hope some day there is a solid java/scala port :) . I started to learn c++ because of the great DLIB.

Cheers

@davisking
Copy link
Owner

davisking commented Dec 9, 2017 via email

@MyraBaba
Copy link
Author

I will do the both soon and let you know the results here for comparison.

Meanwhile how we can use all available cores and cpus in the server in dblib ? (its only using one core ).

I read about the openblas and Intel MKL (paid) and installed both . I didnt see significant improvement still 1 or 2 core is busy. How I can check that dlib example using the blas or intel mkl ? I am using the Clion by the way (mac os x)

It could be very useful a blog post to explain how to benefit Dlib full power with armed with the full CPUs with openblas etc.

Many thanks..

@davisking
Copy link
Owner

davisking commented Dec 10, 2017 via email

@MyraBaba
Copy link
Author

Yes I saw below, even it says it found BLAS . still using only one core..

may be a very stupid question but I couldnt find a clear explanation for using full cpu power.

best...

cmake .. -DUSE_AVX_INSTRUCTIONS=1
-- The C compiler identification is AppleClang 9.0.0.9000038
-- The CXX compiler identification is AppleClang 9.0.0.9000038
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Enabling AVX instructions
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - found
-- Found Threads: TRUE
-- Looking for png_create_read_struct
-- Looking for png_create_read_struct - found
-- Looking for jpeg_read_header
-- Looking for jpeg_read_header - found
-- Searching for BLAS and LAPACK
-- Searching for BLAS and LAPACK
-- Found PkgConfig: /usr/local/bin/pkg-config (found version "0.29.2")
-- Checking for module 'cblas'
-- No package 'cblas' found
-- Checking for module 'lapack'
-- No package 'lapack' found
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of void*
-- Check size of void* - done
-- Found LAPACK library
-- Found CBLAS library
-- Looking for cblas_ddot
-- Looking for cblas_ddot - found
-- Looking for sgesv
-- Looking for sgesv - found
-- Looking for sgesv_
-- Looking for sgesv_ - found
CUDA_TOOLKIT_ROOT_DIR not found or specified
-- Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) (Required is at least version "7.5")
-- Disabling CUDA support for dlib. DLIB WILL NOT USE CUDA
-- Building a C++11 test project to see if your compiler supports C++11
-- C++11 activated.
-- Building a C++11 test project to see if your compiler supports C++11
-- C++11 activated.
-- Configuring done
-- Generating done

@davisking
Copy link
Owner

davisking commented Dec 10, 2017 via email

@MyraBaba
Copy link
Author

I will investigate ,

I am using : facedetection, facelandmark and face recognition part.

Is these benefits from multicore ?

@davisking
Copy link
Owner

davisking commented Dec 10, 2017 via email

@MyraBaba
Copy link
Author

I am using below code:

"Python"

face_locations = face_recognition.face_locations(frame)
or
face_locations = face_recognition.face_locations(frame, number_of_times_to_upsample=0,model = "cnn")

face_encodings = face_recognition.face_encodings(frame, face_locations)

and the encoding compare...

When profile : %56 of the consumed by :

face_detector= dlib.get_frontal_face_detector()

Is it caused by Python (not allowing the Dlib benefiting from cores that I should switch bare C++ . or BLAS is not multicore aware ?

I have plenty 8 cores and I can only use 1 of them ... (Mac Book Pro 2017)

@davisking
Copy link
Owner

davisking commented Dec 10, 2017 via email

@mcourteaux
Copy link
Contributor

Technical note: if your application allows latency (like 5 frames delay), you don't need to perform face detection on every frame. Just detect faces every 5 frames, and interpolate face positions between them. Just saying, in case this might be something that will do for your scenario.

@davisking
Copy link
Owner

That's a decent idea as well. But the deeper issue is that you shouldn't be calling code inside your processing loop that doesn't need to be there. Case in point, model loading code like get_frontal_face_detector has no business being called more than once, let alone on every frame.

@MyraBaba
Copy link
Author

MyraBaba commented Dec 12, 2017 via email

@xhuvom
Copy link

xhuvom commented Dec 18, 2017

I have installed dlib with AVX_INSTRUCTIONS and CUDA+cuDNN. But running a real-time facial detector (5 point) from webcam lags about 1~2 sec per frame when opencv capture is used on the code. The code should run smoothly about 30 fps (theoretically) on my GTX1080 GPU but I am confused whether Dlib using the GPU at all. Checking GPU memory while runtime shows only 15Mb consumption. Any idea whats happening?

@ariel415el
Copy link

Hi,
Any new ideas about how to verify that dlib uses GPU?

@davisking
Copy link
Owner

CMake tells you if it's going to use cuda when you install it. I also recently added the dlib.DLIB_USE_CUDA variable that you can look at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants