Skip to content

Applying some changes #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 5, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
#
cmake_minimum_required (VERSION 3.8)

project ("entropy")
project (entropy VERSION 0.0.2)

set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED True)
add_executable (entropy "entropy.cpp" )
set(CMAKE_CXX_STANDARD_REQUIRED ON)
add_executable (entropy entropy.cpp)

if (WIN32)
set_target_properties(entropy PROPERTIES LINK_FLAGS "/link setargv.obj")
Expand Down
17 changes: 12 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,7 @@ this tool, which also supports Linux and macOS.

## Download

Windows releases are available [here](https://github.com/merces/entropy/releases). In order to run them,
you need the latest [Microsoft Visual C++ Redistributable](docs.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist)
Windows releases are available [here](https://github.com/merces/entropy/releases). In order to run them, you need the latest [Microsoft Visual C++ Redistributable](docs.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist)
installed.

For Linux and macOS see the [Building](#Building) section.
Expand All @@ -25,19 +24,23 @@ For Linux and macOS see the [Building](#Building) section.

Calculating the entropy of a single file:

```bash
$ ./entropy /bin/ls
5.85 /bin/ls
```

Shell expansion is supported too:

```powershell
PS C:\> .\entropy.exe C:\Users\User\Downloads\*
7.92 C:\Users\User\Downloads\1.jpeg
8.00 C:\Users\User\Downloads\setup.exe
7.58 C:\Users\User\Downloads\nov.pptx
4.66 C:\Users\User\Downloads\data.bin
7.99 C:\Users\User\Downloads\pic.png
4.07 C:\Users\User\Downloads\budget.xls

```

From the above output one could say `/bin/ls` is not packed, `1.jpeg` uses compression,
`setup.exe` is compressed, `nov.pptx` is compressed (yup, these modern MS Office files are all
ZIP files indeed), `data.bin` is not compressed, etc. Is that garuanteed? No, it's just math. :nerd_face:
Expand All @@ -48,6 +51,7 @@ ZIP files indeed), `data.bin` is not compressed, etc. Is that garuanteed? No, it

Clone the repo:

```bash
$ git clone https://github.com/merces/entropy.git
$ cd entropy

Expand All @@ -57,11 +61,14 @@ If you have CMake installed, build with:
$ cd build
$ cmake ..
$ make

```

Or if you don't, just use `g++`:

```bash
$ g++ -std=c++20 -o entropy entropy.cpp

```

### Windows

If you use a recent Visual Studio version, you can clone this repository and open the `CMakeLists.txt` here
Expand Down
48 changes: 26 additions & 22 deletions entropy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,10 @@

double calculate_entropy(const unsigned int counted_bytes[256], const std::streamsize total_length) {
double entropy = 0.;
double temp;

for (int i = 0; i < 256; i++) {
double temp = (double)counted_bytes[i] / total_length;
temp = static_cast<double>(counted_bytes[i]) / total_length;

if (temp > 0.) {
entropy += temp * fabs(log2(temp));
Expand All @@ -19,9 +20,9 @@ double calculate_entropy(const unsigned int counted_bytes[256], const std::strea
}

void usage() {
std::cout << "entropy calculates the entropy of files, but you need to provide it with a file. :)\n\n" <<
"Usage:\n\tentropy FILE\n\n" <<
"Examples:\n\tentropy image.png\n\tentropy music.mp3 document.xls\n\tentropy *.exe\n\n" <<
std::cout << "entropy calculates the entropy of files, but you need to provide it with a file. :)\n\n"
"Usage:\n\tentropy FILE\n\n"
"Examples:\n\tentropy image.png\n\tentropy music.mp3 document.xls\n\tentropy *.exe\n\n"
"For more information and bug reporting, refer to https://github.com/merces/entropy\n";
}

Expand All @@ -35,40 +36,43 @@ int main(int argc, char *argv[])
// Entropy will have two decimal places
std::cout << std::fixed << std::setprecision(2);

// 16KB chunks
std::vector<char> buff(1024*16, 0);
std::streamsize total_bytes_read = 0;
std::streamsize bytes_read;
unsigned char count;

// Count occurrence of each possible byte, from zero to 255.
unsigned int counted_bytes[256] = {};

for (int i = 1; i < argc; i++) {
// Skip directories, symlinks, etc
if (!std::filesystem::is_regular_file(argv[i])) {
std::cerr << "\"" << argv[1] << "\"" << " isn't a regular file, skipping." << std::endl;
continue;
}

// Open the file
std::ifstream f(argv[i], std::ios::binary);
if (f.fail()) {
std::cerr << "Could not open \"" << argv[i] << "\" for reading.\n";
std::ifstream input_file(argv[i], std::ios::binary);
if (input_file.fail()) {
std::cerr << "Couldn't open \"" << argv[1] << "\" for reading." << std::endl;
continue;
}

// 16KB chunks
std::vector<char> buff(1024*16, 0);
std::streamsize total_bytes_read = 0;

// Count occurrence of each possible byte, from zero to 255.
unsigned int counted_bytes[256] = { 0 };

// Read file in chunks and count the occurrences of each possible byte (0-255)
while (!f.eof()) {
f.read(buff.data(), buff.size());
auto bytes_read = f.gcount();
while (!input_file.eof()) {
input_file.read(buff.data(), buff.size());
bytes_read = input_file.gcount();
total_bytes_read += bytes_read;

for (int j = 0; j < bytes_read; j++) {
unsigned char c = buff[j];
counted_bytes[c]++;
count = static_cast<unsigned char> (buff[j]);
counted_bytes[count]++;
}
}

f.close();
input_file.close();
std::cout << calculate_entropy(counted_bytes, total_bytes_read) << " " << argv[i] << "\n";
}
return 0;
}
}