Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions democlient/democlient.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,8 @@ class CppcheckExecutor : public ErrorLogger {
, cppcheck(settings, supprs, *this, false, nullptr)
{}

void run(const char code[]) {
cppcheck.check(FileWithDetails("test.cpp", Standards::Language::CPP, 0), code);
void run(const char* code) {
cppcheck.checkBuffer(FileWithDetails("test.cpp", Standards::Language::CPP, 0), reinterpret_cast<const uint8_t*>(code), strlen(code));
}

void reportOut(const std::string & /*outmsg*/, Color /*c*/) override {}
Expand Down
7 changes: 5 additions & 2 deletions gui/mainwindow.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -729,8 +729,11 @@ void MainWindow::analyzeCode(const QString& code, const QString& filename)
checkLockDownUI();
clearResults();
mUI->mResults->checkingStarted(1);
// TODO: apply enforcedLanguage?
cppcheck.check(FileWithDetails(filename.toStdString(), Path::identify(filename.toStdString(), false), 0), code.toStdString());
{
const std::string code_s = code.toStdString();
// TODO: apply enforcedLanguage?
cppcheck.checkBuffer(FileWithDetails(filename.toStdString(), Path::identify(filename.toStdString(), false), 0), reinterpret_cast<const std::uint8_t*>(code_s.data()), code_s.size());
}
analysisDone();

// Expand results
Expand Down
59 changes: 27 additions & 32 deletions lib/cppcheck.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -799,10 +799,9 @@ unsigned int CppCheck::check(const FileWithDetails &file)
return returnValue;
}

unsigned int CppCheck::check(const FileWithDetails &file, const std::string &content)
unsigned int CppCheck::checkBuffer(const FileWithDetails &file, const uint8_t* data, std::size_t size)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike this interface. 1 parameter is switched to 2 parameters and it's not because we want to separate them.
isn't modern c++ guidelines advocating that span/string_view is used instead of raw-pointer+size this is a step in opposite direction.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an rather internal interface and this removes all unnecessary wrappers and shows the actual intention.

As usual the changes made are incremental. The next step after this is to merge danmar/simplecpp#496 and get it downstream so further cleanups can be made. With the code clean and more interfaces into simplecpp available it should be clearer on how to proceed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you want to have is performance, getting rid of memory allocation and data copy, right?

What I want is a function interface that does not have a pointer+size pair and I would like a type that makes it as clear as possible that it is the raw code to process.

In my humble opinion we do not have conflicting requests. How about this:

struct FileData {
    const char* data;
    std::size_t dataSize;
};

unsigned int check(const FileWithDetails &file, const FileData &fileData);

Copy link
Owner

@danmar danmar Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not be against that FileData would have similar interface as string_view (with only the methods that we use). We can then have the option (using #ifdef) to just use a string_view alias instead:

using FileData = std::string_view;

And I guess that could lead to improved static analysis etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is rather awkward and I would like to have something which can be used without adjusting the code when we ever switch. But I do not want to do this in this PR. And without a clean base that cannot be properly tested and we also need a clean baseline to profile against.

As mentioned above that will be done in follow ups.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does more than just adjust that single interface and it will be changed in the future. We need clean code to move to something else instead of some intermediate mishmash. Moving to something else intermediate will only complicate things if I want to compare the implementations - which would actually not be possible at all to compare since there will always be a mix of various interfaces if we never had it clean to begin with.

Copy link
Owner

@danmar danmar Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be changed in the future

I would like to understand what the plan is really. I don't want to have char buffers all over the place. I suggest we remove the buffer inputs from simplecpp directly

string_view is safer than a plain old char pointer. the reason string_view was invented was to improve safety and expressiveness.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I don't understand why we should convert to char buffers everywhere first and then change to something else later. that sounds like "intermediate mishmash" to me :-(

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we remove the buffer inputs from simplecpp directly

ah.. I see that there is work in progress in danmar/simplecpp#496 👍

It would feel much better if we merge 496 first and then use the safe interface..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to understand what the plan is really.

The plan is to get rid of all unnecessary (inefficient) wrappers so data can be passed through directly and the interfaces can be cleaned up.

This PR for instance gets rid of the std::istream input which was never necessary.

And as outlined above. The next steps are getting the modern simplecpp::TokenList interface in so we clean up more things. And to clean up more of the tests.

The production code would then be clean and only the test might still have some mishmash but the interfaces for that can then be moved there and the production code can get some thin wrappers which allows for modern stuff to drop-in. (As using Qt implicitly switches to C++17 that could already be used and tested).

Doing this all as whole would be hard-to-review, making profiling the various implementations impossible and also might make it harder to pinpoint potential issues.

I don't want to have char buffers all over the place.

In production code we do not have any actual char buffers at all. We only pass as such. The buffers are where data comes from memory (fuzzer, democlient) or string literals in tests. In the latter case we can deduce it via template and can mostly pass it on.

I suggest we remove the buffer inputs from simplecpp directly

That is what danmar/simplecpp#496 is essentially doing.

the reason string_view was invented was to improve safety and expressiveness.

You mainly gain bounds safety and simplified interfaces. But it is still a massive footgun because people think it is all safe but if you use it wrong it might be even harder to figure out what is wrong than before (and also lifetime dependencies). But let's stay on topic...

and I don't understand why we should convert to char buffers everywhere first and then change to something else later. that sounds like "intermediate mishmash" to me :-(

We no longer do that in production code after this PR. The buffer goes straight into simplecpp::TokenList.

{
std::istringstream iss(content);
return checkFile(file, "", 0, &iss);
return checkBuffer(file, "", 0, data, size);
}

unsigned int CppCheck::check(const FileSettings &fs)
Expand Down Expand Up @@ -851,14 +850,6 @@ unsigned int CppCheck::check(const FileSettings &fs)
return returnValue;
}

static simplecpp::TokenList createTokenList(const std::string& filename, std::vector<std::string>& files, simplecpp::OutputList* outputList, std::istream* fileStream)
{
if (fileStream)
return {*fileStream, files, filename, outputList};

return {filename, files, outputList};
}

std::size_t CppCheck::calculateHash(const Preprocessor& preprocessor, const simplecpp::TokenList& tokens, const std::string& filePath) const
{
std::ostringstream toolinfo;
Expand All @@ -880,7 +871,23 @@ std::size_t CppCheck::calculateHash(const Preprocessor& preprocessor, const simp
return preprocessor.calculateHash(tokens, toolinfo.str());
}

unsigned int CppCheck::checkFile(const FileWithDetails& file, const std::string &cfgname, int fileIndex, std::istream* fileStream)
unsigned int CppCheck::checkBuffer(const FileWithDetails &file, const std::string &cfgname, int fileIndex, const uint8_t* data, std::size_t size)
{
const auto f = [&file, data, size](std::vector<std::string>& files, simplecpp::OutputList* outputList) {
return simplecpp::TokenList{data, size, files, file.spath(), outputList};
};
return checkInternal(file, cfgname, fileIndex, f);
}

unsigned int CppCheck::checkFile(const FileWithDetails& file, const std::string &cfgname, int fileIndex)
{
const auto f = [&file](std::vector<std::string>& files, simplecpp::OutputList* outputList) {
return simplecpp::TokenList{file.spath(), files, outputList};
};
return checkInternal(file, cfgname, fileIndex, f);
}

unsigned int CppCheck::checkInternal(const FileWithDetails& file, const std::string &cfgname, int fileIndex, const CreateTokenListFn& createTokenList)
{
// TODO: move to constructor when CppCheck no longer owns the settings
if (mSettings.checks.isEnabled(Checks::unusedFunction) && !mUnusedFunctionsCheck)
Expand Down Expand Up @@ -931,24 +938,13 @@ unsigned int CppCheck::checkFile(const FileWithDetails& file, const std::string
std::size_t hash = 0;
// markup files are special and do not adhere to the enforced language
TokenList tokenlist{mSettings, Standards::Language::C};
if (fileStream) {
std::vector<std::string> files;
simplecpp::TokenList tokens(*fileStream, files, file.spath());
if (analyzerInformation) {
const Preprocessor preprocessor(mSettings, mErrorLogger, Standards::Language::C);
hash = calculateHash(preprocessor, tokens);
}
tokenlist.createTokens(std::move(tokens));
}
else {
std::vector<std::string> files;
simplecpp::TokenList tokens(file.spath(), files);
if (analyzerInformation) {
const Preprocessor preprocessor(mSettings, mErrorLogger, file.lang());
hash = calculateHash(preprocessor, tokens);
}
tokenlist.createTokens(std::move(tokens));
std::vector<std::string> files;
simplecpp::TokenList tokens = createTokenList(files, nullptr);
if (analyzerInformation) {
const Preprocessor preprocessor(mSettings, mErrorLogger, file.lang());
hash = calculateHash(preprocessor, tokens);
}
tokenlist.createTokens(std::move(tokens));
// this is not a real source file - we just want to tokenize it. treat it as C anyways as the language needs to be determined.
Tokenizer tokenizer(std::move(tokenlist), mErrorLogger);
mUnusedFunctionsCheck->parseTokens(tokenizer, mSettings);
Expand All @@ -967,7 +963,7 @@ unsigned int CppCheck::checkFile(const FileWithDetails& file, const std::string

simplecpp::OutputList outputList;
std::vector<std::string> files;
simplecpp::TokenList tokens1 = createTokenList(file.spath(), files, &outputList, fileStream);
simplecpp::TokenList tokens1 = createTokenList(files, &outputList);

// If there is a syntax error, report it and stop
const auto output_it = std::find_if(outputList.cbegin(), outputList.cend(), [](const simplecpp::Output &output){
Expand Down Expand Up @@ -1075,8 +1071,7 @@ unsigned int CppCheck::checkFile(const FileWithDetails& file, const std::string
code += "#line " + std::to_string(dir.linenr) + " \"" + dir.file + "\"\n" + dir.str + '\n';
}
TokenList tokenlist(mSettings, file.lang());
std::istringstream istr2(code);
tokenlist.createTokens(istr2); // TODO: check result?
tokenlist.createTokensFromBuffer(code.data(), code.size()); // TODO: check result?
executeRules("define", tokenlist);
}
#endif
Expand Down
35 changes: 30 additions & 5 deletions lib/cppcheck.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,10 @@ class Settings;
struct Suppressions;
class Preprocessor;

namespace simplecpp { class TokenList; }
namespace simplecpp {
class TokenList;
struct Output;
}

/// @addtogroup Core
/// @{
Expand Down Expand Up @@ -100,12 +103,13 @@ class CPPCHECKLIB CppCheck {
* the disk but the content is given in @p content. In errors the @p path
* is used as a filename.
* @param file The file to check.
* @param content File content as a string.
* @param data File content as a buffer.
* @param size Size of buffer.
* @return amount of errors found or 0 if none were found.
* @note You must set settings before calling this function (by calling
* settings()).
*/
unsigned int check(const FileWithDetails &file, const std::string &content);
unsigned int checkBuffer(const FileWithDetails &file, const uint8_t* data, std::size_t size);

/**
* @brief Returns current version number as a string.
Expand Down Expand Up @@ -174,14 +178,35 @@ class CPPCHECKLIB CppCheck {
*/
std::size_t calculateHash(const Preprocessor &preprocessor, const simplecpp::TokenList &tokens, const std::string& filePath = {}) const;

/**
* @brief Check a file
* @param file the file
* @param cfgname cfg name
* @return number of errors found
*/
unsigned int checkFile(const FileWithDetails& file, const std::string &cfgname, int fileIndex);

/**
* @brief Check a file using buffer
* @param file the file
* @param cfgname cfg name
* @param data the data to be read
* @param size the size of the data to be read
* @return number of errors found
*/
unsigned int checkBuffer(const FileWithDetails& file, const std::string &cfgname, int fileIndex, const uint8_t* data, std::size_t size);

// TODO: should use simplecpp::OutputList
using CreateTokenListFn = std::function<simplecpp::TokenList (std::vector<std::string>&, std::list<simplecpp::Output>*)>;

/**
* @brief Check a file using stream
* @param file the file
* @param cfgname cfg name
* @param fileStream stream the file content can be read from
* @param createTokenList a function to create the simplecpp::TokenList with
* @return number of errors found
*/
unsigned int checkFile(const FileWithDetails& file, const std::string &cfgname, int fileIndex, std::istream* fileStream = nullptr);
unsigned int checkInternal(const FileWithDetails& file, const std::string &cfgname, int fileIndex, const CreateTokenListFn& createTokenList);

/**
* @brief Check normal tokens
Expand Down
3 changes: 1 addition & 2 deletions lib/importproject.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -533,8 +533,7 @@ namespace {
// TODO: improve evaluation
const Settings s;
TokenList tokenlist(s, Standards::Language::C);
std::istringstream istr(c);
tokenlist.createTokens(istr); // TODO: check result
tokenlist.createTokensFromBuffer(c.data(), c.size()); // TODO: check result
// TODO: put in a helper
// generate links
{
Expand Down
5 changes: 2 additions & 3 deletions lib/library.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@
#include <iostream>
#include <list>
#include <memory>
#include <sstream>
#include <stdexcept>
#include <string>
#include <unordered_set>
Expand Down Expand Up @@ -178,8 +177,8 @@ static std::vector<std::string> getnames(const char *names)

static void gettokenlistfromvalid(const std::string& valid, TokenList& tokenList)
{
std::istringstream istr(valid + ',');
tokenList.createTokens(istr); // TODO: check result?
const std::string str(valid + ',');
tokenList.createTokensFromBuffer(str.data(), str.size()); // TODO: check result?
for (Token *tok = tokenList.front(); tok; tok = tok->next()) {
if (Token::Match(tok,"- %num%")) {
tok->str("-" + tok->strAt(1));
Expand Down
3 changes: 1 addition & 2 deletions lib/programmemory.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1822,8 +1822,7 @@ static std::shared_ptr<Token> createTokenFromExpression(const std::string& retur
std::shared_ptr<TokenList> tokenList = std::make_shared<TokenList>(settings, cpp ? Standards::Language::CPP : Standards::Language::C);
{
const std::string code = "return " + returnValue + ";";
std::istringstream istr(code);
if (!tokenList->createTokens(istr))
if (!tokenList->createTokensFromBuffer(code.data(), code.size()))
return nullptr;
}

Expand Down
8 changes: 4 additions & 4 deletions lib/symboldatabase.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7756,8 +7756,8 @@ void SymbolDatabase::setValueTypeInTokenList(bool reportDebugWarnings, Token *to
if (!typestr.empty()) {
ValueType valuetype;
TokenList tokenList(mSettings, tok->isCpp() ? Standards::Language::CPP : Standards::Language::C);
std::istringstream istr(typestr+";");
tokenList.createTokens(istr); // TODO: check result?
const std::string str(typestr+";");
tokenList.createTokensFromBuffer(str.data(), str.size()); // TODO: check result?
tokenList.simplifyStdType();
if (parsedecl(tokenList.front(), &valuetype, mDefaultSignedness, mSettings)) {
valuetype.originalTypeName = typestr;
Expand Down Expand Up @@ -7846,8 +7846,8 @@ void SymbolDatabase::setValueTypeInTokenList(bool reportDebugWarnings, Token *to
continue;
}
TokenList tokenList(mSettings, tok->isCpp() ? Standards::Language::CPP : Standards::Language::C);
std::istringstream istr(typestr+";");
if (tokenList.createTokens(istr)) {
const std::string str(typestr+";");
if (tokenList.createTokensFromBuffer(str.data(), str.size())) {
ValueType vt;
tokenList.simplifyPlatformTypes();
tokenList.simplifyStdType();
Expand Down
8 changes: 4 additions & 4 deletions lib/tokenlist.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -321,17 +321,17 @@ void TokenList::insertTokens(Token *dest, const Token *src, nonneg int n)

//---------------------------------------------------------------------------

bool TokenList::createTokens(std::istream &code)
bool TokenList::createTokensFromBuffer(const uint8_t* data, size_t size)
{
return createTokensInternal(code, mFiles.empty() ? "" : *mFiles.cbegin());
return createTokensFromBufferInternal(data, size, mFiles.empty() ? "" : *mFiles.cbegin());
}

//---------------------------------------------------------------------------

bool TokenList::createTokensInternal(std::istream &code, const std::string& file0)
bool TokenList::createTokensFromBufferInternal(const uint8_t* data, size_t size, const std::string& file0)
{
simplecpp::OutputList outputList;
simplecpp::TokenList tokens(code, mFiles, file0, &outputList);
simplecpp::TokenList tokens(data, size, mFiles, file0, &outputList);

createTokens(std::move(tokens));

Expand Down
13 changes: 10 additions & 3 deletions lib/tokenlist.h
Original file line number Diff line number Diff line change
Expand Up @@ -98,9 +98,16 @@ class CPPCHECKLIB TokenList {
* - multiline strings are not handled.
* - UTF in the code are not handled.
* - comments are not handled.
* @param code input stream for code
*/
bool createTokens(std::istream &code);
bool createTokensFromBuffer(const uint8_t* data, size_t size);
bool createTokensFromBuffer(const char* data, size_t size) {
return createTokensFromBuffer(reinterpret_cast<const uint8_t*>(data), size);
}
template<size_t size>
// cppcheck-suppress unusedFunction - used in tests only
bool createTokensFromString(const char (&data)[size]) {
return createTokensFromBuffer(reinterpret_cast<const uint8_t*>(data), size-1);
}

void createTokens(simplecpp::TokenList&& tokenList);

Expand Down Expand Up @@ -208,7 +215,7 @@ class CPPCHECKLIB TokenList {
}

private:
bool createTokensInternal(std::istream &code, const std::string& file0);
bool createTokensFromBufferInternal(const uint8_t* data, std::size_t size, const std::string& file0);

/** Token list */
std::shared_ptr<TokensFrontBack> mTokensFrontBack;
Expand Down
8 changes: 3 additions & 5 deletions lib/valueflow.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,6 @@
#include <memory>
#include <numeric>
#include <set>
#include <sstream>
#include <string>
#include <unordered_map>
#include <unordered_set>
Expand Down Expand Up @@ -1997,8 +1996,7 @@ static bool isNotEqual(std::pair<const Token*, const Token*> x, std::pair<const
static bool isNotEqual(std::pair<const Token*, const Token*> x, const std::string& y, bool cpp, const Settings& settings)
{
TokenList tokenList(settings, cpp ? Standards::Language::CPP : Standards::Language::C);
std::istringstream istr(y);
tokenList.createTokens(istr); // TODO: check result?
tokenList.createTokensFromBuffer(y.data(), y.size()); // TODO: check result?
return isNotEqual(x, std::make_pair(tokenList.front(), tokenList.back()));
}
static bool isNotEqual(std::pair<const Token*, const Token*> x, const ValueType* y, bool cpp, const Settings& settings)
Expand Down Expand Up @@ -7117,8 +7115,8 @@ static bool getMinMaxValues(const std::string& typestr,
MathLib::bigint& maxvalue)
{
TokenList typeTokens(settings, cpp ? Standards::Language::CPP : Standards::Language::C);
std::istringstream istr(typestr + ";");
if (!typeTokens.createTokens(istr))
const std::string str(typestr + ";");
if (!typeTokens.createTokensFromBuffer(str.data(), str.size()))
return false;
typeTokens.simplifyPlatformTypes();
typeTokens.simplifyStdType();
Expand Down
8 changes: 4 additions & 4 deletions oss-fuzz/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,11 @@ static const Settings s_settings = create_settings();
static DummyErrorLogger s_errorLogger;
static const FileWithDetails s_file("test.cpp", Standards::Language::CPP, 0);

static void doCheck(const std::string& code)
static void doCheck(const uint8_t *data, size_t dataSize)
{
Suppressions supprs;
CppCheck cppcheck(s_settings, supprs, s_errorLogger, false, nullptr);
cppcheck.check(s_file, code);
cppcheck.checkBuffer(s_file, data, dataSize);
}

#ifndef NO_FUZZ
Expand All @@ -74,7 +74,7 @@ int LLVMFuzzerTestOneInput(const uint8_t *data, size_t dataSize)
{
if (dataSize < 10000) {
const std::string code = generateCode2(data, dataSize);
doCheck(code);
doCheck(reinterpret_cast<const unsigned char*>(code.data()), code.size());
}
return 0;
}
Expand All @@ -98,7 +98,7 @@ int main(int argc, char * argv[])

const std::string code = oss.str();
for (int i = 0; i < cnt; ++i)
doCheck(code);
doCheck(reinterpret_cast<const unsigned char*>(code.data()), code.size());

return EXIT_SUCCESS;
}
Expand Down
Loading