Skip to content

Conversation

rohitner
Copy link
Contributor

@rohitner rohitner commented May 10, 2018

This PR integrates the gem request-log-analyzer to convert rails logs to Daru::DataFrame.

Copy link
Collaborator

@zverok zverok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to what said in comments, changes without tests are unverifiable :)

daru-io.gemspec Outdated

spec.add_development_dependency 'guard-rspec' if RUBY_VERSION >= '2.2.5'

spec.add_runtime_dependency 'request-log-analyzer', '~> 1.13.4'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that no other file format gem is required unconditionally: daru-io relies on user installing the proper gems. Otherwise, anybody who wants daru-io just for CSV import, will install TON of other gems for pdf, excel and whatnot. For example, look here: https://github.com/SciRuby/daru-io/blob/master/lib/daru/io/importers/excelx.rb#L12 — we have optional_gem construct for requiring something ONLY when it is used.

module IO
module Importers
class Rails < Base
Daru::DataFrame.register_io_module :read_rails, self
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

read_rails_log is better name, read_rails is too vague.

def initialize
require 'request_log_analyzer'
RequestLogAnalyzer::Source::LogParser.class_eval do
def initialize(format, options = {})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better extract it all into importers/rails_logs/request_log_analyzer_patch.rb, create there module RailsLogAnalyzerPatch, and here just include that into RailsLogAnalyzer.

Copy link
Contributor Author

@rohitner rohitner May 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current initialize,parse and read methods are going to be the same for all the importers that will be added. Shall I instead create module RequestLogAnalyzerPatch in importers/request_log_analyzer.rb? Then I will pass :rails3, :apache and :amazon_s3 to the module in rails.rb, apache.rb and amazon_s3.rb.

end
end
end
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, here you are copy-pasting and slightly changing code from other gem. It is totally unreliable: if the gem changes a slightest bit, you'll never follow this with your code. Why do you need this rewriting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to come up with a single method instead of changing the code.

@rohitner
Copy link
Contributor Author

rohitner commented May 12, 2018

Please review the changes so that I can start the documentation and fix rubocop tests.

Copy link
Collaborator

@zverok zverok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please don't ignore Rubocop's suggestions.

new.read(path)
end

def self.parse_log(file,format)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you think it is appropriate to add method parse_log, related to a very specific task, to Base importer?.. All code related to some importer should be totally local to this importer: either in importers/rails_log.rb, or in importers/rails_log/request_log_analyzer_patch.rb (the latter is preferable, as well as extracting patch to a module, instead of doing class_eval)

Copy link
Contributor Author

@rohitner rohitner May 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_log is a helper method as apache and amazon_s3 importers will also use it. Making this part local will later require adding the same code for the remaining two importers. I feel the method is not that specific to be ineligible for the base class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the method is not that specific to be ineligible for the base class.

In fact, it is. The base class provides the common base all importers need, not "some 3 of 20 importers may need that method, so we'll just drop it here". It requires specific external gem. If several different importers will need it, the module with this "patch" could be stored in, like importers/shared/request_log_analyzer_patch.rb, and required/included where it is necessary.

def call()
ind = [:method, :path, :ip, :timestamp, :line_type, :lineno, :source,
:controller, :action, :format, :params, :rendered_file,
:partial_duration, :status, :duration, :view, :db]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That definitely should be a constant.

row = []
ind.each do |attr|
row << hash[attr]
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be map, at least.
But if I am reading it correctly, Hash#values_at should be totally enough.

# initializes the instance variables @path and @file_data to nil
def initialize
@path = nil
@file_data = nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this vars here?

include RequestLogAnalyzerPatch
Daru::DataFrame.register_io_module :read_rails_log, self

# initializes the instance variables @path and @file_data to nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should not just retell the code, code should speak for itself.
For such simple methods, comments aren't necessary at all.

# @example Reading from plaintext file
# instance = Daru::IO::Importers::RailsLog.read("rails_test.log")
def read(path)
@path = path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

# instance = Daru::IO::Importers::RailsLog.read("rails_test.log")
def read(path)
@path = path
@file_data = RequestLogAnalyzerPatch.parse_log(@path,:rails3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why :rails3, couldn't it be another log format?..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the log format as another argument (with maybe :rails3 as default) would be a more generic way to do this.

@@ -0,0 +1,73 @@
module RequestLogAnalyzerPatch
require 'request_log_analyzer'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file now is required always when daru-io is required. It should be made optional_gem in importer's initialize, like other importers do.

@@ -0,0 +1,73 @@
module RequestLogAnalyzerPatch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I proposed this module name, I've meant the following: You define ONLY parse_hash in that module, and then, in importer (not here in separate method) you do the following:

parser = RequestLogAnalyzer::Source::LogParser.new(RequestLogAnalyzer::FileFormat.load(format))
parser.extend(RequestLogAnalyzerPatch) # only this instance has method replaced
parser.parse_hash(path)

This way it is easy to see what you "inserting" into the third-party gem, and keep it local.

# @example Reading from rails log file
# format = RequestLogAnalyzer::FileFormat.load(:rails3)
# instance = RequestLogAnalyzer::Source::LogParser.new(format)
# list = instance.parse_hash('test_rails.log')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is internal to implementation, so does NOT need any comment about "how it can be used".
Instead, the comment that is absolutely necessary here is: explain that this method only copy-pasted (from where exactly; from which version exactly) and this and that lines changed (with "how it was, why it was changed" in comments). This way, it would be possible to follow request_log_analyzer's development, if it would change the method significantly.


private

def attempt_sqlite3_connection(db)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db is totally good name. Please add this to Naming/UncommunicativeMethodParamName: cop's AllowedNames: settings.

In general, if Rubocop is starting to fail on code unrelated to your PR, it is better to notify maintainer (@athityakumar) than judge yourself.

order: %i[method path ip timestamp line_type lineno source
controller action format params rendered_file
partial_duration status duration view db],
:'timestamp.to_a' => [20180312174118], # rubocop:disable Style/NumericLiterals
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't disable here, just do what Rubocop suggests.

@file_data.each do |hash|
data.add_row(INDEX.map { |attr| hash[attr] })
end
data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think df is more readable than data, and has either been consistently named so in other Importers or rather just returned a Daru::DataFrame directly.

# # 1 GET / 127.0.0.1 2018022716 completed 12 /home/roh Rails...
# # ... ... ... ... ... ... ... ... ...
def call
data = Daru::DataFrame.new({},index: INDEX).transpose
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, INDEX is actually supposed to be the order of the DataFrame and has been weirdly named so just due to transpose operation? 🤔

data = Daru::DataFrame.new({},index: INDEX).transpose
@file_data.each do |hash|
data.add_row(INDEX.map { |attr| hash[attr] })
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't the whole transpose + add_row logic be replaced by Daru::DataFrame.rows(array_of_arrays, order: ORDER)?

Copy link
Contributor Author

@rohitner rohitner May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be, but then the array_of_arrays will need further processing as all vectors don't have the same length due to missing attributes in some of them.

@rohitner
Copy link
Contributor Author

@zverok @athityakumar please review and suggest changes for rubocop failure.

@rohitner
Copy link
Contributor Author

rohitner commented May 17, 2018

How about adding importers of all three formats in a single file? It would eliminate the additional separate module for the patch. The log.rb file will look like this

module Importers
  class Log < Base
    Daru::Dataframe.register_io_module :read_log, self
    ORDER_RAILS  = %i[...].freeze
    ORDER_APACHE = %i[...].freeze
    ORDER_S3     = %i[...].freeze
    ...
    def read(path, format: :rails3) # default format is :rails3
      case format
      when :rails3    then order = ORDER_RAILS
      when :apache    then order = ORDER_APACHE
      when :amazon_s3 then order = ORDER_S3
      else raise ArgumentError, 'Invalid format provided.'
      ...
      @file_data = parser.parse_hash(path, order)
      self
    end
    ...
    private
    module RequestLogAnalyzerPatch
    ...
      def parse_hash(file, order)
      ...
          parsed_list[i] = order.map ...
      ...
      end
    end
  end
end

In the above method, I will have need to modify the line containing register_io_module to take one more argument specifying the format. WDYT?

@zverok
Copy link
Collaborator

zverok commented May 17, 2018

How about adding importers of all three formats in a single file? It would eliminate the additional separate module for the patch. The log.rb file will look like this

That may work. Just the code can be simpler:

ORDERS = {
  rails: %i[...].freeze,
  apache: %i[...].freeze,
  s3: %i[...].freeze
}
# and then...
order = ORDERS.fetch(format)

Daru::DataFrame.register_io_module :read_rails_log, self

# Checks for required gem dependencies of RailsLog importer
# and requires the patch for request-log-analyzer gem
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remove the comment. Please don't repeat what method code says clearly (even if other importers do that :)).


# Reads data from a rails log file
#
# @!method self.read(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document format parameter too.

# instance = Daru::IO::Importers::RailsLog.read("rails_test.log")
def read(path, format: :rails3)
parser = RequestLogAnalyzer::Source::LogParser.new(RequestLogAnalyzer::FileFormat.load(format))
parser.extend(RequestLogAnalyzerPatch)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means, that RequestLogAnalyzerPatch should look like this:

module RequestLogAnalyzerPatch
  # look, no other nested modules!
  def parse_hash(file)
  end
end

This way, when you do this extend, parse_hash becomes redefined.

# parse_io and parse_line of the LogParser class. Assigns all the necessary instance
# variables defined in the above specified methods. Creates a request for each line of
# the file stream and stores the hash of parsed information in raw_list. Each element of
# parsed_list is an array of one parsed entry in log file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, now, I see a problem here (I thought that LogParser already has method parse_hash, you just fixing something in it):

combine the methods parse_file, parse_io and parse_line of the LogParser class

Why???? What's the point of combining three methods in one 40-lines mess?.. Why can't you use original methods as they were, just calling them?

Copy link
Contributor Author

@rohitner rohitner May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_file returns nil and so does parse_io. That's why I have tried fixing these methods in the earlier commits instead of making a new single method. But then it was a lot of rewriting. It was a hard time finding points to tap out the parsed hash from the gem as it is only meant for CLI.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. You should have spend a bit more time on understanding LogParser logic. From reading the code, I believe what you need is (without ANY patching) just call LogParser#each_request:

Reads the input, which can either be a file, sequence of files or STDIN to parse lines specified in the FileFormat. This lines will be combined into Request instances, that will be yielded.

...and, as it is aliased as each, and Enumerable is included, ALL you really need is:

RequestLogAnalyzer::Source::LogParser
  .new(RequestLogAnalyzer::FileFormat.load(format))
  .map { converting Request into line for dataframe }

...and that would be completely it.

@rohitner rohitner changed the title add importer for rails log add importer for log files May 19, 2018
@rohitner
Copy link
Contributor Author

@zverok please review

Copy link
Collaborator

@zverok zverok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

# # ... ... ... ... ... ... ... ... ...
def call
Daru::DataFrame.rows(@file_data, order: ORDERS.fetch(@format)
.map { |attr| attr == :path ? :resource_path : attr })
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say RENAME_FIELDS.fetch(attr, attr) with

RENAME_FIELDS = {
  path: :resource_path
}

It would be more regular (even if for now used only for one field).

@zverok
Copy link
Collaborator

zverok commented May 28, 2018

(Just turn off the Lint/SplatKeywordArguments, it is honestly meaningless and already removed in the master).

@zverok zverok merged commit dce4be1 into SciRuby:master May 30, 2018
@zverok
Copy link
Collaborator

zverok commented May 30, 2018

Merged! Congrats 🎉

@rohitner rohitner mentioned this pull request Jun 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants