-
Notifications
You must be signed in to change notification settings - Fork 9
add importer for log files #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding to what said in comments, changes without tests are unverifiable :)
daru-io.gemspec
Outdated
|
||
spec.add_development_dependency 'guard-rspec' if RUBY_VERSION >= '2.2.5' | ||
|
||
spec.add_runtime_dependency 'request-log-analyzer', '~> 1.13.4' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that no other file format gem is required unconditionally: daru-io relies on user installing the proper gems. Otherwise, anybody who wants daru-io just for CSV import, will install TON of other gems for pdf, excel and whatnot. For example, look here: https://github.com/SciRuby/daru-io/blob/master/lib/daru/io/importers/excelx.rb#L12 — we have optional_gem
construct for requiring something ONLY when it is used.
lib/daru/io/importers/rails.rb
Outdated
module IO | ||
module Importers | ||
class Rails < Base | ||
Daru::DataFrame.register_io_module :read_rails, self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read_rails_log
is better name, read_rails
is too vague.
lib/daru/io/importers/rails.rb
Outdated
def initialize | ||
require 'request_log_analyzer' | ||
RequestLogAnalyzer::Source::LogParser.class_eval do | ||
def initialize(format, options = {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better extract it all into importers/rails_logs/request_log_analyzer_patch.rb
, create there module RailsLogAnalyzerPatch
, and here just include that into RailsLogAnalyzer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current initialize
,parse
and read
methods are going to be the same for all the importers that will be added. Shall I instead create module RequestLogAnalyzerPatch
in importers/request_log_analyzer.rb
? Then I will pass :rails3
, :apache
and :amazon_s3
to the module in rails.rb
, apache.rb
and amazon_s3.rb
.
lib/daru/io/importers/rails.rb
Outdated
end | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, here you are copy-pasting and slightly changing code from other gem. It is totally unreliable: if the gem changes a slightest bit, you'll never follow this with your code. Why do you need this rewriting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to come up with a single method instead of changing the code.
Please review the changes so that I can start the documentation and fix rubocop tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please don't ignore Rubocop's suggestions.
lib/daru/io/importers/base.rb
Outdated
new.read(path) | ||
end | ||
|
||
def self.parse_log(file,format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you think it is appropriate to add method parse_log
, related to a very specific task, to Base
importer?.. All code related to some importer should be totally local to this importer: either in importers/rails_log.rb
, or in importers/rails_log/request_log_analyzer_patch.rb
(the latter is preferable, as well as extracting patch to a module, instead of doing class_eval
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parse_log
is a helper method as apache and amazon_s3 importers will also use it. Making this part local will later require adding the same code for the remaining two importers. I feel the method is not that specific to be ineligible for the base class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the method is not that specific to be ineligible for the base class.
In fact, it is. The base class provides the common base all importers need, not "some 3 of 20 importers may need that method, so we'll just drop it here". It requires specific external gem. If several different importers will need it, the module with this "patch" could be stored in, like importers/shared/request_log_analyzer_patch.rb
, and required/included where it is necessary.
lib/daru/io/importers/rails_log.rb
Outdated
def call() | ||
ind = [:method, :path, :ip, :timestamp, :line_type, :lineno, :source, | ||
:controller, :action, :format, :params, :rendered_file, | ||
:partial_duration, :status, :duration, :view, :db] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That definitely should be a constant.
lib/daru/io/importers/rails_log.rb
Outdated
row = [] | ||
ind.each do |attr| | ||
row << hash[attr] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be map
, at least.
But if I am reading it correctly, Hash#values_at
should be totally enough.
lib/daru/io/importers/rails_log.rb
Outdated
# initializes the instance variables @path and @file_data to nil | ||
def initialize | ||
@path = nil | ||
@file_data = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this vars here?
lib/daru/io/importers/rails_log.rb
Outdated
include RequestLogAnalyzerPatch | ||
Daru::DataFrame.register_io_module :read_rails_log, self | ||
|
||
# initializes the instance variables @path and @file_data to nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments should not just retell the code, code should speak for itself.
For such simple methods, comments aren't necessary at all.
lib/daru/io/importers/rails_log.rb
Outdated
# @example Reading from plaintext file | ||
# instance = Daru::IO::Importers::RailsLog.read("rails_test.log") | ||
def read(path) | ||
@path = path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
lib/daru/io/importers/rails_log.rb
Outdated
# instance = Daru::IO::Importers::RailsLog.read("rails_test.log") | ||
def read(path) | ||
@path = path | ||
@file_data = RequestLogAnalyzerPatch.parse_log(@path,:rails3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why :rails3
, couldn't it be another log format?..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the log format as another argument (with maybe :rails3
as default) would be a more generic way to do this.
@@ -0,0 +1,73 @@ | |||
module RequestLogAnalyzerPatch | |||
require 'request_log_analyzer' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file now is required always when daru-io is required. It should be made optional_gem
in importer's initialize
, like other importers do.
@@ -0,0 +1,73 @@ | |||
module RequestLogAnalyzerPatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I proposed this module name, I've meant the following: You define ONLY parse_hash
in that module, and then, in importer (not here in separate method) you do the following:
parser = RequestLogAnalyzer::Source::LogParser.new(RequestLogAnalyzer::FileFormat.load(format))
parser.extend(RequestLogAnalyzerPatch) # only this instance has method replaced
parser.parse_hash(path)
This way it is easy to see what you "inserting" into the third-party gem, and keep it local.
# @example Reading from rails log file | ||
# format = RequestLogAnalyzer::FileFormat.load(:rails3) | ||
# instance = RequestLogAnalyzer::Source::LogParser.new(format) | ||
# list = instance.parse_hash('test_rails.log') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is internal to implementation, so does NOT need any comment about "how it can be used".
Instead, the comment that is absolutely necessary here is: explain that this method only copy-pasted (from where exactly; from which version exactly) and this and that lines changed (with "how it was, why it was changed" in comments). This way, it would be possible to follow request_log_analyzer's development, if it would change the method significantly.
|
||
private | ||
|
||
def attempt_sqlite3_connection(db) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
db
is totally good name. Please add this to Naming/UncommunicativeMethodParamName:
cop's AllowedNames:
settings.
In general, if Rubocop is starting to fail on code unrelated to your PR, it is better to notify maintainer (@athityakumar) than judge yourself.
order: %i[method path ip timestamp line_type lineno source | ||
controller action format params rendered_file | ||
partial_duration status duration view db], | ||
:'timestamp.to_a' => [20180312174118], # rubocop:disable Style/NumericLiterals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't disable here, just do what Rubocop suggests.
lib/daru/io/importers/rails_log.rb
Outdated
@file_data.each do |hash| | ||
data.add_row(INDEX.map { |attr| hash[attr] }) | ||
end | ||
data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think df
is more readable than data
, and has either been consistently named so in other Importers or rather just returned a Daru::DataFrame
directly.
lib/daru/io/importers/rails_log.rb
Outdated
# # 1 GET / 127.0.0.1 2018022716 completed 12 /home/roh Rails... | ||
# # ... ... ... ... ... ... ... ... ... | ||
def call | ||
data = Daru::DataFrame.new({},index: INDEX).transpose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, INDEX
is actually supposed to be the order of the DataFrame and has been weirdly named so just due to transpose operation? 🤔
lib/daru/io/importers/rails_log.rb
Outdated
data = Daru::DataFrame.new({},index: INDEX).transpose | ||
@file_data.each do |hash| | ||
data.add_row(INDEX.map { |attr| hash[attr] }) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't the whole transpose
+ add_row
logic be replaced by Daru::DataFrame.rows(array_of_arrays, order: ORDER)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be, but then the array_of_arrays
will need further processing as all vectors don't have the same length due to missing attributes in some of them.
@zverok @athityakumar please review and suggest changes for rubocop failure. |
How about adding importers of all three formats in a single file? It would eliminate the additional separate module for the patch. The log.rb file will look like this module Importers
class Log < Base
Daru::Dataframe.register_io_module :read_log, self
ORDER_RAILS = %i[...].freeze
ORDER_APACHE = %i[...].freeze
ORDER_S3 = %i[...].freeze
...
def read(path, format: :rails3) # default format is :rails3
case format
when :rails3 then order = ORDER_RAILS
when :apache then order = ORDER_APACHE
when :amazon_s3 then order = ORDER_S3
else raise ArgumentError, 'Invalid format provided.'
...
@file_data = parser.parse_hash(path, order)
self
end
...
private
module RequestLogAnalyzerPatch
...
def parse_hash(file, order)
...
parsed_list[i] = order.map ...
...
end
end
end
end In the above method, I will have need to modify the line containing |
That may work. Just the code can be simpler: ORDERS = {
rails: %i[...].freeze,
apache: %i[...].freeze,
s3: %i[...].freeze
}
# and then...
order = ORDERS.fetch(format) |
lib/daru/io/importers/rails_log.rb
Outdated
Daru::DataFrame.register_io_module :read_rails_log, self | ||
|
||
# Checks for required gem dependencies of RailsLog importer | ||
# and requires the patch for request-log-analyzer gem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just remove the comment. Please don't repeat what method code says clearly (even if other importers do that :)).
lib/daru/io/importers/rails_log.rb
Outdated
|
||
# Reads data from a rails log file | ||
# | ||
# @!method self.read(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document format
parameter too.
lib/daru/io/importers/rails_log.rb
Outdated
# instance = Daru::IO::Importers::RailsLog.read("rails_test.log") | ||
def read(path, format: :rails3) | ||
parser = RequestLogAnalyzer::Source::LogParser.new(RequestLogAnalyzer::FileFormat.load(format)) | ||
parser.extend(RequestLogAnalyzerPatch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means, that RequestLogAnalyzerPatch
should look like this:
module RequestLogAnalyzerPatch
# look, no other nested modules!
def parse_hash(file)
end
end
This way, when you do this extend, parse_hash
becomes redefined.
# parse_io and parse_line of the LogParser class. Assigns all the necessary instance | ||
# variables defined in the above specified methods. Creates a request for each line of | ||
# the file stream and stores the hash of parsed information in raw_list. Each element of | ||
# parsed_list is an array of one parsed entry in log file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, now, I see a problem here (I thought that LogParser
already has method parse_hash
, you just fixing something in it):
combine the methods parse_file, parse_io and parse_line of the LogParser class
Why???? What's the point of combining three methods in one 40-lines mess?.. Why can't you use original methods as they were, just calling them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parse_file
returns nil
and so does parse_io
. That's why I have tried fixing these methods in the earlier commits instead of making a new single method. But then it was a lot of rewriting. It was a hard time finding points to tap out the parsed hash from the gem as it is only meant for CLI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. You should have spend a bit more time on understanding LogParser
logic. From reading the code, I believe what you need is (without ANY patching) just call LogParser#each_request
:
Reads the input, which can either be a file, sequence of files or STDIN to parse lines specified in the FileFormat. This lines will be combined into Request instances, that will be yielded.
...and, as it is aliased as each
, and Enumerable
is included, ALL you really need is:
RequestLogAnalyzer::Source::LogParser
.new(RequestLogAnalyzer::FileFormat.load(format))
.map { converting Request into line for dataframe }
...and that would be completely it.
@zverok please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
lib/daru/io/importers/log.rb
Outdated
# # ... ... ... ... ... ... ... ... ... | ||
def call | ||
Daru::DataFrame.rows(@file_data, order: ORDERS.fetch(@format) | ||
.map { |attr| attr == :path ? :resource_path : attr }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say RENAME_FIELDS.fetch(attr, attr)
with
RENAME_FIELDS = {
path: :resource_path
}
It would be more regular (even if for now used only for one field).
(Just turn off the |
Merged! Congrats 🎉 |
This PR integrates the gem
request-log-analyzer
to convert rails logs toDaru::DataFrame
.