@@ -225,285 +225,10 @@ object
225
225
226
226
### Football Job
227
227
228
- This is a (American) Football statistics loading job. We gave it the
229
- id of ` footballJob ` in our configuration file. Before diving
230
- into the batch job, we'll examine the two input files that need to
231
- be loaded. First is ` player.csv ` , which can be found in the
232
- samples project under
233
- src/main/resources/data/footballjob/input/. Each line within this
234
- file represents a player, with a unique id, the player’s name,
235
- position, etc:
236
-
237
- AbduKa00,Abdul-Jabbar,Karim,rb,1974,1996
238
- AbduRa00,Abdullah,Rabih,rb,1975,1999
239
- AberWa00,Abercrombie,Walter,rb,1959,1982
240
- AbraDa00,Abramowicz,Danny,wr,1945,1967
241
- AdamBo00,Adams,Bob,te,1946,1969
242
- AdamCh00,Adams,Charlie,wr,1979,2003
243
- ...
244
-
245
- One of the first noticeable characteristics of the file is that each
246
- data element is separated by a comma, a format most are familiar
247
- with known as 'CSV'. Other separators such as pipes or semicolons
248
- could just as easily be used to delineate between unique
249
- elements. In general, it falls into one of two types of flat file
250
- formats: delimited or fixed length. (The fixed length case was
251
- covered in the ` fixedLengthImportJob ` .
252
-
253
- The second file, 'games.csv' is formatted the same as the previous
254
- example, and resides in the same directory:
255
-
256
- AbduKa00,1996,mia,10,nwe,0,0,0,0,0,29,104,,16,2
257
- AbduKa00,1996,mia,11,clt,0,0,0,0,0,18,70,,11,2
258
- AbduKa00,1996,mia,12,oti,0,0,0,0,0,18,59,,0,0
259
- AbduKa00,1996,mia,13,pit,0,0,0,0,0,16,57,,0,0
260
- AbduKa00,1996,mia,14,rai,0,0,0,0,0,18,39,,7,0
261
- AbduKa00,1996,mia,15,nyg,0,0,0,0,0,17,96,,14,0
262
- ...
263
-
264
- Each line in the file represents an individual player's performance
265
- in a particular game, containing such statistics as passing yards,
266
- receptions, rushes, and total touchdowns.
267
-
268
- Our example batch job is going to load both files into a database,
269
- and then combine each to summarise how each player performed for a
270
- particular year. Although this example is fairly trivial, it shows
271
- multiple types of input, and the general style is a common batch
272
- scenario. That is, summarising a very large dataset so that it can
273
- be more easily manipulated or viewed by an online web-based
274
- application. In an enterprise solution the third step, the reporting
275
- step, could be implemented through the use of Eclipse BIRT or one of
276
- the many Java Reporting Engines. Given this description, we can then
277
- easily divide our batch job up into 3 'steps': one to load the
278
- player data, one to load the game data, and one to produce a summary
279
- report:
280
-
281
- ** Note:** One of the nice features of Spring is a project called
282
- Spring IDE. When you download the project you can install Spring
283
- IDE and add the Spring configurations to the IDE project. This is
284
- not a tutorial on Spring IDE but the visual view into Spring beans
285
- is helpful in understanding the structure of a Job
286
- Configuration. Spring IDE produces the following diagram:
287
-
288
- ![ Spring Batch Football Object Model] ( src/site/resources/images/spring-batch-football-graph.jpg " Spring Batch Football Object Model ")
289
-
290
- This corresponds exactly with the ` footballJob.xml ` job
291
- configuration file which can be found in the jobs folder under
292
- ` src/main/resources ` . When you drill down into the football job
293
- you will see that the configuration has a list of steps:
294
-
295
- <property name="steps">
296
- <list>
297
- <bean id="playerload" parent="simpleStep" .../>
298
- <bean id="gameLoad" parent="simpleStep" .../>
299
- <bean id="playerSummarization" parent="simpleStep" .../>
300
- </list>
301
- </property>
302
-
303
- A step is run until there is no more input to process, which in
304
- this case would mean that each file has been completely
305
- processed. To describe it in a more narrative form: the first step,
306
- playerLoad, begins executing by grabbing one line of input from the
307
- file, and parsing it into a domain object. That domain object is
308
- then passed to a dao, which writes it out to the PLAYERS table. This
309
- action is repeated until there are no more lines in the file,
310
- causing the playerLoad step to finish. Next, the gameLoad step does
311
- the same for the games input file, inserting into the GAMES
312
- table. Once finished, the playerSummarization step can begin. Unlike
313
- the first two steps, playerSummarization input comes from the
314
- database, using a Sql statement to combine the GAMES and PLAYERS
315
- table. Each returned row is packaged into a domain object and
316
- written out to the PLAYER_SUMMARY table.
317
-
318
- Now that we've discussed the entire flow of the batch job, we can
319
- dive deeper into the first step: playerLoad:
320
-
321
- <bean id="playerload" parent="simpleStep">
322
- <property name="commitInterval" value="${job.commit.interval}" />
323
- <property name="startLimit" value="100" />
324
- <property name="itemReader"
325
- ref="playerFileItemReader" />
326
- <property name="itemWriter">
327
- <bean
328
- class="org.springframework.batch.sample.domain.football.internal.internal.PlayerItemWriter">
329
- <property name="playerDao">
330
- <bean
331
- class="org.springframework.batch.sample.domain.football.internal.internal.JdbcPlayerDao">
332
- <property name="dataSource"
333
- ref="dataSource" />
334
- </bean>
335
- </property>
336
- </bean>
337
- </property>
338
- </bean>
339
-
340
- The root bean in this case is a ` SimpleStepFactoryBean ` , which
341
- can be considered a 'blueprint' of sorts that tells the execution
342
- environment basic details about how the batch job should be
343
- executed. It contains four properties: (others have been removed for
344
- greater clarity) commitInterval, startLimit, itemReader and
345
- itemWriter . After performing all necessary startup, the framework
346
- will periodically delegate to the reader and writer. In this way,
347
- the developer can remain solely concerned with their business
348
- logic.
349
-
350
- * * ItemReader* – the item reader is the source of the information
351
- pipe. At the most basic level input is read in from an input
352
- source, parsed into a domain object and returned. In this way, the
353
- good batch architecture practice of ensuring all data has been
354
- read before beginning processing can be enforced, along with
355
- providing a possible avenue for reuse.
356
-
357
- * * ItemWriter* – this is the business logic. At a high level,
358
- the item writer takes the item returned from the reader
359
- and 'processes' it. In our case it's a data access object that is
360
- simply responsible for inserting a record into the PLAYERS
361
- table. As you can see the developer does very little.
362
-
363
- The application developer simply provides a job configuration with a
364
- configured number of steps, an ItemReader associated to some type
365
- of input source, and ItemWriter associated to some type of
366
- output source and a little mapping of data from flat records to
367
- objects and the pipe is ready wired for processing.
368
-
369
- Another property in the step configuration, the commitInterval,
370
- gives the framework vital information about how to control
371
- transactions during the batch run. Due to the large amount of data
372
- involved in batch processing, it is often advantageous to 'batch'
373
- together multiple logical units of work into one transaction, since
374
- starting and committing a transaction is extremely expensive. For
375
- example, in the playerLoad step, the framework calls read() on the
376
- item reader. The item reader reads one record from the file, and
377
- returns a domain object representation which is passed to the
378
- processor. The writer then writes the one record to the database. It
379
- can then be said that one iteration = one call to
380
- ` ItemReader.read() ` = one line of the file. Therefore, setting
381
- your commitInterval to 5 would result in the framework committing a
382
- transaction after 5 lines have been read from the file, with 5
383
- resultant entries in the PLAYERS table.
384
-
385
- Following the general flow of the batch job, the next step is to
386
- describe how each line of the file will be parsed from its string
387
- representation into a domain object. The first thing the provider
388
- will need is an ` ItemReader ` , which is provided as part of the Spring
389
- Batch infrastructure. Because the input is flat-file based, a
390
- ` FlatFileItemReader ` is used:
391
-
392
- <bean id="playerFileItemReader"
393
- class="org.springframework.batch.item.file.FlatFileItemReader">
394
- <property name="resource"
395
- value="classpath:data/footballjob/input/${player.file.name}" />
396
- <property name="lineTokenizer">
397
- <bean
398
- class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
399
- <property name="names"
400
- value="ID,lastName,firstName,position,birthYear,debutYear" />
401
- </bean>
402
- </property>
403
- <property name="fieldSetMapper">
404
- <bean
405
- class="org.springframework.batch.sample.domain.football.internal.internal.PlayerFieldSetMapper" />
406
- </property>
407
- </bean>
408
-
409
- There are three required dependencies of the item reader; the first
410
- is a resource to read in, which is the file to process. The second
411
- dependency is a ` LineTokenizer ` . The interface for a
412
- ` LineTokenizer ` is very simple, given a string; it will return a
413
- ` FieldSet ` that wraps the results from splitting the provided
414
- string. A ` FieldSet ` is Spring Batch's abstraction for flat file
415
- data. It allows developers to work with file input in much the same
416
- way as they would work with database input. All the developers need
417
- to provide is a ` FieldSetMapper ` (similar to a Spring
418
- ` RowMapper ` ) that will map the provided ` FieldSet ` into an
419
- ` Object ` . Simply by providing the names of each token to the
420
- ` LineTokenizer ` , the ` ItemReader ` can pass the
421
- ` FieldSet ` into our ` PlayerMapper ` , which implements the
422
- ` FieldSetMapper ` interface. There is a single method,
423
- ` mapLine() ` , which maps ` FieldSet ` s the same way that
424
- developers are comfortable mapping ` ResultSet ` s into Java
425
- ` Object ` s, either by index or field name. This behaviour is by
426
- intention and design similar to the ` RowMapper ` passed into a
427
- ` JdbcTemplate ` . You can see this below:
428
-
429
- public class PlayerMapper implements FieldSetMapper {
430
-
431
- public Object mapLine(FieldSet fs) {
432
-
433
- if(fs == null){
434
- return null;
435
- }
436
-
437
- Player player = new Player();
438
- player.setID(fs.readString("ID"));
439
- player.setLastName(fs.readString("lastName"));
440
- player.setFirstName(fs.readString("firstName"));
441
- player.setPosition(fs.readString("position"));
442
- player.setDebutYear(fs.readInt("debutYear"));
443
- player.setBirthYear(fs.readInt("birthYear"));
444
-
445
- return player;
446
- }
447
- }
448
-
449
- The flow of the ` ItemReader ` , in this case, starts with a call
450
- to read the next line from the file. This is passed into the
451
- provided ` LineTokenizer ` . The ` LineTokenizer ` splits the
452
- line at every comma, and creates a ` FieldSet ` using the created
453
- ` String ` array and the array of names passed in.
454
-
455
- ** Note:** it is only necessary to provide the names to create the
456
- ` FieldSet ` if you wish to access the field by name, rather
457
- than by index.
458
-
459
- Once the domain representation of the data has been returned by the
460
- provider, (i.e. a ` Player ` object in this case) it is passed to
461
- the ` ItemWriter ` , which is essentially a Dao that uses a Spring
462
- ` JdbcTemplate ` to insert a new row in the PLAYERS table.
463
-
464
- The next step, gameLoad, works almost exactly the same as the
465
- playerLoad step, except the games file is used.
466
-
467
- The final step, playerSummarization, is much like the previous two
468
- steps, in that it reads from a reader and returns a domain object to
469
- a writer. However, in this case, the input source is the database,
470
- not a file:
471
-
472
- <bean id="playerSummarizationSource" class="org.springframework.batch.item.database.JdbcCursorItemReader">
473
- <property name="dataSource" ref="dataSource" />
474
- <property name="mapper">
475
- <bean
476
- class="org.springframework.batch.sample.domain.football.internal.internal.PlayerSummaryMapper" />
477
- </property>
478
- <property name="sql">
479
- <value>
480
- SELECT games.player_id, games.year_no, SUM(COMPLETES),
481
- SUM(ATTEMPTS), SUM(PASSING_YARDS), SUM(PASSING_TD),
482
- SUM(INTERCEPTIONS), SUM(RUSHES), SUM(RUSH_YARDS),
483
- SUM(RECEPTIONS), SUM(RECEPTIONS_YARDS), SUM(TOTAL_TD)
484
- from games, players where players.player_id =
485
- games.player_id group by games.player_id, games.year_no
486
- </value>
487
- </property>
488
- </bean>
228
+ This is a (American) Football statistics loading job. It loads two files containing players and games
229
+ data into a database, and then combines them to summarise how each player performed for a particular year.
489
230
490
- The ` JdbcCursorItemReader ` has three dependences:
491
-
492
- * A ` DataSource `
493
- * The ` RowMapper ` to use for each row.
494
- * The Sql statement used to create the cursor.
495
-
496
- When the step is first started, a query will be run against the
497
- database to open a cursor, and each call to ` itemReader.read() `
498
- will move the cursor to the next row, using the provided
499
- ` RowMapper ` to return the correct object. As with the previous
500
- two steps, each record returned by the provider will be written out
501
- to the database in the PLAYER_SUMMARY table. Finally to run this
502
- sample application you can execute the JUnit test
503
- ` FootballJobFunctionalTests ` , and you'll see an output showing
504
- each of the records as they are processed. Please keep in mind that
505
- AoP is used to wrap the ` ItemWriter ` and output each record as it
506
- is processed to the logger, which may impact performance.
231
+ [ Football Job] ( ./src/main/java/org/springframework/batch/sample/football/README.md )
507
232
508
233
### Header Footer Sample
509
234
0 commit comments