Skip to content

Commit e693879

Browse files
committed
Add java configuration for the Football Job sample
Issue #3663
1 parent 4f6f31d commit e693879

30 files changed

+569
-359
lines changed

spring-batch-samples/README.md

Lines changed: 3 additions & 278 deletions
Original file line numberDiff line numberDiff line change
@@ -225,285 +225,10 @@ object
225225

226226
### Football Job
227227

228-
This is a (American) Football statistics loading job. We gave it the
229-
id of `footballJob` in our configuration file. Before diving
230-
into the batch job, we'll examine the two input files that need to
231-
be loaded. First is `player.csv`, which can be found in the
232-
samples project under
233-
src/main/resources/data/footballjob/input/. Each line within this
234-
file represents a player, with a unique id, the player’s name,
235-
position, etc:
236-
237-
AbduKa00,Abdul-Jabbar,Karim,rb,1974,1996
238-
AbduRa00,Abdullah,Rabih,rb,1975,1999
239-
AberWa00,Abercrombie,Walter,rb,1959,1982
240-
AbraDa00,Abramowicz,Danny,wr,1945,1967
241-
AdamBo00,Adams,Bob,te,1946,1969
242-
AdamCh00,Adams,Charlie,wr,1979,2003
243-
...
244-
245-
One of the first noticeable characteristics of the file is that each
246-
data element is separated by a comma, a format most are familiar
247-
with known as 'CSV'. Other separators such as pipes or semicolons
248-
could just as easily be used to delineate between unique
249-
elements. In general, it falls into one of two types of flat file
250-
formats: delimited or fixed length. (The fixed length case was
251-
covered in the `fixedLengthImportJob`.
252-
253-
The second file, 'games.csv' is formatted the same as the previous
254-
example, and resides in the same directory:
255-
256-
AbduKa00,1996,mia,10,nwe,0,0,0,0,0,29,104,,16,2
257-
AbduKa00,1996,mia,11,clt,0,0,0,0,0,18,70,,11,2
258-
AbduKa00,1996,mia,12,oti,0,0,0,0,0,18,59,,0,0
259-
AbduKa00,1996,mia,13,pit,0,0,0,0,0,16,57,,0,0
260-
AbduKa00,1996,mia,14,rai,0,0,0,0,0,18,39,,7,0
261-
AbduKa00,1996,mia,15,nyg,0,0,0,0,0,17,96,,14,0
262-
...
263-
264-
Each line in the file represents an individual player's performance
265-
in a particular game, containing such statistics as passing yards,
266-
receptions, rushes, and total touchdowns.
267-
268-
Our example batch job is going to load both files into a database,
269-
and then combine each to summarise how each player performed for a
270-
particular year. Although this example is fairly trivial, it shows
271-
multiple types of input, and the general style is a common batch
272-
scenario. That is, summarising a very large dataset so that it can
273-
be more easily manipulated or viewed by an online web-based
274-
application. In an enterprise solution the third step, the reporting
275-
step, could be implemented through the use of Eclipse BIRT or one of
276-
the many Java Reporting Engines. Given this description, we can then
277-
easily divide our batch job up into 3 'steps': one to load the
278-
player data, one to load the game data, and one to produce a summary
279-
report:
280-
281-
**Note:** One of the nice features of Spring is a project called
282-
Spring IDE. When you download the project you can install Spring
283-
IDE and add the Spring configurations to the IDE project. This is
284-
not a tutorial on Spring IDE but the visual view into Spring beans
285-
is helpful in understanding the structure of a Job
286-
Configuration. Spring IDE produces the following diagram:
287-
288-
![Spring Batch Football Object Model](src/site/resources/images/spring-batch-football-graph.jpg "Spring Batch Football Object Model")
289-
290-
This corresponds exactly with the `footballJob.xml` job
291-
configuration file which can be found in the jobs folder under
292-
`src/main/resources`. When you drill down into the football job
293-
you will see that the configuration has a list of steps:
294-
295-
<property name="steps">
296-
<list>
297-
<bean id="playerload" parent="simpleStep" .../>
298-
<bean id="gameLoad" parent="simpleStep" .../>
299-
<bean id="playerSummarization" parent="simpleStep" .../>
300-
</list>
301-
</property>
302-
303-
A step is run until there is no more input to process, which in
304-
this case would mean that each file has been completely
305-
processed. To describe it in a more narrative form: the first step,
306-
playerLoad, begins executing by grabbing one line of input from the
307-
file, and parsing it into a domain object. That domain object is
308-
then passed to a dao, which writes it out to the PLAYERS table. This
309-
action is repeated until there are no more lines in the file,
310-
causing the playerLoad step to finish. Next, the gameLoad step does
311-
the same for the games input file, inserting into the GAMES
312-
table. Once finished, the playerSummarization step can begin. Unlike
313-
the first two steps, playerSummarization input comes from the
314-
database, using a Sql statement to combine the GAMES and PLAYERS
315-
table. Each returned row is packaged into a domain object and
316-
written out to the PLAYER_SUMMARY table.
317-
318-
Now that we've discussed the entire flow of the batch job, we can
319-
dive deeper into the first step: playerLoad:
320-
321-
<bean id="playerload" parent="simpleStep">
322-
<property name="commitInterval" value="${job.commit.interval}" />
323-
<property name="startLimit" value="100" />
324-
<property name="itemReader"
325-
ref="playerFileItemReader" />
326-
<property name="itemWriter">
327-
<bean
328-
class="org.springframework.batch.sample.domain.football.internal.internal.PlayerItemWriter">
329-
<property name="playerDao">
330-
<bean
331-
class="org.springframework.batch.sample.domain.football.internal.internal.JdbcPlayerDao">
332-
<property name="dataSource"
333-
ref="dataSource" />
334-
</bean>
335-
</property>
336-
</bean>
337-
</property>
338-
</bean>
339-
340-
The root bean in this case is a `SimpleStepFactoryBean`, which
341-
can be considered a 'blueprint' of sorts that tells the execution
342-
environment basic details about how the batch job should be
343-
executed. It contains four properties: (others have been removed for
344-
greater clarity) commitInterval, startLimit, itemReader and
345-
itemWriter . After performing all necessary startup, the framework
346-
will periodically delegate to the reader and writer. In this way,
347-
the developer can remain solely concerned with their business
348-
logic.
349-
350-
* *ItemReader* – the item reader is the source of the information
351-
pipe. At the most basic level input is read in from an input
352-
source, parsed into a domain object and returned. In this way, the
353-
good batch architecture practice of ensuring all data has been
354-
read before beginning processing can be enforced, along with
355-
providing a possible avenue for reuse.
356-
357-
* *ItemWriter* – this is the business logic. At a high level,
358-
the item writer takes the item returned from the reader
359-
and 'processes' it. In our case it's a data access object that is
360-
simply responsible for inserting a record into the PLAYERS
361-
table. As you can see the developer does very little.
362-
363-
The application developer simply provides a job configuration with a
364-
configured number of steps, an ItemReader associated to some type
365-
of input source, and ItemWriter associated to some type of
366-
output source and a little mapping of data from flat records to
367-
objects and the pipe is ready wired for processing.
368-
369-
Another property in the step configuration, the commitInterval,
370-
gives the framework vital information about how to control
371-
transactions during the batch run. Due to the large amount of data
372-
involved in batch processing, it is often advantageous to 'batch'
373-
together multiple logical units of work into one transaction, since
374-
starting and committing a transaction is extremely expensive. For
375-
example, in the playerLoad step, the framework calls read() on the
376-
item reader. The item reader reads one record from the file, and
377-
returns a domain object representation which is passed to the
378-
processor. The writer then writes the one record to the database. It
379-
can then be said that one iteration = one call to
380-
`ItemReader.read()` = one line of the file. Therefore, setting
381-
your commitInterval to 5 would result in the framework committing a
382-
transaction after 5 lines have been read from the file, with 5
383-
resultant entries in the PLAYERS table.
384-
385-
Following the general flow of the batch job, the next step is to
386-
describe how each line of the file will be parsed from its string
387-
representation into a domain object. The first thing the provider
388-
will need is an `ItemReader`, which is provided as part of the Spring
389-
Batch infrastructure. Because the input is flat-file based, a
390-
`FlatFileItemReader` is used:
391-
392-
<bean id="playerFileItemReader"
393-
class="org.springframework.batch.item.file.FlatFileItemReader">
394-
<property name="resource"
395-
value="classpath:data/footballjob/input/${player.file.name}" />
396-
<property name="lineTokenizer">
397-
<bean
398-
class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
399-
<property name="names"
400-
value="ID,lastName,firstName,position,birthYear,debutYear" />
401-
</bean>
402-
</property>
403-
<property name="fieldSetMapper">
404-
<bean
405-
class="org.springframework.batch.sample.domain.football.internal.internal.PlayerFieldSetMapper" />
406-
</property>
407-
</bean>
408-
409-
There are three required dependencies of the item reader; the first
410-
is a resource to read in, which is the file to process. The second
411-
dependency is a `LineTokenizer`. The interface for a
412-
`LineTokenizer` is very simple, given a string; it will return a
413-
`FieldSet` that wraps the results from splitting the provided
414-
string. A `FieldSet` is Spring Batch's abstraction for flat file
415-
data. It allows developers to work with file input in much the same
416-
way as they would work with database input. All the developers need
417-
to provide is a `FieldSetMapper` (similar to a Spring
418-
`RowMapper`) that will map the provided `FieldSet` into an
419-
`Object`. Simply by providing the names of each token to the
420-
`LineTokenizer`, the `ItemReader` can pass the
421-
`FieldSet` into our `PlayerMapper`, which implements the
422-
`FieldSetMapper` interface. There is a single method,
423-
`mapLine()`, which maps `FieldSet`s the same way that
424-
developers are comfortable mapping `ResultSet`s into Java
425-
`Object`s, either by index or field name. This behaviour is by
426-
intention and design similar to the `RowMapper` passed into a
427-
`JdbcTemplate`. You can see this below:
428-
429-
public class PlayerMapper implements FieldSetMapper {
430-
431-
public Object mapLine(FieldSet fs) {
432-
433-
if(fs == null){
434-
return null;
435-
}
436-
437-
Player player = new Player();
438-
player.setID(fs.readString("ID"));
439-
player.setLastName(fs.readString("lastName"));
440-
player.setFirstName(fs.readString("firstName"));
441-
player.setPosition(fs.readString("position"));
442-
player.setDebutYear(fs.readInt("debutYear"));
443-
player.setBirthYear(fs.readInt("birthYear"));
444-
445-
return player;
446-
}
447-
}
448-
449-
The flow of the `ItemReader`, in this case, starts with a call
450-
to read the next line from the file. This is passed into the
451-
provided `LineTokenizer`. The `LineTokenizer` splits the
452-
line at every comma, and creates a `FieldSet` using the created
453-
`String` array and the array of names passed in.
454-
455-
**Note:** it is only necessary to provide the names to create the
456-
`FieldSet` if you wish to access the field by name, rather
457-
than by index.
458-
459-
Once the domain representation of the data has been returned by the
460-
provider, (i.e. a `Player` object in this case) it is passed to
461-
the `ItemWriter`, which is essentially a Dao that uses a Spring
462-
`JdbcTemplate` to insert a new row in the PLAYERS table.
463-
464-
The next step, gameLoad, works almost exactly the same as the
465-
playerLoad step, except the games file is used.
466-
467-
The final step, playerSummarization, is much like the previous two
468-
steps, in that it reads from a reader and returns a domain object to
469-
a writer. However, in this case, the input source is the database,
470-
not a file:
471-
472-
<bean id="playerSummarizationSource" class="org.springframework.batch.item.database.JdbcCursorItemReader">
473-
<property name="dataSource" ref="dataSource" />
474-
<property name="mapper">
475-
<bean
476-
class="org.springframework.batch.sample.domain.football.internal.internal.PlayerSummaryMapper" />
477-
</property>
478-
<property name="sql">
479-
<value>
480-
SELECT games.player_id, games.year_no, SUM(COMPLETES),
481-
SUM(ATTEMPTS), SUM(PASSING_YARDS), SUM(PASSING_TD),
482-
SUM(INTERCEPTIONS), SUM(RUSHES), SUM(RUSH_YARDS),
483-
SUM(RECEPTIONS), SUM(RECEPTIONS_YARDS), SUM(TOTAL_TD)
484-
from games, players where players.player_id =
485-
games.player_id group by games.player_id, games.year_no
486-
</value>
487-
</property>
488-
</bean>
228+
This is a (American) Football statistics loading job. It loads two files containing players and games
229+
data into a database, and then combines them to summarise how each player performed for a particular year.
489230

490-
The `JdbcCursorItemReader` has three dependences:
491-
492-
* A `DataSource`
493-
* The `RowMapper` to use for each row.
494-
* The Sql statement used to create the cursor.
495-
496-
When the step is first started, a query will be run against the
497-
database to open a cursor, and each call to `itemReader.read()`
498-
will move the cursor to the next row, using the provided
499-
`RowMapper` to return the correct object. As with the previous
500-
two steps, each record returned by the provider will be written out
501-
to the database in the PLAYER_SUMMARY table. Finally to run this
502-
sample application you can execute the JUnit test
503-
`FootballJobFunctionalTests`, and you'll see an output showing
504-
each of the records as they are processed. Please keep in mind that
505-
AoP is used to wrap the `ItemWriter` and output each record as it
506-
is processed to the logger, which may impact performance.
231+
[Football Job](./src/main/java/org/springframework/batch/sample/football/README.md)
507232

508233
### Header Footer Sample
509234

0 commit comments

Comments
 (0)