Skip to content
This repository was archived by the owner on Jul 3, 2024. It is now read-only.

DEV-667: stage item in repo & index with catalog & full-text #1

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
0348fde
DEV-667: WIP: stage & index item
aelkiss Mar 22, 2023
b627221
Reorganize docker compose for staging item
aelkiss Mar 22, 2023
a1bfc62
actually index the item into solr
aelkiss Mar 23, 2023
59b05bf
Update README & docker-compose (provisional)
aelkiss Mar 23, 2023
5b3d9d9
setup script to clone needed repos
aelkiss Mar 23, 2023
9fd1c16
README for stage item; fix path in docker-compose
aelkiss Mar 23, 2023
57edd7a
DEV-667: reconcile this and imgsrv
aelkiss Mar 24, 2023
eddf216
DEV-663: pt, ssd config
aelkiss Mar 27, 2023
e0f13c1
remove branch for imgsrv-sample-data
aelkiss Mar 27, 2023
b8c741b
Ensure solr core is writable
aelkiss Mar 27, 2023
af9b381
usage & pairtree creation with stage-item
aelkiss Mar 27, 2023
a66feeb
use geoip branch for imgsrv
aelkiss Mar 27, 2023
80fbcab
(WIP, untested) check out and build everything here
aelkiss Mar 28, 2023
64d520e
use branches for pt, ssd, imgsrv
aelkiss Mar 28, 2023
98d9d81
fixup over-aggressive search & replace in setup.sh
aelkiss Mar 28, 2023
01877a9
Fixes for getting everything under this directory
aelkiss Mar 28, 2023
6fe655e
Use released version of ht-pairtree
aelkiss Mar 29, 2023
a4a8b71
Remove branches from slip & imgsrv
aelkiss Mar 29, 2023
4d69341
Try to run solr as current user
aelkiss Mar 29, 2023
278c28a
Run apache as current user
aelkiss Mar 29, 2023
b831a1c
Indicate pt is available and ls is not
aelkiss Mar 30, 2023
871ef4b
correct port in README
aelkiss Mar 30, 2023
580d9b6
ensure cache directory is present
aelkiss Mar 30, 2023
9d0d5b4
Update instructions for sample item
aelkiss Mar 30, 2023
5bf4344
link parent dir env file
aelkiss Mar 30, 2023
718cb42
update TODO
aelkiss Mar 30, 2023
5719d79
Add additional TODOs to README
aelkiss Mar 30, 2023
3c3e05b
continue taking stuff off of branches
aelkiss Mar 31, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
vendor
.bundle
.env
stage-item/*.xml
stage-item/*.zip

# other repositories

catalog/
common/
hathitrust_catalog_indexer/
ht-pairtree/
imgsrv-sample-data/
imgsrv/
lss_solr_configs/
pt/
sample-data/
slip/
ssd/
logs/
cache/
219 changes: 0 additions & 219 deletions Dockerfile

This file was deleted.

110 changes: 79 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,64 +8,112 @@ Clone all the repositories in a working directory.
We're going to be running docker from this working directory,
so `babel-local-dev` has access to the other repositories.

There's a lot, because we're replicating running on the
dev servers with `debug_local=1` enabled.

```
$ mkdir workdir
$ cd workdir
$ git clone [email protected]:hathitrust/babel-local-dev.git
$ git clone [email protected]:hathitrust/catalog.git
$ git clone [email protected]:hathitrust/common.git
$ git clone [email protected]:hathitrust/imgsrv.git
$ git clone [email protected]:hathitrust/pt.git
$ git clone [email protected]:hathitrust/mdp-lib.git
$ git clone [email protected]:hathitrust/slip-lib.git
$ git clone [email protected]:hathitrust/plack-lib.git
$ git clone [email protected]:hathitrust/imgsrv-sample-data.git
# more to come
First clone this repository:
```bash
git clone [email protected]:hathitrust/babel-local-dev.git babel
```

## Step 2: intialize all the submodules
Then run:

*Insert fancy one liner if available.*
```bash
cd babel
./setup.sh
```

This will check out the other repositories along with their submodules.
There's a lot, because we're replicating running on the dev servers with
`debug_local=1` enabled.

## Step 3: build the `babel-local-dev` environment

In your workdir:

```
docker-compose -f ./babel-local-dev/docker-compose.yml build
docker-compose build
```

## Step 4: run `babel-local-dev`:

In your workdir:

```
docker-compose -f ./babel-local-dev/docker-compose.yml up
docker-compose up
```

In your browser:

* http://localhost:8080/Search/Home
* http://localhost:8080/cgi/pt?id=test.pd_open
* catalog: `http://localhost:8080/Search/Home`
* catalog solr: `http://localhost:9033`
* full-text solr: `http://localhost:8983`

PageTurner & imgsrv:

* `http://localhost:8080/cgi/pt?id=test.pd_open`
* `http://localhost:8080/cgi/imgsrv/cover?id=test.pd_open`
* `http://localhost:8080/cgi/imgsrv/image?id=test.pd_open&seq=1`
* `http://localhost:8080/cgi/imgsrv/html?id=test.pd_open&seq=1`
* `http://localhost:8080/cgi/imgsrv/download/pdf?id=test.pd_open&seq=1&attachment=0`

mysql is exposed at 127.0.0.1:3307. The default username & password with write
access is `mdp-admin` / `mdp-admin` (needless to say, do not use this image in
production!)

```bash
mysql -h 127.0.0.1 -p 3307 -u mdp-admin -p
```
Huzzah!

Not yet configured:
* `http://localhost:8080/cgi/mb`
* `http://localhost:8080/cgi/ls`
* `http://localhost:8080/cgi/whoami`
* `http://localhost:8080/cgi/ping`
* etc

## How this works (for now)

The `docker-commpose` provides a custom catalog configuration to the `nginx` service to
proxy `babel` CGI requests to the `apache-cgi` service, and serve `common` requests from
the local `common` checkout.
* catalog runs nginx + php
* babel cgi apps run under apache in a single container
* imgsrv plack/psgi process runs in its own container

## Staging an Item

`apache-cgi` is there because `nginx` can only speak FastCGI/HTTP and running *all* the babel
apps under FastCGI/HTTP is still aspirational.
First, get a HathiTrust ZIP and METS. The easiest way to do this is probably by
using the [Data API client](https://babel.hathitrust.org/cgi/htdc) to download
a public domain item unencumbered by any contractual restrictions, for example
`uc2.ark:/13960/t4mk66f1d`. Select "Download" and in turn select "Item METS
file" and "entire item" and submit the form; this will download the ZIP and
METS respectively.

Running the stage item script requires a Ruby runtime. It will automate putting
the item in the appropriate location under `imgsrv-sample-data`, fetch the
bibliographic data, and extract and index the full text.

First make sure all the dependencies are running:

```bash
docker-compose build
docker-compose up
```

Then, install dependencies for the `stage-item` script and run it with the
downloaded zip and METS:

```bash
cd stage-item
bundle config set --local path 'vendor/bundle'
bundle install
bundle exec ruby stage_item.rb uc2.ark:/13960/t4mk66f1d ark+=13960=t4mk66f1d.zip ark+=13960=t4mk66f1d.mets.xml
```

Note that the zip and METS must be named as they are in the actual
repository -- if you name them "foo.zip" or "foo.xml" they will not be renamed,
and full-text indexing and PageTurner will not be able to find the item.

## TODO

- [ ] merge the `imgsrv` DEV-231-grok branch and update the `Dockerfile`s to include `grok`
- [ ] update `slip-lib/Searcher.pm` to set `wt=xml` because the new solr defaults return JSON
- [ ] adding `pt` requires filling out more of the `ht_web` tables (namely `mb_*`)
- [ ] add `mb` and `ls`
- [ ] ensure database user can write to relevant tables
- [ ] link to documentation for important tasks - e.g. running apps under debugging, updating css/js, etc
- [ ] easy mechanism to generate placeholder volumes in `imgsrv-sample-data` that correspond to the records in the catalog

- [ ] make it easier to fetch real volumes
Empty file added cache/.keep
Empty file.
Loading