Commit 71b4c4ba authored by Jun Matsushita's avatar Jun Matsushita

More detailed proposal with api gateway URLs instead of direct posting to couch. #11

parent 1c7db818
## watch-url
1. watch-url periodically (and on startup) queries its config - with list of URLs to watch and xpath) : `GET https://api.openintegrity.org/url/config/:agent-id`
2. watch-url follows `config` instructions to watch periodically the URLs? Which means for each:
- Fetch the latest ETag we have in the event store: `GET https://api.openintegrity.org/url/etag/https://guardianproject.info/home/data-usage-and-protection-policies/`
1. Periodically (and on startup) query its config - with list of URLs to watch and xpath) : `GET https://api.openintegrity.org/url/config/:agent-id`
2. Follow `config` instructions to watch periodically the URLs? Which means for each:
- Fetch the latest ETag we have in the event store: `GET https://api.openintegrity.org/urls/etag/https://guardianproject.info/home/data-usage-and-protection-policies/`
- Queries ETag Header from webserver.
- Compares and only proceed if the ETag has changed
3. watch-url asks `fetch-url` microservice to fetch url.
3. Update the latest ETag `POST https://api.openintegrity.org/urls/etag/https://guardianproject.info/home/data-usage-and-protection-policies/` with
```
{
e: "https://guardianproject.info/home/data-usage-and-protection-policies/",
a: "etag",
v: E43A29F57...,
t: {
id: "watch-url",
agent: $AGENT_ID,
timestamp: $AGENT_TIMESTAMP
}
}
```
> e/a/v/t is for entity/attribute/value/transaction (from [datomic](http://docs.datomic.com/glossary.html#sec-44)) which is equivalent to a linked data quad subject/predicate/object/context
4. Call `fetch-url` microservice to fetch url (with `:url` and `:etag` parameters).
## fetch-url
1. fetch-url receives GET request to retrieve html policy
2. fetch-url sends GET policy to website policy (ie. GET https://chatsecure.org/privacy/)
3. fetch-url stores policy html in fs (ie. policies/theguardianproject/chatsecure/privacy_sha.html)
4. fetch-url sends GET policy to analize_policy (ie. GET https://xxx.iilab.org/analize/policies/theguardianproject/chatsecure)
1. Receive `:url` and `:etag` parameter to retrieve a web page.
2. Send GET policy to website policy (ie. GET https://chatsecure.org/privacy/)
- Uses [Conditional GET](https://spaces.internet2.edu/display/InCFederation/HTTP+Conditional+GET) (if HTTP/1.1)
3. Store policy html in the file system (evidence archive?) with SHA256 name (/data/urls/https://guardianproject.info/home/data-usage-and-protection-policies/E43A29F57....html)
- If the SHA256 is already in the file system then do not proceed.
4. Call `analyse-policy` microservice (with `:url` and `:sha` parameter)
## analyse-policy
1. analyze_policy receives GET policy
2. analize_policy look if the sha of the html file is in the store (ie. policies/theguardianproject/chatsecure/privacy_sha.html)
3. if html different, analize_policy converts html to markdown (policies/theguardianproject/chatsecure/privacy_sha.md)
4. // send HTTP GET last markdown to Couchdb (ie. GET https://oii-db.iilab.org/_utils/document.html?policies/theguardianproject/chatsecure/privacy_sha.md)
5. // compare markdown with last_markdown
6. if there are changes, analize_policy send POST markdown to Couchdb (ie. POST https://oii-db.iilab.org/_utils/document.html?policies/theguardianproject/chatsecure/privacy_sha.md)
\ No newline at end of file
1. Receives `:url` and `:sha` parameters
2. Load `:url` from file system with `:sha` ( /data/urls/https://guardianproject.info/home/data-usage-and-protection-policies/E43A29F57....html )
3. Converts html to markdown in fs (/data/policies/https://guardianproject.info/home/data-usage-and-protection-policies/29FE43A57....md)
4. Retrieve last known markdown SHA from event store (`GET https://api.openintegrity.org/policies/https://guardianproject.info/home/data-usage-and-protection-policies/`)
5. Compare markdown SHA with last markdown SHA
- If they are the same then do not proceed.
6. Post policy markdown (ie. `POST https://api.openintegrity.org/policies/https://guardianproject.info/home/data-usage-and-protection-policies/` with
```
{
e: "https://guardianproject.info/home/data-usage-and-protection-policies/",
a: "markdown",
v: "### Policy/n * blah/n",
m: {
id: "analyse-policy",
agent: $AGENT_ID,
timestamp: $AGENT_TIMESTAMP
}
}
```
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment