HTTP API Reference
Table of content:
- Status Codes
- POST /dir
- POST /file
- POST /cp
- POST /dataset
- POST /update
- POST /find
- GET /dir/{cid}
- GET /file/{cid}
- GET /schema/{cid}
- POST /extract/{cid}
Status Codes
HTTP status codes used by the core API are simple:
- 200 OK: The request was processed or is being processed (streaming).
- 400 Bad Request: Malformed request was received.
- 404 Not Found: The endpoint does not exist.
POST /dir
Create an empty directory.
Request
No input is required.
Example
POST /dir
Response
The call to this endpoint will return a body in JSON containing either the cid
(content ID) of the data on success or the error
message upon errors.
Example
HTTP/1.1 200 OK
Content-Type: application/json
{"cid":"QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn"}
POST /file
Add the file to the underlying file system.
Request
The entire request body shall be added as-is to the underlying storage. The following headers are required:
- Content-Length: Size of the data in the body
- Content-Type: MIME type of the data
Example
POST /file HTTP/1.1
Accept: application/json
Content-Length: 1284
Content-Type: text/plain
I'd just like to interject for a moment. What you're referring to as Linux,
is in fact, GNU/Linux, or as I've recently taken to calling it, GNU plus Linux.
Linux is not an operating system unto itself, but rather another free component
of a fully functioning GNU system made useful by the GNU corelibs, shell
utilities and vital system components comprising a full OS as defined by POSIX.
Many computer users run a modified version of the GNU system every day,
without realizing it. Through a peculiar turn of events, the version of GNU
which is widely used today is often called "Linux", and many of its users are
not aware that it is basically the GNU system, developed by the GNU Project.
There really is a Linux, and these people are using it, but it is just a
part of the system they use. Linux is the kernel: the program in the system
that allocates the machine's resources to the other programs that you run.
The kernel is an essential part of an operating system, but useless by itself;
it can only function in the context of a complete operating system. Linux is
normally used in combination with the GNU operating system: the whole system
is basically GNU with Linux added, or GNU/Linux. All the so-called "Linux"
distributions are really distributions of GNU/Linux.
Response
The call to this endpoint will return a body in JSON containing either the cid
(content ID) of the data on success or the error
message upon errors.
Example
HTTP/1.1 200 OK
Content-Type: application/json
{"cid":"QmbwXK2Wg6npoAusr9MkSduuAViS6dxEQBNzqoixanVtj5"}
POST /cp
Copy file or directory inside a directory.
Request
The request body must be a JSON object with the following fields:
src
: CID of source file or directorydest
: CID of destination directorypath
: relative path inside ofdest
Example
Copy QmbwXK2Wg6npoAusr9MkSduuAViS6dxEQBNzqoixanVtj5
to QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/interjection
:
POST /cp HTTP/1.1
Accept: application/json
Content-Type: application/json
{"src": "QmbwXK2Wg6npoAusr9MkSduuAViS6dxEQBNzqoixanVtj5",
"dest": "QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG",
"path": "interjection"}
Response
The call to this endpoint will return a body in JSON containing either the cid
of the resulting directory on success or the error
message upon errors.
Example
HTTP/1.1 200 OK
Content-Type: application/json
{"cid":"QmPao7zTNvuqH2pAVUgquYXgEhqoiTBpdjU7AwgZvsta9r"}
POST /dataset
Add the dataset to the lake.
Request
The request body must be a JSON object. The following fields are required:
file
: ID of the data file or directorydescription
: Breif description of the datsetsource
: Preferably an URI to the original datasettopics
: Array of keywords related to the dataset
Extra fields will be ingested and used for later indexing.
Example
POST /dataset HTTP/1.1
Accept: application/json
Content-Type: application/json
{"file": "QmbwXK2Wg6npoAusr9MkSduuAViS6dxEQBNzqoixanVtj5",
"description": "Interjection",
"source": "https://wiki.installgentoo.com/index.php/Interjection",
"topics": ["Natural language", "copypasta"]}
Response
The call to this endpoint will return a body in JSON containing either the automatically generated dataset id
on success or the error
message upon errors.
Example
HTTP/1.1 200 OK
Content-Type: application/json
{"id":"42"}
POST /update
Add the updated dataset to the lake.
Request
The request body must be a JSON object containing the field parent
. Other fields with be merged with the parent dataset entry.
Example
POST /dataset HTTP/1.1
Accept: application/json
Content-Type: application/json
{"parent": "42", "language": "English"}
Response
The call to this endpoint will return a body in JSON containing either the automatically generated dataset id
on success or the error
message upon errors.
Example
HTTP/1.1 200 OK
Content-Type: application/json
{"id":"69"}
POST /find
Find the data according to the given predicate.
Request
The predicate must be a valid query AST, represented in JSON.
Example
Find all data smaller than 4 KiB:
POST /find HTTP/1.1
Accept: application/json
Content-Type: application/json
["<", [".", ["$"], "length"], 4096]
Response
The server must respond in JSON with an array of objects, each representing a datum. In case of an error, the response would be an JSON object with an error
field.
Example
HTTP/1.1 200 OK
Content-Type: application/json
[{"cid":"QmbwXK2Wg6npoAusr9MkSduuAViS6dxEQBNzqoixanVtj5","id":"589836fe-c2f7-4d21-a521-688439bc74a4","language":"English","length":1284,"name":"Interjection","source":"https:\/\/wiki.installgentoo.com\/index.php\/Interjection","topics":["Natural language","copypasta"],"type":"application\/x-www-form-urlencoded"}]
GET /dir/{cid}
List content of a file system directory.
Request
In the URI, cid
specifies the content identifier of the wanted file.
Example
List the directory of CID QmSnuWmxptJZdLJpKRarxBMS2Ju2oANVrgbr2xWbie9b2D
:
GET /dir/QmSnuWmxptJZdLJpKRarxBMS2Ju2oANVrgbr2xWbie9b2D
Response
The server must respond with a JSON object of name: cid
upon success or the error
message in case of an error.
Example
HTTP/1.1 200 OK
Content-Type: application/json
{"albums":"QmUh6QSTxDKX5qoNU1GoogbhTveQQV9JMeQjfFVchAtd5Q","README.txt":"QmP8jTG1m9GSDJLCbeWhVSVgEzCPPwXRdCRuJtQ5Tz9Kc9","build_frontend_index.py":"QmRSxRRu9AoJ23bxb2pFeoAUFXMAdki7RZu2T7e6zHRdu6","_Metadata.json":"QmWXShtJXt6Mw3FH7hVCQvR56xPcaEtSj4YFSGjp2QxA4v","apolloarchivr.py":"QmU7gJi6Bz3jrvbuVfB7zzXStLJrTHf6vWh8ZqkCsTGoRC","frontend":"QmeQtZfwuq6aWRarY9P3L9MWhZ6QTonDe9ahWECGBZjyEJ"}
GET /file/{cid}
Stream content from underlying file system.
Request
In the URI, cid
specifies the content identifier of the wanted file.
Example
Stream the file of content ID QmbwXK2Wg6npoAusr9MkSduuAViS6dxEQBNzqoixanVtj5
:
GET /file/QmbwXK2Wg6npoAusr9MkSduuAViS6dxEQBNzqoixanVtj5
Response
The server should respond with a octet stream, transferred in chunks. In case of an error, the response would be an JSON object with field error
.
Example
HTTP/1.1 200 OK
Content-Type: application/octet-stream
I'd just like to interject for a moment. What you're referring to as Linux,
is in fact, GNU/Linux, or as I've recently taken to calling it, GNU plus Linux.
Linux is not an operating system unto itself, but rather another free component
of a fully functioning GNU system made useful by the GNU corelibs, shell
utilities and vital system components comprising a full OS as defined by POSIX.
Many computer users run a modified version of the GNU system every day,
without realizing it. Through a peculiar turn of events, the version of GNU
which is widely used today is often called "Linux", and many of its users are
not aware that it is basically the GNU system, developed by the GNU Project.
There really is a Linux, and these people are using it, but it is just a
part of the system they use. Linux is the kernel: the program in the system
that allocates the machine's resources to the other programs that you run.
The kernel is an essential part of an operating system, but useless by itself;
it can only function in the context of a complete operating system. Linux is
normally used in combination with the GNU operating system: the whole system
is basically GNU with Linux added, or GNU/Linux. All the so-called "Linux"
distributions are really distributions of GNU/Linux.
GET /schema/{cid}
JSON schema of (semi-)structured content (JSON or CSV).
Request
In the URI, cid
specifies the content identifier of the wanted JSON or CSV.
Example
Get schema of content of ID QmPVydGNAbc7t4CEf3qxETRNjYkXotABEeN2WBXkkGNc5H
:
GET /schema/QmPVydGNAbc7t4CEf3qxETRNjYkXotABEeN2WBXkkGNc5H
Response
The server should respond with a JSON schema. In case of an error, the response would be an JSON object with field error
.
Example
HTTP/1.1 200 OK
Content-Type: application/json
{
"$schema": "http://json-schema.org/draft-07/schema#"
"title": "QmPVydGNAbc7t4CEf3qxETRNjYkXotABEeN2WBXkkGNc5H",
"type": "array",
"items": {
"type": "object",
"properties": {
"country_code": { "type": "string" },
"country_name": { "type": "string" },
"indicator_code": { "type": "string" },
"indicator_name": { "type": "string" }
"year_1969": { "type": "number" },
...
}
},
}
POST /extract/{cid}
Extract specific rows from (semi-)structured data.
Request
In the URI, cid
specifies the content identifier of the wanted JSON or CSV. The body must be a valid query AST predicate, represented in JSON.
Example
Extract the rows with country_name
of Vietnam
from QmPVydGNAbc7t4CEf3qxETRNjYkXotABEeN2WBXkkGNc5H
:
POST /extract/QmPVydGNAbc7t4CEf3qxETRNjYkXotABEeN2WBXkkGNc5H
Content-Type: application/json
["==", [".", ["$"], "country_name"], "Vietnam"]
Response
The server’s response must be in JSON. In case of an error, it shall be explained in the field error
.
Example
HTTP/1.1 200 OK
Content-Type: application/json
[
{
"country_code": "VNM",
"country_name": "Vietnam",
"indicator_code": "SP.POP.TOTL",
"indicator_name": "Population, total",
"year_1969": "42307146",
...
}
]