Python client¶
Here we describe the Python client of Rubrix that we divide into two basic modules:
Methods: These methods make up the interface to interact with Rubrix’s REST API.
Models: You need to wrap your data in these data models for Rubrix to understand it.
Methods¶
This module contains the interface to access Rubrix’s REST API.
- rubrix.copy(dataset, name_of_copy)¶
Creates a copy of a dataset including its tags and metadata
- Parameters
dataset (str) – Name of the source dataset
name_of_copy (str) – Name of the copied dataset
Examples
>>> import rubrix as rb >>> rb.copy("my_dataset", name_of_copy="new_dataset") >>> dataframe = rb.load("new_dataset")
- rubrix.delete(name)¶
Delete a dataset.
- Parameters
name (str) – The dataset name.
- Return type
None
Examples
>>> import rubrix as rb >>> rb.delete(name="example-dataset")
- rubrix.init(api_url=None, api_key=None, timeout=60)¶
Init the python client.
Passing an api_url disables environment variable reading, which will provide default values.
- Parameters
api_url (Optional[str]) – Address of the REST API. If None (default) and the env variable
RUBRIX_API_URL
is not set, it will default to http://localhost:6900.api_key (Optional[str]) – Authentification key for the REST API. If None (default) and the env variable
RUBRIX_API_KEY
is not set, it will default to rubrix.apikey.timeout (int) – Wait timeout seconds for the connection to timeout. Default: 60.
- Return type
None
Examples
>>> import rubrix as rb >>> rb.init(api_url="http://localhost:9090", api_key="4AkeAPIk3Y")
- rubrix.load(name, ids=None, limit=None)¶
Load dataset data to a pandas DataFrame.
- Parameters
name (str) – The dataset name.
ids (Optional[List[Union[str, int]]]) – If provided, load dataset records with given ids.
limit (Optional[int]) – The number of records to retrieve.
- Returns
The dataset as a pandas Dataframe.
- Return type
pandas.core.frame.DataFrame
Examples
>>> import rubrix as rb >>> dataframe = rb.load(name="example-dataset")
- rubrix.log(records, name, tags=None, metadata=None, chunk_size=500)¶
Log Records to Rubrix.
- Parameters
records (Union[rubrix.client.models.TextClassificationRecord, rubrix.client.models.TokenClassificationRecord, rubrix.client.models.Text2TextRecord, Iterable[Union[rubrix.client.models.TextClassificationRecord, rubrix.client.models.TokenClassificationRecord, rubrix.client.models.Text2TextRecord]]]) – The record or an iterable of records.
name (str) – The dataset name.
tags (Optional[Dict[str, str]]) – A dictionary of tags related to the dataset.
metadata (Optional[Dict[str, Any]]) – A dictionary of extra info for the dataset.
chunk_size (int) – The chunk size for a data bulk.
- Returns
Summary of the response from the REST API
- Return type
Examples
>>> import rubrix as rb >>> record = rb.TextClassificationRecord( ... inputs={"text": "my first rubrix example"}, ... prediction=[('spam', 0.8), ('ham', 0.2)] ... ) >>> response = rb.log(record, name="example-dataset")
Models¶
This module contains the data models for the interface
- class rubrix.client.models.BulkResponse(*, dataset, processed, failed=0)¶
Summary response when logging records to the Rubrix server.
- Parameters
dataset (str) – The dataset name.
processed (int) – Number of records in bulk.
failed (Optional[int]) – Number of failed records.
- Return type
None
- class rubrix.client.models.Text2TextRecord(*args, text, prediction=None, annotation=None, prediction_agent=None, annotation_agent=None, id=None, metadata=None, status=None, event_timestamp=None)¶
Record for a text to text task
- Parameters
text (str) – The input of the record
prediction (Optional[List[Union[str, Tuple[str, float]]]]) – A list of strings or tuples containing predictions for the input text. If tuples, the first entry is the predicted text, the second entry is its corresponding score.
annotation (Optional[str]) – A string representing the expected output text for the given input text.
prediction_agent (Optional[str]) – Name of the prediction agent. By default, this is set to the hostname of your machine.
annotation_agent (Optional[str]) – Name of the prediction agent. By default, this is set to the hostname of your machine.
id (Optional[Union[int, str]]) – The id of the record. By default (None), we will generate a unique ID for you.
metadata (Dict[str, Any]) – Meta data for the record. Defaults to {}.
status (Optional[str]) – The status of the record. Options: ‘Default’, ‘Edited’, ‘Discarded’, ‘Validated’. If an annotation is provided, this defaults to ‘Validated’, otherwise ‘Default’.
event_timestamp (Optional[datetime.datetime]) – The timestamp of the record.
- Return type
None
Examples
>>> import rubrix as rb >>> record = rb.Text2TextRecord( ... text="My name is Sarah and I love my dog.", ... prediction=["Je m'appelle Sarah et j'aime mon chien."] ... )
- classmethod prediction_as_tuples(prediction)¶
Preprocess the predictions and wraps them in a tuple if needed
- Parameters
prediction (Optional[List[Union[str, Tuple[str, float]]]]) –
- class rubrix.client.models.TextClassificationRecord(*args, inputs, prediction=None, annotation=None, prediction_agent=None, annotation_agent=None, multi_label=False, explanation=None, id=None, metadata=None, status=None, event_timestamp=None)¶
Record for text classification
- Parameters
inputs (Union[str, List[str], Dict[str, Union[str, List[str]]]]) – The inputs of the record
prediction (Optional[List[Tuple[str, float]]]) – A list of tuples containing the predictions for the record. The first entry of the tuple is the predicted label, the second entry is its corresponding score.
annotation (Optional[Union[str, List[str]]]) – A string or a list of strings (multilabel) corresponding to the annotation (gold label) for the record.
prediction_agent (Optional[str]) – Name of the prediction agent. By default, this is set to the hostname of your machine.
annotation_agent (Optional[str]) – Name of the prediction agent. By default, this is set to the hostname of your machine.
multi_label (bool) – Is the prediction/annotation for a multi label classification task? Defaults to False.
explanation (Optional[Dict[str, List[rubrix.client.models.TokenAttributions]]]) – A dictionary containing the attributions of each token to the prediction. The keys map the input of the record (see inputs) to the TokenAttributions.
id (Optional[Union[int, str]]) – The id of the record. By default (None), we will generate a unique ID for you.
metadata (Dict[str, Any]) – Meta data for the record. Defaults to {}.
status (Optional[str]) – The status of the record. Options: ‘Default’, ‘Edited’, ‘Discarded’, ‘Validated’. If an annotation is provided, this defaults to ‘Validated’, otherwise ‘Default’.
event_timestamp (Optional[datetime.datetime]) – The timestamp of the record.
- Return type
None
Examples
>>> import rubrix as rb >>> record = rb.TextClassificationRecord( ... inputs={"text": "my first rubrix example"}, ... prediction=[('spam', 0.8), ('ham', 0.2)] ... )
- classmethod input_as_dict(inputs)¶
Preprocess record inputs and wraps as dictionary if needed
- class rubrix.client.models.TokenAttributions(*, token, attributions=None)¶
Attribution of the token to the predicted label.
In the Rubrix app this is only supported for
TextClassificationRecord
and themulti_label=False
case.- Parameters
token (str) – The input token.
attributions (Dict[str, float]) – A dictionary containing label-attribution pairs.
- Return type
None
- class rubrix.client.models.TokenClassificationRecord(*args, text, tokens, prediction=None, annotation=None, prediction_agent=None, annotation_agent=None, id=None, metadata=None, status=None, event_timestamp=None)¶
Record for a token classification task
- Parameters
text (str) – The input of the record
tokens (List[str]) – The tokenized input of the record. We use this to guide the annotation process and to cross-check the spans of your prediction/annotation.
prediction (Optional[List[Union[Tuple[str, int, int], Tuple[str, int, int, float]]]]) – A list of tuples containing the predictions for the record. The first entry of the tuple is the name of predicted entity, the second and third entry correspond to the start and stop character index of the entity. EXPERIMENTAL: The fourth entry is optional and corresponds to the score of the entity.
annotation (Optional[List[Tuple[str, int, int]]]) – A list of tuples containing annotations (gold labels) for the record. The first entry of the tuple is the name of the entity, the second and third entry correspond to the start and stop char index of the entity.
prediction_agent (Optional[str]) – Name of the prediction agent. By default, this is set to the hostname of your machine.
annotation_agent (Optional[str]) – Name of the prediction agent. By default, this is set to the hostname of your machine.
id (Optional[Union[int, str]]) – The id of the record. By default (None), we will generate a unique ID for you.
metadata (Dict[str, Any]) – Meta data for the record. Defaults to {}.
status (Optional[str]) – The status of the record. Options: ‘Default’, ‘Edited’, ‘Discarded’, ‘Validated’. If an annotation is provided, this defaults to ‘Validated’, otherwise ‘Default’.
event_timestamp (Optional[datetime.datetime]) – The timestamp of the record.
- Return type
None
Examples
>>> import rubrix as rb >>> record = rb.TokenClassificationRecord( ... text = "Michael is a professor at Harvard", ... tokens = ["Michael", "is", "a", "professor", "at", "Harvard"], ... prediction = [('NAME', 0, 7), ('LOC', 26, 33)] ... )