The `Cpt` class¶

class cpt.cpt.Cpt¶

Compact Prediction Tree class.

Attributes

split_lengthint, default 0 (all elements are considered): The split length is used to delimit the length of training sequences.
noise_ratiofloat, default 0 (no noise): The threshold of frequency to consider elements as noise.
MBRint, default 0 (at least one update): Minimum number of similar sequences needed to compute predictions.
alphabetAlphabet: The alphabet is used to encode values for Cpt. alphabet should not be used directly.

Methods

`compute_noisy_items`	Compute noisy elements.
`find_similar_sequences`	Find similar sequences.
`fit`	Train the model with a list of sequence.
`predict`	Predict the next element of each sequence of the parameter `sequences`.
`predict_k`	Predict the next elements of each sequence of the parameter `sequences`, sorted by descending confidence.
`retrieve_sequence`	Retrieve sequence from the training data.

fit(sequences)¶

Train the model with a list of sequence.

The model can be retrained to add new sequences. model.fit(seq1);model.fit(seq2) is equivalent to model.fit(seq1 + seq2) with seq1, seq2 list of sequences.

Parameters

sequenceslist: A list of sequences of any hashable type.

Returns

None

Examples

>>> model.fit([['hello', 'world'], ['hello', 'cpt']])

predict(sequences, multithreading=True)¶

Predict the next element of each sequence of the parameter sequences.

Parameters

sequenceslist: A list of sequences of any hashable type.
multithreadingbool, default True: True if the multithreading should be used for predictions.

Returns

predictionslist of length len(sequences): The predicted elements.

Raises

ValueError: noise_ratio should be between 0 and 1. MBR should be non-negative.

Examples

>>> model = Cpt()

>>> model.fit([['hello', 'world'],
     ['hello', 'this', 'is', 'me'],
     ['hello', 'me']
    ])

>>> model.predict([['hello'], ['hello', 'this']])
['me', 'is']

predict_k(sequences, k, multithreading=True)¶

Predict the next elements of each sequence of the parameter sequences, sorted by descending confidence.

Parameters

sequenceslist: A list of sequences of any hashable type.
k: int: Number of predictions to make per sequence, ordered by descending confidence.
multithreadingbool, default True: True if the multithreading should be used for predictions.

Returns

predictionsList[List[Any]] of dimension len(sequences) * k: The predicted elements.

Raises

ValueError: noise_ratio should be between 0 and 1. MBR should be non-negative.

Examples

>>> model = Cpt()

>>> model.fit([['hello', 'world'],
     ['hello', 'this', 'is', 'me'],
     ['hello', 'me']
    ])

>>> model.predict_k([['hello']], 2)
[['me', 'this']]

compute_noisy_items(noise_ratio)¶

Compute noisy elements.

An element is considered as noise if the frequency of sequences in which it appears at least once is below noise_ratio.

Parameters

noise_ratiofloat: The threshold of frequency to consider elements as noise.

Returns

noisy_itemslist: The noisy items.

Raises

ValueError: noise_ratio should be between 0 and 1

find_similar_sequences(sequence)¶

Find similar sequences.

A sequence similar X of a sequence S is a sequence in which every element of S is in X

Parameters

sequencelist

Returns

similar_sequenceslist: The list of similar_sequences.

retrieve_sequence(index)¶

Retrieve sequence from the training data.

Parameters

indexint: Index of the sequence to retrieve.

Returns

sequencelist

Examples

>>> model = Cpt()
>>> model.fit([['sample', 'data'], ['should', 'not', 'be', 'retrieved']])
>>> model.retrieve_sequence(0)
['sample', 'data']

The Cpt class¶

The `Cpt` class¶