The Cpt
class¶
- class cpt.cpt.Cpt¶
Compact Prediction Tree class.
- Attributes
- split_lengthint, default 0 (all elements are considered)
The split length is used to delimit the length of training sequences.
- noise_ratiofloat, default 0 (no noise)
The threshold of frequency to consider elements as noise.
- MBRint, default 0 (at least one update)
Minimum number of similar sequences needed to compute predictions.
- alphabetAlphabet
The alphabet is used to encode values for Cpt.
alphabet
should not be used directly.
Methods
Compute noisy elements.
Find similar sequences.
Train the model with a list of sequence.
Predict the next element of each sequence of the parameter
sequences
.Predict the next elements of each sequence of the parameter
sequences
, sorted by descending confidence.Retrieve sequence from the training data.
- fit(sequences)¶
Train the model with a list of sequence.
The model can be retrained to add new sequences.
model.fit(seq1);model.fit(seq2)
is equivalent tomodel.fit(seq1 + seq2)
with seq1, seq2 list of sequences.- Parameters
- sequenceslist
A list of sequences of any hashable type.
- Returns
- None
Examples
>>> model.fit([['hello', 'world'], ['hello', 'cpt']])
- predict(sequences, multithreading=True)¶
Predict the next element of each sequence of the parameter
sequences
.- Parameters
- sequenceslist
A list of sequences of any hashable type.
- multithreadingbool, default True
True if the multithreading should be used for predictions.
- Returns
- predictionslist of length
len(sequences)
The predicted elements.
- predictionslist of length
- Raises
- ValueError
noise_ratio should be between 0 and 1. MBR should be non-negative.
Examples
>>> model = Cpt()
>>> model.fit([['hello', 'world'], ['hello', 'this', 'is', 'me'], ['hello', 'me'] ])
>>> model.predict([['hello'], ['hello', 'this']]) ['me', 'is']
- predict_k(sequences, k, multithreading=True)¶
Predict the next elements of each sequence of the parameter
sequences
, sorted by descending confidence.- Parameters
- sequenceslist
A list of sequences of any hashable type.
- k: int
Number of predictions to make per sequence, ordered by descending confidence.
- multithreadingbool, default True
True if the multithreading should be used for predictions.
- Returns
- predictionsList[List[Any]] of dimension
len(sequences)
* k The predicted elements.
- predictionsList[List[Any]] of dimension
- Raises
- ValueError
noise_ratio should be between 0 and 1. MBR should be non-negative.
Examples
>>> model = Cpt()
>>> model.fit([['hello', 'world'], ['hello', 'this', 'is', 'me'], ['hello', 'me'] ])
>>> model.predict_k([['hello']], 2) [['me', 'this']]
- compute_noisy_items(noise_ratio)¶
Compute noisy elements.
An element is considered as noise if the frequency of sequences in which it appears at least once is below
noise_ratio
.- Parameters
- noise_ratiofloat
The threshold of frequency to consider elements as noise.
- Returns
- noisy_itemslist
The noisy items.
- Raises
- ValueError
noise_ratio should be between 0 and 1
- find_similar_sequences(sequence)¶
Find similar sequences.
A sequence similar
X
of a sequenceS
is a sequence in which every element ofS
is inX
- Parameters
- sequencelist
- Returns
- similar_sequenceslist
The list of similar_sequences.
- retrieve_sequence(index)¶
Retrieve sequence from the training data.
- Parameters
- indexint
Index of the sequence to retrieve.
- Returns
- sequencelist
Examples
>>> model = Cpt() >>> model.fit([['sample', 'data'], ['should', 'not', 'be', 'retrieved']]) >>> model.retrieve_sequence(0) ['sample', 'data']