The Cpt class¶
- class cpt.cpt.Cpt¶
Compact Prediction Tree class.
- Attributes
- split_lengthint, default 0 (all elements are considered)
The split length is used to delimit the length of training sequences.
- noise_ratiofloat, default 0 (no noise)
The threshold of frequency to consider elements as noise.
- MBRint, default 0 (at least one update)
Minimum number of similar sequences needed to compute predictions.
- alphabetAlphabet
The alphabet is used to encode values for Cpt.
alphabetshould not be used directly.
Methods
Compute noisy elements.
Find similar sequences.
Train the model with a list of sequence.
Predict the next element of each sequence of the parameter
sequences.Predict the next elements of each sequence of the parameter
sequences, sorted by descending confidence.Retrieve sequence from the training data.
- fit(sequences)¶
Train the model with a list of sequence.
The model can be retrained to add new sequences.
model.fit(seq1);model.fit(seq2)is equivalent tomodel.fit(seq1 + seq2)with seq1, seq2 list of sequences.- Parameters
- sequenceslist
A list of sequences of any hashable type.
- Returns
- None
Examples
>>> model.fit([['hello', 'world'], ['hello', 'cpt']])
- predict(sequences, multithreading=True)¶
Predict the next element of each sequence of the parameter
sequences.- Parameters
- sequenceslist
A list of sequences of any hashable type.
- multithreadingbool, default True
True if the multithreading should be used for predictions.
- Returns
- predictionslist of length
len(sequences) The predicted elements.
- predictionslist of length
- Raises
- ValueError
noise_ratio should be between 0 and 1. MBR should be non-negative.
Examples
>>> model = Cpt()
>>> model.fit([['hello', 'world'], ['hello', 'this', 'is', 'me'], ['hello', 'me'] ])
>>> model.predict([['hello'], ['hello', 'this']]) ['me', 'is']
- predict_k(sequences, k, multithreading=True)¶
Predict the next elements of each sequence of the parameter
sequences, sorted by descending confidence.- Parameters
- sequenceslist
A list of sequences of any hashable type.
- k: int
Number of predictions to make per sequence, ordered by descending confidence.
- multithreadingbool, default True
True if the multithreading should be used for predictions.
- Returns
- predictionsList[List[Any]] of dimension
len(sequences)* k The predicted elements.
- predictionsList[List[Any]] of dimension
- Raises
- ValueError
noise_ratio should be between 0 and 1. MBR should be non-negative.
Examples
>>> model = Cpt()
>>> model.fit([['hello', 'world'], ['hello', 'this', 'is', 'me'], ['hello', 'me'] ])
>>> model.predict_k([['hello']], 2) [['me', 'this']]
- compute_noisy_items(noise_ratio)¶
Compute noisy elements.
An element is considered as noise if the frequency of sequences in which it appears at least once is below
noise_ratio.- Parameters
- noise_ratiofloat
The threshold of frequency to consider elements as noise.
- Returns
- noisy_itemslist
The noisy items.
- Raises
- ValueError
noise_ratio should be between 0 and 1
- find_similar_sequences(sequence)¶
Find similar sequences.
A sequence similar
Xof a sequenceSis a sequence in which every element ofSis inX- Parameters
- sequencelist
- Returns
- similar_sequenceslist
The list of similar_sequences.
- retrieve_sequence(index)¶
Retrieve sequence from the training data.
- Parameters
- indexint
Index of the sequence to retrieve.
- Returns
- sequencelist
Examples
>>> model = Cpt() >>> model.fit([['sample', 'data'], ['should', 'not', 'be', 'retrieved']]) >>> model.retrieve_sequence(0) ['sample', 'data']