LOLA refresher
LOLA requires comparing sets of intervals
Can we improve the efficiency to enable faster, larger-scale analysis?
If subject list has no containment, identifying overlaps is fast
binary search on start intervals, followed by backward steps:
The problem arises with contained interval overlaps
How can we improve efficiency without guaranteeing no containment?
Many approaches to solve the ‘containment’ issue:
- Nested Containment Lists (GRanges) (Alekseyenko and Lee, 2007; Aboyoun, P, Pages, H, and Lawrence, 2012)
- R-trees (bedtools) (Kent et al., 2002; Quinlan and Hall, 2010), Augmented interval trees (Cormen et al., 2001)
These methods try to structure the data to provide non-containment guarantees
Methods provide non-containment guarantees
R-trees
Annotates tree nodes with a minimum bounding rectangle of elements. A query that does not intersect the bounding rectangle will not intersect any child element.
Nested Containment Lists
Augmented Interval List
Augment the list with the running maximum end value. solves the problem for lowly-contained lists
Decompose the list to minimize containment. extends the solution to highly-contained lists
Augment with the running maximum end value, maxE
Provides a local guarantee of no containment.
AIList works on contained lists
But long containment runs are problematic
Decompose long runs with constant maxE
Datasets
How does it compare to existing approaches?
How does it scale with increasing size of subject?
Conclusion and Directions
AIList is best-in-class for one-to-one interval comparisons