Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- @startuml
- ' TODO: Discussion
- ' 1. Fingerprint: Pair<Int, Int>, or create a new Object of type Fingerprint and shinglesList: List<Fingerprint>
- ' Entities
- entity Document {
- 'key
- String url;
- 'The rest are values
- ' <shingle_hash, shingle_pos>
- List<Integer, Integer> shinglesList;
- Date publishDate;
- }
- ' Controllers
- class FingerprintController {
- CrawlerService crawlerService;
- FingerprintService fingerprintService;
- ' Endpoints -> Double represents similarity ratio
- ResponseEntity<Double> checkURL(@PathVariable String url);
- ResponseEntity<Double> compareURLs(@PathVariable String url1, @PathVariable String url2);
- ResponseEntity<Double> checkText(@RequestBody String text);
- ResponseEntity<Double> compareTexts(@RequestBody String text1, @RequestBody String text2);
- }
- ' Services
- class CrawlerService {
- 'from news-please import NewsPlease
- NewsPlease newsPlease;
- NewsPlease.Article crawl(String url);
- }
- class SimilarityService {
- ' Check to be in [0, 1]
- Double computeSimilarity(List<Integer, Integer> fp1, List<Integer, Integer> fp2);
- ' <URL, Similarity>
- }
- class FingerprintService {
- DocumentRepository documentRepository;
- SimilarityService similarityService;
- Pair<String, Double> findMaxSim(String URL);
- List<Pair<String, Double>> findMaxKSims(String url, int k);
- }
- class CrawlerService
- ' Repositories -> what interacts with our database
- interface DocumentRepository {
- Document findById();
- List<Document> findAll();
- Document save(Document document);
- }
- @enduml
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement