Indywiki Developer Notes
indywiki.py : GUI logic and algorithms
qrc_resources.py contains the images (encoded) so it is not necessary to ship them on separate files
config.py has the user options (language of last visit, geometry of the application plus some other variables)
web1/ contains wikipedia.py, which is part of the nodebox web library for python. wikipedia.py does many of the queries to the wikipedia API.
def main(): creates and displays the GUI (through Ui_MainWindow()) . Initiates:
toolbar with buttons (lineedit, combobox, etc)
other widgets (textBrowsers, etc)
SIGNAL-SLOT connections (eg when a button is pressed, what function is run)
Most of the widgets are initiated on their own classes (subclasses of main QT widgets) that are found on second half of indywiki.py and below (eg MYLINEEDIT(), MyTextBrowser2() etc)
wikipedia.search is part of the nodebox web library for python. It is used within indywiki to search for an article, and returns: article's text (unicode), article's links (list of unicode strings), article's translations (list of unicode strings), categories an article belongs (list of unicode strings). More specifically, it gives an article's text as a whole text, or splitted in paragraphs.
so when the user enters a query, first_click() is called, that stores the value plus language (wikipedia site) on a list (Back button needs this), sets the mouse pointer as busy, makes a quick sanitization (replace(' ', '_')) and calls class first_click_zero().
first_click_zero() performs the article search. It makes a search through wikipedia.search . In case it gets a reply it emits a signal and goes to first_clicka(). If it doesn't find any result, it tries to get the article's name searching Special:Search/article in the url and get the redirect (Wikipedia redirects a misspelled article into the actual article, if it exists). If this fails it emits a signal and goes to no_results(). If there's a wikipedia.search error, or there are network connection problems it emits a signal for no_resutlts_two() that shows a related message on the statusbar. On the other hand, no_results() will perform a full search on Wikipedia and display a list of the first results, so a user can choose which one is the article she was looking for.
first_clicka() saves the application geometry on the user's config file, clears the textBrowsers, increases the progressBar and starts displaying stuff: paragraph's titles, first paragraph's text (sort of the article's summary, according to Wikipedia article's guidelines), links found on the article's page (the first 500), and emits a signal to go on with display_images()
display_images() uses getImages() to get a dict with info on the articles images, as title, url, width and height -if there are any images. If there are ten images or more, it calls Download_native() (shown later) in ten threads to start downloading and displaying them, since we have ten buttons on the Gui. If 0<n<10 images are found, Download_native() is called n times. Then it emits a signal to go on with display_images_2() and displays a message on the statusbar.
display_images_2() sends a query to the Wikipedia api and gets the list of backlinks (articles that link back to our article). This will be used to estimate the weights of each link, so links with bigger weights are more closely related to our initial article. After it gets the backlinks, it calls filter_links() that estimates this weights. In the case the article has <10 images, get_first_ten() is called, that will try to display the most closely related links, till the ten buttons on the gui are all filled. In the other case, the program waits for a user's intervention to do anything else.
get_first_ten() will try to fetch and display the most closely related links, in order to fill in the ten buttons on the GUI. Each time it gets the first article on list sorted_list , that was prepared by filter_links() and starts a thread with Download(). It ensures this happens till the ten buttons are complete, and then waits for the users intervention.
Download() uses getImages() to see it the article has any images that fulfill our criteria. If it finds any, it downloads it (urllib2.urlopen()) and calls download(), which sends a signal that the image is ready to display. Download() itself can't display the image (although this would save many lines of code), because the program will crash (A golden rule of Qt programming is that changes to the GUI are permitted only from the main thread. Otherwise the gui crashes with Xlib errors. On the other hand, if Download() wasn't started as a separate thread, and was called within a for loop, the program wouldn't search and display the images asyncronously! Moreover, if it was used to display the images, the GUI would hang on the meanwhile, thus distracting the users that the application has crashed, while it would be trying to download the data. This behavior doesn't happen now. Download() does some other stuff, for example it makes sure an article doesn't get displayed more than one times, and that we won't get sort of lists (small and incomplete articles that have very few links and backlinks). We keep feeding the initial list.
download() emits a signal and calls show_nonnative()
show_nonnative() is called when Download() with help of download() have found and downloaded the correct image (an image, as far as show_nonnative() cares, plus the number of button that it will be displayed). Thus it shows the image, (QtGui.QPixmap()) sets it's tooltip to the images title and increases the progressBar.
Download_native() that was called by display_images() downloads the image number x, where x is called as an argument. Download_native() expects the image to be included in a list (that includes the image's url, size etc) if it is a native image, or a tuple, if it is a non_native image. The seconds was added to facilitate the Back button (it stores the images' info on the images' dictionary , plus the names of the article's they belong to). Again when this class downloads the image, it calls another one, Downloadd_native() this time, that emits the signal .
Downloadd_native() emits the signal that image x is ready to be displayed (and calls show_native())
show_native() , that was called by Downloadd_native() (that Download_native() called) shows the image, (QtGui.QPixmap()) sets it's tooltip to the images title and increases the progressBar.
filter_links() that was called by get_first_ten() does the filtering of links and calculates the weights. It makes a test to see if it has what it needs (list of articles that are found on each category the article belongs, list with links and list with backlinks). It then gives the weights using an algorith that is documented within the code. Returns a list with links.
get_next_ten() can be called after the first ten images are displayed. If there are native images to be displayed, it calls Download_native() that will take care of it, otherwise it calls Download() with argument the number on the dictionary of the image that is found with the bigger weight. Download() will try to find and display the first image of it, if it exists, or it will keep on with the next entry. get_next_ten() can be called as many times as we like
genButtonHandler() is called every time a button is pressed (an image button). It calls display_native_size() when a button that has a native image is clicked, otherwise it clears the lineEdit, inserts the new article and then calls first_click(). genButtonHandler() knows that an image is native if it is a list, or that it is a non_native if it is a tuple. For example,
['Samos City', 'http://upload.wikimedia.org/wikipedia/commons/thumb/a/ad/Samos.jpg/250px-Samos.jpg', '250', '188'] is a list (a native image, found on the article 'Samos Island'. The entries are:image title, image url, image weight, image width.
(u'Icaria', ['The village of Armenistis in the north between Nas and Evdilos', 'http://upload.wikimedia.org/wikipedia/commons/thumb/8/8a/Armenist%C3%ADs_ikar%C3%ADa.JPG/250px-Armenist%C3%ADs_ikar%C3%ADa.JPG', '250', '188']) on the other hand is a tuple and thus a non_native image (when we are searching at 'Samos Island' article and click on this, we get directed to article 'Icaria'). The entries are: article, and then a list with info for the article's first image.
display_native_size() that is called by genButtonHandler() when a button gets clicked gets as arguments the images url and images title. It then downloads the image, makes a QDialog() and displays the image in it's own window, normal size (and not thumbnail).
showTranslations() is called when the relevant button on the toolbar is pressed. It displays a message if the article we are looking at is found on other Wikipedia sites. If it exists, it gives us the option to choose a Wikipedia site, through a combobox and get redirected there.
WIKIPEDIA DATAWe get most Wikipedia data using the mediawiki API:
-number_of_ten = 0 after the first ten images are shown. Each time the next button is pressed, it gets increased by one.