Where do I start editing on Kannada Wikipedia?

Many people keep asking this question on facebook, in meetups etc. I have tried to compile a quick list of pointers to start editing on Kannada Wikipedia. Hope this helps.

To start editing Wikipedia: First  create an account on Kannada Wikipedia (http://kn.wikipedia.org). This would help you to track all your modifications or contributions on Wiki.

If you’re already on wiki, then start finding information about any subject or topic of your interest. If you find that information on wiki, its well and good, see if you can improve that article. If you don’t find one, then you will be prompted to create that article.. This is the simplest way to start your editing on Kannada wikipedia.

If you have already kick started your editing on Wikipedia, you can help us (Kannada Wikipedia Community) with few of our existing projects: I shall list them for you.

  • We are also in process of starting Wiki project Karnataka 1000. This project is used to be on English Wikipedia and this time we are trying to make this as a multilingual wiki project to co-ordinate between other language Wikipedians too. This project aims to improve organisation and standardised look and feel of the articles related to the Indian state of Karnataka.- Wikipedia:WikiProject Karnataka 1000 – Wikipedia, the free encyclopedia

These are few projects through which you can kick start your editing activities around Kannada Wikipedia. If you need any help then please drop a message here or on Kannada Wikipedia Facebook Group, Kannada Wikipedia Mailing list. You can also attend our Meetups, Workshops, Or Google Hangout sessions happen atleast once in a month.

Analysis of Indian language Wikipedias for 2012

Having been an ardent believer in the power of Wikipedia to spread knowledge, I had a keen interest to track its metrics over the past few years. Wikipedia software and servers  log each and every action resulting in huge mine of metrics. While WMF produces monthly reports of various metrics, I was unhappy at not having a single number to represent the impact of different actions connected with Wikipedia. While the metrics from the point of view of edits and page views are useful, they only give a partial short term view. As each one of us intuitively understand that number of page views and number of edits are correlated, I derived a metric called ‘Activity’ as their product. As edits could be from humans as well as bots, I have considered total database edits for the metric calculation. Annual change in this Activity metric will be a good way to understand the status. I did the analysis for top 8 Indian language Wikipedias(excluding English) ranked on the basis of Dec 2012 page views  and depicted in the following charts in the same order  to understand their status for the year 2012 and present the data and change highlights below.

 

Indian Language 2012 Pageviews(M) 2012 DB Edits(k) 2011 Page views(M) 2011 DB Edits(k)
Hindi 83.7 167.0 78.1 558.3
Tamil 57.4 238.0 44.2 231.0
Marathi 41.0 131.7 47.4 190.0
Malayalam 34.8 188.0 34.6 186.0
Bengali 32.9 120.8 30.0 119.5
Telugu 28.8 72.5 29.9 84.0
Urdu 19.5 116.0 16.3 81.0
Kannada 16.9 45.6 16.2 50.2

Indian langauge Wikipedias Activity(Pageviews*DB Edits) change

Hindi, Marathi,Telugu and Kannada recorded negative growth in activity, while Urdu, Tamil,Bengali,Malayalam recorded positive growth. Hindi fell by more than 67.4%. Urdu grew by more than 71.32%. We can gain better insight by looking at how each parameter changed  as shown in the next picture.

 

Tamil logged the highest %increase in page views at 29.86% and Marathi fell by  13.5%. Hindi edits fell drastically at around 70%, while Kannada recorded the least fall at  9.16%. Urdu edits grew maximum at 43.21% while Malayalam recorded the least positive growth of 1.08%

 

Indic WP Page views DB Edits

Indic WP Page views, DB Edits (Annual change)

As these metrics are for an year, we need to look at the events over 2011 and 2012 which could have been the cause for these changes. If we look at the edits, we can see that Hindi recorded the maximum drop. I think that lot of the edits were done through bots in the year 2011 in Hindi, unlike in 2012.

Over these two years, there was good amount of activity by Wikimedia Foundation and Wikimedia Chapter to increase awareness. Wiki Conference India that took place in Nov 2011 is a major event. We had  “Wiki Loves Monuments” as a major event in 2012, which however may not  effect Wikipedia metrics  as it is a photography focused event. Tamil wikipedians ran a photography contest and Malayalam Wikipedians had a conference in 2012. Google ran a Wikipedia translation program in 2010 and 2011 in Hindi, Tamil, Telugu, Kannada languages with Hindi being a main target. Tamil had Wikipedia focused event in 2010 itself as part of World Tamil Conference. We had  monthly meetups in three to four cities and held several wiki academies in 2012.  The education initiative of WMF  in India was  focussed on English Wikipedia and hence can be ignored for our study.

In general, Indian language Wikipedias had small number of active and very active editors much below the critical point for any useful collaboration. Based on my knowledge of Wikipedia communities, Telugu and Kannada had insignificant collaboration among Wikipedians. Hindi and  Marathi Wikipedians communities had conflicts which were escalated to Stewards or others for resolution . Hindi, Tamil, Bengali, Urdu and Malayalam can be considered to have lot more contribution from across the world due to their being official languages or due to a significant language population residing abroad.

The other factors that need to be considered are the popularity of social media like Twitter, Facebook, blogs etc, in the last two years which can be negative factors for the growth of Wikipedia.  But these are a major factors at the global level as well and can be ignored for the current study.

From the recent IAMAI survey reports of Computers and Internet use India, we have over 224 Million computer literates in a population of 1.2 Billion and 150 Million Internet users(12%). Among them 45 Million people use native languages accounting for 64% of rural users and 25% of urban users.  Indic language input issue can’t be a main reason as  phonetic and native input methods have been launched by  Google and  Microsoft  way back in 2007.

The above assessment and analysis leads me to the following views.

a) Lack of computers and computer literacy is no longer a major deterrent for Indian language Wikipedia growth. At the present level of computer and Internet access, middle class Indians of around 50 Million   already have access.

b) The increasing use of English as medium of  education from primary level is a major problem for the growth of Indian language Wikipedias. As per ASER 2012 report,   the number of primary and secondary school children studying in English medium is going to trip the halfway mark by 2018.

c)Engineers or professionals who have access to computers and Internet  have not been able to drive   the growth. People from  humanities and Social sciences  need to get access to computers/ smart phones/ Internet. This can increase the diversity required for Wikipedia. What we need is  the availability of low cost Tablet computers for every school/college and revising the curriculum to utilise computers and Wikipedia  in all the disciplines.

Do you agree? Share your views by commenting…

 

 

 

 

 

 

 

Kerala hosts WikiSangamolsavam: first Indic Wikiconference!

Of the 20 Indic language Wikipedia projects, the Malayalam Wikipedia (ml.wikipedia.org) is one of the most vibrant. With about 35 million Malayalam speakers, it is the biggest Indic language community with over 100 Wikipedians. The latest feather in their cap is the recently concluded WikiSangamolsavam conference on April 28-29, 2012. WikiSangamolsavam, a two day event organized in the city of Kollam, was the first Indic language Wikipedia conference ever and witnessed over 100 participants from different parts of the state and country.

A veteran Malayalam Wikipedian, Viswa Prabha, recalls, “Every year, many editors from Malayalam Wikimedia Community attend Wikimania, the annual conference of Wikimedians. Inspired by the activities at Wikimania a few active Malayalam Wikimedians though of planning a similar conference in Kerala.” Many Malayalam Wikimedians also participated in WikiConference, 2011 Mumbai, after which the idea of organizing a conference was put forward in mailing lists, Facebook group, and other discussion forums. Since Wikimedians from Kollam took up the initiative the venue was chosen as Kollam.

Over 30 Malayalee Wikipedians were involved in different stages of organizing the conference like managing the venue, food, accommodation, financial resources, registration etc. What was proposed as an idea in 2009 took 3 years to materialize, but rightfully so into a wonderful experience! “The event celebrated the achievements of the Malayalam community, planned new projects as a community and welcomed more Malayalees to the community. E-malayalam, free and open knowledge, copyright and cyber-freedom were the highlights of the conference this year”, said Kannan Shanmugam, a teacher based in Kollam.

Takeaways from the conference? As Netha Hussain, a medical student and Wikipedian points out, “The high point for me was the parallel Wiki Vidyarthi Sagaman (Wiki students’ meet) where school students were taught to edit Wikipedia. Among the 100+ participants of the conference, a few new editors got valuable insights about Wikiprojects from the paper presentations and discussions during the conference. The existing editors got to meet their friends/fellow Wikimedians whom they had only known online. The paper presentations and discussions brought up new ideas that could be worked upon in the future to enrich ml-Wikipedia’s content. At a larger level, partnerships were explored with IT@School (a government initiative) and Wiki activities were highlighted in the local as well as national media.”

Barry Newstead, Chief Global development officer at the Wikimedia Foundation who also attended the conference, wrote in his blog post, “What was encouraging about my visit was that I saw that this isn’t some naive dream…The Malayalam community served as a real inspiration. Over the past 4 years, they have built a passionate community that has expanded their Wikipedia from 5,700 to 23,000 articles.”

The journey hardly rests at the conference. In the week immediately after the conference, there were meet-ups in 3 different towns – Thrissur, Palakkad and Thiruvananthapuram. Also, community members have been working on initiatives around GLAM and education – and have collaboratively developed proposals for both. Preliminary meetings have already started with a number of museums and a proposal has been submitted to the Keralam – Museum of History and Heritage. Discussions have been initiated with the IT@Schools department of the Kerala government and a formal proposal to introduce Wikipedia as a teaching and learning tool in the 7th – 8th standard will be submitted shortly.

As Shiju Alex, Indic language consultant for Wikimedia Foundation articulates, “These are people who contribute to Wikipedia to share free knowledge but also to keep traditions alive and preserve the language they love. This movement requires young and old, teachers, doctors, engineers, linguists, researchers, writers, bloggers, lawyers, photographers and students. I hope what has started with the conference infuses new enthusiasm in the community and takes it to new heights!”

Numerals in Indic Languages & Indic Wikipedias

As all of you know, most Indic languages use unique scripts for writing the language while some share a common script, such as Devanagari (for Hindi, Marathi, Sanskrit, Nepali and Bhojpuri).

A very interesting phenomenon in Indic languages is the usage of numerals. All Indic languages use its own script for representing the text, but the situation is very different when some one need to represent the numerals. Even though most Indic scripts has its own unique glyphs/symbols (see the following table) for representing numerals, many use Arabic numerals (or Indo-Arabic numerals) instead of language’s own numeral glyphs. (For those who do not know, the official name for the 0,1 2, 3…9 that we use in our daily lives is Arabic numerals! It has many names: Arabic numerals, West Arabic numerals, Hindu numerals, Indo-Arabic numerals, Hindu-Arabic numerals, to name a few :) , but many of us refer them as English or Roman or international numerals).

Here are the numerals of indic languages from the most popularly used numeral systems.

Arabic/West Arabic/Indo-Arabic

0

1

2

3

4

5

6

7

8

9

Asomiya (Assamese)

Bangla

Devanagari

Gujarati

Gurmukhi

Kannada

Malayalam

Oriya

Tamil

Telugu

Urdu

۰

۱

۲

۳

۴

۵

۶

۷

۸

۹

Those of you familiar with any one of these languages will quickly realize that many of us do not use these numerals in our daily life – as Arabic numerals are now the norm in many languages. There are some exceptions though. More details about this is there in the next few sections.

Current status of numerals in Indic languages

All Dravidian languages except Kannada (to some extent) have transitioned to Arabic numerals with media, printing industry and entertainment almost entirely on them. Even school text books have gone down this road. Tamil, Telugu, and Malayalam now use Arabic numerals almost exclusively for everything. The majority of the current generation of the speakers of these languages cannot even identify the numerals of their own mother language. The wikipedias in these languages (Tamil, Telugu, and Malayalam) also completely moved to Arabic numerals.

Kannada is an exception in this regard. Having lived in Bangalore for many years, I have seen Kannada numerals used for bus numbers and elsewhere. (It is actually thanks to this use of Kannada along with English in public space that I learned to read and write Kannada quickly (including numerals!)). Some Kannada text books also use Kannada numerals such as the 10th standard textbooks. (Incidentally, isn’t it wonderful that you can download these for free from an official website!) However, the situation is not so in media (print, online or TV) – where most use Arabic numerals.

So while Malayalam, Tamil & Telugu Wikipedias – use Arabic numerals, Kannada uses its own numerals. However, I have been observing that the new editors coming to Kannada wikipedia mostly tend to start off with Arabic numerals – as it takes a bit of time for them to realise that the preferred numerals on Kannada wikipedia are the Kannada ones. But it is good to note that most Kannadigas are familiar with Kannada numerals, which is not the case for other dravidian languages. In Kannada wikipedia a few community members are of the opinion that Arabic numerals should be used for articles related to science, mathematics, and technology.

Interestingly, Bengali and Assamese language speakers also user their own numerals every where, and Wikipedias also use them. I got the opportunity to see the usage of Assamese numerals in newspapers, books, and elsewhere when I visited Assam for the Assamese wikipedia workshops. Wikipedians from these 2 languages are making major efforts to make sure that all complex Wikimedia templates support their numerals also. This is even more laudable when one considers that the support for non-Arabic numerals is very less now when it comes to complex programs. The work they are doing will benefit all languages that use non-Arabic numerals.

The case is almost similar with Gujarati, Odia, and Punjabi languages where speakers use respective numerals in most places even though TV channels/news papers in some case use Arabic numerals.

 

Devanagari Languages

By Devagri languages I mean the languages that use devanagri script. Some major languages that use devanagri are Hindi, Marathi, Nepali, Sanskrit, Bhojpuri, and so on. The majority of the speakers of Devanagari languages, prefer arabic numerals over devanagari numerals when they want to represent the language in writing.This is widely prevalent in movies, newspapers, books, online, and so on.

Devanāgarī numerals
0 1 2 3 4 5 6 7 8 9

However, in the Wikimedia world, Marathi, Nepali and Sanskrit communities have decided to stick to Devanagari numerals. So, except for Hindi (which is using both the numerals simualtenously) , all other Wikipedias follow Devanagari numerals. In Hindi, the community uses both the numerals which is complicating the situation.

The Debate in Hindi

The situation in Hindi wiki community is bit complex. The community is divided over which to use – Devanagari or Arabic.

Some say that since we use Arabic numerals everywhere else, then this should be followed on Wikipedia too. They quote official communication from the Government of India which suggests that Arabic numerals should be used (though they refer to it as the “international form of Indian numerals”), and refer to a Government notification in this regard. They also talk about a proposal from the Government further reinforcing this. They also say that if South Indian languages can use Arabic numerals, then why can’t Hindi? There are also few Government decisions that went in favor of Arabic numerals (and Romanization of Hindi), such as

Apart from this, the Hindi film industry almost completely moved to Roman letters for most film publicity. For example, I can’t remember the last time I saw a Hindi movie poster in Devanagari. A Hindi film posters search shows all posters only in Roman alphabets – and Hindi film credits are also now in English.

So the main argument is that most of people who have access to media and internet prefer Arabic over Devanagari – and so the former should be adopted as standard.

Northern railway bed sheet

An Indian Railways bedsheet with devanagri numerals printed on it

The counterpoint by wikipedians who argue for Devanagari numerals is that – in spite of the official stance – neither the people, nor indeed Government authorities – have completely abandoned Devanagari, and it is still commonly used across many Hindi speaking areas of North India. For instance, official Government bodies like Indian Railways or Delhi Metro, and even some book publishers follow Devanagari numerals. According to them, unlike urban populations, the majority of the Hindi speaking rural population in UP and Bihar prefer Devanagri numerals when they want to write Hindi. They further point out that a language or script is not owned by the Government but by the speakers of that language. To that extent, they suggest that even if the Government has come up with an order that affects the growth of a language, it is the duty of the speakers of the language to stand and defend their language and script.

Moving forward:

While both arguments are solid, there are unique complications that arise in the Wikimedia world. It is creating difficulties when both kinds of numerals are used on the same project like Hindi Wikipedia (for content, article titles, and so on) as this adversely affects hyperlinking as well as search. There are many other complications arising out of the simultaneous use of both the numerals everywhere. A decision based on community consensus is urgently needed to resolve what could potentially spiral into a much larger issue given that it is already a 1 lakh article project.

Bengali, Kannada, Assamese, Sanskrit and few other wikipedias are showing that it is perfectly fine to use own numerals every where. But few other indic languages like Tamil, Telugu, and Malayalam have gone with the Arabic numerals. So technically both options are possible in the wikimedia world. But community need to reach consensus on sticking to one type of numeral.

Lot of discussion regarding this has happened between Hindi wikipedians both on wiki and off-wiki. The link to one of the on-wiki discussion is here.

http://hi.wikipedia.org/wiki/%E0%A4%B5%E0%A4%BF%E0%A4%95%E0%A4%BF%E0%A4%AA%E0%A5%80%E0%A4%A1%E0%A4%BF%E0%A4%AF%E0%A4%BE:%E0%A4%A6%E0%A5%87%E0%A4%B5%E0%A4%A8%E0%A4%BE%E0%A4%97%E0%A4%B0%E0%A5%80_%E0%A4%85%E0%A4%82%E0%A4%95

In the past, community was not been able to reach consensus – but it is important that the Hindi community should urgently agree on any one numeral system and move forward.

Enhancing Wikipedia Quality using readily available bot code

Bot can help maintain Wiki

Bot can help maintain Wiki

As Indian language wikipedias grow, the focus needs to go beyond  the creation of the new articles and cleaning up content  to  maintainenance tasks like updating and organising categories, making edits across the whole of wikipedia. These tasks are not easy to do manually. Special tools built for English Wikipedia do not support Indian language Wikipedias yet.  Luckily Mediawiki comes with a bot software package called Pywikipedia with pre-built programs that a basic computer user comfortable with typing commands can use.  This post looks at few best practices based on the personal experience of the author.

During the early days (2007-2009)of Telugu Wikipedia, several projects were kick started  to improve content on several topics of importance to the Andhra Pradesh state. A major portion is around districts, mandals, villages, major towns, piligrimage centres, assembly constituencies, various maps down to the level of mandal (District subdivision), and templates supported the organisation of this content. When I read through the archives of the project page, I realize that most of the work is done by few individuals with little coordination. The project never got a proper closure with an assessment of the status and what the next steps should be. As part of 2012 goals, Telugu Wikipedia community agreed  to focus on one project and make a determined bid to improve the quality. After considering the various choices, Andhra Pradesh Districts project was selected.. 5-6 people joined the project. We identified the  key tasks before us as follows.

  • Improve the pages to reflect the outline planned for them and bring them current.
    • Update  2011 census data
    • Add links to the government websites for the districts.
    • Add links to a  major Telugu newspaper site, which had substantial information on each District
    • Update name change of one district
  • Separate the district pages from  the town pages
  • Have a peer review mechanism so that the article quality can be improved based on the same.
  • Capture the lessons learnt which the future wikiprojects can leverage

From the above tasks, the most difficult task is the change of name of the district. This district has close to 1000 villages and most villages had stub articles with the name of the district. There were templates to ease navigation across villages and mandals. These changes are difficult to do manually. Pywikipedia bot is a great aid to do these changes, as there are no special GUI tools which support Indian language Wikipedias.

Replace program (called replace.py)  from Pywikipedia has been put to good use for this. Basically this software allows replacements to any page in Wikipedia. To be effective the number of  pages that need to be processed should be  a smaller subset of the entire wikipedia. The recommended steps  are as follows. Please note that you need to create a separate bot account, if you do not already have one  and obtain bot flag from your Bureaucrat or Steward, so that the large changes that result from are hidden  in Recent changes list of Wikipedia.

a) Understand the wiki category tree related to the district pages. Consider transcluded templates (used for infobox, navigation boxes)

An example of category tree for a distict which needs a name change

[−] కడప జిల్లా‎ (7 వ, 2 పే)
[×] కడప జిల్లా గ్రామాలు‎ (991 పే)
[×] కడప జిల్లా పటములు‎ (51 ద)
[×] కడప జిల్లా పుణ్యక్షేత్రాలు‎ (13 పే)
[×] కడప జిల్లా ప్రముఖులు‎ (10 పే)
[×] కడప జిల్లా మండలాలు‎ (51 పే)
[×] కడప జిల్లా రైల్వేస్టేషన్లు‎ (19 పే)

b) Understand how the old name is used in the Wikipedia. For example Kadapa zilla( native version of the word is కడప జిల్లా) could be used as plan text, as a wiki link, as  a wiki link for the first part, as part of the name of village/mandal for disambiguation. Identify the string pairs (old, new) based on this analysis. for example to change Kadapa zilla to YSR Zilla where a separate page is created with the new name, the following for the required changes

కడప జిల్లా  >>  ‌వైఎస్ఆర్ జిల్లా

[[కడప]] జిల్లా >> [[ ‌వైఎస్ఆర్ జిల్లా]]

district = కడప >>  district = వైఎస్ఆర్

c) Use Replace program with the appropriate category for the district and select option to navigate to its sub categories and make changes.This program can be run interactively, where it will prompt for the source string, destination sting pairs, comment for updation and interactive or batch mode response for updates. Please note that making changes to 1000 pages could take 3 hours are more as the program goes to a wait state after each change. In the following transcript xyzbot is the name of the bot account and only one string pair is given as input.

$python replace.py -catr:కడప_జిల్లా
Please enter the text that should be replaced: district = కడప

Please enter the new text: district = వైఎస్ఆర్

Please enter another text that should be replaced, or press Enter to start:

The summary message will default to: Robot: Automated text replacement (- district =కడప+ district =వైఎస్ఆర్)
Press Enter to use this default message, or enter a description of the
changes your bot will make:

Getting [[వర్గం:కడప జిల్లా గ్రామాలు]]…
Getting 1 pages from wikipedia:te…
WARNING: Family file wikipedia contains version number 1.18wmf1, but it should be 1.20wmf2

>>> పోలి <<<
- |district =కడప
+ |district =వైఎస్ఆర్

Do you want to accept these changes? ([y]es, [N]o, [e]dit, open in [b]rowser, [a]ll, [q]uit) a

Password for user xyzbot on wikipedia:te:******
Logging in to wikipedia:te as xyzbot via API.
Should be logged in now
Updating page [[పోలి]] via API
1 page was changed.

d) If the templates are being changed, make sure that the updated template is correct in all aspects.

e) Redirect the pages which used the old district name in their page titles.

f) Once these are complete, use Replace program with the Search parameter to collect the list of files to modify (which could have been left out by the category based maintainance) and then rerun the replace program to actually make changes.

$python replace.py -search:”కడప జిల్లా”  -save:kadapazillaarticles.txt
$python replace.py -file:../kadapazillaarticles.txt “[[కడప జిల్లా]]” “[[వైఎస్ఆర్ జిల్లా]]”

g) Announce the change to the community and get feedback if you have left out any changes

h) Please note that category trees updation  may take time or may not happen soon(as there are few bugs in Mediawiki related to this)

By following the above, it is easy to make large changes across Wikipedia in a matter of few days. Replace script has many more options to make the selection of pages  easier.  If a certain procedure needs to be run repeatedly, building a special purpose program and  hosting the same on a local or tool server will be most useful.