|
TikiWiki Assistant Thank you for installing TikiWiki!
Click the :: options in the Menu for more options. Please, also see TikiMovies for more setup details. |
HomePage
Keeping the network alive:
Thoughts about Ways to Improve the Availability and Trouble-free Operation of Long Range WiFi? Mesh Networks in Rural Areas By: Yahel-Ben-David? – CTO – AirJaldi ![]() (WiKi page: http://wiki.AirJaldi.Org/ ) Introduction: During the last three years of building, operating and maintaining a large-scale WiFi? network in the Indian Himalayas, I held numerous discussions and spend countless hours in thoughts and experiments relating to the challenges of keeping such a network operational over time, with a very limited budget and using local human resources. I had accumulated many pages of notes, which I never had the time to make public. Yesterday (Sept 9th, 2007), I was fortunate enough to read Sonesh Surana’s et al. paper: http://www.dritte.org/nsdr07/2007/07/session_1_paper_1.html which he presented at the Sigcomm workshop in Kyoto, Japan. In this paper, the authors elegantly articulate their experiences with operating WiFi? networks in rural India (and elsewhere). As could be expected, the challenges they face are very similar to what we experience with AirJaldi networks, although ours is on a much larger scale. Moreover, some of the solutions the authors suggest have been already implemented and tested, to some degree, by AirJaldi. Some proved to be successful and some less so. Many others still need to be further researched, developed and implemented. Thanks to this enlightening paper by Sonesh et al. I was encouraged to make an effort to collect, refine and attempt to organize my thoughts and documents on these issues. As the authors point out, it is somewhat sad that these issues are commonly overlooked by researchers and planners who do not work in the field, and therefore makes the efforts to deal with them especially important. I plan to make these thoughts into a WiKi? page, in the hope that it will encourage a lively discussion by as many readers as possible, hopefully leading to a joint, collaborative, research effort. My thoughts are presented in three main sections: 1. Human factor. 2. Improving the reliability and robustness of network components. 3. Early identification of potential failures before they occur. I hope the readers will find this paper useful and am looking forward to comments, suggestions and a lively discussion! Table of contents:
1. Human factor1.1. Training and capacity buildingAs advanced and innovative a technology can be, it will never be successfully utilized without quality training and capacity building of local administrators. I cannot envisage a network that is completely maintenance free, even if the most expensive products are used. It is therefore essential that the R&D efforts should be accompanied with the development of training materials and programs to complement the innovative products and methodologies and help in putting them to the best use. The AirJaldi Network Academy, with its two campuses and plans for further outreach, was established to address these needs. The initial curriculum used is that of the Cisco Networking Academy. Concurrently with the operation of the Academy and the work of the R&D section, our team is incorporating new products and lessons from the field to develop a practically-focused, hands-on set of AirJaldi training programs. While not yet delivered in a formalized fashion, the existing on-the-job training we gave in Dharamsala during the last two years to our team members, volunteers and interns who worked with us, proved to be effective. As I write this paper in Berkeley, California, our team in Dharamsala not only maintains and manages the existing network, but also adds and connects new subscribers on an average of a new router every other working day. All of AirJaldi’s? team members are from Dharamsala and its rural surroundings, have had no previous training in networking and most had little more than basic computer skills when they joined AirJaldi. As our technology is at present far from being refined - with no GUI, and hardly any documentation, not to mention the fact that it is constantly changing and evolving - our local team had to become very proficient in Unix OS management skills, obtain good understanding of IP routing and debugging of networking problems, and develop skills needed for mechanical elements of instillations, use of power tools, climbing and safety measures, electricity and measurement tools, etc. While some of these skills are essential and must be taught in the future, others could be replaced, to a large extent, by the use of good GUI tools, intelligent automation and proper documentation. This work, together with the development of more formalized training programs will enable shortening the required training time for network operators substantially. Other then the technical skills, some of which where mentioned above, business management skills and experience are also essential for the operation and maintenance of a growing and economically successful network. Such skills are just as hard to find and develop in rural areas, therefore should also be addressed by the AirJaldi training and capacity-building work. Although a critical ingredient, essential for the operation of rural networks - training is all too often overlooked by research groups and activists, a fact which has led to project failures and a bad image of the technology itself. The AirJaldi Networking Academy – is actively seeking collaboration with academic institutes and activists who could assist with curriculum writing and production of training materials, as well as for instruction in Dharamsala and training of trainers (ToT). 1.2. Collaboration with Academic institutes and research groups:“Students power” is one of the best available resources for enabling rural ICT projects in developing countries. Most projects will have great difficulty finding and recruiting such highly trained and often very motivated individuals, let alone afford their costs. Although on the face of it, it might appear that volunteers and interns are a no-cost resource, this is hardly the case. Travel expenses and hosting can quickly accumulate to very high figures. Considerable human resources often need to be allocated towards guiding volunteers and dealing with various professional and personal issues. Without careful planning, mutual alignment of expectations and a clear scope of work, volunteer services/internships can end up being a costly disappointment for both sides. Proper planning, ample time for it, and due-diligence on both sides, are therefore the necessary conditions for a successful internship. In addition, both sides need to decide on the most efficient use of the volunteer’s time. Climbing towers in rural India or Africa, while could be fun, is not necessarily the best use of a student’s time, nor the most efficient way of conducting research. Utilizing local organizations who are already active “on the ground” to test and implement the latest development, as well as collect the data needed for further research can greatly increase productivity, allowing researchers to focus on the work itself rather than the, often unfamiliar and difficult, surrounding conditions, constraints and practices. Having said that, building such fruitful collaborations is often difficult due to technical, cultural, and linguistic gaps. AirJaldi is in a unique position to help bridge some of these gaps and offer that needed “research extension” facility which would mutually benefit all parties involved, ensuring effective collaboration while maintaining the highest academic standards and ensuring focused, speedy and practical research and product development. Potential collaborations with academic and research institutes can be divided into two broad categories: 1.2.1. Onsite internship - mostly of undergraduate students:Past experiences of western undergraduate students coming to Dharamsala for long-term (5-6 months) internship program have proven to be very beneficial for both AirJaldi and for the students. The student interns anticipated encountering difficulties and a relatively slow pace of work arising from dealing a new, unknown and challenging environment, they were positively surprised at the relatively short acclimatization and integration process into our team. This allowed them to spend much time on their actual tasks and enabled them to develop their own areas of expertise during their relatively short visit. AirJaldi benefited immensely from having such skilled and highly motivated individuals on board. We were able to pair our local team members with the visiting interns, yielding not only fruitful work results but also strong friendships and future increased involvement and collaboration with the individual and the academic institute to which he/she belongs. However, we feel that such programs should be handled cautiously and planned well in advance. Our experience has taught us that it is often difficult for visitors to adapt to local conditions. Without proper preparations, ample time and sufficient information prior to arrival, precious time can be wasted getting the student "acclimatized". As a general guideline, we try to discourage internship periods of less then 5-6 months. Especially when working with undergrads, a somewhat close supervision and guidance is needed. It is not uncommon that a student has to spend substantial amounts of time learning or re-learning some of the subjects directly affecting their work/research tasks. 1.2.2. High-level research by graduate students, mostly remote:There is much room for very high-level academic research and development that can make a genuine long term impact. Projects such as altering the MAC layer of WiFi?, while still using the low-cost WiFi? radios are a good example. It is very unlikely that AirJaldi would make much progress with such a research, if done from Dharamsala alone. The supportive academic environment is a priceless resource for such advanced research, not to mention the need for rather expensive lab equipment to conduct such projects. Such higher level research can become the subject for a PhD. level dissertation, yielding additional benefit to the project leader, such as motivation, support and often funding. Helping researchers focus their time and efforts - saving unneeded travel and countless hours wasted in the field waiting for the rain to stop or for a road-blocking land-slide to clear, while not completely replacing the need for site visits of project leader/s, could save considerable resources and reduce frustrations and chances of failure. 1.3. Focused, well targeted research – led and prioritized from the field:Although often thought of as a common practice, not all researchers and developers believe they understand local realities, conditions and needs better then local people. More often than not, researches simply do not have anyone on the ground with sufficient familiarity and experience of local conditions and a good understanding of technology, with whom they can consult and collaborate. Researchers are therefore often forced to relay on their own experience and intuition both with the technological research as well as trying to understand the local needs and conditions. The later often takes much longer then expected and, legging behind the technological development, puts the attainment of overall project targets at risk. We plan to use AirJaldi’s? R&D division as a resource and partner for such collaborations. While being on the ground in rural India and overseeing the day-to-day operation of numerous rural networks, our team of experts is multi-national and multi-disciplinary, making the division ideally suited to coordinate the R&D efforts of diversified research groups from around the world. Wide gaps often exist between a proven technology and a working product. For example – the ability to use a low-cost radio to establish a long-range and high-bandwidth link (the technology) could be rendered useless if a common virus will utilize the whole bandwidth, leaving nothing for desired traffic. Any product developed, will have to take into account these real-life extremes and provide a solution to them, often by integrating a number of technologies (for this example, maybe QoS is a suitable technology, but while being known, tested and proven technology – it is extremely difficult to integrate into a manageable working product). To help bridge this gap, the AirJaldi team will strive to ensure practically-oriented research by helping researchers to identify and prioritize such challenges, as well as areas where research is needed and stand the best chances of yielding benefit and making an impact. In addition, we will provide the needed logistical and technological support for carrying out research projects through help in integrating research work into deployable, ready for testing products, thereby allowing the researchers to focus on the more complex technological issues. In summary, we are planning to position AirJaldi as the leading field expert - responsible for coordination of R&D efforts among multiple, world-leading, research groups, while overseeing the integration, deployment and testing of new developments. We are willing to work hard for that recognition and feel confident that we will earn it through our proven successes and hard work. 1.4. Network “critical mass psychology”– building capacities, empowering and supporting entrepreneurs, and promoting the technology:By using a somewhat unified set of field-tested and proven components, with relatively straightforward methods for replication, network operators can expect to overcome some of the initial growing pains often associated with young and small rural networks. A substantial improvement in the stability and sustainability of a rural network is achieved as it increases in scale. Arguably, the psychological burden on inexperienced operators attempting to oversee such a complex and totally new enterprise is overwhelming. The daily struggle to keep a single link running could be very demoralizing. However, should the same link be one out of a hundred other links, the whole approach changes, and the same task becomes minor and less scary, with much better chances for a quick solution. However, keeping these understanding in mind we could provide for a smoother initial deployment of new networks. The same local operators, with the same basic level of training, might actually find it easier to maintain a larger network then a tiny one, due to the psychological pressure effect. In other words, I suggest there should be a minimum “critical mass” for a new deployment, and that a failure to meet such this initial scale could quickly lead to failures or continuous attempts to run a network which might never become self-sustained technically and economically. The nature of deploying a larger network would, most likely, imply a much better planned and designed system, with less of an experimental nature. It is likely that a larger initial deployment will be planned by a more experienced designer and will be carried out by an experienced team, will be better placed to meet the deployment timetables and will enjoy much higher chances of success. As stated earlier, one problematic link in such a network becomes just a small part of a generally working network, one which can easily prove and demonstrate its overall viability. Having said that, we have seen cases where smaller networks have been very successful, especially when operators and administrators felt they are a part of a larger-scale operation and team. The knowledge that the same exact technology works well in similar environments and that support is available from people operating such networks, is at times all it takes to resolve problems in a calm and methodological fashion, without entering into a “panic mode” which is all too common without such guidance. We see many failed attempts by local entrepreneurs trying to experiment with similar technologies on their own. I argue that trusting the technology and the belief in one’s ability to deal with it, is as critical to the success, as the technology itself. One of AirJaldi’s? main goals is to play the role of the enabler for rural networks - to “be there” for any daring local entrepreneur who wishes to embark on such a grassroots effort of building a rural network by providing support and assistance in planning and creating networks while enhancing the trust and belief in the technology. 2. Improving the reliability and robustness of network components:This section is a collection of tips, ideas, and thoughts about ways to reduce problems throughout the various stages of building, operating and maintaining a wireless network. While some of these tips might seem trivial, others might seem extremely complex. We often learn that it’s the most simple and trivial elements which produces the more substantial improvements, while often overlooked completely by many operators. I’ll try to begin with the more obvious tips first: 2.1. Router’s weather-proof enclosures - ODU (OutDoors Units):Low-cost, locally fabricated, not completely sealed as sealing increase costs and manufacturing complexity on one hand, while on the other hand often fail with age and use, therefore allows humidity to penetrate and condensate without a way to escape. We therefore keep a small groove at the bottom open, allowing water to escape, which also ease the passing of cables without the need to cut them and re-crimp when replacing a board or an antenna. We mount the board on long spacers away from all walls of the box and make a slanted top roof to ensure condensation will not drip on the board but rather slide on the walls (front wall is safer) and escape from the bottom. We found that using sheet-metal is cheapest and easy to use for manufacturing as well as offers substantial heat-mass ideal for keeping the unit cool or warm as needed, provided it's properly powder-coated with light (off-white) color. Solid pole-mounting with clamps is important, as without it the vibrations caused by strong winds are harmful. Spacers should be made of a somewhat flexible plastic to further soften the vibrations, with routing of wires in a way that ensure flexibility, while eliminating any forces from the outside on connectors. In some locations it's wise to mount a power-indicator led at the bottom of the box to be visible from below. However, this could also attract unwanted attention (at night) to the device, by animals or humans. It might be wise too include additional visible indicators such as for signal strength, etc. 2.2. Masts and methodologies for their erection:This is an area where we make constant improvements and ongoing study. It’s important to keep things simple and attempt to establish good working relations with local welders and workers. Generally – we are moving away from water-pipes supported by guy-wires, especially where higher-gain antennas are used for longer distances. We aim for more solid and self-supported, climbable masts and/or wall-mounted arms. In specially exposed locations, we ensure proper grounding (earthing) and use of thick copper cable from the grounding-pit up to above the mast, which serves as lightening arrester. Proper grounding is also critical where gas-discharge lightning protectors are being used and for PoE surge protection (see later). 2.3. IDU (Indoor Units):We use low-cost tin boxes (commonly used in India and referred to as "trunks") to house the power supplies, batteries, charge controllers and often the network switches. These boxes must be locked (sometimes only the big-manager has a key and at times only the local network manager - an AirJaldi employee has the only key). These boxes must be mounted and fixed to a wall, ideally high - just below the ceiling. The power fed into the box should not have a plug, but instead fixed onto the wall socket or power-source, making it extremely difficult for anyone to disconnect. It's best to route a separate power cable from the place where the power gets into the building, commonly connecting it directly to the power-meter, bypassing any fuses or switches (we have our own fuses inside the box). The IDU should have a clearly visible power-led indicating power is on, with a multicolor led indicating battery charge or discharge state. In the future - we might include additional visual indications for various problems and possibly an audible alarm. During the installation of the IDU, the installer must ensure existence of proper grounding (earthing) which is essential for the protection of the PoE (see later) and should be also connected to the box chassis. In remote, solar powered, relay sites - the box is made of steal and is welded onto the mast at above the height of person, with strong locks and while ensuring weather proofing on one hand and ventilation of battery fumes on the other (these sites commonly use a 120Ah large lead-acid batteries). 2.4. LVD (Low Voltage Disconnect) and good charger:LVD circuits, commonly built into battery chargers, are designed to prevent over-discharge of batteries by disconnecting the load when battery voltage drops below a certain level. As a positive side-effect, the use of a good LVD ensures that the load (router) is not being powered with low-voltage which often gets it to hang and stay stuck even when proper voltage level resumes. Before using good LVD circuits we had daily problems of routers that needed to be rebooted manually. The cheap chargers we used, had LVD circuits of very low-quality which used to oscillate, bringing the load up and down frequently hence contributing to damages to the boards and flash memory (see later flush writes during reboot) while further depleting batteries and shortening their life span. The current LVD used in the new AirJaldi chargers – function well and ensures power is not restored to the load unless battery is well charged, while it also maintains a very low self-power consumption during the “off” state. 2.5. Robust power suppliesStill today, the main cause for network outages in the Dharamsala network is due to burnt power supplies. As confirmed by the power-logger used by TIER, power fluctuations are common in rural India with spikes up to 1000v and long periods of very low voltage (we found 70-90v or even less to be common for many hours, specially at winters from 7pm to 10pm). While the recent power supplies we use continue to function well at very wide input voltage ranges 80-240v we still struggle to make them immune to the spikes and surges. We found also, that locations which are very far from the transformer are subject to more frequent and more extreme power fluctuations. While this is very easy to explain as the reason for low-voltage, it’s a bit harder to understand the relation to high-voltage spikes at these far locations - although in-depth learning of the subject confirms and explains it. It is not uncommon in the mountains of Himachal to have a small village fed by a 5-10km long, single-phase, pair of wires. We learned that power surges in these far villages are so frequent and so extreme that it’s not practical to use this source at all. (We have a large pile of burnt power supplies of all sorts, mostly from two such small villages). In such locations, although power is available, we opted to use solar power and not touch the local power grid. I feel that with proper research we should be able to produce an affordable power-supply which will be immune to these power surges while offering even wider input voltage range. This is a very high priority task on our list! 2.6. Hardware watchdogs:While some of the router boards we use include a hardware watchdog component, many do not. In addition, interfacing with these unique hardware elements and ensuring their proper functionality, slows the adaptation time for each particular board and add extra complexity. Moreover, it’s difficult to ensure proper operation of the on-board hardware watchdog, especially as there are cases where it also get stuck and does not function, depending on the unique implementation of each manufacturer and other environmental issues (such as low-power, etc). I suggest multiple approaches to that problem, while similar in nature – there are some major differences and each solution will best-fit different environments and applications: 2.6.1. Software-less “stupid” charger:For most CPE applications and some less-remote relay-stations, we desire to keep the costs down and are willing give-up on intelligent features of the charge-controller such as remote reporting capabilities. For such cases, it seems like simple chargers, based on analog circuitry will be more attractive over software driven microcontroller-based chargers. We experiment with two main approaches for implementing a hardware-watchdog when using such chargers: 2.6.1.1. Integrated into the charge-controller:A very simple and low-cost delay circuit is added to the charger, such as this: ![]() There are many ways of interfacing the reset pin with the router in order to perform this function: Via a GPIO line, a limited RS232 interface and there was one very innovative idea of mounting a photo-voltaic transistor onto the router’s led representing wireless activity, so that every blink of activity will reset the watchdog circuit. This would bypass the need for any electrical interfacing or any software writing and should be explored further. The main limitation of this approach is the need to dedicate a wire between the router and the charger for this reset/heartbeat signal. 2.6.1.2. Tiny watchdog add-on unit for each router:In order to overcome the need for a dedicated wire between the charger and the router for the reset/heartbeat signal – we can install the tiny watchdog circuit on the router itself, interfacing it the same ways as described above. Such a unit can be built into a power-plug, such as used by most routers, and double as PoE enabler for router without PoE support, as well as voltage step-down unit where needed (see later: higher voltage PoE feeds). The footprint of the watchdog circuit is quite small thanks to the small capacitor (there is no need for the trimmer). I’m afraid that a simpler circuit might call for a larger capacitor hence increasing the size and cost. The current size is determined by the size of the IC, however an SMD version will make it much smaller relative to the current 16pin DIP package we use. The circuit can work on a wide voltage range. 2.6.2. Intelligent charge controller:In remote, solar-powered, relay-stations, we’ll prefer to use more expensive charge-controllers that will provide valuable remote reporting capabilities of system status and environmental variables. Such chargers will no doubt be microcontroller based and often will include Ethernet controller for interfacing with the router. In these situations, the router and charger already have a communication channel hence no additional wiring is needed. The charger can monitor a heartbeat signal produced by software on the router and reboot the load (and itself – the charger) in case such a signal is absent for a given time. For the unlikely case where the charger itself get stuck (which did happen numerous times while testing the TIER prototypes) I would suggest to add the simple watchdog circuit described above only now the charger’s software will be responsible to reset the watchdog circuit, substantially increasing the overall availability of the system. 2.7. Software watchdogWhile the hardware watchdogs described above are very effective to automatically recover from most hangs, we still need a way to monitor network down issues and attempt to automatically recover. All AirJaldi routers use such a watchdog which is being run by cron every 4 minutes. The watchdog system is a combination of a binary and a configuration file. The configuration file lists what elements to monitor, such as IP reachability to any of a number of remote hosts, channel, SSID and BSSID changes, wireless operation mode (client, adhoc, master) and even local Ethernet link-state as well as a list of processes that needs to be running on the node. The configuration file also lists what actions to take upon failure of any of the tests, and how often a test is allowed to fail before an action is taken. Actions could be anything which might remedy the failure status, such as bringing the wireless interface down and up again, unloading and reloading kernel modules, and rebooting the system. In practice, most recent routers are configured to run only IP reachability tests and perform a complete system reboot in case of a failure. This feature is handy when changing global network parameters – for example when a master station changes it’s operation channel (in a point to multipoint topology) – since all routers are configured to automatically find the operating channel – all needs to be done is to change the channel on the master and within 4 minutes all routers will reboot themselves as they’ll loose connectivity and quickly re-appear on the correct channel. 2.8. Higher voltage PoE feeds:It is not uncommon to use long PoE feeds of up to 90m to connect routers to local LANs. In many places, the ideal location for the antenna is on a high roof of a neighboring building, while the computer class is quite far. When the power source is 12-14v and the Ethernet cable is so long we often see substantial voltage drop on the cable (depending on the load of the router or routers as well as the quality of the cable). To remedy this problem – we use a DC-to-DC step-up converter at the source and a wide-range voltage regulator at the router’s end. We found that a good voltage for such long PoE feeds is 48-56v. While lower voltage (such as 18-24) is often enough to power a single router without a substantial loss, it is not good enough where multiple routers are in use or multiple high-power radios. Going higher then 56v is not necessary even for the longest cables of worse quality and isolation is not rated for such voltage which might introduce problems. While the DC-to-DC converters and especially the regulator on the router’s side are not very efficient and waste some energy, it is hardly an issue where mains power is in use. For Solar-powered installations where every deficiency means much higher costs for larger solar-panels we do not increase the voltage. Fortunately, for such installations – typically at remote relay stations, there is no need for long PoE feeds as the battery is commonly installed not more then 10m below the router, hence 12v suffice. 2.9. Surge protectors on PoE feeds:A common source for damages in the stormy weather of the Himalayas, was surges trough the Ethernet ports. Often, after a stormy night, we had numerous fried Ethernet ports. Soon we ran out of working ports after they all got fried (there are 5 in most routers). By late 2006 this was the most pressing problem we faced. The solution came in two ways: 2.9.1. PoE surge protection circuit:We had to design a field proven surge-suppressor circuit for the Ethernet feeds. The design was based on research of leading products widely deployed with proven field results. Details can be seen in the diagrams and design details for the AirJaldi – charger: http://drupal.airjaldi.com/system/files/Jaldi_Charger_design_1.6.3.pdf An important element is the proper grounding of the circuit. Without excellent grounding this circuit will not function well. It is very unlikely to find grounding in most rural buildings in India. In the rare cases that it does exist it will be done very poorly and often disconnected. It is therefore advisable to include a good quality ground-pit as part of the installation process. This work must be budgeted accordingly (it’s not cheap) and is time consuming and labor intensive (digging a big pit in the ground, etc). 2.9.2. PoE feeds using shielded Ethernet cables:Additionally, we have replaced the long Ethernet cables in places prone to lightings with higher quality shielded cables. We connect the shield of these cables to good ground at both sides (it is assumed that in such locations the side of the tower will also need proper grounding – as mentioned above). A good outdoors rated Ethernet cable is advisable, especially cables with external plastic coating which is UV resistant. Such cables are hard to find in India (although they are available) and are not cheap. Some lower-cost cables exhibit very good resistance to UV while not being outdoors rated. These cables often get the outer plastic shell broken and water penetrates. However, we found that these cables provide years of flawless operation although the cable is filled with water. We ensure to make a loop in the cable before entering through a wall and puncturing the outer shell at the bottom of the loop – therefore ensuring all the water drips outside and do not enter the building. It is also important to use supporting cables to hold the weight and tension of the cables. It is advisable to connect the supporting cables to good grounding in order to enhance the surge resistance of the system. In some locations, we use conduit pipes used for electrical installation or irrigation to protect the wires. This is advisable mostly for remote relay-stations. Commonly the black plastic pipes are designed for many years of strong UV resistance as well as temperature extremes. It’s also advisable to use such pipe to protect the RF cable from radio to antenna, while ensuring proper mechanical mounting/routing of the pipe, as it’s wind load is higher.
2.10. Antennas/Radios surge protectors:Dharamsala is one of the world’s stormiest geographies. It is not uncommon to see more then one lightening strike per second, lasting for many hours continuously over the course of a night. We quickly learned that all radios which are using an omni antenna get burnt in every storm, while directional antennas commonly survive. The explanation was the fact that omnis are more “attractive” to lightning, being mounted high above the mast and having a sharper pole shape, while directional antennas are often below the maximum height of the mast. We then installed a good gas-discharge lightning protectors for all the omni antennas, with proper and solid grounding. Soon we learned that many of the gas-tubes (fuses) within these lighting protectors get burned during storms, and while saving the radios, we still need to send a climber up the towers to replace these (sometimes hard to get) gas-tubes. Other then that – we did not liked the high costs of the good lightening-protectors, not to mention the gain-loss of the protector and the extra cables and connectors, which at times was a too heavy toll on the already very low-power radios. To solve the problem, we now install ALL antennas about 50cm below the top of the mast. While this is hardly a problem with directional antennas, for omnis it creates a dead-zone where the mast blocks the signal from the antenna. I assume it also produces some unwanted reflections, yet we could not notice any such degradation in the filed. So whenever we install an omni – we have to decide on a small area which will not get good serviced as it’s hidden by the mast. To reduce this negative effect we use an arm to extend the omni antenna away from the mast hence reducing the dead-zone and the unwanted reflections problem. The concept is to ensure that the mast, which should be properly grounded, is higher then the antenna. It is advisable to extend the high-quality copper wire used for grounding further above the mast (about 50cm) to serve as an additional lightening protector. In the Dharamsala network – we do not have a single directional antenna where we use a lighting-protector. In some rare and extreme cases (mostly old instillations) we do have such protectors on omni antennas. I don’t recall a single burnt radio (due to lighting) for over 18 months. It is often a bold decision to install a new repeater site on a high and exposed mountain, using very expensive radio cards, without lightening protectors. However, we are not paying for these costly radios only to damage their superior RF qualities by adding lightning protectors. So far these bold decisions proved 100% correct and we have no burnt radios. 2.11. No writes to flush – especially during rebootA major contributor to damaged or corrupt flush memory are unneeded, writes. While every write to flash shorten its life, it’s these uncompleted writes interrupted by power failures which are the real source for trouble. We have found that many boot-loaders used by many hardware vendors, write to flash during boot! (Some Linksys versions and most buffalo routers for example). Sometimes this is a single bit of data, but it’s enough to corrupt the flash. Imagine a situation where you have an oscillating LVD, bringing the router up and down quickly –In such a situation it’s not uncommon for flush write during boot to get interrupted before completion by power failure. We learned about these boot-loaders the hard way, after discovering many routers with corrupt memories. The remedy was to replace the boot-loaders with our own or to find a version which does not write to flash during boot. In addition, it’s wise to ensure that flash file systems are mounted read-only and that no write operations occur throughout the normal life cycle of the router. 2.12. Remote access via “out of band” (OOB), alternate channels:Often, where it comes to a critical relay-station, on a remote mountain or tower, we wish we had an alternate way of accessing it when the main WiFi? link to it is down. In fact, I had built such a solution a while ago, using an intelligent (yet low-cost) router connected to a mobile-phone which serves as a GPRS terminal. In India GPRS services are very cheap at Rs.400 per month with unlimited duration and bandwidth. I therefore used a Netgear WGT634U router, interfaced through it’s USB2.0 port with a mobile-phone. The Netgear (running one of the earliest Kamikaze versions) would “dial” the PPP over GPRS and then setup an OpenVPN tunnel via that link to a remote server. The Netgear will also have its Ethernet interfaces connected to the other routers of the site, with optional two serial console connections as well. I even had a GPIO of the Netgear connected to a solid-state relay to allow for remote power-cycle of the other routers. In addition, the inbuilt WiFi? radio with inbuilt antenna on the Netgear (a miniPCI slot hosting Atheros card) allowed us to remotely scan the local air for interference and/or low power RF signal from the other routers (often the case after a lighting strike). The mobile phone has its own battery and we used to add an additional battery to power the Netgear. Both would be charged from the main power source of that station. This enabled us to maintain access to the site (via GPRS) even if the main power source was down (for up to a day or two typically). Nevertheless, and although time was invested into design and building of that system, I have found it to be counter productive and mostly useless; The rare cases where the use of such system would help recover the network were few. Only in cases of configuration errors by the administrators, when attempting to make changes to the network from remote, or after failed attempts of remote firmware upgrades, was the backup access system helpful to recovery. In all other cases, sending a service-man to the station could not be avoided, hence making the whole OOB system useless, expensive and a complex impediment. Sonesh Surana et al. suggest the possible use of SMS as a method for OOB connectivity to the router in times when the main link is down. While for cases of a failed remote upgrade, having console access could be very helpful (providing the boot-loader is not damaged) it could be easily done over GPRS, yet implementing console access via SMS seems to be very complex and expensive, if at all practical. As for the very important task of data collection, that should be done via the main wireless links. When the main link is down, it is hard to imagine cases where SMS connectivity will contribute to reviving the link. Remote reboot feature via SMS, is arguably the most beneficial feature for revival of a downed system, yet a low-cost hardware-watchdog could handle that task well. We have learned that there is no room for such an alternate access method in production systems. In a production environment, we must ensure no experiments or un-tested changes are being made (see later ways to prevent such accidents and ways to recover) hence further increasing the availability of the system, while making such a backup access system useless. I would estimate that the costs involved in such a system, would double (or much more) the cost of the whole station, while offering no improvement to overall network availability. It is therefore not suitable for production environments at all. In test sites, where experiments are planned – we sometime continue to use that GPRS access system, although, for most test sites we simply add an additional radio and antenna on an alternative band/channel for remote access. I would therefore suggest that R&D of such backup access methods for production environments is pointless (it might have some value for experimental and testing of small networks). 2.13. Limiting configuration errors and keeping “back doors”:While mostly falls under the human factor – reducing the risk of configuration errors to production routers is desirable. In AirJaldi, once tests are completed for a newly installed router, it is being defined as “production router”. Thereafter all technicians and sysadmins are forbidden from altering anything on that device, without prior written permission, following their submission of detailed work plan and supporting explanations on our internal WiKi/Mailing-list. Operators will often login to the router (SSH) or will use its web-gui, for measuring and testing, but will not alter any configuration. While not yet implemented in Dharamsala, I feel it would be wise to allow additional levels of access to a router other then unrestricted root level. The lowest access level (via GUI) could be given to the subscriber, in order to perform diagnostic tests, while root access will be restricted only to senior admins as soon as the device enters production state. A medium level of access (including shell access) could be given to all operators, yet in read-only mode, without allowing them to mount any file-system in RW mode. I feel that such restrictions would substantially reduce, if not eliminate, configuration errors. Arguably, as also suggest by Sonesh, there might be some benefit to configure a “fail-safe” access to the router. Possibly a virtual wireless NIC (on mulit-SSID supported devices) with a pre-programmed IP address, etc. The GUI will not be able to alter such interfaces, ensuring we could always have a backup way to access the device in case of configuration error. As long as majority of the configuration work for a router is done only via the GUI, such a solution might indeed offer added level of safety. However, it does add some level of complexity to the system and to the initial configuration or automation thereof. As a way to further evaluate the pros and cons of this idea, I would suggest to begin a joint discussion of what parameters should be configurable via the GUI in the first place. Arguably, the less configurable parameters the better. Therefore further reducing the potential benefits of having a “back door” vs. the complexity it adds. I would argue that while there is room for further research of access restrictions and methods to recover from operator errors, it will never replace good training and detailed work processes and methodologies. 2.14. Network security and QoS:Not all network down-time is due to hardware failures or operators errors. It’s not uncommon for a virus, a Trojan, other malware, or even a hacker to bring down a network or parts of it. Without proper measures, an overloaded wireless link, or high CPU load on a router, thanks to some DoS attacks or simply unrestricted excessive traffic, could easily bring down the network. At times, the load would be so high, preventing SSH access, or even pings, to the effected router. It is therefore essential to enforce intelligent QoS policies and access control rules, on every interface of every router. In addition, a security policy of blocking everything which is not explicitly permitted is advisable for most rural installations. Nevertheless, should we manage to enforce an extremely well tuned and effective QoS policy, we could afford reversing the security policy into blocking only explicit services. Such a methodology is desirable as it allows the users to experiment with new protocols and empowers them to learn and improve the uses of the network. There is no need to contact the administrators for every new service or protocol. Nevertheless, a bullet-proof QoS policy, which will allow best use of shared bandwidth and networking resources, is somewhat hard to reach. Getting a “good-enough” QoS policy in place is extremely challenging and involves L7 classification of packets, at times forcing buffering of numerous packets in order to make a decision, as well as use of other technologies, often very demanding on large servers, not to mention on $50 routers. In the AirJaldi network, we use a mix of security policies (at some places explicitly allowing and other explicitly denying) as well as a wide arsenal of QoS enforcing methodologies both on the edge routers and relays as well as on our main Internet gateway, where stronger servers are in use. While we experimented with L7 classifiers on the low-end routers, we never managed to reach a well balanced system, although I feel it’s doable given sufficient research resources. At present each router uses four priority queues (in each direction) with our own VoIP ports and SSH assigned to the top queue, HTTP to the second from the top queue and everything unknown to the 3rd queue, with some unwanted protocols at the lowest level. In practice, there is very little traffic on the 3rd queue (unknown ports), while many things other then web-surfing on the 2nd queue. Most P2P applications, which are the real network killers, are masquerading themselves as HTTP. Therefore, without L7 classifiers we cannot shape these on the CPE level. In Dharamsala, we use a transparent HTTP caching proxy, for all this traffic, further limiting the use of P2P protocols and maintaining content filtering (such a porn filters – due to the high traffic load of this content) by the proxy itself. Furthermore – on the Internet gateway we use policy routing to send the heavy P2P traffic via less expensive lines (where no cap on monthly used traffic – yes less bandwidth). In addition to this level of QoS and packet-filtering which is critical to the lower level health of the network, we enforce a per-interface and per-subscriber, traffic-shaping policies on each router, representing the service-level for each customer. These policies further aid in limiting over-utilization of bandwidth, although we currently enforce these only in the down-stream direction (no individual restriction on subscriber’s upload rates). A major task of our R&D department is to further research this field and come out with better ways for managing and enforcing system wide QoS, Shaping and security policies. IMHO – this is a good area for joint research with academic institutes. 2.14.1 OpenVPN tunnels for remote access and NAT traversal:The issue of remote access for administration via the Internet has always been a challenging one. We opted for a client-server solution in which the clients are the routers, initiating the connections to the servers on the external Internet, therefore bypassing NAT issues and non-routable address spaces. In addition, we decided for a full and non-restrictive IP over IP tunneling, in order to allow full IP routing also to machines on various LANs without restrictions of protocols and ports. It was critical to find a robust, widely used, well tested and supported solution which will maintain connectivity when Internet gateways change IP addresses, support for multiple servers and offers good performance over slow lines with high packet-loss, latency and jitter levels, often with limited ability to send large packets and other restrictions. In addition security of this channel is of major concern. It was clear that a VPN over UDP would be the ideal solution. Some solutions where evaluated with OpenVPN the clear winner, mostly for it’s simplicity and ability to run on low-end routers, it’s wide use and advance features. Initially, we used SSH port tunneling, which was very limited and also had performance problems – here is one reason why: http://sites.inka.de/~W1011/devel/tcp-tcp.html While we don’t need to run OpenVPN on every router, we do on many. In fact, OpenVPN works so well on these low-end routers that we use it to solve another problem. On the routers with Broadcom radios, due to the binary-only wireless driver, there are limitations for the use of Multiple SSIDs. We needed to have a single radio which is used both as a HotSpot? for roaming users (with captive portal) and both as a station to which other CPE routers can connect securely (with encryption over the air). While it’s rather easy to implement with Atheros radios it’s rather limited with Broadcom. We therefore implemented only the unencrypted hotspot solution on the Broadcom radios, while allowing connectivity to a given set of IPs bypassing the captive portal. These IPs where of our local OpenVPN servers. The CPE client would associate with the Broadcom hotspot and open an OpenVPN tunnel, trough it, to the server. As soon as the OpenVPN tunnel would come up, the router will change its default route to be via the tunnel therefore providing connectivity to the users behind it via the VPN. OpenVPN also uses efficient compression, greatly improving performance of many protocols. 2.15. Remote upgrades:A very desirable feature is to be able to remotely upgrade router’s firmware. While we have been doing this in AirJaldi, for all the routers, since the very first days, this process is not without dangers and pitfalls. IMHO, no technical solution would be able to replace good practice. When re-flushing routers we always test the upgrade process on identical routers in the lab, then on easily accessible, non-critical routers and only then on remote routers. Having a large variety of router platforms and firmware versions complicates the process. We therefore attempt to reduce the number of platforms used and ensure identical firmware version on all. In his recent paper, Sonesh Surana et al. suggests that a failed upgrade could be reversed back to a safe OS state, by combining the onboard hardware watchdog (WRAP Boards) in conjunction with the LILO boot loader. While this might be possible on some well designed platforms where the boot loader is on a ROM (or a separate flash memory then the one you flush), with enough flash size to host two complete images together, it cannot be done on low-end routers with 4Mb (or less) total flash size. I could envision some difficulties to recover from such a failed upgrade even on the best routers. (This is a scary and error-prone process even when attempting to remotely upgrade some of the world’s largest and most expensive routers {Cisco, Juniper, etc} and I would love to learn how this is done on a WRAP board). Nevertheless, for the low-end routers used in Dharamsala, and for those I expect to dominant rural installations elsewhere, it’s not an option. Arguably, the only way to recover from a bad kernel image is if one have console access, or over a directly connected Ethernet via TFTP. However, it’s not uncommon for a failed upgrade to damage the boot-loader as well, as it is on the same flash memory chip, rendering the unit useless until a lab with a JTAG programmer is reached (not recommended as a field exercise). I therefore argue that it’s important to have an identical testing environment of a substantial scale, in order to embark on a network-wide firmware upgrade process. No automation could replace proper QA and a well tested process. Arguably, once the new firmware is well tested, automatic upgrade can begin. Today, in Dharamsala we initiate this process from our operations center, using automated scripts, therefore missing routers which might be down at the time of upgrade. We should look into a feature of having the routers automatically pull and self-install upgrades once these become available on a remote server. However, the large variety of platforms greatly complicates this automation process and increase the potential for failures. 2.16 Bandwidth maximizer and peering optimizer:The general quality standards of Internet Service providers in India are very low. Many hours of total service downtime are frequent with hardly a day without an uninterrupted service. When service is resumed, unacceptable congestion is common, offering a slow, limiting and erratic service. There are no service level agreements (SLAs) and the common concepts of service and costumer-care are very poor. Arguably, as demand levels are often higher then what ISPs can supply, with the growth rate in demand higher then of supply, indicating this gap is about to widen further in the near future, it is unlikely that service quality and care would improve soon. As we move deeper into rural India, the situation worsens with costs for service increasing as service quality decreases. It is therefore impossible to provide good Internet service to your subscribers while being dependent on a single upstream provider for the whole network. Many rural networks, even those which where designed initially to provide intra-network services, often evolve and use common Internet connectivity substantially more then their initial planned services, turning them into local rural ISPs. It is therefore surprising how the issues of bettering upstream Internet connectivity are often overlooked by researchers and activists. One of the more unique and special components of the AirJaldi network in Dharamsala is what we refer to as the “Bandwidth Maximizer”; This element is responsible to ensure uninterrupted internet connectivity, while maximizing the utilization of available bandwidth (without over-utilizing it) taking into account financial variations and restrictions of the available packages from multiple ISPs. My personal experience of establishing Internet related setups in developing countries, suggests that there is a huge demand for such a solution, ideally in the form of a plug-n-play appliance. In fact, earlier this week I attended a talk about using the Internet for disaster relief, where one of the presenters was Mr. Jonathan Thompson – the president of Humanlink ( http://www.hlink.org/ ) who seems to be focusing most of his operation on making similar technologies available to aid operators on the ground. To be specific here are some of the issues addressed by the Bandwidth-Maximizer: 2.16.1. Load balancing:Using multiple connections and multiple packages and services from multiple upstream ISPs, greatly reduces overall downtime, while offering substantial costs savings if managed well. Each mode of connectivity comes with a different set of costs, service levels and restrictions. To name two examples: Leased-lines, while commonly costly, generally offers a somewhat increased availability (although not much in rural India) and also provide symmetric bandwidth (both upstream as downstream) without a monthly restriction on the volume of transferred data, making them ideal for VoIP and up-streaming of video as well as most other outbound traffic, while their costs often limits the bandwidth we can afford to buy of these lines. ADSL lines, are generally cheaper, offer good bandwidth in the inbound direction while hardly any in the outbound, and also restrict the monthly data volumes; such lines and plans are mostly suitable for web-browsing traffic and little else. It is therefore not enough to equally load-balance the traffic using all lines, but to classify the many types of traffic and make wise routing decisions (policy routing) based both on these administrative and costs factors as well as on the real-time quality and load of the line at a given moment. Line conditions dynamically change over time; although you might buy a 2mbps line, it is unlikely that you’ll be able to push (or receive) nearly as much through it. Also, routing loops and “holes” are common, often blocking connectivity to large address blocks. Other lines qualities, such as the routable MTU also change dynamically, adding to the complexity of monitoring and deciding which line to use and when. While we do most of these routing decisions manually and statically, I feel these could be automated and optimized to a large degree. By constantly monitoring line conditions, we could give technical quality score to each line, and weigh it against the administrative costs assigned to lines, allowing an automated best-choice routing, dynamically optimized to the changing conditions. I feel there is much room for research of these problems. While many out-of-the-box load-balancers exist, there are none that I’m aware of, which could offer any improvement for this particular problem, as they where not designed for such an environment and specific needs. We encourage interested developers to become involved and contribute to the improvement of that system and the overall “Bandwidth maximizer” there is surely much room for further R&D around this. 2.16.2. QoSQoS should be performed in tune with the load-balancing and policy routing. In fact, the intelligent load-balancer described above is already ensuring a basic level of QoS by routing delay-sensitive applications via the faster lines and delay-tolerant applications via cheaper lines. However, the load-balancer and policy-routing are no substitute to intelligent classification of packets and use of priority queues. Other then classifying packets based on L4 ports information as well as IP addresses; we face a more complex challenge of identifying protocols disguised as others. Most damaging are P2P application that are known to hog all available bandwidth while appearing as HTTP traffic and get smoothly tunneled via proxies, etc. At present, we use the “Application Layer Packet Classifier for Linux” ( http://l7-filter.sourceforge.net/ ) to identify these sneaky protocols and divert them to slower and unlimited, cheaper lines. However, I feel that experimenting with tools such as Snort (http://snort.org/ ) aimed at intrusion detection, to classify packets, might prove to be a better overall approach. 2.16.3. Transparent content caching:Transparent HTTP caching proxies greatly improves users experience while reducing traffic on the expensive uplinks. As could have been expected, we have found web-surfing patterns to be very similar among our users, greatly improving the hit-rate of the cache. We use a Squid proxy – with some very special configuration tweaks for optimization of performance to our unique local needs. http://www.squid-cache.org/ 2.16.4. Advertisements removalSubstantial traffic associated with web-surfing origin from ads, banners, pop-ups, and other junk. It is not uncommon that loading of web-sites delay as most of it are these unwanted objects which are often much larger in size then the wanted parts of the site. While a bit challenging to remove without hurting the layout of the page, there are products which can achieve this task well. We also wish to allow users to manually and specifically request an object missing on such a filtered page. At present we use a product called “Privoxy” - http://www.privoxy.org/ . There are other open-source solutions available which might be more suitable; however it’s been long since I reviewed the progress of these tools. Privoxy has been around and working well for many years. 2.16.5. Viruses and SPAM filteringViruses are a very dominant source of network failures in developing countries. At present we scan and filter for viruses only on incoming email (already at our main email relay in California) stopping them long before they reach India. Same goes for SPAM email, by using a combination of advanced techniques, namely: Grey-listing, use of DNS black-lists, SpamAssassin? (advanced scoring system), etc. However, filtering and scanning of incoming emails is not enough – it is not uncommon for SPAM sending viruses and Trojans to penetrate computers of our subscribers, hence turning our network into a SPAM source. It is therefore important to scan for viruses and SPAM also in outgoing emails through our relays. In fact, this issue has further complexities regarding which outgoing SMTP server our users should use. As most of our external IP addresses are dynamic and belong to Indian ISPs it is very rare that any email sent from our Created by: admin last modification: Sunday 25 of November, 2007 [12:51:27 UTC] by admin |
Login |