Deployment of Conference XP over Wireless Network:
Summarized Test Results
Adam Eck, LD
Miller, and Leen-Kiat Soh
{aeck, lmille,
lksoh}@cse.unl.edu
256 Avery Hall, Lincoln, NE 68588-0115
Tel: (402) 472-6738 Fax: (402) 472-7767
1. Brief
Introduction
In
the following, we report on a series of tests on deploying Conference XP (CXP)
over a wireless network here at the University
of Nebraska. There are four phases of tests:
- Phase
I: Preliminary tests, in April 2006,
were conducted to determine the drop rate of the existing wireless network
when using Conference XP to chat and send presentation slides
- Phase
II: PGM Implementation tests, in July-August 2006, were
conducted when implementing the PGM protocol to replace the UDP protocol
for Conference XP
- Phase
III: PGM Deployment tests, in
September 2006, were conducted to determine how the PGM-powered Conference
XP would work in a real classroom, with 30+ laptops competing for wireless
access points in a classroom.
- Phase
IV: I-MINDS Deployment tests, in October
2006, were conducted to determine how the CXP-powered I-MINDS would work
in a real classroom over wireless, with 30+ laptops. Note that we have extensively tested
I-MINDS deployed in wired networks.
Indeed, I-MINDS has been deployed at the online Bellevue University
since May 2006.
Overall,
the above tests were successful in terms of providing us insights to continue
to refine and revise our designs and implementations. Phase I tests confirmed Microsoft Research’s
observations of the wireless CXP deployment’s drop rate performances. Phase II tests validated our PGM
implementation, albeit in a small wireless environment. Phase III tests verified the correctness of
our PGM implementation, in a classroom-size wireless environment, with 30+
venue participants simultaneously.
However, Phase IV tests were not as successful as initially hoped as we
realize that we still have to first address specific installation problems with
I-MINDS to minimize potential user errors and database tracking problems. From
analyzing the logs, we have some solution ideas in mind. We will continue to revise our solutions and
continue with the Phase IV tests in the near future.
2. Phase I:
Preliminary Tests
These tests confirm the poor drop rate of CXP over
the wireless networks, the better-performance of 802.11g over 802.11b, and chat
messages suffer fewer drops than Powerpoint slides.
In summary,
the Chat capability was very much better than the Classroom Presenter in terms
of drop rates. The Classroom Presenter,
with just one slide, then it was good.
If there were 2 or more slides, the transmission seemed to get significantly
worse.
In
summary, the G-connection had better results, rarely encountering drops
of slide pages. The B-connection fared
worse—five runs had at least a drop of 1 slide page. We don't know how to
explain Run 4 where things seemed to go wiry.
We have sent the event log to Tim Chou for review.
In
summary, we see that the B-connection fared worse.
3. Phase II:
PGM Implementation Tests
These tests were conducted when PGM was being
implemented. We used these tests to
design our PGM implementation, to evaluate our late-join feature and statistics
logging feature.
- Overall this system seems to work pretty
well. It still doesn’t have the advantage of the old system where
all receiving could be done on one socket, but at least I do not have to
manage multiple threads anymore. With a little bit of error
handling, the test application seems pretty robust.
- Message delivery seemed to be pretty
reliable. None were dropped.
- Interference didn’t seem to be too much of a
problem either. At one point, interference seemed to make it so the
Tablet PC could only receive messages from itself. However, about a
second later it suddenly received all the messages it missed from the
other machines. They obviously weren’t in the correct order, but at
least they were received! This is a huge improvement over
UDP.
- This was a small test of only 3 machines on
my local router, but the results are promising.
- The biggest concern is still scalability,
since the asynchronous methods still do a little thread handling in the
runtime environment.
- Chat didn’t drop any messages, Local Screen
Streaming worked, and Shared Browser worked.
- We also loaded up Presenter and ran a test
similar to what we initially did back in March/April. But now, all 3 machines received the
inks without dropping a single one! The latency for all of this was
very fast as well.
- Finally, our I-MINDS capabilities were loaded and
they worked flawlessly as well.
- Obviously, this test scenario is pretty
isolated since it was only 3 machines (2 over wireless) behind a simple
home router, but the results are very promising.
- During the ~3 minutes (185.475 seconds) of
testing, 2030 PGM packets were sent back and forth between the three
machines.
- Of these 2030 packets, 48 with NAK packets
(negative acknowledgement – meaning the machine recognized that it missed
a packet).
- Each NAK was responded to with a NCF packet
from the original sender, letting the NAK-sender know that they would
resend the packet, and then an RDATA packet was sent containing the
original message.
- There were absolutely no NAKs that did not
have a resulting NCF and RDATA packet, meaning every time someone thought
they lost a packet, they received it later.
- There were a couple instances where a machine
had to send several NAKs for a lost packet, but it always received them
in the end, and each NAK was responded to (these responses were just lost
like the original packets).
- Given these numbers, there were 943 ODATA
(original data) packets sent between the machines, resulting in only 48
NAKs, which is an initial success rate of 94.90%.
- The NAK packets themselves had a 100% rate of
resolution. These are very good numbers. Of course, they will
be lower in an implementation with more interference and machines, but
hopefully the drop off isn’t too severe.
- After analyzing the data for dropped packets, we
looked for trends in how the machines received the packets.
- First of all, while PGM does not guarantee
overall ordering of packets, it does preserve the order from each
individual sender. For example, say the three computers each sent
out packets in the following order (desktop1, laptop1, tablet1, desktop2,
laptop2, tablet2, desktop3, laptop3, tablet3). The packets might
not show up in that exact order at each receiver, but the packets from
each sender will show up in order (meaning desktop2 will never come
before desktop1, but tablet3 might come before laptop3).
- The other interesting thing is that this
ordering scheme can result in some pretty interesting things. For
example, say my tablet dropped a packet from my laptop. While it was
waiting to receive that packet after sending a NAK, it would continue
receiving from itself and my desktop, as one might expect. What is
interesting here is that once it finally received the dropped packet, it
suddenly printed out and logged all the messages sent by the laptop since
the dropped packet. In other words, the logs looked something like
this: desktop47, laptop46 (last packet received before dropped one),
tablet50, desktop48, tablet51, desktop49, desktop50, tablet52, laptop47
(received copy of the dropped packet), laptop48, laptop49, laptop50,
laptop51. This sudden rush of packets from one sender was common
after every single NAK. This is kind of nice because it preserves
order for that sender and it doesn’t penalize that sender by making every
subsequent packet be processed later – it tries to “catch-up.”
In summary, we are ready to begin testing in
a larger environment with more machines.
4. Phase III: PGM Deployment Tests
Now we were comfortable with our local tests, we
wanted to test the PGM deployment at a real classroom. We requested the IS (Information Services –
our IT group) department to enable the multicast for us at Kauffman Hall where
the class JDE183H is held. The results
of these tests were mixed. We observed
several problems that were not observed in our previous tests. We also observed some erratic user behaviors
that caused the deployment to fail. We
also went through a difficult time installing the PGM functionality, the
I-MINDS capabilities, and a database driver, as student laptops had their own
different setups. We also had to fix the
security updates by Windows. Please see
Appendix A for our installation guides for the students in JDE183H. But
in the end, we collected enough data to say that PGM has been successfully
deployed.
This was our first “real” test. We asked the students to install the PGM
functionality on their laptops (see Appendix A1). We tried running a
test of 30+ laptops in the classroom running PGM, but we ran into a big
problem:
- Many of the students were in the venue, but
not everyone could see each other. Worse yet, there were no distinct
groups of people visible across machines (i.e., it wasn’t like half the
people were visible to their half, and the other half to theirs).
We found out later, after analyzing the logs and
thorough off-site testing, that a Windows security update (KB919007) caused the
PGM implementation to fail. After
testing with a packet sniffer again, we determined that NAKs were never being
responded to by senders with NCFs or RDATA packets. This caused the receivers to become stuck in
an infinite loop requesting the dropped packet, causing the participants to
time out. Uninstalling the security
update fixed the problem. However,
thanks to Tim, we have been in contact with two members of the Windows security
team and they are working on a solution.
In summary, overall, things are looking very
promising and we are looking forward to running more extensive tests. One problem we encountered during this
test was student laptops going into hibernation. When this happened the PGM sockets were
closed by the operating system and it was difficult to restore them when the
laptop came out of hibernation. In fact
the students would have to close CXP completely and restart it to get back into
the Venue. We asked the students to
change their powersave options, but it is difficult to coordinate such a change
in this large of an environment.
5. Phase IV:
I-MINDS Deployment Tests
We were not too sure about how I-MINDS would work
over wireless networks. I-MINDS makes
extensive use of a database, has numerous tracking/logging points, and involves
a large volume of message exchanges.
Thus, we built a simple simulator to test message sending and
receiving. After we were comfortable
with the design, we tried to deploy I-MINDS in the same JDE183H class. The deployment was a failure. We report here our findings.
To
test the 3.0 version of our capability we installed it on eight borrowed
laptops. Six of the laptops are six year
old IBM Thinkpads with 802.11b, while the other two were newer Toshiba
Satellite tablets with 802.11g. The
laptops were split into two groups.
Messages sent by participants in the first group were only received by
other participants in the first group.
The same was true of the second group; there was no overlap between
groups.
On
the teacher version only, we also automatically started the I-MINDS capability.
- More than
thirty of the students were able to run the script and connect to the UNL
CXP Venue.
- Unfortunately
at this point in test we started to encounter problems. Around one quarter of the students
received an exception when CXP tried to load the capability instance for
their computers.
- When we
consulted the error logs for the ConferenceXP API we discovered that these
students were unable to connect to the I-MINDS database. In all but two cases it appeared that
the students configured the database connection through a system DSN on
their machines properly. However,
deleting the DSN and creating a new one solved the capability crash
problem in all but one machine. We
are not sure whether this was just user error, or a more insidious problem
involving our database connection on a saturated wireless network.
- We also
had problems with the capability being started more than once. This of course caused the capability
form to appear multiple times on all machines connected to the Venue. Right now we have two possible
explanations for why this happened:
- The
first is that one or more curious students decided to start the
capability, to see what would happen.
While this might have happened once or twice, it does not explain
the six plus versions of capability running at the same time!
- The
second explanation is a problem with late join in PGM. Basically, whenever a user joins a
venue that already has a capability running, the machine that began the
capability sends out information to the new user to help them to start
running the capability as well.
However, late join might conflict with this approach. Here is how late join works: all senders keep track of their last X
messages (where X is configurable) in a send window. When a new user joins, their receiver
asks the sender for a certain percentage of those X messages in the
window (this percentage ranges from 0 – 75). If the original capability start is one
of those messages in the send window, the user will automatically receive
the original start, and also the one the sender transmits when the new
user joins, causing two instances of the capability to begin on the new
user’s machine. Since they now
have two, the second one might cause the new user to become a capability
sender themselves, which would result in everyone else loading a second
instance as well. Obviously, as
more users join the venue, this effect will cause even more capability
instances to be loaded. This
behavior was observed in our last test in Soh’s class, but we have been
unable to confirm if it was caused by late join or if students simply
started up new instances after being told not to.
In
summary, we need to address the above problems that occur when we scale
up and when we deal with users in non-control environments.