[Faccus] follow-up from the Waterloo LEARN outage

chappell at uwaterloo.ca chappell at uwaterloo.ca
Mon Feb 11 11:50:06 EST 2013

[Mon., Feb. 11, 2013]
Hello all,
Earlier this morning I sent this update to the Deans (or those who were
their designates for the communication updates during the LEARN outage).

Since the outage of LEARN that started just under 2 weeks ago, we have
continued to gather information from Desire2Learn about the causes, and to
work with them to ensure no such outage occurs again. We have received
from them a technical description of the root cause of the problem, and we
need to work with them further to create a partnership that puts us in the
decision path for such significant changes to our system. Locally we are
continuing work on contingency plans and communications plans, based on
our lessons learned during the outage. Once we have solid results from
both of those activities, we will provide further information to you, the
professors, and the students on actions taken, and changes in practice and

I will also post an article to the Bulletin based on the information in
the attached document.

If you have any questions or concerns, please feel free to contact me.
Thank you,

> [Update Sun., Feb. 3, 8:15pm]
> Hello,
> Desire2Learn reported today that "all systems have operated above expected
> performance and we have seen no indication of recurring issues." They also
> report significant use (up to 80% of peak usage) and yet the system is not
> loaded (operating at 20% of its capacity). I have had no problem reports
> from staff who were checking this weekend. Everything looks good, and
> seems to have returned to normal ever since Friday at 12:30pm.
> There will be a lot of "clean up" work for ITMS LEARN support staff, the
> Faculty Liaisons from CTE, support staff in CEL, and other support areas,
> and of course for instructors who need to deal with changes in their
> course schedules due to the disruption.
> We will work with D2L to get further details on the event, and also to set
> up an agreement whereby they work with us to plan infrastructure or other
> major changes, so that we are part of the plan, understand the risks, and
> participate in deciding when significant changes can take place. We will
> also work, on our part, to improve our communications plan and contingency
> plans based on lessons learned during this event.
> Unless there is a change, this will be the last update on the system
> itself but I'll keep you updated on what comes out of the event review.
> Thanks!
> Andrea
>> [Update Fri., Feb. 1, 4:30pm]
>> Hello,
>> This is the email I've been wanting to send since about 3:26pm on
> Tuesday.
>> The Waterloo LEARN system has been running since about 12:30pm. We have
> done local testing and people are on the system, and it appears to be
> performing normally. We feel the system is stable and that it can go back
>> into normal use. Desire2Learn continues to monitor their hosting site
> and
>> we will continue to check for any alerts. A thorough investigation into
> events will begin next week, in collaboration with D2L.
>> Thank you for your patience during this difficult time. And a huge thank
> you from both me and Dave Wallace to everyone who pitched in to help
> support students and professors, to communicate messages out, and to
> support each other in these 3 very long days. We've learned a lot, and
> while I never want to have to put it into action again, we will have much
>> better contingency plans in future!
>> Andrea
>>> [Update Fri., Feb. 1, 1:40pm]
>>> We had a phone call with the chief operating officer from Desire2Learn
> a
>>> short time ago. You may have already noted that our system is back
> online.
>>> They believe the problem is resolved for us, have performed their
> functional testing, and are asking for us to test as well. Our first
> priority is in making sure we have a fully functioning system. We are
> going through a verification testing and hope to be able to report back by
>>> 3:00 pm on how well the system is performing. D2L will be monitoring
> the
>>> systems closely and will send us any updates from other universities
> who
>>> are in the same situation as well.
>>> Over the next week, we will undergo reviews with D2L and work with them
> to
>>> improve the planning process for changes that involve our system, so
> that
>>> we are aware of timing, potential risks, and participate in the
> decisions
>>> around planned changes. We also will work locally on improving
>>> contingency
>>> plans so we are better prepared to deal with such extraordinary
> circumstances.
>>> Thank you for your patience and we will report again when we have more
> information on the system's stability now that it is back online.
> Andrea
>>>> Hello everyone,
>>>> In an update this morning (not the official general one from D2L,
> which
>>>> had no news, rather from one of our reps), D2L has moved the majority
> of
>>>> clients to the upgraded data infrastructure put in place last fall,
> and
>>>> those clients are performing normally now.
>>>> While they continue to move data, they also are attempting to make
> improvements to the older infrastructure to allow remaining clients,
> including us, to return to normal, but that will require verification and
>>>> testing.
>>>> Presumably one of these two-pronged approaches will be in place first
> and
>>>> we will be able to run.
>>>> We have had several promises for a time line to completion but as of
> 8:45
>>>> D2L does not have a resolution time for us.
>>>> We have assurances there is no loss data, with verifying at every
> stage
>>>> which takes longer.
>>>> We will update you as soon as we have substantive information. There
> was
>>>> to have been a "material status update" in the official general
> updates
>>>> from D2L, at about 7:30am, but that has not arrived.
>>>>> Hello everyone,
>>>>> Sorry for the long lapse, but there were no definitive answers or
> time
>>>> lines to report. We have been in touch with many of you as we focused
> on
>>>> getting messages to the Deans to send to their professors (sent to
> Deans
>>>> at the end of the day and won't be forwarded until morning in some
> cases).
>>>>> Also, we were providing information for a message to students (see
>>>> posted
>>>>> on the LEARN Help site, thanks to Daspina:
>>>>> https://uwaterloo.ca/learn-help/news/message-provost-regarding-learn
> )
>>>>> As of a 9:22pm update, D2L has confirmed that they know the solution
> to
>>>> the problem, and it involves moving our data (and that for about 25%
> of
>>>> their customers) so it is not behind a device that is causing the
> problems. They are working towards that now. In parallel, they are
> calculating estimates for how long the solution will take, and will send
>>>> that information when it is available. They have said they will also
> update the LEARN redirect site with information on progress,
> presumably
>>>> once they have established the expected time to completion. Otherwise,
> the
>>>>> next update is at midnight.
>>>>> You may have noted already the article in the Record.
>>>>> http://www.therecord.com/news/business/article/878997--desire2learn-s-cloud-data-centre-issues-have-uw-wlu-offline
> Thank you, everyone, for your support and patience in this, and for all
>>>> that you are doing to try to support faculty and students during this
> very
>>>>> frustrating time.
>>>>> Andrea
>>>>>> Hello,
>>>>>> There has been no significant news on a recovery time. This update
> is
>>>> to
>>>>>> let you know there is a memo going to the Deans, meant as a message
> to
>>>> send to all instructors. In it will be a pointer to a support site,
> showing instructors how to generate a mailing list to contact their
> students by email. Instructions are also available for how to give access
>>>>>> to large files, rather than sending them through email.
>>>>>> https://uwaterloo.ca/learn-help/while-learn-down
>>>>>> News on LEARN is at the same site.
>>>>>> https://uwaterloo.ca/learn-help/news
>>>>>> We're not sure when the message will be sent to instructors, but
> this
>>>> is
>>>>>> a
>>>>>> heads up of the support they may be looking for.
>>>>>> More news as we get it.
>>>>>> Andrea
>>>>>>> [Thu., Jan. 31, 10:45am]
>>>>>>> Hello,
>>>>>>> The latest news indicates that other clients are also experiencing
> the
>>>> slowness, and that specific clients (and we don't yet know if we are
> one
>>>>>>> of them) will have data migrated to a different storage device.
> Later
>>>> I
>>>>>>> will send an update from a phone call Dave Wallace and I had with
> some
>>>> D2L
>>>>>>> executives. We will have short calls with the Deans over lunch hour
> to
>>>> discuss the situation.
>>>>>>> Andrea
>>>>>>>> [Thu., Jan. 31, 8:55am]
>>>>>>>> Hello again,
>>>>>>>> We reported to D2L the HTTP 500 errors and also that our system
> was
>>>> very
>>>>>>> slow. They know there is a problem and were working on it. We
> agreed
>>>> it
>>>>>>> was better to have the system unavailable than run with those
>>>> problems.
>>>>>>> We
>>>>>>>> do not have a recovery time as yet, sorry.
>>>>>>>> Andrea
>>>>>>>>> [Thu., Jan. 31, D2L 7:45am update]
>>>>>>>>> Hello,
>>>>>>>>> Desire2Learn made our system available again and said we can
> "assume
>>>> that
>>>>>>>>> all work has been completed and that we have finished our
> verification
>>>>>>> testing of your site". However, the support team is seeing errors
>>>> (HTTP
>>>>>>> 500) and has had reports of problems logging in by students and
>>>> faculty
>>>>>>> members. The system is very slow. We will be talking to D2L about
> this
>>>> and
>>>>>>>>> we will keep you updated as soon as we know more.
>>>>>>>>> Andrea
>>>>>>>>>> [Thu., Jan. 31, D2L 4:25am update]
>>>>>>>>>> Hello,
>>>>>>>>>> D2L sent an update at midnight, indicating that the process they
>>>> had
>>>>>>> anticipated finishing by then was 98% done, and that once it
> completed,
>>>>>>>>>> they would start verification.
>>>>>>>>>> At 4:25am the update indicates a 2-3 hour process needed. It is
> a
>>>> bit
>>>>>>> confusing as to what this process is, except that it involves
> migration
>>>>>>>>>> of
>>>>>>>>>> our data and files to the new configuration. They will provide a
>>>> further
>>>>>>>>>> status update upon completion, which could be 6:30 - 7:30am
> based
>>>> on
>>>>>>> their
>>>>>>>>>> estimate.
>>>>>>>>>> Andrea
>>>>>>>>>>> [Wed. Jan. 30, 10:00pm update]
>>>>>>>>>>> Hello,
>>>>>>>>>>> The latest message from D2L, at 10:12pm, says they still expect
> to
>>>> have
>>>>>>>>>>> the recovery by midnight. They have mentioned the need for
>>>> verification
>>>>>>>>>>> but have not indicated how long that might take. They have said
>>>> they'll
>>>>>>>>>>> take off the "maintenance" message when the recovery and
> verification
>>>>>>>>>>> is
>>>>>>>>>>> complete. I assume that means they will remove the redirect
> then.
>>>>>>> Questions are arising individually about what will happen with
>>>> assignments
>>>>>>>>>>> and tests. I think instructors are altering due dates for
>>>>>>> assignments,
>>>>>>>>>>> but
>>>>>>>>>>> mid-terms are likely more difficult. Hopefully instructors will
>>>> post
>>>>>>> announcements in their courses to indicate any changes, so student
> see
>>>>>>>>>>> those as the system is back up.
>>>>>>>>>>> Next update is at midnight and I may not be awake for that!
> Andrea
>>>>>>>>>>>> [Wed. Jan. 30, 5:45pm update]
>>>>>>>>>>>> Hello,
>>>>>>>>>>>> The changes anticipated to be complete by 6:00pm are not
>>>> complete.
>>>>>>>>>>>> The
>>>>>>>>>>> revised finish time is estimated for between 10:00pm and
> midnight,
>>>>>>> followed by a verification process.
>>>>>>>>>>>> The Waterloo LEARN system remains inaccessible during this
> time.
>>>> Access
>>>>>>>>>>> to
>>>>>>>>>>>> the barely responsive system created many more problems for
> our
>>>> faculty
>>>>>>>>>>> and students (long wait times to access the system, if at all,
>>>>>>>>>>>> interruptions to most course functions making those functions
>>>> unusable,
>>>>>>>>>>> inability to access files, etc.).
>>>>>>>>>>>> We expect the next update at about 10pm. At this time the
> system
>>>> has
>>>>>>>>>>> for
>>>>>>>>>>>> 27 hours. We will be seeking detailed explanations from D2L,
> as
>>>> well
>>>>>>>>>>>> as
>>>>>>>>>>> their actions to prevent this from happening again.
>>>>>>>>>>>> Andrea
>>>>>>>>>>>>> [Wed. Jan. 30, 3:30pm update]
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>> The message from D2L remains the same, that the expected
> up-time
>>>> is
>>>>>>>>>>> 6pm.
>>>>>>>>>>>> Thanks to Pat Lafranier for enlisting the help of Jonathan
>>>> Woodcock
>>>>>>>>>>>> to
>>>>>>>>>>>> put
>>>>>>>>>>>>> for now a "Help with LEARN" button on the pathways pages on
> the
>>>>>>>>>>> Waterloo
>>>>>>>>>>>> website, to point to our LEARN update page, maintained by
> Daspina
>>>>>>> Fefekos.
>>>>>>>>>>>>> https://uwaterloo.ca/learn-help/
>>>>>>>>>>>>> Our next update from D2L is expected at 6pm and I will send a
>>>> message
>>>>>>>>>>>> again then.
>>>>>>>>>>>>> Andrea
>>>>>>>>>>>>>> [Wed. Jan. 30, 12:30pm update]
>>>>>>>>>>>>>> (Apologies if you receive a second copy of this message.
> There
>>>>>>>>>>> appeared
>>>>>>>>>>>>> to
>>>>>>>>>>>>>> be a problem when I sent it the first time, and I want to
> make
>>>>>>> sure
>>>>>>>>>>>>>> it
>>>>>>>>>>>>> gets out.)
>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>> I'm afraid the news is not good. D2L reports that they have
>>>> received
>>>>>>>>>>>> new
>>>>>>>>>>>>> devices at the hosted site, to fix the problem. There were to
> be
>>>>>>>>>>>> configuration changes expected to finish at noon which have
> not
>>>>>>>>>>>>> completed.
>>>>>>>>>>>>>> They expect it to be completed by 6pm ET, which seems to
> imply
>>>>>>> that
>>>>>>>>>>>>>> is
>>>>>>>>>>>>> when we will be back up. Our next update from D2L will be at
>>>>>>> 3:00pm.
>>>>>>>>>>>> Unless we hear anything more before then, that will be the
> time
>>>> of
>>>>>>>>>>>> our
>>>>>>>>>>> next update.
>>>>>>>>>>>>>> Andrea
>>>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>>>> We asked D2L to redirect our site to a page that says the
>>>> system
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> unavailable. We have not had an update since the message at
>>>>>>> 8:07am,
>>>>>>>>>>>>>> so
>>>>>>>>>>>>> still anticipate the system to be back by noon. We have asked
>>>> that
>>>>>>>>>>>>> the
>>>>>>>>>>>> redirect page provide updated information, but cannot
> guarantee
>>>> it.
>>>>>>>>>>>>>>> We will let you know as more information comes in.
>>>>>>>>>>>>>>> Andrea
>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>> We have heard from Desire2Learn (8:07am) and have done
> some
>>>> testing
>>>>>>>>>>>> of
>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>> own. According to D2L, they have "partially implemented"
>>>> their
>>>>>>>>>>>>>> solution.
>>>>>>>>>>>>>>> They caution that some sites may still see impacts. They
>>>> "anticipate
>>>>>>>>>>>>>> steady improvement to normal performance" by noon today.
>>>>>>>>>>>>>>>> From our tests, Waterloo LEARN seems to be very slow for
>>>> login
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>> for
>>>>>>>>>>>>>>> accessing files. (At about 7am it was performing well, but
> is
>>>> not
>>>>>>>>>>>> now.)
>>>>>>>>>>>>>>>> We will keep you apprised as we hear more. Our sincere
>>>> apologies
>>>>>>> for
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> significant interruption. We will be pursuing detailed
>>>> explanations
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> D2L.
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Andrea
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LEARN_outatge_update_Deans_11feb2013.pdf
Type: application/pdf
Size: 202935 bytes
Desc: not available
URL: <http://lists.uwaterloo.ca/pipermail/faccus/attachments/20130211/ee657601/attachment-0001.pdf>

More information about the Faccus mailing list