Using Duplicity with Microsoft SharePoint/OneDrive for Business

For a long time I’ve been using Duplicity as my primary backup tool mostly because of its space efficient incremental backups. Since I use Linux both at work and at home this also applies to my corporate system.

Since Linux is not very well supported in my company a lot of set-up is done in do-it-yourself way. In a way I like that as this gives me freedom of choice as to the tools used. As backup is concerned Duplicity was the obvious choice. What I found slightly difficult is to find the right place to store my backups, as using a private USB HDD or Dropbox is obviously not the right choice for corporate data.

Duplicity comes with a wide choice of storage back-ends, starting from local filesystem, network file services like FTP, SFTP or WebDAV up to cloud storage systems like Amazon S3 or popular consumer systems such as Google Drive, Dropbox or OneDrive.

SharePoint/OneDrive for Business as a storage back-end

Initially I became interested in the last option as my company is using Microsoft Office 365 Enterprise which comes with 1TB of personal OneDrive storage. Unfortunately after some quick research I have learned that the corporate OneDrive has not much in common with the regular (private) OneDrive (besides the name of course). The corporate OneDrive storage is essentially a variant of Microsoft SharePoint with some more user-friendly frontend. This means that Duplicity’s personal OneDrive back-end will not work with the corporate OneDrive service.

SharePoint has a RESTful API to access it’s contents, which is not overly complicated in my opinion. I was thinking of developing a back-end module for Duplicity for it. Fortunately after some research I have discovered that besides its native REST API SharePoint also offers access over WebDAV, which is slightly less documented. Since Duplicity has a native WebDAV back-end this sounded like a way forward.

Authentication

The immediate problem that I faced was authentication. SharePoint offers a wide range of those starting from basic username/password to SAML-based ones. My company uses the latter, which Duplicity lack support for.

$ wget -S https://xxx.sharepoint.com/personal/john_doe_xxx_com/Documents/ --method=PROPFIND
--2017-02-24 09:29:03-- https://xxx.sharepoint.com/personal/john_doe_xxx_com/Documents/Backup/Duplicity/
Resolving xxx.sharepoint.com... 104.146.250.25
Connecting to xxx.sharepoint.com|104.146.250.25|:443... connected.
HTTP request sent, awaiting response...
 HTTP/1.1 403 FORBIDDEN
 Content-Type: text/plain; charset=utf-8
 Server: Microsoft-IIS/8.5
 X-SharePointHealthScore: 0
 SPRequestGuid: 9b16d89d-9030-3000-b8ce-0084932e613c
 request-id: 9b16d89d-9030-3000-b8ce-0084932e613c
 X-Forms_Based_Auth_Required: https://xxx.sharepoint.com/_forms/default.aspx?ReturnUrl=/_layouts/15/error.aspx&Source=/personal/john_doe_xxx_com
 X-Forms_Based_Auth_Return_Url: https://xxx.sharepoint.com/_layouts/15/error.aspx
 X-MSDAVEXT_Error: 917656; Access+denied.+Before+opening+files+in+this+location%2c+you+must+first+browse+to+the+web+site+and+select+the+option+to+login+automatically.
 X-IDCRL_AUTH_PARAMS_V1: IDCRL Type="BPOSIDCRL", EndPoint="/personal/john_doe_xxx_com/_vti_bin/idcrl.svc/", RootDomain="sharepoint.com", Policy="MBI"
 X-Powered-By: ASP.NET
 MicrosoftSharePointTeamServices: 16.0.0.6216
 X-Content-Type-Options: nosniff
 X-MS-InvokeApp: 1; RequireReadOnly
 P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI TELo OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
 Date: Fri, 24 Feb 2017 08:29:02 GMT
 Content-Length: 13
2017-02-24 09:29:03 ERROR 403: FORBIDDEN.

This type of authentication requires a set of cookies to be passed. The basic idea is that the browser should open the authentication site, which will ask for credentials, such as username, password or PIN/OTP. Once the user is authenticated a ticket will be generated and set as a cookie and a redirect is issued to the original site, which now grants access based on the cookie.

The cookie can be obtained by visiting the site mentioned in the X-Forms_Based_Auth_Requied header. However there is another way to obtain the cookie, which doesn’t involve displaying a webpage. You need send a SOAP request to https://login.microsoftonline.com/extSTS.srf containing your username and password. Upon successful completion the response will contain a token that has to be passed to your SharePoint site login service, which is available using the URL http://your-sharepoint-site.com/_forms/default.aspx?wa=wsignin1.0
. If everything checks out the response will contain some HTML, which can be safely ignored. The most important are two cookies set in return: FedAuth and rtFA. These can be used to authenticate with the SharePoint site from now on, including the native REST API and WebDAV.

Automating authentication

In order to automate the above authentication tasks I’ve written a Python script that authenticates against the Microsoft login site and retrieves the necessary cookies from SharePoint.

#!/usr/bin/python

from __future__ import print_function

try:
    from http.cookiejar import CookieJar
except ImportError:
    from cookielib import CookieJar

try:
    from urllib.error import URLError
except ImportError:
    from urllib2 import URLError

try:
    from urllib.parse import urlparse
except ImportError:
    from urlparse import urlparse

try:
    from urllib.request import urlopen, build_opener, HTTPCookieProcessor, Request
except ImportError:
    from urllib2 import urlopen, build_opener, HTTPCookieProcessor, Request

import sys
import xml.etree.ElementTree as ET


authXml = """<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope"
xmlns:a="http://www.w3.org/2005/08/addressing"
xmlns:u="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd">
<s:Header>
<a:Action s:mustUnderstand="1">http://schemas.xmlsoap.org/ws/2005/02/trust/RST/Issue</a:Action>
<a:ReplyTo>
<a:Address>http://www.w3.org/2005/08/addressing/anonymous</a:Address>
</a:ReplyTo>
<a:To s:mustUnderstand="1">https://login.microsoftonline.com/extSTS.srf</a:To>
<o:Security s:mustUnderstand="1"
xmlns:o="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd">
<o:UsernameToken>
<o:Username>{0}</o:Username>
<o:Password>{1}</o:Password>
</o:UsernameToken>
</o:Security>
</s:Header>
<s:Body>
<t:RequestSecurityToken xmlns:t="http://schemas.xmlsoap.org/ws/2005/02/trust">
<wsp:AppliesTo xmlns:wsp="http://schemas.xmlsoap.org/ws/2004/09/policy">
<a:EndpointReference>
<a:Address>{2}</a:Address>
</a:EndpointReference>
</wsp:AppliesTo>
<t:KeyType>http://schemas.xmlsoap.org/ws/2005/05/identity/NoProofKey</t:KeyType>
<t:RequestType>http://schemas.xmlsoap.org/ws/2005/02/trust/Issue</t:RequestType>
<t:TokenType>urn:oasis:names:tc:SAML:1.0:assertion</t:TokenType>
</t:RequestSecurityToken>
</s:Body>
</s:Envelope>
"""

def main():
    if len(sys.argv) < 3:
    print("Usage: get-sharepoint-auth-cookie.py endpointURL username password", file=sys.stderr)
    exit(1)

    endpoint = sys.argv[1]
    username = sys.argv[2]
    password = sys.argv[3]

    authReq = authXml.format(username, password, endpoint)
    try:
        request = urlopen("https://login.microsoftonline.com/extSTS.srf", authReq.encode('utf-8'))
    except URLError:
        print("Failed to send login request.", file=sys.stderr)
        exit(1)

    ns = {"soap": "http://www.w3.org/2003/05/soap-envelope",
          "wssec": "http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" }
 
    authRespTree = ET.parse(request)
    authToken = None
    fault = authRespTree.find(".//soap:Fault", ns)
    if fault is not None:
        reason = fault.find("soap:Reason/soap:Text", ns)
        if reason is not None:
            reason = reason.text
        else:
            reason = "*Unknown reason*"
        print("Railed to retrieve authentication token: {}".format(reason))
        exit(1)

    tokenElm = authRespTree.find(".//wssec:BinarySecurityToken", ns)
    if tokenElm is None:
        print("Failed to retrieve authentication token.", file=sys.stderr)
        exit(1)
    else:
        authToken = tokenElm.text

    endpointUrl = urlparse(endpoint)
    if endpointUrl.scheme not in ["http", "https"] or not endpointUrl.netloc:
        print("Invalid endpoint URL: {}".format(endpoint), file=sys.stderr)
        exit(1)

    cookiejar = CookieJar()
    opener = build_opener(HTTPCookieProcessor(cookiejar))
    try:
        request = Request("{0}://{1}/_forms/default.aspx?wa=wsignin1.0".format(
            endpointUrl.scheme, endpointUrl.netloc))
        response = opener.open(request, data=authToken.encode('utf-8'))
        cookieStr = ""
        cookiesFound = []
        for cookie in cookiejar:
            if cookie.name in ("FedAuth", "rtFa"):
                cookieStr += cookie.name + "=" + cookie.value + "; "
                cookiesFound.append(cookie.name)

        if "FedAuth" not in cookiesFound or "rtFa" not in cookiesFound:
            print("Incomplete cookies retrieved.", file=sys.stderr)
            exit(1)
        print(cookieStr)
    except URLError as x:
        print("Failed to login to SharePoint site: {}".format(x.reason))
        exit(1)

if __name__ == '__main__':
    main()

The complete script can also be downloaded from here as WordPress seems to mess up Python formatting.

The script accepts three arguments:

  • endpointURL – the URL to your personal SharePoint site (for ex. https://xxx.sharepoint.com/personal/john_doe_xxx_com/)
  • username – your SharePoint username – usually the company e-mail (for ex. john.doe@xxx.com)
  • password – your account password. If your company uses Azure Directory two-factor authentication you will need to create a dedicated application password. Otherwise you can try with your regular Active Directory password.

The script will print the string of cookies on the standard output. This string is ready to be put as a value of the Cookie HTTP header.

Whether the script works in your environment or not may depend on the actual authentication configuration. I’m sure that there are ways to block such access to SharePoint and some administrators may have chosen to do so.

Making Duplicty work with cookies

Unfortunately retrieving the cookies will not help with Duplicity as by default there is no way to pass them to the HTTP connection code. In order to do that a change in Duplicity code is needed. You need to edit the backends/webdavbackend.py file, which is responsible for the WebDAV backend and add a few lines (in bold):

class WebDAVBackend(duplicity.backend.Backend):
    [...]

    def __init__(self, parsed_url):
        duplicity.backend.Backend.__init__(self, parsed_url) 
        self.headers = {'Connection': 'keep-alive'} 
        auth_cookies = os.getenv('AUTH_COOKIES') 
        if auth_cookies is not None: 
            self.headers['Cookie'] = auth_cookies 
        self.parsed_url = parsed_url

I have chosen to pass the cookie in an environment variable called AUTH_COOKIES. Just set this variable to the output of the above Python script and it should work.

Getting it all together

All that’s left now is to pass the correct URL to Duplicity. In my case (personal folder on OneDrive for Business) the URL is: webdavs://user:pass@xxx.sharepoint.com/personal/john_doe_xxx_com/Documents/Backup/Duplicity/Hostname/

Note the user:pass string – if you don’t pass an explicit username and password to Duplicity it will ask for it on the command line. You can pass any strings you like – it doesn’t matter as the real authentication is based on the cookie.

Akonadi EWS Resource – first bugfix release

In the initial release announcement I have promised to focus on calendar support. Unfortunately I haven’t had that much time for development as I initially anticipated. In the meantime some early adopters have uncovered a number of issues that have since been fixed. In order to deliver those fixes I have decided to issue a bugfix release (0.8.1).

This release doesn’t bring any significant new features to the Akonadi EWS Resource. The most important changes include:

  • Improved collection tree retrieval:
    • Fixed full collection tree synchronisation for some servers (bug #2).
    • Fixed incremental collection tree synchronisation.
    • Reduced the number of full collection tree retrievals by keeping the state persistent.
  • Fixed Qt 5.7 compatibility
  • Fixed crash upon encountering a distribution list item (bug #7).
  • Improved behaviour in case of temporary server connection loss.
  • Added option to customise User Agent string sent to the server.
  • Added option to enable NTLMv2 authentication.
  • Allow specifying an empty domain name (bug #3). Requires KF 5.28 to have any effect.
  • Fixed dates when copying items from other resources to the Sent Items folder (bug #5).
  • Improved batch item retrieval performance (will be effective with KDE Applications 16.12).
  • Add synchronisation of collection tree upon connection. Fixes empty collection tree after adding a new resource until a manual mail fetch is triggered (bug #15).

As promised before I intend to focus on calendar support and hope to get it running in the next release.

Akonadi Resource for Microsoft Exchange Web Services (EWS)

Whether you are a Microsoft hater or a lover, when you have ever had a chance to work for a medium or large corporation, you have probably stumbled upon Microsoft Exchange mail server. While it can be made to talk to regular e-mail clients using standards such as IMAP, POP3 and SMTP, some corporate admins choose not to enable any of the standard mail protocols leaving the user with no choice other than to use Microsoft Outlook. Even if it is possible to use regular e-mail clients they will not be able to explore the full potential of Exchange, as it is not only a mail server but rather a groupware server which includes support for calendar, tasks, contacts and many more.

Why not talk to Exchange in its own language?

Exchange actually knows three of them:

  • MAPI
  • ActiveSync
  • Exchange Web Services (EWS)

MAPI was the primary way to communicate with Exchange until version 2007. It is now considered legacy and it’s for a good reason – it’s a complex protocol, not an easy one to work with. ActiveSync is focused towards mobile applications.

Exchange Web Services were introduced with Exchange 2007 and since then have become the new primary standard to communicate with Exchange.

There are not many e-mail clients that are able to use native Exchange protocols. Evolution is able to work with both MAPI and EWS, but the support is buggy. An extension exists for Thunderbird (ExQuilla), but it requires a license and also has some bugs.

The situation on the Akonadi front

Akonadi has so far seen support for two of the Exchange native protocols. A resource exists for MAPI (akonadi-exchange). It started off as a read-only resource to access the Global Address List and the calendar and has since evolved to allow retrieval of e-mail. Unfortunately because of MAPI complexness it is buggy and pulls it some Samba4 dependencies.

Another resource exists for ActiveSync. Unfortunately access to Exchange using this standard is often heavily restricted or even completely blocked by corporate admins due to security concerns.

Exchange Web Services

When I first started exploring Exchange Web Services I noticed that it is relatively easy to use and quite well documented by Microsoft itself (big thanks to the EU forcing MS to documents its standards). Given the lack of fully-featured Exchange support I have given it a try to develop an Akonadi resource that would use EWS in order to allow KDE PIM to access the full potential of Exchange.

After nearly six months of development the day has come to publish the initial release of this resource.

Current status

The initial version (0.8.0) focuses on e-mail support. It is possible to both send and receive e-mail without use of IMAP, POP3 or SMTP (yes, by default Outlook doesn’t send e-mail through SMTP, but uses the native protocol to do it). Full control over mailbox folder is also possible. You can copy, delete e-mail and manage all folders.

Once notable feature is server-side tag support. Thanks to Exchange support for custom properties it is possible to save Akonadi tags into Exchange so that they can be retrieved on another machine or after a system reinstall.

Support for other items is partial. Calendar is read-only – events are retrieved, reminders work, but there is no possibility to add, modify or delete events as well as invite attendees or respond to invitations from others.

Personal address book is also read-only. There is no support for the Global Address List and tasks.

Development of all the missing features will require more time and will be subject to subsequent releases. It is however worth to note that a number of changes will have to be introduced to Akonadi as in some cases it is based on assumptions, which are not true for Exchange.

Can I try it out?

Of course, you’re welcome to do so. The project is hosted on GitHub, where you can retrieve the latest bleeding edge state of development or – in case you prefer a stable solution – one of available releases (currently there is only one).

I have prepared packages for Fedora 23 and Ubuntu 16.04, but since I’m not a packaging expert please bare with me in case there are some issues.

Being a Gentoo user I have obviously prepared an overlay.

Note: While I have done my best to develop good quality software I am only a human and humans make mistakes. In this case mistakes could even cause your e-mail to be lost, so please use with caution.

Next steps

Once e-mail support is stable I intend to focus on calendar support, which should make the resource much more useful.

Apart from the resource itself I also intend to work on the KDE PIM packages itself in order to better integrate with the EWS resource. I already had to introduce some workarounds that I would like to get rid of.

In the long term I hope to make this resource part of KDE PIM so that people can use Exchange with it without the need to install any third-party software and rely on my poor packaging 😉

Welcome

Welcome to my software development blog, where I intend to write about interesting open-source projects that I’m working on. Soon I hope to write some more about the actual projects so stay tuned!